diff --git a/.gitignore b/.gitignore
index 4bc162c1b0f..8a745655cbf 100644
--- a/.gitignore
+++ b/.gitignore
@@ -159,6 +159,7 @@ website/package-lock.json
/programs/server/store
/programs/server/uuid
/programs/server/coordination
+/programs/server/workload
# temporary test files
tests/queries/0_stateless/test_*
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 6c0d21a4698..90285582b4e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,4 +1,5 @@
### Table of Contents
+**[ClickHouse release v24.10, 2024-10-31](#2410)**
**[ClickHouse release v24.9, 2024-09-26](#249)**
**[ClickHouse release v24.8 LTS, 2024-08-20](#248)**
**[ClickHouse release v24.7, 2024-07-30](#247)**
@@ -12,6 +13,165 @@
# 2024 Changelog
+### ClickHouse release 24.10, 2024-10-31
+
+#### Backward Incompatible Change
+* Allow to write `SETTINGS` before `FORMAT` in a chain of queries with `UNION` when subqueries are inside parentheses. This closes [#39712](https://github.com/ClickHouse/ClickHouse/issues/39712). Change the behavior when a query has the SETTINGS clause specified twice in a sequence. The closest SETTINGS clause will have a preference for the corresponding subquery. In the previous versions, the outermost SETTINGS clause could take a preference over the inner one. [#68614](https://github.com/ClickHouse/ClickHouse/pull/68614) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Reordering of filter conditions from `[PRE]WHERE` clause is now allowed by default. It could be disabled by setting `allow_reorder_prewhere_conditions` to `false`. [#70657](https://github.com/ClickHouse/ClickHouse/pull/70657) ([Nikita Taranov](https://github.com/nickitat)).
+* Remove the `idxd-config` library, which has an incompatible license. This also removes the experimental Intel DeflateQPL codec. [#70987](https://github.com/ClickHouse/ClickHouse/pull/70987) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+
+#### New Feature
+* Allow to grant access to the wildcard prefixes. `GRANT SELECT ON db.table_pefix_* TO user`. [#65311](https://github.com/ClickHouse/ClickHouse/pull/65311) ([pufit](https://github.com/pufit)).
+* If you press space bar during query runtime, the client will display a real-time table with detailed metrics. You can enable it globally with the new `--progress-table` option in clickhouse-client; a new `--enable-progress-table-toggle` is associated with the `--progress-table` option, and toggles the rendering of the progress table by pressing the control key (Space). [#63689](https://github.com/ClickHouse/ClickHouse/pull/63689) ([Maria Khristenko](https://github.com/mariaKhr)), [#70423](https://github.com/ClickHouse/ClickHouse/pull/70423) ([Julia Kartseva](https://github.com/jkartseva)).
+* Allow to cache read files for object storage table engines and data lakes using hash from ETag + file path as cache key. [#70135](https://github.com/ClickHouse/ClickHouse/pull/70135) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Support creating a table with a query: `CREATE TABLE ... CLONE AS ...`. It clones the source table's schema and then attaches all partitions to the newly created table. This feature is only supported with tables of the `MergeTree` family Closes [#65015](https://github.com/ClickHouse/ClickHouse/issues/65015). [#69091](https://github.com/ClickHouse/ClickHouse/pull/69091) ([tuanpach](https://github.com/tuanpach)).
+* Add a new system table, `system.query_metric_log` which contains history of memory and metric values from table system.events for individual queries, periodically flushed to disk. [#66532](https://github.com/ClickHouse/ClickHouse/pull/66532) ([Pablo Marcos](https://github.com/pamarcos)).
+* A simple SELECT query can be written with implicit SELECT to enable calculator-style expressions, e.g., `ch "1 + 2"`. This is controlled by a new setting, `implicit_select`. [#68502](https://github.com/ClickHouse/ClickHouse/pull/68502) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Support the `--copy` mode for clickhouse local as a shortcut for format conversion [#68503](https://github.com/ClickHouse/ClickHouse/issues/68503). [#68583](https://github.com/ClickHouse/ClickHouse/pull/68583) ([Denis Hananein](https://github.com/denis-hananein)).
+* Add a builtin HTML page for visualizing merges which is available at the `/merges` path. [#70821](https://github.com/ClickHouse/ClickHouse/pull/70821) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Add support for `arrayUnion` function. [#68989](https://github.com/ClickHouse/ClickHouse/pull/68989) ([Peter Nguyen](https://github.com/petern48)).
+* Allow parametrised SQL aliases. [#50665](https://github.com/ClickHouse/ClickHouse/pull/50665) ([Anton Kozlov](https://github.com/tonickkozlov)).
+* A new aggregate function `quantileExactWeightedInterpolated`, which is a interpolated version based on quantileExactWeighted. Some people may wonder why we need a new `quantileExactWeightedInterpolated` since we already have `quantileExactInterpolatedWeighted`. The reason is the new one is more accurate than the old one. This is for spark compatibility. [#69619](https://github.com/ClickHouse/ClickHouse/pull/69619) ([李扬](https://github.com/taiyang-li)).
+* A new function `arrayElementOrNull`. It returns `NULL` if the array index is out of range or a Map key not found. [#69646](https://github.com/ClickHouse/ClickHouse/pull/69646) ([李扬](https://github.com/taiyang-li)).
+* Allows users to specify regular expressions through new `message_regexp` and `message_regexp_negative` fields in the `config.xml` file to filter out logging. The logging is applied to the formatted un-colored text for the most intuitive developer experience. [#69657](https://github.com/ClickHouse/ClickHouse/pull/69657) ([Peter Nguyen](https://github.com/petern48)).
+* Added `RIPEMD160` function, which computes the RIPEMD-160 cryptographic hash of a string. Example: `SELECT HEX(RIPEMD160('The quick brown fox jumps over the lazy dog'))` returns `37F332F68DB77BD9D7EDD4969571AD671CF9DD3B`. [#70087](https://github.com/ClickHouse/ClickHouse/pull/70087) ([Dergousov Maxim](https://github.com/m7kss1)).
+* Support reading `Iceberg` tables on `HDFS`. [#70268](https://github.com/ClickHouse/ClickHouse/pull/70268) ([flynn](https://github.com/ucasfl)).
+* Support for CTE in the form of `WITH ... INSERT`, as previously we only supported `INSERT ... WITH ...`. [#70593](https://github.com/ClickHouse/ClickHouse/pull/70593) ([Shichao Jin](https://github.com/jsc0218)).
+* MongoDB integration: support for all MongoDB types, support for WHERE and ORDER BY statements on MongoDB side, restriction for expressions unsupported by MongoDB. Note that the new inegration is disabled by default, to use it, please set `` to `false` in server config. [#63279](https://github.com/ClickHouse/ClickHouse/pull/63279) ([Kirill Nikiforov](https://github.com/allmazz)).
+* A new function `getSettingOrDefault` added to return the default value and avoid exception if a custom setting is not found in the current profile. [#69917](https://github.com/ClickHouse/ClickHouse/pull/69917) ([Shankar](https://github.com/shiyer7474)).
+
+#### Experimental feature
+* Refreshable materialized views are production ready. [#70550](https://github.com/ClickHouse/ClickHouse/pull/70550) ([Michael Kolupaev](https://github.com/al13n321)). Refreshable materialized views are now supported in Replicated databases. [#60669](https://github.com/ClickHouse/ClickHouse/pull/60669) ([Michael Kolupaev](https://github.com/al13n321)).
+* Parallel replicas are moved from experimental to beta. Reworked settings that control the behavior of parallel replicas algorithms. A quick recap: ClickHouse has four different algorithms for parallel reading involving multiple replicas, which is reflected in the setting `parallel_replicas_mode`, the default value for it is `read_tasks` Additionally, the toggle-switch setting `enable_parallel_replicas` has been added. [#63151](https://github.com/ClickHouse/ClickHouse/pull/63151) ([Alexey Milovidov](https://github.com/alexey-milovidov)), ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Support for the `Dynamic` type in most functions by executing them on internal types inside `Dynamic`. [#69691](https://github.com/ClickHouse/ClickHouse/pull/69691) ([Pavel Kruglov](https://github.com/Avogar)).
+* Allow to read/write the `JSON` type as a binary string in `RowBinary` format under settings `input_format_binary_read_json_as_string/output_format_binary_write_json_as_string`. [#70288](https://github.com/ClickHouse/ClickHouse/pull/70288) ([Pavel Kruglov](https://github.com/Avogar)).
+* Allow to serialize/deserialize `JSON` column as single String column in the Native format. For output use setting `output_format_native_write_json_as_string`. For input, use serialization version `1` before the column data. [#70312](https://github.com/ClickHouse/ClickHouse/pull/70312) ([Pavel Kruglov](https://github.com/Avogar)).
+* Introduced a special (experimental) mode of a merge selector for MergeTree tables which makes it more aggressive for the partitions that are close to the limit by the number of parts. It is controlled by the `merge_selector_use_blurry_base` MergeTree-level setting. [#70645](https://github.com/ClickHouse/ClickHouse/pull/70645) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Implement generic ser/de between Avro's `Union` and ClickHouse's `Variant` types. Resolves [#69713](https://github.com/ClickHouse/ClickHouse/issues/69713). [#69712](https://github.com/ClickHouse/ClickHouse/pull/69712) ([Jiří Kozlovský](https://github.com/jirislav)).
+
+#### Performance Improvement
+* Refactor `IDisk` and `IObjectStorage` for better performance. Tables from `plain` and `plain_rewritable` object storages will initialize faster. [#68146](https://github.com/ClickHouse/ClickHouse/pull/68146) ([Alexey Milovidov](https://github.com/alexey-milovidov), [Julia Kartseva](https://github.com/jkartseva)). Do not call the LIST object storage API when determining if a file or directory exists on the plain rewritable disk, as it can be cost-inefficient. [#70852](https://github.com/ClickHouse/ClickHouse/pull/70852) ([Julia Kartseva](https://github.com/jkartseva)). Reduce the number of object storage HEAD API requests in the plain_rewritable disk. [#70915](https://github.com/ClickHouse/ClickHouse/pull/70915) ([Julia Kartseva](https://github.com/jkartseva)).
+* Added an ability to parse data directly into sparse columns. [#69828](https://github.com/ClickHouse/ClickHouse/pull/69828) ([Anton Popov](https://github.com/CurtizJ)).
+* Improved performance of parsing formats with high number of missed values (e.g. `JSONEachRow`). [#69875](https://github.com/ClickHouse/ClickHouse/pull/69875) ([Anton Popov](https://github.com/CurtizJ)).
+* Supports parallel reading of parquet row groups and prefetching of row groups in single-threaded mode. [#69862](https://github.com/ClickHouse/ClickHouse/pull/69862) ([LiuNeng](https://github.com/liuneng1994)).
+* Support minmax index for `pointInPolygon`. [#62085](https://github.com/ClickHouse/ClickHouse/pull/62085) ([JackyWoo](https://github.com/JackyWoo)).
+* Use bloom filters when reading Parquet files. [#62966](https://github.com/ClickHouse/ClickHouse/pull/62966) ([Arthur Passos](https://github.com/arthurpassos)).
+* Lock-free parts rename to avoid INSERT affect SELECT (due to parts lock) (under normal circumstances with `fsync_part_directory`, QPS of SELECT with INSERT in parallel, increased 2x, under heavy load the effect is even bigger). Note, this only includes `ReplicatedMergeTree` for now. [#64955](https://github.com/ClickHouse/ClickHouse/pull/64955) ([Azat Khuzhin](https://github.com/azat)).
+* Respect `ttl_only_drop_parts` on `materialize ttl`; only read necessary columns to recalculate TTL and drop parts by replacing them with an empty one. [#65488](https://github.com/ClickHouse/ClickHouse/pull/65488) ([Andrey Zvonov](https://github.com/zvonand)).
+* Optimized thread creation in the ThreadPool to minimize lock contention. Thread creation is now performed outside of the critical section to avoid delays in job scheduling and thread management under high load conditions. This leads to a much more responsive ClickHouse under heavy concurrent load. [#68694](https://github.com/ClickHouse/ClickHouse/pull/68694) ([filimonov](https://github.com/filimonov)).
+* Enable reading `LowCardinality` string columns from `ORC`. [#69481](https://github.com/ClickHouse/ClickHouse/pull/69481) ([李扬](https://github.com/taiyang-li)).
+* Use `LowCardinality` for `ProfileEvents` in system logs such as `part_log`, `query_views_log`, `filesystem_cache_log`. [#70152](https://github.com/ClickHouse/ClickHouse/pull/70152) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Improve performance of `fromUnixTimestamp`/`toUnixTimestamp` functions. [#71042](https://github.com/ClickHouse/ClickHouse/pull/71042) ([kevinyhzou](https://github.com/KevinyhZou)).
+* Don't disable nonblocking read from page cache for the entire server when reading from a blocking I/O. This was leading to a poorer performance when a single filesystem (e.g., tmpfs) didn't support the `preadv2` syscall while others do. [#70299](https://github.com/ClickHouse/ClickHouse/pull/70299) ([Antonio Andelic](https://github.com/antonio2368)).
+* `ALTER TABLE .. REPLACE PARTITION` doesn't wait anymore for mutations/merges that happen in other partitions. [#59138](https://github.com/ClickHouse/ClickHouse/pull/59138) ([Vasily Nemkov](https://github.com/Enmk)).
+* Don't do validation when synchronizing ACL from Keeper. It's validating during creation. It shouldn't matter that much, but there are installations with tens of thousands or even more user created, and the unnecessary hash validation can take a long time to finish during server startup (it synchronizes everything from keeper). [#70644](https://github.com/ClickHouse/ClickHouse/pull/70644) ([Raúl Marín](https://github.com/Algunenano)).
+
+#### Improvement
+* `CREATE TABLE AS` will copy `PRIMARY KEY`, `ORDER BY`, and similar clauses (of `MergeTree` tables). [#69739](https://github.com/ClickHouse/ClickHouse/pull/69739) ([sakulali](https://github.com/sakulali)).
+* Support 64-bit XID in Keeper. It can be enabled with the `use_xid_64` configuration value. [#69908](https://github.com/ClickHouse/ClickHouse/pull/69908) ([Antonio Andelic](https://github.com/antonio2368)).
+* Command-line arguments for Bool settings are set to true when no value is provided for the argument (e.g. `clickhouse-client --optimize_aggregation_in_order --query "SELECT 1"`). [#70459](https://github.com/ClickHouse/ClickHouse/pull/70459) ([davidtsuk](https://github.com/davidtsuk)).
+* Added user-level settings `min_free_disk_bytes_to_throw_insert` and `min_free_disk_ratio_to_throw_insert` to prevent insertions on disks that are almost full. [#69755](https://github.com/ClickHouse/ClickHouse/pull/69755) ([Marco Vilas Boas](https://github.com/marco-vb)).
+* Embedded documentation for settings will be strictly more detailed and complete than the documentation on the website. This is the first step before making the website documentation always auto-generated from the source code. This has long-standing implications: - it will be guaranteed to have every setting; - there is no chance of having default values obsolete; - we can generate this documentation for each ClickHouse version; - the documentation can be displayed by the server itself even without Internet access. Generate the docs on the website from the source code. [#70289](https://github.com/ClickHouse/ClickHouse/pull/70289) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Allow empty needle in the function `replace`, the same behavior with PostgreSQL. [#69918](https://github.com/ClickHouse/ClickHouse/pull/69918) ([zhanglistar](https://github.com/zhanglistar)).
+* Allow empty needle in functions `replaceRegexp*`. [#70053](https://github.com/ClickHouse/ClickHouse/pull/70053) ([zhanglistar](https://github.com/zhanglistar)).
+* Symbolic links for tables in the `data/database_name/` directory are created for the actual paths to the table's data, depending on the storage policy, instead of the `store/...` directory on the default disk. [#61777](https://github.com/ClickHouse/ClickHouse/pull/61777) ([Kirill](https://github.com/kirillgarbar)).
+* While parsing an `Enum` field from `JSON`, a string containing an integer will be interpreted as the corresponding `Enum` element. This closes [#65119](https://github.com/ClickHouse/ClickHouse/issues/65119). [#66801](https://github.com/ClickHouse/ClickHouse/pull/66801) ([scanhex12](https://github.com/scanhex12)).
+* Allow `TRIM` -ing `LEADING` or `TRAILING` empty string as a no-op. Closes [#67792](https://github.com/ClickHouse/ClickHouse/issues/67792). [#68455](https://github.com/ClickHouse/ClickHouse/pull/68455) ([Peter Nguyen](https://github.com/petern48)).
+* Improve compatibility of `cast(timestamp as String)` with Spark. [#69179](https://github.com/ClickHouse/ClickHouse/pull/69179) ([Wenzheng Liu](https://github.com/lwz9103)).
+* Always use the new analyzer to calculate constant expressions when `enable_analyzer` is set to `true`. Support calculation of `executable` table function arguments without using `SELECT` query for constant expressions. [#69292](https://github.com/ClickHouse/ClickHouse/pull/69292) ([Dmitry Novik](https://github.com/novikd)).
+* Add a setting `enable_secure_identifiers` to disallow identifiers with special characters. [#69411](https://github.com/ClickHouse/ClickHouse/pull/69411) ([tuanpach](https://github.com/tuanpach)).
+* Add `show_create_query_identifier_quoting_rule` to define identifier quoting behavior in the `SHOW CREATE TABLE` query result. Possible values: - `user_display`: When the identifiers is a keyword. - `when_necessary`: When the identifiers is one of `{"distinct", "all", "table"}` and when it could lead to ambiguity: column names, dictionary attribute names. - `always`: Always quote identifiers. [#69448](https://github.com/ClickHouse/ClickHouse/pull/69448) ([tuanpach](https://github.com/tuanpach)).
+* Improve restoring of access entities' dependencies [#69563](https://github.com/ClickHouse/ClickHouse/pull/69563) ([Vitaly Baranov](https://github.com/vitlibar)).
+* If you run `clickhouse-client` or other CLI application, and it starts up slowly due to an overloaded server, and you start typing your query, such as `SELECT`, the previous versions will display the remaining of the terminal echo contents before printing the greetings message, such as `SELECTClickHouse local version 24.10.1.1.` instead of `ClickHouse local version 24.10.1.1.`. Now it is fixed. This closes [#31696](https://github.com/ClickHouse/ClickHouse/issues/31696). [#69856](https://github.com/ClickHouse/ClickHouse/pull/69856) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Add new column `readonly_duration` to the `system.replicas` table. Needed to be able to distinguish actual readonly replicas from sentinel ones in alerts. [#69871](https://github.com/ClickHouse/ClickHouse/pull/69871) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
+* Change the type of `join_output_by_rowlist_perkey_rows_threshold` setting type to unsigned integer. [#69886](https://github.com/ClickHouse/ClickHouse/pull/69886) ([kevinyhzou](https://github.com/KevinyhZou)).
+* Enhance OpenTelemetry span logging to include query settings. [#70011](https://github.com/ClickHouse/ClickHouse/pull/70011) ([sharathks118](https://github.com/sharathks118)).
+* Add diagnostic info about higher-order array functions if lambda result type is unexpected. [#70093](https://github.com/ClickHouse/ClickHouse/pull/70093) ([ttanay](https://github.com/ttanay)).
+* Keeper improvement: less locking during cluster changes. [#70275](https://github.com/ClickHouse/ClickHouse/pull/70275) ([Antonio Andelic](https://github.com/antonio2368)).
+* Add `WITH IMPLICIT` and `FINAL` keywords to the `SHOW GRANTS` command. Fix a minor bug with implicit grants: [#70094](https://github.com/ClickHouse/ClickHouse/issues/70094). [#70293](https://github.com/ClickHouse/ClickHouse/pull/70293) ([pufit](https://github.com/pufit)).
+* Respect `compatibility` for MergeTree settings. The `compatibility` value is taken from the `default` profile on server startup, and default MergeTree settings are changed accordingly. Further changes of the `compatibility` setting do not affect MergeTree settings. [#70322](https://github.com/ClickHouse/ClickHouse/pull/70322) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Avoid spamming the logs with large HTTP response bodies in case of errors during inter-server communication. [#70487](https://github.com/ClickHouse/ClickHouse/pull/70487) ([Vladimir Cherkasov](https://github.com/vdimir)).
+* Added a new setting `max_parts_to_move` to control the maximum number of parts that can be moved at once. [#70520](https://github.com/ClickHouse/ClickHouse/pull/70520) ([Vladimir Cherkasov](https://github.com/vdimir)).
+* Limit the frequency of certain log messages. [#70601](https://github.com/ClickHouse/ClickHouse/pull/70601) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* `CHECK TABLE` with `PART` qualifier was incorrectly formatted in the client. [#70660](https://github.com/ClickHouse/ClickHouse/pull/70660) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Support writing the column index and the offset index using parquet native writer. [#70669](https://github.com/ClickHouse/ClickHouse/pull/70669) ([LiuNeng](https://github.com/liuneng1994)).
+* Support parsing `DateTime64` for microsecond and timezone in joda syntax ("joda" is a popular Java library for date and time, and the "joda syntax" is that library's style). [#70737](https://github.com/ClickHouse/ClickHouse/pull/70737) ([kevinyhzou](https://github.com/KevinyhZou)).
+* Changed an approach to figure out if a cloud storage supports [batch delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) or not. [#70786](https://github.com/ClickHouse/ClickHouse/pull/70786) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Support for Parquet page v2 in the native reader. [#70807](https://github.com/ClickHouse/ClickHouse/pull/70807) ([Arthur Passos](https://github.com/arthurpassos)).
+* A check if table has both `storage_policy` and `disk` set. A check if a new storage policy is compatible with an old one when using `disk` setting is added. [#70839](https://github.com/ClickHouse/ClickHouse/pull/70839) ([Kirill](https://github.com/kirillgarbar)).
+* Add `system.s3_queue_settings` and `system.azure_queue_settings`. [#70841](https://github.com/ClickHouse/ClickHouse/pull/70841) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Functions `base58Encode` and `base58Decode` now accept arguments of type `FixedString`. Example: `SELECT base58Encode(toFixedString('plaintext', 9));`. [#70846](https://github.com/ClickHouse/ClickHouse/pull/70846) ([Faizan Patel](https://github.com/faizan2786)).
+* Add the `partition` column to every entry type of the part log. Previously, it was set only for some entries. This closes [#70819](https://github.com/ClickHouse/ClickHouse/issues/70819). [#70848](https://github.com/ClickHouse/ClickHouse/pull/70848) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Add `MergeStart` and `MutateStart` events into `system.part_log` which helps with merges analysis and visualization. [#70850](https://github.com/ClickHouse/ClickHouse/pull/70850) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Add a profile event about the number of merged source parts. It allows the monitoring of the fanout of the merge tree in production. [#70908](https://github.com/ClickHouse/ClickHouse/pull/70908) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Background downloads to the filesystem cache were enabled back. [#70929](https://github.com/ClickHouse/ClickHouse/pull/70929) ([Nikita Taranov](https://github.com/nickitat)).
+* Add a new merge selector algorithm, named `Trivial`, for professional usage only. It is worse than the `Simple` merge selector. [#70969](https://github.com/ClickHouse/ClickHouse/pull/70969) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Support for atomic `CREATE OR REPLACE VIEW`. [#70536](https://github.com/ClickHouse/ClickHouse/pull/70536) ([tuanpach](https://github.com/tuanpach))
+* Added `strict_once` mode to aggregate function `windowFunnel` to avoid counting one event several times in case it matches multiple conditions, close [#21835](https://github.com/ClickHouse/ClickHouse/issues/21835). [#69738](https://github.com/ClickHouse/ClickHouse/pull/69738) ([Vladimir Cherkasov](https://github.com/vdimir)).
+
+#### Bug Fix (user-visible misbehavior in an official stable release)
+* Apply configuration updates in global context object. It fixes issues like [#62308](https://github.com/ClickHouse/ClickHouse/issues/62308). [#62944](https://github.com/ClickHouse/ClickHouse/pull/62944) ([Amos Bird](https://github.com/amosbird)).
+* Fix `ReadSettings` not using user set values, because defaults were only used. [#65625](https://github.com/ClickHouse/ClickHouse/pull/65625) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Fix type mismatch issue in `sumMapFiltered` when using signed arguments. [#58408](https://github.com/ClickHouse/ClickHouse/pull/58408) ([Chen768959](https://github.com/Chen768959)).
+* Fix toHour-like conversion functions' monotonicity when optional time zone argument is passed. [#60264](https://github.com/ClickHouse/ClickHouse/pull/60264) ([Amos Bird](https://github.com/amosbird)).
+* Relax `supportsPrewhere` check for `Merge` tables. This fixes [#61064](https://github.com/ClickHouse/ClickHouse/issues/61064). It was hardened unnecessarily in [#60082](https://github.com/ClickHouse/ClickHouse/issues/60082). [#61091](https://github.com/ClickHouse/ClickHouse/pull/61091) ([Amos Bird](https://github.com/amosbird)).
+* Fix `use_concurrency_control` setting handling for proper `concurrent_threads_soft_limit_num` limit enforcing. This enables concurrency control by default because previously it was broken. [#61473](https://github.com/ClickHouse/ClickHouse/pull/61473) ([Sergei Trifonov](https://github.com/serxa)).
+* Fix incorrect `JOIN ON` section optimization in case of `IS NULL` check under any other function (like `NOT`) that may lead to wrong results. Closes [#67915](https://github.com/ClickHouse/ClickHouse/issues/67915). [#68049](https://github.com/ClickHouse/ClickHouse/pull/68049) ([Vladimir Cherkasov](https://github.com/vdimir)).
+* Prevent `ALTER` queries that would make the `CREATE` query of tables invalid. [#68574](https://github.com/ClickHouse/ClickHouse/pull/68574) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
+* Fix inconsistent AST formatting for `negate` (`-`) and `NOT` functions with tuples and arrays. [#68600](https://github.com/ClickHouse/ClickHouse/pull/68600) ([Vladimir Cherkasov](https://github.com/vdimir)).
+* Fix insertion of incomplete type into `Dynamic` during deserialization. It could lead to `Parameter out of bound` errors. [#69291](https://github.com/ClickHouse/ClickHouse/pull/69291) ([Pavel Kruglov](https://github.com/Avogar)).
+* Zero-copy replication, which is experimental and should not be used in production: fix inf loop after `restore replica` in the replicated merge tree with zero copy. [#69293](https://github.com/CljmnickHouse/ClickHouse/pull/69293) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
+* Return back default value of `processing_threads_num` as number of cpu cores in storage `S3Queue`. [#69384](https://github.com/ClickHouse/ClickHouse/pull/69384) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Bypass try/catch flow when de/serializing nested repeated protobuf to nested columns (fixes [#41971](https://github.com/ClickHouse/ClickHouse/issues/41971)). [#69556](https://github.com/ClickHouse/ClickHouse/pull/69556) ([Eliot Hautefeuille](https://github.com/hileef)).
+* Fix crash during insertion into FixedString column in PostgreSQL engine. [#69584](https://github.com/ClickHouse/ClickHouse/pull/69584) ([Pavel Kruglov](https://github.com/Avogar)).
+* Fix crash when executing `create view t as (with recursive 42 as ttt select ttt);`. [#69676](https://github.com/ClickHouse/ClickHouse/pull/69676) ([Han Fei](https://github.com/hanfei1991)).
+* Fixed `maxMapState` throwing 'Bad get' if value type is DateTime64. [#69787](https://github.com/ClickHouse/ClickHouse/pull/69787) ([Michael Kolupaev](https://github.com/al13n321)).
+* Fix `getSubcolumn` with `LowCardinality` columns by overriding `useDefaultImplementationForLowCardinalityColumns` to return `true`. [#69831](https://github.com/ClickHouse/ClickHouse/pull/69831) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
+* Fix permanent blocked distributed sends if a DROP of distributed table failed. [#69843](https://github.com/ClickHouse/ClickHouse/pull/69843) ([Azat Khuzhin](https://github.com/azat)).
+* Fix non-cancellable queries containing WITH FILL with NaN keys. This closes [#69261](https://github.com/ClickHouse/ClickHouse/issues/69261). [#69845](https://github.com/ClickHouse/ClickHouse/pull/69845) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Fix analyzer default with old compatibility value. [#69895](https://github.com/ClickHouse/ClickHouse/pull/69895) ([Raúl Marín](https://github.com/Algunenano)).
+* Don't check dependencies during CREATE OR REPLACE VIEW during DROP of old table. Previously CREATE OR REPLACE query failed when there are dependent tables of the recreated view. [#69907](https://github.com/ClickHouse/ClickHouse/pull/69907) ([Pavel Kruglov](https://github.com/Avogar)).
+* Something for Decimal. Fixes [#69730](https://github.com/ClickHouse/ClickHouse/issues/69730). [#69978](https://github.com/ClickHouse/ClickHouse/pull/69978) ([Arthur Passos](https://github.com/arthurpassos)).
+* Now DEFINER/INVOKER will work with parameterized views. [#69984](https://github.com/ClickHouse/ClickHouse/pull/69984) ([pufit](https://github.com/pufit)).
+* Fix parsing for view's definers. [#69985](https://github.com/ClickHouse/ClickHouse/pull/69985) ([pufit](https://github.com/pufit)).
+* Fixed a bug when the timezone could change the result of the query with a `Date` or `Date32` arguments. [#70036](https://github.com/ClickHouse/ClickHouse/pull/70036) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
+* Fixes `Block structure mismatch` for queries with nested views and `WHERE` condition. Fixes [#66209](https://github.com/ClickHouse/ClickHouse/issues/66209). [#70054](https://github.com/ClickHouse/ClickHouse/pull/70054) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Avoid reusing columns among different named tuples when evaluating `tuple` functions. This fixes [#70022](https://github.com/ClickHouse/ClickHouse/issues/70022). [#70103](https://github.com/ClickHouse/ClickHouse/pull/70103) ([Amos Bird](https://github.com/amosbird)).
+* Fix wrong LOGICAL_ERROR when replacing literals in ranges. [#70122](https://github.com/ClickHouse/ClickHouse/pull/70122) ([Pablo Marcos](https://github.com/pamarcos)).
+* Check for Nullable(Nothing) type during ALTER TABLE MODIFY COLUMN/QUERY to prevent tables with such data type. [#70123](https://github.com/ClickHouse/ClickHouse/pull/70123) ([Pavel Kruglov](https://github.com/Avogar)).
+* Proper error message for illegal query `JOIN ... ON *` , close [#68650](https://github.com/ClickHouse/ClickHouse/issues/68650). [#70124](https://github.com/ClickHouse/ClickHouse/pull/70124) ([Vladimir Cherkasov](https://github.com/vdimir)).
+* Fix wrong result with skipping index. [#70127](https://github.com/ClickHouse/ClickHouse/pull/70127) ([Raúl Marín](https://github.com/Algunenano)).
+* Fix data race in ColumnObject/ColumnTuple decompress method that could lead to heap use after free. [#70137](https://github.com/ClickHouse/ClickHouse/pull/70137) ([Pavel Kruglov](https://github.com/Avogar)).
+* Fix possible hung in ALTER COLUMN with Dynamic type. [#70144](https://github.com/ClickHouse/ClickHouse/pull/70144) ([Pavel Kruglov](https://github.com/Avogar)).
+* Now ClickHouse will consider more errors as retriable and will not mark data parts as broken in case of such errors. [#70145](https://github.com/ClickHouse/ClickHouse/pull/70145) ([alesapin](https://github.com/alesapin)).
+* Use correct `max_types` parameter during Dynamic type creation for JSON subcolumn. [#70147](https://github.com/ClickHouse/ClickHouse/pull/70147) ([Pavel Kruglov](https://github.com/Avogar)).
+* Fix the password being displayed in `system.query_log` for users with bcrypt password authentication method. [#70148](https://github.com/ClickHouse/ClickHouse/pull/70148) ([Nikolay Degterinsky](https://github.com/evillique)).
+* Fix event counter for the native interface (InterfaceNativeSendBytes). [#70153](https://github.com/ClickHouse/ClickHouse/pull/70153) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
+* Fix possible crash related to JSON columns. [#70172](https://github.com/ClickHouse/ClickHouse/pull/70172) ([Pavel Kruglov](https://github.com/Avogar)).
+* Fix multiple issues with arrayMin and arrayMax. [#70207](https://github.com/ClickHouse/ClickHouse/pull/70207) ([Raúl Marín](https://github.com/Algunenano)).
+* Respect setting allow_simdjson in the JSON type parser. [#70218](https://github.com/ClickHouse/ClickHouse/pull/70218) ([Pavel Kruglov](https://github.com/Avogar)).
+* Fix a null pointer dereference on creating a materialized view with two selects and an `INTERSECT`, e.g. `CREATE MATERIALIZED VIEW v0 AS (SELECT 1) INTERSECT (SELECT 1);`. [#70264](https://github.com/ClickHouse/ClickHouse/pull/70264) ([Konstantin Bogdanov](https://github.com/thevar1able)).
+* Don't modify global settings with startup scripts. Previously, changing a setting in a startup script would change it globally. [#70310](https://github.com/ClickHouse/ClickHouse/pull/70310) ([Antonio Andelic](https://github.com/antonio2368)).
+* Fix ALTER of `Dynamic` type with reducing max_types parameter that could lead to server crash. [#70328](https://github.com/ClickHouse/ClickHouse/pull/70328) ([Pavel Kruglov](https://github.com/Avogar)).
+* Fix crash when using WITH FILL incorrectly. [#70338](https://github.com/ClickHouse/ClickHouse/pull/70338) ([Raúl Marín](https://github.com/Algunenano)).
+* Fix possible use-after-free in `SYSTEM DROP FORMAT SCHEMA CACHE FOR Protobuf`. [#70358](https://github.com/ClickHouse/ClickHouse/pull/70358) ([Azat Khuzhin](https://github.com/azat)).
+* Fix crash during GROUP BY JSON sub-object subcolumn. [#70374](https://github.com/ClickHouse/ClickHouse/pull/70374) ([Pavel Kruglov](https://github.com/Avogar)).
+* Don't prefetch parts for vertical merges if part has no rows. [#70452](https://github.com/ClickHouse/ClickHouse/pull/70452) ([Antonio Andelic](https://github.com/antonio2368)).
+* Fix crash in WHERE with lambda functions. [#70464](https://github.com/ClickHouse/ClickHouse/pull/70464) ([Raúl Marín](https://github.com/Algunenano)).
+* Fix table creation with `CREATE ... AS table_function(...)` with database `Replicated` and unavailable table function source on secondary replica. [#70511](https://github.com/ClickHouse/ClickHouse/pull/70511) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Ignore all output on async insert with `wait_for_async_insert=1`. Closes [#62644](https://github.com/ClickHouse/ClickHouse/issues/62644). [#70530](https://github.com/ClickHouse/ClickHouse/pull/70530) ([Konstantin Bogdanov](https://github.com/thevar1able)).
+* Ignore frozen_metadata.txt while traversing shadow directory from system.remote_data_paths. [#70590](https://github.com/ClickHouse/ClickHouse/pull/70590) ([Aleksei Filatov](https://github.com/aalexfvk)).
+* Fix creation of stateful window functions on misaligned memory. [#70631](https://github.com/ClickHouse/ClickHouse/pull/70631) ([Raúl Marín](https://github.com/Algunenano)).
+* Fixed rare crashes in `SELECT`-s and merges after adding a column of `Array` type with non-empty default expression. [#70695](https://github.com/ClickHouse/ClickHouse/pull/70695) ([Anton Popov](https://github.com/CurtizJ)).
+* Insert into table function s3 will respect query settings. [#70696](https://github.com/ClickHouse/ClickHouse/pull/70696) ([Vladimir Cherkasov](https://github.com/vdimir)).
+* Fix infinite recursion when inferring a protobuf schema when skipping unsupported fields is enabled. [#70697](https://github.com/ClickHouse/ClickHouse/pull/70697) ([Raúl Marín](https://github.com/Algunenano)).
+* Disable enable_named_columns_in_function_tuple by default. [#70833](https://github.com/ClickHouse/ClickHouse/pull/70833) ([Raúl Marín](https://github.com/Algunenano)).
+* Fix S3Queue table engine setting processing_threads_num not being effective in case it was deduced from the number of cpu cores on the server. [#70837](https://github.com/ClickHouse/ClickHouse/pull/70837) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Normalize named tuple arguments in aggregation states. This fixes [#69732](https://github.com/ClickHouse/ClickHouse/issues/69732) . [#70853](https://github.com/ClickHouse/ClickHouse/pull/70853) ([Amos Bird](https://github.com/amosbird)).
+* Fix a logical error due to negative zeros in the two-level hash table. This closes [#70973](https://github.com/ClickHouse/ClickHouse/issues/70973). [#70979](https://github.com/ClickHouse/ClickHouse/pull/70979) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
+* Fix `limit by`, `limit with ties` for distributed and parallel replicas. [#70880](https://github.com/ClickHouse/ClickHouse/pull/70880) ([Nikita Taranov](https://github.com/nickitat)).
+
+
### ClickHouse release 24.9, 2024-09-26
#### Backward Incompatible Change
diff --git a/README.md b/README.md
index 3b5209dcbe9..dcaeda13acd 100644
--- a/README.md
+++ b/README.md
@@ -42,31 +42,19 @@ Keep an eye out for upcoming meetups and events around the world. Somewhere else
Upcoming meetups
-* [Jakarta Meetup](https://www.meetup.com/clickhouse-indonesia-user-group/events/303191359/) - October 1
-* [Singapore Meetup](https://www.meetup.com/clickhouse-singapore-meetup-group/events/303212064/) - October 3
-* [Madrid Meetup](https://www.meetup.com/clickhouse-spain-user-group/events/303096564/) - October 22
-* [Oslo Meetup](https://www.meetup.com/open-source-real-time-data-warehouse-real-time-analytics/events/302938622) - October 31
* [Barcelona Meetup](https://www.meetup.com/clickhouse-spain-user-group/events/303096876/) - November 12
* [Ghent Meetup](https://www.meetup.com/clickhouse-belgium-user-group/events/303049405/) - November 19
* [Dubai Meetup](https://www.meetup.com/clickhouse-dubai-meetup-group/events/303096989/) - November 21
* [Paris Meetup](https://www.meetup.com/clickhouse-france-user-group/events/303096434) - November 26
+* [Amsterdam Meetup](https://www.meetup.com/clickhouse-netherlands-user-group/events/303638814) - December 3
+* [New York Meetup](https://www.meetup.com/clickhouse-new-york-user-group/events/304268174) - December 9
+* [San Francisco Meetup](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/304286951/) - December 12
Recently completed meetups
-* [ClickHouse Guangzhou User Group Meetup](https://mp.weixin.qq.com/s/GSvo-7xUoVzCsuUvlLTpCw) - August 25
-* [Seattle Meetup (Statsig)](https://www.meetup.com/clickhouse-seattle-user-group/events/302518075/) - August 27
-* [Melbourne Meetup](https://www.meetup.com/clickhouse-australia-user-group/events/302732666/) - August 27
-* [Sydney Meetup](https://www.meetup.com/clickhouse-australia-user-group/events/302862966/) - September 5
-* [Zurich Meetup](https://www.meetup.com/clickhouse-switzerland-meetup-group/events/302267429/) - September 5
-* [San Francisco Meetup (Cloudflare)](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/302540575) - September 5
-* [Raleigh Meetup (Deutsche Bank)](https://www.meetup.com/triangletechtalks/events/302723486/) - September 9
-* [New York Meetup (Rokt)](https://www.meetup.com/clickhouse-new-york-user-group/events/302575342) - September 10
-* [Toronto Meetup (Shopify)](https://www.meetup.com/clickhouse-toronto-user-group/events/301490855/) - September 10
-* [Chicago Meetup (Jump Capital)](https://lu.ma/43tvmrfw) - September 12
-* [London Meetup](https://www.meetup.com/clickhouse-london-user-group/events/302977267) - September 17
-* [Austin Meetup](https://www.meetup.com/clickhouse-austin-user-group/events/302558689/) - September 17
-* [Bangalore Meetup](https://www.meetup.com/clickhouse-bangalore-user-group/events/303208274/) - September 18
-* [Tel Aviv Meetup](https://www.meetup.com/clickhouse-meetup-israel/events/303095121) - September 22
+* [Madrid Meetup](https://www.meetup.com/clickhouse-spain-user-group/events/303096564/) - October 22
+* [Singapore Meetup](https://www.meetup.com/clickhouse-singapore-meetup-group/events/303212064/) - October 3
+* [Jakarta Meetup](https://www.meetup.com/clickhouse-indonesia-user-group/events/303191359/) - October 1
## Recent Recordings
* **Recent Meetup Videos**: [Meetup Playlist](https://www.youtube.com/playlist?list=PL0Z2YDlm0b3iNDUzpY1S3L_iV4nARda_U) Whenever possible recordings of the ClickHouse Community Meetups are edited and presented as individual talks. Current featuring "Modern SQL in 2023", "Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse", and "Full-Text Indices: Design and Experiments"
diff --git a/base/glibc-compatibility/musl/getauxval.c b/base/glibc-compatibility/musl/getauxval.c
index ec2cce1e4aa..cc0cdf25b03 100644
--- a/base/glibc-compatibility/musl/getauxval.c
+++ b/base/glibc-compatibility/musl/getauxval.c
@@ -25,9 +25,10 @@
// We don't have libc struct available here.
// Compute aux vector manually (from /proc/self/auxv).
//
-// Right now there is only 51 AT_* constants,
-// so 64 should be enough until this implementation will be replaced with musl.
-static unsigned long __auxv_procfs[64];
+// Right now there are 51 AT_* constants. Custom kernels have been encountered
+// making use of up to 71. 128 should be enough until this implementation is
+// replaced with musl.
+static unsigned long __auxv_procfs[128];
static unsigned long __auxv_secure = 0;
// Common
static unsigned long * __auxv_environ = NULL;
diff --git a/ci/docker/fasttest/Dockerfile b/ci/docker/fasttest/Dockerfile
index 02595ad0d0a..66e48b163b8 100644
--- a/ci/docker/fasttest/Dockerfile
+++ b/ci/docker/fasttest/Dockerfile
@@ -33,6 +33,8 @@ RUN apt-get update \
# moreutils - provides ts fo FT
# expect, bzip2 - requried by FT
# bsdmainutils - provides hexdump for FT
+# nasm - nasm copiler for one of submodules, required from normal build
+# yasm - asssembler for libhdfs3, required from normal build
RUN apt-get update \
&& apt-get install \
@@ -53,6 +55,8 @@ RUN apt-get update \
pv \
jq \
bzip2 \
+ nasm \
+ yasm \
--yes --no-install-recommends \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*
diff --git a/ci/jobs/build_clickhouse.py b/ci/jobs/build_clickhouse.py
new file mode 100644
index 00000000000..21ed8091608
--- /dev/null
+++ b/ci/jobs/build_clickhouse.py
@@ -0,0 +1,102 @@
+import argparse
+
+from praktika.result import Result
+from praktika.settings import Settings
+from praktika.utils import MetaClasses, Shell, Utils
+
+
+class JobStages(metaclass=MetaClasses.WithIter):
+ CHECKOUT_SUBMODULES = "checkout"
+ CMAKE = "cmake"
+ BUILD = "build"
+
+
+def parse_args():
+ parser = argparse.ArgumentParser(description="ClickHouse Build Job")
+ parser.add_argument("BUILD_TYPE", help="Type: ")
+ parser.add_argument("--param", help="Optional custom job start stage", default=None)
+ return parser.parse_args()
+
+
+def main():
+
+ args = parse_args()
+
+ stop_watch = Utils.Stopwatch()
+
+ stages = list(JobStages)
+ stage = args.param or JobStages.CHECKOUT_SUBMODULES
+ if stage:
+ assert stage in JobStages, f"--param must be one of [{list(JobStages)}]"
+ print(f"Job will start from stage [{stage}]")
+ while stage in stages:
+ stages.pop(0)
+ stages.insert(0, stage)
+
+ cmake_build_type = "Release"
+ sanitizer = ""
+
+ if "debug" in args.BUILD_TYPE.lower():
+ print("Build type set: debug")
+ cmake_build_type = "Debug"
+
+ if "asan" in args.BUILD_TYPE.lower():
+ print("Sanitizer set: address")
+ sanitizer = "address"
+
+ # if Environment.is_local_run():
+ # build_cache_type = "disabled"
+ # else:
+ build_cache_type = "sccache"
+
+ current_directory = Utils.cwd()
+ build_dir = f"{Settings.TEMP_DIR}/build"
+
+ res = True
+ results = []
+
+ if res and JobStages.CHECKOUT_SUBMODULES in stages:
+ Shell.check(f"rm -rf {build_dir} && mkdir -p {build_dir}")
+ results.append(
+ Result.create_from_command_execution(
+ name="Checkout Submodules",
+ command=f"git submodule sync --recursive && git submodule init && git submodule update --depth 1 --recursive --jobs {min([Utils.cpu_count(), 20])}",
+ )
+ )
+ res = results[-1].is_ok()
+
+ if res and JobStages.CMAKE in stages:
+ results.append(
+ Result.create_from_command_execution(
+ name="Cmake configuration",
+ command=f"cmake --debug-trycompile -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE={cmake_build_type} \
+ -DSANITIZE={sanitizer} -DENABLE_CHECK_HEAVY_BUILDS=1 -DENABLE_CLICKHOUSE_SELF_EXTRACTING=1 -DENABLE_TESTS=0 \
+ -DENABLE_UTILS=0 -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_INSTALL_PREFIX=/usr \
+ -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON \
+ -DCMAKE_C_COMPILER=clang-18 -DCMAKE_CXX_COMPILER=clang++-18 -DCOMPILER_CACHE={build_cache_type} -DENABLE_TESTS=1 \
+ -DENABLE_BUILD_PROFILING=1 {current_directory}",
+ workdir=build_dir,
+ with_log=True,
+ )
+ )
+ res = results[-1].is_ok()
+
+ if res and JobStages.BUILD in stages:
+ Shell.check("sccache --show-stats")
+ results.append(
+ Result.create_from_command_execution(
+ name="Build ClickHouse",
+ command="ninja clickhouse-bundle clickhouse-odbc-bridge clickhouse-library-bridge",
+ workdir=build_dir,
+ with_log=True,
+ )
+ )
+ Shell.check("sccache --show-stats")
+ Shell.check(f"ls -l {build_dir}/programs/")
+ res = results[-1].is_ok()
+
+ Result.create_from(results=results, stopwatch=stop_watch).finish_job_accordingly()
+
+
+if __name__ == "__main__":
+ main()
diff --git a/ci/jobs/check_style.py b/ci/jobs/check_style.py
index 1b1b0bf689b..f9cdc76302d 100644
--- a/ci/jobs/check_style.py
+++ b/ci/jobs/check_style.py
@@ -68,7 +68,7 @@ def check_duplicate_includes(file_path):
def check_whitespaces(file_paths):
for file in file_paths:
exit_code, out, err = Shell.get_res_stdout_stderr(
- f'./ci_v2/jobs/scripts/check_style/double_whitespaces.pl "{file}"',
+ f'./ci/jobs/scripts/check_style/double_whitespaces.pl "{file}"',
verbose=False,
)
if out or err:
@@ -174,7 +174,7 @@ def check_broken_links(path, exclude_paths):
def check_cpp_code():
res, out, err = Shell.get_res_stdout_stderr(
- "./ci_v2/jobs/scripts/check_style/check_cpp.sh"
+ "./ci/jobs/scripts/check_style/check_cpp.sh"
)
if err:
out += err
@@ -183,7 +183,7 @@ def check_cpp_code():
def check_repo_submodules():
res, out, err = Shell.get_res_stdout_stderr(
- "./ci_v2/jobs/scripts/check_style/check_submodules.sh"
+ "./ci/jobs/scripts/check_style/check_submodules.sh"
)
if err:
out += err
@@ -192,7 +192,7 @@ def check_repo_submodules():
def check_other():
res, out, err = Shell.get_res_stdout_stderr(
- "./ci_v2/jobs/scripts/check_style/checks_to_refactor.sh"
+ "./ci/jobs/scripts/check_style/checks_to_refactor.sh"
)
if err:
out += err
@@ -201,7 +201,7 @@ def check_other():
def check_codespell():
res, out, err = Shell.get_res_stdout_stderr(
- "./ci_v2/jobs/scripts/check_style/check_typos.sh"
+ "./ci/jobs/scripts/check_style/check_typos.sh"
)
if err:
out += err
@@ -210,7 +210,7 @@ def check_codespell():
def check_aspell():
res, out, err = Shell.get_res_stdout_stderr(
- "./ci_v2/jobs/scripts/check_style/check_aspell.sh"
+ "./ci/jobs/scripts/check_style/check_aspell.sh"
)
if err:
out += err
@@ -219,7 +219,7 @@ def check_aspell():
def check_mypy():
res, out, err = Shell.get_res_stdout_stderr(
- "./ci_v2/jobs/scripts/check_style/check-mypy"
+ "./ci/jobs/scripts/check_style/check-mypy"
)
if err:
out += err
@@ -228,7 +228,7 @@ def check_mypy():
def check_pylint():
res, out, err = Shell.get_res_stdout_stderr(
- "./ci_v2/jobs/scripts/check_style/check-pylint"
+ "./ci/jobs/scripts/check_style/check-pylint"
)
if err:
out += err
diff --git a/ci/jobs/fast_test.py b/ci/jobs/fast_test.py
index b82c17aa42c..1dcd65b6ed2 100644
--- a/ci/jobs/fast_test.py
+++ b/ci/jobs/fast_test.py
@@ -1,12 +1,13 @@
+import argparse
import threading
from pathlib import Path
-from ci_v2.jobs.scripts.functional_tests_results import FTResultsProcessor
-from praktika.environment import Environment
from praktika.result import Result
from praktika.settings import Settings
from praktika.utils import MetaClasses, Shell, Utils
+from ci.jobs.scripts.functional_tests_results import FTResultsProcessor
+
class ClickHouseProc:
def __init__(self):
@@ -208,11 +209,18 @@ class JobStages(metaclass=MetaClasses.WithIter):
TEST = "test"
+def parse_args():
+ parser = argparse.ArgumentParser(description="ClickHouse Fast Test Job")
+ parser.add_argument("--param", help="Optional custom job start stage", default=None)
+ return parser.parse_args()
+
+
def main():
+ args = parse_args()
stop_watch = Utils.Stopwatch()
stages = list(JobStages)
- stage = Environment.LOCAL_RUN_PARAM or JobStages.CHECKOUT_SUBMODULES
+ stage = args.param or JobStages.CHECKOUT_SUBMODULES
if stage:
assert stage in JobStages, f"--param must be one of [{list(JobStages)}]"
print(f"Job will start from stage [{stage}]")
diff --git a/ci/jobs/scripts/check_style/check_cpp.sh b/ci/jobs/scripts/check_style/check_cpp.sh
index 7963bf982af..2e47b253bac 100755
--- a/ci/jobs/scripts/check_style/check_cpp.sh
+++ b/ci/jobs/scripts/check_style/check_cpp.sh
@@ -52,26 +52,6 @@ find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' 2>/dev/n
# Broken symlinks
find -L $ROOT_PATH -type l 2>/dev/null | grep -v contrib && echo "^ Broken symlinks found"
-# Duplicated or incorrect setting declarations
-SETTINGS_FILE=$(mktemp)
-ALL_DECLARATION_FILES="
- $ROOT_PATH/src/Core/Settings.cpp
- $ROOT_PATH/src/Storages/MergeTree/MergeTreeSettings.cpp
- $ROOT_PATH/src/Core/FormatFactorySettingsDeclaration.h"
-
-cat $ROOT_PATH/src/Core/Settings.cpp $ROOT_PATH/src/Core/FormatFactorySettingsDeclaration.h | grep "M(" | awk '{print substr($2, 0, length($2) - 1) " Settings" substr($1, 3, length($1) - 3) " SettingsDeclaration" }' | sort | uniq > ${SETTINGS_FILE}
-cat $ROOT_PATH/src/Storages/MergeTree/MergeTreeSettings.cpp | grep "M(" | awk '{print substr($2, 0, length($2) - 1) " MergeTreeSettings" substr($1, 3, length($1) - 3) " SettingsDeclaration" }' | sort | uniq >> ${SETTINGS_FILE}
-
-# Check that if there are duplicated settings (declared in different objects) they all have the same type (it's simpler to validate style with that assert)
-for setting in $(awk '{print $1 " " $2}' ${SETTINGS_FILE} | sed -e 's/MergeTreeSettings//g' -e 's/Settings//g' | sort | uniq | awk '{ print $1 }' | uniq -d);
-do
- echo "# Found multiple definitions of setting ${setting} with different types: "
- grep --line-number " ${setting}," ${ALL_DECLARATION_FILES} | awk '{print " > " $0 }'
-done
-
-# We append all uses of extern found in implementation files to validate them in a single pass and avoid reading the same files over and over
-find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' | xargs grep -e "^\s*extern const Settings" -e "^\s**extern const MergeTreeSettings" -T | awk '{print substr($5, 0, length($5) -1) " " $4 " " substr($1, 0, length($1) - 1)}' >> ${SETTINGS_FILE}
-
# Duplicated or incorrect setting declarations
bash $ROOT_PATH/utils/check-style/check-settings-style
diff --git a/ci/praktika/_environment.py b/ci/praktika/_environment.py
index ca84def1d29..ce9c6f5b486 100644
--- a/ci/praktika/_environment.py
+++ b/ci/praktika/_environment.py
@@ -29,9 +29,9 @@ class _Environment(MetaClasses.Serializable):
INSTANCE_TYPE: str
INSTANCE_ID: str
INSTANCE_LIFE_CYCLE: str
+ LOCAL_RUN: bool = False
PARAMETER: Any = None
REPORT_INFO: List[str] = dataclasses.field(default_factory=list)
- LOCAL_RUN_PARAM: str = ""
name = "environment"
@classmethod
@@ -185,6 +185,9 @@ class _Environment(MetaClasses.Serializable):
REPORT_URL = f"https://{path}/{Path(Settings.HTML_PAGE_FILE).name}?PR={self.PR_NUMBER}&sha={self.SHA}&name_0={urllib.parse.quote(self.WORKFLOW_NAME, safe='')}&name_1={urllib.parse.quote(self.JOB_NAME, safe='')}"
return REPORT_URL
+ def is_local_run(self):
+ return self.LOCAL_RUN
+
def _to_object(data):
if isinstance(data, dict):
diff --git a/ci/praktika/_settings.py b/ci/praktika/_settings.py
index bfd7ba6c1be..3052d8ef877 100644
--- a/ci/praktika/_settings.py
+++ b/ci/praktika/_settings.py
@@ -8,11 +8,7 @@ class _Settings:
######################################
# Pipeline generation settings #
######################################
- if Path("./ci_v2").is_dir():
- # TODO: hack for CH, remove
- CI_PATH = "./ci_v2"
- else:
- CI_PATH = "./ci"
+ CI_PATH = "./ci"
WORKFLOW_PATH_PREFIX: str = "./.github/workflows"
WORKFLOWS_DIRECTORY: str = f"{CI_PATH}/workflows"
SETTINGS_DIRECTORY: str = f"{CI_PATH}/settings"
diff --git a/ci/praktika/digest.py b/ci/praktika/digest.py
index 44317d5249e..93b62b13dc0 100644
--- a/ci/praktika/digest.py
+++ b/ci/praktika/digest.py
@@ -1,6 +1,8 @@
import dataclasses
import hashlib
+import os
from hashlib import md5
+from pathlib import Path
from typing import List
from praktika import Job
@@ -37,7 +39,9 @@ class Digest:
sorted=True,
)
- print(f"calc digest: hash_key [{cache_key}], include [{included_files}] files")
+ print(
+ f"calc digest for job [{job_config.name}]: hash_key [{cache_key}], include [{len(included_files)}] files"
+ )
# Sort files to ensure consistent hash calculation
included_files.sort()
@@ -91,10 +95,18 @@ class Digest:
@staticmethod
def _calc_file_digest(file_path, hash_md5):
- # Calculate MD5 hash
- with open(file_path, "rb") as f:
+ # Resolve file path if it's a symbolic link
+ resolved_path = file_path
+ if Path(file_path).is_symlink():
+ resolved_path = os.path.realpath(file_path)
+ if not Path(resolved_path).is_file():
+ print(
+ f"WARNING: No valid file resolved by link {file_path} -> {resolved_path} - skipping digest calculation"
+ )
+ return hash_md5.hexdigest()[: Settings.CACHE_DIGEST_LEN]
+
+ with open(resolved_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
- res = hash_md5.hexdigest()[: Settings.CACHE_DIGEST_LEN]
- return res
+ return hash_md5.hexdigest()[: Settings.CACHE_DIGEST_LEN]
diff --git a/ci/praktika/hook_html.py b/ci/praktika/hook_html.py
index c998e817fe7..f4bd4435511 100644
--- a/ci/praktika/hook_html.py
+++ b/ci/praktika/hook_html.py
@@ -1,5 +1,8 @@
+import dataclasses
+import json
import urllib.parse
from pathlib import Path
+from typing import List
from praktika._environment import _Environment
from praktika.gh import GH
@@ -8,12 +11,50 @@ from praktika.result import Result, ResultInfo
from praktika.runtime import RunConfig
from praktika.s3 import S3
from praktika.settings import Settings
-from praktika.utils import Utils
+from praktika.utils import Shell, Utils
+
+
+@dataclasses.dataclass
+class GitCommit:
+ date: str
+ message: str
+ sha: str
+
+ @staticmethod
+ def from_json(json_data: str) -> List["GitCommit"]:
+ commits = []
+ try:
+ data = json.loads(json_data)
+
+ commits = [
+ GitCommit(
+ message=commit["messageHeadline"],
+ sha=commit["oid"],
+ date=commit["committedDate"],
+ )
+ for commit in data.get("commits", [])
+ ]
+ except Exception as e:
+ print(
+ f"ERROR: Failed to deserialize commit's data: [{json_data}], ex: [{e}]"
+ )
+
+ return commits
class HtmlRunnerHooks:
@classmethod
def configure(cls, _workflow):
+
+ def _get_pr_commits(pr_number):
+ res = []
+ if not pr_number:
+ return res
+ output = Shell.get_output(f"gh pr view {pr_number} --json commits")
+ if output:
+ res = GitCommit.from_json(output)
+ return res
+
# generate pending Results for all jobs in the workflow
if _workflow.enable_cache:
skip_jobs = RunConfig.from_fs(_workflow.name).cache_success
@@ -62,10 +103,14 @@ class HtmlRunnerHooks:
or_update_comment_with_substring=f"Workflow [",
)
if not (res1 or res2):
- print(
- "ERROR: Failed to set both GH commit status and PR comment with Workflow Status, cannot proceed"
+ Utils.raise_with_error(
+ "Failed to set both GH commit status and PR comment with Workflow Status, cannot proceed"
)
- raise
+
+ if env.PR_NUMBER:
+ commits = _get_pr_commits(env.PR_NUMBER)
+ # TODO: upload commits data to s3 to visualise it on a report page
+ print(commits)
@classmethod
def pre_run(cls, _workflow, _job):
diff --git a/ci/praktika/json.html b/ci/praktika/json.html
index fe7b65a5ec5..2f8c3e45d0b 100644
--- a/ci/praktika/json.html
+++ b/ci/praktika/json.html
@@ -24,13 +24,15 @@
margin: 0;
display: flex;
flex-direction: column;
- font-family: monospace, sans-serif;
+ font-family: 'IBM Plex Mono Condensed', monospace, sans-serif;
+ --header-background-color: #f4f4f4;
}
body.night-theme {
--background-color: #1F1F1C;
--text-color: #fff;
--tile-background: black;
+ --header-background-color: #1F1F1C;
}
#info-container {
@@ -50,27 +52,41 @@
background-color: var(--tile-background);
padding: 20px;
box-sizing: border-box;
- text-align: left;
font-size: 18px;
+ margin: 0;
+ }
+
+ #status-container a {
+ color: #007bff;
+ text-decoration: underline;
font-weight: bold;
- margin: 0; /* Remove margin */
- }
-
- #status-container button {
- display: block; /* Stack buttons vertically */
- width: 100%; /* Full width of container */
- padding: 10px;
- margin-bottom: 10px; /* Space between buttons */
- background-color: #4CAF50; /* Green background color */
- color: white;
- border: none;
- border-radius: 5px;
- font-size: 16px;
cursor: pointer;
+ display: inline-block;
+ margin-top: 5px;
+ margin-left: 20px;
+ padding: 2px 0;
+ font-size: 0.8em;
}
- #status-container button:hover {
- background-color: #45a049; /* Darker green on hover */
+ #status-container a:hover {
+ color: #0056b3;
+ text-decoration: none;
+ }
+
+ .key-value-pair {
+ display: flex; /* Enable Flexbox for alignment */
+ justify-content: space-between; /* Distribute space between key and value */
+ margin-bottom: 20px; /* Add space between each pair */
+ }
+
+ .json-key {
+ font-weight: bold;
+ }
+
+ .json-value {
+ font-weight: normal;
+ font-family: 'Source Code Pro', monospace, sans-serif;
+ letter-spacing: -0.5px;
}
#result-container {
@@ -203,7 +219,7 @@
}
th {
- background-color: #f4f4f4;
+ background-color: var(--header-background-color);
}
.status-success {
@@ -240,23 +256,6 @@
color: grey;
font-weight: bold;
}
-
- .json-key {
- font-weight: bold;
- margin-top: 10px;
- }
-
- .json-value {
- margin-left: 20px;
- }
-
- .json-value a {
- color: #007bff;
- }
-
- .json-value a:hover {
- text-decoration: underline;
- }
@@ -286,7 +285,6 @@
// Attach the toggle function to the click event of the icon
document.getElementById('theme-toggle').addEventListener('click', toggleTheme);
- // Function to format timestamp to "DD-mmm-YYYY HH:MM:SS.MM"
function formatTimestamp(timestamp, showDate = true) {
const date = new Date(timestamp * 1000);
const day = String(date.getDate()).padStart(2, '0');
@@ -304,6 +302,38 @@
: `${hours}:${minutes}:${seconds}`;
}
+ function formatDuration(durationInSeconds, detailed = false) {
+ // Check if the duration is empty, null, or not a number
+ if (!durationInSeconds || isNaN(durationInSeconds)) {
+ return '';
+ }
+
+ // Ensure duration is a floating-point number
+ const duration = parseFloat(durationInSeconds);
+
+ if (detailed) {
+ // Format in the detailed format with hours, minutes, and seconds
+ const hours = Math.floor(duration / 3600);
+ const minutes = Math.floor((duration % 3600) / 60);
+ const seconds = Math.floor(duration % 60);
+
+ const formattedHours = hours > 0 ? `${hours}h ` : '';
+ const formattedMinutes = minutes > 0 ? `${minutes}m ` : '';
+ const formattedSeconds = `${String(seconds).padStart(2, '0')}s`;
+
+ return `${formattedHours}${formattedMinutes}${formattedSeconds}`.trim();
+ } else {
+ // Format in the default format with seconds and milliseconds
+ const seconds = Math.floor(duration);
+ const milliseconds = Math.floor((duration % 1) * 1000);
+
+ const formattedSeconds = String(seconds);
+ const formattedMilliseconds = String(milliseconds).padStart(3, '0');
+
+ return `${formattedSeconds}.${formattedMilliseconds}`;
+ }
+ }
+
// Function to determine status class based on value
function getStatusClass(status) {
const lowerStatus = status.toLowerCase();
@@ -316,32 +346,13 @@
return 'status-other';
}
- // Function to format duration from seconds to "HH:MM:SS"
- function formatDuration(durationInSeconds) {
- // Check if the duration is empty, null, or not a number
- if (!durationInSeconds || isNaN(durationInSeconds)) {
- return '';
- }
-
- // Ensure duration is a floating-point number
- const duration = parseFloat(durationInSeconds);
-
- // Calculate seconds and milliseconds
- const seconds = Math.floor(duration); // Whole seconds
- const milliseconds = Math.floor((duration % 1) * 1000); // Convert fraction to milliseconds
-
- // Format seconds and milliseconds with leading zeros where needed
- const formattedSeconds = String(seconds);
- const formattedMilliseconds = String(milliseconds).padStart(3, '0');
-
- // Return the formatted duration as seconds.milliseconds
- return `${formattedSeconds}.${formattedMilliseconds}`;
- }
-
function addKeyValueToStatus(key, value) {
const statusContainer = document.getElementById('status-container');
+ let keyValuePair = document.createElement('div');
+ keyValuePair.className = 'key-value-pair';
+
const keyElement = document.createElement('div');
keyElement.className = 'json-key';
keyElement.textContent = key + ':';
@@ -350,8 +361,9 @@
valueElement.className = 'json-value';
valueElement.textContent = value;
- statusContainer.appendChild(keyElement);
- statusContainer.appendChild(valueElement);
+ keyValuePair.appendChild(keyElement)
+ keyValuePair.appendChild(valueElement)
+ statusContainer.appendChild(keyValuePair);
}
function addFileButtonToStatus(key, links) {
@@ -364,64 +376,68 @@
const keyElement = document.createElement('div');
keyElement.className = 'json-key';
- keyElement.textContent = key + ':';
+ keyElement.textContent = columnSymbols[key] + ':' || key;
statusContainer.appendChild(keyElement);
if (Array.isArray(links) && links.length > 0) {
links.forEach(link => {
- // const a = document.createElement('a');
- // a.href = link;
- // a.textContent = link.split('/').pop();
- // a.target = '_blank';
- // statusContainer.appendChild(a);
- const button = document.createElement('button');
- button.textContent = link.split('/').pop();
- button.addEventListener('click', function () {
- window.location.href = link;
- });
- statusContainer.appendChild(button);
+ const textLink = document.createElement('a');
+ textLink.href = link;
+ textLink.textContent = link.split('/').pop();
+ textLink.target = '_blank';
+ statusContainer.appendChild(textLink);
+ statusContainer.appendChild(document.createElement('br'));
});
}
}
function addStatusToStatus(status, start_time, duration) {
- const statusContainer = document.getElementById('status-container');
+ const statusContainer = document.getElementById('status-container')
+ let keyValuePair = document.createElement('div');
+ keyValuePair.className = 'key-value-pair';
let keyElement = document.createElement('div');
let valueElement = document.createElement('div');
keyElement.className = 'json-key';
valueElement.className = 'json-value';
- keyElement.textContent = 'status:';
+ keyElement.textContent = columnSymbols['status'] + ':' || 'status:';
valueElement.classList.add('status-value');
valueElement.classList.add(getStatusClass(status));
valueElement.textContent = status;
- statusContainer.appendChild(keyElement);
- statusContainer.appendChild(valueElement);
+ keyValuePair.appendChild(keyElement);
+ keyValuePair.appendChild(valueElement);
+ statusContainer.appendChild(keyValuePair);
+ keyValuePair = document.createElement('div');
+ keyValuePair.className = 'key-value-pair';
keyElement = document.createElement('div');
valueElement = document.createElement('div');
keyElement.className = 'json-key';
valueElement.className = 'json-value';
- keyElement.textContent = 'start_time:';
+ keyElement.textContent = columnSymbols['start_time'] + ':' || 'start_time:';
valueElement.textContent = formatTimestamp(start_time);
- statusContainer.appendChild(keyElement);
- statusContainer.appendChild(valueElement);
+ keyValuePair.appendChild(keyElement);
+ keyValuePair.appendChild(valueElement);
+ statusContainer.appendChild(keyValuePair);
+ keyValuePair = document.createElement('div');
+ keyValuePair.className = 'key-value-pair';
keyElement = document.createElement('div');
valueElement = document.createElement('div');
keyElement.className = 'json-key';
valueElement.className = 'json-value';
- keyElement.textContent = 'duration:';
+ keyElement.textContent = columnSymbols['duration'] + ':' || 'duration:';
if (duration === null) {
// Set initial value to 0 and add a unique ID or data attribute to identify the duration element
valueElement.textContent = '00:00:00';
valueElement.setAttribute('id', 'duration-value');
} else {
// Format the duration if it's a valid number
- valueElement.textContent = formatDuration(duration);
+ valueElement.textContent = formatDuration(duration, true);
}
- statusContainer.appendChild(keyElement);
- statusContainer.appendChild(valueElement);
+ keyValuePair.appendChild(keyElement);
+ keyValuePair.appendChild(valueElement);
+ statusContainer.appendChild(keyValuePair);
}
function navigatePath(jsonObj, nameArray) {
@@ -470,11 +486,12 @@
const columns = ['name', 'status', 'start_time', 'duration', 'info'];
const columnSymbols = {
- name: '👤',
+ name: '📂',
status: '✔️',
start_time: '🕒',
duration: '⏳',
- info: '⚠️'
+ info: 'ℹ️',
+ files: '📄'
};
function createResultsTable(results, nest_level) {
@@ -626,6 +643,7 @@
footerRight.appendChild(a);
});
}
+
addStatusToStatus(targetData.status, targetData.start_time, targetData.duration)
// Handle links
@@ -639,7 +657,7 @@
const intervalId = setInterval(() => {
duration++;
- durationElement.textContent = formatDuration(duration);
+ durationElement.textContent = formatDuration(duration, true);
}, 1000);
}
diff --git a/ci/praktika/runner.py b/ci/praktika/runner.py
index 15e759397ec..797a799a74d 100644
--- a/ci/praktika/runner.py
+++ b/ci/praktika/runner.py
@@ -42,6 +42,7 @@ class Runner:
INSTANCE_ID="",
INSTANCE_TYPE="",
INSTANCE_LIFE_CYCLE="",
+ LOCAL_RUN=True,
).dump()
workflow_config = RunConfig(
name=workflow.name,
@@ -76,9 +77,6 @@ class Runner:
os.environ[key] = value
print(f"Set environment variable {key}.")
- # TODO: remove
- os.environ["PYTHONPATH"] = os.getcwd()
-
print("Read GH Environment")
env = _Environment.from_env()
env.JOB_NAME = job.name
@@ -132,9 +130,7 @@ class Runner:
f"Custom param for local tests must be of type str, got [{type(param)}]"
)
env = _Environment.get()
- env.LOCAL_RUN_PARAM = param
env.dump()
- print(f"Custom param for local tests [{param}] dumped into Environment")
if job.run_in_docker and not no_docker:
# TODO: add support for any image, including not from ci config (e.g. ubuntu:latest)
@@ -142,9 +138,13 @@ class Runner:
job.run_in_docker
]
docker = docker or f"{job.run_in_docker}:{docker_tag}"
- cmd = f"docker run --rm --user \"$(id -u):$(id -g)\" -e PYTHONPATH='{Settings.DOCKER_WD}' --volume ./:{Settings.DOCKER_WD} --volume {Settings.TEMP_DIR}:{Settings.TEMP_DIR} --workdir={Settings.DOCKER_WD} {docker} {job.command}"
+ cmd = f"docker run --rm --user \"$(id -u):$(id -g)\" -e PYTHONPATH='{Settings.DOCKER_WD}:{Settings.DOCKER_WD}/ci' --volume ./:{Settings.DOCKER_WD} --volume {Settings.TEMP_DIR}:{Settings.TEMP_DIR} --workdir={Settings.DOCKER_WD} {docker} {job.command}"
else:
cmd = job.command
+
+ if param:
+ print(f"Custom --param [{param}] will be passed to job's script")
+ cmd += f" --param {param}"
print(f"--- Run command [{cmd}]")
with TeePopen(cmd, timeout=job.timeout) as process:
diff --git a/ci/praktika/utils.py b/ci/praktika/utils.py
index 1983ce274a3..b96c78e4fa7 100644
--- a/ci/praktika/utils.py
+++ b/ci/praktika/utils.py
@@ -348,9 +348,9 @@ class Utils:
return multiprocessing.cpu_count()
@staticmethod
- def raise_with_error(error_message, stdout="", stderr=""):
+ def raise_with_error(error_message, stdout="", stderr="", ex=None):
Utils.print_formatted_error(error_message, stdout, stderr)
- raise
+ raise ex or RuntimeError()
@staticmethod
def timestamp():
diff --git a/ci/praktika/yaml_generator.py b/ci/praktika/yaml_generator.py
index 9c61b5e2f79..00c469fec0c 100644
--- a/ci/praktika/yaml_generator.py
+++ b/ci/praktika/yaml_generator.py
@@ -83,8 +83,8 @@ jobs:
{JOB_ADDONS}
- name: Prepare env script
run: |
- export PYTHONPATH=.:$PYTHONPATH
cat > {ENV_SETUP_SCRIPT} << 'ENV_SETUP_SCRIPT_EOF'
+ export PYTHONPATH=./ci:.
{SETUP_ENVS}
cat > {WORKFLOW_CONFIG_FILE} << 'EOF'
${{{{ needs.{WORKFLOW_CONFIG_JOB_NAME}.outputs.data }}}}
@@ -100,6 +100,7 @@ jobs:
- name: Run
id: run
run: |
+ . /tmp/praktika_setup_env.sh
set -o pipefail
{PYTHON} -m praktika run --job '''{JOB_NAME}''' --workflow "{WORKFLOW_NAME}" --ci |& tee {RUN_LOG}
{UPLOADS_GITHUB}\
diff --git a/ci/settings/definitions.py b/ci/settings/definitions.py
index 4e6a7f213f0..176e865e6f3 100644
--- a/ci/settings/definitions.py
+++ b/ci/settings/definitions.py
@@ -30,133 +30,133 @@ SECRETS = [
DOCKERS = [
# Docker.Config(
# name="clickhouse/binary-builder",
- # path="./ci_v2/docker/packager/binary-builder",
+ # path="./ci/docker/packager/binary-builder",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/cctools",
- # path="./ci_v2/docker/packager/cctools",
+ # path="./ci/docker/packager/cctools",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/test-old-centos",
- # path="./ci_v2/docker/test/compatibility/centos",
+ # path="./ci/docker/test/compatibility/centos",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/test-old-ubuntu",
- # path="./ci_v2/docker/test/compatibility/ubuntu",
+ # path="./ci/docker/test/compatibility/ubuntu",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/test-util",
- # path="./ci_v2/docker/test/util",
+ # path="./ci/docker/test/util",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/integration-test",
- # path="./ci_v2/docker/test/integration/base",
+ # path="./ci/docker/test/integration/base",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/fuzzer",
- # path="./ci_v2/docker/test/fuzzer",
+ # path="./ci/docker/test/fuzzer",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/performance-comparison",
- # path="./ci_v2/docker/test/performance-comparison",
+ # path="./ci/docker/test/performance-comparison",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
Docker.Config(
name="clickhouse/fasttest",
- path="./ci_v2/docker/fasttest",
+ path="./ci/docker/fasttest",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
# Docker.Config(
# name="clickhouse/test-base",
- # path="./ci_v2/docker/test/base",
+ # path="./ci/docker/test/base",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-util"],
# ),
# Docker.Config(
# name="clickhouse/clickbench",
- # path="./ci_v2/docker/test/clickbench",
+ # path="./ci/docker/test/clickbench",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/keeper-jepsen-test",
- # path="./ci_v2/docker/test/keeper-jepsen",
+ # path="./ci/docker/test/keeper-jepsen",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/server-jepsen-test",
- # path="./ci_v2/docker/test/server-jepsen",
+ # path="./ci/docker/test/server-jepsen",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/sqllogic-test",
- # path="./ci_v2/docker/test/sqllogic",
+ # path="./ci/docker/test/sqllogic",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/sqltest",
- # path="./ci_v2/docker/test/sqltest",
+ # path="./ci/docker/test/sqltest",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/stateless-test",
- # path="./ci_v2/docker/test/stateless",
+ # path="./ci/docker/test/stateless",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/stateful-test",
- # path="./ci_v2/docker/test/stateful",
+ # path="./ci/docker/test/stateful",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/stateless-test"],
# ),
# Docker.Config(
# name="clickhouse/stress-test",
- # path="./ci_v2/docker/test/stress",
+ # path="./ci/docker/test/stress",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/stateful-test"],
# ),
# Docker.Config(
# name="clickhouse/unit-test",
- # path="./ci_v2/docker/test/unit",
+ # path="./ci/docker/test/unit",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/integration-tests-runner",
- # path="./ci_v2/docker/test/integration/runner",
+ # path="./ci/docker/test/integration/runner",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
Docker.Config(
name="clickhouse/style-test",
- path="./ci_v2/docker/style-test",
+ path="./ci/docker/style-test",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
# Docker.Config(
# name="clickhouse/docs-builder",
- # path="./ci_v2/docker/docs/builder",
+ # path="./ci/docker/docs/builder",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
@@ -230,3 +230,4 @@ DOCKERS = [
class JobNames:
STYLE_CHECK = "Style Check"
FAST_TEST = "Fast test"
+ BUILD_AMD_DEBUG = "Build amd64 debug"
diff --git a/ci/settings/settings.py b/ci/settings/settings.py
index 153aab93506..8d5e7bc3c87 100644
--- a/ci/settings/settings.py
+++ b/ci/settings/settings.py
@@ -1,4 +1,4 @@
-from ci_v2.settings.definitions import (
+from ci.settings.definitions import (
S3_BUCKET_HTTP_ENDPOINT,
S3_BUCKET_NAME,
RunnerLabels,
diff --git a/ci/workflows/pull_request.py b/ci/workflows/pull_request.py
index 0e96329788b..74129177efb 100644
--- a/ci/workflows/pull_request.py
+++ b/ci/workflows/pull_request.py
@@ -1,26 +1,62 @@
from typing import List
-from ci_v2.settings.definitions import (
+from praktika import Artifact, Job, Workflow
+from praktika.settings import Settings
+
+from ci.settings.definitions import (
BASE_BRANCH,
DOCKERS,
SECRETS,
JobNames,
RunnerLabels,
)
-from praktika import Job, Workflow
+
+
+class ArtifactNames:
+ ch_debug_binary = "clickhouse_debug_binary"
+
style_check_job = Job.Config(
name=JobNames.STYLE_CHECK,
runs_on=[RunnerLabels.CI_SERVICES],
- command="python3 ./ci_v2/jobs/check_style.py",
+ command="python3 ./ci/jobs/check_style.py",
run_in_docker="clickhouse/style-test",
)
fast_test_job = Job.Config(
name=JobNames.FAST_TEST,
runs_on=[RunnerLabels.BUILDER],
- command="python3 ./ci_v2/jobs/fast_test.py",
+ command="python3 ./ci/jobs/fast_test.py",
run_in_docker="clickhouse/fasttest",
+ digest_config=Job.CacheDigestConfig(
+ include_paths=[
+ "./ci/jobs/fast_test.py",
+ "./tests/queries/0_stateless/",
+ "./src",
+ ],
+ ),
+)
+
+job_build_amd_debug = Job.Config(
+ name=JobNames.BUILD_AMD_DEBUG,
+ runs_on=[RunnerLabels.BUILDER],
+ command="python3 ./ci/jobs/build_clickhouse.py amd_debug",
+ run_in_docker="clickhouse/fasttest",
+ digest_config=Job.CacheDigestConfig(
+ include_paths=[
+ "./src",
+ "./contrib/",
+ "./CMakeLists.txt",
+ "./PreLoad.cmake",
+ "./cmake",
+ "./base",
+ "./programs",
+ "./docker/packager/packager",
+ "./rust",
+ "./tests/ci/version_helper.py",
+ ],
+ ),
+ provides=[ArtifactNames.ch_debug_binary],
)
workflow = Workflow.Config(
@@ -30,6 +66,14 @@ workflow = Workflow.Config(
jobs=[
style_check_job,
fast_test_job,
+ job_build_amd_debug,
+ ],
+ artifacts=[
+ Artifact.Config(
+ name=ArtifactNames.ch_debug_binary,
+ type=Artifact.Type.S3,
+ path=f"{Settings.TEMP_DIR}/build/programs/clickhouse",
+ )
],
dockers=DOCKERS,
secrets=SECRETS,
diff --git a/cmake/autogenerated_versions.txt b/cmake/autogenerated_versions.txt
index 91a7e976aaf..99141510248 100644
--- a/cmake/autogenerated_versions.txt
+++ b/cmake/autogenerated_versions.txt
@@ -2,11 +2,11 @@
# NOTE: VERSION_REVISION has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
-SET(VERSION_REVISION 54491)
+SET(VERSION_REVISION 54492)
SET(VERSION_MAJOR 24)
-SET(VERSION_MINOR 10)
+SET(VERSION_MINOR 11)
SET(VERSION_PATCH 1)
-SET(VERSION_GITHASH b12a367741812f9e5fe754d19ebae600e2a2614c)
-SET(VERSION_DESCRIBE v24.10.1.1-testing)
-SET(VERSION_STRING 24.10.1.1)
+SET(VERSION_GITHASH c82cf25b3e5864bcc153cbe45adb8c6527e1ec6e)
+SET(VERSION_DESCRIBE v24.11.1.1-testing)
+SET(VERSION_STRING 24.11.1.1)
# end of autochange
diff --git a/contrib/numactl b/contrib/numactl
index 8d13d63a05f..ff32c618d63 160000
--- a/contrib/numactl
+++ b/contrib/numactl
@@ -1 +1 @@
-Subproject commit 8d13d63a05f0c3cd88bf777cbb61541202b7da08
+Subproject commit ff32c618d63ca7ac48cce366c5a04bb3563683a0
diff --git a/docker/test/base/setup_export_logs.sh b/docker/test/base/setup_export_logs.sh
index a39f96867be..12f1cc4d357 100755
--- a/docker/test/base/setup_export_logs.sh
+++ b/docker/test/base/setup_export_logs.sh
@@ -25,7 +25,7 @@ EXTRA_COLUMNS_EXPRESSION_TRACE_LOG="${EXTRA_COLUMNS_EXPRESSION}, arrayMap(x -> d
# coverage_log needs more columns for symbolization, but only symbol names (the line numbers are too heavy to calculate)
EXTRA_COLUMNS_COVERAGE_LOG="${EXTRA_COLUMNS} symbols Array(LowCardinality(String)), "
-EXTRA_COLUMNS_EXPRESSION_COVERAGE_LOG="${EXTRA_COLUMNS_EXPRESSION}, arrayMap(x -> demangle(addressToSymbol(x)), coverage)::Array(LowCardinality(String)) AS symbols"
+EXTRA_COLUMNS_EXPRESSION_COVERAGE_LOG="${EXTRA_COLUMNS_EXPRESSION}, arrayDistinct(arrayMap(x -> demangle(addressToSymbol(x)), coverage))::Array(LowCardinality(String)) AS symbols"
function __set_connection_args
diff --git a/docker/test/style/Dockerfile b/docker/test/style/Dockerfile
index fa6b087eb7d..564301f447c 100644
--- a/docker/test/style/Dockerfile
+++ b/docker/test/style/Dockerfile
@@ -28,7 +28,7 @@ COPY requirements.txt /
RUN pip3 install --no-cache-dir -r requirements.txt
RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && locale-gen en_US.UTF-8
-ENV LC_ALL en_US.UTF-8
+ENV LC_ALL=en_US.UTF-8
# Architecture of the image when BuildKit/buildx is used
ARG TARGETARCH
diff --git a/docker/test/style/requirements.txt b/docker/test/style/requirements.txt
index cc87f6e548d..aab20b5bee0 100644
--- a/docker/test/style/requirements.txt
+++ b/docker/test/style/requirements.txt
@@ -12,6 +12,7 @@ charset-normalizer==3.3.2
click==8.1.7
codespell==2.2.1
cryptography==43.0.1
+datacompy==0.7.3
Deprecated==1.2.14
dill==0.3.8
flake8==4.0.1
@@ -23,6 +24,7 @@ mccabe==0.6.1
multidict==6.0.5
mypy==1.8.0
mypy-extensions==1.0.0
+pandas==2.2.3
packaging==24.1
pathspec==0.9.0
pip==24.1.1
diff --git a/docs/en/engines/table-engines/integrations/s3.md b/docs/en/engines/table-engines/integrations/s3.md
index fb759b948a5..fd27d4b6ed9 100644
--- a/docs/en/engines/table-engines/integrations/s3.md
+++ b/docs/en/engines/table-engines/integrations/s3.md
@@ -290,6 +290,7 @@ The following settings can be specified in configuration file for given endpoint
- `expiration_window_seconds` — Grace period for checking if expiration-based credentials have expired. Optional, default value is `120`.
- `no_sign_request` - Ignore all the credentials so requests are not signed. Useful for accessing public buckets.
- `header` — Adds specified HTTP header to a request to given endpoint. Optional, can be specified multiple times.
+- `access_header` - Adds specified HTTP header to a request to given endpoint, in cases where there are no other credentials from another source.
- `server_side_encryption_customer_key_base64` — If specified, required headers for accessing S3 objects with SSE-C encryption will be set. Optional.
- `server_side_encryption_kms_key_id` - If specified, required headers for accessing S3 objects with [SSE-KMS encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) will be set. If an empty string is specified, the AWS managed S3 key will be used. Optional.
- `server_side_encryption_kms_encryption_context` - If specified alongside `server_side_encryption_kms_key_id`, the given encryption context header for SSE-KMS will be set. Optional.
@@ -320,6 +321,32 @@ The following settings can be specified in configuration file for given endpoint
```
+## Working with archives
+
+Suppose that we have several archive files with following URIs on S3:
+
+- 'https://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-2018-01-10.csv.zip'
+- 'https://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-2018-01-11.csv.zip'
+- 'https://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-2018-01-12.csv.zip'
+
+Extracting data from these archives is possible using ::. Globs can be used both in the url part as well as in the part after :: (responsible for the name of a file inside the archive).
+
+``` sql
+SELECT *
+FROM s3(
+ 'https://s3-us-west-1.amazonaws.com/umbrella-static/top-1m-2018-01-1{0..2}.csv.zip :: *.csv'
+);
+```
+
+:::note
+ClickHouse supports three archive formats:
+ZIP
+TAR
+7Z
+While ZIP and TAR archives can be accessed from any supported storage location, 7Z archives can only be read from the local filesystem where ClickHouse is installed.
+:::
+
+
## Accessing public buckets
ClickHouse tries to fetch credentials from many different types of sources.
@@ -331,6 +358,10 @@ CREATE TABLE big_table (name String, value UInt32)
ENGINE = S3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv', NOSIGN, 'CSVWithNames');
```
+## Optimizing performance
+
+For details on optimizing the performance of the s3 function see [our detailed guide](/docs/en/integrations/s3/performance).
+
## See also
- [s3 table function](../../../sql-reference/table-functions/s3.md)
diff --git a/docs/en/engines/table-engines/mergetree-family/replacingmergetree.md b/docs/en/engines/table-engines/mergetree-family/replacingmergetree.md
index 5a0a2691a9e..3670c763da6 100644
--- a/docs/en/engines/table-engines/mergetree-family/replacingmergetree.md
+++ b/docs/en/engines/table-engines/mergetree-family/replacingmergetree.md
@@ -12,6 +12,10 @@ Data deduplication occurs only during a merge. Merging occurs in the background
Thus, `ReplacingMergeTree` is suitable for clearing out duplicate data in the background in order to save space, but it does not guarantee the absence of duplicates.
+:::note
+A detailed guide on ReplacingMergeTree, including best practices and how to optimize performance, is available [here](/docs/en/guides/replacing-merge-tree).
+:::
+
## Creating a Table {#creating-a-table}
``` sql
@@ -162,3 +166,51 @@ All of the parameters excepting `ver` have the same meaning as in `MergeTree`.
- `ver` - column with the version. Optional parameter. For a description, see the text above.
+
+## Query time de-duplication & FINAL
+
+At merge time, the ReplacingMergeTree identifies duplicate rows, using the values of the `ORDER BY` columns (used to create the table) as a unique identifier, and retains only the highest version. This, however, offers eventual correctness only - it does not guarantee rows will be deduplicated, and you should not rely on it. Queries can, therefore, produce incorrect answers due to update and delete rows being considered in queries.
+
+To obtain correct answers, users will need to complement background merges with query time deduplication and deletion removal. This can be achieved using the `FINAL` operator. For example, consider the following example:
+
+```sql
+CREATE TABLE rmt_example
+(
+ `number` UInt16
+)
+ENGINE = ReplacingMergeTree
+ORDER BY number
+
+INSERT INTO rmt_example SELECT floor(randUniform(0, 100)) AS number
+FROM numbers(1000000000)
+
+0 rows in set. Elapsed: 19.958 sec. Processed 1.00 billion rows, 8.00 GB (50.11 million rows/s., 400.84 MB/s.)
+```
+Querying without `FINAL` produces an incorrect count (exact result will vary depending on merges):
+
+```sql
+SELECT count()
+FROM rmt_example
+
+┌─count()─┐
+│ 200 │
+└─────────┘
+
+1 row in set. Elapsed: 0.002 sec.
+```
+
+Adding final produces a correct result:
+
+```sql
+SELECT count()
+FROM rmt_example
+FINAL
+
+┌─count()─┐
+│ 100 │
+└─────────┘
+
+1 row in set. Elapsed: 0.002 sec.
+```
+
+For further details on `FINAL`, including how to optimize `FINAL` performance, we recommend reading our [detailed guide on ReplacingMergeTree](/docs/en/guides/replacing-merge-tree).
diff --git a/docs/en/operations/server-configuration-parameters/settings.md b/docs/en/operations/server-configuration-parameters/settings.md
index b6238487725..02fa5a8ca58 100644
--- a/docs/en/operations/server-configuration-parameters/settings.md
+++ b/docs/en/operations/server-configuration-parameters/settings.md
@@ -1975,6 +1975,22 @@ The default is `false`.
true
```
+## async_load_system_database {#async_load_system_database}
+
+Asynchronous loading of system tables. Helpful if there is a high amount of log tables and parts in the `system` database. Independent of the `async_load_databases` setting.
+
+If set to `true`, all system databases with `Ordinary`, `Atomic`, and `Replicated` engines will be loaded asynchronously after the ClickHouse server starts. See `system.asynchronous_loader` table, `tables_loader_background_pool_size` and `tables_loader_foreground_pool_size` server settings. Any query that tries to access a system table, that is not yet loaded, will wait for exactly this table to be started up. The table that is waited for by at least one query will be loaded with higher priority. Also consider setting the `max_waiting_queries` setting to limit the total number of waiting queries.
+
+If `false`, system database loads before server start.
+
+The default is `false`.
+
+**Example**
+
+``` xml
+true
+```
+
## tables_loader_foreground_pool_size {#tables_loader_foreground_pool_size}
Sets the number of threads performing load jobs in foreground pool. The foreground pool is used for loading table synchronously before server start listening on a port and for loading tables that are waited for. Foreground pool has higher priority than background pool. It means that no job starts in background pool while there are jobs running in foreground pool.
@@ -2217,6 +2233,39 @@ If the table does not exist, ClickHouse will create it. If the structure of the
```
+# query_metric_log {#query_metric_log}
+
+It is disabled by default.
+
+**Enabling**
+
+To manually turn on metrics history collection [`system.query_metric_log`](../../operations/system-tables/query_metric_log.md), create `/etc/clickhouse-server/config.d/query_metric_log.xml` with the following content:
+
+``` xml
+
+
+ system
+
+ 7500
+ 1000
+ 1048576
+ 8192
+ 524288
+ false
+
+
+```
+
+**Disabling**
+
+To disable `query_metric_log` setting, you should create the following file `/etc/clickhouse-server/config.d/disable_query_metric_log.xml` with the following content:
+
+``` xml
+
+
+
+```
+
## query_cache {#server_configuration_parameters_query-cache}
[Query cache](../query-cache.md) configuration.
@@ -3109,7 +3158,7 @@ By default, tunneling (i.e, `HTTP CONNECT`) is used to make `HTTPS` requests ove
### no_proxy
By default, all requests will go through the proxy. In order to disable it for specific hosts, the `no_proxy` variable must be set.
-It can be set inside the `` clause for list and remote resolvers and as an environment variable for environment resolver.
+It can be set inside the `` clause for list and remote resolvers and as an environment variable for environment resolver.
It supports IP addresses, domains, subdomains and `'*'` wildcard for full bypass. Leading dots are stripped just like curl does.
Example:
@@ -3175,6 +3224,34 @@ Default value: "default"
**See Also**
- [Workload Scheduling](/docs/en/operations/workload-scheduling.md)
+## workload_path {#workload_path}
+
+The directory used as a storage for all `CREATE WORKLOAD` and `CREATE RESOURCE` queries. By default `/workload/` folder under server working directory is used.
+
+**Example**
+
+``` xml
+/var/lib/clickhouse/workload/
+```
+
+**See Also**
+- [Workload Hierarchy](/docs/en/operations/workload-scheduling.md#workloads)
+- [workload_zookeeper_path](#workload_zookeeper_path)
+
+## workload_zookeeper_path {#workload_zookeeper_path}
+
+The path to a ZooKeeper node, which is used as a storage for all `CREATE WORKLOAD` and `CREATE RESOURCE` queries. For consistency all SQL definitions are stored as a value of this single znode. By default ZooKeeper is not used and definitions are stored on [disk](#workload_path).
+
+**Example**
+
+``` xml
+/clickhouse/workload/definitions.sql
+```
+
+**See Also**
+- [Workload Hierarchy](/docs/en/operations/workload-scheduling.md#workloads)
+- [workload_path](#workload_path)
+
## max_authentication_methods_per_user {#max_authentication_methods_per_user}
The maximum number of authentication methods a user can be created with or altered to.
diff --git a/docs/en/operations/system-tables/merge_tree_settings.md b/docs/en/operations/system-tables/merge_tree_settings.md
index 48217d63f9d..473315d3941 100644
--- a/docs/en/operations/system-tables/merge_tree_settings.md
+++ b/docs/en/operations/system-tables/merge_tree_settings.md
@@ -18,6 +18,11 @@ Columns:
- `1` — Current user can’t change the setting.
- `type` ([String](../../sql-reference/data-types/string.md)) — Setting type (implementation specific string value).
- `is_obsolete` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) - Shows whether a setting is obsolete.
+- `tier` ([Enum8](../../sql-reference/data-types/enum.md)) — Support level for this feature. ClickHouse features are organized in tiers, varying depending on the current status of their development and the expectations one might have when using them. Values:
+ - `'Production'` — The feature is stable, safe to use and does not have issues interacting with other **production** features. .
+ - `'Beta'` — The feature is stable and safe. The outcome of using it together with other features is unknown and correctness is not guaranteed. Testing and reports are welcome.
+ - `'Experimental'` — The feature is under development. Only intended for developers and ClickHouse enthusiasts. The feature might or might not work and could be removed at any time.
+ - `'Obsolete'` — No longer supported. Either it is already removed or it will be removed in future releases.
**Example**
```sql
diff --git a/docs/en/operations/system-tables/query_metric_log.md b/docs/en/operations/system-tables/query_metric_log.md
new file mode 100644
index 00000000000..38d44c0e19a
--- /dev/null
+++ b/docs/en/operations/system-tables/query_metric_log.md
@@ -0,0 +1,49 @@
+---
+slug: /en/operations/system-tables/query_metric_log
+---
+# query_metric_log
+
+Contains history of memory and metric values from table `system.events` for individual queries, periodically flushed to disk.
+
+Once a query starts, data is collected at periodic intervals of `query_metric_log_interval` milliseconds (which is set to 1000
+by default). The data is also collected when the query finishes if the query takes longer than `query_metric_log_interval`.
+
+Columns:
+- `query_id` ([String](../../sql-reference/data-types/string.md)) — ID of the query.
+- `hostname` ([LowCardinality(String)](../../sql-reference/data-types/string.md)) — Hostname of the server executing the query.
+- `event_date` ([Date](../../sql-reference/data-types/date.md)) — Event date.
+- `event_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — Event time.
+- `event_time_microseconds` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — Event time with microseconds resolution.
+
+**Example**
+
+``` sql
+SELECT * FROM system.query_metric_log LIMIT 1 FORMAT Vertical;
+```
+
+``` text
+Row 1:
+──────
+query_id: 97c8ba04-b6d4-4bd7-b13e-6201c5c6e49d
+hostname: clickhouse.eu-central1.internal
+event_date: 2020-09-05
+event_time: 2020-09-05 16:22:33
+event_time_microseconds: 2020-09-05 16:22:33.196807
+memory_usage: 313434219
+peak_memory_usage: 598951986
+ProfileEvent_Query: 0
+ProfileEvent_SelectQuery: 0
+ProfileEvent_InsertQuery: 0
+ProfileEvent_FailedQuery: 0
+ProfileEvent_FailedSelectQuery: 0
+...
+```
+
+**See also**
+
+- [query_metric_log setting](../../operations/server-configuration-parameters/settings.md#query_metric_log) — Enabling and disabling the setting.
+- [query_metric_log_interval](../../operations/settings/settings.md#query_metric_log_interval)
+- [system.asynchronous_metrics](../../operations/system-tables/asynchronous_metrics.md) — Contains periodically calculated metrics.
+- [system.events](../../operations/system-tables/events.md#system_tables-events) — Contains a number of events that occurred.
+- [system.metrics](../../operations/system-tables/metrics.md) — Contains instantly calculated metrics.
+- [Monitoring](../../operations/monitoring.md) — Base concepts of ClickHouse monitoring.
diff --git a/docs/en/operations/system-tables/resources.md b/docs/en/operations/system-tables/resources.md
new file mode 100644
index 00000000000..6329f05f610
--- /dev/null
+++ b/docs/en/operations/system-tables/resources.md
@@ -0,0 +1,37 @@
+---
+slug: /en/operations/system-tables/resources
+---
+# resources
+
+Contains information for [resources](/docs/en/operations/workload-scheduling.md#workload_entity_storage) residing on the local server. The table contains a row for every resource.
+
+Example:
+
+``` sql
+SELECT *
+FROM system.resources
+FORMAT Vertical
+```
+
+``` text
+Row 1:
+──────
+name: io_read
+read_disks: ['s3']
+write_disks: []
+create_query: CREATE RESOURCE io_read (READ DISK s3)
+
+Row 2:
+──────
+name: io_write
+read_disks: []
+write_disks: ['s3']
+create_query: CREATE RESOURCE io_write (WRITE DISK s3)
+```
+
+Columns:
+
+- `name` (`String`) - Resource name.
+- `read_disks` (`Array(String)`) - The array of disk names that uses this resource for read operations.
+- `write_disks` (`Array(String)`) - The array of disk names that uses this resource for write operations.
+- `create_query` (`String`) - The definition of the resource.
diff --git a/docs/en/operations/system-tables/settings.md b/docs/en/operations/system-tables/settings.md
index a04e095e990..1cfee0ba5f4 100644
--- a/docs/en/operations/system-tables/settings.md
+++ b/docs/en/operations/system-tables/settings.md
@@ -18,6 +18,11 @@ Columns:
- `1` — Current user can’t change the setting.
- `default` ([String](../../sql-reference/data-types/string.md)) — Setting default value.
- `is_obsolete` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) - Shows whether a setting is obsolete.
+- `tier` ([Enum8](../../sql-reference/data-types/enum.md)) — Support level for this feature. ClickHouse features are organized in tiers, varying depending on the current status of their development and the expectations one might have when using them. Values:
+ - `'Production'` — The feature is stable, safe to use and does not have issues interacting with other **production** features. .
+ - `'Beta'` — The feature is stable and safe. The outcome of using it together with other features is unknown and correctness is not guaranteed. Testing and reports are welcome.
+ - `'Experimental'` — The feature is under development. Only intended for developers and ClickHouse enthusiasts. The feature might or might not work and could be removed at any time.
+ - `'Obsolete'` — No longer supported. Either it is already removed or it will be removed in future releases.
**Example**
@@ -26,19 +31,99 @@ The following example shows how to get information about settings which name con
``` sql
SELECT *
FROM system.settings
-WHERE name LIKE '%min_i%'
+WHERE name LIKE '%min_insert_block_size_%'
+FORMAT Vertical
```
``` text
-┌─name───────────────────────────────────────────────_─value─────_─changed─_─description───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────_─min──_─max──_─readonly─_─type─────────_─default───_─alias_for─_─is_obsolete─┐
-│ min_insert_block_size_rows │ 1048449 │ 0 │ Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough. │ ____ │ ____ │ 0 │ UInt64 │ 1048449 │ │ 0 │
-│ min_insert_block_size_bytes │ 268402944 │ 0 │ Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough. │ ____ │ ____ │ 0 │ UInt64 │ 268402944 │ │ 0 │
-│ min_insert_block_size_rows_for_materialized_views │ 0 │ 0 │ Like min_insert_block_size_rows, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_rows) │ ____ │ ____ │ 0 │ UInt64 │ 0 │ │ 0 │
-│ min_insert_block_size_bytes_for_materialized_views │ 0 │ 0 │ Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes) │ ____ │ ____ │ 0 │ UInt64 │ 0 │ │ 0 │
-│ read_backoff_min_interval_between_events_ms │ 1000 │ 0 │ Settings to reduce the number of threads in case of slow reads. Do not pay attention to the event, if the previous one has passed less than a certain amount of time. │ ____ │ ____ │ 0 │ Milliseconds │ 1000 │ │ 0 │
-└────────────────────────────────────────────────────┴───────────┴─────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-──────────────────────────────────────────────────────┴──────┴──────┴──────────┴──────────────┴───────────┴───────────┴─────────────┘
-```
+Row 1:
+──────
+name: min_insert_block_size_rows
+value: 1048449
+changed: 0
+description: Sets the minimum number of rows in the block that can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones.
+
+Possible values:
+
+- Positive integer.
+- 0 — Squashing disabled.
+min: ᴺᵁᴸᴸ
+max: ᴺᵁᴸᴸ
+readonly: 0
+type: UInt64
+default: 1048449
+alias_for:
+is_obsolete: 0
+tier: Production
+
+Row 2:
+──────
+name: min_insert_block_size_bytes
+value: 268402944
+changed: 0
+description: Sets the minimum number of bytes in the block which can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones.
+
+Possible values:
+
+- Positive integer.
+- 0 — Squashing disabled.
+min: ᴺᵁᴸᴸ
+max: ᴺᵁᴸᴸ
+readonly: 0
+type: UInt64
+default: 268402944
+alias_for:
+is_obsolete: 0
+tier: Production
+
+Row 3:
+──────
+name: min_insert_block_size_rows_for_materialized_views
+value: 0
+changed: 0
+description: Sets the minimum number of rows in the block which can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones. This setting is applied only for blocks inserted into [materialized view](../../sql-reference/statements/create/view.md). By adjusting this setting, you control blocks squashing while pushing to materialized view and avoid excessive memory usage.
+
+Possible values:
+
+- Any positive integer.
+- 0 — Squashing disabled.
+
+**See Also**
+
+- [min_insert_block_size_rows](#min-insert-block-size-rows)
+min: ᴺᵁᴸᴸ
+max: ᴺᵁᴸᴸ
+readonly: 0
+type: UInt64
+default: 0
+alias_for:
+is_obsolete: 0
+tier: Production
+
+Row 4:
+──────
+name: min_insert_block_size_bytes_for_materialized_views
+value: 0
+changed: 0
+description: Sets the minimum number of bytes in the block which can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones. This setting is applied only for blocks inserted into [materialized view](../../sql-reference/statements/create/view.md). By adjusting this setting, you control blocks squashing while pushing to materialized view and avoid excessive memory usage.
+
+Possible values:
+
+- Any positive integer.
+- 0 — Squashing disabled.
+
+**See also**
+
+- [min_insert_block_size_bytes](#min-insert-block-size-bytes)
+min: ᴺᵁᴸᴸ
+max: ᴺᵁᴸᴸ
+readonly: 0
+type: UInt64
+default: 0
+alias_for:
+is_obsolete: 0
+tier: Production
+ ```
Using of `WHERE changed` can be useful, for example, when you want to check:
diff --git a/docs/en/operations/system-tables/workloads.md b/docs/en/operations/system-tables/workloads.md
new file mode 100644
index 00000000000..d9c62372044
--- /dev/null
+++ b/docs/en/operations/system-tables/workloads.md
@@ -0,0 +1,40 @@
+---
+slug: /en/operations/system-tables/workloads
+---
+# workloads
+
+Contains information for [workloads](/docs/en/operations/workload-scheduling.md#workload_entity_storage) residing on the local server. The table contains a row for every workload.
+
+Example:
+
+``` sql
+SELECT *
+FROM system.workloads
+FORMAT Vertical
+```
+
+``` text
+Row 1:
+──────
+name: production
+parent: all
+create_query: CREATE WORKLOAD production IN `all` SETTINGS weight = 9
+
+Row 2:
+──────
+name: development
+parent: all
+create_query: CREATE WORKLOAD development IN `all`
+
+Row 3:
+──────
+name: all
+parent:
+create_query: CREATE WORKLOAD `all`
+```
+
+Columns:
+
+- `name` (`String`) - Workload name.
+- `parent` (`String`) - Parent workload name.
+- `create_query` (`String`) - The definition of the workload.
diff --git a/docs/en/operations/workload-scheduling.md b/docs/en/operations/workload-scheduling.md
index 08629492ec6..a43bea7a5b1 100644
--- a/docs/en/operations/workload-scheduling.md
+++ b/docs/en/operations/workload-scheduling.md
@@ -43,6 +43,20 @@ Example:
```
+An alternative way to express which disks are used by a resource is SQL syntax:
+
+```sql
+CREATE RESOURCE resource_name (WRITE DISK disk1, READ DISK disk2)
+```
+
+Resource could be used for any number of disk for READ or WRITE or both for READ and WRITE. There a syntax allowing to use a resource for all the disks:
+
+```sql
+CREATE RESOURCE all_io (READ ANY DISK, WRITE ANY DISK);
+```
+
+Note that server configuration options have priority over SQL way to define resources.
+
## Workload markup {#workload_markup}
Queries can be marked with setting `workload` to distinguish different workloads. If `workload` is not set, than value "default" is used. Note that you are able to specify the other value using settings profiles. Setting constraints can be used to make `workload` constant if you want all queries from the user to be marked with fixed value of `workload` setting.
@@ -153,9 +167,48 @@ Example:
```
+## Workload hierarchy (SQL only) {#workloads}
+
+Defining resources and classifiers in XML could be challenging. ClickHouse provides SQL syntax that is much more convenient. All resources that were created with `CREATE RESOURCE` share the same structure of the hierarchy, but could differ in some aspects. Every workload created with `CREATE WORKLOAD` maintains a few automatically created scheduling nodes for every resource. A child workload can be created inside another parent workload. Here is the example that defines exactly the same hierarchy as XML configuration above:
+
+```sql
+CREATE RESOURCE network_write (WRITE DISK s3)
+CREATE RESOURCE network_read (READ DISK s3)
+CREATE WORKLOAD all SETTINGS max_requests = 100
+CREATE WORKLOAD development IN all
+CREATE WORKLOAD production IN all SETTINGS weight = 3
+```
+
+The name of a leaf workload without children could be used in query settings `SETTINGS workload = 'name'`. Note that workload classifiers are also created automatically when using SQL syntax.
+
+To customize workload the following settings could be used:
+* `priority` - sibling workloads are served according to static priority values (lower value means higher priority).
+* `weight` - sibling workloads having the same static priority share resources according to weights.
+* `max_requests` - the limit on the number of concurrent resource requests in this workload.
+* `max_cost` - the limit on the total inflight bytes count of concurrent resource requests in this workload.
+* `max_speed` - the limit on byte processing rate of this workload (the limit is independent for every resource).
+* `max_burst` - maximum number of bytes that could be processed by the workload without being throttled (for every resource independently).
+
+Note that workload settings are translated into a proper set of scheduling nodes. For more details, see the description of the scheduling node [types and options](#hierarchy).
+
+There is no way to specify different hierarchies of workloads for different resources. But there is a way to specify different workload setting value for a specific resource:
+
+```sql
+CREATE OR REPLACE WORKLOAD all SETTINGS max_requests = 100, max_speed = 1000000 FOR network_read, max_speed = 2000000 FOR network_write
+```
+
+Also note that workload or resource could not be dropped if it is referenced from another workload. To update a definition of a workload use `CREATE OR REPLACE WORKLOAD` query.
+
+## Workloads and resources storage {#workload_entity_storage}
+Definitions of all workloads and resources in the form of `CREATE WORKLOAD` and `CREATE RESOURCE` queries are stored persistently either on disk at `workload_path` or in ZooKeeper at `workload_zookeeper_path`. ZooKeeper storage is recommended to achieve consistency between nodes. Alternatively `ON CLUSTER` clause could be used along with disk storage.
+
## See also
- [system.scheduler](/docs/en/operations/system-tables/scheduler.md)
+ - [system.workloads](/docs/en/operations/system-tables/workloads.md)
+ - [system.resources](/docs/en/operations/system-tables/resources.md)
- [merge_workload](/docs/en/operations/settings/merge-tree-settings.md#merge_workload) merge tree setting
- [merge_workload](/docs/en/operations/server-configuration-parameters/settings.md#merge_workload) global server setting
- [mutation_workload](/docs/en/operations/settings/merge-tree-settings.md#mutation_workload) merge tree setting
- [mutation_workload](/docs/en/operations/server-configuration-parameters/settings.md#mutation_workload) global server setting
+ - [workload_path](/docs/en/operations/server-configuration-parameters/settings.md#workload_path) global server setting
+ - [workload_zookeeper_path](/docs/en/operations/server-configuration-parameters/settings.md#workload_zookeeper_path) global server setting
diff --git a/docs/en/sql-reference/aggregate-functions/reference/quantileexactweighted.md b/docs/en/sql-reference/aggregate-functions/reference/quantileexactweighted.md
index 4ce212888c4..6004e8392f1 100644
--- a/docs/en/sql-reference/aggregate-functions/reference/quantileexactweighted.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/quantileexactweighted.md
@@ -23,7 +23,7 @@ Alias: `medianExactWeighted`.
- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
- `expr` — Expression over the column values resulting in numeric [data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) or [DateTime](../../../sql-reference/data-types/datetime.md).
-- `weight` — Column with weights of sequence members. Weight is a number of value occurrences.
+- `weight` — Column with weights of sequence members. Weight is a number of value occurrences with [Unsigned integer types](../../../sql-reference/data-types/int-uint.md).
**Returned value**
diff --git a/docs/en/sql-reference/aggregate-functions/reference/quantileexactweightedinterpolated.md b/docs/en/sql-reference/aggregate-functions/reference/quantileexactweightedinterpolated.md
new file mode 100644
index 00000000000..6b38e130cb2
--- /dev/null
+++ b/docs/en/sql-reference/aggregate-functions/reference/quantileexactweightedinterpolated.md
@@ -0,0 +1,77 @@
+---
+slug: /en/sql-reference/aggregate-functions/reference/quantileExactWeightedInterpolated
+sidebar_position: 176
+---
+
+# quantileExactWeightedInterpolated
+
+Computes [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence using linear interpolation, taking into account the weight of each element.
+
+To get the interpolated value, all the passed values are combined into an array, which are then sorted by their corresponding weights. Quantile interpolation is then performed using the [weighted percentile method](https://en.wikipedia.org/wiki/Percentile#The_weighted_percentile_method) by building a cumulative distribution based on weights and then a linear interpolation is performed using the weights and the values to compute the quantiles.
+
+When using multiple `quantile*` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) function.
+
+We strongly recommend using `quantileExactWeightedInterpolated` instead of `quantileInterpolatedWeighted` because `quantileExactWeightedInterpolated` is more accurate than `quantileInterpolatedWeighted`. Here is an example:
+
+``` sql
+SELECT
+ quantileExactWeightedInterpolated(0.99)(number, 1),
+ quantile(0.99)(number),
+ quantileInterpolatedWeighted(0.99)(number, 1)
+FROM numbers(9)
+
+
+┌─quantileExactWeightedInterpolated(0.99)(number, 1)─┬─quantile(0.99)(number)─┬─quantileInterpolatedWeighted(0.99)(number, 1)─┐
+│ 7.92 │ 7.92 │ 8 │
+└────────────────────────────────────────────────────┴────────────────────────┴───────────────────────────────────────────────┘
+```
+
+**Syntax**
+
+``` sql
+quantileExactWeightedInterpolated(level)(expr, weight)
+```
+
+Alias: `medianExactWeightedInterpolated`.
+
+**Arguments**
+
+- `level` — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a `level` value in the range of `[0.01, 0.99]`. Default value: 0.5. At `level=0.5` the function calculates [median](https://en.wikipedia.org/wiki/Median).
+- `expr` — Expression over the column values resulting in numeric [data types](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) or [DateTime](../../../sql-reference/data-types/datetime.md).
+- `weight` — Column with weights of sequence members. Weight is a number of value occurrences with [Unsigned integer types](../../../sql-reference/data-types/int-uint.md).
+
+**Returned value**
+
+- Quantile of the specified level.
+
+Type:
+
+- [Float64](../../../sql-reference/data-types/float.md) for numeric data type input.
+- [Date](../../../sql-reference/data-types/date.md) if input values have the `Date` type.
+- [DateTime](../../../sql-reference/data-types/datetime.md) if input values have the `DateTime` type.
+
+**Example**
+
+Input table:
+
+``` text
+┌─n─┬─val─┐
+│ 0 │ 3 │
+│ 1 │ 2 │
+│ 2 │ 1 │
+│ 5 │ 4 │
+└───┴─────┘
+```
+
+Result:
+
+``` text
+┌─quantileExactWeightedInterpolated(n, val)─┐
+│ 1.5 │
+└───────────────────────────────────────────┘
+```
+
+**See Also**
+
+- [median](../../../sql-reference/aggregate-functions/reference/median.md#median)
+- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles)
diff --git a/docs/en/sql-reference/aggregate-functions/reference/quantiles.md b/docs/en/sql-reference/aggregate-functions/reference/quantiles.md
index e2c3295221d..aed017d295f 100644
--- a/docs/en/sql-reference/aggregate-functions/reference/quantiles.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/quantiles.md
@@ -9,7 +9,7 @@ sidebar_position: 177
Syntax: `quantiles(level1, level2, ...)(x)`
-All the quantile functions also have corresponding quantiles functions: `quantiles`, `quantilesDeterministic`, `quantilesTiming`, `quantilesTimingWeighted`, `quantilesExact`, `quantilesExactWeighted`, `quantileInterpolatedWeighted`, `quantilesTDigest`, `quantilesBFloat16`, `quantilesDD`. These functions calculate all the quantiles of the listed levels in one pass, and return an array of the resulting values.
+All the quantile functions also have corresponding quantiles functions: `quantiles`, `quantilesDeterministic`, `quantilesTiming`, `quantilesTimingWeighted`, `quantilesExact`, `quantilesExactWeighted`, `quantileExactWeightedInterpolated`, `quantileInterpolatedWeighted`, `quantilesTDigest`, `quantilesBFloat16`, `quantilesDD`. These functions calculate all the quantiles of the listed levels in one pass, and return an array of the resulting values.
## quantilesExactExclusive
diff --git a/docs/en/sql-reference/data-types/newjson.md b/docs/en/sql-reference/data-types/newjson.md
index 68952590eb9..7e6d4dd934f 100644
--- a/docs/en/sql-reference/data-types/newjson.md
+++ b/docs/en/sql-reference/data-types/newjson.md
@@ -5,7 +5,7 @@ sidebar_label: JSON
keywords: [json, data type]
---
-# JSON
+# JSON Data Type
Stores JavaScript Object Notation (JSON) documents in a single column.
diff --git a/docs/en/sql-reference/functions/type-conversion-functions.md b/docs/en/sql-reference/functions/type-conversion-functions.md
index 5c39f880a0e..91bae2fe9da 100644
--- a/docs/en/sql-reference/functions/type-conversion-functions.md
+++ b/docs/en/sql-reference/functions/type-conversion-functions.md
@@ -6867,6 +6867,18 @@ Same as for [parseDateTimeInJodaSyntax](#parsedatetimeinjodasyntax) except that
Same as for [parseDateTimeInJodaSyntax](#parsedatetimeinjodasyntax) except that it returns `NULL` when it encounters a date format that cannot be processed.
+## parseDateTime64InJodaSyntax
+
+Similar to [parseDateTimeInJodaSyntax](#parsedatetimeinjodasyntax). Differently, it returns a value of type [DateTime64](../data-types/datetime64.md).
+
+## parseDateTime64InJodaSyntaxOrZero
+
+Same as for [parseDateTime64InJodaSyntax](#parsedatetime64injodasyntax) except that it returns zero date when it encounters a date format that cannot be processed.
+
+## parseDateTime64InJodaSyntaxOrNull
+
+Same as for [parseDateTime64InJodaSyntax](#parsedatetime64injodasyntax) except that it returns `NULL` when it encounters a date format that cannot be processed.
+
## parseDateTimeBestEffort
## parseDateTime32BestEffort
diff --git a/docs/en/sql-reference/statements/alter/user.md b/docs/en/sql-reference/statements/alter/user.md
index a56532e2ab0..1514b16a657 100644
--- a/docs/en/sql-reference/statements/alter/user.md
+++ b/docs/en/sql-reference/statements/alter/user.md
@@ -12,7 +12,7 @@ Syntax:
``` sql
ALTER USER [IF EXISTS] name1 [RENAME TO new_name |, name2 [,...]]
[ON CLUSTER cluster_name]
- [NOT IDENTIFIED | RESET AUTHENTICATION METHODS TO NEW | {IDENTIFIED | ADD IDENTIFIED} {[WITH {plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | WITH NO_PASSWORD | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']} | {WITH ssl_certificate CN 'common_name' | SAN 'TYPE:subject_alt_name'} | {WITH ssh_key BY KEY 'public_key' TYPE 'ssh-rsa|...'} | {WITH http SERVER 'server_name' [SCHEME 'Basic']}
+ [NOT IDENTIFIED | RESET AUTHENTICATION METHODS TO NEW | {IDENTIFIED | ADD IDENTIFIED} {[WITH {plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | WITH NO_PASSWORD | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']} | {WITH ssl_certificate CN 'common_name' | SAN 'TYPE:subject_alt_name'} | {WITH ssh_key BY KEY 'public_key' TYPE 'ssh-rsa|...'} | {WITH http SERVER 'server_name' [SCHEME 'Basic']} [VALID UNTIL datetime]
[, {[{plaintext_password | sha256_password | sha256_hash | ...}] BY {'password' | 'hash'}} | {ldap SERVER 'server_name'} | {...} | ... [,...]]]
[[ADD | DROP] HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE]
[VALID UNTIL datetime]
@@ -91,3 +91,15 @@ Reset authentication methods and keep the most recent added one:
``` sql
ALTER USER user1 RESET AUTHENTICATION METHODS TO NEW
```
+
+## VALID UNTIL Clause
+
+Allows you to specify the expiration date and, optionally, the time for an authentication method. It accepts a string as a parameter. It is recommended to use the `YYYY-MM-DD [hh:mm:ss] [timezone]` format for datetime. By default, this parameter equals `'infinity'`.
+The `VALID UNTIL` clause can only be specified along with an authentication method, except for the case where no authentication method has been specified in the query. In this scenario, the `VALID UNTIL` clause will be applied to all existing authentication methods.
+
+Examples:
+
+- `ALTER USER name1 VALID UNTIL '2025-01-01'`
+- `ALTER USER name1 VALID UNTIL '2025-01-01 12:00:00 UTC'`
+- `ALTER USER name1 VALID UNTIL 'infinity'`
+- `ALTER USER name1 IDENTIFIED WITH plaintext_password BY 'no_expiration', bcrypt_password BY 'expiration_set' VALID UNTIL'2025-01-01''`
diff --git a/docs/en/sql-reference/statements/create/user.md b/docs/en/sql-reference/statements/create/user.md
index a018e28306c..03d93fc3365 100644
--- a/docs/en/sql-reference/statements/create/user.md
+++ b/docs/en/sql-reference/statements/create/user.md
@@ -11,7 +11,7 @@ Syntax:
``` sql
CREATE USER [IF NOT EXISTS | OR REPLACE] name1 [, name2 [,...]] [ON CLUSTER cluster_name]
- [NOT IDENTIFIED | IDENTIFIED {[WITH {plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | WITH NO_PASSWORD | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']} | {WITH ssl_certificate CN 'common_name' | SAN 'TYPE:subject_alt_name'} | {WITH ssh_key BY KEY 'public_key' TYPE 'ssh-rsa|...'} | {WITH http SERVER 'server_name' [SCHEME 'Basic']}
+ [NOT IDENTIFIED | IDENTIFIED {[WITH {plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | WITH NO_PASSWORD | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']} | {WITH ssl_certificate CN 'common_name' | SAN 'TYPE:subject_alt_name'} | {WITH ssh_key BY KEY 'public_key' TYPE 'ssh-rsa|...'} | {WITH http SERVER 'server_name' [SCHEME 'Basic']} [VALID UNTIL datetime]
[, {[{plaintext_password | sha256_password | sha256_hash | ...}] BY {'password' | 'hash'}} | {ldap SERVER 'server_name'} | {...} | ... [,...]]]
[HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE]
[VALID UNTIL datetime]
@@ -178,13 +178,16 @@ ClickHouse treats `user_name@'address'` as a username as a whole. Thus, technica
## VALID UNTIL Clause
-Allows you to specify the expiration date and, optionally, the time for user credentials. It accepts a string as a parameter. It is recommended to use the `YYYY-MM-DD [hh:mm:ss] [timezone]` format for datetime. By default, this parameter equals `'infinity'`.
+Allows you to specify the expiration date and, optionally, the time for an authentication method. It accepts a string as a parameter. It is recommended to use the `YYYY-MM-DD [hh:mm:ss] [timezone]` format for datetime. By default, this parameter equals `'infinity'`.
+The `VALID UNTIL` clause can only be specified along with an authentication method, except for the case where no authentication method has been specified in the query. In this scenario, the `VALID UNTIL` clause will be applied to all existing authentication methods.
Examples:
- `CREATE USER name1 VALID UNTIL '2025-01-01'`
- `CREATE USER name1 VALID UNTIL '2025-01-01 12:00:00 UTC'`
- `CREATE USER name1 VALID UNTIL 'infinity'`
+- ```CREATE USER name1 VALID UNTIL '2025-01-01 12:00:00 `Asia/Tokyo`'```
+- `CREATE USER name1 IDENTIFIED WITH plaintext_password BY 'no_expiration', bcrypt_password BY 'expiration_set' VALID UNTIL '2025-01-01''`
## GRANTEES Clause
diff --git a/docs/en/sql-reference/statements/grant.md b/docs/en/sql-reference/statements/grant.md
index c11299baf38..19305675ec8 100644
--- a/docs/en/sql-reference/statements/grant.md
+++ b/docs/en/sql-reference/statements/grant.md
@@ -78,6 +78,10 @@ Specifying privileges you can use asterisk (`*`) instead of a table or a databas
Also, you can omit database name. In this case privileges are granted for current database.
For example, `GRANT SELECT ON * TO john` grants the privilege on all the tables in the current database, `GRANT SELECT ON mytable TO john` grants the privilege on the `mytable` table in the current database.
+:::note
+The feature described below is available starting with the 24.10 ClickHouse version.
+:::
+
You can also put asterisks at the end of a table or a database name. This feature allows you to grant privileges on an abstract prefix of the table's path.
Example: `GRANT SELECT ON db.my_tables* TO john`. This query allows `john` to execute the `SELECT` query over all the `db` database tables with the prefix `my_tables*`.
diff --git a/docs/en/sql-reference/statements/kill.md b/docs/en/sql-reference/statements/kill.md
index 667a5b51f5c..ff6f64a97fe 100644
--- a/docs/en/sql-reference/statements/kill.md
+++ b/docs/en/sql-reference/statements/kill.md
@@ -83,7 +83,7 @@ The presence of long-running or incomplete mutations often indicates that a Clic
- Or manually kill some of these mutations by sending a `KILL` command.
``` sql
-KILL MUTATION [ON CLUSTER cluster]
+KILL MUTATION
WHERE
[TEST]
[FORMAT format]
@@ -135,7 +135,6 @@ KILL MUTATION WHERE database = 'default' AND table = 'table'
-- Cancel the specific mutation:
KILL MUTATION WHERE database = 'default' AND table = 'table' AND mutation_id = 'mutation_3.txt'
```
-:::tip If you are killing a mutation in ClickHouse Cloud or in a self-managed cluster, then be sure to use the ```ON CLUSTER [cluster-name]``` option, in order to ensure the mutation is killed on all replicas:::
The query is useful when a mutation is stuck and cannot finish (e.g. if some function in the mutation query throws an exception when applied to the data contained in the table).
diff --git a/docs/en/sql-reference/table-functions/s3.md b/docs/en/sql-reference/table-functions/s3.md
index 181c92b92d4..b14eb84392f 100644
--- a/docs/en/sql-reference/table-functions/s3.md
+++ b/docs/en/sql-reference/table-functions/s3.md
@@ -93,7 +93,6 @@ LIMIT 5;
ClickHouse also can determine the compression method of the file. For example, if the file was zipped up with a `.csv.gz` extension, ClickHouse would decompress the file automatically.
:::
-
## Usage
Suppose that we have several files with following URIs on S3:
@@ -248,6 +247,25 @@ FROM s3(
LIMIT 5;
```
+## Using S3 credentials (ClickHouse Cloud)
+
+For non-public buckets, users can pass an `aws_access_key_id` and `aws_secret_access_key` to the function. For example:
+
+```sql
+SELECT count() FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/mta/*.tsv', '', '','TSVWithNames')
+```
+
+This is appropriate for one-off accesses or in cases where credentials can easily be rotated. However, this is not recommended as a long-term solution for repeated access or where credentials are sensitive. In this case, we recommend users rely on role-based access.
+
+Role-based access for S3 in ClickHouse Cloud is documented [here](/docs/en/cloud/security/secure-s3#access-your-s3-bucket-with-the-clickhouseaccess-role).
+
+Once configured, a `roleARN` can be passed to the s3 function via an `extra_credentials` parameter. For example:
+
+```sql
+SELECT count() FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/mta/*.tsv','CSVWithNames',extra_credentials(role_arn = 'arn:aws:iam::111111111111:role/ClickHouseAccessRole-001'))
+```
+
+Further examples can be found [here](/docs/en/cloud/security/secure-s3#access-your-s3-bucket-with-the-clickhouseaccess-role)
## Working with archives
@@ -266,6 +284,14 @@ FROM s3(
);
```
+:::note
+ClickHouse supports three archive formats:
+ZIP
+TAR
+7Z
+While ZIP and TAR archives can be accessed from any supported storage location, 7Z archives can only be read from the local filesystem where ClickHouse is installed.
+:::
+
## Virtual Columns {#virtual-columns}
diff --git a/docs/en/sql-reference/table-functions/s3Cluster.md b/docs/en/sql-reference/table-functions/s3Cluster.md
index 9bf5a6b4da6..0aa4784d27a 100644
--- a/docs/en/sql-reference/table-functions/s3Cluster.md
+++ b/docs/en/sql-reference/table-functions/s3Cluster.md
@@ -70,6 +70,15 @@ SELECT count(*) FROM s3Cluster(
)
```
+## Accessing private and public buckets
+
+Users can use the same approaches as document for the s3 function [here](/docs/en/sql-reference/table-functions/s3#accessing-public-buckets).
+
+## Optimizing performance
+
+For details on optimizing the performance of the s3 function see [our detailed guide](/docs/en/integrations/s3/performance).
+
+
**See Also**
- [S3 engine](../../engines/table-engines/integrations/s3.md)
diff --git a/docs/ru/engines/table-engines/integrations/s3.md b/docs/ru/engines/table-engines/integrations/s3.md
index a1c69df4d0a..2bab78c0612 100644
--- a/docs/ru/engines/table-engines/integrations/s3.md
+++ b/docs/ru/engines/table-engines/integrations/s3.md
@@ -138,6 +138,7 @@ CREATE TABLE table_with_asterisk (name String, value UInt32)
- `use_insecure_imds_request` — признак использования менее безопасного соединения при выполнении запроса к IMDS при получении учётных данных из метаданных Amazon EC2. Значение по умолчанию — `false`.
- `region` — название региона S3.
- `header` — добавляет указанный HTTP-заголовок к запросу на заданную точку приема запроса. Может быть определен несколько раз.
+- `access_header` - добавляет указанный HTTP-заголовок к запросу на заданную точку приема запроса, в случая если не указаны другие способы авторизации.
- `server_side_encryption_customer_key_base64` — устанавливает необходимые заголовки для доступа к объектам S3 с шифрованием SSE-C.
- `single_read_retries` — Максимальное количество попыток запроса при единичном чтении. Значение по умолчанию — `4`.
diff --git a/programs/keeper-client/KeeperClient.cpp b/programs/keeper-client/KeeperClient.cpp
index 97caa142124..101ed270fc5 100644
--- a/programs/keeper-client/KeeperClient.cpp
+++ b/programs/keeper-client/KeeperClient.cpp
@@ -163,6 +163,10 @@ void KeeperClient::defineOptions(Poco::Util::OptionSet & options)
.argument("")
.binding("operation-timeout"));
+ options.addOption(
+ Poco::Util::Option("use-xid-64", "", "use 64-bit XID. default false.")
+ .binding("use-xid-64"));
+
options.addOption(
Poco::Util::Option("config-file", "c", "if set, will try to get a connection string from clickhouse config. default `config.xml`")
.argument("")
@@ -411,6 +415,7 @@ int KeeperClient::main(const std::vector & /* args */)
zk_args.connection_timeout_ms = config().getInt("connection-timeout", 10) * 1000;
zk_args.session_timeout_ms = config().getInt("session-timeout", 10) * 1000;
zk_args.operation_timeout_ms = config().getInt("operation-timeout", 10) * 1000;
+ zk_args.use_xid_64 = config().hasOption("use-xid-64");
zookeeper = zkutil::ZooKeeper::createWithoutKillingPreviousSessions(zk_args);
if (config().has("no-confirmation") || config().has("query"))
diff --git a/programs/keeper/Keeper.cpp b/programs/keeper/Keeper.cpp
index 3007df60765..74af9950e13 100644
--- a/programs/keeper/Keeper.cpp
+++ b/programs/keeper/Keeper.cpp
@@ -590,6 +590,7 @@ try
#if USE_SSL
CertificateReloader::instance().tryLoad(*config);
+ CertificateReloader::instance().tryLoadClient(*config);
#endif
});
diff --git a/programs/local/LocalServer.cpp b/programs/local/LocalServer.cpp
index b6b67724b0a..1dcef5eb25e 100644
--- a/programs/local/LocalServer.cpp
+++ b/programs/local/LocalServer.cpp
@@ -821,11 +821,11 @@ void LocalServer::processConfig()
status.emplace(fs::path(path) / "status", StatusFile::write_full_info);
LOG_DEBUG(log, "Loading metadata from {}", path);
- auto startup_system_tasks = loadMetadataSystem(global_context);
+ auto load_system_metadata_tasks = loadMetadataSystem(global_context);
attachSystemTablesServer(global_context, *createMemoryDatabaseIfNotExists(global_context, DatabaseCatalog::SYSTEM_DATABASE), false);
attachInformationSchema(global_context, *createMemoryDatabaseIfNotExists(global_context, DatabaseCatalog::INFORMATION_SCHEMA));
attachInformationSchema(global_context, *createMemoryDatabaseIfNotExists(global_context, DatabaseCatalog::INFORMATION_SCHEMA_UPPERCASE));
- waitLoad(TablesLoaderForegroundPoolId, startup_system_tasks);
+ waitLoad(TablesLoaderForegroundPoolId, load_system_metadata_tasks);
if (!getClientConfiguration().has("only-system-tables"))
{
diff --git a/programs/server/Server.cpp b/programs/server/Server.cpp
index 35dae614d87..1f481381b2b 100644
--- a/programs/server/Server.cpp
+++ b/programs/server/Server.cpp
@@ -86,7 +86,7 @@
#include
#include
#include
-#include
+#include
#include
#include
#include "MetricsTransmitter.h"
@@ -168,9 +168,11 @@ namespace ServerSetting
{
extern const ServerSettingsUInt32 asynchronous_heavy_metrics_update_period_s;
extern const ServerSettingsUInt32 asynchronous_metrics_update_period_s;
+ extern const ServerSettingsBool asynchronous_metrics_enable_heavy_metrics;
extern const ServerSettingsBool async_insert_queue_flush_on_shutdown;
extern const ServerSettingsUInt64 async_insert_threads;
extern const ServerSettingsBool async_load_databases;
+ extern const ServerSettingsBool async_load_system_database;
extern const ServerSettingsUInt64 background_buffer_flush_schedule_pool_size;
extern const ServerSettingsUInt64 background_common_pool_size;
extern const ServerSettingsUInt64 background_distributed_schedule_pool_size;
@@ -205,7 +207,6 @@ namespace ServerSetting
extern const ServerSettingsBool format_alter_operations_with_parentheses;
extern const ServerSettingsUInt64 global_profiler_cpu_time_period_ns;
extern const ServerSettingsUInt64 global_profiler_real_time_period_ns;
- extern const ServerSettingsDouble gwp_asan_force_sample_probability;
extern const ServerSettingsUInt64 http_connections_soft_limit;
extern const ServerSettingsUInt64 http_connections_store_limit;
extern const ServerSettingsUInt64 http_connections_warn_limit;
@@ -620,7 +621,7 @@ void sanityChecks(Server & server)
#if defined(OS_LINUX)
try
{
- const std::unordered_set fastClockSources = {
+ const std::unordered_set fast_clock_sources = {
// ARM clock
"arch_sys_counter",
// KVM guest clock
@@ -629,7 +630,7 @@ void sanityChecks(Server & server)
"tsc",
};
const char * filename = "/sys/devices/system/clocksource/clocksource0/current_clocksource";
- if (!fastClockSources.contains(readLine(filename)))
+ if (!fast_clock_sources.contains(readLine(filename)))
server.context()->addWarningMessage("Linux is not using a fast clock source. Performance can be degraded. Check " + String(filename));
}
catch (...) // NOLINT(bugprone-empty-catch)
@@ -919,7 +920,6 @@ try
registerFormats();
registerRemoteFileMetadatas();
registerSchedulerNodes();
- registerResourceManagers();
CurrentMetrics::set(CurrentMetrics::Revision, ClickHouseRevision::getVersionRevision());
CurrentMetrics::set(CurrentMetrics::VersionInteger, ClickHouseRevision::getVersionInteger());
@@ -1060,6 +1060,7 @@ try
ServerAsynchronousMetrics async_metrics(
global_context,
server_settings[ServerSetting::asynchronous_metrics_update_period_s],
+ server_settings[ServerSetting::asynchronous_metrics_enable_heavy_metrics],
server_settings[ServerSetting::asynchronous_heavy_metrics_update_period_s],
[&]() -> std::vector
{
@@ -1927,10 +1928,6 @@ try
if (global_context->isServerCompletelyStarted())
CannotAllocateThreadFaultInjector::setFaultProbability(new_server_settings[ServerSetting::cannot_allocate_thread_fault_injection_probability]);
-#if USE_GWP_ASAN
- GWPAsan::setForceSampleProbability(new_server_settings[ServerSetting::gwp_asan_force_sample_probability]);
-#endif
-
ProfileEvents::increment(ProfileEvents::MainConfigLoads);
/// Must be the last.
@@ -2199,6 +2196,7 @@ try
LOG_INFO(log, "Loading metadata from {}", path_str);
+ LoadTaskPtrs load_system_metadata_tasks;
LoadTaskPtrs load_metadata_tasks;
// Make sure that if exception is thrown during startup async, new async loading jobs are not going to be called.
@@ -2222,12 +2220,8 @@ try
auto & database_catalog = DatabaseCatalog::instance();
/// We load temporary database first, because projections need it.
database_catalog.initializeAndLoadTemporaryDatabase();
- auto system_startup_tasks = loadMetadataSystem(global_context);
- maybeConvertSystemDatabase(global_context, system_startup_tasks);
- /// This has to be done before the initialization of system logs,
- /// otherwise there is a race condition between the system database initialization
- /// and creation of new tables in the database.
- waitLoad(TablesLoaderForegroundPoolId, system_startup_tasks);
+ load_system_metadata_tasks = loadMetadataSystem(global_context, server_settings[ServerSetting::async_load_system_database]);
+ maybeConvertSystemDatabase(global_context, load_system_metadata_tasks);
/// Startup scripts can depend on the system log tables.
if (config().has("startup_scripts") && !server_settings[ServerSetting::prepare_system_log_tables_on_startup].changed)
@@ -2258,6 +2252,8 @@ try
database_catalog.assertDatabaseExists(default_database);
/// Load user-defined SQL functions.
global_context->getUserDefinedSQLObjectsStorage().loadObjects();
+ /// Load WORKLOADs and RESOURCEs.
+ global_context->getWorkloadEntityStorage().loadEntities();
global_context->getRefreshSet().setRefreshesStopped(false);
}
@@ -2267,6 +2263,30 @@ try
throw;
}
+ bool found_stop_flag = false;
+
+ if (has_zookeeper && global_context->getMacros()->getMacroMap().contains("replica"))
+ {
+ try
+ {
+ auto zookeeper = global_context->getZooKeeper();
+ String stop_flag_path = "/clickhouse/stop_replicated_ddl_queries/{replica}";
+ stop_flag_path = global_context->getMacros()->expand(stop_flag_path);
+ found_stop_flag = zookeeper->exists(stop_flag_path);
+ }
+ catch (const Coordination::Exception & e)
+ {
+ if (e.code != Coordination::Error::ZCONNECTIONLOSS)
+ throw;
+ tryLogCurrentException(log);
+ }
+ }
+
+ if (found_stop_flag)
+ LOG_INFO(log, "Found a stop flag for replicated DDL queries. They will be disabled");
+ else
+ DatabaseCatalog::instance().startReplicatedDDLQueries();
+
LOG_DEBUG(log, "Loaded metadata.");
if (has_trace_collector)
@@ -2321,6 +2341,7 @@ try
#if USE_SSL
CertificateReloader::instance().tryLoad(config());
+ CertificateReloader::instance().tryLoadClient(config());
#endif
/// Must be done after initialization of `servers`, because async_metrics will access `servers` variable from its thread.
@@ -2369,17 +2390,28 @@ try
if (has_zookeeper && config().has("distributed_ddl"))
{
/// DDL worker should be started after all tables were loaded
- String ddl_zookeeper_path = config().getString("distributed_ddl.path", "/clickhouse/task_queue/ddl/");
+ String ddl_queue_path = config().getString("distributed_ddl.path", "/clickhouse/task_queue/ddl/");
+ String ddl_replicas_path = config().getString("distributed_ddl.replicas_path", "/clickhouse/task_queue/replicas/");
int pool_size = config().getInt("distributed_ddl.pool_size", 1);
if (pool_size < 1)
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND, "distributed_ddl.pool_size should be greater then 0");
- global_context->setDDLWorker(std::make_unique(pool_size, ddl_zookeeper_path, global_context, &config(),
- "distributed_ddl", "DDLWorker",
- &CurrentMetrics::MaxDDLEntryID, &CurrentMetrics::MaxPushedDDLEntryID),
- load_metadata_tasks);
+ global_context->setDDLWorker(
+ std::make_unique(
+ pool_size,
+ ddl_queue_path,
+ ddl_replicas_path,
+ global_context,
+ &config(),
+ "distributed_ddl",
+ "DDLWorker",
+ &CurrentMetrics::MaxDDLEntryID,
+ &CurrentMetrics::MaxPushedDDLEntryID),
+ joinTasks(load_system_metadata_tasks, load_metadata_tasks));
}
/// Do not keep tasks in server, they should be kept inside databases. Used here to make dependent tasks only.
+ load_system_metadata_tasks.clear();
+ load_system_metadata_tasks.shrink_to_fit();
load_metadata_tasks.clear();
load_metadata_tasks.shrink_to_fit();
@@ -2405,7 +2437,6 @@ try
#if USE_GWP_ASAN
GWPAsan::initFinished();
- GWPAsan::setForceSampleProbability(server_settings[ServerSetting::gwp_asan_force_sample_probability]);
#endif
try
@@ -2999,7 +3030,7 @@ void Server::updateServers(
for (auto * server : all_servers)
{
- if (!server->isStopping())
+ if (server->supportsRuntimeReconfiguration() && !server->isStopping())
{
std::string port_name = server->getPortName();
bool has_host = false;
diff --git a/programs/server/config.xml b/programs/server/config.xml
index 10ad831465a..9807f8c0d5a 100644
--- a/programs/server/config.xml
+++ b/programs/server/config.xml
@@ -1195,6 +1195,19 @@
false
+
+
+ system
+
+ 7500
+ 1048576
+ 8192
+ 524288
+ 1000
+ false
+
+
+
+
+
@@ -1437,6 +1454,8 @@
/clickhouse/task_queue/ddl
+
+ /clickhouse/task_queue/replicas
diff --git a/programs/server/config.yaml.example b/programs/server/config.yaml.example
index 5d5499f876c..5b0330df572 100644
--- a/programs/server/config.yaml.example
+++ b/programs/server/config.yaml.example
@@ -743,6 +743,13 @@ error_log:
flush_interval_milliseconds: 7500
collect_interval_milliseconds: 1000
+# Query metric log contains history of memory and metric values from table system.events for individual queries, periodically flushed to disk.
+query_metric_log:
+ database: system
+ table: query_metric_log
+ flush_interval_milliseconds: 7500
+ collect_interval_milliseconds: 1000
+
# Asynchronous metric log contains values of metrics from
# system.asynchronous_metrics.
asynchronous_metric_log:
diff --git a/programs/static-files-disk-uploader/static-files-disk-uploader.cpp b/programs/static-files-disk-uploader/static-files-disk-uploader.cpp
index f7696dd37f1..590e0364040 100644
--- a/programs/static-files-disk-uploader/static-files-disk-uploader.cpp
+++ b/programs/static-files-disk-uploader/static-files-disk-uploader.cpp
@@ -4,6 +4,7 @@
#include
#include
+#include
#include
#include
#include
diff --git a/src/Access/AccessControl.h b/src/Access/AccessControl.h
index a91686433ec..a342c5300bf 100644
--- a/src/Access/AccessControl.h
+++ b/src/Access/AccessControl.h
@@ -9,6 +9,8 @@
#include
+#include "config.h"
+
namespace Poco
{
diff --git a/src/Access/Authentication.cpp b/src/Access/Authentication.cpp
index 8d5d04a4ed2..1d69a659cd6 100644
--- a/src/Access/Authentication.cpp
+++ b/src/Access/Authentication.cpp
@@ -12,6 +12,7 @@
#include "config.h"
+
namespace DB
{
diff --git a/src/Access/AuthenticationData.cpp b/src/Access/AuthenticationData.cpp
index 57a1cd756ff..37a4e356af8 100644
--- a/src/Access/AuthenticationData.cpp
+++ b/src/Access/AuthenticationData.cpp
@@ -1,12 +1,16 @@
#include
#include
#include
+#include
#include
#include
#include
#include
#include
#include
+#include
+#include
+#include
#include
#include
@@ -113,7 +117,8 @@ bool operator ==(const AuthenticationData & lhs, const AuthenticationData & rhs)
&& (lhs.ssh_keys == rhs.ssh_keys)
#endif
&& (lhs.http_auth_scheme == rhs.http_auth_scheme)
- && (lhs.http_auth_server_name == rhs.http_auth_server_name);
+ && (lhs.http_auth_server_name == rhs.http_auth_server_name)
+ && (lhs.valid_until == rhs.valid_until);
}
@@ -384,14 +389,34 @@ std::shared_ptr AuthenticationData::toAST() const
throw Exception(ErrorCodes::LOGICAL_ERROR, "AST: Unexpected authentication type {}", toString(auth_type));
}
+
+ if (valid_until)
+ {
+ WriteBufferFromOwnString out;
+ writeDateTimeText(valid_until, out);
+
+ node->valid_until = std::make_shared(out.str());
+ }
+
return node;
}
AuthenticationData AuthenticationData::fromAST(const ASTAuthenticationData & query, ContextPtr context, bool validate)
{
+ time_t valid_until = 0;
+
+ if (query.valid_until)
+ {
+ valid_until = getValidUntilFromAST(query.valid_until, context);
+ }
+
if (query.type && query.type == AuthenticationType::NO_PASSWORD)
- return AuthenticationData();
+ {
+ AuthenticationData auth_data;
+ auth_data.setValidUntil(valid_until);
+ return auth_data;
+ }
/// For this type of authentication we have ASTPublicSSHKey as children for ASTAuthenticationData
if (query.type && query.type == AuthenticationType::SSH_KEY)
@@ -418,6 +443,7 @@ AuthenticationData AuthenticationData::fromAST(const ASTAuthenticationData & que
}
auth_data.setSSHKeys(std::move(keys));
+ auth_data.setValidUntil(valid_until);
return auth_data;
#else
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "SSH is disabled, because ClickHouse is built without libssh");
@@ -451,6 +477,8 @@ AuthenticationData AuthenticationData::fromAST(const ASTAuthenticationData & que
AuthenticationData auth_data(current_type);
+ auth_data.setValidUntil(valid_until);
+
if (validate)
context->getAccessControl().checkPasswordComplexityRules(value);
@@ -494,6 +522,7 @@ AuthenticationData AuthenticationData::fromAST(const ASTAuthenticationData & que
}
AuthenticationData auth_data(*query.type);
+ auth_data.setValidUntil(valid_until);
if (query.contains_hash)
{
diff --git a/src/Access/AuthenticationData.h b/src/Access/AuthenticationData.h
index a0c100264f8..2d8d008c925 100644
--- a/src/Access/AuthenticationData.h
+++ b/src/Access/AuthenticationData.h
@@ -74,6 +74,9 @@ public:
const String & getHTTPAuthenticationServerName() const { return http_auth_server_name; }
void setHTTPAuthenticationServerName(const String & name) { http_auth_server_name = name; }
+ time_t getValidUntil() const { return valid_until; }
+ void setValidUntil(time_t valid_until_) { valid_until = valid_until_; }
+
friend bool operator ==(const AuthenticationData & lhs, const AuthenticationData & rhs);
friend bool operator !=(const AuthenticationData & lhs, const AuthenticationData & rhs) { return !(lhs == rhs); }
@@ -106,6 +109,7 @@ private:
/// HTTP authentication properties
String http_auth_server_name;
HTTPAuthenticationScheme http_auth_scheme = HTTPAuthenticationScheme::BASIC;
+ time_t valid_until = 0;
};
}
diff --git a/src/Access/Common/AccessType.h b/src/Access/Common/AccessType.h
index e9f24a8c685..242dfcd8c35 100644
--- a/src/Access/Common/AccessType.h
+++ b/src/Access/Common/AccessType.h
@@ -99,6 +99,8 @@ enum class AccessType : uint8_t
M(CREATE_ARBITRARY_TEMPORARY_TABLE, "", GLOBAL, CREATE) /* allows to create and manipulate temporary tables
with arbitrary table engine */\
M(CREATE_FUNCTION, "", GLOBAL, CREATE) /* allows to execute CREATE FUNCTION */ \
+ M(CREATE_WORKLOAD, "", GLOBAL, CREATE) /* allows to execute CREATE WORKLOAD */ \
+ M(CREATE_RESOURCE, "", GLOBAL, CREATE) /* allows to execute CREATE RESOURCE */ \
M(CREATE_NAMED_COLLECTION, "", NAMED_COLLECTION, NAMED_COLLECTION_ADMIN) /* allows to execute CREATE NAMED COLLECTION */ \
M(CREATE, "", GROUP, ALL) /* allows to execute {CREATE|ATTACH} */ \
\
@@ -108,6 +110,8 @@ enum class AccessType : uint8_t
implicitly enabled by the grant DROP_TABLE */\
M(DROP_DICTIONARY, "", DICTIONARY, DROP) /* allows to execute {DROP|DETACH} DICTIONARY */\
M(DROP_FUNCTION, "", GLOBAL, DROP) /* allows to execute DROP FUNCTION */\
+ M(DROP_WORKLOAD, "", GLOBAL, DROP) /* allows to execute DROP WORKLOAD */\
+ M(DROP_RESOURCE, "", GLOBAL, DROP) /* allows to execute DROP RESOURCE */\
M(DROP_NAMED_COLLECTION, "", NAMED_COLLECTION, NAMED_COLLECTION_ADMIN) /* allows to execute DROP NAMED COLLECTION */\
M(DROP, "", GROUP, ALL) /* allows to execute {DROP|DETACH} */\
\
@@ -159,6 +163,7 @@ enum class AccessType : uint8_t
M(SYSTEM_SHUTDOWN, "SYSTEM KILL, SHUTDOWN", GLOBAL, SYSTEM) \
M(SYSTEM_DROP_DNS_CACHE, "SYSTEM DROP DNS, DROP DNS CACHE, DROP DNS", GLOBAL, SYSTEM_DROP_CACHE) \
M(SYSTEM_DROP_CONNECTIONS_CACHE, "SYSTEM DROP CONNECTIONS CACHE, DROP CONNECTIONS CACHE", GLOBAL, SYSTEM_DROP_CACHE) \
+ M(SYSTEM_PREWARM_MARK_CACHE, "SYSTEM PREWARM MARK, PREWARM MARK CACHE, PREWARM MARKS", GLOBAL, SYSTEM_DROP_CACHE) \
M(SYSTEM_DROP_MARK_CACHE, "SYSTEM DROP MARK, DROP MARK CACHE, DROP MARKS", GLOBAL, SYSTEM_DROP_CACHE) \
M(SYSTEM_DROP_UNCOMPRESSED_CACHE, "SYSTEM DROP UNCOMPRESSED, DROP UNCOMPRESSED CACHE, DROP UNCOMPRESSED", GLOBAL, SYSTEM_DROP_CACHE) \
M(SYSTEM_DROP_MMAP_CACHE, "SYSTEM DROP MMAP, DROP MMAP CACHE, DROP MMAP", GLOBAL, SYSTEM_DROP_CACHE) \
@@ -193,6 +198,7 @@ enum class AccessType : uint8_t
M(SYSTEM_SENDS, "SYSTEM STOP SENDS, SYSTEM START SENDS, STOP SENDS, START SENDS", GROUP, SYSTEM) \
M(SYSTEM_REPLICATION_QUEUES, "SYSTEM STOP REPLICATION QUEUES, SYSTEM START REPLICATION QUEUES, STOP REPLICATION QUEUES, START REPLICATION QUEUES", TABLE, SYSTEM) \
M(SYSTEM_VIRTUAL_PARTS_UPDATE, "SYSTEM STOP VIRTUAL PARTS UPDATE, SYSTEM START VIRTUAL PARTS UPDATE, STOP VIRTUAL PARTS UPDATE, START VIRTUAL PARTS UPDATE", TABLE, SYSTEM) \
+ M(SYSTEM_REDUCE_BLOCKING_PARTS, "SYSTEM STOP REDUCE BLOCKING PARTS, SYSTEM START REDUCE BLOCKING PARTS, STOP REDUCE BLOCKING PARTS, START REDUCE BLOCKING PARTS", TABLE, SYSTEM) \
M(SYSTEM_DROP_REPLICA, "DROP REPLICA", TABLE, SYSTEM) \
M(SYSTEM_SYNC_REPLICA, "SYNC REPLICA", TABLE, SYSTEM) \
M(SYSTEM_REPLICA_READINESS, "SYSTEM REPLICA READY, SYSTEM REPLICA UNREADY", GLOBAL, SYSTEM) \
diff --git a/src/Access/ContextAccess.cpp b/src/Access/ContextAccess.cpp
index 949fd37e403..a5d0451714b 100644
--- a/src/Access/ContextAccess.cpp
+++ b/src/Access/ContextAccess.cpp
@@ -701,15 +701,17 @@ bool ContextAccess::checkAccessImplHelper(const ContextPtr & context, AccessFlag
const AccessFlags dictionary_ddl = AccessType::CREATE_DICTIONARY | AccessType::DROP_DICTIONARY;
const AccessFlags function_ddl = AccessType::CREATE_FUNCTION | AccessType::DROP_FUNCTION;
+ const AccessFlags workload_ddl = AccessType::CREATE_WORKLOAD | AccessType::DROP_WORKLOAD;
+ const AccessFlags resource_ddl = AccessType::CREATE_RESOURCE | AccessType::DROP_RESOURCE;
const AccessFlags table_and_dictionary_ddl = table_ddl | dictionary_ddl;
const AccessFlags table_and_dictionary_and_function_ddl = table_ddl | dictionary_ddl | function_ddl;
const AccessFlags write_table_access = AccessType::INSERT | AccessType::OPTIMIZE;
const AccessFlags write_dcl_access = AccessType::ACCESS_MANAGEMENT - AccessType::SHOW_ACCESS;
- const AccessFlags not_readonly_flags = write_table_access | table_and_dictionary_and_function_ddl | write_dcl_access | AccessType::SYSTEM | AccessType::KILL_QUERY;
+ const AccessFlags not_readonly_flags = write_table_access | table_and_dictionary_and_function_ddl | workload_ddl | resource_ddl | write_dcl_access | AccessType::SYSTEM | AccessType::KILL_QUERY;
const AccessFlags not_readonly_1_flags = AccessType::CREATE_TEMPORARY_TABLE;
- const AccessFlags ddl_flags = table_ddl | dictionary_ddl | function_ddl;
+ const AccessFlags ddl_flags = table_ddl | dictionary_ddl | function_ddl | workload_ddl | resource_ddl;
const AccessFlags introspection_flags = AccessType::INTROSPECTION;
};
static const PrecalculatedFlags precalc;
diff --git a/src/Access/IAccessStorage.cpp b/src/Access/IAccessStorage.cpp
index 3249d89ba87..72e0933e214 100644
--- a/src/Access/IAccessStorage.cpp
+++ b/src/Access/IAccessStorage.cpp
@@ -554,7 +554,7 @@ std::optional IAccessStorage::authenticateImpl(
continue;
}
- if (areCredentialsValid(user->getName(), user->valid_until, auth_method, credentials, external_authenticators, auth_result.settings))
+ if (areCredentialsValid(user->getName(), auth_method, credentials, external_authenticators, auth_result.settings))
{
auth_result.authentication_data = auth_method;
return auth_result;
@@ -579,7 +579,6 @@ std::optional IAccessStorage::authenticateImpl(
bool IAccessStorage::areCredentialsValid(
const std::string & user_name,
- time_t valid_until,
const AuthenticationData & authentication_method,
const Credentials & credentials,
const ExternalAuthenticators & external_authenticators,
@@ -591,6 +590,7 @@ bool IAccessStorage::areCredentialsValid(
if (credentials.getUserName() != user_name)
return false;
+ auto valid_until = authentication_method.getValidUntil();
if (valid_until)
{
const time_t now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
diff --git a/src/Access/IAccessStorage.h b/src/Access/IAccessStorage.h
index 84cbdd0a751..4e2b27a1864 100644
--- a/src/Access/IAccessStorage.h
+++ b/src/Access/IAccessStorage.h
@@ -236,7 +236,6 @@ protected:
bool allow_plaintext_password) const;
virtual bool areCredentialsValid(
const std::string & user_name,
- time_t valid_until,
const AuthenticationData & authentication_method,
const Credentials & credentials,
const ExternalAuthenticators & external_authenticators,
diff --git a/src/Access/RoleCache.h b/src/Access/RoleCache.h
index 75d1fd32685..b707a05346f 100644
--- a/src/Access/RoleCache.h
+++ b/src/Access/RoleCache.h
@@ -22,6 +22,10 @@ public:
const std::vector & current_roles,
const std::vector & current_roles_with_admin_option);
+ std::shared_ptr getEnabledRoles(
+ boost::container::flat_set current_roles,
+ boost::container::flat_set current_roles_with_admin_option);
+
private:
using SubscriptionsOnRoles = std::vector>;
diff --git a/src/Access/User.cpp b/src/Access/User.cpp
index 887abc213f9..1c92f467003 100644
--- a/src/Access/User.cpp
+++ b/src/Access/User.cpp
@@ -19,8 +19,7 @@ bool User::equal(const IAccessEntity & other) const
return (authentication_methods == other_user.authentication_methods)
&& (allowed_client_hosts == other_user.allowed_client_hosts)
&& (access == other_user.access) && (granted_roles == other_user.granted_roles) && (default_roles == other_user.default_roles)
- && (settings == other_user.settings) && (grantees == other_user.grantees) && (default_database == other_user.default_database)
- && (valid_until == other_user.valid_until);
+ && (settings == other_user.settings) && (grantees == other_user.grantees) && (default_database == other_user.default_database);
}
void User::setName(const String & name_)
@@ -88,7 +87,6 @@ void User::clearAllExceptDependencies()
access = {};
settings.removeSettingsKeepProfiles();
default_database = {};
- valid_until = 0;
}
}
diff --git a/src/Access/User.h b/src/Access/User.h
index 03d62bf2277..f54e74a305d 100644
--- a/src/Access/User.h
+++ b/src/Access/User.h
@@ -23,7 +23,6 @@ struct User : public IAccessEntity
SettingsProfileElements settings;
RolesOrUsersSet grantees = RolesOrUsersSet::AllTag{};
String default_database;
- time_t valid_until = 0;
bool equal(const IAccessEntity & other) const override;
std::shared_ptr clone() const override { return cloneImpl(); }
diff --git a/src/Access/tests/gtest_access_rights_ops.cpp b/src/Access/tests/gtest_access_rights_ops.cpp
index 902fc949840..41567905a10 100644
--- a/src/Access/tests/gtest_access_rights_ops.cpp
+++ b/src/Access/tests/gtest_access_rights_ops.cpp
@@ -284,7 +284,8 @@ TEST(AccessRights, Union)
"CREATE DICTIONARY, DROP DATABASE, DROP TABLE, DROP VIEW, DROP DICTIONARY, UNDROP TABLE, "
"TRUNCATE, OPTIMIZE, BACKUP, CREATE ROW POLICY, ALTER ROW POLICY, DROP ROW POLICY, "
"SHOW ROW POLICIES, SYSTEM MERGES, SYSTEM TTL MERGES, SYSTEM FETCHES, "
- "SYSTEM MOVES, SYSTEM PULLING REPLICATION LOG, SYSTEM CLEANUP, SYSTEM VIEWS, SYSTEM SENDS, SYSTEM REPLICATION QUEUES, SYSTEM VIRTUAL PARTS UPDATE, "
+ "SYSTEM MOVES, SYSTEM PULLING REPLICATION LOG, SYSTEM CLEANUP, SYSTEM VIEWS, SYSTEM SENDS, "
+ "SYSTEM REPLICATION QUEUES, SYSTEM VIRTUAL PARTS UPDATE, SYSTEM REDUCE BLOCKING PARTS, "
"SYSTEM DROP REPLICA, SYSTEM SYNC REPLICA, SYSTEM RESTART REPLICA, "
"SYSTEM RESTORE REPLICA, SYSTEM WAIT LOADING PARTS, SYSTEM SYNC DATABASE REPLICA, SYSTEM FLUSH DISTRIBUTED, "
"SYSTEM UNLOAD PRIMARY KEY, dictGet ON db1.*, GRANT TABLE ENGINE ON db1, "
diff --git a/src/AggregateFunctions/AggregateFunctionGroupArraySorted.cpp b/src/AggregateFunctions/AggregateFunctionGroupArraySorted.cpp
index 86f7661e53f..061a1e519e1 100644
--- a/src/AggregateFunctions/AggregateFunctionGroupArraySorted.cpp
+++ b/src/AggregateFunctions/AggregateFunctionGroupArraySorted.cpp
@@ -59,13 +59,13 @@ constexpr size_t group_array_sorted_sort_strategy_max_elements_threshold = 10000
template
struct GroupArraySortedData
{
+ static constexpr bool is_value_generic_field = std::is_same_v;
+
using Allocator = MixedAlignedArenaAllocator;
- using Array = PODArray;
+ using Array = typename std::conditional_t, PODArray>;
static constexpr size_t partial_sort_max_elements_factor = 2;
- static constexpr bool is_value_generic_field = std::is_same_v;
-
Array values;
static bool compare(const T & lhs, const T & rhs)
@@ -144,7 +144,7 @@ struct GroupArraySortedData
}
if (values.size() > max_elements)
- values.resize(max_elements, arena);
+ resize(max_elements, arena);
}
ALWAYS_INLINE void partialSortAndLimitIfNeeded(size_t max_elements, Arena * arena)
@@ -153,7 +153,23 @@ struct GroupArraySortedData
return;
::nth_element(values.begin(), values.begin() + max_elements, values.end(), Comparator());
- values.resize(max_elements, arena);
+ resize(max_elements, arena);
+ }
+
+ ALWAYS_INLINE void resize(size_t n, Arena * arena)
+ {
+ if constexpr (is_value_generic_field)
+ values.resize(n);
+ else
+ values.resize(n, arena);
+ }
+
+ ALWAYS_INLINE void push_back(T && element, Arena * arena)
+ {
+ if constexpr (is_value_generic_field)
+ values.push_back(element);
+ else
+ values.push_back(element, arena);
}
ALWAYS_INLINE void addElement(T && element, size_t max_elements, Arena * arena)
@@ -171,12 +187,12 @@ struct GroupArraySortedData
return;
}
- values.push_back(std::move(element), arena);
+ push_back(std::move(element), arena);
std::push_heap(values.begin(), values.end(), Comparator());
}
else
{
- values.push_back(std::move(element), arena);
+ push_back(std::move(element), arena);
partialSortAndLimitIfNeeded(max_elements, arena);
}
}
@@ -210,14 +226,6 @@ struct GroupArraySortedData
result_array_data[result_array_data_insert_begin + i] = values[i];
}
}
-
- ~GroupArraySortedData()
- {
- for (auto & value : values)
- {
- value.~T();
- }
- }
};
template
@@ -313,14 +321,12 @@ public:
throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Too large array size, it should not exceed {}", max_elements);
auto & values = this->data(place).values;
- values.resize_exact(size, arena);
- if constexpr (std::is_same_v)
+ if constexpr (Data::is_value_generic_field)
{
+ values.resize(size);
for (Field & element : values)
{
- /// We must initialize the Field type since some internal functions (like operator=) use them
- new (&element) Field;
bool has_value = false;
readBinary(has_value, buf);
if (has_value)
@@ -329,6 +335,7 @@ public:
}
else
{
+ values.resize_exact(size, arena);
if constexpr (std::endian::native == std::endian::little)
{
buf.readStrict(reinterpret_cast(values.data()), size * sizeof(values[0]));
diff --git a/src/AggregateFunctions/AggregateFunctionQuantile.h b/src/AggregateFunctions/AggregateFunctionQuantile.h
index 423fd4bc569..aa6755f237d 100644
--- a/src/AggregateFunctions/AggregateFunctionQuantile.h
+++ b/src/AggregateFunctions/AggregateFunctionQuantile.h
@@ -312,6 +312,9 @@ struct NameQuantilesExactInclusive { static constexpr auto name = "quantilesExac
struct NameQuantileExactWeighted { static constexpr auto name = "quantileExactWeighted"; };
struct NameQuantilesExactWeighted { static constexpr auto name = "quantilesExactWeighted"; };
+struct NameQuantileExactWeightedInterpolated { static constexpr auto name = "quantileExactWeightedInterpolated"; };
+struct NameQuantilesExactWeightedInterpolated { static constexpr auto name = "quantilesExactWeightedInterpolated"; };
+
struct NameQuantileInterpolatedWeighted { static constexpr auto name = "quantileInterpolatedWeighted"; };
struct NameQuantilesInterpolatedWeighted { static constexpr auto name = "quantilesInterpolatedWeighted"; };
diff --git a/src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp b/src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp
index 469abdf45a2..58b3b75b056 100644
--- a/src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp
+++ b/src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp
@@ -1,13 +1,14 @@
-#include
#include
+#include
#include
+#include
#include
#include
-#include
-
#include
#include
+#include
+
namespace DB
{
@@ -29,7 +30,7 @@ namespace
* It uses O(distinct(N)) memory. Can be naturally applied for values with weight.
* In case of many identical values, it can be more efficient than QuantileExact even when weight is not used.
*/
-template
+template
struct QuantileExactWeighted
{
struct Int128Hash
@@ -46,6 +47,7 @@ struct QuantileExactWeighted
/// When creating, the hash table must be small.
using Map = HashMapWithStackMemory;
+ using Pair = typename Map::value_type;
Map map;
@@ -58,8 +60,18 @@ struct QuantileExactWeighted
void add(const Value & x, Weight weight)
{
- if (!isNaN(x))
- map[x] += weight;
+ if constexpr (!interpolated)
+ {
+ /// Keep compatibility for function quantilesExactWeighted.
+ if (!isNaN(x))
+ map[x] += weight;
+ }
+ else
+ {
+ /// Ignore values with zero weight in function quantilesExactWeightedInterpolated.
+ if (!isNaN(x) && weight)
+ map[x] += weight;
+ }
}
void merge(const QuantileExactWeighted & rhs)
@@ -85,6 +97,43 @@ struct QuantileExactWeighted
/// Get the value of the `level` quantile. The level must be between 0 and 1.
Value get(Float64 level) const
+ {
+ if constexpr (interpolated)
+ return getInterpolatedImpl(level);
+ else
+ return getImpl(level);
+ }
+
+ /// Get the `size` values of `levels` quantiles. Write `size` results starting with `result` address.
+ /// indices - an array of index levels such that the corresponding elements will go in ascending order.
+ void getMany(const Float64 * levels, const size_t * indices, size_t num_levels, Value * result) const
+ {
+ if constexpr (interpolated)
+ getManyInterpolatedImpl(levels, indices, num_levels, result);
+ else
+ getManyImpl(levels, indices, num_levels, result);
+ }
+
+ Float64 getFloat(Float64 level) const
+ {
+ if constexpr (interpolated)
+ return getFloatInterpolatedImpl(level);
+ else
+ return getFloatImpl(level);
+ }
+
+ void getManyFloat(const Float64 * levels, const size_t * indices, size_t num_levels, Float64 * result) const
+ {
+ if constexpr (interpolated)
+ getManyFloatInterpolatedImpl(levels, indices, num_levels, result);
+ else
+ getManyFloatImpl(levels, indices, num_levels, result);
+ }
+
+private:
+ /// get implementation without interpolation
+ Value getImpl(Float64 level) const
+ requires(!interpolated)
{
size_t size = map.size();
@@ -92,7 +141,6 @@ struct QuantileExactWeighted
return std::numeric_limits::quiet_NaN();
/// Copy the data to a temporary array to get the element you need in order.
- using Pair = typename Map::value_type;
std::unique_ptr array_holder(new Pair[size]);
Pair * array = array_holder.get();
@@ -135,9 +183,9 @@ struct QuantileExactWeighted
return it->first;
}
- /// Get the `size` values of `levels` quantiles. Write `size` results starting with `result` address.
- /// indices - an array of index levels such that the corresponding elements will go in ascending order.
- void getMany(const Float64 * levels, const size_t * indices, size_t num_levels, Value * result) const
+ /// getMany implementation without interpolation
+ void getManyImpl(const Float64 * levels, const size_t * indices, size_t num_levels, Value * result) const
+ requires(!interpolated)
{
size_t size = map.size();
@@ -149,7 +197,6 @@ struct QuantileExactWeighted
}
/// Copy the data to a temporary array to get the element you need in order.
- using Pair = typename Map::value_type;
std::unique_ptr array_holder(new Pair[size]);
Pair * array = array_holder.get();
@@ -197,23 +244,165 @@ struct QuantileExactWeighted
}
}
- /// The same, but in the case of an empty state, NaN is returned.
- Float64 getFloat(Float64) const
+ /// getFloat implementation without interpolation
+ Float64 getFloatImpl(Float64) const
+ requires(!interpolated)
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method getFloat is not implemented for QuantileExact");
}
- void getManyFloat(const Float64 *, const size_t *, size_t, Float64 *) const
+ /// getManyFloat implementation without interpolation
+ void getManyFloatImpl(const Float64 *, const size_t *, size_t, Float64 *) const
+ requires(!interpolated)
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method getManyFloat is not implemented for QuantileExact");
}
+
+ /// get implementation with interpolation
+ Value getInterpolatedImpl(Float64 level) const
+ requires(interpolated)
+ {
+ size_t size = map.size();
+ if (0 == size)
+ return Value();
+
+ Float64 res = getFloatInterpolatedImpl(level);
+ if constexpr (is_decimal)
+ return Value(static_cast(res));
+ else
+ return static_cast(res);
+ }
+
+ /// getMany implementation with interpolation
+ void getManyInterpolatedImpl(const Float64 * levels, const size_t * indices, size_t num_levels, Value * result) const
+ requires(interpolated)
+ {
+ size_t size = map.size();
+ if (0 == size)
+ {
+ for (size_t i = 0; i < num_levels; ++i)
+ result[i] = Value();
+ return;
+ }
+
+ std::unique_ptr res_holder(new Float64[num_levels]);
+ Float64 * res = res_holder.get();
+ getManyFloatInterpolatedImpl(levels, indices, num_levels, res);
+ for (size_t i = 0; i < num_levels; ++i)
+ {
+ if constexpr (is_decimal)
+ result[i] = Value(static_cast(res[i]));
+ else
+ result[i] = Value(res[i]);
+ }
+ }
+
+ /// getFloat implementation with interpolation
+ Float64 getFloatInterpolatedImpl(Float64 level) const
+ requires(interpolated)
+ {
+ size_t size = map.size();
+
+ if (0 == size)
+ return std::numeric_limits::quiet_NaN();
+
+ /// Copy the data to a temporary array to get the element you need in order.
+ std::unique_ptr array_holder(new Pair[size]);
+ Pair * array = array_holder.get();
+
+ size_t i = 0;
+ for (const auto & pair : map)
+ {
+ array[i] = pair.getValue();
+ ++i;
+ }
+
+ ::sort(array, array + size, [](const Pair & a, const Pair & b) { return a.first < b.first; });
+ std::partial_sum(array, array + size, array, [](const Pair & acc, const Pair & p) { return Pair(p.first, acc.second + p.second); });
+ Weight max_position = array[size - 1].second - 1;
+ Float64 position = max_position * level;
+ return quantileInterpolated(array, size, position);
+ }
+
+ /// getManyFloat implementation with interpolation
+ void getManyFloatInterpolatedImpl(const Float64 * levels, const size_t * indices, size_t num_levels, Float64 * result) const
+ requires(interpolated)
+ {
+ size_t size = map.size();
+ if (0 == size)
+ {
+ for (size_t i = 0; i < num_levels; ++i)
+ result[i] = std::numeric_limits::quiet_NaN();
+ return;
+ }
+
+ /// Copy the data to a temporary array to get the element you need in order.
+ std::unique_ptr array_holder(new Pair[size]);
+ Pair * array = array_holder.get();
+
+ size_t i = 0;
+ for (const auto & pair : map)
+ {
+ array[i] = pair.getValue();
+ ++i;
+ }
+
+ ::sort(array, array + size, [](const Pair & a, const Pair & b) { return a.first < b.first; });
+ std::partial_sum(array, array + size, array, [](Pair acc, Pair & p) { return Pair(p.first, acc.second + p.second); });
+ Weight max_position = array[size - 1].second - 1;
+
+ for (size_t j = 0; j < num_levels; ++j)
+ {
+ Float64 position = max_position * levels[indices[j]];
+ result[indices[j]] = quantileInterpolated(array, size, position);
+ }
+ }
+
+ /// Calculate quantile, using linear interpolation between two closest values
+ Float64 NO_SANITIZE_UNDEFINED quantileInterpolated(const Pair * array, size_t size, Float64 position) const
+ requires(interpolated)
+ {
+ size_t lower = static_cast(std::floor(position));
+ size_t higher = static_cast(std::ceil(position));
+
+ const auto * lower_it = std::lower_bound(array, array + size, lower + 1, [](const Pair & a, size_t b) { return a.second < b; });
+ const auto * higher_it = std::lower_bound(array, array + size, higher + 1, [](const Pair & a, size_t b) { return a.second < b; });
+ if (lower_it == array + size)
+ lower_it = array + size - 1;
+ if (higher_it == array + size)
+ higher_it = array + size - 1;
+
+ UnderlyingType lower_key = lower_it->first;
+ UnderlyingType higher_key = higher_it->first;
+
+ if (lower == higher || lower_key == higher_key)
+ return static_cast(lower_key);
+
+ return (static_cast(higher) - position) * lower_key + (position - static_cast(lower)) * higher_key;
+ }
};
-template using FuncQuantileExactWeighted = AggregateFunctionQuantile, NameQuantileExactWeighted, true, void, false, false>;
-template using FuncQuantilesExactWeighted = AggregateFunctionQuantile, NameQuantilesExactWeighted, true, void, true, false>;
+template
+using FuncQuantileExactWeighted = AggregateFunctionQuantile<
+ Value,
+ QuantileExactWeighted,
+ NameQuantileExactWeighted,
+ true,
+ std::conditional_t,
+ false,
+ false>;
+template
+using FuncQuantilesExactWeighted = AggregateFunctionQuantile<
+ Value,
+ QuantileExactWeighted,
+ NameQuantilesExactWeighted,
+ true,
+ std::conditional_t,
+ true,
+ false>;
-template class Function>
+template class Function, bool interpolated>
AggregateFunctionPtr createAggregateFunctionQuantile(
const std::string & name, const DataTypes & argument_types, const Array & params, const Settings *)
{
@@ -224,22 +413,23 @@ AggregateFunctionPtr createAggregateFunctionQuantile(
WhichDataType which(argument_type);
#define DISPATCH(TYPE) \
- if (which.idx == TypeIndex::TYPE) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::TYPE) \
+ return std::make_shared>(argument_types, params);
FOR_BASIC_NUMERIC_TYPES(DISPATCH)
#undef DISPATCH
- if (which.idx == TypeIndex::Date) return std::make_shared>(argument_types, params);
- if (which.idx == TypeIndex::DateTime) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::Date) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::DateTime) return std::make_shared>(argument_types, params);
- if (which.idx == TypeIndex::Decimal32) return std::make_shared>(argument_types, params);
- if (which.idx == TypeIndex::Decimal64) return std::make_shared>(argument_types, params);
- if (which.idx == TypeIndex::Decimal128) return std::make_shared>(argument_types, params);
- if (which.idx == TypeIndex::Decimal256) return std::make_shared>(argument_types, params);
- if (which.idx == TypeIndex::DateTime64) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::Decimal32) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::Decimal64) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::Decimal128) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::Decimal256) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::DateTime64) return std::make_shared>(argument_types, params);
- if (which.idx == TypeIndex::Int128) return std::make_shared>(argument_types, params);
- if (which.idx == TypeIndex::UInt128) return std::make_shared>(argument_types, params);
- if (which.idx == TypeIndex::Int256) return std::make_shared>(argument_types, params);
- if (which.idx == TypeIndex::UInt256) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::Int128) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::UInt128) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::Int256) return std::make_shared>(argument_types, params);
+ if (which.idx == TypeIndex::UInt256) return std::make_shared>(argument_types, params);
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Illegal type {} of argument for aggregate function {}",
argument_type->getName(), name);
@@ -252,11 +442,17 @@ void registerAggregateFunctionsQuantileExactWeighted(AggregateFunctionFactory &
/// For aggregate functions returning array we cannot return NULL on empty set.
AggregateFunctionProperties properties = { .returns_default_when_only_null = true };
- factory.registerFunction(NameQuantileExactWeighted::name, createAggregateFunctionQuantile);
- factory.registerFunction(NameQuantilesExactWeighted::name, { createAggregateFunctionQuantile, properties });
+ factory.registerFunction(NameQuantileExactWeighted::name, createAggregateFunctionQuantile);
+ factory.registerFunction(
+ NameQuantilesExactWeighted::name, {createAggregateFunctionQuantile, properties});
+
+ factory.registerFunction(NameQuantileExactWeightedInterpolated::name, createAggregateFunctionQuantile);
+ factory.registerFunction(
+ NameQuantilesExactWeightedInterpolated::name, {createAggregateFunctionQuantile, properties});
/// 'median' is an alias for 'quantile'
factory.registerAlias("medianExactWeighted", NameQuantileExactWeighted::name);
+ factory.registerAlias("medianExactWeightedInterpolated", NameQuantileExactWeightedInterpolated::name);
}
}
diff --git a/src/Analyzer/Resolve/QueryAnalyzer.cpp b/src/Analyzer/Resolve/QueryAnalyzer.cpp
index 381edee607d..cb3087af707 100644
--- a/src/Analyzer/Resolve/QueryAnalyzer.cpp
+++ b/src/Analyzer/Resolve/QueryAnalyzer.cpp
@@ -227,8 +227,13 @@ void QueryAnalyzer::resolveConstantExpression(QueryTreeNodePtr & node, const Que
scope.context = context;
auto node_type = node->getNodeType();
+ if (node_type == QueryTreeNodeType::QUERY || node_type == QueryTreeNodeType::UNION)
+ {
+ evaluateScalarSubqueryIfNeeded(node, scope);
+ return;
+ }
- if (table_expression && node_type != QueryTreeNodeType::QUERY && node_type != QueryTreeNodeType::UNION)
+ if (table_expression)
{
scope.expression_join_tree_node = table_expression;
validateTableExpressionModifiers(scope.expression_join_tree_node, scope);
diff --git a/src/Backups/BackupIO_S3.cpp b/src/Backups/BackupIO_S3.cpp
index 4277639fbd0..7dacd8102cc 100644
--- a/src/Backups/BackupIO_S3.cpp
+++ b/src/Backups/BackupIO_S3.cpp
@@ -36,6 +36,24 @@ namespace Setting
extern const SettingsUInt64 s3_max_redirects;
}
+namespace S3AuthSetting
+{
+ extern const S3AuthSettingsString access_key_id;
+ extern const S3AuthSettingsUInt64 expiration_window_seconds;
+ extern const S3AuthSettingsBool no_sign_request;
+ extern const S3AuthSettingsString region;
+ extern const S3AuthSettingsString secret_access_key;
+ extern const S3AuthSettingsString server_side_encryption_customer_key_base64;
+ extern const S3AuthSettingsBool use_environment_credentials;
+ extern const S3AuthSettingsBool use_insecure_imds_request;
+}
+
+namespace S3RequestSetting
+{
+ extern const S3RequestSettingsBool allow_native_copy;
+ extern const S3RequestSettingsString storage_class_name;
+}
+
namespace ErrorCodes
{
extern const int S3_ERROR;
@@ -55,7 +73,7 @@ namespace
HTTPHeaderEntries headers;
if (access_key_id.empty())
{
- credentials = Aws::Auth::AWSCredentials(settings.auth_settings.access_key_id, settings.auth_settings.secret_access_key);
+ credentials = Aws::Auth::AWSCredentials(settings.auth_settings[S3AuthSetting::access_key_id], settings.auth_settings[S3AuthSetting::secret_access_key]);
headers = settings.auth_settings.headers;
}
@@ -64,7 +82,7 @@ namespace
const Settings & local_settings = context->getSettingsRef();
S3::PocoHTTPClientConfiguration client_configuration = S3::ClientFactory::instance().createClientConfiguration(
- settings.auth_settings.region,
+ settings.auth_settings[S3AuthSetting::region],
context->getRemoteHostFilter(),
static_cast(local_settings[Setting::s3_max_redirects]),
static_cast(local_settings[Setting::backup_restore_s3_retry_attempts]),
@@ -95,15 +113,15 @@ namespace
client_settings,
credentials.GetAWSAccessKeyId(),
credentials.GetAWSSecretKey(),
- settings.auth_settings.server_side_encryption_customer_key_base64,
+ settings.auth_settings[S3AuthSetting::server_side_encryption_customer_key_base64],
settings.auth_settings.server_side_encryption_kms_config,
std::move(headers),
S3::CredentialsConfiguration
{
- settings.auth_settings.use_environment_credentials,
- settings.auth_settings.use_insecure_imds_request,
- settings.auth_settings.expiration_window_seconds,
- settings.auth_settings.no_sign_request
+ settings.auth_settings[S3AuthSetting::use_environment_credentials],
+ settings.auth_settings[S3AuthSetting::use_insecure_imds_request],
+ settings.auth_settings[S3AuthSetting::expiration_window_seconds],
+ settings.auth_settings[S3AuthSetting::no_sign_request]
});
}
@@ -143,7 +161,7 @@ BackupReaderS3::BackupReaderS3(
}
s3_settings.request_settings.updateFromSettings(context_->getSettingsRef(), /* if_changed */true);
- s3_settings.request_settings.allow_native_copy = allow_s3_native_copy;
+ s3_settings.request_settings[S3RequestSetting::allow_native_copy] = allow_s3_native_copy;
client = makeS3Client(s3_uri_, access_key_id_, secret_access_key_, s3_settings, context_);
@@ -242,8 +260,8 @@ BackupWriterS3::BackupWriterS3(
}
s3_settings.request_settings.updateFromSettings(context_->getSettingsRef(), /* if_changed */true);
- s3_settings.request_settings.allow_native_copy = allow_s3_native_copy;
- s3_settings.request_settings.storage_class_name = storage_class_name;
+ s3_settings.request_settings[S3RequestSetting::allow_native_copy] = allow_s3_native_copy;
+ s3_settings.request_settings[S3RequestSetting::storage_class_name] = storage_class_name;
client = makeS3Client(s3_uri_, access_key_id_, secret_access_key_, s3_settings, context_);
if (auto blob_storage_system_log = context_->getBlobStorageLog())
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 39499cc577d..3627d760d4c 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -136,6 +136,7 @@ add_headers_and_sources(dbms Storages/ObjectStorage/HDFS)
add_headers_and_sources(dbms Storages/ObjectStorage/Local)
add_headers_and_sources(dbms Storages/ObjectStorage/DataLakes)
add_headers_and_sources(dbms Common/NamedCollections)
+add_headers_and_sources(dbms Common/Scheduler/Workload)
if (TARGET ch_contrib::amqp_cpp)
add_headers_and_sources(dbms Storages/RabbitMQ)
diff --git a/src/Client/ClientApplicationBase.cpp b/src/Client/ClientApplicationBase.cpp
index d26641fe5f9..f7d2d0035d9 100644
--- a/src/Client/ClientApplicationBase.cpp
+++ b/src/Client/ClientApplicationBase.cpp
@@ -418,7 +418,7 @@ void ClientApplicationBase::init(int argc, char ** argv)
UInt64 max_client_memory_usage_int = parseWithSizeSuffix(max_client_memory_usage.c_str(), max_client_memory_usage.length());
total_memory_tracker.setHardLimit(max_client_memory_usage_int);
- total_memory_tracker.setDescription("(total)");
+ total_memory_tracker.setDescription("Global");
total_memory_tracker.setMetric(CurrentMetrics::MemoryTracking);
}
diff --git a/src/Client/ClientBase.cpp b/src/Client/ClientBase.cpp
index 23aa7e841cb..b6bf637ab44 100644
--- a/src/Client/ClientBase.cpp
+++ b/src/Client/ClientBase.cpp
@@ -470,8 +470,7 @@ void ClientBase::onData(Block & block, ASTPtr parsed_query)
{
if (!need_render_progress && select_into_file && !select_into_file_and_stdout)
error_stream << "\r";
- bool toggle_enabled = getClientConfiguration().getBool("enable-progress-table-toggle", true);
- progress_table.writeTable(*tty_buf, progress_table_toggle_on.load(), toggle_enabled);
+ progress_table.writeTable(*tty_buf, progress_table_toggle_on.load(), progress_table_toggle_enabled);
}
}
@@ -825,6 +824,9 @@ void ClientBase::initTTYBuffer(ProgressOption progress_option, ProgressOption pr
if (!need_render_progress && !need_render_progress_table)
return;
+ progress_table_toggle_enabled = getClientConfiguration().getBool("enable-progress-table-toggle");
+ progress_table_toggle_on = !progress_table_toggle_enabled;
+
/// If need_render_progress and need_render_progress_table are enabled,
/// use ProgressOption that was set for the progress bar for progress table as well.
ProgressOption progress = progress_option ? progress_option : progress_table_option;
@@ -881,7 +883,7 @@ void ClientBase::initTTYBuffer(ProgressOption progress_option, ProgressOption pr
void ClientBase::initKeystrokeInterceptor()
{
- if (is_interactive && need_render_progress_table && getClientConfiguration().getBool("enable-progress-table-toggle", true))
+ if (is_interactive && need_render_progress_table && progress_table_toggle_enabled)
{
keystroke_interceptor = std::make_unique(in_fd, error_stream);
keystroke_interceptor->registerCallback(' ', [this]() { progress_table_toggle_on = !progress_table_toggle_on; });
@@ -1151,6 +1153,7 @@ void ClientBase::receiveResult(ASTPtr parsed_query, Int32 signals_before_stop, b
if (keystroke_interceptor)
{
+ progress_table_toggle_on = false;
try
{
keystroke_interceptor->startIntercept();
@@ -1446,10 +1449,27 @@ void ClientBase::onProfileEvents(Block & block)
/// Flush all buffers.
void ClientBase::resetOutput()
{
+ if (need_render_progress_table && tty_buf)
+ progress_table.clearTableOutput(*tty_buf);
+
/// Order is important: format, compression, file
- if (output_format)
- output_format->finalize();
+ try
+ {
+ if (output_format)
+ output_format->finalize();
+ }
+ catch (...)
+ {
+ /// We need to make sure we continue resetting output_format (will stop threads on parallel output)
+ /// as well as cleaning other output related setup
+ if (!have_error)
+ {
+ client_exception
+ = std::make_unique(getCurrentExceptionMessageAndPattern(print_stack_trace), getCurrentExceptionCode());
+ have_error = true;
+ }
+ }
output_format.reset();
logs_out_stream.reset();
diff --git a/src/Client/ClientBase.h b/src/Client/ClientBase.h
index b06958f1d14..75f09e1d0a2 100644
--- a/src/Client/ClientBase.h
+++ b/src/Client/ClientBase.h
@@ -340,6 +340,7 @@ protected:
ProgressTable progress_table;
bool need_render_progress = true;
bool need_render_progress_table = true;
+ bool progress_table_toggle_enabled = true;
std::atomic_bool progress_table_toggle_on = false;
bool need_render_profile_events = true;
bool written_first_block = false;
diff --git a/src/Client/ProgressTable.cpp b/src/Client/ProgressTable.cpp
index 15da659d3fb..f63935440e4 100644
--- a/src/Client/ProgressTable.cpp
+++ b/src/Client/ProgressTable.cpp
@@ -180,9 +180,12 @@ void writeWithWidth(Out & out, std::string_view s, size_t width)
template
void writeWithWidthStrict(Out & out, std::string_view s, size_t width)
{
- chassert(width != 0);
+ constexpr std::string_view ellipsis = "…";
if (s.size() > width)
- out << s.substr(0, width - 1) << "…";
+ if (width <= ellipsis.size())
+ out << s.substr(0, width);
+ else
+ out << s.substr(0, width - ellipsis.size()) << ellipsis;
else
out << s;
}
@@ -219,7 +222,9 @@ void ProgressTable::writeTable(WriteBufferFromFileDescriptor & message, bool sho
writeWithWidth(message, COLUMN_EVENT_NAME, column_event_name_width);
writeWithWidth(message, COLUMN_VALUE, COLUMN_VALUE_WIDTH);
writeWithWidth(message, COLUMN_PROGRESS, COLUMN_PROGRESS_WIDTH);
- writeWithWidth(message, COLUMN_DOCUMENTATION_NAME, COLUMN_DOCUMENTATION_WIDTH);
+ auto col_doc_width = getColumnDocumentationWidth(terminal_width);
+ if (col_doc_width)
+ writeWithWidth(message, COLUMN_DOCUMENTATION_NAME, col_doc_width);
message << CLEAR_TO_END_OF_LINE;
double elapsed_sec = watch.elapsedSeconds();
@@ -257,9 +262,12 @@ void ProgressTable::writeTable(WriteBufferFromFileDescriptor & message, bool sho
writeWithWidth(message, formatReadableValue(value_type, progress) + "/s", COLUMN_PROGRESS_WIDTH);
- message << setColorForDocumentation();
- const auto * doc = getDocumentation(event_name_to_event.at(name));
- writeWithWidthStrict(message, doc, COLUMN_DOCUMENTATION_WIDTH);
+ if (col_doc_width)
+ {
+ message << setColorForDocumentation();
+ const auto * doc = getDocumentation(event_name_to_event.at(name));
+ writeWithWidthStrict(message, doc, col_doc_width);
+ }
message << RESET_COLOR;
message << CLEAR_TO_END_OF_LINE;
@@ -372,6 +380,14 @@ size_t ProgressTable::tableSize() const
return metrics.empty() ? 0 : metrics.size() + 1;
}
+size_t ProgressTable::getColumnDocumentationWidth(size_t terminal_width) const
+{
+ auto fixed_columns_width = column_event_name_width + COLUMN_VALUE_WIDTH + COLUMN_PROGRESS_WIDTH;
+ if (terminal_width < fixed_columns_width + COLUMN_DOCUMENTATION_MIN_WIDTH)
+ return 0;
+ return terminal_width - fixed_columns_width;
+}
+
ProgressTable::MetricInfo::MetricInfo(ProfileEvents::Type t) : type(t)
{
}
diff --git a/src/Client/ProgressTable.h b/src/Client/ProgressTable.h
index a55326e8d3a..f2563d91217 100644
--- a/src/Client/ProgressTable.h
+++ b/src/Client/ProgressTable.h
@@ -87,6 +87,7 @@ private:
};
size_t tableSize() const;
+ size_t getColumnDocumentationWidth(size_t terminal_width) const;
using MetricName = String;
@@ -110,7 +111,7 @@ private:
static constexpr std::string_view COLUMN_DOCUMENTATION_NAME = "Documentation";
static constexpr size_t COLUMN_VALUE_WIDTH = 20;
static constexpr size_t COLUMN_PROGRESS_WIDTH = 20;
- static constexpr size_t COLUMN_DOCUMENTATION_WIDTH = 100;
+ static constexpr size_t COLUMN_DOCUMENTATION_MIN_WIDTH = COLUMN_DOCUMENTATION_NAME.size();
std::ostream & output_stream;
int in_fd;
diff --git a/src/Columns/ColumnArray.cpp b/src/Columns/ColumnArray.cpp
index 83d4c24c769..0c6d7c4e5c6 100644
--- a/src/Columns/ColumnArray.cpp
+++ b/src/Columns/ColumnArray.cpp
@@ -369,6 +369,23 @@ void ColumnArray::popBack(size_t n)
offsets_data.resize_assume_reserved(offsets_data.size() - n);
}
+ColumnCheckpointPtr ColumnArray::getCheckpoint() const
+{
+ return std::make_shared(size(), getData().getCheckpoint());
+}
+
+void ColumnArray::updateCheckpoint(ColumnCheckpoint & checkpoint) const
+{
+ checkpoint.size = size();
+ getData().updateCheckpoint(*assert_cast(checkpoint).nested);
+}
+
+void ColumnArray::rollback(const ColumnCheckpoint & checkpoint)
+{
+ getOffsets().resize_assume_reserved(checkpoint.size);
+ getData().rollback(*assert_cast(checkpoint).nested);
+}
+
int ColumnArray::compareAtImpl(size_t n, size_t m, const IColumn & rhs_, int nan_direction_hint, const Collator * collator) const
{
const ColumnArray & rhs = assert_cast(rhs_);
diff --git a/src/Columns/ColumnArray.h b/src/Columns/ColumnArray.h
index f77268a8be6..ec14b096055 100644
--- a/src/Columns/ColumnArray.h
+++ b/src/Columns/ColumnArray.h
@@ -161,6 +161,10 @@ public:
ColumnPtr compress() const override;
+ ColumnCheckpointPtr getCheckpoint() const override;
+ void updateCheckpoint(ColumnCheckpoint & checkpoint) const override;
+ void rollback(const ColumnCheckpoint & checkpoint) override;
+
void forEachSubcolumn(MutableColumnCallback callback) override
{
callback(offsets);
diff --git a/src/Columns/ColumnDynamic.cpp b/src/Columns/ColumnDynamic.cpp
index 4c7c9b53654..41a9096bc0c 100644
--- a/src/Columns/ColumnDynamic.cpp
+++ b/src/Columns/ColumnDynamic.cpp
@@ -1000,6 +1000,56 @@ ColumnPtr ColumnDynamic::compress() const
});
}
+void ColumnDynamic::updateCheckpoint(ColumnCheckpoint & checkpoint) const
+{
+ auto & nested = assert_cast(checkpoint).nested;
+ const auto & variants = variant_column_ptr->getVariants();
+
+ size_t old_size = nested.size();
+ chassert(old_size <= variants.size());
+
+ for (size_t i = 0; i < old_size; ++i)
+ {
+ variants[i]->updateCheckpoint(*nested[i]);
+ }
+
+ /// If column has new variants since last checkpoint create checkpoints for them.
+ if (old_size < variants.size())
+ {
+ nested.resize(variants.size());
+ for (size_t i = old_size; i < variants.size(); ++i)
+ nested[i] = variants[i]->getCheckpoint();
+ }
+
+ checkpoint.size = size();
+}
+
+void ColumnDynamic::rollback(const ColumnCheckpoint & checkpoint)
+{
+ const auto & nested = assert_cast(checkpoint).nested;
+ chassert(nested.size() <= variant_column_ptr->getNumVariants());
+
+ /// The structure hasn't changed, so we can use generic rollback of Variant column
+ if (nested.size() == variant_column_ptr->getNumVariants())
+ {
+ variant_column_ptr->rollback(checkpoint);
+ return;
+ }
+
+ /// Manually rollback internals of Variant column
+ variant_column_ptr->getOffsets().resize_assume_reserved(checkpoint.size);
+ variant_column_ptr->getLocalDiscriminators().resize_assume_reserved(checkpoint.size);
+
+ auto & variants = variant_column_ptr->getVariants();
+ for (size_t i = 0; i < nested.size(); ++i)
+ variants[i]->rollback(*nested[i]);
+
+ /// Keep the structure of variant as is but rollback
+ /// to 0 variants that are not in the checkpoint.
+ for (size_t i = nested.size(); i < variants.size(); ++i)
+ variants[i] = variants[i]->cloneEmpty();
+}
+
String ColumnDynamic::getTypeNameAt(size_t row_num) const
{
const auto & variant_col = getVariantColumn();
diff --git a/src/Columns/ColumnDynamic.h b/src/Columns/ColumnDynamic.h
index 17b0d80e5eb..57a1545a832 100644
--- a/src/Columns/ColumnDynamic.h
+++ b/src/Columns/ColumnDynamic.h
@@ -304,6 +304,15 @@ public:
variant_column_ptr->protect();
}
+ ColumnCheckpointPtr getCheckpoint() const override
+ {
+ return variant_column_ptr->getCheckpoint();
+ }
+
+ void updateCheckpoint(ColumnCheckpoint & checkpoint) const override;
+
+ void rollback(const ColumnCheckpoint & checkpoint) override;
+
void forEachSubcolumn(MutableColumnCallback callback) override
{
callback(variant_column);
diff --git a/src/Columns/ColumnMap.cpp b/src/Columns/ColumnMap.cpp
index 536da4d06d0..7ebbed930d8 100644
--- a/src/Columns/ColumnMap.cpp
+++ b/src/Columns/ColumnMap.cpp
@@ -312,6 +312,21 @@ void ColumnMap::getExtremes(Field & min, Field & max) const
max = std::move(map_max_value);
}
+ColumnCheckpointPtr ColumnMap::getCheckpoint() const
+{
+ return nested->getCheckpoint();
+}
+
+void ColumnMap::updateCheckpoint(ColumnCheckpoint & checkpoint) const
+{
+ nested->updateCheckpoint(checkpoint);
+}
+
+void ColumnMap::rollback(const ColumnCheckpoint & checkpoint)
+{
+ nested->rollback(checkpoint);
+}
+
void ColumnMap::forEachSubcolumn(MutableColumnCallback callback)
{
callback(nested);
diff --git a/src/Columns/ColumnMap.h b/src/Columns/ColumnMap.h
index 39d15a586b9..575114f8d3a 100644
--- a/src/Columns/ColumnMap.h
+++ b/src/Columns/ColumnMap.h
@@ -102,6 +102,9 @@ public:
size_t byteSizeAt(size_t n) const override;
size_t allocatedBytes() const override;
void protect() override;
+ ColumnCheckpointPtr getCheckpoint() const override;
+ void updateCheckpoint(ColumnCheckpoint & checkpoint) const override;
+ void rollback(const ColumnCheckpoint & checkpoint) override;
void forEachSubcolumn(MutableColumnCallback callback) override;
void forEachSubcolumnRecursively(RecursiveMutableColumnCallback callback) override;
bool structureEquals(const IColumn & rhs) const override;
diff --git a/src/Columns/ColumnNullable.cpp b/src/Columns/ColumnNullable.cpp
index 390df390ae6..6e8bd3fc70c 100644
--- a/src/Columns/ColumnNullable.cpp
+++ b/src/Columns/ColumnNullable.cpp
@@ -302,6 +302,23 @@ void ColumnNullable::popBack(size_t n)
getNullMapColumn().popBack(n);
}
+ColumnCheckpointPtr ColumnNullable::getCheckpoint() const
+{
+ return std::make_shared(size(), nested_column->getCheckpoint());
+}
+
+void ColumnNullable::updateCheckpoint(ColumnCheckpoint & checkpoint) const
+{
+ checkpoint.size = size();
+ nested_column->updateCheckpoint(*assert_cast(checkpoint).nested);
+}
+
+void ColumnNullable::rollback(const ColumnCheckpoint & checkpoint)
+{
+ getNullMapData().resize_assume_reserved(checkpoint.size);
+ nested_column->rollback(*assert_cast(checkpoint).nested);
+}
+
ColumnPtr ColumnNullable::filter(const Filter & filt, ssize_t result_size_hint) const
{
ColumnPtr filtered_data = getNestedColumn().filter(filt, result_size_hint);
diff --git a/src/Columns/ColumnNullable.h b/src/Columns/ColumnNullable.h
index 78274baca51..32ce66c5965 100644
--- a/src/Columns/ColumnNullable.h
+++ b/src/Columns/ColumnNullable.h
@@ -143,6 +143,10 @@ public:
ColumnPtr compress() const override;
+ ColumnCheckpointPtr getCheckpoint() const override;
+ void updateCheckpoint(ColumnCheckpoint & checkpoint) const override;
+ void rollback(const ColumnCheckpoint & checkpoint) override;
+
void forEachSubcolumn(MutableColumnCallback callback) override
{
callback(nested_column);
diff --git a/src/Columns/ColumnObject.cpp b/src/Columns/ColumnObject.cpp
index b962887f7b5..18ba8ed36ee 100644
--- a/src/Columns/ColumnObject.cpp
+++ b/src/Columns/ColumnObject.cpp
@@ -30,6 +30,23 @@ const std::shared_ptr & getDynamicSerialization()
return dynamic_serialization;
}
+struct ColumnObjectCheckpoint : public ColumnCheckpoint
+{
+ using CheckpointsMap = std::unordered_map;
+
+ ColumnObjectCheckpoint(size_t size_, CheckpointsMap typed_paths_, CheckpointsMap dynamic_paths_, ColumnCheckpointPtr shared_data_)
+ : ColumnCheckpoint(size_)
+ , typed_paths(std::move(typed_paths_))
+ , dynamic_paths(std::move(dynamic_paths_))
+ , shared_data(std::move(shared_data_))
+ {
+ }
+
+ CheckpointsMap typed_paths;
+ CheckpointsMap dynamic_paths;
+ ColumnCheckpointPtr shared_data;
+};
+
}
ColumnObject::ColumnObject(
@@ -698,6 +715,69 @@ void ColumnObject::popBack(size_t n)
shared_data->popBack(n);
}
+ColumnCheckpointPtr ColumnObject::getCheckpoint() const
+{
+ auto get_checkpoints = [](const auto & columns)
+ {
+ ColumnObjectCheckpoint::CheckpointsMap checkpoints;
+ for (const auto & [name, column] : columns)
+ checkpoints[name] = column->getCheckpoint();
+
+ return checkpoints;
+ };
+
+ return std::make_shared(size(), get_checkpoints(typed_paths), get_checkpoints(dynamic_paths_ptrs), shared_data->getCheckpoint());
+}
+
+void ColumnObject::updateCheckpoint(ColumnCheckpoint & checkpoint) const
+{
+ auto & object_checkpoint = assert_cast(checkpoint);
+
+ auto update_checkpoints = [&](const auto & columns_map, auto & checkpoints_map)
+ {
+ for (const auto & [name, column] : columns_map)
+ {
+ auto & nested = checkpoints_map[name];
+ if (!nested)
+ nested = column->getCheckpoint();
+ else
+ column->updateCheckpoint(*nested);
+ }
+ };
+
+ checkpoint.size = size();
+ update_checkpoints(typed_paths, object_checkpoint.typed_paths);
+ update_checkpoints(dynamic_paths, object_checkpoint.dynamic_paths);
+ shared_data->updateCheckpoint(*object_checkpoint.shared_data);
+}
+
+void ColumnObject::rollback(const ColumnCheckpoint & checkpoint)
+{
+ const auto & object_checkpoint = assert_cast(checkpoint);
+
+ auto rollback_columns = [&](auto & columns_map, const auto & checkpoints_map)
+ {
+ NameSet names_to_remove;
+
+ /// Rollback subcolumns and remove paths that were not in checkpoint.
+ for (auto & [name, column] : columns_map)
+ {
+ auto it = checkpoints_map.find(name);
+ if (it == checkpoints_map.end())
+ names_to_remove.insert(name);
+ else
+ column->rollback(*it->second);
+ }
+
+ for (const auto & name : names_to_remove)
+ columns_map.erase(name);
+ };
+
+ rollback_columns(typed_paths, object_checkpoint.typed_paths);
+ rollback_columns(dynamic_paths, object_checkpoint.dynamic_paths);
+ shared_data->rollback(*object_checkpoint.shared_data);
+}
+
StringRef ColumnObject::serializeValueIntoArena(size_t n, Arena & arena, const char *& begin) const
{
StringRef res(begin, 0);
diff --git a/src/Columns/ColumnObject.h b/src/Columns/ColumnObject.h
index a2f9964a243..74ae7e136ce 100644
--- a/src/Columns/ColumnObject.h
+++ b/src/Columns/ColumnObject.h
@@ -161,6 +161,9 @@ public:
size_t byteSizeAt(size_t n) const override;
size_t allocatedBytes() const override;
void protect() override;
+ ColumnCheckpointPtr getCheckpoint() const override;
+ void updateCheckpoint(ColumnCheckpoint & checkpoint) const override;
+ void rollback(const ColumnCheckpoint & checkpoint) override;
void forEachSubcolumn(MutableColumnCallback callback) override;
diff --git a/src/Columns/ColumnSparse.cpp b/src/Columns/ColumnSparse.cpp
index a908d970a15..a0e47e65fc6 100644
--- a/src/Columns/ColumnSparse.cpp
+++ b/src/Columns/ColumnSparse.cpp
@@ -308,6 +308,28 @@ void ColumnSparse::popBack(size_t n)
_size = new_size;
}
+ColumnCheckpointPtr ColumnSparse::getCheckpoint() const
+{
+ return std::make_shared(size(), values->getCheckpoint());
+}
+
+void ColumnSparse::updateCheckpoint(ColumnCheckpoint & checkpoint) const
+{
+ checkpoint.size = size();
+ values->updateCheckpoint(*assert_cast(checkpoint).nested);
+}
+
+void ColumnSparse::rollback(const ColumnCheckpoint & checkpoint)
+{
+ _size = checkpoint.size;
+
+ const auto & nested = *assert_cast(checkpoint).nested;
+ chassert(nested.size > 0);
+
+ values->rollback(nested);
+ getOffsetsData().resize_assume_reserved(nested.size - 1);
+}
+
ColumnPtr ColumnSparse::filter(const Filter & filt, ssize_t) const
{
if (_size != filt.size())
diff --git a/src/Columns/ColumnSparse.h b/src/Columns/ColumnSparse.h
index 7a4d914e62a..619dce63c1e 100644
--- a/src/Columns/ColumnSparse.h
+++ b/src/Columns/ColumnSparse.h
@@ -149,6 +149,10 @@ public:
ColumnPtr compress() const override;
+ ColumnCheckpointPtr getCheckpoint() const override;
+ void updateCheckpoint(ColumnCheckpoint & checkpoint) const override;
+ void rollback(const ColumnCheckpoint & checkpoint) override;
+
void forEachSubcolumn(MutableColumnCallback callback) override;
void forEachSubcolumnRecursively(RecursiveMutableColumnCallback callback) override;
diff --git a/src/Columns/ColumnString.cpp b/src/Columns/ColumnString.cpp
index 00cf3bd9c30..269c20397b4 100644
--- a/src/Columns/ColumnString.cpp
+++ b/src/Columns/ColumnString.cpp
@@ -240,6 +240,23 @@ ColumnPtr ColumnString::permute(const Permutation & perm, size_t limit) const
return permuteImpl(*this, perm, limit);
}
+ColumnCheckpointPtr ColumnString::getCheckpoint() const
+{
+ auto nested = std::make_shared(chars.size());
+ return std::make_shared(size(), std::move(nested));
+}
+
+void ColumnString::updateCheckpoint(ColumnCheckpoint & checkpoint) const
+{
+ checkpoint.size = size();
+ assert_cast(checkpoint).nested->size = chars.size();
+}
+
+void ColumnString::rollback(const ColumnCheckpoint & checkpoint)
+{
+ offsets.resize_assume_reserved(checkpoint.size);
+ chars.resize_assume_reserved(assert_cast(checkpoint).nested->size);
+}
void ColumnString::collectSerializedValueSizes(PaddedPODArray & sizes, const UInt8 * is_null) const
{
diff --git a/src/Columns/ColumnString.h b/src/Columns/ColumnString.h
index ec0563b3f00..c2371412437 100644
--- a/src/Columns/ColumnString.h
+++ b/src/Columns/ColumnString.h
@@ -194,6 +194,10 @@ public:
offsets.resize_assume_reserved(offsets.size() - n);
}
+ ColumnCheckpointPtr getCheckpoint() const override;
+ void updateCheckpoint(ColumnCheckpoint & checkpoint) const override;
+ void rollback(const ColumnCheckpoint & checkpoint) override;
+
void collectSerializedValueSizes(PaddedPODArray & sizes, const UInt8 * is_null) const override;
StringRef serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const override;
diff --git a/src/Columns/ColumnTuple.cpp b/src/Columns/ColumnTuple.cpp
index 1cb6d0b60d8..c3f7d10f650 100644
--- a/src/Columns/ColumnTuple.cpp
+++ b/src/Columns/ColumnTuple.cpp
@@ -254,6 +254,37 @@ void ColumnTuple::popBack(size_t n)
column->popBack(n);
}
+ColumnCheckpointPtr ColumnTuple::getCheckpoint() const
+{
+ ColumnCheckpoints checkpoints;
+ checkpoints.reserve(columns.size());
+
+ for (const auto & column : columns)
+ checkpoints.push_back(column->getCheckpoint());
+
+ return std::make_shared(size(), std::move(checkpoints));
+}
+
+void ColumnTuple::updateCheckpoint(ColumnCheckpoint & checkpoint) const
+{
+ auto & checkpoints = assert_cast(checkpoint).nested;
+ chassert(checkpoints.size() == columns.size());
+
+ checkpoint.size = size();
+ for (size_t i = 0; i < columns.size(); ++i)
+ columns[i]->updateCheckpoint(*checkpoints[i]);
+}
+
+void ColumnTuple::rollback(const ColumnCheckpoint & checkpoint)
+{
+ column_length = checkpoint.size;
+ const auto & checkpoints = assert_cast(checkpoint).nested;
+
+ chassert(columns.size() == checkpoints.size());
+ for (size_t i = 0; i < columns.size(); ++i)
+ columns[i]->rollback(*checkpoints[i]);
+}
+
StringRef ColumnTuple::serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const
{
if (columns.empty())
diff --git a/src/Columns/ColumnTuple.h b/src/Columns/ColumnTuple.h
index 6968294aef9..c73f90f13d9 100644
--- a/src/Columns/ColumnTuple.h
+++ b/src/Columns/ColumnTuple.h
@@ -118,6 +118,9 @@ public:
size_t byteSizeAt(size_t n) const override;
size_t allocatedBytes() const override;
void protect() override;
+ ColumnCheckpointPtr getCheckpoint() const override;
+ void updateCheckpoint(ColumnCheckpoint & checkpoint) const override;
+ void rollback(const ColumnCheckpoint & checkpoint) override;
void forEachSubcolumn(MutableColumnCallback callback) override;
void forEachSubcolumnRecursively(RecursiveMutableColumnCallback callback) override;
bool structureEquals(const IColumn & rhs) const override;
diff --git a/src/Columns/ColumnVariant.cpp b/src/Columns/ColumnVariant.cpp
index 8d7de94a319..564b60e1c1d 100644
--- a/src/Columns/ColumnVariant.cpp
+++ b/src/Columns/ColumnVariant.cpp
@@ -739,6 +739,39 @@ void ColumnVariant::popBack(size_t n)
offsets->popBack(n);
}
+ColumnCheckpointPtr ColumnVariant::getCheckpoint() const
+{
+ ColumnCheckpoints checkpoints;
+ checkpoints.reserve(variants.size());
+
+ for (const auto & column : variants)
+ checkpoints.push_back(column->getCheckpoint());
+
+ return std::make_shared(size(), std::move(checkpoints));
+}
+
+void ColumnVariant::updateCheckpoint(ColumnCheckpoint & checkpoint) const
+{
+ auto & checkpoints = assert_cast(checkpoint).nested;
+ chassert(checkpoints.size() == variants.size());
+
+ checkpoint.size = size();
+ for (size_t i = 0; i < variants.size(); ++i)
+ variants[i]->updateCheckpoint(*checkpoints[i]);
+}
+
+void ColumnVariant::rollback(const ColumnCheckpoint & checkpoint)
+{
+ getOffsets().resize_assume_reserved(checkpoint.size);
+ getLocalDiscriminators().resize_assume_reserved(checkpoint.size);
+
+ const auto & checkpoints = assert_cast(checkpoint).nested;
+ chassert(variants.size() == checkpoints.size());
+
+ for (size_t i = 0; i < variants.size(); ++i)
+ variants[i]->rollback(*checkpoints[i]);
+}
+
StringRef ColumnVariant::serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const
{
/// During any serialization/deserialization we should always use global discriminators.
diff --git a/src/Columns/ColumnVariant.h b/src/Columns/ColumnVariant.h
index a6c45ee27b8..f90a812703d 100644
--- a/src/Columns/ColumnVariant.h
+++ b/src/Columns/ColumnVariant.h
@@ -248,6 +248,9 @@ public:
size_t byteSizeAt(size_t n) const override;
size_t allocatedBytes() const override;
void protect() override;
+ ColumnCheckpointPtr getCheckpoint() const override;
+ void updateCheckpoint(ColumnCheckpoint & checkpoint) const override;
+ void rollback(const ColumnCheckpoint & checkpoint) override;
void forEachSubcolumn(MutableColumnCallback callback) override;
void forEachSubcolumnRecursively(RecursiveMutableColumnCallback callback) override;
bool structureEquals(const IColumn & rhs) const override;
diff --git a/src/Columns/IColumn.h b/src/Columns/IColumn.h
index e4fe233ffdf..95becba3fdb 100644
--- a/src/Columns/IColumn.h
+++ b/src/Columns/IColumn.h
@@ -49,6 +49,40 @@ struct EqualRange
using EqualRanges = std::vector;
+/// A checkpoint that contains size of column and all its subcolumns.
+/// It can be used to rollback column to the previous state, for example
+/// after failed parsing when column may be in inconsistent state.
+struct ColumnCheckpoint
+{
+ size_t size;
+
+ explicit ColumnCheckpoint(size_t size_) : size(size_) {}
+ virtual ~ColumnCheckpoint() = default;
+};
+
+using ColumnCheckpointPtr = std::shared_ptr;
+using ColumnCheckpoints = std::vector;
+
+struct ColumnCheckpointWithNested : public ColumnCheckpoint
+{
+ ColumnCheckpointWithNested(size_t size_, ColumnCheckpointPtr nested_)
+ : ColumnCheckpoint(size_), nested(std::move(nested_))
+ {
+ }
+
+ ColumnCheckpointPtr nested;
+};
+
+struct ColumnCheckpointWithMultipleNested : public ColumnCheckpoint
+{
+ ColumnCheckpointWithMultipleNested(size_t size_, ColumnCheckpoints nested_)
+ : ColumnCheckpoint(size_), nested(std::move(nested_))
+ {
+ }
+
+ ColumnCheckpoints nested;
+};
+
/// Declares interface to store columns in memory.
class IColumn : public COW
{
@@ -509,6 +543,17 @@ public:
/// The operation is slow and performed only for debug builds.
virtual void protect() {}
+ /// Returns checkpoint of current state of column.
+ virtual ColumnCheckpointPtr getCheckpoint() const { return std::make_shared(size()); }
+
+ /// Updates the checkpoint with current state. It is used to avoid extra allocations in 'getCheckpoint'.
+ virtual void updateCheckpoint(ColumnCheckpoint & checkpoint) const { checkpoint.size = size(); }
+
+ /// Rollbacks column to the checkpoint.
+ /// Unlike 'popBack' this method should work correctly even if column has invalid state.
+ /// Sizes of columns in checkpoint must be less or equal than current size.
+ virtual void rollback(const ColumnCheckpoint & checkpoint) { popBack(size() - checkpoint.size); }
+
/// If the column contains subcolumns (such as Array, Nullable, etc), do callback on them.
/// Shallow: doesn't do recursive calls; don't do call for itself.
diff --git a/src/Columns/tests/gtest_column_dynamic.cpp b/src/Columns/tests/gtest_column_dynamic.cpp
index de76261229d..9a435a97a07 100644
--- a/src/Columns/tests/gtest_column_dynamic.cpp
+++ b/src/Columns/tests/gtest_column_dynamic.cpp
@@ -920,3 +920,71 @@ TEST(ColumnDynamic, compare)
ASSERT_EQ(column_from->compareAt(3, 2, *column_from, -1), -1);
ASSERT_EQ(column_from->compareAt(3, 4, *column_from, -1), -1);
}
+
+TEST(ColumnDynamic, rollback)
+{
+ auto check_variant = [](const ColumnVariant & column_variant, std::vector sizes)
+ {
+ ASSERT_EQ(column_variant.getNumVariants(), sizes.size());
+ size_t num_rows = 0;
+
+ for (size_t i = 0; i < sizes.size(); ++i)
+ {
+ ASSERT_EQ(column_variant.getVariants()[i]->size(), sizes[i]);
+ num_rows += sizes[i];
+ }
+
+ ASSERT_EQ(num_rows, column_variant.size());
+ };
+
+ auto check_checkpoint = [&](const ColumnCheckpoint & cp, std::vector sizes)
+ {
+ const auto & nested = assert_cast(cp).nested;
+ size_t num_rows = 0;
+
+ for (size_t i = 0; i < nested.size(); ++i)
+ {
+ ASSERT_EQ(nested[i]->size, sizes[i]);
+ num_rows += sizes[i];
+ }
+
+ ASSERT_EQ(num_rows, cp.size);
+ };
+
+ std::vector>> checkpoints;
+
+ auto column = ColumnDynamic::create(2);
+ auto checkpoint = column->getCheckpoint();
+
+ column->insert(Field(42));
+
+ column->updateCheckpoint(*checkpoint);
+ checkpoints.emplace_back(checkpoint, std::vector{0, 1, 0});
+
+ column->insert(Field("str1"));
+ column->rollback(*checkpoint);
+
+ check_checkpoint(*checkpoint, checkpoints.back().second);
+ check_variant(column->getVariantColumn(), checkpoints.back().second);
+
+ column->insert("str1");
+ checkpoints.emplace_back(column->getCheckpoint(), std::vector{0, 1, 1});
+
+ column->insert("str2");
+ checkpoints.emplace_back(column->getCheckpoint(), std::vector{0, 1, 2});
+
+ column->insert(Array({1, 2}));
+ checkpoints.emplace_back(column->getCheckpoint(), std::vector{1, 1, 2});
+
+ column->insert(Field(42.42));
+ checkpoints.emplace_back(column->getCheckpoint(), std::vector{2, 1, 2});
+
+ for (const auto & [cp, sizes] : checkpoints)
+ {
+ auto column_copy = column->clone();
+ column_copy->rollback(*cp);
+
+ check_checkpoint(*cp, sizes);
+ check_variant(assert_cast(*column_copy).getVariantColumn(), sizes);
+ }
+}
diff --git a/src/Columns/tests/gtest_column_object.cpp b/src/Columns/tests/gtest_column_object.cpp
index f6a1da64ba3..a20bd26fabd 100644
--- a/src/Columns/tests/gtest_column_object.cpp
+++ b/src/Columns/tests/gtest_column_object.cpp
@@ -5,6 +5,7 @@
#include
#include
+#include "Core/Field.h"
#include
using namespace DB;
@@ -349,3 +350,65 @@ TEST(ColumnObject, SkipSerializedInArena)
pos = col2->skipSerializedInArena(pos);
ASSERT_EQ(pos, end);
}
+
+TEST(ColumnObject, rollback)
+{
+ auto type = DataTypeFactory::instance().get("JSON(max_dynamic_types=10, max_dynamic_paths=2, a.a UInt32, a.b UInt32)");
+ auto col = type->createColumn();
+ auto & col_object = assert_cast(*col);
+ const auto & typed_paths = col_object.getTypedPaths();
+ const auto & dynamic_paths = col_object.getDynamicPaths();
+ const auto & shared_data = col_object.getSharedDataColumn();
+
+ auto assert_sizes = [&](size_t size)
+ {
+ for (const auto & [name, column] : typed_paths)
+ ASSERT_EQ(column->size(), size);
+
+ for (const auto & [name, column] : dynamic_paths)
+ ASSERT_EQ(column->size(), size);
+
+ ASSERT_EQ(shared_data.size(), size);
+ };
+
+ auto checkpoint = col_object.getCheckpoint();
+
+ col_object.insert(Object{{"a.a", Field{1u}}});
+ col_object.updateCheckpoint(*checkpoint);
+
+ col_object.insert(Object{{"a.b", Field{2u}}});
+ col_object.insert(Object{{"a.a", Field{3u}}});
+
+ col_object.rollback(*checkpoint);
+
+ assert_sizes(1);
+ ASSERT_EQ(typed_paths.size(), 2);
+ ASSERT_EQ(dynamic_paths.size(), 0);
+
+ ASSERT_EQ((*typed_paths.at("a.a"))[0], Field{1u});
+ ASSERT_EQ((*typed_paths.at("a.b"))[0], Field{0u});
+
+ col_object.insert(Object{{"a.c", Field{"ccc"}}});
+
+ checkpoint = col_object.getCheckpoint();
+
+ col_object.insert(Object{{"a.d", Field{"ddd"}}});
+ col_object.insert(Object{{"a.e", Field{"eee"}}});
+
+ assert_sizes(4);
+ ASSERT_EQ(typed_paths.size(), 2);
+ ASSERT_EQ(dynamic_paths.size(), 2);
+
+ ASSERT_EQ((*typed_paths.at("a.a"))[0], Field{1u});
+ ASSERT_EQ((*dynamic_paths.at("a.c"))[1], Field{"ccc"});
+ ASSERT_EQ((*dynamic_paths.at("a.d"))[2], Field{"ddd"});
+
+ col_object.rollback(*checkpoint);
+
+ assert_sizes(2);
+ ASSERT_EQ(typed_paths.size(), 2);
+ ASSERT_EQ(dynamic_paths.size(), 1);
+
+ ASSERT_EQ((*typed_paths.at("a.a"))[0], Field{1u});
+ ASSERT_EQ((*dynamic_paths.at("a.c"))[1], Field{"ccc"});
+}
diff --git a/src/Common/CurrentMetrics.cpp b/src/Common/CurrentMetrics.cpp
index e9d5e07c914..0c850fd4d36 100644
--- a/src/Common/CurrentMetrics.cpp
+++ b/src/Common/CurrentMetrics.cpp
@@ -27,8 +27,8 @@
M(BackgroundBufferFlushSchedulePoolSize, "Limit on number of tasks in BackgroundBufferFlushSchedulePool") \
M(BackgroundDistributedSchedulePoolTask, "Number of active tasks in BackgroundDistributedSchedulePool. This pool is used for distributed sends that is done in background.") \
M(BackgroundDistributedSchedulePoolSize, "Limit on number of tasks in BackgroundDistributedSchedulePool") \
- M(BackgroundMessageBrokerSchedulePoolTask, "Number of active tasks in BackgroundProcessingPool for message streaming") \
- M(BackgroundMessageBrokerSchedulePoolSize, "Limit on number of tasks in BackgroundProcessingPool for message streaming") \
+ M(BackgroundMessageBrokerSchedulePoolTask, "Number of active tasks in BackgroundMessageBrokerSchedulePool for message streaming") \
+ M(BackgroundMessageBrokerSchedulePoolSize, "Limit on number of tasks in BackgroundMessageBrokerSchedulePool for message streaming") \
M(CacheDictionaryUpdateQueueBatches, "Number of 'batches' (a set of keys) in update queue in CacheDictionaries.") \
M(CacheDictionaryUpdateQueueKeys, "Exact number of keys in update queue in CacheDictionaries.") \
M(DiskSpaceReservedForMerge, "Disk space reserved for currently running background merges. It is slightly more than the total size of currently merging parts.") \
@@ -183,8 +183,14 @@
M(BuildVectorSimilarityIndexThreadsScheduled, "Number of queued or active jobs in the build vector similarity index thread pool.") \
\
M(DiskPlainRewritableAzureDirectoryMapSize, "Number of local-to-remote path entries in the 'plain_rewritable' in-memory map for AzureObjectStorage.") \
+ M(DiskPlainRewritableAzureFileCount, "Number of file entries in the 'plain_rewritable' in-memory map for AzureObjectStorage.") \
+ M(DiskPlainRewritableAzureUniqueFileNamesCount, "Number of unique file name entries in the 'plain_rewritable' in-memory map for AzureObjectStorage.") \
M(DiskPlainRewritableLocalDirectoryMapSize, "Number of local-to-remote path entries in the 'plain_rewritable' in-memory map for LocalObjectStorage.") \
+ M(DiskPlainRewritableLocalFileCount, "Number of file entries in the 'plain_rewritable' in-memory map for LocalObjectStorage.") \
+ M(DiskPlainRewritableLocalUniqueFileNamesCount, "Number of unique file name entries in the 'plain_rewritable' in-memory map for LocalObjectStorage.") \
M(DiskPlainRewritableS3DirectoryMapSize, "Number of local-to-remote path entries in the 'plain_rewritable' in-memory map for S3ObjectStorage.") \
+ M(DiskPlainRewritableS3FileCount, "Number of file entries in the 'plain_rewritable' in-memory map for S3ObjectStorage.") \
+ M(DiskPlainRewritableS3UniqueFileNamesCount, "Number of unique file name entries in the 'plain_rewritable' in-memory map for S3ObjectStorage.") \
\
M(MergeTreePartsLoaderThreads, "Number of threads in the MergeTree parts loader thread pool.") \
M(MergeTreePartsLoaderThreadsActive, "Number of threads in the MergeTree parts loader thread pool running a task.") \
diff --git a/src/Common/CurrentMetrics.h b/src/Common/CurrentMetrics.h
index 2c64fd29bbb..1c0de91a0bf 100644
--- a/src/Common/CurrentMetrics.h
+++ b/src/Common/CurrentMetrics.h
@@ -1,7 +1,6 @@
#pragma once
#include
-#include
#include
#include
#include
diff --git a/src/Common/GWPAsan.cpp b/src/Common/GWPAsan.cpp
index de6991191ea..a210fb3a73a 100644
--- a/src/Common/GWPAsan.cpp
+++ b/src/Common/GWPAsan.cpp
@@ -57,7 +57,7 @@ static bool guarded_alloc_initialized = []
opts.MaxSimultaneousAllocations = 1024;
if (!env_options_raw || !std::string_view{env_options_raw}.contains("SampleRate"))
- opts.SampleRate = 10000;
+ opts.SampleRate = 0;
const char * collect_stacktraces = std::getenv("GWP_ASAN_COLLECT_STACKTRACES"); // NOLINT(concurrency-mt-unsafe)
if (collect_stacktraces && std::string_view{collect_stacktraces} == "1")
diff --git a/src/Common/GWPAsan.h b/src/Common/GWPAsan.h
index 846c3417db4..c01a1130739 100644
--- a/src/Common/GWPAsan.h
+++ b/src/Common/GWPAsan.h
@@ -8,7 +8,6 @@
#include
#include
-#include
namespace GWPAsan
{
@@ -39,14 +38,6 @@ inline bool shouldSample()
return init_finished.load(std::memory_order_relaxed) && GuardedAlloc.shouldSample();
}
-inline bool shouldForceSample()
-{
- if (!init_finished.load(std::memory_order_relaxed))
- return false;
- std::bernoulli_distribution dist(force_sample_probability.load(std::memory_order_relaxed));
- return dist(thread_local_rng);
-}
-
}
#endif
diff --git a/src/Common/LockGuard.h b/src/Common/LockGuard.h
new file mode 100644
index 00000000000..8a98c5f553a
--- /dev/null
+++ b/src/Common/LockGuard.h
@@ -0,0 +1,37 @@
+#pragma once
+
+#include
+#include
+
+namespace DB
+{
+
+/** LockGuard provides RAII-style locking mechanism for a mutex.
+ ** It's intended to be used like std::unique_ptr but with TSA annotations
+ */
+template
+class TSA_SCOPED_LOCKABLE LockGuard
+{
+public:
+ explicit LockGuard(Mutex & mutex_) TSA_ACQUIRE(mutex_) : mutex(mutex_) { mutex.lock(); }
+ ~LockGuard() TSA_RELEASE() { mutex.unlock(); }
+
+private:
+ Mutex & mutex;
+};
+
+template typename TLockGuard, typename Mutex>
+class TSA_SCOPED_LOCKABLE LockAndOverCommitTrackerBlocker
+{
+public:
+ explicit LockAndOverCommitTrackerBlocker(Mutex & mutex_) TSA_ACQUIRE(mutex_) : lock(TLockGuard(mutex_)) {}
+ ~LockAndOverCommitTrackerBlocker() TSA_RELEASE() = default;
+
+ TLockGuard & getUnderlyingLock() { return lock; }
+
+private:
+ TLockGuard lock;
+ OvercommitTrackerBlockerInThread blocker = {};
+};
+
+}
diff --git a/src/Common/MemoryTracker.cpp b/src/Common/MemoryTracker.cpp
index 3ed943f217d..f4af019605e 100644
--- a/src/Common/MemoryTracker.cpp
+++ b/src/Common/MemoryTracker.cpp
@@ -68,15 +68,15 @@ inline std::string_view toDescription(OvercommitResult result)
case OvercommitResult::NONE:
return "";
case OvercommitResult::DISABLED:
- return "Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero.";
+ return "Memory overcommit isn't used. Waiting time or overcommit denominator are set to zero";
case OvercommitResult::MEMORY_FREED:
throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "OvercommitResult::MEMORY_FREED shouldn't be asked for description");
case OvercommitResult::SELECTED:
- return "Query was selected to stop by OvercommitTracker.";
+ return "Query was selected to stop by OvercommitTracker";
case OvercommitResult::TIMEOUTED:
- return "Waiting timeout for memory to be freed is reached.";
+ return "Waiting timeout for memory to be freed is reached";
case OvercommitResult::NOT_ENOUGH_FREED:
- return "Memory overcommit has freed not enough memory.";
+ return "Memory overcommit has not freed enough memory";
}
}
@@ -150,15 +150,23 @@ void MemoryTracker::logPeakMemoryUsage()
auto peak_bytes = peak.load(std::memory_order::relaxed);
if (peak_bytes < 128 * 1024)
return;
- LOG_DEBUG(getLogger("MemoryTracker"),
- "Peak memory usage{}: {}.", (description ? " " + std::string(description) : ""), ReadableSize(peak_bytes));
+ LOG_DEBUG(
+ getLogger("MemoryTracker"),
+ "{}{} memory usage: {}.",
+ description ? std::string(description) : "",
+ description ? " peak" : "Peak",
+ ReadableSize(peak_bytes));
}
void MemoryTracker::logMemoryUsage(Int64 current) const
{
const auto * description = description_ptr.load(std::memory_order_relaxed);
- LOG_DEBUG(getLogger("MemoryTracker"),
- "Current memory usage{}: {}.", (description ? " " + std::string(description) : ""), ReadableSize(current));
+ LOG_DEBUG(
+ getLogger("MemoryTracker"),
+ "{}{} memory usage: {}.",
+ description ? std::string(description) : "",
+ description ? " current" : "Current",
+ ReadableSize(current));
}
void MemoryTracker::injectFault() const
@@ -178,9 +186,9 @@ void MemoryTracker::injectFault() const
const auto * description = description_ptr.load(std::memory_order_relaxed);
throw DB::Exception(
DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED,
- "Memory tracker{}{}: fault injected (at specific point)",
- description ? " " : "",
- description ? description : "");
+ "{}{}: fault injected (at specific point)",
+ description ? description : "",
+ description ? " memory tracker" : "Memory tracker");
}
void MemoryTracker::debugLogBigAllocationWithoutCheck(Int64 size [[maybe_unused]])
@@ -282,9 +290,9 @@ AllocationTrace MemoryTracker::allocImpl(Int64 size, bool throw_if_memory_exceed
const auto * description = description_ptr.load(std::memory_order_relaxed);
throw DB::Exception(
DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED,
- "Memory tracker{}{}: fault injected. Would use {} (attempt to allocate chunk of {} bytes), maximum: {}",
- description ? " " : "",
+ "{}{}: fault injected. Would use {} (attempt to allocate chunk of {} bytes), maximum: {}",
description ? description : "",
+ description ? " memory tracker" : "Memory tracker",
formatReadableSizeWithBinarySuffix(will_be),
size,
formatReadableSizeWithBinarySuffix(current_hard_limit));
@@ -305,6 +313,8 @@ AllocationTrace MemoryTracker::allocImpl(Int64 size, bool throw_if_memory_exceed
if (overcommit_result != OvercommitResult::MEMORY_FREED)
{
+ bool overcommit_result_ignore
+ = overcommit_result == OvercommitResult::NONE || overcommit_result == OvercommitResult::DISABLED;
/// Revert
amount.fetch_sub(size, std::memory_order_relaxed);
rss.fetch_sub(size, std::memory_order_relaxed);
@@ -314,18 +324,18 @@ AllocationTrace MemoryTracker::allocImpl(Int64 size, bool throw_if_memory_exceed
ProfileEvents::increment(ProfileEvents::QueryMemoryLimitExceeded);
const auto * description = description_ptr.load(std::memory_order_relaxed);
throw DB::Exception(
- DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED,
- "Memory limit{}{} exceeded: "
- "would use {} (attempt to allocate chunk of {} bytes), current RSS {}, maximum: {}."
- "{}{}",
- description ? " " : "",
- description ? description : "",
- formatReadableSizeWithBinarySuffix(will_be),
- size,
- formatReadableSizeWithBinarySuffix(rss.load(std::memory_order_relaxed)),
- formatReadableSizeWithBinarySuffix(current_hard_limit),
- overcommit_result == OvercommitResult::NONE ? "" : " OvercommitTracker decision: ",
- toDescription(overcommit_result));
+ DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED,
+ "{}{} exceeded: "
+ "would use {} (attempt to allocate chunk of {} bytes), current RSS {}, maximum: {}."
+ "{}{}",
+ description ? description : "",
+ description ? " memory limit" : "Memory limit",
+ formatReadableSizeWithBinarySuffix(will_be),
+ size,
+ formatReadableSizeWithBinarySuffix(rss.load(std::memory_order_relaxed)),
+ formatReadableSizeWithBinarySuffix(current_hard_limit),
+ overcommit_result_ignore ? "" : " OvercommitTracker decision: ",
+ overcommit_result_ignore ? "" : toDescription(overcommit_result));
}
// If OvercommitTracker::needToStopQuery returned false, it guarantees that enough memory is freed.
diff --git a/src/Common/OvercommitTracker.cpp b/src/Common/OvercommitTracker.cpp
index 2a453596dab..751a61b7a41 100644
--- a/src/Common/OvercommitTracker.cpp
+++ b/src/Common/OvercommitTracker.cpp
@@ -45,7 +45,7 @@ OvercommitResult OvercommitTracker::needToStopQuery(MemoryTracker * tracker, Int
// method OvercommitTracker::onQueryStop(MemoryTracker *) is
// always called with already acquired global mutex in
// ProcessListEntry::~ProcessListEntry().
- auto global_lock = process_list->unsafeLock();
+ DB::ProcessList::Lock global_lock(process_list->getMutex());
std::unique_lock lk(overcommit_m);
size_t id = next_id++;
diff --git a/src/Common/PODArray.h b/src/Common/PODArray.h
index 48f2ffee8ce..2d69b8ac26c 100644
--- a/src/Common/PODArray.h
+++ b/src/Common/PODArray.h
@@ -115,11 +115,6 @@ protected:
template
void alloc(size_t bytes, TAllocatorParams &&... allocator_params)
{
-#if USE_GWP_ASAN
- if (unlikely(GWPAsan::shouldForceSample()))
- gwp_asan::getThreadLocals()->NextSampleCounter = 1;
-#endif
-
char * allocated = reinterpret_cast(TAllocator::alloc(bytes, std::forward(allocator_params)...));
c_start = allocated + pad_left;
@@ -149,11 +144,6 @@ protected:
return;
}
-#if USE_GWP_ASAN
- if (unlikely(GWPAsan::shouldForceSample()))
- gwp_asan::getThreadLocals()->NextSampleCounter = 1;
-#endif
-
unprotect();
ptrdiff_t end_diff = c_end - c_start;
diff --git a/src/Common/Priority.h b/src/Common/Priority.h
index 8952fe4dd5a..f0e5787ae91 100644
--- a/src/Common/Priority.h
+++ b/src/Common/Priority.h
@@ -6,6 +6,7 @@
/// Separate type (rather than `Int64` is used just to avoid implicit conversion errors and to default-initialize
struct Priority
{
- Int64 value = 0; /// Note that lower value means higher priority.
- constexpr operator Int64() const { return value; } /// NOLINT
+ using Value = Int64;
+ Value value = 0; /// Note that lower value means higher priority.
+ constexpr operator Value() const { return value; } /// NOLINT
};
diff --git a/src/Common/Scheduler/IResourceManager.h b/src/Common/Scheduler/IResourceManager.h
index 8a7077ac3d5..c6f41346e11 100644
--- a/src/Common/Scheduler/IResourceManager.h
+++ b/src/Common/Scheduler/IResourceManager.h
@@ -26,6 +26,9 @@ class IClassifier : private boost::noncopyable
public:
virtual ~IClassifier() = default;
+ /// Returns true iff resource access is allowed by this classifier
+ virtual bool has(const String & resource_name) = 0;
+
/// Returns ResourceLink that should be used to access resource.
/// Returned link is valid until classifier destruction.
virtual ResourceLink get(const String & resource_name) = 0;
@@ -46,12 +49,15 @@ public:
/// Initialize or reconfigure manager.
virtual void updateConfiguration(const Poco::Util::AbstractConfiguration & config) = 0;
+ /// Returns true iff given resource is controlled through this manager.
+ virtual bool hasResource(const String & resource_name) const = 0;
+
/// Obtain a classifier instance required to get access to resources.
/// Note that it holds resource configuration, so should be destructed when query is done.
virtual ClassifierPtr acquire(const String & classifier_name) = 0;
/// For introspection, see `system.scheduler` table
- using VisitorFunc = std::function;
+ using VisitorFunc = std::function;
virtual void forEachNode(VisitorFunc visitor) = 0;
};
diff --git a/src/Common/Scheduler/ISchedulerConstraint.h b/src/Common/Scheduler/ISchedulerConstraint.h
index a976206de74..3bee9c1b424 100644
--- a/src/Common/Scheduler/ISchedulerConstraint.h
+++ b/src/Common/Scheduler/ISchedulerConstraint.h
@@ -15,8 +15,7 @@ namespace DB
* When constraint is again satisfied, scheduleActivation() is called from finishRequest().
*
* Derived class behaviour requirements:
- * - dequeueRequest() must fill `request->constraint` iff it is nullptr;
- * - finishRequest() must be recursive: call to `parent_constraint->finishRequest()`.
+ * - dequeueRequest() must call `request->addConstraint()`.
*/
class ISchedulerConstraint : public ISchedulerNode
{
@@ -25,34 +24,16 @@ public:
: ISchedulerNode(event_queue_, config, config_prefix)
{}
+ ISchedulerConstraint(EventQueue * event_queue_, const SchedulerNodeInfo & info_)
+ : ISchedulerNode(event_queue_, info_)
+ {}
+
/// Resource consumption by `request` is finished.
/// Should be called outside of scheduling subsystem, implementation must be thread-safe.
virtual void finishRequest(ResourceRequest * request) = 0;
- void setParent(ISchedulerNode * parent_) override
- {
- ISchedulerNode::setParent(parent_);
-
- // Assign `parent_constraint` to the nearest parent derived from ISchedulerConstraint
- for (ISchedulerNode * node = parent_; node != nullptr; node = node->parent)
- {
- if (auto * constraint = dynamic_cast(node))
- {
- parent_constraint = constraint;
- break;
- }
- }
- }
-
/// For introspection of current state (true = satisfied, false = violated)
virtual bool isSatisfied() = 0;
-
-protected:
- // Reference to nearest parent that is also derived from ISchedulerConstraint.
- // Request can traverse through multiple constraints while being dequeue from hierarchy,
- // while finishing request should traverse the same chain in reverse order.
- // NOTE: it must be immutable after initialization, because it is accessed in not thread-safe way from finishRequest()
- ISchedulerConstraint * parent_constraint = nullptr;
};
}
diff --git a/src/Common/Scheduler/ISchedulerNode.h b/src/Common/Scheduler/ISchedulerNode.h
index 0705c4f0a35..5e1239de274 100644
--- a/src/Common/Scheduler/ISchedulerNode.h
+++ b/src/Common/Scheduler/ISchedulerNode.h
@@ -57,7 +57,13 @@ struct SchedulerNodeInfo
SchedulerNodeInfo() = default;
- explicit SchedulerNodeInfo(const Poco::Util::AbstractConfiguration & config = emptyConfig(), const String & config_prefix = {})
+ explicit SchedulerNodeInfo(double weight_, Priority priority_ = {})
+ {
+ setWeight(weight_);
+ setPriority(priority_);
+ }
+
+ explicit SchedulerNodeInfo(const Poco::Util::AbstractConfiguration & config, const String & config_prefix = {})
{
setWeight(config.getDouble(config_prefix + ".weight", weight));
setPriority(config.getInt64(config_prefix + ".priority", priority));
@@ -68,7 +74,7 @@ struct SchedulerNodeInfo
if (value <= 0 || !isfinite(value))
throw Exception(
ErrorCodes::INVALID_SCHEDULER_NODE,
- "Negative and non-finite node weights are not allowed: {}",
+ "Zero, negative and non-finite node weights are not allowed: {}",
value);
weight = value;
}
@@ -78,6 +84,11 @@ struct SchedulerNodeInfo
priority.value = value;
}
+ void setPriority(Priority value)
+ {
+ priority = value;
+ }
+
// To check if configuration update required
bool equals(const SchedulerNodeInfo & o) const
{
@@ -123,7 +134,14 @@ public:
, info(config, config_prefix)
{}
- virtual ~ISchedulerNode() = default;
+ ISchedulerNode(EventQueue * event_queue_, const SchedulerNodeInfo & info_)
+ : event_queue(event_queue_)
+ , info(info_)
+ {}
+
+ virtual ~ISchedulerNode();
+
+ virtual const String & getTypeName() const = 0;
/// Checks if two nodes configuration is equal
virtual bool equals(ISchedulerNode * other)
@@ -134,10 +152,11 @@ public:
/// Attach new child
virtual void attachChild(const std::shared_ptr & child) = 0;
- /// Detach and destroy child
+ /// Detach child
+ /// NOTE: child might be destroyed if the only reference was stored in parent
virtual void removeChild(ISchedulerNode * child) = 0;
- /// Get attached child by name
+ /// Get attached child by name (for tests only)
virtual ISchedulerNode * getChild(const String & child_name) = 0;
/// Activation of child due to the first pending request
@@ -147,7 +166,7 @@ public:
/// Returns true iff node is active
virtual bool isActive() = 0;
- /// Returns number of active children
+ /// Returns number of active children (for introspection only).
virtual size_t activeChildren() = 0;
/// Returns the first request to be executed as the first component of resulting pair.
@@ -155,10 +174,10 @@ public:
virtual std::pair dequeueRequest() = 0;
/// Returns full path string using names of every parent
- String getPath()
+ String getPath() const
{
String result;
- ISchedulerNode * ptr = this;
+ const ISchedulerNode * ptr = this;
while (ptr->parent)
{
result = "/" + ptr->basename + result;
@@ -168,10 +187,7 @@ public:
}
/// Attach to a parent (used by attachChild)
- virtual void setParent(ISchedulerNode * parent_)
- {
- parent = parent_;
- }
+ void setParent(ISchedulerNode * parent_);
protected:
/// Notify parents about the first pending request or constraint becoming satisfied.
@@ -307,6 +323,15 @@ public:
pending.notify_one();
}
+ /// Removes an activation from queue
+ void cancelActivation(ISchedulerNode * node)
+ {
+ std::unique_lock lock{mutex};
+ if (node->is_linked())
+ activations.erase(activations.iterator_to(*node));
+ node->activation_event_id = 0;
+ }
+
/// Process single event if it exists
/// Note that postponing constraint are ignored, use it to empty the queue including postponed events on shutdown
/// Returns `true` iff event has been processed
@@ -471,6 +496,20 @@ private:
std::atomic manual_time{TimePoint()}; // for tests only
};
+inline ISchedulerNode::~ISchedulerNode()
+{
+ // Make sure there is no dangling reference in activations queue
+ event_queue->cancelActivation(this);
+}
+
+inline void ISchedulerNode::setParent(ISchedulerNode * parent_)
+{
+ parent = parent_;
+ // Avoid activation of a detached node
+ if (parent == nullptr)
+ event_queue->cancelActivation(this);
+}
+
inline void ISchedulerNode::scheduleActivation()
{
if (likely(parent))
diff --git a/src/Common/Scheduler/ISchedulerQueue.h b/src/Common/Scheduler/ISchedulerQueue.h
index b7a51870a24..6c77cee6b9d 100644
--- a/src/Common/Scheduler/ISchedulerQueue.h
+++ b/src/Common/Scheduler/ISchedulerQueue.h
@@ -21,6 +21,10 @@ public:
: ISchedulerNode(event_queue_, config, config_prefix)
{}
+ ISchedulerQueue(EventQueue * event_queue_, const SchedulerNodeInfo & info_)
+ : ISchedulerNode(event_queue_, info_)
+ {}
+
// Wrapper for `enqueueRequest()` that should be used to account for available resource budget
// Returns `estimated_cost` that should be passed later to `adjustBudget()`
[[ nodiscard ]] ResourceCost enqueueRequestUsingBudget(ResourceRequest * request)
@@ -47,6 +51,11 @@ public:
/// Should be called outside of scheduling subsystem, implementation must be thread-safe.
virtual bool cancelRequest(ResourceRequest * request) = 0;
+ /// Fails all the resource requests in queue and marks this queue as not usable.
+ /// Afterwards any new request will be failed on `enqueueRequest()`.
+ /// NOTE: This is done for queues that are about to be destructed.
+ virtual void purgeQueue() = 0;
+
/// For introspection
ResourceCost getBudget() const
{
diff --git a/src/Common/Scheduler/Nodes/ClassifiersConfig.cpp b/src/Common/Scheduler/Nodes/ClassifiersConfig.cpp
index 3be61801149..455d0880aa6 100644
--- a/src/Common/Scheduler/Nodes/ClassifiersConfig.cpp
+++ b/src/Common/Scheduler/Nodes/ClassifiersConfig.cpp
@@ -5,11 +5,6 @@
namespace DB
{
-namespace ErrorCodes
-{
- extern const int RESOURCE_NOT_FOUND;
-}
-
ClassifierDescription::ClassifierDescription(const Poco::Util::AbstractConfiguration & config, const String & config_prefix)
{
Poco::Util::AbstractConfiguration::Keys keys;
@@ -31,9 +26,11 @@ ClassifiersConfig::ClassifiersConfig(const Poco::Util::AbstractConfiguration & c
const ClassifierDescription & ClassifiersConfig::get(const String & classifier_name)
{
+ static ClassifierDescription empty;
if (auto it = classifiers.find(classifier_name); it != classifiers.end())
return it->second;
- throw Exception(ErrorCodes::RESOURCE_NOT_FOUND, "Unknown workload classifier '{}' to access resources", classifier_name);
+ else
+ return empty;
}
}
diff --git a/src/Common/Scheduler/Nodes/ClassifiersConfig.h b/src/Common/Scheduler/Nodes/ClassifiersConfig.h
index 186c49943ad..62db719568b 100644
--- a/src/Common/Scheduler/Nodes/ClassifiersConfig.h
+++ b/src/Common/Scheduler/Nodes/ClassifiersConfig.h
@@ -10,6 +10,7 @@ namespace DB
/// Mapping of resource name into path string (e.g. "disk1" -> "/path/to/class")
struct ClassifierDescription : std::unordered_map
{
+ ClassifierDescription() = default;
ClassifierDescription(const Poco::Util::AbstractConfiguration & config, const String & config_prefix);
};
diff --git a/src/Common/Scheduler/Nodes/DynamicResourceManager.cpp b/src/Common/Scheduler/Nodes/CustomResourceManager.cpp
similarity index 84%
rename from src/Common/Scheduler/Nodes/DynamicResourceManager.cpp
rename to src/Common/Scheduler/Nodes/CustomResourceManager.cpp
index 5bf884fc3df..b9ab89ee2b8 100644
--- a/src/Common/Scheduler/Nodes/DynamicResourceManager.cpp
+++ b/src/Common/Scheduler/Nodes/CustomResourceManager.cpp
@@ -1,7 +1,6 @@
-#include
+#include
#include
-#include
#include
#include
@@ -21,7 +20,7 @@ namespace ErrorCodes
extern const int INVALID_SCHEDULER_NODE;
}
-DynamicResourceManager::State::State(EventQueue * event_queue, const Poco::Util::AbstractConfiguration & config)
+CustomResourceManager::State::State(EventQueue * event_queue, const Poco::Util::AbstractConfiguration & config)
: classifiers(config)
{
Poco::Util::AbstractConfiguration::Keys keys;
@@ -35,7 +34,7 @@ DynamicResourceManager::State::State(EventQueue * event_queue, const Poco::Util:
}
}
-DynamicResourceManager::State::Resource::Resource(
+CustomResourceManager::State::Resource::Resource(
const String & name,
EventQueue * event_queue,
const Poco::Util::AbstractConfiguration & config,
@@ -92,7 +91,7 @@ DynamicResourceManager::State::Resource::Resource(
throw Exception(ErrorCodes::INVALID_SCHEDULER_NODE, "undefined root node path '/' for resource '{}'", name);
}
-DynamicResourceManager::State::Resource::~Resource()
+CustomResourceManager::State::Resource::~Resource()
{
// NOTE: we should rely on `attached_to` and cannot use `parent`,
// NOTE: because `parent` can be `nullptr` in case attachment is still in event queue
@@ -106,14 +105,14 @@ DynamicResourceManager::State::Resource::~Resource()
}
}
-DynamicResourceManager::State::Node::Node(const String & name, EventQueue * event_queue, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix)
+CustomResourceManager::State::Node::Node(const String & name, EventQueue * event_queue, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix)
: type(config.getString(config_prefix + ".type", "fifo"))
, ptr(SchedulerNodeFactory::instance().get(type, event_queue, config, config_prefix))
{
ptr->basename = name;
}
-bool DynamicResourceManager::State::Resource::equals(const DynamicResourceManager::State::Resource & o) const
+bool CustomResourceManager::State::Resource::equals(const CustomResourceManager::State::Resource & o) const
{
if (nodes.size() != o.nodes.size())
return false;
@@ -130,14 +129,14 @@ bool DynamicResourceManager::State::Resource::equals(const DynamicResourceManage
return true;
}
-bool DynamicResourceManager::State::Node::equals(const DynamicResourceManager::State::Node & o) const
+bool CustomResourceManager::State::Node::equals(const CustomResourceManager::State::Node & o) const
{
if (type != o.type)
return false;
return ptr->equals(o.ptr.get());
}
-DynamicResourceManager::Classifier::Classifier(const DynamicResourceManager::StatePtr & state_, const String & classifier_name)
+CustomResourceManager::Classifier::Classifier(const CustomResourceManager::StatePtr & state_, const String & classifier_name)
: state(state_)
{
// State is immutable, but nodes are mutable and thread-safe
@@ -162,20 +161,25 @@ DynamicResourceManager::Classifier::Classifier(const DynamicResourceManager::Sta
}
}
-ResourceLink DynamicResourceManager::Classifier::get(const String & resource_name)
+bool CustomResourceManager::Classifier::has(const String & resource_name)
+{
+ return resources.contains(resource_name);
+}
+
+ResourceLink CustomResourceManager::Classifier::get(const String & resource_name)
{
if (auto iter = resources.find(resource_name); iter != resources.end())
return iter->second;
throw Exception(ErrorCodes::RESOURCE_ACCESS_DENIED, "Access denied to resource '{}'", resource_name);
}
-DynamicResourceManager::DynamicResourceManager()
+CustomResourceManager::CustomResourceManager()
: state(new State())
{
scheduler.start();
}
-void DynamicResourceManager::updateConfiguration(const Poco::Util::AbstractConfiguration & config)
+void CustomResourceManager::updateConfiguration(const Poco::Util::AbstractConfiguration & config)
{
StatePtr new_state = std::make_shared(scheduler.event_queue, config);
@@ -217,7 +221,13 @@ void DynamicResourceManager::updateConfiguration(const Poco::Util::AbstractConfi
// NOTE: after mutex unlock `state` became available for Classifier(s) and must be immutable
}
-ClassifierPtr DynamicResourceManager::acquire(const String & classifier_name)
+bool CustomResourceManager::hasResource(const String & resource_name) const
+{
+ std::lock_guard lock{mutex};
+ return state->resources.contains(resource_name);
+}
+
+ClassifierPtr CustomResourceManager::acquire(const String & classifier_name)
{
// Acquire a reference to the current state
StatePtr state_ref;
@@ -229,7 +239,7 @@ ClassifierPtr DynamicResourceManager::acquire(const String & classifier_name)
return std::make_shared(state_ref, classifier_name);
}
-void DynamicResourceManager::forEachNode(IResourceManager::VisitorFunc visitor)
+void CustomResourceManager::forEachNode(IResourceManager::VisitorFunc visitor)
{
// Acquire a reference to the current state
StatePtr state_ref;
@@ -244,7 +254,7 @@ void DynamicResourceManager::forEachNode(IResourceManager::VisitorFunc visitor)
{
for (auto & [name, resource] : state_ref->resources)
for (auto & [path, node] : resource->nodes)
- visitor(name, path, node.type, node.ptr);
+ visitor(name, path, node.ptr.get());
promise.set_value();
});
@@ -252,9 +262,4 @@ void DynamicResourceManager::forEachNode(IResourceManager::VisitorFunc visitor)
future.get();
}
-void registerDynamicResourceManager(ResourceManagerFactory & factory)
-{
- factory.registerMethod("dynamic");
-}
-
}
diff --git a/src/Common/Scheduler/Nodes/DynamicResourceManager.h b/src/Common/Scheduler/Nodes/CustomResourceManager.h
similarity index 86%
rename from src/Common/Scheduler/Nodes/DynamicResourceManager.h
rename to src/Common/Scheduler/Nodes/CustomResourceManager.h
index 4b0a3a48b61..900a9c4e50b 100644
--- a/src/Common/Scheduler/Nodes/DynamicResourceManager.h
+++ b/src/Common/Scheduler/Nodes/CustomResourceManager.h
@@ -10,7 +10,9 @@ namespace DB
{
/*
- * Implementation of `IResourceManager` supporting arbitrary dynamic hierarchy of scheduler nodes.
+ * Implementation of `IResourceManager` supporting arbitrary hierarchy of scheduler nodes.
+ * Scheduling hierarchies for every resource is described through server xml or yaml configuration.
+ * Configuration could be changed dynamically without server restart.
* All resources are controlled by single root `SchedulerRoot`.
*
* State of manager is set of resources attached to the scheduler. States are referenced by classifiers.
@@ -24,11 +26,12 @@ namespace DB
* violation will apply to fairness. Old version exists as long as there is at least one classifier
* instance referencing it. Classifiers are typically attached to queries and will be destructed with them.
*/
-class DynamicResourceManager : public IResourceManager
+class CustomResourceManager : public IResourceManager
{
public:
- DynamicResourceManager();
+ CustomResourceManager();
void updateConfiguration(const Poco::Util::AbstractConfiguration & config) override;
+ bool hasResource(const String & resource_name) const override;
ClassifierPtr acquire(const String & classifier_name) override;
void forEachNode(VisitorFunc visitor) override;
@@ -79,6 +82,7 @@ private:
{
public:
Classifier(const StatePtr & state_, const String & classifier_name);
+ bool has(const String & resource_name) override;
ResourceLink get(const String & resource_name) override;
private:
std::unordered_map resources; // accessible resources by names
@@ -86,7 +90,7 @@ private:
};
SchedulerRoot scheduler;
- std::mutex mutex;
+ mutable std::mutex mutex;
StatePtr state;
};
diff --git a/src/Common/Scheduler/Nodes/FairPolicy.h b/src/Common/Scheduler/Nodes/FairPolicy.h
index 246642ff2fd..a865711c460 100644
--- a/src/Common/Scheduler/Nodes/FairPolicy.h
+++ b/src/Common/Scheduler/Nodes/FairPolicy.h
@@ -28,7 +28,7 @@ namespace ErrorCodes
* of a child is set to vruntime of "start" of the last request. This guarantees immediate processing
* of at least single request of newly activated children and thus best isolation and scheduling latency.
*/
-class FairPolicy : public ISchedulerNode
+class FairPolicy final : public ISchedulerNode
{
/// Scheduling state of a child
struct Item
@@ -48,6 +48,23 @@ public:
: ISchedulerNode(event_queue_, config, config_prefix)
{}
+ FairPolicy(EventQueue * event_queue_, const SchedulerNodeInfo & info_)
+ : ISchedulerNode(event_queue_, info_)
+ {}
+
+ ~FairPolicy() override
+ {
+ // We need to clear `parent` in all children to avoid dangling references
+ while (!children.empty())
+ removeChild(children.begin()->second.get());
+ }
+
+ const String & getTypeName() const override
+ {
+ static String type_name("fair");
+ return type_name;
+ }
+
bool equals(ISchedulerNode * other) override
{
if (!ISchedulerNode::equals(other))
diff --git a/src/Common/Scheduler/Nodes/FifoQueue.h b/src/Common/Scheduler/Nodes/FifoQueue.h
index 90f8fffe665..9502fae1a45 100644
--- a/src/Common/Scheduler/Nodes/FifoQueue.h
+++ b/src/Common/Scheduler/Nodes/FifoQueue.h
@@ -23,13 +23,28 @@ namespace ErrorCodes
/*
* FIFO queue to hold pending resource requests
*/
-class FifoQueue : public ISchedulerQueue
+class FifoQueue final : public ISchedulerQueue
{
public:
FifoQueue(EventQueue * event_queue_, const Poco::Util::AbstractConfiguration & config, const String & config_prefix)
: ISchedulerQueue(event_queue_, config, config_prefix)
{}
+ FifoQueue(EventQueue * event_queue_, const SchedulerNodeInfo & info_)
+ : ISchedulerQueue(event_queue_, info_)
+ {}
+
+ ~FifoQueue() override
+ {
+ purgeQueue();
+ }
+
+ const String & getTypeName() const override
+ {
+ static String type_name("fifo");
+ return type_name;
+ }
+
bool equals(ISchedulerNode * other) override
{
if (!ISchedulerNode::equals(other))
@@ -42,6 +57,8 @@ public:
void enqueueRequest(ResourceRequest * request) override
{
std::lock_guard lock(mutex);
+ if (is_not_usable)
+ throw Exception(ErrorCodes::INVALID_SCHEDULER_NODE, "Scheduler queue is about to be destructed");
queue_cost += request->cost;
bool was_empty = requests.empty();
requests.push_back(*request);
@@ -66,6 +83,8 @@ public:
bool cancelRequest(ResourceRequest * request) override
{
std::lock_guard lock(mutex);
+ if (is_not_usable)
+ return false; // Any request should already be failed or executed
if (request->is_linked())
{
// It's impossible to check that `request` is indeed inserted to this queue and not another queue.
@@ -88,6 +107,19 @@ public:
return false;
}
+ void purgeQueue() override
+ {
+ std::lock_guard lock(mutex);
+ is_not_usable = true;
+ while (!requests.empty())
+ {
+ ResourceRequest * request = &requests.front();
+ requests.pop_front();
+ request->failed(std::make_exception_ptr(
+ Exception(ErrorCodes::INVALID_SCHEDULER_NODE, "Scheduler queue with resource request is about to be destructed")));
+ }
+ }
+
bool isActive() override
{
std::lock_guard lock(mutex);
@@ -131,6 +163,7 @@ private:
std::mutex mutex;
Int64 queue_cost = 0;
boost::intrusive::list requests;
+ bool is_not_usable = false;
};
}
diff --git a/src/Common/Scheduler/Nodes/IOResourceManager.cpp b/src/Common/Scheduler/Nodes/IOResourceManager.cpp
new file mode 100644
index 00000000000..e2042a29a80
--- /dev/null
+++ b/src/Common/Scheduler/Nodes/IOResourceManager.cpp
@@ -0,0 +1,532 @@
+#include
+
+#include
+#include
+
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include
+#include
+
+#include
+#include
+#include