Merge remote-tracking branch 'ClickHouse/master' into delete_old_mongodb_integration

This commit is contained in:
Robert Schulze 2024-11-26 12:44:06 +00:00
commit 45e1702f82
No known key found for this signature in database
GPG Key ID: 26703B55FB13728A
155 changed files with 4007 additions and 1854 deletions

View File

@ -26,7 +26,7 @@
* When retrieving data directly from a dictionary using Dictionary storage, dictionary table function, or direct SELECT from the dictionary itself, it is now enough to have `SELECT` permission or `dictGet` permission for the dictionary. This aligns with previous attempts to prevent ACL bypasses: https://github.com/ClickHouse/ClickHouse/pull/57362 and https://github.com/ClickHouse/ClickHouse/pull/65359. It also makes the latter one backward compatible. [#72051](https://github.com/ClickHouse/ClickHouse/pull/72051) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
#### Experimental feature
* Implement `allowed_feature_tier` as a global switch to disable all experimental / beta features. [#71841](https://github.com/ClickHouse/ClickHouse/pull/71841) ([Raúl Marín](https://github.com/Algunenano)).
* Implement `allowed_feature_tier` as a global switch to disable all experimental / beta features. [#71841](https://github.com/ClickHouse/ClickHouse/pull/71841) [#71145](https://github.com/ClickHouse/ClickHouse/pull/71145) ([Raúl Marín](https://github.com/Algunenano)).
* Fix possible error `No such file or directory` due to unescaped special symbols in files for JSON subcolumns. [#71182](https://github.com/ClickHouse/ClickHouse/pull/71182) ([Pavel Kruglov](https://github.com/Avogar)).
* Support alter from String to JSON. This PR also changes the serialization of JSON and Dynamic types to new version V2. Old version V1 can be still used by enabling setting `merge_tree_use_v1_object_and_dynamic_serialization` (can be used during upgrade to be able to rollback the version without issues). [#70442](https://github.com/ClickHouse/ClickHouse/pull/70442) ([Pavel Kruglov](https://github.com/Avogar)).
* Implement simple CAST from Map/Tuple/Object to new JSON through serialization/deserialization from JSON string. [#71320](https://github.com/ClickHouse/ClickHouse/pull/71320) ([Pavel Kruglov](https://github.com/Avogar)).
@ -34,74 +34,140 @@
* Forbid Dynamic/Variant types in min/max functions to avoid confusion. [#71761](https://github.com/ClickHouse/ClickHouse/pull/71761) ([Pavel Kruglov](https://github.com/Avogar)).
#### New Feature
* Added SQL syntax to describe workload and resource management. https://clickhouse.com/docs/en/operations/workload-scheduling. [#69187](https://github.com/ClickHouse/ClickHouse/pull/69187) ([Sergei Trifonov](https://github.com/serxa)).
* A new data type, `BFloat16`, represents 16-bit floating point numbers with 8-bit exponent, sign, and 7-bit mantissa. This closes [#44206](https://github.com/ClickHouse/ClickHouse/issues/44206). This closes [#49937](https://github.com/ClickHouse/ClickHouse/issues/49937). [#64712](https://github.com/ClickHouse/ClickHouse/pull/64712) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add `CHECK GRANT` query to check whether the current user/role has been granted the specific privilege and whether the corresponding table/column exists in the memory. [#68885](https://github.com/ClickHouse/ClickHouse/pull/68885) ([Unalian](https://github.com/Unalian)).
* Added SQL syntax to describe workload and resource management. https://clickhouse.com/docs/en/operations/workload-scheduling. [#69187](https://github.com/ClickHouse/ClickHouse/pull/69187) ([Sergei Trifonov](https://github.com/serxa)).
* Added server setting `async_load_system_database` that allows the server to start with not fully loaded system database. This helps to start ClickHouse faster if there are many system tables. [#69847](https://github.com/ClickHouse/ClickHouse/pull/69847) ([Sergei Trifonov](https://github.com/serxa)).
* Allow each authentication method to have its own expiration date, remove from user entity. [#70090](https://github.com/ClickHouse/ClickHouse/pull/70090) ([Arthur Passos](https://github.com/arthurpassos)).
* Push external user roles from query originator to other nodes in cluster. Helpful when only originator has access to the external authenticator (like LDAP). [#70332](https://github.com/ClickHouse/ClickHouse/pull/70332) ([Andrey Zvonov](https://github.com/zvonand)).
* Added a new header type for S3 endpoints for user authentication (`access_header`). This allows to get some access header with the lowest priority, which will be overwritten with `access_key_id` from any other source (for example, a table schema or a named collection). [#71011](https://github.com/ClickHouse/ClickHouse/pull/71011) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Initial implementation of settings tiers. [#71145](https://github.com/ClickHouse/ClickHouse/pull/71145) ([Raúl Marín](https://github.com/Algunenano)).
* Add support for staleness clause in order by with fill operator. [#71151](https://github.com/ClickHouse/ClickHouse/pull/71151) ([Mikhail Artemenko](https://github.com/Michicosun)).
* Added aliases `anyRespectNulls`, `firstValueRespectNulls`, and `anyValueRespectNulls` for aggregation function `any`. Also added aliases `anyLastRespectNulls` and `lastValueRespectNulls` for aggregation function `anyLast`. This allows using more natural camel-case-only syntax rather than mixed camel-case/underscore syntax, for example: `SELECT anyLastRespectNullsStateIf` instead of `anyLast_respect_nullsStateIf`. [#71403](https://github.com/ClickHouse/ClickHouse/pull/71403) ([Peter Nguyen](https://github.com/petern48)).
* Added the configuration `date_time_utc` parameter, enabling JSON log formatting to support UTC date-time in RFC 3339/ISO8601 format. [#71560](https://github.com/ClickHouse/ClickHouse/pull/71560) ([Ali](https://github.com/xogoodnow)).
* Optimized memory usage for values of index granularity if granularity is constant for part. Added an ability to always select constant granularity for part (setting `use_const_adaptive_granularity`), which helps to ensure that it is always optimized in memory. It helps in large workloads (trillions of rows in shared storage) to avoid constantly growing memory usage by metadata (values of index granularity) of data parts. [#71786](https://github.com/ClickHouse/ClickHouse/pull/71786) ([Anton Popov](https://github.com/CurtizJ)).
* Add `iceberg[S3;HDFS;Azure]Cluster`, `deltaLakeCluster`, `hudiCluster` table functions. [#72045](https://github.com/ClickHouse/ClickHouse/pull/72045) ([Mikhail Artemenko](https://github.com/Michicosun)).
* Add ability to set user/password in http_handlers (for `dynamic_query_handler`/`predefined_query_handler`). [#70725](https://github.com/ClickHouse/ClickHouse/pull/70725) ([Azat Khuzhin](https://github.com/azat)).
* Add support for staleness clause in the ORDER BY WITH FILL operator. [#71151](https://github.com/ClickHouse/ClickHouse/pull/71151) ([Mikhail Artemenko](https://github.com/Michicosun)).
* Allow each authentication method to have its own expiration date, remove from user entity. [#70090](https://github.com/ClickHouse/ClickHouse/pull/70090) ([Arthur Passos](https://github.com/arthurpassos)).
* Added new functions `parseDateTime64`, `parseDateTime64OrNull` and `parseDateTime64OrZero`. Compared to the existing function `parseDateTime` (and variants), they return a value of type `DateTime64` instead of `DateTime`. [#71581](https://github.com/ClickHouse/ClickHouse/pull/71581) ([kevinyhzou](https://github.com/KevinyhZou)).
#### Performance Improvement
* Now we won't copy input blocks columns for `join_algorithm='parallel_hash'` when distribute them between threads for parallel processing. [#67782](https://github.com/ClickHouse/ClickHouse/pull/67782) ([Nikita Taranov](https://github.com/nickitat)).
* Optimized `Replacing` merge algorithm for non intersecting parts. [#70977](https://github.com/ClickHouse/ClickHouse/pull/70977) ([Anton Popov](https://github.com/CurtizJ)).
* Optimized memory usage for values of index granularity if granularity is constant for part. Added an ability to always select constant granularity for part (setting `use_const_adaptive_granularity`), which helps to ensure that it is always optimized in memory. It helps in large workloads (trillions of rows in shared storage) to avoid constantly growing memory usage by metadata (values of index granularity) of data parts. [#71786](https://github.com/ClickHouse/ClickHouse/pull/71786) ([Anton Popov](https://github.com/CurtizJ)).
* Now we don't copy input blocks columns for `join_algorithm = 'parallel_hash'` when distribute them between threads for parallel processing. [#67782](https://github.com/ClickHouse/ClickHouse/pull/67782) ([Nikita Taranov](https://github.com/nickitat)).
* Optimized `Replacing` merge algorithm for non-intersecting parts. [#70977](https://github.com/ClickHouse/ClickHouse/pull/70977) ([Anton Popov](https://github.com/CurtizJ)).
* Do not list detached parts from readonly and write-once disks for metrics and system.detached_parts. [#71086](https://github.com/ClickHouse/ClickHouse/pull/71086) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Do not calculate heavy asynchronous metrics by default. The feature was introduced in [#40332](https://github.com/ClickHouse/ClickHouse/issues/40332), but it isn't good to have a heavy background job that is needed for only a single customer. [#71087](https://github.com/ClickHouse/ClickHouse/pull/71087) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve the performance and accuracy of system.query_metric_log collection interval by reducing the critical region. [#71473](https://github.com/ClickHouse/ClickHouse/pull/71473) ([Pablo Marcos](https://github.com/pamarcos)).
* For the `plain_rewritable` disks: Do not call the object storage API when listing directories, as this may be cost-inefficient. Instead, store the list of filenames in the memory. The trade-offs are increased initial load time and memory required to store filenames. [#70823](https://github.com/ClickHouse/ClickHouse/pull/70823) ([Julia Kartseva](https://github.com/jkartseva)).
* Improve the performance and accuracy of `system.query_metric_log` collection interval by reducing the critical region. [#71473](https://github.com/ClickHouse/ClickHouse/pull/71473) ([Pablo Marcos](https://github.com/pamarcos)).
* Read-in-order optimization via generating virtual rows, so less data would be read during merge sort especially useful when multiple parts exist. [#62125](https://github.com/ClickHouse/ClickHouse/pull/62125) ([Shichao Jin](https://github.com/jsc0218)).
* Added server setting `async_load_system_database` that allows the server to start with not fully loaded system database. This helps to start ClickHouse faster if there are many system tables. [#69847](https://github.com/ClickHouse/ClickHouse/pull/69847) ([Sergei Trifonov](https://github.com/serxa)).
* Add `--threads` parameter to `clickhouse-compressor`, which allows to compress data in parallel. [#70860](https://github.com/ClickHouse/ClickHouse/pull/70860) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added a setting `prewarm_mark_cache` which enables loading of marks to mark cache on inserts, merges, fetches of parts and on startup of the table. [#71053](https://github.com/ClickHouse/ClickHouse/pull/71053) ([Anton Popov](https://github.com/CurtizJ)).
* Shrink to fit index_granularity array in memory to reduce memory footprint for MergeTree table engines family. [#71595](https://github.com/ClickHouse/ClickHouse/pull/71595) ([alesapin](https://github.com/alesapin)).
* Turn off filesystem cache setting `boundary_alignment` for non-disk read, which improves performance of reading from standalone remote files with caching. [#71827](https://github.com/ClickHouse/ClickHouse/pull/71827) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Queries like `SELECT * FROM table LIMIT ...` used to load part indexes even though they were not used. [#71866](https://github.com/ClickHouse/ClickHouse/pull/71866) ([Alexander Gololobov](https://github.com/davenger)).
* Enable `parallel_replicas_local_plan` by default. Building a full-fledged local plan on the query initiator improves parallel replicas performance with less resource consumption, provides opportunities to apply more query optimizations. [#70171](https://github.com/ClickHouse/ClickHouse/pull/70171) ([Igor Nikonov](https://github.com/devcrafter)).
#### Improvement
* Allow using clickhouse with a file argument as `ch queries.sql`. [#71589](https://github.com/ClickHouse/ClickHouse/pull/71589) ([Raúl Marín](https://github.com/Algunenano)).
* The `Vertical` format (which is also activated when you end your query with `\G`) gets the features of Pretty formats, such as: - highlighting thousand groups in numbers; - printing a readable number tip. [#71630](https://github.com/ClickHouse/ClickHouse/pull/71630) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Push external user roles from query originator to other nodes in cluster. Helpful when only originator has access to the external authenticator (like LDAP). [#70332](https://github.com/ClickHouse/ClickHouse/pull/70332) ([Andrey Zvonov](https://github.com/zvonand)).
* Added aliases `anyRespectNulls`, `firstValueRespectNulls`, and `anyValueRespectNulls` for aggregation function `any`. Also added aliases `anyLastRespectNulls` and `lastValueRespectNulls` for aggregation function `anyLast`. This allows using more natural camel-case-only syntax rather than mixed camel-case/underscore syntax, for example: `SELECT anyLastRespectNullsStateIf` instead of `anyLast_respect_nullsStateIf`. [#71403](https://github.com/ClickHouse/ClickHouse/pull/71403) ([Peter Nguyen](https://github.com/petern48)).
* Added the configuration `date_time_utc` parameter, enabling JSON log formatting to support UTC date-time in RFC 3339/ISO8601 format. [#71560](https://github.com/ClickHouse/ClickHouse/pull/71560) ([Ali](https://github.com/xogoodnow)).
* Added a new header type for S3 endpoints for user authentication (`access_header`). This allows to get some access header with the lowest priority, which will be overwritten with `access_key_id` from any other source (for example, a table schema or a named collection). [#71011](https://github.com/ClickHouse/ClickHouse/pull/71011) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Higher-order functions with constant arrays and constant captured arguments will return constants. [#58400](https://github.com/ClickHouse/ClickHouse/pull/58400) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Read-in-order optimization via generating virtual rows, so less data would be read during merge sort especially useful when multiple parts exist. [#62125](https://github.com/ClickHouse/ClickHouse/pull/62125) ([Shichao Jin](https://github.com/jsc0218)).
* Query plan step names (`EXPLAIN PLAN json=1`) and pipeline processor names (`EXPLAIN PIPELINE compact=0,graph=1`) now have a unique id as a suffix. This allows to match processors profiler output and OpenTelemetry traces with explain output. [#63518](https://github.com/ClickHouse/ClickHouse/pull/63518) ([qhsong](https://github.com/qhsong)).
* Added option to check object exists after writing to Azure Blob Storage, this is controlled by setting `check_objects_after_upload`. [#64847](https://github.com/ClickHouse/ClickHouse/pull/64847) ([Smita Kulkarni](https://github.com/SmitaRKulkarni)).
* Added option to check if the object exists after writing it to Azure Blob Storage, this is controlled by setting `check_objects_after_upload`. [#64847](https://github.com/ClickHouse/ClickHouse/pull/64847) ([Smita Kulkarni](https://github.com/SmitaRKulkarni)).
* Use `Atomic` database by default in `clickhouse-local`. Address items 1 and 5 from [#50647](https://github.com/ClickHouse/ClickHouse/issues/50647). Closes [#44817](https://github.com/ClickHouse/ClickHouse/issues/44817). [#68024](https://github.com/ClickHouse/ClickHouse/pull/68024) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Exceptions break the HTTP protocol in order to alert the client about error. [#68800](https://github.com/ClickHouse/ClickHouse/pull/68800) ([Sema Checherinda](https://github.com/CheSema)).
* Report running DDLWorker hosts by creating replica_dir and mark replicas active in DDLWorker. [#69658](https://github.com/ClickHouse/ClickHouse/pull/69658) ([tuanpach](https://github.com/tuanpach)).
* Report hosts running distributed DDL queries by creating replica_dir and mark replicas active in DDLWorker. [#69658](https://github.com/ClickHouse/ClickHouse/pull/69658) ([tuanpach](https://github.com/tuanpach)).
* Wait only on active replicas for database ON CLUSTER queries if distributed_ddl_output_mode is set to be *_only_active. [#69660](https://github.com/ClickHouse/ClickHouse/pull/69660) ([tuanpach](https://github.com/tuanpach)).
* Better error-handling and cancellation of `ON CLUSTER` backups and restores: - If a backup or restore fails on one host then it'll be cancelled on other hosts automatically - No weird errors must be produced because some hosts failed while other hosts continued their work - If a backup or restore is cancelled on one host then it'll be cancelled on other hosts automatically - Fix issues with `test_disallow_concurrency` - now disabling of concurrency must work better - Backups and restores now are much more resistant to ZooKeeper disconnects. [#70027](https://github.com/ClickHouse/ClickHouse/pull/70027) ([Vitaly Baranov](https://github.com/vitlibar)).
* Enable `parallel_replicas_local_plan` by default. Building a full-fledged local plan on the query initiator improves parallel replicas performance with less resource consumption, provides opportunities to apply more query optimizations. [#70171](https://github.com/ClickHouse/ClickHouse/pull/70171) ([Igor Nikonov](https://github.com/devcrafter)).
* Add ability to set user/password in http_handlers (for `dynamic_query_handler`/`predefined_query_handler`). [#70725](https://github.com/ClickHouse/ClickHouse/pull/70725) ([Azat Khuzhin](https://github.com/azat)).
* Support `ALTER TABLE ... MODIFY/RESET SETTING ...` for certain settings in storage S3Queue. [#70811](https://github.com/ClickHouse/ClickHouse/pull/70811) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Do not call the object storage API when listing directories, as this may be cost-inefficient. Instead, store the list of filenames in the memory. The trade-offs are increased initial load time and memory required to store filenames. [#70823](https://github.com/ClickHouse/ClickHouse/pull/70823) ([Julia Kartseva](https://github.com/jkartseva)).
* Add `--threads` parameter to `clickhouse-compressor`, which allows to compress data in parallel. [#70860](https://github.com/ClickHouse/ClickHouse/pull/70860) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added the ability to reload client certificates in the same way as the procedure for reloading server certificates. [#70997](https://github.com/ClickHouse/ClickHouse/pull/70997) ([Roman Antonov](https://github.com/Romeo58rus)).
* Refactored internal structure of files which work with DataLake Storages. [#71012](https://github.com/ClickHouse/ClickHouse/pull/71012) ([Daniil Ivanik](https://github.com/divanik)).
* Make the Replxx client history size configurable. [#71014](https://github.com/ClickHouse/ClickHouse/pull/71014) ([Jiří Kozlovský](https://github.com/jirislav)).
* Added a setting `prewarm_mark_cache` which enables loading of marks to mark cache on inserts, merges, fetches of parts and on startup of the table. [#71053](https://github.com/ClickHouse/ClickHouse/pull/71053) ([Anton Popov](https://github.com/CurtizJ)).
* Boolean support for parquet native reader. [#71055](https://github.com/ClickHouse/ClickHouse/pull/71055) ([Arthur Passos](https://github.com/arthurpassos)).
* Make the client history size configurable and increase its default size. [#71014](https://github.com/ClickHouse/ClickHouse/pull/71014) ([Jiří Kozlovský](https://github.com/jirislav)).
* Boolean types support for the parquet native reader. [#71055](https://github.com/ClickHouse/ClickHouse/pull/71055) ([Arthur Passos](https://github.com/arthurpassos)).
* Retry more errors when interacting with S3, such as "Malformed message". [#71088](https://github.com/ClickHouse/ClickHouse/pull/71088) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Lower log level for some messages about S3. [#71090](https://github.com/ClickHouse/ClickHouse/pull/71090) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Support write hdfs files with space. [#71105](https://github.com/ClickHouse/ClickHouse/pull/71105) ([exmy](https://github.com/exmy)).
* Support writing HDFS files with spaces. [#71105](https://github.com/ClickHouse/ClickHouse/pull/71105) ([exmy](https://github.com/exmy)).
* Added settings limiting the number of replicated tables, dictionaries and views. [#71179](https://github.com/ClickHouse/ClickHouse/pull/71179) ([Kirill](https://github.com/kirillgarbar)).
* Use `AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE` instead of `AWS_CONTAINER_AUTHORIZATION_TOKEN` if former is available. Fixes [#71074](https://github.com/ClickHouse/ClickHouse/issues/71074). [#71269](https://github.com/ClickHouse/ClickHouse/pull/71269) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Remove the metadata_version ZooKeeper node creation from RMT restarting thread. The only scenario where we need to create this node is when the user updated from a version earlier than 20.4 straight to one later than 24.10. ClickHouse does not support upgrades that span more than a year, so we should throw an exception and ask the user to update gradually, instead of creating the node. [#71385](https://github.com/ClickHouse/ClickHouse/pull/71385) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* Remove the metadata_version ZooKeeper node creation from ReplicatedMergeTree restarting thread. The only scenario where we need to create this node is when the user updated from a version earlier than 20.4 straight to one later than 24.10. ClickHouse does not support upgrades that span more than a year, so we should throw an exception and ask the user to update gradually, instead of creating the node. [#71385](https://github.com/ClickHouse/ClickHouse/pull/71385) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* Add per host dashboards `Overview (host)` and `Cloud overview (host)` to advanced dashboard. [#71422](https://github.com/ClickHouse/ClickHouse/pull/71422) ([alesapin](https://github.com/alesapin)).
* The methods `removeObject` and `removeObjects` are not idempotent. When retries happen due to network errors, the result could be `object not found` because it has been deleted at previous attempts. [#71529](https://github.com/ClickHouse/ClickHouse/pull/71529) ([Sema Checherinda](https://github.com/CheSema)).
* Added new functions `parseDateTime64`, `parseDateTime64OrNull` and `parseDateTime64OrZero`. Compared to the existing function `parseDateTime` (and variants), they return a value of type `DateTime64` instead of `DateTime`. [#71581](https://github.com/ClickHouse/ClickHouse/pull/71581) ([kevinyhzou](https://github.com/KevinyhZou)).
* Allow using clickhouse with a file argument as --queries-file. [#71589](https://github.com/ClickHouse/ClickHouse/pull/71589) ([Raúl Marín](https://github.com/Algunenano)).
* Shrink to fit index_granularity array in memory to reduce memory footprint for MergeTree table engines family. [#71595](https://github.com/ClickHouse/ClickHouse/pull/71595) ([alesapin](https://github.com/alesapin)).
* `clickhouse-local` uses implicit SELECT by default, which allows to use it as a calculator. Improve the syntax highlighting for the implicit SELECT mode. [#71620](https://github.com/ClickHouse/ClickHouse/pull/71620) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The command line applications will highlight syntax even for multi-statements. [#71622](https://github.com/ClickHouse/ClickHouse/pull/71622) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Command-line applications will return non-zero exit codes on errors. In previous versions, the `disks` application returned zero on errors, and other applications returned zero for errors 256 (`PARTITION_ALREADY_EXISTS`) and 512 (`SET_NON_GRANTED_ROLE`). [#71623](https://github.com/ClickHouse/ClickHouse/pull/71623) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* When user/group is given as ID, the `clickhouse su` fails. This patch fixes it to accept `UID:GID` as well. ### Documentation entry for user-facing changes. [#71626](https://github.com/ClickHouse/ClickHouse/pull/71626) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* The `Vertical` format (which is also activated when you end your query with `\G`) gets the features of Pretty formats, such as: - highlighting thousand groups in numbers; - printing a readable number tip. [#71630](https://github.com/ClickHouse/ClickHouse/pull/71630) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* When user/group is given as ID, the `clickhouse su` fails. This patch fixes it to accept `UID:GID` as well. [#71626](https://github.com/ClickHouse/ClickHouse/pull/71626) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Allow to disable memory buffer increase for filesystem cache via setting `filesystem_cache_prefer_bigger_buffer_size`. [#71640](https://github.com/ClickHouse/ClickHouse/pull/71640) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add a separate setting `background_download_max_file_segment_size` for background download max file segment size in filesystem cache. [#71648](https://github.com/ClickHouse/ClickHouse/pull/71648) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Changes the default value of `enable_http_compression` from 0 to 1. Closes [#71591](https://github.com/ClickHouse/ClickHouse/issues/71591). [#71774](https://github.com/ClickHouse/ClickHouse/pull/71774) ([Peter Nguyen](https://github.com/petern48)).
* Slightly better JSON type parsing: if current block for the JSON path contains values of several types, try to choose the best type by trying types in special best-effort order. [#71785](https://github.com/ClickHouse/ClickHouse/pull/71785) ([Pavel Kruglov](https://github.com/Avogar)).
* Previously reading from `system.asynchronous_metrics` would wait for concurrent update to finish. This can take long time if system is under heavy load. With this change the previously collected values can always be read. [#71798](https://github.com/ClickHouse/ClickHouse/pull/71798) ([Alexander Gololobov](https://github.com/davenger)).
* Set `polling_max_timeout_ms` to 10 minutes, `polling_backoff_ms` to 30 seconds. [#71817](https://github.com/ClickHouse/ClickHouse/pull/71817) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Turn-off filesystem cache setting `boundary_alignment` for non-disk read. [#71827](https://github.com/ClickHouse/ClickHouse/pull/71827) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update `HostResolver` 3 times in a `history` period. [#71863](https://github.com/ClickHouse/ClickHouse/pull/71863) ([Sema Checherinda](https://github.com/CheSema)).
* Queries like 'SELECT * FROM t LIMIT 1' used to load part indexes even though they were not used. [#71866](https://github.com/ClickHouse/ClickHouse/pull/71866) ([Alexander Gololobov](https://github.com/davenger)).
* Allow_reorder_prewhere_conditions is on by default with old compatibility settings. [#71867](https://github.com/ClickHouse/ClickHouse/pull/71867) ([Raúl Marín](https://github.com/Algunenano)).
* S3Queue and AzureQueue: Set `polling_max_timeout_ms` to 10 minutes, `polling_backoff_ms` to 30 seconds. [#71817](https://github.com/ClickHouse/ClickHouse/pull/71817) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update `HostResolver` three times in a `history` period. [#71863](https://github.com/ClickHouse/ClickHouse/pull/71863) ([Sema Checherinda](https://github.com/CheSema)).
* On the advanced dashboard HTML page added a dropdown selector for the dashboard from `system.dashboards` table. [#72081](https://github.com/ClickHouse/ClickHouse/pull/72081) ([Sergei Trifonov](https://github.com/serxa)).
* Check if default database is present after authorization. Fixes [#71097](https://github.com/ClickHouse/ClickHouse/issues/71097). [#71140](https://github.com/ClickHouse/ClickHouse/pull/71140) ([Konstantin Bogdanov](https://github.com/thevar1able)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* The parts deduplicated during `ATTACH PART` query don't get stuck with the `attaching_` prefix anymore. [#65636](https://github.com/ClickHouse/ClickHouse/pull/65636) ([Kirill](https://github.com/kirillgarbar)).
* Fix for the bug when DateTime64 losing precision for the `IN` function. [#67230](https://github.com/ClickHouse/ClickHouse/pull/67230) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fix possible logical error when using functions with `IGNORE/RESPECT NULLS` in `ORDER BY ... WITH FILL`, close [#57609](https://github.com/ClickHouse/ClickHouse/issues/57609). [#68234](https://github.com/ClickHouse/ClickHouse/pull/68234) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Fixed rare logical errors in asynchronous inserts with format `Native` in case of reached memory limit. [#68965](https://github.com/ClickHouse/ClickHouse/pull/68965) ([Anton Popov](https://github.com/CurtizJ)).
* Fix COMMENT in CREATE TABLE for EPHEMERAL column. [#70458](https://github.com/ClickHouse/ClickHouse/pull/70458) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix logical error in JSONExtract with LowCardinality(Nullable). [#70549](https://github.com/ClickHouse/ClickHouse/pull/70549) ([Pavel Kruglov](https://github.com/Avogar)).
* Allow system drop replica zkpath when there is another replica with the same zk path. [#70642](https://github.com/ClickHouse/ClickHouse/pull/70642) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Fix a crash and a leak in AggregateFunctionGroupArraySorted. [#70820](https://github.com/ClickHouse/ClickHouse/pull/70820) ([Michael Kolupaev](https://github.com/al13n321)).
* Add ability to override Content-Type by user headers in the URL engine. [#70859](https://github.com/ClickHouse/ClickHouse/pull/70859) ([Artem Iurin](https://github.com/ortyomka)).
* Fix logical error in `StorageS3Queue` "Cannot create a persistent node in /processed since it already exists". [#70984](https://github.com/ClickHouse/ClickHouse/pull/70984) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fixed named sessions not being closed and hanging on forever under certain circumstances. [#70998](https://github.com/ClickHouse/ClickHouse/pull/70998) ([Márcio Martins](https://github.com/marcio-absmartly)).
* Fix the bug that didn't consider _row_exists column in rebuild option of projection lightweight delete. [#71089](https://github.com/ClickHouse/ClickHouse/pull/71089) ([Shichao Jin](https://github.com/jsc0218)).
* Fix `AT_* is out of range` problem when running on Oracle Linux UEK 6.10. [#71109](https://github.com/ClickHouse/ClickHouse/pull/71109) ([Örjan Fors](https://github.com/op)).
* Fix wrong value in system.query_metric_log due to unexpected race condition. [#71124](https://github.com/ClickHouse/ClickHouse/pull/71124) ([Pablo Marcos](https://github.com/pamarcos)).
* Fix mismatched aggreage function name of quantileExactWeightedInterpolated. The bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/69619. cc @Algunenano. [#71168](https://github.com/ClickHouse/ClickHouse/pull/71168) ([李扬](https://github.com/taiyang-li)).
* Fix bad_weak_ptr exception with Dynamic in functions comparison. [#71183](https://github.com/ClickHouse/ClickHouse/pull/71183) ([Pavel Kruglov](https://github.com/Avogar)).
* Checks that read 7z file is on a local machine. [#71184](https://github.com/ClickHouse/ClickHouse/pull/71184) ([Daniil Ivanik](https://github.com/divanik)).
* Fix ignoring format settings in Native format via HTTP and Async Inserts. [#71193](https://github.com/ClickHouse/ClickHouse/pull/71193) ([Pavel Kruglov](https://github.com/Avogar)).
* SELECT queries run with setting `use_query_cache = 1` are no longer rejected if the name of a system table appears as a literal, e.g. `SELECT * FROM users WHERE name = 'system.metrics' SETTINGS use_query_cache = true;` now works. [#71254](https://github.com/ClickHouse/ClickHouse/pull/71254) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix bug of memory usage increase if enable_filesystem_cache=1, but disk in storage configuration did not have any cache configuration. [#71261](https://github.com/ClickHouse/ClickHouse/pull/71261) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix possible error "Cannot read all data" erros during deserialization of LowCardinality dictionary from Dynamic column. [#71299](https://github.com/ClickHouse/ClickHouse/pull/71299) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix incomplete cleanup of parallel output format in the client. [#71304](https://github.com/ClickHouse/ClickHouse/pull/71304) ([Raúl Marín](https://github.com/Algunenano)).
* Added missing unescaping in named collections. Without fix clickhouse-server can't start. [#71308](https://github.com/ClickHouse/ClickHouse/pull/71308) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Fix async inserts with empty blocks via native protocol. [#71312](https://github.com/ClickHouse/ClickHouse/pull/71312) ([Anton Popov](https://github.com/CurtizJ)).
* Fix inconsistent AST formatting when granting wrong wildcard grants [#71309](https://github.com/ClickHouse/ClickHouse/issues/71309). [#71332](https://github.com/ClickHouse/ClickHouse/pull/71332) ([pufit](https://github.com/pufit)).
* Add try/catch to data parts destructors to avoid std::terminate. [#71364](https://github.com/ClickHouse/ClickHouse/pull/71364) ([alesapin](https://github.com/alesapin)).
* Check suspicious and experimental types in JSON type hints. [#71369](https://github.com/ClickHouse/ClickHouse/pull/71369) ([Pavel Kruglov](https://github.com/Avogar)).
* Start memory worker thread on non-Linux OS too (fixes [#71051](https://github.com/ClickHouse/ClickHouse/issues/71051)). [#71384](https://github.com/ClickHouse/ClickHouse/pull/71384) ([Alexandre Snarskii](https://github.com/snar)).
* Fix error Invalid number of rows in Chunk with the Variant column. [#71388](https://github.com/ClickHouse/ClickHouse/pull/71388) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix error column "attgenerated" does not exist for older PostgreSQL versions, fix [#60651](https://github.com/ClickHouse/ClickHouse/issues/60651). [#71396](https://github.com/ClickHouse/ClickHouse/pull/71396) ([0xMihalich](https://github.com/0xMihalich)).
* To avoid spamming the server logs, failing authentication attempts are now logged at level `DEBUG` instead of `ERROR`. [#71405](https://github.com/ClickHouse/ClickHouse/pull/71405) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix crash in `mongodb` table function when passing wrong arguments (e.g. `NULL`). [#71426](https://github.com/ClickHouse/ClickHouse/pull/71426) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Fix crash with optimize_rewrite_array_exists_to_has. [#71432](https://github.com/ClickHouse/ClickHouse/pull/71432) ([Raúl Marín](https://github.com/Algunenano)).
* Fixed the usage of setting `max_insert_delayed_streams_for_parallel_write` in inserts. Previously it worked incorrectly which could lead to high memory usage in inserts which write data into several partitions. [#71474](https://github.com/ClickHouse/ClickHouse/pull/71474) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible error `Argument for function must be constant` (old analyzer) in case when arrayJoin can apparently appear in `WHERE` condition. Regression after https://github.com/ClickHouse/ClickHouse/pull/65414. [#71476](https://github.com/ClickHouse/ClickHouse/pull/71476) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Prevent crash in SortCursor with 0 columns (old analyzer). [#71494](https://github.com/ClickHouse/ClickHouse/pull/71494) ([Raúl Marín](https://github.com/Algunenano)).
* Fix Date32 out of range caused by uninitialized ORC data. For more details, refer to https://github.com/apache/incubator-gluten/issues/7823. [#71500](https://github.com/ClickHouse/ClickHouse/pull/71500) ([李扬](https://github.com/taiyang-li)).
* Fix counting column size in wide part for Dynamic and JSON types. [#71526](https://github.com/ClickHouse/ClickHouse/pull/71526) ([Pavel Kruglov](https://github.com/Avogar)).
* Analyzer fix when query inside materialized view uses IN with CTE. Closes [#65598](https://github.com/ClickHouse/ClickHouse/issues/65598). [#71538](https://github.com/ClickHouse/ClickHouse/pull/71538) ([Maksim Kita](https://github.com/kitaisreal)).
* Avoid crash when using a UDF in a constraint. [#71541](https://github.com/ClickHouse/ClickHouse/pull/71541) ([Raúl Marín](https://github.com/Algunenano)).
* Return 0 or default char instead of throwing an error in bitShift functions in case of out of bounds. [#71580](https://github.com/ClickHouse/ClickHouse/pull/71580) ([Pablo Marcos](https://github.com/pamarcos)).
* Fix server crashes while using materialized view with certain engines. [#71593](https://github.com/ClickHouse/ClickHouse/pull/71593) ([Pervakov Grigorii](https://github.com/GrigoryPervakov)).
* Array join with a nested data structure, which contains an alias to a constant array was leading to a null pointer dereference. This closes [#71677](https://github.com/ClickHouse/ClickHouse/issues/71677). [#71678](https://github.com/ClickHouse/ClickHouse/pull/71678) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix LOGICAL_ERROR when doing ALTER with empty tuple. This fixes [#71647](https://github.com/ClickHouse/ClickHouse/issues/71647). [#71679](https://github.com/ClickHouse/ClickHouse/pull/71679) ([Amos Bird](https://github.com/amosbird)).
* Don't transform constant set in predicates over partition columns in case of NOT IN operator. [#71695](https://github.com/ClickHouse/ClickHouse/pull/71695) ([Eduard Karacharov](https://github.com/korowa)).
* Fix docker init script fail log message for more clean understanding. [#71734](https://github.com/ClickHouse/ClickHouse/pull/71734) ([Андрей](https://github.com/andreineustroev)).
* Fix CAST from LowCardinality(Nullable) to Dynamic. Previously it could lead to error `Bad cast from type DB::ColumnVector<int> to DB::ColumnNullable`. [#71742](https://github.com/ClickHouse/ClickHouse/pull/71742) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix exception for toDayOfWeek on WHERE condition with primary key of DateTime64 type. [#71849](https://github.com/ClickHouse/ClickHouse/pull/71849) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fixed filling of defaults after parsing into sparse columns. [#71854](https://github.com/ClickHouse/ClickHouse/pull/71854) ([Anton Popov](https://github.com/CurtizJ)).
* Fix GROUPING function error when input is ALIAS on distributed table, close [#68602](https://github.com/ClickHouse/ClickHouse/issues/68602). [#71855](https://github.com/ClickHouse/ClickHouse/pull/71855) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Fix possible crash when using `allow_experimental_join_condition`, close [#71693](https://github.com/ClickHouse/ClickHouse/issues/71693). [#71857](https://github.com/ClickHouse/ClickHouse/pull/71857) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Fixed select statements that use `WITH TIES` clause which might not return enough rows. [#71886](https://github.com/ClickHouse/ClickHouse/pull/71886) ([wxybear](https://github.com/wxybear)).
* Fix the TOO_LARGE_ARRAY_SIZE exception caused when a column of arrayWithConstant evaluation is mistaken to cross the array size limit. [#71894](https://github.com/ClickHouse/ClickHouse/pull/71894) ([Udi](https://github.com/udiz)).
* `clickhouse-benchmark` reported wrong metrics for queries taking longer than one second. [#71898](https://github.com/ClickHouse/ClickHouse/pull/71898) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix data race between the progress indicator and the progress table in clickhouse-client. This issue is visible when FROM INFILE is used. Intercept keystrokes during INSERT queries to toggle progress table display. [#71901](https://github.com/ClickHouse/ClickHouse/pull/71901) ([Julia Kartseva](https://github.com/jkartseva)).
* Use auxiliary keepers for cluster autodiscovery. [#71911](https://github.com/ClickHouse/ClickHouse/pull/71911) ([Anton Ivashkin](https://github.com/ianton-ru)).
* Fix rows_processed column in system.s3/azure_queue_log broken in 24.6. Closes [#69975](https://github.com/ClickHouse/ClickHouse/issues/69975). [#71946](https://github.com/ClickHouse/ClickHouse/pull/71946) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fixed case when `s3`/`s3Cluster` functions could return incomplete result or throw an exception. It involved using glob pattern in s3 uri (like `pattern/*`) and an empty object should exist with the key `pattern/` (such objects automatically created by S3 Console). Also default value for setting `s3_skip_empty_files` changed from `false` to `true` by default. [#71947](https://github.com/ClickHouse/ClickHouse/pull/71947) ([Nikita Taranov](https://github.com/nickitat)).
* Fix a crash in clickhouse-client syntax highlighting. Closes [#71864](https://github.com/ClickHouse/ClickHouse/issues/71864). [#71949](https://github.com/ClickHouse/ClickHouse/pull/71949) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix `Illegal type` error for `MergeTree` tables with binary monotonic function in `ORDER BY` when the first argument is constant. Fixes [#71941](https://github.com/ClickHouse/ClickHouse/issues/71941). [#71966](https://github.com/ClickHouse/ClickHouse/pull/71966) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Allow only SELECT queries in EXPLAIN AST used inside subquery. Other types of queries lead to logical error: 'Bad cast from type DB::ASTCreateQuery to DB::ASTSelectWithUnionQuery' or `Inconsistent AST formatting`. [#71982](https://github.com/ClickHouse/ClickHouse/pull/71982) ([Pavel Kruglov](https://github.com/Avogar)).
* When insert a record by `clickhouse-client`, client will read column descriptions from server. but there was a bug that we wrote the descritions with a wrong order , it should be [statistics, ttl, settings]. [#71991](https://github.com/ClickHouse/ClickHouse/pull/71991) ([Han Fei](https://github.com/hanfei1991)).
* Fix formatting of `MOVE PARTITION ... TO TABLE ...` alter commands when `format_alter_commands_with_parentheses` is enabled. [#72080](https://github.com/ClickHouse/ClickHouse/pull/72080) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fixes RIGHT / FULL joins in queries with parallel replicas. Now, RIGHT joins can be executed with parallel replicas (right table reading is distributed). FULL joins can't be parallelized among nodes, - executed locally. [#71162](https://github.com/ClickHouse/ClickHouse/pull/71162) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix the issue where ClickHouse in Docker containers printed "get_mempolicy: Operation not permitted" into stderr due to restricted syscalls. [#70900](https://github.com/ClickHouse/ClickHouse/pull/70900) ([filimonov](https://github.com/filimonov)).
* Fix the metadata_version record in ZooKeeper in restarting thread rather than in attach thread. [#70297](https://github.com/ClickHouse/ClickHouse/pull/70297) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* This is a fix for "zero-copy" replication, which is unsupported and will be removed entirely. Don't delete a blob when there are nodes using it in ReplicatedMergeTree with zero-copy replication. [#71186](https://github.com/ClickHouse/ClickHouse/pull/71186) ([Antonio Andelic](https://github.com/antonio2368)).
* This is a fix for "zero-copy" replication, which is unsupported and will be removed entirely. Acquiring zero-copy shared lock before moving a part to zero-copy disk to prevent possible data loss if Keeper is unavailable. [#71845](https://github.com/ClickHouse/ClickHouse/pull/71845) ([Aleksei Filatov](https://github.com/aalexfvk)).
### <a id="2410"></a> ClickHouse release 24.10, 2024-10-31

View File

@ -0,0 +1,80 @@
# docker build -t clickhouse/binary-builder .
ARG FROM_TAG=latest
FROM clickhouse/fasttest:$FROM_TAG
ENV CC=clang-${LLVM_VERSION}
ENV CXX=clang++-${LLVM_VERSION}
# If the cctools is updated, then first build it in the CI, then update here in a different commit
COPY --from=clickhouse/cctools:d9e3596e706b /cctools /cctools
# Rust toolchain and libraries
ENV RUSTUP_HOME=/rust/rustup
ENV CARGO_HOME=/rust/cargo
ENV PATH="/rust/cargo/bin:${PATH}"
RUN curl https://sh.rustup.rs -sSf | bash -s -- -y && \
chmod 777 -R /rust && \
rustup toolchain install nightly-2024-04-01 && \
rustup default nightly-2024-04-01 && \
rustup toolchain remove stable && \
rustup component add rust-src && \
rustup target add x86_64-unknown-linux-gnu && \
rustup target add aarch64-unknown-linux-gnu && \
rustup target add x86_64-apple-darwin && \
rustup target add x86_64-unknown-freebsd && \
rustup target add aarch64-apple-darwin && \
rustup target add powerpc64le-unknown-linux-gnu && \
rustup target add x86_64-unknown-linux-musl && \
rustup target add aarch64-unknown-linux-musl && \
rustup target add riscv64gc-unknown-linux-gnu
# A cross-linker for RISC-V 64 (we need it, because LLVM's LLD does not work):
RUN apt-get update \
&& apt-get install software-properties-common --yes --no-install-recommends --verbose-versions
RUN add-apt-repository ppa:ubuntu-toolchain-r/test --yes \
&& apt-get update \
&& apt-get install --yes \
binutils-riscv64-linux-gnu \
build-essential \
python3-boto3 \
yasm \
zstd \
zip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*
# Download toolchain and SDK for Darwin
RUN curl -sL -O https://github.com/phracker/MacOSX-SDKs/releases/download/11.3/MacOSX11.0.sdk.tar.xz
# Download and install mold 2.0 for s390x build
RUN curl -Lo /tmp/mold.tar.gz "https://github.com/rui314/mold/releases/download/v2.0.0/mold-2.0.0-x86_64-linux.tar.gz" \
&& mkdir /tmp/mold \
&& tar -xzf /tmp/mold.tar.gz -C /tmp/mold \
&& cp -r /tmp/mold/mold*/* /usr \
&& rm -rf /tmp/mold \
&& rm /tmp/mold.tar.gz
# Architecture of the image when BuildKit/buildx is used
ARG TARGETARCH
ARG NFPM_VERSION=2.20.0
RUN arch=${TARGETARCH:-amd64} \
&& curl -Lo /tmp/nfpm.deb "https://github.com/goreleaser/nfpm/releases/download/v${NFPM_VERSION}/nfpm_${arch}.deb" \
&& dpkg -i /tmp/nfpm.deb \
&& rm /tmp/nfpm.deb
ARG GO_VERSION=1.19.10
# We needed go for clickhouse-diagnostics (it is not used anymore)
RUN arch=${TARGETARCH:-amd64} \
&& curl -Lo /tmp/go.tgz "https://go.dev/dl/go${GO_VERSION}.linux-${arch}.tar.gz" \
&& tar -xzf /tmp/go.tgz -C /usr/local/ \
&& rm /tmp/go.tgz
ENV PATH="$PATH:/usr/local/go/bin"
ENV GOPATH=/workdir/go
ENV GOCACHE=/workdir/
ARG CLANG_TIDY_SHA1=c191254ea00d47ade11d7170ef82fe038c213774
RUN curl -Lo /usr/bin/clang-tidy-cache \
"https://raw.githubusercontent.com/matus-chochlik/ctcache/$CLANG_TIDY_SHA1/clang-tidy-cache" \
&& chmod +x /usr/bin/clang-tidy-cache

View File

@ -0,0 +1,5 @@
# docker build -t clickhouse/test-old-centos .
FROM centos:5
CMD /bin/sh -c "/clickhouse server --config /config/config.xml > /var/log/clickhouse-server/stderr.log 2>&1 & \
sleep 5 && /clickhouse client --query \"select 'OK'\" 2> /var/log/clickhouse-server/clientstderr.log || echo 'FAIL'"

View File

@ -0,0 +1,5 @@
# docker build -t clickhouse/test-old-ubuntu .
FROM ubuntu:12.04
CMD /bin/sh -c "/clickhouse server --config /config/config.xml > /var/log/clickhouse-server/stderr.log 2>&1 & \
sleep 5 && /clickhouse client --query \"select 'OK'\" 2> /var/log/clickhouse-server/clientstderr.log || echo 'FAIL'"

View File

@ -11,7 +11,8 @@ ARG odbc_driver_url="https://github.com/ClickHouse/clickhouse-odbc/releases/down
RUN mkdir /etc/clickhouse-server /etc/clickhouse-keeper /etc/clickhouse-client && chmod 777 /etc/clickhouse-* \
&& mkdir -p /var/lib/clickhouse /var/log/clickhouse-server && chmod 777 /var/log/clickhouse-server /var/lib/clickhouse
RUN addgroup --gid 1001 clickhouse && adduser --uid 1001 --gid 1001 --disabled-password clickhouse
RUN addgroup --gid 1000 clickhouse && adduser --uid 1000 --gid 1000 --disabled-password clickhouse
RUN addgroup --gid 1001 clickhouse2 && adduser --uid 1001 --gid 1001 --disabled-password clickhouse2
# moreutils - provides ts fo FT
# expect, bzip2 - requried by FT

View File

@ -6,6 +6,7 @@ RUN apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --yes \
libxml2-utils \
python3-pip \
locales \
ripgrep \
git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*

View File

@ -1,10 +1,13 @@
import argparse
import os
from praktika.result import Result
from praktika.settings import Settings
from praktika.utils import MetaClasses, Shell, Utils
from ci.jobs.scripts.clickhouse_version import CHVersion
from ci.workflows.defs import CIFiles, ToolSet
from ci.workflows.pull_request import S3_BUILDS_BUCKET
class JobStages(metaclass=MetaClasses.WithIter):
@ -13,6 +16,7 @@ class JobStages(metaclass=MetaClasses.WithIter):
UNSHALLOW = "unshallow"
BUILD = "build"
PACKAGE = "package"
UNIT = "unit"
def parse_args():
@ -36,14 +40,22 @@ CMAKE_CMD = """cmake --debug-trycompile -DCMAKE_VERBOSE_MAKEFILE=1 -LA \
-DENABLE_UTILS=0 -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_INSTALL_PREFIX=/usr \
-DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON \
{AUX_DEFS} \
-DCMAKE_C_COMPILER=clang-18 -DCMAKE_CXX_COMPILER=clang++-18 \
-DCMAKE_C_COMPILER={COMPILER} -DCMAKE_CXX_COMPILER={COMPILER_CPP} \
-DCOMPILER_CACHE={CACHE_TYPE} -DENABLE_BUILD_PROFILING=1 {DIR}"""
# release: cmake --debug-trycompile -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE=None -DSANITIZE= -DENABLE_CHECK_HEAVY_BUILDS=1 -DENABLE_CLICKHOUSE_SELF_EXTRACTING=1 -DENABLE_TESTS=0 -DENABLE_UTILS=0 -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON -DSPLIT_DEBUG_SYMBOLS=ON -DBUILD_STANDALONE_KEEPER=1 -DCMAKE_C_COMPILER=clang-18 -DCMAKE_CXX_COMPILER=clang++-18 -DCOMPILER_CACHE=sccache -DENABLE_BUILD_PROFILING=1 ..
# binary release: cmake --debug-trycompile -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE=None -DSANITIZE= -DENABLE_CHECK_HEAVY_BUILDS=1 -DENABLE_CLICKHOUSE_SELF_EXTRACTING=1 -DCMAKE_C_COMPILER=clang-18 -DCMAKE_CXX_COMPILER=clang++-18 -DCOMPILER_CACHE=sccache -DENABLE_BUILD_PROFILING=1 ..
# release coverage: cmake --debug-trycompile -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE=None -DSANITIZE= -DENABLE_CHECK_HEAVY_BUILDS=1 -DENABLE_CLICKHOUSE_SELF_EXTRACTING=1 -DENABLE_TESTS=0 -DENABLE_UTILS=0 -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON -DCMAKE_C_COMPILER=clang-18 -DCMAKE_CXX_COMPILER=clang++-18 -DSANITIZE_COVERAGE=1 -DBUILD_STANDALONE_KEEPER=0 -DCOMPILER_CACHE=sccache -DENABLE_BUILD_PROFILING=1 ..
def main():
args = parse_args()
# # for sccache
# os.environ["SCCACHE_BUCKET"] = S3_BUILDS_BUCKET
# os.environ["SCCACHE_S3_KEY_PREFIX"] = "ccache/sccache"
# TODO: check with SCCACHE_LOG=debug SCCACHE_NO_DAEMON=1
stop_watch = Utils.Stopwatch()
stages = list(JobStages)
@ -65,30 +77,52 @@ def main():
BUILD_TYPE = "RelWithDebInfo"
SANITIZER = ""
AUX_DEFS = " -DENABLE_TESTS=0 "
AUX_DEFS = " -DENABLE_TESTS=1 "
cmake_cmd = None
if "debug" in build_type:
print("Build type set: debug")
BUILD_TYPE = "Debug"
AUX_DEFS = " -DENABLE_TESTS=1 "
AUX_DEFS = " -DENABLE_TESTS=0 "
package_type = "debug"
elif "release" in build_type:
print("Build type set: release")
AUX_DEFS = (
" -DENABLE_TESTS=0 -DSPLIT_DEBUG_SYMBOLS=ON -DBUILD_STANDALONE_KEEPER=1 "
)
package_type = "release"
elif "asan" in build_type:
print("Sanitizer set: address")
SANITIZER = "address"
package_type = "asan"
elif "tsan" in build_type:
print("Sanitizer set: thread")
SANITIZER = "thread"
package_type = "tsan"
elif "msan" in build_type:
print("Sanitizer set: memory")
SANITIZER = "memory"
package_type = "msan"
elif "ubsan" in build_type:
print("Sanitizer set: undefined")
SANITIZER = "undefined"
package_type = "ubsan"
elif "binary" in build_type:
package_type = "binary"
cmake_cmd = f"cmake --debug-trycompile -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE=None -DSANITIZE= -DENABLE_CHECK_HEAVY_BUILDS=1 -DENABLE_CLICKHOUSE_SELF_EXTRACTING=1 -DCMAKE_C_COMPILER={ToolSet.COMPILER_C} -DCMAKE_CXX_COMPILER={ToolSet.COMPILER_CPP} -DCOMPILER_CACHE=sccache -DENABLE_BUILD_PROFILING=1 {Utils.cwd()}"
else:
assert False
cmake_cmd = CMAKE_CMD.format(
BUILD_TYPE=BUILD_TYPE,
CACHE_TYPE=CACHE_TYPE,
SANITIZER=SANITIZER,
AUX_DEFS=AUX_DEFS,
DIR=Utils.cwd(),
)
if not cmake_cmd:
cmake_cmd = CMAKE_CMD.format(
BUILD_TYPE=BUILD_TYPE,
CACHE_TYPE=CACHE_TYPE,
SANITIZER=SANITIZER,
AUX_DEFS=AUX_DEFS,
DIR=Utils.cwd(),
COMPILER=ToolSet.COMPILER_C,
COMPILER_CPP=ToolSet.COMPILER_CPP,
)
build_dir = f"{Settings.TEMP_DIR}/build"
@ -98,7 +132,7 @@ def main():
if res and JobStages.UNSHALLOW in stages:
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Repo Unshallow",
command="git rev-parse --is-shallow-repository | grep -q true && git fetch --depth 10000 --no-tags --filter=tree:0 origin $(git rev-parse --abbrev-ref HEAD)",
with_log=True,
@ -119,7 +153,7 @@ def main():
if res and JobStages.CHECKOUT_SUBMODULES in stages:
Shell.check(f"rm -rf {build_dir} && mkdir -p {build_dir}")
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Checkout Submodules",
command=f"git submodule sync --recursive && git submodule init && git submodule update --depth 1 --recursive --jobs {min([Utils.cpu_count(), 20])}",
)
@ -128,7 +162,7 @@ def main():
if res and JobStages.CMAKE in stages:
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Cmake configuration",
command=cmake_cmd,
workdir=build_dir,
@ -140,7 +174,7 @@ def main():
if res and JobStages.BUILD in stages:
Shell.check("sccache --show-stats")
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Build ClickHouse",
command="ninja clickhouse-bundle clickhouse-odbc-bridge clickhouse-library-bridge",
workdir=build_dir,
@ -149,18 +183,13 @@ def main():
)
Shell.check("sccache --show-stats")
Shell.check(f"ls -l {build_dir}/programs/")
Shell.check(f"pwd")
Shell.check(f"find {build_dir} -name unit_tests_dbms")
Shell.check(f"find . -name unit_tests_dbms")
res = results[-1].is_ok()
if res and JobStages.PACKAGE in stages:
if "debug" in build_type:
package_type = "debug"
elif "release" in build_type:
package_type = "release"
elif "asan" in build_type:
package_type = "asan"
else:
assert False, "TODO"
if res and JobStages.PACKAGE in stages and "binary" not in build_type:
assert package_type
if "amd" in build_type:
deb_arch = "amd64"
else:
@ -170,7 +199,7 @@ def main():
assert Shell.check(f"rm -f {output_dir}/*.deb")
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Build Packages",
command=[
f"DESTDIR={build_dir}/root ninja programs/install",
@ -183,6 +212,17 @@ def main():
)
res = results[-1].is_ok()
if res and JobStages.UNIT in stages and (SANITIZER or "binary" in build_type):
# TODO: parallel execution
results.append(
Result.from_gtest_run(
name="Unit Tests",
unit_tests_path=CIFiles.UNIT_TESTS_BIN,
with_log=False,
)
)
res = results[-1].is_ok()
Result.create_from(results=results, stopwatch=stop_watch).complete_job()

View File

@ -1,3 +1,4 @@
import argparse
import math
import multiprocessing
import os
@ -245,8 +246,18 @@ def check_file_names(files):
return ""
def parse_args():
parser = argparse.ArgumentParser(description="ClickHouse Style Check Job")
# parser.add_argument("--param", help="Optional job start stage", default=None)
parser.add_argument("--test", help="Optional test name pattern", default="")
return parser.parse_args()
if __name__ == "__main__":
results = []
args = parse_args()
testpattern = args.test
stop_watch = Utils.Stopwatch()
all_files = Utils.traverse_paths(
@ -296,87 +307,111 @@ if __name__ == "__main__":
)
)
results.append(
run_check_concurrent(
check_name="Whitespace Check",
check_function=check_whitespaces,
files=cpp_files,
testname = "Whitespace Check"
if testpattern.lower() in testname.lower():
results.append(
run_check_concurrent(
check_name=testname,
check_function=check_whitespaces,
files=cpp_files,
)
)
)
results.append(
run_check_concurrent(
check_name="YamlLint Check",
check_function=check_yamllint,
files=yaml_workflow_files,
testname = "YamlLint Check"
if testpattern.lower() in testname.lower():
results.append(
run_check_concurrent(
check_name=testname,
check_function=check_yamllint,
files=yaml_workflow_files,
)
)
)
results.append(
run_check_concurrent(
check_name="XmlLint Check",
check_function=check_xmllint,
files=xml_files,
testname = "XmlLint Check"
if testpattern.lower() in testname.lower():
results.append(
run_check_concurrent(
check_name=testname,
check_function=check_xmllint,
files=xml_files,
)
)
)
results.append(
run_check_concurrent(
check_name="Functional Tests scripts smoke check",
check_function=check_functional_test_cases,
files=functional_test_files,
testname = "Functional Tests scripts smoke check"
if testpattern.lower() in testname.lower():
results.append(
run_check_concurrent(
check_name=testname,
check_function=check_functional_test_cases,
files=functional_test_files,
)
)
)
results.append(
Result.create_from_command_execution(
name="Check Tests Numbers",
command=check_gaps_in_tests_numbers,
command_args=[functional_test_files],
testname = "Check Tests Numbers"
if testpattern.lower() in testname.lower():
results.append(
Result.from_commands_run(
name=testname,
command=check_gaps_in_tests_numbers,
command_args=[functional_test_files],
)
)
)
results.append(
Result.create_from_command_execution(
name="Check Broken Symlinks",
command=check_broken_links,
command_kwargs={
"path": "./",
"exclude_paths": ["contrib/", "metadata/", "programs/server/data"],
},
testname = "Check Broken Symlinks"
if testpattern.lower() in testname.lower():
results.append(
Result.from_commands_run(
name=testname,
command=check_broken_links,
command_kwargs={
"path": "./",
"exclude_paths": ["contrib/", "metadata/", "programs/server/data"],
},
)
)
)
results.append(
Result.create_from_command_execution(
name="Check CPP code",
command=check_cpp_code,
testname = "Check CPP code"
if testpattern.lower() in testname.lower():
results.append(
Result.from_commands_run(
name=testname,
command=check_cpp_code,
)
)
)
results.append(
Result.create_from_command_execution(
name="Check Submodules",
command=check_repo_submodules,
testname = "Check Submodules"
if testpattern.lower() in testname.lower():
results.append(
Result.from_commands_run(
name=testname,
command=check_repo_submodules,
)
)
)
results.append(
Result.create_from_command_execution(
name="Check File Names",
command=check_file_names,
command_args=[all_files],
testname = "Check File Names"
if testpattern.lower() in testname.lower():
results.append(
Result.from_commands_run(
name=testname,
command=check_file_names,
command_args=[all_files],
)
)
)
results.append(
Result.create_from_command_execution(
name="Check Many Different Things",
command=check_other,
testname = "Check Many Different Things"
if testpattern.lower() in testname.lower():
results.append(
Result.from_commands_run(
name=testname,
command=check_other,
)
)
)
results.append(
Result.create_from_command_execution(
name="Check Codespell",
command=check_codespell,
testname = "Check Codespell"
if testpattern.lower() in testname.lower():
results.append(
Result.from_commands_run(
name=testname,
command=check_codespell,
)
)
)
results.append(
Result.create_from_command_execution(
name="Check Aspell",
command=check_aspell,
testname = "Check Aspell"
if testpattern.lower() in testname.lower():
results.append(
Result.from_commands_run(
name=testname,
command=check_aspell,
)
)
)
Result.create_from(results=results, stopwatch=stop_watch).complete_job()

View File

@ -6,6 +6,7 @@ from praktika.utils import MetaClasses, Shell, Utils
from ci.jobs.scripts.clickhouse_proc import ClickHouseProc
from ci.jobs.scripts.functional_tests_results import FTResultsProcessor
from ci.workflows.defs import ToolSet
def clone_submodules():
@ -132,7 +133,7 @@ def main():
if res and JobStages.CHECKOUT_SUBMODULES in stages:
Shell.check(f"rm -rf {build_dir} && mkdir -p {build_dir}")
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Checkout Submodules",
command=clone_submodules,
)
@ -141,10 +142,12 @@ def main():
if res and JobStages.CMAKE in stages:
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Cmake configuration",
command=f"cmake {current_directory} -DCMAKE_CXX_COMPILER=clang++-18 -DCMAKE_C_COMPILER=clang-18 \
-DCMAKE_TOOLCHAIN_FILE={current_directory}/cmake/linux/toolchain-x86_64-musl.cmake -DENABLE_LIBRARIES=0 \
command=f"cmake {current_directory} -DCMAKE_CXX_COMPILER={ToolSet.COMPILER_CPP} \
-DCMAKE_C_COMPILER={ToolSet.COMPILER_C} \
-DCMAKE_TOOLCHAIN_FILE={current_directory}/cmake/linux/toolchain-x86_64-musl.cmake \
-DENABLE_LIBRARIES=0 \
-DENABLE_TESTS=0 -DENABLE_UTILS=0 -DENABLE_THINLTO=0 -DENABLE_NURAFT=1 -DENABLE_SIMDJSON=1 \
-DENABLE_JEMALLOC=1 -DENABLE_LIBURING=1 -DENABLE_YAML_CPP=1 -DCOMPILER_CACHE=sccache",
workdir=build_dir,
@ -156,7 +159,7 @@ def main():
if res and JobStages.BUILD in stages:
Shell.check("sccache --show-stats")
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Build ClickHouse",
command="ninja clickhouse-bundle clickhouse-stripped",
workdir=build_dir,
@ -176,7 +179,7 @@ def main():
"clickhouse-test --help",
]
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Check and Compress binary",
command=commands,
workdir=build_dir,
@ -195,7 +198,7 @@ def main():
update_path_ch_config,
]
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Install ClickHouse Config",
command=commands,
with_log=True,

View File

@ -1,4 +1,5 @@
import argparse
import os
import time
from pathlib import Path
@ -109,7 +110,7 @@ def main():
f"clickhouse-server --version",
]
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Install ClickHouse", command=commands, with_log=True
)
)
@ -153,6 +154,10 @@ def main():
stop_watch_ = Utils.Stopwatch()
step_name = "Tests"
print(step_name)
# TODO: fix tests dependent on this and remove:
os.environ["CLICKHOUSE_TMP"] = "tests/queries/1_stateful"
# assert Shell.check("clickhouse-client -q \"insert into system.zookeeper (name, path, value) values ('auxiliary_zookeeper2', '/test/chroot/', '')\"", verbose=True)
run_test(
no_parallel=no_parallel,

View File

@ -1,5 +1,4 @@
import argparse
import os
import time
from pathlib import Path
@ -118,7 +117,7 @@ def main():
f"chmod +x /tmp/praktika/input/clickhouse-odbc-bridge",
]
results.append(
Result.create_from_command_execution(
Result.from_commands_run(
name="Install ClickHouse", command=commands, with_log=True
)
)

View File

@ -15,7 +15,7 @@
LC_ALL="en_US.UTF-8"
ROOT_PATH="."
EXCLUDE='build/|integration/|widechar_width/|glibc-compatibility/|poco/|memcpy/|consistent-hashing|benchmark|tests/.*.cpp|utils/keeper-bench/example.yaml'
EXCLUDE_DOCS='Settings\.cpp|FormatFactorySettingsDeclaration\.h'
EXCLUDE_DOCS='Settings\.cpp|FormatFactorySettings\.h'
# From [1]:
# But since array_to_string_internal() in array.c still loops over array
@ -85,6 +85,8 @@ EXTERN_TYPES_EXCLUDES=(
CurrentMetrics::add
CurrentMetrics::sub
CurrentMetrics::get
CurrentMetrics::getDocumentation
CurrentMetrics::getName
CurrentMetrics::set
CurrentMetrics::end
CurrentMetrics::Increment

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,200 @@
#!/bin/bash
set -e +x
CHPC_CHECK_START_TIMESTAMP="$(date +%s)"
export CHPC_CHECK_START_TIMESTAMP
S3_URL=${S3_URL:="https://clickhouse-builds.s3.amazonaws.com"}
BUILD_NAME=${BUILD_NAME:-package_release}
export S3_URL BUILD_NAME
SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
# Sometimes AWS responds with DNS error and it's impossible to retry it with
# current curl version options.
function curl_with_retry
{
for _ in 1 2 3 4 5 6 7 8 9 10; do
if curl --fail --head "$1"
then
return 0
else
sleep 1
fi
done
return 1
}
# Use the packaged repository to find the revision we will compare to.
function find_reference_sha
{
git -C right/ch log -1 origin/master
git -C right/ch log -1 pr
# Go back from the revision to be tested, trying to find the closest published
# testing release. The PR branch may be either pull/*/head which is the
# author's branch, or pull/*/merge, which is head merged with some master
# automatically by Github. We will use a merge base with master as a reference
# for tesing (or some older commit). A caveat is that if we're testing the
# master, the merge base is the tested commit itself, so we have to step back
# once.
start_ref=$(git -C right/ch merge-base origin/master pr)
if [ "$PR_TO_TEST" == "0" ]
then
start_ref=$start_ref~
fi
# Loop back to find a commit that actually has a published perf test package.
while :
do
# FIXME the original idea was to compare to a closest testing tag, which
# is a version that is verified to work correctly. However, we're having
# some test stability issues now, and the testing release can't roll out
# for more that a weak already because of that. Temporarily switch to
# using just closest master, so that we can go on.
#ref_tag=$(git -C ch describe --match='v*-testing' --abbrev=0 --first-parent "$start_ref")
ref_tag="$start_ref"
echo Reference tag is "$ref_tag"
# We use annotated tags which have their own shas, so we have to further
# dereference the tag to get the commit it points to, hence the '~0' thing.
REF_SHA=$(git -C right/ch rev-parse "$ref_tag~0")
# FIXME sometimes we have testing tags on commits without published builds.
# Normally these are documentation commits. Loop to skip them.
# Historically there were various path for the performance test package,
# test all of them.
unset found
declare -a urls_to_try=(
"$S3_URL/PRs/0/$REF_SHA/$BUILD_NAME/performance.tar.zst"
"$S3_URL/0/$REF_SHA/$BUILD_NAME/performance.tar.zst"
"$S3_URL/0/$REF_SHA/$BUILD_NAME/performance.tgz"
)
for path in "${urls_to_try[@]}"
do
if curl_with_retry "$path"
then
found="$path"
break
fi
done
if [ -n "$found" ] ; then break; fi
start_ref="$REF_SHA~"
done
REF_PR=0
}
#chown nobody workspace output
#chgrp nogroup workspace output
#chmod 777 workspace output
#[ ! -e "/artifacts/performance.tar.zst" ] && echo "ERROR: performance.tar.zst not found" && exit 1
#mkdir -p right
#tar -xf "/artifacts/performance.tar.zst" -C right --no-same-owner --strip-components=1 --zstd --extract --verbose
## Find reference revision if not specified explicitly
#if [ "$REF_SHA" == "" ]; then find_reference_sha; fi
#if [ "$REF_SHA" == "" ]; then echo Reference SHA is not specified ; exit 1 ; fi
#if [ "$REF_PR" == "" ]; then echo Reference PR is not specified ; exit 1 ; fi
# Show what we're testing
#(
# git -C right/ch log -1 --decorate "$REF_SHA" ||:
#) | tee left-commit.txt
#
#(
# git -C right/ch log -1 --decorate "$SHA_TO_TEST" ||:
# echo
# echo Real tested commit is:
# git -C right/ch log -1 --decorate "pr"
#) | tee right-commit.txt
#if [ "$PR_TO_TEST" != "0" ]
#then
# # If the PR only changes the tests and nothing else, prepare a list of these
# # tests for use by compare.sh. Compare to merge base, because master might be
# # far in the future and have unrelated test changes.
# base=$(git -C right/ch merge-base pr origin/master)
# git -C right/ch diff --name-only "$base" pr -- . | tee all-changed-files.txt
# git -C right/ch diff --name-only --diff-filter=d "$base" pr -- tests/performance/*.xml | tee changed-test-definitions.txt
# git -C right/ch diff --name-only "$base" pr -- :!tests/performance/*.xml :!docker/test/performance-comparison | tee other-changed-files.txt
#fi
# prepare config for the right server
export PATH="/tmp/praktika/input:$PATH"
rm -rf /tmp/praktika/right/config && mkdir -p /tmp/praktika/right/config
cp -r ./tests/config /tmp/praktika/right/config
cp ./programs/server/config.xml /tmp/praktika/right/config/
cd /tmp/praktika/input
chmod +x clickhouse
ln -sf clickhouse clickhouse-local
ln -sf clickhouse clickhouse-client
#for file in /tmp/praktika/right/config/config.d/*.xml; do [ -f $file ] && echo Change config $file && sed -i 's|>/var/log|>/tmp/praktika/right/var/log|g; s|>/etc/|>/tmp/praktika/right/etc/|g' $(readlink -f $file); done
cd -
# prepare config for the left server
left_sha=$(sed -n 's/SET(VERSION_GITHASH \(.*\))/\1/p' cmake/autogenerated_versions.txt)
version_major=$(sed -n 's/SET(VERSION_MAJOR \(.*\))/\1/p' cmake/autogenerated_versions.txt)
version_minor=$(sed -n 's/SET(VERSION_MINOR \(.*\))/\1/p' cmake/autogenerated_versions.txt)
rm -rf /tmp/praktika/left/config && mkdir -p /tmp/praktika/left/config
#git checkout left_sha
#rm -rf /tmp/praktika/left && mkdir -p /tmp/praktika/left
#cp -r ./tests/config /tmp/praktika/left/config
#git checkout -
cd /tmp/praktika/left
[ ! -f clickhouse ] && wget -nv https://clickhouse-builds.s3.us-east-1.amazonaws.com/$version_major.$version_minor/020d843058ae211c43285852e5f4f0e0e9cc1eb6/package_aarch64/clickhouse
chmod +x clickhouse
ln -sf clickhouse clickhouse-local
ln -sf clickhouse clickhouse-client
ln -sf clickhouse clickhouse-server
cd -
# Set python output encoding so that we can print queries with non-ASCII letters.
export PYTHONIOENCODING=utf-8
script_path="tests/performance/scripts/"
## Even if we have some errors, try our best to save the logs.
#set +e
# Use clickhouse-client and clickhouse-local from the right server.
export REF_PR
export REF_SHA
# Try to collect some core dumps.
# At least we remove the ulimit and then try to pack some common file names into output.
ulimit -c unlimited
cat /proc/sys/kernel/core_pattern
# Start the main comparison script.
{
# time $SCRIPT_DIR/download.sh "$REF_PR" "$REF_SHA" "$PR_TO_TEST" "$SHA_TO_TEST" && \
time stage=configure ./ci/jobs/scripts/performance_compare.sh ; \
} 2>&1 | ts "$(printf '%%Y-%%m-%%d %%H:%%M:%%S\t')" | tee -a compare.log
# Stop the servers to free memory. Normally they are restarted before getting
# the profile info, so they shouldn't use much, but if the comparison script
# fails in the middle, this might not be the case.
for _ in {1..30}
do
killall clickhouse || break
sleep 1
done
dmesg -T > dmesg.log
ls -lath
7z a '-x!*/tmp' /output/output.7z ./*.{log,tsv,html,txt,rep,svg,columns} \
{right,left}/{performance,scripts} {{right,left}/db,db0}/preprocessed_configs \
report analyze benchmark metrics \
./*.core.dmp ./*.core
# If the files aren't same, copy it
cmp --silent compare.log /output/compare.log || \
cp compare.log /output

View File

@ -8,12 +8,12 @@ from praktika.yaml_generator import YamlGenerator
def create_parser():
parser = argparse.ArgumentParser(prog="python3 -m praktika")
parser = argparse.ArgumentParser(prog="praktika")
subparsers = parser.add_subparsers(dest="command", help="Available subcommands")
run_parser = subparsers.add_parser("run", help="Job Runner")
run_parser.add_argument("--job", help="Job Name", type=str, required=True)
run_parser.add_argument("job", help="Job Name", type=str)
run_parser.add_argument(
"--workflow",
help="Workflow Name (required if job name is not uniq per config)",
@ -75,7 +75,8 @@ def create_parser():
return parser
if __name__ == "__main__":
def main():
sys.path.append(".")
parser = create_parser()
args = parser.parse_args()
@ -120,3 +121,7 @@ if __name__ == "__main__":
else:
parser.print_help()
sys.exit(1)
if __name__ == "__main__":
main()

View File

@ -128,9 +128,6 @@ class HtmlRunnerHooks:
for job in _workflow.jobs:
if job.name not in skip_jobs:
result = Result.generate_pending(job.name)
# Preemptively add the general job log to the result directory to ensure
# the post-job handler can upload it, even if the job is terminated unexpectedly
result.set_files([Settings.RUN_LOG])
else:
result = Result.generate_skipped(job.name, job_cache_records[job.name])
results.append(result)

View File

@ -529,7 +529,7 @@
const columnSymbols = {
name: '🗂️',
status: '🧾',
status: '',
start_time: '🕒',
duration: '⏳',
info: '📝',

View File

@ -68,9 +68,7 @@ def _update_workflow_with_native_jobs(workflow):
print(f"Enable native job [{_docker_build_job.name}] for [{workflow.name}]")
aux_job = copy.deepcopy(_docker_build_job)
if workflow.enable_cache:
print(
f"Add automatic digest config for [{aux_job.name}] job since cache is enabled"
)
print(f"Add automatic digest config for [{aux_job.name}] job")
docker_digest_config = Job.CacheDigestConfig()
for docker_config in workflow.dockers:
docker_digest_config.include_paths.append(docker_config.path)

View File

@ -144,7 +144,7 @@ def _config_workflow(workflow: Workflow.Config, job_name):
f"git diff-index HEAD -- {Settings.WORKFLOW_PATH_PREFIX}"
)
info = ""
status = Result.Status.SUCCESS
status = Result.Status.FAILED
if exit_code != 0:
info = f"workspace has uncommitted files unexpectedly [{output}]"
status = Result.Status.ERROR
@ -154,10 +154,14 @@ def _config_workflow(workflow: Workflow.Config, job_name):
exit_code, output, err = Shell.get_res_stdout_stderr(
f"git diff-index HEAD -- {Settings.WORKFLOW_PATH_PREFIX}"
)
if exit_code != 0:
info = f"workspace has outdated workflows [{output}] - regenerate with [python -m praktika --generate]"
status = Result.Status.ERROR
if output:
info = f"workflows are outdated: [{output}]"
status = Result.Status.FAILED
print("ERROR: ", info)
elif exit_code == 0 and not err:
status = Result.Status.SUCCESS
else:
print(f"ERROR: exit code [{exit_code}], err [{err}]")
return (
Result(

View File

@ -1,5 +1,6 @@
import dataclasses
import datetime
import json
import sys
from pathlib import Path
from typing import Any, Dict, List, Optional, Union
@ -80,12 +81,19 @@ class Result(MetaClasses.Serializable):
infos += info
if results and not status:
for result in results:
if result.status not in (Result.Status.SUCCESS, Result.Status.FAILED):
if result.status not in (
Result.Status.SUCCESS,
Result.Status.FAILED,
Result.Status.ERROR,
):
Utils.raise_with_error(
f"Unexpected result status [{result.status}] for Result.create_from call"
)
if result.status != Result.Status.SUCCESS:
result_status = Result.Status.FAILED
if result.status == Result.Status.ERROR:
result_status = Result.Status.ERROR
break
if results:
for result in results:
if result.info and with_info_from_results:
@ -166,17 +174,14 @@ class Result(MetaClasses.Serializable):
return Result(**obj)
def update_duration(self):
if not self.duration and self.start_time:
if self.duration:
return self
if self.start_time:
self.duration = datetime.datetime.utcnow().timestamp() - self.start_time
else:
if not self.duration:
print(
f"NOTE: duration is set for job [{self.name}] Result - do not update by CI"
)
else:
print(
f"NOTE: start_time is not set for job [{self.name}] Result - do not update duration"
)
print(
f"NOTE: start_time is not set for job [{self.name}] Result - do not update duration"
)
return self
def set_timing(self, stopwatch: Utils.Stopwatch):
@ -250,7 +255,21 @@ class Result(MetaClasses.Serializable):
)
@classmethod
def create_from_command_execution(
def from_gtest_run(cls, name, unit_tests_path, with_log=False):
Shell.check(f"rm {ResultTranslator.GTEST_RESULT_FILE}")
result = Result.from_commands_run(
name=name,
command=[
f"{unit_tests_path} --gtest_output='json:{ResultTranslator.GTEST_RESULT_FILE}'"
],
with_log=with_log,
)
status, results, info = ResultTranslator.from_gtest()
result.set_status(status).set_results(results).set_info(info)
return result
@classmethod
def from_commands_run(
cls,
name,
command,
@ -507,10 +526,11 @@ class _ResultS3:
# return True
@classmethod
def upload_result_files_to_s3(cls, result):
def upload_result_files_to_s3(cls, result, s3_subprefix=""):
s3_subprefix = "/".join([s3_subprefix, Utils.normalize_string(result.name)])
if result.results:
for result_ in result.results:
cls.upload_result_files_to_s3(result_)
cls.upload_result_files_to_s3(result_, s3_subprefix=s3_subprefix)
for file in result.files:
if not Path(file).is_file():
print(f"ERROR: Invalid file [{file}] in [{result.name}] - skip upload")
@ -529,7 +549,7 @@ class _ResultS3:
file,
upload_to_s3=True,
text=is_text,
s3_subprefix=Utils.normalize_string(result.name),
s3_subprefix=s3_subprefix,
)
result.links.append(file_link)
if result.files:
@ -572,3 +592,138 @@ class _ResultS3:
return new_status
else:
return None
class ResultTranslator:
GTEST_RESULT_FILE = "/tmp/praktika/gtest.json"
@classmethod
def from_gtest(cls):
"""The json is described by the next proto3 scheme:
(It's wrong, but that's a copy/paste from
https://google.github.io/googletest/advanced.html#generating-a-json-report)
syntax = "proto3";
package googletest;
import "google/protobuf/timestamp.proto";
import "google/protobuf/duration.proto";
message UnitTest {
int32 tests = 1;
int32 failures = 2;
int32 disabled = 3;
int32 errors = 4;
google.protobuf.Timestamp timestamp = 5;
google.protobuf.Duration time = 6;
string name = 7;
repeated TestCase testsuites = 8;
}
message TestCase {
string name = 1;
int32 tests = 2;
int32 failures = 3;
int32 disabled = 4;
int32 errors = 5;
google.protobuf.Duration time = 6;
repeated TestInfo testsuite = 7;
}
message TestInfo {
string name = 1;
string file = 6;
int32 line = 7;
enum Status {
RUN = 0;
NOTRUN = 1;
}
Status status = 2;
google.protobuf.Duration time = 3;
string classname = 4;
message Failure {
string failures = 1;
string type = 2;
}
repeated Failure failures = 5;
}"""
test_results = [] # type: List[Result]
if not Path(cls.GTEST_RESULT_FILE).exists():
print(f"ERROR: No test result file [{cls.GTEST_RESULT_FILE}]")
return (
Result.Status.ERROR,
test_results,
f"No test result file [{cls.GTEST_RESULT_FILE}]",
)
with open(cls.GTEST_RESULT_FILE, "r", encoding="utf-8") as j:
report = json.load(j)
total_counter = report["tests"]
failed_counter = report["failures"]
error_counter = report["errors"]
description = ""
SEGFAULT = "Segmentation fault. "
SIGNAL = "Exit on signal. "
for suite in report["testsuites"]:
suite_name = suite["name"]
for test_case in suite["testsuite"]:
case_name = test_case["name"]
test_time = float(test_case["time"][:-1])
raw_logs = None
if "failures" in test_case:
raw_logs = ""
for failure in test_case["failures"]:
raw_logs += failure[Result.Status.FAILED]
if (
"Segmentation fault" in raw_logs # type: ignore
and SEGFAULT not in description
):
description += SEGFAULT
if (
"received signal SIG" in raw_logs # type: ignore
and SIGNAL not in description
):
description += SIGNAL
if test_case["status"] == "NOTRUN":
test_status = "SKIPPED"
elif raw_logs is None:
test_status = Result.Status.SUCCESS
else:
test_status = Result.Status.FAILED
test_results.append(
Result(
f"{suite_name}.{case_name}",
test_status,
duration=test_time,
info=raw_logs,
)
)
check_status = Result.Status.SUCCESS
tests_status = Result.Status.SUCCESS
tests_time = float(report["time"][:-1])
if failed_counter:
check_status = Result.Status.FAILED
test_status = Result.Status.FAILED
if error_counter:
check_status = Result.Status.ERROR
test_status = Result.Status.ERROR
test_results.append(Result(report["name"], tests_status, duration=tests_time))
if not description:
description += (
f"fail: {failed_counter + error_counter}, "
f"passed: {total_counter - failed_counter - error_counter}"
)
return (
check_status,
test_results,
description,
)

View File

@ -61,8 +61,6 @@ class Runner:
docker, workflow.dockers
)
# work around for old clickhouse jobs
os.environ["DOCKER_TAG"] = json.dumps(workflow_config.digest_dockers)
workflow_config.dump()
Result.generate_pending(job.name).dump()
@ -86,6 +84,7 @@ class Runner:
print("Read GH Environment")
env = _Environment.from_env()
env.JOB_NAME = job.name
os.environ["JOB_NAME"] = job.name
env.dump()
print(env)
@ -148,6 +147,14 @@ class Runner:
env.JOB_NAME = job.name
env.dump()
# work around for old clickhouse jobs
try:
os.environ["DOCKER_TAG"] = json.dumps(
RunConfig.from_fs(workflow.name).digest_dockers
)
except Exception as e:
print(f"WARNING: Failed to set DOCKER_TAG, ex [{e}]")
if param:
if not isinstance(param, str):
Utils.raise_with_error(
@ -200,13 +207,15 @@ class Runner:
ResultInfo.TIMEOUT
)
elif result.is_running():
info = f"ERROR: Job terminated with an error, exit code [{exit_code}] - set status to [{Result.Status.ERROR}]"
info = f"ERROR: Job killed, exit code [{exit_code}] - set status to [{Result.Status.ERROR}]"
print(info)
result.set_status(Result.Status.ERROR).set_info(info)
result.set_files([Settings.RUN_LOG])
else:
info = f"ERROR: Invalid status [{result.status}] for exit code [{exit_code}] - switch to [{Result.Status.ERROR}]"
print(info)
result.set_status(Result.Status.ERROR).set_info(info)
result.set_files([Settings.RUN_LOG])
result.dump()
return exit_code
@ -257,10 +266,6 @@ class Runner:
info = f"ERROR: {ResultInfo.KILLED}"
print(info)
result.set_info(info).set_status(Result.Status.ERROR).dump()
else:
# TODO: add setting with different ways of storing general praktika log: always, on error, never.
# now let's store it on error only
result.files = [file for file in result.files if file != Settings.RUN_LOG]
result.update_duration().dump()

View File

@ -227,8 +227,8 @@ class Shell:
proc = subprocess.Popen(
command,
shell=True,
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
stdin=subprocess.PIPE if stdin_str else None,
universal_newlines=True,
start_new_session=True, # Start a new process group for signal handling
@ -248,11 +248,24 @@ class Shell:
proc.stdin.write(stdin_str)
proc.stdin.close()
# Process output in real-time
if proc.stdout:
for line in proc.stdout:
# Process both stdout and stderr in real-time
def stream_output(stream, output_fp):
for line in iter(stream.readline, ""):
sys.stdout.write(line)
log_fp.write(line)
output_fp.write(line)
stdout_thread = Thread(
target=stream_output, args=(proc.stdout, log_fp)
)
stderr_thread = Thread(
target=stream_output, args=(proc.stderr, log_fp)
)
stdout_thread.start()
stderr_thread.start()
stdout_thread.join()
stderr_thread.join()
proc.wait() # Wait for the process to finish

View File

@ -105,9 +105,9 @@ jobs:
. /tmp/praktika_setup_env.sh
set -o pipefail
if command -v ts &> /dev/null; then
python3 -m praktika run --job '''{JOB_NAME}''' --workflow "{WORKFLOW_NAME}" --ci |& ts '[%Y-%m-%d %H:%M:%S]' | tee /tmp/praktika/praktika_run.log
python3 -m praktika run '''{JOB_NAME}''' --workflow "{WORKFLOW_NAME}" --ci |& ts '[%Y-%m-%d %H:%M:%S]' | tee /tmp/praktika/praktika_run.log
else
python3 -m praktika run --job '''{JOB_NAME}''' --workflow "{WORKFLOW_NAME}" --ci |& tee /tmp/praktika/praktika_run.log
python3 -m praktika run '''{JOB_NAME}''' --workflow "{WORKFLOW_NAME}" --ci |& tee /tmp/praktika/praktika_run.log
fi
{UPLOADS_GITHUB}\
"""

View File

@ -1,245 +0,0 @@
from praktika import Docker, Secret
S3_BUCKET_NAME = "clickhouse-builds"
S3_BUCKET_HTTP_ENDPOINT = "clickhouse-builds.s3.amazonaws.com"
class RunnerLabels:
CI_SERVICES = "ci_services"
CI_SERVICES_EBS = "ci_services_ebs"
BUILDER_AMD = "builder"
BUILDER_ARM = "builder-aarch64"
FUNC_TESTER_AMD = "func-tester"
FUNC_TESTER_ARM = "func-tester-aarch64"
BASE_BRANCH = "master"
azure_secret = Secret.Config(
name="azure_connection_string",
type=Secret.Type.AWS_SSM_VAR,
)
SECRETS = [
Secret.Config(
name="dockerhub_robot_password",
type=Secret.Type.AWS_SSM_VAR,
),
azure_secret,
# Secret.Config(
# name="woolenwolf_gh_app.clickhouse-app-id",
# type=Secret.Type.AWS_SSM_SECRET,
# ),
# Secret.Config(
# name="woolenwolf_gh_app.clickhouse-app-key",
# type=Secret.Type.AWS_SSM_SECRET,
# ),
]
DOCKERS = [
# Docker.Config(
# name="clickhouse/binary-builder",
# path="./ci/docker/packager/binary-builder",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/cctools",
# path="./ci/docker/packager/cctools",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/test-old-centos",
# path="./ci/docker/test/compatibility/centos",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/test-old-ubuntu",
# path="./ci/docker/test/compatibility/ubuntu",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/test-util",
# path="./ci/docker/test/util",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/integration-test",
# path="./ci/docker/test/integration/base",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/fuzzer",
# path="./ci/docker/test/fuzzer",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/performance-comparison",
# path="./ci/docker/test/performance-comparison",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
Docker.Config(
name="clickhouse/fasttest",
path="./ci/docker/fasttest",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
# Docker.Config(
# name="clickhouse/test-base",
# path="./ci/docker/test/base",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-util"],
# ),
# Docker.Config(
# name="clickhouse/clickbench",
# path="./ci/docker/test/clickbench",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/keeper-jepsen-test",
# path="./ci/docker/test/keeper-jepsen",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/server-jepsen-test",
# path="./ci/docker/test/server-jepsen",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/sqllogic-test",
# path="./ci/docker/test/sqllogic",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/sqltest",
# path="./ci/docker/test/sqltest",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
Docker.Config(
name="clickhouse/stateless-test",
path="./ci/docker/stateless-test",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
Docker.Config(
name="clickhouse/stateful-test",
path="./ci/docker/stateful-test",
platforms=Docker.Platforms.arm_amd,
depends_on=["clickhouse/stateless-test"],
),
# Docker.Config(
# name="clickhouse/stress-test",
# path="./ci/docker/test/stress",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/stateful-test"],
# ),
# Docker.Config(
# name="clickhouse/unit-test",
# path="./ci/docker/test/unit",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/integration-tests-runner",
# path="./ci/docker/test/integration/runner",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
Docker.Config(
name="clickhouse/style-test",
path="./ci/docker/style-test",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
# Docker.Config(
# name="clickhouse/docs-builder",
# path="./ci/docker/docs/builder",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
]
# TODO:
# "docker/test/integration/s3_proxy": {
# "name": "clickhouse/s3-proxy",
# "dependent": []
# },
# "docker/test/integration/resolver": {
# "name": "clickhouse/python-bottle",
# "dependent": []
# },
# "docker/test/integration/helper_container": {
# "name": "clickhouse/integration-helper",
# "dependent": []
# },
# "docker/test/integration/mysql_golang_client": {
# "name": "clickhouse/mysql-golang-client",
# "dependent": []
# },
# "docker/test/integration/dotnet_client": {
# "name": "clickhouse/dotnet-client",
# "dependent": []
# },
# "docker/test/integration/mysql_java_client": {
# "name": "clickhouse/mysql-java-client",
# "dependent": []
# },
# "docker/test/integration/mysql_js_client": {
# "name": "clickhouse/mysql-js-client",
# "dependent": []
# },
# "docker/test/integration/mysql_php_client": {
# "name": "clickhouse/mysql-php-client",
# "dependent": []
# },
# "docker/test/integration/postgresql_java_client": {
# "name": "clickhouse/postgresql-java-client",
# "dependent": []
# },
# "docker/test/integration/kerberos_kdc": {
# "only_amd64": true,
# "name": "clickhouse/kerberos-kdc",
# "dependent": []
# },
# "docker/test/integration/kerberized_hadoop": {
# "only_amd64": true,
# "name": "clickhouse/kerberized-hadoop",
# "dependent": []
# },
# "docker/test/sqlancer": {
# "name": "clickhouse/sqlancer-test",
# "dependent": []
# },
# "docker/test/install/deb": {
# "name": "clickhouse/install-deb-test",
# "dependent": []
# },
# "docker/test/install/rpm": {
# "name": "clickhouse/install-rpm-test",
# "dependent": []
# },
# "docker/test/integration/nginx_dav": {
# "name": "clickhouse/nginx-dav",
# "dependent": []
# }
class JobNames:
STYLE_CHECK = "Style Check"
FAST_TEST = "Fast test"
BUILD = "Build"
STATELESS = "Stateless tests"
STATEFUL = "Stateful tests"
STRESS = "Stress tests"

View File

@ -1,14 +1,13 @@
from ci.settings.definitions import (
S3_BUCKET_HTTP_ENDPOINT,
S3_BUCKET_NAME,
RunnerLabels,
)
# aux settings:
S3_BUCKET_NAME = "clickhouse-builds"
S3_BUCKET_HTTP_ENDPOINT = "clickhouse-builds.s3.amazonaws.com"
# praktika settings:
MAIN_BRANCH = "master"
S3_ARTIFACT_PATH = f"{S3_BUCKET_NAME}/artifacts"
CI_CONFIG_RUNS_ON = [RunnerLabels.CI_SERVICES]
DOCKER_BUILD_RUNS_ON = [RunnerLabels.CI_SERVICES_EBS]
CI_CONFIG_RUNS_ON = ["ci_services"]
DOCKER_BUILD_RUNS_ON = ["ci_services_ebs"]
CACHE_S3_PATH = f"{S3_BUCKET_NAME}/ci_ch_cache"
HTML_S3_PATH = f"{S3_BUCKET_NAME}/reports"
S3_BUCKET_TO_HTTP_ENDPOINT = {S3_BUCKET_NAME: S3_BUCKET_HTTP_ENDPOINT}

17
ci/setup.py Normal file
View File

@ -0,0 +1,17 @@
from setuptools import find_packages, setup
setup(
name="praktika",
version="0.1",
packages=find_packages(),
url="https://github.com/ClickHouse/praktika",
license="Apache 2.0",
author="Max Kainov",
author_email="max.kainov@clickhouse.com",
description="CI Infrastructure Toolbox",
entry_points={
"console_scripts": [
"praktika=praktika.__main__:main",
]
},
)

610
ci/workflows/defs.py Normal file
View File

@ -0,0 +1,610 @@
from praktika import Artifact, Docker, Job, Secret
from praktika.settings import Settings
class RunnerLabels:
CI_SERVICES = "ci_services"
CI_SERVICES_EBS = "ci_services_ebs"
BUILDER_AMD = "builder"
BUILDER_ARM = "builder-aarch64"
FUNC_TESTER_AMD = "func-tester"
FUNC_TESTER_ARM = "func-tester-aarch64"
STYLE_CHECK_AMD = "style-checker"
STYLE_CHECK_ARM = "style-checker-aarch64"
CI_SERVICES = "ci_services"
class CIFiles:
UNIT_TESTS_RESULTS = "/tmp/praktika/output/unit_tests_result.json"
UNIT_TESTS_BIN = "/tmp/praktika/build/src/unit_tests_dbms"
BASE_BRANCH = "master"
azure_secret = Secret.Config(
name="azure_connection_string",
type=Secret.Type.AWS_SSM_VAR,
)
SECRETS = [
Secret.Config(
name="dockerhub_robot_password",
type=Secret.Type.AWS_SSM_VAR,
),
azure_secret,
# Secret.Config(
# name="woolenwolf_gh_app.clickhouse-app-id",
# type=Secret.Type.AWS_SSM_SECRET,
# ),
# Secret.Config(
# name="woolenwolf_gh_app.clickhouse-app-key",
# type=Secret.Type.AWS_SSM_SECRET,
# ),
]
DOCKERS = [
Docker.Config(
name="clickhouse/binary-builder",
path="./ci/docker/binary-builder",
platforms=Docker.Platforms.arm_amd,
depends_on=["clickhouse/fasttest"],
),
# Docker.Config(
# name="clickhouse/cctools",
# path="./ci/docker/packager/cctools",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
Docker.Config(
name="clickhouse/test-old-centos",
path="./ci/docker/compatibility/centos",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
Docker.Config(
name="clickhouse/test-old-ubuntu",
path="./ci/docker/compatibility/ubuntu",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
# Docker.Config(
# name="clickhouse/test-util",
# path="./ci/docker/test/util",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/integration-test",
# path="./ci/docker/test/integration/base",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/fuzzer",
# path="./ci/docker/test/fuzzer",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/performance-comparison",
# path="./ci/docker/test/performance-comparison",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
Docker.Config(
name="clickhouse/fasttest",
path="./ci/docker/fasttest",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
# Docker.Config(
# name="clickhouse/test-base",
# path="./ci/docker/test/base",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-util"],
# ),
# Docker.Config(
# name="clickhouse/clickbench",
# path="./ci/docker/test/clickbench",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/keeper-jepsen-test",
# path="./ci/docker/test/keeper-jepsen",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/server-jepsen-test",
# path="./ci/docker/test/server-jepsen",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/sqllogic-test",
# path="./ci/docker/test/sqllogic",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/sqltest",
# path="./ci/docker/test/sqltest",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
Docker.Config(
name="clickhouse/stateless-test",
path="./ci/docker/stateless-test",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
Docker.Config(
name="clickhouse/stateful-test",
path="./ci/docker/stateful-test",
platforms=Docker.Platforms.arm_amd,
depends_on=["clickhouse/stateless-test"],
),
# Docker.Config(
# name="clickhouse/stress-test",
# path="./ci/docker/test/stress",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/stateful-test"],
# ),
# Docker.Config(
# name="clickhouse/unit-test",
# path="./ci/docker/test/unit",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/integration-tests-runner",
# path="./ci/docker/test/integration/runner",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
Docker.Config(
name="clickhouse/style-test",
path="./ci/docker/style-test",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
# Docker.Config(
# name="clickhouse/docs-builder",
# path="./ci/docker/docs/builder",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
]
# TODO:
# "docker/test/integration/s3_proxy": {
# "name": "clickhouse/s3-proxy",
# "dependent": []
# },
# "docker/test/integration/resolver": {
# "name": "clickhouse/python-bottle",
# "dependent": []
# },
# "docker/test/integration/helper_container": {
# "name": "clickhouse/integration-helper",
# "dependent": []
# },
# "docker/test/integration/mysql_golang_client": {
# "name": "clickhouse/mysql-golang-client",
# "dependent": []
# },
# "docker/test/integration/dotnet_client": {
# "name": "clickhouse/dotnet-client",
# "dependent": []
# },
# "docker/test/integration/mysql_java_client": {
# "name": "clickhouse/mysql-java-client",
# "dependent": []
# },
# "docker/test/integration/mysql_js_client": {
# "name": "clickhouse/mysql-js-client",
# "dependent": []
# },
# "docker/test/integration/mysql_php_client": {
# "name": "clickhouse/mysql-php-client",
# "dependent": []
# },
# "docker/test/integration/postgresql_java_client": {
# "name": "clickhouse/postgresql-java-client",
# "dependent": []
# },
# "docker/test/integration/kerberos_kdc": {
# "only_amd64": true,
# "name": "clickhouse/kerberos-kdc",
# "dependent": []
# },
# "docker/test/integration/kerberized_hadoop": {
# "only_amd64": true,
# "name": "clickhouse/kerberized-hadoop",
# "dependent": []
# },
# "docker/test/sqlancer": {
# "name": "clickhouse/sqlancer-test",
# "dependent": []
# },
# "docker/test/install/deb": {
# "name": "clickhouse/install-deb-test",
# "dependent": []
# },
# "docker/test/install/rpm": {
# "name": "clickhouse/install-rpm-test",
# "dependent": []
# },
# "docker/test/integration/nginx_dav": {
# "name": "clickhouse/nginx-dav",
# "dependent": []
# }
class JobNames:
STYLE_CHECK = "Style Check"
FAST_TEST = "Fast test"
BUILD = "Build"
STATELESS = "Stateless tests"
STATEFUL = "Stateful tests"
STRESS = "Stress tests"
PERFORMANCE = "Performance tests"
COMPATIBILITY = "Compatibility check"
class ToolSet:
COMPILER_C = "clang-19"
COMPILER_CPP = "clang++-19"
class ArtifactNames:
CH_AMD_DEBUG = "CH_AMD_DEBUG"
CH_AMD_RELEASE = "CH_AMD_RELEASE"
CH_AMD_ASAN = "CH_AMD_ASAN"
CH_AMD_TSAN = "CH_AMD_TSAN"
CH_AMD_MSAN = "CH_AMD_MSAN"
CH_AMD_UBSAN = "CH_AMD_UBSAN"
CH_AMD_BINARY = "CH_AMD_BINARY"
CH_ARM_RELEASE = "CH_ARM_RELEASE"
CH_ARM_ASAN = "CH_ARM_ASAN"
CH_ODBC_B_AMD_DEBUG = "CH_ODBC_B_AMD_DEBUG"
CH_ODBC_B_AMD_RELEASE = "CH_ODBC_B_AMD_RELEASE"
CH_ODBC_B_AMD_ASAN = "CH_ODBC_B_AMD_ASAN"
CH_ODBC_B_AMD_TSAN = "CH_ODBC_B_AMD_TSAN"
CH_ODBC_B_AMD_MSAN = "CH_ODBC_B_AMD_MSAN"
CH_ODBC_B_AMD_UBSAN = "CH_ODBC_B_AMD_UBSAN"
CH_ODBC_B_ARM_RELEASE = "CH_ODBC_B_ARM_RELEASE"
CH_ODBC_B_ARM_ASAN = "CH_ODBC_B_ARM_ASAN"
UNITTEST_AMD_ASAN = "UNITTEST_AMD_ASAN"
UNITTEST_AMD_TSAN = "UNITTEST_AMD_TSAN"
UNITTEST_AMD_MSAN = "UNITTEST_AMD_MSAN"
UNITTEST_AMD_UBSAN = "UNITTEST_AMD_UBSAN"
UNITTEST_AMD_BINARY = "UNITTEST_AMD_BINARY"
DEB_AMD_DEBUG = "DEB_AMD_DEBUG"
DEB_AMD_RELEASE = "DEB_AMD_RELEASE"
DEB_AMD_ASAN = "DEB_AMD_ASAN"
DEB_AMD_TSAN = "DEB_AMD_TSAN"
DEB_AMD_MSAM = "DEB_AMD_MSAM"
DEB_AMD_UBSAN = "DEB_AMD_UBSAN"
DEB_ARM_RELEASE = "DEB_ARM_RELEASE"
DEB_ARM_ASAN = "DEB_ARM_ASAN"
ARTIFACTS = [
*Artifact.Config(
name="...",
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/build/programs/clickhouse",
).parametrize(
names=[
ArtifactNames.CH_AMD_DEBUG,
ArtifactNames.CH_AMD_RELEASE,
ArtifactNames.CH_AMD_ASAN,
ArtifactNames.CH_AMD_TSAN,
ArtifactNames.CH_AMD_MSAN,
ArtifactNames.CH_AMD_UBSAN,
ArtifactNames.CH_AMD_BINARY,
ArtifactNames.CH_ARM_RELEASE,
ArtifactNames.CH_ARM_ASAN,
]
),
*Artifact.Config(
name="...",
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/build/programs/clickhouse-odbc-bridge",
).parametrize(
names=[
ArtifactNames.CH_ODBC_B_AMD_DEBUG,
ArtifactNames.CH_ODBC_B_AMD_ASAN,
ArtifactNames.CH_ODBC_B_AMD_TSAN,
ArtifactNames.CH_ODBC_B_AMD_MSAN,
ArtifactNames.CH_ODBC_B_AMD_UBSAN,
ArtifactNames.CH_ODBC_B_AMD_RELEASE,
ArtifactNames.CH_ODBC_B_ARM_RELEASE,
ArtifactNames.CH_ODBC_B_ARM_ASAN,
]
),
# *Artifact.Config(
# name="...",
# type=Artifact.Type.S3,
# path=f"{Settings.TEMP_DIR}/build/src/unit_tests_dbms",
# ).parametrize(
# names=[
# ArtifactNames.UNITTEST_AMD_BINARY,
# ArtifactNames.UNITTEST_AMD_ASAN,
# ArtifactNames.UNITTEST_AMD_TSAN,
# ArtifactNames.UNITTEST_AMD_MSAN,
# ArtifactNames.UNITTEST_AMD_UBSAN,
# ]
# ),
*Artifact.Config(
name="*",
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/output/*.deb",
).parametrize(
names=[
ArtifactNames.DEB_AMD_DEBUG,
ArtifactNames.DEB_AMD_ASAN,
ArtifactNames.DEB_AMD_TSAN,
ArtifactNames.DEB_AMD_MSAM,
ArtifactNames.DEB_AMD_UBSAN,
]
),
Artifact.Config(
name=ArtifactNames.DEB_AMD_RELEASE,
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/output/*.deb",
),
Artifact.Config(
name=ArtifactNames.DEB_ARM_RELEASE,
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/output/*.deb",
),
Artifact.Config(
name=ArtifactNames.DEB_ARM_ASAN,
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/output/*.deb",
),
]
class Jobs:
style_check_job = Job.Config(
name=JobNames.STYLE_CHECK,
runs_on=[RunnerLabels.CI_SERVICES],
command="python3 ./ci/jobs/check_style.py",
run_in_docker="clickhouse/style-test",
)
fast_test_job = Job.Config(
name=JobNames.FAST_TEST,
runs_on=[RunnerLabels.BUILDER_AMD],
command="python3 ./ci/jobs/fast_test.py",
run_in_docker="clickhouse/fasttest",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./ci/jobs/fast_test.py",
"./tests/queries/0_stateless/",
"./src",
],
),
)
build_jobs = Job.Config(
name=JobNames.BUILD,
runs_on=["...from params..."],
requires=[],
command="python3 ./ci/jobs/build_clickhouse.py --build-type {PARAMETER}",
run_in_docker="clickhouse/binary-builder",
timeout=3600 * 2,
digest_config=Job.CacheDigestConfig(
include_paths=[
"./src",
"./contrib/",
"./CMakeLists.txt",
"./PreLoad.cmake",
"./cmake",
"./base",
"./programs",
"./docker/packager/packager",
"./rust",
"./tests/ci/version_helper.py",
"./ci/jobs/build_clickhouse.py",
],
),
).parametrize(
parameter=[
"amd_debug",
"amd_release",
"amd_asan",
"amd_tsan",
"amd_msan",
"amd_ubsan",
"amd_binary",
"arm_release",
"arm_asan",
],
provides=[
[
ArtifactNames.CH_AMD_DEBUG,
ArtifactNames.DEB_AMD_DEBUG,
ArtifactNames.CH_ODBC_B_AMD_DEBUG,
],
[
ArtifactNames.CH_AMD_RELEASE,
ArtifactNames.DEB_AMD_RELEASE,
ArtifactNames.CH_ODBC_B_AMD_RELEASE,
],
[
ArtifactNames.CH_AMD_ASAN,
ArtifactNames.DEB_AMD_ASAN,
ArtifactNames.CH_ODBC_B_AMD_ASAN,
# ArtifactNames.UNITTEST_AMD_ASAN,
],
[
ArtifactNames.CH_AMD_TSAN,
ArtifactNames.DEB_AMD_TSAN,
ArtifactNames.CH_ODBC_B_AMD_TSAN,
# ArtifactNames.UNITTEST_AMD_TSAN,
],
[
ArtifactNames.CH_AMD_MSAN,
ArtifactNames.DEB_AMD_MSAM,
ArtifactNames.CH_ODBC_B_AMD_MSAN,
# ArtifactNames.UNITTEST_AMD_MSAN,
],
[
ArtifactNames.CH_AMD_UBSAN,
ArtifactNames.DEB_AMD_UBSAN,
ArtifactNames.CH_ODBC_B_AMD_UBSAN,
# ArtifactNames.UNITTEST_AMD_UBSAN,
],
[
ArtifactNames.CH_AMD_BINARY,
# ArtifactNames.UNITTEST_AMD_BINARY,
],
[
ArtifactNames.CH_ARM_RELEASE,
ArtifactNames.DEB_ARM_RELEASE,
ArtifactNames.CH_ODBC_B_ARM_RELEASE,
],
[
ArtifactNames.CH_ARM_ASAN,
ArtifactNames.DEB_ARM_ASAN,
ArtifactNames.CH_ODBC_B_ARM_ASAN,
],
],
runs_on=[
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.BUILDER_ARM],
[RunnerLabels.BUILDER_ARM],
],
)
stateless_tests_jobs = Job.Config(
name=JobNames.STATELESS,
runs_on=[RunnerLabels.BUILDER_AMD],
command="python3 ./ci/jobs/functional_stateless_tests.py --test-options {PARAMETER}",
# many tests expect to see "/var/lib/clickhouse" in various output lines - add mount for now, consider creating this dir in docker file
run_in_docker="clickhouse/stateless-test+--security-opt seccomp=unconfined",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./ci/jobs/functional_stateless_tests.py",
],
),
).parametrize(
parameter=[
"amd_debug,parallel",
"amd_debug,non-parallel",
"amd_release,parallel",
"amd_release,non-parallel",
"arm_asan,parallel",
"arm_asan,non-parallel",
],
runs_on=[
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.FUNC_TESTER_AMD],
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.FUNC_TESTER_AMD],
[RunnerLabels.BUILDER_ARM],
[RunnerLabels.FUNC_TESTER_ARM],
],
requires=[
[ArtifactNames.CH_AMD_DEBUG, ArtifactNames.CH_ODBC_B_AMD_DEBUG],
[ArtifactNames.CH_AMD_DEBUG, ArtifactNames.CH_ODBC_B_AMD_DEBUG],
[ArtifactNames.CH_AMD_RELEASE, ArtifactNames.CH_ODBC_B_AMD_RELEASE],
[ArtifactNames.CH_AMD_RELEASE, ArtifactNames.CH_ODBC_B_AMD_RELEASE],
[ArtifactNames.CH_ARM_ASAN, ArtifactNames.CH_ODBC_B_ARM_ASAN],
[ArtifactNames.CH_ARM_ASAN, ArtifactNames.CH_ODBC_B_ARM_ASAN],
],
)
stateful_tests_jobs = Job.Config(
name=JobNames.STATEFUL,
runs_on=[RunnerLabels.BUILDER_AMD],
command="python3 ./ci/jobs/functional_stateful_tests.py --test-options {PARAMETER}",
# many tests expect to see "/var/lib/clickhouse"
# some tests expect to see "/var/log/clickhouse"
run_in_docker="clickhouse/stateless-test+--security-opt seccomp=unconfined",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./ci/jobs/functional_stateful_tests.py",
],
),
).parametrize(
parameter=[
"amd_release,parallel",
],
runs_on=[
[RunnerLabels.BUILDER_AMD],
],
requires=[
[ArtifactNames.CH_AMD_DEBUG],
],
)
# TODO: refactor job to be aligned with praktika style (remove wrappers, run in docker)
stress_test_jobs = Job.Config(
name=JobNames.STRESS,
runs_on=[RunnerLabels.BUILDER_ARM],
command="python3 ./tests/ci/stress_check.py {PARAMETER}",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./ci/jobs/functional_stateful_tests.py",
],
),
).parametrize(
parameter=[
"arm_release",
],
runs_on=[
[RunnerLabels.FUNC_TESTER_ARM],
],
requires=[
[ArtifactNames.DEB_ARM_RELEASE],
],
)
performance_test_job = Job.Config(
name=JobNames.PERFORMANCE,
runs_on=[RunnerLabels.FUNC_TESTER_ARM],
command="./ci/jobs/scripts/performance_test.sh",
run_in_docker="clickhouse/stateless-test",
requires=[ArtifactNames.CH_ARM_RELEASE],
# digest_config=Job.CacheDigestConfig(
# include_paths=[
# "./ci/jobs/fast_test.py",
# "./tests/queries/0_stateless/",
# "./src",
# ],
# ),
)
compatibility_test_jobs = Job.Config(
name=JobNames.COMPATIBILITY,
runs_on=["#from param"],
command="python3 ./tests/ci/compatibility_check.py --check-name {PARAMETER}",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./tests/ci/compatibility_check.py",
"./docker/test/compatibility",
],
),
).parametrize(
parameter=["amd_release", "arm_release"],
runs_on=[
[RunnerLabels.STYLE_CHECK_AMD],
[RunnerLabels.STYLE_CHECK_ARM],
],
requires=[[ArtifactNames.DEB_AMD_RELEASE], [ArtifactNames.DEB_ARM_RELEASE]],
)

View File

@ -1,250 +1,24 @@
from praktika import Artifact, Job, Workflow
from praktika.settings import Settings
from praktika import Workflow
from ci.settings.definitions import (
BASE_BRANCH,
DOCKERS,
SECRETS,
JobNames,
RunnerLabels,
)
class ArtifactNames:
CH_AMD_DEBUG = "CH_AMD_DEBUG"
CH_AMD_RELEASE = "CH_AMD_RELEASE"
CH_ARM_RELEASE = "CH_ARM_RELEASE"
CH_ARM_ASAN = "CH_ARM_ASAN"
CH_ODBC_B_AMD_DEBUG = "CH_ODBC_B_AMD_DEBUG"
CH_ODBC_B_AMD_RELEASE = "CH_ODBC_B_AMD_RELEASE"
CH_ODBC_B_ARM_RELEASE = "CH_ODBC_B_ARM_RELEASE"
CH_ODBC_B_ARM_ASAN = "CH_ODBC_B_ARM_ASAN"
DEB_AMD_DEBUG = "DEB_AMD_DEBUG"
DEB_AMD_RELEASE = "DEB_AMD_RELEASE"
DEB_ARM_RELEASE = "DEB_ARM_RELEASE"
DEB_ARM_ASAN = "DEB_ARM_ASAN"
style_check_job = Job.Config(
name=JobNames.STYLE_CHECK,
runs_on=[RunnerLabels.CI_SERVICES],
command="python3 ./ci/jobs/check_style.py",
run_in_docker="clickhouse/style-test",
)
fast_test_job = Job.Config(
name=JobNames.FAST_TEST,
runs_on=[RunnerLabels.BUILDER_AMD],
command="python3 ./ci/jobs/fast_test.py",
run_in_docker="clickhouse/fasttest",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./ci/jobs/fast_test.py",
"./tests/queries/0_stateless/",
"./src",
],
),
)
build_jobs = Job.Config(
name=JobNames.BUILD,
runs_on=["...from params..."],
requires=[],
command="python3 ./ci/jobs/build_clickhouse.py --build-type {PARAMETER}",
run_in_docker="clickhouse/fasttest",
timeout=3600 * 2,
digest_config=Job.CacheDigestConfig(
include_paths=[
"./src",
"./contrib/",
"./CMakeLists.txt",
"./PreLoad.cmake",
"./cmake",
"./base",
"./programs",
"./docker/packager/packager",
"./rust",
"./tests/ci/version_helper.py",
"./ci/jobs/build_clickhouse.py",
],
),
).parametrize(
parameter=["amd_debug", "amd_release", "arm_release", "arm_asan"],
provides=[
[
ArtifactNames.CH_AMD_DEBUG,
ArtifactNames.DEB_AMD_DEBUG,
ArtifactNames.CH_ODBC_B_AMD_DEBUG,
],
[
ArtifactNames.CH_AMD_RELEASE,
ArtifactNames.DEB_AMD_RELEASE,
ArtifactNames.CH_ODBC_B_AMD_RELEASE,
],
[
ArtifactNames.CH_ARM_RELEASE,
ArtifactNames.DEB_ARM_RELEASE,
ArtifactNames.CH_ODBC_B_ARM_RELEASE,
],
[
ArtifactNames.CH_ARM_ASAN,
ArtifactNames.DEB_ARM_ASAN,
ArtifactNames.CH_ODBC_B_ARM_ASAN,
],
],
runs_on=[
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.BUILDER_ARM],
[RunnerLabels.BUILDER_ARM],
],
)
stateless_tests_jobs = Job.Config(
name=JobNames.STATELESS,
runs_on=[RunnerLabels.BUILDER_AMD],
command="python3 ./ci/jobs/functional_stateless_tests.py --test-options {PARAMETER}",
# many tests expect to see "/var/lib/clickhouse" in various output lines - add mount for now, consider creating this dir in docker file
run_in_docker="clickhouse/stateless-test+--security-opt seccomp=unconfined",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./ci/jobs/functional_stateless_tests.py",
],
),
).parametrize(
parameter=[
"amd_debug,parallel",
"amd_debug,non-parallel",
"amd_release,parallel",
"amd_release,non-parallel",
"arm_asan,parallel",
"arm_asan,non-parallel",
],
runs_on=[
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.FUNC_TESTER_AMD],
[RunnerLabels.BUILDER_AMD],
[RunnerLabels.FUNC_TESTER_AMD],
[RunnerLabels.BUILDER_ARM],
[RunnerLabels.FUNC_TESTER_ARM],
],
requires=[
[ArtifactNames.CH_AMD_DEBUG, ArtifactNames.CH_ODBC_B_AMD_DEBUG],
[ArtifactNames.CH_AMD_DEBUG, ArtifactNames.CH_ODBC_B_AMD_DEBUG],
[ArtifactNames.CH_AMD_RELEASE, ArtifactNames.CH_ODBC_B_AMD_RELEASE],
[ArtifactNames.CH_AMD_RELEASE, ArtifactNames.CH_ODBC_B_AMD_RELEASE],
[ArtifactNames.CH_ARM_ASAN, ArtifactNames.CH_ODBC_B_ARM_ASAN],
[ArtifactNames.CH_ARM_ASAN, ArtifactNames.CH_ODBC_B_ARM_ASAN],
],
)
stateful_tests_jobs = Job.Config(
name=JobNames.STATEFUL,
runs_on=[RunnerLabels.BUILDER_AMD],
command="python3 ./ci/jobs/functional_stateful_tests.py --test-options {PARAMETER}",
# many tests expect to see "/var/lib/clickhouse"
# some tests expect to see "/var/log/clickhouse"
run_in_docker="clickhouse/stateless-test+--security-opt seccomp=unconfined",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./ci/jobs/functional_stateful_tests.py",
],
),
).parametrize(
parameter=[
"amd_release,parallel",
],
runs_on=[
[RunnerLabels.BUILDER_AMD],
],
requires=[
[ArtifactNames.CH_AMD_DEBUG],
],
)
# TODO: refactor job to be aligned with praktika style (remove wrappers, run in docker)
stress_test_jobs = Job.Config(
name=JobNames.STRESS,
runs_on=[RunnerLabels.BUILDER_ARM],
command="python3 ./tests/ci/stress_check.py {PARAMETER}",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./ci/jobs/functional_stateful_tests.py",
],
),
).parametrize(
parameter=[
"arm_release",
],
runs_on=[
[RunnerLabels.FUNC_TESTER_ARM],
],
requires=[
[ArtifactNames.DEB_ARM_RELEASE],
],
)
from ci.workflows.defs import ARTIFACTS, BASE_BRANCH, DOCKERS, SECRETS, Jobs
S3_BUILDS_BUCKET = "clickhouse-builds"
workflow = Workflow.Config(
name="PR",
event=Workflow.Event.PULL_REQUEST,
base_branches=[BASE_BRANCH],
jobs=[
style_check_job,
fast_test_job,
*build_jobs,
*stateless_tests_jobs,
*stateful_tests_jobs,
*stress_test_jobs,
],
artifacts=[
*Artifact.Config(
name="...",
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/build/programs/clickhouse",
).parametrize(
names=[
ArtifactNames.CH_AMD_DEBUG,
ArtifactNames.CH_AMD_RELEASE,
ArtifactNames.CH_ARM_RELEASE,
ArtifactNames.CH_ARM_ASAN,
]
),
*Artifact.Config(
name="...",
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/build/programs/clickhouse-odbc-bridge",
).parametrize(
names=[
ArtifactNames.CH_ODBC_B_AMD_DEBUG,
ArtifactNames.CH_ODBC_B_AMD_RELEASE,
ArtifactNames.CH_ODBC_B_ARM_RELEASE,
ArtifactNames.CH_ODBC_B_ARM_ASAN,
]
),
Artifact.Config(
name=ArtifactNames.DEB_AMD_DEBUG,
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/output/*.deb",
),
Artifact.Config(
name=ArtifactNames.DEB_AMD_RELEASE,
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/output/*.deb",
),
Artifact.Config(
name=ArtifactNames.DEB_ARM_RELEASE,
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/output/*.deb",
),
Artifact.Config(
name=ArtifactNames.DEB_ARM_ASAN,
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/output/*.deb",
),
Jobs.style_check_job,
Jobs.fast_test_job,
*Jobs.build_jobs,
*Jobs.stateless_tests_jobs,
*Jobs.stateful_tests_jobs,
*Jobs.stress_test_jobs,
Jobs.performance_test_job,
*Jobs.compatibility_test_jobs,
],
artifacts=ARTIFACTS,
dockers=DOCKERS,
secrets=SECRETS,
enable_cache=True,
@ -255,13 +29,3 @@ workflow = Workflow.Config(
WORKFLOWS = [
workflow,
]
# if __name__ == "__main__":
# # local job test inside praktika environment
# from praktika.runner import Runner
# from praktika.digest import Digest
#
# print(Digest().calc_job_digest(amd_debug_build_job))
#
# Runner().run(workflow, fast_test_job, docker="fasttest", local_run=True)

View File

@ -4,7 +4,7 @@ FROM ubuntu:22.04
# ARG for quick switch to a given ubuntu mirror
ARG apt_archive="http://archive.ubuntu.com"
RUN sed -i "s|http://archive.ubuntu.com|$apt_archive|g" /etc/apt/sources.list
ARG LLVM_APT_VERSION="1:19.1.4~*"
ARG LLVM_APT_VERSION="1:19.1.4"
ENV DEBIAN_FRONTEND=noninteractive LLVM_VERSION=19
@ -29,7 +29,7 @@ RUN apt-get update \
&& echo "deb https://apt.llvm.org/${CODENAME}/ llvm-toolchain-${CODENAME}-${LLVM_VERSION} main" >> \
/etc/apt/sources.list \
&& apt-get update \
&& apt-get install --yes --no-install-recommends --verbose-versions llvm-${LLVM_VERSION}>=${LLVM_APT_VERSION} \
&& apt-get satisfy --yes --no-install-recommends "llvm-${LLVM_VERSION} (>= ${LLVM_APT_VERSION})" \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*

View File

@ -33,9 +33,9 @@ Input table:
``` text
┌─id─┬─name─┐
1 John│
2 Jane│
3 Bob│
1 │ John
2 │ Jane
3 │ Bob
└────┴──────┘
```

View File

@ -15,8 +15,8 @@ The [rank](./rank.md) function provides the same behaviour, but with gaps in ran
Alias: `denseRank` (case-sensitive)
```sql
dense_rank (column_name)
OVER ([[PARTITION BY grouping_column] [ORDER BY sorting_column]
dense_rank ()
OVER ([[PARTITION BY grouping_column] [ORDER BY sorting_column]
[ROWS or RANGE expression_to_bound_rows_withing_the_group]] | [window_name])
FROM table_name
WINDOW window_name as ([[PARTITION BY grouping_column] [ORDER BY sorting_column])
@ -55,7 +55,7 @@ INSERT INTO salaries FORMAT Values
```
```sql
SELECT player, salary,
SELECT player, salary,
dense_rank() OVER (ORDER BY salary DESC) AS dense_rank
FROM salaries;
```
@ -72,4 +72,4 @@ Result:
6. │ Scott Harrison │ 150000 │ 3 │
7. │ James Henderson │ 140000 │ 4 │
└─────────────────┴────────┴────────────┘
```
```

View File

@ -13,8 +13,8 @@ returns the relative rank (i.e. percentile) of rows within a window partition.
Alias: `percentRank` (case-sensitive)
```sql
percent_rank (column_name)
OVER ([[PARTITION BY grouping_column] [ORDER BY sorting_column]
percent_rank ()
OVER ([[PARTITION BY grouping_column] [ORDER BY sorting_column]
[RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]] | [window_name])
FROM table_name
WINDOW window_name as ([PARTITION BY grouping_column] [ORDER BY sorting_column] RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
@ -50,7 +50,7 @@ INSERT INTO salaries FORMAT Values
```
```sql
SELECT player, salary,
SELECT player, salary,
percent_rank() OVER (ORDER BY salary DESC) AS percent_rank
FROM salaries;
```

View File

@ -9,13 +9,13 @@ sidebar_position: 6
Ranks the current row within its partition with gaps. In other words, if the value of any row it encounters is equal to the value of a previous row then it will receive the same rank as that previous row.
The rank of the next row is then equal to the rank of the previous row plus a gap equal to the number of times the previous rank was given.
The [dense_rank](./dense_rank.md) function provides the same behaviour but without gaps in ranking.
The [dense_rank](./dense_rank.md) function provides the same behaviour but without gaps in ranking.
**Syntax**
```sql
rank (column_name)
OVER ([[PARTITION BY grouping_column] [ORDER BY sorting_column]
rank ()
OVER ([[PARTITION BY grouping_column] [ORDER BY sorting_column]
[ROWS or RANGE expression_to_bound_rows_withing_the_group]] | [window_name])
FROM table_name
WINDOW window_name as ([[PARTITION BY grouping_column] [ORDER BY sorting_column])
@ -54,7 +54,7 @@ INSERT INTO salaries FORMAT Values
```
```sql
SELECT player, salary,
SELECT player, salary,
rank() OVER (ORDER BY salary DESC) AS rank
FROM salaries;
```
@ -71,4 +71,4 @@ Result:
6. │ Scott Harrison │ 150000 │ 4 │
7. │ James Henderson │ 140000 │ 7 │
└─────────────────┴────────┴──────┘
```
```

View File

@ -48,18 +48,6 @@ if [ "$1" = configure ] || [ -n "$not_deb_os" ]; then
fi
fi
# /etc/systemd/system/clickhouse-server.service shouldn't be distributed by the package, but it was
# here we delete the service file if it was from our package
if [ -f /etc/systemd/system/clickhouse-server.service ]; then
SHA256=$(sha256sum /etc/systemd/system/clickhouse-server.service | cut -d' ' -f1)
for ref_sum in 7769a14773e811a56f67fd70f7960147217f5e68f746010aec96722e24d289bb 22890012047ea84fbfcebd6e291fe2ef2185cbfdd94a0294e13c8bf9959f58f8 b7790ae57156663c723f92e75ac2508453bf0a7b7e8313bb8081da99e5e88cd3 d1dcc1dbe92dab3ae17baa395f36abf1876b4513df272bf021484923e0111eef ac29ddd32a02eb31670bf5f0018c5d8a3cc006ca7ea572dcf717cb42310dcad7 c62d23052532a70115414833b500b266647d3924eb006a6f3eb673ff0d55f8fa b6b200ffb517afc2b9cf9e25ad8a4afdc0dad5a045bddbfb0174f84cc5a959ed; do
if [ "$SHA256" = "$ref_sum" ]; then
rm /etc/systemd/system/clickhouse-server.service
break
fi
done
fi
# Setup clickhouse-keeper directories
chown -R "${CLICKHOUSE_USER}:${CLICKHOUSE_GROUP}" "${KEEPER_CONFDIR}"
chmod 0755 "${KEEPER_CONFDIR}"

View File

@ -5,6 +5,7 @@ set (CLICKHOUSE_EXTRACT_FROM_CONFIG_LINK
boost::program_options
clickhouse_common_config
clickhouse_common_io
clickhouse_common_zookeeper_base
clickhouse_common_zookeeper
)

View File

@ -7,6 +7,7 @@ set (CLICKHOUSE_KEEPER_LINK
PRIVATE
clickhouse_common_config
clickhouse_common_io
clickhouse_common_zookeeper_base
clickhouse_common_zookeeper
daemon
clickhouse-keeper-converter-lib

View File

@ -8,6 +8,7 @@ set (CLICKHOUSE_SERVER_LINK
clickhouse_aggregate_functions
clickhouse_common_config
clickhouse_common_io
clickhouse_common_zookeeper_base
clickhouse_common_zookeeper
clickhouse_functions
clickhouse_parsers

View File

@ -13,18 +13,39 @@ struct Settings;
namespace ErrorCodes
{
extern const int INCORRECT_DATA;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int LOGICAL_ERROR;
extern const int INCORRECT_DATA;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int LOGICAL_ERROR;
}
namespace
{
template <class ValueType>
template <class ResultType, class ValueType>
struct AggregateFunctionArgMinMaxData
{
private:
ResultType result_data;
ValueType value_data;
public:
ResultType & result() { return result_data; }
const ResultType & result() const { return result_data; }
ValueType & value() { return value_data; }
const ValueType & value() const { return value_data; }
AggregateFunctionArgMinMaxData() = default;
explicit AggregateFunctionArgMinMaxData(TypeIndex) {}
static bool allocatesMemoryInArena(TypeIndex)
{
return ResultType::allocatesMemoryInArena() || ValueType::allocatesMemoryInArena();
}
};
template <class ValueType>
struct AggregateFunctionArgMinMaxDataGeneric
{
private:
SingleValueDataBaseMemoryBlock result_data;
ValueType value_data;
@ -35,27 +56,32 @@ public:
ValueType & value() { return value_data; }
const ValueType & value() const { return value_data; }
[[noreturn]] explicit AggregateFunctionArgMinMaxData()
[[noreturn]] AggregateFunctionArgMinMaxDataGeneric()
{
throw Exception(ErrorCodes::LOGICAL_ERROR, "AggregateFunctionArgMinMaxData initialized empty");
}
explicit AggregateFunctionArgMinMaxData(TypeIndex result_type) : value_data()
explicit AggregateFunctionArgMinMaxDataGeneric(TypeIndex result_type) : value_data()
{
generateSingleValueFromTypeIndex(result_type, result_data);
}
~AggregateFunctionArgMinMaxData() { result().~SingleValueDataBase(); }
static bool allocatesMemoryInArena(TypeIndex result_type_index)
{
return singleValueTypeAllocatesMemoryInArena(result_type_index) || ValueType::allocatesMemoryInArena();
}
~AggregateFunctionArgMinMaxDataGeneric() { result().~SingleValueDataBase(); }
};
static_assert(
sizeof(AggregateFunctionArgMinMaxData<Int8>) <= 2 * SingleValueDataBase::MAX_STORAGE_SIZE,
sizeof(AggregateFunctionArgMinMaxDataGeneric<Int8>) <= 2 * SingleValueDataBase::MAX_STORAGE_SIZE,
"Incorrect size of AggregateFunctionArgMinMaxData struct");
/// Returns the first arg value found for the minimum/maximum value. Example: argMin(arg, value).
template <typename ValueData, bool isMin>
template <typename Data, bool isMin>
class AggregateFunctionArgMinMax final
: public IAggregateFunctionDataHelper<AggregateFunctionArgMinMaxData<ValueData>, AggregateFunctionArgMinMax<ValueData, isMin>>
: public IAggregateFunctionDataHelper<Data, AggregateFunctionArgMinMax<Data, isMin>>
{
private:
const DataTypePtr & type_val;
@ -63,7 +89,8 @@ private:
const SerializationPtr serialization_val;
const TypeIndex result_type_index;
using Base = IAggregateFunctionDataHelper<AggregateFunctionArgMinMaxData<ValueData>, AggregateFunctionArgMinMax<ValueData, isMin>>;
using Base = IAggregateFunctionDataHelper<Data, AggregateFunctionArgMinMax<Data, isMin>>;
public:
explicit AggregateFunctionArgMinMax(const DataTypes & argument_types_)
@ -91,7 +118,7 @@ public:
void create(AggregateDataPtr __restrict place) const override /// NOLINT
{
new (place) AggregateFunctionArgMinMaxData<ValueData>(result_type_index);
new (place) Data(result_type_index);
}
String getName() const override
@ -215,7 +242,7 @@ public:
bool allocatesMemoryInArena() const override
{
return singleValueTypeAllocatesMemoryInArena(result_type_index) || ValueData::allocatesMemoryInArena();
return Data::allocatesMemoryInArena(result_type_index);
}
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena *) const override
@ -224,12 +251,125 @@ public:
}
};
template <bool isMin>
AggregateFunctionPtr createAggregateFunctionArgMinMax(
const std::string & name, const DataTypes & argument_types, const Array & parameters, const Settings * settings)
template <bool isMin, typename ResultType>
IAggregateFunction * createWithTwoTypesSecond(const DataTypes & argument_types)
{
return AggregateFunctionPtr(createAggregateFunctionSingleValue<AggregateFunctionArgMinMax, /* unary */ false, isMin>(
name, argument_types, parameters, settings));
const DataTypePtr & value_type = argument_types[1];
WhichDataType which_value(value_type);
if (which_value.idx == TypeIndex::UInt8)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<UInt8>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::UInt16)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<UInt16>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::UInt32)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<UInt32>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::UInt64)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<UInt64>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::Int8)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<Int8>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::Int16)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<Int16>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::Int32)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<Int32>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::Int64)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<Int64>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::Float32)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<Float32>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::Float64)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<Float64>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::Date)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<UInt16>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
if (which_value.idx == TypeIndex::DateTime)
{
using Data = AggregateFunctionArgMinMaxData<SingleValueDataFixed<ResultType>, SingleValueDataFixed<UInt32>>;
return new AggregateFunctionArgMinMax<Data, isMin>(argument_types);
}
return nullptr;
}
template <bool isMin>
IAggregateFunction * createWithTwoTypes(const DataTypes & argument_types)
{
const DataTypePtr & result_type = argument_types[0];
WhichDataType which_result(result_type);
if (which_result.idx == TypeIndex::UInt8) return createWithTwoTypesSecond<isMin, UInt8>(argument_types);
if (which_result.idx == TypeIndex::UInt16) return createWithTwoTypesSecond<isMin, UInt16>(argument_types);
if (which_result.idx == TypeIndex::UInt32) return createWithTwoTypesSecond<isMin, UInt32>(argument_types);
if (which_result.idx == TypeIndex::UInt64) return createWithTwoTypesSecond<isMin, UInt64>(argument_types);
if (which_result.idx == TypeIndex::Int8) return createWithTwoTypesSecond<isMin, Int8>(argument_types);
if (which_result.idx == TypeIndex::Int16) return createWithTwoTypesSecond<isMin, Int16>(argument_types);
if (which_result.idx == TypeIndex::Int32) return createWithTwoTypesSecond<isMin, Int32>(argument_types);
if (which_result.idx == TypeIndex::Int64) return createWithTwoTypesSecond<isMin, Int64>(argument_types);
if (which_result.idx == TypeIndex::Float32) return createWithTwoTypesSecond<isMin, Float32>(argument_types);
if (which_result.idx == TypeIndex::Float64) return createWithTwoTypesSecond<isMin, Float64>(argument_types);
return nullptr;
}
template <bool isMin>
AggregateFunctionPtr createAggregateFunctionArgMinMax(const std::string & name, const DataTypes & argument_types, const Array &, const Settings *)
{
assertBinary(name, argument_types);
AggregateFunctionPtr result = AggregateFunctionPtr(createWithTwoTypes<isMin>(argument_types));
if (!result)
{
const DataTypePtr & value_type = argument_types[1];
WhichDataType which(value_type);
#define DISPATCH(TYPE) \
if (which.idx == TypeIndex::TYPE) \
return AggregateFunctionPtr(new AggregateFunctionArgMinMax<AggregateFunctionArgMinMaxDataGeneric<SingleValueDataFixed<TYPE>>, isMin>(argument_types)); /// NOLINT
FOR_SINGLE_VALUE_NUMERIC_TYPES(DISPATCH)
#undef DISPATCH
if (which.idx == TypeIndex::Date)
return AggregateFunctionPtr(new AggregateFunctionArgMinMax<AggregateFunctionArgMinMaxDataGeneric<SingleValueDataFixed<DataTypeDate::FieldType>>, isMin>(argument_types));
if (which.idx == TypeIndex::DateTime)
return AggregateFunctionPtr(new AggregateFunctionArgMinMax<AggregateFunctionArgMinMaxDataGeneric<SingleValueDataFixed<DataTypeDateTime::FieldType>>, isMin>(argument_types));
if (which.idx == TypeIndex::String)
return AggregateFunctionPtr(new AggregateFunctionArgMinMax<AggregateFunctionArgMinMaxDataGeneric<SingleValueDataString>, isMin>(argument_types));
return AggregateFunctionPtr(new AggregateFunctionArgMinMax<AggregateFunctionArgMinMaxDataGeneric<SingleValueDataGeneric>, isMin>(argument_types));
}
return result;
}
}

View File

@ -2,6 +2,7 @@
#include <Core/ServerSettings.h>
#include <Core/Settings.h>
#include <Common/ShellCommandsHolder.h>
#include <IO/ConnectionTimeouts.h>
namespace DB
@ -29,7 +30,7 @@ LibraryBridgeHelper::LibraryBridgeHelper(ContextPtr context_)
void LibraryBridgeHelper::startBridge(std::unique_ptr<ShellCommand> cmd) const
{
getContext()->addBridgeCommand(std::move(cmd));
ShellCommandsHolder::instance().addCommand(std::move(cmd));
}

View File

@ -11,6 +11,7 @@
#include <Poco/Util/AbstractConfiguration.h>
#include <Common/BridgeProtocolVersion.h>
#include <Common/ShellCommand.h>
#include <Common/ShellCommandsHolder.h>
#include <IO/ConnectionTimeouts.h>
#include <base/range.h>
#include <BridgeHelper/IBridgeHelper.h>
@ -144,7 +145,7 @@ protected:
void startBridge(std::unique_ptr<ShellCommand> cmd) const override
{
getContext()->addBridgeCommand(std::move(cmd));
ShellCommandsHolder::instance().addCommand(std::move(cmd));
}

View File

@ -107,14 +107,8 @@ list (REMOVE_ITEM clickhouse_common_io_sources Common/malloc.cpp Common/new_dele
add_headers_and_sources(clickhouse_compression Compression)
add_headers_and_sources(clickhouse_compression Parsers)
add_headers_and_sources(clickhouse_compression Core)
#Included these specific files to avoid linking grpc
add_glob(clickhouse_compression_headers Server/ServerType.h)
add_glob(clickhouse_compression_sources Server/ServerType.cpp)
add_library(clickhouse_compression ${clickhouse_compression_headers} ${clickhouse_compression_sources})
add_headers_and_sources(dbms Disks/IO)
add_headers_and_sources(dbms Disks/ObjectStorages)
if (TARGET ch_contrib::sqlite)
@ -222,7 +216,6 @@ add_object_library(clickhouse_access Access)
add_object_library(clickhouse_backups Backups)
add_object_library(clickhouse_core Core)
add_object_library(clickhouse_core_mysql Core/MySQL)
add_object_library(clickhouse_compression Compression)
add_object_library(clickhouse_querypipeline QueryPipeline)
add_object_library(clickhouse_datatypes DataTypes)
add_object_library(clickhouse_datatypes_serializations DataTypes/Serializations)
@ -422,6 +415,7 @@ dbms_target_link_libraries (
boost::filesystem
boost::program_options
clickhouse_common_config
clickhouse_common_zookeeper_base
clickhouse_common_zookeeper
clickhouse_dictionaries_embedded
clickhouse_parsers
@ -429,6 +423,7 @@ dbms_target_link_libraries (
Poco::JSON
PUBLIC
boost::system
clickhouse_compression
clickhouse_common_io
Poco::Redis
)
@ -662,6 +657,7 @@ if (ENABLE_TESTS)
clickhouse_parsers
clickhouse_storages_system
dbms
clickhouse_common_zookeeper_base
clickhouse_common_config
clickhouse_common_zookeeper
hilite_comparator)

View File

@ -426,29 +426,90 @@ void ColumnLowCardinality::getPermutation(IColumn::PermutationSortDirection dire
getPermutationImpl(direction, stability, limit, nan_direction_hint, res);
}
namespace
{
/// Comparator for sorting LowCardinality column with the help of sorted dictionary.
/// NOTE: Dictionary itself must be sorted in ASC or DESC order depending on the requested direction.
template <typename IndexColumn, bool stable>
struct LowCardinalityComparator
{
const IndexColumn & real_indexes; /// Indexes column
const PaddedPODArray<UInt64> & position_by_index; /// Maps original dictionary index to position in sorted dictionary
inline bool operator () (size_t lhs, size_t rhs) const
{
int ret;
const UInt64 lhs_index = real_indexes.getUInt(lhs);
const UInt64 rhs_index = real_indexes.getUInt(rhs);
if (lhs_index == rhs_index)
ret = 0;
else
ret = CompareHelper<UInt64>::compare(position_by_index[lhs_index], position_by_index[rhs_index], 0);
if (stable && ret == 0)
return lhs < rhs;
return ret < 0;
}
};
}
template <typename IndexColumn>
void ColumnLowCardinality::updatePermutationWithIndexType(
IColumn::PermutationSortStability stability, size_t limit, const PaddedPODArray<UInt64> & position_by_index,
IColumn::Permutation & res, EqualRanges & equal_ranges) const
{
/// Cast indexes column to the real type so that compareAt and getUInt methods can be inlined.
const IndexColumn * real_indexes = assert_cast<const IndexColumn *>(&getIndexes());
auto equal_comparator = [real_indexes](size_t lhs, size_t rhs)
{
return real_indexes->getUInt(lhs) == real_indexes->getUInt(rhs);
};
const bool stable = (stability == IColumn::PermutationSortStability::Stable);
if (stable)
updatePermutationImpl(limit, res, equal_ranges, LowCardinalityComparator<IndexColumn, true>{*real_indexes, position_by_index}, equal_comparator, DefaultSort(), DefaultPartialSort());
else
updatePermutationImpl(limit, res, equal_ranges, LowCardinalityComparator<IndexColumn, false>{*real_indexes, position_by_index}, equal_comparator, DefaultSort(), DefaultPartialSort());
}
void ColumnLowCardinality::updatePermutation(IColumn::PermutationSortDirection direction, IColumn::PermutationSortStability stability,
size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_ranges) const
{
bool ascending = direction == IColumn::PermutationSortDirection::Ascending;
IColumn::Permutation dict_perm;
getDictionary().getNestedColumn()->getPermutation(direction, stability, 0, nan_direction_hint, dict_perm);
auto comparator = [this, ascending, stability, nan_direction_hint](size_t lhs, size_t rhs)
/// This is a paranoid check, but in other places in code empty permutation is used to indicate that no sorting is needed.
if (dict_perm.size() != getDictionary().size())
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Dictionary permutation size {} is equal to dictionary size {}. It is a bug.",
dict_perm.size(), getDictionary().size());
PaddedPODArray<UInt64> position_by_index(dict_perm.size());
for (size_t i = 0; i < dict_perm.size(); ++i)
position_by_index[dict_perm[i]] = i;
/// Dispatch by index column type.
switch (idx.getSizeOfIndexType())
{
int ret = getDictionary().compareAt(getIndexes().getUInt(lhs), getIndexes().getUInt(rhs), getDictionary(), nan_direction_hint);
if (unlikely(stability == IColumn::PermutationSortStability::Stable && ret == 0))
return lhs < rhs;
if (ascending)
return ret < 0;
return ret > 0;
};
auto equal_comparator = [this, nan_direction_hint](size_t lhs, size_t rhs)
{
int ret = getDictionary().compareAt(getIndexes().getUInt(lhs), getIndexes().getUInt(rhs), getDictionary(), nan_direction_hint);
return ret == 0;
};
updatePermutationImpl(limit, res, equal_ranges, comparator, equal_comparator, DefaultSort(), DefaultPartialSort());
case sizeof(UInt8):
updatePermutationWithIndexType<ColumnUInt8>(stability, limit, position_by_index, res, equal_ranges);
return;
case sizeof(UInt16):
updatePermutationWithIndexType<ColumnUInt16>(stability, limit, position_by_index, res, equal_ranges);
return;
case sizeof(UInt32):
updatePermutationWithIndexType<ColumnUInt32>(stability, limit, position_by_index, res, equal_ranges);
return;
case sizeof(UInt64):
updatePermutationWithIndexType<ColumnUInt64>(stability, limit, position_by_index, res, equal_ranges);
return;
default: throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected size of index type for low cardinality column.");
}
}
void ColumnLowCardinality::getPermutationWithCollation(const Collator & collator, IColumn::PermutationSortDirection direction, IColumn::PermutationSortStability stability,

View File

@ -190,6 +190,26 @@ public:
callback(dictionary.getColumnUniquePtr());
}
void forEachSubcolumnRecursively(RecursiveColumnCallback callback) const override
{
/** It is important to have both const and non-const versions here.
* The behavior of ColumnUnique::forEachSubcolumnRecursively differs between const and non-const versions.
* The non-const version will update a field in ColumnUnique.
* In the meantime, the default implementation IColumn::forEachSubcolumnRecursively uses const_cast,
* so when the const version is called, the field will still be mutated.
* This can lead to a data race if constness is expected.
*/
callback(*idx.getPositionsPtr());
idx.getPositionsPtr()->forEachSubcolumnRecursively(callback);
/// Column doesn't own dictionary if it's shared.
if (!dictionary.isShared())
{
callback(*dictionary.getColumnUniquePtr());
dictionary.getColumnUniquePtr()->forEachSubcolumnRecursively(callback);
}
}
void forEachSubcolumnRecursively(RecursiveMutableColumnCallback callback) override
{
callback(*idx.getPositionsPtr());
@ -389,6 +409,11 @@ private:
int compareAtImpl(size_t n, size_t m, const IColumn & rhs, int nan_direction_hint, const Collator * collator=nullptr) const;
void getPermutationImpl(IColumn::PermutationSortDirection direction, IColumn::PermutationSortStability stability, size_t limit, int nan_direction_hint, Permutation & res, const Collator * collator = nullptr) const;
template <typename IndexColumn>
void updatePermutationWithIndexType(
IColumn::PermutationSortStability stability, size_t limit, const PaddedPODArray<UInt64> & position_by_index,
IColumn::Permutation & res, EqualRanges & equal_ranges) const;
};
bool isColumnLowCardinalityNullable(const IColumn & column);

View File

@ -11,20 +11,12 @@ set (SRCS
add_library(clickhouse_common_config ${SRCS})
target_link_libraries(clickhouse_common_config
PUBLIC
clickhouse_common_zookeeper_base
clickhouse_common_zookeeper
common
Poco::XML
)
add_library(clickhouse_common_config_no_zookeeper_log ${SRCS})
target_link_libraries(clickhouse_common_config_no_zookeeper_log
PUBLIC
clickhouse_common_zookeeper_no_log
common
Poco::XML
)
if (TARGET ch_contrib::yaml_cpp)
target_link_libraries(clickhouse_common_config PRIVATE ch_contrib::yaml_cpp)
target_link_libraries(clickhouse_common_config_no_zookeeper_log PRIVATE ch_contrib::yaml_cpp)
endif()

View File

@ -79,6 +79,8 @@ static struct InitFiu
REGULAR(zero_copy_lock_zk_fail_after_op) \
REGULAR(plain_object_storage_write_fail_on_directory_create) \
REGULAR(plain_object_storage_write_fail_on_directory_move) \
REGULAR(zero_copy_unlock_zk_fail_before_op) \
REGULAR(zero_copy_unlock_zk_fail_after_op) \
namespace FailPoints

View File

@ -60,6 +60,9 @@ LoggerPtr ShellCommand::getLogger()
ShellCommand::~ShellCommand()
{
if (do_not_terminate)
return;
if (wait_called)
return;
@ -293,11 +296,48 @@ std::unique_ptr<ShellCommand> ShellCommand::executeDirect(const ShellCommand::Co
return executeImpl(path.data(), argv.data(), config);
}
struct ShellCommand::tryWaitResult
{
bool is_process_terminated = false;
int retcode = -1;
};
int ShellCommand::tryWait()
{
return tryWaitImpl(true).retcode;
}
ShellCommand::tryWaitResult ShellCommand::tryWaitImpl(bool blocking)
{
LOG_TRACE(getLogger(), "Will wait for shell command pid {}", pid);
ShellCommand::tryWaitResult result;
int options = ((!blocking) ? WNOHANG : 0);
int status = 0;
int waitpid_retcode = -1;
while (waitpid_retcode < 0)
{
waitpid_retcode = waitpid(pid, &status, options);
if (waitpid_retcode > 0)
{
break;
}
if (!blocking && !waitpid_retcode)
{
result.is_process_terminated = false;
return result;
}
if (errno != EINTR)
throw ErrnoException(ErrorCodes::CANNOT_WAITPID, "Cannot waitpid");
}
LOG_TRACE(getLogger(), "Wait for shell command pid {} completed with status {}", pid, status);
wait_called = true;
result.is_process_terminated = true;
in.close();
out.close();
err.close();
@ -308,19 +348,11 @@ int ShellCommand::tryWait()
for (auto & [_, fd] : read_fds)
fd.close();
LOG_TRACE(getLogger(), "Will wait for shell command pid {}", pid);
int status = 0;
while (waitpid(pid, &status, 0) < 0)
{
if (errno != EINTR)
throw ErrnoException(ErrorCodes::CANNOT_WAITPID, "Cannot waitpid");
}
LOG_TRACE(getLogger(), "Wait for shell command pid {} completed with status {}", pid, status);
if (WIFEXITED(status))
return WEXITSTATUS(status);
{
result.retcode = WEXITSTATUS(status);
return result;
}
if (WIFSIGNALED(status))
throw Exception(ErrorCodes::CHILD_WAS_NOT_EXITED_NORMALLY, "Child process was terminated by signal {}", toString(WTERMSIG(status)));
@ -332,10 +364,8 @@ int ShellCommand::tryWait()
}
void ShellCommand::wait()
void ShellCommand::handleProcessRetcode(int retcode) const
{
int retcode = tryWait();
if (retcode != EXIT_SUCCESS)
{
switch (retcode)
@ -358,5 +388,22 @@ void ShellCommand::wait()
}
}
bool ShellCommand::waitIfProccesTerminated()
{
auto proc_status = tryWaitImpl(false);
if (proc_status.is_process_terminated)
{
handleProcessRetcode(proc_status.retcode);
}
return proc_status.is_process_terminated;
}
void ShellCommand::wait()
{
int retcode = tryWaitImpl(true).retcode;
handleProcessRetcode(retcode);
}
}

View File

@ -67,6 +67,21 @@ public:
DestructorStrategy terminate_in_destructor_strategy = DestructorStrategy(false, 0);
};
pid_t getPid() const
{
return pid;
}
bool isWaitCalled() const
{
return wait_called;
}
void setDoNotTerminate()
{
do_not_terminate = true;
}
/// Run the command using /bin/sh -c.
/// If terminate_in_destructor is true, send terminate signal in destructor and don't wait process.
static std::unique_ptr<ShellCommand> execute(const Config & config);
@ -81,6 +96,10 @@ public:
/// Wait for the process to finish, see the return code. To throw an exception if the process was not completed independently.
int tryWait();
/// Returns if process terminated.
/// If process terminated, then handle return code.
bool waitIfProccesTerminated();
WriteBufferFromFile in; /// If the command reads from stdin, do not forget to call in.close() after writing all the data there.
ReadBufferFromFile out;
ReadBufferFromFile err;
@ -92,10 +111,16 @@ private:
pid_t pid;
Config config;
bool wait_called = false;
bool do_not_terminate = false;
ShellCommand(pid_t pid_, int & in_fd_, int & out_fd_, int & err_fd_, const Config & config);
bool tryWaitProcessWithTimeout(size_t timeout_in_seconds);
struct tryWaitResult;
tryWaitResult tryWaitImpl(bool blocking);
void handleProcessRetcode(int retcode) const;
static LoggerPtr getLogger();

View File

@ -0,0 +1,53 @@
#include <Common/logger_useful.h>
#include <Common/Exception.h>
#include <Common/ShellCommandsHolder.h>
namespace DB
{
ShellCommandsHolder & ShellCommandsHolder::instance()
{
static ShellCommandsHolder instance;
return instance;
}
void ShellCommandsHolder::removeCommand(pid_t pid)
{
std::lock_guard lock(mutex);
bool is_erased = shell_commands.erase(pid);
LOG_TRACE(log, "Try to erase command with the pid {}, is_erased: {}", pid, is_erased);
}
void ShellCommandsHolder::addCommand(std::unique_ptr<ShellCommand> command)
{
std::lock_guard lock(mutex);
pid_t command_pid = command->getPid();
if (command->waitIfProccesTerminated())
{
LOG_TRACE(log, "Pid {} already finished. Do not insert it.", command_pid);
return;
}
auto [iterator, is_inserted] = shell_commands.try_emplace(command_pid, std::move(command));
if (is_inserted)
{
LOG_TRACE(log, "Inserted the command with pid {}", command_pid);
return;
}
if (iterator->second->isWaitCalled())
{
iterator->second = std::move(command);
LOG_TRACE(log, "Replaced the command with pid {}", command_pid);
return;
}
/// We got two active ShellCommand with the same pid.
/// Probably it is a bug, will try to replace the old shell command with a new one.
chassert(false);
LOG_WARNING(log, "The PID already presented in active shell commands, will try to replace with a new one.");
iterator->second->setDoNotTerminate();
iterator->second = std::move(command);
}
}

View File

@ -0,0 +1,32 @@
#pragma once
#include <Common/ShellCommand.h>
#include <boost/noncopyable.hpp>
#include <memory>
#include <mutex>
#include <unordered_map>
namespace DB
{
/** The holder class for running background shell processes.
*/
class ShellCommandsHolder final : public boost::noncopyable
{
public:
static ShellCommandsHolder & instance();
void removeCommand(pid_t pid);
void addCommand(std::unique_ptr<ShellCommand> command);
private:
using ShellCommands = std::unordered_map<pid_t, std::unique_ptr<ShellCommand>>;
std::mutex mutex;
ShellCommands shell_commands TSA_GUARDED_BY(mutex);
LoggerPtr log = getLogger("ShellCommandsHolder");
};
}

View File

@ -1,6 +1,7 @@
#include <Common/SignalHandlers.h>
#include <Common/config_version.h>
#include <Common/getHashOfLoadedBinary.h>
#include <Common/ShellCommandsHolder.h>
#include <Common/CurrentThread.h>
#include <Daemon/BaseDaemon.h>
#include <Daemon/SentryWriter.h>
@ -68,6 +69,20 @@ void terminateRequestedSignalHandler(int sig, siginfo_t *, void *)
writeSignalIDtoSignalPipe(sig);
}
void childSignalHandler(int sig, siginfo_t * info, void *)
{
DENY_ALLOCATIONS_IN_SCOPE;
auto saved_errno = errno; /// We must restore previous value of errno in signal handler.
char buf[signal_pipe_buf_size];
auto & signal_pipe = HandledSignals::instance().signal_pipe;
WriteBufferFromFileDescriptor out(signal_pipe.fds_rw[1], signal_pipe_buf_size, buf);
writeBinary(sig, out);
writeBinary(info->si_pid, out);
out.finalize();
errno = saved_errno;
}
void signalHandler(int sig, siginfo_t * info, void * context)
{
@ -294,6 +309,12 @@ void SignalListener::run()
if (daemon)
daemon->handleSignal(sig);
}
else if (sig == SIGCHLD)
{
pid_t child_pid = 0;
readBinary(child_pid, in);
ShellCommandsHolder::instance().removeCommand(child_pid);
}
else
{
siginfo_t info{};

View File

@ -33,6 +33,7 @@ void closeLogsSignalHandler(int sig, siginfo_t *, void *);
void terminateRequestedSignalHandler(int sig, siginfo_t *, void *);
void childSignalHandler(int sig, siginfo_t * info, void *);
/** Handler for "fault" or diagnostic signals. Send data about fault to separate thread to write into log.
*/

View File

@ -2,10 +2,20 @@ include("${ClickHouse_SOURCE_DIR}/cmake/dbms_glob_sources.cmake")
add_headers_and_sources(clickhouse_common_zookeeper .)
list(APPEND clickhouse_common_zookeeper_sources ${CMAKE_CURRENT_SOURCE_DIR}/../../../src/Coordination/KeeperFeatureFlags.cpp)
# Needs to be built differently depending on ZOOKEEPER_LOG
list(REMOVE_ITEM clickhouse_common_zookeeper_sources "ZooKeeperImpl.cpp")
add_library(clickhouse_common_zookeeper_base ${clickhouse_common_zookeeper_headers} ${clickhouse_common_zookeeper_sources})
target_link_libraries (clickhouse_common_zookeeper_base
PUBLIC
clickhouse_common_io
clickhouse_compression
common
)
# for clickhouse server
add_library(clickhouse_common_zookeeper ${clickhouse_common_zookeeper_headers} ${clickhouse_common_zookeeper_sources})
add_library(clickhouse_common_zookeeper ZooKeeperImpl.cpp)
target_compile_definitions (clickhouse_common_zookeeper PRIVATE -DZOOKEEPER_LOG)
target_link_libraries (clickhouse_common_zookeeper
PUBLIC
@ -15,7 +25,7 @@ target_link_libraries (clickhouse_common_zookeeper
)
# for examples -- no logging (to avoid extra dependencies)
add_library(clickhouse_common_zookeeper_no_log ${clickhouse_common_zookeeper_headers} ${clickhouse_common_zookeeper_sources})
add_library(clickhouse_common_zookeeper_no_log ZooKeeperImpl.cpp)
target_link_libraries (clickhouse_common_zookeeper_no_log
PUBLIC
clickhouse_common_io

View File

@ -2,7 +2,7 @@
#include <base/types.h>
#include <Common/Exception.h>
#include <Coordination/KeeperFeatureFlags.h>
#include <Common/ZooKeeper/KeeperFeatureFlags.h>
#include <Poco/Net/SocketAddress.h>
#include <vector>

View File

@ -1,4 +1,4 @@
#include <Coordination/KeeperFeatureFlags.h>
#include <Common/ZooKeeper/KeeperFeatureFlags.h>
#include <Common/ErrorCodes.h>
#include <Common/Exception.h>
#include <Common/logger_useful.h>

View File

@ -11,7 +11,7 @@
#include <Common/ZooKeeper/ZooKeeperArgs.h>
#include <Common/ThreadPool.h>
#include <Common/ConcurrentBoundedQueue.h>
#include <Coordination/KeeperFeatureFlags.h>
#include <Common/ZooKeeper/KeeperFeatureFlags.h>
namespace Coordination

View File

@ -1,5 +1,5 @@
#include "ZooKeeper.h"
#include "Coordination/KeeperFeatureFlags.h"
#include "Common/ZooKeeper/KeeperFeatureFlags.h"
#include "ZooKeeperImpl.h"
#include "KeeperException.h"
#include "TestKeeper.h"

View File

@ -14,7 +14,7 @@
#include <Common/ZooKeeper/ZooKeeperConstants.h>
#include <Common/ZooKeeper/ZooKeeperArgs.h>
#include <Common/thread_local_rng.h>
#include <Coordination/KeeperFeatureFlags.h>
#include <Common/ZooKeeper/KeeperFeatureFlags.h>
#include <unistd.h>

View File

@ -10,7 +10,7 @@
#include <Common/ZooKeeper/ZooKeeperArgs.h>
#include <Common/ZooKeeper/ZooKeeper.h>
#include <Coordination/KeeperConstants.h>
#include <Coordination/KeeperFeatureFlags.h>
#include <Common/ZooKeeper/KeeperFeatureFlags.h>
#include <IO/ReadBuffer.h>
#include <IO/WriteBuffer.h>

View File

@ -1,15 +1,15 @@
clickhouse_add_executable(zkutil_test_commands zkutil_test_commands.cpp)
target_link_libraries(zkutil_test_commands PRIVATE
clickhouse_common_zookeeper_no_log
clickhouse_common_zookeeper_base clickhouse_common_zookeeper_no_log
dbms)
clickhouse_add_executable(zkutil_test_commands_new_lib zkutil_test_commands_new_lib.cpp)
target_link_libraries(zkutil_test_commands_new_lib PRIVATE
clickhouse_common_zookeeper_no_log
clickhouse_common_zookeeper_base clickhouse_common_zookeeper_no_log
clickhouse_compression
dbms)
clickhouse_add_executable(zkutil_test_async zkutil_test_async.cpp)
target_link_libraries(zkutil_test_async PRIVATE
clickhouse_common_zookeeper_no_log
clickhouse_common_zookeeper_base clickhouse_common_zookeeper_no_log
dbms)

View File

@ -11,7 +11,7 @@
#include <Common/getMaxFileDescriptorCount.h>
#include <Common/StringUtils.h>
#include <Common/config_version.h>
#include "Coordination/KeeperFeatureFlags.h"
#include "Common/ZooKeeper/KeeperFeatureFlags.h"
#include <Coordination/Keeper4LWInfo.h>
#include <IO/WriteHelpers.h>
#include <IO/WriteBufferFromString.h>

View File

@ -13,7 +13,7 @@
#include <Poco/Util/JSONConfiguration.h>
#include <Coordination/KeeperConstants.h>
#include <Server/CloudPlacementInfo.h>
#include <Coordination/KeeperFeatureFlags.h>
#include <Common/ZooKeeper/KeeperFeatureFlags.h>
#include <Disks/DiskSelector.h>
#include <Common/logger_useful.h>

View File

@ -1,5 +1,5 @@
#pragma once
#include <Coordination/KeeperFeatureFlags.h>
#include <Common/ZooKeeper/KeeperFeatureFlags.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Common/ZooKeeper/ZooKeeperConstants.h>
#include <atomic>

View File

@ -12,7 +12,7 @@
#include <Coordination/InMemoryLogStore.h>
#include <Coordination/KeeperContext.h>
#include <Coordination/KeeperConstants.h>
#include <Coordination/KeeperFeatureFlags.h>
#include <Common/ZooKeeper/KeeperFeatureFlags.h>
#include <Coordination/KeeperLogStore.h>
#include <Coordination/KeeperSnapshotManager.h>
#include <Coordination/KeeperStateMachine.h>

View File

@ -1924,13 +1924,6 @@ See also:
For single JOIN in case of identifier ambiguity prefer left table
)", IMPORTANT) \
\
DECLARE(BoolAuto, query_plan_join_swap_table, Field("auto"), R"(
Determine which side of the join should be the build table (also called inner, the one inserted into the hash table for a hash join) in the query plan. This setting is supported only for `ALL` join strictness with the `JOIN ON` clause. Possible values are:
- 'auto': Let the planner decide which table to use as the build table.
- 'false': Never swap tables (the right table is the build table).
- 'true': Always swap tables (the left table is the build table).
)", 0) \
\
DECLARE(UInt64, preferred_block_size_bytes, 1000000, R"(
This setting adjusts the data block size for query processing and represents additional fine-tuning to the more rough 'max_block_size' setting. If the columns are large and with 'max_block_size' rows the block size is likely to be larger than the specified amount of bytes, its size will be lowered for better CPU cache locality.
)", 0) \

View File

@ -45,7 +45,6 @@ class WriteBuffer;
#define COMMON_SETTINGS_SUPPORTED_TYPES(CLASS_NAME, M) \
M(CLASS_NAME, ArrowCompression) \
M(CLASS_NAME, Bool) \
M(CLASS_NAME, BoolAuto) \
M(CLASS_NAME, CapnProtoEnumComparingMode) \
M(CLASS_NAME, Char) \
M(CLASS_NAME, DateTimeInputFormat) \

View File

@ -60,7 +60,6 @@ static std::initializer_list<std::pair<ClickHouseVersion, SettingsChangesHistory
{
{"24.12",
{
{"query_plan_join_swap_table", "false", "auto", "New setting. Right table was always chosen before."},
}
},
{"24.11",

View File

@ -128,8 +128,10 @@ constexpr auto getEnumValues();
DECLARE_SETTING_ENUM(LoadBalancing)
DECLARE_SETTING_ENUM(JoinStrictness)
DECLARE_SETTING_MULTI_ENUM(JoinAlgorithm)
/// Which rows should be included in TOTALS.
enum class TotalsMode : uint8_t
{

View File

@ -2,13 +2,13 @@
#include <chrono>
#include <string_view>
#include <optional>
#include <Core/Field.h>
#include <Core/MultiEnum.h>
#include <base/types.h>
#include <Poco/Timespan.h>
#include <Poco/URI.h>
namespace DB
{
namespace ErrorCodes
@ -125,10 +125,8 @@ struct SettingAutoWrapper
void readBinary(ReadBuffer & in) { changed = true; is_auto = false; base.readBinary(in); }
Type valueOr(Type default_value) const { return is_auto ? default_value : base.value; }
std::optional<Type> get() const { return is_auto ? std::nullopt : std::make_optional(base.value); }
};
using SettingFieldBoolAuto = SettingAutoWrapper<SettingFieldBool>;
using SettingFieldUInt64Auto = SettingAutoWrapper<SettingFieldUInt64>;
using SettingFieldInt64Auto = SettingAutoWrapper<SettingFieldInt64>;
using SettingFieldFloatAuto = SettingAutoWrapper<SettingFieldFloat>;

View File

@ -440,6 +440,7 @@ void BaseDaemon::initializeTerminationAndSignalProcessing()
HandledSignals::instance().setupCommonDeadlySignalHandlers();
HandledSignals::instance().setupCommonTerminateRequestSignalHandlers();
HandledSignals::instance().addSignalHandler({SIGHUP}, closeLogsSignalHandler, true);
HandledSignals::instance().addSignalHandler({SIGCHLD}, childSignalHandler, true);
/// Set up Poco ErrorHandler for Poco Threads.
static KillingErrorHandler killing_error_handler;

View File

@ -768,9 +768,9 @@ std::unique_ptr<WriteBufferFromFileBase> DiskObjectStorageTransaction::writeFile
}
else
{
auto write_operation = std::make_unique<WriteFileObjectStorageOperation>(object_storage, metadata_storage, object);
auto write_operation = std::make_shared<WriteFileObjectStorageOperation>(object_storage, metadata_storage, object);
create_metadata_callback = [object_storage_tx = shared_from_this(), write_op = write_operation.get(), mode, path, key_ = std::move(object_key)](size_t count)
create_metadata_callback = [object_storage_tx = shared_from_this(), write_op = write_operation, mode, path, key_ = std::move(object_key)](size_t count)
{
/// This callback called in WriteBuffer finalize method -- only there we actually know
/// how many bytes were written. We don't control when this finalize method will be called

View File

@ -1,5 +1,6 @@
#pragma once
#include <memory>
#include <Disks/IDiskTransaction.h>
#include <Disks/ObjectStorages/DiskObjectStorage.h>
#include <Disks/ObjectStorages/IMetadataStorage.h>
@ -34,7 +35,7 @@ public:
virtual std::string getInfoForLog() const = 0;
};
using DiskObjectStorageOperation = std::unique_ptr<IDiskObjectStorageOperation>;
using DiskObjectStorageOperation = std::shared_ptr<IDiskObjectStorageOperation>;
using DiskObjectStorageOperations = std::vector<DiskObjectStorageOperation>;

View File

@ -695,7 +695,7 @@ S3CredentialsProviderChain::S3CredentialsProviderChain(
static const char AWS_ECS_CONTAINER_CREDENTIALS_RELATIVE_URI[] = "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI";
static const char AWS_ECS_CONTAINER_CREDENTIALS_FULL_URI[] = "AWS_CONTAINER_CREDENTIALS_FULL_URI";
static const char AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN[] = "AWS_CONTAINER_AUTHORIZATION_TOKEN";
static const char AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN_PATH[] = "AWS_CONTAINER_AUTHORIZATION_TOKEN_PATH";
static const char AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN_FILE[] = "AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE";
static const char AWS_EC2_METADATA_DISABLED[] = "AWS_EC2_METADATA_DISABLED";
/// The only difference from DefaultAWSCredentialsProviderChain::DefaultAWSCredentialsProviderChain()
@ -754,11 +754,11 @@ S3CredentialsProviderChain::S3CredentialsProviderChain(
else if (!absolute_uri.empty())
{
auto token = Aws::Environment::GetEnv(AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN);
const auto token_path = Aws::Environment::GetEnv(AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN_PATH);
const auto token_path = Aws::Environment::GetEnv(AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN_FILE);
if (!token_path.empty())
{
LOG_INFO(logger, "The environment variable value {} is {}", AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN_PATH, token_path);
LOG_INFO(logger, "The environment variable value {} is {}", AWS_ECS_CONTAINER_AUTHORIZATION_TOKEN_FILE, token_path);
String token_from_file;

View File

@ -63,17 +63,6 @@ public:
IBlocksStreamPtr
getNonJoinedBlocks(const Block & left_sample_block, const Block & result_sample_block, UInt64 max_block_size) const override;
bool isCloneSupported() const override
{
return !getTotals() && getTotalRowCount() == 0;
}
std::shared_ptr<IJoin> clone(const std::shared_ptr<TableJoin> & table_join_, const Block &, const Block & right_sample_block_) const override
{
return std::make_shared<ConcurrentHashJoin>(context, table_join_, slots, right_sample_block_, stats_collecting_params);
}
private:
struct InternalHashJoin
{

View File

@ -540,9 +540,6 @@ struct ContextSharedPart : boost::noncopyable
/// No lock required for application_type modified only during initialization
Context::ApplicationType application_type = Context::ApplicationType::SERVER;
/// vector of xdbc-bridge commands, they will be killed when Context will be destroyed
std::vector<std::unique_ptr<ShellCommand>> bridge_commands TSA_GUARDED_BY(mutex);
/// No lock required for config_reload_callback, start_servers_callback, stop_servers_callback modified only during initialization
Context::ConfigReloadCallback config_reload_callback;
Context::StartStopServersCallback start_servers_callback;
@ -5067,12 +5064,6 @@ void Context::addQueryParameters(const NameToNameMap & parameters)
query_parameters.insert_or_assign(name, value);
}
void Context::addBridgeCommand(std::unique_ptr<ShellCommand> cmd) const
{
std::lock_guard lock(shared->mutex);
shared->bridge_commands.emplace_back(std::move(cmd));
}
IHostContextPtr & Context::getHostContext()
{

View File

@ -1288,8 +1288,6 @@ public:
/// Overrides values of existing parameters.
void addQueryParameters(const NameToNameMap & parameters);
/// Add started bridge command. It will be killed after context destruction
void addBridgeCommand(std::unique_ptr<ShellCommand> cmd) const;
IHostContextPtr & getHostContext();
const IHostContextPtr & getHostContext() const;

View File

@ -36,7 +36,7 @@ public:
bool isCloneSupported() const override
{
return !getTotals();
return true;
}
std::shared_ptr<IJoin> clone(const std::shared_ptr<TableJoin> & table_join_,

View File

@ -431,16 +431,6 @@ size_t HashJoin::getTotalByteCount() const
return res;
}
bool HashJoin::isUsedByAnotherAlgorithm() const
{
return table_join->isEnabledAlgorithm(JoinAlgorithm::AUTO) || table_join->isEnabledAlgorithm(JoinAlgorithm::GRACE_HASH);
}
bool HashJoin::canRemoveColumnsFromLeftBlock() const
{
return table_join->enableEnalyzer() && !table_join->hasUsing() && !isUsedByAnotherAlgorithm() && strictness != JoinStrictness::RightAny;
}
void HashJoin::initRightBlockStructure(Block & saved_block_sample)
{
if (isCrossOrComma(kind))
@ -452,10 +442,8 @@ void HashJoin::initRightBlockStructure(Block & saved_block_sample)
bool multiple_disjuncts = !table_join->oneDisjunct();
/// We could remove key columns for LEFT | INNER HashJoin but we should keep them for JoinSwitcher (if any).
bool save_key_columns = isUsedByAnotherAlgorithm() ||
isRightOrFull(kind) ||
multiple_disjuncts ||
table_join->getMixedJoinExpression();
bool save_key_columns = table_join->isEnabledAlgorithm(JoinAlgorithm::AUTO) || table_join->isEnabledAlgorithm(JoinAlgorithm::GRACE_HASH)
|| isRightOrFull(kind) || multiple_disjuncts || table_join->getMixedJoinExpression();
if (save_key_columns)
{
saved_block_sample = right_table_keys.cloneEmpty();
@ -1368,10 +1356,7 @@ HashJoin::getNonJoinedBlocks(const Block & left_sample_block, const Block & resu
{
if (!JoinCommon::hasNonJoinedBlocks(*table_join))
return {};
size_t left_columns_count = left_sample_block.columns();
if (canRemoveColumnsFromLeftBlock())
left_columns_count = table_join->getOutputColumns(JoinTableSide::Left).size();
bool flag_per_row = needUsedFlagsForPerRightTableRow(table_join);
if (!flag_per_row)
@ -1380,9 +1365,14 @@ HashJoin::getNonJoinedBlocks(const Block & left_sample_block, const Block & resu
size_t expected_columns_count = left_columns_count + required_right_keys.columns() + sample_block_with_columns_to_add.columns();
if (expected_columns_count != result_sample_block.columns())
{
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected number of columns in result sample block: {} expected {} ([{}] + [{}] + [{}])",
result_sample_block.columns(), expected_columns_count,
left_sample_block.dumpNames(), required_right_keys.dumpNames(), sample_block_with_columns_to_add.dumpNames());
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Unexpected number of columns in result sample block: {} instead of {} ({} + {} + {})",
result_sample_block.columns(),
expected_columns_count,
left_columns_count,
required_right_keys.columns(),
sample_block_with_columns_to_add.columns());
}
}

View File

@ -126,7 +126,7 @@ public:
bool isCloneSupported() const override
{
return !getTotals() && getTotalRowCount() == 0;
return true;
}
std::shared_ptr<IJoin> clone(const std::shared_ptr<TableJoin> & table_join_,
@ -484,9 +484,6 @@ private:
bool empty() const;
bool isUsedByAnotherAlgorithm() const;
bool canRemoveColumnsFromLeftBlock() const;
void validateAdditionalFilterExpression(std::shared_ptr<ExpressionActions> additional_filter_expression);
bool needUsedFlagsForPerRightTableRow(std::shared_ptr<TableJoin> table_join_) const;

View File

@ -80,7 +80,6 @@ ScatteredBlock HashJoinMethods<KIND, STRICTNESS, MapsTemplate>::joinBlockImpl(
const auto & key_names = !is_join_get ? onexprs[i].key_names_left : onexprs[i].key_names_right;
join_on_keys.emplace_back(block, key_names, onexprs[i].condColumnNames().first, join.key_sizes[i]);
}
auto & source_block = block.getSourceBlock();
size_t existing_columns = source_block.columns();
@ -122,20 +121,6 @@ ScatteredBlock HashJoinMethods<KIND, STRICTNESS, MapsTemplate>::joinBlockImpl(
block.filterBySelector();
const auto & table_join = join.table_join;
std::set<size_t> block_columns_to_erase;
if (join.canRemoveColumnsFromLeftBlock())
{
std::unordered_set<String> left_output_columns;
for (const auto & out_column : table_join->getOutputColumns(JoinTableSide::Left))
left_output_columns.insert(out_column.name);
for (size_t i = 0; i < source_block.columns(); ++i)
{
if (!left_output_columns.contains(source_block.getByPosition(i).name))
block_columns_to_erase.insert(i);
}
}
for (size_t i = 0; i < added_columns.size(); ++i)
source_block.insert(added_columns.moveColumn(i));
@ -191,7 +176,6 @@ ScatteredBlock HashJoinMethods<KIND, STRICTNESS, MapsTemplate>::joinBlockImpl(
columns[pos] = columns[pos]->replicate(offsets);
block.getSourceBlock().setColumns(columns);
block.getSourceBlock().erase(block_columns_to_erase);
block = ScatteredBlock(std::move(block).getSourceBlock());
}
return remaining_block;

View File

@ -1889,9 +1889,7 @@ void InterpreterSelectQuery::executeImpl(QueryPlan & query_plan, std::optional<P
settings[Setting::max_block_size],
0,
max_streams,
/* required_output_ = */ NameSet{},
analysis_result.optimize_read_in_order,
/* use_new_analyzer_ = */ false);
analysis_result.optimize_read_in_order);
join_step->setStepDescription(fmt::format("JOIN {}", expressions.join->pipelineType()));
std::vector<QueryPlanPtr> plans;

View File

@ -41,7 +41,6 @@ namespace DB
namespace Setting
{
extern const SettingsBool allow_experimental_join_right_table_sorting;
extern const SettingsBool allow_experimental_analyzer;
extern const SettingsUInt64 cross_join_min_bytes_to_compress;
extern const SettingsUInt64 cross_join_min_rows_to_compress;
extern const SettingsUInt64 default_max_bytes_in_join;
@ -144,7 +143,6 @@ TableJoin::TableJoin(const Settings & settings, VolumePtr tmp_volume_, Temporary
, max_memory_usage(settings[Setting::max_memory_usage])
, tmp_volume(tmp_volume_)
, tmp_data(tmp_data_)
, enable_analyzer(settings[Setting::allow_experimental_analyzer])
{
}
@ -163,8 +161,6 @@ void TableJoin::resetCollected()
clauses.clear();
columns_from_joined_table.clear();
columns_added_by_join.clear();
columns_from_left_table.clear();
result_columns_from_left_table.clear();
original_names.clear();
renames.clear();
left_type_map.clear();
@ -207,20 +203,6 @@ size_t TableJoin::rightKeyInclusion(const String & name) const
return count;
}
void TableJoin::setInputColumns(NamesAndTypesList left_output_columns, NamesAndTypesList right_output_columns)
{
columns_from_left_table = std::move(left_output_columns);
columns_from_joined_table = std::move(right_output_columns);
}
const NamesAndTypesList & TableJoin::getOutputColumns(JoinTableSide side)
{
if (side == JoinTableSide::Left)
return result_columns_from_left_table;
return columns_added_by_join;
}
void TableJoin::deduplicateAndQualifyColumnNames(const NameSet & left_table_columns, const String & right_table_prefix)
{
NameSet joined_columns;
@ -369,18 +351,9 @@ bool TableJoin::rightBecomeNullable(const DataTypePtr & column_type) const
return forceNullableRight() && JoinCommon::canBecomeNullable(column_type);
}
void TableJoin::setUsedColumn(const NameAndTypePair & joined_column, JoinTableSide side)
{
if (side == JoinTableSide::Left)
result_columns_from_left_table.push_back(joined_column);
else
columns_added_by_join.push_back(joined_column);
}
void TableJoin::addJoinedColumn(const NameAndTypePair & joined_column)
{
setUsedColumn(joined_column, JoinTableSide::Right);
columns_added_by_join.emplace_back(joined_column);
}
NamesAndTypesList TableJoin::correctedColumnsAddedByJoin() const
@ -1022,32 +995,5 @@ size_t TableJoin::getMaxMemoryUsage() const
return max_memory_usage;
}
void TableJoin::swapSides()
{
assertEnableEnalyzer();
std::swap(key_asts_left, key_asts_right);
std::swap(left_type_map, right_type_map);
for (auto & clause : clauses)
{
std::swap(clause.key_names_left, clause.key_names_right);
std::swap(clause.on_filter_condition_left, clause.on_filter_condition_right);
std::swap(clause.analyzer_left_filter_condition_column_name, clause.analyzer_right_filter_condition_column_name);
}
std::swap(columns_from_left_table, columns_from_joined_table);
std::swap(result_columns_from_left_table, columns_added_by_join);
if (table_join.kind == JoinKind::Left)
table_join.kind = JoinKind::Right;
else if (table_join.kind == JoinKind::Right)
table_join.kind = JoinKind::Left;
}
void TableJoin::assertEnableEnalyzer() const
{
if (!enable_analyzer)
throw DB::Exception(ErrorCodes::NOT_IMPLEMENTED, "TableJoin: analyzer is disabled");
}
}

View File

@ -172,9 +172,6 @@ private:
ASOFJoinInequality asof_inequality = ASOFJoinInequality::GreaterOrEquals;
NamesAndTypesList columns_from_left_table;
NamesAndTypesList result_columns_from_left_table;
/// All columns which can be read from joined table. Duplicating names are qualified.
NamesAndTypesList columns_from_joined_table;
/// Columns will be added to block by JOIN.
@ -210,8 +207,6 @@ private:
bool is_join_with_constant = false;
bool enable_analyzer = false;
Names requiredJoinedNames() const;
/// Create converting actions and change key column names if required
@ -275,8 +270,6 @@ public:
VolumePtr getGlobalTemporaryVolume() { return tmp_volume; }
bool enableEnalyzer() const { return enable_analyzer; }
void assertEnableEnalyzer() const;
TemporaryDataOnDiskScopePtr getTempDataOnDisk() { return tmp_data ? tmp_data->childScope(CurrentMetrics::TemporaryFilesForJoin) : nullptr; }
ActionsDAG createJoinedBlockActions(ContextPtr context) const;
@ -294,7 +287,6 @@ public:
}
bool allowParallelHashJoin() const;
void swapSides();
bool joinUseNulls() const { return join_use_nulls; }
@ -385,9 +377,6 @@ public:
bool leftBecomeNullable(const DataTypePtr & column_type) const;
bool rightBecomeNullable(const DataTypePtr & column_type) const;
void addJoinedColumn(const NameAndTypePair & joined_column);
void setUsedColumn(const NameAndTypePair & joined_column, JoinTableSide side);
void setColumnsAddedByJoin(const NamesAndTypesList & columns_added_by_join_value)
{
columns_added_by_join = columns_added_by_join_value;
@ -413,17 +402,11 @@ public:
ASTPtr leftKeysList() const;
ASTPtr rightKeysList() const; /// For ON syntax only
void setColumnsFromJoinedTable(NamesAndTypesList columns_from_joined_table_value, const NameSet & left_table_columns, const String & right_table_prefix, const NamesAndTypesList & columns_from_left_table_)
void setColumnsFromJoinedTable(NamesAndTypesList columns_from_joined_table_value, const NameSet & left_table_columns, const String & right_table_prefix)
{
columns_from_joined_table = std::move(columns_from_joined_table_value);
deduplicateAndQualifyColumnNames(left_table_columns, right_table_prefix);
result_columns_from_left_table = columns_from_left_table_;
columns_from_left_table = columns_from_left_table_;
}
void setInputColumns(NamesAndTypesList left_output_columns, NamesAndTypesList right_output_columns);
const NamesAndTypesList & getOutputColumns(JoinTableSide side);
const NamesAndTypesList & columnsFromJoinedTable() const { return columns_from_joined_table; }
const NamesAndTypesList & columnsAddedByJoin() const { return columns_added_by_join; }

View File

@ -1353,15 +1353,12 @@ TreeRewriterResultPtr TreeRewriter::analyzeSelect(
if (tables_with_columns.size() > 1)
{
auto columns_from_left_table = tables_with_columns[0].columns;
const auto & right_table = tables_with_columns[1];
auto columns_from_joined_table = right_table.columns;
/// query can use materialized or aliased columns from right joined table,
/// we want to request it for right table
columns_from_joined_table.insert(columns_from_joined_table.end(), right_table.hidden_columns.begin(), right_table.hidden_columns.end());
columns_from_left_table.insert(columns_from_left_table.end(), tables_with_columns[0].hidden_columns.begin(), tables_with_columns[0].hidden_columns.end());
result.analyzed_join->setColumnsFromJoinedTable(
std::move(columns_from_joined_table), source_columns_set, right_table.table.getQualifiedNamePrefix(), columns_from_left_table);
result.analyzed_join->setColumnsFromJoinedTable(std::move(columns_from_joined_table), source_columns_set, right_table.table.getQualifiedNamePrefix());
}
translateQualifiedNames(query, *select_query, source_columns_set, tables_with_columns);

View File

@ -16,6 +16,9 @@ namespace DB
namespace ErrorCodes
{
extern const int BAD_COLLATION;
#ifndef NDEBUG
extern const int LOGICAL_ERROR;
#endif
}
/// Column with description for sort
@ -272,6 +275,60 @@ bool isAlreadySortedImpl(size_t rows, Comparator compare)
return true;
}
#ifndef NDEBUG
template <typename Comparator>
void checkSortedWithPermutationImpl(size_t rows, Comparator compare, UInt64 limit, const IColumn::Permutation & permutation)
{
if (limit && limit < rows)
rows = limit;
const bool no_permutaiton = permutation.empty();
for (size_t i = 1; i < rows; ++i)
{
const size_t current_row = no_permutaiton ? i : permutation[i];
const size_t previous_row = no_permutaiton ? (i - 1) : permutation[i - 1];
if (compare(current_row, previous_row))
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Rows are not sorted with permutation, position {}, previous_row index {}, current_row index {}", i, previous_row, current_row);
}
}
void checkSortedWithPermutation(const Block & block, const SortDescription & description, UInt64 limit, const IColumn::Permutation & permutation)
{
if (!block)
return;
ColumnsWithSortDescriptions columns_with_sort_desc = getColumnsWithSortDescription(block, description);
bool is_collation_required = false;
for (auto & column_with_sort_desc : columns_with_sort_desc)
{
if (isCollationRequired(column_with_sort_desc.description))
{
is_collation_required = true;
break;
}
}
size_t rows = block.rows();
if (is_collation_required)
{
PartialSortingLessWithCollation less(columns_with_sort_desc);
checkSortedWithPermutationImpl(rows, less, limit, permutation);
return;
}
else
{
PartialSortingLess less(columns_with_sort_desc);
checkSortedWithPermutationImpl(rows, less, limit, permutation);
return;
}
}
#endif
}
void sortBlock(Block & block, const SortDescription & description, UInt64 limit)
@ -279,6 +336,10 @@ void sortBlock(Block & block, const SortDescription & description, UInt64 limit)
IColumn::Permutation permutation;
getBlockSortPermutationImpl(block, description, IColumn::PermutationSortStability::Unstable, limit, permutation);
#ifndef NDEBUG
checkSortedWithPermutation(block, description, limit, permutation);
#endif
if (permutation.empty())
return;
@ -303,6 +364,10 @@ void stableGetPermutation(const Block & block, const SortDescription & descripti
return;
getBlockSortPermutationImpl(block, description, IColumn::PermutationSortStability::Stable, 0, out_permutation);
#ifndef NDEBUG
checkSortedWithPermutation(block, description, 0, out_permutation);
#endif
}
bool isAlreadySorted(const Block & block, const SortDescription & description)

View File

@ -31,7 +31,7 @@ CreateQueryUUIDs::CreateQueryUUIDs(const ASTCreateQuery & query, bool generate_r
/// If we generate random UUIDs for already existing tables then those UUIDs will not be correct making those inner target table inaccessible.
/// Thus it's not safe for example to replace
/// "ATTACH MATERIALIZED VIEW mv AS SELECT a FROM b" with
/// "ATTACH MATERIALIZED VIEW mv TO INNER UUID '123e4567-e89b-12d3-a456-426614174000' AS SELECT a FROM b"
/// "ATTACH MATERIALIZED VIEW mv TO INNER UUID "248372b7-02c4-4c88-a5e1-282a83cc572a" AS SELECT a FROM b"
/// This replacement is safe only for CREATE queries when inner target tables don't exist yet.
if (!query.attach)
{

View File

@ -2,7 +2,6 @@
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/ColumnNode.h>
#include <Analyzer/JoinNode.h>
#include <Planner/PlannerContext.h>

View File

@ -104,7 +104,6 @@ namespace Setting
extern const SettingsBool optimize_move_to_prewhere;
extern const SettingsBool optimize_move_to_prewhere_if_final;
extern const SettingsBool use_concurrency_control;
extern const SettingsBoolAuto query_plan_join_swap_table;
extern const SettingsUInt64 min_joined_block_size_bytes;
}
@ -1269,55 +1268,6 @@ void joinCastPlanColumnsToNullable(QueryPlan & plan_to_add_cast, PlannerContextP
plan_to_add_cast.addStep(std::move(cast_join_columns_step));
}
std::optional<ActionsDAG> createStepToDropColumns(
const Block & header,
const ColumnIdentifierSet & outer_scope_columns,
const PlannerContextPtr & planner_context)
{
ActionsDAG drop_unused_columns_after_join_actions_dag(header.getColumnsWithTypeAndName());
ActionsDAG::NodeRawConstPtrs drop_unused_columns_after_join_actions_dag_updated_outputs;
std::unordered_set<std::string_view> drop_unused_columns_after_join_actions_dag_updated_outputs_names;
std::optional<size_t> first_skipped_column_node_index;
auto & drop_unused_columns_after_join_actions_dag_outputs = drop_unused_columns_after_join_actions_dag.getOutputs();
size_t drop_unused_columns_after_join_actions_dag_outputs_size = drop_unused_columns_after_join_actions_dag_outputs.size();
const auto & global_planner_context = planner_context->getGlobalPlannerContext();
for (size_t i = 0; i < drop_unused_columns_after_join_actions_dag_outputs_size; ++i)
{
const auto & output = drop_unused_columns_after_join_actions_dag_outputs[i];
if (drop_unused_columns_after_join_actions_dag_updated_outputs_names.contains(output->result_name)
|| !global_planner_context->hasColumnIdentifier(output->result_name))
continue;
if (!outer_scope_columns.contains(output->result_name))
{
if (!first_skipped_column_node_index)
first_skipped_column_node_index = i;
continue;
}
drop_unused_columns_after_join_actions_dag_updated_outputs.push_back(output);
drop_unused_columns_after_join_actions_dag_updated_outputs_names.insert(output->result_name);
}
if (!first_skipped_column_node_index)
return {};
/** It is expected that JOIN TREE query plan will contain at least 1 column, even if there are no columns in outer scope.
*
* Example: SELECT count() FROM test_table_1 AS t1, test_table_2 AS t2;
*/
if (drop_unused_columns_after_join_actions_dag_updated_outputs.empty() && first_skipped_column_node_index)
drop_unused_columns_after_join_actions_dag_updated_outputs.push_back(drop_unused_columns_after_join_actions_dag_outputs[*first_skipped_column_node_index]);
drop_unused_columns_after_join_actions_dag_outputs = std::move(drop_unused_columns_after_join_actions_dag_updated_outputs);
return drop_unused_columns_after_join_actions_dag;
}
JoinTreeQueryPlan buildQueryPlanForJoinNode(
const QueryTreeNodePtr & join_table_expression,
JoinTreeQueryPlan left_join_tree_query_plan,
@ -1592,48 +1542,24 @@ JoinTreeQueryPlan buildQueryPlanForJoinNode(
}
const Block & left_header = left_plan.getCurrentHeader();
auto left_table_names = left_header.getNames();
NameSet left_table_names_set(left_table_names.begin(), left_table_names.end());
auto columns_from_joined_table = right_plan.getCurrentHeader().getNamesAndTypesList();
table_join->setColumnsFromJoinedTable(columns_from_joined_table, left_table_names_set, "");
for (auto & column_from_joined_table : columns_from_joined_table)
{
/// Add columns from joined table only if they are presented in outer scope, otherwise they can be dropped
if (planner_context->getGlobalPlannerContext()->hasColumnIdentifier(column_from_joined_table.name) &&
outer_scope_columns.contains(column_from_joined_table.name))
table_join->addJoinedColumn(column_from_joined_table);
}
const Block & right_header = right_plan.getCurrentHeader();
auto join_algorithm = chooseJoinAlgorithm(
table_join, join_node.getRightTableExpression(), left_header, right_header, planner_context, select_query_info);
auto columns_from_left_table = left_header.getNamesAndTypesList();
auto columns_from_right_table = right_header.getNamesAndTypesList();
table_join->setInputColumns(columns_from_left_table, columns_from_right_table);
for (auto & column_from_joined_table : columns_from_left_table)
{
/// Add columns to output only if they are presented in outer scope, otherwise they can be dropped
if (planner_context->getGlobalPlannerContext()->hasColumnIdentifier(column_from_joined_table.name) &&
outer_scope_columns.contains(column_from_joined_table.name))
table_join->setUsedColumn(column_from_joined_table, JoinTableSide::Left);
}
for (auto & column_from_joined_table : columns_from_right_table)
{
/// Add columns to output only if they are presented in outer scope, otherwise they can be dropped
if (planner_context->getGlobalPlannerContext()->hasColumnIdentifier(column_from_joined_table.name) &&
outer_scope_columns.contains(column_from_joined_table.name))
table_join->setUsedColumn(column_from_joined_table, JoinTableSide::Right);
}
if (table_join->getOutputColumns(JoinTableSide::Left).empty() && table_join->getOutputColumns(JoinTableSide::Right).empty())
{
/// We should add all duplicated columns, because join algorithm add either all column with specified name or none
auto set_used_column_with_duplicates = [&](const NamesAndTypesList & columns, JoinTableSide join_table_side)
{
const auto & column_name = columns.front().name;
for (const auto & column : columns)
if (column.name == column_name)
table_join->setUsedColumn(column, join_table_side);
};
if (!columns_from_left_table.empty())
set_used_column_with_duplicates(columns_from_left_table, JoinTableSide::Left);
else if (!columns_from_right_table.empty())
set_used_column_with_duplicates(columns_from_right_table, JoinTableSide::Right);
}
auto join_algorithm = chooseJoinAlgorithm(table_join, join_node.getRightTableExpression(), left_header, right_header, planner_context, select_query_info);
auto result_plan = QueryPlan();
bool is_filled_join = join_algorithm->isFilled();
@ -1719,16 +1645,6 @@ JoinTreeQueryPlan buildQueryPlanForJoinNode(
}
auto join_pipeline_type = join_algorithm->pipelineType();
ColumnIdentifierSet outer_scope_columns_nonempty;
if (outer_scope_columns.empty())
{
if (left_header.columns() > 1)
outer_scope_columns_nonempty.insert(left_header.getByPosition(0).name);
else if (right_header.columns() > 1)
outer_scope_columns_nonempty.insert(right_header.getByPosition(0).name);
}
auto join_step = std::make_unique<JoinStep>(
left_plan.getCurrentHeader(),
right_plan.getCurrentHeader(),
@ -1736,11 +1652,7 @@ JoinTreeQueryPlan buildQueryPlanForJoinNode(
settings[Setting::max_block_size],
settings[Setting::min_joined_block_size_bytes],
settings[Setting::max_threads],
outer_scope_columns.empty() ? outer_scope_columns_nonempty : outer_scope_columns,
false /*optimize_read_in_order*/,
true /*optimize_skip_unused_shards*/);
join_step->swap_join_tables = settings[Setting::query_plan_join_swap_table].get();
false /*optimize_read_in_order*/);
join_step->setStepDescription(fmt::format("JOIN {}", join_pipeline_type));
@ -1751,18 +1663,47 @@ JoinTreeQueryPlan buildQueryPlanForJoinNode(
result_plan.unitePlans(std::move(join_step), {std::move(plans)});
}
const auto & header_after_join = result_plan.getCurrentHeader();
if (header_after_join.columns() > outer_scope_columns.size())
ActionsDAG drop_unused_columns_after_join_actions_dag(result_plan.getCurrentHeader().getColumnsWithTypeAndName());
ActionsDAG::NodeRawConstPtrs drop_unused_columns_after_join_actions_dag_updated_outputs;
std::unordered_set<std::string_view> drop_unused_columns_after_join_actions_dag_updated_outputs_names;
std::optional<size_t> first_skipped_column_node_index;
auto & drop_unused_columns_after_join_actions_dag_outputs = drop_unused_columns_after_join_actions_dag.getOutputs();
size_t drop_unused_columns_after_join_actions_dag_outputs_size = drop_unused_columns_after_join_actions_dag_outputs.size();
for (size_t i = 0; i < drop_unused_columns_after_join_actions_dag_outputs_size; ++i)
{
auto drop_unused_columns_after_join_actions_dag = createStepToDropColumns(header_after_join, outer_scope_columns, planner_context);
if (drop_unused_columns_after_join_actions_dag)
const auto & output = drop_unused_columns_after_join_actions_dag_outputs[i];
const auto & global_planner_context = planner_context->getGlobalPlannerContext();
if (drop_unused_columns_after_join_actions_dag_updated_outputs_names.contains(output->result_name)
|| !global_planner_context->hasColumnIdentifier(output->result_name))
continue;
if (!outer_scope_columns.contains(output->result_name))
{
auto drop_unused_columns_after_join_transform_step = std::make_unique<ExpressionStep>(result_plan.getCurrentHeader(), std::move(*drop_unused_columns_after_join_actions_dag));
drop_unused_columns_after_join_transform_step->setStepDescription("Drop unused columns after JOIN");
result_plan.addStep(std::move(drop_unused_columns_after_join_transform_step));
if (!first_skipped_column_node_index)
first_skipped_column_node_index = i;
continue;
}
drop_unused_columns_after_join_actions_dag_updated_outputs.push_back(output);
drop_unused_columns_after_join_actions_dag_updated_outputs_names.insert(output->result_name);
}
/** It is expected that JOIN TREE query plan will contain at least 1 column, even if there are no columns in outer scope.
*
* Example: SELECT count() FROM test_table_1 AS t1, test_table_2 AS t2;
*/
if (drop_unused_columns_after_join_actions_dag_updated_outputs.empty() && first_skipped_column_node_index)
drop_unused_columns_after_join_actions_dag_updated_outputs.push_back(drop_unused_columns_after_join_actions_dag_outputs[*first_skipped_column_node_index]);
drop_unused_columns_after_join_actions_dag_outputs = std::move(drop_unused_columns_after_join_actions_dag_updated_outputs);
auto drop_unused_columns_after_join_transform_step = std::make_unique<ExpressionStep>(result_plan.getCurrentHeader(), std::move(drop_unused_columns_after_join_actions_dag));
drop_unused_columns_after_join_transform_step->setStepDescription("DROP unused columns after JOIN");
result_plan.addStep(std::move(drop_unused_columns_after_join_transform_step));
for (const auto & right_join_tree_query_plan_row_policy : right_join_tree_query_plan.used_row_policies)
left_join_tree_query_plan.used_row_policies.insert(right_join_tree_query_plan_row_policy);

View File

@ -7,7 +7,6 @@
#include <QueryPipeline/QueryPipelineBuilder.h>
#include <Common/JSONBuilder.h>
#include <Common/typeid_cast.h>
#include <Processors/Transforms/ColumnPermuteTransform.h>
namespace DB
{
@ -38,37 +37,6 @@ std::vector<std::pair<String, String>> describeJoinActions(const JoinPtr & join)
return description;
}
std::vector<size_t> getPermutationForBlock(
const Block & block,
const Block & lhs_block,
const Block & rhs_block,
const NameSet & name_filter)
{
std::vector<size_t> permutation;
permutation.reserve(block.columns());
Block::NameMap name_map = block.getNamesToIndexesMap();
bool is_trivial = true;
for (const auto & other_block : {lhs_block, rhs_block})
{
for (const auto & col : other_block)
{
if (!name_filter.contains(col.name))
continue;
if (auto it = name_map.find(col.name); it != name_map.end())
{
is_trivial = is_trivial && it->second == permutation.size();
permutation.push_back(it->second);
}
}
}
if (is_trivial && permutation.size() == block.columns())
return {};
return permutation;
}
}
JoinStep::JoinStep(
@ -78,16 +46,12 @@ JoinStep::JoinStep(
size_t max_block_size_,
size_t min_block_size_bytes_,
size_t max_streams_,
NameSet required_output_,
bool keep_left_read_in_order_,
bool use_new_analyzer_)
bool keep_left_read_in_order_)
: join(std::move(join_))
, max_block_size(max_block_size_)
, min_block_size_bytes(min_block_size_bytes_)
, max_streams(max_streams_)
, required_output(std::move(required_output_))
, keep_left_read_in_order(keep_left_read_in_order_)
, use_new_analyzer(use_new_analyzer_)
{
updateInputHeaders({left_header_, right_header_});
}
@ -97,52 +61,32 @@ QueryPipelineBuilderPtr JoinStep::updatePipeline(QueryPipelineBuilders pipelines
if (pipelines.size() != 2)
throw Exception(ErrorCodes::LOGICAL_ERROR, "JoinStep expect two input steps");
Block lhs_header = pipelines[0]->getHeader();
Block rhs_header = pipelines[1]->getHeader();
if (swap_streams)
std::swap(pipelines[0], pipelines[1]);
std::unique_ptr<QueryPipelineBuilder> joined_pipeline;
if (join->pipelineType() == JoinPipelineType::YShaped)
{
joined_pipeline = QueryPipelineBuilder::joinPipelinesYShaped(
std::move(pipelines[0]), std::move(pipelines[1]), join, join_algorithm_header, max_block_size, &processors);
auto joined_pipeline = QueryPipelineBuilder::joinPipelinesYShaped(
std::move(pipelines[0]), std::move(pipelines[1]), join, *output_header, max_block_size, &processors);
joined_pipeline->resize(max_streams);
}
else
{
joined_pipeline = QueryPipelineBuilder::joinPipelinesRightLeft(
std::move(pipelines[0]),
std::move(pipelines[1]),
join,
join_algorithm_header,
max_block_size,
min_block_size_bytes,
max_streams,
keep_left_read_in_order,
&processors);
}
if (!use_new_analyzer)
return joined_pipeline;
auto column_permutation = getPermutationForBlock(joined_pipeline->getHeader(), lhs_header, rhs_header, required_output);
if (!column_permutation.empty())
{
joined_pipeline->addSimpleTransform([&column_permutation](const Block & header)
{
return std::make_shared<ColumnPermuteTransform>(header, column_permutation);
});
}
auto pipeline = QueryPipelineBuilder::joinPipelinesRightLeft(
std::move(pipelines[0]),
std::move(pipelines[1]),
join,
*output_header,
max_block_size,
min_block_size_bytes,
max_streams,
keep_left_read_in_order,
&processors);
if (join->supportParallelJoin())
{
joined_pipeline->addSimpleTransform([&](const Block & header)
pipeline->addSimpleTransform([&](const Block & header)
{ return std::make_shared<SimpleSquashingChunksTransform>(header, 0, min_block_size_bytes); });
}
return joined_pipeline;
return pipeline;
}
bool JoinStep::allowPushDownToRight() const
@ -161,49 +105,17 @@ void JoinStep::describeActions(FormatSettings & settings) const
for (const auto & [name, value] : describeJoinActions(join))
settings.out << prefix << name << ": " << value << '\n';
if (swap_streams)
settings.out << prefix << "Swapped: true\n";
}
void JoinStep::describeActions(JSONBuilder::JSONMap & map) const
{
for (const auto & [name, value] : describeJoinActions(join))
map.add(name, value);
if (swap_streams)
map.add("Swapped", true);
}
void JoinStep::setJoin(JoinPtr join_, bool swap_streams_)
{
join_algorithm_header.clear();
swap_streams = swap_streams_;
join = std::move(join_);
updateOutputHeader();
}
void JoinStep::updateOutputHeader()
{
if (join_algorithm_header)
return;
const auto & header = swap_streams ? input_headers[1] : input_headers[0];
Block result_header = JoiningTransform::transformHeader(header, join);
join_algorithm_header = result_header;
if (!use_new_analyzer)
{
if (swap_streams)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot swap streams without new analyzer");
output_header = result_header;
return;
}
auto column_permutation = getPermutationForBlock(result_header, input_headers[0], input_headers[1], required_output);
if (!column_permutation.empty())
result_header = ColumnPermuteTransform::permute(result_header, column_permutation);
output_header = result_header;
output_header = JoiningTransform::transformHeader(input_headers.front(), join);
}
static ITransformingStep::Traits getStorageJoinTraits()

View File

@ -2,7 +2,6 @@
#include <Processors/QueryPlan/IQueryPlanStep.h>
#include <Processors/QueryPlan/ITransformingStep.h>
#include <Core/Joins.h>
namespace DB
{
@ -21,9 +20,7 @@ public:
size_t max_block_size_,
size_t min_block_size_bytes_,
size_t max_streams_,
NameSet required_output_,
bool keep_left_read_in_order_,
bool use_new_analyzer_);
bool keep_left_read_in_order_);
String getName() const override { return "Join"; }
@ -35,28 +32,17 @@ public:
void describeActions(FormatSettings & settings) const override;
const JoinPtr & getJoin() const { return join; }
void setJoin(JoinPtr join_, bool swap_streams_ = false);
void setJoin(JoinPtr join_) { join = std::move(join_); }
bool allowPushDownToRight() const;
/// Swap automatically if not set, otherwise always or never, depending on the value
std::optional<bool> swap_join_tables = false;
private:
void updateOutputHeader() override;
/// Header that expected to be returned from IJoin
Block join_algorithm_header;
JoinPtr join;
size_t max_block_size;
size_t min_block_size_bytes;
size_t max_streams;
const NameSet required_output;
std::set<size_t> columns_to_remove;
bool keep_left_read_in_order;
bool use_new_analyzer = false;
bool swap_streams = false;
};
/// Special step for the case when Join is already filled.

View File

@ -113,7 +113,6 @@ void optimizePrimaryKeyConditionAndLimit(const Stack & stack);
void optimizePrewhere(Stack & stack, QueryPlan::Nodes & nodes);
void optimizeReadInOrder(QueryPlan::Node & node, QueryPlan::Nodes & nodes);
void optimizeAggregationInOrder(QueryPlan::Node & node, QueryPlan::Nodes &);
void optimizeJoin(QueryPlan::Node & node, QueryPlan::Nodes &);
void optimizeDistinctInOrder(QueryPlan::Node & node, QueryPlan::Nodes &);
/// A separate tree traverse to apply sorting properties after *InOrder optimizations.

View File

@ -1,103 +0,0 @@
#include <Processors/QueryPlan/ExpressionStep.h>
#include <Processors/QueryPlan/FilterStep.h>
#include <Processors/QueryPlan/ITransformingStep.h>
#include <Processors/QueryPlan/JoinStep.h>
#include <Processors/QueryPlan/Optimizations/Optimizations.h>
#include <Processors/QueryPlan/Optimizations/actionsDAGUtils.h>
#include <Processors/QueryPlan/ReadFromMergeTree.h>
#include <Processors/QueryPlan/SortingStep.h>
#include <Storages/StorageMemory.h>
#include <Processors/QueryPlan/ReadFromMemoryStorageStep.h>
#include <Core/Settings.h>
#include <Interpreters/IJoin.h>
#include <Interpreters/HashJoin/HashJoin.h>
#include <Interpreters/TableJoin.h>
#include <Common/logger_useful.h>
#include <Core/Joins.h>
#include <ranges>
namespace DB::QueryPlanOptimizations
{
static std::optional<UInt64> estimateReadRowsCount(QueryPlan::Node & node)
{
IQueryPlanStep * step = node.step.get();
if (const auto * reading = typeid_cast<const ReadFromMergeTree *>(step))
{
if (auto analyzed_result = reading->getAnalyzedResult())
return analyzed_result->selected_rows;
if (auto analyzed_result = reading->selectRangesToRead())
return analyzed_result->selected_rows;
return {};
}
if (const auto * reading = typeid_cast<const ReadFromMemoryStorageStep *>(step))
return reading->getStorage()->totalRows(Settings{});
if (node.children.size() != 1)
return {};
if (typeid_cast<ExpressionStep *>(step) || typeid_cast<FilterStep *>(step))
return estimateReadRowsCount(*node.children.front());
return {};
}
void optimizeJoin(QueryPlan::Node & node, QueryPlan::Nodes &)
{
auto * join_step = typeid_cast<JoinStep *>(node.step.get());
if (!join_step || node.children.size() != 2)
return;
const auto & join = join_step->getJoin();
if (join->pipelineType() != JoinPipelineType::FillRightFirst || !join->isCloneSupported())
return;
const auto & table_join = join->getTableJoin();
/// Algorithms other than HashJoin may not support all JOIN kinds, so changing from LEFT to RIGHT is not always possible
bool allow_outer_join = typeid_cast<const HashJoin *>(join.get());
if (table_join.kind() != JoinKind::Inner && !allow_outer_join)
return;
/// fixme: USING clause handled specially in join algorithm, so swap breaks it
/// fixme: Swapping for SEMI and ANTI joins should be alright, need to try to enable it and test
if (table_join.hasUsing() || table_join.strictness() != JoinStrictness::All)
return;
bool need_swap = false;
if (!join_step->swap_join_tables.has_value())
{
auto lhs_extimation = estimateReadRowsCount(*node.children[0]);
auto rhs_extimation = estimateReadRowsCount(*node.children[1]);
LOG_TRACE(getLogger("optimizeJoin"), "Left table estimation: {}, right table estimation: {}",
lhs_extimation.transform(toString<UInt64>).value_or("unknown"),
rhs_extimation.transform(toString<UInt64>).value_or("unknown"));
if (lhs_extimation && rhs_extimation && *lhs_extimation < *rhs_extimation)
need_swap = true;
}
else if (join_step->swap_join_tables.value())
{
need_swap = true;
}
if (!need_swap)
return;
const auto & headers = join_step->getInputHeaders();
if (headers.size() != 2)
return;
const auto & left_stream_input_header = headers.front();
const auto & right_stream_input_header = headers.back();
auto updated_table_join = std::make_shared<TableJoin>(table_join);
updated_table_join->swapSides();
auto updated_join = join->clone(updated_table_join, right_stream_input_header, left_stream_input_header);
join_step->setJoin(std::move(updated_join), /* swap_streams= */ true);
}
}

View File

@ -227,9 +227,6 @@ void addStepsToBuildSets(QueryPlan & plan, QueryPlan::Node & root, QueryPlan::No
/// NOTE: frame cannot be safely used after stack was modified.
auto & frame = stack.back();
if (frame.next_child == 0)
optimizeJoin(*frame.node, nodes);
/// Traverse all children first.
if (frame.next_child < frame.node->children.size())
{

View File

@ -35,8 +35,6 @@ public:
void initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &) override;
const StoragePtr & getStorage() const { return storage; }
private:
static constexpr auto name = "ReadFromMemoryStorage";

View File

@ -1,49 +0,0 @@
#include <Processors/Transforms/ColumnPermuteTransform.h>
namespace DB
{
namespace
{
template <typename T>
void applyPermutation(std::vector<T> & data, const std::vector<size_t> & permutation)
{
std::vector<T> res;
res.reserve(permutation.size());
for (size_t i : permutation)
res.push_back(data[i]);
data = std::move(res);
}
void permuteChunk(Chunk & chunk, const std::vector<size_t> & permutation)
{
size_t num_rows = chunk.getNumRows();
auto columns = chunk.detachColumns();
applyPermutation(columns, permutation);
chunk.setColumns(std::move(columns), num_rows);
}
}
Block ColumnPermuteTransform::permute(const Block & block, const std::vector<size_t> & permutation)
{
auto columns = block.getColumnsWithTypeAndName();
applyPermutation(columns, permutation);
return Block(columns);
}
ColumnPermuteTransform::ColumnPermuteTransform(const Block & header_, const std::vector<size_t> & permutation_)
: ISimpleTransform(header_, permute(header_, permutation_), false)
, permutation(permutation_)
{
}
void ColumnPermuteTransform::transform(Chunk & chunk)
{
permuteChunk(chunk, permutation);
}
}

View File

@ -1,30 +0,0 @@
#pragma once
#include <atomic>
#include <mutex>
#include <vector>
#include <Processors/ISimpleTransform.h>
#include <Poco/Logger.h>
#include <Interpreters/Set.h>
namespace DB
{
class ColumnPermuteTransform : public ISimpleTransform
{
public:
ColumnPermuteTransform(const Block & header_, const std::vector<size_t> & permutation_);
String getName() const override { return "ColumnPermuteTransform"; }
void transform(Chunk & chunk) override;
static Block permute(const Block & block, const std::vector<size_t> & permutation);
private:
Names column_names;
std::vector<size_t> permutation;
};
}

View File

@ -19,7 +19,6 @@ Block JoiningTransform::transformHeader(Block header, const JoinPtr & join)
join->initialize(header);
ExtraBlockPtr tmp;
join->joinBlock(header, tmp);
materializeBlockInplace(header);
LOG_TEST(getLogger("JoiningTransform"), "After join block: '{}'", header.dumpStructure());
return header;
}

View File

@ -101,7 +101,6 @@ IMergingAlgorithm::Status PasteJoinAlgorithm::merge()
return Status(0);
if (last_used_row[1] >= chunks[1].getNumRows())
return Status(1);
/// We have unused rows from both inputs
size_t result_num_rows = std::min(chunks[0].getNumRows() - last_used_row[0], chunks[1].getNumRows() - last_used_row[1]);
@ -111,7 +110,6 @@ IMergingAlgorithm::Status PasteJoinAlgorithm::merge()
result.addColumn(col->cut(last_used_row[source_num], result_num_rows));
last_used_row[0] += result_num_rows;
last_used_row[1] += result_num_rows;
return Status(std::move(result));
}

View File

@ -74,6 +74,7 @@ void StorageObjectStorageSink::finalizeBuffers()
catch (...)
{
/// Stop ParallelFormattingOutputFormat correctly.
cancelBuffers();
releaseBuffers();
throw;
}

Some files were not shown because too many files have changed in this diff Show More