Merge remote-tracking branch 'origin/master' into store-analysis-result

2024-11-21 23:21:59 +00:00 · 2024-05-31 19:41:34 +00:00 · 2024-05-31 19:41:34 +00:00 · 0c0f411042
commit 0c0f411042
parent f3ad52d689 bf3cd881a3
18 changed files with 1182 additions and 148 deletions
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@ -42,40 +42,27 @@ At a minimum, the following information should be added (but add more as needed)

 > Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

-<details>
-    <summary>CI Settings</summary>
-
-**NOTE:** If your merge the PR with modified CI you **MUST KNOW** what you are doing
-**NOTE:** Checked options will be applied if set before CI RunConfig/PrepareRunConfig step
- [ ] <!---ci_include_integration--> Allow: Integration Tests
+#### CI Settings (Only check the boxes if you know what you are doing):
+- [ ] <!---ci_set_required--> Allow: All Required Checks
 - [ ] <!---ci_include_stateless--> Allow: Stateless tests
 - [ ] <!---ci_include_stateful--> Allow: Stateful tests
- [ ] <!---ci_include_unit--> Allow: Unit tests
+- [ ] <!---ci_include_integration--> Allow: Integration Tests
 - [ ] <!---ci_include_performance--> Allow: Performance tests
- [ ] <!---ci_include_aarch64--> Allow: All with aarch64
- [ ] <!---ci_include_asan--> Allow: All with ASAN
- [ ] <!---ci_include_tsan--> Allow: All with TSAN
- [ ] <!---ci_include_analyzer--> Allow: All with Analyzer
- [ ] <!---ci_include_azure --> Allow: All with Azure
- [ ] <!---ci_include_KEYWORD--> Allow: Add your option here
+- [ ] <!---ci_set_non_required--> Allow: All NOT Required Checks
+- [ ] <!---batch_0_1--> Allow: batch 1, 2 for multi-batch jobs
+- [ ] <!---batch_2_3--> Allow: batch 3, 4, 5, 6 for multi-batch jobs
 ---
+- [ ] <!---ci_exclude_style--> Exclude: Style check
 - [ ] <!---ci_exclude_fast--> Exclude: Fast test
 - [ ] <!---ci_exclude_integration--> Exclude: Integration Tests
 - [ ] <!---ci_exclude_stateless--> Exclude: Stateless tests
 - [ ] <!---ci_exclude_stateful--> Exclude: Stateful tests
 - [ ] <!---ci_exclude_performance--> Exclude: Performance tests
 - [ ] <!---ci_exclude_asan--> Exclude: All with ASAN
- [ ] <!---ci_exclude_tsan--> Exclude: All with TSAN
- [ ] <!---ci_exclude_msan--> Exclude: All with MSAN
- [ ] <!---ci_exclude_ubsan--> Exclude: All with UBSAN
- [ ] <!---ci_exclude_coverage--> Exclude: All with Coverage
 - [ ] <!---ci_exclude_aarch64--> Exclude: All with Aarch64
+- [ ] <!---ci_exclude_tsan|msan|ubsan|coverage--> Exclude: All with TSAN, MSAN, UBSAN, Coverage
 ---
- [ ] <!---do_not_test--> do not test (only style check)
- [ ] <!---upload_all--> upload all binary artifacts from build jobs
- [ ] <!---no_merge_commit--> disable merge-commit (no merge from master before tests)
- [ ] <!---no_ci_cache--> disable CI cache (job reuse)
- [ ] <!---batch_0_1--> allow: batch 1, 2 for multi-batch jobs
- [ ] <!---batch_2_3--> allow: batch 3, 4
- [ ] <!---batch_4_5--> allow: batch 5, 6
-</details>
+- [ ] <!---do_not_test--> Do not test
+- [ ] <!---upload_all--> Upload binaries for special builds
+- [ ] <!---no_merge_commit--> Disable merge-commit
+- [ ] <!---no_ci_cache--> Disable CI cache
--- a/.github/workflows/merge_queue.yml
+++ b/.github/workflows/merge_queue.yml
@ -80,11 +80,27 @@ jobs:
      run_command: |
          python3 fast_test_check.py

+  Builds_1:
+    needs: [RunConfig, BuildDockers]
+    if: ${{ !failure() && !cancelled() && contains(fromJson(needs.RunConfig.outputs.data).stages_data.stages_to_do, 'Builds_1') }}
+    # using callable wf (reusable_stage.yml) allows grouping all nested jobs under a tab
+    uses: ./.github/workflows/reusable_build_stage.yml
+    with:
+      stage: Builds_1
+      data: ${{ needs.RunConfig.outputs.data }}
+  Tests_1:
+    needs: [RunConfig, Builds_1]
+    if: ${{ !failure() && !cancelled() && contains(fromJson(needs.RunConfig.outputs.data).stages_data.stages_to_do, 'Tests_1') }}
+    uses: ./.github/workflows/reusable_test_stage.yml
+    with:
+      stage: Tests_1
+      data: ${{ needs.RunConfig.outputs.data }}
+
  ################################# Stage Final #################################
  #
  FinishCheck:
    if: ${{ !failure() && !cancelled() }}
-    needs: [RunConfig, BuildDockers, StyleCheck, FastTest]
+    needs: [RunConfig, BuildDockers, StyleCheck, FastTest, Builds_1, Tests_1]
    runs-on: [self-hosted, style-checker-aarch64]
    steps:
      - name: Check out repository code
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -14,7 +14,6 @@
 * Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)).
 * Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_functions = 1` or set `compatibility = '24.4'` or lower. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)).
 * Queries from `system.columns` will work faster if there is a large number of columns, but many databases or tables are not granted for `SHOW TABLES`. Note that in previous versions, if you grant `SHOW COLUMNS` to individual columns without granting `SHOW TABLES` to the corresponding tables, the `system.columns` table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. [#63439](https://github.com/ClickHouse/ClickHouse/pull/63439) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
-* Setting `replace_long_file_name_to_hash` is enabled by default for `MergeTree` tables. [#64457](https://github.com/ClickHouse/ClickHouse/pull/64457) ([Anton Popov](https://github.com/CurtizJ)). The data written with this setting can be read by server versions since 23.9. After you use ClickHouse with this setting enabled, you cannot downgrade to versions 23.8 and earlier.

 #### New Feature
 * Adds the `Form` format to read/write a single record in the `application/x-www-form-urlencoded` format. [#60199](https://github.com/ClickHouse/ClickHouse/pull/60199) ([Shaun Struwig](https://github.com/Blargian)).
@ -29,7 +28,6 @@
 * Support for conditional function `clamp`. [#62377](https://github.com/ClickHouse/ClickHouse/pull/62377) ([skyoct](https://github.com/skyoct)).
 * Add `NPy` output format. [#62430](https://github.com/ClickHouse/ClickHouse/pull/62430) ([豪肥肥](https://github.com/HowePa)).
 * `Raw` format as a synonym for `TSVRaw`. [#63394](https://github.com/ClickHouse/ClickHouse/pull/63394) ([Unalian](https://github.com/Unalian)).
-* Added a new SQL function `generateSnowflakeID` for generating Twitter-style Snowflake IDs. [#63577](https://github.com/ClickHouse/ClickHouse/pull/63577) ([Danila Puzov](https://github.com/kazalika)).
 * Added a new SQL function `generateUUIDv7` to generate version 7 UUIDs aka. timestamp-based UUIDs with random component. Also added a new function `UUIDToNum` to extract bytes from a UUID and a new function `UUIDv7ToDateTime` to extract timestamp component from a UUID version 7. [#62852](https://github.com/ClickHouse/ClickHouse/pull/62852) ([Alexey Petrunyaka](https://github.com/pet74alex)).
 * On Linux and MacOS, if the program has stdout redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to `INTO OUTFILE`). [#63662](https://github.com/ClickHouse/ClickHouse/pull/63662) ([v01dXYZ](https://github.com/v01dXYZ)).
 * Change warning on high number of attached tables to differentiate tables, views and dictionaries. [#64180](https://github.com/ClickHouse/ClickHouse/pull/64180) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
@ -43,26 +41,20 @@
 * Account failed files in `s3queue_tracked_file_ttl_sec` and `s3queue_traked_files_limit` for `StorageS3Queue`. [#63638](https://github.com/ClickHouse/ClickHouse/pull/63638) ([Kseniia Sumarokova](https://github.com/kssenii)).

 #### Performance Improvement
-* A native parquet reader, which can read parquet binary to ClickHouse columns directly. Now this feature can be activated by setting `input_format_parquet_use_native_reader` to true. [#60361](https://github.com/ClickHouse/ClickHouse/pull/60361) ([ZhiHong Zhang](https://github.com/copperybean)).
 * Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by `keep_free_space_size(elements)_ratio`). This allows to release pressure from space reservation for queries (on `tryReserve` method). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. [#61250](https://github.com/ClickHouse/ClickHouse/pull/61250) ([Kseniia Sumarokova](https://github.com/kssenii)).
 * Skip merging of newly created projection blocks during `INSERT`-s. [#59405](https://github.com/ClickHouse/ClickHouse/pull/59405) ([Nikita Taranov](https://github.com/nickitat)).
 * Process string functions `...UTF8` 'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. [#61632](https://github.com/ClickHouse/ClickHouse/pull/61632) ([李扬](https://github.com/taiyang-li)).
 * Improved performance of selection (`{}`) globs in StorageS3. [#62120](https://github.com/ClickHouse/ClickHouse/pull/62120) ([Andrey Zvonov](https://github.com/zvonand)).
 * HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. [#62652](https://github.com/ClickHouse/ClickHouse/pull/62652) ([Anton Ivashkin](https://github.com/ianton-ru)).
-* Function `splitByRegexp` is now faster when the regular expression argument is a single-character, trivial regular expression (in this case, it now falls back internally to `splitByChar`). [#62696](https://github.com/ClickHouse/ClickHouse/pull/62696) ([Robert Schulze](https://github.com/rschu1ze)).
-* Aggregation with 8-bit and 16-bit keys became faster: added min/max in FixedHashTable to limit the array index and reduce the `isZero()` calls during iteration. [#62746](https://github.com/ClickHouse/ClickHouse/pull/62746) ([Jiebin Sun](https://github.com/jiebinn)).
 * Add a new configuration`prefer_merge_sort_block_bytes` to control the memory usage and speed up sorting 2 times when merging when there are many columns. [#62904](https://github.com/ClickHouse/ClickHouse/pull/62904) ([LiuNeng](https://github.com/liuneng1994)).
 * `clickhouse-local` will start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes [#62941](https://github.com/ClickHouse/ClickHouse/issues/62941). [#63074](https://github.com/ClickHouse/ClickHouse/pull/63074) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
 * Micro-optimizations for the new analyzer. [#63429](https://github.com/ClickHouse/ClickHouse/pull/63429) ([Raúl Marín](https://github.com/Algunenano)).
 * Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63443](https://github.com/ClickHouse/ClickHouse/pull/63443) [#63532](https://github.com/ClickHouse/ClickHouse/pull/63532) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
 * Speed up indices of type `set` a little (around 1.5 times) by removing garbage. [#64098](https://github.com/ClickHouse/ClickHouse/pull/64098) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
-* Optimized vertical merges in tables with sparse columns. [#64311](https://github.com/ClickHouse/ClickHouse/pull/64311) ([Anton Popov](https://github.com/CurtizJ)).
-* Improve filtering of sparse columns: reduce redundant calls of `ColumnSparse::filter` to improve performance. [#64426](https://github.com/ClickHouse/ClickHouse/pull/64426) ([Jiebin Sun](https://github.com/jiebinn)).
 * Remove copying data when writing to the filesystem cache. [#63401](https://github.com/ClickHouse/ClickHouse/pull/63401) ([Kseniia Sumarokova](https://github.com/kssenii)).
 * Now backups with azure blob storage will use multicopy. [#64116](https://github.com/ClickHouse/ClickHouse/pull/64116) ([alesapin](https://github.com/alesapin)).
 * Allow to use native copy for azure even with different containers. [#64154](https://github.com/ClickHouse/ClickHouse/pull/64154) ([alesapin](https://github.com/alesapin)).
 * Finally enable native copy for azure. [#64182](https://github.com/ClickHouse/ClickHouse/pull/64182) ([alesapin](https://github.com/alesapin)).
-* Improve the iteration over sparse columns to reduce call of `size`. [#64497](https://github.com/ClickHouse/ClickHouse/pull/64497) ([Jiebin Sun](https://github.com/jiebinn)).

 #### Improvement
 * Allow using `clickhouse-local` and its shortcuts `clickhouse` and `ch` with a query or queries file as a positional argument. Examples: `ch "SELECT 1"`, `ch --param_test Hello "SELECT {test:String}"`, `ch query.sql`. This closes [#62361](https://github.com/ClickHouse/ClickHouse/issues/62361). [#63081](https://github.com/ClickHouse/ClickHouse/pull/63081) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
@ -92,14 +84,8 @@
 * Exception handling now works when ClickHouse is used inside AWS Lambda. Author: [Alexey Coolnev](https://github.com/acoolnev). [#64014](https://github.com/ClickHouse/ClickHouse/pull/64014) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
 * Throw `CANNOT_DECOMPRESS` instread of `CORRUPTED_DATA` on invalid compressed data passed via HTTP. [#64036](https://github.com/ClickHouse/ClickHouse/pull/64036) ([vdimir](https://github.com/vdimir)).
 * A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes [#61993](https://github.com/ClickHouse/ClickHouse/issues/61993). [#64084](https://github.com/ClickHouse/ClickHouse/pull/64084) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
-* Added knob `metadata_storage_type` to keep free space on metadata storage disk. [#64128](https://github.com/ClickHouse/ClickHouse/pull/64128) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
 * Add metrics, logs, and thread names around parts filtering with indices. [#64130](https://github.com/ClickHouse/ClickHouse/pull/64130) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
-* Metrics to track the number of directories created and removed by the `plain_rewritable` metadata storage, and the number of entries in the local-to-remote in-memory map. [#64175](https://github.com/ClickHouse/ClickHouse/pull/64175) ([Julia Kartseva](https://github.com/jkartseva)).
 * Ignore `allow_suspicious_primary_key` on `ATTACH` and verify on `ALTER`. [#64202](https://github.com/ClickHouse/ClickHouse/pull/64202) ([Azat Khuzhin](https://github.com/azat)).
-* The query cache now considers identical queries with different settings as different. This increases robustness in cases where different settings (e.g. `limit` or `additional_table_filters`) would affect the query result. [#64205](https://github.com/ClickHouse/ClickHouse/pull/64205) ([Robert Schulze](https://github.com/rschu1ze)).
-* Test that a non standard error code `QPSLimitExceeded` is supported and it is retryable error. [#64225](https://github.com/ClickHouse/ClickHouse/pull/64225) ([Sema Checherinda](https://github.com/CheSema)).
-* Settings from the user config doesn't affect merges and mutations for MergeTree on top of object storage. [#64456](https://github.com/ClickHouse/ClickHouse/pull/64456) ([alesapin](https://github.com/alesapin)).
-* Test that `totalqpslimitexceeded` is a retriable s3 error. [#64520](https://github.com/ClickHouse/ClickHouse/pull/64520) ([Sema Checherinda](https://github.com/CheSema)).

 #### Build/Testing/Packaging Improvement
 * ClickHouse is built with clang-18. A lot of new checks from clang-tidy-18 have been enabled. [#60469](https://github.com/ClickHouse/ClickHouse/pull/60469) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
@ -162,7 +148,6 @@
 * Fix analyzer: there's turtles all the way down... [#63930](https://github.com/ClickHouse/ClickHouse/pull/63930) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
 * Allow certain ALTER TABLE commands for `plain_rewritable` disk [#63933](https://github.com/ClickHouse/ClickHouse/pull/63933) ([Julia Kartseva](https://github.com/jkartseva)).
 * Recursive CTE distributed fix [#63939](https://github.com/ClickHouse/ClickHouse/pull/63939) ([Maksim Kita](https://github.com/kitaisreal)).
-* Fix reading of columns of type `Tuple(Map(LowCardinality(...)))` [#63956](https://github.com/ClickHouse/ClickHouse/pull/63956) ([Anton Popov](https://github.com/CurtizJ)).
 * Analyzer: Fix COLUMNS resolve [#63962](https://github.com/ClickHouse/ClickHouse/pull/63962) ([Dmitry Novik](https://github.com/novikd)).
 * LIMIT BY and skip_unused_shards with analyzer [#63983](https://github.com/ClickHouse/ClickHouse/pull/63983) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
 * A fix for some trash (experimental Kusto) [#63992](https://github.com/ClickHouse/ClickHouse/pull/63992) ([Yong Wang](https://github.com/kashwy)).
@ -176,8 +161,6 @@
 * Prevent LOGICAL_ERROR on CREATE TABLE as Materialized View [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)).
 * Query Cache: Consider identical queries against different databases as different [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)).
 * Ignore `text_log` for Keeper [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)).
-* Fix ARRAY JOIN with Distributed. [#64226](https://github.com/ClickHouse/ClickHouse/pull/64226) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
-* Fix: CNF with mutually exclusive atoms reduction [#64256](https://github.com/ClickHouse/ClickHouse/pull/64256) ([Eduard Karacharov](https://github.com/korowa)).
 * Fix Logical error: Bad cast for Buffer table with prewhere. [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).


@ -680,7 +663,7 @@
 * Improve the operation of `sumMapFiltered` with NaN values. NaN values are now placed at the end (instead of randomly) and considered different from any values. `-0` is now also treated as equal to `0`; since 0 values are discarded, `-0` values are discarded too. [#58959](https://github.com/ClickHouse/ClickHouse/pull/58959) ([Raúl Marín](https://github.com/Algunenano)).
 * The function `visibleWidth` will behave according to the docs. In previous versions, it simply counted code points after string serialization, like the `lengthUTF8` function, but didn't consider zero-width and combining characters, full-width characters, tabs, and deletes. Now the behavior is changed accordingly. If you want to keep the old behavior, set `function_visible_width_behavior` to `0`, or set `compatibility` to `23.12` or lower. [#59022](https://github.com/ClickHouse/ClickHouse/pull/59022) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
 * `Kusto` dialect is disabled until these two bugs will be fixed: [#59037](https://github.com/ClickHouse/ClickHouse/issues/59037) and [#59036](https://github.com/ClickHouse/ClickHouse/issues/59036). [#59305](https://github.com/ClickHouse/ClickHouse/pull/59305) ([Alexey Milovidov](https://github.com/alexey-milovidov)). Any attempt to use `Kusto` will result in exception.
-* More efficient implementation of the `FINAL` modifier no longer guarantees preserving the order even if `max_threads = 1`. If you counted on the previous behavior, set `enable_vertical_final` to 0 or `compatibility` to `23.12`. 
+* More efficient implementation of the `FINAL` modifier no longer guarantees preserving the order even if `max_threads = 1`. If you counted on the previous behavior, set `enable_vertical_final` to 0 or `compatibility` to `23.12`.

 #### New Feature
 * Implement Variant data type that represents a union of other data types. Type `Variant(T1, T2, ..., TN)` means that each row of this type has a value of either type `T1` or `T2` or ... or `TN` or none of them (`NULL` value). Variant type is available under a setting `allow_experimental_variant_type`. Reference: [#54864](https://github.com/ClickHouse/ClickHouse/issues/54864). [#58047](https://github.com/ClickHouse/ClickHouse/pull/58047) ([Kruglov Pavel](https://github.com/Avogar)).
--- a/programs/server/config.xml
+++ b/programs/server/config.xml
@ -715,7 +715,7 @@
        <!-- Enables logic that users without permissive row policies can still read rows using a SELECT query.
             For example, if there two users A, B and a row policy is defined only for A, then
             if this setting is true the user B will see all rows, and if this setting is false the user B will see no rows.
-             By default this setting is false for compatibility with earlier access configurations. -->
+             By default this setting is true. -->
        <users_without_row_policies_can_read_rows>true</users_without_row_policies_can_read_rows>

        <!-- By default, for backward compatibility ON CLUSTER queries ignore CLUSTER grant,
--- a/src/Analyzer/Passes/RewriteAggregateFunctionWithIfPass.cpp
+++ b/src/Analyzer/Passes/RewriteAggregateFunctionWithIfPass.cpp
@ -36,6 +36,10 @@ public:
        if (!function_node || !function_node->isAggregateFunction())
            return;

+        auto lower_name = Poco::toLower(function_node->getFunctionName());
+        if (lower_name.ends_with("if"))
+            return;
+
        auto & function_arguments_nodes = function_node->getArguments().getNodes();
        if (function_arguments_nodes.size() != 1)
            return;
@ -44,7 +48,6 @@ public:
        if (!if_node || if_node->getFunctionName() != "if")
            return;

-        auto lower_name = Poco::toLower(function_node->getFunctionName());
        auto if_arguments_nodes = if_node->getArguments().getNodes();
        auto * first_const_node = if_arguments_nodes[1]->as<ConstantNode>();
        auto * second_const_node = if_arguments_nodes[2]->as<ConstantNode>();
--- a/src/Analyzer/ValidationUtils.cpp
+++ b/src/Analyzer/ValidationUtils.cpp
@ -412,7 +412,21 @@ void validateTreeSize(const QueryTreeNodePtr & node,
        if (processed_children)
        {
            ++tree_size;
-            node_to_tree_size.emplace(node_to_process, tree_size);
+
+            size_t subtree_size = 1;
+            for (const auto & node_to_process_child : node_to_process->getChildren())
+            {
+                if (!node_to_process_child)
+                    continue;
+
+                subtree_size += node_to_tree_size[node_to_process_child];
+            }
+
+            auto * constant_node = node_to_process->as<ConstantNode>();
+            if (constant_node && constant_node->hasSourceExpression())
+                subtree_size += node_to_tree_size[constant_node->getSourceExpression()];
+
+            node_to_tree_size.emplace(node_to_process, subtree_size);
            continue;
        }

--- a/src/Backups/BackupIO_AzureBlobStorage.cpp
+++ b/src/Backups/BackupIO_AzureBlobStorage.cpp
@ -80,8 +80,8 @@ void BackupReaderAzureBlobStorage::copyFileToDisk(const String & path_in_backup,
                                    DiskPtr destination_disk, const String & destination_path, WriteMode write_mode)
 {
    auto destination_data_source_description = destination_disk->getDataSourceDescription();
-    LOG_TRACE(log, "Source description {}, desctionation description {}", data_source_description.description, destination_data_source_description.description);
-    if (destination_data_source_description.sameKind(data_source_description)
+    LOG_TRACE(log, "Source description {}, destination description {}", data_source_description.description, destination_data_source_description.description);
+    if (destination_data_source_description.object_storage_type == ObjectStorageType::Azure
        && destination_data_source_description.is_encrypted == encrypted_in_backup)
    {
        LOG_TRACE(log, "Copying {} from AzureBlobStorage to disk {}", path_in_backup, destination_disk->getName());
@ -153,8 +153,8 @@ void BackupWriterAzureBlobStorage::copyFileFromDisk(
 {
    /// Use the native copy as a more optimal way to copy a file from AzureBlobStorage to AzureBlobStorage if it's possible.
    auto source_data_source_description = src_disk->getDataSourceDescription();
-    LOG_TRACE(log, "Source description {}, desctionation description {}", source_data_source_description.description, data_source_description.description);
-    if (source_data_source_description.sameKind(data_source_description)
+    LOG_TRACE(log, "Source description {}, destination description {}", source_data_source_description.description, data_source_description.description);
+    if (source_data_source_description.object_storage_type == ObjectStorageType::Azure
        && source_data_source_description.is_encrypted == copy_encrypted)
    {
        /// getBlobPath() can return more than 3 elements if the file is stored as multiple objects in AzureBlobStorage container.
--- a/tests/ci/build_report_check.py
+++ b/tests/ci/build_report_check.py
@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-
+import json
 import logging
 import os
 import sys
@ -13,6 +13,8 @@ from env_helper import (
    GITHUB_SERVER_URL,
    REPORT_PATH,
    TEMP_PATH,
+    CI_CONFIG_PATH,
+    CI,
 )
 from pr_info import PRInfo
 from report import (
@ -53,6 +55,18 @@ def main():
        release=pr_info.is_release,
        backport=pr_info.head_ref.startswith("backport/"),
    )
+    if CI:
+        # In CI only specific builds might be manually selected, or some wf does not build all builds.
+        #   Filtering @builds_for_check to verify only builds that are present in the current CI workflow
+        with open(CI_CONFIG_PATH, encoding="utf-8") as jfd:
+            ci_config = json.load(jfd)
+        all_ci_jobs = (
+            ci_config["jobs_data"]["jobs_to_skip"]
+            + ci_config["jobs_data"]["jobs_to_do"]
+        )
+        builds_for_check = [job for job in builds_for_check if job in all_ci_jobs]
+        print(f"NOTE: following build reports will be accounted: [{builds_for_check}]")
+
    required_builds = len(builds_for_check)
    missing_builds = 0

--- a/tests/ci/ci.py
+++ b/tests/ci/ci.py
@ -47,6 +47,7 @@ from env_helper import (
    REPORT_PATH,
    S3_BUILDS_BUCKET,
    TEMP_PATH,
+    CI_CONFIG_PATH,
 )
 from get_robot_token import get_best_robot_token
 from git_helper import GIT_PREFIX, Git
@ -275,6 +276,13 @@ class CiCache:
            )
        return self

+    @staticmethod
+    def dump_run_config(indata: Dict[str, Any]) -> None:
+        assert indata
+        assert CI_CONFIG_PATH
+        with open(CI_CONFIG_PATH, "w", encoding="utf-8") as json_file:
+            json.dump(indata, json_file, indent=2)
+
    def update(self):
        """
        Pulls cache records from s3. Only records name w/o content.
@ -784,7 +792,9 @@ class CiOptions:
            f"{GIT_PREFIX} log {pr_info.sha} --format=%B -n 1"
        )

-        pattern = r"(#|- \[x\] +<!---)(\w+)"
+        # CI setting example we need to match with re:
+        # - [x] <!---ci_exclude_tsan|msan|ubsan|coverage--> Exclude: All with TSAN, MSAN, UBSAN, Coverage
+        pattern = r"(#|- \[x\] +<!---)([|\w]+)"
        matches = [match[-1] for match in re.findall(pattern, message)]
        print(f"CI tags from commit message: [{matches}]")

@ -818,9 +828,10 @@ class CiOptions:
            elif match.startswith("ci_exclude_"):
                if not res.exclude_keywords:
                    res.exclude_keywords = []
-                res.exclude_keywords.append(
-                    normalize_check_name(match.removeprefix("ci_exclude_"))
-                )
+                keywords = match.removeprefix("ci_exclude_").split("|")
+                res.exclude_keywords += [
+                    normalize_check_name(keyword) for keyword in keywords
+                ]
            elif match == CILabels.NO_CI_CACHE:
                res.no_ci_cache = True
                print("NOTE: CI Cache will be disabled")
@ -903,7 +914,6 @@ class CiOptions:
                # Style check must not be omitted
                jobs_to_do_requested.append(JobNames.STYLE_CHECK)

-        # FIXME: to be removed in favor of include/exclude
        # 1. Handle "ci_set_" tags if any
        if self.ci_sets:
            for tag in self.ci_sets:
@ -912,7 +922,12 @@ class CiOptions:
                print(
                    f"NOTE: CI Set's tag: [{tag}], add jobs: [{label_config.run_jobs}]"
                )
-                jobs_to_do_requested += label_config.run_jobs
+                # match against @jobs_to_do and @jobs_to_skip to remove non-relevant entries from @label_config.run_jobs
+                jobs_to_do_requested += [
+                    job
+                    for job in label_config.run_jobs
+                    if job in jobs_to_do or job in jobs_to_skip
+                ]

        # FIXME: to be removed in favor of include/exclude
        # 2. Handle "job_" tags if any
@ -1200,11 +1215,16 @@ def _pre_action(s3, indata, pr_info):
    ci_cache = CiCache(s3, indata["jobs_data"]["digests"])

    # for release/master branches reports must be from the same branch
-    report_prefix = normalize_string(pr_info.head_ref) if pr_info.number == 0 else ""
+    report_prefix = ""
+    if pr_info.is_master or pr_info.is_release:
+        report_prefix = normalize_string(pr_info.head_ref)
    print(
        f"Use report prefix [{report_prefix}], pr_num [{pr_info.number}], head_ref [{pr_info.head_ref}]"
    )
    reports_files = ci_cache.download_build_reports(file_prefix=report_prefix)
+
+    ci_cache.dump_run_config(indata)
+
    print(f"Pre action done. Report files [{reports_files}] have been downloaded")


@ -1356,7 +1376,12 @@ def _configure_jobs(

    # FIXME: find better place for these config variables
    DOCS_CHECK_JOBS = [JobNames.DOCS_CHECK, JobNames.STYLE_CHECK]
-    MQ_JOBS = [JobNames.STYLE_CHECK, JobNames.FAST_TEST]
+    MQ_JOBS = [
+        JobNames.STYLE_CHECK,
+        JobNames.FAST_TEST,
+        Build.BINARY_RELEASE,
+        JobNames.UNIT_TEST,
+    ]
    # Must always calculate digest for these jobs for CI Cache to function (they define s3 paths where records are stored)
    REQUIRED_DIGESTS = [JobNames.DOCS_CHECK, Build.PACKAGE_RELEASE]
    if pr_info.has_changes_in_documentation_only():
@ -1373,6 +1398,9 @@ def _configure_jobs(
        ):
            # We still need digest for JobNames.DOCS_CHECK since CiCache depends on it (FIXME)
            continue
+        if pr_info.is_master and job in MQ_JOBS:
+            # On master - skip jobs that run in MQ
+            continue
        if (
            pr_info.has_changes_in_documentation_only()
            and job not in DOCS_CHECK_JOBS
--- a/tests/ci/ci_config.py
+++ b/tests/ci/ci_config.py
@ -50,14 +50,10 @@ class CILabels(metaclass=WithIter):
    # to upload all binaries from build jobs
    UPLOAD_ALL_ARTIFACTS = "upload_all"
    CI_SET_REDUCED = "ci_set_reduced"
-    CI_SET_FAST = "ci_set_fast"
    CI_SET_ARM = "ci_set_arm"
-    CI_SET_INTEGRATION = "ci_set_integration"
+    CI_SET_REQUIRED = "ci_set_required"
+    CI_SET_NON_REQUIRED = "ci_set_non_required"
    CI_SET_OLD_ANALYZER = "ci_set_old_analyzer"
-    CI_SET_STATELESS = "ci_set_stateless"
-    CI_SET_STATEFUL = "ci_set_stateful"
-    CI_SET_STATELESS_ASAN = "ci_set_stateless_asan"
-    CI_SET_STATEFUL_ASAN = "ci_set_stateful_asan"

    libFuzzer = "libFuzzer"

@ -833,15 +829,34 @@ class CIConfig:
            raise KeyError("config contains errors", errors)


+# checks required by Mergeable Check
+REQUIRED_CHECKS = [
+    "PR Check",
+    StatusNames.SYNC,
+    JobNames.BUILD_CHECK,
+    JobNames.BUILD_CHECK_SPECIAL,
+    JobNames.DOCS_CHECK,
+    JobNames.FAST_TEST,
+    JobNames.STATEFUL_TEST_RELEASE,
+    JobNames.STATELESS_TEST_RELEASE,
+    JobNames.STATELESS_TEST_ASAN,
+    JobNames.STATELESS_TEST_FLAKY_ASAN,
+    JobNames.STATEFUL_TEST_ASAN,
+    JobNames.STYLE_CHECK,
+    JobNames.UNIT_TEST_ASAN,
+    JobNames.UNIT_TEST_MSAN,
+    JobNames.UNIT_TEST,
+    JobNames.UNIT_TEST_TSAN,
+    JobNames.UNIT_TEST_UBSAN,
+    JobNames.INTEGRATION_TEST_ASAN_OLD_ANALYZER,
+    JobNames.STATELESS_TEST_OLD_ANALYZER_S3_REPLICATED_RELEASE,
+]
+
+BATCH_REGEXP = re.compile(r"\s+\[[0-9/]+\]$")
+
 CI_CONFIG = CIConfig(
    label_configs={
        CILabels.DO_NOT_TEST_LABEL: LabelConfig(run_jobs=[JobNames.STYLE_CHECK]),
-        CILabels.CI_SET_FAST: LabelConfig(
-            run_jobs=[
-                JobNames.STYLE_CHECK,
-                JobNames.FAST_TEST,
-            ]
-        ),
        CILabels.CI_SET_ARM: LabelConfig(
            run_jobs=[
                JobNames.STYLE_CHECK,
@ -849,12 +864,9 @@ CI_CONFIG = CIConfig(
                JobNames.INTEGRATION_TEST_ARM,
            ]
        ),
-        CILabels.CI_SET_INTEGRATION: LabelConfig(
-            run_jobs=[
-                JobNames.STYLE_CHECK,
-                Build.PACKAGE_RELEASE,
-                JobNames.INTEGRATION_TEST,
-            ]
+        CILabels.CI_SET_REQUIRED: LabelConfig(run_jobs=REQUIRED_CHECKS),
+        CILabels.CI_SET_NON_REQUIRED: LabelConfig(
+            run_jobs=[job for job in JobNames if job not in REQUIRED_CHECKS]
        ),
        CILabels.CI_SET_OLD_ANALYZER: LabelConfig(
            run_jobs=[
@ -866,38 +878,6 @@ CI_CONFIG = CIConfig(
                JobNames.INTEGRATION_TEST_ASAN_OLD_ANALYZER,
            ]
        ),
-        CILabels.CI_SET_STATELESS: LabelConfig(
-            run_jobs=[
-                JobNames.STYLE_CHECK,
-                JobNames.FAST_TEST,
-                Build.PACKAGE_RELEASE,
-                JobNames.STATELESS_TEST_RELEASE,
-            ]
-        ),
-        CILabels.CI_SET_STATELESS_ASAN: LabelConfig(
-            run_jobs=[
-                JobNames.STYLE_CHECK,
-                JobNames.FAST_TEST,
-                Build.PACKAGE_ASAN,
-                JobNames.STATELESS_TEST_ASAN,
-            ]
-        ),
-        CILabels.CI_SET_STATEFUL: LabelConfig(
-            run_jobs=[
-                JobNames.STYLE_CHECK,
-                JobNames.FAST_TEST,
-                Build.PACKAGE_RELEASE,
-                JobNames.STATEFUL_TEST_RELEASE,
-            ]
-        ),
-        CILabels.CI_SET_STATEFUL_ASAN: LabelConfig(
-            run_jobs=[
-                JobNames.STYLE_CHECK,
-                JobNames.FAST_TEST,
-                Build.PACKAGE_ASAN,
-                JobNames.STATEFUL_TEST_ASAN,
-            ]
-        ),
        CILabels.CI_SET_REDUCED: LabelConfig(
            run_jobs=[
                job
@ -1380,32 +1360,6 @@ CI_CONFIG = CIConfig(
 CI_CONFIG.validate()


-# checks required by Mergeable Check
-REQUIRED_CHECKS = [
-    "PR Check",
-    StatusNames.SYNC,
-    JobNames.BUILD_CHECK,
-    JobNames.BUILD_CHECK_SPECIAL,
-    JobNames.DOCS_CHECK,
-    JobNames.FAST_TEST,
-    JobNames.STATEFUL_TEST_RELEASE,
-    JobNames.STATELESS_TEST_RELEASE,
-    JobNames.STATELESS_TEST_ASAN,
-    JobNames.STATELESS_TEST_FLAKY_ASAN,
-    JobNames.STATEFUL_TEST_ASAN,
-    JobNames.STYLE_CHECK,
-    JobNames.UNIT_TEST_ASAN,
-    JobNames.UNIT_TEST_MSAN,
-    JobNames.UNIT_TEST,
-    JobNames.UNIT_TEST_TSAN,
-    JobNames.UNIT_TEST_UBSAN,
-    JobNames.INTEGRATION_TEST_ASAN_OLD_ANALYZER,
-    JobNames.STATELESS_TEST_OLD_ANALYZER_S3_REPLICATED_RELEASE,
-]
-
-BATCH_REGEXP = re.compile(r"\s+\[[0-9/]+\]$")
-
-
 def is_required(check_name: str) -> bool:
    """Checks if a check_name is in REQUIRED_CHECKS, including batched jobs"""
    if check_name in REQUIRED_CHECKS:
--- a/tests/ci/commit_status_helper.py
+++ b/tests/ci/commit_status_helper.py
@ -84,7 +84,7 @@ def get_commit(gh: Github, commit_sha: str, retry_count: int = RETRY) -> Commit:

 def post_commit_status(
    commit: Commit,
-    state: StatusType,
+    state: Union[StatusType, str],
    report_url: Optional[str] = None,
    description: Optional[str] = None,
    check_name: Optional[str] = None,
--- a/tests/ci/env_helper.py
+++ b/tests/ci/env_helper.py
@ -39,6 +39,7 @@ S3_ARTIFACT_DOWNLOAD_TEMPLATE = (
    f"{S3_DOWNLOAD}/{S3_BUILDS_BUCKET}/"
    "{pr_or_release}/{commit}/{build_name}/{artifact}"
 )
+CI_CONFIG_PATH = f"{TEMP_PATH}/ci_config.json"

 # These parameters are set only on demand, and only once
 _GITHUB_JOB_ID = ""
--- a/tests/ci/finish_check.py
+++ b/tests/ci/finish_check.py
@ -15,7 +15,7 @@ from commit_status_helper import (
 )
 from get_robot_token import get_best_robot_token
 from pr_info import PRInfo
-from report import PENDING
+from report import PENDING, SUCCESS, FAILURE
 from synchronizer_utils import SYNC_BRANCH_PREFIX
 from env_helper import GITHUB_REPOSITORY, GITHUB_UPSTREAM_REPOSITORY

@ -59,16 +59,40 @@ def main():
                can_set_green_mergeable_status=True,
            )

-        statuses = [s for s in statuses if s.context == StatusNames.CI]
-        if not statuses:
+        ci_running_statuses = [s for s in statuses if s.context == StatusNames.CI]
+        if not ci_running_statuses:
            return
        # Take the latest status
-        status = statuses[-1]
-        if status.state == PENDING:
+        ci_status = ci_running_statuses[-1]
+
+        has_failure = False
+        has_pending = False
+        for status in statuses:
+            if status.context in (StatusNames.MERGEABLE, StatusNames.CI):
+                # do not account these statuses
+                continue
+            if status.state == PENDING:
+                if status.context == StatusNames.SYNC:
+                    # do not account sync status if pending - it's a different WF
+                    continue
+                has_pending = True
+            elif status.state == SUCCESS:
+                continue
+            else:
+                has_failure = True
+
+        ci_state = SUCCESS
+        if has_failure:
+            ci_state = FAILURE
+        elif has_pending:
+            print("ERROR: CI must not have pending jobs by the time of finish check")
+            ci_state = FAILURE
+
+        if ci_status.state == PENDING:
            post_commit_status(
                commit,
-                state,  # map Mergeable Check status to CI Running
-                status.target_url,
+                ci_state,
+                ci_status.target_url,
                "All checks finished",
                StatusNames.CI,
                pr_info,
--- a/tests/ci/test_ci_options.py
+++ b/tests/ci/test_ci_options.py
@ -8,7 +8,7 @@ from pr_info import PRInfo

 _TEST_BODY_1 = """
 #### Run only:
- [x] <!---ci_set_integration--> Integration tests
+- [x] <!---ci_set_non_required--> Non required
 - [ ] <!---ci_set_arm--> Integration tests (arm64)
 - [x] <!---ci_include_foo--> Integration tests
 - [x] <!---ci_include_foo_Bar--> Integration tests
@ -33,7 +33,7 @@ _TEST_BODY_2 = """
 - [x] <!---ci_include_azure--> MUST include azure
 - [x] <!---ci_include_foo_Bar--> no action must be applied
 - [ ] <!---ci_include_bar--> no action must be applied
- [x] <!---ci_exclude_tsan--> MUST exclude tsan
+- [x] <!---ci_exclude_tsan|foobar--> MUST exclude tsan
 - [x] <!---ci_exclude_aarch64--> MUST exclude aarch64
 - [x] <!---ci_exclude_analyzer--> MUST exclude test with analazer
 - [ ] <!---ci_exclude_bar--> no action applied
@ -138,7 +138,7 @@ class TestCIOptions(unittest.TestCase):
        self.assertFalse(ci_options.do_not_test)
        self.assertFalse(ci_options.no_ci_cache)
        self.assertTrue(ci_options.no_merge_commit)
-        self.assertEqual(ci_options.ci_sets, ["ci_set_integration"])
+        self.assertEqual(ci_options.ci_sets, ["ci_set_non_required"])
        self.assertCountEqual(ci_options.include_keywords, ["foo", "foo_bar"])
        self.assertCountEqual(ci_options.exclude_keywords, ["foo", "foo_bar"])

@ -153,7 +153,7 @@ class TestCIOptions(unittest.TestCase):
        )
        self.assertCountEqual(
            ci_options.exclude_keywords,
-            ["tsan", "aarch64", "analyzer", "s3_storage", "coverage"],
+            ["tsan", "foobar", "aarch64", "analyzer", "s3_storage", "coverage"],
        )
        jobs_to_do = list(_TEST_JOB_LIST)
        jobs_to_skip = []
--- a/tests/queries/0_stateless/03164_analyzer_rewrite_aggregate_function_with_if.reference
+++ b/tests/queries/0_stateless/03164_analyzer_rewrite_aggregate_function_with_if.reference
@ -0,0 +1 @@
+1
--- a/tests/queries/0_stateless/03164_analyzer_rewrite_aggregate_function_with_if.sql
+++ b/tests/queries/0_stateless/03164_analyzer_rewrite_aggregate_function_with_if.sql
@ -0,0 +1 @@
+SELECT countIf(multiIf(number < 2, NULL, if(number = 4, 1, 0))) FROM numbers(5);
--- a/tests/queries/0_stateless/03164_analyzer_validate_tree_size.reference
+++ b/tests/queries/0_stateless/03164_analyzer_validate_tree_size.reference
@ -0,0 +1 @@
+1
--- a/tests/queries/0_stateless/03164_analyzer_validate_tree_size.sql
+++ b/tests/queries/0_stateless/03164_analyzer_validate_tree_size.sql
				`@ -0,0 +1 @@`
				`SELECT countIf(multiIf(number < 2, NULL, if(number = 4, 1, 0))) FROM numbers(5);`