diff --git a/.github/ISSUE_TEMPLATE/10_question.md b/.github/ISSUE_TEMPLATE/10_question.md index 0992bf06217..08a05a844e0 100644 --- a/.github/ISSUE_TEMPLATE/10_question.md +++ b/.github/ISSUE_TEMPLATE/10_question.md @@ -10,3 +10,11 @@ assignees: '' > Make sure to check documentation https://clickhouse.com/docs/en/ first. If the question is concise and probably has a short answer, asking it in [community Slack](https://join.slack.com/t/clickhousedb/shared_invite/zt-1gh9ds7f4-PgDhJAaF8ad5RbWBAAjzFg) is probably the fastest way to find the answer. For more complicated questions, consider asking them on StackOverflow with "clickhouse" tag https://stackoverflow.com/questions/tagged/clickhouse > If you still prefer GitHub issues, remove all this text and ask your question here. + +**Company or project name** + +Put your company name or project description here + +**Question** + +Your question diff --git a/.github/ISSUE_TEMPLATE/20_feature-request.md b/.github/ISSUE_TEMPLATE/20_feature-request.md index f59dbc2c40f..cf5ac000a23 100644 --- a/.github/ISSUE_TEMPLATE/20_feature-request.md +++ b/.github/ISSUE_TEMPLATE/20_feature-request.md @@ -9,6 +9,10 @@ assignees: '' > (you don't have to strictly follow this form) +**Company or project name** + +> Put your company name or project description here + **Use case** > A clear and concise description of what is the intended usage scenario is. diff --git a/.github/ISSUE_TEMPLATE/30_unexpected-behaviour.md b/.github/ISSUE_TEMPLATE/30_unexpected-behaviour.md index 3630d95ba33..73c861886e6 100644 --- a/.github/ISSUE_TEMPLATE/30_unexpected-behaviour.md +++ b/.github/ISSUE_TEMPLATE/30_unexpected-behaviour.md @@ -9,6 +9,10 @@ assignees: '' (you don't have to strictly follow this form) +**Company or project name** + +Put your company name or project description here + **Describe the unexpected behaviour** A clear and concise description of what works not as it is supposed to. diff --git a/.github/ISSUE_TEMPLATE/35_incomplete_implementation.md b/.github/ISSUE_TEMPLATE/35_incomplete_implementation.md index 6a014ce3c29..45f752b53ef 100644 --- a/.github/ISSUE_TEMPLATE/35_incomplete_implementation.md +++ b/.github/ISSUE_TEMPLATE/35_incomplete_implementation.md @@ -9,6 +9,10 @@ assignees: '' (you don't have to strictly follow this form) +**Company or project name** + +Put your company name or project description here + **Describe the unexpected behaviour** A clear and concise description of what works not as it is supposed to. diff --git a/.github/ISSUE_TEMPLATE/45_usability-issue.md b/.github/ISSUE_TEMPLATE/45_usability-issue.md index b03b11606c1..79f23fe0a14 100644 --- a/.github/ISSUE_TEMPLATE/45_usability-issue.md +++ b/.github/ISSUE_TEMPLATE/45_usability-issue.md @@ -9,6 +9,9 @@ assignees: '' (you don't have to strictly follow this form) +**Company or project name** +Put your company name or project description here + **Describe the issue** A clear and concise description of what works not as it is supposed to. diff --git a/.github/ISSUE_TEMPLATE/50_build-issue.md b/.github/ISSUE_TEMPLATE/50_build-issue.md index 9b05fbbdd13..5a58add9ad8 100644 --- a/.github/ISSUE_TEMPLATE/50_build-issue.md +++ b/.github/ISSUE_TEMPLATE/50_build-issue.md @@ -9,6 +9,10 @@ assignees: '' > Make sure that `git diff` result is empty and you've just pulled fresh master. Try cleaning up cmake cache. Just in case, official build instructions are published here: https://clickhouse.com/docs/en/development/build/ +**Company or project name** + +> Put your company name or project description here + **Operating system** > OS kind or distribution, specific version/release, non-standard kernel if any. If you are trying to build inside virtual machine, please mention it too. diff --git a/.github/ISSUE_TEMPLATE/60_documentation-issue.md b/.github/ISSUE_TEMPLATE/60_documentation-issue.md index 557e5ea43c9..5a941977dac 100644 --- a/.github/ISSUE_TEMPLATE/60_documentation-issue.md +++ b/.github/ISSUE_TEMPLATE/60_documentation-issue.md @@ -8,6 +8,9 @@ labels: comp-documentation (you don't have to strictly follow this form) +**Company or project name** +Put your company name or project description here + **Describe the issue** A clear and concise description of what's wrong in documentation. diff --git a/.github/ISSUE_TEMPLATE/70_performance-issue.md b/.github/ISSUE_TEMPLATE/70_performance-issue.md index d0e549039a6..21eba3f5af1 100644 --- a/.github/ISSUE_TEMPLATE/70_performance-issue.md +++ b/.github/ISSUE_TEMPLATE/70_performance-issue.md @@ -9,6 +9,9 @@ assignees: '' (you don't have to strictly follow this form) +**Company or project name** +Put your company name or project description here + **Describe the situation** What exactly works slower than expected? diff --git a/.github/ISSUE_TEMPLATE/80_backward-compatibility.md b/.github/ISSUE_TEMPLATE/80_backward-compatibility.md index a13e9508f70..8058f5bcc53 100644 --- a/.github/ISSUE_TEMPLATE/80_backward-compatibility.md +++ b/.github/ISSUE_TEMPLATE/80_backward-compatibility.md @@ -9,6 +9,9 @@ assignees: '' (you don't have to strictly follow this form) +**Company or project name** +Put your company name or project description here + **Describe the issue** A clear and concise description of what works not as it is supposed to. diff --git a/.github/ISSUE_TEMPLATE/85_bug-report.md b/.github/ISSUE_TEMPLATE/85_bug-report.md index 6bf265260ac..c43473d63ad 100644 --- a/.github/ISSUE_TEMPLATE/85_bug-report.md +++ b/.github/ISSUE_TEMPLATE/85_bug-report.md @@ -11,6 +11,10 @@ assignees: '' > You have to provide the following information whenever possible. +**Company or project name** + +> Put your company name or project description here + **Describe what's wrong** > A clear and concise description of what works not as it is supposed to. diff --git a/.github/ISSUE_TEMPLATE/96_installation-issues.md b/.github/ISSUE_TEMPLATE/96_installation-issues.md index e4be8af86b6..5f1b6cfd640 100644 --- a/.github/ISSUE_TEMPLATE/96_installation-issues.md +++ b/.github/ISSUE_TEMPLATE/96_installation-issues.md @@ -7,6 +7,10 @@ assignees: '' --- +**Company or project name** + +Put your company name or project description here + **I have tried the following solutions**: https://clickhouse.com/docs/en/faq/troubleshooting/#troubleshooting-installation-errors **Installation type** diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 041024b21db..51a1a6e2df8 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -42,40 +42,27 @@ At a minimum, the following information should be added (but add more as needed) > Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/ -
- CI Settings - -**NOTE:** If your merge the PR with modified CI you **MUST KNOW** what you are doing -**NOTE:** Checked options will be applied if set before CI RunConfig/PrepareRunConfig step -- [ ] Allow: Integration Tests +#### CI Settings (Only check the boxes if you know what you are doing): +- [ ] Allow: All Required Checks - [ ] Allow: Stateless tests - [ ] Allow: Stateful tests -- [ ] Allow: Unit tests +- [ ] Allow: Integration Tests - [ ] Allow: Performance tests -- [ ] Allow: All with aarch64 -- [ ] Allow: All with ASAN -- [ ] Allow: All with TSAN -- [ ] Allow: All with Analyzer -- [ ] Allow: All with Azure -- [ ] Allow: Add your option here +- [ ] Allow: All NOT Required Checks +- [ ] Allow: batch 1, 2 for multi-batch jobs +- [ ] Allow: batch 3, 4, 5, 6 for multi-batch jobs --- +- [ ] Exclude: Style check - [ ] Exclude: Fast test - [ ] Exclude: Integration Tests - [ ] Exclude: Stateless tests - [ ] Exclude: Stateful tests - [ ] Exclude: Performance tests - [ ] Exclude: All with ASAN -- [ ] Exclude: All with TSAN -- [ ] Exclude: All with MSAN -- [ ] Exclude: All with UBSAN -- [ ] Exclude: All with Coverage - [ ] Exclude: All with Aarch64 +- [ ] Exclude: All with TSAN, MSAN, UBSAN, Coverage --- -- [ ] do not test (only style check) -- [ ] disable merge-commit (no merge from master before tests) -- [ ] disable CI cache (job reuse) -- [ ] allow: batch 1 for multi-batch jobs -- [ ] allow: batch 2 -- [ ] allow: batch 3 -- [ ] allow: batch 4, 5 and 6 -
+- [ ] Do not test +- [ ] Upload binaries for special builds +- [ ] Disable merge-commit +- [ ] Disable CI cache diff --git a/.github/workflows/master.yml b/.github/workflows/master.yml index c2a893a8e99..91dcb6a4968 100644 --- a/.github/workflows/master.yml +++ b/.github/workflows/master.yml @@ -106,7 +106,8 @@ jobs: data: ${{ needs.RunConfig.outputs.data }} # stage for jobs that do not prohibit merge Tests_3: - needs: [RunConfig, Builds_1] + # Test_3 should not wait for Test_1/Test_2 and should not be blocked by them on master branch since all jobs need to run there. + needs: [RunConfig, Builds_1, Builds_2] if: ${{ !failure() && !cancelled() && contains(fromJson(needs.RunConfig.outputs.data).stages_data.stages_to_do, 'Tests_3') }} uses: ./.github/workflows/reusable_test_stage.yml with: diff --git a/.github/workflows/merge_queue.yml b/.github/workflows/merge_queue.yml index d1b03198485..c8b2452829b 100644 --- a/.github/workflows/merge_queue.yml +++ b/.github/workflows/merge_queue.yml @@ -80,11 +80,27 @@ jobs: run_command: | python3 fast_test_check.py + Builds_1: + needs: [RunConfig, BuildDockers] + if: ${{ !failure() && !cancelled() && contains(fromJson(needs.RunConfig.outputs.data).stages_data.stages_to_do, 'Builds_1') }} + # using callable wf (reusable_stage.yml) allows grouping all nested jobs under a tab + uses: ./.github/workflows/reusable_build_stage.yml + with: + stage: Builds_1 + data: ${{ needs.RunConfig.outputs.data }} + Tests_1: + needs: [RunConfig, Builds_1] + if: ${{ !failure() && !cancelled() && contains(fromJson(needs.RunConfig.outputs.data).stages_data.stages_to_do, 'Tests_1') }} + uses: ./.github/workflows/reusable_test_stage.yml + with: + stage: Tests_1 + data: ${{ needs.RunConfig.outputs.data }} + ################################# Stage Final ################################# # FinishCheck: if: ${{ !failure() && !cancelled() }} - needs: [RunConfig, BuildDockers, StyleCheck, FastTest] + needs: [RunConfig, BuildDockers, StyleCheck, FastTest, Builds_1, Tests_1] runs-on: [self-hosted, style-checker-aarch64] steps: - name: Check out repository code diff --git a/.github/workflows/pull_request.yml b/.github/workflows/pull_request.yml index 7d22554473e..e4deaf9f35e 100644 --- a/.github/workflows/pull_request.yml +++ b/.github/workflows/pull_request.yml @@ -135,7 +135,7 @@ jobs: data: ${{ needs.RunConfig.outputs.data }} # stage for jobs that do not prohibit merge Tests_3: - needs: [RunConfig, Tests_1, Tests_2] + needs: [RunConfig, Builds_1, Tests_1, Builds_2, Tests_2] if: ${{ !failure() && !cancelled() && contains(fromJson(needs.RunConfig.outputs.data).stages_data.stages_to_do, 'Tests_3') }} uses: ./.github/workflows/reusable_test_stage.yml with: diff --git a/.github/workflows/reusable_test.yml b/.github/workflows/reusable_test.yml index e30ef863a86..c01dd8ca9d4 100644 --- a/.github/workflows/reusable_test.yml +++ b/.github/workflows/reusable_test.yml @@ -58,7 +58,7 @@ jobs: env: GITHUB_JOB_OVERRIDDEN: ${{inputs.test_name}}${{ fromJson(inputs.data).jobs_data.jobs_params[inputs.test_name].num_batches > 1 && format('-{0}',matrix.batch) || '' }} strategy: - fail-fast: false # we always wait for entire matrix + fail-fast: false # we always wait for the entire matrix matrix: batch: ${{ fromJson(inputs.data).jobs_data.jobs_params[inputs.test_name].batches }} steps: diff --git a/.gitignore b/.gitignore index db3f77d7d1e..4bc162c1b0f 100644 --- a/.gitignore +++ b/.gitignore @@ -21,6 +21,9 @@ *.stderr *.stdout +# llvm-xray logs +xray-log.* + /docs/build /docs/publish /docs/edit diff --git a/.gitmessage b/.gitmessage deleted file mode 100644 index 89ee7d35d23..00000000000 --- a/.gitmessage +++ /dev/null @@ -1,29 +0,0 @@ - - -### CI modificators (add a leading space to apply) ### - -## To avoid a merge commit in CI: -#no_merge_commit - -## To discard CI cache: -#no_ci_cache - -## To not test (only style check): -#do_not_test - -## To run specified set of tests in CI: -#ci_set_ -#ci_set_reduced -#ci_set_arm -#ci_set_integration -#ci_set_old_analyzer - -## To run specified job in CI: -#job_ -#job_stateless_tests_release -#job_package_debug -#job_integration_tests_asan - -## To run only specified batches for multi-batch job(s) -#batch_2 -#batch_1_2_3 diff --git a/CHANGELOG.md b/CHANGELOG.md index 64ff3b78065..4891b79e4c7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,10 +11,9 @@ ### ClickHouse release 24.5, 2024-05-30 #### Backward Incompatible Change -* Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)). -* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_functions = 1` or set `compatibility = '24.4'` or lower. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)). +* Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make sure to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)). +* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_error_prone_window_functions = 1` or set `compatibility = '24.4'` or lower. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)). * Queries from `system.columns` will work faster if there is a large number of columns, but many databases or tables are not granted for `SHOW TABLES`. Note that in previous versions, if you grant `SHOW COLUMNS` to individual columns without granting `SHOW TABLES` to the corresponding tables, the `system.columns` table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. [#63439](https://github.com/ClickHouse/ClickHouse/pull/63439) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Setting `replace_long_file_name_to_hash` is enabled by default for `MergeTree` tables. [#64457](https://github.com/ClickHouse/ClickHouse/pull/64457) ([Anton Popov](https://github.com/CurtizJ)). The data written with this setting can be read by server versions since 23.9. After you use ClickHouse with this setting enabled, you cannot downgrade to versions 23.8 and earlier. #### New Feature * Adds the `Form` format to read/write a single record in the `application/x-www-form-urlencoded` format. [#60199](https://github.com/ClickHouse/ClickHouse/pull/60199) ([Shaun Struwig](https://github.com/Blargian)). @@ -29,10 +28,9 @@ * Support for conditional function `clamp`. [#62377](https://github.com/ClickHouse/ClickHouse/pull/62377) ([skyoct](https://github.com/skyoct)). * Add `NPy` output format. [#62430](https://github.com/ClickHouse/ClickHouse/pull/62430) ([豪肥肥](https://github.com/HowePa)). * `Raw` format as a synonym for `TSVRaw`. [#63394](https://github.com/ClickHouse/ClickHouse/pull/63394) ([Unalian](https://github.com/Unalian)). -* Added new SQL functions `generateSnowflakeID` for generating Twitter-style Snowflake IDs. [#63577](https://github.com/ClickHouse/ClickHouse/pull/63577) ([Danila Puzov](https://github.com/kazalika)). +* Added a new SQL function `generateUUIDv7` to generate version 7 UUIDs aka. timestamp-based UUIDs with random component. Also added a new function `UUIDToNum` to extract bytes from a UUID and a new function `UUIDv7ToDateTime` to extract timestamp component from a UUID version 7. [#62852](https://github.com/ClickHouse/ClickHouse/pull/62852) ([Alexey Petrunyaka](https://github.com/pet74alex)). * On Linux and MacOS, if the program has stdout redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to `INTO OUTFILE`). [#63662](https://github.com/ClickHouse/ClickHouse/pull/63662) ([v01dXYZ](https://github.com/v01dXYZ)). * Change warning on high number of attached tables to differentiate tables, views and dictionaries. [#64180](https://github.com/ClickHouse/ClickHouse/pull/64180) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). -* Added SQL functions `fromReadableSize` (along with `OrNull` and `OrZero` variants). This function performs the opposite operation of functions `formatReadableSize` and `formatReadableDecimalSize,` i.e., the given human-readable byte size; they return the number of bytes. Example: `SELECT fromReadableSize('3.0 MiB')` returns `3145728`. [#64386](https://github.com/ClickHouse/ClickHouse/pull/64386) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). * Provide support for `azureBlobStorage` function in ClickHouse server to use Azure Workload identity to authenticate against Azure blob storage. If `use_workload_identity` parameter is set in config, [workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications) is used for authentication. [#57881](https://github.com/ClickHouse/ClickHouse/pull/57881) ([Vinay Suryadevara](https://github.com/vinay92-ch)). * Add TTL information in the `system.parts_columns` table. [#63200](https://github.com/ClickHouse/ClickHouse/pull/63200) ([litlig](https://github.com/litlig)). @@ -43,26 +41,20 @@ * Account failed files in `s3queue_tracked_file_ttl_sec` and `s3queue_traked_files_limit` for `StorageS3Queue`. [#63638](https://github.com/ClickHouse/ClickHouse/pull/63638) ([Kseniia Sumarokova](https://github.com/kssenii)). #### Performance Improvement -* A native parquet reader, which can read parquet binary to ClickHouse columns directly. Now this feature can be activated by setting `input_format_parquet_use_native_reader` to true. [#60361](https://github.com/ClickHouse/ClickHouse/pull/60361) ([ZhiHong Zhang](https://github.com/copperybean)). * Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by `keep_free_space_size(elements)_ratio`). This allows to release pressure from space reservation for queries (on `tryReserve` method). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. [#61250](https://github.com/ClickHouse/ClickHouse/pull/61250) ([Kseniia Sumarokova](https://github.com/kssenii)). * Skip merging of newly created projection blocks during `INSERT`-s. [#59405](https://github.com/ClickHouse/ClickHouse/pull/59405) ([Nikita Taranov](https://github.com/nickitat)). * Process string functions `...UTF8` 'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. [#61632](https://github.com/ClickHouse/ClickHouse/pull/61632) ([李扬](https://github.com/taiyang-li)). * Improved performance of selection (`{}`) globs in StorageS3. [#62120](https://github.com/ClickHouse/ClickHouse/pull/62120) ([Andrey Zvonov](https://github.com/zvonand)). * HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. [#62652](https://github.com/ClickHouse/ClickHouse/pull/62652) ([Anton Ivashkin](https://github.com/ianton-ru)). -* Function `splitByRegexp` is now faster when the regular expression argument is a single-character, trivial regular expression (in this case, it now falls back internally to `splitByChar`). [#62696](https://github.com/ClickHouse/ClickHouse/pull/62696) ([Robert Schulze](https://github.com/rschu1ze)). -* Aggregation with 8-bit and 16-bit keys became faster: added min/max in FixedHashTable to limit the array index and reduce the `isZero()` calls during iteration. [#62746](https://github.com/ClickHouse/ClickHouse/pull/62746) ([Jiebin Sun](https://github.com/jiebinn)). * Add a new configuration`prefer_merge_sort_block_bytes` to control the memory usage and speed up sorting 2 times when merging when there are many columns. [#62904](https://github.com/ClickHouse/ClickHouse/pull/62904) ([LiuNeng](https://github.com/liuneng1994)). * `clickhouse-local` will start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes [#62941](https://github.com/ClickHouse/ClickHouse/issues/62941). [#63074](https://github.com/ClickHouse/ClickHouse/pull/63074) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Micro-optimizations for the new analyzer. [#63429](https://github.com/ClickHouse/ClickHouse/pull/63429) ([Raúl Marín](https://github.com/Algunenano)). * Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63443](https://github.com/ClickHouse/ClickHouse/pull/63443) [#63532](https://github.com/ClickHouse/ClickHouse/pull/63532) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Speed up indices of type `set` a little (around 1.5 times) by removing garbage. [#64098](https://github.com/ClickHouse/ClickHouse/pull/64098) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Optimized vertical merges in tables with sparse columns. [#64311](https://github.com/ClickHouse/ClickHouse/pull/64311) ([Anton Popov](https://github.com/CurtizJ)). -* Improve filtering of sparse columns: reduce redundant calls of `ColumnSparse::filter` to improve performance. [#64426](https://github.com/ClickHouse/ClickHouse/pull/64426) ([Jiebin Sun](https://github.com/jiebinn)). * Remove copying data when writing to the filesystem cache. [#63401](https://github.com/ClickHouse/ClickHouse/pull/63401) ([Kseniia Sumarokova](https://github.com/kssenii)). * Now backups with azure blob storage will use multicopy. [#64116](https://github.com/ClickHouse/ClickHouse/pull/64116) ([alesapin](https://github.com/alesapin)). * Allow to use native copy for azure even with different containers. [#64154](https://github.com/ClickHouse/ClickHouse/pull/64154) ([alesapin](https://github.com/alesapin)). * Finally enable native copy for azure. [#64182](https://github.com/ClickHouse/ClickHouse/pull/64182) ([alesapin](https://github.com/alesapin)). -* Improve the iteration over sparse columns to reduce call of `size`. [#64497](https://github.com/ClickHouse/ClickHouse/pull/64497) ([Jiebin Sun](https://github.com/jiebinn)). #### Improvement * Allow using `clickhouse-local` and its shortcuts `clickhouse` and `ch` with a query or queries file as a positional argument. Examples: `ch "SELECT 1"`, `ch --param_test Hello "SELECT {test:String}"`, `ch query.sql`. This closes [#62361](https://github.com/ClickHouse/ClickHouse/issues/62361). [#63081](https://github.com/ClickHouse/ClickHouse/pull/63081) ([Alexey Milovidov](https://github.com/alexey-milovidov)). @@ -92,14 +84,8 @@ * Exception handling now works when ClickHouse is used inside AWS Lambda. Author: [Alexey Coolnev](https://github.com/acoolnev). [#64014](https://github.com/ClickHouse/ClickHouse/pull/64014) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Throw `CANNOT_DECOMPRESS` instread of `CORRUPTED_DATA` on invalid compressed data passed via HTTP. [#64036](https://github.com/ClickHouse/ClickHouse/pull/64036) ([vdimir](https://github.com/vdimir)). * A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes [#61993](https://github.com/ClickHouse/ClickHouse/issues/61993). [#64084](https://github.com/ClickHouse/ClickHouse/pull/64084) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Added knob `metadata_storage_type` to keep free space on metadata storage disk. [#64128](https://github.com/ClickHouse/ClickHouse/pull/64128) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). * Add metrics, logs, and thread names around parts filtering with indices. [#64130](https://github.com/ClickHouse/ClickHouse/pull/64130) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Metrics to track the number of directories created and removed by the `plain_rewritable` metadata storage, and the number of entries in the local-to-remote in-memory map. [#64175](https://github.com/ClickHouse/ClickHouse/pull/64175) ([Julia Kartseva](https://github.com/jkartseva)). * Ignore `allow_suspicious_primary_key` on `ATTACH` and verify on `ALTER`. [#64202](https://github.com/ClickHouse/ClickHouse/pull/64202) ([Azat Khuzhin](https://github.com/azat)). -* The query cache now considers identical queries with different settings as different. This increases robustness in cases where different settings (e.g. `limit` or `additional_table_filters`) would affect the query result. [#64205](https://github.com/ClickHouse/ClickHouse/pull/64205) ([Robert Schulze](https://github.com/rschu1ze)). -* Test that a non standard error code `QPSLimitExceeded` is supported and it is retryable error. [#64225](https://github.com/ClickHouse/ClickHouse/pull/64225) ([Sema Checherinda](https://github.com/CheSema)). -* Settings from the user config doesn't affect merges and mutations for MergeTree on top of object storage. [#64456](https://github.com/ClickHouse/ClickHouse/pull/64456) ([alesapin](https://github.com/alesapin)). -* Test that `totalqpslimitexceeded` is a retriable s3 error. [#64520](https://github.com/ClickHouse/ClickHouse/pull/64520) ([Sema Checherinda](https://github.com/CheSema)). #### Build/Testing/Packaging Improvement * ClickHouse is built with clang-18. A lot of new checks from clang-tidy-18 have been enabled. [#60469](https://github.com/ClickHouse/ClickHouse/pull/60469) ([Alexey Milovidov](https://github.com/alexey-milovidov)). @@ -162,7 +148,6 @@ * Fix analyzer: there's turtles all the way down... [#63930](https://github.com/ClickHouse/ClickHouse/pull/63930) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). * Allow certain ALTER TABLE commands for `plain_rewritable` disk [#63933](https://github.com/ClickHouse/ClickHouse/pull/63933) ([Julia Kartseva](https://github.com/jkartseva)). * Recursive CTE distributed fix [#63939](https://github.com/ClickHouse/ClickHouse/pull/63939) ([Maksim Kita](https://github.com/kitaisreal)). -* Fix reading of columns of type `Tuple(Map(LowCardinality(...)))` [#63956](https://github.com/ClickHouse/ClickHouse/pull/63956) ([Anton Popov](https://github.com/CurtizJ)). * Analyzer: Fix COLUMNS resolve [#63962](https://github.com/ClickHouse/ClickHouse/pull/63962) ([Dmitry Novik](https://github.com/novikd)). * LIMIT BY and skip_unused_shards with analyzer [#63983](https://github.com/ClickHouse/ClickHouse/pull/63983) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * A fix for some trash (experimental Kusto) [#63992](https://github.com/ClickHouse/ClickHouse/pull/63992) ([Yong Wang](https://github.com/kashwy)). @@ -176,8 +161,6 @@ * Prevent LOGICAL_ERROR on CREATE TABLE as Materialized View [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)). * Query Cache: Consider identical queries against different databases as different [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)). * Ignore `text_log` for Keeper [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix ARRAY JOIN with Distributed. [#64226](https://github.com/ClickHouse/ClickHouse/pull/64226) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix: CNF with mutually exclusive atoms reduction [#64256](https://github.com/ClickHouse/ClickHouse/pull/64256) ([Eduard Karacharov](https://github.com/korowa)). * Fix Logical error: Bad cast for Buffer table with prewhere. [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). @@ -680,7 +663,7 @@ * Improve the operation of `sumMapFiltered` with NaN values. NaN values are now placed at the end (instead of randomly) and considered different from any values. `-0` is now also treated as equal to `0`; since 0 values are discarded, `-0` values are discarded too. [#58959](https://github.com/ClickHouse/ClickHouse/pull/58959) ([Raúl Marín](https://github.com/Algunenano)). * The function `visibleWidth` will behave according to the docs. In previous versions, it simply counted code points after string serialization, like the `lengthUTF8` function, but didn't consider zero-width and combining characters, full-width characters, tabs, and deletes. Now the behavior is changed accordingly. If you want to keep the old behavior, set `function_visible_width_behavior` to `0`, or set `compatibility` to `23.12` or lower. [#59022](https://github.com/ClickHouse/ClickHouse/pull/59022) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * `Kusto` dialect is disabled until these two bugs will be fixed: [#59037](https://github.com/ClickHouse/ClickHouse/issues/59037) and [#59036](https://github.com/ClickHouse/ClickHouse/issues/59036). [#59305](https://github.com/ClickHouse/ClickHouse/pull/59305) ([Alexey Milovidov](https://github.com/alexey-milovidov)). Any attempt to use `Kusto` will result in exception. -* More efficient implementation of the `FINAL` modifier no longer guarantees preserving the order even if `max_threads = 1`. If you counted on the previous behavior, set `enable_vertical_final` to 0 or `compatibility` to `23.12`. +* More efficient implementation of the `FINAL` modifier no longer guarantees preserving the order even if `max_threads = 1`. If you counted on the previous behavior, set `enable_vertical_final` to 0 or `compatibility` to `23.12`. #### New Feature * Implement Variant data type that represents a union of other data types. Type `Variant(T1, T2, ..., TN)` means that each row of this type has a value of either type `T1` or `T2` or ... or `TN` or none of them (`NULL` value). Variant type is available under a setting `allow_experimental_variant_type`. Reference: [#54864](https://github.com/ClickHouse/ClickHouse/issues/54864). [#58047](https://github.com/ClickHouse/ClickHouse/pull/58047) ([Kruglov Pavel](https://github.com/Avogar)). diff --git a/CMakeLists.txt b/CMakeLists.txt index 96ba2961d3a..455adc24182 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -122,6 +122,8 @@ add_library(global-libs INTERFACE) include (cmake/sanitize.cmake) +include (cmake/xray_instrumentation.cmake) + option(ENABLE_COLORED_BUILD "Enable colors in compiler output" ON) set (CMAKE_COLOR_MAKEFILE ${ENABLE_COLORED_BUILD}) # works only for the makefile generator @@ -208,8 +210,6 @@ option(OMIT_HEAVY_DEBUG_SYMBOLS "Do not generate debugger info for heavy modules (ClickHouse functions and dictionaries, some contrib)" ${OMIT_HEAVY_DEBUG_SYMBOLS_DEFAULT}) -option(USE_DEBUG_HELPERS "Enable debug helpers" ${USE_DEBUG_HELPERS}) - option(BUILD_STANDALONE_KEEPER "Build keeper as small standalone binary" OFF) if (NOT BUILD_STANDALONE_KEEPER) option(CREATE_KEEPER_SYMLINK "Create symlink for clickhouse-keeper to main server binary" ON) diff --git a/SECURITY.md b/SECURITY.md index 14c39129db9..8635951dc0e 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -2,22 +2,27 @@ the file is autogenerated by utils/security-generator/generate_security.py --> -# Security Policy +# ClickHouse Security Vulnerability Response Policy -## Security Announcements -Security fixes will be announced by posting them in the [security changelog](https://clickhouse.com/docs/en/whats-new/security-changelog/). +## Security Change Log and Support -## Scope and Supported Versions +Details regarding security fixes are publicly reported in our [security changelog](https://clickhouse.com/docs/en/whats-new/security-changelog/). A summary of known security vulnerabilities is shown at the bottom of this page. -The following versions of ClickHouse server are currently being supported with security updates: +Vulnerability notifications pre-release or during embargo periods are available to open source users and support customers registered for vulnerability alerts. Refer to our [Embargo Policy](#embargo-policy) below. + +The following versions of ClickHouse server are currently supported with security updates: | Version | Supported | |:-|:-| +| 24.5 | ✔️ | | 24.4 | ✔️ | | 24.3 | ✔️ | -| 24.2 | ✔️ | +| 24.2 | ❌ | | 24.1 | ❌ | -| 23.* | ❌ | +| 23.12 | ❌ | +| 23.11 | ❌ | +| 23.10 | ❌ | +| 23.9 | ❌ | | 23.8 | ✔️ | | 23.7 | ❌ | | 23.6 | ❌ | @@ -37,7 +42,7 @@ The following versions of ClickHouse server are currently being supported with s We're extremely grateful for security researchers and users that report vulnerabilities to the ClickHouse Open Source Community. All reports are thoroughly investigated by developers. -To report a potential vulnerability in ClickHouse please send the details about it to [security@clickhouse.com](mailto:security@clickhouse.com). We do not offer any financial rewards for reporting issues to us using this method. Alternatively, you can also submit your findings through our public bug bounty program hosted by [Bugcrowd](https://bugcrowd.com/clickhouse) and be rewarded for it as per the program scope and rules of engagement. +To report a potential vulnerability in ClickHouse please send the details about it through our public bug bounty program hosted by [Bugcrowd](https://bugcrowd.com/clickhouse) and be rewarded for it as per the program scope and rules of engagement. ### When Should I Report a Vulnerability? @@ -59,3 +64,21 @@ As the security issue moves from triage, to identified fix, to release planning A public disclosure date is negotiated by the ClickHouse maintainers and the bug submitter. We prefer to fully disclose the bug as soon as possible once a user mitigation is available. It is reasonable to delay disclosure when the bug or the fix is not yet fully understood, the solution is not well-tested, or for vendor coordination. The timeframe for disclosure is from immediate (especially if it's already publicly known) to 90 days. For a vulnerability with a straightforward mitigation, we expect the report date to disclosure date to be on the order of 7 days. +## Embargo Policy + +Open source users and support customers may subscribe to receive alerts during the embargo period by visiting [https://trust.clickhouse.com/?product=clickhouseoss](https://trust.clickhouse.com/?product=clickhouseoss), requesting access and subscribing for alerts. Subscribers agree not to make these notifications public, issue communications, share this information with others, or issue public patches before the disclosure date. Accidental disclosures must be reported immediately to trust@clickhouse.com. Failure to follow this policy or repeated leaks may result in removal from the subscriber list. + +Participation criteria: +1. Be a current open source user or support customer with a valid corporate email domain (no @gmail.com, @azure.com, etc.). +1. Sign up to the ClickHouse OSS Trust Center at [https://trust.clickhouse.com](https://trust.clickhouse.com). +1. Accept the ClickHouse Security Vulnerability Response Policy as outlined above. +1. Subscribe to ClickHouse OSS Trust Center alerts. + +Removal criteria: +1. Members may be removed for failure to follow this policy or repeated leaks. +1. Members may be removed for bounced messages (mail delivery failure). +1. Members may unsubscribe at any time. + +Notification process: +ClickHouse will post notifications within our OSS Trust Center and notify subscribers. Subscribers must log in to the Trust Center to download the notification. The notification will include the timeframe for public disclosure. + diff --git a/base/base/CMakeLists.txt b/base/base/CMakeLists.txt index 27aa0bd6baf..159502c9735 100644 --- a/base/base/CMakeLists.txt +++ b/base/base/CMakeLists.txt @@ -34,15 +34,6 @@ set (SRCS throwError.cpp ) -if (USE_DEBUG_HELPERS) - get_target_property(MAGIC_ENUM_INCLUDE_DIR ch_contrib::magic_enum INTERFACE_INCLUDE_DIRECTORIES) - # CMake generator expression will do insane quoting when it encounters special character like quotes, spaces, etc. - # Prefixing "SHELL:" will force it to use the original text. - set (INCLUDE_DEBUG_HELPERS "SHELL:-I\"${MAGIC_ENUM_INCLUDE_DIR}\" -include \"${ClickHouse_SOURCE_DIR}/base/base/iostream_debug_helpers.h\"") - # Use generator expression as we don't want to pollute CMAKE_CXX_FLAGS, which will interfere with CMake check system. - add_compile_options($<$:${INCLUDE_DEBUG_HELPERS}>) -endif () - add_library (common ${SRCS}) if (WITH_COVERAGE) diff --git a/base/base/iostream_debug_helpers.h b/base/base/iostream_debug_helpers.h deleted file mode 100644 index b23d3d9794d..00000000000 --- a/base/base/iostream_debug_helpers.h +++ /dev/null @@ -1,187 +0,0 @@ -#pragma once - -#include "demangle.h" -#include "getThreadId.h" -#include -#include -#include -#include -#include - -/** Usage: - * - * DUMP(variable...) - */ - - -template -Out & dumpValue(Out &, T &&); - - -/// Catch-all case. -template -requires(priority == -1) -Out & dumpImpl(Out & out, T &&) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - return out << "{...}"; -} - -/// An object, that could be output with operator <<. -template -requires(priority == 0) -Out & dumpImpl(Out & out, T && x, std::decay_t() << std::declval())> * = nullptr) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - return out << x; -} - -/// A pointer-like object. -template -requires(priority == 1 - /// Protect from the case when operator * do effectively nothing (function pointer). - && !std::is_same_v, std::decay_t())>>) -Out & dumpImpl(Out & out, T && x, std::decay_t())> * = nullptr) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - if (!x) - return out << "nullptr"; - return dumpValue(out, *x); -} - -/// Container. -template -requires(priority == 2) -Out & dumpImpl(Out & out, T && x, std::decay_t()))> * = nullptr) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - bool first = true; - out << "{"; - for (const auto & elem : x) - { - if (first) - first = false; - else - out << ", "; - dumpValue(out, elem); - } - return out << "}"; -} - - -template -requires(priority == 3 && std::is_enum_v>) -Out & dumpImpl(Out & out, T && x) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - return out << magic_enum::enum_name(x); -} - -/// string and const char * - output not as container or pointer. - -template -requires(priority == 3 && (std::is_same_v, std::string> || std::is_same_v, const char *>)) -Out & dumpImpl(Out & out, T && x) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - return out << std::quoted(x); -} - -/// UInt8 - output as number, not char. - -template -requires(priority == 3 && std::is_same_v, unsigned char>) -Out & dumpImpl(Out & out, T && x) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - return out << int(x); -} - - -/// Tuple, pair -template -Out & dumpTupleImpl(Out & out, T && x) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - if constexpr (N == 0) - out << "{"; - else - out << ", "; - - dumpValue(out, std::get(x)); - - if constexpr (N + 1 == std::tuple_size_v>) - out << "}"; - else - dumpTupleImpl(out, x); - - return out; -} - -template -requires(priority == 4) -Out & dumpImpl(Out & out, T && x, std::decay_t(std::declval()))> * = nullptr) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - return dumpTupleImpl<0>(out, x); -} - - -template -Out & dumpDispatchPriorities(Out & out, T && x, std::decay_t(std::declval(), std::declval()))> *) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - return dumpImpl(out, x); -} - -// NOLINTNEXTLINE(google-explicit-constructor) -struct LowPriority { LowPriority(void *) {} }; - -template -Out & dumpDispatchPriorities(Out & out, T && x, LowPriority) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - return dumpDispatchPriorities(out, x, nullptr); -} - - -template -Out & dumpValue(Out & out, T && x) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - return dumpDispatchPriorities<5>(out, x, nullptr); -} - - -template -Out & dump(Out & out, const char * name, T && x) // NOLINT(cppcoreguidelines-missing-std-forward) -{ - // Dumping string literal, printing name and demangled type is irrelevant. - if constexpr (std::is_same_v>>) - { - const auto name_len = strlen(name); - const auto value_len = strlen(x); - // `name` is the same as quoted `x` - if (name_len > 2 && value_len > 0 && name[0] == '"' && name[name_len - 1] == '"' - && strncmp(name + 1, x, std::min(value_len, name_len) - 1) == 0) - return out << x; - } - - out << demangle(typeid(x).name()) << " " << name << " = "; - return dumpValue(out, x) << "; "; -} - -#pragma clang diagnostic ignored "-Wgnu-zero-variadic-macro-arguments" - -#define DUMPVAR(VAR) ::dump(std::cerr, #VAR, (VAR)); -#define DUMPHEAD std::cerr << __FILE__ << ':' << __LINE__ << " [ " << getThreadId() << " ] "; -#define DUMPTAIL std::cerr << '\n'; - -#define DUMP1(V1) do { DUMPHEAD DUMPVAR(V1) DUMPTAIL } while(0) -#define DUMP2(V1, V2) do { DUMPHEAD DUMPVAR(V1) DUMPVAR(V2) DUMPTAIL } while(0) -#define DUMP3(V1, V2, V3) do { DUMPHEAD DUMPVAR(V1) DUMPVAR(V2) DUMPVAR(V3) DUMPTAIL } while(0) -#define DUMP4(V1, V2, V3, V4) do { DUMPHEAD DUMPVAR(V1) DUMPVAR(V2) DUMPVAR(V3) DUMPVAR(V4) DUMPTAIL } while(0) -#define DUMP5(V1, V2, V3, V4, V5) do { DUMPHEAD DUMPVAR(V1) DUMPVAR(V2) DUMPVAR(V3) DUMPVAR(V4) DUMPVAR(V5) DUMPTAIL } while(0) -#define DUMP6(V1, V2, V3, V4, V5, V6) do { DUMPHEAD DUMPVAR(V1) DUMPVAR(V2) DUMPVAR(V3) DUMPVAR(V4) DUMPVAR(V5) DUMPVAR(V6) DUMPTAIL } while(0) -#define DUMP7(V1, V2, V3, V4, V5, V6, V7) do { DUMPHEAD DUMPVAR(V1) DUMPVAR(V2) DUMPVAR(V3) DUMPVAR(V4) DUMPVAR(V5) DUMPVAR(V6) DUMPVAR(V7) DUMPTAIL } while(0) -#define DUMP8(V1, V2, V3, V4, V5, V6, V7, V8) do { DUMPHEAD DUMPVAR(V1) DUMPVAR(V2) DUMPVAR(V3) DUMPVAR(V4) DUMPVAR(V5) DUMPVAR(V6) DUMPVAR(V7) DUMPVAR(V8) DUMPTAIL } while(0) -#define DUMP9(V1, V2, V3, V4, V5, V6, V7, V8, V9) do { DUMPHEAD DUMPVAR(V1) DUMPVAR(V2) DUMPVAR(V3) DUMPVAR(V4) DUMPVAR(V5) DUMPVAR(V6) DUMPVAR(V7) DUMPVAR(V8) DUMPVAR(V9) DUMPTAIL } while(0) - -/// https://groups.google.com/forum/#!searchin/kona-dev/variadic$20macro%7Csort:date/kona-dev/XMA-lDOqtlI/GCzdfZsD41sJ - -#define VA_NUM_ARGS_IMPL(x1, x2, x3, x4, x5, x6, x7, x8, x9, N, ...) N -#define VA_NUM_ARGS(...) VA_NUM_ARGS_IMPL(__VA_ARGS__, 9, 8, 7, 6, 5, 4, 3, 2, 1) - -#define MAKE_VAR_MACRO_IMPL_CONCAT(PREFIX, NUM_ARGS) PREFIX ## NUM_ARGS -#define MAKE_VAR_MACRO_IMPL(PREFIX, NUM_ARGS) MAKE_VAR_MACRO_IMPL_CONCAT(PREFIX, NUM_ARGS) -#define MAKE_VAR_MACRO(PREFIX, ...) MAKE_VAR_MACRO_IMPL(PREFIX, VA_NUM_ARGS(__VA_ARGS__)) - -#define DUMP(...) MAKE_VAR_MACRO(DUMP, __VA_ARGS__)(__VA_ARGS__) diff --git a/base/base/tests/CMakeLists.txt b/base/base/tests/CMakeLists.txt index 81db4f3622f..e69de29bb2d 100644 --- a/base/base/tests/CMakeLists.txt +++ b/base/base/tests/CMakeLists.txt @@ -1,2 +0,0 @@ -clickhouse_add_executable (dump_variable dump_variable.cpp) -target_link_libraries (dump_variable PRIVATE clickhouse_common_io) diff --git a/base/base/tests/dump_variable.cpp b/base/base/tests/dump_variable.cpp deleted file mode 100644 index 9addc298ecb..00000000000 --- a/base/base/tests/dump_variable.cpp +++ /dev/null @@ -1,70 +0,0 @@ -#include - -#include -#include -#include -#include -#include -#include -#include -#include - - -struct S1; -struct S2 {}; - -struct S3 -{ - std::set m1; -}; - -std::ostream & operator<<(std::ostream & stream, const S3 & what) -{ - stream << "S3 {m1="; - dumpValue(stream, what.m1) << "}"; - return stream; -} - -int main(int, char **) -{ - int x = 1; - - DUMP(x); - DUMP(x, 1, &x); - - DUMP(std::make_unique(1)); - DUMP(std::make_shared(1)); - - std::vector vec{1, 2, 3}; - DUMP(vec); - - auto pair = std::make_pair(1, 2); - DUMP(pair); - - auto tuple = std::make_tuple(1, 2, 3); - DUMP(tuple); - - std::map map{{1, "hello"}, {2, "world"}}; - DUMP(map); - - std::initializer_list list{"hello", "world"}; - DUMP(list); - - std::array arr{{"hello", "world"}}; - DUMP(arr); - - //DUMP([]{}); - - S1 * s = nullptr; - DUMP(s); - - DUMP(S2()); - - std::set variants = {"hello", "world"}; - DUMP(variants); - - S3 s3 {{"hello", "world"}}; - DUMP(s3); - - return 0; -} diff --git a/cmake/xray_instrumentation.cmake b/cmake/xray_instrumentation.cmake new file mode 100644 index 00000000000..661c0575e54 --- /dev/null +++ b/cmake/xray_instrumentation.cmake @@ -0,0 +1,20 @@ +# https://llvm.org/docs/XRay.html + +option (ENABLE_XRAY "Enable LLVM XRay" OFF) + +if (NOT ENABLE_XRAY) + message (STATUS "Not using LLVM XRay") + return() +endif() + +if (NOT (ARCH_AMD64 AND OS_LINUX)) + message (STATUS "Not using LLVM XRay, only amd64 Linux or FreeBSD are supported") + return() +endif() + +# The target clang must support xray, otherwise it should error on invalid option +set (XRAY_FLAGS "-fxray-instrument -DUSE_XRAY") +set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${XRAY_FLAGS}") +set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${XRAY_FLAGS}") + +message (STATUS "Using LLVM XRay") diff --git a/docker/keeper/Dockerfile b/docker/keeper/Dockerfile index 413ad2dfaed..b3271d94184 100644 --- a/docker/keeper/Dockerfile +++ b/docker/keeper/Dockerfile @@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \ # lts / testing / prestable / etc ARG REPO_CHANNEL="stable" ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}" -ARG VERSION="24.4.1.2088" +ARG VERSION="24.5.1.1763" ARG PACKAGES="clickhouse-keeper" ARG DIRECT_DOWNLOAD_URLS="" diff --git a/docker/server/Dockerfile.alpine b/docker/server/Dockerfile.alpine index 5e224b16764..3f3b880c8f3 100644 --- a/docker/server/Dockerfile.alpine +++ b/docker/server/Dockerfile.alpine @@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \ # lts / testing / prestable / etc ARG REPO_CHANNEL="stable" ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}" -ARG VERSION="24.4.1.2088" +ARG VERSION="24.5.1.1763" ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static" ARG DIRECT_DOWNLOAD_URLS="" diff --git a/docker/server/Dockerfile.ubuntu b/docker/server/Dockerfile.ubuntu index d82be0e63f6..5fd22ee9b51 100644 --- a/docker/server/Dockerfile.ubuntu +++ b/docker/server/Dockerfile.ubuntu @@ -28,7 +28,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list ARG REPO_CHANNEL="stable" ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main" -ARG VERSION="24.4.1.2088" +ARG VERSION="24.5.1.1763" ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static" #docker-official-library:off diff --git a/docker/test/style/Dockerfile b/docker/test/style/Dockerfile index 172fbce6406..cb29185f068 100644 --- a/docker/test/style/Dockerfile +++ b/docker/test/style/Dockerfile @@ -15,7 +15,6 @@ RUN apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --yes \ file \ libxml2-utils \ moreutils \ - python3-fuzzywuzzy \ python3-pip \ yamllint \ locales \ @@ -23,8 +22,18 @@ RUN apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --yes \ && rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/* # python-magic is the same version as in Ubuntu 22.04 -RUN pip3 install black==23.12.0 boto3 codespell==2.2.1 mypy==1.8.0 PyGithub unidiff pylint==3.1.0 \ - python-magic==0.4.24 requests types-requests \ +RUN pip3 install \ + PyGithub \ + black==23.12.0 \ + boto3 \ + codespell==2.2.1 \ + mypy==1.8.0 \ + pylint==3.1.0 \ + python-magic==0.4.24 \ + requests \ + thefuzz \ + types-requests \ + unidiff \ && rm -rf /root/.cache/pip RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && locale-gen en_US.UTF-8 diff --git a/docker/test/upgrade/run.sh b/docker/test/upgrade/run.sh index 29174cc87e6..1f2cc9903b2 100644 --- a/docker/test/upgrade/run.sh +++ b/docker/test/upgrade/run.sh @@ -65,46 +65,22 @@ function save_settings_clean() script -q -c "clickhouse-local -q \"select * from system.settings into outfile '$out'\"" --log-out /dev/null } +# We save the (numeric) version of the old server to compare setting changes between the 2 +# We do this since we are testing against the latest release, not taking into account release candidates, so we might +# be testing current master (24.6) against the latest stable release (24.4) +function save_major_version() +{ + local out=$1 && shift + clickhouse-local -q "SELECT a[1]::UInt64 * 100 + a[2]::UInt64 as v FROM (Select splitByChar('.', version()) as a) into outfile '$out'" +} + save_settings_clean 'old_settings.native' +save_major_version 'old_version.native' # Initial run without S3 to create system.*_log on local file system to make it # available for dump via clickhouse-local configure -function remove_keeper_config() -{ - sudo sed -i "/<$1>$2<\/$1>/d" /etc/clickhouse-server/config.d/keeper_port.xml -} - -# async_replication setting doesn't exist on some older versions -remove_keeper_config "async_replication" "1" - -# create_if_not_exists feature flag doesn't exist on some older versions -remove_keeper_config "create_if_not_exists" "[01]" - -#todo: remove these after 24.3 released. -sudo sed -i "s|azure<|azure_blob_storage<|" /etc/clickhouse-server/config.d/azure_storage_conf.xml - -#todo: remove these after 24.3 released. -sudo sed -i "s|local<|local_blob_storage<|" /etc/clickhouse-server/config.d/storage_conf.xml - -# latest_logs_cache_size_threshold setting doesn't exist on some older versions -remove_keeper_config "latest_logs_cache_size_threshold" "[[:digit:]]\+" - -# commit_logs_cache_size_threshold setting doesn't exist on some older versions -remove_keeper_config "commit_logs_cache_size_threshold" "[[:digit:]]\+" - -# it contains some new settings, but we can safely remove it -rm /etc/clickhouse-server/config.d/merge_tree.xml -rm /etc/clickhouse-server/config.d/enable_wait_for_shutdown_replicated_tables.xml -rm /etc/clickhouse-server/config.d/zero_copy_destructive_operations.xml -rm /etc/clickhouse-server/config.d/storage_conf_02963.xml -rm /etc/clickhouse-server/config.d/backoff_failed_mutation.xml -rm /etc/clickhouse-server/config.d/handlers.yaml -rm /etc/clickhouse-server/users.d/nonconst_timezone.xml -rm /etc/clickhouse-server/users.d/s3_cache_new.xml -rm /etc/clickhouse-server/users.d/replicated_ddl_entry.xml - start stop mv /var/log/clickhouse-server/clickhouse-server.log /var/log/clickhouse-server/clickhouse-server.initial.log @@ -116,44 +92,11 @@ export USE_S3_STORAGE_FOR_MERGE_TREE=1 export ZOOKEEPER_FAULT_INJECTION=0 configure -# force_sync=false doesn't work correctly on some older versions -sudo sed -i "s|false|true|" /etc/clickhouse-server/config.d/keeper_port.xml - -#todo: remove these after 24.3 released. -sudo sed -i "s|azure<|azure_blob_storage<|" /etc/clickhouse-server/config.d/azure_storage_conf.xml - -#todo: remove these after 24.3 released. -sudo sed -i "s|local<|local_blob_storage<|" /etc/clickhouse-server/config.d/storage_conf.xml - -# async_replication setting doesn't exist on some older versions -remove_keeper_config "async_replication" "1" - -# create_if_not_exists feature flag doesn't exist on some older versions -remove_keeper_config "create_if_not_exists" "[01]" - -# latest_logs_cache_size_threshold setting doesn't exist on some older versions -remove_keeper_config "latest_logs_cache_size_threshold" "[[:digit:]]\+" - -# commit_logs_cache_size_threshold setting doesn't exist on some older versions -remove_keeper_config "commit_logs_cache_size_threshold" "[[:digit:]]\+" - # But we still need default disk because some tables loaded only into it sudo sed -i "s|
s3
|
s3
default|" /etc/clickhouse-server/config.d/s3_storage_policy_by_default.xml sudo chown clickhouse /etc/clickhouse-server/config.d/s3_storage_policy_by_default.xml sudo chgrp clickhouse /etc/clickhouse-server/config.d/s3_storage_policy_by_default.xml -# it contains some new settings, but we can safely remove it -rm /etc/clickhouse-server/config.d/merge_tree.xml -rm /etc/clickhouse-server/config.d/enable_wait_for_shutdown_replicated_tables.xml -rm /etc/clickhouse-server/config.d/zero_copy_destructive_operations.xml -rm /etc/clickhouse-server/config.d/storage_conf_02963.xml -rm /etc/clickhouse-server/config.d/backoff_failed_mutation.xml -rm /etc/clickhouse-server/config.d/handlers.yaml -rm /etc/clickhouse-server/config.d/block_number.xml -rm /etc/clickhouse-server/users.d/nonconst_timezone.xml -rm /etc/clickhouse-server/users.d/s3_cache_new.xml -rm /etc/clickhouse-server/users.d/replicated_ddl_entry.xml - start clickhouse-client --query="SELECT 'Server version: ', version()" @@ -192,6 +135,7 @@ then save_settings_clean 'new_settings.native' clickhouse-local -nmq " CREATE TABLE old_settings AS file('old_settings.native'); + CREATE TABLE old_version AS file('old_version.native'); CREATE TABLE new_settings AS file('new_settings.native'); SELECT @@ -202,8 +146,11 @@ then LEFT JOIN old_settings ON new_settings.name = old_settings.name WHERE (new_settings.value != old_settings.value) AND (name NOT IN ( SELECT arrayJoin(tupleElement(changes, 'name')) - FROM system.settings_changes - WHERE version = extract(version(), '^(?:\\d+\\.\\d+)') + FROM + ( + SELECT *, splitByChar('.', version) AS version_array FROM system.settings_changes + ) + WHERE (version_array[1]::UInt64 * 100 + version_array[2]::UInt64) > (SELECT v FROM old_version LIMIT 1) )) SETTINGS join_use_nulls = 1 INTO OUTFILE 'changed_settings.txt' @@ -216,8 +163,11 @@ then FROM old_settings )) AND (name NOT IN ( SELECT arrayJoin(tupleElement(changes, 'name')) - FROM system.settings_changes - WHERE version = extract(version(), '^(?:\\d+\\.\\d+)') + FROM + ( + SELECT *, splitByChar('.', version) AS version_array FROM system.settings_changes + ) + WHERE (version_array[1]::UInt64 * 100 + version_array[2]::UInt64) > (SELECT v FROM old_version LIMIT 1) )) INTO OUTFILE 'new_settings.txt' FORMAT PrettyCompactNoEscapes; diff --git a/docs/changelogs/v23.8.1.2992-lts.md b/docs/changelogs/v23.8.1.2992-lts.md index 05385d9c52b..62326533a79 100644 --- a/docs/changelogs/v23.8.1.2992-lts.md +++ b/docs/changelogs/v23.8.1.2992-lts.md @@ -33,7 +33,7 @@ sidebar_label: 2023 * Add input format One that doesn't read any data and always returns single row with column `dummy` with type `UInt8` and value `0` like `system.one`. It can be used together with `_file/_path` virtual columns to list files in file/s3/url/hdfs/etc table functions without reading any data. [#53209](https://github.com/ClickHouse/ClickHouse/pull/53209) ([Kruglov Pavel](https://github.com/Avogar)). * Add tupleConcat function. Closes [#52759](https://github.com/ClickHouse/ClickHouse/issues/52759). [#53239](https://github.com/ClickHouse/ClickHouse/pull/53239) ([Nikolay Degterinsky](https://github.com/evillique)). * Support `TRUNCATE DATABASE` operation. [#53261](https://github.com/ClickHouse/ClickHouse/pull/53261) ([Bharat Nallan](https://github.com/bharatnc)). -* Add max_threads_for_indexes setting to limit number of threads used for primary key processing. [#53313](https://github.com/ClickHouse/ClickHouse/pull/53313) ([jorisgio](https://github.com/jorisgio)). +* Add max_threads_for_indexes setting to limit number of threads used for primary key processing. [#53313](https://github.com/ClickHouse/ClickHouse/pull/53313) ([Joris Giovannangeli](https://github.com/jorisgio)). * Add experimental support for HNSW as approximate neighbor search method. [#53447](https://github.com/ClickHouse/ClickHouse/pull/53447) ([Davit Vardanyan](https://github.com/davvard)). * Re-add SipHash keyed functions. [#53525](https://github.com/ClickHouse/ClickHouse/pull/53525) ([Salvatore Mesoraca](https://github.com/aiven-sal)). * ([#52755](https://github.com/ClickHouse/ClickHouse/issues/52755) , [#52895](https://github.com/ClickHouse/ClickHouse/issues/52895)) Added functions `arrayRotateLeft`, `arrayRotateRight`, `arrayShiftLeft`, `arrayShiftRight`. [#53557](https://github.com/ClickHouse/ClickHouse/pull/53557) ([Mikhail Koviazin](https://github.com/mkmkme)). @@ -72,7 +72,7 @@ sidebar_label: 2023 * Add ability to log when max_partitions_per_insert_block is reached ... [#50948](https://github.com/ClickHouse/ClickHouse/pull/50948) ([Sean Haynes](https://github.com/seandhaynes)). * Added a bunch of custom commands (mostly to make ClickHouse debugging easier). [#51117](https://github.com/ClickHouse/ClickHouse/pull/51117) ([pufit](https://github.com/pufit)). * Updated check for connection_string as connection string with sas does not always begin with DefaultEndPoint and updated connection url to include sas token after adding container to url. [#51141](https://github.com/ClickHouse/ClickHouse/pull/51141) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* Fix description for filtering sets in full_sorting_merge join. [#51329](https://github.com/ClickHouse/ClickHouse/pull/51329) ([Tanay Tummalapalli](https://github.com/ttanay)). +* Fix description for filtering sets in full_sorting_merge join. [#51329](https://github.com/ClickHouse/ClickHouse/pull/51329) ([ttanay](https://github.com/ttanay)). * The sizes of the (index) uncompressed/mark, mmap and query caches can now be configured dynamically at runtime. [#51446](https://github.com/ClickHouse/ClickHouse/pull/51446) ([Robert Schulze](https://github.com/rschu1ze)). * Fixed memory consumption in `Aggregator` when `max_block_size` is huge. [#51566](https://github.com/ClickHouse/ClickHouse/pull/51566) ([Nikita Taranov](https://github.com/nickitat)). * Add `SYSTEM SYNC FILESYSTEM CACHE` command. It will compare in-memory state of filesystem cache with what it has on disk and fix in-memory state if needed. [#51622](https://github.com/ClickHouse/ClickHouse/pull/51622) ([Kseniia Sumarokova](https://github.com/kssenii)). @@ -80,10 +80,10 @@ sidebar_label: 2023 * Support reading tuple subcolumns from file/s3/hdfs/url/azureBlobStorage table functions. [#51806](https://github.com/ClickHouse/ClickHouse/pull/51806) ([Kruglov Pavel](https://github.com/Avogar)). * Function `arrayIntersect` now returns the values sorted like the first argument. Closes [#27622](https://github.com/ClickHouse/ClickHouse/issues/27622). [#51850](https://github.com/ClickHouse/ClickHouse/pull/51850) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). * Add new queries, which allow to create/drop of access entities in specified access storage or move access entities from one access storage to another. [#51912](https://github.com/ClickHouse/ClickHouse/pull/51912) ([pufit](https://github.com/pufit)). -* ALTER TABLE FREEZE are not replicated in Replicated engine. [#52064](https://github.com/ClickHouse/ClickHouse/pull/52064) ([Mike Kot](https://github.com/myrrc)). +* ALTER TABLE FREEZE are not replicated in Replicated engine. [#52064](https://github.com/ClickHouse/ClickHouse/pull/52064) ([Mikhail Kot](https://github.com/myrrc)). * Added possibility to flush logs to the disk on crash - Added logs buffer configuration. [#52174](https://github.com/ClickHouse/ClickHouse/pull/52174) ([Alexey Gerasimchuck](https://github.com/Demilivor)). -* Fix S3 table function does not work for pre-signed URL. close [#50846](https://github.com/ClickHouse/ClickHouse/issues/50846). [#52310](https://github.com/ClickHouse/ClickHouse/pull/52310) ([chen](https://github.com/xiedeyantu)). -* System.events and system.metrics tables add column name as an alias to event and metric. close [#51257](https://github.com/ClickHouse/ClickHouse/issues/51257). [#52315](https://github.com/ClickHouse/ClickHouse/pull/52315) ([chen](https://github.com/xiedeyantu)). +* Fix S3 table function does not work for pre-signed URL. close [#50846](https://github.com/ClickHouse/ClickHouse/issues/50846). [#52310](https://github.com/ClickHouse/ClickHouse/pull/52310) ([Jensen](https://github.com/xiedeyantu)). +* System.events and system.metrics tables add column name as an alias to event and metric. close [#51257](https://github.com/ClickHouse/ClickHouse/issues/51257). [#52315](https://github.com/ClickHouse/ClickHouse/pull/52315) ([Jensen](https://github.com/xiedeyantu)). * Added support of syntax `CREATE UNIQUE INDEX` in parser for better SQL compatibility. `UNIQUE` index is not supported. Set `create_index_ignore_unique=1` to ignore UNIQUE keyword in queries. [#52320](https://github.com/ClickHouse/ClickHouse/pull/52320) ([Ilya Yatsishin](https://github.com/qoega)). * Add support of predefined macro (`{database}` and `{table}`) in some kafka engine settings: topic, consumer, client_id, etc. [#52386](https://github.com/ClickHouse/ClickHouse/pull/52386) ([Yury Bogomolov](https://github.com/ybogo)). * Disable updating fs cache during backup/restore. Filesystem cache must not be updated during backup/restore, it seems it just slows down the process without any profit (because the BACKUP command can read a lot of data and it's no use to put all the data to the filesystem cache and immediately evict it). [#52402](https://github.com/ClickHouse/ClickHouse/pull/52402) ([Vitaly Baranov](https://github.com/vitlibar)). @@ -107,7 +107,7 @@ sidebar_label: 2023 * Use the same default paths for `clickhouse_keeper` (symlink) as for `clickhouse_keeper` (executable). [#52861](https://github.com/ClickHouse/ClickHouse/pull/52861) ([Vitaly Baranov](https://github.com/vitlibar)). * CVE-2016-2183: disable 3DES. [#52893](https://github.com/ClickHouse/ClickHouse/pull/52893) ([Kenji Noguchi](https://github.com/knoguchi)). * Load filesystem cache metadata on startup in parallel. Configured by `load_metadata_threads` (default: 1) cache config setting. Related to [#52037](https://github.com/ClickHouse/ClickHouse/issues/52037). [#52943](https://github.com/ClickHouse/ClickHouse/pull/52943) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Improve error message for table function remote. Closes [#40220](https://github.com/ClickHouse/ClickHouse/issues/40220). [#52959](https://github.com/ClickHouse/ClickHouse/pull/52959) ([jiyoungyoooo](https://github.com/jiyoungyoooo)). +* Improve error message for table function remote. Closes [#40220](https://github.com/ClickHouse/ClickHouse/issues/40220). [#52959](https://github.com/ClickHouse/ClickHouse/pull/52959) ([Jiyoung Yoo](https://github.com/jiyoungyoooo)). * Added the possibility to specify custom storage policy in the `SETTINGS` clause of `RESTORE` queries. [#52970](https://github.com/ClickHouse/ClickHouse/pull/52970) ([Victor Krasnov](https://github.com/sirvickr)). * Add the ability to throttle the S3 requests on backup operations (`BACKUP` and `RESTORE` commands now honor `s3_max_[get/put]_[rps/burst]`). [#52974](https://github.com/ClickHouse/ClickHouse/pull/52974) ([Daniel Pozo Escalona](https://github.com/danipozo)). * Add settings to ignore ON CLUSTER clause in queries for management of replicated user-defined functions or access control entities with replicated storage. [#52975](https://github.com/ClickHouse/ClickHouse/pull/52975) ([Aleksei Filatov](https://github.com/aalexfvk)). @@ -127,7 +127,7 @@ sidebar_label: 2023 * Server settings asynchronous_metrics_update_period_s and asynchronous_heavy_metrics_update_period_s configured to 0 now fail gracefully instead of crash the server. [#53428](https://github.com/ClickHouse/ClickHouse/pull/53428) ([Robert Schulze](https://github.com/rschu1ze)). * Previously the caller could register the same watch callback multiple times. In that case each entry was consuming memory and the same callback was called multiple times which didn't make much sense. In order to avoid this the caller could have some logic to not add the same watch multiple times. With this change this deduplication is done internally if the watch callback is passed via shared_ptr. [#53452](https://github.com/ClickHouse/ClickHouse/pull/53452) ([Alexander Gololobov](https://github.com/davenger)). * The ClickHouse server now respects memory limits changed via cgroups when reloading its configuration. [#53455](https://github.com/ClickHouse/ClickHouse/pull/53455) ([Robert Schulze](https://github.com/rschu1ze)). -* Add ability to turn off flush of Distributed tables on `DETACH`/`DROP`/server shutdown. [#53501](https://github.com/ClickHouse/ClickHouse/pull/53501) ([Azat Khuzhin](https://github.com/azat)). +* Add ability to turn off flush of Distributed tables on `DETACH`/`DROP`/server shutdown (`flush_on_detach` setting for `Distributed`). [#53501](https://github.com/ClickHouse/ClickHouse/pull/53501) ([Azat Khuzhin](https://github.com/azat)). * Domainrfc support ipv6(ip literal within square brackets). [#53506](https://github.com/ClickHouse/ClickHouse/pull/53506) ([Chen768959](https://github.com/Chen768959)). * Use filter by file/path before reading in url/file/hdfs table functins. [#53529](https://github.com/ClickHouse/ClickHouse/pull/53529) ([Kruglov Pavel](https://github.com/Avogar)). * Use longer timeout for S3 CopyObject requests. [#53533](https://github.com/ClickHouse/ClickHouse/pull/53533) ([Michael Kolupaev](https://github.com/al13n321)). @@ -186,71 +186,71 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Do not reset Annoy index during build-up with > 1 mark [#51325](https://github.com/ClickHouse/ClickHouse/pull/51325) ([Tian Xinhui](https://github.com/xinhuitian)). -* Fix usage of temporary directories during RESTORE [#51493](https://github.com/ClickHouse/ClickHouse/pull/51493) ([Azat Khuzhin](https://github.com/azat)). -* Fix binary arithmetic for Nullable(IPv4) [#51642](https://github.com/ClickHouse/ClickHouse/pull/51642) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Support IPv4 and IPv6 as dictionary attributes [#51756](https://github.com/ClickHouse/ClickHouse/pull/51756) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Bug fix for checksum of compress marks [#51777](https://github.com/ClickHouse/ClickHouse/pull/51777) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* Fix mistakenly comma parsing as part of datetime in CSV best effort parsing [#51950](https://github.com/ClickHouse/ClickHouse/pull/51950) ([Kruglov Pavel](https://github.com/Avogar)). -* Don't throw exception when exec udf has parameters [#51961](https://github.com/ClickHouse/ClickHouse/pull/51961) ([Nikita Taranov](https://github.com/nickitat)). -* Fix recalculation of skip indexes and projections in `ALTER DELETE` queries [#52530](https://github.com/ClickHouse/ClickHouse/pull/52530) ([Anton Popov](https://github.com/CurtizJ)). -* MaterializedMySQL: Fix the infinite loop in ReadBuffer::read [#52621](https://github.com/ClickHouse/ClickHouse/pull/52621) ([Val Doroshchuk](https://github.com/valbok)). -* Load suggestion only with `clickhouse` dialect [#52628](https://github.com/ClickHouse/ClickHouse/pull/52628) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* init and destroy ares channel on demand.. [#52634](https://github.com/ClickHouse/ClickHouse/pull/52634) ([Arthur Passos](https://github.com/arthurpassos)). -* RFC: Fix filtering by virtual columns with OR expression [#52653](https://github.com/ClickHouse/ClickHouse/pull/52653) ([Azat Khuzhin](https://github.com/azat)). -* Fix crash in function `tuple` with one sparse column argument [#52659](https://github.com/ClickHouse/ClickHouse/pull/52659) ([Anton Popov](https://github.com/CurtizJ)). -* Fix named collections on cluster 23.7 [#52687](https://github.com/ClickHouse/ClickHouse/pull/52687) ([Al Korgun](https://github.com/alkorgun)). -* Fix reading of unnecessary column in case of multistage `PREWHERE` [#52689](https://github.com/ClickHouse/ClickHouse/pull/52689) ([Anton Popov](https://github.com/CurtizJ)). -* Fix unexpected sort result on multi columns with nulls first direction [#52761](https://github.com/ClickHouse/ClickHouse/pull/52761) ([copperybean](https://github.com/copperybean)). -* Fix data race in Keeper reconfiguration [#52804](https://github.com/ClickHouse/ClickHouse/pull/52804) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix sorting of sparse columns with large limit [#52827](https://github.com/ClickHouse/ClickHouse/pull/52827) ([Anton Popov](https://github.com/CurtizJ)). -* clickhouse-keeper: fix implementation of server with poll() [#52833](https://github.com/ClickHouse/ClickHouse/pull/52833) ([Andy Fiddaman](https://github.com/citrus-it)). -* make regexp analyzer recognize named capturing groups [#52840](https://github.com/ClickHouse/ClickHouse/pull/52840) ([Han Fei](https://github.com/hanfei1991)). -* Fix possible assert in ~PushingAsyncPipelineExecutor in clickhouse-local [#52862](https://github.com/ClickHouse/ClickHouse/pull/52862) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix reading of empty `Nested(Array(LowCardinality(...)))` [#52949](https://github.com/ClickHouse/ClickHouse/pull/52949) ([Anton Popov](https://github.com/CurtizJ)). -* Added new tests for session_log and fixed the inconsistency between login and logout. [#52958](https://github.com/ClickHouse/ClickHouse/pull/52958) ([Alexey Gerasimchuck](https://github.com/Demilivor)). -* Fix password leak in show create mysql table [#52962](https://github.com/ClickHouse/ClickHouse/pull/52962) ([Duc Canh Le](https://github.com/canhld94)). -* Convert sparse to full in CreateSetAndFilterOnTheFlyStep [#53000](https://github.com/ClickHouse/ClickHouse/pull/53000) ([vdimir](https://github.com/vdimir)). -* Fix rare race condition with empty key prefix directory deletion in fs cache [#53055](https://github.com/ClickHouse/ClickHouse/pull/53055) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix ZstdDeflatingWriteBuffer truncating the output sometimes [#53064](https://github.com/ClickHouse/ClickHouse/pull/53064) ([Michael Kolupaev](https://github.com/al13n321)). -* Fix query_id in part_log with async flush queries [#53103](https://github.com/ClickHouse/ClickHouse/pull/53103) ([Raúl Marín](https://github.com/Algunenano)). -* Fix possible error from cache "Read unexpected size" [#53121](https://github.com/ClickHouse/ClickHouse/pull/53121) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Disable the new parquet encoder [#53130](https://github.com/ClickHouse/ClickHouse/pull/53130) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Not-ready Set [#53162](https://github.com/ClickHouse/ClickHouse/pull/53162) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix character escaping in the PostgreSQL engine [#53250](https://github.com/ClickHouse/ClickHouse/pull/53250) ([Nikolay Degterinsky](https://github.com/evillique)). -* #2 Added new tests for session_log and fixed the inconsistency between login and logout. [#53255](https://github.com/ClickHouse/ClickHouse/pull/53255) ([Alexey Gerasimchuck](https://github.com/Demilivor)). -* #3 Fixed inconsistency between login success and logout [#53302](https://github.com/ClickHouse/ClickHouse/pull/53302) ([Alexey Gerasimchuck](https://github.com/Demilivor)). -* Fix adding sub-second intervals to DateTime [#53309](https://github.com/ClickHouse/ClickHouse/pull/53309) ([Michael Kolupaev](https://github.com/al13n321)). -* Fix "Context has expired" error in dictionaries [#53342](https://github.com/ClickHouse/ClickHouse/pull/53342) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix incorrect normal projection AST format [#53347](https://github.com/ClickHouse/ClickHouse/pull/53347) ([Amos Bird](https://github.com/amosbird)). -* Forbid use_structure_from_insertion_table_in_table_functions when execute Scalar [#53348](https://github.com/ClickHouse/ClickHouse/pull/53348) ([flynn](https://github.com/ucasfl)). -* Fix loading lazy database during system.table select query [#53372](https://github.com/ClickHouse/ClickHouse/pull/53372) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* Fixed system.data_skipping_indices for MaterializedMySQL [#53381](https://github.com/ClickHouse/ClickHouse/pull/53381) ([Filipp Ozinov](https://github.com/bakwc)). -* Fix processing single carriage return in TSV file segmentation engine [#53407](https://github.com/ClickHouse/ClickHouse/pull/53407) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix 'Context has expired' error properly [#53433](https://github.com/ClickHouse/ClickHouse/pull/53433) ([Michael Kolupaev](https://github.com/al13n321)). -* Fix timeout_overflow_mode when having subquery in the rhs of IN [#53439](https://github.com/ClickHouse/ClickHouse/pull/53439) ([Duc Canh Le](https://github.com/canhld94)). -* Fix an unexpected behavior in [#53152](https://github.com/ClickHouse/ClickHouse/issues/53152) [#53440](https://github.com/ClickHouse/ClickHouse/pull/53440) ([Zhiguo Zhou](https://github.com/ZhiguoZh)). -* Fix JSON_QUERY Function parse error while path is all number [#53470](https://github.com/ClickHouse/ClickHouse/pull/53470) ([KevinyhZou](https://github.com/KevinyhZou)). -* Fix wrong columns order for queries with parallel FINAL. [#53489](https://github.com/ClickHouse/ClickHouse/pull/53489) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fixed SELECTing from ReplacingMergeTree with do_not_merge_across_partitions_select_final [#53511](https://github.com/ClickHouse/ClickHouse/pull/53511) ([Vasily Nemkov](https://github.com/Enmk)). -* bugfix: Flush async insert queue first on shutdown [#53547](https://github.com/ClickHouse/ClickHouse/pull/53547) ([joelynch](https://github.com/joelynch)). -* Fix crash in join on sparse column [#53548](https://github.com/ClickHouse/ClickHouse/pull/53548) ([vdimir](https://github.com/vdimir)). -* Fix possible UB in Set skipping index for functions with incorrect args [#53559](https://github.com/ClickHouse/ClickHouse/pull/53559) ([Azat Khuzhin](https://github.com/azat)). -* Fix possible UB in inverted indexes (experimental feature) [#53560](https://github.com/ClickHouse/ClickHouse/pull/53560) ([Azat Khuzhin](https://github.com/azat)). -* Fix: interpolate expression takes source column instead of same name aliased from select expression. [#53572](https://github.com/ClickHouse/ClickHouse/pull/53572) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Fix number of dropped granules in EXPLAIN PLAN index=1 [#53616](https://github.com/ClickHouse/ClickHouse/pull/53616) ([wangxiaobo](https://github.com/wzb5212)). -* Correctly handle totals and extremes with `DelayedSource` [#53644](https://github.com/ClickHouse/ClickHouse/pull/53644) ([Antonio Andelic](https://github.com/antonio2368)). -* Prepared set cache in mutation pipeline stuck [#53645](https://github.com/ClickHouse/ClickHouse/pull/53645) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix bug on mutations with subcolumns of type JSON in predicates of UPDATE and DELETE queries. [#53677](https://github.com/ClickHouse/ClickHouse/pull/53677) ([VanDarkholme7](https://github.com/VanDarkholme7)). -* Fix filter pushdown for full_sorting_merge join [#53699](https://github.com/ClickHouse/ClickHouse/pull/53699) ([vdimir](https://github.com/vdimir)). -* Try to fix bug with NULL::LowCardinality(Nullable(...)) NOT IN [#53706](https://github.com/ClickHouse/ClickHouse/pull/53706) ([Andrey Zvonov](https://github.com/zvonand)). -* Fix: sorted distinct with sparse columns [#53711](https://github.com/ClickHouse/ClickHouse/pull/53711) ([Igor Nikonov](https://github.com/devcrafter)). -* transform: correctly handle default column with multiple rows [#53742](https://github.com/ClickHouse/ClickHouse/pull/53742) ([Salvatore Mesoraca](https://github.com/aiven-sal)). -* Fix fuzzer crash in parseDateTime() [#53764](https://github.com/ClickHouse/ClickHouse/pull/53764) ([Robert Schulze](https://github.com/rschu1ze)). -* Materialized postgres: fix uncaught exception in getCreateTableQueryImpl [#53832](https://github.com/ClickHouse/ClickHouse/pull/53832) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix possible segfault while using PostgreSQL engine [#53847](https://github.com/ClickHouse/ClickHouse/pull/53847) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix named_collection_admin alias [#54066](https://github.com/ClickHouse/ClickHouse/pull/54066) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix rows_before_limit_at_least for DelayedSource. [#54122](https://github.com/ClickHouse/ClickHouse/pull/54122) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix results of queries utilizing the Annoy index when the part has more than one mark. [#51325](https://github.com/ClickHouse/ClickHouse/pull/51325) ([Tian Xinhui](https://github.com/xinhuitian)). +* Fix usage of temporary directories during RESTORE. [#51493](https://github.com/ClickHouse/ClickHouse/pull/51493) ([Azat Khuzhin](https://github.com/azat)). +* Fixed binary arithmetic for Nullable(IPv4). [#51642](https://github.com/ClickHouse/ClickHouse/pull/51642) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Support IPv4 and IPv6 as dictionary attributes. [#51756](https://github.com/ClickHouse/ClickHouse/pull/51756) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Updated checkDataPart to read compress marks as compressed file by checking its extension resolves [#51337](https://github.com/ClickHouse/ClickHouse/issues/51337). [#51777](https://github.com/ClickHouse/ClickHouse/pull/51777) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fix mistakenly comma parsing as part of datetime in CSV datetime best effort parsing. Closes [#51059](https://github.com/ClickHouse/ClickHouse/issues/51059). [#51950](https://github.com/ClickHouse/ClickHouse/pull/51950) ([Kruglov Pavel](https://github.com/Avogar)). +* Fixed exception when executable udf was provided with a parameter. [#51961](https://github.com/ClickHouse/ClickHouse/pull/51961) ([Nikita Taranov](https://github.com/nickitat)). +* Fixed recalculation of skip indexes and projections in `ALTER DELETE` queries. [#52530](https://github.com/ClickHouse/ClickHouse/pull/52530) ([Anton Popov](https://github.com/CurtizJ)). +* Fixed the infinite loop in ReadBuffer when the pos overflows the end of the buffer in MaterializedMySQL. [#52621](https://github.com/ClickHouse/ClickHouse/pull/52621) ([Val Doroshchuk](https://github.com/valbok)). +* Do not try to load suggestions in `clickhouse-local` when a the dialect is not `clickhouse`. [#52628](https://github.com/ClickHouse/ClickHouse/pull/52628) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Remove mutex from CaresPTRResolver and create `ares_channel` on demand. Trying to fix: https://github.com/ClickHouse/ClickHouse/pull/52327#issuecomment-1643021543. [#52634](https://github.com/ClickHouse/ClickHouse/pull/52634) ([Arthur Passos](https://github.com/arthurpassos)). +* Fix filtering by virtual columns with OR expression (i.e. by `_table` for `Merge` engine). [#52653](https://github.com/ClickHouse/ClickHouse/pull/52653) ([Azat Khuzhin](https://github.com/azat)). +* Fix crash in function `tuple` with one sparse column argument. [#52659](https://github.com/ClickHouse/ClickHouse/pull/52659) ([Anton Popov](https://github.com/CurtizJ)). +* Fix named collections related statements: `if [not] exists`, `on cluster`. Closes [#51609](https://github.com/ClickHouse/ClickHouse/issues/51609). [#52687](https://github.com/ClickHouse/ClickHouse/pull/52687) ([Al Korgun](https://github.com/alkorgun)). +* Fix reading of unnecessary column in case of multistage `PREWHERE`. [#52689](https://github.com/ClickHouse/ClickHouse/pull/52689) ([Anton Popov](https://github.com/CurtizJ)). +* Fix unexpected sort result on multi columns with nulls first direction. [#52761](https://github.com/ClickHouse/ClickHouse/pull/52761) ([ZhiHong Zhang](https://github.com/copperybean)). +* Keeper fix: fix data race during reconfiguration. [#52804](https://github.com/ClickHouse/ClickHouse/pull/52804) ([Antonio Andelic](https://github.com/antonio2368)). +* Fixed sorting of sparse columns in case of `ORDER BY ... LIMIT n` clause and large values of `n`. [#52827](https://github.com/ClickHouse/ClickHouse/pull/52827) ([Anton Popov](https://github.com/CurtizJ)). +* Keeper fix: platforms that used poll() would delay responding to requests until the client sent a heartbeat. [#52833](https://github.com/ClickHouse/ClickHouse/pull/52833) ([Andy Fiddaman](https://github.com/citrus-it)). +* Make regexp analyzer recognize named capturing groups. [#52840](https://github.com/ClickHouse/ClickHouse/pull/52840) ([Han Fei](https://github.com/hanfei1991)). +* Fix possible assert in ~PushingAsyncPipelineExecutor in clickhouse-local. [#52862](https://github.com/ClickHouse/ClickHouse/pull/52862) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix reading of empty `Nested(Array(LowCardinality(...)))` columns (added by `ALTER TABLE ... ADD COLUMN ...` query and not materialized in parts) from compact parts of `MergeTree` tables. [#52949](https://github.com/ClickHouse/ClickHouse/pull/52949) ([Anton Popov](https://github.com/CurtizJ)). +* Fixed the record inconsistency in session_log between login and logout. [#52958](https://github.com/ClickHouse/ClickHouse/pull/52958) ([Alexey Gerasimchuck](https://github.com/Demilivor)). +* Fix password leak in show create mysql table. [#52962](https://github.com/ClickHouse/ClickHouse/pull/52962) ([Duc Canh Le](https://github.com/canhld94)). +* Fix possible crash in full sorting merge join on sparse columns, close [#52978](https://github.com/ClickHouse/ClickHouse/issues/52978). [#53000](https://github.com/ClickHouse/ClickHouse/pull/53000) ([vdimir](https://github.com/vdimir)). +* Fix very rare race condition with empty key prefix directory deletion in fs cache. [#53055](https://github.com/ClickHouse/ClickHouse/pull/53055) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fixed `output_format_parquet_compression_method='zstd'` producing invalid Parquet files sometimes. In older versions, use setting `output_format_parquet_use_custom_encoder = 0` as a workaround. [#53064](https://github.com/ClickHouse/ClickHouse/pull/53064) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix query_id in part_log with async flush queries. [#53103](https://github.com/ClickHouse/ClickHouse/pull/53103) ([Raúl Marín](https://github.com/Algunenano)). +* Fix possible error from filesystem cache "Read unexpected size". [#53121](https://github.com/ClickHouse/ClickHouse/pull/53121) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Disable the new parquet encoder: it has a bug. [#53130](https://github.com/ClickHouse/ClickHouse/pull/53130) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* `Not-ready Set is passed as the second argument for function 'in'` could happen with limited `max_result_rows` and ` result_overflow_mode = 'break'`. [#53162](https://github.com/ClickHouse/ClickHouse/pull/53162) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix character escaping in the PostgreSQL engine (`\'` -> `''`, `\\` -> `\`). Closes [#49821](https://github.com/ClickHouse/ClickHouse/issues/49821). [#53250](https://github.com/ClickHouse/ClickHouse/pull/53250) ([Nikolay Degterinsky](https://github.com/evillique)). +* Fixed the record inconsistency in session_log between login and logout. [#53255](https://github.com/ClickHouse/ClickHouse/pull/53255) ([Alexey Gerasimchuck](https://github.com/Demilivor)). +* Fixed the record inconsistency in session_log between login and logout. [#53302](https://github.com/ClickHouse/ClickHouse/pull/53302) ([Alexey Gerasimchuck](https://github.com/Demilivor)). +* Fixed adding intervals of a fraction of a second to DateTime producing incorrect result. [#53309](https://github.com/ClickHouse/ClickHouse/pull/53309) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix the "Context has expired" error in dictionaries when using subqueries. [#53342](https://github.com/ClickHouse/ClickHouse/pull/53342) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix incorrect normal projection AST format when single function is used in ORDER BY. This fixes [#52607](https://github.com/ClickHouse/ClickHouse/issues/52607). [#53347](https://github.com/ClickHouse/ClickHouse/pull/53347) ([Amos Bird](https://github.com/amosbird)). +* Forbid `use_structure_from_insertion_table_in_table_functions` when execute Scalar. Closes [#52494](https://github.com/ClickHouse/ClickHouse/issues/52494). [#53348](https://github.com/ClickHouse/ClickHouse/pull/53348) ([flynn](https://github.com/ucasfl)). +* Avoid loading tables from lazy database when not needed Follow up to [#43840](https://github.com/ClickHouse/ClickHouse/issues/43840). [#53372](https://github.com/ClickHouse/ClickHouse/pull/53372) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fixed `system.data_skipping_indices` columns `data_compressed_bytes` and `data_uncompressed_bytes` for MaterializedMySQL. [#53381](https://github.com/ClickHouse/ClickHouse/pull/53381) ([Filipp Ozinov](https://github.com/bakwc)). +* Fix processing single carriage return in TSV file segmentation engine that could lead to parsing errors. Closes [#53320](https://github.com/ClickHouse/ClickHouse/issues/53320). [#53407](https://github.com/ClickHouse/ClickHouse/pull/53407) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix the "Context has expired" error when using subqueries with functions `file()` (regular function, not table function), `joinGet()`, `joinGetOrNull()`, `connectionId()`. [#53433](https://github.com/ClickHouse/ClickHouse/pull/53433) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix timeout_overflow_mode when having subquery in the rhs of IN. [#53439](https://github.com/ClickHouse/ClickHouse/pull/53439) ([Duc Canh Le](https://github.com/canhld94)). +* This PR fixes [#53152](https://github.com/ClickHouse/ClickHouse/issues/53152). [#53440](https://github.com/ClickHouse/ClickHouse/pull/53440) ([Zhiguo Zhou](https://github.com/ZhiguoZh)). +* Fix the JSON_QUERY function can not parse the json string while path is numberic. like in the query SELECT JSON_QUERY('{"123":"abcd"}', '$.123'), we would encounter the exceptions ``` DB::Exception: Unable to parse JSONPath: While processing JSON_QUERY('{"123":"acd"}', '$.123'). (BAD_ARGUMENTS) ```. [#53470](https://github.com/ClickHouse/ClickHouse/pull/53470) ([KevinyhZou](https://github.com/KevinyhZou)). +* Fix possible crash for queries with parallel `FINAL` where `ORDER BY` and `PRIMARY KEY` are different in table definition. [#53489](https://github.com/ClickHouse/ClickHouse/pull/53489) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fixed ReplacingMergeTree to properly process single-partition cases when `do_not_merge_across_partitions_select_final=1`. Previously `SELECT` could return rows that were marked as deleted. [#53511](https://github.com/ClickHouse/ClickHouse/pull/53511) ([Vasily Nemkov](https://github.com/Enmk)). +* Fix bug in flushing of async insert queue on graceful shutdown. [#53547](https://github.com/ClickHouse/ClickHouse/pull/53547) ([joelynch](https://github.com/joelynch)). +* Fix crash in join on sparse column. [#53548](https://github.com/ClickHouse/ClickHouse/pull/53548) ([vdimir](https://github.com/vdimir)). +* Fix possible UB in Set skipping index for functions with incorrect args. [#53559](https://github.com/ClickHouse/ClickHouse/pull/53559) ([Azat Khuzhin](https://github.com/azat)). +* Fix possible UB in inverted indexes (experimental feature). [#53560](https://github.com/ClickHouse/ClickHouse/pull/53560) ([Azat Khuzhin](https://github.com/azat)). +* Fixed bug for interpolate when interpolated column is aliased with the same name as a source column. [#53572](https://github.com/ClickHouse/ClickHouse/pull/53572) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fixed a bug in EXPLAIN PLAN index=1 where the number of dropped granules was incorrect. [#53616](https://github.com/ClickHouse/ClickHouse/pull/53616) ([wangxiaobo](https://github.com/wzb5212)). +* Correctly handle totals and extremes when `DelayedSource` is used. [#53644](https://github.com/ClickHouse/ClickHouse/pull/53644) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix `Pipeline stuck` error in mutation with `IN (subquery WITH TOTALS)` where ready set was taken from cache. [#53645](https://github.com/ClickHouse/ClickHouse/pull/53645) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Allow to use JSON subcolumns in predicates of UPDATE and DELETE queries. [#53677](https://github.com/ClickHouse/ClickHouse/pull/53677) ([zps](https://github.com/VanDarkholme7)). +* Fix possible logical error exception during filter pushdown for full_sorting_merge join. [#53699](https://github.com/ClickHouse/ClickHouse/pull/53699) ([vdimir](https://github.com/vdimir)). +* Fix NULL::LowCardinality(Nullable(...)) with IN. [#53706](https://github.com/ClickHouse/ClickHouse/pull/53706) ([Andrey Zvonov](https://github.com/zvonand)). +* Fixes possible crashes in `DISTINCT` queries with enabled `optimize_distinct_in_order` and sparse columns. [#53711](https://github.com/ClickHouse/ClickHouse/pull/53711) ([Igor Nikonov](https://github.com/devcrafter)). +* Correctly handle default column with multiple rows in transform. [#53742](https://github.com/ClickHouse/ClickHouse/pull/53742) ([Salvatore Mesoraca](https://github.com/aiven-sal)). +* Fix crash in SQL function parseDateTime() with non-const timezone argument. [#53764](https://github.com/ClickHouse/ClickHouse/pull/53764) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix uncaught exception in `getCreateTableQueryImpl`. [#53832](https://github.com/ClickHouse/ClickHouse/pull/53832) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix possible segfault while using PostgreSQL engine. Closes [#36919](https://github.com/ClickHouse/ClickHouse/issues/36919). [#53847](https://github.com/ClickHouse/ClickHouse/pull/53847) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix `named_collection_admin` alias to `named_collection_control` not working from config. [#54066](https://github.com/ClickHouse/ClickHouse/pull/54066) ([Kseniia Sumarokova](https://github.com/kssenii)). +* A distributed query could miss `rows_before_limit_at_least` in the query result in case it was executed on a replica with a delay more than `max_replica_delay_for_distributed_queries`. [#54122](https://github.com/ClickHouse/ClickHouse/pull/54122) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). #### NO CL ENTRY @@ -272,7 +272,7 @@ sidebar_label: 2023 * Add more checks into ThreadStatus ctor. [#42019](https://github.com/ClickHouse/ClickHouse/pull/42019) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Refactor Query Tree visitor [#46740](https://github.com/ClickHouse/ClickHouse/pull/46740) ([Dmitry Novik](https://github.com/novikd)). * Revert "Revert "Randomize JIT settings in tests"" [#48282](https://github.com/ClickHouse/ClickHouse/pull/48282) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix outdated cache configuration in s3 tests: s3_storage_policy_by_defau... [#48424](https://github.com/ClickHouse/ClickHouse/pull/48424) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix outdated cache configuration in s3 tests: s3_storage_policy_by_defau… [#48424](https://github.com/ClickHouse/ClickHouse/pull/48424) ([Kseniia Sumarokova](https://github.com/kssenii)). * Fix IN with decimal in analyzer [#48754](https://github.com/ClickHouse/ClickHouse/pull/48754) ([vdimir](https://github.com/vdimir)). * Some unclear change in StorageBuffer::reschedule() for something [#49723](https://github.com/ClickHouse/ClickHouse/pull/49723) ([DimasKovas](https://github.com/DimasKovas)). * MergeTree & SipHash checksum big-endian support [#50276](https://github.com/ClickHouse/ClickHouse/pull/50276) ([ltrk2](https://github.com/ltrk2)). @@ -540,7 +540,7 @@ sidebar_label: 2023 * Do not warn about arch_sys_counter clock [#53739](https://github.com/ClickHouse/ClickHouse/pull/53739) ([Artur Malchanau](https://github.com/Hexta)). * Add some profile events [#53741](https://github.com/ClickHouse/ClickHouse/pull/53741) ([Kseniia Sumarokova](https://github.com/kssenii)). * Support clang-18 (Wmissing-field-initializers) [#53751](https://github.com/ClickHouse/ClickHouse/pull/53751) ([Raúl Marín](https://github.com/Algunenano)). -* Upgrade openSSL to v3.0.10 [#53756](https://github.com/ClickHouse/ClickHouse/pull/53756) ([bhavnajindal](https://github.com/bhavnajindal)). +* Upgrade openSSL to v3.0.10 [#53756](https://github.com/ClickHouse/ClickHouse/pull/53756) ([Bhavna Jindal](https://github.com/bhavnajindal)). * Improve JSON-handling on s390x [#53760](https://github.com/ClickHouse/ClickHouse/pull/53760) ([ltrk2](https://github.com/ltrk2)). * Reduce API calls to SSM client [#53762](https://github.com/ClickHouse/ClickHouse/pull/53762) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). * Remove branch references from .gitmodules [#53763](https://github.com/ClickHouse/ClickHouse/pull/53763) ([Robert Schulze](https://github.com/rschu1ze)). @@ -588,3 +588,4 @@ sidebar_label: 2023 * tests: mark 02152_http_external_tables_memory_tracking as no-parallel [#54155](https://github.com/ClickHouse/ClickHouse/pull/54155) ([Azat Khuzhin](https://github.com/azat)). * The external logs have had colliding arguments [#54165](https://github.com/ClickHouse/ClickHouse/pull/54165) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). * Rename macro [#54169](https://github.com/ClickHouse/ClickHouse/pull/54169) ([Kseniia Sumarokova](https://github.com/kssenii)). + diff --git a/docs/changelogs/v23.8.10.43-lts.md b/docs/changelogs/v23.8.10.43-lts.md index 0093467d129..0750901da8a 100644 --- a/docs/changelogs/v23.8.10.43-lts.md +++ b/docs/changelogs/v23.8.10.43-lts.md @@ -16,17 +16,17 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Background merges correctly use temporary data storage in the cache [#57275](https://github.com/ClickHouse/ClickHouse/pull/57275) ([vdimir](https://github.com/vdimir)). -* MergeTree mutations reuse source part index granularity [#57352](https://github.com/ClickHouse/ClickHouse/pull/57352) ([Maksim Kita](https://github.com/kitaisreal)). -* Fix double destroy call on exception throw in addBatchLookupTable8 [#58745](https://github.com/ClickHouse/ClickHouse/pull/58745) ([Raúl Marín](https://github.com/Algunenano)). -* Fix JSONExtract function for LowCardinality(Nullable) columns [#58808](https://github.com/ClickHouse/ClickHouse/pull/58808) ([vdimir](https://github.com/vdimir)). -* Fix: LIMIT BY and LIMIT in distributed query [#59153](https://github.com/ClickHouse/ClickHouse/pull/59153) ([Igor Nikonov](https://github.com/devcrafter)). -* Fix translate() with FixedString input [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). -* Fix error "Read beyond last offset" for AsynchronousBoundedReadBuffer [#59630](https://github.com/ClickHouse/ClickHouse/pull/59630) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix query start time on non initial queries [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). -* Fix leftPad / rightPad function with FixedString input [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). -* rabbitmq: fix having neither acked nor nacked messages [#59775](https://github.com/ClickHouse/ClickHouse/pull/59775) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix cosineDistance crash with Nullable [#60150](https://github.com/ClickHouse/ClickHouse/pull/60150) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#57565](https://github.com/ClickHouse/ClickHouse/issues/57565): Background merges correctly use temporary data storage in the cache. [#57275](https://github.com/ClickHouse/ClickHouse/pull/57275) ([vdimir](https://github.com/vdimir)). +* Backported in [#57476](https://github.com/ClickHouse/ClickHouse/issues/57476): Fix possible broken skipping indexes after materialization in MergeTree compact parts. [#57352](https://github.com/ClickHouse/ClickHouse/pull/57352) ([Maksim Kita](https://github.com/kitaisreal)). +* Backported in [#58777](https://github.com/ClickHouse/ClickHouse/issues/58777): Fix double destroy call on exception throw in addBatchLookupTable8. [#58745](https://github.com/ClickHouse/ClickHouse/pull/58745) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#58856](https://github.com/ClickHouse/ClickHouse/issues/58856): Fix possible crash in JSONExtract function extracting `LowCardinality(Nullable(T))` type. [#58808](https://github.com/ClickHouse/ClickHouse/pull/58808) ([vdimir](https://github.com/vdimir)). +* Backported in [#59194](https://github.com/ClickHouse/ClickHouse/issues/59194): The combination of LIMIT BY and LIMIT could produce an incorrect result in distributed queries (parallel replicas included). [#59153](https://github.com/ClickHouse/ClickHouse/pull/59153) ([Igor Nikonov](https://github.com/devcrafter)). +* Backported in [#59429](https://github.com/ClickHouse/ClickHouse/issues/59429): Fix translate() with FixedString input. Could lead to crashes as it'd return a String column (vs the expected FixedString). This issue was found through ClickHouse Bug Bounty Program YohannJardin. [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#60128](https://github.com/ClickHouse/ClickHouse/issues/60128): Fix error `Read beyond last offset` for `AsynchronousBoundedReadBuffer`. [#59630](https://github.com/ClickHouse/ClickHouse/pull/59630) ([Vitaly Baranov](https://github.com/vitlibar)). +* Backported in [#59836](https://github.com/ClickHouse/ClickHouse/issues/59836): Fix query start time on non initial queries. [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59758](https://github.com/ClickHouse/ClickHouse/issues/59758): Fix leftPad / rightPad function with FixedString input. [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#60304](https://github.com/ClickHouse/ClickHouse/issues/60304): Fix having neigher acked nor nacked messages. If exception happens during read-write phase, messages will be nacked. [#59775](https://github.com/ClickHouse/ClickHouse/pull/59775) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#60171](https://github.com/ClickHouse/ClickHouse/issues/60171): Fix cosineDistance crash with Nullable. [#60150](https://github.com/ClickHouse/ClickHouse/pull/60150) ([Raúl Marín](https://github.com/Algunenano)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.11.28-lts.md b/docs/changelogs/v23.8.11.28-lts.md index acc284caa72..3da3d10cfa5 100644 --- a/docs/changelogs/v23.8.11.28-lts.md +++ b/docs/changelogs/v23.8.11.28-lts.md @@ -12,11 +12,11 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). -* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). -* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#60983](https://github.com/ClickHouse/ClickHouse/issues/60983): Fix buffer overflow that can happen if the attacker asks the HTTP server to decompress data with a composition of codecs and size triggering numeric overflow. Fix buffer overflow that can happen inside codec NONE on wrong input data. This was submitted by TIANGONG research team through our [Bug Bounty program](https://github.com/ClickHouse/ClickHouse/issues/38986). [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#60986](https://github.com/ClickHouse/ClickHouse/issues/60986): Functions for SQL/JSON were able to read uninitialized memory. This closes [#60017](https://github.com/ClickHouse/ClickHouse/issues/60017). Found by Fuzzer. [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#60816](https://github.com/ClickHouse/ClickHouse/issues/60816): Fix crash in arrayEnumerateRanked. [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#60837](https://github.com/ClickHouse/ClickHouse/issues/60837): Fix crash when using input() in INSERT SELECT JOIN. Closes [#60035](https://github.com/ClickHouse/ClickHouse/issues/60035). [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#60911](https://github.com/ClickHouse/ClickHouse/issues/60911): Avoid segfault if too many keys are skipped when reading from S3. [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). #### NO CL ENTRY diff --git a/docs/changelogs/v23.8.12.13-lts.md b/docs/changelogs/v23.8.12.13-lts.md index dbb36fdc00e..0329d4349f3 100644 --- a/docs/changelogs/v23.8.12.13-lts.md +++ b/docs/changelogs/v23.8.12.13-lts.md @@ -9,9 +9,9 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Improve isolation of query cache entries under re-created users or role switches [#58611](https://github.com/ClickHouse/ClickHouse/pull/58611) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). +* Backported in [#61439](https://github.com/ClickHouse/ClickHouse/issues/61439): The query cache now denies access to entries when the user is re-created or assumes another role. This improves prevents attacks where 1. an user with the same name as a dropped user may access the old user's cache entries or 2. a user with a different role may access cache entries of a role with a different row policy. [#58611](https://github.com/ClickHouse/ClickHouse/pull/58611) ([Robert Schulze](https://github.com/rschu1ze)). +* Backported in [#61572](https://github.com/ClickHouse/ClickHouse/issues/61572): Fix string search with constant start position which previously could lead to memory corruption. [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#61854](https://github.com/ClickHouse/ClickHouse/issues/61854): Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` when specifying incorrect UTF-8 sequence. Example: [#61714](https://github.com/ClickHouse/ClickHouse/issues/61714#issuecomment-2012768202). [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v23.8.13.25-lts.md b/docs/changelogs/v23.8.13.25-lts.md index 3452621556a..e9c6e2e9f28 100644 --- a/docs/changelogs/v23.8.13.25-lts.md +++ b/docs/changelogs/v23.8.13.25-lts.md @@ -15,11 +15,11 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix REPLACE/MOVE PARTITION with zero-copy replication [#54193](https://github.com/ClickHouse/ClickHouse/pull/54193) ([Alexander Tokmakov](https://github.com/tavplubix)). -* Fix ATTACH query with external ON CLUSTER [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). -* Cancel merges before removing moved parts [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). -* Try to fix segfault in Hive engine [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#62898](https://github.com/ClickHouse/ClickHouse/issues/62898): Fixed a bug in zero-copy replication (an experimental feature) that could cause `The specified key does not exist` errors and data loss after REPLACE/MOVE PARTITION. A similar issue might happen with TTL-moves between disks. [#54193](https://github.com/ClickHouse/ClickHouse/pull/54193) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Backported in [#61964](https://github.com/ClickHouse/ClickHouse/issues/61964): Fix the ATTACH query with the ON CLUSTER clause when the database does not exist on the initiator node. Closes [#55009](https://github.com/ClickHouse/ClickHouse/issues/55009). [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#62527](https://github.com/ClickHouse/ClickHouse/issues/62527): Fix data race between `MOVE PARTITION` query and merges resulting in intersecting parts. [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#62238](https://github.com/ClickHouse/ClickHouse/issues/62238): Fix skipping escape sequcne parsing errors during JSON data parsing while using `input_format_allow_errors_num/ratio` settings. [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62673](https://github.com/ClickHouse/ClickHouse/issues/62673): Fix segmentation fault when using Hive table engine. Reference [#62154](https://github.com/ClickHouse/ClickHouse/issues/62154), [#62560](https://github.com/ClickHouse/ClickHouse/issues/62560). [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v23.8.14.6-lts.md b/docs/changelogs/v23.8.14.6-lts.md index 0053502a9dc..3236c931e51 100644 --- a/docs/changelogs/v23.8.14.6-lts.md +++ b/docs/changelogs/v23.8.14.6-lts.md @@ -9,6 +9,6 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Set server name for SSL handshake in MongoDB engine [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). -* Use user specified db instead of "config" for MongoDB wire protocol version check [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#63172](https://github.com/ClickHouse/ClickHouse/issues/63172): Setting server_name might help with recently reported SSL handshake error when connecting to MongoDB Atlas: `Poco::Exception. Code: 1000, e.code() = 0, SSL Exception: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR`. [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#63164](https://github.com/ClickHouse/ClickHouse/issues/63164): The wire protocol version check for MongoDB used to try accessing "config" database, but this can fail if the user doesn't have permissions for it. The fix is to use the database name provided by user. [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). diff --git a/docs/changelogs/v23.8.2.7-lts.md b/docs/changelogs/v23.8.2.7-lts.md index 317e2c6d56a..a6f74e7998c 100644 --- a/docs/changelogs/v23.8.2.7-lts.md +++ b/docs/changelogs/v23.8.2.7-lts.md @@ -9,8 +9,8 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix: parallel replicas over distributed don't read from all replicas [#54199](https://github.com/ClickHouse/ClickHouse/pull/54199) ([Igor Nikonov](https://github.com/devcrafter)). -* Fix: allow IPv6 for bloom filter [#54200](https://github.com/ClickHouse/ClickHouse/pull/54200) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Backported in [#54209](https://github.com/ClickHouse/ClickHouse/issues/54209): Parallel reading from replicas over Distributed table was using only one replica per shard. [#54199](https://github.com/ClickHouse/ClickHouse/pull/54199) ([Igor Nikonov](https://github.com/devcrafter)). +* Backported in [#54233](https://github.com/ClickHouse/ClickHouse/issues/54233): Allow IPv6 for bloom filter, backward compatibility issue. [#54200](https://github.com/ClickHouse/ClickHouse/pull/54200) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.3.48-lts.md b/docs/changelogs/v23.8.3.48-lts.md index af669c5adc8..91514f48a25 100644 --- a/docs/changelogs/v23.8.3.48-lts.md +++ b/docs/changelogs/v23.8.3.48-lts.md @@ -18,19 +18,19 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix: moved to prewhere condition actions can lose column [#53492](https://github.com/ClickHouse/ClickHouse/pull/53492) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Fix: parallel replicas over distributed with prefer_localhost_replica=1 [#54334](https://github.com/ClickHouse/ClickHouse/pull/54334) ([Igor Nikonov](https://github.com/devcrafter)). -* Fix possible error 'URI contains invalid characters' in s3 table function [#54373](https://github.com/ClickHouse/ClickHouse/pull/54373) ([Kruglov Pavel](https://github.com/Avogar)). -* Check for overflow before addition in `analysisOfVariance` function [#54385](https://github.com/ClickHouse/ClickHouse/pull/54385) ([Antonio Andelic](https://github.com/antonio2368)). -* reproduce and fix the bug in removeSharedRecursive [#54430](https://github.com/ClickHouse/ClickHouse/pull/54430) ([Sema Checherinda](https://github.com/CheSema)). -* Fix aggregate projections with normalized states [#54480](https://github.com/ClickHouse/ClickHouse/pull/54480) ([Amos Bird](https://github.com/amosbird)). -* Fix possible parsing error in WithNames formats with disabled input_format_with_names_use_header [#54513](https://github.com/ClickHouse/ClickHouse/pull/54513) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix zero copy garbage [#54550](https://github.com/ClickHouse/ClickHouse/pull/54550) ([Alexander Tokmakov](https://github.com/tavplubix)). -* Fix race in `ColumnUnique` [#54575](https://github.com/ClickHouse/ClickHouse/pull/54575) ([Nikita Taranov](https://github.com/nickitat)). -* Fix serialization of `ColumnDecimal` [#54601](https://github.com/ClickHouse/ClickHouse/pull/54601) ([Nikita Taranov](https://github.com/nickitat)). -* Fix virtual columns having incorrect values after ORDER BY [#54811](https://github.com/ClickHouse/ClickHouse/pull/54811) ([Michael Kolupaev](https://github.com/al13n321)). -* Fix Keeper segfault during shutdown [#54841](https://github.com/ClickHouse/ClickHouse/pull/54841) ([Antonio Andelic](https://github.com/antonio2368)). -* Rebuild minmax_count_projection when partition key gets modified [#54943](https://github.com/ClickHouse/ClickHouse/pull/54943) ([Amos Bird](https://github.com/amosbird)). +* Backported in [#54974](https://github.com/ClickHouse/ClickHouse/issues/54974): Fixed issue when during prewhere optimization compound condition actions DAG can lose output column of intermediate step while this column is required as an input column of some next step. [#53492](https://github.com/ClickHouse/ClickHouse/pull/53492) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Backported in [#54996](https://github.com/ClickHouse/ClickHouse/issues/54996): Parallel replicas either executed completely on the local replica or produce an incorrect result when `prefer_localhost_replica=1`. Fixes [#54276](https://github.com/ClickHouse/ClickHouse/issues/54276). [#54334](https://github.com/ClickHouse/ClickHouse/pull/54334) ([Igor Nikonov](https://github.com/devcrafter)). +* Backported in [#54516](https://github.com/ClickHouse/ClickHouse/issues/54516): Fix possible error 'URI contains invalid characters' in s3 table function. Closes [#54345](https://github.com/ClickHouse/ClickHouse/issues/54345). [#54373](https://github.com/ClickHouse/ClickHouse/pull/54373) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#54418](https://github.com/ClickHouse/ClickHouse/issues/54418): Check for overflow when handling group number argument for `analysisOfVariance` to avoid crashes. Crash found using WINGFUZZ. [#54385](https://github.com/ClickHouse/ClickHouse/pull/54385) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#54527](https://github.com/ClickHouse/ClickHouse/issues/54527): Reproduce the bug described here [#54135](https://github.com/ClickHouse/ClickHouse/issues/54135). [#54430](https://github.com/ClickHouse/ClickHouse/pull/54430) ([Sema Checherinda](https://github.com/CheSema)). +* Backported in [#54854](https://github.com/ClickHouse/ClickHouse/issues/54854): Fix incorrect aggregation projection optimization when using variant aggregate states. This optimization is accidentally enabled but not properly implemented, because after https://github.com/ClickHouse/ClickHouse/pull/39420 the comparison of DataTypeAggregateFunction is normalized. This fixes [#54406](https://github.com/ClickHouse/ClickHouse/issues/54406). [#54480](https://github.com/ClickHouse/ClickHouse/pull/54480) ([Amos Bird](https://github.com/amosbird)). +* Backported in [#54599](https://github.com/ClickHouse/ClickHouse/issues/54599): Fix parsing error in WithNames formats while reading subset of columns with disabled input_format_with_names_use_header. Closes [#52591](https://github.com/ClickHouse/ClickHouse/issues/52591). [#54513](https://github.com/ClickHouse/ClickHouse/pull/54513) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#54594](https://github.com/ClickHouse/ClickHouse/issues/54594): Starting from version 23.5, zero-copy replication could leave some garbage in ZooKeeper and on S3. It might happen on removal of Outdated parts that were mutated. The issue is indicated by `Failed to get mutation parent on {} for part {}, refusing to remove blobs` log messages. [#54550](https://github.com/ClickHouse/ClickHouse/pull/54550) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Backported in [#54627](https://github.com/ClickHouse/ClickHouse/issues/54627): Fix unsynchronised write to a shared variable in `ColumnUnique`. [#54575](https://github.com/ClickHouse/ClickHouse/pull/54575) ([Nikita Taranov](https://github.com/nickitat)). +* Backported in [#54625](https://github.com/ClickHouse/ClickHouse/issues/54625): Fix serialization of `ColumnDecimal`. [#54601](https://github.com/ClickHouse/ClickHouse/pull/54601) ([Nikita Taranov](https://github.com/nickitat)). +* Backported in [#54945](https://github.com/ClickHouse/ClickHouse/issues/54945): Fixed virtual columns (e.g. _file) showing incorrect values with ORDER BY. [#54811](https://github.com/ClickHouse/ClickHouse/pull/54811) ([Michael Kolupaev](https://github.com/al13n321)). +* Backported in [#54872](https://github.com/ClickHouse/ClickHouse/issues/54872): Keeper fix: correctly capture a variable in callback to avoid segfaults during shutdown. [#54841](https://github.com/ClickHouse/ClickHouse/pull/54841) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#54950](https://github.com/ClickHouse/ClickHouse/issues/54950): Fix projection optimization error if table's partition key was ALTERed by extending its Enum type. The fix is to rebuild `minmax_count_projection` when partition key gets modified. This fixes [#54941](https://github.com/ClickHouse/ClickHouse/issues/54941). [#54943](https://github.com/ClickHouse/ClickHouse/pull/54943) ([Amos Bird](https://github.com/amosbird)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.4.69-lts.md b/docs/changelogs/v23.8.4.69-lts.md index 065a57549be..a6d8d8bb03b 100644 --- a/docs/changelogs/v23.8.4.69-lts.md +++ b/docs/changelogs/v23.8.4.69-lts.md @@ -11,26 +11,26 @@ sidebar_label: 2023 * Backported in [#55673](https://github.com/ClickHouse/ClickHouse/issues/55673): If the database is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). [#50724](https://github.com/ClickHouse/ClickHouse/pull/50724) ([Alexander Nikolaev](https://github.com/AlexNik)). * Backported in [#55293](https://github.com/ClickHouse/ClickHouse/issues/55293): Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checkouting submodules. [#51435](https://github.com/ClickHouse/ClickHouse/pull/51435) ([Ilya Yatsishin](https://github.com/qoega)). * Backported in [#55366](https://github.com/ClickHouse/ClickHouse/issues/55366): Solve issue with launching standalone clickhouse-keeper from clickhouse-server package. [#55226](https://github.com/ClickHouse/ClickHouse/pull/55226) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). -* Backported in [#55725](https://github.com/ClickHouse/ClickHouse/issues/55725): Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/mkaynov)). +* Backported in [#55725](https://github.com/ClickHouse/ClickHouse/issues/55725): Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/maxknv)). #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix "Invalid number of rows in Chunk" in MaterializedPostgreSQL [#54844](https://github.com/ClickHouse/ClickHouse/pull/54844) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Move obsolete format settings to separate section [#54855](https://github.com/ClickHouse/ClickHouse/pull/54855) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix: insert quorum w/o keeper retries [#55026](https://github.com/ClickHouse/ClickHouse/pull/55026) ([Igor Nikonov](https://github.com/devcrafter)). -* Prevent attaching parts from tables with different projections or indices [#55062](https://github.com/ClickHouse/ClickHouse/pull/55062) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Proper cleanup in case of exception in ctor of ShellCommandSource [#55103](https://github.com/ClickHouse/ClickHouse/pull/55103) ([Alexander Gololobov](https://github.com/davenger)). -* Fix deadlock in LDAP assigned role update [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)). -* Fix for background download in fs cache [#55252](https://github.com/ClickHouse/ClickHouse/pull/55252) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix functions execution over sparse columns [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)). -* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)). -* Fix trash optimization (up to a certain extent) [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix parsing of arrays in cast operator [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)). -* Fix filtering by virtual columns with OR filter in query [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)). -* Fix MongoDB connection issues [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)). -* Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)). -* Fix filtering by virtual columns with OR filter in query (resubmit) [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#55304](https://github.com/ClickHouse/ClickHouse/issues/55304): Fix "Invalid number of rows in Chunk" in MaterializedPostgreSQL (which could happen with PostgreSQL version >= 13). [#54844](https://github.com/ClickHouse/ClickHouse/pull/54844) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#55018](https://github.com/ClickHouse/ClickHouse/issues/55018): Move obsolete format settings to separate section and use it together with all format settings to avoid exceptions `Unknown setting` during use of obsolete format settings. Closes [#54792](https://github.com/ClickHouse/ClickHouse/issues/54792) ### Documentation entry for user-facing changes. [#54855](https://github.com/ClickHouse/ClickHouse/pull/54855) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#55097](https://github.com/ClickHouse/ClickHouse/issues/55097): Insert quorum could be marked as satisfied incorrectly in case of keeper retries while waiting for the quorum. Fixes [#54543](https://github.com/ClickHouse/ClickHouse/issues/54543). [#55026](https://github.com/ClickHouse/ClickHouse/pull/55026) ([Igor Nikonov](https://github.com/devcrafter)). +* Backported in [#55473](https://github.com/ClickHouse/ClickHouse/issues/55473): Prevent attaching partitions from tables that doesn't have the same indices or projections defined. [#55062](https://github.com/ClickHouse/ClickHouse/pull/55062) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#55461](https://github.com/ClickHouse/ClickHouse/issues/55461): If an exception happens in `ShellCommandSource` constructor after some of the `send_data_threads` are started, they need to be join()-ed, otherwise abort() will be triggered in `ThreadFromGlobalPool` destructor. Fixes [#55091](https://github.com/ClickHouse/ClickHouse/issues/55091). [#55103](https://github.com/ClickHouse/ClickHouse/pull/55103) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#55412](https://github.com/ClickHouse/ClickHouse/issues/55412): Fix deadlock in LDAP assigned role update for non-existing ClickHouse roles. [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)). +* Backported in [#55323](https://github.com/ClickHouse/ClickHouse/issues/55323): Fix for background download in fs cache. [#55252](https://github.com/ClickHouse/ClickHouse/pull/55252) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#55349](https://github.com/ClickHouse/ClickHouse/issues/55349): Fix functions execution over sparse columns (fixes `DB::Exception: isDefaultAt is not implemented for Function: while executing 'FUNCTION Capture` error). [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#55475](https://github.com/ClickHouse/ClickHouse/issues/55475): Fix an issue with inability to drop detached partition in `ReplicatedMergeTree` engines family on top of S3 (without zero-copy replication). Fixes issue [#55225](https://github.com/ClickHouse/ClickHouse/issues/55225). Fix bug with abandoned blobs on S3 for complex data types like Arrays or Nested columns. Partially fixes [#52393](https://github.com/ClickHouse/ClickHouse/issues/52393). Many kudos to @alifirat for examples. [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)). +* Backported in [#55399](https://github.com/ClickHouse/ClickHouse/issues/55399): An optimization introduced one year ago was wrong. This closes [#55272](https://github.com/ClickHouse/ClickHouse/issues/55272). [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#55437](https://github.com/ClickHouse/ClickHouse/issues/55437): Fix parsing of arrays in cast operator (`::`). [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#55635](https://github.com/ClickHouse/ClickHouse/issues/55635): Fix filtering by virtual columns with OR filter in query (`_part*` filtering for `MergeTree`, `_path`/`_file` for various `File`/`HDFS`/... engines, `_table` for `Merge`). [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#55445](https://github.com/ClickHouse/ClickHouse/issues/55445): Fix connection issues that occurred with some versions of MongoDB. Closes [#55376](https://github.com/ClickHouse/ClickHouse/issues/55376), [#55232](https://github.com/ClickHouse/ClickHouse/issues/55232). [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#55534](https://github.com/ClickHouse/ClickHouse/issues/55534): Fix possible deadlock caused by not destroyed fiber in case of exception in async task cancellation. Closes [#55185](https://github.com/ClickHouse/ClickHouse/issues/55185). [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#55747](https://github.com/ClickHouse/ClickHouse/issues/55747): Fix crash in QueryNormalizer with cyclic aliases. [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)). +* Backported in [#55760](https://github.com/ClickHouse/ClickHouse/issues/55760): Fix filtering by virtual columns with OR filter in query (_part* filtering for MergeTree, _path/_file for various File/HDFS/... engines, _table for Merge). [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)). #### NO CL CATEGORY @@ -46,6 +46,6 @@ sidebar_label: 2023 * Clean data dir and always start an old server version in aggregate functions compatibility test. [#55105](https://github.com/ClickHouse/ClickHouse/pull/55105) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * check if block is empty after async insert retries [#55143](https://github.com/ClickHouse/ClickHouse/pull/55143) ([Han Fei](https://github.com/hanfei1991)). * MaterializedPostgreSQL: remove back check [#55297](https://github.com/ClickHouse/ClickHouse/pull/55297) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Remove existing moving/ dir if allow_remove_stale_moving_parts is off [#55480](https://github.com/ClickHouse/ClickHouse/pull/55480) ([Mike Kot](https://github.com/myrrc)). +* Remove existing moving/ dir if allow_remove_stale_moving_parts is off [#55480](https://github.com/ClickHouse/ClickHouse/pull/55480) ([Mikhail Kot](https://github.com/myrrc)). * Bump curl to 8.4 [#55492](https://github.com/ClickHouse/ClickHouse/pull/55492) ([Robert Schulze](https://github.com/rschu1ze)). diff --git a/docs/changelogs/v23.8.5.16-lts.md b/docs/changelogs/v23.8.5.16-lts.md index 4a23b8892be..32ddbd6031d 100644 --- a/docs/changelogs/v23.8.5.16-lts.md +++ b/docs/changelogs/v23.8.5.16-lts.md @@ -12,9 +12,9 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#55736](https://github.com/ClickHouse/ClickHouse/issues/55736): Fix iceberg metadata parsing - delete files were not checked. [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#55969](https://github.com/ClickHouse/ClickHouse/issues/55969): Try to fix possible segfault in Native ORC input format. Closes [#55873](https://github.com/ClickHouse/ClickHouse/issues/55873). [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#55907](https://github.com/ClickHouse/ClickHouse/issues/55907): Fix window functions in case of sparse columns. Previously some queries with window functions returned invalid results or made ClickHouse crash when the columns were sparse. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.6.16-lts.md b/docs/changelogs/v23.8.6.16-lts.md index 6eb752e987c..df6c03cd668 100644 --- a/docs/changelogs/v23.8.6.16-lts.md +++ b/docs/changelogs/v23.8.6.16-lts.md @@ -9,11 +9,11 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix rare case of CHECKSUM_DOESNT_MATCH error [#54549](https://github.com/ClickHouse/ClickHouse/pull/54549) ([alesapin](https://github.com/alesapin)). -* Fix: avoid using regex match, possibly containing alternation, as a key condition. [#54696](https://github.com/ClickHouse/ClickHouse/pull/54696) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Fix a crash during table loading on startup [#56232](https://github.com/ClickHouse/ClickHouse/pull/56232) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix segfault in signal handler for Keeper [#56266](https://github.com/ClickHouse/ClickHouse/pull/56266) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix buffer overflow in T64 [#56434](https://github.com/ClickHouse/ClickHouse/pull/56434) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#54583](https://github.com/ClickHouse/ClickHouse/issues/54583): Fix rare bug in replicated merge tree which could lead to self-recovering `CHECKSUM_DOESNT_MATCH` error in logs. [#54549](https://github.com/ClickHouse/ClickHouse/pull/54549) ([alesapin](https://github.com/alesapin)). +* Backported in [#56253](https://github.com/ClickHouse/ClickHouse/issues/56253): Fixed bug of match() function (regex) with pattern containing alternation produces incorrect key condition. [#54696](https://github.com/ClickHouse/ClickHouse/pull/54696) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Backported in [#56322](https://github.com/ClickHouse/ClickHouse/issues/56322): Fix a crash during table loading on startup. Closes [#55767](https://github.com/ClickHouse/ClickHouse/issues/55767). [#56232](https://github.com/ClickHouse/ClickHouse/pull/56232) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#56292](https://github.com/ClickHouse/ClickHouse/issues/56292): Fix segfault in signal handler for Keeper. [#56266](https://github.com/ClickHouse/ClickHouse/pull/56266) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#56443](https://github.com/ClickHouse/ClickHouse/issues/56443): Fix crash due to buffer overflow while decompressing malformed data using `T64` codec. This issue was found with [ClickHouse Bug Bounty Program](https://github.com/ClickHouse/ClickHouse/issues/38986) by https://twitter.com/malacupa. [#56434](https://github.com/ClickHouse/ClickHouse/pull/56434) ([Alexey Milovidov](https://github.com/alexey-milovidov)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.7.24-lts.md b/docs/changelogs/v23.8.7.24-lts.md index 37862c17315..042484e2404 100644 --- a/docs/changelogs/v23.8.7.24-lts.md +++ b/docs/changelogs/v23.8.7.24-lts.md @@ -12,12 +12,12 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Select from system tables when table based on table function. [#55540](https://github.com/ClickHouse/ClickHouse/pull/55540) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). -* Fix incomplete query result for UNION in view() function. [#56274](https://github.com/ClickHouse/ClickHouse/pull/56274) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix crash in case of adding a column with type Object(JSON) [#56307](https://github.com/ClickHouse/ClickHouse/pull/56307) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). -* Fix segfault during Kerberos initialization [#56401](https://github.com/ClickHouse/ClickHouse/pull/56401) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix: RabbitMQ OpenSSL dynamic loading issue [#56703](https://github.com/ClickHouse/ClickHouse/pull/56703) ([Igor Nikonov](https://github.com/devcrafter)). -* Fix crash in FPC codec [#56795](https://github.com/ClickHouse/ClickHouse/pull/56795) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#56581](https://github.com/ClickHouse/ClickHouse/issues/56581): Prevent reference to a remote data source for the `data_paths` column in `system.tables` if the table is created with a table function using explicit column description. [#55540](https://github.com/ClickHouse/ClickHouse/pull/55540) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Backported in [#56877](https://github.com/ClickHouse/ClickHouse/issues/56877): Fix incomplete query result for `UNION` in `view()` table function. [#56274](https://github.com/ClickHouse/ClickHouse/pull/56274) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#56409](https://github.com/ClickHouse/ClickHouse/issues/56409): Prohibit adding a column with type `Object(JSON)` to an existing table. This closes: [#56095](https://github.com/ClickHouse/ClickHouse/issues/56095) This closes: [#49944](https://github.com/ClickHouse/ClickHouse/issues/49944). [#56307](https://github.com/ClickHouse/ClickHouse/pull/56307) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Backported in [#56756](https://github.com/ClickHouse/ClickHouse/issues/56756): Fix a segfault caused by a thrown exception in Kerberos initialization during the creation of the Kafka table. Closes [#56073](https://github.com/ClickHouse/ClickHouse/issues/56073). [#56401](https://github.com/ClickHouse/ClickHouse/pull/56401) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#56748](https://github.com/ClickHouse/ClickHouse/issues/56748): Fixed the issue that the RabbitMQ table engine wasn't able to connect to RabbitMQ over a secure connection. [#56703](https://github.com/ClickHouse/ClickHouse/pull/56703) ([Igor Nikonov](https://github.com/devcrafter)). +* Backported in [#56839](https://github.com/ClickHouse/ClickHouse/issues/56839): The server crashed when decompressing malformed data using the `FPC` codec. This issue was found with [ClickHouse Bug Bounty Program](https://github.com/ClickHouse/ClickHouse/issues/38986) by https://twitter.com/malacupa. [#56795](https://github.com/ClickHouse/ClickHouse/pull/56795) ([Alexey Milovidov](https://github.com/alexey-milovidov)). #### NO CL CATEGORY diff --git a/docs/changelogs/v23.8.8.20-lts.md b/docs/changelogs/v23.8.8.20-lts.md index 345cfcccf17..f45498cb61f 100644 --- a/docs/changelogs/v23.8.8.20-lts.md +++ b/docs/changelogs/v23.8.8.20-lts.md @@ -16,9 +16,9 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix ON CLUSTER queries without database on initial node [#56484](https://github.com/ClickHouse/ClickHouse/pull/56484) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix buffer overflow in Gorilla codec [#57107](https://github.com/ClickHouse/ClickHouse/pull/57107) ([Nikolay Degterinsky](https://github.com/evillique)). -* Close interserver connection on any exception before authentication [#57142](https://github.com/ClickHouse/ClickHouse/pull/57142) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#57111](https://github.com/ClickHouse/ClickHouse/issues/57111): Fix ON CLUSTER queries without the database being present on an initial node. Closes [#55009](https://github.com/ClickHouse/ClickHouse/issues/55009). [#56484](https://github.com/ClickHouse/ClickHouse/pull/56484) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#57169](https://github.com/ClickHouse/ClickHouse/issues/57169): Fix crash due to buffer overflow while decompressing malformed data using `Gorilla` codec. This issue was found with [ClickHouse Bug Bounty Program](https://github.com/ClickHouse/ClickHouse/issues/38986) by https://twitter.com/malacupa. [#57107](https://github.com/ClickHouse/ClickHouse/pull/57107) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#57175](https://github.com/ClickHouse/ClickHouse/issues/57175): Close interserver connection for any exception that happens before the authentication. This issue was found with [ClickHouse Bug Bounty Program](https://github.com/ClickHouse/ClickHouse/issues/38986) by https://twitter.com/malacupa. [#57142](https://github.com/ClickHouse/ClickHouse/pull/57142) ([Antonio Andelic](https://github.com/antonio2368)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.9.54-lts.md b/docs/changelogs/v23.8.9.54-lts.md index 00607c60c39..db13238f4ad 100644 --- a/docs/changelogs/v23.8.9.54-lts.md +++ b/docs/changelogs/v23.8.9.54-lts.md @@ -11,29 +11,29 @@ sidebar_label: 2024 * Backported in [#57668](https://github.com/ClickHouse/ClickHouse/issues/57668): Output valid JSON/XML on excetpion during HTTP query execution. Add setting `http_write_exception_in_output_format` to enable/disable this behaviour (enabled by default). [#52853](https://github.com/ClickHouse/ClickHouse/pull/52853) ([Kruglov Pavel](https://github.com/Avogar)). * Backported in [#58491](https://github.com/ClickHouse/ClickHouse/issues/58491): Fix transfer query to MySQL compatible query. Fixes [#57253](https://github.com/ClickHouse/ClickHouse/issues/57253). Fixes [#52654](https://github.com/ClickHouse/ClickHouse/issues/52654). Fixes [#56729](https://github.com/ClickHouse/ClickHouse/issues/56729). [#56456](https://github.com/ClickHouse/ClickHouse/pull/56456) ([flynn](https://github.com/ucasfl)). * Backported in [#57238](https://github.com/ClickHouse/ClickHouse/issues/57238): Fetching a part waits when that part is fully committed on remote replica. It is better not send part in PreActive state. In case of zero copy this is mandatory restriction. [#56808](https://github.com/ClickHouse/ClickHouse/pull/56808) ([Sema Checherinda](https://github.com/CheSema)). -* Backported in [#57655](https://github.com/ClickHouse/ClickHouse/issues/57655): Handle sigabrt case when getting PostgreSQl table structure with empty array. [#57618](https://github.com/ClickHouse/ClickHouse/pull/57618) ([Mike Kot (Михаил Кот)](https://github.com/myrrc)). +* Backported in [#57655](https://github.com/ClickHouse/ClickHouse/issues/57655): Handle sigabrt case when getting PostgreSQl table structure with empty array. [#57618](https://github.com/ClickHouse/ClickHouse/pull/57618) ([Mikhail Kot](https://github.com/myrrc)). #### Build/Testing/Packaging Improvement * Backported in [#57582](https://github.com/ClickHouse/ClickHouse/issues/57582): Fix issue caught in https://github.com/docker-library/official-images/pull/15846. [#57571](https://github.com/ClickHouse/ClickHouse/pull/57571) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). #### Bug Fix (user-visible misbehavior in an official stable release) -* Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix ALTER COLUMN with ALIAS [#56493](https://github.com/ClickHouse/ClickHouse/pull/56493) ([Nikolay Degterinsky](https://github.com/evillique)). -* Prevent incompatible ALTER of projection columns [#56948](https://github.com/ClickHouse/ClickHouse/pull/56948) ([Amos Bird](https://github.com/amosbird)). -* Fix segfault after ALTER UPDATE with Nullable MATERIALIZED column [#57147](https://github.com/ClickHouse/ClickHouse/pull/57147) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix incorrect JOIN plan optimization with partially materialized normal projection [#57196](https://github.com/ClickHouse/ClickHouse/pull/57196) ([Amos Bird](https://github.com/amosbird)). -* Fix `ReadonlyReplica` metric for all cases [#57267](https://github.com/ClickHouse/ClickHouse/pull/57267) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix working with read buffers in StreamingFormatExecutor [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)). -* bugfix: correctly parse SYSTEM STOP LISTEN TCP SECURE [#57483](https://github.com/ClickHouse/ClickHouse/pull/57483) ([joelynch](https://github.com/joelynch)). -* Ignore ON CLUSTER clause in grant/revoke queries for management of replicated access entities. [#57538](https://github.com/ClickHouse/ClickHouse/pull/57538) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). -* Disable system.kafka_consumers by default (due to possible live memory leak) [#57822](https://github.com/ClickHouse/ClickHouse/pull/57822) ([Azat Khuzhin](https://github.com/azat)). -* Fix invalid memory access in BLAKE3 (Rust) [#57876](https://github.com/ClickHouse/ClickHouse/pull/57876) ([Raúl Marín](https://github.com/Algunenano)). -* Normalize function names in CREATE INDEX [#57906](https://github.com/ClickHouse/ClickHouse/pull/57906) ([Alexander Tokmakov](https://github.com/tavplubix)). -* Fix invalid preprocessing on Keeper [#58069](https://github.com/ClickHouse/ClickHouse/pull/58069) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix Integer overflow in Poco::UTF32Encoding [#58073](https://github.com/ClickHouse/ClickHouse/pull/58073) ([Andrey Fedotov](https://github.com/anfedotoff)). -* Remove parallel parsing for JSONCompactEachRow [#58181](https://github.com/ClickHouse/ClickHouse/pull/58181) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix parallel parsing for JSONCompactEachRow [#58250](https://github.com/ClickHouse/ClickHouse/pull/58250) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#58324](https://github.com/ClickHouse/ClickHouse/issues/58324): Flatten only true Nested type if flatten_nested=1, not all Array(Tuple). [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#57395](https://github.com/ClickHouse/ClickHouse/issues/57395): Fix ALTER COLUMN with ALIAS that previously threw the `NO_SUCH_COLUMN_IN_TABLE` exception. Closes [#50927](https://github.com/ClickHouse/ClickHouse/issues/50927). [#56493](https://github.com/ClickHouse/ClickHouse/pull/56493) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#57449](https://github.com/ClickHouse/ClickHouse/issues/57449): Now ALTER columns which are incompatible with columns used in some projections will be forbidden. Previously it could result in incorrect data. This fixes [#56932](https://github.com/ClickHouse/ClickHouse/issues/56932). This PR also allows RENAME of index columns, and improves the exception message by providing clear information on the affected indices or projections causing the prevention. [#56948](https://github.com/ClickHouse/ClickHouse/pull/56948) ([Amos Bird](https://github.com/amosbird)). +* Backported in [#57281](https://github.com/ClickHouse/ClickHouse/issues/57281): Fix segfault after ALTER UPDATE with Nullable MATERIALIZED column. Closes [#42918](https://github.com/ClickHouse/ClickHouse/issues/42918). [#57147](https://github.com/ClickHouse/ClickHouse/pull/57147) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#57247](https://github.com/ClickHouse/ClickHouse/issues/57247): Fix incorrect JOIN plan optimization with partially materialized normal projection. This fixes [#57194](https://github.com/ClickHouse/ClickHouse/issues/57194). [#57196](https://github.com/ClickHouse/ClickHouse/pull/57196) ([Amos Bird](https://github.com/amosbird)). +* Backported in [#57346](https://github.com/ClickHouse/ClickHouse/issues/57346): Fix `ReadonlyReplica` metric for some cases (e.g. when a table cannot be initialized because of difference in local and Keeper data). [#57267](https://github.com/ClickHouse/ClickHouse/pull/57267) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#58434](https://github.com/ClickHouse/ClickHouse/issues/58434): Fix working with read buffers in StreamingFormatExecutor, previously it could lead to segfaults in Kafka and other streaming engines. [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#57539](https://github.com/ClickHouse/ClickHouse/issues/57539): Fix parsing of `SYSTEM STOP LISTEN TCP SECURE`. [#57483](https://github.com/ClickHouse/ClickHouse/pull/57483) ([joelynch](https://github.com/joelynch)). +* Backported in [#57779](https://github.com/ClickHouse/ClickHouse/issues/57779): Conf ``` /clickhouse/access/ ``` sql ``` show settings like 'ignore_on_cluster_for_replicated_access_entities_queries' ┌─name─────────────────────────────────────────────────────┬─type─┬─value─┐ │ ignore_on_cluster_for_replicated_access_entities_queries │ bool │ 1 │ └──────────────────────────────────────────────────────────┴──────┴───────┘. [#57538](https://github.com/ClickHouse/ClickHouse/pull/57538) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Backported in [#58256](https://github.com/ClickHouse/ClickHouse/issues/58256): Disable system.kafka_consumers by default (due to possible live memory leak). [#57822](https://github.com/ClickHouse/ClickHouse/pull/57822) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#57923](https://github.com/ClickHouse/ClickHouse/issues/57923): Fix invalid memory access in BLAKE3. [#57876](https://github.com/ClickHouse/ClickHouse/pull/57876) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#58084](https://github.com/ClickHouse/ClickHouse/issues/58084): Normilize function names in `CREATE INDEX` query. Avoid `Existing table metadata in ZooKeeper differs in skip indexes` errors if an alias was used insead of canonical function name when creating an index. [#57906](https://github.com/ClickHouse/ClickHouse/pull/57906) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Backported in [#58110](https://github.com/ClickHouse/ClickHouse/issues/58110): Keeper fix: Leader should correctly fail on preprocessing a request if it is not initialized. [#58069](https://github.com/ClickHouse/ClickHouse/pull/58069) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#58155](https://github.com/ClickHouse/ClickHouse/issues/58155): Fix Integer overflow in Poco::UTF32Encoding. [#58073](https://github.com/ClickHouse/ClickHouse/pull/58073) ([Andrey Fedotov](https://github.com/anfedotoff)). +* Backported in [#58188](https://github.com/ClickHouse/ClickHouse/issues/58188): Parallel parsing for `JSONCompactEachRow` could work incorrectly in previous versions. This closes [#58180](https://github.com/ClickHouse/ClickHouse/issues/58180). [#58181](https://github.com/ClickHouse/ClickHouse/pull/58181) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#58301](https://github.com/ClickHouse/ClickHouse/issues/58301): Fix parallel parsing for JSONCompactEachRow. [#58250](https://github.com/ClickHouse/ClickHouse/pull/58250) ([Kruglov Pavel](https://github.com/Avogar)). #### NO CL ENTRY diff --git a/docs/changelogs/v24.1.1.2048-stable.md b/docs/changelogs/v24.1.1.2048-stable.md index 8e4647da86e..c509ce0058e 100644 --- a/docs/changelogs/v24.1.1.2048-stable.md +++ b/docs/changelogs/v24.1.1.2048-stable.md @@ -133,56 +133,56 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Add join keys conversion for nested lowcardinality [#51550](https://github.com/ClickHouse/ClickHouse/pull/51550) ([vdimir](https://github.com/vdimir)). -* Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix a bug with projections and the aggregate_functions_null_for_empty setting during insertion. [#56944](https://github.com/ClickHouse/ClickHouse/pull/56944) ([Amos Bird](https://github.com/amosbird)). -* Fixed potential exception due to stale profile UUID [#57263](https://github.com/ClickHouse/ClickHouse/pull/57263) ([Vasily Nemkov](https://github.com/Enmk)). -* Fix working with read buffers in StreamingFormatExecutor [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)). -* Ignore MVs with dropped target table during pushing to views [#57520](https://github.com/ClickHouse/ClickHouse/pull/57520) ([Kruglov Pavel](https://github.com/Avogar)). -* [RFC] Eliminate possible race between ALTER_METADATA and MERGE_PARTS [#57755](https://github.com/ClickHouse/ClickHouse/pull/57755) ([Azat Khuzhin](https://github.com/azat)). -* Fix the exprs order bug in group by with rollup [#57786](https://github.com/ClickHouse/ClickHouse/pull/57786) ([Chen768959](https://github.com/Chen768959)). -* Fix lost blobs after dropping a replica with broken detached parts [#58333](https://github.com/ClickHouse/ClickHouse/pull/58333) ([Alexander Tokmakov](https://github.com/tavplubix)). -* Allow users to work with symlinks in user_files_path (again) [#58447](https://github.com/ClickHouse/ClickHouse/pull/58447) ([Duc Canh Le](https://github.com/canhld94)). -* Fix segfault when graphite table does not have agg function [#58453](https://github.com/ClickHouse/ClickHouse/pull/58453) ([Duc Canh Le](https://github.com/canhld94)). -* Delay reading from StorageKafka to allow multiple reads in materialized views [#58477](https://github.com/ClickHouse/ClickHouse/pull/58477) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Fix a stupid case of intersecting parts [#58482](https://github.com/ClickHouse/ClickHouse/pull/58482) ([Alexander Tokmakov](https://github.com/tavplubix)). -* MergeTreePrefetchedReadPool disable for LIMIT only queries [#58505](https://github.com/ClickHouse/ClickHouse/pull/58505) ([Maksim Kita](https://github.com/kitaisreal)). -* Enable ordinary databases while restoration [#58520](https://github.com/ClickHouse/ClickHouse/pull/58520) ([Jihyuk Bok](https://github.com/tomahawk28)). -* Fix hive threadpool read ORC/Parquet/... Failed [#58537](https://github.com/ClickHouse/ClickHouse/pull/58537) ([sunny](https://github.com/sunny19930321)). -* Hide credentials in system.backup_log base_backup_name column [#58550](https://github.com/ClickHouse/ClickHouse/pull/58550) ([Daniel Pozo Escalona](https://github.com/danipozo)). -* toStartOfInterval for milli- microsencods values rounding [#58557](https://github.com/ClickHouse/ClickHouse/pull/58557) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Disable max_joined_block_rows in ConcurrentHashJoin [#58595](https://github.com/ClickHouse/ClickHouse/pull/58595) ([vdimir](https://github.com/vdimir)). -* Fix join using nullable in old analyzer [#58596](https://github.com/ClickHouse/ClickHouse/pull/58596) ([vdimir](https://github.com/vdimir)). -* `makeDateTime64()`: Allow non-const fraction argument [#58597](https://github.com/ClickHouse/ClickHouse/pull/58597) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix possible NULL dereference during symbolizing inline frames [#58607](https://github.com/ClickHouse/ClickHouse/pull/58607) ([Azat Khuzhin](https://github.com/azat)). -* Improve isolation of query cache entries under re-created users or role switches [#58611](https://github.com/ClickHouse/ClickHouse/pull/58611) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix broken partition key analysis when doing projection optimization [#58638](https://github.com/ClickHouse/ClickHouse/pull/58638) ([Amos Bird](https://github.com/amosbird)). -* Query cache: Fix per-user quota [#58731](https://github.com/ClickHouse/ClickHouse/pull/58731) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix stream partitioning in parallel window functions [#58739](https://github.com/ClickHouse/ClickHouse/pull/58739) ([Dmitry Novik](https://github.com/novikd)). -* Fix double destroy call on exception throw in addBatchLookupTable8 [#58745](https://github.com/ClickHouse/ClickHouse/pull/58745) ([Raúl Marín](https://github.com/Algunenano)). -* Don't process requests in Keeper during shutdown [#58765](https://github.com/ClickHouse/ClickHouse/pull/58765) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix Segfault in `SlabsPolygonIndex::find` [#58771](https://github.com/ClickHouse/ClickHouse/pull/58771) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Fix JSONExtract function for LowCardinality(Nullable) columns [#58808](https://github.com/ClickHouse/ClickHouse/pull/58808) ([vdimir](https://github.com/vdimir)). -* Table CREATE DROP Poco::Logger memory leak fix [#58831](https://github.com/ClickHouse/ClickHouse/pull/58831) ([Maksim Kita](https://github.com/kitaisreal)). -* Fix HTTP compressors finalization [#58846](https://github.com/ClickHouse/ClickHouse/pull/58846) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Multiple read file log storage in mv [#58877](https://github.com/ClickHouse/ClickHouse/pull/58877) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Restriction for the access key id for s3. [#58900](https://github.com/ClickHouse/ClickHouse/pull/58900) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). -* Fix possible crash in clickhouse-local during loading suggestions [#58907](https://github.com/ClickHouse/ClickHouse/pull/58907) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix crash when indexHint() is used [#58911](https://github.com/ClickHouse/ClickHouse/pull/58911) ([Dmitry Novik](https://github.com/novikd)). -* Fix StorageURL forgetting headers on server restart [#58933](https://github.com/ClickHouse/ClickHouse/pull/58933) ([Michael Kolupaev](https://github.com/al13n321)). -* Analyzer: fix storage replacement with insertion block [#58958](https://github.com/ClickHouse/ClickHouse/pull/58958) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Fix seek in ReadBufferFromZipArchive [#58966](https://github.com/ClickHouse/ClickHouse/pull/58966) ([Michael Kolupaev](https://github.com/al13n321)). -* `DROP INDEX` of inverted index now removes all relevant files from persistence [#59040](https://github.com/ClickHouse/ClickHouse/pull/59040) ([mochi](https://github.com/MochiXu)). -* Fix data race on query_factories_info [#59049](https://github.com/ClickHouse/ClickHouse/pull/59049) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Disable "Too many redirects" error retry [#59099](https://github.com/ClickHouse/ClickHouse/pull/59099) ([skyoct](https://github.com/skyoct)). -* Fix aggregation issue in mixed x86_64 and ARM clusters [#59132](https://github.com/ClickHouse/ClickHouse/pull/59132) ([Harry Lee](https://github.com/HarryLeeIBM)). -* Fix not started database shutdown deadlock [#59137](https://github.com/ClickHouse/ClickHouse/pull/59137) ([Sergei Trifonov](https://github.com/serxa)). -* Fix: LIMIT BY and LIMIT in distributed query [#59153](https://github.com/ClickHouse/ClickHouse/pull/59153) ([Igor Nikonov](https://github.com/devcrafter)). -* Fix crash with nullable timezone for `toString` [#59190](https://github.com/ClickHouse/ClickHouse/pull/59190) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Fix abort in iceberg metadata on bad file paths [#59275](https://github.com/ClickHouse/ClickHouse/pull/59275) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix architecture name in select of Rust target [#59307](https://github.com/ClickHouse/ClickHouse/pull/59307) ([p1rattttt](https://github.com/p1rattttt)). -* Fix not-ready set for system.tables [#59351](https://github.com/ClickHouse/ClickHouse/pull/59351) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix lazy initialization in RabbitMQ [#59352](https://github.com/ClickHouse/ClickHouse/pull/59352) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix possible errors when joining sub-types with low cardinality (e.g., Array(LowCardinality(T)) with Array(T)). [#51550](https://github.com/ClickHouse/ClickHouse/pull/51550) ([vdimir](https://github.com/vdimir)). +* Flatten only true Nested type if flatten_nested=1, not all Array(Tuple). [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix a bug with projections and the `aggregate_functions_null_for_empty` setting during insertion. This is an addition to [#42198](https://github.com/ClickHouse/ClickHouse/issues/42198) and [#49873](https://github.com/ClickHouse/ClickHouse/issues/49873). The bug was found by fuzzer in [#56666](https://github.com/ClickHouse/ClickHouse/issues/56666). This PR also fix potential issues with projections and the `transform_null_in` setting. [#56944](https://github.com/ClickHouse/ClickHouse/pull/56944) ([Amos Bird](https://github.com/amosbird)). +* Fixed (a rare) exception in case when user's assigned profiles are updated right after user logging in, which could cause a missing entry in `session_log` or problems with logging in. [#57263](https://github.com/ClickHouse/ClickHouse/pull/57263) ([Vasily Nemkov](https://github.com/Enmk)). +* Fix working with read buffers in StreamingFormatExecutor, previously it could lead to segfaults in Kafka and other streaming engines. [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)). +* Ignore MVs with dropped target table during pushing to views in insert to a source table. [#57520](https://github.com/ClickHouse/ClickHouse/pull/57520) ([Kruglov Pavel](https://github.com/Avogar)). +* Eliminate possible race between ALTER_METADATA and MERGE_PARTS (that leads to checksum mismatch - CHECKSUM_DOESNT_MATCH). [#57755](https://github.com/ClickHouse/ClickHouse/pull/57755) ([Azat Khuzhin](https://github.com/azat)). +* Fix the exprs order bug in group by with rollup. [#57786](https://github.com/ClickHouse/ClickHouse/pull/57786) ([Chen768959](https://github.com/Chen768959)). +* Fix a bug in zero-copy-replication (an experimental feature) that could lead to `The specified key does not exist` error and data loss. It could happen when dropping a replica with broken or unexpected/ignored detached parts. Fixes [#57985](https://github.com/ClickHouse/ClickHouse/issues/57985). [#58333](https://github.com/ClickHouse/ClickHouse/pull/58333) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Fix a bug that users cannot work with symlinks in user_files_path. [#58447](https://github.com/ClickHouse/ClickHouse/pull/58447) ([Duc Canh Le](https://github.com/canhld94)). +* Fix segfault when graphite table does not have agg function. [#58453](https://github.com/ClickHouse/ClickHouse/pull/58453) ([Duc Canh Le](https://github.com/canhld94)). +* Fix reading multiple times from KafkaEngine in materialized views. [#58477](https://github.com/ClickHouse/ClickHouse/pull/58477) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Fix `Part ... intersects part ...` error that might occur in `ReplicatedMergeTree` when the server was restarted just after [automatically] dropping [an empty] part and adjacent parts were merged. The bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/56282. [#58482](https://github.com/ClickHouse/ClickHouse/pull/58482) ([Alexander Tokmakov](https://github.com/tavplubix)). +* MergeTreePrefetchedReadPool disable for LIMIT only queries, because time spend during filling per thread tasks can be greater than whole query execution for big tables with small limit. [#58505](https://github.com/ClickHouse/ClickHouse/pull/58505) ([Maksim Kita](https://github.com/kitaisreal)). +* While `restore` is underway in Clickhouse, restore should allow the database with an `ordinary` engine. [#58520](https://github.com/ClickHouse/ClickHouse/pull/58520) ([Jihyuk Bok](https://github.com/tomahawk28)). +* Fix read buffer creation in Hive engine when thread_pool read method is used. Closes [#57978](https://github.com/ClickHouse/ClickHouse/issues/57978). [#58537](https://github.com/ClickHouse/ClickHouse/pull/58537) ([sunny](https://github.com/sunny19930321)). +* Hide credentials in `base_backup_name` column of `system.backup_log`. [#58550](https://github.com/ClickHouse/ClickHouse/pull/58550) ([Daniel Pozo Escalona](https://github.com/danipozo)). +* While executing queries like `SELECT toStartOfInterval(toDateTime64('2023-10-09 10:11:12.000999', 6), toIntervalMillisecond(1));`, the result was not rounded to 1 millisecond previously. Current PR solves this issue. Also, current PR will solve some problems appearing in https://github.com/ClickHouse/ClickHouse/pull/56738. [#58557](https://github.com/ClickHouse/ClickHouse/pull/58557) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix logical error in `parallel_hash` working with `max_joined_block_size_rows`. [#58595](https://github.com/ClickHouse/ClickHouse/pull/58595) ([vdimir](https://github.com/vdimir)). +* Fix error in join with `USING` when one of the table has `Nullable` key. [#58596](https://github.com/ClickHouse/ClickHouse/pull/58596) ([vdimir](https://github.com/vdimir)). +* The (optional) `fraction` argument in function `makeDateTime64()` can now be non-const. This was possible already with ClickHouse <= 23.8. [#58597](https://github.com/ClickHouse/ClickHouse/pull/58597) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix possible server crash during symbolizing inline frames. [#58607](https://github.com/ClickHouse/ClickHouse/pull/58607) ([Azat Khuzhin](https://github.com/azat)). +* The query cache now denies access to entries when the user is re-created or assumes another role. This improves prevents attacks where 1. an user with the same name as a dropped user may access the old user's cache entries or 2. a user with a different role may access cache entries of a role with a different row policy. [#58611](https://github.com/ClickHouse/ClickHouse/pull/58611) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix broken partition key analysis when doing projection optimization with `force_index_by_date = 1`. This fixes [#58620](https://github.com/ClickHouse/ClickHouse/issues/58620). We don't need partition key analysis for projections after https://github.com/ClickHouse/ClickHouse/pull/56502 . [#58638](https://github.com/ClickHouse/ClickHouse/pull/58638) ([Amos Bird](https://github.com/amosbird)). +* The query cache now behaves properly when per-user quotas are defined and `SYSTEM DROP QUERY CACHE` ran. [#58731](https://github.com/ClickHouse/ClickHouse/pull/58731) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix data stream partitioning for window functions when there are different window descriptions with similar prefixes but different partitioning. Fixes [#58714](https://github.com/ClickHouse/ClickHouse/issues/58714). [#58739](https://github.com/ClickHouse/ClickHouse/pull/58739) ([Dmitry Novik](https://github.com/novikd)). +* Fix double destroy call on exception throw in addBatchLookupTable8. [#58745](https://github.com/ClickHouse/ClickHouse/pull/58745) ([Raúl Marín](https://github.com/Algunenano)). +* Keeper fix: don't process requests during shutdown because it will lead to invalid state. [#58765](https://github.com/ClickHouse/ClickHouse/pull/58765) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix a crash in the polygon dictionary. Fixes [#58612](https://github.com/ClickHouse/ClickHouse/issues/58612). [#58771](https://github.com/ClickHouse/ClickHouse/pull/58771) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix possible crash in JSONExtract function extracting `LowCardinality(Nullable(T))` type. [#58808](https://github.com/ClickHouse/ClickHouse/pull/58808) ([vdimir](https://github.com/vdimir)). +* Table CREATE DROP `Poco::Logger` memory leak fix. Closes [#57931](https://github.com/ClickHouse/ClickHouse/issues/57931). Closes [#58496](https://github.com/ClickHouse/ClickHouse/issues/58496). [#58831](https://github.com/ClickHouse/ClickHouse/pull/58831) ([Maksim Kita](https://github.com/kitaisreal)). +* Fix HTTP compressors. Follow-up [#58475](https://github.com/ClickHouse/ClickHouse/issues/58475). [#58846](https://github.com/ClickHouse/ClickHouse/pull/58846) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fix reading multiple times from FileLog engine in materialized views. [#58877](https://github.com/ClickHouse/ClickHouse/pull/58877) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Prevent specifying an `access_key_id` that does not match the correct [correct pattern]( https://docs.aws.amazon.com/IAM/latest/APIReference/API_AccessKey.html). [#58900](https://github.com/ClickHouse/ClickHouse/pull/58900) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Fix possible crash in clickhouse-local during loading suggestions. Closes [#58825](https://github.com/ClickHouse/ClickHouse/issues/58825). [#58907](https://github.com/ClickHouse/ClickHouse/pull/58907) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix crash when `indexHint` function is used without arguments in the filters. [#58911](https://github.com/ClickHouse/ClickHouse/pull/58911) ([Dmitry Novik](https://github.com/novikd)). +* Fixed URL and S3 engines losing the `headers` argument on server restart. [#58933](https://github.com/ClickHouse/ClickHouse/pull/58933) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix analyzer - insertion from select with subquery referencing insertion table should process only insertion block for all table expressions. Fixes [#58080](https://github.com/ClickHouse/ClickHouse/issues/58080). follow-up [#50857](https://github.com/ClickHouse/ClickHouse/issues/50857). [#58958](https://github.com/ClickHouse/ClickHouse/pull/58958) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fixed reading parquet files from archives. [#58966](https://github.com/ClickHouse/ClickHouse/pull/58966) ([Michael Kolupaev](https://github.com/al13n321)). +* Experimental feature of inverted indices: `ALTER TABLE DROP INDEX` for an inverted index now removes all inverted index files from the new part (issue [#59039](https://github.com/ClickHouse/ClickHouse/issues/59039)). [#59040](https://github.com/ClickHouse/ClickHouse/pull/59040) ([mochi](https://github.com/MochiXu)). +* Fix data race on collecting factories info for system.query_log. [#59049](https://github.com/ClickHouse/ClickHouse/pull/59049) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fixs: [#58967](https://github.com/ClickHouse/ClickHouse/issues/58967). [#59099](https://github.com/ClickHouse/ClickHouse/pull/59099) ([skyoct](https://github.com/skyoct)). +* Fixed wrong aggregation results in mixed x86_64 and ARM clusters. [#59132](https://github.com/ClickHouse/ClickHouse/pull/59132) ([Harry Lee](https://github.com/HarryLeeIBM)). +* Fix a deadlock that can happen during the shutdown of the server due to metadata loading failure. [#59137](https://github.com/ClickHouse/ClickHouse/pull/59137) ([Sergei Trifonov](https://github.com/serxa)). +* The combination of LIMIT BY and LIMIT could produce an incorrect result in distributed queries (parallel replicas included). [#59153](https://github.com/ClickHouse/ClickHouse/pull/59153) ([Igor Nikonov](https://github.com/devcrafter)). +* Fixes crash with for `toString()` with timezone in nullable format. Fixes [#59126](https://github.com/ClickHouse/ClickHouse/issues/59126). [#59190](https://github.com/ClickHouse/ClickHouse/pull/59190) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix abort in iceberg metadata on bad file paths. [#59275](https://github.com/ClickHouse/ClickHouse/pull/59275) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix architecture name in select of Rust target. [#59307](https://github.com/ClickHouse/ClickHouse/pull/59307) ([p1rattttt](https://github.com/p1rattttt)). +* Fix `Not-ready Set` for queries from `system.tables` with `table IN (subquery)` filter expression. Fixes [#59342](https://github.com/ClickHouse/ClickHouse/issues/59342). [#59351](https://github.com/ClickHouse/ClickHouse/pull/59351) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix lazy initialization in RabbitMQ that could lead to logical error and not initialized state. [#59352](https://github.com/ClickHouse/ClickHouse/pull/59352) ([Kruglov Pavel](https://github.com/Avogar)). #### NO CL ENTRY diff --git a/docs/changelogs/v24.1.2.5-stable.md b/docs/changelogs/v24.1.2.5-stable.md index bac25c9b9ed..080e24da6f0 100644 --- a/docs/changelogs/v24.1.2.5-stable.md +++ b/docs/changelogs/v24.1.2.5-stable.md @@ -9,6 +9,6 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix translate() with FixedString input [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). -* Fix stacktraces for binaries without debug symbols [#59444](https://github.com/ClickHouse/ClickHouse/pull/59444) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#59425](https://github.com/ClickHouse/ClickHouse/issues/59425): Fix translate() with FixedString input. Could lead to crashes as it'd return a String column (vs the expected FixedString). This issue was found through ClickHouse Bug Bounty Program YohannJardin. [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59478](https://github.com/ClickHouse/ClickHouse/issues/59478): Fix stacktraces for binaries without debug symbols. [#59444](https://github.com/ClickHouse/ClickHouse/pull/59444) ([Azat Khuzhin](https://github.com/azat)). diff --git a/docs/changelogs/v24.1.3.31-stable.md b/docs/changelogs/v24.1.3.31-stable.md index e898fba5c87..ec73672c8d5 100644 --- a/docs/changelogs/v24.1.3.31-stable.md +++ b/docs/changelogs/v24.1.3.31-stable.md @@ -13,13 +13,13 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix `ASTAlterCommand::formatImpl` in case of column specific settings... [#59445](https://github.com/ClickHouse/ClickHouse/pull/59445) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Make MAX use the same rules as permutation for complex types [#59498](https://github.com/ClickHouse/ClickHouse/pull/59498) ([Raúl Marín](https://github.com/Algunenano)). -* Fix corner case when passing `update_insert_deduplication_token_in_dependent_materialized_views` [#59544](https://github.com/ClickHouse/ClickHouse/pull/59544) ([Jordi Villar](https://github.com/jrdi)). -* Fix incorrect result of arrayElement / map[] on empty value [#59594](https://github.com/ClickHouse/ClickHouse/pull/59594) ([Raúl Marín](https://github.com/Algunenano)). -* Fix crash in topK when merging empty states [#59603](https://github.com/ClickHouse/ClickHouse/pull/59603) ([Raúl Marín](https://github.com/Algunenano)). -* Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor [#59658](https://github.com/ClickHouse/ClickHouse/pull/59658) ([Raúl Marín](https://github.com/Algunenano)). -* Fix leftPad / rightPad function with FixedString input [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59726](https://github.com/ClickHouse/ClickHouse/issues/59726): Fix formatting of alter commands in case of column specific settings. [#59445](https://github.com/ClickHouse/ClickHouse/pull/59445) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#59585](https://github.com/ClickHouse/ClickHouse/issues/59585): Make MAX use the same rules as permutation for complex types. [#59498](https://github.com/ClickHouse/ClickHouse/pull/59498) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59579](https://github.com/ClickHouse/ClickHouse/issues/59579): Fix a corner case when passing `update_insert_deduplication_token_in_dependent_materialized_views` setting. There is one corner case not covered due to the absence of tables in the path:. [#59544](https://github.com/ClickHouse/ClickHouse/pull/59544) ([Jordi Villar](https://github.com/jrdi)). +* Backported in [#59647](https://github.com/ClickHouse/ClickHouse/issues/59647): Fix incorrect result of arrayElement / map[] on empty value. [#59594](https://github.com/ClickHouse/ClickHouse/pull/59594) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59639](https://github.com/ClickHouse/ClickHouse/issues/59639): Fix crash in topK when merging empty states. [#59603](https://github.com/ClickHouse/ClickHouse/pull/59603) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59696](https://github.com/ClickHouse/ClickHouse/issues/59696): Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor. [#59658](https://github.com/ClickHouse/ClickHouse/pull/59658) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59764](https://github.com/ClickHouse/ClickHouse/issues/59764): Fix leftPad / rightPad function with FixedString input. [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). #### NO CL ENTRY diff --git a/docs/changelogs/v24.1.4.20-stable.md b/docs/changelogs/v24.1.4.20-stable.md index 8612a485f12..1baec2178b1 100644 --- a/docs/changelogs/v24.1.4.20-stable.md +++ b/docs/changelogs/v24.1.4.20-stable.md @@ -15,10 +15,10 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix digest calculation in Keeper [#59439](https://github.com/ClickHouse/ClickHouse/pull/59439) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix distributed table with a constant sharding key [#59606](https://github.com/ClickHouse/ClickHouse/pull/59606) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix query start time on non initial queries [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). -* Fix parsing of partition expressions surrounded by parens [#59901](https://github.com/ClickHouse/ClickHouse/pull/59901) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#59457](https://github.com/ClickHouse/ClickHouse/issues/59457): Keeper fix: fix digest calculation for nodes. [#59439](https://github.com/ClickHouse/ClickHouse/pull/59439) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#59682](https://github.com/ClickHouse/ClickHouse/issues/59682): Fix distributed table with a constant sharding key. [#59606](https://github.com/ClickHouse/ClickHouse/pull/59606) ([Vitaly Baranov](https://github.com/vitlibar)). +* Backported in [#59842](https://github.com/ClickHouse/ClickHouse/issues/59842): Fix query start time on non initial queries. [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59937](https://github.com/ClickHouse/ClickHouse/issues/59937): Fix parsing of partition expressions that are surrounded by parentheses, e.g.: `ALTER TABLE test DROP PARTITION ('2023-10-19')`. [#59901](https://github.com/ClickHouse/ClickHouse/pull/59901) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v24.1.5.6-stable.md b/docs/changelogs/v24.1.5.6-stable.md index ce46c51e2f4..caf246fcab6 100644 --- a/docs/changelogs/v24.1.5.6-stable.md +++ b/docs/changelogs/v24.1.5.6-stable.md @@ -9,7 +9,7 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* UniqExactSet read crash fix [#59928](https://github.com/ClickHouse/ClickHouse/pull/59928) ([Maksim Kita](https://github.com/kitaisreal)). +* Backported in [#59959](https://github.com/ClickHouse/ClickHouse/issues/59959): Fix crash during deserialization of aggregation function states that internally use `UniqExactSet`. Introduced https://github.com/ClickHouse/ClickHouse/pull/59009. [#59928](https://github.com/ClickHouse/ClickHouse/pull/59928) ([Maksim Kita](https://github.com/kitaisreal)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v24.1.7.18-stable.md b/docs/changelogs/v24.1.7.18-stable.md index 603a83a67be..3bc94538174 100644 --- a/docs/changelogs/v24.1.7.18-stable.md +++ b/docs/changelogs/v24.1.7.18-stable.md @@ -9,10 +9,10 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix_max_query_size_for_kql_compound_operator: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). -* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). -* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#61330](https://github.com/ClickHouse/ClickHouse/issues/61330): Fix deadlock in parallel parsing when lots of rows are skipped due to errors. [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#61008](https://github.com/ClickHouse/ClickHouse/issues/61008): Fix the issue of `max_query_size` for KQL compound operator like mv-expand. Related to [#59626](https://github.com/ClickHouse/ClickHouse/issues/59626). [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). +* Backported in [#61019](https://github.com/ClickHouse/ClickHouse/issues/61019): Fix crash when `allow_experimental_analyzer` setting value is changed in the subqueries. [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#61293](https://github.com/ClickHouse/ClickHouse/issues/61293): Keeper: fix runtime reconfig for standalone binary. [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.1.8.22-stable.md b/docs/changelogs/v24.1.8.22-stable.md index f780de41c40..e615c60a942 100644 --- a/docs/changelogs/v24.1.8.22-stable.md +++ b/docs/changelogs/v24.1.8.22-stable.md @@ -9,12 +9,12 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix possible incorrect result of aggregate function `uniqExact` [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). -* Fix consecutive keys optimization for nullable keys [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). -* Fix bug when reading system.parts using UUID (issue 61220). [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). -* Fix client `-s` argument [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). -* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). +* Backported in [#61451](https://github.com/ClickHouse/ClickHouse/issues/61451): Fix possible incorrect result of aggregate function `uniqExact`. [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#61844](https://github.com/ClickHouse/ClickHouse/issues/61844): Fixed possible wrong result of aggregation with nullable keys. [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#61746](https://github.com/ClickHouse/ClickHouse/issues/61746): Fix incorrect results when filtering `system.parts` or `system.parts_columns` using UUID. [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). +* Backported in [#61696](https://github.com/ClickHouse/ClickHouse/issues/61696): Fix `clickhouse-client -s` argument, it was broken by defining it two times. [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Backported in [#61576](https://github.com/ClickHouse/ClickHouse/issues/61576): Fix string search with constant start position which previously could lead to memory corruption. [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#61858](https://github.com/ClickHouse/ClickHouse/issues/61858): Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` when specifying incorrect UTF-8 sequence. Example: [#61714](https://github.com/ClickHouse/ClickHouse/issues/61714#issuecomment-2012768202). [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.2.1.2248-stable.md b/docs/changelogs/v24.2.1.2248-stable.md index 02affe12c43..edcd3da3852 100644 --- a/docs/changelogs/v24.2.1.2248-stable.md +++ b/docs/changelogs/v24.2.1.2248-stable.md @@ -60,7 +60,7 @@ sidebar_label: 2024 * Support negative positional arguments. Closes [#57736](https://github.com/ClickHouse/ClickHouse/issues/57736). [#58292](https://github.com/ClickHouse/ClickHouse/pull/58292) ([flynn](https://github.com/ucasfl)). * Implement auto-adjustment for asynchronous insert timeouts. The following settings are introduced: async_insert_poll_timeout_ms, async_insert_use_adaptive_busy_timeout, async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms, async_insert_busy_timeout_increase_rate, async_insert_busy_timeout_decrease_rate. [#58486](https://github.com/ClickHouse/ClickHouse/pull/58486) ([Julia Kartseva](https://github.com/jkartseva)). * Allow to define `volume_priority` in `storage_configuration`. [#58533](https://github.com/ClickHouse/ClickHouse/pull/58533) ([Andrey Zvonov](https://github.com/zvonand)). -* Add support for Date32 type in T64 codec. [#58738](https://github.com/ClickHouse/ClickHouse/pull/58738) ([Hongbin Ma](https://github.com/binmahone)). +* Add support for Date32 type in T64 codec. [#58738](https://github.com/ClickHouse/ClickHouse/pull/58738) ([Hongbin Ma (Mahone)](https://github.com/binmahone)). * Support `LEFT JOIN`, `ALL INNER JOIN`, and simple subqueries for parallel replicas (only with analyzer). New setting `parallel_replicas_prefer_local_join` chooses local `JOIN` execution (by default) vs `GLOBAL JOIN`. All tables should exist on every replica from `cluster_for_parallel_replicas`. New settings `min_external_table_block_size_rows` and `min_external_table_block_size_bytes` are used to squash small blocks that are sent for temporary tables (only with analyzer). [#58916](https://github.com/ClickHouse/ClickHouse/pull/58916) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Allow trailing commas in types with several items. [#59119](https://github.com/ClickHouse/ClickHouse/pull/59119) ([Aleksandr Musorin](https://github.com/AVMusorin)). * Allow parallel and distributed processing for `S3Queue` table engine. For distributed processing use setting `s3queue_total_shards_num` (by default `1`). Setting `s3queue_processing_threads_num` previously was not allowed for Ordered processing mode, now it is allowed. Warning: settings `s3queue_processing_threads_num`(processing threads per each shard) and `s3queue_total_shards_num` for ordered mode change how metadata is stored (make the number of `max_processed_file` nodes equal to `s3queue_processing_threads_num * s3queue_total_shards_num`), so they must be the same for all shards and cannot be changed once at least one shard is created. [#59167](https://github.com/ClickHouse/ClickHouse/pull/59167) ([Kseniia Sumarokova](https://github.com/kssenii)). @@ -123,60 +123,60 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Non ready set in TTL WHERE. [#57430](https://github.com/ClickHouse/ClickHouse/pull/57430) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix quantilesGK bug [#58216](https://github.com/ClickHouse/ClickHouse/pull/58216) ([李扬](https://github.com/taiyang-li)). -* Disable parallel replicas JOIN with CTE (not analyzer) [#59239](https://github.com/ClickHouse/ClickHouse/pull/59239) ([Raúl Marín](https://github.com/Algunenano)). -* Fix bug with `intDiv` for decimal arguments [#59243](https://github.com/ClickHouse/ClickHouse/pull/59243) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Fix translate() with FixedString input [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). -* Fix digest calculation in Keeper [#59439](https://github.com/ClickHouse/ClickHouse/pull/59439) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix stacktraces for binaries without debug symbols [#59444](https://github.com/ClickHouse/ClickHouse/pull/59444) ([Azat Khuzhin](https://github.com/azat)). -* Fix `ASTAlterCommand::formatImpl` in case of column specific settings... [#59445](https://github.com/ClickHouse/ClickHouse/pull/59445) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Fix `SELECT * FROM [...] ORDER BY ALL` with Analyzer [#59462](https://github.com/ClickHouse/ClickHouse/pull/59462) ([zhongyuankai](https://github.com/zhongyuankai)). -* Fix possible uncaught exception during distributed query cancellation [#59487](https://github.com/ClickHouse/ClickHouse/pull/59487) ([Azat Khuzhin](https://github.com/azat)). -* Make MAX use the same rules as permutation for complex types [#59498](https://github.com/ClickHouse/ClickHouse/pull/59498) ([Raúl Marín](https://github.com/Algunenano)). -* Fix corner case when passing `update_insert_deduplication_token_in_dependent_materialized_views` [#59544](https://github.com/ClickHouse/ClickHouse/pull/59544) ([Jordi Villar](https://github.com/jrdi)). -* Fix incorrect result of arrayElement / map[] on empty value [#59594](https://github.com/ClickHouse/ClickHouse/pull/59594) ([Raúl Marín](https://github.com/Algunenano)). -* Fix crash in topK when merging empty states [#59603](https://github.com/ClickHouse/ClickHouse/pull/59603) ([Raúl Marín](https://github.com/Algunenano)). -* Fix distributed table with a constant sharding key [#59606](https://github.com/ClickHouse/ClickHouse/pull/59606) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix_kql_issue_found_by_wingfuzz [#59626](https://github.com/ClickHouse/ClickHouse/pull/59626) ([Yong Wang](https://github.com/kashwy)). -* Fix error "Read beyond last offset" for AsynchronousBoundedReadBuffer [#59630](https://github.com/ClickHouse/ClickHouse/pull/59630) ([Vitaly Baranov](https://github.com/vitlibar)). -* Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor [#59658](https://github.com/ClickHouse/ClickHouse/pull/59658) ([Raúl Marín](https://github.com/Algunenano)). -* Fix query start time on non initial queries [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). -* Validate types of arguments for `minmax` skipping index [#59733](https://github.com/ClickHouse/ClickHouse/pull/59733) ([Anton Popov](https://github.com/CurtizJ)). -* Fix leftPad / rightPad function with FixedString input [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). -* Fix AST fuzzer issue in function `countMatches` [#59752](https://github.com/ClickHouse/ClickHouse/pull/59752) ([Robert Schulze](https://github.com/rschu1ze)). -* rabbitmq: fix having neither acked nor nacked messages [#59775](https://github.com/ClickHouse/ClickHouse/pull/59775) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix StorageURL doing some of the query execution in single thread [#59833](https://github.com/ClickHouse/ClickHouse/pull/59833) ([Michael Kolupaev](https://github.com/al13n321)). -* s3queue: fix uninitialized value [#59897](https://github.com/ClickHouse/ClickHouse/pull/59897) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix parsing of partition expressions surrounded by parens [#59901](https://github.com/ClickHouse/ClickHouse/pull/59901) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Fix crash in JSONColumnsWithMetadata format over http [#59925](https://github.com/ClickHouse/ClickHouse/pull/59925) ([Kruglov Pavel](https://github.com/Avogar)). -* Do not rewrite sum() to count() if return value differs in analyzer [#59926](https://github.com/ClickHouse/ClickHouse/pull/59926) ([Azat Khuzhin](https://github.com/azat)). -* UniqExactSet read crash fix [#59928](https://github.com/ClickHouse/ClickHouse/pull/59928) ([Maksim Kita](https://github.com/kitaisreal)). -* ReplicatedMergeTree invalid metadata_version fix [#59946](https://github.com/ClickHouse/ClickHouse/pull/59946) ([Maksim Kita](https://github.com/kitaisreal)). -* Fix data race in `StorageDistributed` [#59987](https://github.com/ClickHouse/ClickHouse/pull/59987) ([Nikita Taranov](https://github.com/nickitat)). -* Run init scripts when option is enabled rather than disabled [#59991](https://github.com/ClickHouse/ClickHouse/pull/59991) ([jktng](https://github.com/jktng)). -* Fix scale conversion for DateTime64 [#60004](https://github.com/ClickHouse/ClickHouse/pull/60004) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Fix INSERT into SQLite with single quote (by escaping single quotes with a quote instead of backslash) [#60015](https://github.com/ClickHouse/ClickHouse/pull/60015) ([Azat Khuzhin](https://github.com/azat)). -* Fix several logical errors in arrayFold [#60022](https://github.com/ClickHouse/ClickHouse/pull/60022) ([Raúl Marín](https://github.com/Algunenano)). -* Fix optimize_uniq_to_count removing the column alias [#60026](https://github.com/ClickHouse/ClickHouse/pull/60026) ([Raúl Marín](https://github.com/Algunenano)). -* Fix possible exception from s3queue table on drop [#60036](https://github.com/ClickHouse/ClickHouse/pull/60036) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix formatting of NOT with single literals [#60042](https://github.com/ClickHouse/ClickHouse/pull/60042) ([Raúl Marín](https://github.com/Algunenano)). -* Use max_query_size from context in DDLLogEntry instead of hardcoded 4096 [#60083](https://github.com/ClickHouse/ClickHouse/pull/60083) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix inconsistent formatting of queries [#60095](https://github.com/ClickHouse/ClickHouse/pull/60095) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix inconsistent formatting of explain in subqueries [#60102](https://github.com/ClickHouse/ClickHouse/pull/60102) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix cosineDistance crash with Nullable [#60150](https://github.com/ClickHouse/ClickHouse/pull/60150) ([Raúl Marín](https://github.com/Algunenano)). -* Allow casting of bools in string representation to to true bools [#60160](https://github.com/ClickHouse/ClickHouse/pull/60160) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix system.s3queue_log [#60166](https://github.com/ClickHouse/ClickHouse/pull/60166) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix arrayReduce with nullable aggregate function name [#60188](https://github.com/ClickHouse/ClickHouse/pull/60188) ([Raúl Marín](https://github.com/Algunenano)). -* Fix actions execution during preliminary filtering (PK, partition pruning) [#60196](https://github.com/ClickHouse/ClickHouse/pull/60196) ([Azat Khuzhin](https://github.com/azat)). -* Hide sensitive info for s3queue [#60233](https://github.com/ClickHouse/ClickHouse/pull/60233) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Revert "Replace `ORDER BY ALL` by `ORDER BY *`" [#60248](https://github.com/ClickHouse/ClickHouse/pull/60248) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix http exception codes. [#60252](https://github.com/ClickHouse/ClickHouse/pull/60252) ([Austin Kothig](https://github.com/kothiga)). -* s3queue: fix bug (also fixes flaky test_storage_s3_queue/test.py::test_shards_distributed) [#60282](https://github.com/ClickHouse/ClickHouse/pull/60282) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix use-of-uninitialized-value and invalid result in hashing functions with IPv6 [#60359](https://github.com/ClickHouse/ClickHouse/pull/60359) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix OptimizeDateOrDateTimeConverterWithPreimageVisitor with null arguments [#60453](https://github.com/ClickHouse/ClickHouse/pull/60453) ([Raúl Marín](https://github.com/Algunenano)). -* Merging [#59674](https://github.com/ClickHouse/ClickHouse/issues/59674). [#60470](https://github.com/ClickHouse/ClickHouse/pull/60470) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Correctly check keys in s3Cluster [#60477](https://github.com/ClickHouse/ClickHouse/pull/60477) ([Antonio Andelic](https://github.com/antonio2368)). +* Support `IN (subquery)` in table TTL expression. Initially, it was allowed to create such a TTL expression, but any TTL merge would fail with `Not-ready Set` error in the background. Now, TTL is correctly applied. Subquery is executed for every TTL merge, and its result is not cached or reused by other merges. Use such configuration with special care, because subqueries in TTL may lead to high memory consumption and, possibly, a non-deterministic result of TTL merge on different replicas (which is correctly handled by replication, however). [#57430](https://github.com/ClickHouse/ClickHouse/pull/57430) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix quantilesGK bug, close [#57683](https://github.com/ClickHouse/ClickHouse/issues/57683). [#58216](https://github.com/ClickHouse/ClickHouse/pull/58216) ([李扬](https://github.com/taiyang-li)). +* Disable parallel replicas JOIN with CTE (not analyzer). [#59239](https://github.com/ClickHouse/ClickHouse/pull/59239) ([Raúl Marín](https://github.com/Algunenano)). +* Fixes bug with for function `intDiv` with decimal arguments. Fixes [#56414](https://github.com/ClickHouse/ClickHouse/issues/56414). [#59243](https://github.com/ClickHouse/ClickHouse/pull/59243) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix translate() with FixedString input. Could lead to crashes as it'd return a String column (vs the expected FixedString). This issue was found through ClickHouse Bug Bounty Program YohannJardin. [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). +* Keeper fix: fix digest calculation for nodes. [#59439](https://github.com/ClickHouse/ClickHouse/pull/59439) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix stacktraces for binaries without debug symbols. [#59444](https://github.com/ClickHouse/ClickHouse/pull/59444) ([Azat Khuzhin](https://github.com/azat)). +* Fix formatting of alter commands in case of column specific settings. [#59445](https://github.com/ClickHouse/ClickHouse/pull/59445) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* `SELECT * FROM [...] ORDER BY ALL SETTINGS allow_experimental_analyzer = 1` now works. [#59462](https://github.com/ClickHouse/ClickHouse/pull/59462) ([zhongyuankai](https://github.com/zhongyuankai)). +* Fix possible uncaught exception during distributed query cancellation. Closes [#59169](https://github.com/ClickHouse/ClickHouse/issues/59169). [#59487](https://github.com/ClickHouse/ClickHouse/pull/59487) ([Azat Khuzhin](https://github.com/azat)). +* Make MAX use the same rules as permutation for complex types. [#59498](https://github.com/ClickHouse/ClickHouse/pull/59498) ([Raúl Marín](https://github.com/Algunenano)). +* Fix a corner case when passing `update_insert_deduplication_token_in_dependent_materialized_views` setting. There is one corner case not covered due to the absence of tables in the path:. [#59544](https://github.com/ClickHouse/ClickHouse/pull/59544) ([Jordi Villar](https://github.com/jrdi)). +* Fix incorrect result of arrayElement / map[] on empty value. [#59594](https://github.com/ClickHouse/ClickHouse/pull/59594) ([Raúl Marín](https://github.com/Algunenano)). +* Fix crash in topK when merging empty states. [#59603](https://github.com/ClickHouse/ClickHouse/pull/59603) ([Raúl Marín](https://github.com/Algunenano)). +* Fix distributed table with a constant sharding key. [#59606](https://github.com/ClickHouse/ClickHouse/pull/59606) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix segmentation fault in KQL parser when the input query exceeds the `max_query_size`. Also re-enable the KQL dialect. Fixes [#59036](https://github.com/ClickHouse/ClickHouse/issues/59036) and [#59037](https://github.com/ClickHouse/ClickHouse/issues/59037). [#59626](https://github.com/ClickHouse/ClickHouse/pull/59626) ([Yong Wang](https://github.com/kashwy)). +* Fix error `Read beyond last offset` for `AsynchronousBoundedReadBuffer`. [#59630](https://github.com/ClickHouse/ClickHouse/pull/59630) ([Vitaly Baranov](https://github.com/vitlibar)). +* Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor. [#59658](https://github.com/ClickHouse/ClickHouse/pull/59658) ([Raúl Marín](https://github.com/Algunenano)). +* Fix query start time on non initial queries. [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). +* Validate types of arguments for `minmax` skipping index. [#59733](https://github.com/ClickHouse/ClickHouse/pull/59733) ([Anton Popov](https://github.com/CurtizJ)). +* Fix leftPad / rightPad function with FixedString input. [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). +* Fixed an exception in function `countMatches` with non-const `FixedString` haystack arguments, e.g. `SELECT countMatches(materialize(toFixedString('foobarfoo', 9)), 'foo');`. [#59752](https://github.com/ClickHouse/ClickHouse/pull/59752) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix having neigher acked nor nacked messages. If exception happens during read-write phase, messages will be nacked. [#59775](https://github.com/ClickHouse/ClickHouse/pull/59775) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fixed queries that read a Parquet file over HTTP (url()/URL()) executing in one thread instead of max_threads. [#59833](https://github.com/ClickHouse/ClickHouse/pull/59833) ([Michael Kolupaev](https://github.com/al13n321)). +* Fixed uninitialized value in s3 queue, which happened during upgrade to a new version if table had Ordered mode and resulted in an error "Existing table metadata in ZooKeeper differs in s3queue_processing_threads_num setting". [#59897](https://github.com/ClickHouse/ClickHouse/pull/59897) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix parsing of partition expressions that are surrounded by parentheses, e.g.: `ALTER TABLE test DROP PARTITION ('2023-10-19')`. [#59901](https://github.com/ClickHouse/ClickHouse/pull/59901) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Fix crash in JSONColumnsWithMetadata format over http. Closes [#59853](https://github.com/ClickHouse/ClickHouse/issues/59853). [#59925](https://github.com/ClickHouse/ClickHouse/pull/59925) ([Kruglov Pavel](https://github.com/Avogar)). +* Do not rewrite sum() to count() if return value differs in analyzer. [#59926](https://github.com/ClickHouse/ClickHouse/pull/59926) ([Azat Khuzhin](https://github.com/azat)). +* Fix crash during deserialization of aggregation function states that internally use `UniqExactSet`. Introduced https://github.com/ClickHouse/ClickHouse/pull/59009. [#59928](https://github.com/ClickHouse/ClickHouse/pull/59928) ([Maksim Kita](https://github.com/kitaisreal)). +* ReplicatedMergeTree fix invalid `metadata_version` node initialization in Zookeeper during creation of non first replica. Closes [#54902](https://github.com/ClickHouse/ClickHouse/issues/54902). [#59946](https://github.com/ClickHouse/ClickHouse/pull/59946) ([Maksim Kita](https://github.com/kitaisreal)). +* Fixed data race on cluster object between `StorageDistributed` and `Context::reloadClusterConfig()`. Former held const reference to its member while the latter destroyed the object (in process of replacing it with a new one). [#59987](https://github.com/ClickHouse/ClickHouse/pull/59987) ([Nikita Taranov](https://github.com/nickitat)). +* Fixes [#59989](https://github.com/ClickHouse/ClickHouse/issues/59989): runs init scripts when force-enabled or when no database exists, rather than the inverse. [#59991](https://github.com/ClickHouse/ClickHouse/pull/59991) ([jktng](https://github.com/jktng)). +* This PR fixes scale conversion for DateTime64 values (for example, DateTime64(6)->DateTime64(3)). ```SQL create table test (result DateTime64(3)) engine=Memory;. [#60004](https://github.com/ClickHouse/ClickHouse/pull/60004) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix INSERT into SQLite with single quote (by properly escaping single quotes with a quote instead of backslash). [#60015](https://github.com/ClickHouse/ClickHouse/pull/60015) ([Azat Khuzhin](https://github.com/azat)). +* Fix several logical errors in arrayFold. Fixes support for Nullable and LowCardinality. [#60022](https://github.com/ClickHouse/ClickHouse/pull/60022) ([Raúl Marín](https://github.com/Algunenano)). +* Fix optimize_uniq_to_count removing the column alias. [#60026](https://github.com/ClickHouse/ClickHouse/pull/60026) ([Raúl Marín](https://github.com/Algunenano)). +* Fix possible error while dropping s3queue table, like "no node shard0". [#60036](https://github.com/ClickHouse/ClickHouse/pull/60036) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix formatting of NOT with single literals. [#60042](https://github.com/ClickHouse/ClickHouse/pull/60042) ([Raúl Marín](https://github.com/Algunenano)). +* Use max_query_size from context in parsing changed settings in DDLWorker. Previously with large number of changed settings DDLWorker could fail with `Max query size exceeded` error and don't process log entries. [#60083](https://github.com/ClickHouse/ClickHouse/pull/60083) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix inconsistent formatting of queries containing tables named `table`. Fix wrong formatting of queries with `UNION ALL`, `INTERSECT`, and `EXCEPT` when their structure wasn't linear. This closes [#52349](https://github.com/ClickHouse/ClickHouse/issues/52349). Fix wrong formatting of `SYSTEM` queries, including `SYSTEM ... DROP FILESYSTEM CACHE`, `SYSTEM ... REFRESH/START/STOP/CANCEL/TEST VIEW`, `SYSTEM ENABLE/DISABLE FAILPOINT`. Fix formatting of parameterized DDL queries. Fix the formatting of the `DESCRIBE FILESYSTEM CACHE` query. Fix incorrect formatting of the `SET param_...` (a query setting a parameter). Fix incorrect formatting of `CREATE INDEX` queries. Fix inconsistent formatting of `CREATE USER` and similar queries. Fix inconsistent formatting of `CREATE SETTINGS PROFILE`. Fix incorrect formatting of `ALTER ... MODIFY REFRESH`. Fix inconsistent formatting of window functions if frame offsets were expressions. Fix inconsistent formatting of `RESPECT NULLS` and `IGNORE NULLS` if they were used after a function that implements an operator (such as `plus`). Fix idiotic formatting of `SYSTEM SYNC REPLICA ... LIGHTWEIGHT FROM ...`. Fix inconsistent formatting of invalid queries with `GROUP BY GROUPING SETS ... WITH ROLLUP/CUBE/TOTALS`. Fix inconsistent formatting of `GRANT CURRENT GRANTS`. Fix inconsistent formatting of `CREATE TABLE (... COLLATE)`. Additionally, I fixed the incorrect formatting of `EXPLAIN` in subqueries ([#60102](https://github.com/ClickHouse/ClickHouse/issues/60102)). Fixed incorrect formatting of lambda functions ([#60012](https://github.com/ClickHouse/ClickHouse/issues/60012)). Added a check so there is no way to miss these abominations in the future. [#60095](https://github.com/ClickHouse/ClickHouse/pull/60095) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Queries like `SELECT * FROM (EXPLAIN ...)` were formatted incorrectly. [#60102](https://github.com/ClickHouse/ClickHouse/pull/60102) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix cosineDistance crash with Nullable. [#60150](https://github.com/ClickHouse/ClickHouse/pull/60150) ([Raúl Marín](https://github.com/Algunenano)). +* Boolean values in string representation now cast to true bools. E.g. this query previously threw an exception but now works: `SELECT true = 'true'`. [#60160](https://github.com/ClickHouse/ClickHouse/pull/60160) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix non-filled column `table_uuid` in `system.s3queue_log`. Added columns `database` and `table`. Renamed `table_uuid` to `uuid`. [#60166](https://github.com/ClickHouse/ClickHouse/pull/60166) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix arrayReduce with nullable aggregate function name. [#60188](https://github.com/ClickHouse/ClickHouse/pull/60188) ([Raúl Marín](https://github.com/Algunenano)). +* Fix actions execution during preliminary filtering (PK, partition pruning). [#60196](https://github.com/ClickHouse/ClickHouse/pull/60196) ([Azat Khuzhin](https://github.com/azat)). +* Hide sensitive info for `S3Queue` table engine. [#60233](https://github.com/ClickHouse/ClickHouse/pull/60233) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Restore the previous syntax `ORDER BY ALL` which has temporarily (for a few days) been replaced by ORDER BY *. [#60248](https://github.com/ClickHouse/ClickHouse/pull/60248) ([Robert Schulze](https://github.com/rschu1ze)). +* Fixed a minor bug that caused all http return codes to be 200 (success) instead of a relevant code on exception. [#60252](https://github.com/ClickHouse/ClickHouse/pull/60252) ([Austin Kothig](https://github.com/kothiga)). +* Fix bug in `S3Queue` table engine with ordered parallel mode. [#60282](https://github.com/ClickHouse/ClickHouse/pull/60282) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix use-of-uninitialized-value and invalid result in hashing functions with IPv6. [#60359](https://github.com/ClickHouse/ClickHouse/pull/60359) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix OptimizeDateOrDateTimeConverterWithPreimageVisitor with null arguments. [#60453](https://github.com/ClickHouse/ClickHouse/pull/60453) ([Raúl Marín](https://github.com/Algunenano)). +* Fixed a minor bug that prevented distributed table queries sent from either KQL or PRQL dialect clients to be executed on replicas. [#60470](https://github.com/ClickHouse/ClickHouse/pull/60470) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix incomplete results with s3Cluster when multiple threads are used. [#60477](https://github.com/ClickHouse/ClickHouse/pull/60477) ([Antonio Andelic](https://github.com/antonio2368)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.2.2.71-stable.md b/docs/changelogs/v24.2.2.71-stable.md index b9aa5be626b..e17c22ab176 100644 --- a/docs/changelogs/v24.2.2.71-stable.md +++ b/docs/changelogs/v24.2.2.71-stable.md @@ -12,21 +12,21 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* PartsSplitter invalid ranges for the same part [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)). -* Try to avoid calculation of scalar subqueries for CREATE TABLE. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix_max_query_size_for_kql_compound_operator: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). -* Reduce the number of read rows from `system.numbers` [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)). -* Don't output number tips for date types [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)). -* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Prevent setting custom metadata headers on unsupported multipart upload operations [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). -* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). -* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). -* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix multiple bugs in groupArraySorted [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)). -* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#60640](https://github.com/ClickHouse/ClickHouse/issues/60640): Fixed a bug in parallel optimization for queries with `FINAL`, which could give an incorrect result in rare cases. [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)). +* Backported in [#61085](https://github.com/ClickHouse/ClickHouse/issues/61085): Avoid calculation of scalar subqueries for `CREATE TABLE`. Fixes [#59795](https://github.com/ClickHouse/ClickHouse/issues/59795) and [#59930](https://github.com/ClickHouse/ClickHouse/issues/59930). Attempt to re-implement https://github.com/ClickHouse/ClickHouse/pull/57855. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#61332](https://github.com/ClickHouse/ClickHouse/issues/61332): Fix deadlock in parallel parsing when lots of rows are skipped due to errors. [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#61010](https://github.com/ClickHouse/ClickHouse/issues/61010): Fix the issue of `max_query_size` for KQL compound operator like mv-expand. Related to [#59626](https://github.com/ClickHouse/ClickHouse/issues/59626). [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). +* Backported in [#61002](https://github.com/ClickHouse/ClickHouse/issues/61002): Reduce the number of read rows from `system.numbers`. Fixes [#59418](https://github.com/ClickHouse/ClickHouse/issues/59418). [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)). +* Backported in [#60629](https://github.com/ClickHouse/ClickHouse/issues/60629): Don't output number tips for date types. [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#60793](https://github.com/ClickHouse/ClickHouse/issues/60793): Fix buffer overflow that can happen if the attacker asks the HTTP server to decompress data with a composition of codecs and size triggering numeric overflow. Fix buffer overflow that can happen inside codec NONE on wrong input data. This was submitted by TIANGONG research team through our [Bug Bounty program](https://github.com/ClickHouse/ClickHouse/issues/38986). [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#60785](https://github.com/ClickHouse/ClickHouse/issues/60785): Functions for SQL/JSON were able to read uninitialized memory. This closes [#60017](https://github.com/ClickHouse/ClickHouse/issues/60017). Found by Fuzzer. [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#60805](https://github.com/ClickHouse/ClickHouse/issues/60805): Do not set aws custom metadata `x-amz-meta-*` headers on UploadPart & CompleteMultipartUpload calls. [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). +* Backported in [#60822](https://github.com/ClickHouse/ClickHouse/issues/60822): Fix crash in arrayEnumerateRanked. [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#60843](https://github.com/ClickHouse/ClickHouse/issues/60843): Fix crash when using input() in INSERT SELECT JOIN. Closes [#60035](https://github.com/ClickHouse/ClickHouse/issues/60035). [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#60919](https://github.com/ClickHouse/ClickHouse/issues/60919): Fix crash when `allow_experimental_analyzer` setting value is changed in the subqueries. [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#60906](https://github.com/ClickHouse/ClickHouse/issues/60906): Avoid segfault if too many keys are skipped when reading from S3. [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#61307](https://github.com/ClickHouse/ClickHouse/issues/61307): Fix multiple bugs in groupArraySorted. [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#61295](https://github.com/ClickHouse/ClickHouse/issues/61295): Keeper: fix runtime reconfig for standalone binary. [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.2.3.70-stable.md b/docs/changelogs/v24.2.3.70-stable.md index cd88877e254..1a50355e0b9 100644 --- a/docs/changelogs/v24.2.3.70-stable.md +++ b/docs/changelogs/v24.2.3.70-stable.md @@ -15,28 +15,28 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix possible incorrect result of aggregate function `uniqExact` [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). -* Fix ATTACH query with external ON CLUSTER [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix consecutive keys optimization for nullable keys [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). -* fix issue of actions dag split [#61458](https://github.com/ClickHouse/ClickHouse/pull/61458) ([Raúl Marín](https://github.com/Algunenano)). -* Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings [#61468](https://github.com/ClickHouse/ClickHouse/pull/61468) ([Raúl Marín](https://github.com/Algunenano)). -* Fix bug when reading system.parts using UUID (issue 61220). [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). -* Fix ALTER QUERY MODIFY SQL SECURITY [#61480](https://github.com/ClickHouse/ClickHouse/pull/61480) ([pufit](https://github.com/pufit)). -* Fix client `-s` argument [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). -* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). -* Cancel merges before removing moved parts [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). -* Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). -* Crash in Engine Merge if Row Policy does not have expression [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). -* Fix data race on scalars in Context [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). -* Try to fix segfault in Hive engine [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix memory leak in groupArraySorted [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix GCD codec [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). -* Fix temporary data in cache incorrectly processing failure of cache key directory creation [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix incorrect judgement of of monotonicity of function abs [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). -* Make sanity check of settings worse [#63119](https://github.com/ClickHouse/ClickHouse/pull/63119) ([Raúl Marín](https://github.com/Algunenano)). -* Set server name for SSL handshake in MongoDB engine [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). -* Format SQL security option only in `CREATE VIEW` queries. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). +* Backported in [#61453](https://github.com/ClickHouse/ClickHouse/issues/61453): Fix possible incorrect result of aggregate function `uniqExact`. [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#61946](https://github.com/ClickHouse/ClickHouse/issues/61946): Fix the ATTACH query with the ON CLUSTER clause when the database does not exist on the initiator node. Closes [#55009](https://github.com/ClickHouse/ClickHouse/issues/55009). [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#61846](https://github.com/ClickHouse/ClickHouse/issues/61846): Fixed possible wrong result of aggregation with nullable keys. [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#61591](https://github.com/ClickHouse/ClickHouse/issues/61591): ActionsDAG::split can't make sure that "Execution of first then second parts on block is equivalent to execution of initial DAG.". [#61458](https://github.com/ClickHouse/ClickHouse/pull/61458) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#61648](https://github.com/ClickHouse/ClickHouse/issues/61648): Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings. [#61468](https://github.com/ClickHouse/ClickHouse/pull/61468) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#61748](https://github.com/ClickHouse/ClickHouse/issues/61748): Fix incorrect results when filtering `system.parts` or `system.parts_columns` using UUID. [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). +* Backported in [#61963](https://github.com/ClickHouse/ClickHouse/issues/61963): Fix the `ALTER QUERY MODIFY SQL SECURITY` queries to override the table's DDL correctly. [#61480](https://github.com/ClickHouse/ClickHouse/pull/61480) ([pufit](https://github.com/pufit)). +* Backported in [#61699](https://github.com/ClickHouse/ClickHouse/issues/61699): Fix `clickhouse-client -s` argument, it was broken by defining it two times. [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Backported in [#61578](https://github.com/ClickHouse/ClickHouse/issues/61578): Fix string search with constant start position which previously could lead to memory corruption. [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#62531](https://github.com/ClickHouse/ClickHouse/issues/62531): Fix data race between `MOVE PARTITION` query and merges resulting in intersecting parts. [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#61860](https://github.com/ClickHouse/ClickHouse/issues/61860): Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` when specifying incorrect UTF-8 sequence. Example: [#61714](https://github.com/ClickHouse/ClickHouse/issues/61714#issuecomment-2012768202). [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). +* Backported in [#62242](https://github.com/ClickHouse/ClickHouse/issues/62242): Fix skipping escape sequcne parsing errors during JSON data parsing while using `input_format_allow_errors_num/ratio` settings. [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62218](https://github.com/ClickHouse/ClickHouse/issues/62218): Fixes Crash in Engine Merge if Row Policy does not have expression. [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). +* Backported in [#62342](https://github.com/ClickHouse/ClickHouse/issues/62342): Fix data race on scalars in Context. [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62677](https://github.com/ClickHouse/ClickHouse/issues/62677): Fix segmentation fault when using Hive table engine. Reference [#62154](https://github.com/ClickHouse/ClickHouse/issues/62154), [#62560](https://github.com/ClickHouse/ClickHouse/issues/62560). [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#62639](https://github.com/ClickHouse/ClickHouse/issues/62639): Fix memory leak in groupArraySorted. Fix [#62536](https://github.com/ClickHouse/ClickHouse/issues/62536). [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#63054](https://github.com/ClickHouse/ClickHouse/issues/63054): Fixed bug in GCD codec implementation that may lead to server crashes. [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). +* Backported in [#63030](https://github.com/ClickHouse/ClickHouse/issues/63030): Fix temporary data in cache incorrect behaviour in case creation of cache key base directory fails with `no space left on device`. [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#63142](https://github.com/ClickHouse/ClickHouse/issues/63142): Fix incorrect judgement of of monotonicity of function `abs`. [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). +* Backported in [#63183](https://github.com/ClickHouse/ClickHouse/issues/63183): Sanity check: Clamp values instead of throwing. [#63119](https://github.com/ClickHouse/ClickHouse/pull/63119) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#63176](https://github.com/ClickHouse/ClickHouse/issues/63176): Setting server_name might help with recently reported SSL handshake error when connecting to MongoDB Atlas: `Poco::Exception. Code: 1000, e.code() = 0, SSL Exception: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR`. [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#63191](https://github.com/ClickHouse/ClickHouse/issues/63191): Fix a bug when `SQL SECURITY` statement appears in all `CREATE` queries if the server setting `ignore_empty_sql_security_in_create_view_query=true` https://github.com/ClickHouse/ClickHouse/pull/63134. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.3.1.2672-lts.md b/docs/changelogs/v24.3.1.2672-lts.md index 006ab941203..a70a33971c2 100644 --- a/docs/changelogs/v24.3.1.2672-lts.md +++ b/docs/changelogs/v24.3.1.2672-lts.md @@ -20,7 +20,7 @@ sidebar_label: 2024 #### New Feature * Topk/topkweighed support mode, which return count of values and it's error. [#54508](https://github.com/ClickHouse/ClickHouse/pull/54508) ([UnamedRus](https://github.com/UnamedRus)). -* Add generate_series as a table function. This function generates table with an arithmetic progression with natural numbers. [#59390](https://github.com/ClickHouse/ClickHouse/pull/59390) ([divanik](https://github.com/divanik)). +* Add generate_series as a table function. This function generates table with an arithmetic progression with natural numbers. [#59390](https://github.com/ClickHouse/ClickHouse/pull/59390) ([Daniil Ivanik](https://github.com/divanik)). * Support reading and writing backups as tar archives. [#59535](https://github.com/ClickHouse/ClickHouse/pull/59535) ([josh-hildred](https://github.com/josh-hildred)). * Implemented support for S3Express buckets. [#59965](https://github.com/ClickHouse/ClickHouse/pull/59965) ([Nikita Taranov](https://github.com/nickitat)). * Allow to attach parts from a different disk * attach partition from the table on other disks using copy instead of hard link (such as instant table) * attach partition using copy when the hard link fails even on the same disk. [#60112](https://github.com/ClickHouse/ClickHouse/pull/60112) ([Unalian](https://github.com/Unalian)). @@ -133,75 +133,75 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix function execution over const and LowCardinality with GROUP BY const for analyzer [#59986](https://github.com/ClickHouse/ClickHouse/pull/59986) ([Azat Khuzhin](https://github.com/azat)). -* Fix finished_mutations_to_keep=0 for MergeTree (as docs says 0 is to keep everything) [#60031](https://github.com/ClickHouse/ClickHouse/pull/60031) ([Azat Khuzhin](https://github.com/azat)). -* PartsSplitter invalid ranges for the same part [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)). -* Azure Blob Storage : Fix issues endpoint and prefix [#60251](https://github.com/ClickHouse/ClickHouse/pull/60251) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* fix LRUResource Cache bug (Hive cache) [#60262](https://github.com/ClickHouse/ClickHouse/pull/60262) ([shanfengp](https://github.com/Aed-p)). -* Force reanalysis if parallel replicas changed [#60362](https://github.com/ClickHouse/ClickHouse/pull/60362) ([Raúl Marín](https://github.com/Algunenano)). -* Fix usage of plain metadata type with new disks configuration option [#60396](https://github.com/ClickHouse/ClickHouse/pull/60396) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Try to fix logical error 'Cannot capture column because it has incompatible type' in mapContainsKeyLike [#60451](https://github.com/ClickHouse/ClickHouse/pull/60451) ([Kruglov Pavel](https://github.com/Avogar)). -* Try to avoid calculation of scalar subqueries for CREATE TABLE. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix_max_query_size_for_kql_compound_operator: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). -* Keeper fix: add timeouts when waiting for commit logs [#60544](https://github.com/ClickHouse/ClickHouse/pull/60544) ([Antonio Andelic](https://github.com/antonio2368)). -* Reduce the number of read rows from `system.numbers` [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)). -* Don't output number tips for date types [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)). -* Fix reading from MergeTree with non-deterministic functions in filter [#60586](https://github.com/ClickHouse/ClickHouse/pull/60586) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix logical error on bad compatibility setting value type [#60596](https://github.com/ClickHouse/ClickHouse/pull/60596) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix inconsistent aggregate function states in mixed x86-64 / ARM clusters [#60610](https://github.com/ClickHouse/ClickHouse/pull/60610) ([Harry Lee](https://github.com/HarryLeeIBM)). -* fix(prql): Robust panic handler [#60615](https://github.com/ClickHouse/ClickHouse/pull/60615) ([Maximilian Roos](https://github.com/max-sixty)). -* Fix `intDiv` for decimal and date arguments [#60672](https://github.com/ClickHouse/ClickHouse/pull/60672) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Fix: expand CTE in alter modify query [#60682](https://github.com/ClickHouse/ClickHouse/pull/60682) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Fix system.parts for non-Atomic/Ordinary database engine (i.e. Memory) [#60689](https://github.com/ClickHouse/ClickHouse/pull/60689) ([Azat Khuzhin](https://github.com/azat)). -* Fix "Invalid storage definition in metadata file" for parameterized views [#60708](https://github.com/ClickHouse/ClickHouse/pull/60708) ([Azat Khuzhin](https://github.com/azat)). -* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Remove wrong sanitize checking in aggregate function quantileGK [#60740](https://github.com/ClickHouse/ClickHouse/pull/60740) ([李扬](https://github.com/taiyang-li)). -* Fix insert-select + insert_deduplication_token bug by setting streams to 1 [#60745](https://github.com/ClickHouse/ClickHouse/pull/60745) ([Jordi Villar](https://github.com/jrdi)). -* Prevent setting custom metadata headers on unsupported multipart upload operations [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). -* Fix toStartOfInterval [#60763](https://github.com/ClickHouse/ClickHouse/pull/60763) ([Andrey Zvonov](https://github.com/zvonand)). -* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). -* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). -* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix possible stuck on error in HashedDictionaryParallelLoader [#60926](https://github.com/ClickHouse/ClickHouse/pull/60926) ([vdimir](https://github.com/vdimir)). -* Fix async RESTORE with Replicated database [#60934](https://github.com/ClickHouse/ClickHouse/pull/60934) ([Antonio Andelic](https://github.com/antonio2368)). -* fix csv format not support tuple [#60994](https://github.com/ClickHouse/ClickHouse/pull/60994) ([shuai.xu](https://github.com/shuai-xu)). -* Fix deadlock in async inserts to `Log` tables via native protocol [#61055](https://github.com/ClickHouse/ClickHouse/pull/61055) ([Anton Popov](https://github.com/CurtizJ)). -* Fix lazy execution of default argument in dictGetOrDefault for RangeHashedDictionary [#61196](https://github.com/ClickHouse/ClickHouse/pull/61196) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix multiple bugs in groupArraySorted [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)). -* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix usage of session_token in S3 engine [#61234](https://github.com/ClickHouse/ClickHouse/pull/61234) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix possible incorrect result of aggregate function `uniqExact` [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). -* Fix bugs in show database [#61269](https://github.com/ClickHouse/ClickHouse/pull/61269) ([Raúl Marín](https://github.com/Algunenano)). -* Fix logical error in RabbitMQ storage with MATERIALIZED columns [#61320](https://github.com/ClickHouse/ClickHouse/pull/61320) ([vdimir](https://github.com/vdimir)). -* Fix CREATE OR REPLACE DICTIONARY [#61356](https://github.com/ClickHouse/ClickHouse/pull/61356) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix crash in ObjectJson parsing array with nulls [#61364](https://github.com/ClickHouse/ClickHouse/pull/61364) ([vdimir](https://github.com/vdimir)). -* Fix ATTACH query with external ON CLUSTER [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix consecutive keys optimization for nullable keys [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). -* fix issue of actions dag split [#61458](https://github.com/ClickHouse/ClickHouse/pull/61458) ([Raúl Marín](https://github.com/Algunenano)). -* Fix finishing a failed RESTORE [#61466](https://github.com/ClickHouse/ClickHouse/pull/61466) ([Vitaly Baranov](https://github.com/vitlibar)). -* Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings [#61468](https://github.com/ClickHouse/ClickHouse/pull/61468) ([Raúl Marín](https://github.com/Algunenano)). -* Allow queuing in restore pool [#61475](https://github.com/ClickHouse/ClickHouse/pull/61475) ([Nikita Taranov](https://github.com/nickitat)). -* Fix bug when reading system.parts using UUID (issue 61220). [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). -* Fix ALTER QUERY MODIFY SQL SECURITY [#61480](https://github.com/ClickHouse/ClickHouse/pull/61480) ([pufit](https://github.com/pufit)). -* Fix crash in window view [#61526](https://github.com/ClickHouse/ClickHouse/pull/61526) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix `repeat` with non native integers [#61527](https://github.com/ClickHouse/ClickHouse/pull/61527) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix client `-s` argument [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). -* Reset part level upon attach from disk on MergeTree [#61536](https://github.com/ClickHouse/ClickHouse/pull/61536) ([Arthur Passos](https://github.com/arthurpassos)). -* Fix crash in arrayPartialReverseSort [#61539](https://github.com/ClickHouse/ClickHouse/pull/61539) ([Raúl Marín](https://github.com/Algunenano)). -* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix addDays cause an error when used datetime64 [#61561](https://github.com/ClickHouse/ClickHouse/pull/61561) ([Shuai li](https://github.com/loneylee)). -* disallow LowCardinality input type for JSONExtract [#61617](https://github.com/ClickHouse/ClickHouse/pull/61617) ([Julia Kartseva](https://github.com/jkartseva)). -* Fix `system.part_log` for async insert with deduplication [#61620](https://github.com/ClickHouse/ClickHouse/pull/61620) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix Non-ready set for system.parts. [#61666](https://github.com/ClickHouse/ClickHouse/pull/61666) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Don't allow the same expression in ORDER BY with and without WITH FILL [#61667](https://github.com/ClickHouse/ClickHouse/pull/61667) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix actual_part_name for REPLACE_RANGE (`Entry actual part isn't empty yet`) [#61675](https://github.com/ClickHouse/ClickHouse/pull/61675) ([Alexander Tokmakov](https://github.com/tavplubix)). -* Fix columns after executing MODIFY QUERY for a materialized view with internal table [#61734](https://github.com/ClickHouse/ClickHouse/pull/61734) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). -* Fix RANGE frame is not supported for Nullable columns. [#61766](https://github.com/ClickHouse/ClickHouse/pull/61766) ([YuanLiu](https://github.com/ditgittube)). -* Revert "Revert "Fix bug when reading system.parts using UUID (issue 61220)."" [#61779](https://github.com/ClickHouse/ClickHouse/pull/61779) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Fix function execution over const and LowCardinality with GROUP BY const for analyzer. [#59986](https://github.com/ClickHouse/ClickHouse/pull/59986) ([Azat Khuzhin](https://github.com/azat)). +* Fix finished_mutations_to_keep=0 for MergeTree (as docs says 0 is to keep everything). [#60031](https://github.com/ClickHouse/ClickHouse/pull/60031) ([Azat Khuzhin](https://github.com/azat)). +* Fixed a bug in parallel optimization for queries with `FINAL`, which could give an incorrect result in rare cases. [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)). +* Updated to not include account_name in endpoint if flag `endpoint_contains_account_name` is set and fixed issue with empty container name. [#60251](https://github.com/ClickHouse/ClickHouse/pull/60251) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fix LRUResource Cache implementation that can be triggered by incorrect component usage. Error can't be triggered with current ClickHouse usage. close [#60122](https://github.com/ClickHouse/ClickHouse/issues/60122). [#60262](https://github.com/ClickHouse/ClickHouse/pull/60262) ([shanfengp](https://github.com/Aed-p)). +* Force reanalysis of the query if parallel replicas isn't supported in a subquery. [#60362](https://github.com/ClickHouse/ClickHouse/pull/60362) ([Raúl Marín](https://github.com/Algunenano)). +* Fix usage of plain metadata type for new disks configuration option. [#60396](https://github.com/ClickHouse/ClickHouse/pull/60396) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix logical error 'Cannot capture column because it has incompatible type' in mapContainsKeyLike. [#60451](https://github.com/ClickHouse/ClickHouse/pull/60451) ([Kruglov Pavel](https://github.com/Avogar)). +* Avoid calculation of scalar subqueries for `CREATE TABLE`. Fixes [#59795](https://github.com/ClickHouse/ClickHouse/issues/59795) and [#59930](https://github.com/ClickHouse/ClickHouse/issues/59930). Attempt to re-implement https://github.com/ClickHouse/ClickHouse/pull/57855. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix deadlock in parallel parsing when lots of rows are skipped due to errors. [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix the issue of `max_query_size` for KQL compound operator like mv-expand. Related to [#59626](https://github.com/ClickHouse/ClickHouse/issues/59626). [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). +* Keeper fix: add timeouts when waiting for commit logs. Keeper could get stuck if the log successfully gets replicated but never committed. [#60544](https://github.com/ClickHouse/ClickHouse/pull/60544) ([Antonio Andelic](https://github.com/antonio2368)). +* Reduce the number of read rows from `system.numbers`. Fixes [#59418](https://github.com/ClickHouse/ClickHouse/issues/59418). [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)). +* Don't output number tips for date types. [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)). +* Fix unexpected result during reading from tables with virtual columns when filter contains non-deterministic functions. Closes [#61106](https://github.com/ClickHouse/ClickHouse/issues/61106). [#60586](https://github.com/ClickHouse/ClickHouse/pull/60586) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix logical error on bad compatibility setting value type. Closes [#60590](https://github.com/ClickHouse/ClickHouse/issues/60590). [#60596](https://github.com/ClickHouse/ClickHouse/pull/60596) ([Kruglov Pavel](https://github.com/Avogar)). +* Fixed potentially inconsistent aggregate function states in mixed x86-64 / ARM clusters. [#60610](https://github.com/ClickHouse/ClickHouse/pull/60610) ([Harry Lee](https://github.com/HarryLeeIBM)). +* Isolates the ClickHouse binary from any panics in `prqlc`. [#60615](https://github.com/ClickHouse/ClickHouse/pull/60615) ([Maximilian Roos](https://github.com/max-sixty)). +* Fixing bug where `intDiv` with decimal and date/datetime as arguments leads to crash. Closes [#60653](https://github.com/ClickHouse/ClickHouse/issues/60653). [#60672](https://github.com/ClickHouse/ClickHouse/pull/60672) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix bug when attempt to 'ALTER TABLE ... MODIFY QUERY' with CTE ends up with "Table [CTE] does not exist" exception (Code: 60). [#60682](https://github.com/ClickHouse/ClickHouse/pull/60682) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fix system.parts for non-Atomic/Ordinary database engine (i.e. Memory - major user is `clickhouse-local`). [#60689](https://github.com/ClickHouse/ClickHouse/pull/60689) ([Azat Khuzhin](https://github.com/azat)). +* Fix "Invalid storage definition in metadata file" for parameterized views. [#60708](https://github.com/ClickHouse/ClickHouse/pull/60708) ([Azat Khuzhin](https://github.com/azat)). +* Fix buffer overflow that can happen if the attacker asks the HTTP server to decompress data with a composition of codecs and size triggering numeric overflow. Fix buffer overflow that can happen inside codec NONE on wrong input data. This was submitted by TIANGONG research team through our [Bug Bounty program](https://github.com/ClickHouse/ClickHouse/issues/38986). [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Functions for SQL/JSON were able to read uninitialized memory. This closes [#60017](https://github.com/ClickHouse/ClickHouse/issues/60017). Found by Fuzzer. [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Remove wrong sanitize checking in aggregate function quantileGK: `sampled_len` in `ApproxSampler` is not guaranteed to be less than `default_compress_threshold`. `default_compress_threshold` is a just soft limitation while executing `ApproxSampler::insert`. cc @Algunenano. This issue was reproduced in https://github.com/oap-project/gluten/pull/4829. [#60740](https://github.com/ClickHouse/ClickHouse/pull/60740) ([李扬](https://github.com/taiyang-li)). +* Fix the issue causing undesired deduplication on insert-select queries passing a custom `insert_deduplication_token.` The change sets streams to 1 in those cases to prevent the issue from happening at the expense of ignoring `max_insert_threads > 1`. [#60745](https://github.com/ClickHouse/ClickHouse/pull/60745) ([Jordi Villar](https://github.com/jrdi)). +* Do not set aws custom metadata `x-amz-meta-*` headers on UploadPart & CompleteMultipartUpload calls. [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). +* One more fix for toStartOfInterval returning wrong result for interval smaller than second. [#60763](https://github.com/ClickHouse/ClickHouse/pull/60763) ([Andrey Zvonov](https://github.com/zvonand)). +* Fix crash in arrayEnumerateRanked. [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). +* Fix crash when using input() in INSERT SELECT JOIN. Closes [#60035](https://github.com/ClickHouse/ClickHouse/issues/60035). [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix crash when `allow_experimental_analyzer` setting value is changed in the subqueries. [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). +* Avoid segfault if too many keys are skipped when reading from S3. [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix possible stuck on error while reloading dictionary with `SHARDS`. [#60926](https://github.com/ClickHouse/ClickHouse/pull/60926) ([vdimir](https://github.com/vdimir)). +* Fix async RESTORE with Replicated database. [#60934](https://github.com/ClickHouse/ClickHouse/pull/60934) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix csv write tuple in a wrong format and can not read it. [#60994](https://github.com/ClickHouse/ClickHouse/pull/60994) ([shuai.xu](https://github.com/shuai-xu)). +* Fixed deadlock in async inserts to `Log` tables via native protocol. [#61055](https://github.com/ClickHouse/ClickHouse/pull/61055) ([Anton Popov](https://github.com/CurtizJ)). +* Fix lazy execution of default argument in dictGetOrDefault for RangeHashedDictionary that could lead to nullptr dereference on bad column types in FunctionsConversion. Closes [#56661](https://github.com/ClickHouse/ClickHouse/issues/56661). [#61196](https://github.com/ClickHouse/ClickHouse/pull/61196) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix multiple bugs in groupArraySorted. [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)). +* Keeper: fix runtime reconfig for standalone binary. [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix usage of session_token in S3 engine. Fixes https://github.com/ClickHouse/ClickHouse/pull/57850#issuecomment-1966404710. [#61234](https://github.com/ClickHouse/ClickHouse/pull/61234) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix possible incorrect result of aggregate function `uniqExact`. [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). +* Fix bugs in show database. [#61269](https://github.com/ClickHouse/ClickHouse/pull/61269) ([Raúl Marín](https://github.com/Algunenano)). +* Fix possible `LOGICAL_ERROR` in case storage with `RabbitMQ` engine has unsupported `MATERIALIZED|ALIAS|DEFAULT` columns. [#61320](https://github.com/ClickHouse/ClickHouse/pull/61320) ([vdimir](https://github.com/vdimir)). +* This PR fixes `CREATE OR REPLACE DICTIONARY` with `lazy_load` turned off. [#61356](https://github.com/ClickHouse/ClickHouse/pull/61356) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix possible crash in `Object('json')` data type parsing array with `null`s. [#61364](https://github.com/ClickHouse/ClickHouse/pull/61364) ([vdimir](https://github.com/vdimir)). +* Fix the ATTACH query with the ON CLUSTER clause when the database does not exist on the initiator node. Closes [#55009](https://github.com/ClickHouse/ClickHouse/issues/55009). [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). +* Fixed possible wrong result of aggregation with nullable keys. [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). +* ActionsDAG::split can't make sure that "Execution of first then second parts on block is equivalent to execution of initial DAG.". [#61458](https://github.com/ClickHouse/ClickHouse/pull/61458) ([Raúl Marín](https://github.com/Algunenano)). +* Fix finishing a failed RESTORE. [#61466](https://github.com/ClickHouse/ClickHouse/pull/61466) ([Vitaly Baranov](https://github.com/vitlibar)). +* Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings. [#61468](https://github.com/ClickHouse/ClickHouse/pull/61468) ([Raúl Marín](https://github.com/Algunenano)). +* Fix deadlock during `restore database` execution if `restore_threads` was set to 1. [#61475](https://github.com/ClickHouse/ClickHouse/pull/61475) ([Nikita Taranov](https://github.com/nickitat)). +* Fix incorrect results when filtering `system.parts` or `system.parts_columns` using UUID. [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). +* Fix the `ALTER QUERY MODIFY SQL SECURITY` queries to override the table's DDL correctly. [#61480](https://github.com/ClickHouse/ClickHouse/pull/61480) ([pufit](https://github.com/pufit)). +* The experimental "window view" feature (it is disabled by default), which should not be used in production, could lead to a crash. Issue was identified by YohannJardin via Bugcrowd program. [#61526](https://github.com/ClickHouse/ClickHouse/pull/61526) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix `repeat` with non-native integers (e.g. `UInt256`). [#61527](https://github.com/ClickHouse/ClickHouse/pull/61527) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix `clickhouse-client -s` argument, it was broken by defining it two times. [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Fix too high part level reported in [#58558](https://github.com/ClickHouse/ClickHouse/issues/58558) by resetting MergeTree part levels upon attach from disk just like `ReplicatedMergeTree` [does](https://github.com/ClickHouse/ClickHouse/blob/9cd7e6155c7027baccd6dc5380d0813db94b03cc/src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp#L838). [#61536](https://github.com/ClickHouse/ClickHouse/pull/61536) ([Arthur Passos](https://github.com/arthurpassos)). +* Fix crash in arrayPartialReverseSort. [#61539](https://github.com/ClickHouse/ClickHouse/pull/61539) ([Raúl Marín](https://github.com/Algunenano)). +* Fix string search with constant start position which previously could lead to memory corruption. [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix the issue where the function `addDays` (and similar functions) reports an error when the first parameter is `DateTime64`. [#61561](https://github.com/ClickHouse/ClickHouse/pull/61561) ([Shuai li](https://github.com/loneylee)). +* Disallow LowCardinality type for the column containing JSON input in the JSONExtract function. [#61617](https://github.com/ClickHouse/ClickHouse/pull/61617) ([Julia Kartseva](https://github.com/jkartseva)). +* Add parts to `system.part_log` when created using async insert with deduplication. [#61620](https://github.com/ClickHouse/ClickHouse/pull/61620) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix `Not-ready Set` error while reading from `system.parts` (with `IN subquery`). Was introduced in [#60510](https://github.com/ClickHouse/ClickHouse/issues/60510). [#61666](https://github.com/ClickHouse/ClickHouse/pull/61666) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Don't allow the same expression in ORDER BY with and without WITH FILL. Such invalid expression could lead to logical error `Invalid number of rows in Chunk`. [#61667](https://github.com/ClickHouse/ClickHouse/pull/61667) ([Kruglov Pavel](https://github.com/Avogar)). +* Fixed `Entry actual part isn't empty yet. This is a bug. (LOGICAL_ERROR)` that might happen in rare cases after executing `REPLACE PARTITION`, `MOVE PARTITION TO TABLE` or `ATTACH PARTITION FROM`. [#61675](https://github.com/ClickHouse/ClickHouse/pull/61675) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Fix columns after executing `ALTER TABLE MODIFY QUERY` for a materialized view with internal table. A materialized view must have the same columns as its internal table if any, however `MODIFY QUERY` could break that rule before this PR causing the materialized view to be inconsistent. [#61734](https://github.com/ClickHouse/ClickHouse/pull/61734) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` when specifying incorrect UTF-8 sequence. Example: [#61714](https://github.com/ClickHouse/ClickHouse/issues/61714#issuecomment-2012768202). [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). +* Fix RANGE frame is not supported for Nullable columns. ``` SELECT number, sum(number) OVER (ORDER BY number ASC RANGE BETWEEN CURRENT ROW AND 1 FOLLOWING) AS sum FROM values('number Nullable(Int8)', 1, 1, 2, 3, NULL). [#61766](https://github.com/ClickHouse/ClickHouse/pull/61766) ([YuanLiu](https://github.com/ditgittube)). +* Fix incorrect results when filtering `system.parts` or `system.parts_columns` using UUID. [#61779](https://github.com/ClickHouse/ClickHouse/pull/61779) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). #### CI Fix or Improvement (changelog entry is not required) @@ -526,7 +526,7 @@ sidebar_label: 2024 * No "please" [#61916](https://github.com/ClickHouse/ClickHouse/pull/61916) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Update version_date.tsv and changelogs after v23.12.6.19-stable [#61917](https://github.com/ClickHouse/ClickHouse/pull/61917) ([robot-clickhouse](https://github.com/robot-clickhouse)). * Update version_date.tsv and changelogs after v24.1.8.22-stable [#61918](https://github.com/ClickHouse/ClickHouse/pull/61918) ([robot-clickhouse](https://github.com/robot-clickhouse)). -* Fix flaky test_broken_projestions/test.py::test_broken_ignored_replic... [#61932](https://github.com/ClickHouse/ClickHouse/pull/61932) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix flaky test_broken_projestions/test.py::test_broken_ignored_replic… [#61932](https://github.com/ClickHouse/ClickHouse/pull/61932) ([Kseniia Sumarokova](https://github.com/kssenii)). * Check is Rust avaiable for build, if not, suggest a way to disable Rust support [#61938](https://github.com/ClickHouse/ClickHouse/pull/61938) ([Azat Khuzhin](https://github.com/azat)). * CI: new ci menu in PR body [#61948](https://github.com/ClickHouse/ClickHouse/pull/61948) ([Max K.](https://github.com/maxknv)). * Remove flaky test `01193_metadata_loading` [#61961](https://github.com/ClickHouse/ClickHouse/pull/61961) ([Nikita Taranov](https://github.com/nickitat)). diff --git a/docs/changelogs/v24.3.2.23-lts.md b/docs/changelogs/v24.3.2.23-lts.md index 4d59a1cedf6..d8adc63c8ac 100644 --- a/docs/changelogs/v24.3.2.23-lts.md +++ b/docs/changelogs/v24.3.2.23-lts.md @@ -9,9 +9,9 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix logical error in group_by_use_nulls + grouping set + analyzer + materialize/constant [#61567](https://github.com/ClickHouse/ClickHouse/pull/61567) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix external table cannot parse data type Bool [#62115](https://github.com/ClickHouse/ClickHouse/pull/62115) ([Duc Canh Le](https://github.com/canhld94)). -* Revert "Merge pull request [#61564](https://github.com/ClickHouse/ClickHouse/issues/61564) from liuneng1994/optimize_in_single_value" [#62135](https://github.com/ClickHouse/ClickHouse/pull/62135) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#62078](https://github.com/ClickHouse/ClickHouse/issues/62078): Fix logical error ''Unexpected return type from materialize. Expected Nullable. Got UInt8' while using group_by_use_nulls with analyzer and materialize/constant in grouping set. Closes [#61531](https://github.com/ClickHouse/ClickHouse/issues/61531). [#61567](https://github.com/ClickHouse/ClickHouse/pull/61567) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62122](https://github.com/ClickHouse/ClickHouse/issues/62122): Fix external table cannot parse data type Bool. [#62115](https://github.com/ClickHouse/ClickHouse/pull/62115) ([Duc Canh Le](https://github.com/canhld94)). +* Backported in [#62147](https://github.com/ClickHouse/ClickHouse/issues/62147): Revert "Merge pull request [#61564](https://github.com/ClickHouse/ClickHouse/issues/61564) from liuneng1994/optimize_in_single_value". The feature is broken and can't be disabled individually. [#62135](https://github.com/ClickHouse/ClickHouse/pull/62135) ([Raúl Marín](https://github.com/Algunenano)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.3.3.102-lts.md b/docs/changelogs/v24.3.3.102-lts.md index dc89ac24208..1cdbde67031 100644 --- a/docs/changelogs/v24.3.3.102-lts.md +++ b/docs/changelogs/v24.3.3.102-lts.md @@ -17,36 +17,36 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Cancel merges before removing moved parts [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). -* Crash in Engine Merge if Row Policy does not have expression [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). -* ReadWriteBufferFromHTTP set right header host when redirected [#62068](https://github.com/ClickHouse/ClickHouse/pull/62068) ([Sema Checherinda](https://github.com/CheSema)). -* Analyzer: Fix query parameter resolution [#62186](https://github.com/ClickHouse/ClickHouse/pull/62186) ([Dmitry Novik](https://github.com/novikd)). -* Fixing NULL random seed for generateRandom with analyzer. [#62248](https://github.com/ClickHouse/ClickHouse/pull/62248) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix PartsSplitter [#62268](https://github.com/ClickHouse/ClickHouse/pull/62268) ([Nikita Taranov](https://github.com/nickitat)). -* Analyzer: Fix alias to parametrized view resolution [#62274](https://github.com/ClickHouse/ClickHouse/pull/62274) ([Dmitry Novik](https://github.com/novikd)). -* Analyzer: Fix name resolution from parent scopes [#62281](https://github.com/ClickHouse/ClickHouse/pull/62281) ([Dmitry Novik](https://github.com/novikd)). -* Fix argMax with nullable non native numeric column [#62285](https://github.com/ClickHouse/ClickHouse/pull/62285) ([Raúl Marín](https://github.com/Algunenano)). -* Fix data race on scalars in Context [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix analyzer with positional arguments in distributed query [#62362](https://github.com/ClickHouse/ClickHouse/pull/62362) ([flynn](https://github.com/ucasfl)). -* Fix filter pushdown from additional_table_filters in Merge engine in analyzer [#62398](https://github.com/ClickHouse/ClickHouse/pull/62398) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix GLOBAL IN table queries with analyzer. [#62409](https://github.com/ClickHouse/ClickHouse/pull/62409) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix scalar subquery in LIMIT [#62567](https://github.com/ClickHouse/ClickHouse/pull/62567) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Try to fix segfault in Hive engine [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix memory leak in groupArraySorted [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix argMin/argMax combinator state [#62708](https://github.com/ClickHouse/ClickHouse/pull/62708) ([Raúl Marín](https://github.com/Algunenano)). -* Fix temporary data in cache failing because of cache lock contention optimization [#62715](https://github.com/ClickHouse/ClickHouse/pull/62715) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix FINAL modifier is not respected in CTE with analyzer [#62811](https://github.com/ClickHouse/ClickHouse/pull/62811) ([Duc Canh Le](https://github.com/canhld94)). -* Fix crash in function `formatRow` with `JSON` format and HTTP interface [#62840](https://github.com/ClickHouse/ClickHouse/pull/62840) ([Anton Popov](https://github.com/CurtizJ)). -* Fix GCD codec [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). -* Disable optimize_rewrite_aggregate_function_with_if for sum(nullable) [#62912](https://github.com/ClickHouse/ClickHouse/pull/62912) ([Raúl Marín](https://github.com/Algunenano)). -* Fix temporary data in cache incorrectly processing failure of cache key directory creation [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix optimize_rewrite_aggregate_function_with_if implicit cast [#62999](https://github.com/ClickHouse/ClickHouse/pull/62999) ([Raúl Marín](https://github.com/Algunenano)). -* Do not remove server constants from GROUP BY key for secondary query. [#63047](https://github.com/ClickHouse/ClickHouse/pull/63047) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix incorrect judgement of of monotonicity of function abs [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). -* Set server name for SSL handshake in MongoDB engine [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). -* Use user specified db instead of "config" for MongoDB wire protocol version check [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). -* Format SQL security option only in `CREATE VIEW` queries. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). +* Backported in [#62533](https://github.com/ClickHouse/ClickHouse/issues/62533): Fix data race between `MOVE PARTITION` query and merges resulting in intersecting parts. [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#62244](https://github.com/ClickHouse/ClickHouse/issues/62244): Fix skipping escape sequcne parsing errors during JSON data parsing while using `input_format_allow_errors_num/ratio` settings. [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62220](https://github.com/ClickHouse/ClickHouse/issues/62220): Fixes Crash in Engine Merge if Row Policy does not have expression. [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). +* Backported in [#62234](https://github.com/ClickHouse/ClickHouse/issues/62234): ReadWriteBufferFromHTTP set right header host when redirected. [#62068](https://github.com/ClickHouse/ClickHouse/pull/62068) ([Sema Checherinda](https://github.com/CheSema)). +* Backported in [#62278](https://github.com/ClickHouse/ClickHouse/issues/62278): Fix query parameter resolution with `allow_experimental_analyzer` enabled. Closes [#62113](https://github.com/ClickHouse/ClickHouse/issues/62113). [#62186](https://github.com/ClickHouse/ClickHouse/pull/62186) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#62354](https://github.com/ClickHouse/ClickHouse/issues/62354): Fix `generateRandom` with `NULL` in the seed argument. Fixes [#62092](https://github.com/ClickHouse/ClickHouse/issues/62092). [#62248](https://github.com/ClickHouse/ClickHouse/pull/62248) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#62412](https://github.com/ClickHouse/ClickHouse/issues/62412): When some index columns are not loaded into memory for some parts of a *MergeTree table, queries with `FINAL` might produce wrong results. Now we explicitly choose only the common prefix of index columns for all parts to avoid this issue. [#62268](https://github.com/ClickHouse/ClickHouse/pull/62268) ([Nikita Taranov](https://github.com/nickitat)). +* Backported in [#62733](https://github.com/ClickHouse/ClickHouse/issues/62733): Fix inability to address parametrized view in SELECT queries via aliases. [#62274](https://github.com/ClickHouse/ClickHouse/pull/62274) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#62407](https://github.com/ClickHouse/ClickHouse/issues/62407): Fix name resolution in case when identifier is resolved to an executed scalar subquery. [#62281](https://github.com/ClickHouse/ClickHouse/pull/62281) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#62331](https://github.com/ClickHouse/ClickHouse/issues/62331): Fix argMax with nullable non native numeric column. [#62285](https://github.com/ClickHouse/ClickHouse/pull/62285) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#62344](https://github.com/ClickHouse/ClickHouse/issues/62344): Fix data race on scalars in Context. [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62484](https://github.com/ClickHouse/ClickHouse/issues/62484): Resolve positional arguments only on the initiator node. Closes [#62289](https://github.com/ClickHouse/ClickHouse/issues/62289). [#62362](https://github.com/ClickHouse/ClickHouse/pull/62362) ([flynn](https://github.com/ucasfl)). +* Backported in [#62442](https://github.com/ClickHouse/ClickHouse/issues/62442): Fix filter pushdown from additional_table_filters in Merge engine in analyzer. Closes [#62229](https://github.com/ClickHouse/ClickHouse/issues/62229). [#62398](https://github.com/ClickHouse/ClickHouse/pull/62398) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62475](https://github.com/ClickHouse/ClickHouse/issues/62475): Fix `Unknown expression or table expression identifier` error for `GLOBAL IN table` queries (with new analyzer). Fixes [#62286](https://github.com/ClickHouse/ClickHouse/issues/62286). [#62409](https://github.com/ClickHouse/ClickHouse/pull/62409) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#62612](https://github.com/ClickHouse/ClickHouse/issues/62612): Fix an error `LIMIT expression must be constant` in queries with constant expression in `LIMIT`/`OFFSET` which contains scalar subquery. Fixes [#62294](https://github.com/ClickHouse/ClickHouse/issues/62294). [#62567](https://github.com/ClickHouse/ClickHouse/pull/62567) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#62679](https://github.com/ClickHouse/ClickHouse/issues/62679): Fix segmentation fault when using Hive table engine. Reference [#62154](https://github.com/ClickHouse/ClickHouse/issues/62154), [#62560](https://github.com/ClickHouse/ClickHouse/issues/62560). [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#62641](https://github.com/ClickHouse/ClickHouse/issues/62641): Fix memory leak in groupArraySorted. Fix [#62536](https://github.com/ClickHouse/ClickHouse/issues/62536). [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#62770](https://github.com/ClickHouse/ClickHouse/issues/62770): Fix argMin/argMax combinator state. [#62708](https://github.com/ClickHouse/ClickHouse/pull/62708) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#62750](https://github.com/ClickHouse/ClickHouse/issues/62750): Fix temporary data in cache failing because of a small value of setting `filesystem_cache_reserve_space_wait_lock_timeout_milliseconds`. Introduced a separate setting `temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds`. [#62715](https://github.com/ClickHouse/ClickHouse/pull/62715) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#62993](https://github.com/ClickHouse/ClickHouse/issues/62993): Fix an error when `FINAL` is not applied when specified in CTE (new analyzer). Fixes [#62779](https://github.com/ClickHouse/ClickHouse/issues/62779). [#62811](https://github.com/ClickHouse/ClickHouse/pull/62811) ([Duc Canh Le](https://github.com/canhld94)). +* Backported in [#62859](https://github.com/ClickHouse/ClickHouse/issues/62859): Fixed crash in function `formatRow` with `JSON` format in queries executed via the HTTP interface. [#62840](https://github.com/ClickHouse/ClickHouse/pull/62840) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#63056](https://github.com/ClickHouse/ClickHouse/issues/63056): Fixed bug in GCD codec implementation that may lead to server crashes. [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). +* Backported in [#62960](https://github.com/ClickHouse/ClickHouse/issues/62960): Disable optimize_rewrite_aggregate_function_with_if for sum(nullable). [#62912](https://github.com/ClickHouse/ClickHouse/pull/62912) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#63032](https://github.com/ClickHouse/ClickHouse/issues/63032): Fix temporary data in cache incorrect behaviour in case creation of cache key base directory fails with `no space left on device`. [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#63148](https://github.com/ClickHouse/ClickHouse/issues/63148): Fix optimize_rewrite_aggregate_function_with_if implicit cast. [#62999](https://github.com/ClickHouse/ClickHouse/pull/62999) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#63146](https://github.com/ClickHouse/ClickHouse/issues/63146): Fix `Not found column in block` error for distributed queries with server-side constants in `GROUP BY` key. Fixes [#62682](https://github.com/ClickHouse/ClickHouse/issues/62682). [#63047](https://github.com/ClickHouse/ClickHouse/pull/63047) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#63144](https://github.com/ClickHouse/ClickHouse/issues/63144): Fix incorrect judgement of of monotonicity of function `abs`. [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). +* Backported in [#63178](https://github.com/ClickHouse/ClickHouse/issues/63178): Setting server_name might help with recently reported SSL handshake error when connecting to MongoDB Atlas: `Poco::Exception. Code: 1000, e.code() = 0, SSL Exception: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR`. [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#63170](https://github.com/ClickHouse/ClickHouse/issues/63170): The wire protocol version check for MongoDB used to try accessing "config" database, but this can fail if the user doesn't have permissions for it. The fix is to use the database name provided by user. [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#63193](https://github.com/ClickHouse/ClickHouse/issues/63193): Fix a bug when `SQL SECURITY` statement appears in all `CREATE` queries if the server setting `ignore_empty_sql_security_in_create_view_query=true` https://github.com/ClickHouse/ClickHouse/pull/63134. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.4.1.2088-stable.md b/docs/changelogs/v24.4.1.2088-stable.md index b8d83f1a31f..06e704356d4 100644 --- a/docs/changelogs/v24.4.1.2088-stable.md +++ b/docs/changelogs/v24.4.1.2088-stable.md @@ -106,75 +106,75 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix parser error when using COUNT(*) with FILTER clause [#61357](https://github.com/ClickHouse/ClickHouse/pull/61357) ([Duc Canh Le](https://github.com/canhld94)). -* Fix logical error in group_by_use_nulls + grouping set + analyzer + materialize/constant [#61567](https://github.com/ClickHouse/ClickHouse/pull/61567) ([Kruglov Pavel](https://github.com/Avogar)). -* Cancel merges before removing moved parts [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Try to fix abort in arrow [#61720](https://github.com/ClickHouse/ClickHouse/pull/61720) ([Kruglov Pavel](https://github.com/Avogar)). -* Search for convert_to_replicated flag at the correct path [#61769](https://github.com/ClickHouse/ClickHouse/pull/61769) ([Kirill](https://github.com/kirillgarbar)). -* Fix possible connections data-race for distributed_foreground_insert/distributed_background_insert_batch [#61867](https://github.com/ClickHouse/ClickHouse/pull/61867) ([Azat Khuzhin](https://github.com/azat)). -* Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix writing exception message in output format in HTTP when http_wait_end_of_query is used [#61951](https://github.com/ClickHouse/ClickHouse/pull/61951) ([Kruglov Pavel](https://github.com/Avogar)). -* Proper fix for LowCardinality together with JSONExtact functions [#61957](https://github.com/ClickHouse/ClickHouse/pull/61957) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). -* Crash in Engine Merge if Row Policy does not have expression [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). -* Fix WriteBufferAzureBlobStorage destructor uncaught exception [#61988](https://github.com/ClickHouse/ClickHouse/pull/61988) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* Fix CREATE TABLE w/o columns definition for ReplicatedMergeTree [#62040](https://github.com/ClickHouse/ClickHouse/pull/62040) ([Azat Khuzhin](https://github.com/azat)). -* Fix optimize_skip_unused_shards_rewrite_in for composite sharding key [#62047](https://github.com/ClickHouse/ClickHouse/pull/62047) ([Azat Khuzhin](https://github.com/azat)). -* ReadWriteBufferFromHTTP set right header host when redirected [#62068](https://github.com/ClickHouse/ClickHouse/pull/62068) ([Sema Checherinda](https://github.com/CheSema)). -* Fix external table cannot parse data type Bool [#62115](https://github.com/ClickHouse/ClickHouse/pull/62115) ([Duc Canh Le](https://github.com/canhld94)). -* Revert "Merge pull request [#61564](https://github.com/ClickHouse/ClickHouse/issues/61564) from liuneng1994/optimize_in_single_value" [#62135](https://github.com/ClickHouse/ClickHouse/pull/62135) ([Raúl Marín](https://github.com/Algunenano)). -* Add test for [#35215](https://github.com/ClickHouse/ClickHouse/issues/35215) [#62180](https://github.com/ClickHouse/ClickHouse/pull/62180) ([Raúl Marín](https://github.com/Algunenano)). -* Analyzer: Fix query parameter resolution [#62186](https://github.com/ClickHouse/ClickHouse/pull/62186) ([Dmitry Novik](https://github.com/novikd)). -* Fix restoring parts while readonly [#62207](https://github.com/ClickHouse/ClickHouse/pull/62207) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix crash in index definition containing sql udf [#62225](https://github.com/ClickHouse/ClickHouse/pull/62225) ([vdimir](https://github.com/vdimir)). -* Fixing NULL random seed for generateRandom with analyzer. [#62248](https://github.com/ClickHouse/ClickHouse/pull/62248) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Correctly handle const columns in DistinctTransfom [#62250](https://github.com/ClickHouse/ClickHouse/pull/62250) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix PartsSplitter [#62268](https://github.com/ClickHouse/ClickHouse/pull/62268) ([Nikita Taranov](https://github.com/nickitat)). -* Analyzer: Fix alias to parametrized view resolution [#62274](https://github.com/ClickHouse/ClickHouse/pull/62274) ([Dmitry Novik](https://github.com/novikd)). -* Analyzer: Fix name resolution from parent scopes [#62281](https://github.com/ClickHouse/ClickHouse/pull/62281) ([Dmitry Novik](https://github.com/novikd)). -* Fix argMax with nullable non native numeric column [#62285](https://github.com/ClickHouse/ClickHouse/pull/62285) ([Raúl Marín](https://github.com/Algunenano)). -* Fix BACKUP and RESTORE of a materialized view in Ordinary database [#62295](https://github.com/ClickHouse/ClickHouse/pull/62295) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix data race on scalars in Context [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix primary key in materialized view [#62319](https://github.com/ClickHouse/ClickHouse/pull/62319) ([Murat Khairulin](https://github.com/mxwell)). -* Do not build multithread insert pipeline for tables without support [#62333](https://github.com/ClickHouse/ClickHouse/pull/62333) ([vdimir](https://github.com/vdimir)). -* Fix analyzer with positional arguments in distributed query [#62362](https://github.com/ClickHouse/ClickHouse/pull/62362) ([flynn](https://github.com/ucasfl)). -* Fix filter pushdown from additional_table_filters in Merge engine in analyzer [#62398](https://github.com/ClickHouse/ClickHouse/pull/62398) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix GLOBAL IN table queries with analyzer. [#62409](https://github.com/ClickHouse/ClickHouse/pull/62409) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Respect settings truncate_on_insert/create_new_file_on_insert in s3/hdfs/azure engines during partitioned write [#62425](https://github.com/ClickHouse/ClickHouse/pull/62425) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix backup restore path for AzureBlobStorage [#62447](https://github.com/ClickHouse/ClickHouse/pull/62447) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* Fix SimpleSquashingChunksTransform [#62451](https://github.com/ClickHouse/ClickHouse/pull/62451) ([Nikita Taranov](https://github.com/nickitat)). -* Fix capture of nested lambda. [#62462](https://github.com/ClickHouse/ClickHouse/pull/62462) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix validation of special MergeTree columns [#62498](https://github.com/ClickHouse/ClickHouse/pull/62498) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Avoid crash when reading protobuf with recursive types [#62506](https://github.com/ClickHouse/ClickHouse/pull/62506) ([Raúl Marín](https://github.com/Algunenano)). -* Fix a bug moving one partition from one to itself [#62524](https://github.com/ClickHouse/ClickHouse/pull/62524) ([helifu](https://github.com/helifu)). -* Fix scalar subquery in LIMIT [#62567](https://github.com/ClickHouse/ClickHouse/pull/62567) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Try to fix segfault in Hive engine [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix memory leak in groupArraySorted [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix crash in largestTriangleThreeBuckets [#62646](https://github.com/ClickHouse/ClickHouse/pull/62646) ([Raúl Marín](https://github.com/Algunenano)). -* Fix tumble[Start,End] and hop[Start,End] for bigger resolutions [#62705](https://github.com/ClickHouse/ClickHouse/pull/62705) ([Jordi Villar](https://github.com/jrdi)). -* Fix argMin/argMax combinator state [#62708](https://github.com/ClickHouse/ClickHouse/pull/62708) ([Raúl Marín](https://github.com/Algunenano)). -* Fix temporary data in cache failing because of cache lock contention optimization [#62715](https://github.com/ClickHouse/ClickHouse/pull/62715) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix crash in function `mergeTreeIndex` [#62762](https://github.com/ClickHouse/ClickHouse/pull/62762) ([Anton Popov](https://github.com/CurtizJ)). -* fix: update: nested materialized columns: size check fixes [#62773](https://github.com/ClickHouse/ClickHouse/pull/62773) ([Eliot Hautefeuille](https://github.com/hileef)). -* Fix FINAL modifier is not respected in CTE with analyzer [#62811](https://github.com/ClickHouse/ClickHouse/pull/62811) ([Duc Canh Le](https://github.com/canhld94)). -* Fix crash in function `formatRow` with `JSON` format and HTTP interface [#62840](https://github.com/ClickHouse/ClickHouse/pull/62840) ([Anton Popov](https://github.com/CurtizJ)). -* Azure: fix building final url from endpoint object [#62850](https://github.com/ClickHouse/ClickHouse/pull/62850) ([Daniel Pozo Escalona](https://github.com/danipozo)). -* Fix GCD codec [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). -* Fix LowCardinality(Nullable) key in hyperrectangle [#62866](https://github.com/ClickHouse/ClickHouse/pull/62866) ([Amos Bird](https://github.com/amosbird)). -* Fix fromUnixtimestamp in joda syntax while the input value beyond UInt32 [#62901](https://github.com/ClickHouse/ClickHouse/pull/62901) ([KevinyhZou](https://github.com/KevinyhZou)). -* Disable optimize_rewrite_aggregate_function_with_if for sum(nullable) [#62912](https://github.com/ClickHouse/ClickHouse/pull/62912) ([Raúl Marín](https://github.com/Algunenano)). -* Fix PREWHERE for StorageBuffer with different source table column types. [#62916](https://github.com/ClickHouse/ClickHouse/pull/62916) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix temporary data in cache incorrectly processing failure of cache key directory creation [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). -* gRPC: fix crash on IPv6 peer connection [#62978](https://github.com/ClickHouse/ClickHouse/pull/62978) ([Konstantin Bogdanov](https://github.com/thevar1able)). -* Fix possible CHECKSUM_DOESNT_MATCH (and others) during replicated fetches [#62987](https://github.com/ClickHouse/ClickHouse/pull/62987) ([Azat Khuzhin](https://github.com/azat)). -* Fix terminate with uncaught exception in temporary data in cache [#62998](https://github.com/ClickHouse/ClickHouse/pull/62998) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix optimize_rewrite_aggregate_function_with_if implicit cast [#62999](https://github.com/ClickHouse/ClickHouse/pull/62999) ([Raúl Marín](https://github.com/Algunenano)). -* Fix unhandled exception in ~RestorerFromBackup [#63040](https://github.com/ClickHouse/ClickHouse/pull/63040) ([Vitaly Baranov](https://github.com/vitlibar)). -* Do not remove server constants from GROUP BY key for secondary query. [#63047](https://github.com/ClickHouse/ClickHouse/pull/63047) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix incorrect judgement of of monotonicity of function abs [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). -* Make sanity check of settings worse [#63119](https://github.com/ClickHouse/ClickHouse/pull/63119) ([Raúl Marín](https://github.com/Algunenano)). -* Set server name for SSL handshake in MongoDB engine [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). -* Use user specified db instead of "config" for MongoDB wire protocol version check [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). -* Format SQL security option only in `CREATE VIEW` queries. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). +* Fix parser error when using COUNT(*) with FILTER clause. [#61357](https://github.com/ClickHouse/ClickHouse/pull/61357) ([Duc Canh Le](https://github.com/canhld94)). +* Fix logical error ''Unexpected return type from materialize. Expected Nullable. Got UInt8' while using group_by_use_nulls with analyzer and materialize/constant in grouping set. Closes [#61531](https://github.com/ClickHouse/ClickHouse/issues/61531). [#61567](https://github.com/ClickHouse/ClickHouse/pull/61567) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix data race between `MOVE PARTITION` query and merges resulting in intersecting parts. [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* TBD. [#61720](https://github.com/ClickHouse/ClickHouse/pull/61720) ([Kruglov Pavel](https://github.com/Avogar)). +* Search for MergeTree to ReplicatedMergeTree conversion flag at the correct location for tables with custom storage policy. [#61769](https://github.com/ClickHouse/ClickHouse/pull/61769) ([Kirill](https://github.com/kirillgarbar)). +* Fix possible connections data-race for distributed_foreground_insert/distributed_background_insert_batch that leads to crashes. [#61867](https://github.com/ClickHouse/ClickHouse/pull/61867) ([Azat Khuzhin](https://github.com/azat)). +* Fix skipping escape sequcne parsing errors during JSON data parsing while using `input_format_allow_errors_num/ratio` settings. [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix writing exception message in output format in HTTP when http_wait_end_of_query is used. Closes [#55101](https://github.com/ClickHouse/ClickHouse/issues/55101). [#61951](https://github.com/ClickHouse/ClickHouse/pull/61951) ([Kruglov Pavel](https://github.com/Avogar)). +* This PR reverts https://github.com/ClickHouse/ClickHouse/pull/61617 and fixed the problem with usage of LowCardinality columns together with JSONExtract function. Previously the user may receive either incorrect result of a logical error. [#61957](https://github.com/ClickHouse/ClickHouse/pull/61957) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Fixes Crash in Engine Merge if Row Policy does not have expression. [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). +* Implemented preFinalize, updated finalizeImpl & destructor of WriteBufferAzureBlobStorage to avoided having uncaught exception in destructor. [#61988](https://github.com/ClickHouse/ClickHouse/pull/61988) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fix CREATE TABLE w/o columns definition for ReplicatedMergeTree (columns will be obtained from replica). [#62040](https://github.com/ClickHouse/ClickHouse/pull/62040) ([Azat Khuzhin](https://github.com/azat)). +* Fix optimize_skip_unused_shards_rewrite_in for composite sharding key (could lead to `NOT_FOUND_COLUMN_IN_BLOCK` and `TYPE_MISMATCH`). [#62047](https://github.com/ClickHouse/ClickHouse/pull/62047) ([Azat Khuzhin](https://github.com/azat)). +* ReadWriteBufferFromHTTP set right header host when redirected. [#62068](https://github.com/ClickHouse/ClickHouse/pull/62068) ([Sema Checherinda](https://github.com/CheSema)). +* Fix external table cannot parse data type Bool. [#62115](https://github.com/ClickHouse/ClickHouse/pull/62115) ([Duc Canh Le](https://github.com/canhld94)). +* Revert "Merge pull request [#61564](https://github.com/ClickHouse/ClickHouse/issues/61564) from liuneng1994/optimize_in_single_value". The feature is broken and can't be disabled individually. [#62135](https://github.com/ClickHouse/ClickHouse/pull/62135) ([Raúl Marín](https://github.com/Algunenano)). +* Fix override of MergeTree virtual columns. [#62180](https://github.com/ClickHouse/ClickHouse/pull/62180) ([Raúl Marín](https://github.com/Algunenano)). +* Fix query parameter resolution with `allow_experimental_analyzer` enabled. Closes [#62113](https://github.com/ClickHouse/ClickHouse/issues/62113). [#62186](https://github.com/ClickHouse/ClickHouse/pull/62186) ([Dmitry Novik](https://github.com/novikd)). +* This PR makes `RESTORE ON CLUSTER` wait for each `ReplicatedMergeTree` table to stop being readonly before attaching any restored parts to it. Earlier it didn't wait and it could try to attach some parts at nearly the same time as checking other replicas during the table's startup. In rare cases some parts could be not attached at all during `RESTORE ON CLUSTER` because of that issue. [#62207](https://github.com/ClickHouse/ClickHouse/pull/62207) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix crash on `CREATE TABLE` with `INDEX` containing SQL UDF in expression, close [#62134](https://github.com/ClickHouse/ClickHouse/issues/62134). [#62225](https://github.com/ClickHouse/ClickHouse/pull/62225) ([vdimir](https://github.com/vdimir)). +* Fix `generateRandom` with `NULL` in the seed argument. Fixes [#62092](https://github.com/ClickHouse/ClickHouse/issues/62092). [#62248](https://github.com/ClickHouse/ClickHouse/pull/62248) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix buffer overflow when `DISTINCT` is used with constant values. [#62250](https://github.com/ClickHouse/ClickHouse/pull/62250) ([Antonio Andelic](https://github.com/antonio2368)). +* When some index columns are not loaded into memory for some parts of a *MergeTree table, queries with `FINAL` might produce wrong results. Now we explicitly choose only the common prefix of index columns for all parts to avoid this issue. [#62268](https://github.com/ClickHouse/ClickHouse/pull/62268) ([Nikita Taranov](https://github.com/nickitat)). +* Fix inability to address parametrized view in SELECT queries via aliases. [#62274](https://github.com/ClickHouse/ClickHouse/pull/62274) ([Dmitry Novik](https://github.com/novikd)). +* Fix name resolution in case when identifier is resolved to an executed scalar subquery. [#62281](https://github.com/ClickHouse/ClickHouse/pull/62281) ([Dmitry Novik](https://github.com/novikd)). +* Fix argMax with nullable non native numeric column. [#62285](https://github.com/ClickHouse/ClickHouse/pull/62285) ([Raúl Marín](https://github.com/Algunenano)). +* Fix BACKUP and RESTORE of a materialized view in Ordinary database. [#62295](https://github.com/ClickHouse/ClickHouse/pull/62295) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix data race on scalars in Context. [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix displaying of materialized_view primary_key in system.tables. Previously it was shown empty even when a CREATE query included PRIMARY KEY. [#62319](https://github.com/ClickHouse/ClickHouse/pull/62319) ([Murat Khairulin](https://github.com/mxwell)). +* Do not build multithread insert pipeline for engines without `max_insert_threads` support. Fix insterted rows order in queries like `INSERT INTO FUNCTION file/s3(...) SELECT * FROM ORDER BY col`. [#62333](https://github.com/ClickHouse/ClickHouse/pull/62333) ([vdimir](https://github.com/vdimir)). +* Resolve positional arguments only on the initiator node. Closes [#62289](https://github.com/ClickHouse/ClickHouse/issues/62289). [#62362](https://github.com/ClickHouse/ClickHouse/pull/62362) ([flynn](https://github.com/ucasfl)). +* Fix filter pushdown from additional_table_filters in Merge engine in analyzer. Closes [#62229](https://github.com/ClickHouse/ClickHouse/issues/62229). [#62398](https://github.com/ClickHouse/ClickHouse/pull/62398) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix `Unknown expression or table expression identifier` error for `GLOBAL IN table` queries (with new analyzer). Fixes [#62286](https://github.com/ClickHouse/ClickHouse/issues/62286). [#62409](https://github.com/ClickHouse/ClickHouse/pull/62409) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Respect settings truncate_on_insert/create_new_file_on_insert in s3/hdfs/azure engines during partitioned write. Closes [#61492](https://github.com/ClickHouse/ClickHouse/issues/61492). [#62425](https://github.com/ClickHouse/ClickHouse/pull/62425) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix backup restore path for AzureBlobStorage to include specified blob path. [#62447](https://github.com/ClickHouse/ClickHouse/pull/62447) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fixed rare bug in `SimpleSquashingChunksTransform` that may lead to a loss of the last chunk of data in a stream. [#62451](https://github.com/ClickHouse/ClickHouse/pull/62451) ([Nikita Taranov](https://github.com/nickitat)). +* Fix excessive memory usage for queries with nested lambdas. Fixes [#62036](https://github.com/ClickHouse/ClickHouse/issues/62036). [#62462](https://github.com/ClickHouse/ClickHouse/pull/62462) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix validation of special columns (`ver`, `is_deleted`, `sign`) in MergeTree engines on table creation and alter queries. Fixes [#62463](https://github.com/ClickHouse/ClickHouse/issues/62463). [#62498](https://github.com/ClickHouse/ClickHouse/pull/62498) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Avoid crash when reading protobuf with recursive types. [#62506](https://github.com/ClickHouse/ClickHouse/pull/62506) ([Raúl Marín](https://github.com/Algunenano)). +* Fix [62459](https://github.com/ClickHouse/ClickHouse/issues/62459). [#62524](https://github.com/ClickHouse/ClickHouse/pull/62524) ([helifu](https://github.com/helifu)). +* Fix an error `LIMIT expression must be constant` in queries with constant expression in `LIMIT`/`OFFSET` which contains scalar subquery. Fixes [#62294](https://github.com/ClickHouse/ClickHouse/issues/62294). [#62567](https://github.com/ClickHouse/ClickHouse/pull/62567) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix segmentation fault when using Hive table engine. Reference [#62154](https://github.com/ClickHouse/ClickHouse/issues/62154), [#62560](https://github.com/ClickHouse/ClickHouse/issues/62560). [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). +* Fix memory leak in groupArraySorted. Fix [#62536](https://github.com/ClickHouse/ClickHouse/issues/62536). [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix crash in largestTriangleThreeBuckets. [#62646](https://github.com/ClickHouse/ClickHouse/pull/62646) ([Raúl Marín](https://github.com/Algunenano)). +* Fix `tumble[Start,End]` and `hop[Start,End]` functions for resolutions bigger than a day. [#62705](https://github.com/ClickHouse/ClickHouse/pull/62705) ([Jordi Villar](https://github.com/jrdi)). +* Fix argMin/argMax combinator state. [#62708](https://github.com/ClickHouse/ClickHouse/pull/62708) ([Raúl Marín](https://github.com/Algunenano)). +* Fix temporary data in cache failing because of a small value of setting `filesystem_cache_reserve_space_wait_lock_timeout_milliseconds`. Introduced a separate setting `temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds`. [#62715](https://github.com/ClickHouse/ClickHouse/pull/62715) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fixed crash in table function `mergeTreeIndex` after offloading some of the columns from suffix of primary key. [#62762](https://github.com/ClickHouse/ClickHouse/pull/62762) ([Anton Popov](https://github.com/CurtizJ)). +* Fix size checks when updating materialized nested columns ( fixes [#62731](https://github.com/ClickHouse/ClickHouse/issues/62731) ). [#62773](https://github.com/ClickHouse/ClickHouse/pull/62773) ([Eliot Hautefeuille](https://github.com/hileef)). +* Fix an error when `FINAL` is not applied when specified in CTE (new analyzer). Fixes [#62779](https://github.com/ClickHouse/ClickHouse/issues/62779). [#62811](https://github.com/ClickHouse/ClickHouse/pull/62811) ([Duc Canh Le](https://github.com/canhld94)). +* Fixed crash in function `formatRow` with `JSON` format in queries executed via the HTTP interface. [#62840](https://github.com/ClickHouse/ClickHouse/pull/62840) ([Anton Popov](https://github.com/CurtizJ)). +* Fix failure to start when storage account URL has trailing slash. [#62850](https://github.com/ClickHouse/ClickHouse/pull/62850) ([Daniel Pozo Escalona](https://github.com/danipozo)). +* Fixed bug in GCD codec implementation that may lead to server crashes. [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). +* Fix incorrect key analysis when LowCardinality(Nullable) keys appear in the middle of a hyperrectangle. This fixes [#62848](https://github.com/ClickHouse/ClickHouse/issues/62848). [#62866](https://github.com/ClickHouse/ClickHouse/pull/62866) ([Amos Bird](https://github.com/amosbird)). +* When we use function `fromUnixTimestampInJodaSyntax` to convert the input `Int64` or `UInt64` value to `DateTime`, sometimes it return the wrong result,because the input value may exceed the maximum value of Uint32 type,and the function will first convert the input value to Uint32, and so would lead to the wrong result. For example we have a table `test_tbl(a Int64, b UInt64)`, and it has a row (`10262736196`, `10262736196`), when use `fromUnixTimestampInJodaSyntax` to convert, the wrong result as below. [#62901](https://github.com/ClickHouse/ClickHouse/pull/62901) ([KevinyhZou](https://github.com/KevinyhZou)). +* Disable optimize_rewrite_aggregate_function_with_if for sum(nullable). [#62912](https://github.com/ClickHouse/ClickHouse/pull/62912) ([Raúl Marín](https://github.com/Algunenano)). +* Fix the `Unexpected return type` error for queries that read from `StorageBuffer` with `PREWHERE` when the source table has different types. Fixes [#62545](https://github.com/ClickHouse/ClickHouse/issues/62545). [#62916](https://github.com/ClickHouse/ClickHouse/pull/62916) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix temporary data in cache incorrect behaviour in case creation of cache key base directory fails with `no space left on device`. [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fixed server crash on IPv6 gRPC client connection. [#62978](https://github.com/ClickHouse/ClickHouse/pull/62978) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* Fix possible CHECKSUM_DOESNT_MATCH (and others) during replicated fetches. [#62987](https://github.com/ClickHouse/ClickHouse/pull/62987) ([Azat Khuzhin](https://github.com/azat)). +* Fix terminate with uncaught exception in temporary data in cache. [#62998](https://github.com/ClickHouse/ClickHouse/pull/62998) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix optimize_rewrite_aggregate_function_with_if implicit cast. [#62999](https://github.com/ClickHouse/ClickHouse/pull/62999) ([Raúl Marín](https://github.com/Algunenano)). +* Fix possible crash after unsuccessful RESTORE. This PR fixes [#62985](https://github.com/ClickHouse/ClickHouse/issues/62985). [#63040](https://github.com/ClickHouse/ClickHouse/pull/63040) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix `Not found column in block` error for distributed queries with server-side constants in `GROUP BY` key. Fixes [#62682](https://github.com/ClickHouse/ClickHouse/issues/62682). [#63047](https://github.com/ClickHouse/ClickHouse/pull/63047) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix incorrect judgement of of monotonicity of function `abs`. [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). +* Sanity check: Clamp values instead of throwing. [#63119](https://github.com/ClickHouse/ClickHouse/pull/63119) ([Raúl Marín](https://github.com/Algunenano)). +* Setting server_name might help with recently reported SSL handshake error when connecting to MongoDB Atlas: `Poco::Exception. Code: 1000, e.code() = 0, SSL Exception: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR`. [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). +* The wire protocol version check for MongoDB used to try accessing "config" database, but this can fail if the user doesn't have permissions for it. The fix is to use the database name provided by user. [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). +* Fix a bug when `SQL SECURITY` statement appears in all `CREATE` queries if the server setting `ignore_empty_sql_security_in_create_view_query=true` https://github.com/ClickHouse/ClickHouse/pull/63134. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.5.1.1763-stable.md b/docs/changelogs/v24.5.1.1763-stable.md new file mode 100644 index 00000000000..384e0395c4d --- /dev/null +++ b/docs/changelogs/v24.5.1.1763-stable.md @@ -0,0 +1,366 @@ +--- +sidebar_position: 1 +sidebar_label: 2024 +--- + +# 2024 Changelog + +### ClickHouse release v24.5.1.1763-stable (647c154a94d) FIXME as compared to v24.4.1.2088-stable (6d4b31322d1) + +#### Backward Incompatible Change +* Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)). +* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_functions=1`. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)). +* Queries from `system.columns` will work faster if there is a large number of columns, but many databases or tables are not granted for `SHOW TABLES`. Note that in previous versions, if you grant `SHOW COLUMNS` to individual columns without granting `SHOW TABLES` to the corresponding tables, the `system.columns` table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. [#63439](https://github.com/ClickHouse/ClickHouse/pull/63439) ([Alexey Milovidov](https://github.com/alexey-milovidov)). + +#### New Feature +* Provide support for AzureBlobStorage function in ClickHouse server to use Azure Workload identity to authenticate against Azure blob storage. If `use_workload_identity` parameter is set in config, [workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications) is used for authentication. [#57881](https://github.com/ClickHouse/ClickHouse/pull/57881) ([Vinay Suryadevara](https://github.com/vinay92-ch)). +* Introduce bulk loading to StorageEmbeddedRocksDB by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce `StorageEmbeddedRocksDB` table settings. [#59163](https://github.com/ClickHouse/ClickHouse/pull/59163) ([Duc Canh Le](https://github.com/canhld94)). +* User can now parse CRLF with TSV format using a setting `input_format_tsv_crlf_end_of_line`. Closes [#56257](https://github.com/ClickHouse/ClickHouse/issues/56257). [#59747](https://github.com/ClickHouse/ClickHouse/pull/59747) ([Shaun Struwig](https://github.com/Blargian)). +* Adds the Form Format to read/write a single record in the application/x-www-form-urlencoded format. [#60199](https://github.com/ClickHouse/ClickHouse/pull/60199) ([Shaun Struwig](https://github.com/Blargian)). +* Added possibility to compress in CROSS JOIN. [#60459](https://github.com/ClickHouse/ClickHouse/pull/60459) ([p1rattttt](https://github.com/p1rattttt)). +* New setting `input_format_force_null_for_omitted_fields` that forces NULL values for omitted fields. [#60887](https://github.com/ClickHouse/ClickHouse/pull/60887) ([Constantine Peresypkin](https://github.com/pkit)). +* Support join with inequal conditions which involve columns from both left and right table. e.g. `t1.y < t2.y`. To enable, `SET allow_experimental_join_condition = 1`. [#60920](https://github.com/ClickHouse/ClickHouse/pull/60920) ([lgbo](https://github.com/lgbo-ustc)). +* Earlier our s3 storage and s3 table function didn't support selecting from archive files. I created a solution that allows to iterate over files inside archives in S3. [#62259](https://github.com/ClickHouse/ClickHouse/pull/62259) ([Daniil Ivanik](https://github.com/divanik)). +* Support for conditional function `clamp`. [#62377](https://github.com/ClickHouse/ClickHouse/pull/62377) ([skyoct](https://github.com/skyoct)). +* Add npy output format. [#62430](https://github.com/ClickHouse/ClickHouse/pull/62430) ([豪肥肥](https://github.com/HowePa)). +* Added SQL functions `generateUUIDv7`, `generateUUIDv7ThreadMonotonic`, `generateUUIDv7NonMonotonic` (with different monotonicity/performance trade-offs) to generate version 7 UUIDs aka. timestamp-based UUIDs with random component. Also added a new function `UUIDToNum` to extract bytes from a UUID and a new function `UUIDv7ToDateTime` to extract timestamp component from a UUID version 7. [#62852](https://github.com/ClickHouse/ClickHouse/pull/62852) ([Alexey Petrunyaka](https://github.com/pet74alex)). +* Backported in [#64307](https://github.com/ClickHouse/ClickHouse/issues/64307): Implement Dynamic data type that allows to store values of any type inside it without knowing all of them in advance. Dynamic type is available under a setting `allow_experimental_dynamic_type`. Reference: [#54864](https://github.com/ClickHouse/ClickHouse/issues/54864). [#63058](https://github.com/ClickHouse/ClickHouse/pull/63058) ([Kruglov Pavel](https://github.com/Avogar)). +* Introduce bulk loading to StorageEmbeddedRocksDB by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce StorageEmbeddedRocksDB table settings. [#63324](https://github.com/ClickHouse/ClickHouse/pull/63324) ([Duc Canh Le](https://github.com/canhld94)). +* Raw as a synonym for TSVRaw. [#63394](https://github.com/ClickHouse/ClickHouse/pull/63394) ([Unalian](https://github.com/Unalian)). +* Added possibility to do cross join in temporary file if size exceeds limits. [#63432](https://github.com/ClickHouse/ClickHouse/pull/63432) ([p1rattttt](https://github.com/p1rattttt)). +* On Linux and MacOS, if the program has STDOUT redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to `INTO OUTFILE` ). [#63662](https://github.com/ClickHouse/ClickHouse/pull/63662) ([v01dXYZ](https://github.com/v01dXYZ)). +* Change warning on high number of attached tables to differentiate tables, views and dictionaries. [#64180](https://github.com/ClickHouse/ClickHouse/pull/64180) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). + +#### Performance Improvement +* Skip merging of newly created projection blocks during `INSERT`-s. [#59405](https://github.com/ClickHouse/ClickHouse/pull/59405) ([Nikita Taranov](https://github.com/nickitat)). +* Process string functions XXXUTF8 'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. [#61632](https://github.com/ClickHouse/ClickHouse/pull/61632) ([李扬](https://github.com/taiyang-li)). +* Improved performance of selection (`{}`) globs in StorageS3. [#62120](https://github.com/ClickHouse/ClickHouse/pull/62120) ([Andrey Zvonov](https://github.com/zvonand)). +* HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. [#62652](https://github.com/ClickHouse/ClickHouse/pull/62652) ([Anton Ivashkin](https://github.com/ianton-ru)). +* Add a new configuration`prefer_merge_sort_block_bytes` to control the memory usage and speed up sorting 2 times when merging when there are many columns. [#62904](https://github.com/ClickHouse/ClickHouse/pull/62904) ([LiuNeng](https://github.com/liuneng1994)). +* `clickhouse-local` will start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes [#62941](https://github.com/ClickHouse/ClickHouse/issues/62941). [#63074](https://github.com/ClickHouse/ClickHouse/pull/63074) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Micro-optimizations for the new analyzer. [#63429](https://github.com/ClickHouse/ClickHouse/pull/63429) ([Raúl Marín](https://github.com/Algunenano)). +* Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63443](https://github.com/ClickHouse/ClickHouse/pull/63443) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63532](https://github.com/ClickHouse/ClickHouse/pull/63532) ([Raúl Marín](https://github.com/Algunenano)). +* Speed up indices of type `set` a little (around 1.5 times) by removing garbage. [#64098](https://github.com/ClickHouse/ClickHouse/pull/64098) ([Alexey Milovidov](https://github.com/alexey-milovidov)). + +#### Improvement +* Maps can now have `Float32`, `Float64`, `Array(T)`, `Map(K,V)` and `Tuple(T1, T2, ...)` as keys. Closes [#54537](https://github.com/ClickHouse/ClickHouse/issues/54537). [#59318](https://github.com/ClickHouse/ClickHouse/pull/59318) ([李扬](https://github.com/taiyang-li)). +* Multiline strings with border preservation and column width change. [#59940](https://github.com/ClickHouse/ClickHouse/pull/59940) ([Volodyachan](https://github.com/Volodyachan)). +* Make rabbitmq nack broken messages. Closes [#45350](https://github.com/ClickHouse/ClickHouse/issues/45350). [#60312](https://github.com/ClickHouse/ClickHouse/pull/60312) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix a crash in asynchronous stack unwinding (such as when using the sampling query profiler) while interpreting debug info. This closes [#60460](https://github.com/ClickHouse/ClickHouse/issues/60460). [#60468](https://github.com/ClickHouse/ClickHouse/pull/60468) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Distinct messages for s3 error 'no key' for cases disk and storage. [#61108](https://github.com/ClickHouse/ClickHouse/pull/61108) ([Sema Checherinda](https://github.com/CheSema)). +* Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by `keep_free_space_size(elements)_ratio`). This allows to release pressure from space reservation for queries (on `tryReserve` method). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. [#61250](https://github.com/ClickHouse/ClickHouse/pull/61250) ([Kseniia Sumarokova](https://github.com/kssenii)). +* The progress bar will work for trivial queries with LIMIT from `system.zeros`, `system.zeros_mt` (it already works for `system.numbers` and `system.numbers_mt`), and the `generateRandom` table function. As a bonus, if the total number of records is greater than the `max_rows_to_read` limit, it will throw an exception earlier. This closes [#58183](https://github.com/ClickHouse/ClickHouse/issues/58183). [#61823](https://github.com/ClickHouse/ClickHouse/pull/61823) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* YAML Merge Key support. [#62685](https://github.com/ClickHouse/ClickHouse/pull/62685) ([Azat Khuzhin](https://github.com/azat)). +* Enhance error message when non-deterministic function is used with Replicated source. [#62896](https://github.com/ClickHouse/ClickHouse/pull/62896) ([Grégoire Pineau](https://github.com/lyrixx)). +* Fix interserver secret for Distributed over Distributed from `remote`. [#63013](https://github.com/ClickHouse/ClickHouse/pull/63013) ([Azat Khuzhin](https://github.com/azat)). +* Allow using `clickhouse-local` and its shortcuts `clickhouse` and `ch` with a query or queries file as a positional argument. Examples: `ch "SELECT 1"`, `ch --param_test Hello "SELECT {test:String}"`, `ch query.sql`. This closes [#62361](https://github.com/ClickHouse/ClickHouse/issues/62361). [#63081](https://github.com/ClickHouse/ClickHouse/pull/63081) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Support configuration substitutions from YAML files. [#63106](https://github.com/ClickHouse/ClickHouse/pull/63106) ([Eduard Karacharov](https://github.com/korowa)). +* Add TTL information in system parts_columns table. [#63200](https://github.com/ClickHouse/ClickHouse/pull/63200) ([litlig](https://github.com/litlig)). +* Keep previous data in terminal after picking from skim suggestions. [#63261](https://github.com/ClickHouse/ClickHouse/pull/63261) ([FlameFactory](https://github.com/FlameFactory)). +* Width of fields now correctly calculate, ignoring ANSI escape sequences. [#63270](https://github.com/ClickHouse/ClickHouse/pull/63270) ([Shaun Struwig](https://github.com/Blargian)). +* Enable plain_rewritable metadata for local and Azure (azure_blob_storage) object storages. [#63365](https://github.com/ClickHouse/ClickHouse/pull/63365) ([Julia Kartseva](https://github.com/jkartseva)). +* Support English-style Unicode quotes, e.g. “Hello”, ‘world’. This is questionable in general but helpful when you type your query in a word processor, such as Google Docs. This closes [#58634](https://github.com/ClickHouse/ClickHouse/issues/58634). [#63381](https://github.com/ClickHouse/ClickHouse/pull/63381) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Allowed to create MaterializedMySQL database without connection to MySQL. [#63397](https://github.com/ClickHouse/ClickHouse/pull/63397) ([Kirill](https://github.com/kirillgarbar)). +* Remove copying data when writing to filesystem cache. [#63401](https://github.com/ClickHouse/ClickHouse/pull/63401) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Update the usage of error code `NUMBER_OF_ARGUMENTS_DOESNT_MATCH` by more accurate error codes when appropriate. [#63406](https://github.com/ClickHouse/ClickHouse/pull/63406) ([Yohann Jardin](https://github.com/yohannj)). +* `os_user` and `client_hostname` are now correctly set up for queries for command line suggestions in clickhouse-client. This closes [#63430](https://github.com/ClickHouse/ClickHouse/issues/63430). [#63433](https://github.com/ClickHouse/ClickHouse/pull/63433) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fixed tabulation from line numbering, correct handling of length when moving a line if the value has a tab, added tests. [#63493](https://github.com/ClickHouse/ClickHouse/pull/63493) ([Volodyachan](https://github.com/Volodyachan)). +* Add this `aggregate_function_group_array_has_limit_size`setting to support discarding data in some scenarios. [#63516](https://github.com/ClickHouse/ClickHouse/pull/63516) ([zhongyuankai](https://github.com/zhongyuankai)). +* Automatically mark a replica of Replicated database as lost and start recovery if some DDL task fails more than `max_retries_before_automatic_recovery` (100 by default) times in a row with the same error. Also, fixed a bug that could cause skipping DDL entries when an exception is thrown during an early stage of entry execution. [#63549](https://github.com/ClickHouse/ClickHouse/pull/63549) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Automatically correct `max_block_size=0` to default value. [#63587](https://github.com/ClickHouse/ClickHouse/pull/63587) ([Antonio Andelic](https://github.com/antonio2368)). +* Account failed files in `s3queue_tracked_file_ttl_sec` and `s3queue_traked_files_limit` for `StorageS3Queue`. [#63638](https://github.com/ClickHouse/ClickHouse/pull/63638) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Add a build_id ALIAS column to trace_log to facilitate auto renaming upon detecting binary changes. This is to address [#52086](https://github.com/ClickHouse/ClickHouse/issues/52086). [#63656](https://github.com/ClickHouse/ClickHouse/pull/63656) ([Zimu Li](https://github.com/woodlzm)). +* Enable truncate operation for object storage disks. [#63693](https://github.com/ClickHouse/ClickHouse/pull/63693) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* The loading of the keywords list is now dependent on the server revision and will be disabled for the old versions of ClickHouse server. CC @azat. [#63786](https://github.com/ClickHouse/ClickHouse/pull/63786) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Allow trailing commas in the columns list in the INSERT query. For example, `INSERT INTO test (a, b, c, ) VALUES ...`. [#63803](https://github.com/ClickHouse/ClickHouse/pull/63803) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Better exception messages for the `Regexp` format. [#63804](https://github.com/ClickHouse/ClickHouse/pull/63804) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Allow trailing commas in the `Values` format. For example, this query is allowed: `INSERT INTO test (a, b, c) VALUES (4, 5, 6,);`. [#63810](https://github.com/ClickHouse/ClickHouse/pull/63810) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Clickhouse disks have to read server setting to obtain actual metadata format version. [#63831](https://github.com/ClickHouse/ClickHouse/pull/63831) ([Sema Checherinda](https://github.com/CheSema)). +* Disable pretty format restrictions (`output_format_pretty_max_rows`/`output_format_pretty_max_value_width`) when stdout is not TTY. [#63942](https://github.com/ClickHouse/ClickHouse/pull/63942) ([Azat Khuzhin](https://github.com/azat)). +* Exception handling now works when ClickHouse is used inside AWS Lambda. Author: [Alexey Coolnev](https://github.com/acoolnev). [#64014](https://github.com/ClickHouse/ClickHouse/pull/64014) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Throw `CANNOT_DECOMPRESS` instread of `CORRUPTED_DATA` on invalid compressed data passed via HTTP. [#64036](https://github.com/ClickHouse/ClickHouse/pull/64036) ([vdimir](https://github.com/vdimir)). +* A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes [#61993](https://github.com/ClickHouse/ClickHouse/issues/61993). [#64084](https://github.com/ClickHouse/ClickHouse/pull/64084) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Now backups with azure blob storage will use multicopy. [#64116](https://github.com/ClickHouse/ClickHouse/pull/64116) ([alesapin](https://github.com/alesapin)). +* Add metrics, logs, and thread names around parts filtering with indices. [#64130](https://github.com/ClickHouse/ClickHouse/pull/64130) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Allow to use native copy for azure even with different containers. [#64154](https://github.com/ClickHouse/ClickHouse/pull/64154) ([alesapin](https://github.com/alesapin)). +* Finally enable native copy for azure. [#64182](https://github.com/ClickHouse/ClickHouse/pull/64182) ([alesapin](https://github.com/alesapin)). +* Ignore `allow_suspicious_primary_key` on `ATTACH` and verify on `ALTER`. [#64202](https://github.com/ClickHouse/ClickHouse/pull/64202) ([Azat Khuzhin](https://github.com/azat)). + +#### Build/Testing/Packaging Improvement +* ClickHouse is built with clang-18. A lot of new checks from clang-tidy-18 have been enabled. [#60469](https://github.com/ClickHouse/ClickHouse/pull/60469) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Re-enable broken s390x build in CI. [#63135](https://github.com/ClickHouse/ClickHouse/pull/63135) ([Harry Lee](https://github.com/HarryLeeIBM)). +* The Dockerfile is reviewed by the docker official library in https://github.com/docker-library/official-images/pull/15846. [#63400](https://github.com/ClickHouse/ClickHouse/pull/63400) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Information about every symbol in every translation unit will be collected in the CI database for every build in the CI. This closes [#63494](https://github.com/ClickHouse/ClickHouse/issues/63494). [#63495](https://github.com/ClickHouse/ClickHouse/pull/63495) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Experimentally support loongarch64 as a new platform for ClickHouse. [#63733](https://github.com/ClickHouse/ClickHouse/pull/63733) ([qiangxuhui](https://github.com/qiangxuhui)). +* Update Apache Datasketches library. It resolves [#63858](https://github.com/ClickHouse/ClickHouse/issues/63858). [#63923](https://github.com/ClickHouse/ClickHouse/pull/63923) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Enable GRPC support for aarch64 linux while cross-compiling binary. [#64072](https://github.com/ClickHouse/ClickHouse/pull/64072) ([alesapin](https://github.com/alesapin)). + +#### Bug Fix (user-visible misbehavior in an official stable release) + +* Fix making backup when multiple shards are used. This PR fixes [#56566](https://github.com/ClickHouse/ClickHouse/issues/56566). [#57684](https://github.com/ClickHouse/ClickHouse/pull/57684) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix passing projections/indexes from CREATE query into inner table of MV. [#59183](https://github.com/ClickHouse/ClickHouse/pull/59183) ([Azat Khuzhin](https://github.com/azat)). +* Fix boundRatio incorrect merge. [#60532](https://github.com/ClickHouse/ClickHouse/pull/60532) ([Tao Wang](https://github.com/wangtZJU)). +* Fix crash when using some functions with low-cardinality columns. [#61966](https://github.com/ClickHouse/ClickHouse/pull/61966) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix queries with FINAL give wrong result when table does not use adaptive granularity. [#62432](https://github.com/ClickHouse/ClickHouse/pull/62432) ([Duc Canh Le](https://github.com/canhld94)). +* Improve the detection of cgroups v2 memory controller in unusual locations. This fixes a warning that the cgroup memory observer was disabled because no cgroups v1 or v2 current memory file could be found. [#62903](https://github.com/ClickHouse/ClickHouse/pull/62903) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix subsequent use of external tables in client. [#62964](https://github.com/ClickHouse/ClickHouse/pull/62964) ([Azat Khuzhin](https://github.com/azat)). +* Fix crash with untuple and unresolved lambda. [#63131](https://github.com/ClickHouse/ClickHouse/pull/63131) ([Raúl Marín](https://github.com/Algunenano)). +* Fix bug which could lead to server to accept connections before server is actually loaded. [#63181](https://github.com/ClickHouse/ClickHouse/pull/63181) ([alesapin](https://github.com/alesapin)). +* Fix intersect parts when restart after drop range. [#63202](https://github.com/ClickHouse/ClickHouse/pull/63202) ([Han Fei](https://github.com/hanfei1991)). +* Fix a misbehavior when SQL security defaults don't load for old tables during server startup. [#63209](https://github.com/ClickHouse/ClickHouse/pull/63209) ([pufit](https://github.com/pufit)). +* JOIN filter push down filled join fix. Closes [#63228](https://github.com/ClickHouse/ClickHouse/issues/63228). [#63234](https://github.com/ClickHouse/ClickHouse/pull/63234) ([Maksim Kita](https://github.com/kitaisreal)). +* Fix infinite loop while listing objects in Azure blob storage. [#63257](https://github.com/ClickHouse/ClickHouse/pull/63257) ([Julia Kartseva](https://github.com/jkartseva)). +* CROSS join can be executed with any value `join_algorithm` setting, close [#62431](https://github.com/ClickHouse/ClickHouse/issues/62431). [#63273](https://github.com/ClickHouse/ClickHouse/pull/63273) ([vdimir](https://github.com/vdimir)). +* Fixed a potential crash caused by a `no space left` error when temporary data in the cache is used. [#63346](https://github.com/ClickHouse/ClickHouse/pull/63346) ([vdimir](https://github.com/vdimir)). +* Fix bug which could potentially lead to rare LOGICAL_ERROR during SELECT query with message: `Unexpected return type from materialize. Expected type_XXX. Got type_YYY.` Introduced in [#59379](https://github.com/ClickHouse/ClickHouse/issues/59379). [#63353](https://github.com/ClickHouse/ClickHouse/pull/63353) ([alesapin](https://github.com/alesapin)). +* Fix `X-ClickHouse-Timezone` header returning wrong timezone when using `session_timezone` as query level setting. [#63377](https://github.com/ClickHouse/ClickHouse/pull/63377) ([Andrey Zvonov](https://github.com/zvonand)). +* Fix debug assert when using grouping WITH ROLLUP and LowCardinality types. [#63398](https://github.com/ClickHouse/ClickHouse/pull/63398) ([Raúl Marín](https://github.com/Algunenano)). +* Fix logical errors in queries with `GROUPING SETS` and `WHERE` and `group_by_use_nulls = true`, close [#60538](https://github.com/ClickHouse/ClickHouse/issues/60538). [#63405](https://github.com/ClickHouse/ClickHouse/pull/63405) ([vdimir](https://github.com/vdimir)). +* Fix backup of projection part in case projection was removed from table metadata, but part still has projection. [#63426](https://github.com/ClickHouse/ClickHouse/pull/63426) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix 'Every derived table must have its own alias' error for MYSQL dictionary source, close [#63341](https://github.com/ClickHouse/ClickHouse/issues/63341). [#63481](https://github.com/ClickHouse/ClickHouse/pull/63481) ([vdimir](https://github.com/vdimir)). +* Insert QueryFinish on AsyncInsertFlush with no data. [#63483](https://github.com/ClickHouse/ClickHouse/pull/63483) ([Raúl Marín](https://github.com/Algunenano)). +* Fix `system.query_log.used_dictionaries` logging. [#63487](https://github.com/ClickHouse/ClickHouse/pull/63487) ([Eduard Karacharov](https://github.com/korowa)). +* Avoid segafult in `MergeTreePrefetchedReadPool` while fetching projection parts. [#63513](https://github.com/ClickHouse/ClickHouse/pull/63513) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix rabbitmq heap-use-after-free found by clang-18, which can happen if an error is thrown from RabbitMQ during initialization of exchange and queues. [#63515](https://github.com/ClickHouse/ClickHouse/pull/63515) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix crash on exit with sentry enabled (due to openssl destroyed before sentry). [#63548](https://github.com/ClickHouse/ClickHouse/pull/63548) ([Azat Khuzhin](https://github.com/azat)). +* Fix support for Array and Map with Keyed hashing functions and materialized keys. [#63628](https://github.com/ClickHouse/ClickHouse/pull/63628) ([Salvatore Mesoraca](https://github.com/aiven-sal)). +* Fixed Parquet filter pushdown not working with Analyzer. [#63642](https://github.com/ClickHouse/ClickHouse/pull/63642) ([Michael Kolupaev](https://github.com/al13n321)). +* It is forbidden to convert MergeTree to replicated if the zookeeper path for this table already exists. [#63670](https://github.com/ClickHouse/ClickHouse/pull/63670) ([Kirill](https://github.com/kirillgarbar)). +* Read only the necessary columns from VIEW (new analyzer). Closes [#62594](https://github.com/ClickHouse/ClickHouse/issues/62594). [#63688](https://github.com/ClickHouse/ClickHouse/pull/63688) ([Maksim Kita](https://github.com/kitaisreal)). +* Fix rare case with missing data in the result of distributed query. [#63691](https://github.com/ClickHouse/ClickHouse/pull/63691) ([vdimir](https://github.com/vdimir)). +* Fix [#63539](https://github.com/ClickHouse/ClickHouse/issues/63539). Forbid WINDOW redefinition in new analyzer. [#63694](https://github.com/ClickHouse/ClickHouse/pull/63694) ([Dmitry Novik](https://github.com/novikd)). +* Flatten_nested is broken with replicated database. [#63695](https://github.com/ClickHouse/ClickHouse/pull/63695) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix `SIZES_OF_COLUMNS_DOESNT_MATCH` error for queries with `arrayJoin` function in `WHERE`. Fixes [#63653](https://github.com/ClickHouse/ClickHouse/issues/63653). [#63722](https://github.com/ClickHouse/ClickHouse/pull/63722) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix `Not found column` and `CAST AS Map from array requires nested tuple of 2 elements` exceptions for distributed queries which use `Map(Nothing, Nothing)` type. Fixes [#63637](https://github.com/ClickHouse/ClickHouse/issues/63637). [#63753](https://github.com/ClickHouse/ClickHouse/pull/63753) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix possible `ILLEGAL_COLUMN` error in `partial_merge` join, close [#37928](https://github.com/ClickHouse/ClickHouse/issues/37928). [#63755](https://github.com/ClickHouse/ClickHouse/pull/63755) ([vdimir](https://github.com/vdimir)). +* `query_plan_remove_redundant_distinct` can break queries with WINDOW FUNCTIONS (with `allow_experimental_analyzer` is on). Fixes [#62820](https://github.com/ClickHouse/ClickHouse/issues/62820). [#63776](https://github.com/ClickHouse/ClickHouse/pull/63776) ([Igor Nikonov](https://github.com/devcrafter)). +* Fix possible crash with SYSTEM UNLOAD PRIMARY KEY. [#63778](https://github.com/ClickHouse/ClickHouse/pull/63778) ([Raúl Marín](https://github.com/Algunenano)). +* Fix a query with a duplicating cycling alias. Fixes [#63320](https://github.com/ClickHouse/ClickHouse/issues/63320). [#63791](https://github.com/ClickHouse/ClickHouse/pull/63791) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fixed performance degradation of parsing data formats in INSERT query. This closes [#62918](https://github.com/ClickHouse/ClickHouse/issues/62918). This partially reverts [#42284](https://github.com/ClickHouse/ClickHouse/issues/42284), which breaks the original design and introduces more problems. [#63801](https://github.com/ClickHouse/ClickHouse/pull/63801) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Add 'endpoint_subpath' S3 URI setting to allow plain_rewritable disks to share the same endpoint. [#63806](https://github.com/ClickHouse/ClickHouse/pull/63806) ([Julia Kartseva](https://github.com/jkartseva)). +* Fix queries using parallel read buffer (e.g. with max_download_thread > 0) getting stuck when threads cannot be allocated. [#63814](https://github.com/ClickHouse/ClickHouse/pull/63814) ([Antonio Andelic](https://github.com/antonio2368)). +* Allow JOIN filter push down to both streams if only single equivalent column is used in query. Closes [#63799](https://github.com/ClickHouse/ClickHouse/issues/63799). [#63819](https://github.com/ClickHouse/ClickHouse/pull/63819) ([Maksim Kita](https://github.com/kitaisreal)). +* Remove the data from all disks after DROP with the Lazy database engines. Without these changes, orhpaned will remain on the disks. [#63848](https://github.com/ClickHouse/ClickHouse/pull/63848) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Fix incorrect select query result when parallel replicas were used to read from a Materialized View. [#63861](https://github.com/ClickHouse/ClickHouse/pull/63861) ([Nikita Taranov](https://github.com/nickitat)). +* Fixes in `find_super_nodes` and `find_big_family` command of keeper-client: - do not fail on ZNONODE errors - find super nodes inside super nodes - properly calculate subtree node count. [#63862](https://github.com/ClickHouse/ClickHouse/pull/63862) ([Alexander Gololobov](https://github.com/davenger)). +* Fix a error `Database name is empty` for remote queries with lambdas over the cluster with modified default database. Fixes [#63471](https://github.com/ClickHouse/ClickHouse/issues/63471). [#63864](https://github.com/ClickHouse/ClickHouse/pull/63864) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix SIGSEGV due to CPU/Real (`query_profiler_real_time_period_ns`/`query_profiler_cpu_time_period_ns`) profiler (has been an issue since 2022, that leads to periodic server crashes, especially if you were using distributed engine). [#63865](https://github.com/ClickHouse/ClickHouse/pull/63865) ([Azat Khuzhin](https://github.com/azat)). +* Fixed `EXPLAIN CURRENT TRANSACTION` query. [#63926](https://github.com/ClickHouse/ClickHouse/pull/63926) ([Anton Popov](https://github.com/CurtizJ)). +* Fix analyzer - IN function with arbitrary deep sub-selects in materialized view to use insertion block. [#63930](https://github.com/ClickHouse/ClickHouse/pull/63930) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Allow `ALTER TABLE .. MODIFY|RESET SETTING` and `ALTER TABLE .. MODIFY COMMENT` for plain_rewritable disk. [#63933](https://github.com/ClickHouse/ClickHouse/pull/63933) ([Julia Kartseva](https://github.com/jkartseva)). +* Fix Recursive CTE with distributed queries. Closes [#63790](https://github.com/ClickHouse/ClickHouse/issues/63790). [#63939](https://github.com/ClickHouse/ClickHouse/pull/63939) ([Maksim Kita](https://github.com/kitaisreal)). +* Fix resolve of unqualified COLUMNS matcher. Preserve the input columns order and forbid usage of unknown identifiers. [#63962](https://github.com/ClickHouse/ClickHouse/pull/63962) ([Dmitry Novik](https://github.com/novikd)). +* Fix the `Not found column` error for queries with `skip_unused_shards = 1`, `LIMIT BY`, and the new analyzer. Fixes [#63943](https://github.com/ClickHouse/ClickHouse/issues/63943). [#63983](https://github.com/ClickHouse/ClickHouse/pull/63983) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* (Low-quality third-party Kusto Query Language). Resolve Client Abortion Issue When Using KQL Table Function in Interactive Mode. [#63992](https://github.com/ClickHouse/ClickHouse/pull/63992) ([Yong Wang](https://github.com/kashwy)). +* Backported in [#64356](https://github.com/ClickHouse/ClickHouse/issues/64356): Fix an `Cyclic aliases` error for cyclic aliases of different type (expression and function). Fixes [#63205](https://github.com/ClickHouse/ClickHouse/issues/63205). [#63993](https://github.com/ClickHouse/ClickHouse/pull/63993) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Deserialize untrusted binary inputs in a safer way. [#64024](https://github.com/ClickHouse/ClickHouse/pull/64024) ([Robert Schulze](https://github.com/rschu1ze)). +* Do not throw `Storage doesn't support FINAL` error for remote queries over non-MergeTree tables with `final = true` and new analyzer. Fixes [#63960](https://github.com/ClickHouse/ClickHouse/issues/63960). [#64037](https://github.com/ClickHouse/ClickHouse/pull/64037) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Add missing settings to recoverLostReplica. [#64040](https://github.com/ClickHouse/ClickHouse/pull/64040) ([Raúl Marín](https://github.com/Algunenano)). +* Fix unwind on SIGSEGV on aarch64 (due to small stack for signal). [#64058](https://github.com/ClickHouse/ClickHouse/pull/64058) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#64324](https://github.com/ClickHouse/ClickHouse/issues/64324): This fix will use a proper redefined context with the correct definer for each individual view in the query pipeline Closes [#63777](https://github.com/ClickHouse/ClickHouse/issues/63777). [#64079](https://github.com/ClickHouse/ClickHouse/pull/64079) ([pufit](https://github.com/pufit)). +* Backported in [#64384](https://github.com/ClickHouse/ClickHouse/issues/64384): Fix analyzer: "Not found column" error is fixed when using INTERPOLATE. [#64096](https://github.com/ClickHouse/ClickHouse/pull/64096) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fix azure backup writing multipart blocks as 1mb (read buffer size) instead of max_upload_part_size. [#64117](https://github.com/ClickHouse/ClickHouse/pull/64117) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#64541](https://github.com/ClickHouse/ClickHouse/issues/64541): Fix creating backups to S3 buckets with different credentials from the disk containing the file. [#64153](https://github.com/ClickHouse/ClickHouse/pull/64153) ([Antonio Andelic](https://github.com/antonio2368)). +* Prevent LOGICAL_ERROR on CREATE TABLE as MaterializedView. [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#64332](https://github.com/ClickHouse/ClickHouse/issues/64332): The query cache now considers two identical queries against different databases as different. The previous behavior could be used to bypass missing privileges to read from a table. [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)). +* Ignore `text_log` config when using Keeper. [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#64692](https://github.com/ClickHouse/ClickHouse/issues/64692): Fix Query Tree size validation. Closes [#63701](https://github.com/ClickHouse/ClickHouse/issues/63701). [#64377](https://github.com/ClickHouse/ClickHouse/pull/64377) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#64411](https://github.com/ClickHouse/ClickHouse/issues/64411): Fix `Logical error: Bad cast` for `Buffer` table with `PREWHERE`. Fixes [#64172](https://github.com/ClickHouse/ClickHouse/issues/64172). [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#64625](https://github.com/ClickHouse/ClickHouse/issues/64625): Fix an error `Cannot find column` in distributed queries with constant CTE in the `GROUP BY` key. [#64519](https://github.com/ClickHouse/ClickHouse/pull/64519) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#64682](https://github.com/ClickHouse/ClickHouse/issues/64682): Fix [#64612](https://github.com/ClickHouse/ClickHouse/issues/64612). Do not rewrite aggregation if `-If` combinator is already used. [#64638](https://github.com/ClickHouse/ClickHouse/pull/64638) ([Dmitry Novik](https://github.com/novikd)). + +#### CI Fix or Improvement (changelog entry is not required) + +* Implement cumulative A Sync status. [#61464](https://github.com/ClickHouse/ClickHouse/pull/61464) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Add ability to run Azure tests in PR with label. [#63196](https://github.com/ClickHouse/ClickHouse/pull/63196) ([alesapin](https://github.com/alesapin)). +* Add azure run with msan. [#63238](https://github.com/ClickHouse/ClickHouse/pull/63238) ([alesapin](https://github.com/alesapin)). +* Improve cloud backport script. [#63282](https://github.com/ClickHouse/ClickHouse/pull/63282) ([Raúl Marín](https://github.com/Algunenano)). +* Use `/commit/` to have the URLs in [reports](https://play.clickhouse.com/play?user=play#c2VsZWN0IGRpc3RpbmN0IGNvbW1pdF91cmwgZnJvbSBjaGVja3Mgd2hlcmUgY2hlY2tfc3RhcnRfdGltZSA+PSBub3coKSAtIGludGVydmFsIDEgbW9udGggYW5kIHB1bGxfcmVxdWVzdF9udW1iZXI9NjA1MzI=) like https://github.com/ClickHouse/ClickHouse/commit/44f8bc5308b53797bec8cccc3bd29fab8a00235d and not like https://github.com/ClickHouse/ClickHouse/commits/44f8bc5308b53797bec8cccc3bd29fab8a00235d. [#63331](https://github.com/ClickHouse/ClickHouse/pull/63331) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Extra constraints for stress and fuzzer tests. [#63470](https://github.com/ClickHouse/ClickHouse/pull/63470) ([Raúl Marín](https://github.com/Algunenano)). +* Fix 02362_part_log_merge_algorithm flaky test. [#63635](https://github.com/ClickHouse/ClickHouse/pull/63635) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)). +* Fix test_odbc_interaction from aarch64 [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63787](https://github.com/ClickHouse/ClickHouse/pull/63787) ([alesapin](https://github.com/alesapin)). +* Fix test `test_catboost_evaluate` for aarch64. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63789](https://github.com/ClickHouse/ClickHouse/pull/63789) ([alesapin](https://github.com/alesapin)). +* Remove HDFS from disks config for one integration test for arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63832](https://github.com/ClickHouse/ClickHouse/pull/63832) ([alesapin](https://github.com/alesapin)). +* Bump version for old image in test_short_strings_aggregation to make it work on arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63836](https://github.com/ClickHouse/ClickHouse/pull/63836) ([alesapin](https://github.com/alesapin)). +* Disable test `test_non_default_compression/test.py::test_preconfigured_deflateqpl_codec` on arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63839](https://github.com/ClickHouse/ClickHouse/pull/63839) ([alesapin](https://github.com/alesapin)). +* Include checks like `Stateless tests (asan, distributed cache, meta storage in keeper, s3 storage) [2/3]` in `Mergeable Check` and `A Sync`. [#63945](https://github.com/ClickHouse/ClickHouse/pull/63945) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Fix 02124_insert_deduplication_token_multiple_blocks. [#63950](https://github.com/ClickHouse/ClickHouse/pull/63950) ([Han Fei](https://github.com/hanfei1991)). +* Add `ClickHouseVersion.copy` method. Create a branch release in advance without spinning out the release to increase the stability. [#64039](https://github.com/ClickHouse/ClickHouse/pull/64039) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* The mime type is not 100% reliable for Python and shell scripts without shebangs; add a check for file extension. [#64062](https://github.com/ClickHouse/ClickHouse/pull/64062) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Add retries in git submodule update. [#64125](https://github.com/ClickHouse/ClickHouse/pull/64125) ([Alexey Milovidov](https://github.com/alexey-milovidov)). + +#### Critical Bug Fix (crash, LOGICAL_ERROR, data loss, RBAC) + +* Backported in [#64591](https://github.com/ClickHouse/ClickHouse/issues/64591): Disabled `enable_vertical_final` setting by default. This feature should not be used because it has a bug: [#64543](https://github.com/ClickHouse/ClickHouse/issues/64543). [#64544](https://github.com/ClickHouse/ClickHouse/pull/64544) ([Alexander Tokmakov](https://github.com/tavplubix)). + +#### NO CL ENTRY + +* NO CL ENTRY: 'Revert "Do not remove server constants from GROUP BY key for secondary query."'. [#63297](https://github.com/ClickHouse/ClickHouse/pull/63297) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* NO CL ENTRY: 'Revert "Introduce bulk loading to StorageEmbeddedRocksDB"'. [#63316](https://github.com/ClickHouse/ClickHouse/pull/63316) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* NO CL ENTRY: 'Add tags for the test 03000_traverse_shadow_system_data_paths.sql to make it stable'. [#63366](https://github.com/ClickHouse/ClickHouse/pull/63366) ([Aleksei Filatov](https://github.com/aalexfvk)). +* NO CL ENTRY: 'Revert "Revert "Do not remove server constants from GROUP BY key for secondary query.""'. [#63415](https://github.com/ClickHouse/ClickHouse/pull/63415) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* NO CL ENTRY: 'Revert "Fix index analysis for `DateTime64`"'. [#63525](https://github.com/ClickHouse/ClickHouse/pull/63525) ([Raúl Marín](https://github.com/Algunenano)). +* NO CL ENTRY: 'Add `jwcrypto` to integration tests runner'. [#63551](https://github.com/ClickHouse/ClickHouse/pull/63551) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* NO CL ENTRY: 'Follow-up for the `binary_symbols` table in CI'. [#63802](https://github.com/ClickHouse/ClickHouse/pull/63802) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* NO CL ENTRY: 'chore(ci-workers): remove reusable from tailscale key'. [#63999](https://github.com/ClickHouse/ClickHouse/pull/63999) ([Gabriel Martinez](https://github.com/GMartinez-Sisti)). +* NO CL ENTRY: 'Revert "Update gui.md - Add ch-ui to open-source available tools."'. [#64064](https://github.com/ClickHouse/ClickHouse/pull/64064) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* NO CL ENTRY: 'Prevent stack overflow in Fuzzer and Stress test'. [#64082](https://github.com/ClickHouse/ClickHouse/pull/64082) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* NO CL ENTRY: 'Revert "Prevent conversion to Replicated if zookeeper path already exists"'. [#64214](https://github.com/ClickHouse/ClickHouse/pull/64214) ([Sergei Trifonov](https://github.com/serxa)). + +#### NOT FOR CHANGELOG / INSIGNIFICANT + +* Remove http_max_chunk_size setting (too internal) [#60852](https://github.com/ClickHouse/ClickHouse/pull/60852) ([Azat Khuzhin](https://github.com/azat)). +* Fix race in refreshable materialized views causing SELECT to fail sometimes [#60883](https://github.com/ClickHouse/ClickHouse/pull/60883) ([Michael Kolupaev](https://github.com/al13n321)). +* Parallel replicas: table check failover [#61935](https://github.com/ClickHouse/ClickHouse/pull/61935) ([Igor Nikonov](https://github.com/devcrafter)). +* Avoid crashing on column type mismatch in a few dozen places [#62087](https://github.com/ClickHouse/ClickHouse/pull/62087) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix optimize_if_chain_to_multiif const NULL handling [#62104](https://github.com/ClickHouse/ClickHouse/pull/62104) ([Michael Kolupaev](https://github.com/al13n321)). +* Use intrusive lists for `ResourceRequest` instead of deque [#62165](https://github.com/ClickHouse/ClickHouse/pull/62165) ([Sergei Trifonov](https://github.com/serxa)). +* Analyzer: Fix validateAggregates for tables with different aliases [#62346](https://github.com/ClickHouse/ClickHouse/pull/62346) ([vdimir](https://github.com/vdimir)). +* Improve code and tests of `DROP` of multiple tables [#62359](https://github.com/ClickHouse/ClickHouse/pull/62359) ([zhongyuankai](https://github.com/zhongyuankai)). +* Fix exception message during writing to partitioned s3/hdfs/azure path with globs [#62423](https://github.com/ClickHouse/ClickHouse/pull/62423) ([Kruglov Pavel](https://github.com/Avogar)). +* Support UBSan on Clang-19 (master) [#62466](https://github.com/ClickHouse/ClickHouse/pull/62466) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Save the stacktrace of thread waiting on failing AsyncLoader job [#62719](https://github.com/ClickHouse/ClickHouse/pull/62719) ([Sergei Trifonov](https://github.com/serxa)). +* group_by_use_nulls strikes back [#62922](https://github.com/ClickHouse/ClickHouse/pull/62922) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Analyzer: prefer column name to alias from array join [#62995](https://github.com/ClickHouse/ClickHouse/pull/62995) ([vdimir](https://github.com/vdimir)). +* CI: try separate the workflows file for GitHub's Merge Queue [#63123](https://github.com/ClickHouse/ClickHouse/pull/63123) ([Max K.](https://github.com/maxknv)). +* Try to fix coverage tests [#63130](https://github.com/ClickHouse/ClickHouse/pull/63130) ([Raúl Marín](https://github.com/Algunenano)). +* Fix azure backup flaky test [#63158](https://github.com/ClickHouse/ClickHouse/pull/63158) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Merging [#60920](https://github.com/ClickHouse/ClickHouse/issues/60920) [#63159](https://github.com/ClickHouse/ClickHouse/pull/63159) ([vdimir](https://github.com/vdimir)). +* QueryAnalysisPass improve QUALIFY validation [#63162](https://github.com/ClickHouse/ClickHouse/pull/63162) ([Maksim Kita](https://github.com/kitaisreal)). +* Add numpy tests for different endianness [#63189](https://github.com/ClickHouse/ClickHouse/pull/63189) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fallback action-runner to autoupdate when it's unable to start [#63195](https://github.com/ClickHouse/ClickHouse/pull/63195) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Fix possible endless loop while reading from azure [#63197](https://github.com/ClickHouse/ClickHouse/pull/63197) ([Anton Popov](https://github.com/CurtizJ)). +* Add information about materialized view security bug fix into the changelog [#63204](https://github.com/ClickHouse/ClickHouse/pull/63204) ([pufit](https://github.com/pufit)). +* Disable one query from 02994_sanity_check_settings [#63208](https://github.com/ClickHouse/ClickHouse/pull/63208) ([Raúl Marín](https://github.com/Algunenano)). +* Enable custom parquet encoder by default, attempt 2 [#63210](https://github.com/ClickHouse/ClickHouse/pull/63210) ([Michael Kolupaev](https://github.com/al13n321)). +* Update version after release [#63215](https://github.com/ClickHouse/ClickHouse/pull/63215) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Update version_date.tsv and changelogs after v24.4.1.2088-stable [#63217](https://github.com/ClickHouse/ClickHouse/pull/63217) ([robot-clickhouse](https://github.com/robot-clickhouse)). +* Update version_date.tsv and changelogs after v24.3.3.102-lts [#63226](https://github.com/ClickHouse/ClickHouse/pull/63226) ([robot-clickhouse](https://github.com/robot-clickhouse)). +* Update version_date.tsv and changelogs after v24.2.3.70-stable [#63227](https://github.com/ClickHouse/ClickHouse/pull/63227) ([robot-clickhouse](https://github.com/robot-clickhouse)). +* Return back [#61551](https://github.com/ClickHouse/ClickHouse/issues/61551) (More optimal loading of marks) [#63233](https://github.com/ClickHouse/ClickHouse/pull/63233) ([Anton Popov](https://github.com/CurtizJ)). +* Hide CI options under a spoiler [#63237](https://github.com/ClickHouse/ClickHouse/pull/63237) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* Add `FROM` keyword to `TRUNCATE ALL TABLES` [#63241](https://github.com/ClickHouse/ClickHouse/pull/63241) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Minor follow-up to a renaming PR [#63260](https://github.com/ClickHouse/ClickHouse/pull/63260) ([Robert Schulze](https://github.com/rschu1ze)). +* More checks for concurrently deleted files and dirs in system.remote_data_paths [#63274](https://github.com/ClickHouse/ClickHouse/pull/63274) ([Alexander Gololobov](https://github.com/davenger)). +* Fix SettingsChangesHistory.h for allow_experimental_join_condition [#63278](https://github.com/ClickHouse/ClickHouse/pull/63278) ([Raúl Marín](https://github.com/Algunenano)). +* Update version_date.tsv and changelogs after v23.8.14.6-lts [#63285](https://github.com/ClickHouse/ClickHouse/pull/63285) ([robot-clickhouse](https://github.com/robot-clickhouse)). +* Fix azure flaky test [#63286](https://github.com/ClickHouse/ClickHouse/pull/63286) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fix deadlock in `CacheDictionaryUpdateQueue` in case of exception in constructor [#63287](https://github.com/ClickHouse/ClickHouse/pull/63287) ([Nikita Taranov](https://github.com/nickitat)). +* DiskApp: fix 'list --recursive /' and crash on invalid arguments [#63296](https://github.com/ClickHouse/ClickHouse/pull/63296) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix terminate because of unhandled exception in `MergeTreeDeduplicationLog::shutdown` [#63298](https://github.com/ClickHouse/ClickHouse/pull/63298) ([Nikita Taranov](https://github.com/nickitat)). +* Move s3_plain_rewritable unit test to shell [#63317](https://github.com/ClickHouse/ClickHouse/pull/63317) ([Julia Kartseva](https://github.com/jkartseva)). +* Add tests for [#63264](https://github.com/ClickHouse/ClickHouse/issues/63264) [#63321](https://github.com/ClickHouse/ClickHouse/pull/63321) ([Raúl Marín](https://github.com/Algunenano)). +* Try fix segfault in `MergeTreeReadPoolBase::createTask` [#63323](https://github.com/ClickHouse/ClickHouse/pull/63323) ([Antonio Andelic](https://github.com/antonio2368)). +* Update README.md [#63326](https://github.com/ClickHouse/ClickHouse/pull/63326) ([Tyler Hannan](https://github.com/tylerhannan)). +* Skip unaccessible table dirs in system.remote_data_paths [#63330](https://github.com/ClickHouse/ClickHouse/pull/63330) ([Alexander Gololobov](https://github.com/davenger)). +* Add test for [#56287](https://github.com/ClickHouse/ClickHouse/issues/56287) [#63340](https://github.com/ClickHouse/ClickHouse/pull/63340) ([Raúl Marín](https://github.com/Algunenano)). +* Update README.md [#63350](https://github.com/ClickHouse/ClickHouse/pull/63350) ([Tyler Hannan](https://github.com/tylerhannan)). +* Add test for [#48049](https://github.com/ClickHouse/ClickHouse/issues/48049) [#63351](https://github.com/ClickHouse/ClickHouse/pull/63351) ([Raúl Marín](https://github.com/Algunenano)). +* Add option `query_id_prefix` to `clickhouse-benchmark` [#63352](https://github.com/ClickHouse/ClickHouse/pull/63352) ([Anton Popov](https://github.com/CurtizJ)). +* Rollback azurite to working version [#63354](https://github.com/ClickHouse/ClickHouse/pull/63354) ([alesapin](https://github.com/alesapin)). +* Randomize setting `enable_block_offset_column` in stress tests [#63355](https://github.com/ClickHouse/ClickHouse/pull/63355) ([Anton Popov](https://github.com/CurtizJ)). +* Fix AST parsing of invalid type names [#63357](https://github.com/ClickHouse/ClickHouse/pull/63357) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix some 00002_log_and_exception_messages_formatting flakiness [#63358](https://github.com/ClickHouse/ClickHouse/pull/63358) ([Michael Kolupaev](https://github.com/al13n321)). +* Add a test for [#55655](https://github.com/ClickHouse/ClickHouse/issues/55655) [#63380](https://github.com/ClickHouse/ClickHouse/pull/63380) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix data race in `reportBrokenPart` [#63396](https://github.com/ClickHouse/ClickHouse/pull/63396) ([Antonio Andelic](https://github.com/antonio2368)). +* Workaround for `oklch()` inside canvas bug for firefox [#63404](https://github.com/ClickHouse/ClickHouse/pull/63404) ([Sergei Trifonov](https://github.com/serxa)). +* Add test for issue [#47862](https://github.com/ClickHouse/ClickHouse/issues/47862) [#63424](https://github.com/ClickHouse/ClickHouse/pull/63424) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix parsing of `CREATE INDEX` query [#63425](https://github.com/ClickHouse/ClickHouse/pull/63425) ([Anton Popov](https://github.com/CurtizJ)). +* We are using Shared Catalog in the CI Logs cluster [#63442](https://github.com/ClickHouse/ClickHouse/pull/63442) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix collection of coverage data in the CI Logs cluster [#63453](https://github.com/ClickHouse/ClickHouse/pull/63453) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix flaky test for rocksdb bulk sink [#63457](https://github.com/ClickHouse/ClickHouse/pull/63457) ([Duc Canh Le](https://github.com/canhld94)). +* io_uring: refactor get reader from context [#63475](https://github.com/ClickHouse/ClickHouse/pull/63475) ([Tomer Shafir](https://github.com/tomershafir)). +* Analyzer setting max_streams_to_max_threads_ratio overflow fix [#63478](https://github.com/ClickHouse/ClickHouse/pull/63478) ([Maksim Kita](https://github.com/kitaisreal)). +* Add setting for better rendering of multiline string for pretty format [#63479](https://github.com/ClickHouse/ClickHouse/pull/63479) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix logical error when reloading config with customly created web disk broken after [#56367](https://github.com/ClickHouse/ClickHouse/issues/56367) [#63484](https://github.com/ClickHouse/ClickHouse/pull/63484) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Add test for [#49307](https://github.com/ClickHouse/ClickHouse/issues/49307) [#63486](https://github.com/ClickHouse/ClickHouse/pull/63486) ([Anton Popov](https://github.com/CurtizJ)). +* Remove leftovers of GCC support in cmake rules [#63488](https://github.com/ClickHouse/ClickHouse/pull/63488) ([Azat Khuzhin](https://github.com/azat)). +* Fix ProfileEventTimeIncrement code [#63489](https://github.com/ClickHouse/ClickHouse/pull/63489) ([Azat Khuzhin](https://github.com/azat)). +* MergeTreePrefetchedReadPool: Print parent name when logging projection parts [#63522](https://github.com/ClickHouse/ClickHouse/pull/63522) ([Raúl Marín](https://github.com/Algunenano)). +* Correctly stop `asyncCopy` tasks in all cases [#63523](https://github.com/ClickHouse/ClickHouse/pull/63523) ([Antonio Andelic](https://github.com/antonio2368)). +* Almost everything should work on AArch64 (Part of [#58061](https://github.com/ClickHouse/ClickHouse/issues/58061)) [#63527](https://github.com/ClickHouse/ClickHouse/pull/63527) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Update randomization of `old_parts_lifetime` [#63530](https://github.com/ClickHouse/ClickHouse/pull/63530) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Update 02240_system_filesystem_cache_table.sh [#63531](https://github.com/ClickHouse/ClickHouse/pull/63531) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix data race in `DistributedSink` [#63538](https://github.com/ClickHouse/ClickHouse/pull/63538) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix azure tests run on master [#63540](https://github.com/ClickHouse/ClickHouse/pull/63540) ([alesapin](https://github.com/alesapin)). +* Find a proper commit for cumulative `A Sync` status [#63543](https://github.com/ClickHouse/ClickHouse/pull/63543) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Add `no-s3-storage` tag to local_plain_rewritable ut [#63546](https://github.com/ClickHouse/ClickHouse/pull/63546) ([Julia Kartseva](https://github.com/jkartseva)). +* Go back to upstream lz4 submodule [#63574](https://github.com/ClickHouse/ClickHouse/pull/63574) ([Raúl Marín](https://github.com/Algunenano)). +* Fix logical error in ColumnTuple::tryInsert() [#63583](https://github.com/ClickHouse/ClickHouse/pull/63583) ([Michael Kolupaev](https://github.com/al13n321)). +* harmonize sumMap error messages on ILLEGAL_TYPE_OF_ARGUMENT [#63619](https://github.com/ClickHouse/ClickHouse/pull/63619) ([Yohann Jardin](https://github.com/yohannj)). +* Update README.md [#63631](https://github.com/ClickHouse/ClickHouse/pull/63631) ([Tyler Hannan](https://github.com/tylerhannan)). +* Ignore global profiler if system.trace_log is not enabled and fix really disable it for keeper standalone build [#63632](https://github.com/ClickHouse/ClickHouse/pull/63632) ([Azat Khuzhin](https://github.com/azat)). +* Fixes for 00002_log_and_exception_messages_formatting [#63634](https://github.com/ClickHouse/ClickHouse/pull/63634) ([Azat Khuzhin](https://github.com/azat)). +* Fix tests flakiness due to long SYSTEM FLUSH LOGS (explicitly specify old_parts_lifetime) [#63639](https://github.com/ClickHouse/ClickHouse/pull/63639) ([Azat Khuzhin](https://github.com/azat)). +* Update clickhouse-test help section [#63663](https://github.com/ClickHouse/ClickHouse/pull/63663) ([Ali](https://github.com/xogoodnow)). +* Fix bad test `02950_part_log_bytes_uncompressed` [#63672](https://github.com/ClickHouse/ClickHouse/pull/63672) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Remove leftovers of `optimize_monotonous_functions_in_order_by` [#63674](https://github.com/ClickHouse/ClickHouse/pull/63674) ([Nikita Taranov](https://github.com/nickitat)). +* tests: attempt to fix 02340_parts_refcnt_mergetree flakiness [#63684](https://github.com/ClickHouse/ClickHouse/pull/63684) ([Azat Khuzhin](https://github.com/azat)). +* Parallel replicas: simple cleanup [#63685](https://github.com/ClickHouse/ClickHouse/pull/63685) ([Igor Nikonov](https://github.com/devcrafter)). +* Cancel S3 reads properly when parallel reads are used [#63687](https://github.com/ClickHouse/ClickHouse/pull/63687) ([Antonio Andelic](https://github.com/antonio2368)). +* Explain map insertion order [#63690](https://github.com/ClickHouse/ClickHouse/pull/63690) ([Mark Needham](https://github.com/mneedham)). +* selectRangesToRead() simple cleanup [#63692](https://github.com/ClickHouse/ClickHouse/pull/63692) ([Igor Nikonov](https://github.com/devcrafter)). +* Fix fuzzed analyzer_join_with_constant query [#63702](https://github.com/ClickHouse/ClickHouse/pull/63702) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Add missing explicit instantiations of ColumnUnique [#63718](https://github.com/ClickHouse/ClickHouse/pull/63718) ([Raúl Marín](https://github.com/Algunenano)). +* Better asserts in ColumnString.h [#63719](https://github.com/ClickHouse/ClickHouse/pull/63719) ([Raúl Marín](https://github.com/Algunenano)). +* Don't randomize some settings in 02941_variant_type_* tests to avoid timeouts [#63721](https://github.com/ClickHouse/ClickHouse/pull/63721) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix flaky 03145_non_loaded_projection_backup.sh [#63728](https://github.com/ClickHouse/ClickHouse/pull/63728) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Userspace page cache: don't collect stats if cache is unused [#63730](https://github.com/ClickHouse/ClickHouse/pull/63730) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix insignificant UBSAN error in QueryAnalyzer::replaceNodesWithPositionalArguments() [#63734](https://github.com/ClickHouse/ClickHouse/pull/63734) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix a bug in resolving matcher inside lambda inside ARRAY JOIN [#63744](https://github.com/ClickHouse/ClickHouse/pull/63744) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Remove unused CaresPTRResolver::cancel_requests method [#63754](https://github.com/ClickHouse/ClickHouse/pull/63754) ([Arthur Passos](https://github.com/arthurpassos)). +* Do not hide disk name [#63756](https://github.com/ClickHouse/ClickHouse/pull/63756) ([Kseniia Sumarokova](https://github.com/kssenii)). +* CI: remove Cancel and Debug workflows as redundant [#63757](https://github.com/ClickHouse/ClickHouse/pull/63757) ([Max K.](https://github.com/maxknv)). +* Security Policy: Add notification process [#63773](https://github.com/ClickHouse/ClickHouse/pull/63773) ([Leticia Webb](https://github.com/leticiawebb)). +* Fix typo [#63774](https://github.com/ClickHouse/ClickHouse/pull/63774) ([Anton Popov](https://github.com/CurtizJ)). +* Fix fuzzer when only explicit faults are used [#63775](https://github.com/ClickHouse/ClickHouse/pull/63775) ([Raúl Marín](https://github.com/Algunenano)). +* Settings typo [#63782](https://github.com/ClickHouse/ClickHouse/pull/63782) ([Rory Crispin](https://github.com/RoryCrispin)). +* Changed the previous value of `output_format_pretty_preserve_border_for_multiline_string` setting [#63783](https://github.com/ClickHouse/ClickHouse/pull/63783) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* fix antlr insertStmt for issue 63657 [#63811](https://github.com/ClickHouse/ClickHouse/pull/63811) ([GG Bond](https://github.com/zzyReal666)). +* Fix race in `ReplicatedMergeTreeLogEntryData` [#63816](https://github.com/ClickHouse/ClickHouse/pull/63816) ([Antonio Andelic](https://github.com/antonio2368)). +* Allow allocation during job destructor in `ThreadPool` [#63829](https://github.com/ClickHouse/ClickHouse/pull/63829) ([Antonio Andelic](https://github.com/antonio2368)). +* io_uring: add basic io_uring clickhouse perf test [#63835](https://github.com/ClickHouse/ClickHouse/pull/63835) ([Tomer Shafir](https://github.com/tomershafir)). +* fix typo [#63838](https://github.com/ClickHouse/ClickHouse/pull/63838) ([Alexander Gololobov](https://github.com/davenger)). +* Remove unnecessary logging statements in MergeJoinTransform.cpp [#63860](https://github.com/ClickHouse/ClickHouse/pull/63860) ([vdimir](https://github.com/vdimir)). +* CI: disable ARM integration test cases with libunwind crash [#63867](https://github.com/ClickHouse/ClickHouse/pull/63867) ([Max K.](https://github.com/maxknv)). +* Fix some settings values in 02455_one_row_from_csv_memory_usage test to make it less flaky [#63874](https://github.com/ClickHouse/ClickHouse/pull/63874) ([Kruglov Pavel](https://github.com/Avogar)). +* Randomise `allow_experimental_parallel_reading_from_replicas` in stress tests [#63899](https://github.com/ClickHouse/ClickHouse/pull/63899) ([Nikita Taranov](https://github.com/nickitat)). +* Fix logs test for binary data by converting it to a valid UTF8 string. [#63909](https://github.com/ClickHouse/ClickHouse/pull/63909) ([Alexey Katsman](https://github.com/alexkats)). +* More sanity checks for parallel replicas [#63910](https://github.com/ClickHouse/ClickHouse/pull/63910) ([Nikita Taranov](https://github.com/nickitat)). +* Insignificant libunwind build fixes [#63946](https://github.com/ClickHouse/ClickHouse/pull/63946) ([Azat Khuzhin](https://github.com/azat)). +* Revert multiline pretty changes due to performance problems [#63947](https://github.com/ClickHouse/ClickHouse/pull/63947) ([Raúl Marín](https://github.com/Algunenano)). +* Some usability improvements for c++expr script [#63948](https://github.com/ClickHouse/ClickHouse/pull/63948) ([Azat Khuzhin](https://github.com/azat)). +* CI: aarch64: disable arm integration tests with kerberaized kafka [#63961](https://github.com/ClickHouse/ClickHouse/pull/63961) ([Max K.](https://github.com/maxknv)). +* Slightly better setting `force_optimize_projection_name` [#63997](https://github.com/ClickHouse/ClickHouse/pull/63997) ([Anton Popov](https://github.com/CurtizJ)). +* Better script to collect symbols statistics [#64013](https://github.com/ClickHouse/ClickHouse/pull/64013) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix a typo in Analyzer [#64022](https://github.com/ClickHouse/ClickHouse/pull/64022) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix libbcrypt for FreeBSD build [#64023](https://github.com/ClickHouse/ClickHouse/pull/64023) ([Azat Khuzhin](https://github.com/azat)). +* Fix searching for libclang_rt.builtins.*.a on FreeBSD [#64051](https://github.com/ClickHouse/ClickHouse/pull/64051) ([Azat Khuzhin](https://github.com/azat)). +* Fix waiting for mutations with retriable errors [#64063](https://github.com/ClickHouse/ClickHouse/pull/64063) ([Alexander Tokmakov](https://github.com/tavplubix)). +* harmonize h3PointDist* error messages [#64080](https://github.com/ClickHouse/ClickHouse/pull/64080) ([Yohann Jardin](https://github.com/yohannj)). +* This log message is better in Trace [#64081](https://github.com/ClickHouse/ClickHouse/pull/64081) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* tests: fix expected error for 03036_reading_s3_archives (fixes CI) [#64089](https://github.com/ClickHouse/ClickHouse/pull/64089) ([Azat Khuzhin](https://github.com/azat)). +* Fix sanitizers [#64090](https://github.com/ClickHouse/ClickHouse/pull/64090) ([Azat Khuzhin](https://github.com/azat)). +* Update llvm/clang to 18.1.6 [#64091](https://github.com/ClickHouse/ClickHouse/pull/64091) ([Azat Khuzhin](https://github.com/azat)). +* CI: mergeable check redesign [#64093](https://github.com/ClickHouse/ClickHouse/pull/64093) ([Max K.](https://github.com/maxknv)). +* Move `isAllASCII` from UTFHelper to StringUtils [#64108](https://github.com/ClickHouse/ClickHouse/pull/64108) ([Robert Schulze](https://github.com/rschu1ze)). +* Clean up .clang-tidy after transition to Clang 18 [#64111](https://github.com/ClickHouse/ClickHouse/pull/64111) ([Robert Schulze](https://github.com/rschu1ze)). +* Ignore exception when checking for cgroupsv2 [#64118](https://github.com/ClickHouse/ClickHouse/pull/64118) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix UBSan error in negative positional arguments [#64127](https://github.com/ClickHouse/ClickHouse/pull/64127) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Syncing code [#64135](https://github.com/ClickHouse/ClickHouse/pull/64135) ([Antonio Andelic](https://github.com/antonio2368)). +* Losen build resource limits for unusual architectures [#64152](https://github.com/ClickHouse/ClickHouse/pull/64152) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* fix clang tidy [#64179](https://github.com/ClickHouse/ClickHouse/pull/64179) ([Han Fei](https://github.com/hanfei1991)). +* Fix global query profiler [#64187](https://github.com/ClickHouse/ClickHouse/pull/64187) ([Azat Khuzhin](https://github.com/azat)). +* CI: cancel running PR wf after adding to MQ [#64188](https://github.com/ClickHouse/ClickHouse/pull/64188) ([Max K.](https://github.com/maxknv)). +* Add debug logging to EmbeddedRocksDBBulkSink [#64203](https://github.com/ClickHouse/ClickHouse/pull/64203) ([vdimir](https://github.com/vdimir)). +* Fix special builds (due to excessive resource usage - memory/CPU) [#64204](https://github.com/ClickHouse/ClickHouse/pull/64204) ([Azat Khuzhin](https://github.com/azat)). +* Add gh to style-check dockerfile [#64227](https://github.com/ClickHouse/ClickHouse/pull/64227) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Followup for [#63691](https://github.com/ClickHouse/ClickHouse/issues/63691) [#64285](https://github.com/ClickHouse/ClickHouse/pull/64285) ([vdimir](https://github.com/vdimir)). +* Rename allow_deprecated_functions to allow_deprecated_error_prone_win… [#64358](https://github.com/ClickHouse/ClickHouse/pull/64358) ([Raúl Marín](https://github.com/Algunenano)). +* Update description for settings `cross_join_min_rows_to_compress` and `cross_join_min_bytes_to_compress` [#64360](https://github.com/ClickHouse/ClickHouse/pull/64360) ([Nikita Fomichev](https://github.com/fm4v)). +* Rename aggregate_function_group_array_has_limit_size [#64362](https://github.com/ClickHouse/ClickHouse/pull/64362) ([Raúl Marín](https://github.com/Algunenano)). +* Split tests 03039_dynamic_all_merge_algorithms to avoid timeouts [#64363](https://github.com/ClickHouse/ClickHouse/pull/64363) ([Kruglov Pavel](https://github.com/Avogar)). +* Clean settings in 02943_variant_read_subcolumns test [#64437](https://github.com/ClickHouse/ClickHouse/pull/64437) ([Kruglov Pavel](https://github.com/Avogar)). +* CI: Critical bugfix category in PR template [#64480](https://github.com/ClickHouse/ClickHouse/pull/64480) ([Max K.](https://github.com/maxknv)). + diff --git a/docs/en/development/continuous-integration.md b/docs/en/development/continuous-integration.md index 91253ca5e44..c348eb5ca07 100644 --- a/docs/en/development/continuous-integration.md +++ b/docs/en/development/continuous-integration.md @@ -71,7 +71,7 @@ If it fails, fix the style errors following the [code style guide](style.md). ```sh mkdir -p /tmp/test_output # running all checks -docker run --rm --volume=.:/ClickHouse --volume=/tmp/test_output:/test_output -u $(id -u ${USER}):$(id -g ${USER}) --cap-add=SYS_PTRACE clickhouse/style-test +python3 tests/ci/style_check.py --no-push # run specified check script (e.g.: ./check-mypy) docker run --rm --volume=.:/ClickHouse --volume=/tmp/test_output:/test_output -u $(id -u ${USER}):$(id -g ${USER}) --cap-add=SYS_PTRACE --entrypoint= -w/ClickHouse/utils/check-style clickhouse/style-test ./check-mypy diff --git a/docs/en/engines/table-engines/integrations/mongodb.md b/docs/en/engines/table-engines/integrations/mongodb.md index f87e8da8b5b..5bb3bc752f5 100644 --- a/docs/en/engines/table-engines/integrations/mongodb.md +++ b/docs/en/engines/table-engines/integrations/mongodb.md @@ -34,10 +34,11 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name - `options` — MongoDB connection string options (optional parameter). :::tip -If you are using the MongoDB Atlas cloud offering please add these options: +If you are using the MongoDB Atlas cloud offering: ``` -'connectTimeoutMS=10000&ssl=true&authSource=admin' +- connection url can be obtained from 'Atlas SQL' option +- use options: 'connectTimeoutMS=10000&ssl=true&authSource=admin' ``` ::: diff --git a/docs/en/engines/table-engines/mergetree-family/invertedindexes.md b/docs/en/engines/table-engines/mergetree-family/invertedindexes.md index f58a06464b2..ec4c14b6bf1 100644 --- a/docs/en/engines/table-engines/mergetree-family/invertedindexes.md +++ b/docs/en/engines/table-engines/mergetree-family/invertedindexes.md @@ -37,7 +37,7 @@ ways, for example with respect to their DDL/DQL syntax or performance/compressio To use full-text indexes, first enable them in the configuration: ```sql -SET allow_experimental_inverted_index = true; +SET allow_experimental_full_text_index = true; ``` An full-text index can be defined on a string column using the following syntax diff --git a/docs/en/engines/table-engines/mergetree-family/mergetree.md b/docs/en/engines/table-engines/mergetree-family/mergetree.md index a009c4a32f3..689c05a24af 100644 --- a/docs/en/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/en/engines/table-engines/mergetree-family/mergetree.md @@ -178,6 +178,10 @@ Additional parameters that control the behavior of the `MergeTree` (optional): `max_partitions_to_read` — Limits the maximum number of partitions that can be accessed in one query. You can also specify setting [max_partitions_to_read](/docs/en/operations/settings/merge-tree-settings.md/#max-partitions-to-read) in the global setting. +#### allow_experimental_optimized_row_order + +`allow_experimental_optimized_row_order` - Experimental. Enables the optimization of the row order during inserts to improve the compressability of the data for compression codecs (e.g. LZ4). Analyzes and reorders the data, and thus increases the CPU overhead of inserts. + **Example of Sections Setting** ``` sql diff --git a/docs/en/operations/configuration-files.md b/docs/en/operations/configuration-files.md index 0675e0edcb6..57fea3cca3a 100644 --- a/docs/en/operations/configuration-files.md +++ b/docs/en/operations/configuration-files.md @@ -7,6 +7,8 @@ sidebar_label: Configuration Files # Configuration Files The ClickHouse server can be configured with configuration files in XML or YAML syntax. In most installation types, the ClickHouse server runs with `/etc/clickhouse-server/config.xml` as default configuration file, but it is also possible to specify the location of the configuration file manually at server startup using command line option `--config-file=` or `-C`. Additional configuration files may be placed into directory `config.d/` relative to the main configuration file, for example into directory `/etc/clickhouse-server/config.d/`. Files in this directory and the main configuration are merged in a preprocessing step before the configuration is applied in ClickHouse server. Configuration files are merged in alphabetical order. To simplify updates and improve modularization, it is best practice to keep the default `config.xml` file unmodified and place additional customization into `config.d/`. +(The ClickHouse keeper configuration lives in `/etc/clickhouse-keeper/keeper_config.xml` and thus the additional files need to be placed in `/etc/clickhouse-keeper/keeper_config.d/` ) + It is possible to mix XML and YAML configuration files, for example you could have a main configuration file `config.xml` and additional configuration files `config.d/network.xml`, `config.d/timezone.yaml` and `config.d/keeper.yaml`. Mixing XML and YAML within a single configuration file is not supported. XML configuration files should use `...` as top-level tag. In YAML configuration files, `clickhouse:` is optional, the parser inserts it implicitly if absent. diff --git a/docs/en/operations/settings/merge-tree-settings.md b/docs/en/operations/settings/merge-tree-settings.md index 76250b80476..c3f303dcd38 100644 --- a/docs/en/operations/settings/merge-tree-settings.md +++ b/docs/en/operations/settings/merge-tree-settings.md @@ -885,3 +885,47 @@ Default value: false **See Also** - [exclude_deleted_rows_for_part_size_in_merge](#exclude_deleted_rows_for_part_size_in_merge) setting + +### allow_experimental_optimized_row_order + +Controls if the row order should be optimized during inserts to improve the compressability of the newly inserted table part. + +MergeTree tables are (optionally) compressed using [compression codecs](../../sql-reference/statements/create/table.md#column_compression_codec). +Generic compression codecs such as LZ4 and ZSTD achieve maximum compression rates if the data exposes patterns. +Long runs of the same value typically compress very well. + +If this setting is enabled, ClickHouse attempts to store the data in newly inserted parts in a row order that minimizes the number of equal-value runs across the columns of the new table part. +In other words, a small number of equal-value runs mean that individual runs are long and compress well. + +Finding the optimal row order is computationally infeasible (NP hard). +Therefore, ClickHouse uses a heuristics to quickly find a row order which still improves compression rates over the original row order. + +
+ +Heuristics for finding a row order + +It is generally possible to shuffle the rows of a table (or table part) freely as SQL considers the same table (table part) in different row order equivalent. + +This freedom of shuffling rows is restricted when a primary key is defined for the table. +In ClickHouse, a primary key `C1, C2, ..., CN` enforces that the table rows are sorted by columns `C1`, `C2`, ... `Cn` ([clustered index](https://en.wikipedia.org/wiki/Database_index#Clustered)). +As a result, rows can only be shuffled within "equivalence classes" of row, i.e. rows which have the same values in their primary key columns. +The intuition is that primary keys with high-cardinality, e.g. primary keys involving a `DateTime64` timestamp column, lead to many small equivalence classes. +Likewise, tables with a low-cardinality primary key, create few and large equivalence classes. +A table with no primary key represents the extreme case of a single equivalence class which spans all rows. + +The fewer and the larger the equivalence classes are, the higher the degree of freedom when re-shuffling rows. + +The heuristics applied to find the best row order within each equivalence class is suggested by D. Lemir, O. Kaser in [Reordering columns for smaller indexes](https://doi.org/10.1016/j.ins.2011.02.002) and based on sorting the rows within each equivalence class by ascending cardinality of the non-primary key columns. +It performs three steps: +1. Find all equivalence classes based on the row values in primary key columns. +2. For each equivalence class, calculate (usually estimate) the cardinalities of the non-primary-key columns. +3. For each equivalence class, sort the rows in order of ascending non-primary-key column cardinality. + +
+ +If enabled, insert operations incur additional CPU costs to analyze and optimize the row order of the new data. +INSERTs are expected to take 30-50% longer depending on the data characteristics. +Compression rates of LZ4 or ZSTD improve on average by 20-40%. + +This setting works best for tables with no primary key or a low-cardinality primary key, i.e. a table with only few distinct primary key values. +High-cardinality primary keys, e.g. involving timestamp columns of type `DateTime64`, are not expected to benefit from this setting. diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index 252b041ef6f..0b905df21d4 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -1956,7 +1956,7 @@ Possible values: - Positive integer. - 0 — Asynchronous insertions are disabled. -Default value: `1000000`. +Default value: `10485760`. ### async_insert_max_query_number {#async-insert-max-query-number} diff --git a/docs/en/sql-reference/aggregate-functions/reference/corr.md b/docs/en/sql-reference/aggregate-functions/reference/corr.md index 8fa493c9630..5681c942169 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/corr.md +++ b/docs/en/sql-reference/aggregate-functions/reference/corr.md @@ -5,10 +5,57 @@ sidebar_position: 107 # corr -Syntax: `corr(x, y)` +Calculates the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient): + +$$ +\frac{\Sigma{(x - \bar{x})(y - \bar{y})}}{\sqrt{\Sigma{(x - \bar{x})^2} * \Sigma{(y - \bar{y})^2}}} +$$ -Calculates the Pearson correlation coefficient: `Σ((x - x̅)(y - y̅)) / sqrt(Σ((x - x̅)^2) * Σ((y - y̅)^2))`. :::note -This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `corrStable` function. It works slower but provides a lower computational error. -::: \ No newline at end of file +This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the [`corrStable`](../reference/corrstable.md) function. It is slower but provides a more accurate result. +::: + +**Syntax** + +```sql +corr(x, y) +``` + +**Arguments** + +- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). +- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). + +**Returned Value** + +- The Pearson correlation coefficient. [Float64](../../data-types/float.md). + +**Example** + +Query: + +```sql +DROP TABLE IF EXISTS series; +CREATE TABLE series +( + i UInt32, + x_value Float64, + y_value Float64 +) +ENGINE = Memory; +INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6, -4.4),(2, -9.6, 3),(3, -1.3, -4),(4, 5.3, 9.7),(5, 4.4, 0.037),(6, -8.6, -7.8),(7, 5.1, 9.3),(8, 7.9, -3.6),(9, -8.2, 0.62),(10, -3, 7.3); +``` + +```sql +SELECT corr(x_value, y_value) +FROM series; +``` + +Result: + +```response +┌─corr(x_value, y_value)─┐ +│ 0.1730265755453256 │ +└────────────────────────┘ +``` diff --git a/docs/en/sql-reference/aggregate-functions/reference/corrmatrix.md b/docs/en/sql-reference/aggregate-functions/reference/corrmatrix.md new file mode 100644 index 00000000000..718477b28dd --- /dev/null +++ b/docs/en/sql-reference/aggregate-functions/reference/corrmatrix.md @@ -0,0 +1,55 @@ +--- +slug: /en/sql-reference/aggregate-functions/reference/corrmatrix +sidebar_position: 108 +--- + +# corrMatrix + +Computes the correlation matrix over N variables. + +**Syntax** + +```sql +corrMatrix(x[, ...]) +``` + +**Arguments** + +- `x` — a variable number of parameters. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). + +**Returned value** + +- Correlation matrix. [Array](../../data-types/array.md)([Array](../../data-types/array.md)([Float64](../../data-types/float.md))). + +**Example** + +Query: + +```sql +DROP TABLE IF EXISTS test; +CREATE TABLE test +( + a UInt32, + b Float64, + c Float64, + d Float64 +) +ENGINE = Memory; +INSERT INTO test(a, b, c, d) VALUES (1, 5.6, -4.4, 2.6), (2, -9.6, 3, 3.3), (3, -1.3, -4, 1.2), (4, 5.3, 9.7, 2.3), (5, 4.4, 0.037, 1.222), (6, -8.6, -7.8, 2.1233), (7, 5.1, 9.3, 8.1222), (8, 7.9, -3.6, 9.837), (9, -8.2, 0.62, 8.43555), (10, -3, 7.3, 6.762); +``` + +```sql +SELECT arrayMap(x -> round(x, 3), arrayJoin(corrMatrix(a, b, c, d))) AS corrMatrix +FROM test; +``` + +Result: + +```response + ┌─corrMatrix─────────────┐ +1. │ [1,-0.096,0.243,0.746] │ +2. │ [-0.096,1,0.173,0.106] │ +3. │ [0.243,0.173,1,0.258] │ +4. │ [0.746,0.106,0.258,1] │ + └────────────────────────┘ +``` \ No newline at end of file diff --git a/docs/en/sql-reference/aggregate-functions/reference/corrstable.md b/docs/en/sql-reference/aggregate-functions/reference/corrstable.md new file mode 100644 index 00000000000..b35442a32b6 --- /dev/null +++ b/docs/en/sql-reference/aggregate-functions/reference/corrstable.md @@ -0,0 +1,58 @@ +--- +slug: /en/sql-reference/aggregate-functions/reference/corrstable +sidebar_position: 107 +--- + +# corrStable + +Calculates the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient): + +$$ +\frac{\Sigma{(x - \bar{x})(y - \bar{y})}}{\sqrt{\Sigma{(x - \bar{x})^2} * \Sigma{(y - \bar{y})^2}}} +$$ + +Similar to the [`corr`](../reference/corr.md) function, but uses a numerically stable algorithm. As a result, `corrStable` is slower than `corr` but produces a more accurate result. + +**Syntax** + +```sql +corrStable(x, y) +``` + +**Arguments** + +- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). +- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). + +**Returned Value** + +- The Pearson correlation coefficient. [Float64](../../data-types/float.md). + +***Example** + +Query: + +```sql +DROP TABLE IF EXISTS series; +CREATE TABLE series +( + i UInt32, + x_value Float64, + y_value Float64 +) +ENGINE = Memory; +INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6, -4.4),(2, -9.6, 3),(3, -1.3, -4),(4, 5.3, 9.7),(5, 4.4, 0.037),(6, -8.6, -7.8),(7, 5.1, 9.3),(8, 7.9, -3.6),(9, -8.2, 0.62),(10, -3, 7.3); +``` + +```sql +SELECT corrStable(x_value, y_value) +FROM series; +``` + +Result: + +```response +┌─corrStable(x_value, y_value)─┐ +│ 0.17302657554532558 │ +└──────────────────────────────┘ +``` \ No newline at end of file diff --git a/docs/en/sql-reference/aggregate-functions/reference/covarpop.md b/docs/en/sql-reference/aggregate-functions/reference/covarpop.md index 579035b2fe1..78b9f4cffea 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/covarpop.md +++ b/docs/en/sql-reference/aggregate-functions/reference/covarpop.md @@ -1,14 +1,54 @@ --- slug: /en/sql-reference/aggregate-functions/reference/covarpop -sidebar_position: 36 +sidebar_position: 37 --- # covarPop -Syntax: `covarPop(x, y)` +Calculates the population covariance: -Calculates the value of `Σ((x - x̅)(y - y̅)) / n`. +$$ +\frac{\Sigma{(x - \bar{x})(y - \bar{y})}}{n} +$$ :::note -This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `covarPopStable` function. It works slower but provides a lower computational error. -::: \ No newline at end of file +This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the [`covarPopStable`](../reference/covarpopstable.md) function. It works slower but provides a lower computational error. +::: + +**Syntax** + +```sql +covarPop(x, y) +``` + +**Arguments** + +- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). +- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). + +**Returned Value** + +- The population covariance between `x` and `y`. [Float64](../../data-types/float.md). + +**Example** + +Query: + +```sql +DROP TABLE IF EXISTS series; +CREATE TABLE series(i UInt32, x_value Float64, y_value Float64) ENGINE = Memory; +INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6, -4.4),(2, -9.6, 3),(3, -1.3, -4),(4, 5.3, 9.7),(5, 4.4, 0.037),(6, -8.6, -7.8),(7, 5.1, 9.3),(8, 7.9, -3.6),(9, -8.2, 0.62),(10, -3, 7.3); +``` + +```sql +SELECT covarPop(x_value, y_value) +FROM series; +``` + +Result: + +```reference +┌─covarPop(x_value, y_value)─┐ +│ 6.485648 │ +└────────────────────────────┘ +``` diff --git a/docs/en/sql-reference/aggregate-functions/reference/covarpopmatrix.md b/docs/en/sql-reference/aggregate-functions/reference/covarpopmatrix.md new file mode 100644 index 00000000000..d7400599a49 --- /dev/null +++ b/docs/en/sql-reference/aggregate-functions/reference/covarpopmatrix.md @@ -0,0 +1,55 @@ +--- +slug: /en/sql-reference/aggregate-functions/reference/covarpopmatrix +sidebar_position: 36 +--- + +# covarPopMatrix + +Returns the population covariance matrix over N variables. + +**Syntax** + +```sql +covarPopMatrix(x[, ...]) +``` + +**Arguments** + +- `x` — a variable number of parameters. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). + +**Returned Value** + +- Population covariance matrix. [Array](../../data-types/array.md)([Array](../../data-types/array.md)([Float64](../../data-types/float.md))). + +**Example** + +Query: + +```sql +DROP TABLE IF EXISTS test; +CREATE TABLE test +( + a UInt32, + b Float64, + c Float64, + d Float64 +) +ENGINE = Memory; +INSERT INTO test(a, b, c, d) VALUES (1, 5.6, -4.4, 2.6), (2, -9.6, 3, 3.3), (3, -1.3, -4, 1.2), (4, 5.3, 9.7, 2.3), (5, 4.4, 0.037, 1.222), (6, -8.6, -7.8, 2.1233), (7, 5.1, 9.3, 8.1222), (8, 7.9, -3.6, 9.837), (9, -8.2, 0.62, 8.43555), (10, -3, 7.3, 6.762); +``` + +```sql +SELECT arrayMap(x -> round(x, 3), arrayJoin(covarPopMatrix(a, b, c, d))) AS covarPopMatrix +FROM test; +``` + +Result: + +```reference + ┌─covarPopMatrix────────────┐ +1. │ [8.25,-1.76,4.08,6.748] │ +2. │ [-1.76,41.07,6.486,2.132] │ +3. │ [4.08,6.486,34.21,4.755] │ +4. │ [6.748,2.132,4.755,9.93] │ + └───────────────────────────┘ +``` diff --git a/docs/en/sql-reference/aggregate-functions/reference/covarpopstable.md b/docs/en/sql-reference/aggregate-functions/reference/covarpopstable.md new file mode 100644 index 00000000000..68e78fc3bd8 --- /dev/null +++ b/docs/en/sql-reference/aggregate-functions/reference/covarpopstable.md @@ -0,0 +1,60 @@ +--- +slug: /en/sql-reference/aggregate-functions/reference/covarpopstable +sidebar_position: 36 +--- + +# covarPopStable + +Calculates the value of the population covariance: + +$$ +\frac{\Sigma{(x - \bar{x})(y - \bar{y})}}{n} +$$ + +It is similar to the [covarPop](../reference/covarpop.md) function, but uses a numerically stable algorithm. As a result, `covarPopStable` is slower than `covarPop` but produces a more accurate result. + + +**Syntax** + +```sql +covarPop(x, y) +``` + +**Arguments** + +- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). +- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). + +**Returned Value** + +- The population covariance between `x` and `y`. [Float64](../../data-types/float.md). + +**Example** + +Query: + +```sql +DROP TABLE IF EXISTS series; +CREATE TABLE series(i UInt32, x_value Float64, y_value Float64) ENGINE = Memory; +INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6,-4.4),(2, -9.6,3),(3, -1.3,-4),(4, 5.3,9.7),(5, 4.4,0.037),(6, -8.6,-7.8),(7, 5.1,9.3),(8, 7.9,-3.6),(9, -8.2,0.62),(10, -3,7.3); +``` + +```sql +SELECT covarPopStable(x_value, y_value) +FROM +( + SELECT + x_value, + y_value + FROM series +); +``` + +Result: + +```reference +┌─covarPopStable(x_value, y_value)─┐ +│ 6.485648 │ +└──────────────────────────────────┘ +``` + diff --git a/docs/en/sql-reference/aggregate-functions/reference/covarsamp.md b/docs/en/sql-reference/aggregate-functions/reference/covarsamp.md index bdcc6c0e3d0..7d5d5d13f35 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/covarsamp.md +++ b/docs/en/sql-reference/aggregate-functions/reference/covarsamp.md @@ -7,8 +7,74 @@ sidebar_position: 37 Calculates the value of `Σ((x - x̅)(y - y̅)) / (n - 1)`. -Returns Float64. When `n <= 1`, returns `nan`. - :::note -This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `covarSampStable` function. It works slower but provides a lower computational error. +This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the [`covarSampStable`](../reference/covarsamp.md) function. It works slower but provides a lower computational error. ::: + +**Syntax** + +```sql +covarSamp(x, y) +``` + +**Arguments** + +- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). +- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). + +**Returned Value** + +- The sample covariance between `x` and `y`. For `n <= 1`, `nan` is returned. [Float64](../../data-types/float.md). + +**Example** + +Query: + +```sql +DROP TABLE IF EXISTS series; +CREATE TABLE series(i UInt32, x_value Float64, y_value Float64) ENGINE = Memory; +INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6,-4.4),(2, -9.6,3),(3, -1.3,-4),(4, 5.3,9.7),(5, 4.4,0.037),(6, -8.6,-7.8),(7, 5.1,9.3),(8, 7.9,-3.6),(9, -8.2,0.62),(10, -3,7.3); +``` + +```sql +SELECT covarSamp(x_value, y_value) +FROM +( + SELECT + x_value, + y_value + FROM series +); +``` + +Result: + +```reference +┌─covarSamp(x_value, y_value)─┐ +│ 7.206275555555556 │ +└─────────────────────────────┘ +``` + +Query: + +```sql +SELECT covarSamp(x_value, y_value) +FROM +( + SELECT + x_value, + y_value + FROM series LIMIT 1 +); + +``` + +Result: + +```reference +┌─covarSamp(x_value, y_value)─┐ +│ nan │ +└─────────────────────────────┘ +``` + + diff --git a/docs/en/sql-reference/aggregate-functions/reference/covarsampmatrix.md b/docs/en/sql-reference/aggregate-functions/reference/covarsampmatrix.md new file mode 100644 index 00000000000..b71d753f0be --- /dev/null +++ b/docs/en/sql-reference/aggregate-functions/reference/covarsampmatrix.md @@ -0,0 +1,57 @@ +--- +slug: /en/sql-reference/aggregate-functions/reference/covarsampmatrix +sidebar_position: 38 +--- + +# covarSampMatrix + +Returns the sample covariance matrix over N variables. + +**Syntax** + +```sql +covarSampMatrix(x[, ...]) +``` + +**Arguments** + +- `x` — a variable number of parameters. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). + +**Returned Value** + +- Sample covariance matrix. [Array](../../data-types/array.md)([Array](../../data-types/array.md)([Float64](../../data-types/float.md))). + +**Example** + +Query: + +```sql +DROP TABLE IF EXISTS test; +CREATE TABLE test +( + a UInt32, + b Float64, + c Float64, + d Float64 +) +ENGINE = Memory; +INSERT INTO test(a, b, c, d) VALUES (1, 5.6, -4.4, 2.6), (2, -9.6, 3, 3.3), (3, -1.3, -4, 1.2), (4, 5.3, 9.7, 2.3), (5, 4.4, 0.037, 1.222), (6, -8.6, -7.8, 2.1233), (7, 5.1, 9.3, 8.1222), (8, 7.9, -3.6, 9.837), (9, -8.2, 0.62, 8.43555), (10, -3, 7.3, 6.762); +``` + +```sql +SELECT arrayMap(x -> round(x, 3), arrayJoin(covarSampMatrix(a, b, c, d))) AS covarSampMatrix +FROM test; +``` + +Result: + +```reference + ┌─covarSampMatrix─────────────┐ +1. │ [9.167,-1.956,4.534,7.498] │ +2. │ [-1.956,45.634,7.206,2.369] │ +3. │ [4.534,7.206,38.011,5.283] │ +4. │ [7.498,2.369,5.283,11.034] │ + └─────────────────────────────┘ +``` + + diff --git a/docs/en/sql-reference/aggregate-functions/reference/covarsampstable.md b/docs/en/sql-reference/aggregate-functions/reference/covarsampstable.md new file mode 100644 index 00000000000..3e6867b96d6 --- /dev/null +++ b/docs/en/sql-reference/aggregate-functions/reference/covarsampstable.md @@ -0,0 +1,73 @@ +--- +slug: /en/sql-reference/aggregate-functions/reference/covarsampstable +sidebar_position: 37 +--- + +# covarSampStable + +Calculates the value of `Σ((x - x̅)(y - y̅)) / (n - 1)`. Similar to [covarSamp](../reference/covarsamp.md) but works slower while providing a lower computational error. + +**Syntax** + +```sql +covarSampStable(x, y) +``` + +**Arguments** + +- `x` — first variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). +- `y` — second variable. [(U)Int*](../../data-types/int-uint.md), [Float*](../../data-types/float.md), [Decimal](../../data-types/decimal.md). + +**Returned Value** + +- The sample covariance between `x` and `y`. For `n <= 1`, `inf` is returned. [Float64](../../data-types/float.md). + +**Example** + +Query: + +```sql +DROP TABLE IF EXISTS series; +CREATE TABLE series(i UInt32, x_value Float64, y_value Float64) ENGINE = Memory; +INSERT INTO series(i, x_value, y_value) VALUES (1, 5.6,-4.4),(2, -9.6,3),(3, -1.3,-4),(4, 5.3,9.7),(5, 4.4,0.037),(6, -8.6,-7.8),(7, 5.1,9.3),(8, 7.9,-3.6),(9, -8.2,0.62),(10, -3,7.3); +``` + +```sql +SELECT covarSampStable(x_value, y_value) +FROM +( + SELECT + x_value, + y_value + FROM series +); +``` + +Result: + +```reference +┌─covarSampStable(x_value, y_value)─┐ +│ 7.206275555555556 │ +└───────────────────────────────────┘ +``` + +Query: + +```sql +SELECT covarSampStable(x_value, y_value) +FROM +( + SELECT + x_value, + y_value + FROM series LIMIT 1 +); +``` + +Result: + +```reference +┌─covarSampStable(x_value, y_value)─┐ +│ inf │ +└───────────────────────────────────┘ +``` \ No newline at end of file diff --git a/docs/en/sql-reference/aggregate-functions/reference/index.md b/docs/en/sql-reference/aggregate-functions/reference/index.md index 451ee2aae9d..a56b1c97681 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/index.md +++ b/docs/en/sql-reference/aggregate-functions/reference/index.md @@ -9,110 +9,116 @@ toc_hidden: true Standard aggregate functions: -- [count](/docs/en/sql-reference/aggregate-functions/reference/count.md) -- [min](/docs/en/sql-reference/aggregate-functions/reference/min.md) -- [max](/docs/en/sql-reference/aggregate-functions/reference/max.md) -- [sum](/docs/en/sql-reference/aggregate-functions/reference/sum.md) -- [avg](/docs/en/sql-reference/aggregate-functions/reference/avg.md) -- [any](/docs/en/sql-reference/aggregate-functions/reference/any.md) -- [stddevPop](/docs/en/sql-reference/aggregate-functions/reference/stddevpop.md) -- [stddevPopStable](/docs/en/sql-reference/aggregate-functions/reference/stddevpopstable.md) -- [stddevSamp](/docs/en/sql-reference/aggregate-functions/reference/stddevsamp.md) -- [stddevSampStable](/docs/en/sql-reference/aggregate-functions/reference/stddevsampstable.md) -- [varPop](/docs/en/sql-reference/aggregate-functions/reference/varpop.md) -- [varSamp](/docs/en/sql-reference/aggregate-functions/reference/varsamp.md) -- [corr](./corr.md) -- [covarPop](/docs/en/sql-reference/aggregate-functions/reference/covarpop.md) -- [covarSamp](/docs/en/sql-reference/aggregate-functions/reference/covarsamp.md) -- [entropy](./entropy.md) -- [exponentialMovingAverage](./exponentialmovingaverage.md) -- [intervalLengthSum](./intervalLengthSum.md) -- [kolmogorovSmirnovTest](./kolmogorovsmirnovtest.md) -- [mannwhitneyutest](./mannwhitneyutest.md) -- [median](./median.md) -- [rankCorr](./rankCorr.md) -- [sumKahan](./sumkahan.md) -- [studentTTest](./studentttest.md) -- [welchTTest](./welchttest.md) +- [count](../reference/count.md) +- [min](../reference/min.md) +- [max](../reference/max.md) +- [sum](../reference/sum.md) +- [avg](../reference/avg.md) +- [any](../reference/any.md) +- [stddevPop](../reference/stddevpop.md) +- [stddevPopStable](../reference/stddevpopstable.md) +- [stddevSamp](../reference/stddevsamp.md) +- [stddevSampStable](../reference/stddevsampstable.md) +- [varPop](../reference/varpop.md) +- [varSamp](../reference/varsamp.md) +- [corr](../reference/corr.md) +- [corr](../reference/corrstable.md) +- [corrMatrix](../reference/corrmatrix.md) +- [covarPop](../reference/covarpop.md) +- [covarStable](../reference/covarpopstable.md) +- [covarPopMatrix](../reference/covarpopmatrix.md) +- [covarSamp](../reference/covarsamp.md) +- [covarSampStable](../reference/covarsampstable.md) +- [covarSampMatrix](../reference/covarsampmatrix.md) +- [entropy](../reference/entropy.md) +- [exponentialMovingAverage](../reference/exponentialmovingaverage.md) +- [intervalLengthSum](../reference/intervalLengthSum.md) +- [kolmogorovSmirnovTest](../reference/kolmogorovsmirnovtest.md) +- [mannwhitneyutest](../reference/mannwhitneyutest.md) +- [median](../reference/median.md) +- [rankCorr](../reference/rankCorr.md) +- [sumKahan](../reference/sumkahan.md) +- [studentTTest](../reference/studentttest.md) +- [welchTTest](../reference/welchttest.md) ClickHouse-specific aggregate functions: -- [analysisOfVariance](/docs/en/sql-reference/aggregate-functions/reference/analysis_of_variance.md) -- [any](/docs/en/sql-reference/aggregate-functions/reference/any_respect_nulls.md) -- [anyHeavy](/docs/en/sql-reference/aggregate-functions/reference/anyheavy.md) -- [anyLast](/docs/en/sql-reference/aggregate-functions/reference/anylast.md) -- [anyLast](/docs/en/sql-reference/aggregate-functions/reference/anylast_respect_nulls.md) -- [boundingRatio](/docs/en/sql-reference/aggregate-functions/reference/boundrat.md) -- [first_value](/docs/en/sql-reference/aggregate-functions/reference/first_value.md) -- [last_value](/docs/en/sql-reference/aggregate-functions/reference/last_value.md) -- [argMin](/docs/en/sql-reference/aggregate-functions/reference/argmin.md) -- [argMax](/docs/en/sql-reference/aggregate-functions/reference/argmax.md) -- [avgWeighted](/docs/en/sql-reference/aggregate-functions/reference/avgweighted.md) -- [topK](/docs/en/sql-reference/aggregate-functions/reference/topk.md) -- [topKWeighted](/docs/en/sql-reference/aggregate-functions/reference/topkweighted.md) -- [deltaSum](./deltasum.md) -- [deltaSumTimestamp](./deltasumtimestamp.md) -- [groupArray](/docs/en/sql-reference/aggregate-functions/reference/grouparray.md) -- [groupArrayLast](/docs/en/sql-reference/aggregate-functions/reference/grouparraylast.md) -- [groupUniqArray](/docs/en/sql-reference/aggregate-functions/reference/groupuniqarray.md) -- [groupArrayInsertAt](/docs/en/sql-reference/aggregate-functions/reference/grouparrayinsertat.md) -- [groupArrayMovingAvg](/docs/en/sql-reference/aggregate-functions/reference/grouparraymovingavg.md) -- [groupArrayMovingSum](/docs/en/sql-reference/aggregate-functions/reference/grouparraymovingsum.md) -- [groupArraySample](./grouparraysample.md) -- [groupArraySorted](/docs/en/sql-reference/aggregate-functions/reference/grouparraysorted.md) -- [groupArrayIntersect](./grouparrayintersect.md) -- [groupBitAnd](/docs/en/sql-reference/aggregate-functions/reference/groupbitand.md) -- [groupBitOr](/docs/en/sql-reference/aggregate-functions/reference/groupbitor.md) -- [groupBitXor](/docs/en/sql-reference/aggregate-functions/reference/groupbitxor.md) -- [groupBitmap](/docs/en/sql-reference/aggregate-functions/reference/groupbitmap.md) -- [groupBitmapAnd](/docs/en/sql-reference/aggregate-functions/reference/groupbitmapand.md) -- [groupBitmapOr](/docs/en/sql-reference/aggregate-functions/reference/groupbitmapor.md) -- [groupBitmapXor](/docs/en/sql-reference/aggregate-functions/reference/groupbitmapxor.md) -- [sumWithOverflow](/docs/en/sql-reference/aggregate-functions/reference/sumwithoverflow.md) -- [sumMap](/docs/en/sql-reference/aggregate-functions/reference/summap.md) -- [sumMapWithOverflow](/docs/en/sql-reference/aggregate-functions/reference/summapwithoverflow.md) -- [sumMapFiltered](/docs/en/sql-reference/aggregate-functions/parametric-functions.md/#summapfiltered) -- [sumMapFilteredWithOverflow](/docs/en/sql-reference/aggregate-functions/parametric-functions.md/#summapfilteredwithoverflow) -- [minMap](/docs/en/sql-reference/aggregate-functions/reference/minmap.md) -- [maxMap](/docs/en/sql-reference/aggregate-functions/reference/maxmap.md) -- [skewSamp](/docs/en/sql-reference/aggregate-functions/reference/skewsamp.md) -- [skewPop](/docs/en/sql-reference/aggregate-functions/reference/skewpop.md) -- [kurtSamp](/docs/en/sql-reference/aggregate-functions/reference/kurtsamp.md) -- [kurtPop](/docs/en/sql-reference/aggregate-functions/reference/kurtpop.md) -- [uniq](/docs/en/sql-reference/aggregate-functions/reference/uniq.md) -- [uniqExact](/docs/en/sql-reference/aggregate-functions/reference/uniqexact.md) -- [uniqCombined](/docs/en/sql-reference/aggregate-functions/reference/uniqcombined.md) -- [uniqCombined64](/docs/en/sql-reference/aggregate-functions/reference/uniqcombined64.md) -- [uniqHLL12](/docs/en/sql-reference/aggregate-functions/reference/uniqhll12.md) -- [uniqTheta](/docs/en/sql-reference/aggregate-functions/reference/uniqthetasketch.md) -- [quantile](/docs/en/sql-reference/aggregate-functions/reference/quantile.md) -- [quantiles](/docs/en/sql-reference/aggregate-functions/reference/quantiles.md) -- [quantileExact](/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md) -- [quantileExactLow](/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md#quantileexactlow) -- [quantileExactHigh](/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md#quantileexacthigh) -- [quantileExactWeighted](/docs/en/sql-reference/aggregate-functions/reference/quantileexactweighted.md) -- [quantileTiming](/docs/en/sql-reference/aggregate-functions/reference/quantiletiming.md) -- [quantileTimingWeighted](/docs/en/sql-reference/aggregate-functions/reference/quantiletimingweighted.md) -- [quantileDeterministic](/docs/en/sql-reference/aggregate-functions/reference/quantiledeterministic.md) -- [quantileTDigest](/docs/en/sql-reference/aggregate-functions/reference/quantiletdigest.md) -- [quantileTDigestWeighted](/docs/en/sql-reference/aggregate-functions/reference/quantiletdigestweighted.md) -- [quantileBFloat16](/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md#quantilebfloat16) -- [quantileBFloat16Weighted](/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md#quantilebfloat16weighted) -- [quantileDD](/docs/en/sql-reference/aggregate-functions/reference/quantileddsketch.md#quantileddsketch) -- [simpleLinearRegression](/docs/en/sql-reference/aggregate-functions/reference/simplelinearregression.md) -- [singleValueOrNull](/docs/en/sql-reference/aggregate-functions/reference/singlevalueornull.md) -- [stochasticLinearRegression](/docs/en/sql-reference/aggregate-functions/reference/stochasticlinearregression.md) -- [stochasticLogisticRegression](/docs/en/sql-reference/aggregate-functions/reference/stochasticlogisticregression.md) -- [categoricalInformationValue](/docs/en/sql-reference/aggregate-functions/reference/categoricalinformationvalue.md) -- [contingency](./contingency.md) -- [cramersV](./cramersv.md) -- [cramersVBiasCorrected](./cramersvbiascorrected.md) -- [theilsU](./theilsu.md) -- [maxIntersections](./maxintersections.md) -- [maxIntersectionsPosition](./maxintersectionsposition.md) -- [meanZTest](./meanztest.md) -- [quantileGK](./quantileGK.md) -- [quantileInterpolatedWeighted](./quantileinterpolatedweighted.md) -- [sparkBar](./sparkbar.md) -- [sumCount](./sumcount.md) -- [largestTriangleThreeBuckets](./largestTriangleThreeBuckets.md) +- [analysisOfVariance](../reference/analysis_of_variance.md) +- [any](../reference/any_respect_nulls.md) +- [anyHeavy](../reference/anyheavy.md) +- [anyLast](../reference/anylast.md) +- [anyLast](../reference/anylast_respect_nulls.md) +- [boundingRatio](../reference/boundrat.md) +- [first_value](../reference/first_value.md) +- [last_value](../reference/last_value.md) +- [argMin](../reference/argmin.md) +- [argMax](../reference/argmax.md) +- [avgWeighted](../reference/avgweighted.md) +- [topK](../reference/topk.md) +- [topKWeighted](../reference/topkweighted.md) +- [deltaSum](../reference/deltasum.md) +- [deltaSumTimestamp](../reference/deltasumtimestamp.md) +- [groupArray](../reference/grouparray.md) +- [groupArrayLast](../reference/grouparraylast.md) +- [groupUniqArray](../reference/groupuniqarray.md) +- [groupArrayInsertAt](../reference/grouparrayinsertat.md) +- [groupArrayMovingAvg](../reference/grouparraymovingavg.md) +- [groupArrayMovingSum](../reference/grouparraymovingsum.md) +- [groupArraySample](../reference/grouparraysample.md) +- [groupArraySorted](../reference/grouparraysorted.md) +- [groupArrayIntersect](../reference/grouparrayintersect.md) +- [groupBitAnd](../reference/groupbitand.md) +- [groupBitOr](../reference/groupbitor.md) +- [groupBitXor](../reference/groupbitxor.md) +- [groupBitmap](../reference/groupbitmap.md) +- [groupBitmapAnd](../reference/groupbitmapand.md) +- [groupBitmapOr](../reference/groupbitmapor.md) +- [groupBitmapXor](../reference/groupbitmapxor.md) +- [sumWithOverflow](../reference/sumwithoverflow.md) +- [sumMap](../reference/summap.md) +- [sumMapWithOverflow](../reference/summapwithoverflow.md) +- [sumMapFiltered](../parametric-functions.md/#summapfiltered) +- [sumMapFilteredWithOverflow](../parametric-functions.md/#summapfilteredwithoverflow) +- [minMap](../reference/minmap.md) +- [maxMap](../reference/maxmap.md) +- [skewSamp](../reference/skewsamp.md) +- [skewPop](../reference/skewpop.md) +- [kurtSamp](../reference/kurtsamp.md) +- [kurtPop](../reference/kurtpop.md) +- [uniq](../reference/uniq.md) +- [uniqExact](../reference/uniqexact.md) +- [uniqCombined](../reference/uniqcombined.md) +- [uniqCombined64](../reference/uniqcombined64.md) +- [uniqHLL12](../reference/uniqhll12.md) +- [uniqTheta](../reference/uniqthetasketch.md) +- [quantile](../reference/quantile.md) +- [quantiles](../reference/quantiles.md) +- [quantileExact](../reference/quantileexact.md) +- [quantileExactLow](../reference/quantileexact.md#quantileexactlow) +- [quantileExactHigh](../reference/quantileexact.md#quantileexacthigh) +- [quantileExactWeighted](../reference/quantileexactweighted.md) +- [quantileTiming](../reference/quantiletiming.md) +- [quantileTimingWeighted](../reference/quantiletimingweighted.md) +- [quantileDeterministic](../reference/quantiledeterministic.md) +- [quantileTDigest](../reference/quantiletdigest.md) +- [quantileTDigestWeighted](../reference/quantiletdigestweighted.md) +- [quantileBFloat16](../reference/quantilebfloat16.md#quantilebfloat16) +- [quantileBFloat16Weighted](../reference/quantilebfloat16.md#quantilebfloat16weighted) +- [quantileDD](../reference/quantileddsketch.md#quantileddsketch) +- [simpleLinearRegression](../reference/simplelinearregression.md) +- [singleValueOrNull](../reference/singlevalueornull.md) +- [stochasticLinearRegression](../reference/stochasticlinearregression.md) +- [stochasticLogisticRegression](../reference/stochasticlogisticregression.md) +- [categoricalInformationValue](../reference/categoricalinformationvalue.md) +- [contingency](../reference/contingency.md) +- [cramersV](../reference/cramersv.md) +- [cramersVBiasCorrected](../reference/cramersvbiascorrected.md) +- [theilsU](../reference/theilsu.md) +- [maxIntersections](../reference/maxintersections.md) +- [maxIntersectionsPosition](../reference/maxintersectionsposition.md) +- [meanZTest](../reference/meanztest.md) +- [quantileGK](../reference/quantileGK.md) +- [quantileInterpolatedWeighted](../reference/quantileinterpolatedweighted.md) +- [sparkBar](../reference/sparkbar.md) +- [sumCount](../reference/sumcount.md) +- [largestTriangleThreeBuckets](../reference/largestTriangleThreeBuckets.md) diff --git a/docs/en/sql-reference/data-types/boolean.md b/docs/en/sql-reference/data-types/boolean.md index 4c59bd947de..6fcbc218c5d 100644 --- a/docs/en/sql-reference/data-types/boolean.md +++ b/docs/en/sql-reference/data-types/boolean.md @@ -1,7 +1,7 @@ --- slug: /en/sql-reference/data-types/boolean sidebar_position: 22 -sidebar_label: Boolean +sidebar_label: Bool --- # Bool diff --git a/docs/en/sql-reference/data-types/ipv4.md b/docs/en/sql-reference/data-types/ipv4.md index 637ed543e08..98ba9f4abac 100644 --- a/docs/en/sql-reference/data-types/ipv4.md +++ b/docs/en/sql-reference/data-types/ipv4.md @@ -57,6 +57,18 @@ SELECT toTypeName(from), hex(from) FROM hits LIMIT 1; └──────────────────┴───────────┘ ``` +IPv4 addresses can be directly compared to IPv6 addresses: + +```sql +SELECT toIPv4('127.0.0.1') = toIPv6('::ffff:127.0.0.1'); +``` + +```text +┌─equals(toIPv4('127.0.0.1'), toIPv6('::ffff:127.0.0.1'))─┐ +│ 1 │ +└─────────────────────────────────────────────────────────┘ +``` + **See Also** - [Functions for Working with IPv4 and IPv6 Addresses](../functions/ip-address-functions.md) diff --git a/docs/en/sql-reference/data-types/ipv6.md b/docs/en/sql-reference/data-types/ipv6.md index 642a7db81fc..d3b7cc72a1a 100644 --- a/docs/en/sql-reference/data-types/ipv6.md +++ b/docs/en/sql-reference/data-types/ipv6.md @@ -57,6 +57,19 @@ SELECT toTypeName(from), hex(from) FROM hits LIMIT 1; └──────────────────┴──────────────────────────────────┘ ``` +IPv6 addresses can be directly compared to IPv4 addresses: + +```sql +SELECT toIPv4('127.0.0.1') = toIPv6('::ffff:127.0.0.1'); +``` + +```text +┌─equals(toIPv4('127.0.0.1'), toIPv6('::ffff:127.0.0.1'))─┐ +│ 1 │ +└─────────────────────────────────────────────────────────┘ +``` + + **See Also** - [Functions for Working with IPv4 and IPv6 Addresses](../functions/ip-address-functions.md) diff --git a/docs/en/sql-reference/data-types/map.md b/docs/en/sql-reference/data-types/map.md index 18c7816f811..9f82c2f093a 100644 --- a/docs/en/sql-reference/data-types/map.md +++ b/docs/en/sql-reference/data-types/map.md @@ -6,101 +6,106 @@ sidebar_label: Map(K, V) # Map(K, V) -`Map(K, V)` data type stores `key:value` pairs. -The Map datatype is implemented as `Array(Tuple(key T1, value T2))`, which means that the order of keys in each map does not change, i.e., this data type maintains insertion order. +Data type `Map(K, V)` stores key-value pairs. + +Unlike other databases, maps are not unique in ClickHouse, i.e. a map can contain two elements with the same key. +(The reason for that is that maps are internally implemented as `Array(Tuple(K, V))`.) + +You can use use syntax `m[k]` to obtain the value for key `k` in map `m`. +Also, `m[k]` scans the map, i.e. the runtime of the operation is linear in the size of the map. **Parameters** -- `key` — The key part of the pair. Arbitrary type, except [Nullable](../../sql-reference/data-types/nullable.md) and [LowCardinality](../../sql-reference/data-types/lowcardinality.md) nested with [Nullable](../../sql-reference/data-types/nullable.md) types. -- `value` — The value part of the pair. Arbitrary type, including [Map](../../sql-reference/data-types/map.md) and [Array](../../sql-reference/data-types/array.md). - -To get the value from an `a Map('key', 'value')` column, use `a['key']` syntax. This lookup works now with a linear complexity. +- `K` — The type of the Map keys. Arbitrary type except [Nullable](../../sql-reference/data-types/nullable.md) and [LowCardinality](../../sql-reference/data-types/lowcardinality.md) nested with [Nullable](../../sql-reference/data-types/nullable.md) types. +- `V` — The type of the Map values. Arbitrary type. **Examples** -Consider the table: +Create a table with a column of type map: ``` sql -CREATE TABLE table_map (a Map(String, UInt64)) ENGINE=Memory; -INSERT INTO table_map VALUES ({'key1':1, 'key2':10}), ({'key1':2,'key2':20}), ({'key1':3,'key2':30}); +CREATE TABLE tab (m Map(String, UInt64)) ENGINE=Memory; +INSERT INTO tab VALUES ({'key1':1, 'key2':10}), ({'key1':2,'key2':20}), ({'key1':3,'key2':30}); ``` -Select all `key2` values: +To select `key2` values: ```sql -SELECT a['key2'] FROM table_map; +SELECT m['key2'] FROM tab; ``` + Result: ```text -┌─arrayElement(a, 'key2')─┐ +┌─arrayElement(m, 'key2')─┐ │ 10 │ │ 20 │ │ 30 │ └─────────────────────────┘ ``` -If there's no such `key` in the `Map()` column, the query returns zeros for numerical values, empty strings or empty arrays. +If the requested key `k` is not contained in the map, `m[k]` returns the value type's default value, e.g. `0` for integer types and `''` for string types. +To check whether a key exists in a map, you can use function [mapContains](../../sql-reference/functions/tuple-map-functions#mapcontains). ```sql -INSERT INTO table_map VALUES ({'key3':100}), ({}); -SELECT a['key3'] FROM table_map; +CREATE TABLE tab (m Map(String, UInt64)) ENGINE=Memory; +INSERT INTO tab VALUES ({'key1':100}), ({}); +SELECT m['key1'] FROM tab; ``` Result: ```text -┌─arrayElement(a, 'key3')─┐ +┌─arrayElement(m, 'key1')─┐ │ 100 │ │ 0 │ └─────────────────────────┘ -┌─arrayElement(a, 'key3')─┐ -│ 0 │ -│ 0 │ -│ 0 │ -└─────────────────────────┘ ``` -## Convert Tuple to Map Type +## Converting Tuple to Map -You can cast `Tuple()` as `Map()` using [CAST](../../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast) function: +Values of type `Tuple()` can be casted to values of type `Map()` using function [CAST](../../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast): + +**Example** + +Query: ``` sql SELECT CAST(([1, 2, 3], ['Ready', 'Steady', 'Go']), 'Map(UInt8, String)') AS map; ``` +Result: + ``` text ┌─map───────────────────────────┐ │ {1:'Ready',2:'Steady',3:'Go'} │ └───────────────────────────────┘ ``` -## Map.keys and Map.values Subcolumns +## Reading subcolumns of Map -To optimize `Map` column processing, in some cases you can use the `keys` and `values` subcolumns instead of reading the whole column. +To avoid reading the entire map, you can use subcolumns `keys` and `values` in some cases. **Example** Query: ``` sql -CREATE TABLE t_map (`a` Map(String, UInt64)) ENGINE = Memory; +CREATE TABLE tab (m Map(String, UInt64)) ENGINE = Memory; +INSERT INTO tab VALUES (map('key1', 1, 'key2', 2, 'key3', 3)); -INSERT INTO t_map VALUES (map('key1', 1, 'key2', 2, 'key3', 3)); - -SELECT a.keys FROM t_map; - -SELECT a.values FROM t_map; +SELECT m.keys FROM tab; -- same as mapKeys(m) +SELECT m.values FROM tab; -- same as mapValues(m) ``` Result: ``` text -┌─a.keys─────────────────┐ +┌─m.keys─────────────────┐ │ ['key1','key2','key3'] │ └────────────────────────┘ -┌─a.values─┐ +┌─m.values─┐ │ [1,2,3] │ └──────────┘ ``` diff --git a/docs/en/sql-reference/functions/encoding-functions.md b/docs/en/sql-reference/functions/encoding-functions.md index 408b605727d..24a95b0398b 100644 --- a/docs/en/sql-reference/functions/encoding-functions.md +++ b/docs/en/sql-reference/functions/encoding-functions.md @@ -167,7 +167,7 @@ Performs the opposite operation of [hex](#hex). It interprets each pair of hexad If you want to convert the result to a number, you can use the [reverse](../../sql-reference/functions/string-functions.md#reverse) and [reinterpretAs<Type>](../../sql-reference/functions/type-conversion-functions.md#type-conversion-functions) functions. -:::note +:::note If `unhex` is invoked from within the `clickhouse-client`, binary strings display using UTF-8. ::: @@ -322,11 +322,11 @@ Alias: `UNBIN`. For a numeric argument `unbin()` does not return the inverse of `bin()`. If you want to convert the result to a number, you can use the [reverse](../../sql-reference/functions/string-functions.md#reverse) and [reinterpretAs<Type>](../../sql-reference/functions/type-conversion-functions.md#reinterpretasuint8163264) functions. -:::note +:::note If `unbin` is invoked from within the `clickhouse-client`, binary strings are displayed using UTF-8. ::: -Supports binary digits `0` and `1`. The number of binary digits does not have to be multiples of eight. If the argument string contains anything other than binary digits, some implementation-defined result is returned (an exception isn’t thrown). +Supports binary digits `0` and `1`. The number of binary digits does not have to be multiples of eight. If the argument string contains anything other than binary digits, some implementation-defined result is returned (an exception isn’t thrown). **Arguments** @@ -482,7 +482,7 @@ mortonEncode(range_mask, args) - `range_mask`: 1-8. - `args`: up to 8 [unsigned integers](../data-types/int-uint.md) or columns of the aforementioned type. -Note: when using columns for `args` the provided `range_mask` tuple should still be a constant. +Note: when using columns for `args` the provided `range_mask` tuple should still be a constant. **Returned value** @@ -626,7 +626,7 @@ Result: Accepts a range mask (tuple) as a first argument and the code as the second argument. Each number in the mask configures the amount of range shrink:
1 - no shrink
-2 - 2x shrink
+2 - 2x shrink
3 - 3x shrink
...
Up to 8x shrink.
@@ -701,6 +701,267 @@ Result: 1 2 3 4 5 6 7 8 ``` +## hilbertEncode + +Calculates code for Hilbert Curve for a list of unsigned integers. + +The function has two modes of operation: +- Simple +- Expanded + +### Simple mode + +Simple: accepts up to 2 unsigned integers as arguments and produces a UInt64 code. + +**Syntax** + +```sql +hilbertEncode(args) +``` + +**Parameters** + +- `args`: up to 2 [unsigned integers](../../sql-reference/data-types/int-uint.md) or columns of the aforementioned type. + +**Returned value** + +- A UInt64 code + +Type: [UInt64](../../sql-reference/data-types/int-uint.md) + +**Example** + +Query: + +```sql +SELECT hilbertEncode(3, 4); +``` +Result: + +```response +31 +``` + +### Expanded mode + +Accepts a range mask ([tuple](../../sql-reference/data-types/tuple.md)) as a first argument and up to 2 [unsigned integers](../../sql-reference/data-types/int-uint.md) as other arguments. + +Each number in the mask configures the number of bits by which the corresponding argument will be shifted left, effectively scaling the argument within its range. + +**Syntax** + +```sql +hilbertEncode(range_mask, args) +``` + +**Parameters** +- `range_mask`: ([tuple](../../sql-reference/data-types/tuple.md)) +- `args`: up to 2 [unsigned integers](../../sql-reference/data-types/int-uint.md) or columns of the aforementioned type. + +Note: when using columns for `args` the provided `range_mask` tuple should still be a constant. + +**Returned value** + +- A UInt64 code + +Type: [UInt64](../../sql-reference/data-types/int-uint.md) +**Example** +Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality) +For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF). + +Query: + +```sql +SELECT hilbertEncode((10,6), 1024, 16); +``` + +Result: + +```response +4031541586602 +``` + +Note: tuple size must be equal to the number of the other arguments. + +**Example** + +For a single argument without a tuple, the function returns the argument itself as the Hilbert index, since no dimensional mapping is needed. + +Query: + +```sql +SELECT hilbertEncode(1); +``` + +Result: + +```response +1 +``` + +**Example** + +If a single argument is provided with a tuple specifying bit shifts, the function shifts the argument left by the specified number of bits. + +Query: + +```sql +SELECT hilbertEncode(tuple(2), 128); +``` + +Result: + +```response +512 +``` + +**Example** + +The function also accepts columns as arguments: + +Query: + +First create the table and insert some data. + +```sql +create table hilbert_numbers( + n1 UInt32, + n2 UInt32 +) +Engine=MergeTree() +ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi'; +insert into hilbert_numbers (*) values(1,2); +``` +Use column names instead of constants as function arguments to `hilbertEncode` + +Query: + +```sql +SELECT hilbertEncode(n1, n2) FROM hilbert_numbers; +``` + +Result: + +```response +13 +``` + +**implementation details** + +Please note that you can fit only so many bits of information into Hilbert code as [UInt64](../../sql-reference/data-types/int-uint.md) has. Two arguments will have a range of maximum 2^32 (64/2) each. All overflow will be clamped to zero. + +## hilbertDecode + +Decodes a Hilbert curve index back into a tuple of unsigned integers, representing coordinates in multi-dimensional space. + +As with the `hilbertEncode` function, this function has two modes of operation: +- Simple +- Expanded + +### Simple mode + +Accepts up to 2 unsigned integers as arguments and produces a UInt64 code. + +**Syntax** + +```sql +hilbertDecode(tuple_size, code) +``` + +**Parameters** +- `tuple_size`: integer value no more than 2. +- `code`: [UInt64](../../sql-reference/data-types/int-uint.md) code. + +**Returned value** + +- [tuple](../../sql-reference/data-types/tuple.md) of the specified size. + +Type: [UInt64](../../sql-reference/data-types/int-uint.md) + +**Example** + +Query: + +```sql +SELECT hilbertDecode(2, 31); +``` + +Result: + +```response +["3", "4"] +``` + +### Expanded mode + +Accepts a range mask (tuple) as a first argument and up to 2 unsigned integers as other arguments. +Each number in the mask configures the number of bits by which the corresponding argument will be shifted left, effectively scaling the argument within its range. + +Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality) +For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF). +As with the encode function, this is limited to 8 numbers at most. + +**Example** + +Hilbert code for one argument is always the argument itself (as a tuple). + +Query: + +```sql +SELECT hilbertDecode(1, 1); +``` + +Result: + +```response +["1"] +``` + +**Example** + +A single argument with a tuple specifying bit shifts will be right-shifted accordingly. + +Query: + +```sql +SELECT hilbertDecode(tuple(2), 32768); +``` + +Result: + +```response +["128"] +``` + +**Example** + +The function accepts a column of codes as a second argument: + +First create the table and insert some data. + +Query: +```sql +create table hilbert_numbers( + n1 UInt32, + n2 UInt32 +) +Engine=MergeTree() +ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi'; +insert into hilbert_numbers (*) values(1,2); +``` +Use column names instead of constants as function arguments to `hilbertDecode` + +Query: + +```sql +select untuple(hilbertDecode(2, hilbertEncode(n1, n2))) from hilbert_numbers; +``` + +Result: + +```response +1 2 +``` diff --git a/docs/en/sql-reference/functions/math-functions.md b/docs/en/sql-reference/functions/math-functions.md index 7f50fa933b6..12098efc635 100644 --- a/docs/en/sql-reference/functions/math-functions.md +++ b/docs/en/sql-reference/functions/math-functions.md @@ -415,8 +415,8 @@ Alias: `power(x, y)` **Arguments** -- `x` - [(U)Int8/16/32/64](../data-types/int-uint.md) or [Float*](../data-types/float.md) -- `y` - [(U)Int8/16/32/64](../data-types/int-uint.md) or [Float*](../data-types/float.md) +- `x` - [(U)Int8/16/32/64](../data-types/int-uint.md), [Float*](../data-types/float.md) or [Decimal*](../data-types/decimal.md) +- `y` - [(U)Int8/16/32/64](../data-types/int-uint.md), [Float*](../data-types/float.md) or [Decimal*](../data-types/decimal.md) **Returned value** @@ -635,8 +635,8 @@ atan2(y, x) **Arguments** -- `y` — y-coordinate of the point through which the ray passes. [(U)Int*](../data-types/int-uint.md), [Float*](../data-types/float.md). -- `x` — x-coordinate of the point through which the ray passes. [(U)Int*](../data-types/int-uint.md), [Float*](../data-types/float.md). +- `y` — y-coordinate of the point through which the ray passes. [(U)Int*](../data-types/int-uint.md), [Float*](../data-types/float.md) or [Decimal*](../data-types/decimal.md). +- `x` — x-coordinate of the point through which the ray passes. [(U)Int*](../data-types/int-uint.md), [Float*](../data-types/float.md) or [Decimal*](../data-types/decimal.md). **Returned value** @@ -670,8 +670,8 @@ hypot(x, y) **Arguments** -- `x` — The first cathetus of a right-angle triangle. [(U)Int*](../data-types/int-uint.md), [Float*](../data-types/float.md). -- `y` — The second cathetus of a right-angle triangle. [(U)Int*](../data-types/int-uint.md), [Float*](../data-types/float.md). +- `x` — The first cathetus of a right-angle triangle. [(U)Int*](../data-types/int-uint.md), [Float*](../data-types/float.md) or [Decimal*](../data-types/decimal.md). +- `y` — The second cathetus of a right-angle triangle. [(U)Int*](../data-types/int-uint.md), [Float*](../data-types/float.md) or [Decimal*](../data-types/decimal.md). **Returned value** @@ -838,6 +838,7 @@ degrees(x) **Arguments** +- `x` — Input in radians. [(U)Int*](../data-types/int-uint.md), [Float*](../data-types/float.md) or [Decimal*](../data-types/decimal.md). - `x` — Input in radians. [(U)Int*](../data-types/int-uint.md), [Float*](../data-types/float.md) or [Decimal*](../data-types/decimal.md). **Returned value** diff --git a/docs/en/sql-reference/functions/other-functions.md b/docs/en/sql-reference/functions/other-functions.md index dfe1224f7b8..31df9e5627d 100644 --- a/docs/en/sql-reference/functions/other-functions.md +++ b/docs/en/sql-reference/functions/other-functions.md @@ -735,6 +735,8 @@ LIMIT 10 Given a size (number of bytes), this function returns a readable, rounded size with suffix (KB, MB, etc.) as string. +The opposite operations of this function are [parseReadableSize](#parseReadableSize), [parseReadableSizeOrZero](#parseReadableSizeOrZero), and [parseReadableSizeOrNull](#parseReadableSizeOrNull). + **Syntax** ```sql @@ -766,6 +768,8 @@ Result: Given a size (number of bytes), this function returns a readable, rounded size with suffix (KiB, MiB, etc.) as string. +The opposite operations of this function are [parseReadableSize](#parseReadableSize), [parseReadableSizeOrZero](#parseReadableSizeOrZero), and [parseReadableSizeOrNull](#parseReadableSizeOrNull). + **Syntax** ```sql @@ -890,6 +894,122 @@ SELECT └────────────────────┴────────────────────────────────────────────────┘ ``` +## parseReadableSize + +Given a string containing a byte size and `B`, `KiB`, `KB`, `MiB`, `MB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) or decimal byte unit), this function returns the corresponding number of bytes. +If the function is unable to parse the input value, it throws an exception. + +The inverse operations of this function are [formatReadableSize](#formatReadableSize) and [formatReadableDecimalSize](#formatReadableDecimalSize). + +**Syntax** + +```sql +formatReadableSize(x) +``` + +**Arguments** + +- `x` : Readable size with ISO/IEC 80000-13 or decimal byte unit ([String](../../sql-reference/data-types/string.md)). + +**Returned value** + +- Number of bytes, rounded up to the nearest integer ([UInt64](../../sql-reference/data-types/int-uint.md)). + +**Example** + +```sql +SELECT + arrayJoin(['1 B', '1 KiB', '3 MB', '5.314 KiB']) AS readable_sizes, + parseReadableSize(readable_sizes) AS sizes; +``` + +```text +┌─readable_sizes─┬───sizes─┐ +│ 1 B │ 1 │ +│ 1 KiB │ 1024 │ +│ 3 MB │ 3000000 │ +│ 5.314 KiB │ 5442 │ +└────────────────┴─────────┘ +``` + +## parseReadableSizeOrNull + +Given a string containing a byte size and `B`, `KiB`, `KB`, `MiB`, `MB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) or decimal byte unit), this function returns the corresponding number of bytes. +If the function is unable to parse the input value, it returns `NULL`. + +The inverse operations of this function are [formatReadableSize](#formatReadableSize) and [formatReadableDecimalSize](#formatReadableDecimalSize). + +**Syntax** + +```sql +parseReadableSizeOrNull(x) +``` + +**Arguments** + +- `x` : Readable size with ISO/IEC 80000-13 or decimal byte unit ([String](../../sql-reference/data-types/string.md)). + +**Returned value** + +- Number of bytes, rounded up to the nearest integer, or NULL if unable to parse the input (Nullable([UInt64](../../sql-reference/data-types/int-uint.md))). + +**Example** + +```sql +SELECT + arrayJoin(['1 B', '1 KiB', '3 MB', '5.314 KiB', 'invalid']) AS readable_sizes, + parseReadableSizeOrNull(readable_sizes) AS sizes; +``` + +```text +┌─readable_sizes─┬───sizes─┐ +│ 1 B │ 1 │ +│ 1 KiB │ 1024 │ +│ 3 MB │ 3000000 │ +│ 5.314 KiB │ 5442 │ +│ invalid │ ᴺᵁᴸᴸ │ +└────────────────┴─────────┘ +``` + +## parseReadableSizeOrZero + +Given a string containing a byte size and `B`, `KiB`, `KB`, `MiB`, `MB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) or decimal byte unit), this function returns the corresponding number of bytes. If the function is unable to parse the input value, it returns `0`. + +The inverse operations of this function are [formatReadableSize](#formatReadableSize) and [formatReadableDecimalSize](#formatReadableDecimalSize). + + +**Syntax** + +```sql +parseReadableSizeOrZero(x) +``` + +**Arguments** + +- `x` : Readable size with ISO/IEC 80000-13 or decimal byte unit ([String](../../sql-reference/data-types/string.md)). + +**Returned value** + +- Number of bytes, rounded up to the nearest integer, or 0 if unable to parse the input ([UInt64](../../sql-reference/data-types/int-uint.md)). + +**Example** + +```sql +SELECT + arrayJoin(['1 B', '1 KiB', '3 MB', '5.314 KiB', 'invalid']) AS readable_sizes, + parseReadableSizeOrZero(readable_sizes) AS sizes; +``` + +```text +┌─readable_sizes─┬───sizes─┐ +│ 1 B │ 1 │ +│ 1 KiB │ 1024 │ +│ 3 MB │ 3000000 │ +│ 5.314 KiB │ 5442 │ +│ invalid │ 0 │ +└────────────────┴─────────┘ +``` + ## parseTimeDelta Parse a sequence of numbers followed by something resembling a time unit. diff --git a/docs/en/sql-reference/functions/tuple-map-functions.md b/docs/en/sql-reference/functions/tuple-map-functions.md index d9c18e2a0a2..ad40725d680 100644 --- a/docs/en/sql-reference/functions/tuple-map-functions.md +++ b/docs/en/sql-reference/functions/tuple-map-functions.md @@ -6,7 +6,7 @@ sidebar_label: Maps ## map -Arranges `key:value` pairs into [Map(key, value)](../data-types/map.md) data type. +Creates a value of type [Map(key, value)](../data-types/map.md) from key-value pairs. **Syntax** @@ -16,12 +16,12 @@ map(key1, value1[, key2, value2, ...]) **Arguments** -- `key` — The key part of the pair. Arbitrary type, except [Nullable](../data-types/nullable.md) and [LowCardinality](../data-types/lowcardinality.md) nested with [Nullable](../data-types/nullable.md). -- `value` — The value part of the pair. Arbitrary type, including [Map](../data-types/map.md) and [Array](../data-types/array.md). +- `key_n` — The keys of the map entries. Any type supported as key type of [Map](../data-types/map.md). +- `value_n` — The values of the map entries. Any type supported as value type of [Map](../data-types/map.md). **Returned value** -- Data structure as `key:value` pairs. [Map(key, value)](../data-types/map.md). +- A map containing `key:value` pairs. [Map(key, value)](../data-types/map.md). **Examples** @@ -41,35 +41,16 @@ Result: └──────────────────────────────────────────────────┘ ``` -Query: - -```sql -CREATE TABLE table_map (a Map(String, UInt64)) ENGINE = MergeTree() ORDER BY a; -INSERT INTO table_map SELECT map('key1', number, 'key2', number * 2) FROM numbers(3); -SELECT a['key2'] FROM table_map; -``` - -Result: - -```text -┌─arrayElement(a, 'key2')─┐ -│ 0 │ -│ 2 │ -│ 4 │ -└─────────────────────────┘ -``` - -**See Also** - -- [Map(key, value)](../data-types/map.md) data type - ## mapFromArrays -Merges an [Array](../data-types/array.md) of keys and an [Array](../data-types/array.md) of values into a [Map(key, value)](../data-types/map.md). Notice that the second argument could also be a [Map](../data-types/map.md), thus it is casted to an Array when executing. +Creates a map from an array of keys and an array of values. +The function is a convenient alternative to syntax `CAST([...], 'Map(key_type, value_type)')`. +For example, instead of writing +- `CAST((['aa', 'bb'], [4, 5]), 'Map(String, UInt32)')`, or +- `CAST([('aa',4), ('bb',5)], 'Map(String, UInt32)')` -The function is a more convenient alternative to `CAST((key_array, value_array_or_map), 'Map(key_type, value_type)')`. For example, instead of writing `CAST((['aa', 'bb'], [4, 5]), 'Map(String, UInt32)')`, you can write `mapFromArrays(['aa', 'bb'], [4, 5])`. - +you can write `mapFromArrays(['aa', 'bb'], [4, 5])`. **Syntax** @@ -81,12 +62,12 @@ Alias: `MAP_FROM_ARRAYS(keys, values)` **Arguments** -- `keys` — Given key array to create a map from. The nested type of array must be: [String](../data-types/string.md), [Integer](../data-types/int-uint.md), [LowCardinality](../data-types/lowcardinality.md), [FixedString](../data-types/fixedstring.md), [UUID](../data-types/uuid.md), [Date](../data-types/date.md), [DateTime](../data-types/datetime.md), [Date32](../data-types/date32.md), [Enum](../data-types/enum.md) -- `values` - Given value array or map to create a map from. +- `keys` — Array of keys to create the map from. [Array(T)](../data-types/array.md) where `T` can be any type supported by [Map](../data-types/map.md) as key type. +- `values` - Array or map of values to create the map from. [Array](../data-types/array.md) or [Map](../data-types/map.md). **Returned value** -- A map whose keys and values are constructed from the key array and value array/map. +- A map with keys and values constructed from the key array and value array/map. **Example** @@ -94,14 +75,25 @@ Query: ```sql select mapFromArrays(['a', 'b', 'c'], [1, 2, 3]) +``` +Result: +``` ┌─mapFromArrays(['a', 'b', 'c'], [1, 2, 3])─┐ │ {'a':1,'b':2,'c':3} │ └───────────────────────────────────────────┘ +``` +`mapFromArrays` also accepts arguments of type [Map](../data-types/map.md). These are casted to array of tuples during execution. + +```sql SELECT mapFromArrays([1, 2, 3], map('a', 1, 'b', 2, 'c', 3)) +``` +Result: + +``` ┌─mapFromArrays([1, 2, 3], map('a', 1, 'b', 2, 'c', 3))─┐ │ {1:('a',1),2:('b',2),3:('c',3)} │ └───────────────────────────────────────────────────────┘ @@ -109,9 +101,11 @@ SELECT mapFromArrays([1, 2, 3], map('a', 1, 'b', 2, 'c', 3)) ## extractKeyValuePairs -Extracts key-value pairs, i.e. a [Map(String, String)](../data-types/map.md), from a string. Parsing is robust towards noise (e.g. log files). - -A key-value pair consists of a key, followed by a `key_value_delimiter` and a value. Key value pairs must be separated by `pair_delimiter`. Quoted keys and values are also supported. +Converts a string of key-value pairs to a [Map(String, String)](../data-types/map.md). +Parsing is tolerant towards noise (e.g. log files). +Key-value pairs in the input string consist of a key, followed by a key-value delimiter, and a value. +Key value pairs are separated by a pair delimiter. +Keys and values can be quoted. **Syntax** @@ -126,17 +120,17 @@ Alias: **Arguments** - `data` - String to extract key-value pairs from. [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md). -- `key_value_delimiter` - Character to be used as delimiter between the key and the value. Defaults to `:`. [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md). -- `pair_delimiters` - Set of character to be used as delimiters between pairs. Defaults to ` `, `,` and `;`. [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md). -- `quoting_character` - Character to be used as quoting character. Defaults to `"`. [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md). +- `key_value_delimiter` - Single character delimiting keys and values. Defaults to `:`. [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md). +- `pair_delimiters` - Set of character delimiting pairs. Defaults to ` `, `,` and `;`. [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md). +- `quoting_character` - Single character used as quoting character. Defaults to `"`. [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md). **Returned values** -- A [Map(String, String)](../data-types/map.md) of key-value pairs. +- A of key-value pairs. Type: [Map(String, String)](../data-types/map.md) **Examples** -Simple case: +Query ``` sql SELECT extractKeyValuePairs('name:neymar, age:31 team:psg,nationality:brazil') as kv @@ -150,7 +144,7 @@ Result: └─────────────────────────────────────────────────────────────────────────┘ ``` -Single quote as quoting character: +With a single quote `'` as quoting character: ``` sql SELECT extractKeyValuePairs('name:\'neymar\';\'age\':31;team:psg;nationality:brazil,last_key:last_value', ':', ';,', '\'') as kv @@ -178,9 +172,29 @@ Result: └────────────────────────┘ ``` +To restore a map string key-value pairs serialized with `toString`: + +```sql +SELECT + map('John', '33', 'Paula', '31') AS m, + toString(m) as map_serialized, + extractKeyValuePairs(map_serialized, ':', ',', '\'') AS map_restored +FORMAT Vertical; +``` + +Result: + +``` +Row 1: +────── +m: {'John':'33','Paula':'31'} +map_serialized: {'John':'33','Paula':'31'} +map_restored: {'John':'33','Paula':'31'} +``` + ## extractKeyValuePairsWithEscaping -Same as `extractKeyValuePairs` but with escaping support. +Same as `extractKeyValuePairs` but supports escaping. Supported escape sequences: `\x`, `\N`, `\a`, `\b`, `\e`, `\f`, `\n`, `\r`, `\t`, `\v` and `\0`. Non standard escape sequences are returned as it is (including the backslash) unless they are one of the following: @@ -229,20 +243,6 @@ Arguments are [maps](../data-types/map.md) or [tuples](../data-types/tuple.md#tu **Example** -Query with a tuple: - -```sql -SELECT mapAdd(([toUInt8(1), 2], [1, 1]), ([toUInt8(1), 2], [1, 1])) as res, toTypeName(res) as type; -``` - -Result: - -```text -┌─res───────────┬─type───────────────────────────────┐ -│ ([1,2],[2,2]) │ Tuple(Array(UInt8), Array(UInt64)) │ -└───────────────┴────────────────────────────────────┘ -``` - Query with `Map` type: ```sql @@ -257,6 +257,20 @@ Result: └──────────────────────────────┘ ``` +Query with a tuple: + +```sql +SELECT mapAdd(([toUInt8(1), 2], [1, 1]), ([toUInt8(1), 2], [1, 1])) as res, toTypeName(res) as type; +``` + +Result: + +```text +┌─res───────────┬─type───────────────────────────────┐ +│ ([1,2],[2,2]) │ Tuple(Array(UInt8), Array(UInt64)) │ +└───────────────┴────────────────────────────────────┘ +``` + ## mapSubtract Collect all the keys and subtract corresponding values. @@ -277,20 +291,6 @@ Arguments are [maps](../data-types/map.md) or [tuples](../data-types/tuple.md#tu **Example** -Query with a tuple map: - -```sql -SELECT mapSubtract(([toUInt8(1), 2], [toInt32(1), 1]), ([toUInt8(1), 2], [toInt32(2), 1])) as res, toTypeName(res) as type; -``` - -Result: - -```text -┌─res────────────┬─type──────────────────────────────┐ -│ ([1,2],[-1,0]) │ Tuple(Array(UInt8), Array(Int64)) │ -└────────────────┴───────────────────────────────────┘ -``` - Query with `Map` type: ```sql @@ -305,55 +305,57 @@ Result: └───────────────────────────────────┘ ``` -## mapPopulateSeries - -Fills missing keys in the maps (key and value array pair), where keys are integers. Also, it supports specifying the max key, which is used to extend the keys array. - -**Syntax** +Query with a tuple map: ```sql -mapPopulateSeries(keys, values[, max]) -mapPopulateSeries(map[, max]) -``` - -Generates a map (a tuple with two arrays or a value of `Map` type, depending on the arguments), where keys are a series of numbers, from minimum to maximum keys (or `max` argument if it specified) taken from the map with a step size of one, and corresponding values. If the value is not specified for the key, then it uses the default value in the resulting map. For repeated keys, only the first value (in order of appearing) gets associated with the key. - -For array arguments the number of elements in `keys` and `values` must be the same for each row. - -**Arguments** - -Arguments are [maps](../data-types/map.md) or two [arrays](../data-types/array.md#data-type-array), where the first array represent keys, and the second array contains values for the each key. - -Mapped arrays: - -- `keys` — Array of keys. [Array](../data-types/array.md#data-type-array)([Int](../data-types/int-uint.md#uint-ranges)). -- `values` — Array of values. [Array](../data-types/array.md#data-type-array)([Int](../data-types/int-uint.md#uint-ranges)). -- `max` — Maximum key value. Optional. [Int8, Int16, Int32, Int64, Int128, Int256](../data-types/int-uint.md#int-ranges). - -or - -- `map` — Map with integer keys. [Map](../data-types/map.md). - -**Returned value** - -- Depending on the arguments returns a [map](../data-types/map.md) or a [tuple](../data-types/tuple.md#tuplet1-t2) of two [arrays](../data-types/array.md#data-type-array): keys in sorted order, and values the corresponding keys. - -**Example** - -Query with mapped arrays: - -```sql -SELECT mapPopulateSeries([1,2,4], [11,22,44], 5) AS res, toTypeName(res) AS type; +SELECT mapSubtract(([toUInt8(1), 2], [toInt32(1), 1]), ([toUInt8(1), 2], [toInt32(2), 1])) as res, toTypeName(res) as type; ``` Result: ```text -┌─res──────────────────────────┬─type──────────────────────────────┐ -│ ([1,2,3,4,5],[11,22,0,44,0]) │ Tuple(Array(UInt8), Array(UInt8)) │ -└──────────────────────────────┴───────────────────────────────────┘ +┌─res────────────┬─type──────────────────────────────┐ +│ ([1,2],[-1,0]) │ Tuple(Array(UInt8), Array(Int64)) │ +└────────────────┴───────────────────────────────────┘ ``` +## mapPopulateSeries + +Fills missing key-value pairs in a map with integer keys. +To support extending the keys beyond the largest value, a maximum key can be specified. +More specifically, the function returns a map in which the the keys form a series from the smallest to the largest key (or `max` argument if it specified) with step size of 1, and corresponding values. +If no value is specified for a key, a default value is used as value. +In case keys repeat, only the first value (in order of appearance) is associated with the key. + +**Syntax** + +```sql +mapPopulateSeries(map[, max]) +mapPopulateSeries(keys, values[, max]) +``` + +For array arguments the number of elements in `keys` and `values` must be the same for each row. + +**Arguments** + +Arguments are [Maps](../data-types/map.md) or two [Arrays](../data-types/array.md#data-type-array), where the first and second array contains keys and values for the each key. + +Mapped arrays: + +- `map` — Map with integer keys. [Map](../data-types/map.md). + +or + +- `keys` — Array of keys. [Array](../data-types/array.md#data-type-array)([Int](../data-types/int-uint.md#uint-ranges)). +- `values` — Array of values. [Array](../data-types/array.md#data-type-array)([Int](../data-types/int-uint.md#uint-ranges)). +- `max` — Maximum key value. Optional. [Int8, Int16, Int32, Int64, Int128, Int256](../data-types/int-uint.md#int-ranges). + +**Returned value** + +- Depending on the arguments a [Map](../data-types/map.md) or a [Tuple](../data-types/tuple.md#tuplet1-t2) of two [Arrays](../data-types/array.md#data-type-array): keys in sorted order, and values the corresponding keys. + +**Example** + Query with `Map` type: ```sql @@ -368,9 +370,23 @@ Result: └─────────────────────────────────────────┘ ``` +Query with mapped arrays: + +```sql +SELECT mapPopulateSeries([1,2,4], [11,22,44], 5) AS res, toTypeName(res) AS type; +``` + +Result: + +```text +┌─res──────────────────────────┬─type──────────────────────────────┐ +│ ([1,2,3,4,5],[11,22,0,44,0]) │ Tuple(Array(UInt8), Array(UInt8)) │ +└──────────────────────────────┴───────────────────────────────────┘ +``` + ## mapContains -Determines whether the `map` contains the `key` parameter. +Returns if a given key is contained in a given map. **Syntax** @@ -381,7 +397,7 @@ mapContains(map, key) **Arguments** - `map` — Map. [Map](../data-types/map.md). -- `key` — Key. Type matches the type of keys of `map` parameter. +- `key` — Key. Type must match the key type of `map`. **Returned value** @@ -392,11 +408,11 @@ mapContains(map, key) Query: ```sql -CREATE TABLE test (a Map(String,String)) ENGINE = Memory; +CREATE TABLE tab (a Map(String, String)) ENGINE = Memory; -INSERT INTO test VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'}); +INSERT INTO tab VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'}); -SELECT mapContains(a, 'name') FROM test; +SELECT mapContains(a, 'name') FROM tab; ``` @@ -411,9 +427,11 @@ Result: ## mapKeys -Returns all keys from the `map` parameter. +Returns the keys of a given map. -Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [keys](../data-types/map.md#map-subcolumns) subcolumn instead of reading and processing the whole column data. The query `SELECT mapKeys(m) FROM table` transforms to `SELECT m.keys FROM table`. +This function can be optimized by enabling setting [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). +With enabled setting, the function only reads the [keys](../data-types/map.md#map-subcolumns) subcolumn instead the whole map. +The query `SELECT mapKeys(m) FROM table` is transformed to `SELECT m.keys FROM table`. **Syntax** @@ -434,11 +452,11 @@ mapKeys(map) Query: ```sql -CREATE TABLE test (a Map(String,String)) ENGINE = Memory; +CREATE TABLE tab (a Map(String, String)) ENGINE = Memory; -INSERT INTO test VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'}); +INSERT INTO tab VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'}); -SELECT mapKeys(a) FROM test; +SELECT mapKeys(a) FROM tab; ``` Result: @@ -452,9 +470,11 @@ Result: ## mapValues -Returns all values from the `map` parameter. +Returns the values of a given map. -Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [values](../data-types/map.md#map-subcolumns) subcolumn instead of reading and processing the whole column data. The query `SELECT mapValues(m) FROM table` transforms to `SELECT m.values FROM table`. +This function can be optimized by enabling setting [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns). +With enabled setting, the function only reads the [values](../data-types/map.md#map-subcolumns) subcolumn instead the whole map. +The query `SELECT mapValues(m) FROM table` is transformed to `SELECT m.values FROM table`. **Syntax** @@ -475,11 +495,11 @@ mapValues(map) Query: ```sql -CREATE TABLE test (a Map(String,String)) ENGINE = Memory; +CREATE TABLE tab (a Map(String, String)) ENGINE = Memory; -INSERT INTO test VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'}); +INSERT INTO tab VALUES ({'name':'eleven','age':'11'}), ({'number':'twelve','position':'6.0'}); -SELECT mapValues(a) FROM test; +SELECT mapValues(a) FROM tab; ``` Result: @@ -512,11 +532,11 @@ mapContainsKeyLike(map, pattern) Query: ```sql -CREATE TABLE test (a Map(String,String)) ENGINE = Memory; +CREATE TABLE tab (a Map(String, String)) ENGINE = Memory; -INSERT INTO test VALUES ({'abc':'abc','def':'def'}), ({'hij':'hij','klm':'klm'}); +INSERT INTO tab VALUES ({'abc':'abc','def':'def'}), ({'hij':'hij','klm':'klm'}); -SELECT mapContainsKeyLike(a, 'a%') FROM test; +SELECT mapContainsKeyLike(a, 'a%') FROM tab; ``` Result: @@ -530,6 +550,8 @@ Result: ## mapExtractKeyLike +Give a map with string keys and a LIKE pattern, this function returns a map with elements where the key matches the pattern. + **Syntax** ```sql @@ -543,18 +565,18 @@ mapExtractKeyLike(map, pattern) **Returned value** -- A map contained elements the key of which matches the specified pattern. If there are no elements matched the pattern, it will return an empty map. +- A map containing elements the key matching the specified pattern. If no elements match the pattern, an empty map is returned. **Example** Query: ```sql -CREATE TABLE test (a Map(String,String)) ENGINE = Memory; +CREATE TABLE tab (a Map(String, String)) ENGINE = Memory; -INSERT INTO test VALUES ({'abc':'abc','def':'def'}), ({'hij':'hij','klm':'klm'}); +INSERT INTO tab VALUES ({'abc':'abc','def':'def'}), ({'hij':'hij','klm':'klm'}); -SELECT mapExtractKeyLike(a, 'a%') FROM test; +SELECT mapExtractKeyLike(a, 'a%') FROM tab; ``` Result: @@ -568,6 +590,8 @@ Result: ## mapApply +Applies a function to each element of a map. + **Syntax** ```sql @@ -608,6 +632,8 @@ Result: ## mapFilter +Filters a map by applying a function to each map element. + **Syntax** ```sql @@ -623,7 +649,6 @@ mapFilter(func, map) - Returns a map containing only the elements in `map` for which `func(map1[i], ..., mapN[i])` returns something other than 0. - **Example** Query: @@ -647,7 +672,6 @@ Result: └─────────────────────┘ ``` - ## mapUpdate **Syntax** @@ -683,6 +707,9 @@ Result: ## mapConcat +Concatenates multiple maps based on the equality of their keys. +If elements with the same key exist in more than one input map, all elements are added to the result map, but only the first one is accessible via operator `[]` + **Syntax** ```sql @@ -691,11 +718,11 @@ mapConcat(maps) **Arguments** -- `maps` – Arbitrary number of arguments of [Map](../data-types/map.md) type. +- `maps` – Arbitrarily many [Maps](../data-types/map.md). **Returned value** -- Returns a map with concatenated maps passed as arguments. If there are same keys in two or more maps, all of them are added to the result map, but only the first one is accessible via operator `[]` +- Returns a map with concatenated maps passed as arguments. **Examples** @@ -729,9 +756,12 @@ Result: ## mapExists(\[func,\], map) -Returns 1 if there is at least one key-value pair in `map` for which `func(key, value)` returns something other than 0. Otherwise, it returns 0. +Returns 1 if at least one key-value pair in `map` exists for which `func(key, value)` returns something other than 0. Otherwise, it returns 0. -Note that the `mapExists` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. +:::note +`mapExists` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). +You can pass a lambda function to it as the first argument. +::: **Example** @@ -743,7 +773,7 @@ SELECT mapExists((k, v) -> (v = 1), map('k1', 1, 'k2', 2)) AS res Result: -```text +``` ┌─res─┐ │ 1 │ └─────┘ @@ -753,7 +783,10 @@ Result: Returns 1 if `func(key, value)` returns something other than 0 for all key-value pairs in `map`. Otherwise, it returns 0. -Note that the `mapAll` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. +:::note +Note that the `mapAll` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). +You can pass a lambda function to it as the first argument. +::: **Example** @@ -765,7 +798,7 @@ SELECT mapAll((k, v) -> (v = 1), map('k1', 1, 'k2', 2)) AS res Result: -```text +``` ┌─res─┐ │ 0 │ └─────┘ @@ -773,7 +806,8 @@ Result: ## mapSort(\[func,\], map) -Sorts the elements of the `map` in ascending order. If the `func` function is specified, sorting order is determined by the result of the `func` function applied to the keys and values of the map. +Sorts the elements of a map in ascending order. +If the `func` function is specified, the sorting order is determined by the result of the `func` function applied to the keys and values of the map. **Examples** @@ -801,8 +835,8 @@ For more details see the [reference](../../sql-reference/functions/array-functio ## mapReverseSort(\[func,\], map) -Sorts the elements of the `map` in descending order. If the `func` function is specified, sorting order is determined by the result of the `func` function applied to the keys and values of the map. - +Sorts the elements of a map in descending order. +If the `func` function is specified, the sorting order is determined by the result of the `func` function applied to the keys and values of the map. **Examples** @@ -826,4 +860,4 @@ SELECT mapReverseSort((k, v) -> v, map('key2', 2, 'key3', 1, 'key1', 3)) AS map; └──────────────────────────────┘ ``` -For more details see the [reference](../../sql-reference/functions/array-functions.md#array_functions-reverse-sort) for `arrayReverseSort` function. +For more details see function [arrayReverseSort](../../sql-reference/functions/array-functions.md#array_functions-reverse-sort). diff --git a/docs/en/sql-reference/statements/create/table.md b/docs/en/sql-reference/statements/create/table.md index 0edf158e981..628fe1d2875 100644 --- a/docs/en/sql-reference/statements/create/table.md +++ b/docs/en/sql-reference/statements/create/table.md @@ -337,7 +337,7 @@ Then, when executing the query `SELECT name FROM users_a WHERE length(name) < 5; Defines storage time for values. Can be specified only for MergeTree-family tables. For the detailed description, see [TTL for columns and tables](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl). -## Column Compression Codecs +## Column Compression Codecs {#column_compression_codec} By default, ClickHouse applies `lz4` compression in the self-managed version, and `zstd` in ClickHouse Cloud. @@ -410,6 +410,10 @@ High compression levels are useful for asymmetric scenarios, like compress once, - For compression, ZSTD_QAT tries to use an Intel® QAT offloading device ([QuickAssist Technology](https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/overview.html)). If no such device was found, it will fallback to ZSTD compression in software. - Decompression is always performed in software. +:::note +ZSTD_QAT is not available in ClickHouse Cloud. +::: + #### DEFLATE_QPL `DEFLATE_QPL` — [Deflate compression algorithm](https://github.com/intel/qpl) implemented by Intel® Query Processing Library. Some limitations apply: diff --git a/docs/en/sql-reference/statements/create/view.md b/docs/en/sql-reference/statements/create/view.md index b526c94e508..1bdf22b35b0 100644 --- a/docs/en/sql-reference/statements/create/view.md +++ b/docs/en/sql-reference/statements/create/view.md @@ -85,6 +85,14 @@ Also note, that `materialized_views_ignore_errors` set to `true` by default for If you specify `POPULATE`, the existing table data is inserted into the view when creating it, as if making a `CREATE TABLE ... AS SELECT ...` . Otherwise, the query contains only the data inserted in the table after creating the view. We **do not recommend** using `POPULATE`, since data inserted in the table during the view creation will not be inserted in it. +:::note +Given that `POPULATE` works like `CREATE TABLE ... AS SELECT ...` it has limitations: +- It is not supported with Replicated database +- It is not supported in ClickHouse cloud + +Instead a separate `INSERT ... SELECT` can be used. +::: + A `SELECT` query can contain `DISTINCT`, `GROUP BY`, `ORDER BY`, `LIMIT`. Note that the corresponding conversions are performed independently on each block of inserted data. For example, if `GROUP BY` is set, data is aggregated during insertion, but only within a single packet of inserted data. The data won’t be further aggregated. The exception is when using an `ENGINE` that independently performs data aggregation, such as `SummingMergeTree`. The execution of [ALTER](/docs/en/sql-reference/statements/alter/view.md) queries on materialized views has limitations, for example, you can not update the `SELECT` query, so this might be inconvenient. If the materialized view uses the construction `TO [db.]name`, you can `DETACH` the view, run `ALTER` for the target table, and then `ATTACH` the previously detached (`DETACH`) view. diff --git a/docs/en/sql-reference/table-functions/loop.md b/docs/en/sql-reference/table-functions/loop.md new file mode 100644 index 00000000000..3a9367b2d10 --- /dev/null +++ b/docs/en/sql-reference/table-functions/loop.md @@ -0,0 +1,55 @@ +# loop + +**Syntax** + +``` sql +SELECT ... FROM loop(database, table); +SELECT ... FROM loop(database.table); +SELECT ... FROM loop(table); +SELECT ... FROM loop(other_table_function(...)); +``` + +**Parameters** + +- `database` — database name. +- `table` — table name. +- `other_table_function(...)` — other table function. + Example: `SELECT * FROM loop(numbers(10));` + `other_table_function(...)` here is `numbers(10)`. + +**Returned Value** + +Infinite loop to return query results. + +**Examples** + +Selecting data from ClickHouse: + +``` sql +SELECT * FROM loop(test_database, test_table); +SELECT * FROM loop(test_database.test_table); +SELECT * FROM loop(test_table); +``` + +Or using other table function: + +``` sql +SELECT * FROM loop(numbers(3)) LIMIT 7; + ┌─number─┐ +1. │ 0 │ +2. │ 1 │ +3. │ 2 │ + └────────┘ + ┌─number─┐ +4. │ 0 │ +5. │ 1 │ +6. │ 2 │ + └────────┘ + ┌─number─┐ +7. │ 0 │ + └────────┘ +``` +``` sql +SELECT * FROM loop(mysql('localhost:3306', 'test', 'test', 'user', 'password')); +... +``` \ No newline at end of file diff --git a/docs/ru/sql-reference/functions/math-functions.md b/docs/ru/sql-reference/functions/math-functions.md index 367451a5b32..caacbb216bf 100644 --- a/docs/ru/sql-reference/functions/math-functions.md +++ b/docs/ru/sql-reference/functions/math-functions.md @@ -304,8 +304,8 @@ atan2(y, x) **Аргументы** -- `y` — координата y точки, в которую проведена линия. [Float64](../../sql-reference/data-types/float.md#float32-float64). -- `x` — координата х точки, в которую проведена линия. [Float64](../../sql-reference/data-types/float.md#float32-float64). +- `y` — координата y точки, в которую проведена линия. [Float64](../../sql-reference/data-types/float.md#float32-float64) или [Decimal](../../sql-reference/data-types/decimal.md). +- `x` — координата х точки, в которую проведена линия. [Float64](../../sql-reference/data-types/float.md#float32-float64) или [Decimal](../../sql-reference/data-types/decimal.md). **Возвращаемое значение** @@ -341,8 +341,8 @@ hypot(x, y) **Аргументы** -- `x` — первый катет прямоугольного треугольника. [Float64](../../sql-reference/data-types/float.md#float32-float64). -- `y` — второй катет прямоугольного треугольника. [Float64](../../sql-reference/data-types/float.md#float32-float64). +- `x` — первый катет прямоугольного треугольника. [Float64](../../sql-reference/data-types/float.md#float32-float64) или [Decimal](../../sql-reference/data-types/decimal.md). +- `y` — второй катет прямоугольного треугольника. [Float64](../../sql-reference/data-types/float.md#float32-float64) или [Decimal](../../sql-reference/data-types/decimal.md). **Возвращаемое значение** diff --git a/programs/bash-completion/completions/clickhouse-bootstrap b/programs/bash-completion/completions/clickhouse-bootstrap index 2862140b528..73e2ef07477 100644 --- a/programs/bash-completion/completions/clickhouse-bootstrap +++ b/programs/bash-completion/completions/clickhouse-bootstrap @@ -154,7 +154,8 @@ function _clickhouse_quote() # Extract every option (everything that starts with "-") from the --help dialog. function _clickhouse_get_options() { - "$@" --help 2>&1 | awk -F '[ ,=<>.]' '{ for (i=1; i <= NF; ++i) { if (substr($i, 1, 1) == "-" && length($i) > 1) print $i; } }' | sort -u + # By default --help will not print all settings, this is done only under --verbose + "$@" --help --verbose 2>&1 | awk -F '[ ,=<>.]' '{ for (i=1; i <= NF; ++i) { if (substr($i, 1, 1) == "-" && length($i) > 1) print $i; } }' | sort -u } function _complete_for_clickhouse_generic_bin_impl() diff --git a/programs/keeper-client/Commands.cpp b/programs/keeper-client/Commands.cpp index a109912e6e0..df9da8e9613 100644 --- a/programs/keeper-client/Commands.cpp +++ b/programs/keeper-client/Commands.cpp @@ -10,7 +10,7 @@ namespace DB namespace ErrorCodes { - extern const int KEEPER_EXCEPTION; + extern const int LOGICAL_ERROR; } bool LSCommand::parse(IParser::Pos & pos, std::shared_ptr & node, Expected & expected) const @@ -213,6 +213,143 @@ void GetStatCommand::execute(const ASTKeeperQuery * query, KeeperClient * client std::cout << "numChildren = " << stat.numChildren << "\n"; } +namespace +{ + +/// Helper class for parallelized tree traversal +template +struct TraversalTask : public std::enable_shared_from_this> +{ + using TraversalTaskPtr = std::shared_ptr>; + + struct Ctx + { + std::deque new_tasks; /// Tasks for newly discovered children, that hasn't been started yet + std::deque> in_flight_list_requests; /// In-flight getChildren requests + std::deque> finish_callbacks; /// Callbacks to be called + KeeperClient * client; + UserCtx & user_ctx; + + Ctx(KeeperClient * client_, UserCtx & user_ctx_) : client(client_), user_ctx(user_ctx_) {} + }; + +private: + const fs::path path; + const TraversalTaskPtr parent; + + Int64 child_tasks = 0; + Int64 nodes_in_subtree = 1; + +public: + TraversalTask(const fs::path & path_, TraversalTaskPtr parent_) + : path(path_) + , parent(parent_) + { + } + + /// Start traversing the subtree + void onStart(Ctx & ctx) + { + /// tryGetChildren doesn't throw if the node is not found (was deleted in the meantime) + std::shared_ptr> list_request = + std::make_shared>(ctx.client->zookeeper->asyncTryGetChildren(path)); + ctx.in_flight_list_requests.push_back([task = this->shared_from_this(), list_request](Ctx & ctx_) mutable + { + task->onGetChildren(ctx_, list_request->get()); + }); + } + + /// Called when getChildren request returns + void onGetChildren(Ctx & ctx, const Coordination::ListResponse & response) + { + const bool traverse_children = ctx.user_ctx.onListChildren(path, response.names); + + if (traverse_children) + { + /// Schedule traversal of each child + for (const auto & child : response.names) + { + auto task = std::make_shared(path / child, this->shared_from_this()); + ctx.new_tasks.push_back(task); + } + child_tasks = response.names.size(); + } + + if (child_tasks == 0) + finish(ctx); + } + + /// Called when a child subtree has been traversed + void onChildTraversalFinished(Ctx & ctx, Int64 child_nodes_in_subtree) + { + nodes_in_subtree += child_nodes_in_subtree; + + --child_tasks; + + /// Finish if all children have been traversed + if (child_tasks == 0) + finish(ctx); + } + +private: + /// This node and all its children have been traversed + void finish(Ctx & ctx) + { + ctx.user_ctx.onFinishChildrenTraversal(path, nodes_in_subtree); + + if (!parent) + return; + + /// Notify the parent that we have finished traversing the subtree + ctx.finish_callbacks.push_back([p = this->parent, child_nodes_in_subtree = this->nodes_in_subtree](Ctx & ctx_) + { + p->onChildTraversalFinished(ctx_, child_nodes_in_subtree); + }); + } +}; + +/// Traverses the tree in parallel and calls user callbacks +/// Parallelization is achieved by sending multiple async getChildren requests to Keeper, but all processing is done in a single thread +template +void parallelized_traverse(const fs::path & path, KeeperClient * client, size_t max_in_flight_requests, UserCtx & ctx_) +{ + typename TraversalTask::Ctx ctx(client, ctx_); + + auto root_task = std::make_shared>(path, nullptr); + + ctx.new_tasks.push_back(root_task); + + /// Until there is something to do + while (!ctx.new_tasks.empty() || !ctx.in_flight_list_requests.empty() || !ctx.finish_callbacks.empty()) + { + /// First process all finish callbacks, they don't wait for anything and allow to free memory + while (!ctx.finish_callbacks.empty()) + { + auto callback = std::move(ctx.finish_callbacks.front()); + ctx.finish_callbacks.pop_front(); + callback(ctx); + } + + /// Make new requests if there are less than max in flight + while (!ctx.new_tasks.empty() && ctx.in_flight_list_requests.size() < max_in_flight_requests) + { + auto task = std::move(ctx.new_tasks.front()); + ctx.new_tasks.pop_front(); + task->onStart(ctx); + } + + /// Wait for first request in the queue to finish + if (!ctx.in_flight_list_requests.empty()) + { + auto request = std::move(ctx.in_flight_list_requests.front()); + ctx.in_flight_list_requests.pop_front(); + request(ctx); + } + } +} + +} /// anonymous namespace + bool FindSuperNodes::parse(IParser::Pos & pos, std::shared_ptr & node, Expected & expected) const { ASTPtr threshold; @@ -236,27 +373,21 @@ void FindSuperNodes::execute(const ASTKeeperQuery * query, KeeperClient * client auto threshold = query->args[0].safeGet(); auto path = client->getAbsolutePath(query->args[1].safeGet()); - Coordination::Stat stat; - if (!client->zookeeper->exists(path, &stat)) - return; /// It is ok if node was deleted meanwhile - - if (stat.numChildren >= static_cast(threshold)) - std::cout << static_cast(path) << "\t" << stat.numChildren << "\n"; - - Strings children; - auto status = client->zookeeper->tryGetChildren(path, children); - if (status == Coordination::Error::ZNONODE) - return; /// It is ok if node was deleted meanwhile - else if (status != Coordination::Error::ZOK) - throw DB::Exception(DB::ErrorCodes::KEEPER_EXCEPTION, "Error {} while getting children of {}", status, path.string()); - - std::sort(children.begin(), children.end()); - auto next_query = *query; - for (const auto & child : children) + struct { - next_query.args[1] = DB::Field(path / child); - execute(&next_query, client); - } + bool onListChildren(const fs::path & path, const Strings & children) const + { + if (children.size() >= threshold) + std::cout << static_cast(path) << "\t" << children.size() << "\n"; + return true; + } + + void onFinishChildrenTraversal(const fs::path &, Int64) const {} + + size_t threshold; + } ctx {.threshold = threshold }; + + parallelized_traverse(path, client, /* max_in_flight_requests */ 50, ctx); } bool DeleteStaleBackups::parse(IParser::Pos & /* pos */, std::shared_ptr & /* node */, Expected & /* expected */) const @@ -321,38 +452,28 @@ bool FindBigFamily::parse(IParser::Pos & pos, std::shared_ptr & return true; } -/// DFS the subtree and return the number of nodes in the subtree -static Int64 traverse(const fs::path & path, KeeperClient * client, std::vector> & result) -{ - Int64 nodes_in_subtree = 1; - - Strings children; - auto status = client->zookeeper->tryGetChildren(path, children); - if (status == Coordination::Error::ZNONODE) - return 0; - else if (status != Coordination::Error::ZOK) - throw DB::Exception(DB::ErrorCodes::KEEPER_EXCEPTION, "Error {} while getting children of {}", status, path.string()); - - for (auto & child : children) - nodes_in_subtree += traverse(path / child, client, result); - - result.emplace_back(nodes_in_subtree, path.string()); - - return nodes_in_subtree; -} - void FindBigFamily::execute(const ASTKeeperQuery * query, KeeperClient * client) const { auto path = client->getAbsolutePath(query->args[0].safeGet()); auto n = query->args[1].safeGet(); - std::vector> result; + struct + { + std::vector> result; - traverse(path, client, result); + bool onListChildren(const fs::path &, const Strings &) const { return true; } - std::sort(result.begin(), result.end(), std::greater()); - for (UInt64 i = 0; i < std::min(result.size(), static_cast(n)); ++i) - std::cout << std::get<1>(result[i]) << "\t" << std::get<0>(result[i]) << "\n"; + void onFinishChildrenTraversal(const fs::path & path, Int64 nodes_in_subtree) + { + result.emplace_back(nodes_in_subtree, path.string()); + } + } ctx; + + parallelized_traverse(path, client, /* max_in_flight_requests */ 50, ctx); + + std::sort(ctx.result.begin(), ctx.result.end(), std::greater()); + for (UInt64 i = 0; i < std::min(ctx.result.size(), static_cast(n)); ++i) + std::cout << std::get<1>(ctx.result[i]) << "\t" << std::get<0>(ctx.result[i]) << "\n"; } bool RMCommand::parse(IParser::Pos & pos, std::shared_ptr & node, Expected & expected) const @@ -441,7 +562,7 @@ void ReconfigCommand::execute(const DB::ASTKeeperQuery * query, DB::KeeperClient new_members = query->args[1].safeGet(); break; default: - UNREACHABLE(); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected operation: {}", operation); } auto response = client->zookeeper->reconfig(joining, leaving, new_members); diff --git a/programs/keeper-converter/KeeperConverter.cpp b/programs/keeper-converter/KeeperConverter.cpp index 7518227a070..639e0957f49 100644 --- a/programs/keeper-converter/KeeperConverter.cpp +++ b/programs/keeper-converter/KeeperConverter.cpp @@ -57,7 +57,7 @@ int mainEntryClickHouseKeeperConverter(int argc, char ** argv) DB::KeeperSnapshotManager manager(1, keeper_context); auto snp = manager.serializeSnapshotToBuffer(snapshot); auto file_info = manager.serializeSnapshotBufferToDisk(*snp, storage.getZXID()); - std::cout << "Snapshot serialized to path:" << fs::path(file_info.disk->getPath()) / file_info.path << std::endl; + std::cout << "Snapshot serialized to path:" << fs::path(file_info->disk->getPath()) / file_info->path << std::endl; } catch (...) { diff --git a/programs/keeper/CMakeLists.txt b/programs/keeper/CMakeLists.txt index af360e44ff4..52aa601b1a2 100644 --- a/programs/keeper/CMakeLists.txt +++ b/programs/keeper/CMakeLists.txt @@ -9,8 +9,6 @@ set (CLICKHOUSE_KEEPER_LINK clickhouse_common_zookeeper daemon dbms - - ${LINK_RESOURCE_LIB} ) clickhouse_program_add(keeper) @@ -210,8 +208,6 @@ if (BUILD_STANDALONE_KEEPER) loggers_no_text_log clickhouse_common_io clickhouse_parsers # Otherwise compression will not built. FIXME. - - ${LINK_RESOURCE_LIB_STANDALONE_KEEPER} ) set_target_properties(clickhouse-keeper PROPERTIES RUNTIME_OUTPUT_DIRECTORY ../) diff --git a/programs/main.cpp b/programs/main.cpp index bc8476e4ce4..c270388f17f 100644 --- a/programs/main.cpp +++ b/programs/main.cpp @@ -155,8 +155,8 @@ auto instructionFailToString(InstructionFail fail) ret("AVX2"); case InstructionFail::AVX512: ret("AVX512"); +#undef ret } - UNREACHABLE(); } diff --git a/programs/server/CMakeLists.txt b/programs/server/CMakeLists.txt index 76d201cc924..be696ff2afe 100644 --- a/programs/server/CMakeLists.txt +++ b/programs/server/CMakeLists.txt @@ -14,8 +14,6 @@ set (CLICKHOUSE_SERVER_LINK clickhouse_storages_system clickhouse_table_functions - ${LINK_RESOURCE_LIB} - PUBLIC daemon ) diff --git a/programs/server/config.xml b/programs/server/config.xml index 27ed5952fc9..4b3248d9d1c 100644 --- a/programs/server/config.xml +++ b/programs/server/config.xml @@ -715,7 +715,7 @@ + By default this setting is true. --> true 3 equal ranges +/// pk1 pk2 c1 c2 +/// ---------------------- +/// 1 1 a b +/// 1 1 b e +/// -------- +/// 1 2 e a +/// 1 2 d c +/// 1 2 e a +/// -------- +/// 2 1 a 3 +/// ---------------------- +EqualRanges getEqualRanges(const Block & block, const SortDescription & sort_description, const IColumn::Permutation & permutation, const LoggerPtr & log) +{ + LOG_TRACE(log, "Finding equal ranges"); + EqualRanges ranges; + const size_t rows = block.rows(); + if (sort_description.empty()) + { + ranges.push_back({0, rows}); + } + else + { + for (size_t i = 0; i < rows;) + { + size_t j = i; + while (j < rows && haveEqualSortingKeyValues(block, sort_description, permutation[i], permutation[j])) + ++j; + ranges.push_back({i, j}); + i = j; + } + } + return ranges; +} + +std::vector getCardinalitiesInPermutedRange( + const Block & block, + const std::vector & other_column_indexes, + const IColumn::Permutation & permutation, + const EqualRange & equal_range) +{ + std::vector cardinalities(other_column_indexes.size()); + for (size_t i = 0; i < other_column_indexes.size(); ++i) + { + const size_t column_id = other_column_indexes[i]; + const ColumnPtr & column = block.getByPosition(column_id).column; + cardinalities[i] = column->estimateCardinalityInPermutedRange(permutation, equal_range); + } + return cardinalities; +} + +void updatePermutationInEqualRange( + const Block & block, + const std::vector & other_column_indexes, + IColumn::Permutation & permutation, + const EqualRange & equal_range, + const std::vector & cardinalities) +{ + LoggerPtr log = getLogger("RowOrderOptimizer"); + + LOG_TRACE(log, "Starting optimization in equal range"); + + std::vector column_order(other_column_indexes.size()); + iota(column_order.begin(), column_order.end(), 0); + auto cmp = [&](size_t lhs, size_t rhs) -> bool { return cardinalities[lhs] < cardinalities[rhs]; }; + stable_sort(column_order.begin(), column_order.end(), cmp); + + std::vector ranges = {equal_range}; + LOG_TRACE(log, "equal_range: .from: {}, .to: {}", equal_range.from, equal_range.to); + for (size_t i : column_order) + { + const size_t column_id = other_column_indexes[i]; + const ColumnPtr & column = block.getByPosition(column_id).column; + LOG_TRACE(log, "i: {}, column_id: {}, column->getName(): {}, cardinality: {}", i, column_id, column->getName(), cardinalities[i]); + column->updatePermutation( + IColumn::PermutationSortDirection::Ascending, IColumn::PermutationSortStability::Stable, 0, 1, permutation, ranges); + } + + LOG_TRACE(log, "Finish optimization in equal range"); +} + +} + +void RowOrderOptimizer::optimize(const Block & block, const SortDescription & sort_description, IColumn::Permutation & permutation) +{ + LoggerPtr log = getLogger("RowOrderOptimizer"); + + LOG_TRACE(log, "Starting optimization"); + + if (block.columns() == 0) + return; /// a table without columns, this should not happen in the first place ... + + if (permutation.empty()) + { + const size_t rows = block.rows(); + permutation.resize(rows); + iota(permutation.data(), rows, IColumn::Permutation::value_type(0)); + } + + const EqualRanges equal_ranges = getEqualRanges(block, sort_description, permutation, log); + const std::vector other_columns_indexes = getOtherColumnIndexes(block, sort_description); + + LOG_TRACE(log, "block.columns(): {}, block.rows(): {}, sort_description.size(): {}, equal_ranges.size(): {}", block.columns(), block.rows(), sort_description.size(), equal_ranges.size()); + + for (const auto & equal_range : equal_ranges) + { + if (equal_range.size() <= 1) + continue; + const std::vector cardinalities = getCardinalitiesInPermutedRange(block, other_columns_indexes, permutation, equal_range); + updatePermutationInEqualRange(block, other_columns_indexes, permutation, equal_range, cardinalities); + } + + LOG_TRACE(log, "Finished optimization"); +} + +} diff --git a/src/Storages/MergeTree/RowOrderOptimizer.h b/src/Storages/MergeTree/RowOrderOptimizer.h new file mode 100644 index 00000000000..f321345c3e4 --- /dev/null +++ b/src/Storages/MergeTree/RowOrderOptimizer.h @@ -0,0 +1,26 @@ +#pragma once + +#include +#include +#include + +namespace DB +{ + +class RowOrderOptimizer +{ +public: + /// Given the columns in a Block with a sub-set of them as sorting key columns (usually primary key columns --> SortDescription), and a + /// permutation of the rows, this function tries to "improve" the permutation such that the data can be compressed better by generic + /// compression algorithms such as zstd. The heuristics is based on D. Lemire, O. Kaser (2011): Reordering columns for smaller + /// indexes, https://doi.org/10.1016/j.ins.2011.02.002 + /// The algorithm works like this: + /// - Divide the sorting key columns horizontally into "equal ranges". An equal range is defined by the same sorting key values on all + /// of its rows. We can re-shuffle the non-sorting-key values within each equal range freely. + /// - Determine (estimate) for each equal range the cardinality of each non-sorting-key column. + /// - The simple heuristics applied is that non-sorting key columns will be sorted (within each equal range) in order of ascending + /// cardinality. This maximizes the length of equal-value runs within the non-sorting-key columns, leading to better compressability. + static void optimize(const Block & block, const SortDescription & sort_description, IColumn::Permutation & permutation); +}; + +} diff --git a/src/Storages/NamedCollectionsHelpers.cpp b/src/Storages/NamedCollectionsHelpers.cpp index c1e744e8d79..47b69d79ad8 100644 --- a/src/Storages/NamedCollectionsHelpers.cpp +++ b/src/Storages/NamedCollectionsHelpers.cpp @@ -1,6 +1,7 @@ #include "NamedCollectionsHelpers.h" #include #include +#include #include #include #include diff --git a/src/Storages/ObjectStorage/DataLakes/DeltaLakeMetadata.cpp b/src/Storages/ObjectStorage/DataLakes/DeltaLakeMetadata.cpp index 277d07d88ef..38bf3112ee2 100644 --- a/src/Storages/ObjectStorage/DataLakes/DeltaLakeMetadata.cpp +++ b/src/Storages/ObjectStorage/DataLakes/DeltaLakeMetadata.cpp @@ -269,13 +269,12 @@ struct DeltaLakeMetadata::Impl header.insert({column.type->createColumn(), column.type, column.name}); std::atomic is_stopped{0}; - auto arrow_file = asArrowFile(*buf, format_settings, is_stopped, "Parquet", PARQUET_MAGIC_BYTES); std::unique_ptr reader; THROW_ARROW_NOT_OK( parquet::arrow::OpenFile( asArrowFile(*buf, format_settings, is_stopped, "Parquet", PARQUET_MAGIC_BYTES), - arrow::default_memory_pool(), + ArrowMemoryPool::instance(), &reader)); std::shared_ptr schema; diff --git a/src/Storages/S3Queue/S3QueueFilesMetadata.cpp b/src/Storages/S3Queue/S3QueueFilesMetadata.cpp deleted file mode 100644 index e1583b8329c..00000000000 --- a/src/Storages/S3Queue/S3QueueFilesMetadata.cpp +++ /dev/null @@ -1,1173 +0,0 @@ -#include "config.h" - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include - - -namespace ProfileEvents -{ - extern const Event S3QueueSetFileProcessingMicroseconds; - extern const Event S3QueueSetFileProcessedMicroseconds; - extern const Event S3QueueSetFileFailedMicroseconds; - extern const Event S3QueueFailedFiles; - extern const Event S3QueueProcessedFiles; - extern const Event S3QueueCleanupMaxSetSizeOrTTLMicroseconds; - extern const Event S3QueueLockLocalFileStatusesMicroseconds; - extern const Event CannotRemoveEphemeralNode; -}; - -namespace DB -{ - -namespace ErrorCodes -{ - extern const int LOGICAL_ERROR; - extern const int BAD_ARGUMENTS; -} - -namespace -{ - UInt64 getCurrentTime() - { - return std::chrono::duration_cast(std::chrono::system_clock::now().time_since_epoch()).count(); - } - - size_t generateRescheduleInterval(size_t min, size_t max) - { - /// Use more or less random interval for unordered mode cleanup task. - /// So that distributed processing cleanup tasks would not schedule cleanup at the same time. - pcg64 rng(randomSeed()); - return min + rng() % (max - min + 1); - } -} - -std::unique_lock S3QueueFilesMetadata::LocalFileStatuses::lock() const -{ - auto timer = DB::CurrentThread::getProfileEvents().timer(ProfileEvents::S3QueueLockLocalFileStatusesMicroseconds); - return std::unique_lock(mutex); -} - -S3QueueFilesMetadata::FileStatuses S3QueueFilesMetadata::LocalFileStatuses::getAll() const -{ - auto lk = lock(); - return file_statuses; -} - -S3QueueFilesMetadata::FileStatusPtr S3QueueFilesMetadata::LocalFileStatuses::get(const std::string & filename, bool create) -{ - auto lk = lock(); - auto it = file_statuses.find(filename); - if (it == file_statuses.end()) - { - if (create) - it = file_statuses.emplace(filename, std::make_shared()).first; - else - throw Exception(ErrorCodes::BAD_ARGUMENTS, "File status for {} doesn't exist", filename); - } - return it->second; -} - -bool S3QueueFilesMetadata::LocalFileStatuses::remove(const std::string & filename, bool if_exists) -{ - auto lk = lock(); - auto it = file_statuses.find(filename); - if (it == file_statuses.end()) - { - if (if_exists) - return false; - else - throw Exception(ErrorCodes::BAD_ARGUMENTS, "File status for {} doesn't exist", filename); - } - file_statuses.erase(it); - return true; -} - -std::string S3QueueFilesMetadata::NodeMetadata::toString() const -{ - Poco::JSON::Object json; - json.set("file_path", file_path); - json.set("last_processed_timestamp", getCurrentTime()); - json.set("last_exception", last_exception); - json.set("retries", retries); - json.set("processing_id", processing_id); - - std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM - oss.exceptions(std::ios::failbit); - Poco::JSON::Stringifier::stringify(json, oss); - return oss.str(); -} - -S3QueueFilesMetadata::NodeMetadata S3QueueFilesMetadata::NodeMetadata::fromString(const std::string & metadata_str) -{ - Poco::JSON::Parser parser; - auto json = parser.parse(metadata_str).extract(); - - NodeMetadata metadata; - metadata.file_path = json->getValue("file_path"); - metadata.last_processed_timestamp = json->getValue("last_processed_timestamp"); - metadata.last_exception = json->getValue("last_exception"); - metadata.retries = json->getValue("retries"); - metadata.processing_id = json->getValue("processing_id"); - return metadata; -} - -S3QueueFilesMetadata::S3QueueFilesMetadata(const fs::path & zookeeper_path_, const S3QueueSettings & settings_) - : mode(settings_.mode) - , max_set_size(settings_.s3queue_tracked_files_limit.value) - , max_set_age_sec(settings_.s3queue_tracked_file_ttl_sec.value) - , max_loading_retries(settings_.s3queue_loading_retries.value) - , min_cleanup_interval_ms(settings_.s3queue_cleanup_interval_min_ms.value) - , max_cleanup_interval_ms(settings_.s3queue_cleanup_interval_max_ms.value) - , shards_num(settings_.s3queue_total_shards_num) - , threads_per_shard(settings_.s3queue_processing_threads_num) - , zookeeper_processing_path(zookeeper_path_ / "processing") - , zookeeper_processed_path(zookeeper_path_ / "processed") - , zookeeper_failed_path(zookeeper_path_ / "failed") - , zookeeper_shards_path(zookeeper_path_ / "shards") - , zookeeper_cleanup_lock_path(zookeeper_path_ / "cleanup_lock") - , log(getLogger("StorageS3Queue(" + zookeeper_path_.string() + ")")) -{ - if (mode == S3QueueMode::UNORDERED && (max_set_size || max_set_age_sec)) - { - task = Context::getGlobalContextInstance()->getSchedulePool().createTask("S3QueueCleanupFunc", [this] { cleanupThreadFunc(); }); - task->activate(); - task->scheduleAfter(generateRescheduleInterval(min_cleanup_interval_ms, max_cleanup_interval_ms)); - } -} - -S3QueueFilesMetadata::~S3QueueFilesMetadata() -{ - deactivateCleanupTask(); -} - -void S3QueueFilesMetadata::deactivateCleanupTask() -{ - shutdown = true; - if (task) - task->deactivate(); -} - -zkutil::ZooKeeperPtr S3QueueFilesMetadata::getZooKeeper() const -{ - return Context::getGlobalContextInstance()->getZooKeeper(); -} - -S3QueueFilesMetadata::FileStatusPtr S3QueueFilesMetadata::getFileStatus(const std::string & path) -{ - /// Return a locally cached file status. - return local_file_statuses.get(path, /* create */false); -} - -std::string S3QueueFilesMetadata::getNodeName(const std::string & path) -{ - /// Since with are dealing with paths in s3 which can have "/", - /// we cannot create a zookeeper node with the name equal to path. - /// Therefore we use a hash of the path as a node name. - - SipHash path_hash; - path_hash.update(path); - return toString(path_hash.get64()); -} - -S3QueueFilesMetadata::NodeMetadata S3QueueFilesMetadata::createNodeMetadata( - const std::string & path, - const std::string & exception, - size_t retries) -{ - /// Create a metadata which will be stored in a node named as getNodeName(path). - - /// Since node name is just a hash we want to know to which file it corresponds, - /// so we keep "file_path" in nodes data. - /// "last_processed_timestamp" is needed for TTL metadata nodes enabled by s3queue_tracked_file_ttl_sec. - /// "last_exception" is kept for introspection, should also be visible in system.s3queue_log if it is enabled. - /// "retries" is kept for retrying the processing enabled by s3queue_loading_retries. - NodeMetadata metadata; - metadata.file_path = path; - metadata.last_processed_timestamp = getCurrentTime(); - metadata.last_exception = exception; - metadata.retries = retries; - return metadata; -} - -bool S3QueueFilesMetadata::isShardedProcessing() const -{ - return getProcessingIdsNum() > 1 && mode == S3QueueMode::ORDERED; -} - -size_t S3QueueFilesMetadata::registerNewShard() -{ - if (!isShardedProcessing()) - { - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Cannot register a new shard, because processing is not sharded"); - } - - const auto zk_client = getZooKeeper(); - zk_client->createIfNotExists(zookeeper_shards_path, ""); - - std::string shard_node_path; - size_t shard_id = 0; - for (size_t i = 0; i < shards_num; ++i) - { - const auto node_path = getZooKeeperPathForShard(i); - auto err = zk_client->tryCreate(node_path, "", zkutil::CreateMode::Persistent); - if (err == Coordination::Error::ZOK) - { - shard_node_path = node_path; - shard_id = i; - break; - } - else if (err == Coordination::Error::ZNODEEXISTS) - continue; - else - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Unexpected error: {}", magic_enum::enum_name(err)); - } - - if (shard_node_path.empty()) - throw Exception(ErrorCodes::LOGICAL_ERROR, "Failed to register a new shard"); - - LOG_TRACE(log, "Using shard {} (zk node: {})", shard_id, shard_node_path); - return shard_id; -} - -std::string S3QueueFilesMetadata::getZooKeeperPathForShard(size_t shard_id) const -{ - return zookeeper_shards_path / ("shard" + toString(shard_id)); -} - -void S3QueueFilesMetadata::registerNewShard(size_t shard_id) -{ - if (!isShardedProcessing()) - { - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Cannot register a new shard, because processing is not sharded"); - } - - const auto zk_client = getZooKeeper(); - const auto node_path = getZooKeeperPathForShard(shard_id); - zk_client->createAncestors(node_path); - - auto err = zk_client->tryCreate(node_path, "", zkutil::CreateMode::Persistent); - if (err != Coordination::Error::ZOK) - { - if (err == Coordination::Error::ZNODEEXISTS) - throw Exception(ErrorCodes::BAD_ARGUMENTS, "Cannot register shard {}: already exists", shard_id); - else - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Unexpected error: {}", magic_enum::enum_name(err)); - } -} - -bool S3QueueFilesMetadata::isShardRegistered(size_t shard_id) -{ - const auto zk_client = getZooKeeper(); - const auto node_path = getZooKeeperPathForShard(shard_id); - return zk_client->exists(node_path); -} - -void S3QueueFilesMetadata::unregisterShard(size_t shard_id) -{ - if (!isShardedProcessing()) - { - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Cannot unregister a shard, because processing is not sharded"); - } - - const auto zk_client = getZooKeeper(); - const auto node_path = getZooKeeperPathForShard(shard_id); - auto error_code = zk_client->tryRemove(node_path); - if (error_code != Coordination::Error::ZOK - && error_code != Coordination::Error::ZNONODE) - throw zkutil::KeeperException::fromPath(error_code, node_path); -} - -size_t S3QueueFilesMetadata::getProcessingIdsNum() const -{ - return shards_num * threads_per_shard; -} - -std::vector S3QueueFilesMetadata::getProcessingIdsForShard(size_t shard_id) const -{ - std::vector res(threads_per_shard); - std::iota(res.begin(), res.end(), shard_id * threads_per_shard); - return res; -} - -bool S3QueueFilesMetadata::isProcessingIdBelongsToShard(size_t id, size_t shard_id) const -{ - return shard_id * threads_per_shard <= id && id < (shard_id + 1) * threads_per_shard; -} - -size_t S3QueueFilesMetadata::getIdForProcessingThread(size_t thread_id, size_t shard_id) const -{ - return shard_id * threads_per_shard + thread_id; -} - -size_t S3QueueFilesMetadata::getProcessingIdForPath(const std::string & path) const -{ - return sipHash64(path) % getProcessingIdsNum(); -} - -S3QueueFilesMetadata::ProcessingNodeHolderPtr S3QueueFilesMetadata::trySetFileAsProcessing(const std::string & path) -{ - auto timer = DB::CurrentThread::getProfileEvents().timer(ProfileEvents::S3QueueSetFileProcessingMicroseconds); - auto file_status = local_file_statuses.get(path, /* create */true); - - /// Check locally cached file status. - /// Processed or Failed state is always cached. - /// Processing state is cached only if processing is being done by current clickhouse server - /// (because If another server is doing the processing, - /// we cannot know if state changes without checking with zookeeper so there is no point in cache here). - - { - std::lock_guard lock(file_status->metadata_lock); - switch (file_status->state) - { - case FileStatus::State::Processing: - { - LOG_TEST(log, "File {} is already processing", path); - return {}; - } - case FileStatus::State::Processed: - { - LOG_TEST(log, "File {} is already processed", path); - return {}; - } - case FileStatus::State::Failed: - { - /// If max_loading_retries == 0, file is not retriable. - if (max_loading_retries == 0) - { - LOG_TEST(log, "File {} is failed and processing retries are disabled", path); - return {}; - } - - /// Otherwise file_status->retries is also cached. - /// In case file_status->retries >= max_loading_retries we can fully rely that it is true - /// and will not attempt processing it. - /// But in case file_status->retries < max_loading_retries we cannot be sure - /// (another server could have done a try after we cached retries value), - /// so check with zookeeper here. - if (file_status->retries >= max_loading_retries) - { - LOG_TEST(log, "File {} is failed and processing retries are exceeeded", path); - return {}; - } - - break; - } - case FileStatus::State::None: - { - /// The file was not processed by current server and file status was not cached, - /// check metadata in zookeeper. - break; - } - } - } - - /// Another thread could already be trying to set file as processing. - /// So there is no need to attempt the same, better to continue with the next file. - std::unique_lock processing_lock(file_status->processing_lock, std::defer_lock); - if (!processing_lock.try_lock()) - { - return {}; - } - - /// Let's go and check metadata in zookeeper and try to create a /processing ephemeral node. - /// If successful, return result with processing node holder. - SetFileProcessingResult result; - ProcessingNodeHolderPtr processing_node_holder; - - switch (mode) - { - case S3QueueMode::ORDERED: - { - std::tie(result, processing_node_holder) = trySetFileAsProcessingForOrderedMode(path, file_status); - break; - } - case S3QueueMode::UNORDERED: - { - std::tie(result, processing_node_holder) = trySetFileAsProcessingForUnorderedMode(path, file_status); - break; - } - } - - /// Cache file status, save some statistics. - switch (result) - { - case SetFileProcessingResult::Success: - { - std::lock_guard lock(file_status->metadata_lock); - file_status->state = FileStatus::State::Processing; - - file_status->profile_counters.increment(ProfileEvents::S3QueueSetFileProcessingMicroseconds, timer.get()); - timer.cancel(); - - if (!file_status->processing_start_time) - file_status->processing_start_time = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()); - - return processing_node_holder; - } - case SetFileProcessingResult::AlreadyProcessed: - { - std::lock_guard lock(file_status->metadata_lock); - file_status->state = FileStatus::State::Processed; - return {}; - } - case SetFileProcessingResult::AlreadyFailed: - { - std::lock_guard lock(file_status->metadata_lock); - file_status->state = FileStatus::State::Failed; - return {}; - } - case SetFileProcessingResult::ProcessingByOtherNode: - { - /// We cannot save any local state here, see comment above. - return {}; - } - } -} - -std::pair -S3QueueFilesMetadata::trySetFileAsProcessingForUnorderedMode(const std::string & path, const FileStatusPtr & file_status) -{ - /// In one zookeeper transaction do the following: - /// 1. check that corresponding persistent nodes do not exist in processed/ and failed/; - /// 2. create an ephemenral node in /processing if it does not exist; - /// Return corresponding status if any of the step failed. - - const auto node_name = getNodeName(path); - const auto zk_client = getZooKeeper(); - auto node_metadata = createNodeMetadata(path); - node_metadata.processing_id = getRandomASCIIString(10); - - Coordination::Requests requests; - - requests.push_back(zkutil::makeCreateRequest(zookeeper_processed_path / node_name, "", zkutil::CreateMode::Persistent)); - requests.push_back(zkutil::makeRemoveRequest(zookeeper_processed_path / node_name, -1)); - - requests.push_back(zkutil::makeCreateRequest(zookeeper_failed_path / node_name, "", zkutil::CreateMode::Persistent)); - requests.push_back(zkutil::makeRemoveRequest(zookeeper_failed_path / node_name, -1)); - - requests.push_back(zkutil::makeCreateRequest(zookeeper_processing_path / node_name, node_metadata.toString(), zkutil::CreateMode::Ephemeral)); - - Coordination::Responses responses; - auto code = zk_client->tryMulti(requests, responses); - - if (code == Coordination::Error::ZOK) - { - auto holder = std::make_unique( - node_metadata.processing_id, path, zookeeper_processing_path / node_name, file_status, zk_client, log); - return std::pair{SetFileProcessingResult::Success, std::move(holder)}; - } - - if (responses[0]->error != Coordination::Error::ZOK) - { - return std::pair{SetFileProcessingResult::AlreadyProcessed, nullptr}; - } - else if (responses[2]->error != Coordination::Error::ZOK) - { - return std::pair{SetFileProcessingResult::AlreadyFailed, nullptr}; - } - else if (responses[4]->error != Coordination::Error::ZOK) - { - return std::pair{SetFileProcessingResult::ProcessingByOtherNode, nullptr}; - } - else - { - throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected state of zookeeper transaction: {}", magic_enum::enum_name(code)); - } -} - -std::pair -S3QueueFilesMetadata::trySetFileAsProcessingForOrderedMode(const std::string & path, const FileStatusPtr & file_status) -{ - /// Same as for Unordered mode. - /// The only difference is the check if the file is already processed. - /// For Ordered mode we do not keep a separate /processed/hash_node for each file - /// but instead we only keep a maximum processed file - /// (since all files are ordered and new files have a lexically bigger name, it makes sense). - - const auto node_name = getNodeName(path); - const auto zk_client = getZooKeeper(); - auto node_metadata = createNodeMetadata(path); - node_metadata.processing_id = getRandomASCIIString(10); - - while (true) - { - /// Get a /processed node content - max_processed path. - /// Compare our path to it. - /// If file is not yet processed, check corresponding /failed node and try create /processing node - /// and in the same zookeeper transaction also check that /processed node did not change - /// in between, e.g. that stat.version remained the same. - /// If the version did change - retry (since we cannot do Get and Create requests - /// in the same zookeeper transaction, so we use a while loop with tries). - - auto processed_node = isShardedProcessing() - ? zookeeper_processed_path / toString(getProcessingIdForPath(path)) - : zookeeper_processed_path; - - NodeMetadata processed_node_metadata; - Coordination::Stat processed_node_stat; - std::string data; - auto processed_node_exists = zk_client->tryGet(processed_node, data, &processed_node_stat); - if (processed_node_exists && !data.empty()) - processed_node_metadata = NodeMetadata::fromString(data); - - auto max_processed_file_path = processed_node_metadata.file_path; - if (!max_processed_file_path.empty() && path <= max_processed_file_path) - { - LOG_TEST(log, "File {} is already processed, max processed file: {}", path, max_processed_file_path); - return std::pair{SetFileProcessingResult::AlreadyProcessed, nullptr}; - } - - Coordination::Requests requests; - requests.push_back(zkutil::makeCreateRequest(zookeeper_failed_path / node_name, "", zkutil::CreateMode::Persistent)); - requests.push_back(zkutil::makeRemoveRequest(zookeeper_failed_path / node_name, -1)); - - requests.push_back(zkutil::makeCreateRequest(zookeeper_processing_path / node_name, node_metadata.toString(), zkutil::CreateMode::Ephemeral)); - - if (processed_node_exists) - { - requests.push_back(zkutil::makeCheckRequest(processed_node, processed_node_stat.version)); - } - else - { - requests.push_back(zkutil::makeCreateRequest(processed_node, "", zkutil::CreateMode::Persistent)); - requests.push_back(zkutil::makeRemoveRequest(processed_node, -1)); - } - - Coordination::Responses responses; - auto code = zk_client->tryMulti(requests, responses); - if (code == Coordination::Error::ZOK) - { - auto holder = std::make_unique( - node_metadata.processing_id, path, zookeeper_processing_path / node_name, file_status, zk_client, log); - - LOG_TEST(log, "File {} is ready to be processed", path); - return std::pair{SetFileProcessingResult::Success, std::move(holder)}; - } - - if (responses[0]->error != Coordination::Error::ZOK) - { - LOG_TEST(log, "Skipping file `{}`: failed", path); - return std::pair{SetFileProcessingResult::AlreadyFailed, nullptr}; - } - else if (responses[2]->error != Coordination::Error::ZOK) - { - LOG_TEST(log, "Skipping file `{}`: already processing", path); - return std::pair{SetFileProcessingResult::ProcessingByOtherNode, nullptr}; - } - else - { - LOG_TEST(log, "Version of max processed file changed. Retrying the check for file `{}`", path); - } - } -} - -void S3QueueFilesMetadata::setFileProcessed(ProcessingNodeHolderPtr holder) -{ - auto timer = DB::CurrentThread::getProfileEvents().timer(ProfileEvents::S3QueueSetFileProcessedMicroseconds); - auto file_status = holder->getFileStatus(); - { - std::lock_guard lock(file_status->metadata_lock); - file_status->state = FileStatus::State::Processed; - file_status->processing_end_time = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()); - } - - SCOPE_EXIT({ - file_status->profile_counters.increment(ProfileEvents::S3QueueSetFileProcessedMicroseconds, timer.get()); - timer.cancel(); - }); - - switch (mode) - { - case S3QueueMode::ORDERED: - { - setFileProcessedForOrderedMode(holder); - break; - } - case S3QueueMode::UNORDERED: - { - setFileProcessedForUnorderedMode(holder); - break; - } - } - - ProfileEvents::increment(ProfileEvents::S3QueueProcessedFiles); -} - -void S3QueueFilesMetadata::setFileProcessedForUnorderedMode(ProcessingNodeHolderPtr holder) -{ - /// Create a persistent node in /processed and remove ephemeral node from /processing. - - const auto & path = holder->path; - const auto node_name = getNodeName(path); - const auto node_metadata = createNodeMetadata(path).toString(); - const auto zk_client = getZooKeeper(); - - Coordination::Requests requests; - requests.push_back(zkutil::makeCreateRequest(zookeeper_processed_path / node_name, node_metadata, zkutil::CreateMode::Persistent)); - - Coordination::Responses responses; - if (holder->remove(&requests, &responses)) - { - LOG_TRACE(log, "Moved file `{}` to processed", path); - if (max_loading_retries) - zk_client->tryRemove(zookeeper_failed_path / (node_name + ".retriable"), -1); - return; - } - - if (!responses.empty() && responses[0]->error != Coordination::Error::ZOK) - { - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Cannot create a persistent node in /processed since it already exists"); - } - - LOG_WARNING(log, - "Cannot set file ({}) as processed since ephemeral node in /processing" - "does not exist with expected id, " - "this could be a result of expired zookeeper session", path); -} - - -void S3QueueFilesMetadata::setFileProcessedForOrderedMode(ProcessingNodeHolderPtr holder) -{ - auto processed_node_path = isShardedProcessing() - ? zookeeper_processed_path / toString(getProcessingIdForPath(holder->path)) - : zookeeper_processed_path; - - setFileProcessedForOrderedModeImpl(holder->path, holder, processed_node_path); -} - -void S3QueueFilesMetadata::setFileProcessedForOrderedModeImpl( - const std::string & path, ProcessingNodeHolderPtr holder, const std::string & processed_node_path) -{ - /// Update a persistent node in /processed and remove ephemeral node from /processing. - - const auto node_name = getNodeName(path); - const auto node_metadata = createNodeMetadata(path).toString(); - const auto zk_client = getZooKeeper(); - - LOG_TRACE(log, "Setting file `{}` as processed (at {})", path, processed_node_path); - while (true) - { - std::string res; - Coordination::Stat stat; - bool exists = zk_client->tryGet(processed_node_path, res, &stat); - Coordination::Requests requests; - if (exists) - { - if (!res.empty()) - { - auto metadata = NodeMetadata::fromString(res); - if (metadata.file_path >= path) - { - LOG_TRACE(log, "File {} is already processed, current max processed file: {}", path, metadata.file_path); - return; - } - } - requests.push_back(zkutil::makeSetRequest(processed_node_path, node_metadata, stat.version)); - } - else - { - requests.push_back(zkutil::makeCreateRequest(processed_node_path, node_metadata, zkutil::CreateMode::Persistent)); - } - - Coordination::Responses responses; - if (holder) - { - if (holder->remove(&requests, &responses)) - { - LOG_TRACE(log, "Moved file `{}` to processed", path); - if (max_loading_retries) - zk_client->tryRemove(zookeeper_failed_path / (node_name + ".retriable"), -1); - return; - } - } - else - { - auto code = zk_client->tryMulti(requests, responses); - if (code == Coordination::Error::ZOK) - { - LOG_TRACE(log, "Moved file `{}` to processed", path); - return; - } - } - - /// Failed to update max processed node, retry. - if (!responses.empty() && responses[0]->error != Coordination::Error::ZOK) - { - LOG_TRACE(log, "Failed to update processed node for path {} ({}). Will retry.", - path, magic_enum::enum_name(responses[0]->error)); - continue; - } - - LOG_WARNING(log, "Cannot set file ({}) as processed since processing node " - "does not exist with expected processing id does not exist, " - "this could be a result of expired zookeeper session", path); - return; - } -} - -void S3QueueFilesMetadata::setFileProcessed(const std::string & path, size_t shard_id) -{ - if (mode != S3QueueMode::ORDERED) - throw Exception(ErrorCodes::BAD_ARGUMENTS, "Can set file as preprocessed only for Ordered mode"); - - if (isShardedProcessing()) - { - for (const auto & processor : getProcessingIdsForShard(shard_id)) - setFileProcessedForOrderedModeImpl(path, nullptr, zookeeper_processed_path / toString(processor)); - } - else - { - setFileProcessedForOrderedModeImpl(path, nullptr, zookeeper_processed_path); - } -} - -void S3QueueFilesMetadata::setFileFailed(ProcessingNodeHolderPtr holder, const String & exception_message) -{ - auto timer = DB::CurrentThread::getProfileEvents().timer(ProfileEvents::S3QueueSetFileFailedMicroseconds); - const auto & path = holder->path; - - auto file_status = holder->getFileStatus(); - { - std::lock_guard lock(file_status->metadata_lock); - file_status->state = FileStatus::State::Failed; - file_status->last_exception = exception_message; - file_status->processing_end_time = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()); - } - - ProfileEvents::increment(ProfileEvents::S3QueueFailedFiles); - - SCOPE_EXIT({ - file_status->profile_counters.increment(ProfileEvents::S3QueueSetFileFailedMicroseconds, timer.get()); - timer.cancel(); - }); - - const auto node_name = getNodeName(path); - auto node_metadata = createNodeMetadata(path, exception_message); - const auto zk_client = getZooKeeper(); - - /// Is file retriable? - if (max_loading_retries == 0) - { - /// File is not retriable, - /// just create a node in /failed and remove a node from /processing. - - Coordination::Requests requests; - requests.push_back(zkutil::makeCreateRequest(zookeeper_failed_path / node_name, - node_metadata.toString(), - zkutil::CreateMode::Persistent)); - Coordination::Responses responses; - if (holder->remove(&requests, &responses)) - { - LOG_TRACE(log, "File `{}` failed to process and will not be retried. " - "Error: {}", path, exception_message); - return; - } - - if (responses[0]->error != Coordination::Error::ZOK) - { - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Cannot create a persistent node in /failed since it already exists"); - } - - LOG_WARNING(log, "Cannot set file ({}) as processed since processing node " - "does not exist with expected processing id does not exist, " - "this could be a result of expired zookeeper session", path); - - return; - } - - /// So file is retriable. - /// Let's do an optimization here. - /// Instead of creating a persistent /failed/node_hash node - /// we create a persistent /failed/node_hash.retriable node. - /// This allows us to make less zookeeper requests as we avoid checking - /// the number of already done retries in trySetFileAsProcessing. - - const auto node_name_with_retriable_suffix = node_name + ".retriable"; - Coordination::Stat stat; - std::string res; - - /// Extract the number of already done retries from node_hash.retriable node if it exists. - if (zk_client->tryGet(zookeeper_failed_path / node_name_with_retriable_suffix, res, &stat)) - { - auto failed_node_metadata = NodeMetadata::fromString(res); - node_metadata.retries = failed_node_metadata.retries + 1; - - std::lock_guard lock(file_status->metadata_lock); - file_status->retries = node_metadata.retries; - } - - LOG_TRACE(log, "File `{}` failed to process, try {}/{} (Error: {})", - path, node_metadata.retries, max_loading_retries, exception_message); - - /// Check if file can be retried further or not. - if (node_metadata.retries >= max_loading_retries) - { - /// File is no longer retriable. - /// Make a persistent node /failed/node_hash, remove /failed/node_hash.retriable node and node in /processing. - - Coordination::Requests requests; - requests.push_back(zkutil::makeRemoveRequest(zookeeper_processing_path / node_name, -1)); - requests.push_back(zkutil::makeRemoveRequest(zookeeper_failed_path / node_name_with_retriable_suffix, - stat.version)); - requests.push_back(zkutil::makeCreateRequest(zookeeper_failed_path / node_name, - node_metadata.toString(), - zkutil::CreateMode::Persistent)); - - Coordination::Responses responses; - auto code = zk_client->tryMulti(requests, responses); - if (code == Coordination::Error::ZOK) - return; - - throw Exception(ErrorCodes::LOGICAL_ERROR, "Failed to set file as failed"); - } - else - { - /// File is still retriable, update retries count and remove node from /processing. - - Coordination::Requests requests; - requests.push_back(zkutil::makeRemoveRequest(zookeeper_processing_path / node_name, -1)); - if (node_metadata.retries == 0) - { - requests.push_back(zkutil::makeCreateRequest(zookeeper_failed_path / node_name_with_retriable_suffix, - node_metadata.toString(), - zkutil::CreateMode::Persistent)); - } - else - { - requests.push_back(zkutil::makeSetRequest(zookeeper_failed_path / node_name_with_retriable_suffix, - node_metadata.toString(), - stat.version)); - } - Coordination::Responses responses; - auto code = zk_client->tryMulti(requests, responses); - if (code == Coordination::Error::ZOK) - return; - - throw Exception(ErrorCodes::LOGICAL_ERROR, "Failed to set file as failed"); - } -} - -S3QueueFilesMetadata::ProcessingNodeHolder::ProcessingNodeHolder( - const std::string & processing_id_, - const std::string & path_, - const std::string & zk_node_path_, - FileStatusPtr file_status_, - zkutil::ZooKeeperPtr zk_client_, - LoggerPtr logger_) - : zk_client(zk_client_) - , file_status(file_status_) - , path(path_) - , zk_node_path(zk_node_path_) - , processing_id(processing_id_) - , log(logger_) -{ -} - -S3QueueFilesMetadata::ProcessingNodeHolder::~ProcessingNodeHolder() -{ - if (!removed) - remove(); -} - -bool S3QueueFilesMetadata::ProcessingNodeHolder::remove(Coordination::Requests * requests, Coordination::Responses * responses) -{ - if (removed) - throw Exception(ErrorCodes::LOGICAL_ERROR, "Processing node is already removed"); - - LOG_TEST(log, "Removing processing node {} ({})", zk_node_path, path); - - try - { - if (!zk_client->expired()) - { - /// Is is possible that we created an ephemeral processing node - /// but session expired and someone other created an ephemeral processing node. - /// To avoid deleting this new node, check processing_id. - std::string res; - Coordination::Stat stat; - if (zk_client->tryGet(zk_node_path, res, &stat)) - { - auto node_metadata = NodeMetadata::fromString(res); - if (node_metadata.processing_id == processing_id) - { - if (requests) - { - requests->push_back(zkutil::makeRemoveRequest(zk_node_path, stat.version)); - auto code = zk_client->tryMulti(*requests, *responses); - removed = code == Coordination::Error::ZOK; - } - else - { - zk_client->remove(zk_node_path); - removed = true; - } - return removed; - } - else - LOG_WARNING(log, "Cannot remove {} since processing id changed: {} -> {}", - zk_node_path, processing_id, node_metadata.processing_id); - } - else - LOG_DEBUG(log, "Cannot remove {}, node doesn't exist, " - "probably because of session expiration", zk_node_path); - - /// TODO: this actually would mean that we already processed (or partially processed) - /// the data but another thread will try processing it again and data can be duplicated. - /// This can be solved via persistenly saving last processed offset in the file. - } - else - { - ProfileEvents::increment(ProfileEvents::CannotRemoveEphemeralNode); - LOG_DEBUG(log, "Cannot remove {} since session has been expired", zk_node_path); - } - } - catch (...) - { - ProfileEvents::increment(ProfileEvents::CannotRemoveEphemeralNode); - LOG_ERROR(log, "Failed to remove processing node for file {}: {}", path, getCurrentExceptionMessage(true)); - } - return false; -} - -void S3QueueFilesMetadata::cleanupThreadFunc() -{ - /// A background task is responsible for maintaining - /// max_set_size and max_set_age settings for `unordered` processing mode. - - if (shutdown) - return; - - try - { - cleanupThreadFuncImpl(); - } - catch (...) - { - LOG_ERROR(log, "Failed to cleanup nodes in zookeeper: {}", getCurrentExceptionMessage(true)); - } - - if (shutdown) - return; - - task->scheduleAfter(generateRescheduleInterval(min_cleanup_interval_ms, max_cleanup_interval_ms)); -} - -void S3QueueFilesMetadata::cleanupThreadFuncImpl() -{ - auto timer = DB::CurrentThread::getProfileEvents().timer(ProfileEvents::S3QueueCleanupMaxSetSizeOrTTLMicroseconds); - const auto zk_client = getZooKeeper(); - - Strings processed_nodes; - auto code = zk_client->tryGetChildren(zookeeper_processed_path, processed_nodes); - if (code != Coordination::Error::ZOK) - { - if (code == Coordination::Error::ZNONODE) - { - LOG_TEST(log, "Path {} does not exist", zookeeper_processed_path.string()); - } - else - throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected error: {}", magic_enum::enum_name(code)); - } - - Strings failed_nodes; - code = zk_client->tryGetChildren(zookeeper_failed_path, failed_nodes); - if (code != Coordination::Error::ZOK) - { - if (code == Coordination::Error::ZNONODE) - { - LOG_TEST(log, "Path {} does not exist", zookeeper_failed_path.string()); - } - else - throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected error: {}", magic_enum::enum_name(code)); - } - - const size_t nodes_num = processed_nodes.size() + failed_nodes.size(); - if (!nodes_num) - { - LOG_TEST(log, "There are neither processed nor failed nodes"); - return; - } - - chassert(max_set_size || max_set_age_sec); - const bool check_nodes_limit = max_set_size > 0; - const bool check_nodes_ttl = max_set_age_sec > 0; - - const bool nodes_limit_exceeded = nodes_num > max_set_size; - if ((!nodes_limit_exceeded || !check_nodes_limit) && !check_nodes_ttl) - { - LOG_TEST(log, "No limit exceeded"); - return; - } - - LOG_TRACE(log, "Will check limits for {} nodes", nodes_num); - - /// Create a lock so that with distributed processing - /// multiple nodes do not execute cleanup in parallel. - auto ephemeral_node = zkutil::EphemeralNodeHolder::tryCreate(zookeeper_cleanup_lock_path, *zk_client, toString(getCurrentTime())); - if (!ephemeral_node) - { - LOG_TEST(log, "Cleanup is already being executed by another node"); - return; - } - /// TODO because of this lock we might not update local file statuses on time on one of the nodes. - - struct Node - { - std::string zk_path; - NodeMetadata metadata; - }; - auto node_cmp = [](const Node & a, const Node & b) - { - return std::tie(a.metadata.last_processed_timestamp, a.metadata.file_path) - < std::tie(b.metadata.last_processed_timestamp, b.metadata.file_path); - }; - - /// Ordered in ascending order of timestamps. - std::set sorted_nodes(node_cmp); - - for (const auto & node : processed_nodes) - { - const std::string path = zookeeper_processed_path / node; - try - { - std::string metadata_str; - if (zk_client->tryGet(path, metadata_str)) - { - sorted_nodes.emplace(path, NodeMetadata::fromString(metadata_str)); - LOG_TEST(log, "Fetched metadata for node {}", path); - } - else - LOG_ERROR(log, "Failed to fetch node metadata {}", path); - } - catch (const zkutil::KeeperException & e) - { - if (e.code != Coordination::Error::ZCONNECTIONLOSS) - { - LOG_WARNING(log, "Unexpected exception: {}", getCurrentExceptionMessage(true)); - chassert(false); - } - - /// Will retry with a new zk connection. - throw; - } - } - - for (const auto & node : failed_nodes) - { - const std::string path = zookeeper_failed_path / node; - try - { - std::string metadata_str; - if (zk_client->tryGet(path, metadata_str)) - { - sorted_nodes.emplace(path, NodeMetadata::fromString(metadata_str)); - LOG_TEST(log, "Fetched metadata for node {}", path); - } - else - LOG_ERROR(log, "Failed to fetch node metadata {}", path); - } - catch (const zkutil::KeeperException & e) - { - if (e.code != Coordination::Error::ZCONNECTIONLOSS) - { - LOG_WARNING(log, "Unexpected exception: {}", getCurrentExceptionMessage(true)); - chassert(false); - } - - /// Will retry with a new zk connection. - throw; - } - } - - auto get_nodes_str = [&]() - { - WriteBufferFromOwnString wb; - for (const auto & [node, metadata] : sorted_nodes) - wb << fmt::format("Node: {}, path: {}, timestamp: {};\n", node, metadata.file_path, metadata.last_processed_timestamp); - return wb.str(); - }; - LOG_TEST(log, "Checking node limits (max size: {}, max age: {}) for {}", max_set_size, max_set_age_sec, get_nodes_str()); - - size_t nodes_to_remove = check_nodes_limit && nodes_limit_exceeded ? nodes_num - max_set_size : 0; - for (const auto & node : sorted_nodes) - { - if (nodes_to_remove) - { - LOG_TRACE(log, "Removing node at path {} ({}) because max files limit is reached", - node.metadata.file_path, node.zk_path); - - local_file_statuses.remove(node.metadata.file_path, /* if_exists */true); - - code = zk_client->tryRemove(node.zk_path); - if (code == Coordination::Error::ZOK) - --nodes_to_remove; - else - LOG_ERROR(log, "Failed to remove a node `{}` (code: {})", node.zk_path, code); - } - else if (check_nodes_ttl) - { - UInt64 node_age = getCurrentTime() - node.metadata.last_processed_timestamp; - if (node_age >= max_set_age_sec) - { - LOG_TRACE(log, "Removing node at path {} ({}) because file is reached", - node.metadata.file_path, node.zk_path); - - local_file_statuses.remove(node.metadata.file_path, /* if_exists */true); - - code = zk_client->tryRemove(node.zk_path); - if (code != Coordination::Error::ZOK) - LOG_ERROR(log, "Failed to remove a node `{}` (code: {})", node.zk_path, code); - } - else if (!nodes_to_remove) - { - /// Nodes limit satisfied. - /// Nodes ttl satisfied as well as if current node is under tll, then all remaining as well - /// (because we are iterating in timestamp ascending order). - break; - } - } - else - { - /// Nodes limit and ttl are satisfied. - break; - } - } - - LOG_TRACE(log, "Node limits check finished"); -} - -bool S3QueueFilesMetadata::checkSettings(const S3QueueSettings & settings) const -{ - return mode == settings.mode - && max_set_size == settings.s3queue_tracked_files_limit.value - && max_set_age_sec == settings.s3queue_tracked_file_ttl_sec.value - && max_loading_retries == settings.s3queue_loading_retries.value - && min_cleanup_interval_ms == settings.s3queue_cleanup_interval_min_ms.value - && max_cleanup_interval_ms == settings.s3queue_cleanup_interval_max_ms.value; -} - -} diff --git a/src/Storages/S3Queue/S3QueueFilesMetadata.h b/src/Storages/S3Queue/S3QueueFilesMetadata.h deleted file mode 100644 index e26af1d25c5..00000000000 --- a/src/Storages/S3Queue/S3QueueFilesMetadata.h +++ /dev/null @@ -1,215 +0,0 @@ -#pragma once -#include "config.h" - -#include -#include -#include -#include -#include - -namespace fs = std::filesystem; -namespace Poco { class Logger; } - -namespace DB -{ -struct S3QueueSettings; -class StorageS3Queue; - -/** - * A class for managing S3Queue metadata in zookeeper, e.g. - * the following folders: - * - /processing - * - /processed - * - /failed - * - * Depending on S3Queue processing mode (ordered or unordered) - * we can differently store metadata in /processed node. - * - * Implements caching of zookeeper metadata for faster responses. - * Cached part is located in LocalFileStatuses. - * - * In case of Unordered mode - if files TTL is enabled or maximum tracked files limit is set - * starts a background cleanup thread which is responsible for maintaining them. - */ -class S3QueueFilesMetadata -{ -public: - class ProcessingNodeHolder; - using ProcessingNodeHolderPtr = std::shared_ptr; - - S3QueueFilesMetadata(const fs::path & zookeeper_path_, const S3QueueSettings & settings_); - - ~S3QueueFilesMetadata(); - - void setFileProcessed(ProcessingNodeHolderPtr holder); - void setFileProcessed(const std::string & path, size_t shard_id); - - void setFileFailed(ProcessingNodeHolderPtr holder, const std::string & exception_message); - - struct FileStatus - { - enum class State : uint8_t - { - Processing, - Processed, - Failed, - None - }; - State state = State::None; - - std::atomic processed_rows = 0; - time_t processing_start_time = 0; - time_t processing_end_time = 0; - size_t retries = 0; - std::string last_exception; - ProfileEvents::Counters profile_counters; - - std::mutex processing_lock; - std::mutex metadata_lock; - }; - using FileStatusPtr = std::shared_ptr; - using FileStatuses = std::unordered_map; - - /// Set file as processing, if it is not alreaty processed, failed or processing. - ProcessingNodeHolderPtr trySetFileAsProcessing(const std::string & path); - - FileStatusPtr getFileStatus(const std::string & path); - - FileStatuses getFileStateses() const { return local_file_statuses.getAll(); } - - bool checkSettings(const S3QueueSettings & settings) const; - - void deactivateCleanupTask(); - - /// Should the table use sharded processing? - /// We use sharded processing for Ordered mode of S3Queue table. - /// It allows to parallelize processing within a single server - /// and to allow distributed processing. - bool isShardedProcessing() const; - - /// Register a new shard for processing. - /// Return a shard id of registered shard. - size_t registerNewShard(); - /// Register a new shard for processing by given id. - /// Throws exception if shard by this id is already registered. - void registerNewShard(size_t shard_id); - /// Unregister shard from keeper. - void unregisterShard(size_t shard_id); - bool isShardRegistered(size_t shard_id); - - /// Total number of processing ids. - /// A processing id identifies a single processing thread. - /// There might be several processing ids per shard. - size_t getProcessingIdsNum() const; - /// Get processing ids identified with requested shard. - std::vector getProcessingIdsForShard(size_t shard_id) const; - /// Check if given processing id belongs to a given shard. - bool isProcessingIdBelongsToShard(size_t id, size_t shard_id) const; - /// Get a processing id for processing thread by given thread id. - /// thread id is a value in range [0, threads_per_shard]. - size_t getIdForProcessingThread(size_t thread_id, size_t shard_id) const; - - /// Calculate which processing id corresponds to a given file path. - /// The file will be processed by a thread related to this processing id. - size_t getProcessingIdForPath(const std::string & path) const; - -private: - const S3QueueMode mode; - const UInt64 max_set_size; - const UInt64 max_set_age_sec; - const UInt64 max_loading_retries; - const size_t min_cleanup_interval_ms; - const size_t max_cleanup_interval_ms; - const size_t shards_num; - const size_t threads_per_shard; - - const fs::path zookeeper_processing_path; - const fs::path zookeeper_processed_path; - const fs::path zookeeper_failed_path; - const fs::path zookeeper_shards_path; - const fs::path zookeeper_cleanup_lock_path; - - LoggerPtr log; - - std::atomic_bool shutdown = false; - BackgroundSchedulePool::TaskHolder task; - - std::string getNodeName(const std::string & path); - - zkutil::ZooKeeperPtr getZooKeeper() const; - - void setFileProcessedForOrderedMode(ProcessingNodeHolderPtr holder); - void setFileProcessedForUnorderedMode(ProcessingNodeHolderPtr holder); - std::string getZooKeeperPathForShard(size_t shard_id) const; - - void setFileProcessedForOrderedModeImpl( - const std::string & path, ProcessingNodeHolderPtr holder, const std::string & processed_node_path); - - enum class SetFileProcessingResult : uint8_t - { - Success, - ProcessingByOtherNode, - AlreadyProcessed, - AlreadyFailed, - }; - std::pair trySetFileAsProcessingForOrderedMode(const std::string & path, const FileStatusPtr & file_status); - std::pair trySetFileAsProcessingForUnorderedMode(const std::string & path, const FileStatusPtr & file_status); - - struct NodeMetadata - { - std::string file_path; UInt64 last_processed_timestamp = 0; - std::string last_exception; - UInt64 retries = 0; - std::string processing_id; /// For ephemeral processing node. - - std::string toString() const; - static NodeMetadata fromString(const std::string & metadata_str); - }; - - NodeMetadata createNodeMetadata(const std::string & path, const std::string & exception = "", size_t retries = 0); - - void cleanupThreadFunc(); - void cleanupThreadFuncImpl(); - - struct LocalFileStatuses - { - FileStatuses file_statuses; - mutable std::mutex mutex; - - FileStatuses getAll() const; - FileStatusPtr get(const std::string & filename, bool create); - bool remove(const std::string & filename, bool if_exists); - std::unique_lock lock() const; - }; - LocalFileStatuses local_file_statuses; -}; - -class S3QueueFilesMetadata::ProcessingNodeHolder -{ - friend class S3QueueFilesMetadata; -public: - ProcessingNodeHolder( - const std::string & processing_id_, - const std::string & path_, - const std::string & zk_node_path_, - FileStatusPtr file_status_, - zkutil::ZooKeeperPtr zk_client_, - LoggerPtr logger_); - - ~ProcessingNodeHolder(); - - FileStatusPtr getFileStatus() { return file_status; } - -private: - bool remove(Coordination::Requests * requests = nullptr, Coordination::Responses * responses = nullptr); - - zkutil::ZooKeeperPtr zk_client; - FileStatusPtr file_status; - std::string path; - std::string zk_node_path; - std::string processing_id; - bool removed = false; - LoggerPtr log; -}; - -} diff --git a/src/Storages/S3Queue/S3QueueIFileMetadata.cpp b/src/Storages/S3Queue/S3QueueIFileMetadata.cpp new file mode 100644 index 00000000000..6c4089115d4 --- /dev/null +++ b/src/Storages/S3Queue/S3QueueIFileMetadata.cpp @@ -0,0 +1,354 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +namespace ProfileEvents +{ + extern const Event S3QueueProcessedFiles; + extern const Event S3QueueFailedFiles; +}; + +namespace DB +{ +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + +namespace +{ + zkutil::ZooKeeperPtr getZooKeeper() + { + return Context::getGlobalContextInstance()->getZooKeeper(); + } + + time_t now() + { + return std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()); + } +} + +void S3QueueIFileMetadata::FileStatus::onProcessing() +{ + state = FileStatus::State::Processing; + processing_start_time = now(); +} + +void S3QueueIFileMetadata::FileStatus::onProcessed() +{ + state = FileStatus::State::Processed; + processing_end_time = now(); +} + +void S3QueueIFileMetadata::FileStatus::onFailed(const std::string & exception) +{ + state = FileStatus::State::Failed; + processing_end_time = now(); + std::lock_guard lock(last_exception_mutex); + last_exception = exception; +} + +std::string S3QueueIFileMetadata::FileStatus::getException() const +{ + std::lock_guard lock(last_exception_mutex); + return last_exception; +} + +std::string S3QueueIFileMetadata::NodeMetadata::toString() const +{ + Poco::JSON::Object json; + json.set("file_path", file_path); + json.set("last_processed_timestamp", now()); + json.set("last_exception", last_exception); + json.set("retries", retries); + json.set("processing_id", processing_id); + + std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM + oss.exceptions(std::ios::failbit); + Poco::JSON::Stringifier::stringify(json, oss); + return oss.str(); +} + +S3QueueIFileMetadata::NodeMetadata S3QueueIFileMetadata::NodeMetadata::fromString(const std::string & metadata_str) +{ + Poco::JSON::Parser parser; + auto json = parser.parse(metadata_str).extract(); + chassert(json); + + NodeMetadata metadata; + metadata.file_path = json->getValue("file_path"); + metadata.last_processed_timestamp = json->getValue("last_processed_timestamp"); + metadata.last_exception = json->getValue("last_exception"); + metadata.retries = json->getValue("retries"); + metadata.processing_id = json->getValue("processing_id"); + return metadata; +} + +S3QueueIFileMetadata::S3QueueIFileMetadata( + const std::string & path_, + const std::string & processing_node_path_, + const std::string & processed_node_path_, + const std::string & failed_node_path_, + FileStatusPtr file_status_, + size_t max_loading_retries_, + LoggerPtr log_) + : path(path_) + , node_name(getNodeName(path_)) + , file_status(file_status_) + , max_loading_retries(max_loading_retries_) + , processing_node_path(processing_node_path_) + , processed_node_path(processed_node_path_) + , failed_node_path(failed_node_path_) + , node_metadata(createNodeMetadata(path)) + , log(log_) + , processing_node_id_path(processing_node_path + "_processing_id") +{ + LOG_TEST(log, "Path: {}, node_name: {}, max_loading_retries: {}, " + "processed_path: {}, processing_path: {}, failed_path: {}", + path, node_name, max_loading_retries, + processed_node_path, processing_node_path, failed_node_path); +} + +S3QueueIFileMetadata::~S3QueueIFileMetadata() +{ + if (processing_id_version.has_value()) + { + file_status->onFailed("Uncaught exception"); + LOG_TEST(log, "Removing processing node in destructor for file: {}", path); + try + { + auto zk_client = getZooKeeper(); + + Coordination::Requests requests; + requests.push_back(zkutil::makeCheckRequest(processing_node_id_path, processing_id_version.value())); + requests.push_back(zkutil::makeRemoveRequest(processing_node_path, -1)); + + Coordination::Responses responses; + const auto code = zk_client->tryMulti(requests, responses); + if (code != Coordination::Error::ZOK + && !Coordination::isHardwareError(code) + && code != Coordination::Error::ZBADVERSION + && code != Coordination::Error::ZNONODE) + { + LOG_WARNING(log, "Unexpected error while removing processing node: {}", code); + chassert(false); + } + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + } + } +} + +std::string S3QueueIFileMetadata::getNodeName(const std::string & path) +{ + /// Since with are dealing with paths in s3 which can have "/", + /// we cannot create a zookeeper node with the name equal to path. + /// Therefore we use a hash of the path as a node name. + + SipHash path_hash; + path_hash.update(path); + return toString(path_hash.get64()); +} + +S3QueueIFileMetadata::NodeMetadata S3QueueIFileMetadata::createNodeMetadata( + const std::string & path, + const std::string & exception, + size_t retries) +{ + /// Create a metadata which will be stored in a node named as getNodeName(path). + + /// Since node name is just a hash we want to know to which file it corresponds, + /// so we keep "file_path" in nodes data. + /// "last_processed_timestamp" is needed for TTL metadata nodes enabled by s3queue_tracked_file_ttl_sec. + /// "last_exception" is kept for introspection, should also be visible in system.s3queue_log if it is enabled. + /// "retries" is kept for retrying the processing enabled by s3queue_loading_retries. + NodeMetadata metadata; + metadata.file_path = path; + metadata.last_processed_timestamp = now(); + metadata.last_exception = exception; + metadata.retries = retries; + return metadata; +} + +std::string S3QueueIFileMetadata::getProcessorInfo(const std::string & processor_id) +{ + /// Add information which will be useful for debugging just in case. + Poco::JSON::Object json; + json.set("hostname", DNSResolver::instance().getHostName()); + json.set("processor_id", processor_id); + + std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM + oss.exceptions(std::ios::failbit); + Poco::JSON::Stringifier::stringify(json, oss); + return oss.str(); +} + +bool S3QueueIFileMetadata::setProcessing() +{ + auto state = file_status->state.load(); + if (state == FileStatus::State::Processing + || state == FileStatus::State::Processed + || (state == FileStatus::State::Failed && file_status->retries >= max_loading_retries)) + { + LOG_TEST(log, "File {} has non-processable state `{}`", path, file_status->state.load()); + return false; + } + + /// An optimization for local parallel processing. + std::unique_lock processing_lock(file_status->processing_lock, std::defer_lock); + if (!processing_lock.try_lock()) + return {}; + + auto [success, file_state] = setProcessingImpl(); + if (success) + file_status->onProcessing(); + else + file_status->updateState(file_state); + + LOG_TEST(log, "File {} has state `{}`: will {}process (processing id version: {})", + path, file_state, success ? "" : "not ", + processing_id_version.has_value() ? toString(processing_id_version.value()) : "None"); + + return success; +} + +void S3QueueIFileMetadata::setProcessed() +{ + LOG_TRACE(log, "Setting file {} as processed (path: {})", path, processed_node_path); + + ProfileEvents::increment(ProfileEvents::S3QueueProcessedFiles); + file_status->onProcessed(); + setProcessedImpl(); + + processing_id.reset(); + processing_id_version.reset(); + + LOG_TRACE(log, "Set file {} as processed (rows: {})", path, file_status->processed_rows); +} + +void S3QueueIFileMetadata::setFailed(const std::string & exception) +{ + LOG_TRACE(log, "Setting file {} as failed (exception: {}, path: {})", path, exception, failed_node_path); + + ProfileEvents::increment(ProfileEvents::S3QueueFailedFiles); + file_status->onFailed(exception); + node_metadata.last_exception = exception; + + if (max_loading_retries == 0) + setFailedNonRetriable(); + else + setFailedRetriable(); + + processing_id.reset(); + processing_id_version.reset(); + + LOG_TRACE(log, "Set file {} as failed (rows: {})", path, file_status->processed_rows); +} + +void S3QueueIFileMetadata::setFailedNonRetriable() +{ + auto zk_client = getZooKeeper(); + Coordination::Requests requests; + requests.push_back(zkutil::makeCreateRequest(failed_node_path, node_metadata.toString(), zkutil::CreateMode::Persistent)); + requests.push_back(zkutil::makeRemoveRequest(processing_node_path, -1)); + + Coordination::Responses responses; + const auto code = zk_client->tryMulti(requests, responses); + if (code == Coordination::Error::ZOK) + { + LOG_TRACE(log, "File `{}` failed to process and will not be retried. ", path); + return; + } + + if (Coordination::isHardwareError(responses[0]->error)) + { + LOG_WARNING(log, "Cannot set file as failed: lost connection to keeper"); + return; + } + + if (responses[0]->error == Coordination::Error::ZNODEEXISTS) + { + LOG_WARNING(log, "Cannot create a persistent node in /failed since it already exists"); + chassert(false); + return; + } + + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected error while setting file as failed: {}", code); +} + +void S3QueueIFileMetadata::setFailedRetriable() +{ + /// Instead of creating a persistent /failed/node_hash node + /// we create a persistent /failed/node_hash.retriable node. + /// This allows us to make less zookeeper requests as we avoid checking + /// the number of already done retries in trySetFileAsProcessing. + + auto retrieable_failed_node_path = failed_node_path + ".retriable"; + auto zk_client = getZooKeeper(); + + /// Extract the number of already done retries from node_hash.retriable node if it exists. + Coordination::Stat stat; + std::string res; + if (zk_client->tryGet(retrieable_failed_node_path, res, &stat)) + { + auto failed_node_metadata = NodeMetadata::fromString(res); + node_metadata.retries = failed_node_metadata.retries + 1; + file_status->retries = node_metadata.retries; + } + + LOG_TRACE(log, "File `{}` failed to process, try {}/{}", + path, node_metadata.retries, max_loading_retries); + + Coordination::Requests requests; + if (node_metadata.retries >= max_loading_retries) + { + /// File is no longer retriable. + /// Make a persistent node /failed/node_hash, + /// remove /failed/node_hash.retriable node and node in /processing. + + requests.push_back(zkutil::makeRemoveRequest(processing_node_path, -1)); + requests.push_back(zkutil::makeRemoveRequest(retrieable_failed_node_path, stat.version)); + requests.push_back( + zkutil::makeCreateRequest( + failed_node_path, node_metadata.toString(), zkutil::CreateMode::Persistent)); + + } + else + { + /// File is still retriable, + /// update retries count and remove node from /processing. + + requests.push_back(zkutil::makeRemoveRequest(processing_node_path, -1)); + if (node_metadata.retries == 0) + { + requests.push_back( + zkutil::makeCreateRequest( + retrieable_failed_node_path, node_metadata.toString(), zkutil::CreateMode::Persistent)); + } + else + { + requests.push_back( + zkutil::makeSetRequest( + retrieable_failed_node_path, node_metadata.toString(), stat.version)); + } + } + + Coordination::Responses responses; + auto code = zk_client->tryMulti(requests, responses); + if (code == Coordination::Error::ZOK) + return; + + throw Exception(ErrorCodes::LOGICAL_ERROR, + "Failed to set file {} as failed (code: {})", path, code); +} + +} diff --git a/src/Storages/S3Queue/S3QueueIFileMetadata.h b/src/Storages/S3Queue/S3QueueIFileMetadata.h new file mode 100644 index 00000000000..e0b0d16cbcc --- /dev/null +++ b/src/Storages/S3Queue/S3QueueIFileMetadata.h @@ -0,0 +1,114 @@ +#pragma once +#include +#include +#include + +namespace DB +{ + +class S3QueueIFileMetadata +{ +public: + struct FileStatus + { + enum class State : uint8_t + { + Processing, + Processed, + Failed, + None + }; + + void onProcessing(); + void onProcessed(); + void onFailed(const std::string & exception); + void updateState(State state_) { state = state_; } + + std::string getException() const; + + std::mutex processing_lock; + + std::atomic state = State::None; + std::atomic processed_rows = 0; + std::atomic processing_start_time = 0; + std::atomic processing_end_time = 0; + std::atomic retries = 0; + ProfileEvents::Counters profile_counters; + + private: + mutable std::mutex last_exception_mutex; + std::string last_exception; + }; + using FileStatusPtr = std::shared_ptr; + + explicit S3QueueIFileMetadata( + const std::string & path_, + const std::string & processing_node_path_, + const std::string & processed_node_path_, + const std::string & failed_node_path_, + FileStatusPtr file_status_, + size_t max_loading_retries_, + LoggerPtr log_); + + virtual ~S3QueueIFileMetadata(); + + bool setProcessing(); + void setProcessed(); + void setFailed(const std::string & exception); + + virtual void setProcessedAtStartRequests( + Coordination::Requests & requests, + const zkutil::ZooKeeperPtr & zk_client) = 0; + + FileStatusPtr getFileStatus() { return file_status; } + + struct NodeMetadata + { + std::string file_path; UInt64 last_processed_timestamp = 0; + std::string last_exception; + UInt64 retries = 0; + std::string processing_id; /// For ephemeral processing node. + + std::string toString() const; + static NodeMetadata fromString(const std::string & metadata_str); + }; + +protected: + virtual std::pair setProcessingImpl() = 0; + virtual void setProcessedImpl() = 0; + void setFailedNonRetriable(); + void setFailedRetriable(); + + const std::string path; + const std::string node_name; + const FileStatusPtr file_status; + const size_t max_loading_retries; + + const std::string processing_node_path; + const std::string processed_node_path; + const std::string failed_node_path; + + NodeMetadata node_metadata; + LoggerPtr log; + + /// processing node is ephemeral, so we cannot verify with it if + /// this node was created by a certain processor on a previous s3 queue processing stage, + /// because we could get a session expired in between the stages + /// and someone else could just create this processing node. + /// Therefore we also create a persistent processing node + /// which is updated on each creation of ephemeral processing node. + /// We use the version of this node to verify the version of the processing ephemeral node. + const std::string processing_node_id_path; + /// Id of the processor. + std::optional processing_id; + /// Version of the processing id persistent node. + std::optional processing_id_version; + + static std::string getNodeName(const std::string & path); + + static NodeMetadata createNodeMetadata(const std::string & path, const std::string & exception = {}, size_t retries = 0); + + static std::string getProcessorInfo(const std::string & processor_id); +}; + +} diff --git a/src/Storages/S3Queue/S3QueueMetadata.cpp b/src/Storages/S3Queue/S3QueueMetadata.cpp new file mode 100644 index 00000000000..f4c8c5c5ef2 --- /dev/null +++ b/src/Storages/S3Queue/S3QueueMetadata.cpp @@ -0,0 +1,485 @@ +#include "config.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +namespace ProfileEvents +{ + extern const Event S3QueueSetFileProcessingMicroseconds; + extern const Event S3QueueSetFileProcessedMicroseconds; + extern const Event S3QueueSetFileFailedMicroseconds; + extern const Event S3QueueFailedFiles; + extern const Event S3QueueProcessedFiles; + extern const Event S3QueueCleanupMaxSetSizeOrTTLMicroseconds; + extern const Event S3QueueLockLocalFileStatusesMicroseconds; +}; + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; + extern const int BAD_ARGUMENTS; + extern const int REPLICA_ALREADY_EXISTS; + extern const int INCOMPATIBLE_COLUMNS; +} + +namespace +{ + UInt64 getCurrentTime() + { + return std::chrono::duration_cast(std::chrono::system_clock::now().time_since_epoch()).count(); + } + + size_t generateRescheduleInterval(size_t min, size_t max) + { + /// Use more or less random interval for unordered mode cleanup task. + /// So that distributed processing cleanup tasks would not schedule cleanup at the same time. + pcg64 rng(randomSeed()); + return min + rng() % (max - min + 1); + } + + zkutil::ZooKeeperPtr getZooKeeper() + { + return Context::getGlobalContextInstance()->getZooKeeper(); + } +} + +class S3QueueMetadata::LocalFileStatuses +{ +public: + LocalFileStatuses() = default; + + FileStatuses getAll() const + { + auto lk = lock(); + return file_statuses; + } + + FileStatusPtr get(const std::string & filename, bool create) + { + auto lk = lock(); + auto it = file_statuses.find(filename); + if (it == file_statuses.end()) + { + if (create) + it = file_statuses.emplace(filename, std::make_shared()).first; + else + throw Exception(ErrorCodes::BAD_ARGUMENTS, "File status for {} doesn't exist", filename); + } + return it->second; + } + + bool remove(const std::string & filename, bool if_exists) + { + auto lk = lock(); + auto it = file_statuses.find(filename); + if (it == file_statuses.end()) + { + if (if_exists) + return false; + else + throw Exception(ErrorCodes::BAD_ARGUMENTS, "File status for {} doesn't exist", filename); + } + file_statuses.erase(it); + return true; + } + +private: + FileStatuses file_statuses; + mutable std::mutex mutex; + + std::unique_lock lock() const + { + auto timer = DB::CurrentThread::getProfileEvents().timer(ProfileEvents::S3QueueLockLocalFileStatusesMicroseconds); + return std::unique_lock(mutex); + } +}; + +S3QueueMetadata::S3QueueMetadata(const fs::path & zookeeper_path_, const S3QueueSettings & settings_) + : settings(settings_) + , zookeeper_path(zookeeper_path_) + , buckets_num(getBucketsNum(settings_)) + , log(getLogger("StorageS3Queue(" + zookeeper_path_.string() + ")")) + , local_file_statuses(std::make_shared()) +{ + if (settings.mode == S3QueueMode::UNORDERED + && (settings.s3queue_tracked_files_limit || settings.s3queue_tracked_file_ttl_sec)) + { + task = Context::getGlobalContextInstance()->getSchedulePool().createTask( + "S3QueueCleanupFunc", + [this] { cleanupThreadFunc(); }); + + task->activate(); + task->scheduleAfter( + generateRescheduleInterval( + settings.s3queue_cleanup_interval_min_ms, settings.s3queue_cleanup_interval_max_ms)); + } +} + +S3QueueMetadata::~S3QueueMetadata() +{ + shutdown(); +} + +void S3QueueMetadata::shutdown() +{ + shutdown_called = true; + if (task) + task->deactivate(); +} + +void S3QueueMetadata::checkSettings(const S3QueueSettings & settings_) const +{ + S3QueueTableMetadata::checkEquals(settings, settings_); +} + +S3QueueMetadata::FileStatusPtr S3QueueMetadata::getFileStatus(const std::string & path) +{ + return local_file_statuses->get(path, /* create */false); +} + +S3QueueMetadata::FileStatuses S3QueueMetadata::getFileStatuses() const +{ + return local_file_statuses->getAll(); +} + +S3QueueMetadata::FileMetadataPtr S3QueueMetadata::getFileMetadata( + const std::string & path, + S3QueueOrderedFileMetadata::BucketInfoPtr bucket_info) +{ + auto file_status = local_file_statuses->get(path, /* create */true); + switch (settings.mode) + { + case S3QueueMode::ORDERED: + return std::make_shared( + zookeeper_path, + path, + file_status, + bucket_info, + buckets_num, + settings.s3queue_loading_retries, + log); + case S3QueueMode::UNORDERED: + return std::make_shared( + zookeeper_path, + path, + file_status, + settings.s3queue_loading_retries, + log); + } +} + +size_t S3QueueMetadata::getBucketsNum(const S3QueueSettings & settings) +{ + if (settings.s3queue_buckets) + return settings.s3queue_buckets; + if (settings.s3queue_processing_threads_num) + return settings.s3queue_processing_threads_num; + return 0; +} + +size_t S3QueueMetadata::getBucketsNum(const S3QueueTableMetadata & settings) +{ + if (settings.buckets) + return settings.buckets; + if (settings.processing_threads_num) + return settings.processing_threads_num; + return 0; +} + +bool S3QueueMetadata::useBucketsForProcessing() const +{ + return settings.mode == S3QueueMode::ORDERED && (buckets_num > 1); +} + +S3QueueMetadata::Bucket S3QueueMetadata::getBucketForPath(const std::string & path) const +{ + return S3QueueOrderedFileMetadata::getBucketForPath(path, buckets_num); +} + +S3QueueOrderedFileMetadata::BucketHolderPtr +S3QueueMetadata::tryAcquireBucket(const Bucket & bucket, const Processor & processor) +{ + return S3QueueOrderedFileMetadata::tryAcquireBucket(zookeeper_path, bucket, processor); +} + +void S3QueueMetadata::initialize( + const ConfigurationPtr & configuration, + const StorageInMemoryMetadata & storage_metadata) +{ + const auto metadata_from_table = S3QueueTableMetadata(*configuration, settings, storage_metadata); + const auto & columns_from_table = storage_metadata.getColumns(); + const auto table_metadata_path = zookeeper_path / "metadata"; + const auto metadata_paths = settings.mode == S3QueueMode::ORDERED + ? S3QueueOrderedFileMetadata::getMetadataPaths(buckets_num) + : S3QueueUnorderedFileMetadata::getMetadataPaths(); + + auto zookeeper = getZooKeeper(); + zookeeper->createAncestors(zookeeper_path); + + for (size_t i = 0; i < 1000; ++i) + { + if (zookeeper->exists(table_metadata_path)) + { + const auto metadata_from_zk = S3QueueTableMetadata::parse(zookeeper->get(fs::path(zookeeper_path) / "metadata")); + const auto columns_from_zk = ColumnsDescription::parse(metadata_from_zk.columns); + + metadata_from_table.checkEquals(metadata_from_zk); + if (columns_from_zk != columns_from_table) + { + throw Exception( + ErrorCodes::INCOMPATIBLE_COLUMNS, + "Table columns structure in ZooKeeper is different from local table structure. " + "Local columns:\n{}\nZookeeper columns:\n{}", + columns_from_table.toString(), columns_from_zk.toString()); + } + return; + } + + Coordination::Requests requests; + requests.emplace_back(zkutil::makeCreateRequest(zookeeper_path, "", zkutil::CreateMode::Persistent)); + requests.emplace_back(zkutil::makeCreateRequest(table_metadata_path, metadata_from_table.toString(), zkutil::CreateMode::Persistent)); + + for (const auto & path : metadata_paths) + { + const auto zk_path = zookeeper_path / path; + requests.emplace_back(zkutil::makeCreateRequest(zk_path, "", zkutil::CreateMode::Persistent)); + } + + if (!settings.s3queue_last_processed_path.value.empty()) + getFileMetadata(settings.s3queue_last_processed_path)->setProcessedAtStartRequests(requests, zookeeper); + + Coordination::Responses responses; + auto code = zookeeper->tryMulti(requests, responses); + if (code == Coordination::Error::ZNODEEXISTS) + { + auto exception = zkutil::KeeperMultiException(code, requests, responses); + LOG_INFO(log, "Got code `{}` for path: {}. " + "It looks like the table {} was created by another server at the same moment, " + "will retry", code, exception.getPathForFirstFailedOp(), zookeeper_path.string()); + continue; + } + else if (code != Coordination::Error::ZOK) + zkutil::KeeperMultiException::check(code, requests, responses); + + return; + } + + throw Exception( + ErrorCodes::REPLICA_ALREADY_EXISTS, + "Cannot create table, because it is created concurrently every time or because " + "of wrong zookeeper path or because of logical error"); +} + +void S3QueueMetadata::cleanupThreadFunc() +{ + /// A background task is responsible for maintaining + /// settings.s3queue_tracked_files_limit and max_set_age settings for `unordered` processing mode. + + if (shutdown_called) + return; + + try + { + cleanupThreadFuncImpl(); + } + catch (...) + { + LOG_ERROR(log, "Failed to cleanup nodes in zookeeper: {}", getCurrentExceptionMessage(true)); + } + + if (shutdown_called) + return; + + task->scheduleAfter( + generateRescheduleInterval( + settings.s3queue_cleanup_interval_min_ms, settings.s3queue_cleanup_interval_max_ms)); +} + +void S3QueueMetadata::cleanupThreadFuncImpl() +{ + auto timer = DB::CurrentThread::getProfileEvents().timer(ProfileEvents::S3QueueCleanupMaxSetSizeOrTTLMicroseconds); + const auto zk_client = getZooKeeper(); + const fs::path zookeeper_processed_path = zookeeper_path / "processed"; + const fs::path zookeeper_failed_path = zookeeper_path / "failed"; + const fs::path zookeeper_cleanup_lock_path = zookeeper_path / "cleanup_lock"; + + Strings processed_nodes; + auto code = zk_client->tryGetChildren(zookeeper_processed_path, processed_nodes); + if (code != Coordination::Error::ZOK) + { + if (code == Coordination::Error::ZNONODE) + { + LOG_TEST(log, "Path {} does not exist", zookeeper_processed_path.string()); + } + else + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected error: {}", magic_enum::enum_name(code)); + } + + Strings failed_nodes; + code = zk_client->tryGetChildren(zookeeper_failed_path, failed_nodes); + if (code != Coordination::Error::ZOK) + { + if (code == Coordination::Error::ZNONODE) + { + LOG_TEST(log, "Path {} does not exist", zookeeper_failed_path.string()); + } + else + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected error: {}", magic_enum::enum_name(code)); + } + + const size_t nodes_num = processed_nodes.size() + failed_nodes.size(); + if (!nodes_num) + { + LOG_TEST(log, "There are neither processed nor failed nodes (in {} and in {})", + zookeeper_processed_path.string(), zookeeper_failed_path.string()); + return; + } + + chassert(settings.s3queue_tracked_files_limit || settings.s3queue_tracked_file_ttl_sec); + const bool check_nodes_limit = settings.s3queue_tracked_files_limit > 0; + const bool check_nodes_ttl = settings.s3queue_tracked_file_ttl_sec > 0; + + const bool nodes_limit_exceeded = nodes_num > settings.s3queue_tracked_files_limit; + if ((!nodes_limit_exceeded || !check_nodes_limit) && !check_nodes_ttl) + { + LOG_TEST(log, "No limit exceeded"); + return; + } + + LOG_TRACE(log, "Will check limits for {} nodes", nodes_num); + + /// Create a lock so that with distributed processing + /// multiple nodes do not execute cleanup in parallel. + auto ephemeral_node = zkutil::EphemeralNodeHolder::tryCreate(zookeeper_cleanup_lock_path, *zk_client, toString(getCurrentTime())); + if (!ephemeral_node) + { + LOG_TEST(log, "Cleanup is already being executed by another node"); + return; + } + /// TODO because of this lock we might not update local file statuses on time on one of the nodes. + + struct Node + { + std::string zk_path; + S3QueueIFileMetadata::NodeMetadata metadata; + }; + auto node_cmp = [](const Node & a, const Node & b) + { + return std::tie(a.metadata.last_processed_timestamp, a.metadata.file_path) + < std::tie(b.metadata.last_processed_timestamp, b.metadata.file_path); + }; + + /// Ordered in ascending order of timestamps. + std::set sorted_nodes(node_cmp); + + auto fetch_nodes = [&](const Strings & nodes, const fs::path & base_path) + { + for (const auto & node : nodes) + { + const std::string path = base_path / node; + try + { + std::string metadata_str; + if (zk_client->tryGet(path, metadata_str)) + { + sorted_nodes.emplace(path, S3QueueIFileMetadata::NodeMetadata::fromString(metadata_str)); + LOG_TEST(log, "Fetched metadata for node {}", path); + } + else + LOG_ERROR(log, "Failed to fetch node metadata {}", path); + } + catch (const zkutil::KeeperException & e) + { + if (!Coordination::isHardwareError(e.code)) + { + LOG_WARNING(log, "Unexpected exception: {}", getCurrentExceptionMessage(true)); + chassert(false); + } + + /// Will retry with a new zk connection. + throw; + } + } + }; + + fetch_nodes(processed_nodes, zookeeper_processed_path); + fetch_nodes(failed_nodes, zookeeper_failed_path); + + auto get_nodes_str = [&]() + { + WriteBufferFromOwnString wb; + for (const auto & [node, metadata] : sorted_nodes) + wb << fmt::format("Node: {}, path: {}, timestamp: {};\n", node, metadata.file_path, metadata.last_processed_timestamp); + return wb.str(); + }; + LOG_TEST(log, "Checking node limits (max size: {}, max age: {}) for {}", settings.s3queue_tracked_files_limit, settings.s3queue_tracked_file_ttl_sec, get_nodes_str()); + + size_t nodes_to_remove = check_nodes_limit && nodes_limit_exceeded ? nodes_num - settings.s3queue_tracked_files_limit : 0; + for (const auto & node : sorted_nodes) + { + if (nodes_to_remove) + { + LOG_TRACE(log, "Removing node at path {} ({}) because max files limit is reached", + node.metadata.file_path, node.zk_path); + + local_file_statuses->remove(node.metadata.file_path, /* if_exists */true); + + code = zk_client->tryRemove(node.zk_path); + if (code == Coordination::Error::ZOK) + --nodes_to_remove; + else + LOG_ERROR(log, "Failed to remove a node `{}` (code: {})", node.zk_path, code); + } + else if (check_nodes_ttl) + { + UInt64 node_age = getCurrentTime() - node.metadata.last_processed_timestamp; + if (node_age >= settings.s3queue_tracked_file_ttl_sec) + { + LOG_TRACE(log, "Removing node at path {} ({}) because file ttl is reached", + node.metadata.file_path, node.zk_path); + + local_file_statuses->remove(node.metadata.file_path, /* if_exists */true); + + code = zk_client->tryRemove(node.zk_path); + if (code != Coordination::Error::ZOK) + LOG_ERROR(log, "Failed to remove a node `{}` (code: {})", node.zk_path, code); + } + else if (!nodes_to_remove) + { + /// Nodes limit satisfied. + /// Nodes ttl satisfied as well as if current node is under tll, then all remaining as well + /// (because we are iterating in timestamp ascending order). + break; + } + } + else + { + /// Nodes limit and ttl are satisfied. + break; + } + } + + LOG_TRACE(log, "Node limits check finished"); +} + +} diff --git a/src/Storages/S3Queue/S3QueueMetadata.h b/src/Storages/S3Queue/S3QueueMetadata.h new file mode 100644 index 00000000000..ef4a9808c68 --- /dev/null +++ b/src/Storages/S3Queue/S3QueueMetadata.h @@ -0,0 +1,95 @@ +#pragma once +#include "config.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace fs = std::filesystem; +namespace Poco { class Logger; } + +namespace DB +{ +struct S3QueueSettings; +class StorageS3Queue; +struct S3QueueTableMetadata; +struct StorageInMemoryMetadata; +using ConfigurationPtr = StorageObjectStorage::ConfigurationPtr; + +/** + * A class for managing S3Queue metadata in zookeeper, e.g. + * the following folders: + * - /processed + * - /processing + * - /failed + * + * In case we use buckets for processing for Ordered mode, the structure looks like: + * - /buckets//processed -- persistent node, information about last processed file. + * - /buckets//lock -- ephemeral node, used for acquiring bucket lock. + * - /processing + * - /failed + * + * Depending on S3Queue processing mode (ordered or unordered) + * we can differently store metadata in /processed node. + * + * Implements caching of zookeeper metadata for faster responses. + * Cached part is located in LocalFileStatuses. + * + * In case of Unordered mode - if files TTL is enabled or maximum tracked files limit is set + * starts a background cleanup thread which is responsible for maintaining them. + */ +class S3QueueMetadata +{ +public: + using FileStatus = S3QueueIFileMetadata::FileStatus; + using FileMetadataPtr = std::shared_ptr; + using FileStatusPtr = std::shared_ptr; + using FileStatuses = std::unordered_map; + using Bucket = size_t; + using Processor = std::string; + + S3QueueMetadata(const fs::path & zookeeper_path_, const S3QueueSettings & settings_); + ~S3QueueMetadata(); + + void initialize(const ConfigurationPtr & configuration, const StorageInMemoryMetadata & storage_metadata); + void checkSettings(const S3QueueSettings & settings) const; + void shutdown(); + + FileMetadataPtr getFileMetadata(const std::string & path, S3QueueOrderedFileMetadata::BucketInfoPtr bucket_info = {}); + + FileStatusPtr getFileStatus(const std::string & path); + FileStatuses getFileStatuses() const; + + /// Method of Ordered mode parallel processing. + bool useBucketsForProcessing() const; + Bucket getBucketForPath(const std::string & path) const; + S3QueueOrderedFileMetadata::BucketHolderPtr tryAcquireBucket(const Bucket & bucket, const Processor & processor); + + static size_t getBucketsNum(const S3QueueSettings & settings); + static size_t getBucketsNum(const S3QueueTableMetadata & settings); + +private: + void cleanupThreadFunc(); + void cleanupThreadFuncImpl(); + + const S3QueueSettings settings; + const fs::path zookeeper_path; + const size_t buckets_num; + + bool initialized = false; + LoggerPtr log; + + std::atomic_bool shutdown_called = false; + BackgroundSchedulePool::TaskHolder task; + + class LocalFileStatuses; + std::shared_ptr local_file_statuses; +}; + +} diff --git a/src/Storages/S3Queue/S3QueueMetadataFactory.cpp b/src/Storages/S3Queue/S3QueueMetadataFactory.cpp index 0c3c26adfe0..a319b21ca3e 100644 --- a/src/Storages/S3Queue/S3QueueMetadataFactory.cpp +++ b/src/Storages/S3Queue/S3QueueMetadataFactory.cpp @@ -21,17 +21,13 @@ S3QueueMetadataFactory::getOrCreate(const std::string & zookeeper_path, const S3 auto it = metadata_by_path.find(zookeeper_path); if (it == metadata_by_path.end()) { - auto files_metadata = std::make_shared(zookeeper_path, settings); + auto files_metadata = std::make_shared(zookeeper_path, settings); it = metadata_by_path.emplace(zookeeper_path, std::move(files_metadata)).first; } - else if (it->second.metadata->checkSettings(settings)) - { - it->second.ref_count += 1; - } else { - throw Exception(ErrorCodes::BAD_ARGUMENTS, "Metadata with the same `s3queue_zookeeper_path` " - "was already created but with different settings"); + it->second.metadata->checkSettings(settings); + it->second.ref_count += 1; } return it->second.metadata; } diff --git a/src/Storages/S3Queue/S3QueueMetadataFactory.h b/src/Storages/S3Queue/S3QueueMetadataFactory.h index c5e94d59050..80e96f8aa7e 100644 --- a/src/Storages/S3Queue/S3QueueMetadataFactory.h +++ b/src/Storages/S3Queue/S3QueueMetadataFactory.h @@ -1,7 +1,7 @@ #pragma once #include #include -#include +#include namespace DB { @@ -9,7 +9,7 @@ namespace DB class S3QueueMetadataFactory final : private boost::noncopyable { public: - using FilesMetadataPtr = std::shared_ptr; + using FilesMetadataPtr = std::shared_ptr; static S3QueueMetadataFactory & instance(); @@ -22,9 +22,9 @@ public: private: struct Metadata { - explicit Metadata(std::shared_ptr metadata_) : metadata(metadata_), ref_count(1) {} + explicit Metadata(std::shared_ptr metadata_) : metadata(metadata_), ref_count(1) {} - std::shared_ptr metadata; + std::shared_ptr metadata; /// TODO: the ref count should be kept in keeper, because of the case with distributed processing. size_t ref_count = 0; }; diff --git a/src/Storages/S3Queue/S3QueueOrderedFileMetadata.cpp b/src/Storages/S3Queue/S3QueueOrderedFileMetadata.cpp new file mode 100644 index 00000000000..d1298b8c4fa --- /dev/null +++ b/src/Storages/S3Queue/S3QueueOrderedFileMetadata.cpp @@ -0,0 +1,414 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + +namespace +{ + S3QueueOrderedFileMetadata::Bucket getBucketForPathImpl(const std::string & path, size_t buckets_num) + { + return sipHash64(path) % buckets_num; + } + + std::string getProcessedPathForBucket(const std::filesystem::path & zk_path, size_t bucket) + { + return zk_path / "buckets" / toString(bucket) / "processed"; + } + + std::string getProcessedPath(const std::filesystem::path & zk_path, const std::string & path, size_t buckets_num) + { + if (buckets_num > 1) + return getProcessedPathForBucket(zk_path, getBucketForPathImpl(path, buckets_num)); + else + return zk_path / "processed"; + } + + zkutil::ZooKeeperPtr getZooKeeper() + { + return Context::getGlobalContextInstance()->getZooKeeper(); + } +} + +S3QueueOrderedFileMetadata::BucketHolder::BucketHolder( + const Bucket & bucket_, + int bucket_version_, + const std::string & bucket_lock_path_, + const std::string & bucket_lock_id_path_, + zkutil::ZooKeeperPtr zk_client_) + : bucket_info(std::make_shared(BucketInfo{ + .bucket = bucket_, + .bucket_version = bucket_version_, + .bucket_lock_path = bucket_lock_path_, + .bucket_lock_id_path = bucket_lock_id_path_})) + , zk_client(zk_client_) +{ +} + +void S3QueueOrderedFileMetadata::BucketHolder::release() +{ + if (released) + return; + + released = true; + LOG_TEST(getLogger("S3QueueBucketHolder"), "Releasing bucket {}", bucket_info->bucket); + + Coordination::Requests requests; + /// Check that bucket lock version has not changed + /// (which could happen if session had expired as bucket_lock_path is ephemeral node). + requests.push_back(zkutil::makeCheckRequest(bucket_info->bucket_lock_id_path, bucket_info->bucket_version)); + /// Remove bucket lock. + requests.push_back(zkutil::makeRemoveRequest(bucket_info->bucket_lock_path, -1)); + + Coordination::Responses responses; + const auto code = zk_client->tryMulti(requests, responses); + zkutil::KeeperMultiException::check(code, requests, responses); +} + +S3QueueOrderedFileMetadata::BucketHolder::~BucketHolder() +{ + try + { + release(); + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + } +} + +S3QueueOrderedFileMetadata::S3QueueOrderedFileMetadata( + const std::filesystem::path & zk_path_, + const std::string & path_, + FileStatusPtr file_status_, + BucketInfoPtr bucket_info_, + size_t buckets_num_, + size_t max_loading_retries_, + LoggerPtr log_) + : S3QueueIFileMetadata( + path_, + /* processing_node_path */zk_path_ / "processing" / getNodeName(path_), + /* processed_node_path */getProcessedPath(zk_path_, path_, buckets_num_), + /* failed_node_path */zk_path_ / "failed" / getNodeName(path_), + file_status_, + max_loading_retries_, + log_) + , buckets_num(buckets_num_) + , zk_path(zk_path_) + , bucket_info(bucket_info_) +{ +} + +std::vector S3QueueOrderedFileMetadata::getMetadataPaths(size_t buckets_num) +{ + if (buckets_num > 1) + { + std::vector paths{"buckets", "failed", "processing"}; + for (size_t i = 0; i < buckets_num; ++i) + paths.push_back("buckets/" + toString(i)); + return paths; + } + else + return {"failed", "processing"}; +} + +bool S3QueueOrderedFileMetadata::getMaxProcessedFile( + NodeMetadata & result, + Coordination::Stat * stat, + const zkutil::ZooKeeperPtr & zk_client) +{ + return getMaxProcessedFile(result, stat, processed_node_path, zk_client); +} + +bool S3QueueOrderedFileMetadata::getMaxProcessedFile( + NodeMetadata & result, + Coordination::Stat * stat, + const std::string & processed_node_path_, + const zkutil::ZooKeeperPtr & zk_client) +{ + std::string data; + if (zk_client->tryGet(processed_node_path_, data, stat)) + { + if (!data.empty()) + result = NodeMetadata::fromString(data); + return true; + } + return false; +} + +S3QueueOrderedFileMetadata::Bucket S3QueueOrderedFileMetadata::getBucketForPath(const std::string & path_, size_t buckets_num) +{ + return getBucketForPathImpl(path_, buckets_num); +} + +S3QueueOrderedFileMetadata::BucketHolderPtr S3QueueOrderedFileMetadata::tryAcquireBucket( + const std::filesystem::path & zk_path, + const Bucket & bucket, + const Processor & processor) +{ + const auto zk_client = getZooKeeper(); + const auto bucket_lock_path = zk_path / "buckets" / toString(bucket) / "lock"; + const auto bucket_lock_id_path = zk_path / "buckets" / toString(bucket) / "lock_id"; + const auto processor_info = getProcessorInfo(processor); + + Coordination::Requests requests; + + /// Create bucket lock node as ephemeral node. + requests.push_back(zkutil::makeCreateRequest(bucket_lock_path, "", zkutil::CreateMode::Ephemeral)); + + /// Create bucket lock id node as persistent node if it does not exist yet. + requests.push_back( + zkutil::makeCreateRequest( + bucket_lock_id_path, processor_info, zkutil::CreateMode::Persistent, /* ignore_if_exists */true)); + + /// Update bucket lock id path. We use its version as a version of ephemeral bucket lock node. + /// (See comment near S3QueueIFileMetadata::processing_node_version). + requests.push_back(zkutil::makeSetRequest(bucket_lock_id_path, processor_info, -1)); + + Coordination::Responses responses; + const auto code = zk_client->tryMulti(requests, responses); + if (code == Coordination::Error::ZOK) + { + const auto * set_response = dynamic_cast(responses[2].get()); + const auto bucket_lock_version = set_response->stat.version; + + LOG_TEST( + getLogger("S3QueueOrderedFileMetadata"), + "Processor {} acquired bucket {} for processing (bucket lock version: {})", + processor, bucket, bucket_lock_version); + + return std::make_shared( + bucket, + bucket_lock_version, + bucket_lock_path, + bucket_lock_id_path, + zk_client); + } + + if (code == Coordination::Error::ZNODEEXISTS) + return nullptr; + + if (Coordination::isHardwareError(code)) + return nullptr; + + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected error: {}", code); +} + +std::pair S3QueueOrderedFileMetadata::setProcessingImpl() +{ + /// In one zookeeper transaction do the following: + enum RequestType + { + /// node_name is not within failed persistent nodes + FAILED_PATH_DOESNT_EXIST = 0, + /// node_name ephemeral processing node was successfully created + CREATED_PROCESSING_PATH = 2, + /// update processing id + SET_PROCESSING_ID = 4, + /// bucket version did not change + CHECKED_BUCKET_VERSION = 5, + /// max_processed_node version did not change + CHECKED_MAX_PROCESSED_PATH = 6, + }; + + const auto zk_client = getZooKeeper(); + processing_id = node_metadata.processing_id = getRandomASCIIString(10); + auto processor_info = getProcessorInfo(processing_id.value()); + + while (true) + { + NodeMetadata processed_node; + Coordination::Stat processed_node_stat; + bool has_processed_node = getMaxProcessedFile(processed_node, &processed_node_stat, zk_client); + if (has_processed_node) + { + LOG_TEST(log, "Current max processed file {} from path: {}", + processed_node.file_path, processed_node_path); + + if (!processed_node.file_path.empty() && path <= processed_node.file_path) + { + return {false, FileStatus::State::Processed}; + } + } + + Coordination::Requests requests; + requests.push_back(zkutil::makeCreateRequest(failed_node_path, "", zkutil::CreateMode::Persistent)); + requests.push_back(zkutil::makeRemoveRequest(failed_node_path, -1)); + requests.push_back(zkutil::makeCreateRequest(processing_node_path, node_metadata.toString(), zkutil::CreateMode::Ephemeral)); + + requests.push_back( + zkutil::makeCreateRequest( + processing_node_id_path, processor_info, zkutil::CreateMode::Persistent, /* ignore_if_exists */true)); + requests.push_back(zkutil::makeSetRequest(processing_node_id_path, processor_info, -1)); + + if (bucket_info) + requests.push_back(zkutil::makeCheckRequest(bucket_info->bucket_lock_id_path, bucket_info->bucket_version)); + + /// TODO: for ordered processing with buckets it should be enough to check only bucket lock version, + /// so may be remove creation and check for processing_node_id if bucket_info is set? + + if (has_processed_node) + { + requests.push_back(zkutil::makeCheckRequest(processed_node_path, processed_node_stat.version)); + } + else + { + requests.push_back(zkutil::makeCreateRequest(processed_node_path, "", zkutil::CreateMode::Persistent)); + requests.push_back(zkutil::makeRemoveRequest(processed_node_path, -1)); + } + + Coordination::Responses responses; + const auto code = zk_client->tryMulti(requests, responses); + auto is_request_failed = [&](RequestType type) { return responses[type]->error != Coordination::Error::ZOK; }; + + if (code == Coordination::Error::ZOK) + { + const auto * set_response = dynamic_cast(responses[SET_PROCESSING_ID].get()); + processing_id_version = set_response->stat.version; + return {true, FileStatus::State::None}; + } + + if (is_request_failed(FAILED_PATH_DOESNT_EXIST)) + return {false, FileStatus::State::Failed}; + + if (is_request_failed(CREATED_PROCESSING_PATH)) + return {false, FileStatus::State::Processing}; + + if (bucket_info && is_request_failed(CHECKED_BUCKET_VERSION)) + { + LOG_TEST(log, "Version of bucket lock changed: {}. Will retry for file `{}`", code, path); + continue; + } + + if (is_request_failed(bucket_info ? CHECKED_MAX_PROCESSED_PATH : CHECKED_BUCKET_VERSION)) + { + LOG_TEST(log, "Version of max processed file changed: {}. Will retry for file `{}`", code, path); + continue; + } + + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected response state: {}", code); + } +} + +void S3QueueOrderedFileMetadata::setProcessedAtStartRequests( + Coordination::Requests & requests, + const zkutil::ZooKeeperPtr & zk_client) +{ + if (buckets_num > 1) + { + for (size_t i = 0; i < buckets_num; ++i) + { + auto path = getProcessedPathForBucket(zk_path, i); + setProcessedRequests(requests, zk_client, path, /* ignore_if_exists */true); + } + } + else + { + setProcessedRequests(requests, zk_client, processed_node_path, /* ignore_if_exists */true); + } +} + +void S3QueueOrderedFileMetadata::setProcessedRequests( + Coordination::Requests & requests, + const zkutil::ZooKeeperPtr & zk_client, + const std::string & processed_node_path_, + bool ignore_if_exists) +{ + NodeMetadata processed_node; + Coordination::Stat processed_node_stat; + if (getMaxProcessedFile(processed_node, &processed_node_stat, processed_node_path_, zk_client)) + { + LOG_TEST(log, "Current max processed file: {}, condition less: {}", + processed_node.file_path, bool(path <= processed_node.file_path)); + + if (!processed_node.file_path.empty() && path <= processed_node.file_path) + { + LOG_TRACE(log, "File {} is already processed, current max processed file: {}", path, processed_node.file_path); + + if (ignore_if_exists) + return; + + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "File ({}) is already processed, while expected it not to be (path: {})", + path, processed_node_path_); + } + requests.push_back(zkutil::makeSetRequest(processed_node_path_, node_metadata.toString(), processed_node_stat.version)); + } + else + { + LOG_TEST(log, "Max processed file does not exist, creating at: {}", processed_node_path_); + requests.push_back(zkutil::makeCreateRequest(processed_node_path_, node_metadata.toString(), zkutil::CreateMode::Persistent)); + } + + if (processing_id_version.has_value()) + { + requests.push_back(zkutil::makeCheckRequest(processing_node_id_path, processing_id_version.value())); + requests.push_back(zkutil::makeRemoveRequest(processing_node_id_path, processing_id_version.value())); + requests.push_back(zkutil::makeRemoveRequest(processing_node_path, -1)); + } +} + +void S3QueueOrderedFileMetadata::setProcessedImpl() +{ + /// In one zookeeper transaction do the following: + enum RequestType + { + SET_MAX_PROCESSED_PATH = 0, + CHECK_PROCESSING_ID_PATH = 1, /// Optional. + REMOVE_PROCESSING_ID_PATH = 2, /// Optional. + REMOVE_PROCESSING_PATH = 3, /// Optional. + }; + + const auto zk_client = getZooKeeper(); + const auto node_metadata_str = node_metadata.toString(); + std::string failure_reason; + + while (true) + { + Coordination::Requests requests; + setProcessedRequests(requests, zk_client, processed_node_path, /* ignore_if_exists */false); + + Coordination::Responses responses; + auto is_request_failed = [&](RequestType type) { return responses[type]->error != Coordination::Error::ZOK; }; + + auto code = zk_client->tryMulti(requests, responses); + if (code == Coordination::Error::ZOK) + { + if (max_loading_retries) + zk_client->tryRemove(failed_node_path + ".retriable", -1); + return; + } + + if (Coordination::isHardwareError(code)) + failure_reason = "Lost connection to keeper"; + else if (is_request_failed(SET_MAX_PROCESSED_PATH)) + { + LOG_TRACE(log, "Cannot set file {} as processed. " + "Failed to update processed node: {}. " + "Will retry.", path, code); + continue; + } + else if (is_request_failed(CHECK_PROCESSING_ID_PATH)) + failure_reason = "Version of processing id node changed"; + else if (is_request_failed(REMOVE_PROCESSING_PATH)) + failure_reason = "Failed to remove processing path"; + else + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected state of zookeeper transaction: {}", code); + + LOG_WARNING(log, "Cannot set file {} as processed: {}. Reason: {}", path, code, failure_reason); + return; + } +} + +} diff --git a/src/Storages/S3Queue/S3QueueOrderedFileMetadata.h b/src/Storages/S3Queue/S3QueueOrderedFileMetadata.h new file mode 100644 index 00000000000..698ec0f54cc --- /dev/null +++ b/src/Storages/S3Queue/S3QueueOrderedFileMetadata.h @@ -0,0 +1,97 @@ +#pragma once +#include +#include +#include +#include + +namespace DB +{ + +class S3QueueOrderedFileMetadata : public S3QueueIFileMetadata +{ +public: + using Processor = std::string; + using Bucket = size_t; + struct BucketInfo + { + Bucket bucket; + int bucket_version; + std::string bucket_lock_path; + std::string bucket_lock_id_path; + }; + using BucketInfoPtr = std::shared_ptr; + + explicit S3QueueOrderedFileMetadata( + const std::filesystem::path & zk_path_, + const std::string & path_, + FileStatusPtr file_status_, + BucketInfoPtr bucket_info_, + size_t buckets_num_, + size_t max_loading_retries_, + LoggerPtr log_); + + struct BucketHolder; + using BucketHolderPtr = std::shared_ptr; + + static BucketHolderPtr tryAcquireBucket( + const std::filesystem::path & zk_path, + const Bucket & bucket, + const Processor & processor); + + static S3QueueOrderedFileMetadata::Bucket getBucketForPath(const std::string & path, size_t buckets_num); + + static std::vector getMetadataPaths(size_t buckets_num); + + void setProcessedAtStartRequests( + Coordination::Requests & requests, + const zkutil::ZooKeeperPtr & zk_client) override; + +private: + const size_t buckets_num; + const std::string zk_path; + const BucketInfoPtr bucket_info; + + std::pair setProcessingImpl() override; + void setProcessedImpl() override; + + bool getMaxProcessedFile( + NodeMetadata & result, + Coordination::Stat * stat, + const zkutil::ZooKeeperPtr & zk_client); + + bool getMaxProcessedFile( + NodeMetadata & result, + Coordination::Stat * stat, + const std::string & processed_node_path_, + const zkutil::ZooKeeperPtr & zk_client); + + void setProcessedRequests( + Coordination::Requests & requests, + const zkutil::ZooKeeperPtr & zk_client, + const std::string & processed_node_path_, + bool ignore_if_exists); +}; + +struct S3QueueOrderedFileMetadata::BucketHolder +{ + BucketHolder( + const Bucket & bucket_, + int bucket_version_, + const std::string & bucket_lock_path_, + const std::string & bucket_lock_id_path_, + zkutil::ZooKeeperPtr zk_client_); + + ~BucketHolder(); + + Bucket getBucket() const { return bucket_info->bucket; } + BucketInfoPtr getBucketInfo() const { return bucket_info; } + + void release(); + +private: + BucketInfoPtr bucket_info; + const zkutil::ZooKeeperPtr zk_client; + bool released = false; +}; + +} diff --git a/src/Storages/S3Queue/S3QueueSettings.h b/src/Storages/S3Queue/S3QueueSettings.h index c26e973a1c0..c486a7fbb5d 100644 --- a/src/Storages/S3Queue/S3QueueSettings.h +++ b/src/Storages/S3Queue/S3QueueSettings.h @@ -13,7 +13,7 @@ class ASTStorage; #define S3QUEUE_RELATED_SETTINGS(M, ALIAS) \ M(S3QueueMode, \ mode, \ - S3QueueMode::ORDERED, \ + S3QueueMode::UNORDERED, \ "With unordered mode, the set of all already processed files is tracked with persistent nodes in ZooKepeer." \ "With ordered mode, only the max name of the successfully consumed file stored.", \ 0) \ @@ -30,8 +30,7 @@ class ASTStorage; M(UInt32, s3queue_tracked_files_limit, 1000, "For unordered mode. Max set size for tracking processed files in ZooKeeper", 0) \ M(UInt32, s3queue_cleanup_interval_min_ms, 60000, "For unordered mode. Polling backoff min for cleanup", 0) \ M(UInt32, s3queue_cleanup_interval_max_ms, 60000, "For unordered mode. Polling backoff max for cleanup", 0) \ - M(UInt32, s3queue_total_shards_num, 1, "Value 0 means disabled", 0) \ - M(UInt32, s3queue_current_shard_num, 0, "", 0) \ + M(UInt32, s3queue_buckets, 0, "Number of buckets for Ordered mode parallel processing", 0) \ #define LIST_OF_S3QUEUE_SETTINGS(M, ALIAS) \ S3QUEUE_RELATED_SETTINGS(M, ALIAS) \ diff --git a/src/Storages/S3Queue/S3QueueSource.cpp b/src/Storages/S3Queue/S3QueueSource.cpp index c8aaece0711..d8633037ed9 100644 --- a/src/Storages/S3Queue/S3QueueSource.cpp +++ b/src/Storages/S3Queue/S3QueueSource.cpp @@ -1,10 +1,10 @@ #include "config.h" -#if USE_AWS_S3 #include #include #include #include +#include #include #include #include @@ -32,18 +32,16 @@ namespace ErrorCodes } StorageS3QueueSource::S3QueueObjectInfo::S3QueueObjectInfo( - const std::string & key_, - const ObjectMetadata & object_metadata_, - Metadata::ProcessingNodeHolderPtr processing_holder_) - : ObjectInfo(key_, object_metadata_) + const ObjectInfo & object_info, + Metadata::FileMetadataPtr processing_holder_) + : ObjectInfo(object_info.relative_path, object_info.metadata) , processing_holder(processing_holder_) { } StorageS3QueueSource::FileIterator::FileIterator( - std::shared_ptr metadata_, + std::shared_ptr metadata_, std::unique_ptr glob_iterator_, - size_t current_shard_, std::atomic & shutdown_called_, LoggerPtr logger_) : StorageObjectStorageSource::IIterator("S3QueueIterator") @@ -51,109 +49,7 @@ StorageS3QueueSource::FileIterator::FileIterator( , glob_iterator(std::move(glob_iterator_)) , shutdown_called(shutdown_called_) , log(logger_) - , sharded_processing(metadata->isShardedProcessing()) - , current_shard(current_shard_) { - if (sharded_processing) - { - for (const auto & id : metadata->getProcessingIdsForShard(current_shard)) - sharded_keys.emplace(id, std::deque{}); - } -} - -StorageS3QueueSource::ObjectInfoPtr StorageS3QueueSource::FileIterator::nextImpl(size_t processor) -{ - while (!shutdown_called) - { - ObjectInfoPtr val{nullptr}; - - { - std::unique_lock lk(sharded_keys_mutex, std::defer_lock); - if (sharded_processing) - { - /// To make sure order on keys in each shard in sharded_keys - /// we need to check sharded_keys and to next() under lock. - lk.lock(); - - if (auto it = sharded_keys.find(processor); it != sharded_keys.end()) - { - auto & keys = it->second; - if (!keys.empty()) - { - val = keys.front(); - keys.pop_front(); - chassert(processor == metadata->getProcessingIdForPath(val->relative_path)); - } - } - else - { - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Processing id {} does not exist (Expected ids: {})", - processor, fmt::join(metadata->getProcessingIdsForShard(current_shard), ", ")); - } - } - - if (!val) - { - val = glob_iterator->next(processor); - if (val && sharded_processing) - { - const auto processing_id_for_key = metadata->getProcessingIdForPath(val->relative_path); - if (processor != processing_id_for_key) - { - if (metadata->isProcessingIdBelongsToShard(processing_id_for_key, current_shard)) - { - LOG_TEST(log, "Putting key {} into queue of processor {} (total: {})", - val->relative_path, processing_id_for_key, sharded_keys.size()); - - if (auto it = sharded_keys.find(processing_id_for_key); it != sharded_keys.end()) - { - it->second.push_back(val); - } - else - { - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Processing id {} does not exist (Expected ids: {})", - processing_id_for_key, fmt::join(metadata->getProcessingIdsForShard(current_shard), ", ")); - } - } - continue; - } - } - } - } - - if (!val) - return {}; - - if (shutdown_called) - { - LOG_TEST(log, "Shutdown was called, stopping file iterator"); - return {}; - } - - auto processing_holder = metadata->trySetFileAsProcessing(val->relative_path); - if (shutdown_called) - { - LOG_TEST(log, "Shutdown was called, stopping file iterator"); - return {}; - } - - LOG_TEST(log, "Checking if can process key {} for processing_id {}", val->relative_path, processor); - - if (processing_holder) - { - return std::make_shared(val->relative_path, val->metadata.value(), processing_holder); - } - else if (sharded_processing - && metadata->getFileStatus(val->relative_path)->state == S3QueueFilesMetadata::FileStatus::State::Processing) - { - throw Exception(ErrorCodes::LOGICAL_ERROR, - "File {} is processing by someone else in sharded processing. " - "It is a bug", val->relative_path); - } - } - return {}; } size_t StorageS3QueueSource::FileIterator::estimatedKeysCount() @@ -161,12 +57,242 @@ size_t StorageS3QueueSource::FileIterator::estimatedKeysCount() throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method estimateKeysCount is not implemented"); } +StorageS3QueueSource::ObjectInfoPtr StorageS3QueueSource::FileIterator::nextImpl(size_t processor) +{ + ObjectInfoPtr object_info; + S3QueueOrderedFileMetadata::BucketInfoPtr bucket_info; + + while (!shutdown_called) + { + if (metadata->useBucketsForProcessing()) + std::tie(object_info, bucket_info) = getNextKeyFromAcquiredBucket(processor); + else + object_info = glob_iterator->next(processor); + + if (!object_info) + return {}; + + if (shutdown_called) + { + LOG_TEST(log, "Shutdown was called, stopping file iterator"); + return {}; + } + + auto file_metadata = metadata->getFileMetadata(object_info->relative_path, bucket_info); + if (file_metadata->setProcessing()) + return std::make_shared(*object_info, file_metadata); + } + return {}; +} + +std::pair +StorageS3QueueSource::FileIterator::getNextKeyFromAcquiredBucket(size_t processor) +{ + /// We need this lock to maintain consistency between listing s3 directory + /// and getting/putting result into listed_keys_cache. + std::lock_guard lock(buckets_mutex); + + auto bucket_holder_it = bucket_holders.emplace(processor, nullptr).first; + auto current_processor = toString(processor); + + LOG_TEST( + log, "Current processor: {}, acquired bucket: {}", + processor, bucket_holder_it->second ? toString(bucket_holder_it->second->getBucket()) : "None"); + + while (true) + { + /// Each processing thread gets next path from glob_iterator->next() + /// and checks if corresponding bucket is already acquired by someone. + /// In case it is already acquired, they put the key into listed_keys_cache, + /// so that the thread who acquired the bucket will be able to see + /// those keys without the need to list s3 directory once again. + if (bucket_holder_it->second) + { + const auto bucket = bucket_holder_it->second->getBucket(); + auto it = listed_keys_cache.find(bucket); + if (it != listed_keys_cache.end()) + { + /// `bucket_keys` -- keys we iterated so far and which were not taken for processing. + /// `bucket_processor` -- processor id of the thread which has acquired the bucket. + auto & [bucket_keys, bucket_processor] = it->second; + + /// Check correctness just in case. + if (!bucket_processor.has_value()) + { + bucket_processor = current_processor; + } + else if (bucket_processor.value() != current_processor) + { + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Expected current processor {} to be equal to {} for bucket {}", + current_processor, + bucket_processor.has_value() ? toString(bucket_processor.value()) : "None", + bucket); + } + + /// Take next key to process + if (!bucket_keys.empty()) + { + /// Take the key from the front, the order is important. + auto object_info = bucket_keys.front(); + bucket_keys.pop_front(); + + LOG_TEST(log, "Current bucket: {}, will process file: {}", + bucket, object_info->getFileName()); + + return std::pair{object_info, bucket_holder_it->second->getBucketInfo()}; + } + + LOG_TEST(log, "Cache of bucket {} is empty", bucket); + + /// No more keys in bucket, remove it from cache. + listed_keys_cache.erase(it); + } + else + { + LOG_TEST(log, "Cache of bucket {} is empty", bucket); + } + + if (iterator_finished) + { + /// Bucket is fully processed - release the bucket. + bucket_holder_it->second->release(); + bucket_holder_it->second.reset(); + } + } + /// If processing thread has already acquired some bucket + /// and while listing s3 directory gets a key which is in a different bucket, + /// it puts the key into listed_keys_cache to allow others to process it, + /// because one processing thread can acquire only one bucket at a time. + /// Once a thread is finished with its acquired bucket, it checks listed_keys_cache + /// to see if there are keys from buckets not acquired by anyone. + if (!bucket_holder_it->second) + { + for (auto it = listed_keys_cache.begin(); it != listed_keys_cache.end();) + { + auto & [bucket, bucket_info] = *it; + auto & [bucket_keys, bucket_processor] = bucket_info; + + LOG_TEST(log, "Bucket: {}, cached keys: {}, processor: {}", + bucket, bucket_keys.size(), bucket_processor.has_value() ? toString(bucket_processor.value()) : "None"); + + if (bucket_processor.has_value()) + { + LOG_TEST(log, "Bucket {} is already locked for processing by {} (keys: {})", + bucket, bucket_processor.value(), bucket_keys.size()); + ++it; + continue; + } + + if (bucket_keys.empty()) + { + /// No more keys in bucket, remove it from cache. + /// We still might add new keys to this bucket if !iterator_finished. + it = listed_keys_cache.erase(it); + continue; + } + + bucket_holder_it->second = metadata->tryAcquireBucket(bucket, current_processor); + if (!bucket_holder_it->second) + { + LOG_TEST(log, "Bucket {} is already locked for processing (keys: {})", + bucket, bucket_keys.size()); + ++it; + continue; + } + + bucket_processor = current_processor; + + /// Take the key from the front, the order is important. + auto object_info = bucket_keys.front(); + bucket_keys.pop_front(); + + LOG_TEST(log, "Acquired bucket: {}, will process file: {}", + bucket, object_info->getFileName()); + + return std::pair{object_info, bucket_holder_it->second->getBucketInfo()}; + } + } + + if (iterator_finished) + { + LOG_TEST(log, "Reached the end of file iterator and nothing left in keys cache"); + return {}; + } + + auto object_info = glob_iterator->next(processor); + if (object_info) + { + const auto bucket = metadata->getBucketForPath(object_info->relative_path); + auto & bucket_cache = listed_keys_cache[bucket]; + + LOG_TEST(log, "Found next file: {}, bucket: {}, current bucket: {}, cached_keys: {}", + object_info->getFileName(), bucket, + bucket_holder_it->second ? toString(bucket_holder_it->second->getBucket()) : "None", + bucket_cache.keys.size()); + + if (bucket_holder_it->second) + { + if (bucket_holder_it->second->getBucket() != bucket) + { + /// Acquired bucket differs from object's bucket, + /// put it into bucket's cache and continue. + bucket_cache.keys.emplace_back(object_info); + continue; + } + /// Bucket is already acquired, process the file. + return std::pair{object_info, bucket_holder_it->second->getBucketInfo()}; + } + else + { + bucket_holder_it->second = metadata->tryAcquireBucket(bucket, current_processor); + if (bucket_holder_it->second) + { + bucket_cache.processor = current_processor; + if (!bucket_cache.keys.empty()) + { + /// We have to maintain ordering between keys, + /// so if some keys are already in cache - start with them. + bucket_cache.keys.emplace_back(object_info); + object_info = bucket_cache.keys.front(); + bucket_cache.keys.pop_front(); + } + return std::pair{object_info, bucket_holder_it->second->getBucketInfo()}; + } + else + { + LOG_TEST(log, "Bucket {} is already locked for processing", bucket); + bucket_cache.keys.emplace_back(object_info); + continue; + } + } + } + else + { + if (bucket_holder_it->second) + { + bucket_holder_it->second->release(); + bucket_holder_it->second.reset(); + } + + LOG_TEST(log, "Reached the end of file iterator"); + iterator_finished = true; + + if (listed_keys_cache.empty()) + return {}; + else + continue; + } + } +} + StorageS3QueueSource::StorageS3QueueSource( String name_, + size_t processor_id_, const Block & header_, std::unique_ptr internal_source_, - std::shared_ptr files_metadata_, - size_t processing_id_, + std::shared_ptr files_metadata_, const S3QueueAction & action_, RemoveFileFunc remove_file_func_, const NamesAndTypesList & requested_virtual_columns_, @@ -179,8 +305,8 @@ StorageS3QueueSource::StorageS3QueueSource( : ISource(header_) , WithContext(context_) , name(std::move(name_)) + , processor_id(processor_id_) , action(action_) - , processing_id(processing_id_) , files_metadata(files_metadata_) , internal_source(std::move(internal_source_)) , requested_virtual_columns(requested_virtual_columns_) @@ -198,12 +324,12 @@ String StorageS3QueueSource::getName() const return name; } -void StorageS3QueueSource::lazyInitialize() +void StorageS3QueueSource::lazyInitialize(size_t processor) { if (initialized) return; - internal_source->lazyInitialize(processing_id); + internal_source->lazyInitialize(processor); reader = std::move(internal_source->reader); if (reader) reader_future = std::move(internal_source->reader_future); @@ -212,15 +338,16 @@ void StorageS3QueueSource::lazyInitialize() Chunk StorageS3QueueSource::generate() { - lazyInitialize(); + lazyInitialize(processor_id); while (true) { if (!reader) break; - const auto * key_with_info = dynamic_cast(&reader.getObjectInfo()); - auto file_status = key_with_info->processing_holder->getFileStatus(); + const auto * object_info = dynamic_cast(&reader.getObjectInfo()); + auto file_metadata = object_info->processing_holder; + auto file_status = file_metadata->getFileStatus(); if (isCancelled()) { @@ -230,12 +357,12 @@ Chunk StorageS3QueueSource::generate() { try { - files_metadata->setFileFailed(key_with_info->processing_holder, "Cancelled"); + file_metadata->setFailed("Cancelled"); } catch (...) { LOG_ERROR(log, "Failed to set file {} as failed: {}", - key_with_info->relative_path, getCurrentExceptionMessage(true)); + object_info->relative_path, getCurrentExceptionMessage(true)); } appendLogElement(reader.getObjectInfo().getPath(), *file_status, processed_rows_from_file, false); @@ -259,12 +386,12 @@ Chunk StorageS3QueueSource::generate() try { - files_metadata->setFileFailed(key_with_info->processing_holder, "Table is dropped"); + file_metadata->setFailed("Table is dropped"); } catch (...) { LOG_ERROR(log, "Failed to set file {} as failed: {}", - key_with_info->relative_path, getCurrentExceptionMessage(true)); + object_info->relative_path, getCurrentExceptionMessage(true)); } appendLogElement(path, *file_status, processed_rows_from_file, false); @@ -304,14 +431,14 @@ Chunk StorageS3QueueSource::generate() const auto message = getCurrentExceptionMessage(true); LOG_ERROR(log, "Got an error while pulling chunk. Will set file {} as failed. Error: {} ", path, message); - files_metadata->setFileFailed(key_with_info->processing_holder, message); + file_metadata->setFailed(message); appendLogElement(path, *file_status, processed_rows_from_file, false); throw; } - files_metadata->setFileProcessed(key_with_info->processing_holder); - applyActionAfterProcessing(path); + file_metadata->setProcessed(); + applyActionAfterProcessing(reader.getObjectInfo().relative_path); appendLogElement(path, *file_status, processed_rows_from_file, true); file_status.reset(); @@ -334,7 +461,7 @@ Chunk StorageS3QueueSource::generate() /// Even if task is finished the thread may be not freed in pool. /// So wait until it will be freed before scheduling a new task. internal_source->create_reader_pool->wait(); - reader_future = internal_source->createReaderAsync(processing_id); + reader_future = internal_source->createReaderAsync(processor_id); } return {}; @@ -357,7 +484,7 @@ void StorageS3QueueSource::applyActionAfterProcessing(const String & path) void StorageS3QueueSource::appendLogElement( const std::string & filename, - S3QueueFilesMetadata::FileStatus & file_status_, + S3QueueMetadata::FileStatus & file_status_, size_t processed_rows, bool processed) { @@ -366,7 +493,6 @@ void StorageS3QueueSource::appendLogElement( S3QueueLogElement elem{}; { - std::lock_guard lock(file_status_.metadata_lock); elem = S3QueueLogElement { .event_time = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()), @@ -379,12 +505,10 @@ void StorageS3QueueSource::appendLogElement( .counters_snapshot = file_status_.profile_counters.getPartiallyAtomicSnapshot(), .processing_start_time = file_status_.processing_start_time, .processing_end_time = file_status_.processing_end_time, - .exception = file_status_.last_exception, + .exception = file_status_.getException(), }; } s3_queue_log->add(std::move(elem)); } } - -#endif diff --git a/src/Storages/S3Queue/S3QueueSource.h b/src/Storages/S3Queue/S3QueueSource.h index 663577e055b..6e098f8cb63 100644 --- a/src/Storages/S3Queue/S3QueueSource.h +++ b/src/Storages/S3Queue/S3QueueSource.h @@ -1,10 +1,9 @@ #pragma once #include "config.h" -#if USE_AWS_S3 #include #include -#include +#include #include #include #include @@ -21,14 +20,13 @@ class StorageS3QueueSource : public ISource, WithContext { public: using Storage = StorageObjectStorage; - using ConfigurationPtr = Storage::ConfigurationPtr; using GlobIterator = StorageObjectStorageSource::GlobIterator; using ZooKeeperGetter = std::function; using RemoveFileFunc = std::function; - using FileStatusPtr = S3QueueFilesMetadata::FileStatusPtr; + using FileStatusPtr = S3QueueMetadata::FileStatusPtr; using ReaderHolder = StorageObjectStorageSource::ReaderHolder; - using Metadata = S3QueueFilesMetadata; + using Metadata = S3QueueMetadata; using ObjectInfo = StorageObjectStorageSource::ObjectInfo; using ObjectInfoPtr = std::shared_ptr; using ObjectInfos = std::vector; @@ -36,20 +34,18 @@ public: struct S3QueueObjectInfo : public ObjectInfo { S3QueueObjectInfo( - const std::string & key_, - const ObjectMetadata & object_metadata_, - Metadata::ProcessingNodeHolderPtr processing_holder_); + const ObjectInfo & object_info, + Metadata::FileMetadataPtr processing_holder_); - Metadata::ProcessingNodeHolderPtr processing_holder; + Metadata::FileMetadataPtr processing_holder; }; class FileIterator : public StorageObjectStorageSource::IIterator { public: FileIterator( - std::shared_ptr metadata_, + std::shared_ptr metadata_, std::unique_ptr glob_iterator_, - size_t current_shard_, std::atomic & shutdown_called_, LoggerPtr logger_); @@ -61,24 +57,35 @@ public: size_t estimatedKeysCount() override; private: - const std::shared_ptr metadata; + using Bucket = S3QueueMetadata::Bucket; + using Processor = S3QueueMetadata::Processor; + + const std::shared_ptr metadata; const std::unique_ptr glob_iterator; + std::atomic & shutdown_called; std::mutex mutex; LoggerPtr log; - const bool sharded_processing; - const size_t current_shard; - std::unordered_map> sharded_keys; - std::mutex sharded_keys_mutex; + std::mutex buckets_mutex; + struct ListedKeys + { + std::deque keys; + std::optional processor; + }; + std::unordered_map listed_keys_cache; + bool iterator_finished = false; + std::unordered_map bucket_holders; + + std::pair getNextKeyFromAcquiredBucket(size_t processor); }; StorageS3QueueSource( String name_, + size_t processor_id_, const Block & header_, std::unique_ptr internal_source_, - std::shared_ptr files_metadata_, - size_t processing_id_, + std::shared_ptr files_metadata_, const S3QueueAction & action_, RemoveFileFunc remove_file_func_, const NamesAndTypesList & requested_virtual_columns_, @@ -97,9 +104,9 @@ public: private: const String name; + const size_t processor_id; const S3QueueAction action; - const size_t processing_id; - const std::shared_ptr files_metadata; + const std::shared_ptr files_metadata; const std::shared_ptr internal_source; const NamesAndTypesList requested_virtual_columns; const std::atomic & shutdown_called; @@ -115,10 +122,11 @@ private: std::atomic initialized{false}; size_t processed_rows_from_file = 0; + S3QueueOrderedFileMetadata::BucketHolderPtr current_bucket_holder; + void applyActionAfterProcessing(const String & path); - void appendLogElement(const std::string & filename, S3QueueFilesMetadata::FileStatus & file_status_, size_t processed_rows, bool processed); - void lazyInitialize(); + void appendLogElement(const std::string & filename, S3QueueMetadata::FileStatus & file_status_, size_t processed_rows, bool processed); + void lazyInitialize(size_t processor); }; } -#endif diff --git a/src/Storages/S3Queue/S3QueueTableMetadata.cpp b/src/Storages/S3Queue/S3QueueTableMetadata.cpp index f0b7568ae7f..ecaa7ad57cc 100644 --- a/src/Storages/S3Queue/S3QueueTableMetadata.cpp +++ b/src/Storages/S3Queue/S3QueueTableMetadata.cpp @@ -1,12 +1,11 @@ #include -#if USE_AWS_S3 - #include #include #include #include #include +#include #include @@ -40,10 +39,10 @@ S3QueueTableMetadata::S3QueueTableMetadata( format_name = configuration.format; after_processing = engine_settings.after_processing.toString(); mode = engine_settings.mode.toString(); - s3queue_tracked_files_limit = engine_settings.s3queue_tracked_files_limit; - s3queue_tracked_file_ttl_sec = engine_settings.s3queue_tracked_file_ttl_sec; - s3queue_total_shards_num = engine_settings.s3queue_total_shards_num; - s3queue_processing_threads_num = engine_settings.s3queue_processing_threads_num; + tracked_files_limit = engine_settings.s3queue_tracked_files_limit; + tracked_file_ttl_sec = engine_settings.s3queue_tracked_file_ttl_sec; + buckets = engine_settings.s3queue_buckets; + processing_threads_num = engine_settings.s3queue_processing_threads_num; columns = storage_metadata.getColumns().toString(); } @@ -52,14 +51,15 @@ String S3QueueTableMetadata::toString() const Poco::JSON::Object json; json.set("after_processing", after_processing); json.set("mode", mode); - json.set("s3queue_tracked_files_limit", s3queue_tracked_files_limit); - json.set("s3queue_tracked_file_ttl_sec", s3queue_tracked_file_ttl_sec); - json.set("s3queue_total_shards_num", s3queue_total_shards_num); - json.set("s3queue_processing_threads_num", s3queue_processing_threads_num); + json.set("tracked_files_limit", tracked_files_limit); + json.set("tracked_file_ttl_sec", tracked_file_ttl_sec); + json.set("processing_threads_num", processing_threads_num); + json.set("buckets", buckets); json.set("format_name", format_name); json.set("columns", columns); + json.set("last_processed_file", last_processed_path); - std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM + std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM oss.exceptions(std::ios::failbit); Poco::JSON::Stringifier::stringify(json, oss); return oss.str(); @@ -72,20 +72,34 @@ void S3QueueTableMetadata::read(const String & metadata_str) after_processing = json->getValue("after_processing"); mode = json->getValue("mode"); - s3queue_tracked_files_limit = json->getValue("s3queue_tracked_files_limit"); - s3queue_tracked_file_ttl_sec = json->getValue("s3queue_tracked_file_ttl_sec"); + format_name = json->getValue("format_name"); columns = json->getValue("columns"); - if (json->has("s3queue_total_shards_num")) - s3queue_total_shards_num = json->getValue("s3queue_total_shards_num"); - else - s3queue_total_shards_num = 1; + /// Check with "s3queue_" prefix for compatibility. + { + if (json->has("s3queue_tracked_files_limit")) + tracked_files_limit = json->getValue("s3queue_tracked_files_limit"); + if (json->has("s3queue_tracked_file_ttl_sec")) + tracked_file_ttl_sec = json->getValue("s3queue_tracked_file_ttl_sec"); + if (json->has("s3queue_processing_threads_num")) + processing_threads_num = json->getValue("s3queue_processing_threads_num"); + } - if (json->has("s3queue_processing_threads_num")) - s3queue_processing_threads_num = json->getValue("s3queue_processing_threads_num"); - else - s3queue_processing_threads_num = 1; + if (json->has("tracked_files_limit")) + tracked_files_limit = json->getValue("tracked_files_limit"); + + if (json->has("tracked_file_ttl_sec")) + tracked_file_ttl_sec = json->getValue("tracked_file_ttl_sec"); + + if (json->has("last_processed_file")) + last_processed_path = json->getValue("last_processed_file"); + + if (json->has("processing_threads_num")) + processing_threads_num = json->getValue("processing_threads_num"); + + if (json->has("buckets")) + buckets = json->getValue("buckets"); } S3QueueTableMetadata S3QueueTableMetadata::parse(const String & metadata_str) @@ -95,6 +109,11 @@ S3QueueTableMetadata S3QueueTableMetadata::parse(const String & metadata_str) return metadata; } +void S3QueueTableMetadata::checkEquals(const S3QueueTableMetadata & from_zk) const +{ + checkImmutableFieldsEquals(from_zk); +} + void S3QueueTableMetadata::checkImmutableFieldsEquals(const S3QueueTableMetadata & from_zk) const { if (after_processing != from_zk.after_processing) @@ -113,21 +132,21 @@ void S3QueueTableMetadata::checkImmutableFieldsEquals(const S3QueueTableMetadata from_zk.mode, mode); - if (s3queue_tracked_files_limit != from_zk.s3queue_tracked_files_limit) + if (tracked_files_limit != from_zk.tracked_files_limit) throw Exception( ErrorCodes::METADATA_MISMATCH, "Existing table metadata in ZooKeeper differs in max set size. " "Stored in ZooKeeper: {}, local: {}", - from_zk.s3queue_tracked_files_limit, - s3queue_tracked_files_limit); + from_zk.tracked_files_limit, + tracked_files_limit); - if (s3queue_tracked_file_ttl_sec != from_zk.s3queue_tracked_file_ttl_sec) + if (tracked_file_ttl_sec != from_zk.tracked_file_ttl_sec) throw Exception( ErrorCodes::METADATA_MISMATCH, "Existing table metadata in ZooKeeper differs in max set age. " "Stored in ZooKeeper: {}, local: {}", - from_zk.s3queue_tracked_file_ttl_sec, - s3queue_tracked_file_ttl_sec); + from_zk.tracked_file_ttl_sec, + tracked_file_ttl_sec); if (format_name != from_zk.format_name) throw Exception( @@ -137,34 +156,97 @@ void S3QueueTableMetadata::checkImmutableFieldsEquals(const S3QueueTableMetadata from_zk.format_name, format_name); + if (last_processed_path != from_zk.last_processed_path) + throw Exception( + ErrorCodes::METADATA_MISMATCH, + "Existing table metadata in ZooKeeper differs in last processed path. " + "Stored in ZooKeeper: {}, local: {}", + from_zk.last_processed_path, + last_processed_path); + if (modeFromString(mode) == S3QueueMode::ORDERED) { - if (s3queue_processing_threads_num != from_zk.s3queue_processing_threads_num) + if (buckets != from_zk.buckets) { throw Exception( ErrorCodes::METADATA_MISMATCH, - "Existing table metadata in ZooKeeper differs in s3queue_processing_threads_num setting. " + "Existing table metadata in ZooKeeper differs in s3queue_buckets setting. " "Stored in ZooKeeper: {}, local: {}", - from_zk.s3queue_processing_threads_num, - s3queue_processing_threads_num); + from_zk.buckets, buckets); } - if (s3queue_total_shards_num != from_zk.s3queue_total_shards_num) + + if (S3QueueMetadata::getBucketsNum(*this) != S3QueueMetadata::getBucketsNum(from_zk)) { throw Exception( ErrorCodes::METADATA_MISMATCH, - "Existing table metadata in ZooKeeper differs in s3queue_total_shards_num setting. " + "Existing table metadata in ZooKeeper differs in processing buckets. " "Stored in ZooKeeper: {}, local: {}", - from_zk.s3queue_total_shards_num, - s3queue_total_shards_num); + S3QueueMetadata::getBucketsNum(*this), S3QueueMetadata::getBucketsNum(from_zk)); } } } -void S3QueueTableMetadata::checkEquals(const S3QueueTableMetadata & from_zk) const +void S3QueueTableMetadata::checkEquals(const S3QueueSettings & current, const S3QueueSettings & expected) { - checkImmutableFieldsEquals(from_zk); -} + if (current.after_processing != expected.after_processing) + throw Exception( + ErrorCodes::METADATA_MISMATCH, + "Existing table metadata in ZooKeeper differs " + "in action after processing. Stored in ZooKeeper: {}, local: {}", + expected.after_processing.toString(), + current.after_processing.toString()); -} + if (current.mode != expected.mode) + throw Exception( + ErrorCodes::METADATA_MISMATCH, + "Existing table metadata in ZooKeeper differs in engine mode. " + "Stored in ZooKeeper: {}, local: {}", + expected.mode.toString(), + current.mode.toString()); -#endif + if (current.s3queue_tracked_files_limit != expected.s3queue_tracked_files_limit) + throw Exception( + ErrorCodes::METADATA_MISMATCH, + "Existing table metadata in ZooKeeper differs in max set size. " + "Stored in ZooKeeper: {}, local: {}", + expected.s3queue_tracked_files_limit, + current.s3queue_tracked_files_limit); + + if (current.s3queue_tracked_file_ttl_sec != expected.s3queue_tracked_file_ttl_sec) + throw Exception( + ErrorCodes::METADATA_MISMATCH, + "Existing table metadata in ZooKeeper differs in max set age. " + "Stored in ZooKeeper: {}, local: {}", + expected.s3queue_tracked_file_ttl_sec, + current.s3queue_tracked_file_ttl_sec); + + if (current.s3queue_last_processed_path.value != expected.s3queue_last_processed_path.value) + throw Exception( + ErrorCodes::METADATA_MISMATCH, + "Existing table metadata in ZooKeeper differs in last_processed_path. " + "Stored in ZooKeeper: {}, local: {}", + expected.s3queue_last_processed_path.value, + current.s3queue_last_processed_path.value); + + if (current.mode == S3QueueMode::ORDERED) + { + if (current.s3queue_buckets != expected.s3queue_buckets) + { + throw Exception( + ErrorCodes::METADATA_MISMATCH, + "Existing table metadata in ZooKeeper differs in s3queue_buckets setting. " + "Stored in ZooKeeper: {}, local: {}", + expected.s3queue_buckets, current.s3queue_buckets); + } + + if (S3QueueMetadata::getBucketsNum(current) != S3QueueMetadata::getBucketsNum(expected)) + { + throw Exception( + ErrorCodes::METADATA_MISMATCH, + "Existing table metadata in ZooKeeper differs in processing buckets. " + "Stored in ZooKeeper: {}, local: {}", + S3QueueMetadata::getBucketsNum(current), S3QueueMetadata::getBucketsNum(expected)); + } + } +} +} diff --git a/src/Storages/S3Queue/S3QueueTableMetadata.h b/src/Storages/S3Queue/S3QueueTableMetadata.h index bb8f8ccf2c4..d53b60570ae 100644 --- a/src/Storages/S3Queue/S3QueueTableMetadata.h +++ b/src/Storages/S3Queue/S3QueueTableMetadata.h @@ -1,7 +1,5 @@ #pragma once -#if USE_AWS_S3 - #include #include #include @@ -22,10 +20,11 @@ struct S3QueueTableMetadata String columns; String after_processing; String mode; - UInt64 s3queue_tracked_files_limit = 0; - UInt64 s3queue_tracked_file_ttl_sec = 0; - UInt64 s3queue_total_shards_num = 1; - UInt64 s3queue_processing_threads_num = 1; + UInt64 tracked_files_limit = 0; + UInt64 tracked_file_ttl_sec = 0; + UInt64 buckets = 0; + UInt64 processing_threads_num = 1; + String last_processed_path; S3QueueTableMetadata() = default; S3QueueTableMetadata( @@ -39,6 +38,7 @@ struct S3QueueTableMetadata String toString() const; void checkEquals(const S3QueueTableMetadata & from_zk) const; + static void checkEquals(const S3QueueSettings & current, const S3QueueSettings & expected); private: void checkImmutableFieldsEquals(const S3QueueTableMetadata & from_zk) const; @@ -46,5 +46,3 @@ private: } - -#endif diff --git a/src/Storages/S3Queue/S3QueueUnorderedFileMetadata.cpp b/src/Storages/S3Queue/S3QueueUnorderedFileMetadata.cpp new file mode 100644 index 00000000000..c61e9557fc2 --- /dev/null +++ b/src/Storages/S3Queue/S3QueueUnorderedFileMetadata.cpp @@ -0,0 +1,155 @@ +#include +#include +#include +#include + +namespace DB +{ +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + +namespace +{ + zkutil::ZooKeeperPtr getZooKeeper() + { + return Context::getGlobalContextInstance()->getZooKeeper(); + } +} + +S3QueueUnorderedFileMetadata::S3QueueUnorderedFileMetadata( + const std::filesystem::path & zk_path, + const std::string & path_, + FileStatusPtr file_status_, + size_t max_loading_retries_, + LoggerPtr log_) + : S3QueueIFileMetadata( + path_, + /* processing_node_path */zk_path / "processing" / getNodeName(path_), + /* processed_node_path */zk_path / "processed" / getNodeName(path_), + /* failed_node_path */zk_path / "failed" / getNodeName(path_), + file_status_, + max_loading_retries_, + log_) +{ +} + +std::pair S3QueueUnorderedFileMetadata::setProcessingImpl() +{ + /// In one zookeeper transaction do the following: + enum RequestType + { + /// node_name is not within processed persistent nodes + PROCESSED_PATH_DOESNT_EXIST = 0, + /// node_name is not within failed persistent nodes + FAILED_PATH_DOESNT_EXIST = 2, + /// node_name ephemeral processing node was successfully created + CREATED_PROCESSING_PATH = 4, + /// update processing id + SET_PROCESSING_ID = 6, + }; + + const auto zk_client = getZooKeeper(); + processing_id = node_metadata.processing_id = getRandomASCIIString(10); + auto processor_info = getProcessorInfo(processing_id.value()); + + Coordination::Requests requests; + requests.push_back(zkutil::makeCreateRequest(processed_node_path, "", zkutil::CreateMode::Persistent)); + requests.push_back(zkutil::makeRemoveRequest(processed_node_path, -1)); + requests.push_back(zkutil::makeCreateRequest(failed_node_path, "", zkutil::CreateMode::Persistent)); + requests.push_back(zkutil::makeRemoveRequest(failed_node_path, -1)); + requests.push_back(zkutil::makeCreateRequest(processing_node_path, node_metadata.toString(), zkutil::CreateMode::Ephemeral)); + + requests.push_back( + zkutil::makeCreateRequest( + processing_node_id_path, processor_info, zkutil::CreateMode::Persistent, /* ignore_if_exists */true)); + requests.push_back(zkutil::makeSetRequest(processing_node_id_path, processor_info, -1)); + + Coordination::Responses responses; + const auto code = zk_client->tryMulti(requests, responses); + auto is_request_failed = [&](RequestType type) { return responses[type]->error != Coordination::Error::ZOK; }; + + if (code == Coordination::Error::ZOK) + { + const auto * set_response = dynamic_cast(responses[SET_PROCESSING_ID].get()); + processing_id_version = set_response->stat.version; + return std::pair{true, FileStatus::State::None}; + } + + if (is_request_failed(PROCESSED_PATH_DOESNT_EXIST)) + return {false, FileStatus::State::Processed}; + + if (is_request_failed(FAILED_PATH_DOESNT_EXIST)) + return {false, FileStatus::State::Failed}; + + if (is_request_failed(CREATED_PROCESSING_PATH)) + return {false, FileStatus::State::Processing}; + + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected state of zookeeper transaction: {}", magic_enum::enum_name(code)); +} + +void S3QueueUnorderedFileMetadata::setProcessedAtStartRequests( + Coordination::Requests & requests, + const zkutil::ZooKeeperPtr &) +{ + requests.push_back( + zkutil::makeCreateRequest( + processed_node_path, node_metadata.toString(), zkutil::CreateMode::Persistent)); +} + +void S3QueueUnorderedFileMetadata::setProcessedImpl() +{ + /// In one zookeeper transaction do the following: + enum RequestType + { + SET_MAX_PROCESSED_PATH = 0, + CHECK_PROCESSING_ID_PATH = 1, /// Optional. + REMOVE_PROCESSING_ID_PATH = 2, /// Optional. + REMOVE_PROCESSING_PATH = 3, /// Optional. + }; + + const auto zk_client = getZooKeeper(); + std::string failure_reason; + + Coordination::Requests requests; + requests.push_back( + zkutil::makeCreateRequest( + processed_node_path, node_metadata.toString(), zkutil::CreateMode::Persistent)); + + if (processing_id_version.has_value()) + { + requests.push_back(zkutil::makeCheckRequest(processing_node_id_path, processing_id_version.value())); + requests.push_back(zkutil::makeRemoveRequest(processing_node_id_path, processing_id_version.value())); + requests.push_back(zkutil::makeRemoveRequest(processing_node_path, -1)); + } + + Coordination::Responses responses; + auto is_request_failed = [&](RequestType type) { return responses[type]->error != Coordination::Error::ZOK; }; + + const auto code = zk_client->tryMulti(requests, responses); + if (code == Coordination::Error::ZOK) + { + if (max_loading_retries) + zk_client->tryRemove(failed_node_path + ".retriable", -1); + + LOG_TRACE(log, "Moved file `{}` to processed (node path: {})", path, processed_node_path); + return; + } + + if (Coordination::isHardwareError(code)) + failure_reason = "Lost connection to keeper"; + else if (is_request_failed(SET_MAX_PROCESSED_PATH)) + throw Exception(ErrorCodes::LOGICAL_ERROR, + "Cannot create a persistent node in /processed since it already exists"); + else if (is_request_failed(CHECK_PROCESSING_ID_PATH)) + failure_reason = "Version of processing id node changed"; + else if (is_request_failed(REMOVE_PROCESSING_PATH)) + failure_reason = "Failed to remove processing path"; + else + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected state of zookeeper transaction: {}", code); + + LOG_WARNING(log, "Cannot set file {} as processed: {}. Reason: {}", path, code, failure_reason); +} + +} diff --git a/src/Storages/S3Queue/S3QueueUnorderedFileMetadata.h b/src/Storages/S3Queue/S3QueueUnorderedFileMetadata.h new file mode 100644 index 00000000000..24c2765bf3a --- /dev/null +++ b/src/Storages/S3Queue/S3QueueUnorderedFileMetadata.h @@ -0,0 +1,32 @@ +#pragma once +#include +#include +#include + +namespace DB +{ + +class S3QueueUnorderedFileMetadata : public S3QueueIFileMetadata +{ +public: + using Bucket = size_t; + + explicit S3QueueUnorderedFileMetadata( + const std::filesystem::path & zk_path, + const std::string & path_, + FileStatusPtr file_status_, + size_t max_loading_retries_, + LoggerPtr log_); + + static std::vector getMetadataPaths() { return {"processed", "failed", "processing"}; } + + void setProcessedAtStartRequests( + Coordination::Requests & requests, + const zkutil::ZooKeeperPtr & zk_client) override; + +private: + std::pair setProcessingImpl() override; + void setProcessedImpl() override; +}; + +} diff --git a/src/Storages/S3Queue/StorageS3Queue.cpp b/src/Storages/S3Queue/StorageS3Queue.cpp index f8eb288921c..0844d0a479e 100644 --- a/src/Storages/S3Queue/StorageS3Queue.cpp +++ b/src/Storages/S3Queue/StorageS3Queue.cpp @@ -1,7 +1,6 @@ #include #include "config.h" -#if USE_AWS_S3 #include #include #include @@ -18,7 +17,7 @@ #include #include #include -#include +#include #include #include #include @@ -48,8 +47,6 @@ namespace ErrorCodes extern const int BAD_ARGUMENTS; extern const int S3_ERROR; extern const int QUERY_NOT_ALLOWED; - extern const int REPLICA_ALREADY_EXISTS; - extern const int INCOMPATIBLE_COLUMNS; } namespace @@ -104,13 +101,12 @@ StorageS3Queue::StorageS3Queue( const String & comment, ContextPtr context_, std::optional format_settings_, - ASTStorage * engine_args, + ASTStorage * /* engine_args */, LoadingStrictnessLevel mode) : IStorage(table_id_) , WithContext(context_) , s3queue_settings(std::move(s3queue_settings_)) , zk_path(chooseZooKeeperPath(table_id_, context_->getSettingsRef(), *s3queue_settings)) - , after_processing(s3queue_settings->after_processing) , configuration{configuration_} , format_settings(format_settings_) , reschedule_processing_interval_ms(s3queue_settings->s3queue_polling_min_timeout_ms) @@ -132,7 +128,7 @@ StorageS3Queue::StorageS3Queue( if (mode == LoadingStrictnessLevel::CREATE && !context_->getSettingsRef().s3queue_allow_experimental_sharded_mode && s3queue_settings->mode == S3QueueMode::ORDERED - && (s3queue_settings->s3queue_total_shards_num > 1 || s3queue_settings->s3queue_processing_threads_num > 1)) + && (s3queue_settings->s3queue_buckets > 1 || s3queue_settings->s3queue_processing_threads_num > 1)) { throw Exception(ErrorCodes::QUERY_NOT_ALLOWED, "S3Queue sharded mode is not allowed. To enable use `s3queue_allow_experimental_sharded_mode`"); } @@ -157,28 +153,18 @@ StorageS3Queue::StorageS3Queue( LOG_INFO(log, "Using zookeeper path: {}", zk_path.string()); task = getContext()->getSchedulePool().createTask("S3QueueStreamingTask", [this] { threadFunc(); }); - createOrCheckMetadata(storage_metadata); - /// Get metadata manager from S3QueueMetadataFactory, /// it will increase the ref count for the metadata object. /// The ref count is decreased when StorageS3Queue::drop() method is called. files_metadata = S3QueueMetadataFactory::instance().getOrCreate(zk_path, *s3queue_settings); - - if (files_metadata->isShardedProcessing()) + try { - if (!s3queue_settings->s3queue_current_shard_num.changed) - { - s3queue_settings->s3queue_current_shard_num = static_cast(files_metadata->registerNewShard()); - engine_args->settings->changes.setSetting("s3queue_current_shard_num", s3queue_settings->s3queue_current_shard_num.value); - } - else if (!files_metadata->isShardRegistered(s3queue_settings->s3queue_current_shard_num)) - { - files_metadata->registerNewShard(s3queue_settings->s3queue_current_shard_num); - } + files_metadata->initialize(configuration_, storage_metadata); } - if (s3queue_settings->mode == S3QueueMode::ORDERED && !s3queue_settings->s3queue_last_processed_path.value.empty()) + catch (...) { - files_metadata->setFileProcessed(s3queue_settings->s3queue_last_processed_path.value, s3queue_settings->s3queue_current_shard_num); + S3QueueMetadataFactory::instance().remove(zk_path); + throw; } } @@ -201,14 +187,7 @@ void StorageS3Queue::shutdown(bool is_drop) if (files_metadata) { - files_metadata->deactivateCleanupTask(); - - if (is_drop && files_metadata->isShardedProcessing()) - { - files_metadata->unregisterShard(s3queue_settings->s3queue_current_shard_num); - LOG_TRACE(log, "Unregistered shard {} from zookeeper", s3queue_settings->s3queue_current_shard_num); - } - + files_metadata->shutdown(); files_metadata.reset(); } LOG_TRACE(log, "Shut down storage"); @@ -328,9 +307,9 @@ void ReadFromS3Queue::initializePipeline(QueryPipelineBuilder & pipeline, const createIterator(nullptr); for (size_t i = 0; i < adjusted_num_streams; ++i) pipes.emplace_back(storage->createSource( + i, info, iterator, - storage->files_metadata->getIdForProcessingThread(i, storage->s3queue_settings->s3queue_current_shard_num), max_block_size, context)); auto pipe = Pipe::unitePipes(std::move(pipes)); @@ -344,9 +323,9 @@ void ReadFromS3Queue::initializePipeline(QueryPipelineBuilder & pipeline, const } std::shared_ptr StorageS3Queue::createSource( + size_t processor_id, const ReadFromFormatInfo & info, std::shared_ptr file_iterator, - size_t processing_id, size_t max_block_size, ContextPtr local_context) { @@ -368,9 +347,20 @@ std::shared_ptr StorageS3Queue::createSource( }; auto s3_queue_log = s3queue_settings->s3queue_enable_logging_to_s3queue_log ? local_context->getS3QueueLog() : nullptr; return std::make_shared( - getName(), info.source_header, std::move(internal_source), - files_metadata, processing_id, after_processing, file_deleter, info.requested_virtual_columns, - local_context, shutdown_called, table_is_being_dropped, s3_queue_log, getStorageID(), log); + getName(), + processor_id, + info.source_header, + std::move(internal_source), + files_metadata, + s3queue_settings->after_processing, + file_deleter, + info.requested_virtual_columns, + local_context, + shutdown_called, + table_is_being_dropped, + s3_queue_log, + getStorageID(), + log); } bool StorageS3Queue::hasDependencies(const StorageID & table_id) @@ -471,10 +461,7 @@ bool StorageS3Queue::streamToViews() pipes.reserve(s3queue_settings->s3queue_processing_threads_num); for (size_t i = 0; i < s3queue_settings->s3queue_processing_threads_num; ++i) { - auto source = createSource( - read_from_format_info, file_iterator, files_metadata->getIdForProcessingThread(i, s3queue_settings->s3queue_current_shard_num), - DBMS_DEFAULT_BUFFER_SIZE, s3queue_context); - + auto source = createSource(i, read_from_format_info, file_iterator, DBMS_DEFAULT_BUFFER_SIZE, s3queue_context); pipes.emplace_back(std::move(source)); } auto pipe = Pipe::unitePipes(std::move(pipes)); @@ -497,88 +484,16 @@ zkutil::ZooKeeperPtr StorageS3Queue::getZooKeeper() const return getContext()->getZooKeeper(); } -void StorageS3Queue::createOrCheckMetadata(const StorageInMemoryMetadata & storage_metadata) -{ - auto zookeeper = getZooKeeper(); - zookeeper->createAncestors(zk_path); - - for (size_t i = 0; i < 1000; ++i) - { - Coordination::Requests requests; - if (zookeeper->exists(zk_path / "metadata")) - { - checkTableStructure(zk_path, storage_metadata); - } - else - { - std::string metadata = S3QueueTableMetadata(*configuration, *s3queue_settings, storage_metadata).toString(); - requests.emplace_back(zkutil::makeCreateRequest(zk_path, "", zkutil::CreateMode::Persistent)); - requests.emplace_back(zkutil::makeCreateRequest(zk_path / "processed", "", zkutil::CreateMode::Persistent)); - requests.emplace_back(zkutil::makeCreateRequest(zk_path / "failed", "", zkutil::CreateMode::Persistent)); - requests.emplace_back(zkutil::makeCreateRequest(zk_path / "processing", "", zkutil::CreateMode::Persistent)); - requests.emplace_back(zkutil::makeCreateRequest(zk_path / "metadata", metadata, zkutil::CreateMode::Persistent)); - } - - if (!requests.empty()) - { - Coordination::Responses responses; - auto code = zookeeper->tryMulti(requests, responses); - if (code == Coordination::Error::ZNODEEXISTS) - { - LOG_INFO(log, "It looks like the table {} was created by another server at the same moment, will retry", zk_path.string()); - continue; - } - else if (code != Coordination::Error::ZOK) - { - zkutil::KeeperMultiException::check(code, requests, responses); - } - } - - return; - } - - throw Exception( - ErrorCodes::REPLICA_ALREADY_EXISTS, - "Cannot create table, because it is created concurrently every time or because " - "of wrong zk_path or because of logical error"); -} - - -void StorageS3Queue::checkTableStructure(const String & zookeeper_prefix, const StorageInMemoryMetadata & storage_metadata) -{ - // Verify that list of columns and table settings match those specified in ZK (/metadata). - // If not, throw an exception. - - auto zookeeper = getZooKeeper(); - String metadata_str = zookeeper->get(fs::path(zookeeper_prefix) / "metadata"); - auto metadata_from_zk = S3QueueTableMetadata::parse(metadata_str); - - S3QueueTableMetadata old_metadata(*configuration, *s3queue_settings, storage_metadata); - old_metadata.checkEquals(metadata_from_zk); - - auto columns_from_zk = ColumnsDescription::parse(metadata_from_zk.columns); - const ColumnsDescription & old_columns = storage_metadata.getColumns(); - if (columns_from_zk != old_columns) - { - throw Exception( - ErrorCodes::INCOMPATIBLE_COLUMNS, - "Table columns structure in ZooKeeper is different from local table structure. Local columns:\n" - "{}\nZookeeper columns:\n{}", - old_columns.toString(), - columns_from_zk.toString()); - } -} - std::shared_ptr StorageS3Queue::createFileIterator(ContextPtr local_context, const ActionsDAG::Node * predicate) { auto settings = configuration->getQuerySettings(local_context); auto glob_iterator = std::make_unique( object_storage, configuration, predicate, getVirtualsList(), local_context, nullptr, settings.list_object_keys_size, settings.throw_on_zero_files_match); - return std::make_shared( - files_metadata, std::move(glob_iterator), s3queue_settings->s3queue_current_shard_num, shutdown_called, log); + return std::make_shared(files_metadata, std::move(glob_iterator), shutdown_called, log); } +#if USE_AWS_S3 void registerStorageS3Queue(StorageFactory & factory) { factory.registerStorage( @@ -645,8 +560,6 @@ void registerStorageS3Queue(StorageFactory & factory) .source_access_type = AccessType::S3, }); } +#endif } - - -#endif diff --git a/src/Storages/S3Queue/StorageS3Queue.h b/src/Storages/S3Queue/StorageS3Queue.h index 83b7bc6667b..ef83a1ccc25 100644 --- a/src/Storages/S3Queue/StorageS3Queue.h +++ b/src/Storages/S3Queue/StorageS3Queue.h @@ -1,7 +1,6 @@ #pragma once #include "config.h" -#if USE_AWS_S3 #include #include #include @@ -16,7 +15,7 @@ namespace DB { -class S3QueueFilesMetadata; +class S3QueueMetadata; class StorageS3Queue : public IStorage, WithContext { @@ -59,9 +58,8 @@ private: const std::unique_ptr s3queue_settings; const fs::path zk_path; - const S3QueueAction after_processing; - std::shared_ptr files_metadata; + std::shared_ptr files_metadata; ConfigurationPtr configuration; ObjectStoragePtr object_storage; @@ -86,20 +84,15 @@ private: std::shared_ptr createFileIterator(ContextPtr local_context, const ActionsDAG::Node * predicate); std::shared_ptr createSource( + size_t processor_id, const ReadFromFormatInfo & info, std::shared_ptr file_iterator, - size_t processing_id, size_t max_block_size, ContextPtr local_context); bool hasDependencies(const StorageID & table_id); bool streamToViews(); void threadFunc(); - - void createOrCheckMetadata(const StorageInMemoryMetadata & storage_metadata); - void checkTableStructure(const String & zookeeper_prefix, const StorageInMemoryMetadata & storage_metadata); }; } - -#endif diff --git a/src/Storages/Statistics/Estimator.cpp b/src/Storages/Statistics/Estimator.cpp index 7e0e465c7bf..e272014c1c2 100644 --- a/src/Storages/Statistics/Estimator.cpp +++ b/src/Storages/Statistics/Estimator.cpp @@ -112,7 +112,7 @@ Float64 ConditionEstimator::estimateSelectivity(const RPNBuilderTreeNode & node) auto [op, val] = extractBinaryOp(node, col); if (op == "equals") { - if (val < - threshold || val > threshold) + if (val < -threshold || val > threshold) return default_normal_cond_factor; else return default_good_cond_factor; diff --git a/src/Storages/StatisticsDescription.cpp b/src/Storages/StatisticsDescription.cpp index a427fb6a7cd..7d4226f2fbe 100644 --- a/src/Storages/StatisticsDescription.cpp +++ b/src/Storages/StatisticsDescription.cpp @@ -22,6 +22,31 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; }; +StatisticDescription & StatisticDescription::operator=(const StatisticDescription & other) +{ + if (this == &other) + return *this; + + type = other.type; + column_name = other.column_name; + ast = other.ast ? other.ast->clone() : nullptr; + + return *this; +} + +StatisticDescription & StatisticDescription::operator=(StatisticDescription && other) noexcept +{ + if (this == &other) + return *this; + + type = std::exchange(other.type, StatisticType{}); + column_name = std::move(other.column_name); + ast = other.ast ? other.ast->clone() : nullptr; + other.ast.reset(); + + return *this; +} + StatisticType stringToType(String type) { if (type == "tdigest") @@ -55,15 +80,7 @@ std::vector StatisticDescription::getStatisticsFromAST(con const auto & column = columns.getPhysical(column_name); stat.column_name = column.name; - - auto function_node = std::make_shared(); - function_node->name = "STATISTIC"; - function_node->arguments = std::make_shared(); - function_node->arguments->children.push_back(std::make_shared(stat_definition->type)); - function_node->children.push_back(function_node->arguments); - - stat.ast = function_node; - + stat.ast = makeASTFunction("STATISTIC", std::make_shared(stat_definition->type)); stats.push_back(stat); } @@ -80,6 +97,7 @@ StatisticDescription StatisticDescription::getStatisticFromColumnDeclaration(con const auto & stat_type_list_ast = column.stat_type->as().arguments; if (stat_type_list_ast->children.size() != 1) throw Exception(ErrorCodes::INCORRECT_QUERY, "We expect only one statistic type for column {}", queryToString(column)); + const auto & stat_type = stat_type_list_ast->children[0]->as().name; StatisticDescription stat; diff --git a/src/Storages/StatisticsDescription.h b/src/Storages/StatisticsDescription.h index 9a66951ab52..b571fa31e9d 100644 --- a/src/Storages/StatisticsDescription.h +++ b/src/Storages/StatisticsDescription.h @@ -27,6 +27,10 @@ struct StatisticDescription String getTypeName() const; StatisticDescription() = default; + StatisticDescription(const StatisticDescription & other) { *this = other; } + StatisticDescription & operator=(const StatisticDescription & other); + StatisticDescription(StatisticDescription && other) noexcept { *this = std::move(other); } + StatisticDescription & operator=(StatisticDescription && other) noexcept; bool operator==(const StatisticDescription & other) const { diff --git a/src/Storages/StorageDistributed.cpp b/src/Storages/StorageDistributed.cpp index fbb40f8b79f..9c58468c4a4 100644 --- a/src/Storages/StorageDistributed.cpp +++ b/src/Storages/StorageDistributed.cpp @@ -1986,9 +1986,18 @@ void registerStorageDistributed(StorageFactory & factory) bool StorageDistributed::initializeDiskOnConfigChange(const std::set & new_added_disks) { - if (!data_volume) + if (!storage_policy || !data_volume) return true; + auto new_storage_policy = getContext()->getStoragePolicy(storage_policy->getName()); + auto new_data_volume = new_storage_policy->getVolume(0); + if (new_storage_policy->getVolumes().size() > 1) + LOG_WARNING(log, "Storage policy for Distributed table has multiple volumes. " + "Only {} volume will be used to store data. Other will be ignored.", data_volume->getName()); + + std::atomic_store(&storage_policy, new_storage_policy); + std::atomic_store(&data_volume, new_data_volume); + for (auto & disk : data_volume->getDisks()) { if (new_added_disks.contains(disk->getName())) diff --git a/src/Storages/StorageFile.cpp b/src/Storages/StorageFile.cpp index 51bcc64bceb..6744159d5dc 100644 --- a/src/Storages/StorageFile.cpp +++ b/src/Storages/StorageFile.cpp @@ -274,7 +274,7 @@ std::unique_ptr selectReadBuffer( if (S_ISREG(file_stat.st_mode) && (read_method == LocalFSReadMethod::pread || read_method == LocalFSReadMethod::mmap)) { if (use_table_fd) - res = std::make_unique(table_fd); + res = std::make_unique(table_fd, context->getSettingsRef().max_read_buffer_size); else res = std::make_unique(current_path, context->getSettingsRef().max_read_buffer_size); @@ -296,7 +296,7 @@ std::unique_ptr selectReadBuffer( else { if (use_table_fd) - res = std::make_unique(table_fd); + res = std::make_unique(table_fd, context->getSettingsRef().max_read_buffer_size); else res = std::make_unique(current_path, context->getSettingsRef().max_read_buffer_size); diff --git a/src/Storages/StorageGenerateRandom.cpp b/src/Storages/StorageGenerateRandom.cpp index cdbade51695..2190e012c5b 100644 --- a/src/Storages/StorageGenerateRandom.cpp +++ b/src/Storages/StorageGenerateRandom.cpp @@ -267,6 +267,9 @@ ColumnPtr fillColumnWithRandomData( case TypeIndex::Tuple: { auto elements = typeid_cast(type.get())->getElements(); + if (elements.empty()) + return ColumnTuple::create(limit); + const size_t tuple_size = elements.size(); Columns tuple_columns(tuple_size); diff --git a/src/Storages/StorageLoop.cpp b/src/Storages/StorageLoop.cpp new file mode 100644 index 00000000000..2062749e60b --- /dev/null +++ b/src/Storages/StorageLoop.cpp @@ -0,0 +1,49 @@ +#include "StorageLoop.h" +#include +#include +#include + + +namespace DB +{ + namespace ErrorCodes + { + + } + StorageLoop::StorageLoop( + const StorageID & table_id_, + StoragePtr inner_storage_) + : IStorage(table_id_) + , inner_storage(std::move(inner_storage_)) + { + StorageInMemoryMetadata storage_metadata = inner_storage->getInMemoryMetadata(); + setInMemoryMetadata(storage_metadata); + } + + + void StorageLoop::read( + QueryPlan & query_plan, + const Names & column_names, + const StorageSnapshotPtr & storage_snapshot, + SelectQueryInfo & query_info, + ContextPtr context, + QueryProcessingStage::Enum processed_stage, + size_t max_block_size, + size_t num_streams) + { + query_info.optimize_trivial_count = false; + + query_plan.addStep(std::make_unique( + column_names, query_info, storage_snapshot, context, processed_stage, inner_storage, max_block_size, num_streams + )); + } + + void registerStorageLoop(StorageFactory & factory) + { + factory.registerStorage("Loop", [](const StorageFactory::Arguments & args) + { + StoragePtr inner_storage; + return std::make_shared(args.table_id, inner_storage); + }); + } +} diff --git a/src/Storages/StorageLoop.h b/src/Storages/StorageLoop.h new file mode 100644 index 00000000000..48760b169c2 --- /dev/null +++ b/src/Storages/StorageLoop.h @@ -0,0 +1,33 @@ +#pragma once +#include "config.h" +#include + + +namespace DB +{ + + class StorageLoop final : public IStorage + { + public: + StorageLoop( + const StorageID & table_id, + StoragePtr inner_storage_); + + std::string getName() const override { return "Loop"; } + + void read( + QueryPlan & query_plan, + const Names & column_names, + const StorageSnapshotPtr & storage_snapshot, + SelectQueryInfo & query_info, + ContextPtr context, + QueryProcessingStage::Enum processed_stage, + size_t max_block_size, + size_t num_streams) override; + + bool supportsTrivialCountOptimization(const StorageSnapshotPtr &, ContextPtr) const override { return false; } + + private: + StoragePtr inner_storage; + }; +} diff --git a/src/Storages/StorageMergeTree.cpp b/src/Storages/StorageMergeTree.cpp index ea698775298..27a76f4f21d 100644 --- a/src/Storages/StorageMergeTree.cpp +++ b/src/Storages/StorageMergeTree.cpp @@ -2119,16 +2119,36 @@ void StorageMergeTree::replacePartitionFrom(const StoragePtr & source_table, con MergeTreePartInfo dst_part_info(partition_id, temp_index, temp_index, src_part->info.level); IDataPartStorage::ClonePartParams clone_params{.txn = local_context->getCurrentTransaction()}; - auto [dst_part, part_lock] = cloneAndLoadDataPartOnSameDisk( - src_part, - TMP_PREFIX, - dst_part_info, - my_metadata_snapshot, - clone_params, - local_context->getReadSettings(), - local_context->getWriteSettings()); - dst_parts.emplace_back(std::move(dst_part)); - dst_parts_locks.emplace_back(std::move(part_lock)); + if (replace) + { + /// Replace can only work on the same disk + auto [dst_part, part_lock] = cloneAndLoadDataPart( + src_part, + TMP_PREFIX, + dst_part_info, + my_metadata_snapshot, + clone_params, + local_context->getReadSettings(), + local_context->getWriteSettings(), + true/*must_on_same_disk*/); + dst_parts.emplace_back(std::move(dst_part)); + dst_parts_locks.emplace_back(std::move(part_lock)); + } + else + { + /// Attach can work on another disk + auto [dst_part, part_lock] = cloneAndLoadDataPart( + src_part, + TMP_PREFIX, + dst_part_info, + my_metadata_snapshot, + clone_params, + local_context->getReadSettings(), + local_context->getWriteSettings(), + false/*must_on_same_disk*/); + dst_parts.emplace_back(std::move(dst_part)); + dst_parts_locks.emplace_back(std::move(part_lock)); + } } /// ATTACH empty part set @@ -2233,14 +2253,15 @@ void StorageMergeTree::movePartitionToTable(const StoragePtr & dest_table, const .copy_instead_of_hardlink = getSettings()->always_use_copy_instead_of_hardlinks, }; - auto [dst_part, part_lock] = dest_table_storage->cloneAndLoadDataPartOnSameDisk( + auto [dst_part, part_lock] = dest_table_storage->cloneAndLoadDataPart( src_part, TMP_PREFIX, dst_part_info, dest_metadata_snapshot, clone_params, local_context->getReadSettings(), - local_context->getWriteSettings() + local_context->getWriteSettings(), + true/*must_on_same_disk*/ ); dst_parts.emplace_back(std::move(dst_part)); diff --git a/src/Storages/StorageReplicatedMergeTree.cpp b/src/Storages/StorageReplicatedMergeTree.cpp index 62bfcc223fd..e18e66d7af9 100644 --- a/src/Storages/StorageReplicatedMergeTree.cpp +++ b/src/Storages/StorageReplicatedMergeTree.cpp @@ -2803,7 +2803,7 @@ bool StorageReplicatedMergeTree::executeReplaceRange(LogEntry & entry) auto obtain_part = [&] (PartDescriptionPtr & part_desc) { - /// Fetches with zero-copy-replication are cheap, but cloneAndLoadDataPartOnSameDisk will do full copy. + /// Fetches with zero-copy-replication are cheap, but cloneAndLoadDataPart(must_on_same_disk=true) will do full copy. /// It's okay to check the setting for current table and disk for the source table, because src and dst part are on the same disk. bool prefer_fetch_from_other_replica = !part_desc->replica.empty() && storage_settings_ptr->allow_remote_fs_zero_copy_replication && part_desc->src_table_part && part_desc->src_table_part->isStoredOnRemoteDiskWithZeroCopySupport(); @@ -2822,14 +2822,15 @@ bool StorageReplicatedMergeTree::executeReplaceRange(LogEntry & entry) .copy_instead_of_hardlink = storage_settings_ptr->always_use_copy_instead_of_hardlinks || ((our_zero_copy_enabled || source_zero_copy_enabled) && part_desc->src_table_part->isStoredOnRemoteDiskWithZeroCopySupport()), .metadata_version_to_write = metadata_snapshot->getMetadataVersion() }; - auto [res_part, temporary_part_lock] = cloneAndLoadDataPartOnSameDisk( + auto [res_part, temporary_part_lock] = cloneAndLoadDataPart( part_desc->src_table_part, TMP_PREFIX + "clone_", part_desc->new_part_info, metadata_snapshot, clone_params, getContext()->getReadSettings(), - getContext()->getWriteSettings()); + getContext()->getWriteSettings(), + true/*must_on_same_disk*/); part_desc->res_part = std::move(res_part); part_desc->temporary_part_lock = std::move(temporary_part_lock); } @@ -4900,14 +4901,15 @@ bool StorageReplicatedMergeTree::fetchPart( .keep_metadata_version = true, }; - auto [cloned_part, lock] = cloneAndLoadDataPartOnSameDisk( + auto [cloned_part, lock] = cloneAndLoadDataPart( part_to_clone, "tmp_clone_", part_info, metadata_snapshot, clone_params, getContext()->getReadSettings(), - getContext()->getWriteSettings()); + getContext()->getWriteSettings(), + true/*must_on_same_disk*/); part_directory_lock = std::move(lock); return cloned_part; @@ -8109,17 +8111,37 @@ void StorageReplicatedMergeTree::replacePartitionFrom( .copy_instead_of_hardlink = storage_settings_ptr->always_use_copy_instead_of_hardlinks || (zero_copy_enabled && src_part->isStoredOnRemoteDiskWithZeroCopySupport()), .metadata_version_to_write = metadata_snapshot->getMetadataVersion() }; - auto [dst_part, part_lock] = cloneAndLoadDataPartOnSameDisk( - src_part, - TMP_PREFIX, - dst_part_info, - metadata_snapshot, - clone_params, - query_context->getReadSettings(), - query_context->getWriteSettings()); + if (replace) + { + /// Replace can only work on the same disk + auto [dst_part, part_lock] = cloneAndLoadDataPart( + src_part, + TMP_PREFIX, + dst_part_info, + metadata_snapshot, + clone_params, + query_context->getReadSettings(), + query_context->getWriteSettings(), + true/*must_on_same_disk*/); + dst_parts.emplace_back(std::move(dst_part)); + dst_parts_locks.emplace_back(std::move(part_lock)); + } + else + { + /// Attach can work on another disk + auto [dst_part, part_lock] = cloneAndLoadDataPart( + src_part, + TMP_PREFIX, + dst_part_info, + metadata_snapshot, + clone_params, + query_context->getReadSettings(), + query_context->getWriteSettings(), + false/*must_on_same_disk*/); + dst_parts.emplace_back(std::move(dst_part)); + dst_parts_locks.emplace_back(std::move(part_lock)); + } src_parts.emplace_back(src_part); - dst_parts.emplace_back(dst_part); - dst_parts_locks.emplace_back(std::move(part_lock)); ephemeral_locks.emplace_back(std::move(*lock)); block_id_paths.emplace_back(block_id_path); part_checksums.emplace_back(hash_hex); @@ -8375,14 +8397,15 @@ void StorageReplicatedMergeTree::movePartitionToTable(const StoragePtr & dest_ta .copy_instead_of_hardlink = storage_settings_ptr->always_use_copy_instead_of_hardlinks || (zero_copy_enabled && src_part->isStoredOnRemoteDiskWithZeroCopySupport()), .metadata_version_to_write = dest_metadata_snapshot->getMetadataVersion() }; - auto [dst_part, dst_part_lock] = dest_table_storage->cloneAndLoadDataPartOnSameDisk( + auto [dst_part, dst_part_lock] = dest_table_storage->cloneAndLoadDataPart( src_part, TMP_PREFIX, dst_part_info, dest_metadata_snapshot, clone_params, query_context->getReadSettings(), - query_context->getWriteSettings()); + query_context->getWriteSettings(), + true/*must_on_same_disk*/); src_parts.emplace_back(src_part); dst_parts.emplace_back(dst_part); diff --git a/src/Storages/StorageSet.cpp b/src/Storages/StorageSet.cpp index 205a90423bf..a8c8e81e23d 100644 --- a/src/Storages/StorageSet.cpp +++ b/src/Storages/StorageSet.cpp @@ -130,7 +130,6 @@ StorageSetOrJoinBase::StorageSetOrJoinBase( storage_metadata.setComment(comment); setInMemoryMetadata(storage_metadata); - if (relative_path_.empty()) throw Exception(ErrorCodes::INCORRECT_FILE_NAME, "Join and Set storages require data path"); diff --git a/src/Storages/StorageTableFunction.h b/src/Storages/StorageTableFunction.h index 9d966fb899b..9507eb6ed8a 100644 --- a/src/Storages/StorageTableFunction.h +++ b/src/Storages/StorageTableFunction.h @@ -63,14 +63,6 @@ public: StoragePolicyPtr getStoragePolicy() const override { return nullptr; } bool storesDataOnDisk() const override { return false; } - String getName() const override - { - std::lock_guard lock{nested_mutex}; - if (nested) - return nested->getName(); - return StorageProxy::getName(); - } - void startup() override { } void shutdown(bool is_drop) override { diff --git a/src/Storages/StorageURL.cpp b/src/Storages/StorageURL.cpp index 272f771194d..8d1c6933503 100644 --- a/src/Storages/StorageURL.cpp +++ b/src/Storages/StorageURL.cpp @@ -457,7 +457,7 @@ std::pair> StorageURLSource: const auto settings = context_->getSettings(); - auto proxy_config = getProxyConfiguration(http_method); + auto proxy_config = getProxyConfiguration(request_uri.getScheme()); try { @@ -543,10 +543,11 @@ StorageURLSink::StorageURLSink( std::string content_type = FormatFactory::instance().getContentType(format, context, format_settings); std::string content_encoding = toContentEncodingName(compression_method); - auto proxy_config = getProxyConfiguration(http_method); + auto poco_uri = Poco::URI(uri); + auto proxy_config = getProxyConfiguration(poco_uri.getScheme()); auto write_buffer = std::make_unique( - HTTPConnectionGroupType::STORAGE, Poco::URI(uri), http_method, content_type, content_encoding, headers, timeouts, DBMS_DEFAULT_BUFFER_SIZE, proxy_config + HTTPConnectionGroupType::STORAGE, poco_uri, http_method, content_type, content_encoding, headers, timeouts, DBMS_DEFAULT_BUFFER_SIZE, proxy_config ); const auto & settings = context->getSettingsRef(); @@ -1327,6 +1328,7 @@ std::optional IStorageURLBase::tryGetLastModificationTime( .withBufSize(settings.max_read_buffer_size) .withRedirects(settings.max_http_get_redirects) .withHeaders(headers) + .withProxy(proxy_config) .create(credentials); return buf->tryGetLastModificationTime(); diff --git a/src/Storages/System/StorageSystemNamedCollections.cpp b/src/Storages/System/StorageSystemNamedCollections.cpp index 156fa5e5a9b..0836560dff0 100644 --- a/src/Storages/System/StorageSystemNamedCollections.cpp +++ b/src/Storages/System/StorageSystemNamedCollections.cpp @@ -9,7 +9,7 @@ #include #include #include -#include +#include namespace DB diff --git a/src/Storages/System/StorageSystemS3Queue.cpp b/src/Storages/System/StorageSystemS3Queue.cpp index a6bb7da2b6e..637182067f2 100644 --- a/src/Storages/System/StorageSystemS3Queue.cpp +++ b/src/Storages/System/StorageSystemS3Queue.cpp @@ -11,7 +11,7 @@ #include #include #include -#include +#include #include #include #include @@ -45,29 +45,27 @@ void StorageSystemS3Queue::fillData(MutableColumns & res_columns, ContextPtr, co { for (const auto & [zookeeper_path, metadata] : S3QueueMetadataFactory::instance().getAll()) { - for (const auto & [file_name, file_status] : metadata->getFileStateses()) + for (const auto & [file_name, file_status] : metadata->getFileStatuses()) { size_t i = 0; res_columns[i++]->insert(zookeeper_path); res_columns[i++]->insert(file_name); - std::lock_guard lock(file_status->metadata_lock); - res_columns[i++]->insert(file_status->processed_rows.load()); - res_columns[i++]->insert(magic_enum::enum_name(file_status->state)); + res_columns[i++]->insert(magic_enum::enum_name(file_status->state.load())); if (file_status->processing_start_time) - res_columns[i++]->insert(file_status->processing_start_time); + res_columns[i++]->insert(file_status->processing_start_time.load()); else res_columns[i++]->insertDefault(); if (file_status->processing_end_time) - res_columns[i++]->insert(file_status->processing_end_time); + res_columns[i++]->insert(file_status->processing_end_time.load()); else res_columns[i++]->insertDefault(); ProfileEvents::dumpToMapColumn(file_status->profile_counters.getPartiallyAtomicSnapshot(), res_columns[i++].get(), true); - res_columns[i++]->insert(file_status->last_exception); + res_columns[i++]->insert(file_status->getException()); } } } diff --git a/src/Storages/System/StorageSystemTables.cpp b/src/Storages/System/StorageSystemTables.cpp index 1f900ec623e..783b899c978 100644 --- a/src/Storages/System/StorageSystemTables.cpp +++ b/src/Storages/System/StorageSystemTables.cpp @@ -146,7 +146,7 @@ ColumnPtr getFilteredTables(const ActionsDAG::Node * predicate, const ColumnPtr filter_by_engine = true; if (filter_by_engine) - engine_column= ColumnString::create(); + engine_column = ColumnString::create(); } for (size_t database_idx = 0; database_idx < filtered_databases_column->size(); ++database_idx) diff --git a/src/Storages/WindowView/StorageWindowView.cpp b/src/Storages/WindowView/StorageWindowView.cpp index a9ec1f6c694..8bca1c97aad 100644 --- a/src/Storages/WindowView/StorageWindowView.cpp +++ b/src/Storages/WindowView/StorageWindowView.cpp @@ -297,7 +297,6 @@ namespace CASE_WINDOW_KIND(Year) #undef CASE_WINDOW_KIND } - UNREACHABLE(); } class AddingAggregatedChunkInfoTransform : public ISimpleTransform @@ -920,7 +919,6 @@ UInt32 StorageWindowView::getWindowLowerBound(UInt32 time_sec) CASE_WINDOW_KIND(Year) #undef CASE_WINDOW_KIND } - UNREACHABLE(); } UInt32 StorageWindowView::getWindowUpperBound(UInt32 time_sec) @@ -948,7 +946,6 @@ UInt32 StorageWindowView::getWindowUpperBound(UInt32 time_sec) CASE_WINDOW_KIND(Year) #undef CASE_WINDOW_KIND } - UNREACHABLE(); } void StorageWindowView::addFireSignal(std::set & signals) diff --git a/src/Storages/buildQueryTreeForShard.cpp b/src/Storages/buildQueryTreeForShard.cpp index 4f655f9b5e8..131712e750a 100644 --- a/src/Storages/buildQueryTreeForShard.cpp +++ b/src/Storages/buildQueryTreeForShard.cpp @@ -320,6 +320,8 @@ QueryTreeNodePtr buildQueryTreeForShard(const PlannerContextPtr & planner_contex auto replacement_map = visitor.getReplacementMap(); const auto & global_in_or_join_nodes = visitor.getGlobalInOrJoinNodes(); + QueryTreeNodePtrWithHashMap global_in_temporary_tables; + for (const auto & global_in_or_join_node : global_in_or_join_nodes) { if (auto * join_node = global_in_or_join_node.query_node->as()) @@ -364,15 +366,19 @@ QueryTreeNodePtr buildQueryTreeForShard(const PlannerContextPtr & planner_contex if (in_function_node_type != QueryTreeNodeType::QUERY && in_function_node_type != QueryTreeNodeType::UNION && in_function_node_type != QueryTreeNodeType::TABLE) continue; - auto subquery_to_execute = in_function_subquery_node; - if (subquery_to_execute->as()) - subquery_to_execute = buildSubqueryToReadColumnsFromTableExpression(subquery_to_execute, planner_context->getQueryContext()); + auto & temporary_table_expression_node = global_in_temporary_tables[in_function_subquery_node]; + if (!temporary_table_expression_node) + { + auto subquery_to_execute = in_function_subquery_node; + if (subquery_to_execute->as()) + subquery_to_execute = buildSubqueryToReadColumnsFromTableExpression(subquery_to_execute, planner_context->getQueryContext()); - auto temporary_table_expression_node = executeSubqueryNode(subquery_to_execute, - planner_context->getMutableQueryContext(), - global_in_or_join_node.subquery_depth); + temporary_table_expression_node = executeSubqueryNode(subquery_to_execute, + planner_context->getMutableQueryContext(), + global_in_or_join_node.subquery_depth); + } - in_function_subquery_node = std::move(temporary_table_expression_node); + replacement_map.emplace(in_function_subquery_node.get(), temporary_table_expression_node); } else { diff --git a/src/Storages/registerStorages.cpp b/src/Storages/registerStorages.cpp index 0fb00c08acc..47542b7b47e 100644 --- a/src/Storages/registerStorages.cpp +++ b/src/Storages/registerStorages.cpp @@ -25,6 +25,7 @@ void registerStorageLiveView(StorageFactory & factory); void registerStorageGenerateRandom(StorageFactory & factory); void registerStorageExecutable(StorageFactory & factory); void registerStorageWindowView(StorageFactory & factory); +void registerStorageLoop(StorageFactory & factory); #if USE_RAPIDJSON || USE_SIMDJSON void registerStorageFuzzJSON(StorageFactory & factory); #endif @@ -120,6 +121,7 @@ void registerStorages() registerStorageGenerateRandom(factory); registerStorageExecutable(factory); registerStorageWindowView(factory); + registerStorageLoop(factory); #if USE_RAPIDJSON || USE_SIMDJSON registerStorageFuzzJSON(factory); #endif diff --git a/src/TableFunctions/TableFunctionLoop.cpp b/src/TableFunctions/TableFunctionLoop.cpp new file mode 100644 index 00000000000..43f122f6cb3 --- /dev/null +++ b/src/TableFunctions/TableFunctionLoop.cpp @@ -0,0 +1,155 @@ +#include "config.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registerTableFunctions.h" + +namespace DB +{ + namespace ErrorCodes + { + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int UNKNOWN_TABLE; + } + namespace + { + class TableFunctionLoop : public ITableFunction + { + public: + static constexpr auto name = "loop"; + std::string getName() const override { return name; } + private: + StoragePtr executeImpl(const ASTPtr & ast_function, ContextPtr context, const String & table_name, ColumnsDescription cached_columns, bool is_insert_query) const override; + const char * getStorageTypeName() const override { return "Loop"; } + ColumnsDescription getActualTableStructure(ContextPtr context, bool is_insert_query) const override; + void parseArguments(const ASTPtr & ast_function, ContextPtr context) override; + + // save the inner table function AST + ASTPtr inner_table_function_ast; + // save database and table + std::string loop_database_name; + std::string loop_table_name; + }; + + } + + void TableFunctionLoop::parseArguments(const ASTPtr & ast_function, ContextPtr context) + { + const auto & args_func = ast_function->as(); + + if (!args_func.arguments) + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Table function 'loop' must have arguments."); + + auto & args = args_func.arguments->children; + if (args.empty()) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "No arguments provided for table function 'loop'"); + + if (args.size() == 1) + { + if (const auto * id = args[0]->as()) + { + String id_name = id->name(); + + size_t dot_pos = id_name.find('.'); + if (id_name.find('.', dot_pos + 1) != String::npos) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "There are more than one dot"); + if (dot_pos != String::npos) + { + loop_database_name = id_name.substr(0, dot_pos); + loop_table_name = id_name.substr(dot_pos + 1); + } + else + { + loop_table_name = id_name; + } + } + else if (const auto * func = args[0]->as()) + { + inner_table_function_ast = args[0]; + } + else + { + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Expected identifier or function for argument 1 of function 'loop', got {}", args[0]->getID()); + } + } + // loop(database, table) + else if (args.size() == 2) + { + args[0] = evaluateConstantExpressionForDatabaseName(args[0], context); + args[1] = evaluateConstantExpressionOrIdentifierAsLiteral(args[1], context); + + loop_database_name = checkAndGetLiteralArgument(args[0], "database"); + loop_table_name = checkAndGetLiteralArgument(args[1], "table"); + } + else + { + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Table function 'loop' must have 1 or 2 arguments."); + } + } + + ColumnsDescription TableFunctionLoop::getActualTableStructure(ContextPtr /*context*/, bool /*is_insert_query*/) const + { + return ColumnsDescription(); + } + + StoragePtr TableFunctionLoop::executeImpl( + const ASTPtr & /*ast_function*/, + ContextPtr context, + const std::string & table_name, + ColumnsDescription cached_columns, + bool is_insert_query) const + { + StoragePtr storage; + if (!inner_table_function_ast) + { + String database_name = loop_database_name; + if (database_name.empty()) + database_name = context->getCurrentDatabase(); + + auto database = DatabaseCatalog::instance().getDatabase(database_name); + storage = database->tryGetTable(loop_table_name, context); + if (!storage) + throw Exception(ErrorCodes::UNKNOWN_TABLE, "Table '{}' not found in database '{}'", loop_table_name, database_name); + } + else + { + auto inner_table_function = TableFunctionFactory::instance().get(inner_table_function_ast, context); + storage = inner_table_function->execute( + inner_table_function_ast, + context, + table_name, + std::move(cached_columns), + is_insert_query); + } + auto res = std::make_shared( + StorageID(getDatabaseName(), table_name), + storage + ); + res->startup(); + return res; + } + + void registerTableFunctionLoop(TableFunctionFactory & factory) + { + factory.registerFunction( + {.documentation + = {.description=R"(The table function can be used to continuously output query results in an infinite loop.)", + .examples{{"loop", "SELECT * FROM loop((numbers(3)) LIMIT 7", "0" + "1" + "2" + "0" + "1" + "2" + "0"}} + }}); + } + +} diff --git a/src/TableFunctions/registerTableFunctions.cpp b/src/TableFunctions/registerTableFunctions.cpp index 26b9a771416..ca4913898f9 100644 --- a/src/TableFunctions/registerTableFunctions.cpp +++ b/src/TableFunctions/registerTableFunctions.cpp @@ -11,6 +11,7 @@ void registerTableFunctions() registerTableFunctionMerge(factory); registerTableFunctionRemote(factory); registerTableFunctionNumbers(factory); + registerTableFunctionLoop(factory); registerTableFunctionGenerateSeries(factory); registerTableFunctionNull(factory); registerTableFunctionZeros(factory); diff --git a/src/TableFunctions/registerTableFunctions.h b/src/TableFunctions/registerTableFunctions.h index 4a89b3afbb3..efde4d6dcdc 100644 --- a/src/TableFunctions/registerTableFunctions.h +++ b/src/TableFunctions/registerTableFunctions.h @@ -8,6 +8,7 @@ class TableFunctionFactory; void registerTableFunctionMerge(TableFunctionFactory & factory); void registerTableFunctionRemote(TableFunctionFactory & factory); void registerTableFunctionNumbers(TableFunctionFactory & factory); +void registerTableFunctionLoop(TableFunctionFactory & factory); void registerTableFunctionGenerateSeries(TableFunctionFactory & factory); void registerTableFunctionNull(TableFunctionFactory & factory); void registerTableFunctionZeros(TableFunctionFactory & factory); diff --git a/tests/ci/bugfix_validate_check.py b/tests/ci/bugfix_validate_check.py index 7aaf18e7765..d41fdaf05ff 100644 --- a/tests/ci/bugfix_validate_check.py +++ b/tests/ci/bugfix_validate_check.py @@ -109,12 +109,12 @@ def main(): test_script = jobs_scripts[test_job] if report_file.exists(): report_file.unlink() - extra_timeout_option = "" - if test_job == JobNames.STATELESS_TEST_RELEASE: - extra_timeout_option = str(3600) # "bugfix" must be present in checkname, as integration test runner checks this check_name = f"Validate bugfix: {test_job}" - command = f"python3 {test_script} '{check_name}' {extra_timeout_option} --validate-bugfix --report-to-file {report_file}" + command = ( + f"python3 {test_script} '{check_name}' " + f"--validate-bugfix --report-to-file {report_file}" + ) print(f"Going to validate job [{test_job}], command [{command}]") _ = subprocess.run( command, diff --git a/tests/ci/build_report_check.py b/tests/ci/build_report_check.py index cc8e226e495..1d734fbb3f8 100644 --- a/tests/ci/build_report_check.py +++ b/tests/ci/build_report_check.py @@ -1,5 +1,5 @@ #!/usr/bin/env python3 - +import json import logging import os import sys @@ -13,6 +13,8 @@ from env_helper import ( GITHUB_SERVER_URL, REPORT_PATH, TEMP_PATH, + CI_CONFIG_PATH, + CI, ) from pr_info import PRInfo from report import ( @@ -53,6 +55,18 @@ def main(): release=pr_info.is_release, backport=pr_info.head_ref.startswith("backport/"), ) + if CI: + # In CI only specific builds might be manually selected, or some wf does not build all builds. + # Filtering @builds_for_check to verify only builds that are present in the current CI workflow + with open(CI_CONFIG_PATH, encoding="utf-8") as jfd: + ci_config = json.load(jfd) + all_ci_jobs = ( + ci_config["jobs_data"]["jobs_to_skip"] + + ci_config["jobs_data"]["jobs_to_do"] + ) + builds_for_check = [job for job in builds_for_check if job in all_ci_jobs] + print(f"NOTE: following build reports will be accounted: [{builds_for_check}]") + required_builds = len(builds_for_check) missing_builds = 0 diff --git a/tests/ci/cache_utils.py b/tests/ci/cache_utils.py index a0692f4eff2..5a295fc66ca 100644 --- a/tests/ci/cache_utils.py +++ b/tests/ci/cache_utils.py @@ -197,7 +197,6 @@ class CargoCache(Cache): logging.info("Cache for Cargo.lock md5 %s will be uploaded", self.lock_hash) self._force_upload_cache = True self.directory.mkdir(parents=True, exist_ok=True) - return def upload(self): self._upload(f"{self.PREFIX}/{self.archive_name}", self._force_upload_cache) diff --git a/tests/ci/cherry_pick.py b/tests/ci/cherry_pick.py index 91a03e55d87..e470621e2c5 100644 --- a/tests/ci/cherry_pick.py +++ b/tests/ci/cherry_pick.py @@ -91,7 +91,7 @@ close it. name: str, pr: PullRequest, repo: Repository, - backport_created_label: str = Labels.PR_BACKPORTS_CREATED, + backport_created_label: str, ): self.name = name self.pr = pr @@ -115,11 +115,12 @@ close it. if branch_updated: self._backported = True - def pop_prs(self, prs: PullRequests) -> None: + def pop_prs(self, prs: PullRequests) -> PullRequests: """the method processes all prs and pops the ReleaseBranch related prs""" to_pop = [] # type: List[int] for i, pr in enumerate(prs): if self.name not in pr.head.ref: + # this pr is not for the current branch continue if pr.head.ref.startswith(f"cherrypick/{self.name}"): self.cherrypick_pr = pr @@ -128,19 +129,22 @@ close it. self.backport_pr = pr to_pop.append(i) else: - logging.error( - "head ref of PR #%s isn't starting with known suffix", - pr.number, - ) + assert False, f"BUG! Invalid PR's branch [{pr.head.ref}]" + + # Cherry-pick or backport PR found, set @backported flag for current release branch + self._backported = True + for i in reversed(to_pop): # Going from the tail to keep the order and pop greater index first prs.pop(i) + return prs def process( # pylint: disable=too-many-return-statements self, dry_run: bool ) -> None: if self.backported: return + if not self.cherrypick_pr: if dry_run: logging.info( @@ -148,41 +152,39 @@ close it. ) return self.create_cherrypick() - if self.backported: - return - if self.cherrypick_pr is not None: - # Try to merge cherrypick instantly - if self.cherrypick_pr.mergeable and self.cherrypick_pr.state != "closed": - if dry_run: - logging.info( - "DRY RUN: Would merge cherry-pick PR for #%s", self.pr.number - ) - return - self.cherrypick_pr.merge() - # The PR needs update, since PR.merge doesn't update the object - self.cherrypick_pr.update() - if self.cherrypick_pr.merged: - if dry_run: - logging.info( - "DRY RUN: Would create backport PR for #%s", self.pr.number - ) - return - self.create_backport() - return - if self.cherrypick_pr.state == "closed": + assert self.cherrypick_pr, "BUG!" + + if self.cherrypick_pr.mergeable and self.cherrypick_pr.state != "closed": + if dry_run: logging.info( - "The cherrypick PR #%s for PR #%s is discarded", - self.cherrypick_pr.number, - self.pr.number, + "DRY RUN: Would merge cherry-pick PR for #%s", self.pr.number ) - self._backported = True return + self.cherrypick_pr.merge() + # The PR needs update, since PR.merge doesn't update the object + self.cherrypick_pr.update() + if self.cherrypick_pr.merged: + if dry_run: + logging.info( + "DRY RUN: Would create backport PR for #%s", self.pr.number + ) + return + self.create_backport() + return + if self.cherrypick_pr.state == "closed": logging.info( - "Cherrypick PR #%s for PR #%s have conflicts and unable to be merged", + "The cherry-pick PR #%s for PR #%s is discarded", self.cherrypick_pr.number, self.pr.number, ) - self.ping_cherry_pick_assignees(dry_run) + self._backported = True + return + logging.info( + "Cherry-pick PR #%s for PR #%s has conflicts and unable to be merged", + self.cherrypick_pr.number, + self.pr.number, + ) + self.ping_cherry_pick_assignees(dry_run) def create_cherrypick(self): # First, create backport branch: @@ -216,7 +218,6 @@ close it. self.name, self.pr.number, ) - self._backported = True return except CalledProcessError: # There are most probably conflicts, they'll be resolved in PR @@ -225,7 +226,7 @@ close it. # There are changes to apply, so continue git_runner(f"{GIT_PREFIX} reset --merge") - # Push, create the cherrypick PR, lable and assign it + # Push, create the cherry-pick PR, label and assign it for branch in [self.cherrypick_branch, self.backport_branch]: git_runner(f"{GIT_PREFIX} push -f {self.REMOTE} {branch}:{branch}") @@ -246,6 +247,7 @@ close it. self.cherrypick_pr.add_to_labels(Labels.PR_CRITICAL_BUGFIX) elif Labels.PR_BUGFIX in [label.name for label in self.pr.labels]: self.cherrypick_pr.add_to_labels(Labels.PR_BUGFIX) + self._backported = True self._assign_new_pr(self.cherrypick_pr) # update cherrypick PR to get the state for PR.mergable self.cherrypick_pr.update() @@ -338,7 +340,7 @@ close it. @property def backported(self) -> bool: - return self._backported or self.backport_pr is not None + return self._backported def __repr__(self): return self.name @@ -351,16 +353,22 @@ class Backport: repo: str, fetch_from: Optional[str], dry_run: bool, - must_create_backport_labels: List[str], - backport_created_label: str, ): self.gh = gh self._repo_name = repo self._fetch_from = fetch_from self.dry_run = dry_run - self.must_create_backport_labels = must_create_backport_labels - self.backport_created_label = backport_created_label + self.must_create_backport_label = ( + Labels.MUST_BACKPORT + if self._repo_name == self._fetch_from + else Labels.MUST_BACKPORT_CLOUD + ) + self.backport_created_label = ( + Labels.PR_BACKPORTS_CREATED + if self._repo_name == self._fetch_from + else Labels.PR_BACKPORTS_CREATED_CLOUD + ) self._remote = "" self._remote_line = "" @@ -460,7 +468,7 @@ class Backport: query_args = { "query": f"type:pr repo:{self._fetch_from} -label:{self.backport_created_label}", "label": ",".join( - self.labels_to_backport + self.must_create_backport_labels + self.labels_to_backport + [self.must_create_backport_label] ), "merged": [since_date, tomorrow], } @@ -477,23 +485,19 @@ class Backport: self.process_pr(pr) except Exception as e: logging.error( - "During processing the PR #%s error occured: %s", pr.number, e + "During processing the PR #%s error occurred: %s", pr.number, e ) self.error = e def process_pr(self, pr: PullRequest) -> None: pr_labels = [label.name for label in pr.labels] - for label in self.must_create_backport_labels: - # We backport any vXXX-must-backport to all branches of the fetch repo (better than no backport) - if label in pr_labels or self._fetch_from: - branches = [ - ReleaseBranch(br, pr, self.repo, self.backport_created_label) - for br in self.release_branches - ] # type: List[ReleaseBranch] - break - - if not branches: + if self.must_create_backport_label in pr_labels: + branches = [ + ReleaseBranch(br, pr, self.repo, self.backport_created_label) + for br in self.release_branches + ] # type: List[ReleaseBranch] + else: branches = [ ReleaseBranch(br, pr, self.repo, self.backport_created_label) for br in [ @@ -502,20 +506,14 @@ class Backport: if label in self.labels_to_backport ] ] - if not branches: - # This is definitely some error. There must be at least one branch - # It also make the whole program exit code non-zero - self.error = Exception( - f"There are no branches to backport PR #{pr.number}, logical error" - ) - raise self.error + assert branches, "BUG!" logging.info( " PR #%s is supposed to be backported to %s", pr.number, ", ".join(map(str, branches)), ) - # All PRs for cherrypick and backport branches as heads + # All PRs for cherry-pick and backport branches as heads query_suffix = " ".join( [ f"head:{branch.backport_branch} head:{branch.cherrypick_branch}" @@ -527,29 +525,15 @@ class Backport: label=f"{Labels.PR_BACKPORT},{Labels.PR_CHERRYPICK}", ) for br in branches: - br.pop_prs(bp_cp_prs) - - if bp_cp_prs: - # This is definitely some error. All prs must be consumed by - # branches with ReleaseBranch.pop_prs. It also makes the whole - # program exit code non-zero - self.error = Exception( - "The following PRs are not filtered by release branches:\n" - "\n".join(map(str, bp_cp_prs)) - ) - raise self.error - - if all(br.backported for br in branches): - # Let's check if the PR is already backported - self.mark_pr_backported(pr) - return + bp_cp_prs = br.pop_prs(bp_cp_prs) + assert not bp_cp_prs, "BUG!" for br in branches: br.process(self.dry_run) - if all(br.backported for br in branches): - # And check it after the running - self.mark_pr_backported(pr) + for br in branches: + assert br.backported, f"BUG! backport to branch [{br}] failed" + self.mark_pr_backported(pr) def mark_pr_backported(self, pr: PullRequest) -> None: if self.dry_run: @@ -586,19 +570,6 @@ def parse_args(): ) parser.add_argument("--dry-run", action="store_true", help="do not create anything") - parser.add_argument( - "--must-create-backport-label", - default=Labels.MUST_BACKPORT, - choices=(Labels.MUST_BACKPORT, Labels.MUST_BACKPORT_CLOUD), - help="label to filter PRs to backport", - nargs="+", - ) - parser.add_argument( - "--backport-created-label", - default=Labels.PR_BACKPORTS_CREATED, - choices=(Labels.PR_BACKPORTS_CREATED, Labels.PR_BACKPORTS_CREATED_CLOUD), - help="label to mark PRs as backported", - ) parser.add_argument( "--reserve-search-days", default=0, @@ -663,12 +634,6 @@ def main(): args.repo, args.from_repo, args.dry_run, - ( - args.must_create_backport_label - if isinstance(args.must_create_backport_label, list) - else [args.must_create_backport_label] - ), - args.backport_created_label, ) # https://github.com/python/mypy/issues/3004 bp.gh.cache_path = temp_path / "gh_cache" diff --git a/tests/ci/ci.py b/tests/ci/ci.py index c4e06ccd79a..55a18a2f335 100644 --- a/tests/ci/ci.py +++ b/tests/ci/ci.py @@ -3,22 +3,26 @@ import concurrent.futures import json import logging import os -import random import re import subprocess import sys -import time -from copy import deepcopy -from dataclasses import asdict, dataclass -from enum import Enum +from dataclasses import dataclass from pathlib import Path -from typing import Any, Dict, List, Optional, Sequence, Set, Tuple, Union +from typing import Any, Dict, List, Optional import docker_images_helper import upload_result_helper from build_check import get_release_or_pr -from ci_config import CI_CONFIG, Build, CILabels, CIStages, JobNames, StatusNames -from ci_utils import GHActions, is_hex, normalize_string +from ci_config import ( + CI_CONFIG, + Build, + CILabels, + CIStages, + JobNames, + StatusNames, +) +from ci_metadata import CiMetadata +from ci_utils import GHActions, normalize_string from clickhouse_helper import ( CiLogsCredentials, ClickHouseHelper, @@ -35,947 +39,31 @@ from commit_status_helper import ( post_commit_status, set_status_comment, ) -from digest_helper import DockerDigester, JobDigester +from digest_helper import DockerDigester from env_helper import ( CI, GITHUB_JOB_API_URL, - GITHUB_RUN_URL, - REPO_COPY, - REPORT_PATH, - S3_BUILDS_BUCKET, - TEMP_PATH, - GITHUB_RUN_ID, GITHUB_REPOSITORY, + GITHUB_RUN_ID, + REPO_COPY, + TEMP_PATH, ) from get_robot_token import get_best_robot_token from git_helper import GIT_PREFIX, Git from git_helper import Runner as GitRunner from github_helper import GitHub from pr_info import PRInfo -from report import ERROR, SUCCESS, BuildResult, JobReport, PENDING +from report import ERROR, FAILURE, PENDING, SUCCESS, BuildResult, JobReport, TestResult from s3_helper import S3Helper -from ci_metadata import CiMetadata +from stopwatch import Stopwatch +from tee_popen import TeePopen +from ci_cache import CiCache +from ci_settings import CiSettings from version_helper import get_version_from_repo # pylint: disable=too-many-lines -@dataclass -class PendingState: - updated_at: float - run_url: str - - -class CiCache: - """ - CI cache is a bunch of records. Record is a file stored under special location on s3. - The file name has a format: - - _[]--___.ci - - RECORD_TYPE: - SUCCESSFUL - for successful jobs - PENDING - for pending jobs - - ATTRIBUTES: - release - for jobs being executed on the release branch including master branch (not a PR branch) - """ - - _S3_CACHE_PREFIX = "CI_cache_v1" - _CACHE_BUILD_REPORT_PREFIX = "build_report" - _RECORD_FILE_EXTENSION = ".ci" - _LOCAL_CACHE_PATH = Path(TEMP_PATH) / "ci_cache" - _ATTRIBUTE_RELEASE = "release" - # divider symbol 1 - _DIV1 = "--" - # divider symbol 2 - _DIV2 = "_" - assert _DIV1 != _DIV2 - - class RecordType(Enum): - SUCCESSFUL = "successful" - PENDING = "pending" - FAILED = "failed" - - @dataclass - class Record: - record_type: "CiCache.RecordType" - job_name: str - job_digest: str - batch: int - num_batches: int - release_branch: bool - file: str = "" - - def to_str_key(self): - """other fields must not be included in the hash str""" - return "_".join( - [self.job_name, self.job_digest, str(self.batch), str(self.num_batches)] - ) - - class JobType(Enum): - DOCS = "DOCS" - SRCS = "SRCS" - - @classmethod - def is_docs_job(cls, job_name: str) -> bool: - return job_name == JobNames.DOCS_CHECK - - @classmethod - def is_srcs_job(cls, job_name: str) -> bool: - return not cls.is_docs_job(job_name) - - @classmethod - def get_type_by_name(cls, job_name: str) -> "CiCache.JobType": - res = cls.SRCS - if cls.is_docs_job(job_name): - res = cls.DOCS - elif cls.is_srcs_job(job_name): - res = cls.SRCS - else: - assert False - return res - - def __init__( - self, - s3: S3Helper, - job_digests: Dict[str, str], - ): - self.s3 = s3 - self.job_digests = job_digests - self.cache_s3_paths = { - job_type: f"{self._S3_CACHE_PREFIX}/{job_type.value}-{self._get_digest_for_job_type(self.job_digests, job_type)}/" - for job_type in self.JobType - } - self.s3_record_prefixes = { - record_type: record_type.value for record_type in self.RecordType - } - self.records: Dict["CiCache.RecordType", Dict[str, "CiCache.Record"]] = { - record_type: {} for record_type in self.RecordType - } - - self.cache_updated = False - self.cache_data_fetched = True - if not self._LOCAL_CACHE_PATH.exists(): - self._LOCAL_CACHE_PATH.mkdir(parents=True, exist_ok=True) - - def _get_digest_for_job_type( - self, job_digests: Dict[str, str], job_type: JobType - ) -> str: - if job_type == self.JobType.DOCS: - res = job_digests[JobNames.DOCS_CHECK] - elif job_type == self.JobType.SRCS: - # any build type job has the same digest - pick up Build.PACKAGE_RELEASE or Build.PACKAGE_ASAN as a failover - # Build.PACKAGE_RELEASE may not exist in the list if we have reduced CI pipeline - if Build.PACKAGE_RELEASE in job_digests: - res = job_digests[Build.PACKAGE_RELEASE] - elif Build.PACKAGE_ASAN in job_digests: - # failover, if failover does not work - fix it! - res = job_digests[Build.PACKAGE_ASAN] - else: - assert False, "BUG, no build job in digest' list" - else: - assert False, "BUG, New JobType? - please update func" - return res - - def _get_record_file_name( - self, - record_type: RecordType, - job_name: str, - batch: int, - num_batches: int, - release_branch: bool, - ) -> str: - prefix = self.s3_record_prefixes[record_type] - prefix_extended = ( - self._DIV2.join([prefix, self._ATTRIBUTE_RELEASE]) - if release_branch - else prefix - ) - assert self._DIV1 not in job_name, f"Invalid job name {job_name}" - job_name = self._DIV2.join( - [job_name, self.job_digests[job_name], str(batch), str(num_batches)] - ) - file_name = self._DIV1.join([prefix_extended, job_name]) - file_name += self._RECORD_FILE_EXTENSION - return file_name - - def _get_record_s3_path(self, job_name: str) -> str: - return self.cache_s3_paths[self.JobType.get_type_by_name(job_name)] - - def _parse_record_file_name( - self, record_type: RecordType, file_name: str - ) -> Optional["CiCache.Record"]: - # validate filename - if ( - not file_name.endswith(self._RECORD_FILE_EXTENSION) - or not len(file_name.split(self._DIV1)) == 2 - ): - print("ERROR: wrong file name format") - return None - - file_name = file_name.removesuffix(self._RECORD_FILE_EXTENSION) - release_branch = False - - prefix_extended, job_suffix = file_name.split(self._DIV1) - record_type_and_attribute = prefix_extended.split(self._DIV2) - - # validate filename prefix - failure = False - if not 0 < len(record_type_and_attribute) <= 2: - print("ERROR: wrong file name prefix") - failure = True - if ( - len(record_type_and_attribute) > 1 - and record_type_and_attribute[1] != self._ATTRIBUTE_RELEASE - ): - print("ERROR: wrong record attribute") - failure = True - if record_type_and_attribute[0] != self.s3_record_prefixes[record_type]: - print("ERROR: wrong record type") - failure = True - if failure: - return None - - if ( - len(record_type_and_attribute) > 1 - and record_type_and_attribute[1] == self._ATTRIBUTE_RELEASE - ): - release_branch = True - - job_properties = job_suffix.split(self._DIV2) - job_name, job_digest, batch, num_batches = ( - self._DIV2.join(job_properties[:-3]), - job_properties[-3], - int(job_properties[-2]), - int(job_properties[-1]), - ) - - if not is_hex(job_digest): - print("ERROR: wrong record job digest") - return None - - record = self.Record( - record_type, - job_name, - job_digest, - batch, - num_batches, - release_branch, - file="", - ) - return record - - def print_status(self): - for record_type in self.RecordType: - GHActions.print_in_group( - f"Cache records: [{record_type}]", list(self.records[record_type]) - ) - return self - - def update(self): - """ - Pulls cache records from s3. Only records name w/o content. - """ - for record_type in self.RecordType: - prefix = self.s3_record_prefixes[record_type] - cache_list = self.records[record_type] - for job_type in self.JobType: - path = self.cache_s3_paths[job_type] - records = self.s3.list_prefix(f"{path}{prefix}", S3_BUILDS_BUCKET) - records = [record.split("/")[-1] for record in records] - for file in records: - record = self._parse_record_file_name( - record_type=record_type, file_name=file - ) - if not record: - print(f"ERROR: failed to parse cache record [{file}]") - continue - if ( - record.job_name not in self.job_digests - or self.job_digests[record.job_name] != record.job_digest - ): - # skip records we are not interested in - continue - - if record.to_str_key() not in cache_list: - cache_list[record.to_str_key()] = record - self.cache_data_fetched = False - elif ( - not cache_list[record.to_str_key()].release_branch - and record.release_branch - ): - # replace a non-release record with a release one - cache_list[record.to_str_key()] = record - self.cache_data_fetched = False - - self.cache_updated = True - return self - - def fetch_records_data(self): - """ - Pulls CommitStatusData for all cached jobs from s3 - """ - if not self.cache_updated: - self.update() - - if self.cache_data_fetched: - # there are no records without fetched data - no need to fetch - return self - - # clean up - for file in self._LOCAL_CACHE_PATH.glob("*.ci"): - file.unlink() - - # download all record files - for job_type in self.JobType: - path = self.cache_s3_paths[job_type] - for record_type in self.RecordType: - prefix = self.s3_record_prefixes[record_type] - _ = self.s3.download_files( - bucket=S3_BUILDS_BUCKET, - s3_path=f"{path}{prefix}", - file_suffix=self._RECORD_FILE_EXTENSION, - local_directory=self._LOCAL_CACHE_PATH, - ) - - # validate we have files for all records and save file names meanwhile - for record_type in self.RecordType: - record_list = self.records[record_type] - for _, record in record_list.items(): - record_file_name = self._get_record_file_name( - record_type, - record.job_name, - record.batch, - record.num_batches, - record.release_branch, - ) - assert ( - self._LOCAL_CACHE_PATH / record_file_name - ).is_file(), f"BUG. Record file must be present: {self._LOCAL_CACHE_PATH / record_file_name}" - record.file = record_file_name - - self.cache_data_fetched = True - return self - - def exist( - self, - record_type: "CiCache.RecordType", - job: str, - batch: int, - num_batches: int, - release_branch: bool, - ) -> bool: - if not self.cache_updated: - self.update() - record_key = self.Record( - record_type, - job, - self.job_digests[job], - batch, - num_batches, - release_branch, - ).to_str_key() - res = record_key in self.records[record_type] - if release_branch: - return res and self.records[record_type][record_key].release_branch - else: - return res - - def push( - self, - record_type: "CiCache.RecordType", - job: str, - batches: Union[int, Sequence[int]], - num_batches: int, - status: Union[CommitStatusData, PendingState], - release_branch: bool = False, - ) -> None: - """ - Pushes a cache record (CommitStatusData) - @release_branch adds "release" attribute to a record - """ - if isinstance(batches, int): - batches = [batches] - for batch in batches: - record_file = self._LOCAL_CACHE_PATH / self._get_record_file_name( - record_type, job, batch, num_batches, release_branch - ) - record_s3_path = self._get_record_s3_path(job) - if record_type == self.RecordType.SUCCESSFUL: - assert isinstance(status, CommitStatusData) - status.dump_to_file(record_file) - elif record_type == self.RecordType.FAILED: - assert isinstance(status, CommitStatusData) - status.dump_to_file(record_file) - elif record_type == self.RecordType.PENDING: - assert isinstance(status, PendingState) - with open(record_file, "w", encoding="utf-8") as json_file: - json.dump(asdict(status), json_file) - else: - assert False - - _ = self.s3.upload_file( - bucket=S3_BUILDS_BUCKET, - file_path=record_file, - s3_path=record_s3_path + record_file.name, - ) - record = self.Record( - record_type, - job, - self.job_digests[job], - batch, - num_batches, - release_branch, - file=record_file.name, - ) - if ( - record.release_branch - or record.to_str_key() not in self.records[record_type] - ): - self.records[record_type][record.to_str_key()] = record - - def get( - self, record_type: "CiCache.RecordType", job: str, batch: int, num_batches: int - ) -> Optional[Union[CommitStatusData, PendingState]]: - """ - Gets a cache record data for a job, or None if a cache miss - """ - - if not self.cache_data_fetched: - self.fetch_records_data() - - record_key = self.Record( - record_type, - job, - self.job_digests[job], - batch, - num_batches, - release_branch=False, - ).to_str_key() - - if record_key not in self.records[record_type]: - return None - - record_file_name = self.records[record_type][record_key].file - - res = CommitStatusData.load_from_file( - self._LOCAL_CACHE_PATH / record_file_name - ) # type: CommitStatusData - - return res - - def delete( - self, - record_type: "CiCache.RecordType", - job: str, - batch: int, - num_batches: int, - release_branch: bool, - ) -> None: - """ - deletes record from the cache - """ - raise NotImplementedError("Let's try make cache push-and-read-only") - # assert ( - # record_type == self.RecordType.PENDING - # ), "FIXME: delete is supported for pending records only" - # record_file_name = self._get_record_file_name( - # self.RecordType.PENDING, - # job, - # batch, - # num_batches, - # release_branch=release_branch, - # ) - # record_s3_path = self._get_record_s3_path(job) - # self.s3.delete_file_from_s3(S3_BUILDS_BUCKET, record_s3_path + record_file_name) - - # record_key = self.Record( - # record_type, - # job, - # self.job_digests[job], - # batch, - # num_batches, - # release_branch=False, - # ).to_str_key() - - # if record_key in self.records[record_type]: - # del self.records[record_type][record_key] - - def is_successful( - self, job: str, batch: int, num_batches: int, release_branch: bool - ) -> bool: - """ - checks if a given job have already been done successfully - """ - return self.exist( - self.RecordType.SUCCESSFUL, job, batch, num_batches, release_branch - ) - - def is_failed( - self, job: str, batch: int, num_batches: int, release_branch: bool - ) -> bool: - """ - checks if a given job have already been done with failure - """ - return self.exist( - self.RecordType.FAILED, job, batch, num_batches, release_branch - ) - - def is_pending( - self, job: str, batch: int, num_batches: int, release_branch: bool - ) -> bool: - """ - check pending record in the cache for a given job - @release_branch - checks that "release" attribute is set for a record - """ - if self.is_successful( - job, batch, num_batches, release_branch - ) or self.is_failed(job, batch, num_batches, release_branch): - return False - - return self.exist( - self.RecordType.PENDING, job, batch, num_batches, release_branch - ) - - def push_successful( - self, - job: str, - batch: int, - num_batches: int, - job_status: CommitStatusData, - release_branch: bool = False, - ) -> None: - """ - Pushes a cache record (CommitStatusData) - @release_branch adds "release" attribute to a record - """ - self.push( - self.RecordType.SUCCESSFUL, - job, - [batch], - num_batches, - job_status, - release_branch, - ) - - def push_failed( - self, - job: str, - batch: int, - num_batches: int, - job_status: CommitStatusData, - release_branch: bool = False, - ) -> None: - """ - Pushes a cache record of type Failed (CommitStatusData) - @release_branch adds "release" attribute to a record - """ - self.push( - self.RecordType.FAILED, - job, - [batch], - num_batches, - job_status, - release_branch, - ) - - def push_pending( - self, job: str, batches: List[int], num_batches: int, release_branch: bool - ) -> None: - """ - pushes pending record for a job to the cache - """ - pending_state = PendingState(time.time(), run_url=GITHUB_RUN_URL) - self.push( - self.RecordType.PENDING, - job, - batches, - num_batches, - pending_state, - release_branch, - ) - - def get_successful( - self, job: str, batch: int, num_batches: int - ) -> Optional[CommitStatusData]: - """ - Gets a cache record (CommitStatusData) for a job, or None if a cache miss - """ - res = self.get(self.RecordType.SUCCESSFUL, job, batch, num_batches) - assert res is None or isinstance(res, CommitStatusData) - return res - - def delete_pending( - self, job: str, batch: int, num_batches: int, release_branch: bool - ) -> None: - """ - deletes pending record from the cache - """ - self.delete(self.RecordType.PENDING, job, batch, num_batches, release_branch) - - def download_build_reports(self, file_prefix: str = "") -> List[str]: - """ - not ideal class for this method, - but let it be as we store build reports in CI cache directory on s3 - and CiCache knows where exactly - - @file_prefix allows to filter out reports by git head_ref - """ - report_path = Path(REPORT_PATH) - report_path.mkdir(exist_ok=True, parents=True) - path = ( - self._get_record_s3_path(Build.PACKAGE_RELEASE) - + self._CACHE_BUILD_REPORT_PREFIX - ) - if file_prefix: - path += "_" + file_prefix - reports_files = self.s3.download_files( - bucket=S3_BUILDS_BUCKET, - s3_path=path, - file_suffix=".json", - local_directory=report_path, - ) - return reports_files - - def upload_build_report(self, build_result: BuildResult) -> str: - result_json_path = build_result.write_json(Path(TEMP_PATH)) - s3_path = ( - self._get_record_s3_path(Build.PACKAGE_RELEASE) + result_json_path.name - ) - return self.s3.upload_file( - bucket=S3_BUILDS_BUCKET, file_path=result_json_path, s3_path=s3_path - ) - - def await_jobs( - self, jobs_with_params: Dict[str, Dict[str, Any]], is_release_branch: bool - ) -> Dict[str, List[int]]: - """ - await pending jobs to be finished - @jobs_with_params - jobs to await. {JOB_NAME: {"batches": [BATCHES...], "num_batches": NUM_BATCHES}} - returns successfully finished jobs: {JOB_NAME: [BATCHES...]} - """ - if not jobs_with_params: - return {} - poll_interval_sec = 300 - # TIMEOUT * MAX_ROUNDS_TO_WAIT must be less than 6h (GH job timeout) with a room for rest RunConfig work - TIMEOUT = 3000 # 50 min - MAX_ROUNDS_TO_WAIT = 6 - MAX_JOB_NUM_TO_WAIT = 3 - await_finished: Dict[str, List[int]] = {} - round_cnt = 0 - while ( - len(jobs_with_params) > MAX_JOB_NUM_TO_WAIT - and round_cnt < MAX_ROUNDS_TO_WAIT - ): - round_cnt += 1 - GHActions.print_in_group( - f"Wait pending jobs, round [{round_cnt}/{MAX_ROUNDS_TO_WAIT}]:", - list(jobs_with_params), - ) - # this is initial approach to wait pending jobs: - # start waiting for the next TIMEOUT seconds if there are more than X(=4) jobs to wait - # wait TIMEOUT seconds in rounds. Y(=5) is the max number of rounds - expired_sec = 0 - start_at = int(time.time()) - while expired_sec < TIMEOUT and jobs_with_params: - time.sleep(poll_interval_sec) - self.update() - jobs_with_params_copy = deepcopy(jobs_with_params) - for job_name in jobs_with_params: - num_batches = jobs_with_params[job_name]["num_batches"] - job_config = CI_CONFIG.get_job_config(job_name) - for batch in jobs_with_params[job_name]["batches"]: - if self.is_pending( - job_name, - batch, - num_batches, - release_branch=is_release_branch - and job_config.required_on_release_branch, - ): - continue - print( - f"Job [{job_name}_[{batch}/{num_batches}]] is not pending anymore" - ) - - # some_job_ready = True - jobs_with_params_copy[job_name]["batches"].remove(batch) - if not jobs_with_params_copy[job_name]["batches"]: - del jobs_with_params_copy[job_name] - - if not self.is_successful( - job_name, - batch, - num_batches, - release_branch=is_release_branch - and job_config.required_on_release_branch, - ): - print( - f"NOTE: Job [{job_name}:{batch}] finished but no success - remove from awaiting list, do not add to ready" - ) - continue - if job_name in await_finished: - await_finished[job_name].append(batch) - else: - await_finished[job_name] = [batch] - jobs_with_params = jobs_with_params_copy - expired_sec = int(time.time()) - start_at - print( - f"...awaiting continues... seconds left [{TIMEOUT - expired_sec}]" - ) - if await_finished: - GHActions.print_in_group( - f"Finished jobs, round [{round_cnt}]:", - [f"{job}:{batches}" for job, batches in await_finished.items()], - ) - GHActions.print_in_group( - "Remaining jobs:", - [f"{job}:{params['batches']}" for job, params in jobs_with_params.items()], - ) - return await_finished - - -@dataclass -class CiOptions: - # job will be included in the run if any keyword from the list matches job name - include_keywords: Optional[List[str]] = None - # job will be excluded in the run if any keyword from the list matches job name - exclude_keywords: Optional[List[str]] = None - - # list of specified preconfigured ci sets to run - ci_sets: Optional[List[str]] = None - # list of specified jobs to run - ci_jobs: Optional[List[str]] = None - - # batches to run for all multi-batch jobs - job_batches: Optional[List[int]] = None - - do_not_test: bool = False - no_ci_cache: bool = False - no_merge_commit: bool = False - - def as_dict(self) -> Dict[str, Any]: - return asdict(self) - - @staticmethod - def create_from_run_config(run_config: Dict[str, Any]) -> "CiOptions": - return CiOptions(**run_config["ci_options"]) - - @staticmethod - def create_from_pr_message( - debug_message: Optional[str], update_from_api: bool - ) -> "CiOptions": - """ - Creates CiOptions instance based on tags found in PR body and/or commit message - @commit_message - may be provided directly for debugging purposes, otherwise it will be retrieved from git. - """ - res = CiOptions() - pr_info = PRInfo() - if ( - not pr_info.is_pr and not debug_message - ): # if commit_message is provided it's test/debug scenario - do not return - # CI options can be configured in PRs only - # if debug_message is provided - it's a test - return res - message = debug_message or GitRunner(set_cwd_to_git_root=True).run( - f"{GIT_PREFIX} log {pr_info.sha} --format=%B -n 1" - ) - - pattern = r"(#|- \[x\] + Exclude: All with TSAN, MSAN, UBSAN, Coverage + pattern = r"(#|- \[x\] + Integration tests +- [x] Non required - [ ] Integration tests (arm64) - [x] Integration tests - [x] Integration tests @@ -33,7 +33,7 @@ _TEST_BODY_2 = """ - [x] MUST include azure - [x] no action must be applied - [ ] no action must be applied -- [x] MUST exclude tsan +- [x] MUST exclude tsan - [x] MUST exclude aarch64 - [x] MUST exclude test with analazer - [ ] no action applied @@ -54,6 +54,14 @@ _TEST_JOB_LIST = [ "Fast test", "package_release", "package_asan", + "package_aarch64", + "package_release_coverage", + "package_debug", + "package_tsan", + "package_msan", + "package_ubsan", + "binary_release", + "fuzzers", "Docker server image", "Docker keeper image", "Install packages (amd64)", @@ -129,22 +137,24 @@ _TEST_JOB_LIST = [ "Bugfix validation", ] +_TEST_JOB_LIST_2 = ["Style check", "Fast test", "fuzzers"] + class TestCIOptions(unittest.TestCase): def test_pr_body_parsing(self): - ci_options = CiOptions.create_from_pr_message( + ci_options = CiSettings.create_from_pr_message( _TEST_BODY_1, update_from_api=False ) self.assertFalse(ci_options.do_not_test) self.assertFalse(ci_options.no_ci_cache) self.assertTrue(ci_options.no_merge_commit) - self.assertEqual(ci_options.ci_sets, ["ci_set_integration"]) + self.assertEqual(ci_options.ci_sets, ["ci_set_non_required"]) self.assertCountEqual(ci_options.include_keywords, ["foo", "foo_bar"]) self.assertCountEqual(ci_options.exclude_keywords, ["foo", "foo_bar"]) def test_options_applied(self): self.maxDiff = None - ci_options = CiOptions.create_from_pr_message( + ci_options = CiSettings.create_from_pr_message( _TEST_BODY_2, update_from_api=False ) self.assertCountEqual( @@ -153,26 +163,35 @@ class TestCIOptions(unittest.TestCase): ) self.assertCountEqual( ci_options.exclude_keywords, - ["tsan", "aarch64", "analyzer", "s3_storage", "coverage"], + ["tsan", "foobar", "aarch64", "analyzer", "s3_storage", "coverage"], ) - jobs_to_do = list(_TEST_JOB_LIST) - jobs_to_skip = [] - job_params = { - "Stateless tests (azure, asan)": { - "batches": list(range(3)), - "num_batches": 3, - "run_by_ci_option": True, - } - } - jobs_to_do, jobs_to_skip, job_params = ci_options.apply( - jobs_to_do, jobs_to_skip, job_params, PRInfo() + + jobs_configs = {job: JobConfig() for job in _TEST_JOB_LIST} + jobs_configs[ + "fuzzers" + ].run_by_label = ( + "TEST_LABEL" # check "fuzzers" appears in the result due to the label + ) + jobs_configs[ + "Integration tests (asan)" + ].release_only = ( + True # still must be included as it's set with include keywords + ) + filtered_jobs = list( + ci_options.apply( + jobs_configs, is_release=False, is_pr=True, labels=["TEST_LABEL"] + ) ) self.assertCountEqual( - jobs_to_do, + filtered_jobs, [ "Style check", + "fuzzers", "package_release", "package_asan", + "package_debug", + "package_msan", + "package_ubsan", "Stateless tests (asan)", "Stateless tests (azure, asan)", "Stateless tests flaky check (asan)", @@ -187,54 +206,88 @@ class TestCIOptions(unittest.TestCase): ) def test_options_applied_2(self): + jobs_configs = {job: JobConfig() for job in _TEST_JOB_LIST_2} + jobs_configs["Style check"].release_only = True + jobs_configs["Fast test"].pr_only = True + jobs_configs["fuzzers"].run_by_label = "TEST_LABEL" + # no settings are set + filtered_jobs = list( + CiSettings().apply(jobs_configs, is_release=False, is_pr=True, labels=[]) + ) + self.assertCountEqual( + filtered_jobs, + [ + "Fast test", + ], + ) + + filtered_jobs = list( + CiSettings().apply(jobs_configs, is_release=True, is_pr=False, labels=[]) + ) + self.assertCountEqual( + filtered_jobs, + [ + "Style check", + ], + ) + + def test_options_applied_3(self): + ci_settings = CiSettings() + ci_settings.include_keywords = ["Style"] + jobs_configs = {job: JobConfig() for job in _TEST_JOB_LIST_2} + jobs_configs["Style check"].release_only = True + jobs_configs["Fast test"].pr_only = True + # no settings are set + filtered_jobs = list( + ci_settings.apply( + jobs_configs, is_release=False, is_pr=True, labels=["TEST_LABEL"] + ) + ) + self.assertCountEqual( + filtered_jobs, + [ + "Style check", + ], + ) + + ci_settings.include_keywords = ["Fast"] + filtered_jobs = list( + ci_settings.apply( + jobs_configs, is_release=True, is_pr=False, labels=["TEST_LABEL"] + ) + ) + self.assertCountEqual( + filtered_jobs, + [ + "Style check", + ], + ) + + def test_options_applied_4(self): self.maxDiff = None - ci_options = CiOptions.create_from_pr_message( + ci_options = CiSettings.create_from_pr_message( _TEST_BODY_3, update_from_api=False ) self.assertCountEqual(ci_options.include_keywords, ["analyzer"]) self.assertIsNone(ci_options.exclude_keywords) - jobs_to_do = list(_TEST_JOB_LIST) - jobs_to_skip = [] - job_params = {} - jobs_to_do, jobs_to_skip, job_params = ci_options.apply( - jobs_to_do, jobs_to_skip, job_params, PRInfo() + jobs_configs = {job: JobConfig() for job in _TEST_JOB_LIST} + jobs_configs[ + "fuzzers" + ].run_by_label = "TEST_LABEL" # check "fuzzers" does not appears in the result + jobs_configs["Integration tests (asan)"].release_only = True + filtered_jobs = list( + ci_options.apply( + jobs_configs, is_release=False, is_pr=True, labels=["TEST_LABEL"] + ) ) self.assertCountEqual( - jobs_to_do, + filtered_jobs, [ "Style check", "Integration tests (asan, old analyzer)", "package_release", "Stateless tests (release, old analyzer, s3, DatabaseReplicated)", "package_asan", + "fuzzers", ], ) - - def test_options_applied_3(self): - self.maxDiff = None - ci_options = CiOptions.create_from_pr_message( - _TEST_BODY_4, update_from_api=False - ) - self.assertIsNone(ci_options.include_keywords, None) - self.assertIsNone(ci_options.exclude_keywords, None) - jobs_to_do = list(_TEST_JOB_LIST) - jobs_to_skip = [] - job_params = {} - - for job in _TEST_JOB_LIST: - if "Stateless" in job: - job_params[job] = { - "batches": list(range(3)), - "num_batches": 3, - "run_by_ci_option": "azure" in job, - } - else: - job_params[job] = {"run_by_ci_option": False} - - jobs_to_do, jobs_to_skip, job_params = ci_options.apply( - jobs_to_do, jobs_to_skip, job_params, PRInfo() - ) - self.assertNotIn( - "Stateless tests (azure, asan)", - jobs_to_do, - ) diff --git a/tests/clickhouse-test b/tests/clickhouse-test index 133d635f8a0..af203563d58 100755 --- a/tests/clickhouse-test +++ b/tests/clickhouse-test @@ -1223,12 +1223,9 @@ class TestCase: return FailureReason.S3_STORAGE elif ( tags - and ("no-s3-storage-with-slow-build" in tags) + and "no-s3-storage-with-slow-build" in tags and args.s3_storage - and ( - BuildFlags.THREAD in args.build_flags - or BuildFlags.DEBUG in args.build_flags - ) + and BuildFlags.RELEASE not in args.build_flags ): return FailureReason.S3_STORAGE @@ -2411,6 +2408,17 @@ def do_run_tests(jobs, test_suite: TestSuite, parallel): for _ in range(jobs): parallel_tests_array.append((None, batch_size, test_suite)) + # If we don't do random shuffling then there will be always + # nearly the same groups of test suites running concurrently. + # Thus, if there is a test within group which appears to be broken + # then it will affect all other tests in a non-random form. + # So each time a bad test fails - other tests from the group will also fail + # and this process will be more or less stable. + # It makes it more difficult to detect real flaky tests, + # because the distribution and the amount + # of failures will be nearly the same for all tests from the group. + random.shuffle(test_suite.parallel_tests) + try: with closing(multiprocessing.Pool(processes=jobs)) as pool: pool.map_async(run_tests_array, parallel_tests_array) diff --git a/tests/integration/helpers/s3_url_proxy_tests_util.py b/tests/integration/helpers/s3_url_proxy_tests_util.py index 9059fda08ae..c67d00769c5 100644 --- a/tests/integration/helpers/s3_url_proxy_tests_util.py +++ b/tests/integration/helpers/s3_url_proxy_tests_util.py @@ -2,21 +2,35 @@ import os import time +ALL_HTTP_METHODS = {"POST", "PUT", "GET", "HEAD", "CONNECT"} + + def check_proxy_logs( - cluster, proxy_instance, protocol, bucket, http_methods={"POST", "PUT", "GET"} + cluster, proxy_instances, protocol, bucket, requested_http_methods ): for i in range(10): - logs = cluster.get_container_logs(proxy_instance) # Check with retry that all possible interactions with Minio are present - for http_method in http_methods: - if ( - logs.find(http_method + f" {protocol}://minio1:9001/root/data/{bucket}") - >= 0 - ): - return + for http_method in ALL_HTTP_METHODS: + for proxy_instance in proxy_instances: + logs = cluster.get_container_logs(proxy_instance) + if ( + logs.find( + http_method + f" {protocol}://minio1:9001/root/data/{bucket}" + ) + >= 0 + ): + if http_method not in requested_http_methods: + assert ( + False + ), f"Found http method {http_method} for bucket {bucket} that should not be found in {proxy_instance} logs" + break + else: + if http_method in requested_http_methods: + assert ( + False + ), f"{http_method} method not found in logs of {proxy_instance} for bucket {bucket}" + time.sleep(1) - else: - assert False, f"{http_methods} method not found in logs of {proxy_instance}" def wait_resolver(cluster): @@ -33,8 +47,8 @@ def wait_resolver(cluster): if response == "proxy1" or response == "proxy2": return time.sleep(i) - else: - assert False, "Resolver is not up" + + assert False, "Resolver is not up" # Runs simple proxy resolver in python env container. @@ -80,9 +94,33 @@ def perform_simple_queries(node, minio_endpoint): def simple_test(cluster, proxies, protocol, bucket): minio_endpoint = build_s3_endpoint(protocol, bucket) - node = cluster.instances[f"{bucket}"] + node = cluster.instances[bucket] perform_simple_queries(node, minio_endpoint) - for proxy in proxies: - check_proxy_logs(cluster, proxy, protocol, bucket) + check_proxy_logs(cluster, proxies, protocol, bucket, ["PUT", "GET", "HEAD"]) + + +def simple_storage_test(cluster, node, proxies, policy): + node.query( + """ + CREATE TABLE s3_test ( + id Int64, + data String + ) ENGINE=MergeTree() + ORDER BY id + SETTINGS storage_policy='{}' + """.format( + policy + ) + ) + node.query("INSERT INTO s3_test VALUES (0,'data'),(1,'data')") + assert ( + node.query("SELECT * FROM s3_test order by id FORMAT Values") + == "(0,'data'),(1,'data')" + ) + + node.query("DROP TABLE IF EXISTS s3_test SYNC") + + # not checking for POST because it is in a different format + check_proxy_logs(cluster, proxies, "http", policy, ["PUT", "GET"]) diff --git a/tests/integration/test_attach_partition_using_copy/__init__.py b/tests/integration/test_attach_partition_using_copy/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_attach_partition_using_copy/configs/remote_servers.xml b/tests/integration/test_attach_partition_using_copy/configs/remote_servers.xml new file mode 100644 index 00000000000..b40730e9f7d --- /dev/null +++ b/tests/integration/test_attach_partition_using_copy/configs/remote_servers.xml @@ -0,0 +1,17 @@ + + + + + true + + replica1 + 9000 + + + replica2 + 9000 + + + + + diff --git a/tests/integration/test_attach_partition_using_copy/test.py b/tests/integration/test_attach_partition_using_copy/test.py new file mode 100644 index 00000000000..e7163b1eb32 --- /dev/null +++ b/tests/integration/test_attach_partition_using_copy/test.py @@ -0,0 +1,201 @@ +import pytest +from helpers.cluster import ClickHouseCluster +from helpers.test_tools import assert_eq_with_retry + +cluster = ClickHouseCluster(__file__) + +replica1 = cluster.add_instance( + "replica1", with_zookeeper=True, main_configs=["configs/remote_servers.xml"] +) +replica2 = cluster.add_instance( + "replica2", with_zookeeper=True, main_configs=["configs/remote_servers.xml"] +) + + +@pytest.fixture(scope="module") +def start_cluster(): + try: + cluster.start() + yield cluster + except Exception as ex: + print(ex) + finally: + cluster.shutdown() + + +def cleanup(nodes): + for node in nodes: + node.query("DROP TABLE IF EXISTS source SYNC") + node.query("DROP TABLE IF EXISTS destination SYNC") + + +def create_source_table(node, table_name, replicated): + replica = node.name + engine = ( + f"ReplicatedMergeTree('/clickhouse/tables/1/{table_name}', '{replica}')" + if replicated + else "MergeTree()" + ) + node.query_with_retry( + """ + ATTACH TABLE {table_name} UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7' + ( + price UInt32, + date Date, + postcode1 LowCardinality(String), + postcode2 LowCardinality(String), + type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4), + is_new UInt8, + duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2), + addr1 String, + addr2 String, + street LowCardinality(String), + locality LowCardinality(String), + town LowCardinality(String), + district LowCardinality(String), + county LowCardinality(String) + ) + ENGINE = {engine} + ORDER BY (postcode1, postcode2, addr1, addr2) + SETTINGS disk = disk(type = web, endpoint = 'https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/') + """.format( + table_name=table_name, engine=engine + ) + ) + + +def create_destination_table(node, table_name, replicated): + replica = node.name + engine = ( + f"ReplicatedMergeTree('/clickhouse/tables/1/{table_name}', '{replica}')" + if replicated + else "MergeTree()" + ) + node.query_with_retry( + """ + CREATE TABLE {table_name} + ( + price UInt32, + date Date, + postcode1 LowCardinality(String), + postcode2 LowCardinality(String), + type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4), + is_new UInt8, + duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2), + addr1 String, + addr2 String, + street LowCardinality(String), + locality LowCardinality(String), + town LowCardinality(String), + district LowCardinality(String), + county LowCardinality(String) + ) + ENGINE = {engine} + ORDER BY (postcode1, postcode2, addr1, addr2) + """.format( + table_name=table_name, engine=engine + ) + ) + + +def test_both_mergtree(start_cluster): + create_source_table(replica1, "source", False) + create_destination_table(replica1, "destination", False) + + replica1.query(f"ALTER TABLE destination ATTACH PARTITION tuple() FROM source") + + assert_eq_with_retry( + replica1, + f"SELECT toYear(date) AS year,round(avg(price)) AS price,bar(price, 0, 1000000, 80) FROM destination GROUP BY year ORDER BY year ASC", + replica1.query( + f"SELECT toYear(date) AS year,round(avg(price)) AS price,bar(price, 0, 1000000, 80) FROM source GROUP BY year ORDER BY year ASC" + ), + ) + + assert_eq_with_retry( + replica1, f"SELECT town from destination LIMIT 1", "SCARBOROUGH" + ) + + cleanup([replica1]) + + +def test_all_replicated(start_cluster): + create_source_table(replica1, "source", True) + create_destination_table(replica1, "destination", True) + create_destination_table(replica2, "destination", True) + + replica1.query("SYSTEM SYNC REPLICA destination") + replica1.query(f"ALTER TABLE destination ATTACH PARTITION tuple() FROM source") + + assert_eq_with_retry( + replica1, + f"SELECT toYear(date) AS year,round(avg(price)) AS price,bar(price, 0, 1000000, 80) FROM destination GROUP BY year ORDER BY year ASC", + replica1.query( + f"SELECT toYear(date) AS year,round(avg(price)) AS price,bar(price, 0, 1000000, 80) FROM source GROUP BY year ORDER BY year ASC" + ), + ) + assert_eq_with_retry( + replica1, + f"SELECT toYear(date) AS year,round(avg(price)) AS price,bar(price, 0, 1000000, 80) FROM source GROUP BY year ORDER BY year ASC", + replica2.query( + f"SELECT toYear(date) AS year,round(avg(price)) AS price,bar(price, 0, 1000000, 80) FROM destination GROUP BY year ORDER BY year ASC" + ), + ) + + assert_eq_with_retry( + replica1, f"SELECT town from destination LIMIT 1", "SCARBOROUGH" + ) + + assert_eq_with_retry( + replica2, f"SELECT town from destination LIMIT 1", "SCARBOROUGH" + ) + + cleanup([replica1, replica2]) + + +def test_only_destination_replicated(start_cluster): + create_source_table(replica1, "source", False) + create_destination_table(replica1, "destination", True) + create_destination_table(replica2, "destination", True) + + replica1.query("SYSTEM SYNC REPLICA destination") + replica1.query(f"ALTER TABLE destination ATTACH PARTITION tuple() FROM source") + + assert_eq_with_retry( + replica1, + f"SELECT toYear(date) AS year,round(avg(price)) AS price,bar(price, 0, 1000000, 80) FROM destination GROUP BY year ORDER BY year ASC", + replica1.query( + f"SELECT toYear(date) AS year,round(avg(price)) AS price,bar(price, 0, 1000000, 80) FROM source GROUP BY year ORDER BY year ASC" + ), + ) + assert_eq_with_retry( + replica1, + f"SELECT toYear(date) AS year,round(avg(price)) AS price,bar(price, 0, 1000000, 80) FROM source GROUP BY year ORDER BY year ASC", + replica2.query( + f"SELECT toYear(date) AS year,round(avg(price)) AS price,bar(price, 0, 1000000, 80) FROM destination GROUP BY year ORDER BY year ASC" + ), + ) + + assert_eq_with_retry( + replica1, f"SELECT town from destination LIMIT 1", "SCARBOROUGH" + ) + + assert_eq_with_retry( + replica2, f"SELECT town from destination LIMIT 1", "SCARBOROUGH" + ) + + cleanup([replica1, replica2]) + + +def test_not_work_on_different_disk(start_cluster): + # Replace and move should not work on replace + create_source_table(replica1, "source", False) + create_destination_table(replica2, "destination", False) + + replica1.query_and_get_error( + f"ALTER TABLE destination REPLACE PARTITION tuple() FROM source" + ) + replica1.query_and_get_error( + f"ALTER TABLE destination MOVE PARTITION tuple() FROM source" + ) + cleanup([replica1, replica2]) diff --git a/tests/integration/test_disk_over_web_server/test.py b/tests/integration/test_disk_over_web_server/test.py index dd5163082ef..9f43ab73fa3 100644 --- a/tests/integration/test_disk_over_web_server/test.py +++ b/tests/integration/test_disk_over_web_server/test.py @@ -358,7 +358,6 @@ def test_page_cache(cluster): node.query("SYSTEM FLUSH LOGS") def get_profile_events(query_name): - print(f"asdqwe {query_name}") text = node.query( f"SELECT ProfileEvents.Names, ProfileEvents.Values FROM system.query_log ARRAY JOIN ProfileEvents WHERE query LIKE '% -- {query_name}' AND type = 'QueryFinish'" ) @@ -367,7 +366,6 @@ def test_page_cache(cluster): if line == "": continue name, value = line.split("\t") - print(f"asdqwe {name} = {int(value)}") res[name] = int(value) return res diff --git a/tests/integration/test_hot_reload_storage_policy/configs/storage_configuration.xml b/tests/integration/test_hot_reload_storage_policy/configs/config.d/storage_configuration.xml similarity index 56% rename from tests/integration/test_hot_reload_storage_policy/configs/storage_configuration.xml rename to tests/integration/test_hot_reload_storage_policy/configs/config.d/storage_configuration.xml index 466ecde137d..8940efb3301 100644 --- a/tests/integration/test_hot_reload_storage_policy/configs/storage_configuration.xml +++ b/tests/integration/test_hot_reload_storage_policy/configs/config.d/storage_configuration.xml @@ -4,18 +4,25 @@ /var/lib/clickhouse/disk0/ - - /var/lib/clickhouse/disk1/ - - + disk0 - + + + + + + localhost + 9000 + + + + \ No newline at end of file diff --git a/tests/integration/test_hot_reload_storage_policy/test.py b/tests/integration/test_hot_reload_storage_policy/test.py index 8654b0462e4..1d38f39d72c 100644 --- a/tests/integration/test_hot_reload_storage_policy/test.py +++ b/tests/integration/test_hot_reload_storage_policy/test.py @@ -10,11 +10,8 @@ from helpers.cluster import ClickHouseCluster from helpers.test_tools import TSV cluster = ClickHouseCluster(__file__) -node0 = cluster.add_instance( - "node0", with_zookeeper=True, main_configs=["configs/storage_configuration.xml"] -) -node1 = cluster.add_instance( - "node1", with_zookeeper=True, main_configs=["configs/storage_configuration.xml"] +node = cluster.add_instance( + "node", main_configs=["configs/config.d/storage_configuration.xml"], stay_alive=True ) @@ -28,6 +25,37 @@ def started_cluster(): cluster.shutdown() +old_disk_config = """ + + + + + /var/lib/clickhouse/disk0/ + + + + + + + disk0 + + + + + + + + + + localhost + 9000 + + + + + +""" + new_disk_config = """ @@ -38,49 +66,120 @@ new_disk_config = """ /var/lib/clickhouse/disk1/ - - /var/lib/clickhouse/disk2/ - - - disk2 + disk1 + + disk0 - + + + + + + localhost + 9000 + + + + """ def set_config(node, config): - node.replace_config( - "/etc/clickhouse-server/config.d/storage_configuration.xml", config - ) + node.replace_config("/etc/clickhouse-server/config.d/config.xml", config) node.query("SYSTEM RELOAD CONFIG") + # to give ClickHouse time to refresh disks + time.sleep(1) def test_hot_reload_policy(started_cluster): - node0.query( - "CREATE TABLE t (d Int32, s String) ENGINE = ReplicatedMergeTree('/clickhouse/tables/t', '0') PARTITION BY d ORDER BY tuple() SETTINGS storage_policy = 'default_policy'" + node.query( + "CREATE TABLE t (d Int32, s String) ENGINE = MergeTree() PARTITION BY d ORDER BY tuple() SETTINGS storage_policy = 'default_policy'" ) - node0.query("INSERT INTO TABLE t VALUES (1, 'foo') (1, 'bar')") + node.query("SYSTEM STOP MERGES t") + node.query("INSERT INTO TABLE t VALUES (1, 'foo')") - node1.query( - "CREATE TABLE t (d Int32, s String) ENGINE = ReplicatedMergeTree('/clickhouse/tables/t_mirror', '1') PARTITION BY d ORDER BY tuple() SETTINGS storage_policy = 'default_policy'" + set_config(node, new_disk_config) + + # After reloading new policy with new disk, merge tree tables should reinitialize the new disk (create relative path, 'detached' folder...) + # and as default policy is `least_used`, at least one insertion should come to the new disk + node.query("INSERT INTO TABLE t VALUES (1, 'foo')") + node.query("INSERT INTO TABLE t VALUES (1, 'bar')") + + num_disks = int( + node.query( + "SELECT uniqExact(disk_name) FROM system.parts WHERE database = 'default' AND table = 't'" + ) ) - set_config(node1, new_disk_config) - time.sleep(1) - node1.query("ALTER TABLE t FETCH PARTITION 1 FROM '/clickhouse/tables/t'") - result = int(node1.query("SELECT count() FROM t")) + assert ( - result == 4, - "Node should have 2 x full data (4 rows) after reloading storage configuration and fetch new partition, but get {} rows".format( - result - ), + num_disks == 2 + ), "Node should write data to 2 disks after reloading disks, but got {}".format( + num_disks ) + + # If `detached` is not created this query will throw exception + node.query("ALTER TABLE t DETACH PARTITION 1") + + node.query("DROP TABLE t") + + +def test_hot_reload_policy_distributed_table(started_cluster): + # Same test for distributed table, it should reinitialize the storage policy and data volume + # We check it by trying an insert and the distribution queue must be on new disk + + # Restart node first + set_config(node, old_disk_config) + node.restart_clickhouse() + + node.query( + "CREATE TABLE t (d Int32, s String) ENGINE = MergeTree PARTITION BY d ORDER BY tuple()" + ) + node.query( + "CREATE TABLE t_d (d Int32, s String) ENGINE = Distributed('default', 'default', 't', d%20, 'default_policy')" + ) + + node.query("SYSTEM STOP DISTRIBUTED SENDS t_d") + node.query( + "INSERT INTO TABLE t_d SETTINGS prefer_localhost_replica = 0 VALUES (2, 'bar') (12, 'bar')" + ) + # t_d should create queue on disk0 + queue_path = node.query("SELECT data_path FROM system.distribution_queue") + + assert ( + "disk0" in queue_path + ), "Distributed table should create distributed queue on disk0 (disk1), but the queue path is {}".format( + queue_path + ) + + node.query("SYSTEM START DISTRIBUTED SENDS t_d") + + node.query("SYSTEM FLUSH DISTRIBUTED t_d") + + set_config(node, new_disk_config) + + node.query("SYSTEM STOP DISTRIBUTED SENDS t_d") + node.query( + "INSERT INTO TABLE t_d SETTINGS prefer_localhost_replica = 0 VALUES (2, 'bar') (12, 'bar')" + ) + + # t_d should create queue on disk1 + queue_path = node.query("SELECT data_path FROM system.distribution_queue") + + assert ( + "disk1" in queue_path + ), "Distributed table should be using new disk (disk1), but the queue paths are {}".format( + queue_path + ) + + node.query("DROP TABLE t") + node.query("DROP TABLE t_d") diff --git a/tests/integration/test_https_s3_table_function_with_http_proxy_no_tunneling/proxy-resolver/resolver.py b/tests/integration/test_https_s3_table_function_with_http_proxy_no_tunneling/proxy-resolver/resolver.py index 8c7611303b8..eaea4c1dab2 100644 --- a/tests/integration/test_https_s3_table_function_with_http_proxy_no_tunneling/proxy-resolver/resolver.py +++ b/tests/integration/test_https_s3_table_function_with_http_proxy_no_tunneling/proxy-resolver/resolver.py @@ -5,7 +5,10 @@ import bottle @bottle.route("/hostname") def index(): - return "proxy1" + if random.randrange(2) == 0: + return "proxy1" + else: + return "proxy2" bottle.run(host="0.0.0.0", port=8080) diff --git a/tests/integration/test_https_s3_table_function_with_http_proxy_no_tunneling/test.py b/tests/integration/test_https_s3_table_function_with_http_proxy_no_tunneling/test.py index ae872a33cd4..3c8a5de8691 100644 --- a/tests/integration/test_https_s3_table_function_with_http_proxy_no_tunneling/test.py +++ b/tests/integration/test_https_s3_table_function_with_http_proxy_no_tunneling/test.py @@ -56,7 +56,7 @@ def test_s3_with_https_proxy_list(cluster): def test_s3_with_https_remote_proxy(cluster): - proxy_util.simple_test(cluster, ["proxy1"], "https", "remote_proxy_node") + proxy_util.simple_test(cluster, ["proxy1", "proxy2"], "https", "remote_proxy_node") def test_s3_with_https_env_proxy(cluster): diff --git a/tests/integration/test_keeper_client/test.py b/tests/integration/test_keeper_client/test.py index fbfc38ca35c..ca22c119281 100644 --- a/tests/integration/test_keeper_client/test.py +++ b/tests/integration/test_keeper_client/test.py @@ -61,7 +61,6 @@ def test_big_family(client: KeeperClient): ) response = client.find_big_family("/test_big_family", 2) - assert response == TSV( [ ["/test_big_family", "11"], @@ -87,7 +86,12 @@ def test_find_super_nodes(client: KeeperClient): client.cd("/test_find_super_nodes") response = client.find_super_nodes(4) - assert response == TSV( + + # The order of the response is not guaranteed, so we need to sort it + normalized_response = response.strip().split("\n") + normalized_response.sort() + + assert TSV(normalized_response) == TSV( [ ["/test_find_super_nodes/1", "5"], ["/test_find_super_nodes/2", "4"], diff --git a/tests/integration/test_keeper_snapshots/test.py b/tests/integration/test_keeper_snapshots/test.py index f6f746c892e..6dfb2078559 100644 --- a/tests/integration/test_keeper_snapshots/test.py +++ b/tests/integration/test_keeper_snapshots/test.py @@ -17,7 +17,6 @@ node = cluster.add_instance( "node", main_configs=["configs/enable_keeper.xml"], stay_alive=True, - with_zookeeper=True, ) @@ -211,3 +210,46 @@ def test_invalid_snapshot(started_cluster): node_zk.close() except: pass + + +def test_snapshot_size(started_cluster): + keeper_utils.wait_until_connected(started_cluster, node) + node_zk = None + try: + node_zk = get_connection_zk("node") + + node_zk.create("/test_state_size", b"somevalue") + strs = [] + for i in range(100): + strs.append(random_string(123).encode()) + node_zk.create("/test_state_size/node" + str(i), strs[i]) + + node_zk.stop() + node_zk.close() + + keeper_utils.send_4lw_cmd(started_cluster, node, "csnp") + node.wait_for_log_line("Created persistent snapshot") + + def get_snapshot_size(): + return int( + next( + filter( + lambda line: "zk_latest_snapshot_size" in line, + keeper_utils.send_4lw_cmd(started_cluster, node, "mntr").split( + "\n" + ), + ) + ).split("\t")[1] + ) + + assert get_snapshot_size() != 0 + restart_clickhouse() + assert get_snapshot_size() != 0 + finally: + try: + if node_zk is not None: + node_zk.stop() + node_zk.close() + + except: + pass diff --git a/tests/integration/test_move_ttl_broken_compatibility/__init__.py b/tests/integration/test_move_ttl_broken_compatibility/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_move_ttl_broken_compatibility/configs/storage_conf.xml b/tests/integration/test_move_ttl_broken_compatibility/configs/storage_conf.xml new file mode 100644 index 00000000000..1b2177d0392 --- /dev/null +++ b/tests/integration/test_move_ttl_broken_compatibility/configs/storage_conf.xml @@ -0,0 +1,36 @@ + + + test + + + + + + s3 + http://minio1:9001/root/data/ + minio + minio123 + + + + + + default + + + + + + default + False + +
+ s3 + False +
+
+ 0.0 +
+
+
+
diff --git a/tests/integration/test_move_ttl_broken_compatibility/test.py b/tests/integration/test_move_ttl_broken_compatibility/test.py new file mode 100644 index 00000000000..f9eab8b5ebb --- /dev/null +++ b/tests/integration/test_move_ttl_broken_compatibility/test.py @@ -0,0 +1,105 @@ +#!/usr/bin/env python3 + +import logging +import random +import string +import time + +import pytest +from helpers.cluster import ClickHouseCluster +import minio + + +cluster = ClickHouseCluster(__file__) + + +@pytest.fixture(scope="module") +def started_cluster(): + try: + cluster.add_instance( + "node1", + main_configs=["configs/storage_conf.xml"], + image="clickhouse/clickhouse-server", + with_minio=True, + tag="24.1", + stay_alive=True, + with_installed_binary=True, + ) + cluster.start() + + yield cluster + finally: + cluster.shutdown() + + +def test_bc_compatibility(started_cluster): + node1 = cluster.instances["node1"] + node1.query( + """ + CREATE TABLE test_ttl_table ( + generation UInt64, + date_key DateTime, + number UInt64, + text String, + expired DateTime DEFAULT now() + ) + ENGINE=MergeTree + ORDER BY (generation, date_key) + PARTITION BY toMonth(date_key) + TTL expired + INTERVAL 20 SECONDS TO DISK 's3' + SETTINGS storage_policy = 's3'; + """ + ) + + node1.query( + """ + INSERT INTO test_ttl_table ( + generation, + date_key, + number, + text + ) + SELECT + 1, + toDateTime('2000-01-01 00:00:00') + rand(number) % 365 * 86400, + number, + toString(number) + FROM numbers(10000); + """ + ) + + disks = ( + node1.query( + """ + SELECT distinct disk_name + FROM system.parts + WHERE table = 'test_ttl_table' + """ + ) + .strip() + .split("\n") + ) + print("Disks before", disks) + + assert len(disks) == 1 + assert disks[0] == "default" + + node1.restart_with_latest_version() + + for _ in range(60): + disks = ( + node1.query( + """ + SELECT distinct disk_name + FROM system.parts + WHERE table = 'test_ttl_table' + """ + ) + .strip() + .split("\n") + ) + print("Disks after", disks) + if "s3" in disks: + break + time.sleep(1) + assert "s3" in disks diff --git a/tests/integration/test_multiple_disks/test.py b/tests/integration/test_multiple_disks/test.py index fdd81284b2a..e97ffeb4cc3 100644 --- a/tests/integration/test_multiple_disks/test.py +++ b/tests/integration/test_multiple_disks/test.py @@ -1783,15 +1783,12 @@ def test_move_across_policies_does_not_work(start_cluster): except QueryRuntimeException: """All parts of partition 'all' are already on disk 'jbod2'.""" - with pytest.raises( - QueryRuntimeException, - match=".*because disk does not belong to storage policy.*", - ): - node1.query( - """ALTER TABLE {name}2 ATTACH PARTITION tuple() FROM {name}""".format( - name=name - ) + # works when attach + node1.query( + """ALTER TABLE {name}2 ATTACH PARTITION tuple() FROM {name}""".format( + name=name ) + ) with pytest.raises( QueryRuntimeException, @@ -1814,7 +1811,7 @@ def test_move_across_policies_does_not_work(start_cluster): ) assert node1.query( - """SELECT * FROM {name}""".format(name=name) + """SELECT * FROM {name}2""".format(name=name) ).splitlines() == ["1"] finally: diff --git a/tests/integration/test_s3_plain_rewritable/configs/storage_conf.xml b/tests/integration/test_s3_plain_rewritable/configs/storage_conf.xml index 560e6b6eca4..23368394494 100644 --- a/tests/integration/test_s3_plain_rewritable/configs/storage_conf.xml +++ b/tests/integration/test_s3_plain_rewritable/configs/storage_conf.xml @@ -8,6 +8,13 @@ minio minio123 + + cache + disk_s3_plain_rewritable + /var/lib/clickhouse/disks/s3_plain_rewritable_cache/ + 1000000000 + 1 + @@ -17,6 +24,13 @@ + + +
+ disk_cache_s3_plain_rewritable +
+
+
diff --git a/tests/integration/test_s3_plain_rewritable/test.py b/tests/integration/test_s3_plain_rewritable/test.py index 06967958631..4b1aaafc814 100644 --- a/tests/integration/test_s3_plain_rewritable/test.py +++ b/tests/integration/test_s3_plain_rewritable/test.py @@ -8,11 +8,8 @@ from helpers.cluster import ClickHouseCluster cluster = ClickHouseCluster(__file__) NUM_WORKERS = 5 - MAX_ROWS = 1000 -dirs_created = [] - def gen_insert_values(size): return ",".join( @@ -46,8 +43,14 @@ def start_cluster(): cluster.shutdown() -@pytest.mark.order(0) -def test_insert(): +@pytest.mark.parametrize( + "storage_policy", + [ + pytest.param("s3_plain_rewritable"), + pytest.param("cache_s3_plain_rewritable"), + ], +) +def test(storage_policy): def create_insert(node, insert_values): node.query( """ @@ -56,8 +59,10 @@ def test_insert(): data String ) ENGINE=MergeTree() ORDER BY id - SETTINGS storage_policy='s3_plain_rewritable' - """ + SETTINGS storage_policy='{}' + """.format( + storage_policy + ) ) node.query("INSERT INTO test VALUES {}".format(insert_values)) @@ -107,25 +112,6 @@ def test_insert(): != -1 ) - created = int( - node.query( - "SELECT value FROM system.events WHERE event = 'DiskPlainRewritableS3DirectoryCreated'" - ) - ) - assert created > 0 - dirs_created.append(created) - assert ( - int( - node.query( - "SELECT value FROM system.metrics WHERE metric = 'DiskPlainRewritableS3DirectoryMapSize'" - ) - ) - == created - ) - - -@pytest.mark.order(1) -def test_restart(): insert_values_arr = [] for i in range(NUM_WORKERS): node = cluster.instances[f"node{i + 1}"] @@ -138,6 +124,7 @@ def test_restart(): threads = [] for i in range(NUM_WORKERS): + node = cluster.instances[f"node{i + 1}"] t = threading.Thread(target=restart, args=(node,)) threads.append(t) t.start() @@ -152,21 +139,10 @@ def test_restart(): == insert_values_arr[i] ) - -@pytest.mark.order(2) -def test_drop(): for i in range(NUM_WORKERS): node = cluster.instances[f"node{i + 1}"] node.query("DROP TABLE IF EXISTS test SYNC") - removed = int( - node.query( - "SELECT value FROM system.events WHERE event = 'DiskPlainRewritableS3DirectoryRemoved'" - ) - ) - - assert dirs_created[i] == removed - it = cluster.minio_client.list_objects( cluster.minio_bucket, "data/", recursive=True ) diff --git a/tests/integration/test_s3_storage_conf_new_proxy/configs/config.d/proxy_list.xml b/tests/integration/test_s3_storage_conf_new_proxy/configs/config.d/proxy_list.xml index 24c1eb29fbc..84e91495304 100644 --- a/tests/integration/test_s3_storage_conf_new_proxy/configs/config.d/proxy_list.xml +++ b/tests/integration/test_s3_storage_conf_new_proxy/configs/config.d/proxy_list.xml @@ -2,7 +2,6 @@ http://proxy1 - http://proxy2 diff --git a/tests/integration/test_s3_storage_conf_new_proxy/configs/config.d/storage_conf.xml b/tests/integration/test_s3_storage_conf_new_proxy/configs/config.d/storage_conf.xml index 94ac83b32ac..1d31272a395 100644 --- a/tests/integration/test_s3_storage_conf_new_proxy/configs/config.d/storage_conf.xml +++ b/tests/integration/test_s3_storage_conf_new_proxy/configs/config.d/storage_conf.xml @@ -3,7 +3,7 @@ s3 - http://minio1:9001/root/data/ + http://minio1:9001/root/data/s3 minio minio123 diff --git a/tests/integration/test_s3_storage_conf_new_proxy/test.py b/tests/integration/test_s3_storage_conf_new_proxy/test.py index c98eb05a217..3b3b07aaa09 100644 --- a/tests/integration/test_s3_storage_conf_new_proxy/test.py +++ b/tests/integration/test_s3_storage_conf_new_proxy/test.py @@ -3,6 +3,7 @@ import time import pytest from helpers.cluster import ClickHouseCluster +import helpers.s3_url_proxy_tests_util as proxy_util @pytest.fixture(scope="module") @@ -26,41 +27,8 @@ def cluster(): cluster.shutdown() -def check_proxy_logs(cluster, proxy_instance, http_methods={"POST", "PUT", "GET"}): - for i in range(10): - logs = cluster.get_container_logs(proxy_instance) - # Check with retry that all possible interactions with Minio are present - for http_method in http_methods: - if logs.find(http_method + " http://minio1") >= 0: - return - time.sleep(1) - else: - assert False, f"{http_methods} method not found in logs of {proxy_instance}" - - @pytest.mark.parametrize("policy", ["s3"]) def test_s3_with_proxy_list(cluster, policy): - node = cluster.instances["node"] - - node.query( - """ - CREATE TABLE s3_test ( - id Int64, - data String - ) ENGINE=MergeTree() - ORDER BY id - SETTINGS storage_policy='{}' - """.format( - policy - ) + proxy_util.simple_storage_test( + cluster, cluster.instances["node"], ["proxy1"], policy ) - node.query("INSERT INTO s3_test VALUES (0,'data'),(1,'data')") - assert ( - node.query("SELECT * FROM s3_test order by id FORMAT Values") - == "(0,'data'),(1,'data')" - ) - - node.query("DROP TABLE IF EXISTS s3_test SYNC") - - for proxy in ["proxy1", "proxy2"]: - check_proxy_logs(cluster, proxy, ["PUT", "GET"]) diff --git a/tests/integration/test_s3_storage_conf_proxy/configs/config.d/storage_conf.xml b/tests/integration/test_s3_storage_conf_proxy/configs/config.d/storage_conf.xml index 132eac4a2a6..73e7e8175c5 100644 --- a/tests/integration/test_s3_storage_conf_proxy/configs/config.d/storage_conf.xml +++ b/tests/integration/test_s3_storage_conf_proxy/configs/config.d/storage_conf.xml @@ -3,7 +3,7 @@ s3 - http://minio1:9001/root/data/ + http://minio1:9001/root/data/s3 minio minio123 @@ -13,9 +13,10 @@ s3 - http://minio1:9001/root/data/ + http://minio1:9001/root/data/s3_with_resolver minio minio123 + true not usable as primary key +But Map(Nothing, ...) can be a non-primary-key, it is quite useless though ... +Map(Float32, ...) and Map(LC(String)) are okay as primary key +{1:'a'} {'b':'b'} +{2:'aa'} {'bb':'bb'} +Map(Float32, ...) and Map(LC(String)) as non-primary-key +{1:'a'} {'b':'b'} +{3:'aaa'} {'bb':'bb'} diff --git a/tests/queries/0_stateless/03034_ddls_and_merges_with_unusual_maps.sql b/tests/queries/0_stateless/03034_ddls_and_merges_with_unusual_maps.sql new file mode 100644 index 00000000000..a3cd59df1cd --- /dev/null +++ b/tests/queries/0_stateless/03034_ddls_and_merges_with_unusual_maps.sql @@ -0,0 +1,33 @@ +-- Tests maps with "unusual" key types (Float32, Nothing, LowCardinality(String)) + +SET mutations_sync = 2; + +DROP TABLE IF EXISTS tab; + +SELECT 'Map(Nothing, ...) is non-comparable --> not usable as primary key'; +CREATE TABLE tab (m1 Map(Nothing, String)) ENGINE = MergeTree ORDER BY m1; -- { serverError DATA_TYPE_CANNOT_BE_USED_IN_KEY } + +SELECT 'But Map(Nothing, ...) can be a non-primary-key, it is quite useless though ...'; +CREATE TABLE tab (m3 Map(Nothing, String)) ENGINE = MergeTree ORDER BY tuple(); +-- INSERT INTO tab VALUES (map('', 'd')); -- { serverError NOT_IMPLEMENTED } -- The client can't serialize the data and fails. The query + -- doesn't reach the server and we can't check via 'serverError' :-/ +DROP TABLE tab; + +SELECT 'Map(Float32, ...) and Map(LC(String)) are okay as primary key'; +CREATE TABLE tab (m1 Map(Float32, String), m2 Map(LowCardinality(String), String)) ENGINE = MergeTree ORDER BY (m1, m2); +INSERT INTO tab VALUES (map(1.0, 'a'), map('b', 'b')); +INSERT INTO tab VALUES (map(2.0, 'aa'), map('bb', 'bb')); + +-- Test merge +OPTIMIZE TABLE tab FINAL; +SELECT * FROM tab ORDER BY m1, m2; + +DROP TABLE tab; + +SELECT 'Map(Float32, ...) and Map(LC(String)) as non-primary-key'; +CREATE TABLE tab (m1 Map(Float32, String), m2 Map(LowCardinality(String), String)) ENGINE = MergeTree ORDER BY tuple(); +INSERT INTO tab VALUES (map(1.0, 'a'), map('b', 'b')), (map(2.0, 'aa'), map('bb', 'bb')); +ALTER TABLE tab UPDATE m1 = map(3.0, 'aaa') WHERE m1 = map(2.0, 'aa'); +SELECT * FROM tab ORDER BY m1, m2; + +DROP TABLE tab; diff --git a/tests/queries/0_stateless/03095_window_functions_qualify.sql b/tests/queries/0_stateless/03095_window_functions_qualify.sql index 35e203a2ffc..adedff2e2cf 100644 --- a/tests/queries/0_stateless/03095_window_functions_qualify.sql +++ b/tests/queries/0_stateless/03095_window_functions_qualify.sql @@ -27,10 +27,10 @@ SELECT '--'; EXPLAIN header = 1, actions = 1 SELECT number, COUNT() OVER (PARTITION BY number % 3) AS partition_count FROM numbers(10) QUALIFY COUNT() OVER (PARTITION BY number % 3) = 4 ORDER BY number; -SELECT number % toUInt256(2) AS key, count() FROM numbers(10) GROUP BY key WITH CUBE WITH TOTALS QUALIFY key = toNullable(toNullable(0)); -- { serverError 48 } +SELECT number % toUInt256(2) AS key, count() FROM numbers(10) GROUP BY key WITH CUBE WITH TOTALS QUALIFY key = toNullable(toNullable(0)); -- { serverError NOT_IMPLEMENTED } -SELECT number % 2 AS key, count(materialize(5)) IGNORE NULLS FROM numbers(10) WHERE toLowCardinality(toLowCardinality(materialize(2))) GROUP BY key WITH CUBE WITH TOTALS QUALIFY key = 0; -- { serverError 48 } +SELECT number % 2 AS key, count(materialize(5)) IGNORE NULLS FROM numbers(10) WHERE toLowCardinality(toLowCardinality(materialize(2))) GROUP BY key WITH CUBE WITH TOTALS QUALIFY key = 0; -- { serverError NOT_IMPLEMENTED } -SELECT 4, count(4) IGNORE NULLS, number % 2 AS key FROM numbers(10) GROUP BY key WITH ROLLUP WITH TOTALS QUALIFY key = materialize(0); -- { serverError 48 } +SELECT 4, count(4) IGNORE NULLS, number % 2 AS key FROM numbers(10) GROUP BY key WITH ROLLUP WITH TOTALS QUALIFY key = materialize(0); -- { serverError NOT_IMPLEMENTED } -SELECT 3, number % toLowCardinality(2) AS key, count() IGNORE NULLS FROM numbers(10) GROUP BY key WITH ROLLUP WITH TOTALS QUALIFY key = 0; -- { serverError 48 } +SELECT 3, number % toLowCardinality(2) AS key, count() IGNORE NULLS FROM numbers(10) GROUP BY key WITH ROLLUP WITH TOTALS QUALIFY key = 0; -- { serverError NOT_IMPLEMENTED } diff --git a/tests/queries/0_stateless/03096_text_log_format_string_args_not_empty.sql b/tests/queries/0_stateless/03096_text_log_format_string_args_not_empty.sql index cffc8a49c67..b1ddd141e04 100644 --- a/tests/queries/0_stateless/03096_text_log_format_string_args_not_empty.sql +++ b/tests/queries/0_stateless/03096_text_log_format_string_args_not_empty.sql @@ -1,8 +1,8 @@ set allow_experimental_analyzer = true; -select count; -- { serverError 47 } +select count; -- { serverError UNKNOWN_IDENTIFIER } -select conut(); -- { serverError 46 } +select conut(); -- { serverError UNKNOWN_FUNCTION } system flush logs; diff --git a/tests/queries/0_stateless/03131_deprecated_functions.sql b/tests/queries/0_stateless/03131_deprecated_functions.sql index 9247db15fd3..acdf36a50da 100644 --- a/tests/queries/0_stateless/03131_deprecated_functions.sql +++ b/tests/queries/0_stateless/03131_deprecated_functions.sql @@ -1,8 +1,8 @@ -SELECT number, neighbor(number, 2) FROM system.numbers LIMIT 10; -- { serverError 721 } +SELECT number, neighbor(number, 2) FROM system.numbers LIMIT 10; -- { serverError DEPRECATED_FUNCTION } -SELECT runningDifference(number) FROM system.numbers LIMIT 10; -- { serverError 721 } +SELECT runningDifference(number) FROM system.numbers LIMIT 10; -- { serverError DEPRECATED_FUNCTION } -SELECT k, runningAccumulate(sum_k) AS res FROM (SELECT number as k, sumState(k) AS sum_k FROM numbers(10) GROUP BY k ORDER BY k); -- { serverError 721 } +SELECT k, runningAccumulate(sum_k) AS res FROM (SELECT number as k, sumState(k) AS sum_k FROM numbers(10) GROUP BY k ORDER BY k); -- { serverError DEPRECATED_FUNCTION } SET allow_deprecated_error_prone_window_functions=1; diff --git a/tests/queries/0_stateless/03131_hilbert_coding.reference b/tests/queries/0_stateless/03131_hilbert_coding.reference new file mode 100644 index 00000000000..bdb578483fa --- /dev/null +++ b/tests/queries/0_stateless/03131_hilbert_coding.reference @@ -0,0 +1,8 @@ +----- START ----- +----- CONST ----- +133 +31 +(3,4) +----- 4294967296, 2 ----- +----- ERRORS ----- +----- END ----- diff --git a/tests/queries/0_stateless/03131_hilbert_coding.sql b/tests/queries/0_stateless/03131_hilbert_coding.sql new file mode 100644 index 00000000000..ed293dc6910 --- /dev/null +++ b/tests/queries/0_stateless/03131_hilbert_coding.sql @@ -0,0 +1,55 @@ +SELECT '----- START -----'; +drop table if exists hilbert_numbers_03131; +create table hilbert_numbers_03131( + n1 UInt32, + n2 UInt32 +) + Engine=MergeTree() + ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi'; + +SELECT '----- CONST -----'; +select hilbertEncode(133); +select hilbertEncode(3, 4); +select hilbertDecode(2, 31); + +SELECT '----- 4294967296, 2 -----'; +insert into hilbert_numbers_03131 +select n1.number, n2.number +from numbers(pow(2, 32)-8,8) n1 + cross join numbers(pow(2, 32)-8, 8) n2 +; + +drop table if exists hilbert_numbers_1_03131; +create table hilbert_numbers_1_03131( + n1 UInt64, + n2 UInt64 +) + Engine=MergeTree() + ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi'; + +insert into hilbert_numbers_1_03131 +select untuple(hilbertDecode(2, hilbertEncode(n1, n2))) +from hilbert_numbers_03131; + +( + select n1, n2 from hilbert_numbers_03131 + union distinct + select n1, n2 from hilbert_numbers_1_03131 +) +except +( + select n1, n2 from hilbert_numbers_03131 + intersect + select n1, n2 from hilbert_numbers_1_03131 +); +drop table if exists hilbert_numbers_1_03131; + +select '----- ERRORS -----'; +select hilbertEncode(); -- { serverError TOO_FEW_ARGUMENTS_FOR_FUNCTION } +select hilbertDecode(); -- { serverError NUMBER_OF_ARGUMENTS_DOESNT_MATCH } +select hilbertEncode('text'); -- { serverError ILLEGAL_TYPE_OF_ARGUMENT } +select hilbertDecode('text', 'text'); -- { serverError ILLEGAL_COLUMN } +select hilbertEncode((1, 2), 3); -- { serverError ARGUMENT_OUT_OF_BOUND } + +SELECT '----- END -----'; +drop table if exists hilbert_numbers_03131; diff --git a/tests/queries/0_stateless/03135_keeper_client_find_commands.sh b/tests/queries/0_stateless/03135_keeper_client_find_commands.sh index 0acc4014f1f..0f57694028d 100755 --- a/tests/queries/0_stateless/03135_keeper_client_find_commands.sh +++ b/tests/queries/0_stateless/03135_keeper_client_find_commands.sh @@ -21,7 +21,7 @@ $CLICKHOUSE_KEEPER_CLIENT -q "create $path/1/d/c 'foobar'" echo 'find_super_nodes' $CLICKHOUSE_KEEPER_CLIENT -q "find_super_nodes 1000000000" -$CLICKHOUSE_KEEPER_CLIENT -q "find_super_nodes 3 $path" +$CLICKHOUSE_KEEPER_CLIENT -q "find_super_nodes 3 $path" | sort echo 'find_big_family' $CLICKHOUSE_KEEPER_CLIENT -q "find_big_family $path 3" diff --git a/tests/queries/0_stateless/03143_prewhere_profile_events.reference b/tests/queries/0_stateless/03143_prewhere_profile_events.reference new file mode 100644 index 00000000000..32c93b89dc5 --- /dev/null +++ b/tests/queries/0_stateless/03143_prewhere_profile_events.reference @@ -0,0 +1,4 @@ +52503 10000000 +52503 10052503 +26273 10000000 +0 10052503 diff --git a/tests/queries/0_stateless/03143_prewhere_profile_events.sh b/tests/queries/0_stateless/03143_prewhere_profile_events.sh new file mode 100755 index 00000000000..863fcc1fe01 --- /dev/null +++ b/tests/queries/0_stateless/03143_prewhere_profile_events.sh @@ -0,0 +1,84 @@ +#!/usr/bin/env bash +# Tags: no-random-merge-tree-settings + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +${CLICKHOUSE_CLIENT} -nq " + DROP TABLE IF EXISTS t; + + CREATE TABLE t(a UInt32, b UInt32, c UInt32, d UInt32) ENGINE=MergeTree ORDER BY a SETTINGS min_bytes_for_wide_part=1, min_rows_for_wide_part=1; + + INSERT INTO t SELECT number, number, number, number FROM numbers_mt(1e7); + + OPTIMIZE TABLE t FINAL; +" + +query_id_1=$RANDOM$RANDOM +query_id_2=$RANDOM$RANDOM +query_id_3=$RANDOM$RANDOM +query_id_4=$RANDOM$RANDOM + +client_opts=( + --max_block_size 65409 + --max_threads 8 +) + +${CLICKHOUSE_CLIENT} "${client_opts[@]}" --query_id "$query_id_1" -nq " + SELECT * + FROM t +PREWHERE (b % 8192) = 42 + WHERE c = 42 + FORMAT Null +" + +${CLICKHOUSE_CLIENT} "${client_opts[@]}" --query_id "$query_id_2" -nq " + SELECT * + FROM t +PREWHERE (b % 8192) = 42 AND (c % 8192) = 42 + WHERE d = 42 + FORMAT Null +settings enable_multiple_prewhere_read_steps=1; +" + +${CLICKHOUSE_CLIENT} "${client_opts[@]}" --query_id "$query_id_3" -nq " + SELECT * + FROM t +PREWHERE (b % 8192) = 42 AND (c % 16384) = 42 + WHERE d = 42 + FORMAT Null +settings enable_multiple_prewhere_read_steps=0; +" + +${CLICKHOUSE_CLIENT} "${client_opts[@]}" --query_id "$query_id_4" -nq " + SELECT b, c + FROM t +PREWHERE (b % 8192) = 42 AND (c % 8192) = 42 + FORMAT Null +settings enable_multiple_prewhere_read_steps=1; +" + +${CLICKHOUSE_CLIENT} -nq " + SYSTEM FLUSH LOGS; + + -- 52503 which is 43 * number of granules, 10000000 + SELECT ProfileEvents['RowsReadByMainReader'], ProfileEvents['RowsReadByPrewhereReaders'] + FROM system.query_log + WHERE current_database=currentDatabase() AND query_id = '$query_id_1' and type = 'QueryFinish'; + + -- 52503, 10052503 which is the sum of 10000000 from the first prewhere step plus 52503 from the second + SELECT ProfileEvents['RowsReadByMainReader'], ProfileEvents['RowsReadByPrewhereReaders'] + FROM system.query_log + WHERE current_database=currentDatabase() AND query_id = '$query_id_2' and type = 'QueryFinish'; + + -- 26273 the same as query #1 but twice less data (43 * ceil((52503 / 43) / 2)), 10000000 + SELECT ProfileEvents['RowsReadByMainReader'], ProfileEvents['RowsReadByPrewhereReaders'] + FROM system.query_log + WHERE current_database=currentDatabase() AND query_id = '$query_id_3' and type = 'QueryFinish'; + + -- 0, 10052503 + SELECT ProfileEvents['RowsReadByMainReader'], ProfileEvents['RowsReadByPrewhereReaders'] + FROM system.query_log + WHERE current_database=currentDatabase() AND query_id = '$query_id_4' and type = 'QueryFinish'; +" diff --git a/tests/queries/0_stateless/03143_window_functions_qualify_validation.sql b/tests/queries/0_stateless/03143_window_functions_qualify_validation.sql index 2b6d1820b00..5adbe7ff2a7 100644 --- a/tests/queries/0_stateless/03143_window_functions_qualify_validation.sql +++ b/tests/queries/0_stateless/03143_window_functions_qualify_validation.sql @@ -19,8 +19,8 @@ CREATE TABLE uk_price_paid ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2); -SELECT count(), (quantile(0.9)(price) OVER ()) AS price_quantile FROM uk_price_paid WHERE toYear(date) = 2023 QUALIFY price > price_quantile; -- { serverError 215 } +SELECT count(), (quantile(0.9)(price) OVER ()) AS price_quantile FROM uk_price_paid WHERE toYear(date) = 2023 QUALIFY price > price_quantile; -- { serverError NOT_AN_AGGREGATE } -SELECT count() FROM uk_price_paid WHERE toYear(date) = 2023 QUALIFY price > (quantile(0.9)(price) OVER ()); -- { serverError 215 } +SELECT count() FROM uk_price_paid WHERE toYear(date) = 2023 QUALIFY price > (quantile(0.9)(price) OVER ()); -- { serverError NOT_AN_AGGREGATE } DROP TABLE uk_price_paid; diff --git a/tests/queries/0_stateless/03147_parquet_memory_tracking.reference b/tests/queries/0_stateless/03147_parquet_memory_tracking.reference new file mode 100644 index 00000000000..573541ac970 --- /dev/null +++ b/tests/queries/0_stateless/03147_parquet_memory_tracking.reference @@ -0,0 +1 @@ +0 diff --git a/tests/queries/0_stateless/03147_parquet_memory_tracking.sql b/tests/queries/0_stateless/03147_parquet_memory_tracking.sql new file mode 100644 index 00000000000..aeca04ffb9d --- /dev/null +++ b/tests/queries/0_stateless/03147_parquet_memory_tracking.sql @@ -0,0 +1,13 @@ +-- Tags: no-fasttest, no-parallel + +-- Create an ~80 MB parquet file with one row group and one column. +insert into function file('03147_parquet_memory_tracking.parquet') select number from numbers(10000000) settings output_format_parquet_compression_method='none', output_format_parquet_row_group_size=1000000000000, engine_file_truncate_on_insert=1; + +-- Try to read it with 60 MB memory limit. Should fail because we read the 80 MB column all at once. +select sum(ignore(*)) from file('03147_parquet_memory_tracking.parquet') settings max_memory_usage=60000000; -- { serverError CANNOT_ALLOCATE_MEMORY } + +-- Try to read it with 500 MB memory limit, just in case. +select sum(ignore(*)) from file('03147_parquet_memory_tracking.parquet') settings max_memory_usage=500000000; + +-- Truncate the file to avoid leaving too much garbage behind. +insert into function file('03147_parquet_memory_tracking.parquet') select number from numbers(1) settings engine_file_truncate_on_insert=1; diff --git a/tests/queries/0_stateless/03147_table_function_loop.reference b/tests/queries/0_stateless/03147_table_function_loop.reference new file mode 100644 index 00000000000..46a2310b65f --- /dev/null +++ b/tests/queries/0_stateless/03147_table_function_loop.reference @@ -0,0 +1,65 @@ +0 +1 +2 +0 +1 +2 +0 +1 +2 +0 +0 +1 +2 +0 +1 +2 +0 +1 +2 +0 +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +0 +1 +2 +3 +4 +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +0 +1 +2 +3 +4 +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +0 +1 +2 +3 +4 diff --git a/tests/queries/0_stateless/03147_table_function_loop.sql b/tests/queries/0_stateless/03147_table_function_loop.sql new file mode 100644 index 00000000000..aa3c8e2def5 --- /dev/null +++ b/tests/queries/0_stateless/03147_table_function_loop.sql @@ -0,0 +1,16 @@ +-- Tags: no-parallel + +SELECT * FROM loop(numbers(3)) LIMIT 10; +SELECT * FROM loop (numbers(3)) LIMIT 10 settings max_block_size = 1; + +DROP DATABASE IF EXISTS 03147_db; +CREATE DATABASE IF NOT EXISTS 03147_db; +CREATE TABLE 03147_db.t (n Int8) ENGINE=MergeTree ORDER BY n; +INSERT INTO 03147_db.t SELECT * FROM numbers(10); +USE 03147_db; + +SELECT * FROM loop(03147_db.t) LIMIT 15; +SELECT * FROM loop(t) LIMIT 15; +SELECT * FROM loop(03147_db, t) LIMIT 15; + +SELECT * FROM loop('', '') -- { serverError UNKNOWN_TABLE } diff --git a/tests/queries/0_stateless/03161_decimal_binary_math.reference b/tests/queries/0_stateless/03161_decimal_binary_math.reference new file mode 100644 index 00000000000..f7d9761c7c5 --- /dev/null +++ b/tests/queries/0_stateless/03161_decimal_binary_math.reference @@ -0,0 +1,75 @@ +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +42.4242 2.42 8686.104718 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +0.4242 0.24 0.514871 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 2.42 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +42.4242 2.42 42.4242 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 +0.4242 0.4242 0.599909 diff --git a/tests/queries/0_stateless/03161_decimal_binary_math.sql b/tests/queries/0_stateless/03161_decimal_binary_math.sql new file mode 100644 index 00000000000..5484cc6a9bb --- /dev/null +++ b/tests/queries/0_stateless/03161_decimal_binary_math.sql @@ -0,0 +1,79 @@ +SELECT toDecimal32('42.4242', 4) AS x, toDecimal32('2.42', 2) AS y, round(pow(x, y), 6); +SELECT toDecimal64('42.4242', 4) AS x, toDecimal32('2.42', 2) AS y, round(pow(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, toDecimal64('2.42', 2) AS y, round(pow(x, y), 6); +SELECT toDecimal64('42.4242', 4) AS x, toDecimal32('2.42', 2) AS y, round(pow(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, materialize(toDecimal32('2.42', 2)) AS y, round(pow(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, toDecimal32('2.42', 2) AS y, round(pow(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, materialize(toDecimal32('2.42', 2)) AS y, round(pow(x, y), 6); +SELECT 42.4242 AS x, toDecimal32('2.42', 2) AS y, round(pow(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, 2.42 AS y, round(pow(x, y), 6); +SELECT materialize(42.4242) AS x, toDecimal32('2.42', 2) AS y, round(pow(x, y), 6); +SELECT 42.4242 AS x, materialize(toDecimal32('2.42', 2)) AS y, round(pow(x, y), 6); +SELECT materialize(42.4242) AS x, materialize(toDecimal32('2.42', 2)) AS y, round(pow(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, 2.42 AS y, round(pow(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, materialize(2.42) AS y, round(pow(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, materialize(2.42) AS y, round(pow(x, y), 6); + +SELECT toDecimal32('0.4242', 4) AS x, toDecimal32('0.24', 2) AS y, round(atan2(y, x), 6); +SELECT toDecimal64('0.4242', 4) AS x, toDecimal32('0.24', 2) AS y, round(atan2(y, x), 6); +SELECT toDecimal32('0.4242', 4) AS x, toDecimal64('0.24', 2) AS y, round(atan2(y, x), 6); +SELECT toDecimal64('0.4242', 4) AS x, toDecimal64('0.24', 2) AS y, round(atan2(y, x), 6); +SELECT toDecimal32('0.4242', 4) AS x, materialize(toDecimal32('0.24', 2)) AS y, round(atan2(y, x), 6); +SELECT materialize(toDecimal32('0.4242', 4)) AS x, toDecimal32('0.24', 2) AS y, round(atan2(y, x), 6); +SELECT materialize(toDecimal32('0.4242', 4)) AS x, materialize(toDecimal32('0.24', 2)) AS y, round(atan2(y, x), 6); +SELECT 0.4242 AS x, toDecimal32('0.24', 2) AS y, round(atan2(y, x), 6); +SELECT toDecimal32('0.4242', 4) AS x, 0.24 AS y, round(atan2(y, x), 6); +SELECT materialize(0.4242) AS x, toDecimal32('0.24', 2) AS y, round(atan2(y, x), 6); +SELECT 0.4242 AS x, materialize(toDecimal32('0.24', 2)) AS y, round(atan2(y, x), 6); +SELECT materialize(0.4242) AS x, materialize(toDecimal32('0.24', 2)) AS y, round(atan2(y, x), 6); +SELECT materialize(toDecimal32('0.4242', 4)) AS x, 0.24 AS y, round(atan2(y, x), 6); +SELECT toDecimal32('0.4242', 4) AS x, materialize(0.24) AS y, round(atan2(y, x), 6); +SELECT materialize(toDecimal32('0.4242', 4)) AS x, materialize(0.24) AS y, round(atan2(y, x), 6); + +SELECT toDecimal32('42.4242', 4) AS x, toDecimal32('2.42', 2) AS y, round(min2(x, y), 6); +SELECT toDecimal64('42.4242', 4) AS x, toDecimal32('2.42', 2) AS y, round(min2(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, toDecimal64('2.42', 2) AS y, round(min2(x, y), 6); +SELECT toDecimal64('42.4242', 4) AS x, toDecimal64('2.42', 2) AS y, round(min2(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, materialize(toDecimal32('2.42', 2)) AS y, round(min2(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, toDecimal32('2.42', 2) AS y, round(min2(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, materialize(toDecimal32('2.42', 2)) AS y, round(min2(x, y), 6); +SELECT 42.4242 AS x, toDecimal32('2.42', 2) AS y, round(min2(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, 2.42 AS y, round(min2(x, y), 6); +SELECT materialize(42.4242) AS x, toDecimal32('2.42', 2) AS y, round(min2(x, y), 6); +SELECT 42.4242 AS x, materialize(toDecimal32('2.42', 2)) AS y, round(min2(x, y), 6); +SELECT materialize(42.4242) AS x, materialize(toDecimal32('2.42', 2)) AS y, round(min2(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, 2.42 AS y, round(min2(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, materialize(2.42) AS y, round(min2(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, materialize(2.42) AS y, round(min2(x, y), 6); + +SELECT toDecimal32('42.4242', 4) AS x, toDecimal32('2.42', 2) AS y, round(max2(x, y), 6); +SELECT toDecimal64('42.4242', 4) AS x, toDecimal32('2.42', 2) AS y, round(max2(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, toDecimal64('2.42', 2) AS y, round(max2(x, y), 6); +SELECT toDecimal64('42.4242', 4) AS x, toDecimal64('2.42', 2) AS y, round(max2(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, materialize(toDecimal32('2.42', 2)) AS y, round(max2(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, toDecimal32('2.42', 2) AS y, round(max2(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, materialize(toDecimal32('2.42', 2)) AS y, round(max2(x, y), 6); +SELECT 42.4242 AS x, toDecimal32('2.42', 2) AS y, round(max2(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, 2.42 AS y, round(max2(x, y), 6); +SELECT materialize(42.4242) AS x, toDecimal32('2.42', 2) AS y, round(max2(x, y), 6); +SELECT 42.4242 AS x, materialize(toDecimal32('2.42', 2)) AS y, round(max2(x, y), 6); +SELECT materialize(42.4242) AS x, materialize(toDecimal32('2.42', 2)) AS y, round(max2(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, 2.42 AS y, round(max2(x, y), 6); +SELECT toDecimal32('42.4242', 4) AS x, materialize(2.42) AS y, round(max2(x, y), 6); +SELECT materialize(toDecimal32('42.4242', 4)) AS x, materialize(2.42) AS y, round(max2(x, y), 6); + +SELECT toDecimal32('0.4242', 4) AS x, toDecimal32('0.4242', 4) AS y, round(hypot(x, y), 6); +SELECT toDecimal64('0.4242', 4) AS x, toDecimal32('0.4242', 4) AS y, round(hypot(x, y), 6); +SELECT toDecimal32('0.4242', 4) AS x, toDecimal64('0.4242', 4) AS y, round(hypot(x, y), 6); +SELECT toDecimal64('0.4242', 4) AS x, toDecimal64('0.4242', 4) AS y, round(hypot(x, y), 6); +SELECT toDecimal32('0.4242', 4) AS x, materialize(toDecimal32('0.4242', 4)) AS y, round(hypot(x, y), 6); +SELECT materialize(toDecimal32('0.4242', 4)) AS x, toDecimal32('0.4242', 4) AS y, round(hypot(x, y), 6); +SELECT materialize(toDecimal32('0.4242', 4)) AS x, materialize(toDecimal32('0.4242', 4)) AS y, round(hypot(x, y), 6); +SELECT 0.4242 AS x, toDecimal32('0.4242', 4) AS y, round(hypot(x, y), 6); +SELECT toDecimal32('0.4242', 4) AS x, 0.4242 AS y, round(hypot(x, y), 6); +SELECT materialize(0.4242) AS x, toDecimal32('0.4242', 4) AS y, round(hypot(x, y), 6); +SELECT 0.4242 AS x, materialize(toDecimal32('0.4242', 4)) AS y, round(hypot(x, y), 6); +SELECT materialize(0.4242) AS x, materialize(toDecimal32('0.4242', 4)) AS y, round(hypot(x, y), 6); +SELECT materialize(toDecimal32('0.4242', 4)) AS x, 0.4242 AS y, round(hypot(x, y), 6); +SELECT toDecimal32('0.4242', 4) AS x, materialize(0.4242) AS y, round(hypot(x, y), 6); +SELECT materialize(toDecimal32('0.4242', 4)) AS x, materialize(0.4242) AS y, round(hypot(x, y), 6); diff --git a/tests/queries/0_stateless/03161_ipv4_ipv6_equality.reference b/tests/queries/0_stateless/03161_ipv4_ipv6_equality.reference new file mode 100644 index 00000000000..2a4cb2e658f --- /dev/null +++ b/tests/queries/0_stateless/03161_ipv4_ipv6_equality.reference @@ -0,0 +1,8 @@ +1 +1 +0 +0 +0 +0 +0 +0 diff --git a/tests/queries/0_stateless/03161_ipv4_ipv6_equality.sql b/tests/queries/0_stateless/03161_ipv4_ipv6_equality.sql new file mode 100644 index 00000000000..da2a660977a --- /dev/null +++ b/tests/queries/0_stateless/03161_ipv4_ipv6_equality.sql @@ -0,0 +1,11 @@ +-- Equal +SELECT toIPv4('127.0.0.1') = toIPv6('::ffff:127.0.0.1'); +SELECT toIPv6('::ffff:127.0.0.1') = toIPv4('127.0.0.1'); + +-- Not equal +SELECT toIPv4('127.0.0.1') = toIPv6('::ffff:127.0.0.2'); +SELECT toIPv4('127.0.0.2') = toIPv6('::ffff:127.0.0.1'); +SELECT toIPv6('::ffff:127.0.0.1') = toIPv4('127.0.0.2'); +SELECT toIPv6('::ffff:127.0.0.2') = toIPv4('127.0.0.1'); +SELECT toIPv4('127.0.0.1') = toIPv6('::ffef:127.0.0.1'); +SELECT toIPv6('::ffef:127.0.0.1') = toIPv4('127.0.0.1'); \ No newline at end of file diff --git a/tests/queries/0_stateless/03164_analyzer_global_in_alias.reference b/tests/queries/0_stateless/03164_analyzer_global_in_alias.reference new file mode 100644 index 00000000000..459605fc1db --- /dev/null +++ b/tests/queries/0_stateless/03164_analyzer_global_in_alias.reference @@ -0,0 +1,4 @@ +1 1 +1 +1 1 +1 diff --git a/tests/queries/0_stateless/03164_analyzer_global_in_alias.sql b/tests/queries/0_stateless/03164_analyzer_global_in_alias.sql new file mode 100644 index 00000000000..00c293334ee --- /dev/null +++ b/tests/queries/0_stateless/03164_analyzer_global_in_alias.sql @@ -0,0 +1,6 @@ +SET allow_experimental_analyzer=1; +SELECT 1 GLOBAL IN (SELECT 1) AS s, s FROM remote('127.0.0.{2,3}', system.one) GROUP BY 1; +SELECT 1 GLOBAL IN (SELECT 1) AS s FROM remote('127.0.0.{2,3}', system.one) GROUP BY 1; + +SELECT 1 GLOBAL IN (SELECT 1) AS s, s FROM remote('127.0.0.{1,3}', system.one) GROUP BY 1; +SELECT 1 GLOBAL IN (SELECT 1) AS s FROM remote('127.0.0.{1,3}', system.one) GROUP BY 1; diff --git a/tests/queries/0_stateless/03164_analyzer_rewrite_aggregate_function_with_if.reference b/tests/queries/0_stateless/03164_analyzer_rewrite_aggregate_function_with_if.reference new file mode 100644 index 00000000000..d00491fd7e5 --- /dev/null +++ b/tests/queries/0_stateless/03164_analyzer_rewrite_aggregate_function_with_if.reference @@ -0,0 +1 @@ +1 diff --git a/tests/queries/0_stateless/03164_analyzer_rewrite_aggregate_function_with_if.sql b/tests/queries/0_stateless/03164_analyzer_rewrite_aggregate_function_with_if.sql new file mode 100644 index 00000000000..52f767d8aae --- /dev/null +++ b/tests/queries/0_stateless/03164_analyzer_rewrite_aggregate_function_with_if.sql @@ -0,0 +1 @@ +SELECT countIf(multiIf(number < 2, NULL, if(number = 4, 1, 0))) FROM numbers(5); diff --git a/tests/queries/0_stateless/03164_analyzer_validate_tree_size.reference b/tests/queries/0_stateless/03164_analyzer_validate_tree_size.reference new file mode 100644 index 00000000000..d00491fd7e5 --- /dev/null +++ b/tests/queries/0_stateless/03164_analyzer_validate_tree_size.reference @@ -0,0 +1 @@ +1 diff --git a/tests/queries/0_stateless/03164_analyzer_validate_tree_size.sql b/tests/queries/0_stateless/03164_analyzer_validate_tree_size.sql new file mode 100644 index 00000000000..0e581592aef --- /dev/null +++ b/tests/queries/0_stateless/03164_analyzer_validate_tree_size.sql @@ -0,0 +1,1007 @@ +CREATE TABLE t +( +c1 Int64 , +c2 Int64 , +c3 Int64 , +c4 Int64 , +c5 Int64 , +c6 Int64 , +c7 Int64 , +c8 Int64 , +c9 Int64 , +c10 Int64 , +c11 Int64 , +c12 Int64 , +c13 Int64 , +c14 Int64 , +c15 Int64 , +c16 Int64 , +c17 Int64 , +c18 Int64 , +c19 Int64 , +c20 Int64 , +c21 Int64 , +c22 Int64 , +c23 Int64 , +c24 Int64 , +c25 Int64 , +c26 Int64 , +c27 Int64 , +c28 Int64 , +c29 Int64 , +c30 Int64 , +c31 Int64 , +c32 Int64 , +c33 Int64 , +c34 Int64 , +c35 Int64 , +c36 Int64 , +c37 Int64 , +c38 Int64 , +c39 Int64 , +c40 Int64 , +c41 Int64 , +c42 Int64 , +c43 Int64 , +c44 Int64 , +c45 Int64 , +c46 Int64 , +c47 Int64 , +c48 Int64 , +c49 Int64 , +c50 Int64 , +c51 Int64 , +c52 Int64 , +c53 Int64 , +c54 Int64 , +c55 Int64 , +c56 Int64 , +c57 Int64 , +c58 Int64 , +c59 Int64 , +c60 Int64 , +c61 Int64 , +c62 Int64 , +c63 Int64 , +c64 Int64 , +c65 Int64 , +c66 Int64 , +c67 Int64 , +c68 Int64 , +c69 Int64 , +c70 Int64 , +c71 Int64 , +c72 Int64 , +c73 Int64 , +c74 Int64 , +c75 Int64 , +c76 Int64 , +c77 Int64 , +c78 Int64 , +c79 Int64 , +c80 Int64 , +c81 Int64 , +c82 Int64 , +c83 Int64 , +c84 Int64 , +c85 Int64 , +c86 Int64 , +c87 Int64 , +c88 Int64 , +c89 Int64 , +c90 Int64 , +c91 Int64 , +c92 Int64 , +c93 Int64 , +c94 Int64 , +c95 Int64 , +c96 Int64 , +c97 Int64 , +c98 Int64 , +c99 Int64 , +c100 Int64 , +c101 Int64 , +c102 Int64 , +c103 Int64 , +c104 Int64 , +c105 Int64 , +c106 Int64 , +c107 Int64 , +c108 Int64 , +c109 Int64 , +c110 Int64 , +c111 Int64 , +c112 Int64 , +c113 Int64 , +c114 Int64 , +c115 Int64 , +c116 Int64 , +c117 Int64 , +c118 Int64 , +c119 Int64 , +c120 Int64 , +c121 Int64 , +c122 Int64 , +c123 Int64 , +c124 Int64 , +c125 Int64 , +c126 Int64 , +c127 Int64 , +c128 Int64 , +c129 Int64 , +c130 Int64 , +c131 Int64 , +c132 Int64 , +c133 Int64 , +c134 Int64 , +c135 Int64 , +c136 Int64 , +c137 Int64 , +c138 Int64 , +c139 Int64 , +c140 Int64 , +c141 Int64 , +c142 Int64 , +c143 Int64 , +c144 Int64 , +c145 Int64 , +c146 Int64 , +c147 Int64 , +c148 Int64 , +c149 Int64 , +c150 Int64 , +c151 Int64 , +c152 Int64 , +c153 Int64 , +c154 Int64 , +c155 Int64 , +c156 Int64 , +c157 Int64 , +c158 Int64 , +c159 Int64 , +c160 Int64 , +c161 Int64 , +c162 Int64 , +c163 Int64 , +c164 Int64 , +c165 Int64 , +c166 Int64 , +c167 Int64 , +c168 Int64 , +c169 Int64 , +c170 Int64 , +c171 Int64 , +c172 Int64 , +c173 Int64 , +c174 Int64 , +c175 Int64 , +c176 Int64 , +c177 Int64 , +c178 Int64 , +c179 Int64 , +c180 Int64 , +c181 Int64 , +c182 Int64 , +c183 Int64 , +c184 Int64 , +c185 Int64 , +c186 Int64 , +c187 Int64 , +c188 Int64 , +c189 Int64 , +c190 Int64 , +c191 Int64 , +c192 Int64 , +c193 Int64 , +c194 Int64 , +c195 Int64 , +c196 Int64 , +c197 Int64 , +c198 Int64 , +c199 Int64 , +c200 Int64 , +c201 Int64 , +c202 Int64 , +c203 Int64 , +c204 Int64 , +c205 Int64 , +c206 Int64 , +c207 Int64 , +c208 Int64 , +c209 Int64 , +c210 Int64 , +c211 Int64 , +c212 Int64 , +c213 Int64 , +c214 Int64 , +c215 Int64 , +c216 Int64 , +c217 Int64 , +c218 Int64 , +c219 Int64 , +c220 Int64 , +c221 Int64 , +c222 Int64 , +c223 Int64 , +c224 Int64 , +c225 Int64 , +c226 Int64 , +c227 Int64 , +c228 Int64 , +c229 Int64 , +c230 Int64 , +c231 Int64 , +c232 Int64 , +c233 Int64 , +c234 Int64 , +c235 Int64 , +c236 Int64 , +c237 Int64 , +c238 Int64 , +c239 Int64 , +c240 Int64 , +c241 Int64 , +c242 Int64 , +c243 Int64 , +c244 Int64 , +c245 Int64 , +c246 Int64 , +c247 Int64 , +c248 Int64 , +c249 Int64 , +c250 Int64 , +c251 Int64 , +c252 Int64 , +c253 Int64 , +c254 Int64 , +c255 Int64 , +c256 Int64 , +c257 Int64 , +c258 Int64 , +c259 Int64 , +c260 Int64 , +c261 Int64 , +c262 Int64 , +c263 Int64 , +c264 Int64 , +c265 Int64 , +c266 Int64 , +c267 Int64 , +c268 Int64 , +c269 Int64 , +c270 Int64 , +c271 Int64 , +c272 Int64 , +c273 Int64 , +c274 Int64 , +c275 Int64 , +c276 Int64 , +c277 Int64 , +c278 Int64 , +c279 Int64 , +c280 Int64 , +c281 Int64 , +c282 Int64 , +c283 Int64 , +c284 Int64 , +c285 Int64 , +c286 Int64 , +c287 Int64 , +c288 Int64 , +c289 Int64 , +c290 Int64 , +c291 Int64 , +c292 Int64 , +c293 Int64 , +c294 Int64 , +c295 Int64 , +c296 Int64 , +c297 Int64 , +c298 Int64 , +c299 Int64 , +c300 Int64 , +c301 Int64 , +c302 Int64 , +c303 Int64 , +c304 Int64 , +c305 Int64 , +c306 Int64 , +c307 Int64 , +c308 Int64 , +c309 Int64 , +c310 Int64 , +c311 Int64 , +c312 Int64 , +c313 Int64 , +c314 Int64 , +c315 Int64 , +c316 Int64 , +c317 Int64 , +c318 Int64 , +c319 Int64 , +c320 Int64 , +c321 Int64 , +c322 Int64 , +c323 Int64 , +c324 Int64 , +c325 Int64 , +c326 Int64 , +c327 Int64 , +c328 Int64 , +c329 Int64 , +c330 Int64 , +c331 Int64 , +c332 Int64 , +c333 Int64 , +c334 Int64 , +c335 Int64 , +c336 Int64 , +c337 Int64 , +c338 Int64 , +c339 Int64 , +c340 Int64 , +c341 Int64 , +c342 Int64 , +c343 Int64 , +c344 Int64 , +c345 Int64 , +c346 Int64 , +c347 Int64 , +c348 Int64 , +c349 Int64 , +c350 Int64 , +c351 Int64 , +c352 Int64 , +c353 Int64 , +c354 Int64 , +c355 Int64 , +c356 Int64 , +c357 Int64 , +c358 Int64 , +c359 Int64 , +c360 Int64 , +c361 Int64 , +c362 Int64 , +c363 Int64 , +c364 Int64 , +c365 Int64 , +c366 Int64 , +c367 Int64 , +c368 Int64 , +c369 Int64 , +c370 Int64 , +c371 Int64 , +c372 Int64 , +c373 Int64 , +c374 Int64 , +c375 Int64 , +c376 Int64 , +c377 Int64 , +c378 Int64 , +c379 Int64 , +c380 Int64 , +c381 Int64 , +c382 Int64 , +c383 Int64 , +c384 Int64 , +c385 Int64 , +c386 Int64 , +c387 Int64 , +c388 Int64 , +c389 Int64 , +c390 Int64 , +c391 Int64 , +c392 Int64 , +c393 Int64 , +c394 Int64 , +c395 Int64 , +c396 Int64 , +c397 Int64 , +c398 Int64 , +c399 Int64 , +c400 Int64 , +c401 Int64 , +c402 Int64 , +c403 Int64 , +c404 Int64 , +c405 Int64 , +c406 Int64 , +c407 Int64 , +c408 Int64 , +c409 Int64 , +c410 Int64 , +c411 Int64 , +c412 Int64 , +c413 Int64 , +c414 Int64 , +c415 Int64 , +c416 Int64 , +c417 Int64 , +c418 Int64 , +c419 Int64 , +c420 Int64 , +c421 Int64 , +c422 Int64 , +c423 Int64 , +c424 Int64 , +c425 Int64 , +c426 Int64 , +c427 Int64 , +c428 Int64 , +c429 Int64 , +c430 Int64 , +c431 Int64 , +c432 Int64 , +c433 Int64 , +c434 Int64 , +c435 Int64 , +c436 Int64 , +c437 Int64 , +c438 Int64 , +c439 Int64 , +c440 Int64 , +c441 Int64 , +c442 Int64 , +c443 Int64 , +c444 Int64 , +c445 Int64 , +c446 Int64 , +c447 Int64 , +c448 Int64 , +c449 Int64 , +c450 Int64 , +c451 Int64 , +c452 Int64 , +c453 Int64 , +c454 Int64 , +c455 Int64 , +c456 Int64 , +c457 Int64 , +c458 Int64 , +c459 Int64 , +c460 Int64 , +c461 Int64 , +c462 Int64 , +c463 Int64 , +c464 Int64 , +c465 Int64 , +c466 Int64 , +c467 Int64 , +c468 Int64 , +c469 Int64 , +c470 Int64 , +c471 Int64 , +c472 Int64 , +c473 Int64 , +c474 Int64 , +c475 Int64 , +c476 Int64 , +c477 Int64 , +c478 Int64 , +c479 Int64 , +c480 Int64 , +c481 Int64 , +c482 Int64 , +c483 Int64 , +c484 Int64 , +c485 Int64 , +c486 Int64 , +c487 Int64 , +c488 Int64 , +c489 Int64 , +c490 Int64 , +c491 Int64 , +c492 Int64 , +c493 Int64 , +c494 Int64 , +c495 Int64 , +c496 Int64 , +c497 Int64 , +c498 Int64 , +c499 Int64 , +c500 Int64 , +b1 Int64 , +b2 Int64 , +b3 Int64 , +b4 Int64 , +b5 Int64 , +b6 Int64 , +b7 Int64 , +b8 Int64 , +b9 Int64 , +b10 Int64 , +b11 Int64 , +b12 Int64 , +b13 Int64 , +b14 Int64 , +b15 Int64 , +b16 Int64 , +b17 Int64 , +b18 Int64 , +b19 Int64 , +b20 Int64 , +b21 Int64 , +b22 Int64 , +b23 Int64 , +b24 Int64 , +b25 Int64 , +b26 Int64 , +b27 Int64 , +b28 Int64 , +b29 Int64 , +b30 Int64 , +b31 Int64 , +b32 Int64 , +b33 Int64 , +b34 Int64 , +b35 Int64 , +b36 Int64 , +b37 Int64 , +b38 Int64 , +b39 Int64 , +b40 Int64 , +b41 Int64 , +b42 Int64 , +b43 Int64 , +b44 Int64 , +b45 Int64 , +b46 Int64 , +b47 Int64 , +b48 Int64 , +b49 Int64 , +b50 Int64 , +b51 Int64 , +b52 Int64 , +b53 Int64 , +b54 Int64 , +b55 Int64 , +b56 Int64 , +b57 Int64 , +b58 Int64 , +b59 Int64 , +b60 Int64 , +b61 Int64 , +b62 Int64 , +b63 Int64 , +b64 Int64 , +b65 Int64 , +b66 Int64 , +b67 Int64 , +b68 Int64 , +b69 Int64 , +b70 Int64 , +b71 Int64 , +b72 Int64 , +b73 Int64 , +b74 Int64 , +b75 Int64 , +b76 Int64 , +b77 Int64 , +b78 Int64 , +b79 Int64 , +b80 Int64 , +b81 Int64 , +b82 Int64 , +b83 Int64 , +b84 Int64 , +b85 Int64 , +b86 Int64 , +b87 Int64 , +b88 Int64 , +b89 Int64 , +b90 Int64 , +b91 Int64 , +b92 Int64 , +b93 Int64 , +b94 Int64 , +b95 Int64 , +b96 Int64 , +b97 Int64 , +b98 Int64 , +b99 Int64 , +b100 Int64 , +b101 Int64 , +b102 Int64 , +b103 Int64 , +b104 Int64 , +b105 Int64 , +b106 Int64 , +b107 Int64 , +b108 Int64 , +b109 Int64 , +b110 Int64 , +b111 Int64 , +b112 Int64 , +b113 Int64 , +b114 Int64 , +b115 Int64 , +b116 Int64 , +b117 Int64 , +b118 Int64 , +b119 Int64 , +b120 Int64 , +b121 Int64 , +b122 Int64 , +b123 Int64 , +b124 Int64 , +b125 Int64 , +b126 Int64 , +b127 Int64 , +b128 Int64 , +b129 Int64 , +b130 Int64 , +b131 Int64 , +b132 Int64 , +b133 Int64 , +b134 Int64 , +b135 Int64 , +b136 Int64 , +b137 Int64 , +b138 Int64 , +b139 Int64 , +b140 Int64 , +b141 Int64 , +b142 Int64 , +b143 Int64 , +b144 Int64 , +b145 Int64 , +b146 Int64 , +b147 Int64 , +b148 Int64 , +b149 Int64 , +b150 Int64 , +b151 Int64 , +b152 Int64 , +b153 Int64 , +b154 Int64 , +b155 Int64 , +b156 Int64 , +b157 Int64 , +b158 Int64 , +b159 Int64 , +b160 Int64 , +b161 Int64 , +b162 Int64 , +b163 Int64 , +b164 Int64 , +b165 Int64 , +b166 Int64 , +b167 Int64 , +b168 Int64 , +b169 Int64 , +b170 Int64 , +b171 Int64 , +b172 Int64 , +b173 Int64 , +b174 Int64 , +b175 Int64 , +b176 Int64 , +b177 Int64 , +b178 Int64 , +b179 Int64 , +b180 Int64 , +b181 Int64 , +b182 Int64 , +b183 Int64 , +b184 Int64 , +b185 Int64 , +b186 Int64 , +b187 Int64 , +b188 Int64 , +b189 Int64 , +b190 Int64 , +b191 Int64 , +b192 Int64 , +b193 Int64 , +b194 Int64 , +b195 Int64 , +b196 Int64 , +b197 Int64 , +b198 Int64 , +b199 Int64 , +b200 Int64 , +b201 Int64 , +b202 Int64 , +b203 Int64 , +b204 Int64 , +b205 Int64 , +b206 Int64 , +b207 Int64 , +b208 Int64 , +b209 Int64 , +b210 Int64 , +b211 Int64 , +b212 Int64 , +b213 Int64 , +b214 Int64 , +b215 Int64 , +b216 Int64 , +b217 Int64 , +b218 Int64 , +b219 Int64 , +b220 Int64 , +b221 Int64 , +b222 Int64 , +b223 Int64 , +b224 Int64 , +b225 Int64 , +b226 Int64 , +b227 Int64 , +b228 Int64 , +b229 Int64 , +b230 Int64 , +b231 Int64 , +b232 Int64 , +b233 Int64 , +b234 Int64 , +b235 Int64 , +b236 Int64 , +b237 Int64 , +b238 Int64 , +b239 Int64 , +b240 Int64 , +b241 Int64 , +b242 Int64 , +b243 Int64 , +b244 Int64 , +b245 Int64 , +b246 Int64 , +b247 Int64 , +b248 Int64 , +b249 Int64 , +b250 Int64 , +b251 Int64 , +b252 Int64 , +b253 Int64 , +b254 Int64 , +b255 Int64 , +b256 Int64 , +b257 Int64 , +b258 Int64 , +b259 Int64 , +b260 Int64 , +b261 Int64 , +b262 Int64 , +b263 Int64 , +b264 Int64 , +b265 Int64 , +b266 Int64 , +b267 Int64 , +b268 Int64 , +b269 Int64 , +b270 Int64 , +b271 Int64 , +b272 Int64 , +b273 Int64 , +b274 Int64 , +b275 Int64 , +b276 Int64 , +b277 Int64 , +b278 Int64 , +b279 Int64 , +b280 Int64 , +b281 Int64 , +b282 Int64 , +b283 Int64 , +b284 Int64 , +b285 Int64 , +b286 Int64 , +b287 Int64 , +b288 Int64 , +b289 Int64 , +b290 Int64 , +b291 Int64 , +b292 Int64 , +b293 Int64 , +b294 Int64 , +b295 Int64 , +b296 Int64 , +b297 Int64 , +b298 Int64 , +b299 Int64 , +b300 Int64 , +b301 Int64 , +b302 Int64 , +b303 Int64 , +b304 Int64 , +b305 Int64 , +b306 Int64 , +b307 Int64 , +b308 Int64 , +b309 Int64 , +b310 Int64 , +b311 Int64 , +b312 Int64 , +b313 Int64 , +b314 Int64 , +b315 Int64 , +b316 Int64 , +b317 Int64 , +b318 Int64 , +b319 Int64 , +b320 Int64 , +b321 Int64 , +b322 Int64 , +b323 Int64 , +b324 Int64 , +b325 Int64 , +b326 Int64 , +b327 Int64 , +b328 Int64 , +b329 Int64 , +b330 Int64 , +b331 Int64 , +b332 Int64 , +b333 Int64 , +b334 Int64 , +b335 Int64 , +b336 Int64 , +b337 Int64 , +b338 Int64 , +b339 Int64 , +b340 Int64 , +b341 Int64 , +b342 Int64 , +b343 Int64 , +b344 Int64 , +b345 Int64 , +b346 Int64 , +b347 Int64 , +b348 Int64 , +b349 Int64 , +b350 Int64 , +b351 Int64 , +b352 Int64 , +b353 Int64 , +b354 Int64 , +b355 Int64 , +b356 Int64 , +b357 Int64 , +b358 Int64 , +b359 Int64 , +b360 Int64 , +b361 Int64 , +b362 Int64 , +b363 Int64 , +b364 Int64 , +b365 Int64 , +b366 Int64 , +b367 Int64 , +b368 Int64 , +b369 Int64 , +b370 Int64 , +b371 Int64 , +b372 Int64 , +b373 Int64 , +b374 Int64 , +b375 Int64 , +b376 Int64 , +b377 Int64 , +b378 Int64 , +b379 Int64 , +b380 Int64 , +b381 Int64 , +b382 Int64 , +b383 Int64 , +b384 Int64 , +b385 Int64 , +b386 Int64 , +b387 Int64 , +b388 Int64 , +b389 Int64 , +b390 Int64 , +b391 Int64 , +b392 Int64 , +b393 Int64 , +b394 Int64 , +b395 Int64 , +b396 Int64 , +b397 Int64 , +b398 Int64 , +b399 Int64 , +b400 Int64 , +b401 Int64 , +b402 Int64 , +b403 Int64 , +b404 Int64 , +b405 Int64 , +b406 Int64 , +b407 Int64 , +b408 Int64 , +b409 Int64 , +b410 Int64 , +b411 Int64 , +b412 Int64 , +b413 Int64 , +b414 Int64 , +b415 Int64 , +b416 Int64 , +b417 Int64 , +b418 Int64 , +b419 Int64 , +b420 Int64 , +b421 Int64 , +b422 Int64 , +b423 Int64 , +b424 Int64 , +b425 Int64 , +b426 Int64 , +b427 Int64 , +b428 Int64 , +b429 Int64 , +b430 Int64 , +b431 Int64 , +b432 Int64 , +b433 Int64 , +b434 Int64 , +b435 Int64 , +b436 Int64 , +b437 Int64 , +b438 Int64 , +b439 Int64 , +b440 Int64 , +b441 Int64 , +b442 Int64 , +b443 Int64 , +b444 Int64 , +b445 Int64 , +b446 Int64 , +b447 Int64 , +b448 Int64 , +b449 Int64 , +b450 Int64 , +b451 Int64 , +b452 Int64 , +b453 Int64 , +b454 Int64 , +b455 Int64 , +b456 Int64 , +b457 Int64 , +b458 Int64 , +b459 Int64 , +b460 Int64 , +b461 Int64 , +b462 Int64 , +b463 Int64 , +b464 Int64 , +b465 Int64 , +b466 Int64 , +b467 Int64 , +b468 Int64 , +b469 Int64 , +b470 Int64 , +b471 Int64 , +b472 Int64 , +b473 Int64 , +b474 Int64 , +b475 Int64 , +b476 Int64 , +b477 Int64 , +b478 Int64 , +b479 Int64 , +b480 Int64 , +b481 Int64 , +b482 Int64 , +b483 Int64 , +b484 Int64 , +b485 Int64 , +b486 Int64 , +b487 Int64 , +b488 Int64 , +b489 Int64 , +b490 Int64 , +b491 Int64 , +b492 Int64 , +b493 Int64 , +b494 Int64 , +b495 Int64 , +b496 Int64 , +b497 Int64 , +b498 Int64 , +b499 Int64 , +b500 Int64 +) ENGINE = Memory; + +insert into t(c1) values(1); + +SELECT count() FROM (SELECT tuple(*) FROM t); diff --git a/tests/queries/0_stateless/03164_create_as_default.reference b/tests/queries/0_stateless/03164_create_as_default.reference new file mode 100644 index 00000000000..aceba23beaf --- /dev/null +++ b/tests/queries/0_stateless/03164_create_as_default.reference @@ -0,0 +1,5 @@ +CREATE TABLE default.src_table\n(\n `time` DateTime(\'UTC\') DEFAULT fromUnixTimestamp(sipTimestamp),\n `sipTimestamp` UInt64\n)\nENGINE = MergeTree\nORDER BY time\nSETTINGS index_granularity = 8192 +sipTimestamp +time fromUnixTimestamp(sipTimestamp) +{"time":"2024-05-20 09:00:00","sipTimestamp":"1716195600"} +{"time":"2024-05-20 09:00:00","sipTimestamp":"1716195600"} diff --git a/tests/queries/0_stateless/03164_create_as_default.sql b/tests/queries/0_stateless/03164_create_as_default.sql new file mode 100644 index 00000000000..e9fd7c1e35a --- /dev/null +++ b/tests/queries/0_stateless/03164_create_as_default.sql @@ -0,0 +1,27 @@ +DROP TABLE IF EXISTS src_table; +DROP TABLE IF EXISTS copied_table; + +CREATE TABLE src_table +( + time DateTime('UTC') DEFAULT fromUnixTimestamp(sipTimestamp), + sipTimestamp UInt64 +) +ENGINE = MergeTree +ORDER BY time; + +INSERT INTO src_table(sipTimestamp) VALUES (toUnixTimestamp(toDateTime('2024-05-20 09:00:00', 'UTC'))); + +CREATE TABLE copied_table AS src_table; + +ALTER TABLE copied_table RENAME COLUMN `sipTimestamp` TO `timestamp`; + +SHOW CREATE TABLE src_table; + +SELECT name, default_expression FROM system.columns WHERE database = currentDatabase() AND table = 'src_table' ORDER BY name; +INSERT INTO src_table(sipTimestamp) VALUES (toUnixTimestamp(toDateTime('2024-05-20 09:00:00', 'UTC'))); + +SELECT * FROM src_table ORDER BY time FORMAT JSONEachRow; +SELECT * FROM copied_table ORDER BY time FORMAT JSONEachRow; + +DROP TABLE src_table; +DROP TABLE copied_table; diff --git a/tests/queries/0_stateless/03164_materialize_skip_index.reference b/tests/queries/0_stateless/03164_materialize_skip_index.reference new file mode 100644 index 00000000000..34251101e89 --- /dev/null +++ b/tests/queries/0_stateless/03164_materialize_skip_index.reference @@ -0,0 +1,52 @@ +20 +Expression ((Project names + Projection)) + Aggregating + Expression (Before GROUP BY) + Expression + ReadFromMergeTree (default.t_skip_index_insert) + Indexes: + Skip + Name: idx_a + Description: minmax GRANULARITY 1 + Parts: 2/2 + Granules: 50/50 + Skip + Name: idx_b + Description: set GRANULARITY 1 + Parts: 2/2 + Granules: 50/50 +20 +Expression ((Project names + Projection)) + Aggregating + Expression (Before GROUP BY) + Expression + ReadFromMergeTree (default.t_skip_index_insert) + Indexes: + Skip + Name: idx_a + Description: minmax GRANULARITY 1 + Parts: 1/1 + Granules: 6/50 + Skip + Name: idx_b + Description: set GRANULARITY 1 + Parts: 1/1 + Granules: 6/6 +20 +Expression ((Project names + Projection)) + Aggregating + Expression (Before GROUP BY) + Expression + ReadFromMergeTree (default.t_skip_index_insert) + Indexes: + Skip + Name: idx_a + Description: minmax GRANULARITY 1 + Parts: 1/2 + Granules: 6/50 + Skip + Name: idx_b + Description: set GRANULARITY 1 + Parts: 1/1 + Granules: 6/6 +4 0 diff --git a/tests/queries/0_stateless/03164_materialize_skip_index.sql b/tests/queries/0_stateless/03164_materialize_skip_index.sql new file mode 100644 index 00000000000..4e59ef6b6cd --- /dev/null +++ b/tests/queries/0_stateless/03164_materialize_skip_index.sql @@ -0,0 +1,50 @@ +DROP TABLE IF EXISTS t_skip_index_insert; + +CREATE TABLE t_skip_index_insert +( + a UInt64, + b UInt64, + INDEX idx_a a TYPE minmax, + INDEX idx_b b TYPE set(3) +) +ENGINE = MergeTree ORDER BY tuple() SETTINGS index_granularity = 4; + +SET allow_experimental_analyzer = 1; +SET materialize_skip_indexes_on_insert = 0; + +SYSTEM STOP MERGES t_skip_index_insert; + +INSERT INTO t_skip_index_insert SELECT number, number / 50 FROM numbers(100); +INSERT INTO t_skip_index_insert SELECT number, number / 50 FROM numbers(100, 100); + +SELECT count() FROM t_skip_index_insert WHERE a >= 110 AND a < 130 AND b = 2; +EXPLAIN indexes = 1 SELECT count() FROM t_skip_index_insert WHERE a >= 110 AND a < 130 AND b = 2; + +SYSTEM START MERGES t_skip_index_insert; +OPTIMIZE TABLE t_skip_index_insert FINAL; + +SELECT count() FROM t_skip_index_insert WHERE a >= 110 AND a < 130 AND b = 2; +EXPLAIN indexes = 1 SELECT count() FROM t_skip_index_insert WHERE a >= 110 AND a < 130 AND b = 2; + +TRUNCATE TABLE t_skip_index_insert; + +INSERT INTO t_skip_index_insert SELECT number, number / 50 FROM numbers(100); +INSERT INTO t_skip_index_insert SELECT number, number / 50 FROM numbers(100, 100); + +SET mutations_sync = 2; + +ALTER TABLE t_skip_index_insert MATERIALIZE INDEX idx_a; +ALTER TABLE t_skip_index_insert MATERIALIZE INDEX idx_b; + +SELECT count() FROM t_skip_index_insert WHERE a >= 110 AND a < 130 AND b = 2; +EXPLAIN indexes = 1 SELECT count() FROM t_skip_index_insert WHERE a >= 110 AND a < 130 AND b = 2; + +DROP TABLE IF EXISTS t_skip_index_insert; + +SYSTEM FLUSH LOGS; + +SELECT count(), sum(ProfileEvents['MergeTreeDataWriterSkipIndicesCalculationMicroseconds']) +FROM system.query_log +WHERE current_database = currentDatabase() + AND query LIKE 'INSERT INTO t_skip_index_insert SELECT%' + AND type = 'QueryFinish'; diff --git a/tests/queries/0_stateless/03164_materialize_statistics.reference b/tests/queries/0_stateless/03164_materialize_statistics.reference new file mode 100644 index 00000000000..c209d2e8b63 --- /dev/null +++ b/tests/queries/0_stateless/03164_materialize_statistics.reference @@ -0,0 +1,10 @@ +10 +10 +10 +statistic not used Condition less(b, 10_UInt8) moved to PREWHERE +statistic not used Condition less(a, 10_UInt8) moved to PREWHERE +statistic used after merge Condition less(a, 10_UInt8) moved to PREWHERE +statistic used after merge Condition less(b, 10_UInt8) moved to PREWHERE +statistic used after materialize Condition less(a, 10_UInt8) moved to PREWHERE +statistic used after materialize Condition less(b, 10_UInt8) moved to PREWHERE +2 0 diff --git a/tests/queries/0_stateless/03164_materialize_statistics.sql b/tests/queries/0_stateless/03164_materialize_statistics.sql new file mode 100644 index 00000000000..763644d16ab --- /dev/null +++ b/tests/queries/0_stateless/03164_materialize_statistics.sql @@ -0,0 +1,49 @@ +DROP TABLE IF EXISTS t_statistic_materialize; + +SET allow_experimental_analyzer = 1; +SET allow_experimental_statistic = 1; +SET allow_statistic_optimize = 1; +SET materialize_statistics_on_insert = 0; + +CREATE TABLE t_statistic_materialize +( + a Int64 STATISTIC(tdigest), + b Int16 STATISTIC(tdigest), +) ENGINE = MergeTree() ORDER BY tuple() +SETTINGS min_bytes_for_wide_part = 0, enable_vertical_merge_algorithm = 0; -- TODO: there is a bug in vertical merge with statistics. + +INSERT INTO t_statistic_materialize SELECT number, -number FROM system.numbers LIMIT 10000; + +SELECT count(*) FROM t_statistic_materialize WHERE b < 10 and a < 10 SETTINGS log_comment = 'statistic not used'; + +OPTIMIZE TABLE t_statistic_materialize FINAL; + +SELECT count(*) FROM t_statistic_materialize WHERE b < 10 and a < 10 SETTINGS log_comment = 'statistic used after merge'; + +TRUNCATE TABLE t_statistic_materialize; +SET mutations_sync = 2; + +INSERT INTO t_statistic_materialize SELECT number, -number FROM system.numbers LIMIT 10000; +ALTER TABLE t_statistic_materialize MATERIALIZE STATISTIC a, b TYPE tdigest; + +SELECT count(*) FROM t_statistic_materialize WHERE b < 10 and a < 10 SETTINGS log_comment = 'statistic used after materialize'; + +DROP TABLE t_statistic_materialize; + +SYSTEM FLUSH LOGS; + +SELECT log_comment, message FROM system.text_log JOIN +( + SELECT Settings['log_comment'] AS log_comment, query_id FROM system.query_log + WHERE current_database = currentDatabase() + AND query LIKE 'SELECT count(*) FROM t_statistic_materialize%' + AND type = 'QueryFinish' +) AS query_log USING (query_id) +WHERE message LIKE '%moved to PREWHERE%' +ORDER BY event_time_microseconds; + +SELECT count(), sum(ProfileEvents['MergeTreeDataWriterStatisticsCalculationMicroseconds']) +FROM system.query_log +WHERE current_database = currentDatabase() + AND query LIKE 'INSERT INTO t_statistic_materialize SELECT%' + AND type = 'QueryFinish'; diff --git a/tests/queries/0_stateless/03164_optimize_read_in_order_nullable.reference b/tests/queries/0_stateless/03164_optimize_read_in_order_nullable.reference new file mode 100644 index 00000000000..7e866b496a8 --- /dev/null +++ b/tests/queries/0_stateless/03164_optimize_read_in_order_nullable.reference @@ -0,0 +1,32 @@ +-- Reproducer result: +\N Mark 50 +1 John 33 +2 Ksenia 48 + +-- Read in order, no sort required: +0 0 +1 \N +4 4 +\N 2 +\N \N + +-- Read in order, partial sort for second key: +0 0 +1 \N +4 4 +\N \N +\N 2 + +-- No reading in order, sort for first key: +\N 2 +\N \N +0 0 +1 \N +4 4 + +-- Reverse order, partial sort for the second key: +\N 2 +\N \N +4 4 +1 \N +0 0 diff --git a/tests/queries/0_stateless/03164_optimize_read_in_order_nullable.sql b/tests/queries/0_stateless/03164_optimize_read_in_order_nullable.sql new file mode 100644 index 00000000000..7af6e55bf98 --- /dev/null +++ b/tests/queries/0_stateless/03164_optimize_read_in_order_nullable.sql @@ -0,0 +1,55 @@ +-- Reproducer from https://github.com/ClickHouse/ClickHouse/issues/63460 +DROP TABLE IF EXISTS 03164_users; +CREATE TABLE 03164_users (uid Nullable(Int16), name String, age Int16) ENGINE=MergeTree ORDER BY (uid) SETTINGS allow_nullable_key=1; + +INSERT INTO 03164_users VALUES (1, 'John', 33); +INSERT INTO 03164_users VALUES (2, 'Ksenia', 48); +INSERT INTO 03164_users VALUES (NULL, 'Mark', 50); +OPTIMIZE TABLE 03164_users FINAL; + +SELECT '-- Reproducer result:'; + +SELECT * FROM 03164_users ORDER BY uid ASC NULLS FIRST LIMIT 10 SETTINGS optimize_read_in_order = 1; + +DROP TABLE IF EXISTS 03164_users; + +DROP TABLE IF EXISTS 03164_multi_key; +CREATE TABLE 03164_multi_key (c1 Nullable(UInt32), c2 Nullable(UInt32)) ENGINE = MergeTree ORDER BY (c1, c2) SETTINGS allow_nullable_key=1; + +INSERT INTO 03164_multi_key VALUES (0, 0), (1, NULL), (NULL, 2), (NULL, NULL), (4, 4); +-- Just in case +OPTIMIZE TABLE 03164_multi_key FINAL; + +SELECT ''; +SELECT '-- Read in order, no sort required:'; + +SELECT c1, c2 +FROM 03164_multi_key +ORDER BY c1 ASC NULLS LAST, c2 ASC NULLS LAST +SETTINGS optimize_read_in_order = 1; + +SELECT ''; +SELECT '-- Read in order, partial sort for second key:'; + +SELECT c1, c2 +FROM 03164_multi_key +ORDER BY c1 ASC NULLS LAST, c2 ASC NULLS FIRST +SETTINGS optimize_read_in_order = 1; + +SELECT ''; +SELECT '-- No reading in order, sort for first key:'; + +SELECT c1, c2 +FROM 03164_multi_key +ORDER BY c1 ASC NULLS FIRST, c2 ASC NULLS LAST +SETTINGS optimize_read_in_order = 1; + +SELECT ''; +SELECT '-- Reverse order, partial sort for the second key:'; + +SELECT c1, c2 +FROM 03164_multi_key +ORDER BY c1 DESC NULLS FIRST, c2 DESC NULLS LAST +SETTINGS optimize_read_in_order = 1; + +DROP TABLE IF EXISTS 03164_multi_key; diff --git a/tests/queries/0_stateless/03165_distinct_with_window_func_crash.reference b/tests/queries/0_stateless/03165_distinct_with_window_func_crash.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/03165_distinct_with_window_func_crash.sql b/tests/queries/0_stateless/03165_distinct_with_window_func_crash.sql new file mode 100644 index 00000000000..e2e87fde35d --- /dev/null +++ b/tests/queries/0_stateless/03165_distinct_with_window_func_crash.sql @@ -0,0 +1,31 @@ +DROP TABLE IF EXISTS atable; + +CREATE TABLE atable +( + cdu_date Int16, + loanx_id String, + rating_sp String +) +ENGINE = MergeTree +ORDER BY tuple(); + +-- disable parallelization after window function otherwise +-- generated pipeline contains enormous number of transformers (should be fixed separately) +SET query_plan_enable_multithreading_after_window_functions=0; +-- max_threads is randomized, and can significantly increase number of parallel transformers after window func, so set to small value explicitly +SET max_threads=3; + +SELECT DISTINCT + loanx_id, + rating_sp, + cdu_date, + row_number() OVER (PARTITION BY cdu_date) AS row_number, + last_value(cdu_date) OVER (PARTITION BY loanx_id ORDER BY cdu_date ASC) AS last_cdu_date +FROM atable +GROUP BY + cdu_date, + loanx_id, + rating_sp +SETTINGS query_plan_remove_redundant_distinct = 1; + +DROP TABLE atable; diff --git a/tests/queries/0_stateless/03165_parseReadableSize.reference b/tests/queries/0_stateless/03165_parseReadableSize.reference new file mode 100644 index 00000000000..57f17ecc5d3 --- /dev/null +++ b/tests/queries/0_stateless/03165_parseReadableSize.reference @@ -0,0 +1,60 @@ +1.00 B +1.00 KiB +1.00 MiB +1.00 GiB +1.00 TiB +1.00 PiB +1.00 EiB +1.00 B +1.00 KB +1.00 MB +1.00 GB +1.00 TB +1.00 PB +1.00 EB +1.00 MiB +1024 +3072 +1024 +1024 +1024 +1024 +1024 +\N +3217 +3217 +1000 +5 +2048 +8192 +0 0 0 +1 B 1 +1 KiB 1024 +1 MiB 1048576 +1 GiB 1073741824 +1 TiB 1099511627776 +1 PiB 1125899906842624 +1 EiB 1152921504606846976 +invalid \N +1 Joe \N +1KB 1000 + 1 GiB \N +1 TiB with fries \N +NaN KiB \N +Inf KiB \N +0xa123 KiB \N +1 B 1 +1 KiB 1024 +1 MiB 1048576 +1 GiB 1073741824 +1 TiB 1099511627776 +1 PiB 1125899906842624 +1 EiB 1152921504606846976 +invalid 0 +1 Joe 0 +1KB 1000 + 1 GiB 0 +1 TiB with fries 0 +NaN KiB 0 +Inf KiB 0 +0xa123 KiB 0 diff --git a/tests/queries/0_stateless/03165_parseReadableSize.sql b/tests/queries/0_stateless/03165_parseReadableSize.sql new file mode 100644 index 00000000000..33386268aa4 --- /dev/null +++ b/tests/queries/0_stateless/03165_parseReadableSize.sql @@ -0,0 +1,121 @@ +-- Should be the inverse of formatReadableSize +SELECT formatReadableSize(parseReadableSize('1 B')); +SELECT formatReadableSize(parseReadableSize('1 KiB')); +SELECT formatReadableSize(parseReadableSize('1 MiB')); +SELECT formatReadableSize(parseReadableSize('1 GiB')); +SELECT formatReadableSize(parseReadableSize('1 TiB')); +SELECT formatReadableSize(parseReadableSize('1 PiB')); +SELECT formatReadableSize(parseReadableSize('1 EiB')); + +-- Should be the inverse of formatReadableDecimalSize +SELECT formatReadableDecimalSize(parseReadableSize('1 B')); +SELECT formatReadableDecimalSize(parseReadableSize('1 KB')); +SELECT formatReadableDecimalSize(parseReadableSize('1 MB')); +SELECT formatReadableDecimalSize(parseReadableSize('1 GB')); +SELECT formatReadableDecimalSize(parseReadableSize('1 TB')); +SELECT formatReadableDecimalSize(parseReadableSize('1 PB')); +SELECT formatReadableDecimalSize(parseReadableSize('1 EB')); + +-- Is case-insensitive +SELECT formatReadableSize(parseReadableSize('1 mIb')); + +-- Should be able to parse decimals +SELECT parseReadableSize('1.00 KiB'); -- 1024 +SELECT parseReadableSize('3.00 KiB'); -- 3072 + +-- Infix whitespace is ignored +SELECT parseReadableSize('1 KiB'); +SELECT parseReadableSize('1KiB'); + +-- Can parse LowCardinality +SELECT parseReadableSize(toLowCardinality('1 KiB')); + +-- Can parse nullable fields +SELECT parseReadableSize(toNullable('1 KiB')); + +-- Can parse non-const columns fields +SELECT parseReadableSize(materialize('1 KiB')); + +-- Output is NULL if NULL arg is passed +SELECT parseReadableSize(NULL); + +-- Can parse more decimal places than Float64's precision +SELECT parseReadableSize('3.14159265358979323846264338327950288419716939937510 KiB'); + +-- Can parse sizes prefixed with a plus sign +SELECT parseReadableSize('+3.1415 KiB'); + +-- Can parse amounts in scientific notation +SELECT parseReadableSize('10e2 B'); + +-- Can parse floats with no decimal points +SELECT parseReadableSize('5. B'); + +-- Can parse numbers with leading zeroes +SELECT parseReadableSize('002 KiB'); + +-- Can parse octal-like +SELECT parseReadableSize('08 KiB'); + +-- Can parse various flavours of zero +SELECT parseReadableSize('0 KiB'), parseReadableSize('+0 KiB'), parseReadableSize('-0 KiB'); + +-- ERRORS +-- No arguments +SELECT parseReadableSize(); -- { serverError NUMBER_OF_ARGUMENTS_DOESNT_MATCH } +-- Too many arguments +SELECT parseReadableSize('1 B', '2 B'); -- { serverError NUMBER_OF_ARGUMENTS_DOESNT_MATCH } +-- Wrong Type +SELECT parseReadableSize(12); -- { serverError ILLEGAL_TYPE_OF_ARGUMENT } +-- Invalid input - overall garbage +SELECT parseReadableSize('oh no'); -- { serverError CANNOT_PARSE_NUMBER } +-- Invalid input - unknown unit +SELECT parseReadableSize('12.3 rb'); -- { serverError CANNOT_PARSE_TEXT } +-- Invalid input - Leading whitespace +SELECT parseReadableSize(' 1 B'); -- { serverError CANNOT_PARSE_INPUT_ASSERTION_FAILED } +-- Invalid input - Trailing characters +SELECT parseReadableSize('1 B leftovers'); -- { serverError UNEXPECTED_DATA_AFTER_PARSED_VALUE } +-- Invalid input - Negative sizes are not allowed +SELECT parseReadableSize('-1 KiB'); -- { serverError BAD_ARGUMENTS } +-- Invalid input - Input too large to fit in UInt64 +SELECT parseReadableSize('1000 EiB'); -- { serverError BAD_ARGUMENTS } +-- Invalid input - Hexadecimal is not supported +SELECT parseReadableSize('0xa123 KiB'); -- { serverError CANNOT_PARSE_TEXT } +-- Invalid input - NaN is not supported, with or without sign and with different capitalizations +SELECT parseReadableSize('nan KiB'); -- { serverError BAD_ARGUMENTS } +SELECT parseReadableSize('+nan KiB'); -- { serverError BAD_ARGUMENTS } +SELECT parseReadableSize('-nan KiB'); -- { serverError BAD_ARGUMENTS } +SELECT parseReadableSize('NaN KiB'); -- { serverError BAD_ARGUMENTS } +-- Invalid input - Infinite is not supported, with or without sign, in all its forms +SELECT parseReadableSize('inf KiB'); -- { serverError BAD_ARGUMENTS } +SELECT parseReadableSize('+inf KiB'); -- { serverError BAD_ARGUMENTS } +SELECT parseReadableSize('-inf KiB'); -- { serverError BAD_ARGUMENTS } +SELECT parseReadableSize('infinite KiB'); -- { serverError BAD_ARGUMENTS } +SELECT parseReadableSize('+infinite KiB'); -- { serverError BAD_ARGUMENTS } +SELECT parseReadableSize('-infinite KiB'); -- { serverError BAD_ARGUMENTS } +SELECT parseReadableSize('Inf KiB'); -- { serverError BAD_ARGUMENTS } +SELECT parseReadableSize('Infinite KiB'); -- { serverError BAD_ARGUMENTS } + + +-- OR NULL +-- Works as the regular version when inputs are correct +SELECT + arrayJoin(['1 B', '1 KiB', '1 MiB', '1 GiB', '1 TiB', '1 PiB', '1 EiB']) AS readable_sizes, + parseReadableSizeOrNull(readable_sizes) AS filesize; + +-- Returns NULL on invalid values +SELECT + arrayJoin(['invalid', '1 Joe', '1KB', ' 1 GiB', '1 TiB with fries', 'NaN KiB', 'Inf KiB', '0xa123 KiB']) AS readable_sizes, + parseReadableSizeOrNull(readable_sizes) AS filesize; + + +-- OR ZERO +-- Works as the regular version when inputs are correct +SELECT + arrayJoin(['1 B', '1 KiB', '1 MiB', '1 GiB', '1 TiB', '1 PiB', '1 EiB']) AS readable_sizes, + parseReadableSizeOrZero(readable_sizes) AS filesize; + +-- Returns NULL on invalid values +SELECT + arrayJoin(['invalid', '1 Joe', '1KB', ' 1 GiB', '1 TiB with fries', 'NaN KiB', 'Inf KiB', '0xa123 KiB']) AS readable_sizes, + parseReadableSizeOrZero(readable_sizes) AS filesize; \ No newline at end of file diff --git a/tests/queries/0_stateless/03166_optimize_row_order_during_insert.reference b/tests/queries/0_stateless/03166_optimize_row_order_during_insert.reference new file mode 100644 index 00000000000..bbd87fb450c --- /dev/null +++ b/tests/queries/0_stateless/03166_optimize_row_order_during_insert.reference @@ -0,0 +1,78 @@ +Simple test +Egor 1 +Egor 2 +Igor 1 +Igor 2 +Igor 3 +Cardinalities test +Alex 1 63 0 +Alex 1 65 0 +Alex 1 239 0 +Alex 2 224 0 +Alex 4 83 0 +Alex 4 134 0 +Alex 4 192 0 +Bob 2 53 0 +Bob 4 100 0 +Bob 4 177 0 +Bob 4 177 0 +Nikita 1 173 0 +Nikita 1 228 0 +Nikita 2 148 0 +Nikita 2 148 0 +Nikita 2 208 0 +Alex 1 63 1 +Alex 1 65 1 +Alex 1 239 1 +Alex 2 128 1 +Alex 2 128 1 +Alex 2 224 1 +Alex 4 83 1 +Alex 4 83 1 +Alex 4 134 1 +Alex 4 134 1 +Alex 4 192 1 +Bob 2 53 1 +Bob 2 53 1 +Bob 2 187 1 +Bob 2 187 1 +Bob 4 100 1 +Nikita 1 173 1 +Nikita 1 228 1 +Nikita 2 54 1 +Nikita 2 54 1 +Nikita 2 148 1 +Nikita 2 208 1 +Equivalence classes test +AB 1 9.81 0 +A\0 0 2.7 1 +A\0 1 2.7 1 +B\0 0 2.7 1 +B\0 1 2.7 1 +A\0 1 42 1 +B\0 0 42 1 +A\0 0 3.14 \N +B\0 -1 3.14 \N +B\0 2 3.14 \N +AB 0 42 \N +AB 0 42 \N +B\0 0 42 \N +A\0 1 42 \N +A\0 1 42 \N +B\0 1 42 \N +Many types test +A\0\0\0\0\0 2020-01-01 [0,1.1] 10 some string {'key':'value'} (123) +A\0\0\0\0\0 2020-01-01 [0,1.1] \N example {} (26) +A\0\0\0\0\0 2020-01-01 [2.2,1.1] 1 some other string {'key2':'value2'} (5) +A\0\0\0\0\0 2020-01-02 [2.2,1.1] 1 some other string {'key2':'value2'} (5) +A\0\0\0\0\0 2020-01-02 [0,1.1] 10 some string {'key':'value'} (123) +A\0\0\0\0\0 2020-01-02 [0,2.2] 10 example {} (26) +B\0\0\0\0\0 2020-01-04 [0,2.2] \N example {} (26) +B\0\0\0\0\0 2020-01-04 [0,1.1] 10 some string {'key':'value'} (123) +B\0\0\0\0\0 2020-01-04 [2.2,1.1] 1 some string {'key2':'value2'} (5) +B\0\0\0\0\0 2020-01-05 [0,1.1] 10 some string {'key':'value'} (123) +B\0\0\0\0\0 2020-01-05 [0,2.2] \N example {} (26) +B\0\0\0\0\0 2020-01-05 [2.2,1.1] 1 some other string {'key':'value'} (5) +C\0\0\0\0\0 2020-01-04 [0,1.1] 10 some string {'key':'value'} (5) +C\0\0\0\0\0 2020-01-04 [0,2.2] \N example {} (26) +C\0\0\0\0\0 2020-01-04 [2.2,1.1] 1 some other string {'key2':'value2'} (5) diff --git a/tests/queries/0_stateless/03166_optimize_row_order_during_insert.sql b/tests/queries/0_stateless/03166_optimize_row_order_during_insert.sql new file mode 100644 index 00000000000..bb2f5e94d05 --- /dev/null +++ b/tests/queries/0_stateless/03166_optimize_row_order_during_insert.sql @@ -0,0 +1,98 @@ +-- Checks that no bad things happen when the table optimizes the row order to improve compressability during insert. + + +-- Below SELECTs intentionally only ORDER BY the table primary key and rely on read-in-order optimization +SET optimize_read_in_order = 1; + +-- Just simple check, that optimization works correctly for table with 2 columns and 2 equivalence classes. +SELECT 'Simple test'; + +DROP TABLE IF EXISTS tab; + +CREATE TABLE tab ( + name String, + event Int8 +) ENGINE = MergeTree +ORDER BY name +SETTINGS allow_experimental_optimized_row_order = true; +INSERT INTO tab VALUES ('Igor', 3), ('Egor', 1), ('Egor', 2), ('Igor', 2), ('Igor', 1); + +SELECT * FROM tab ORDER BY name SETTINGS max_threads=1; + +DROP TABLE tab; + +-- Checks that RowOptimizer correctly selects the order for columns according to cardinality, with an empty ORDER BY. +-- There are 4 columns with cardinalities {name : 3, timestamp": 3, money: 17, flag: 2}, so the columns order must be {flag, name, timestamp, money}. +SELECT 'Cardinalities test'; + +DROP TABLE IF EXISTS tab; + +CREATE TABLE tab ( + name String, + timestamp Int64, + money UInt8, + flag String +) ENGINE = MergeTree +ORDER BY () +SETTINGS allow_experimental_optimized_row_order = True; +INSERT INTO tab VALUES ('Bob', 4, 100, '1'), ('Nikita', 2, 54, '1'), ('Nikita', 1, 228, '1'), ('Alex', 4, 83, '1'), ('Alex', 4, 134, '1'), ('Alex', 1, 65, '0'), ('Alex', 4, 134, '1'), ('Bob', 2, 53, '0'), ('Alex', 4, 83, '0'), ('Alex', 1, 63, '1'), ('Bob', 2, 53, '1'), ('Alex', 4, 192, '1'), ('Alex', 2, 128, '1'), ('Nikita', 2, 148, '0'), ('Bob', 4, 177, '0'), ('Nikita', 1, 173, '0'), ('Alex', 1, 239, '0'), ('Alex', 1, 63, '0'), ('Alex', 2, 224, '1'), ('Bob', 4, 177, '0'), ('Alex', 2, 128, '1'), ('Alex', 4, 134, '0'), ('Alex', 4, 83, '1'), ('Bob', 4, 100, '0'), ('Nikita', 2, 54, '1'), ('Alex', 1, 239, '1'), ('Bob', 2, 187, '1'), ('Alex', 1, 65, '1'), ('Bob', 2, 53, '1'), ('Alex', 2, 224, '0'), ('Alex', 4, 192, '0'), ('Nikita', 1, 173, '1'), ('Nikita', 2, 148, '1'), ('Bob', 2, 187, '1'), ('Nikita', 2, 208, '1'), ('Nikita', 2, 208, '0'), ('Nikita', 1, 228, '0'), ('Nikita', 2, 148, '0'); + +SELECT * FROM tab SETTINGS max_threads=1; + +DROP TABLE tab; + +-- Checks that RowOptimizer correctly selects the order for columns according to cardinality in each equivalence class obtained using SortDescription. +-- There are two columns in the SortDescription: {flag, money} in this order. +-- So there are 5 equivalence classes: {9.81, 9}, {2.7, 1}, {42, 1}, {3.14, Null}, {42, Null}. +-- For the first three of them cardinalities of the other 2 columns are equal, so they are sorted in order {0, 1} in these classes. +-- In the fourth class cardinalities: {name : 2, timestamp : 3}, so they are sorted in order {name, timestamp} in this class. +-- In the fifth class cardinalities: {name : 3, timestamp : 2}, so they are sorted in order {timestamp, name} in this class. +SELECT 'Equivalence classes test'; + +DROP TABLE IF EXISTS tab; + +CREATE TABLE tab ( + name FixedString(2), + timestamp Float32, + money Float64, + flag Nullable(Int32) +) ENGINE = MergeTree +ORDER BY (flag, money) +SETTINGS allow_experimental_optimized_row_order = True, allow_nullable_key = True; +INSERT INTO tab VALUES ('AB', 0, 42, Null), ('AB', 0, 42, Null), ('A', 1, 42, Null), ('AB', 1, 9.81, 0), ('B', 0, 42, Null), ('B', -1, 3.14, Null), ('B', 1, 2.7, 1), ('B', 0, 42, 1), ('A', 1, 42, 1), ('B', 1, 42, Null), ('B', 0, 2.7, 1), ('A', 0, 2.7, 1), ('B', 2, 3.14, Null), ('A', 0, 3.14, Null), ('A', 1, 2.7, 1), ('A', 1, 42, Null); + +SELECT * FROM tab ORDER BY (flag, money) SETTINGS max_threads=1; + +DROP TABLE tab; + +-- Checks that no bad things happen when the table optimizes the row order to improve compressability during insert for many different column types. +-- For some of these types estimateCardinalityInPermutedRange returns just the size of the current equal range. +-- There are 5 equivalence classes, each of them has equal size = 3. +-- In the first of them cardinality of the vector_array column equals 2, other cardinalities equals 3. +-- In the second of them cardinality of the nullable_int column equals 2, other cardinalities equals 3. +-- ... +-- In the fifth of them cardinality of the tuple_column column equals 2, other cardinalities equals 3. +-- So, for all of this classes for columns with cardinality equals 2 such that estimateCardinalityInPermutedRange methid is implemented, +-- this column must be the first in the column order, all others must be in the stable order. +-- For all other classes columns must be in the stable order. +SELECT 'Many types test'; + +DROP TABLE IF EXISTS tab; + +CREATE TABLE tab ( + fixed_str FixedString(6), + event_date Date, + vector_array Array(Float32), + nullable_int Nullable(Int128), + low_card_string LowCardinality(String), + map_column Map(String, String), + tuple_column Tuple(UInt256) +) ENGINE = MergeTree() +ORDER BY (fixed_str, event_date) +SETTINGS allow_experimental_optimized_row_order = True; + +INSERT INTO tab VALUES ('A', '2020-01-01', [0.0, 1.1], 10, 'some string', {'key':'value'}, (123)), ('A', '2020-01-01', [0.0, 1.1], NULL, 'example', {}, (26)), ('A', '2020-01-01', [2.2, 1.1], 1, 'some other string', {'key2':'value2'}, (5)), ('A', '2020-01-02', [0.0, 1.1], 10, 'some string', {'key':'value'}, (123)), ('A', '2020-01-02', [0.0, 2.2], 10, 'example', {}, (26)), ('A', '2020-01-02', [2.2, 1.1], 1, 'some other string', {'key2':'value2'}, (5)), ('B', '2020-01-04', [0.0, 1.1], 10, 'some string', {'key':'value'}, (123)), ('B', '2020-01-04', [0.0, 2.2], Null, 'example', {}, (26)), ('B', '2020-01-04', [2.2, 1.1], 1, 'some string', {'key2':'value2'}, (5)), ('B', '2020-01-05', [0.0, 1.1], 10, 'some string', {'key':'value'}, (123)), ('B', '2020-01-05', [0.0, 2.2], Null, 'example', {}, (26)), ('B', '2020-01-05', [2.2, 1.1], 1, 'some other string', {'key':'value'}, (5)), ('C', '2020-01-04', [0.0, 1.1], 10, 'some string', {'key':'value'}, (5)), ('C', '2020-01-04', [0.0, 2.2], Null, 'example', {}, (26)), ('C', '2020-01-04', [2.2, 1.1], 1, 'some other string', {'key2':'value2'}, (5)); + +SELECT * FROM tab ORDER BY (fixed_str, event_date) SETTINGS max_threads=1; + +DROP TABLE tab; diff --git a/tests/queries/1_stateful/00091_prewhere_two_conditions.sql b/tests/queries/1_stateful/00091_prewhere_two_conditions.sql index cbfbbaa2662..cd88743160c 100644 --- a/tests/queries/1_stateful/00091_prewhere_two_conditions.sql +++ b/tests/queries/1_stateful/00091_prewhere_two_conditions.sql @@ -14,6 +14,6 @@ WITH toTimeZone(EventTime, 'Asia/Dubai') AS xyz SELECT uniq(*) FROM test.hits WH SET optimize_move_to_prewhere = 0; SET enable_multiple_prewhere_read_steps = 0; -SELECT uniq(URL) FROM test.hits WHERE toTimeZone(EventTime, 'Asia/Dubai') >= '2014-03-20 00:00:00' AND toTimeZone(EventTime, 'Asia/Dubai') < '2014-03-21 00:00:00'; -- { serverError 307 } -SELECT uniq(URL) FROM test.hits WHERE toTimeZone(EventTime, 'Asia/Dubai') >= '2014-03-20 00:00:00' AND URL != '' AND toTimeZone(EventTime, 'Asia/Dubai') < '2014-03-21 00:00:00'; -- { serverError 307 } -SELECT uniq(URL) FROM test.hits PREWHERE toTimeZone(EventTime, 'Asia/Dubai') >= '2014-03-20 00:00:00' AND URL != '' AND toTimeZone(EventTime, 'Asia/Dubai') < '2014-03-21 00:00:00'; -- { serverError 307 } +SELECT uniq(URL) FROM test.hits WHERE toTimeZone(EventTime, 'Asia/Dubai') >= '2014-03-20 00:00:00' AND toTimeZone(EventTime, 'Asia/Dubai') < '2014-03-21 00:00:00'; -- { serverError TOO_MANY_BYTES } +SELECT uniq(URL) FROM test.hits WHERE toTimeZone(EventTime, 'Asia/Dubai') >= '2014-03-20 00:00:00' AND URL != '' AND toTimeZone(EventTime, 'Asia/Dubai') < '2014-03-21 00:00:00'; -- { serverError TOO_MANY_BYTES } +SELECT uniq(URL) FROM test.hits PREWHERE toTimeZone(EventTime, 'Asia/Dubai') >= '2014-03-20 00:00:00' AND URL != '' AND toTimeZone(EventTime, 'Asia/Dubai') < '2014-03-21 00:00:00'; -- { serverError TOO_MANY_BYTES } diff --git a/tests/queries/1_stateful/00175_counting_resources_in_subqueries.sql b/tests/queries/1_stateful/00175_counting_resources_in_subqueries.sql index fe7837d7ff1..63eca96414f 100644 --- a/tests/queries/1_stateful/00175_counting_resources_in_subqueries.sql +++ b/tests/queries/1_stateful/00175_counting_resources_in_subqueries.sql @@ -1,20 +1,20 @@ -- the work for scalar subquery is properly accounted: SET max_rows_to_read = 1000000; -SELECT 1 = (SELECT count() FROM test.hits WHERE NOT ignore(AdvEngineID)); -- { serverError 158 } +SELECT 1 = (SELECT count() FROM test.hits WHERE NOT ignore(AdvEngineID)); -- { serverError TOO_MANY_ROWS } -- the work for subquery in IN is properly accounted: SET max_rows_to_read = 1000000; -SELECT 1 IN (SELECT count() FROM test.hits WHERE NOT ignore(AdvEngineID)); -- { serverError 158 } +SELECT 1 IN (SELECT count() FROM test.hits WHERE NOT ignore(AdvEngineID)); -- { serverError TOO_MANY_ROWS } -- this query reads from the table twice: SET max_rows_to_read = 15000000; -SELECT count() IN (SELECT count() FROM test.hits WHERE NOT ignore(AdvEngineID)) FROM test.hits WHERE NOT ignore(AdvEngineID); -- { serverError 158 } +SELECT count() IN (SELECT count() FROM test.hits WHERE NOT ignore(AdvEngineID)) FROM test.hits WHERE NOT ignore(AdvEngineID); -- { serverError TOO_MANY_ROWS } -- the resources are properly accounted even if the subquery is evaluated in advance to facilitate the index analysis. -- this query is using index and filter out the second reading pass. SET max_rows_to_read = 1000000; -SELECT count() FROM test.hits WHERE CounterID > (SELECT count() FROM test.hits WHERE NOT ignore(AdvEngineID)); -- { serverError 158 } +SELECT count() FROM test.hits WHERE CounterID > (SELECT count() FROM test.hits WHERE NOT ignore(AdvEngineID)); -- { serverError TOO_MANY_ROWS } -- this query is using index but have to read all the data twice. SET max_rows_to_read = 10000000; -SELECT count() FROM test.hits WHERE CounterID < (SELECT count() FROM test.hits WHERE NOT ignore(AdvEngineID)); -- { serverError 158 } +SELECT count() FROM test.hits WHERE CounterID < (SELECT count() FROM test.hits WHERE NOT ignore(AdvEngineID)); -- { serverError TOO_MANY_ROWS } diff --git a/utils/changelog/README.md b/utils/changelog/README.md index ccc235c4990..4b16c39a3fe 100644 --- a/utils/changelog/README.md +++ b/utils/changelog/README.md @@ -6,7 +6,7 @@ Generate github token: Dependencies: ``` sudo apt-get update -sudo apt-get install git python3 python3-fuzzywuzzy python3-github +sudo apt-get install git python3 python3-thefuzz python3-github python3 changelog.py -h ``` @@ -15,10 +15,7 @@ Usage example: Note: The working directory is ClickHouse/utils/changelog ```bash -export GITHUB_TOKEN="" - -git fetch --tags # changelog.py depends on having the tags available, this will fetch them. - # If you are working from a branch in your personal fork, then you may need `git fetch --all` +GITHUB_TOKEN="" python3 changelog.py --output=changelog-v22.4.1.2305-prestable.md --gh-user-or-token="$GITHUB_TOKEN" v21.6.2.7-prestable python3 changelog.py --output=changelog-v22.4.1.2305-prestable.md --gh-user-or-token="$USER" --gh-password="$PASSWORD" v21.6.2.7-prestable diff --git a/utils/changelog/changelog.py b/utils/changelog/changelog.py index acc7293473d..314461a6b3a 100755 --- a/utils/changelog/changelog.py +++ b/utils/changelog/changelog.py @@ -3,18 +3,20 @@ import argparse import logging -import os.path as p import os +import os.path as p import re from datetime import date, timedelta -from subprocess import CalledProcessError, DEVNULL +from subprocess import DEVNULL, CalledProcessError from typing import Dict, List, Optional, TextIO -from fuzzywuzzy.fuzz import ratio # type: ignore -from github_helper import GitHub, PullRequest, PullRequests, Repository from github.GithubException import RateLimitExceededException, UnknownObjectException from github.NamedUser import NamedUser -from git_helper import is_shallow, git_runner as runner +from thefuzz.fuzz import ratio # type: ignore + +from git_helper import git_runner as runner +from git_helper import is_shallow +from github_helper import GitHub, PullRequest, PullRequests, Repository # This array gives the preferred category order, and is also used to # normalize category names. @@ -58,9 +60,10 @@ class Description: self.entry, ) # 2) issue URL w/o markdown link + # including #issuecomment-1 or #event-12 entry = re.sub( - r"([^(])https://github.com/ClickHouse/ClickHouse/issues/([0-9]{4,})", - r"\1[#\2](https://github.com/ClickHouse/ClickHouse/issues/\2)", + r"([^(])(https://github.com/ClickHouse/ClickHouse/issues/([0-9]{4,})[-#a-z0-9]*)", + r"\1[#\3](\2)", entry, ) # It's possible that we face a secondary rate limit. @@ -97,17 +100,14 @@ def get_descriptions(prs: PullRequests) -> Dict[str, List[Description]]: # obj._rawData doesn't spend additional API requests # We'll save some requests # pylint: disable=protected-access - repo_name = pr._rawData["base"]["repo"]["full_name"] # type: ignore + repo_name = pr._rawData["base"]["repo"]["full_name"] # pylint: enable=protected-access if repo_name not in repos: repos[repo_name] = pr.base.repo in_changelog = False merge_commit = pr.merge_commit_sha - try: - runner.run(f"git rev-parse '{merge_commit}'") - except CalledProcessError: - # It's possible that commit not in the repo, just continue - logging.info("PR %s does not belong to the repo", pr.number) + if merge_commit is None: + logging.warning("PR %s does not have merge-commit, skipping", pr.number) continue in_changelog = merge_commit in SHA_IN_CHANGELOG @@ -271,7 +271,6 @@ def generate_description(item: PullRequest, repo: Repository) -> Optional[Descri category, ): category = "Bug Fix (user-visible misbehavior in an official stable release)" - return Description(item.number, item.user, item.html_url, item.title, category) # Filter out documentations changelog if re.match( @@ -300,8 +299,9 @@ def generate_description(item: PullRequest, repo: Repository) -> Optional[Descri return Description(item.number, item.user, item.html_url, entry, category) -def write_changelog(fd: TextIO, descriptions: Dict[str, List[Description]]): - year = date.today().year +def write_changelog( + fd: TextIO, descriptions: Dict[str, List[Description]], year: int +) -> None: to_commit = runner(f"git rev-parse {TO_REF}^{{}}")[:11] from_commit = runner(f"git rev-parse {FROM_REF}^{{}}")[:11] fd.write( @@ -359,6 +359,12 @@ def set_sha_in_changelog(): ).split("\n") +def get_year(prs: PullRequests) -> int: + if not prs: + return date.today().year + return max(pr.created_at.year for pr in prs) + + def main(): log_levels = [logging.WARN, logging.INFO, logging.DEBUG] args = parse_args() @@ -412,8 +418,9 @@ def main(): prs = gh.get_pulls_from_search(query=query, merged=merged, sort="created") descriptions = get_descriptions(prs) + changelog_year = get_year(prs) - write_changelog(args.output, descriptions) + write_changelog(args.output, descriptions, changelog_year) if __name__ == "__main__": diff --git a/utils/changelog/requirements.txt b/utils/changelog/requirements.txt index 106e9e2c72d..53c3bf3206e 100644 --- a/utils/changelog/requirements.txt +++ b/utils/changelog/requirements.txt @@ -1,3 +1,2 @@ -fuzzywuzzy +thefuzz PyGitHub -python-Levenshtein diff --git a/utils/check-style/aspell-ignore/en/aspell-dict.txt b/utils/check-style/aspell-ignore/en/aspell-dict.txt index 244f2ad98ff..c35e860a5d7 100644 --- a/utils/check-style/aspell-ignore/en/aspell-dict.txt +++ b/utils/check-style/aspell-ignore/en/aspell-dict.txt @@ -446,6 +446,7 @@ KafkaLibrdkafkaThreads KafkaProducers KafkaWrites Kahan +Kaser KeeperAliveConnections KeeperMap KeeperOutstandingRequets @@ -466,6 +467,7 @@ LOCALTIME LOCALTIMESTAMP LONGLONG LOONGARCH +Lemir Levenshtein Liao LibFuzzer @@ -989,6 +991,7 @@ URLHash URLHierarchy URLPathHierarchy USearch +UTCTimestamp UUIDNumToString UUIDStringToNum UUIDToNum @@ -1366,6 +1369,10 @@ const contrib convertCharset coroutines +corrMatrix +corrStable +corrmatrix +corrstable cosineDistance countDigits countEqual @@ -1375,10 +1382,19 @@ countSubstrings countSubstringsCaseInsensitive countSubstringsCaseInsensitiveUTF covarPop +covarPopMatrix +covarPopStable covarSamp +covarSampMatrix +covarSampStable +covarStable covariates covarpop +covarpopmatrix +covarpopstable covarsamp +covarsampmatrix +covarsampstable covid cpp cppkafka @@ -1609,6 +1625,7 @@ formated formatschema formatter formatters +frac freezed fromDaysSinceYearZero fromModifiedJulianDay @@ -1735,6 +1752,8 @@ hdfs hdfsCluster heredoc heredocs +hilbertDecode +hilbertEncode hiveHash holistics homebrew @@ -2133,6 +2152,9 @@ parseDateTimeInJodaSyntaxOrNull parseDateTimeInJodaSyntaxOrZero parseDateTimeOrNull parseDateTimeOrZero +parseReadableSize +parseReadableSizeOrNull +parseReadableSizeOrZero parseTimeDelta parseable parsers @@ -2664,16 +2686,16 @@ toStartOfFiveMinutes toStartOfHour toStartOfISOYear toStartOfInterval +toStartOfMicrosecond +toStartOfMillisecond toStartOfMinute toStartOfMonth +toStartOfNanosecond toStartOfQuarter toStartOfSecond toStartOfTenMinutes toStartOfWeek toStartOfYear -toStartOfMicrosecond -toStartOfMillisecond -toStartOfNanosecond toString toStringCutToZero toTime @@ -2803,7 +2825,6 @@ urls usearch userspace userver -UTCTimestamp utils uuid uuidv diff --git a/utils/check-style/check-style b/utils/check-style/check-style index 23e8b6b2bc4..5c05907e9dd 100755 --- a/utils/check-style/check-style +++ b/utils/check-style/check-style @@ -290,8 +290,6 @@ std_cerr_cout_excludes=( /examples/ /tests/ _fuzzer - # DUMP() - base/base/iostream_debug_helpers.h # OK src/Common/ProgressIndication.cpp # only under #ifdef DBMS_HASH_MAP_DEBUG_RESIZES, that is used only in tests diff --git a/utils/list-versions/version_date.tsv b/utils/list-versions/version_date.tsv index 1f47a999162..f7d84cce4b1 100644 --- a/utils/list-versions/version_date.tsv +++ b/utils/list-versions/version_date.tsv @@ -1,3 +1,4 @@ +v24.5.1.1763-stable 2024-06-01 v24.4.1.2088-stable 2024-05-01 v24.3.3.102-lts 2024-05-01 v24.3.2.23-lts 2024-04-03 diff --git a/utils/security-generator/generate_security.py b/utils/security-generator/generate_security.py index 2b37e28257a..21c6b72e476 100755 --- a/utils/security-generator/generate_security.py +++ b/utils/security-generator/generate_security.py @@ -98,7 +98,7 @@ def generate_supported_versions() -> str: lts.append(version) to_append = f"| {version} | ✔️ |" if to_append: - if len(regular) == max_regular or len(lts) == max_lts: + if len(regular) == max_regular and len(lts) == max_lts: supported_year = year table.append(to_append) continue