diff --git a/CHANGELOG.md b/CHANGELOG.md index 207b88f7860..64ff3b78065 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,5 @@ ### Table of Contents +**[ClickHouse release v24.5, 2024-05-30](#245)**
**[ClickHouse release v24.4, 2024-04-30](#244)**
**[ClickHouse release v24.3 LTS, 2024-03-26](#243)**
**[ClickHouse release v24.2, 2024-02-29](#242)**
@@ -7,6 +8,179 @@ # 2024 Changelog +### ClickHouse release 24.5, 2024-05-30 + +#### Backward Incompatible Change +* Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)). +* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_functions = 1` or set `compatibility = '24.4'` or lower. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)). +* Queries from `system.columns` will work faster if there is a large number of columns, but many databases or tables are not granted for `SHOW TABLES`. Note that in previous versions, if you grant `SHOW COLUMNS` to individual columns without granting `SHOW TABLES` to the corresponding tables, the `system.columns` table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. [#63439](https://github.com/ClickHouse/ClickHouse/pull/63439) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Setting `replace_long_file_name_to_hash` is enabled by default for `MergeTree` tables. [#64457](https://github.com/ClickHouse/ClickHouse/pull/64457) ([Anton Popov](https://github.com/CurtizJ)). The data written with this setting can be read by server versions since 23.9. After you use ClickHouse with this setting enabled, you cannot downgrade to versions 23.8 and earlier. + +#### New Feature +* Adds the `Form` format to read/write a single record in the `application/x-www-form-urlencoded` format. [#60199](https://github.com/ClickHouse/ClickHouse/pull/60199) ([Shaun Struwig](https://github.com/Blargian)). +* Added possibility to compress in CROSS JOIN. [#60459](https://github.com/ClickHouse/ClickHouse/pull/60459) ([p1rattttt](https://github.com/p1rattttt)). +* Added possibility to do `CROSS JOIN` in temporary files if the size exceeds limits. [#63432](https://github.com/ClickHouse/ClickHouse/pull/63432) ([p1rattttt](https://github.com/p1rattttt)). +* Support join with inequal conditions which involve columns from both left and right table. e.g. `t1.y < t2.y`. To enable, `SET allow_experimental_join_condition = 1`. [#60920](https://github.com/ClickHouse/ClickHouse/pull/60920) ([lgbo](https://github.com/lgbo-ustc)). +* Maps can now have `Float32`, `Float64`, `Array(T)`, `Map(K, V)` and `Tuple(T1, T2, ...)` as keys. Closes [#54537](https://github.com/ClickHouse/ClickHouse/issues/54537). [#59318](https://github.com/ClickHouse/ClickHouse/pull/59318) ([李扬](https://github.com/taiyang-li)). +* Introduce bulk loading to `EmbeddedRocksDB` by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce `EmbeddedRocksDB` table settings. [#59163](https://github.com/ClickHouse/ClickHouse/pull/59163) [#63324](https://github.com/ClickHouse/ClickHouse/pull/63324) ([Duc Canh Le](https://github.com/canhld94)). +* User can now parse CRLF with TSV format using a setting `input_format_tsv_crlf_end_of_line`. Closes [#56257](https://github.com/ClickHouse/ClickHouse/issues/56257). [#59747](https://github.com/ClickHouse/ClickHouse/pull/59747) ([Shaun Struwig](https://github.com/Blargian)). +* A new setting `input_format_force_null_for_omitted_fields` that forces NULL values for omitted fields. [#60887](https://github.com/ClickHouse/ClickHouse/pull/60887) ([Constantine Peresypkin](https://github.com/pkit)). +* Earlier our S3 storage and s3 table function didn't support selecting from archive container files, such as tarballs, zip, 7z. Now they allow to iterate over files inside archives in S3. [#62259](https://github.com/ClickHouse/ClickHouse/pull/62259) ([Daniil Ivanik](https://github.com/divanik)). +* Support for conditional function `clamp`. [#62377](https://github.com/ClickHouse/ClickHouse/pull/62377) ([skyoct](https://github.com/skyoct)). +* Add `NPy` output format. [#62430](https://github.com/ClickHouse/ClickHouse/pull/62430) ([豪肥肥](https://github.com/HowePa)). +* `Raw` format as a synonym for `TSVRaw`. [#63394](https://github.com/ClickHouse/ClickHouse/pull/63394) ([Unalian](https://github.com/Unalian)). +* Added new SQL functions `generateSnowflakeID` for generating Twitter-style Snowflake IDs. [#63577](https://github.com/ClickHouse/ClickHouse/pull/63577) ([Danila Puzov](https://github.com/kazalika)). +* On Linux and MacOS, if the program has stdout redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to `INTO OUTFILE`). [#63662](https://github.com/ClickHouse/ClickHouse/pull/63662) ([v01dXYZ](https://github.com/v01dXYZ)). +* Change warning on high number of attached tables to differentiate tables, views and dictionaries. [#64180](https://github.com/ClickHouse/ClickHouse/pull/64180) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). +* Added SQL functions `fromReadableSize` (along with `OrNull` and `OrZero` variants). This function performs the opposite operation of functions `formatReadableSize` and `formatReadableDecimalSize,` i.e., the given human-readable byte size; they return the number of bytes. Example: `SELECT fromReadableSize('3.0 MiB')` returns `3145728`. [#64386](https://github.com/ClickHouse/ClickHouse/pull/64386) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). +* Provide support for `azureBlobStorage` function in ClickHouse server to use Azure Workload identity to authenticate against Azure blob storage. If `use_workload_identity` parameter is set in config, [workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications) is used for authentication. [#57881](https://github.com/ClickHouse/ClickHouse/pull/57881) ([Vinay Suryadevara](https://github.com/vinay92-ch)). +* Add TTL information in the `system.parts_columns` table. [#63200](https://github.com/ClickHouse/ClickHouse/pull/63200) ([litlig](https://github.com/litlig)). + +#### Experimental Features +* Implement `Dynamic` data type that allows to store values of any type inside it without knowing all of them in advance. `Dynamic` type is available under a setting `allow_experimental_dynamic_type`. Reference: [#54864](https://github.com/ClickHouse/ClickHouse/issues/54864). [#63058](https://github.com/ClickHouse/ClickHouse/pull/63058) ([Kruglov Pavel](https://github.com/Avogar)). +* Allowed to create `MaterializedMySQL` database without connection to MySQL. [#63397](https://github.com/ClickHouse/ClickHouse/pull/63397) ([Kirill](https://github.com/kirillgarbar)). +* Automatically mark a replica of Replicated database as lost and start recovery if some DDL task fails more than `max_retries_before_automatic_recovery` (100 by default) times in a row with the same error. Also, fixed a bug that could cause skipping DDL entries when an exception is thrown during an early stage of entry execution. [#63549](https://github.com/ClickHouse/ClickHouse/pull/63549) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Account failed files in `s3queue_tracked_file_ttl_sec` and `s3queue_traked_files_limit` for `StorageS3Queue`. [#63638](https://github.com/ClickHouse/ClickHouse/pull/63638) ([Kseniia Sumarokova](https://github.com/kssenii)). + +#### Performance Improvement +* A native parquet reader, which can read parquet binary to ClickHouse columns directly. Now this feature can be activated by setting `input_format_parquet_use_native_reader` to true. [#60361](https://github.com/ClickHouse/ClickHouse/pull/60361) ([ZhiHong Zhang](https://github.com/copperybean)). +* Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by `keep_free_space_size(elements)_ratio`). This allows to release pressure from space reservation for queries (on `tryReserve` method). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. [#61250](https://github.com/ClickHouse/ClickHouse/pull/61250) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Skip merging of newly created projection blocks during `INSERT`-s. [#59405](https://github.com/ClickHouse/ClickHouse/pull/59405) ([Nikita Taranov](https://github.com/nickitat)). +* Process string functions `...UTF8` 'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. [#61632](https://github.com/ClickHouse/ClickHouse/pull/61632) ([李扬](https://github.com/taiyang-li)). +* Improved performance of selection (`{}`) globs in StorageS3. [#62120](https://github.com/ClickHouse/ClickHouse/pull/62120) ([Andrey Zvonov](https://github.com/zvonand)). +* HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. [#62652](https://github.com/ClickHouse/ClickHouse/pull/62652) ([Anton Ivashkin](https://github.com/ianton-ru)). +* Function `splitByRegexp` is now faster when the regular expression argument is a single-character, trivial regular expression (in this case, it now falls back internally to `splitByChar`). [#62696](https://github.com/ClickHouse/ClickHouse/pull/62696) ([Robert Schulze](https://github.com/rschu1ze)). +* Aggregation with 8-bit and 16-bit keys became faster: added min/max in FixedHashTable to limit the array index and reduce the `isZero()` calls during iteration. [#62746](https://github.com/ClickHouse/ClickHouse/pull/62746) ([Jiebin Sun](https://github.com/jiebinn)). +* Add a new configuration`prefer_merge_sort_block_bytes` to control the memory usage and speed up sorting 2 times when merging when there are many columns. [#62904](https://github.com/ClickHouse/ClickHouse/pull/62904) ([LiuNeng](https://github.com/liuneng1994)). +* `clickhouse-local` will start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes [#62941](https://github.com/ClickHouse/ClickHouse/issues/62941). [#63074](https://github.com/ClickHouse/ClickHouse/pull/63074) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Micro-optimizations for the new analyzer. [#63429](https://github.com/ClickHouse/ClickHouse/pull/63429) ([Raúl Marín](https://github.com/Algunenano)). +* Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63443](https://github.com/ClickHouse/ClickHouse/pull/63443) [#63532](https://github.com/ClickHouse/ClickHouse/pull/63532) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Speed up indices of type `set` a little (around 1.5 times) by removing garbage. [#64098](https://github.com/ClickHouse/ClickHouse/pull/64098) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Optimized vertical merges in tables with sparse columns. [#64311](https://github.com/ClickHouse/ClickHouse/pull/64311) ([Anton Popov](https://github.com/CurtizJ)). +* Improve filtering of sparse columns: reduce redundant calls of `ColumnSparse::filter` to improve performance. [#64426](https://github.com/ClickHouse/ClickHouse/pull/64426) ([Jiebin Sun](https://github.com/jiebinn)). +* Remove copying data when writing to the filesystem cache. [#63401](https://github.com/ClickHouse/ClickHouse/pull/63401) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Now backups with azure blob storage will use multicopy. [#64116](https://github.com/ClickHouse/ClickHouse/pull/64116) ([alesapin](https://github.com/alesapin)). +* Allow to use native copy for azure even with different containers. [#64154](https://github.com/ClickHouse/ClickHouse/pull/64154) ([alesapin](https://github.com/alesapin)). +* Finally enable native copy for azure. [#64182](https://github.com/ClickHouse/ClickHouse/pull/64182) ([alesapin](https://github.com/alesapin)). +* Improve the iteration over sparse columns to reduce call of `size`. [#64497](https://github.com/ClickHouse/ClickHouse/pull/64497) ([Jiebin Sun](https://github.com/jiebinn)). + +#### Improvement +* Allow using `clickhouse-local` and its shortcuts `clickhouse` and `ch` with a query or queries file as a positional argument. Examples: `ch "SELECT 1"`, `ch --param_test Hello "SELECT {test:String}"`, `ch query.sql`. This closes [#62361](https://github.com/ClickHouse/ClickHouse/issues/62361). [#63081](https://github.com/ClickHouse/ClickHouse/pull/63081) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Enable plain_rewritable metadata for local and Azure (azure_blob_storage) object storages. [#63365](https://github.com/ClickHouse/ClickHouse/pull/63365) ([Julia Kartseva](https://github.com/jkartseva)). +* Support English-style Unicode quotes, e.g. “Hello”, ‘world’. This is questionable in general but helpful when you type your query in a word processor, such as Google Docs. This closes [#58634](https://github.com/ClickHouse/ClickHouse/issues/58634). [#63381](https://github.com/ClickHouse/ClickHouse/pull/63381) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Allow trailing commas in the columns list in the INSERT query. For example, `INSERT INTO test (a, b, c, ) VALUES ...`. [#63803](https://github.com/ClickHouse/ClickHouse/pull/63803) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Better exception messages for the `Regexp` format. [#63804](https://github.com/ClickHouse/ClickHouse/pull/63804) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Allow trailing commas in the `Values` format. For example, this query is allowed: `INSERT INTO test (a, b, c) VALUES (4, 5, 6,);`. [#63810](https://github.com/ClickHouse/ClickHouse/pull/63810) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Make rabbitmq nack broken messages. Closes [#45350](https://github.com/ClickHouse/ClickHouse/issues/45350). [#60312](https://github.com/ClickHouse/ClickHouse/pull/60312) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix a crash in asynchronous stack unwinding (such as when using the sampling query profiler) while interpreting debug info. This closes [#60460](https://github.com/ClickHouse/ClickHouse/issues/60460). [#60468](https://github.com/ClickHouse/ClickHouse/pull/60468) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Distinct messages for s3 error 'no key' for cases disk and storage. [#61108](https://github.com/ClickHouse/ClickHouse/pull/61108) ([Sema Checherinda](https://github.com/CheSema)). +* The progress bar will work for trivial queries with LIMIT from `system.zeros`, `system.zeros_mt` (it already works for `system.numbers` and `system.numbers_mt`), and the `generateRandom` table function. As a bonus, if the total number of records is greater than the `max_rows_to_read` limit, it will throw an exception earlier. This closes [#58183](https://github.com/ClickHouse/ClickHouse/issues/58183). [#61823](https://github.com/ClickHouse/ClickHouse/pull/61823) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Support for "Merge Key" in YAML configurations (this is a weird feature of YAML, please never mind). [#62685](https://github.com/ClickHouse/ClickHouse/pull/62685) ([Azat Khuzhin](https://github.com/azat)). +* Enhance error message when non-deterministic function is used with Replicated source. [#62896](https://github.com/ClickHouse/ClickHouse/pull/62896) ([Grégoire Pineau](https://github.com/lyrixx)). +* Fix interserver secret for Distributed over Distributed from `remote`. [#63013](https://github.com/ClickHouse/ClickHouse/pull/63013) ([Azat Khuzhin](https://github.com/azat)). +* Support `include_from` for YAML files. However, you should better use `config.d` [#63106](https://github.com/ClickHouse/ClickHouse/pull/63106) ([Eduard Karacharov](https://github.com/korowa)). +* Keep previous data in terminal after picking from skim suggestions. [#63261](https://github.com/ClickHouse/ClickHouse/pull/63261) ([FlameFactory](https://github.com/FlameFactory)). +* Width of fields (in Pretty formats or the `visibleWidth` function) now correctly ignores ANSI escape sequences. [#63270](https://github.com/ClickHouse/ClickHouse/pull/63270) ([Shaun Struwig](https://github.com/Blargian)). +* Update the usage of error code `NUMBER_OF_ARGUMENTS_DOESNT_MATCH` by more accurate error codes when appropriate. [#63406](https://github.com/ClickHouse/ClickHouse/pull/63406) ([Yohann Jardin](https://github.com/yohannj)). +* `os_user` and `client_hostname` are now correctly set up for queries for command line suggestions in clickhouse-client. This closes [#63430](https://github.com/ClickHouse/ClickHouse/issues/63430). [#63433](https://github.com/ClickHouse/ClickHouse/pull/63433) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Automatically correct `max_block_size` to the default value if it is zero. [#63587](https://github.com/ClickHouse/ClickHouse/pull/63587) ([Antonio Andelic](https://github.com/antonio2368)). +* Add a build_id ALIAS column to trace_log to facilitate auto renaming upon detecting binary changes. This is to address [#52086](https://github.com/ClickHouse/ClickHouse/issues/52086). [#63656](https://github.com/ClickHouse/ClickHouse/pull/63656) ([Zimu Li](https://github.com/woodlzm)). +* Enable truncate operation for object storage disks. [#63693](https://github.com/ClickHouse/ClickHouse/pull/63693) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* The loading of the keywords list is now dependent on the server revision and will be disabled for the old versions of ClickHouse server. CC @azat. [#63786](https://github.com/ClickHouse/ClickHouse/pull/63786) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Clickhouse disks have to read server setting to obtain actual metadata format version. [#63831](https://github.com/ClickHouse/ClickHouse/pull/63831) ([Sema Checherinda](https://github.com/CheSema)). +* Disable pretty format restrictions (`output_format_pretty_max_rows`/`output_format_pretty_max_value_width`) when stdout is not TTY. [#63942](https://github.com/ClickHouse/ClickHouse/pull/63942) ([Azat Khuzhin](https://github.com/azat)). +* Exception handling now works when ClickHouse is used inside AWS Lambda. Author: [Alexey Coolnev](https://github.com/acoolnev). [#64014](https://github.com/ClickHouse/ClickHouse/pull/64014) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Throw `CANNOT_DECOMPRESS` instread of `CORRUPTED_DATA` on invalid compressed data passed via HTTP. [#64036](https://github.com/ClickHouse/ClickHouse/pull/64036) ([vdimir](https://github.com/vdimir)). +* A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes [#61993](https://github.com/ClickHouse/ClickHouse/issues/61993). [#64084](https://github.com/ClickHouse/ClickHouse/pull/64084) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Added knob `metadata_storage_type` to keep free space on metadata storage disk. [#64128](https://github.com/ClickHouse/ClickHouse/pull/64128) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Add metrics, logs, and thread names around parts filtering with indices. [#64130](https://github.com/ClickHouse/ClickHouse/pull/64130) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Metrics to track the number of directories created and removed by the `plain_rewritable` metadata storage, and the number of entries in the local-to-remote in-memory map. [#64175](https://github.com/ClickHouse/ClickHouse/pull/64175) ([Julia Kartseva](https://github.com/jkartseva)). +* Ignore `allow_suspicious_primary_key` on `ATTACH` and verify on `ALTER`. [#64202](https://github.com/ClickHouse/ClickHouse/pull/64202) ([Azat Khuzhin](https://github.com/azat)). +* The query cache now considers identical queries with different settings as different. This increases robustness in cases where different settings (e.g. `limit` or `additional_table_filters`) would affect the query result. [#64205](https://github.com/ClickHouse/ClickHouse/pull/64205) ([Robert Schulze](https://github.com/rschu1ze)). +* Test that a non standard error code `QPSLimitExceeded` is supported and it is retryable error. [#64225](https://github.com/ClickHouse/ClickHouse/pull/64225) ([Sema Checherinda](https://github.com/CheSema)). +* Settings from the user config doesn't affect merges and mutations for MergeTree on top of object storage. [#64456](https://github.com/ClickHouse/ClickHouse/pull/64456) ([alesapin](https://github.com/alesapin)). +* Test that `totalqpslimitexceeded` is a retriable s3 error. [#64520](https://github.com/ClickHouse/ClickHouse/pull/64520) ([Sema Checherinda](https://github.com/CheSema)). + +#### Build/Testing/Packaging Improvement +* ClickHouse is built with clang-18. A lot of new checks from clang-tidy-18 have been enabled. [#60469](https://github.com/ClickHouse/ClickHouse/pull/60469) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Experimentally support loongarch64 as a new platform for ClickHouse. [#63733](https://github.com/ClickHouse/ClickHouse/pull/63733) ([qiangxuhui](https://github.com/qiangxuhui)). +* The Dockerfile is reviewed by the docker official library in https://github.com/docker-library/official-images/pull/15846. [#63400](https://github.com/ClickHouse/ClickHouse/pull/63400) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Information about every symbol in every translation unit will be collected in the CI database for every build in the CI. This closes [#63494](https://github.com/ClickHouse/ClickHouse/issues/63494). [#63495](https://github.com/ClickHouse/ClickHouse/pull/63495) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Update Apache Datasketches library. It resolves [#63858](https://github.com/ClickHouse/ClickHouse/issues/63858). [#63923](https://github.com/ClickHouse/ClickHouse/pull/63923) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Enable GRPC support for aarch64 linux while cross-compiling binary. [#64072](https://github.com/ClickHouse/ClickHouse/pull/64072) ([alesapin](https://github.com/alesapin)). +* Fix unwind on SIGSEGV on aarch64 (due to small stack for signal) [#64058](https://github.com/ClickHouse/ClickHouse/pull/64058) ([Azat Khuzhin](https://github.com/azat)). + +#### Bug Fix +* Disabled `enable_vertical_final` setting by default. This feature should not be used because it has a bug: [#64543](https://github.com/ClickHouse/ClickHouse/issues/64543). [#64544](https://github.com/ClickHouse/ClickHouse/pull/64544) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Fix making backup when multiple shards are used [#57684](https://github.com/ClickHouse/ClickHouse/pull/57684) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix passing projections/indexes/primary key from columns list from CREATE query into inner table of MV [#59183](https://github.com/ClickHouse/ClickHouse/pull/59183) ([Azat Khuzhin](https://github.com/azat)). +* Fix boundRatio incorrect merge [#60532](https://github.com/ClickHouse/ClickHouse/pull/60532) ([Tao Wang](https://github.com/wangtZJU)). +* Fix crash when calling some functions on const low-cardinality columns [#61966](https://github.com/ClickHouse/ClickHouse/pull/61966) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix queries with FINAL give wrong result when table does not use adaptive granularity [#62432](https://github.com/ClickHouse/ClickHouse/pull/62432) ([Duc Canh Le](https://github.com/canhld94)). +* Improve detection of cgroups v2 support for memory controllers [#62903](https://github.com/ClickHouse/ClickHouse/pull/62903) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix subsequent use of external tables in client [#62964](https://github.com/ClickHouse/ClickHouse/pull/62964) ([Azat Khuzhin](https://github.com/azat)). +* Fix crash with untuple and unresolved lambda [#63131](https://github.com/ClickHouse/ClickHouse/pull/63131) ([Raúl Marín](https://github.com/Algunenano)). +* Fix premature server listen for connections [#63181](https://github.com/ClickHouse/ClickHouse/pull/63181) ([alesapin](https://github.com/alesapin)). +* Fix intersecting parts when restarting after a DROP PART command [#63202](https://github.com/ClickHouse/ClickHouse/pull/63202) ([Han Fei](https://github.com/hanfei1991)). +* Correctly load SQL security defaults during startup [#63209](https://github.com/ClickHouse/ClickHouse/pull/63209) ([pufit](https://github.com/pufit)). +* JOIN filter push down filter join fix [#63234](https://github.com/ClickHouse/ClickHouse/pull/63234) ([Maksim Kita](https://github.com/kitaisreal)). +* Fix infinite loop in AzureObjectStorage::listObjects [#63257](https://github.com/ClickHouse/ClickHouse/pull/63257) ([Julia Kartseva](https://github.com/jkartseva)). +* CROSS join ignore join_algorithm setting [#63273](https://github.com/ClickHouse/ClickHouse/pull/63273) ([vdimir](https://github.com/vdimir)). +* Fix finalize WriteBufferToFileSegment and StatusFile [#63346](https://github.com/ClickHouse/ClickHouse/pull/63346) ([vdimir](https://github.com/vdimir)). +* Fix logical error during SELECT query after ALTER in rare case [#63353](https://github.com/ClickHouse/ClickHouse/pull/63353) ([alesapin](https://github.com/alesapin)). +* Fix `X-ClickHouse-Timezone` header with `session_timezone` [#63377](https://github.com/ClickHouse/ClickHouse/pull/63377) ([Andrey Zvonov](https://github.com/zvonand)). +* Fix debug assert when using grouping WITH ROLLUP and LowCardinality types [#63398](https://github.com/ClickHouse/ClickHouse/pull/63398) ([Raúl Marín](https://github.com/Algunenano)). +* Small fixes for group_by_use_nulls [#63405](https://github.com/ClickHouse/ClickHouse/pull/63405) ([vdimir](https://github.com/vdimir)). +* Fix backup/restore of projection part in case projection was removed from table metadata, but part still has projection [#63426](https://github.com/ClickHouse/ClickHouse/pull/63426) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix mysql dictionary source [#63481](https://github.com/ClickHouse/ClickHouse/pull/63481) ([vdimir](https://github.com/vdimir)). +* Insert QueryFinish on AsyncInsertFlush with no data [#63483](https://github.com/ClickHouse/ClickHouse/pull/63483) ([Raúl Marín](https://github.com/Algunenano)). +* Fix: empty used_dictionaries in system.query_log [#63487](https://github.com/ClickHouse/ClickHouse/pull/63487) ([Eduard Karacharov](https://github.com/korowa)). +* Make `MergeTreePrefetchedReadPool` safer [#63513](https://github.com/ClickHouse/ClickHouse/pull/63513) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix crash on exit with sentry enabled (due to openssl destroyed before sentry) [#63548](https://github.com/ClickHouse/ClickHouse/pull/63548) ([Azat Khuzhin](https://github.com/azat)). +* Fix Array and Map support with Keyed hashing [#63628](https://github.com/ClickHouse/ClickHouse/pull/63628) ([Salvatore Mesoraca](https://github.com/aiven-sal)). +* Fix filter pushdown for Parquet and maybe StorageMerge [#63642](https://github.com/ClickHouse/ClickHouse/pull/63642) ([Michael Kolupaev](https://github.com/al13n321)). +* Prevent conversion to Replicated if zookeeper path already exists [#63670](https://github.com/ClickHouse/ClickHouse/pull/63670) ([Kirill](https://github.com/kirillgarbar)). +* Analyzer: views read only necessary columns [#63688](https://github.com/ClickHouse/ClickHouse/pull/63688) ([Maksim Kita](https://github.com/kitaisreal)). +* Analyzer: Forbid WINDOW redefinition [#63694](https://github.com/ClickHouse/ClickHouse/pull/63694) ([Dmitry Novik](https://github.com/novikd)). +* flatten_nested was broken with the experimental Replicated database. [#63695](https://github.com/ClickHouse/ClickHouse/pull/63695) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix [#63653](https://github.com/ClickHouse/ClickHouse/issues/63653) [#63722](https://github.com/ClickHouse/ClickHouse/pull/63722) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Allow cast from Array(Nothing) to Map(Nothing, Nothing) [#63753](https://github.com/ClickHouse/ClickHouse/pull/63753) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix ILLEGAL_COLUMN in partial_merge join [#63755](https://github.com/ClickHouse/ClickHouse/pull/63755) ([vdimir](https://github.com/vdimir)). +* Fix: remove redundant distinct with window functions [#63776](https://github.com/ClickHouse/ClickHouse/pull/63776) ([Igor Nikonov](https://github.com/devcrafter)). +* Fix possible crash with SYSTEM UNLOAD PRIMARY KEY [#63778](https://github.com/ClickHouse/ClickHouse/pull/63778) ([Raúl Marín](https://github.com/Algunenano)). +* Fix a query with duplicating cycling alias. [#63791](https://github.com/ClickHouse/ClickHouse/pull/63791) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Make `TokenIterator` lazy as it should be [#63801](https://github.com/ClickHouse/ClickHouse/pull/63801) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Add `endpoint_subpath` S3 URI setting [#63806](https://github.com/ClickHouse/ClickHouse/pull/63806) ([Julia Kartseva](https://github.com/jkartseva)). +* Fix deadlock in `ParallelReadBuffer` [#63814](https://github.com/ClickHouse/ClickHouse/pull/63814) ([Antonio Andelic](https://github.com/antonio2368)). +* JOIN filter push down equivalent columns fix [#63819](https://github.com/ClickHouse/ClickHouse/pull/63819) ([Maksim Kita](https://github.com/kitaisreal)). +* Remove data from all disks after DROP with Lazy database. [#63848](https://github.com/ClickHouse/ClickHouse/pull/63848) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Fix incorrect result when reading from MV with parallel replicas and new analyzer [#63861](https://github.com/ClickHouse/ClickHouse/pull/63861) ([Nikita Taranov](https://github.com/nickitat)). +* Fixes in `find_super_nodes` and `find_big_family` command of keeper-client [#63862](https://github.com/ClickHouse/ClickHouse/pull/63862) ([Alexander Gololobov](https://github.com/davenger)). +* Update lambda execution name [#63864](https://github.com/ClickHouse/ClickHouse/pull/63864) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix SIGSEGV due to CPU/Real profiler [#63865](https://github.com/ClickHouse/ClickHouse/pull/63865) ([Azat Khuzhin](https://github.com/azat)). +* Fix `EXPLAIN CURRENT TRANSACTION` query [#63926](https://github.com/ClickHouse/ClickHouse/pull/63926) ([Anton Popov](https://github.com/CurtizJ)). +* Fix analyzer: there's turtles all the way down... [#63930](https://github.com/ClickHouse/ClickHouse/pull/63930) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Allow certain ALTER TABLE commands for `plain_rewritable` disk [#63933](https://github.com/ClickHouse/ClickHouse/pull/63933) ([Julia Kartseva](https://github.com/jkartseva)). +* Recursive CTE distributed fix [#63939](https://github.com/ClickHouse/ClickHouse/pull/63939) ([Maksim Kita](https://github.com/kitaisreal)). +* Fix reading of columns of type `Tuple(Map(LowCardinality(...)))` [#63956](https://github.com/ClickHouse/ClickHouse/pull/63956) ([Anton Popov](https://github.com/CurtizJ)). +* Analyzer: Fix COLUMNS resolve [#63962](https://github.com/ClickHouse/ClickHouse/pull/63962) ([Dmitry Novik](https://github.com/novikd)). +* LIMIT BY and skip_unused_shards with analyzer [#63983](https://github.com/ClickHouse/ClickHouse/pull/63983) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* A fix for some trash (experimental Kusto) [#63992](https://github.com/ClickHouse/ClickHouse/pull/63992) ([Yong Wang](https://github.com/kashwy)). +* Deserialize untrusted binary inputs in a safer way [#64024](https://github.com/ClickHouse/ClickHouse/pull/64024) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix query analysis for queries with the setting `final` = 1 for Distributed tables over tables from other than the MergeTree family. [#64037](https://github.com/ClickHouse/ClickHouse/pull/64037) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Add missing settings to recoverLostReplica [#64040](https://github.com/ClickHouse/ClickHouse/pull/64040) ([Raúl Marín](https://github.com/Algunenano)). +* Fix SQL security access checks with analyzer [#64079](https://github.com/ClickHouse/ClickHouse/pull/64079) ([pufit](https://github.com/pufit)). +* Fix analyzer: only interpolate expression should be used for DAG [#64096](https://github.com/ClickHouse/ClickHouse/pull/64096) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fix azure backup writing multipart blocks by 1 MiB (read buffer size) instead of `max_upload_part_size` (in non-native copy case) [#64117](https://github.com/ClickHouse/ClickHouse/pull/64117) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Correctly fallback during backup copy [#64153](https://github.com/ClickHouse/ClickHouse/pull/64153) ([Antonio Andelic](https://github.com/antonio2368)). +* Prevent LOGICAL_ERROR on CREATE TABLE as Materialized View [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)). +* Query Cache: Consider identical queries against different databases as different [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)). +* Ignore `text_log` for Keeper [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix ARRAY JOIN with Distributed. [#64226](https://github.com/ClickHouse/ClickHouse/pull/64226) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix: CNF with mutually exclusive atoms reduction [#64256](https://github.com/ClickHouse/ClickHouse/pull/64256) ([Eduard Karacharov](https://github.com/korowa)). +* Fix Logical error: Bad cast for Buffer table with prewhere. [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). + + ### ClickHouse release 24.4, 2024-04-30 #### Upgrade Notes diff --git a/contrib/aws b/contrib/aws index eb96e740453..deeaa9e7c5f 160000 --- a/contrib/aws +++ b/contrib/aws @@ -1 +1 @@ -Subproject commit eb96e740453ae27afa1f367ba19f99bdcb38484d +Subproject commit deeaa9e7c5fe690e3dacc4005d7ecfa7a66a32bb diff --git a/docs/_includes/install/deb_repo.sh b/docs/_includes/install/deb_repo.sh deleted file mode 100644 index 21106e9fc47..00000000000 --- a/docs/_includes/install/deb_repo.sh +++ /dev/null @@ -1,11 +0,0 @@ -sudo apt-get install apt-transport-https ca-certificates dirmngr -sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 - -echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \ - /etc/apt/sources.list.d/clickhouse.list -sudo apt-get update - -sudo apt-get install -y clickhouse-server clickhouse-client - -sudo service clickhouse-server start -clickhouse-client # or "clickhouse-client --password" if you set up a password. diff --git a/docs/_includes/install/rpm_repo.sh b/docs/_includes/install/rpm_repo.sh deleted file mode 100644 index e3fd1232047..00000000000 --- a/docs/_includes/install/rpm_repo.sh +++ /dev/null @@ -1,7 +0,0 @@ -sudo yum install yum-utils -sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG -sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo -sudo yum install clickhouse-server clickhouse-client - -sudo /etc/init.d/clickhouse-server start -clickhouse-client # or "clickhouse-client --password" if you set up a password. diff --git a/docs/_includes/install/tgz_repo.sh b/docs/_includes/install/tgz_repo.sh deleted file mode 100644 index 0994510755b..00000000000 --- a/docs/_includes/install/tgz_repo.sh +++ /dev/null @@ -1,19 +0,0 @@ -export LATEST_VERSION=$(curl -s https://repo.clickhouse.com/tgz/stable/ | \ - grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1) -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz - -tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz -sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh - -tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz -sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh - -tar -xzvf clickhouse-server-$LATEST_VERSION.tgz -sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh -sudo /etc/init.d/clickhouse-server start - -tar -xzvf clickhouse-client-$LATEST_VERSION.tgz -sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh diff --git a/docs/changelogs/v23.8.1.2992-lts.md b/docs/changelogs/v23.8.1.2992-lts.md index 05385d9c52b..62326533a79 100644 --- a/docs/changelogs/v23.8.1.2992-lts.md +++ b/docs/changelogs/v23.8.1.2992-lts.md @@ -33,7 +33,7 @@ sidebar_label: 2023 * Add input format One that doesn't read any data and always returns single row with column `dummy` with type `UInt8` and value `0` like `system.one`. It can be used together with `_file/_path` virtual columns to list files in file/s3/url/hdfs/etc table functions without reading any data. [#53209](https://github.com/ClickHouse/ClickHouse/pull/53209) ([Kruglov Pavel](https://github.com/Avogar)). * Add tupleConcat function. Closes [#52759](https://github.com/ClickHouse/ClickHouse/issues/52759). [#53239](https://github.com/ClickHouse/ClickHouse/pull/53239) ([Nikolay Degterinsky](https://github.com/evillique)). * Support `TRUNCATE DATABASE` operation. [#53261](https://github.com/ClickHouse/ClickHouse/pull/53261) ([Bharat Nallan](https://github.com/bharatnc)). -* Add max_threads_for_indexes setting to limit number of threads used for primary key processing. [#53313](https://github.com/ClickHouse/ClickHouse/pull/53313) ([jorisgio](https://github.com/jorisgio)). +* Add max_threads_for_indexes setting to limit number of threads used for primary key processing. [#53313](https://github.com/ClickHouse/ClickHouse/pull/53313) ([Joris Giovannangeli](https://github.com/jorisgio)). * Add experimental support for HNSW as approximate neighbor search method. [#53447](https://github.com/ClickHouse/ClickHouse/pull/53447) ([Davit Vardanyan](https://github.com/davvard)). * Re-add SipHash keyed functions. [#53525](https://github.com/ClickHouse/ClickHouse/pull/53525) ([Salvatore Mesoraca](https://github.com/aiven-sal)). * ([#52755](https://github.com/ClickHouse/ClickHouse/issues/52755) , [#52895](https://github.com/ClickHouse/ClickHouse/issues/52895)) Added functions `arrayRotateLeft`, `arrayRotateRight`, `arrayShiftLeft`, `arrayShiftRight`. [#53557](https://github.com/ClickHouse/ClickHouse/pull/53557) ([Mikhail Koviazin](https://github.com/mkmkme)). @@ -72,7 +72,7 @@ sidebar_label: 2023 * Add ability to log when max_partitions_per_insert_block is reached ... [#50948](https://github.com/ClickHouse/ClickHouse/pull/50948) ([Sean Haynes](https://github.com/seandhaynes)). * Added a bunch of custom commands (mostly to make ClickHouse debugging easier). [#51117](https://github.com/ClickHouse/ClickHouse/pull/51117) ([pufit](https://github.com/pufit)). * Updated check for connection_string as connection string with sas does not always begin with DefaultEndPoint and updated connection url to include sas token after adding container to url. [#51141](https://github.com/ClickHouse/ClickHouse/pull/51141) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* Fix description for filtering sets in full_sorting_merge join. [#51329](https://github.com/ClickHouse/ClickHouse/pull/51329) ([Tanay Tummalapalli](https://github.com/ttanay)). +* Fix description for filtering sets in full_sorting_merge join. [#51329](https://github.com/ClickHouse/ClickHouse/pull/51329) ([ttanay](https://github.com/ttanay)). * The sizes of the (index) uncompressed/mark, mmap and query caches can now be configured dynamically at runtime. [#51446](https://github.com/ClickHouse/ClickHouse/pull/51446) ([Robert Schulze](https://github.com/rschu1ze)). * Fixed memory consumption in `Aggregator` when `max_block_size` is huge. [#51566](https://github.com/ClickHouse/ClickHouse/pull/51566) ([Nikita Taranov](https://github.com/nickitat)). * Add `SYSTEM SYNC FILESYSTEM CACHE` command. It will compare in-memory state of filesystem cache with what it has on disk and fix in-memory state if needed. [#51622](https://github.com/ClickHouse/ClickHouse/pull/51622) ([Kseniia Sumarokova](https://github.com/kssenii)). @@ -80,10 +80,10 @@ sidebar_label: 2023 * Support reading tuple subcolumns from file/s3/hdfs/url/azureBlobStorage table functions. [#51806](https://github.com/ClickHouse/ClickHouse/pull/51806) ([Kruglov Pavel](https://github.com/Avogar)). * Function `arrayIntersect` now returns the values sorted like the first argument. Closes [#27622](https://github.com/ClickHouse/ClickHouse/issues/27622). [#51850](https://github.com/ClickHouse/ClickHouse/pull/51850) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). * Add new queries, which allow to create/drop of access entities in specified access storage or move access entities from one access storage to another. [#51912](https://github.com/ClickHouse/ClickHouse/pull/51912) ([pufit](https://github.com/pufit)). -* ALTER TABLE FREEZE are not replicated in Replicated engine. [#52064](https://github.com/ClickHouse/ClickHouse/pull/52064) ([Mike Kot](https://github.com/myrrc)). +* ALTER TABLE FREEZE are not replicated in Replicated engine. [#52064](https://github.com/ClickHouse/ClickHouse/pull/52064) ([Mikhail Kot](https://github.com/myrrc)). * Added possibility to flush logs to the disk on crash - Added logs buffer configuration. [#52174](https://github.com/ClickHouse/ClickHouse/pull/52174) ([Alexey Gerasimchuck](https://github.com/Demilivor)). -* Fix S3 table function does not work for pre-signed URL. close [#50846](https://github.com/ClickHouse/ClickHouse/issues/50846). [#52310](https://github.com/ClickHouse/ClickHouse/pull/52310) ([chen](https://github.com/xiedeyantu)). -* System.events and system.metrics tables add column name as an alias to event and metric. close [#51257](https://github.com/ClickHouse/ClickHouse/issues/51257). [#52315](https://github.com/ClickHouse/ClickHouse/pull/52315) ([chen](https://github.com/xiedeyantu)). +* Fix S3 table function does not work for pre-signed URL. close [#50846](https://github.com/ClickHouse/ClickHouse/issues/50846). [#52310](https://github.com/ClickHouse/ClickHouse/pull/52310) ([Jensen](https://github.com/xiedeyantu)). +* System.events and system.metrics tables add column name as an alias to event and metric. close [#51257](https://github.com/ClickHouse/ClickHouse/issues/51257). [#52315](https://github.com/ClickHouse/ClickHouse/pull/52315) ([Jensen](https://github.com/xiedeyantu)). * Added support of syntax `CREATE UNIQUE INDEX` in parser for better SQL compatibility. `UNIQUE` index is not supported. Set `create_index_ignore_unique=1` to ignore UNIQUE keyword in queries. [#52320](https://github.com/ClickHouse/ClickHouse/pull/52320) ([Ilya Yatsishin](https://github.com/qoega)). * Add support of predefined macro (`{database}` and `{table}`) in some kafka engine settings: topic, consumer, client_id, etc. [#52386](https://github.com/ClickHouse/ClickHouse/pull/52386) ([Yury Bogomolov](https://github.com/ybogo)). * Disable updating fs cache during backup/restore. Filesystem cache must not be updated during backup/restore, it seems it just slows down the process without any profit (because the BACKUP command can read a lot of data and it's no use to put all the data to the filesystem cache and immediately evict it). [#52402](https://github.com/ClickHouse/ClickHouse/pull/52402) ([Vitaly Baranov](https://github.com/vitlibar)). @@ -107,7 +107,7 @@ sidebar_label: 2023 * Use the same default paths for `clickhouse_keeper` (symlink) as for `clickhouse_keeper` (executable). [#52861](https://github.com/ClickHouse/ClickHouse/pull/52861) ([Vitaly Baranov](https://github.com/vitlibar)). * CVE-2016-2183: disable 3DES. [#52893](https://github.com/ClickHouse/ClickHouse/pull/52893) ([Kenji Noguchi](https://github.com/knoguchi)). * Load filesystem cache metadata on startup in parallel. Configured by `load_metadata_threads` (default: 1) cache config setting. Related to [#52037](https://github.com/ClickHouse/ClickHouse/issues/52037). [#52943](https://github.com/ClickHouse/ClickHouse/pull/52943) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Improve error message for table function remote. Closes [#40220](https://github.com/ClickHouse/ClickHouse/issues/40220). [#52959](https://github.com/ClickHouse/ClickHouse/pull/52959) ([jiyoungyoooo](https://github.com/jiyoungyoooo)). +* Improve error message for table function remote. Closes [#40220](https://github.com/ClickHouse/ClickHouse/issues/40220). [#52959](https://github.com/ClickHouse/ClickHouse/pull/52959) ([Jiyoung Yoo](https://github.com/jiyoungyoooo)). * Added the possibility to specify custom storage policy in the `SETTINGS` clause of `RESTORE` queries. [#52970](https://github.com/ClickHouse/ClickHouse/pull/52970) ([Victor Krasnov](https://github.com/sirvickr)). * Add the ability to throttle the S3 requests on backup operations (`BACKUP` and `RESTORE` commands now honor `s3_max_[get/put]_[rps/burst]`). [#52974](https://github.com/ClickHouse/ClickHouse/pull/52974) ([Daniel Pozo Escalona](https://github.com/danipozo)). * Add settings to ignore ON CLUSTER clause in queries for management of replicated user-defined functions or access control entities with replicated storage. [#52975](https://github.com/ClickHouse/ClickHouse/pull/52975) ([Aleksei Filatov](https://github.com/aalexfvk)). @@ -127,7 +127,7 @@ sidebar_label: 2023 * Server settings asynchronous_metrics_update_period_s and asynchronous_heavy_metrics_update_period_s configured to 0 now fail gracefully instead of crash the server. [#53428](https://github.com/ClickHouse/ClickHouse/pull/53428) ([Robert Schulze](https://github.com/rschu1ze)). * Previously the caller could register the same watch callback multiple times. In that case each entry was consuming memory and the same callback was called multiple times which didn't make much sense. In order to avoid this the caller could have some logic to not add the same watch multiple times. With this change this deduplication is done internally if the watch callback is passed via shared_ptr. [#53452](https://github.com/ClickHouse/ClickHouse/pull/53452) ([Alexander Gololobov](https://github.com/davenger)). * The ClickHouse server now respects memory limits changed via cgroups when reloading its configuration. [#53455](https://github.com/ClickHouse/ClickHouse/pull/53455) ([Robert Schulze](https://github.com/rschu1ze)). -* Add ability to turn off flush of Distributed tables on `DETACH`/`DROP`/server shutdown. [#53501](https://github.com/ClickHouse/ClickHouse/pull/53501) ([Azat Khuzhin](https://github.com/azat)). +* Add ability to turn off flush of Distributed tables on `DETACH`/`DROP`/server shutdown (`flush_on_detach` setting for `Distributed`). [#53501](https://github.com/ClickHouse/ClickHouse/pull/53501) ([Azat Khuzhin](https://github.com/azat)). * Domainrfc support ipv6(ip literal within square brackets). [#53506](https://github.com/ClickHouse/ClickHouse/pull/53506) ([Chen768959](https://github.com/Chen768959)). * Use filter by file/path before reading in url/file/hdfs table functins. [#53529](https://github.com/ClickHouse/ClickHouse/pull/53529) ([Kruglov Pavel](https://github.com/Avogar)). * Use longer timeout for S3 CopyObject requests. [#53533](https://github.com/ClickHouse/ClickHouse/pull/53533) ([Michael Kolupaev](https://github.com/al13n321)). @@ -186,71 +186,71 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Do not reset Annoy index during build-up with > 1 mark [#51325](https://github.com/ClickHouse/ClickHouse/pull/51325) ([Tian Xinhui](https://github.com/xinhuitian)). -* Fix usage of temporary directories during RESTORE [#51493](https://github.com/ClickHouse/ClickHouse/pull/51493) ([Azat Khuzhin](https://github.com/azat)). -* Fix binary arithmetic for Nullable(IPv4) [#51642](https://github.com/ClickHouse/ClickHouse/pull/51642) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Support IPv4 and IPv6 as dictionary attributes [#51756](https://github.com/ClickHouse/ClickHouse/pull/51756) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Bug fix for checksum of compress marks [#51777](https://github.com/ClickHouse/ClickHouse/pull/51777) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* Fix mistakenly comma parsing as part of datetime in CSV best effort parsing [#51950](https://github.com/ClickHouse/ClickHouse/pull/51950) ([Kruglov Pavel](https://github.com/Avogar)). -* Don't throw exception when exec udf has parameters [#51961](https://github.com/ClickHouse/ClickHouse/pull/51961) ([Nikita Taranov](https://github.com/nickitat)). -* Fix recalculation of skip indexes and projections in `ALTER DELETE` queries [#52530](https://github.com/ClickHouse/ClickHouse/pull/52530) ([Anton Popov](https://github.com/CurtizJ)). -* MaterializedMySQL: Fix the infinite loop in ReadBuffer::read [#52621](https://github.com/ClickHouse/ClickHouse/pull/52621) ([Val Doroshchuk](https://github.com/valbok)). -* Load suggestion only with `clickhouse` dialect [#52628](https://github.com/ClickHouse/ClickHouse/pull/52628) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* init and destroy ares channel on demand.. [#52634](https://github.com/ClickHouse/ClickHouse/pull/52634) ([Arthur Passos](https://github.com/arthurpassos)). -* RFC: Fix filtering by virtual columns with OR expression [#52653](https://github.com/ClickHouse/ClickHouse/pull/52653) ([Azat Khuzhin](https://github.com/azat)). -* Fix crash in function `tuple` with one sparse column argument [#52659](https://github.com/ClickHouse/ClickHouse/pull/52659) ([Anton Popov](https://github.com/CurtizJ)). -* Fix named collections on cluster 23.7 [#52687](https://github.com/ClickHouse/ClickHouse/pull/52687) ([Al Korgun](https://github.com/alkorgun)). -* Fix reading of unnecessary column in case of multistage `PREWHERE` [#52689](https://github.com/ClickHouse/ClickHouse/pull/52689) ([Anton Popov](https://github.com/CurtizJ)). -* Fix unexpected sort result on multi columns with nulls first direction [#52761](https://github.com/ClickHouse/ClickHouse/pull/52761) ([copperybean](https://github.com/copperybean)). -* Fix data race in Keeper reconfiguration [#52804](https://github.com/ClickHouse/ClickHouse/pull/52804) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix sorting of sparse columns with large limit [#52827](https://github.com/ClickHouse/ClickHouse/pull/52827) ([Anton Popov](https://github.com/CurtizJ)). -* clickhouse-keeper: fix implementation of server with poll() [#52833](https://github.com/ClickHouse/ClickHouse/pull/52833) ([Andy Fiddaman](https://github.com/citrus-it)). -* make regexp analyzer recognize named capturing groups [#52840](https://github.com/ClickHouse/ClickHouse/pull/52840) ([Han Fei](https://github.com/hanfei1991)). -* Fix possible assert in ~PushingAsyncPipelineExecutor in clickhouse-local [#52862](https://github.com/ClickHouse/ClickHouse/pull/52862) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix reading of empty `Nested(Array(LowCardinality(...)))` [#52949](https://github.com/ClickHouse/ClickHouse/pull/52949) ([Anton Popov](https://github.com/CurtizJ)). -* Added new tests for session_log and fixed the inconsistency between login and logout. [#52958](https://github.com/ClickHouse/ClickHouse/pull/52958) ([Alexey Gerasimchuck](https://github.com/Demilivor)). -* Fix password leak in show create mysql table [#52962](https://github.com/ClickHouse/ClickHouse/pull/52962) ([Duc Canh Le](https://github.com/canhld94)). -* Convert sparse to full in CreateSetAndFilterOnTheFlyStep [#53000](https://github.com/ClickHouse/ClickHouse/pull/53000) ([vdimir](https://github.com/vdimir)). -* Fix rare race condition with empty key prefix directory deletion in fs cache [#53055](https://github.com/ClickHouse/ClickHouse/pull/53055) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix ZstdDeflatingWriteBuffer truncating the output sometimes [#53064](https://github.com/ClickHouse/ClickHouse/pull/53064) ([Michael Kolupaev](https://github.com/al13n321)). -* Fix query_id in part_log with async flush queries [#53103](https://github.com/ClickHouse/ClickHouse/pull/53103) ([Raúl Marín](https://github.com/Algunenano)). -* Fix possible error from cache "Read unexpected size" [#53121](https://github.com/ClickHouse/ClickHouse/pull/53121) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Disable the new parquet encoder [#53130](https://github.com/ClickHouse/ClickHouse/pull/53130) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Not-ready Set [#53162](https://github.com/ClickHouse/ClickHouse/pull/53162) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix character escaping in the PostgreSQL engine [#53250](https://github.com/ClickHouse/ClickHouse/pull/53250) ([Nikolay Degterinsky](https://github.com/evillique)). -* #2 Added new tests for session_log and fixed the inconsistency between login and logout. [#53255](https://github.com/ClickHouse/ClickHouse/pull/53255) ([Alexey Gerasimchuck](https://github.com/Demilivor)). -* #3 Fixed inconsistency between login success and logout [#53302](https://github.com/ClickHouse/ClickHouse/pull/53302) ([Alexey Gerasimchuck](https://github.com/Demilivor)). -* Fix adding sub-second intervals to DateTime [#53309](https://github.com/ClickHouse/ClickHouse/pull/53309) ([Michael Kolupaev](https://github.com/al13n321)). -* Fix "Context has expired" error in dictionaries [#53342](https://github.com/ClickHouse/ClickHouse/pull/53342) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix incorrect normal projection AST format [#53347](https://github.com/ClickHouse/ClickHouse/pull/53347) ([Amos Bird](https://github.com/amosbird)). -* Forbid use_structure_from_insertion_table_in_table_functions when execute Scalar [#53348](https://github.com/ClickHouse/ClickHouse/pull/53348) ([flynn](https://github.com/ucasfl)). -* Fix loading lazy database during system.table select query [#53372](https://github.com/ClickHouse/ClickHouse/pull/53372) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* Fixed system.data_skipping_indices for MaterializedMySQL [#53381](https://github.com/ClickHouse/ClickHouse/pull/53381) ([Filipp Ozinov](https://github.com/bakwc)). -* Fix processing single carriage return in TSV file segmentation engine [#53407](https://github.com/ClickHouse/ClickHouse/pull/53407) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix 'Context has expired' error properly [#53433](https://github.com/ClickHouse/ClickHouse/pull/53433) ([Michael Kolupaev](https://github.com/al13n321)). -* Fix timeout_overflow_mode when having subquery in the rhs of IN [#53439](https://github.com/ClickHouse/ClickHouse/pull/53439) ([Duc Canh Le](https://github.com/canhld94)). -* Fix an unexpected behavior in [#53152](https://github.com/ClickHouse/ClickHouse/issues/53152) [#53440](https://github.com/ClickHouse/ClickHouse/pull/53440) ([Zhiguo Zhou](https://github.com/ZhiguoZh)). -* Fix JSON_QUERY Function parse error while path is all number [#53470](https://github.com/ClickHouse/ClickHouse/pull/53470) ([KevinyhZou](https://github.com/KevinyhZou)). -* Fix wrong columns order for queries with parallel FINAL. [#53489](https://github.com/ClickHouse/ClickHouse/pull/53489) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fixed SELECTing from ReplacingMergeTree with do_not_merge_across_partitions_select_final [#53511](https://github.com/ClickHouse/ClickHouse/pull/53511) ([Vasily Nemkov](https://github.com/Enmk)). -* bugfix: Flush async insert queue first on shutdown [#53547](https://github.com/ClickHouse/ClickHouse/pull/53547) ([joelynch](https://github.com/joelynch)). -* Fix crash in join on sparse column [#53548](https://github.com/ClickHouse/ClickHouse/pull/53548) ([vdimir](https://github.com/vdimir)). -* Fix possible UB in Set skipping index for functions with incorrect args [#53559](https://github.com/ClickHouse/ClickHouse/pull/53559) ([Azat Khuzhin](https://github.com/azat)). -* Fix possible UB in inverted indexes (experimental feature) [#53560](https://github.com/ClickHouse/ClickHouse/pull/53560) ([Azat Khuzhin](https://github.com/azat)). -* Fix: interpolate expression takes source column instead of same name aliased from select expression. [#53572](https://github.com/ClickHouse/ClickHouse/pull/53572) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Fix number of dropped granules in EXPLAIN PLAN index=1 [#53616](https://github.com/ClickHouse/ClickHouse/pull/53616) ([wangxiaobo](https://github.com/wzb5212)). -* Correctly handle totals and extremes with `DelayedSource` [#53644](https://github.com/ClickHouse/ClickHouse/pull/53644) ([Antonio Andelic](https://github.com/antonio2368)). -* Prepared set cache in mutation pipeline stuck [#53645](https://github.com/ClickHouse/ClickHouse/pull/53645) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix bug on mutations with subcolumns of type JSON in predicates of UPDATE and DELETE queries. [#53677](https://github.com/ClickHouse/ClickHouse/pull/53677) ([VanDarkholme7](https://github.com/VanDarkholme7)). -* Fix filter pushdown for full_sorting_merge join [#53699](https://github.com/ClickHouse/ClickHouse/pull/53699) ([vdimir](https://github.com/vdimir)). -* Try to fix bug with NULL::LowCardinality(Nullable(...)) NOT IN [#53706](https://github.com/ClickHouse/ClickHouse/pull/53706) ([Andrey Zvonov](https://github.com/zvonand)). -* Fix: sorted distinct with sparse columns [#53711](https://github.com/ClickHouse/ClickHouse/pull/53711) ([Igor Nikonov](https://github.com/devcrafter)). -* transform: correctly handle default column with multiple rows [#53742](https://github.com/ClickHouse/ClickHouse/pull/53742) ([Salvatore Mesoraca](https://github.com/aiven-sal)). -* Fix fuzzer crash in parseDateTime() [#53764](https://github.com/ClickHouse/ClickHouse/pull/53764) ([Robert Schulze](https://github.com/rschu1ze)). -* Materialized postgres: fix uncaught exception in getCreateTableQueryImpl [#53832](https://github.com/ClickHouse/ClickHouse/pull/53832) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix possible segfault while using PostgreSQL engine [#53847](https://github.com/ClickHouse/ClickHouse/pull/53847) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix named_collection_admin alias [#54066](https://github.com/ClickHouse/ClickHouse/pull/54066) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix rows_before_limit_at_least for DelayedSource. [#54122](https://github.com/ClickHouse/ClickHouse/pull/54122) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix results of queries utilizing the Annoy index when the part has more than one mark. [#51325](https://github.com/ClickHouse/ClickHouse/pull/51325) ([Tian Xinhui](https://github.com/xinhuitian)). +* Fix usage of temporary directories during RESTORE. [#51493](https://github.com/ClickHouse/ClickHouse/pull/51493) ([Azat Khuzhin](https://github.com/azat)). +* Fixed binary arithmetic for Nullable(IPv4). [#51642](https://github.com/ClickHouse/ClickHouse/pull/51642) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Support IPv4 and IPv6 as dictionary attributes. [#51756](https://github.com/ClickHouse/ClickHouse/pull/51756) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Updated checkDataPart to read compress marks as compressed file by checking its extension resolves [#51337](https://github.com/ClickHouse/ClickHouse/issues/51337). [#51777](https://github.com/ClickHouse/ClickHouse/pull/51777) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fix mistakenly comma parsing as part of datetime in CSV datetime best effort parsing. Closes [#51059](https://github.com/ClickHouse/ClickHouse/issues/51059). [#51950](https://github.com/ClickHouse/ClickHouse/pull/51950) ([Kruglov Pavel](https://github.com/Avogar)). +* Fixed exception when executable udf was provided with a parameter. [#51961](https://github.com/ClickHouse/ClickHouse/pull/51961) ([Nikita Taranov](https://github.com/nickitat)). +* Fixed recalculation of skip indexes and projections in `ALTER DELETE` queries. [#52530](https://github.com/ClickHouse/ClickHouse/pull/52530) ([Anton Popov](https://github.com/CurtizJ)). +* Fixed the infinite loop in ReadBuffer when the pos overflows the end of the buffer in MaterializedMySQL. [#52621](https://github.com/ClickHouse/ClickHouse/pull/52621) ([Val Doroshchuk](https://github.com/valbok)). +* Do not try to load suggestions in `clickhouse-local` when a the dialect is not `clickhouse`. [#52628](https://github.com/ClickHouse/ClickHouse/pull/52628) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Remove mutex from CaresPTRResolver and create `ares_channel` on demand. Trying to fix: https://github.com/ClickHouse/ClickHouse/pull/52327#issuecomment-1643021543. [#52634](https://github.com/ClickHouse/ClickHouse/pull/52634) ([Arthur Passos](https://github.com/arthurpassos)). +* Fix filtering by virtual columns with OR expression (i.e. by `_table` for `Merge` engine). [#52653](https://github.com/ClickHouse/ClickHouse/pull/52653) ([Azat Khuzhin](https://github.com/azat)). +* Fix crash in function `tuple` with one sparse column argument. [#52659](https://github.com/ClickHouse/ClickHouse/pull/52659) ([Anton Popov](https://github.com/CurtizJ)). +* Fix named collections related statements: `if [not] exists`, `on cluster`. Closes [#51609](https://github.com/ClickHouse/ClickHouse/issues/51609). [#52687](https://github.com/ClickHouse/ClickHouse/pull/52687) ([Al Korgun](https://github.com/alkorgun)). +* Fix reading of unnecessary column in case of multistage `PREWHERE`. [#52689](https://github.com/ClickHouse/ClickHouse/pull/52689) ([Anton Popov](https://github.com/CurtizJ)). +* Fix unexpected sort result on multi columns with nulls first direction. [#52761](https://github.com/ClickHouse/ClickHouse/pull/52761) ([ZhiHong Zhang](https://github.com/copperybean)). +* Keeper fix: fix data race during reconfiguration. [#52804](https://github.com/ClickHouse/ClickHouse/pull/52804) ([Antonio Andelic](https://github.com/antonio2368)). +* Fixed sorting of sparse columns in case of `ORDER BY ... LIMIT n` clause and large values of `n`. [#52827](https://github.com/ClickHouse/ClickHouse/pull/52827) ([Anton Popov](https://github.com/CurtizJ)). +* Keeper fix: platforms that used poll() would delay responding to requests until the client sent a heartbeat. [#52833](https://github.com/ClickHouse/ClickHouse/pull/52833) ([Andy Fiddaman](https://github.com/citrus-it)). +* Make regexp analyzer recognize named capturing groups. [#52840](https://github.com/ClickHouse/ClickHouse/pull/52840) ([Han Fei](https://github.com/hanfei1991)). +* Fix possible assert in ~PushingAsyncPipelineExecutor in clickhouse-local. [#52862](https://github.com/ClickHouse/ClickHouse/pull/52862) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix reading of empty `Nested(Array(LowCardinality(...)))` columns (added by `ALTER TABLE ... ADD COLUMN ...` query and not materialized in parts) from compact parts of `MergeTree` tables. [#52949](https://github.com/ClickHouse/ClickHouse/pull/52949) ([Anton Popov](https://github.com/CurtizJ)). +* Fixed the record inconsistency in session_log between login and logout. [#52958](https://github.com/ClickHouse/ClickHouse/pull/52958) ([Alexey Gerasimchuck](https://github.com/Demilivor)). +* Fix password leak in show create mysql table. [#52962](https://github.com/ClickHouse/ClickHouse/pull/52962) ([Duc Canh Le](https://github.com/canhld94)). +* Fix possible crash in full sorting merge join on sparse columns, close [#52978](https://github.com/ClickHouse/ClickHouse/issues/52978). [#53000](https://github.com/ClickHouse/ClickHouse/pull/53000) ([vdimir](https://github.com/vdimir)). +* Fix very rare race condition with empty key prefix directory deletion in fs cache. [#53055](https://github.com/ClickHouse/ClickHouse/pull/53055) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fixed `output_format_parquet_compression_method='zstd'` producing invalid Parquet files sometimes. In older versions, use setting `output_format_parquet_use_custom_encoder = 0` as a workaround. [#53064](https://github.com/ClickHouse/ClickHouse/pull/53064) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix query_id in part_log with async flush queries. [#53103](https://github.com/ClickHouse/ClickHouse/pull/53103) ([Raúl Marín](https://github.com/Algunenano)). +* Fix possible error from filesystem cache "Read unexpected size". [#53121](https://github.com/ClickHouse/ClickHouse/pull/53121) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Disable the new parquet encoder: it has a bug. [#53130](https://github.com/ClickHouse/ClickHouse/pull/53130) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* `Not-ready Set is passed as the second argument for function 'in'` could happen with limited `max_result_rows` and ` result_overflow_mode = 'break'`. [#53162](https://github.com/ClickHouse/ClickHouse/pull/53162) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix character escaping in the PostgreSQL engine (`\'` -> `''`, `\\` -> `\`). Closes [#49821](https://github.com/ClickHouse/ClickHouse/issues/49821). [#53250](https://github.com/ClickHouse/ClickHouse/pull/53250) ([Nikolay Degterinsky](https://github.com/evillique)). +* Fixed the record inconsistency in session_log between login and logout. [#53255](https://github.com/ClickHouse/ClickHouse/pull/53255) ([Alexey Gerasimchuck](https://github.com/Demilivor)). +* Fixed the record inconsistency in session_log between login and logout. [#53302](https://github.com/ClickHouse/ClickHouse/pull/53302) ([Alexey Gerasimchuck](https://github.com/Demilivor)). +* Fixed adding intervals of a fraction of a second to DateTime producing incorrect result. [#53309](https://github.com/ClickHouse/ClickHouse/pull/53309) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix the "Context has expired" error in dictionaries when using subqueries. [#53342](https://github.com/ClickHouse/ClickHouse/pull/53342) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix incorrect normal projection AST format when single function is used in ORDER BY. This fixes [#52607](https://github.com/ClickHouse/ClickHouse/issues/52607). [#53347](https://github.com/ClickHouse/ClickHouse/pull/53347) ([Amos Bird](https://github.com/amosbird)). +* Forbid `use_structure_from_insertion_table_in_table_functions` when execute Scalar. Closes [#52494](https://github.com/ClickHouse/ClickHouse/issues/52494). [#53348](https://github.com/ClickHouse/ClickHouse/pull/53348) ([flynn](https://github.com/ucasfl)). +* Avoid loading tables from lazy database when not needed Follow up to [#43840](https://github.com/ClickHouse/ClickHouse/issues/43840). [#53372](https://github.com/ClickHouse/ClickHouse/pull/53372) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fixed `system.data_skipping_indices` columns `data_compressed_bytes` and `data_uncompressed_bytes` for MaterializedMySQL. [#53381](https://github.com/ClickHouse/ClickHouse/pull/53381) ([Filipp Ozinov](https://github.com/bakwc)). +* Fix processing single carriage return in TSV file segmentation engine that could lead to parsing errors. Closes [#53320](https://github.com/ClickHouse/ClickHouse/issues/53320). [#53407](https://github.com/ClickHouse/ClickHouse/pull/53407) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix the "Context has expired" error when using subqueries with functions `file()` (regular function, not table function), `joinGet()`, `joinGetOrNull()`, `connectionId()`. [#53433](https://github.com/ClickHouse/ClickHouse/pull/53433) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix timeout_overflow_mode when having subquery in the rhs of IN. [#53439](https://github.com/ClickHouse/ClickHouse/pull/53439) ([Duc Canh Le](https://github.com/canhld94)). +* This PR fixes [#53152](https://github.com/ClickHouse/ClickHouse/issues/53152). [#53440](https://github.com/ClickHouse/ClickHouse/pull/53440) ([Zhiguo Zhou](https://github.com/ZhiguoZh)). +* Fix the JSON_QUERY function can not parse the json string while path is numberic. like in the query SELECT JSON_QUERY('{"123":"abcd"}', '$.123'), we would encounter the exceptions ``` DB::Exception: Unable to parse JSONPath: While processing JSON_QUERY('{"123":"acd"}', '$.123'). (BAD_ARGUMENTS) ```. [#53470](https://github.com/ClickHouse/ClickHouse/pull/53470) ([KevinyhZou](https://github.com/KevinyhZou)). +* Fix possible crash for queries with parallel `FINAL` where `ORDER BY` and `PRIMARY KEY` are different in table definition. [#53489](https://github.com/ClickHouse/ClickHouse/pull/53489) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fixed ReplacingMergeTree to properly process single-partition cases when `do_not_merge_across_partitions_select_final=1`. Previously `SELECT` could return rows that were marked as deleted. [#53511](https://github.com/ClickHouse/ClickHouse/pull/53511) ([Vasily Nemkov](https://github.com/Enmk)). +* Fix bug in flushing of async insert queue on graceful shutdown. [#53547](https://github.com/ClickHouse/ClickHouse/pull/53547) ([joelynch](https://github.com/joelynch)). +* Fix crash in join on sparse column. [#53548](https://github.com/ClickHouse/ClickHouse/pull/53548) ([vdimir](https://github.com/vdimir)). +* Fix possible UB in Set skipping index for functions with incorrect args. [#53559](https://github.com/ClickHouse/ClickHouse/pull/53559) ([Azat Khuzhin](https://github.com/azat)). +* Fix possible UB in inverted indexes (experimental feature). [#53560](https://github.com/ClickHouse/ClickHouse/pull/53560) ([Azat Khuzhin](https://github.com/azat)). +* Fixed bug for interpolate when interpolated column is aliased with the same name as a source column. [#53572](https://github.com/ClickHouse/ClickHouse/pull/53572) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fixed a bug in EXPLAIN PLAN index=1 where the number of dropped granules was incorrect. [#53616](https://github.com/ClickHouse/ClickHouse/pull/53616) ([wangxiaobo](https://github.com/wzb5212)). +* Correctly handle totals and extremes when `DelayedSource` is used. [#53644](https://github.com/ClickHouse/ClickHouse/pull/53644) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix `Pipeline stuck` error in mutation with `IN (subquery WITH TOTALS)` where ready set was taken from cache. [#53645](https://github.com/ClickHouse/ClickHouse/pull/53645) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Allow to use JSON subcolumns in predicates of UPDATE and DELETE queries. [#53677](https://github.com/ClickHouse/ClickHouse/pull/53677) ([zps](https://github.com/VanDarkholme7)). +* Fix possible logical error exception during filter pushdown for full_sorting_merge join. [#53699](https://github.com/ClickHouse/ClickHouse/pull/53699) ([vdimir](https://github.com/vdimir)). +* Fix NULL::LowCardinality(Nullable(...)) with IN. [#53706](https://github.com/ClickHouse/ClickHouse/pull/53706) ([Andrey Zvonov](https://github.com/zvonand)). +* Fixes possible crashes in `DISTINCT` queries with enabled `optimize_distinct_in_order` and sparse columns. [#53711](https://github.com/ClickHouse/ClickHouse/pull/53711) ([Igor Nikonov](https://github.com/devcrafter)). +* Correctly handle default column with multiple rows in transform. [#53742](https://github.com/ClickHouse/ClickHouse/pull/53742) ([Salvatore Mesoraca](https://github.com/aiven-sal)). +* Fix crash in SQL function parseDateTime() with non-const timezone argument. [#53764](https://github.com/ClickHouse/ClickHouse/pull/53764) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix uncaught exception in `getCreateTableQueryImpl`. [#53832](https://github.com/ClickHouse/ClickHouse/pull/53832) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix possible segfault while using PostgreSQL engine. Closes [#36919](https://github.com/ClickHouse/ClickHouse/issues/36919). [#53847](https://github.com/ClickHouse/ClickHouse/pull/53847) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix `named_collection_admin` alias to `named_collection_control` not working from config. [#54066](https://github.com/ClickHouse/ClickHouse/pull/54066) ([Kseniia Sumarokova](https://github.com/kssenii)). +* A distributed query could miss `rows_before_limit_at_least` in the query result in case it was executed on a replica with a delay more than `max_replica_delay_for_distributed_queries`. [#54122](https://github.com/ClickHouse/ClickHouse/pull/54122) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). #### NO CL ENTRY @@ -272,7 +272,7 @@ sidebar_label: 2023 * Add more checks into ThreadStatus ctor. [#42019](https://github.com/ClickHouse/ClickHouse/pull/42019) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Refactor Query Tree visitor [#46740](https://github.com/ClickHouse/ClickHouse/pull/46740) ([Dmitry Novik](https://github.com/novikd)). * Revert "Revert "Randomize JIT settings in tests"" [#48282](https://github.com/ClickHouse/ClickHouse/pull/48282) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix outdated cache configuration in s3 tests: s3_storage_policy_by_defau... [#48424](https://github.com/ClickHouse/ClickHouse/pull/48424) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix outdated cache configuration in s3 tests: s3_storage_policy_by_defau… [#48424](https://github.com/ClickHouse/ClickHouse/pull/48424) ([Kseniia Sumarokova](https://github.com/kssenii)). * Fix IN with decimal in analyzer [#48754](https://github.com/ClickHouse/ClickHouse/pull/48754) ([vdimir](https://github.com/vdimir)). * Some unclear change in StorageBuffer::reschedule() for something [#49723](https://github.com/ClickHouse/ClickHouse/pull/49723) ([DimasKovas](https://github.com/DimasKovas)). * MergeTree & SipHash checksum big-endian support [#50276](https://github.com/ClickHouse/ClickHouse/pull/50276) ([ltrk2](https://github.com/ltrk2)). @@ -540,7 +540,7 @@ sidebar_label: 2023 * Do not warn about arch_sys_counter clock [#53739](https://github.com/ClickHouse/ClickHouse/pull/53739) ([Artur Malchanau](https://github.com/Hexta)). * Add some profile events [#53741](https://github.com/ClickHouse/ClickHouse/pull/53741) ([Kseniia Sumarokova](https://github.com/kssenii)). * Support clang-18 (Wmissing-field-initializers) [#53751](https://github.com/ClickHouse/ClickHouse/pull/53751) ([Raúl Marín](https://github.com/Algunenano)). -* Upgrade openSSL to v3.0.10 [#53756](https://github.com/ClickHouse/ClickHouse/pull/53756) ([bhavnajindal](https://github.com/bhavnajindal)). +* Upgrade openSSL to v3.0.10 [#53756](https://github.com/ClickHouse/ClickHouse/pull/53756) ([Bhavna Jindal](https://github.com/bhavnajindal)). * Improve JSON-handling on s390x [#53760](https://github.com/ClickHouse/ClickHouse/pull/53760) ([ltrk2](https://github.com/ltrk2)). * Reduce API calls to SSM client [#53762](https://github.com/ClickHouse/ClickHouse/pull/53762) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). * Remove branch references from .gitmodules [#53763](https://github.com/ClickHouse/ClickHouse/pull/53763) ([Robert Schulze](https://github.com/rschu1ze)). @@ -588,3 +588,4 @@ sidebar_label: 2023 * tests: mark 02152_http_external_tables_memory_tracking as no-parallel [#54155](https://github.com/ClickHouse/ClickHouse/pull/54155) ([Azat Khuzhin](https://github.com/azat)). * The external logs have had colliding arguments [#54165](https://github.com/ClickHouse/ClickHouse/pull/54165) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). * Rename macro [#54169](https://github.com/ClickHouse/ClickHouse/pull/54169) ([Kseniia Sumarokova](https://github.com/kssenii)). + diff --git a/docs/changelogs/v23.8.10.43-lts.md b/docs/changelogs/v23.8.10.43-lts.md index 0093467d129..0750901da8a 100644 --- a/docs/changelogs/v23.8.10.43-lts.md +++ b/docs/changelogs/v23.8.10.43-lts.md @@ -16,17 +16,17 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Background merges correctly use temporary data storage in the cache [#57275](https://github.com/ClickHouse/ClickHouse/pull/57275) ([vdimir](https://github.com/vdimir)). -* MergeTree mutations reuse source part index granularity [#57352](https://github.com/ClickHouse/ClickHouse/pull/57352) ([Maksim Kita](https://github.com/kitaisreal)). -* Fix double destroy call on exception throw in addBatchLookupTable8 [#58745](https://github.com/ClickHouse/ClickHouse/pull/58745) ([Raúl Marín](https://github.com/Algunenano)). -* Fix JSONExtract function for LowCardinality(Nullable) columns [#58808](https://github.com/ClickHouse/ClickHouse/pull/58808) ([vdimir](https://github.com/vdimir)). -* Fix: LIMIT BY and LIMIT in distributed query [#59153](https://github.com/ClickHouse/ClickHouse/pull/59153) ([Igor Nikonov](https://github.com/devcrafter)). -* Fix translate() with FixedString input [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). -* Fix error "Read beyond last offset" for AsynchronousBoundedReadBuffer [#59630](https://github.com/ClickHouse/ClickHouse/pull/59630) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix query start time on non initial queries [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). -* Fix leftPad / rightPad function with FixedString input [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). -* rabbitmq: fix having neither acked nor nacked messages [#59775](https://github.com/ClickHouse/ClickHouse/pull/59775) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix cosineDistance crash with Nullable [#60150](https://github.com/ClickHouse/ClickHouse/pull/60150) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#57565](https://github.com/ClickHouse/ClickHouse/issues/57565): Background merges correctly use temporary data storage in the cache. [#57275](https://github.com/ClickHouse/ClickHouse/pull/57275) ([vdimir](https://github.com/vdimir)). +* Backported in [#57476](https://github.com/ClickHouse/ClickHouse/issues/57476): Fix possible broken skipping indexes after materialization in MergeTree compact parts. [#57352](https://github.com/ClickHouse/ClickHouse/pull/57352) ([Maksim Kita](https://github.com/kitaisreal)). +* Backported in [#58777](https://github.com/ClickHouse/ClickHouse/issues/58777): Fix double destroy call on exception throw in addBatchLookupTable8. [#58745](https://github.com/ClickHouse/ClickHouse/pull/58745) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#58856](https://github.com/ClickHouse/ClickHouse/issues/58856): Fix possible crash in JSONExtract function extracting `LowCardinality(Nullable(T))` type. [#58808](https://github.com/ClickHouse/ClickHouse/pull/58808) ([vdimir](https://github.com/vdimir)). +* Backported in [#59194](https://github.com/ClickHouse/ClickHouse/issues/59194): The combination of LIMIT BY and LIMIT could produce an incorrect result in distributed queries (parallel replicas included). [#59153](https://github.com/ClickHouse/ClickHouse/pull/59153) ([Igor Nikonov](https://github.com/devcrafter)). +* Backported in [#59429](https://github.com/ClickHouse/ClickHouse/issues/59429): Fix translate() with FixedString input. Could lead to crashes as it'd return a String column (vs the expected FixedString). This issue was found through ClickHouse Bug Bounty Program YohannJardin. [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#60128](https://github.com/ClickHouse/ClickHouse/issues/60128): Fix error `Read beyond last offset` for `AsynchronousBoundedReadBuffer`. [#59630](https://github.com/ClickHouse/ClickHouse/pull/59630) ([Vitaly Baranov](https://github.com/vitlibar)). +* Backported in [#59836](https://github.com/ClickHouse/ClickHouse/issues/59836): Fix query start time on non initial queries. [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59758](https://github.com/ClickHouse/ClickHouse/issues/59758): Fix leftPad / rightPad function with FixedString input. [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#60304](https://github.com/ClickHouse/ClickHouse/issues/60304): Fix having neigher acked nor nacked messages. If exception happens during read-write phase, messages will be nacked. [#59775](https://github.com/ClickHouse/ClickHouse/pull/59775) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#60171](https://github.com/ClickHouse/ClickHouse/issues/60171): Fix cosineDistance crash with Nullable. [#60150](https://github.com/ClickHouse/ClickHouse/pull/60150) ([Raúl Marín](https://github.com/Algunenano)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.11.28-lts.md b/docs/changelogs/v23.8.11.28-lts.md index acc284caa72..3da3d10cfa5 100644 --- a/docs/changelogs/v23.8.11.28-lts.md +++ b/docs/changelogs/v23.8.11.28-lts.md @@ -12,11 +12,11 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). -* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). -* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#60983](https://github.com/ClickHouse/ClickHouse/issues/60983): Fix buffer overflow that can happen if the attacker asks the HTTP server to decompress data with a composition of codecs and size triggering numeric overflow. Fix buffer overflow that can happen inside codec NONE on wrong input data. This was submitted by TIANGONG research team through our [Bug Bounty program](https://github.com/ClickHouse/ClickHouse/issues/38986). [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#60986](https://github.com/ClickHouse/ClickHouse/issues/60986): Functions for SQL/JSON were able to read uninitialized memory. This closes [#60017](https://github.com/ClickHouse/ClickHouse/issues/60017). Found by Fuzzer. [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#60816](https://github.com/ClickHouse/ClickHouse/issues/60816): Fix crash in arrayEnumerateRanked. [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#60837](https://github.com/ClickHouse/ClickHouse/issues/60837): Fix crash when using input() in INSERT SELECT JOIN. Closes [#60035](https://github.com/ClickHouse/ClickHouse/issues/60035). [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#60911](https://github.com/ClickHouse/ClickHouse/issues/60911): Avoid segfault if too many keys are skipped when reading from S3. [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). #### NO CL ENTRY diff --git a/docs/changelogs/v23.8.12.13-lts.md b/docs/changelogs/v23.8.12.13-lts.md index dbb36fdc00e..0329d4349f3 100644 --- a/docs/changelogs/v23.8.12.13-lts.md +++ b/docs/changelogs/v23.8.12.13-lts.md @@ -9,9 +9,9 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Improve isolation of query cache entries under re-created users or role switches [#58611](https://github.com/ClickHouse/ClickHouse/pull/58611) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). +* Backported in [#61439](https://github.com/ClickHouse/ClickHouse/issues/61439): The query cache now denies access to entries when the user is re-created or assumes another role. This improves prevents attacks where 1. an user with the same name as a dropped user may access the old user's cache entries or 2. a user with a different role may access cache entries of a role with a different row policy. [#58611](https://github.com/ClickHouse/ClickHouse/pull/58611) ([Robert Schulze](https://github.com/rschu1ze)). +* Backported in [#61572](https://github.com/ClickHouse/ClickHouse/issues/61572): Fix string search with constant start position which previously could lead to memory corruption. [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#61854](https://github.com/ClickHouse/ClickHouse/issues/61854): Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` when specifying incorrect UTF-8 sequence. Example: [#61714](https://github.com/ClickHouse/ClickHouse/issues/61714#issuecomment-2012768202). [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v23.8.13.25-lts.md b/docs/changelogs/v23.8.13.25-lts.md index 3452621556a..e9c6e2e9f28 100644 --- a/docs/changelogs/v23.8.13.25-lts.md +++ b/docs/changelogs/v23.8.13.25-lts.md @@ -15,11 +15,11 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix REPLACE/MOVE PARTITION with zero-copy replication [#54193](https://github.com/ClickHouse/ClickHouse/pull/54193) ([Alexander Tokmakov](https://github.com/tavplubix)). -* Fix ATTACH query with external ON CLUSTER [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). -* Cancel merges before removing moved parts [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). -* Try to fix segfault in Hive engine [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#62898](https://github.com/ClickHouse/ClickHouse/issues/62898): Fixed a bug in zero-copy replication (an experimental feature) that could cause `The specified key does not exist` errors and data loss after REPLACE/MOVE PARTITION. A similar issue might happen with TTL-moves between disks. [#54193](https://github.com/ClickHouse/ClickHouse/pull/54193) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Backported in [#61964](https://github.com/ClickHouse/ClickHouse/issues/61964): Fix the ATTACH query with the ON CLUSTER clause when the database does not exist on the initiator node. Closes [#55009](https://github.com/ClickHouse/ClickHouse/issues/55009). [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#62527](https://github.com/ClickHouse/ClickHouse/issues/62527): Fix data race between `MOVE PARTITION` query and merges resulting in intersecting parts. [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#62238](https://github.com/ClickHouse/ClickHouse/issues/62238): Fix skipping escape sequcne parsing errors during JSON data parsing while using `input_format_allow_errors_num/ratio` settings. [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62673](https://github.com/ClickHouse/ClickHouse/issues/62673): Fix segmentation fault when using Hive table engine. Reference [#62154](https://github.com/ClickHouse/ClickHouse/issues/62154), [#62560](https://github.com/ClickHouse/ClickHouse/issues/62560). [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v23.8.14.6-lts.md b/docs/changelogs/v23.8.14.6-lts.md index 0053502a9dc..3236c931e51 100644 --- a/docs/changelogs/v23.8.14.6-lts.md +++ b/docs/changelogs/v23.8.14.6-lts.md @@ -9,6 +9,6 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Set server name for SSL handshake in MongoDB engine [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). -* Use user specified db instead of "config" for MongoDB wire protocol version check [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#63172](https://github.com/ClickHouse/ClickHouse/issues/63172): Setting server_name might help with recently reported SSL handshake error when connecting to MongoDB Atlas: `Poco::Exception. Code: 1000, e.code() = 0, SSL Exception: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR`. [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#63164](https://github.com/ClickHouse/ClickHouse/issues/63164): The wire protocol version check for MongoDB used to try accessing "config" database, but this can fail if the user doesn't have permissions for it. The fix is to use the database name provided by user. [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). diff --git a/docs/changelogs/v23.8.2.7-lts.md b/docs/changelogs/v23.8.2.7-lts.md index 317e2c6d56a..a6f74e7998c 100644 --- a/docs/changelogs/v23.8.2.7-lts.md +++ b/docs/changelogs/v23.8.2.7-lts.md @@ -9,8 +9,8 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix: parallel replicas over distributed don't read from all replicas [#54199](https://github.com/ClickHouse/ClickHouse/pull/54199) ([Igor Nikonov](https://github.com/devcrafter)). -* Fix: allow IPv6 for bloom filter [#54200](https://github.com/ClickHouse/ClickHouse/pull/54200) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Backported in [#54209](https://github.com/ClickHouse/ClickHouse/issues/54209): Parallel reading from replicas over Distributed table was using only one replica per shard. [#54199](https://github.com/ClickHouse/ClickHouse/pull/54199) ([Igor Nikonov](https://github.com/devcrafter)). +* Backported in [#54233](https://github.com/ClickHouse/ClickHouse/issues/54233): Allow IPv6 for bloom filter, backward compatibility issue. [#54200](https://github.com/ClickHouse/ClickHouse/pull/54200) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.3.48-lts.md b/docs/changelogs/v23.8.3.48-lts.md index af669c5adc8..91514f48a25 100644 --- a/docs/changelogs/v23.8.3.48-lts.md +++ b/docs/changelogs/v23.8.3.48-lts.md @@ -18,19 +18,19 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix: moved to prewhere condition actions can lose column [#53492](https://github.com/ClickHouse/ClickHouse/pull/53492) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Fix: parallel replicas over distributed with prefer_localhost_replica=1 [#54334](https://github.com/ClickHouse/ClickHouse/pull/54334) ([Igor Nikonov](https://github.com/devcrafter)). -* Fix possible error 'URI contains invalid characters' in s3 table function [#54373](https://github.com/ClickHouse/ClickHouse/pull/54373) ([Kruglov Pavel](https://github.com/Avogar)). -* Check for overflow before addition in `analysisOfVariance` function [#54385](https://github.com/ClickHouse/ClickHouse/pull/54385) ([Antonio Andelic](https://github.com/antonio2368)). -* reproduce and fix the bug in removeSharedRecursive [#54430](https://github.com/ClickHouse/ClickHouse/pull/54430) ([Sema Checherinda](https://github.com/CheSema)). -* Fix aggregate projections with normalized states [#54480](https://github.com/ClickHouse/ClickHouse/pull/54480) ([Amos Bird](https://github.com/amosbird)). -* Fix possible parsing error in WithNames formats with disabled input_format_with_names_use_header [#54513](https://github.com/ClickHouse/ClickHouse/pull/54513) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix zero copy garbage [#54550](https://github.com/ClickHouse/ClickHouse/pull/54550) ([Alexander Tokmakov](https://github.com/tavplubix)). -* Fix race in `ColumnUnique` [#54575](https://github.com/ClickHouse/ClickHouse/pull/54575) ([Nikita Taranov](https://github.com/nickitat)). -* Fix serialization of `ColumnDecimal` [#54601](https://github.com/ClickHouse/ClickHouse/pull/54601) ([Nikita Taranov](https://github.com/nickitat)). -* Fix virtual columns having incorrect values after ORDER BY [#54811](https://github.com/ClickHouse/ClickHouse/pull/54811) ([Michael Kolupaev](https://github.com/al13n321)). -* Fix Keeper segfault during shutdown [#54841](https://github.com/ClickHouse/ClickHouse/pull/54841) ([Antonio Andelic](https://github.com/antonio2368)). -* Rebuild minmax_count_projection when partition key gets modified [#54943](https://github.com/ClickHouse/ClickHouse/pull/54943) ([Amos Bird](https://github.com/amosbird)). +* Backported in [#54974](https://github.com/ClickHouse/ClickHouse/issues/54974): Fixed issue when during prewhere optimization compound condition actions DAG can lose output column of intermediate step while this column is required as an input column of some next step. [#53492](https://github.com/ClickHouse/ClickHouse/pull/53492) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Backported in [#54996](https://github.com/ClickHouse/ClickHouse/issues/54996): Parallel replicas either executed completely on the local replica or produce an incorrect result when `prefer_localhost_replica=1`. Fixes [#54276](https://github.com/ClickHouse/ClickHouse/issues/54276). [#54334](https://github.com/ClickHouse/ClickHouse/pull/54334) ([Igor Nikonov](https://github.com/devcrafter)). +* Backported in [#54516](https://github.com/ClickHouse/ClickHouse/issues/54516): Fix possible error 'URI contains invalid characters' in s3 table function. Closes [#54345](https://github.com/ClickHouse/ClickHouse/issues/54345). [#54373](https://github.com/ClickHouse/ClickHouse/pull/54373) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#54418](https://github.com/ClickHouse/ClickHouse/issues/54418): Check for overflow when handling group number argument for `analysisOfVariance` to avoid crashes. Crash found using WINGFUZZ. [#54385](https://github.com/ClickHouse/ClickHouse/pull/54385) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#54527](https://github.com/ClickHouse/ClickHouse/issues/54527): Reproduce the bug described here [#54135](https://github.com/ClickHouse/ClickHouse/issues/54135). [#54430](https://github.com/ClickHouse/ClickHouse/pull/54430) ([Sema Checherinda](https://github.com/CheSema)). +* Backported in [#54854](https://github.com/ClickHouse/ClickHouse/issues/54854): Fix incorrect aggregation projection optimization when using variant aggregate states. This optimization is accidentally enabled but not properly implemented, because after https://github.com/ClickHouse/ClickHouse/pull/39420 the comparison of DataTypeAggregateFunction is normalized. This fixes [#54406](https://github.com/ClickHouse/ClickHouse/issues/54406). [#54480](https://github.com/ClickHouse/ClickHouse/pull/54480) ([Amos Bird](https://github.com/amosbird)). +* Backported in [#54599](https://github.com/ClickHouse/ClickHouse/issues/54599): Fix parsing error in WithNames formats while reading subset of columns with disabled input_format_with_names_use_header. Closes [#52591](https://github.com/ClickHouse/ClickHouse/issues/52591). [#54513](https://github.com/ClickHouse/ClickHouse/pull/54513) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#54594](https://github.com/ClickHouse/ClickHouse/issues/54594): Starting from version 23.5, zero-copy replication could leave some garbage in ZooKeeper and on S3. It might happen on removal of Outdated parts that were mutated. The issue is indicated by `Failed to get mutation parent on {} for part {}, refusing to remove blobs` log messages. [#54550](https://github.com/ClickHouse/ClickHouse/pull/54550) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Backported in [#54627](https://github.com/ClickHouse/ClickHouse/issues/54627): Fix unsynchronised write to a shared variable in `ColumnUnique`. [#54575](https://github.com/ClickHouse/ClickHouse/pull/54575) ([Nikita Taranov](https://github.com/nickitat)). +* Backported in [#54625](https://github.com/ClickHouse/ClickHouse/issues/54625): Fix serialization of `ColumnDecimal`. [#54601](https://github.com/ClickHouse/ClickHouse/pull/54601) ([Nikita Taranov](https://github.com/nickitat)). +* Backported in [#54945](https://github.com/ClickHouse/ClickHouse/issues/54945): Fixed virtual columns (e.g. _file) showing incorrect values with ORDER BY. [#54811](https://github.com/ClickHouse/ClickHouse/pull/54811) ([Michael Kolupaev](https://github.com/al13n321)). +* Backported in [#54872](https://github.com/ClickHouse/ClickHouse/issues/54872): Keeper fix: correctly capture a variable in callback to avoid segfaults during shutdown. [#54841](https://github.com/ClickHouse/ClickHouse/pull/54841) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#54950](https://github.com/ClickHouse/ClickHouse/issues/54950): Fix projection optimization error if table's partition key was ALTERed by extending its Enum type. The fix is to rebuild `minmax_count_projection` when partition key gets modified. This fixes [#54941](https://github.com/ClickHouse/ClickHouse/issues/54941). [#54943](https://github.com/ClickHouse/ClickHouse/pull/54943) ([Amos Bird](https://github.com/amosbird)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.4.69-lts.md b/docs/changelogs/v23.8.4.69-lts.md index 065a57549be..a6d8d8bb03b 100644 --- a/docs/changelogs/v23.8.4.69-lts.md +++ b/docs/changelogs/v23.8.4.69-lts.md @@ -11,26 +11,26 @@ sidebar_label: 2023 * Backported in [#55673](https://github.com/ClickHouse/ClickHouse/issues/55673): If the database is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). [#50724](https://github.com/ClickHouse/ClickHouse/pull/50724) ([Alexander Nikolaev](https://github.com/AlexNik)). * Backported in [#55293](https://github.com/ClickHouse/ClickHouse/issues/55293): Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checkouting submodules. [#51435](https://github.com/ClickHouse/ClickHouse/pull/51435) ([Ilya Yatsishin](https://github.com/qoega)). * Backported in [#55366](https://github.com/ClickHouse/ClickHouse/issues/55366): Solve issue with launching standalone clickhouse-keeper from clickhouse-server package. [#55226](https://github.com/ClickHouse/ClickHouse/pull/55226) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). -* Backported in [#55725](https://github.com/ClickHouse/ClickHouse/issues/55725): Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/mkaynov)). +* Backported in [#55725](https://github.com/ClickHouse/ClickHouse/issues/55725): Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/maxknv)). #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix "Invalid number of rows in Chunk" in MaterializedPostgreSQL [#54844](https://github.com/ClickHouse/ClickHouse/pull/54844) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Move obsolete format settings to separate section [#54855](https://github.com/ClickHouse/ClickHouse/pull/54855) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix: insert quorum w/o keeper retries [#55026](https://github.com/ClickHouse/ClickHouse/pull/55026) ([Igor Nikonov](https://github.com/devcrafter)). -* Prevent attaching parts from tables with different projections or indices [#55062](https://github.com/ClickHouse/ClickHouse/pull/55062) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Proper cleanup in case of exception in ctor of ShellCommandSource [#55103](https://github.com/ClickHouse/ClickHouse/pull/55103) ([Alexander Gololobov](https://github.com/davenger)). -* Fix deadlock in LDAP assigned role update [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)). -* Fix for background download in fs cache [#55252](https://github.com/ClickHouse/ClickHouse/pull/55252) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix functions execution over sparse columns [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)). -* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)). -* Fix trash optimization (up to a certain extent) [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix parsing of arrays in cast operator [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)). -* Fix filtering by virtual columns with OR filter in query [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)). -* Fix MongoDB connection issues [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)). -* Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)). -* Fix filtering by virtual columns with OR filter in query (resubmit) [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#55304](https://github.com/ClickHouse/ClickHouse/issues/55304): Fix "Invalid number of rows in Chunk" in MaterializedPostgreSQL (which could happen with PostgreSQL version >= 13). [#54844](https://github.com/ClickHouse/ClickHouse/pull/54844) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#55018](https://github.com/ClickHouse/ClickHouse/issues/55018): Move obsolete format settings to separate section and use it together with all format settings to avoid exceptions `Unknown setting` during use of obsolete format settings. Closes [#54792](https://github.com/ClickHouse/ClickHouse/issues/54792) ### Documentation entry for user-facing changes. [#54855](https://github.com/ClickHouse/ClickHouse/pull/54855) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#55097](https://github.com/ClickHouse/ClickHouse/issues/55097): Insert quorum could be marked as satisfied incorrectly in case of keeper retries while waiting for the quorum. Fixes [#54543](https://github.com/ClickHouse/ClickHouse/issues/54543). [#55026](https://github.com/ClickHouse/ClickHouse/pull/55026) ([Igor Nikonov](https://github.com/devcrafter)). +* Backported in [#55473](https://github.com/ClickHouse/ClickHouse/issues/55473): Prevent attaching partitions from tables that doesn't have the same indices or projections defined. [#55062](https://github.com/ClickHouse/ClickHouse/pull/55062) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#55461](https://github.com/ClickHouse/ClickHouse/issues/55461): If an exception happens in `ShellCommandSource` constructor after some of the `send_data_threads` are started, they need to be join()-ed, otherwise abort() will be triggered in `ThreadFromGlobalPool` destructor. Fixes [#55091](https://github.com/ClickHouse/ClickHouse/issues/55091). [#55103](https://github.com/ClickHouse/ClickHouse/pull/55103) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#55412](https://github.com/ClickHouse/ClickHouse/issues/55412): Fix deadlock in LDAP assigned role update for non-existing ClickHouse roles. [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)). +* Backported in [#55323](https://github.com/ClickHouse/ClickHouse/issues/55323): Fix for background download in fs cache. [#55252](https://github.com/ClickHouse/ClickHouse/pull/55252) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#55349](https://github.com/ClickHouse/ClickHouse/issues/55349): Fix functions execution over sparse columns (fixes `DB::Exception: isDefaultAt is not implemented for Function: while executing 'FUNCTION Capture` error). [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#55475](https://github.com/ClickHouse/ClickHouse/issues/55475): Fix an issue with inability to drop detached partition in `ReplicatedMergeTree` engines family on top of S3 (without zero-copy replication). Fixes issue [#55225](https://github.com/ClickHouse/ClickHouse/issues/55225). Fix bug with abandoned blobs on S3 for complex data types like Arrays or Nested columns. Partially fixes [#52393](https://github.com/ClickHouse/ClickHouse/issues/52393). Many kudos to @alifirat for examples. [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)). +* Backported in [#55399](https://github.com/ClickHouse/ClickHouse/issues/55399): An optimization introduced one year ago was wrong. This closes [#55272](https://github.com/ClickHouse/ClickHouse/issues/55272). [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#55437](https://github.com/ClickHouse/ClickHouse/issues/55437): Fix parsing of arrays in cast operator (`::`). [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#55635](https://github.com/ClickHouse/ClickHouse/issues/55635): Fix filtering by virtual columns with OR filter in query (`_part*` filtering for `MergeTree`, `_path`/`_file` for various `File`/`HDFS`/... engines, `_table` for `Merge`). [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#55445](https://github.com/ClickHouse/ClickHouse/issues/55445): Fix connection issues that occurred with some versions of MongoDB. Closes [#55376](https://github.com/ClickHouse/ClickHouse/issues/55376), [#55232](https://github.com/ClickHouse/ClickHouse/issues/55232). [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#55534](https://github.com/ClickHouse/ClickHouse/issues/55534): Fix possible deadlock caused by not destroyed fiber in case of exception in async task cancellation. Closes [#55185](https://github.com/ClickHouse/ClickHouse/issues/55185). [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#55747](https://github.com/ClickHouse/ClickHouse/issues/55747): Fix crash in QueryNormalizer with cyclic aliases. [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)). +* Backported in [#55760](https://github.com/ClickHouse/ClickHouse/issues/55760): Fix filtering by virtual columns with OR filter in query (_part* filtering for MergeTree, _path/_file for various File/HDFS/... engines, _table for Merge). [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)). #### NO CL CATEGORY @@ -46,6 +46,6 @@ sidebar_label: 2023 * Clean data dir and always start an old server version in aggregate functions compatibility test. [#55105](https://github.com/ClickHouse/ClickHouse/pull/55105) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * check if block is empty after async insert retries [#55143](https://github.com/ClickHouse/ClickHouse/pull/55143) ([Han Fei](https://github.com/hanfei1991)). * MaterializedPostgreSQL: remove back check [#55297](https://github.com/ClickHouse/ClickHouse/pull/55297) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Remove existing moving/ dir if allow_remove_stale_moving_parts is off [#55480](https://github.com/ClickHouse/ClickHouse/pull/55480) ([Mike Kot](https://github.com/myrrc)). +* Remove existing moving/ dir if allow_remove_stale_moving_parts is off [#55480](https://github.com/ClickHouse/ClickHouse/pull/55480) ([Mikhail Kot](https://github.com/myrrc)). * Bump curl to 8.4 [#55492](https://github.com/ClickHouse/ClickHouse/pull/55492) ([Robert Schulze](https://github.com/rschu1ze)). diff --git a/docs/changelogs/v23.8.5.16-lts.md b/docs/changelogs/v23.8.5.16-lts.md index 4a23b8892be..32ddbd6031d 100644 --- a/docs/changelogs/v23.8.5.16-lts.md +++ b/docs/changelogs/v23.8.5.16-lts.md @@ -12,9 +12,9 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#55736](https://github.com/ClickHouse/ClickHouse/issues/55736): Fix iceberg metadata parsing - delete files were not checked. [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#55969](https://github.com/ClickHouse/ClickHouse/issues/55969): Try to fix possible segfault in Native ORC input format. Closes [#55873](https://github.com/ClickHouse/ClickHouse/issues/55873). [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#55907](https://github.com/ClickHouse/ClickHouse/issues/55907): Fix window functions in case of sparse columns. Previously some queries with window functions returned invalid results or made ClickHouse crash when the columns were sparse. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.6.16-lts.md b/docs/changelogs/v23.8.6.16-lts.md index 6eb752e987c..df6c03cd668 100644 --- a/docs/changelogs/v23.8.6.16-lts.md +++ b/docs/changelogs/v23.8.6.16-lts.md @@ -9,11 +9,11 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix rare case of CHECKSUM_DOESNT_MATCH error [#54549](https://github.com/ClickHouse/ClickHouse/pull/54549) ([alesapin](https://github.com/alesapin)). -* Fix: avoid using regex match, possibly containing alternation, as a key condition. [#54696](https://github.com/ClickHouse/ClickHouse/pull/54696) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Fix a crash during table loading on startup [#56232](https://github.com/ClickHouse/ClickHouse/pull/56232) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix segfault in signal handler for Keeper [#56266](https://github.com/ClickHouse/ClickHouse/pull/56266) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix buffer overflow in T64 [#56434](https://github.com/ClickHouse/ClickHouse/pull/56434) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#54583](https://github.com/ClickHouse/ClickHouse/issues/54583): Fix rare bug in replicated merge tree which could lead to self-recovering `CHECKSUM_DOESNT_MATCH` error in logs. [#54549](https://github.com/ClickHouse/ClickHouse/pull/54549) ([alesapin](https://github.com/alesapin)). +* Backported in [#56253](https://github.com/ClickHouse/ClickHouse/issues/56253): Fixed bug of match() function (regex) with pattern containing alternation produces incorrect key condition. [#54696](https://github.com/ClickHouse/ClickHouse/pull/54696) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Backported in [#56322](https://github.com/ClickHouse/ClickHouse/issues/56322): Fix a crash during table loading on startup. Closes [#55767](https://github.com/ClickHouse/ClickHouse/issues/55767). [#56232](https://github.com/ClickHouse/ClickHouse/pull/56232) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#56292](https://github.com/ClickHouse/ClickHouse/issues/56292): Fix segfault in signal handler for Keeper. [#56266](https://github.com/ClickHouse/ClickHouse/pull/56266) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#56443](https://github.com/ClickHouse/ClickHouse/issues/56443): Fix crash due to buffer overflow while decompressing malformed data using `T64` codec. This issue was found with [ClickHouse Bug Bounty Program](https://github.com/ClickHouse/ClickHouse/issues/38986) by https://twitter.com/malacupa. [#56434](https://github.com/ClickHouse/ClickHouse/pull/56434) ([Alexey Milovidov](https://github.com/alexey-milovidov)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.7.24-lts.md b/docs/changelogs/v23.8.7.24-lts.md index 37862c17315..042484e2404 100644 --- a/docs/changelogs/v23.8.7.24-lts.md +++ b/docs/changelogs/v23.8.7.24-lts.md @@ -12,12 +12,12 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Select from system tables when table based on table function. [#55540](https://github.com/ClickHouse/ClickHouse/pull/55540) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). -* Fix incomplete query result for UNION in view() function. [#56274](https://github.com/ClickHouse/ClickHouse/pull/56274) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix crash in case of adding a column with type Object(JSON) [#56307](https://github.com/ClickHouse/ClickHouse/pull/56307) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). -* Fix segfault during Kerberos initialization [#56401](https://github.com/ClickHouse/ClickHouse/pull/56401) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix: RabbitMQ OpenSSL dynamic loading issue [#56703](https://github.com/ClickHouse/ClickHouse/pull/56703) ([Igor Nikonov](https://github.com/devcrafter)). -* Fix crash in FPC codec [#56795](https://github.com/ClickHouse/ClickHouse/pull/56795) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#56581](https://github.com/ClickHouse/ClickHouse/issues/56581): Prevent reference to a remote data source for the `data_paths` column in `system.tables` if the table is created with a table function using explicit column description. [#55540](https://github.com/ClickHouse/ClickHouse/pull/55540) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Backported in [#56877](https://github.com/ClickHouse/ClickHouse/issues/56877): Fix incomplete query result for `UNION` in `view()` table function. [#56274](https://github.com/ClickHouse/ClickHouse/pull/56274) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#56409](https://github.com/ClickHouse/ClickHouse/issues/56409): Prohibit adding a column with type `Object(JSON)` to an existing table. This closes: [#56095](https://github.com/ClickHouse/ClickHouse/issues/56095) This closes: [#49944](https://github.com/ClickHouse/ClickHouse/issues/49944). [#56307](https://github.com/ClickHouse/ClickHouse/pull/56307) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Backported in [#56756](https://github.com/ClickHouse/ClickHouse/issues/56756): Fix a segfault caused by a thrown exception in Kerberos initialization during the creation of the Kafka table. Closes [#56073](https://github.com/ClickHouse/ClickHouse/issues/56073). [#56401](https://github.com/ClickHouse/ClickHouse/pull/56401) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#56748](https://github.com/ClickHouse/ClickHouse/issues/56748): Fixed the issue that the RabbitMQ table engine wasn't able to connect to RabbitMQ over a secure connection. [#56703](https://github.com/ClickHouse/ClickHouse/pull/56703) ([Igor Nikonov](https://github.com/devcrafter)). +* Backported in [#56839](https://github.com/ClickHouse/ClickHouse/issues/56839): The server crashed when decompressing malformed data using the `FPC` codec. This issue was found with [ClickHouse Bug Bounty Program](https://github.com/ClickHouse/ClickHouse/issues/38986) by https://twitter.com/malacupa. [#56795](https://github.com/ClickHouse/ClickHouse/pull/56795) ([Alexey Milovidov](https://github.com/alexey-milovidov)). #### NO CL CATEGORY diff --git a/docs/changelogs/v23.8.8.20-lts.md b/docs/changelogs/v23.8.8.20-lts.md index 345cfcccf17..f45498cb61f 100644 --- a/docs/changelogs/v23.8.8.20-lts.md +++ b/docs/changelogs/v23.8.8.20-lts.md @@ -16,9 +16,9 @@ sidebar_label: 2023 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix ON CLUSTER queries without database on initial node [#56484](https://github.com/ClickHouse/ClickHouse/pull/56484) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix buffer overflow in Gorilla codec [#57107](https://github.com/ClickHouse/ClickHouse/pull/57107) ([Nikolay Degterinsky](https://github.com/evillique)). -* Close interserver connection on any exception before authentication [#57142](https://github.com/ClickHouse/ClickHouse/pull/57142) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#57111](https://github.com/ClickHouse/ClickHouse/issues/57111): Fix ON CLUSTER queries without the database being present on an initial node. Closes [#55009](https://github.com/ClickHouse/ClickHouse/issues/55009). [#56484](https://github.com/ClickHouse/ClickHouse/pull/56484) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#57169](https://github.com/ClickHouse/ClickHouse/issues/57169): Fix crash due to buffer overflow while decompressing malformed data using `Gorilla` codec. This issue was found with [ClickHouse Bug Bounty Program](https://github.com/ClickHouse/ClickHouse/issues/38986) by https://twitter.com/malacupa. [#57107](https://github.com/ClickHouse/ClickHouse/pull/57107) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#57175](https://github.com/ClickHouse/ClickHouse/issues/57175): Close interserver connection for any exception that happens before the authentication. This issue was found with [ClickHouse Bug Bounty Program](https://github.com/ClickHouse/ClickHouse/issues/38986) by https://twitter.com/malacupa. [#57142](https://github.com/ClickHouse/ClickHouse/pull/57142) ([Antonio Andelic](https://github.com/antonio2368)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v23.8.9.54-lts.md b/docs/changelogs/v23.8.9.54-lts.md index 00607c60c39..db13238f4ad 100644 --- a/docs/changelogs/v23.8.9.54-lts.md +++ b/docs/changelogs/v23.8.9.54-lts.md @@ -11,29 +11,29 @@ sidebar_label: 2024 * Backported in [#57668](https://github.com/ClickHouse/ClickHouse/issues/57668): Output valid JSON/XML on excetpion during HTTP query execution. Add setting `http_write_exception_in_output_format` to enable/disable this behaviour (enabled by default). [#52853](https://github.com/ClickHouse/ClickHouse/pull/52853) ([Kruglov Pavel](https://github.com/Avogar)). * Backported in [#58491](https://github.com/ClickHouse/ClickHouse/issues/58491): Fix transfer query to MySQL compatible query. Fixes [#57253](https://github.com/ClickHouse/ClickHouse/issues/57253). Fixes [#52654](https://github.com/ClickHouse/ClickHouse/issues/52654). Fixes [#56729](https://github.com/ClickHouse/ClickHouse/issues/56729). [#56456](https://github.com/ClickHouse/ClickHouse/pull/56456) ([flynn](https://github.com/ucasfl)). * Backported in [#57238](https://github.com/ClickHouse/ClickHouse/issues/57238): Fetching a part waits when that part is fully committed on remote replica. It is better not send part in PreActive state. In case of zero copy this is mandatory restriction. [#56808](https://github.com/ClickHouse/ClickHouse/pull/56808) ([Sema Checherinda](https://github.com/CheSema)). -* Backported in [#57655](https://github.com/ClickHouse/ClickHouse/issues/57655): Handle sigabrt case when getting PostgreSQl table structure with empty array. [#57618](https://github.com/ClickHouse/ClickHouse/pull/57618) ([Mike Kot (Михаил Кот)](https://github.com/myrrc)). +* Backported in [#57655](https://github.com/ClickHouse/ClickHouse/issues/57655): Handle sigabrt case when getting PostgreSQl table structure with empty array. [#57618](https://github.com/ClickHouse/ClickHouse/pull/57618) ([Mikhail Kot](https://github.com/myrrc)). #### Build/Testing/Packaging Improvement * Backported in [#57582](https://github.com/ClickHouse/ClickHouse/issues/57582): Fix issue caught in https://github.com/docker-library/official-images/pull/15846. [#57571](https://github.com/ClickHouse/ClickHouse/pull/57571) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). #### Bug Fix (user-visible misbehavior in an official stable release) -* Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix ALTER COLUMN with ALIAS [#56493](https://github.com/ClickHouse/ClickHouse/pull/56493) ([Nikolay Degterinsky](https://github.com/evillique)). -* Prevent incompatible ALTER of projection columns [#56948](https://github.com/ClickHouse/ClickHouse/pull/56948) ([Amos Bird](https://github.com/amosbird)). -* Fix segfault after ALTER UPDATE with Nullable MATERIALIZED column [#57147](https://github.com/ClickHouse/ClickHouse/pull/57147) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix incorrect JOIN plan optimization with partially materialized normal projection [#57196](https://github.com/ClickHouse/ClickHouse/pull/57196) ([Amos Bird](https://github.com/amosbird)). -* Fix `ReadonlyReplica` metric for all cases [#57267](https://github.com/ClickHouse/ClickHouse/pull/57267) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix working with read buffers in StreamingFormatExecutor [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)). -* bugfix: correctly parse SYSTEM STOP LISTEN TCP SECURE [#57483](https://github.com/ClickHouse/ClickHouse/pull/57483) ([joelynch](https://github.com/joelynch)). -* Ignore ON CLUSTER clause in grant/revoke queries for management of replicated access entities. [#57538](https://github.com/ClickHouse/ClickHouse/pull/57538) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). -* Disable system.kafka_consumers by default (due to possible live memory leak) [#57822](https://github.com/ClickHouse/ClickHouse/pull/57822) ([Azat Khuzhin](https://github.com/azat)). -* Fix invalid memory access in BLAKE3 (Rust) [#57876](https://github.com/ClickHouse/ClickHouse/pull/57876) ([Raúl Marín](https://github.com/Algunenano)). -* Normalize function names in CREATE INDEX [#57906](https://github.com/ClickHouse/ClickHouse/pull/57906) ([Alexander Tokmakov](https://github.com/tavplubix)). -* Fix invalid preprocessing on Keeper [#58069](https://github.com/ClickHouse/ClickHouse/pull/58069) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix Integer overflow in Poco::UTF32Encoding [#58073](https://github.com/ClickHouse/ClickHouse/pull/58073) ([Andrey Fedotov](https://github.com/anfedotoff)). -* Remove parallel parsing for JSONCompactEachRow [#58181](https://github.com/ClickHouse/ClickHouse/pull/58181) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix parallel parsing for JSONCompactEachRow [#58250](https://github.com/ClickHouse/ClickHouse/pull/58250) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#58324](https://github.com/ClickHouse/ClickHouse/issues/58324): Flatten only true Nested type if flatten_nested=1, not all Array(Tuple). [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#57395](https://github.com/ClickHouse/ClickHouse/issues/57395): Fix ALTER COLUMN with ALIAS that previously threw the `NO_SUCH_COLUMN_IN_TABLE` exception. Closes [#50927](https://github.com/ClickHouse/ClickHouse/issues/50927). [#56493](https://github.com/ClickHouse/ClickHouse/pull/56493) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#57449](https://github.com/ClickHouse/ClickHouse/issues/57449): Now ALTER columns which are incompatible with columns used in some projections will be forbidden. Previously it could result in incorrect data. This fixes [#56932](https://github.com/ClickHouse/ClickHouse/issues/56932). This PR also allows RENAME of index columns, and improves the exception message by providing clear information on the affected indices or projections causing the prevention. [#56948](https://github.com/ClickHouse/ClickHouse/pull/56948) ([Amos Bird](https://github.com/amosbird)). +* Backported in [#57281](https://github.com/ClickHouse/ClickHouse/issues/57281): Fix segfault after ALTER UPDATE with Nullable MATERIALIZED column. Closes [#42918](https://github.com/ClickHouse/ClickHouse/issues/42918). [#57147](https://github.com/ClickHouse/ClickHouse/pull/57147) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#57247](https://github.com/ClickHouse/ClickHouse/issues/57247): Fix incorrect JOIN plan optimization with partially materialized normal projection. This fixes [#57194](https://github.com/ClickHouse/ClickHouse/issues/57194). [#57196](https://github.com/ClickHouse/ClickHouse/pull/57196) ([Amos Bird](https://github.com/amosbird)). +* Backported in [#57346](https://github.com/ClickHouse/ClickHouse/issues/57346): Fix `ReadonlyReplica` metric for some cases (e.g. when a table cannot be initialized because of difference in local and Keeper data). [#57267](https://github.com/ClickHouse/ClickHouse/pull/57267) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#58434](https://github.com/ClickHouse/ClickHouse/issues/58434): Fix working with read buffers in StreamingFormatExecutor, previously it could lead to segfaults in Kafka and other streaming engines. [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#57539](https://github.com/ClickHouse/ClickHouse/issues/57539): Fix parsing of `SYSTEM STOP LISTEN TCP SECURE`. [#57483](https://github.com/ClickHouse/ClickHouse/pull/57483) ([joelynch](https://github.com/joelynch)). +* Backported in [#57779](https://github.com/ClickHouse/ClickHouse/issues/57779): Conf ``` /clickhouse/access/ ``` sql ``` show settings like 'ignore_on_cluster_for_replicated_access_entities_queries' ┌─name─────────────────────────────────────────────────────┬─type─┬─value─┐ │ ignore_on_cluster_for_replicated_access_entities_queries │ bool │ 1 │ └──────────────────────────────────────────────────────────┴──────┴───────┘. [#57538](https://github.com/ClickHouse/ClickHouse/pull/57538) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Backported in [#58256](https://github.com/ClickHouse/ClickHouse/issues/58256): Disable system.kafka_consumers by default (due to possible live memory leak). [#57822](https://github.com/ClickHouse/ClickHouse/pull/57822) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#57923](https://github.com/ClickHouse/ClickHouse/issues/57923): Fix invalid memory access in BLAKE3. [#57876](https://github.com/ClickHouse/ClickHouse/pull/57876) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#58084](https://github.com/ClickHouse/ClickHouse/issues/58084): Normilize function names in `CREATE INDEX` query. Avoid `Existing table metadata in ZooKeeper differs in skip indexes` errors if an alias was used insead of canonical function name when creating an index. [#57906](https://github.com/ClickHouse/ClickHouse/pull/57906) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Backported in [#58110](https://github.com/ClickHouse/ClickHouse/issues/58110): Keeper fix: Leader should correctly fail on preprocessing a request if it is not initialized. [#58069](https://github.com/ClickHouse/ClickHouse/pull/58069) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#58155](https://github.com/ClickHouse/ClickHouse/issues/58155): Fix Integer overflow in Poco::UTF32Encoding. [#58073](https://github.com/ClickHouse/ClickHouse/pull/58073) ([Andrey Fedotov](https://github.com/anfedotoff)). +* Backported in [#58188](https://github.com/ClickHouse/ClickHouse/issues/58188): Parallel parsing for `JSONCompactEachRow` could work incorrectly in previous versions. This closes [#58180](https://github.com/ClickHouse/ClickHouse/issues/58180). [#58181](https://github.com/ClickHouse/ClickHouse/pull/58181) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#58301](https://github.com/ClickHouse/ClickHouse/issues/58301): Fix parallel parsing for JSONCompactEachRow. [#58250](https://github.com/ClickHouse/ClickHouse/pull/58250) ([Kruglov Pavel](https://github.com/Avogar)). #### NO CL ENTRY diff --git a/docs/changelogs/v24.1.1.2048-stable.md b/docs/changelogs/v24.1.1.2048-stable.md index 8e4647da86e..c509ce0058e 100644 --- a/docs/changelogs/v24.1.1.2048-stable.md +++ b/docs/changelogs/v24.1.1.2048-stable.md @@ -133,56 +133,56 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Add join keys conversion for nested lowcardinality [#51550](https://github.com/ClickHouse/ClickHouse/pull/51550) ([vdimir](https://github.com/vdimir)). -* Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix a bug with projections and the aggregate_functions_null_for_empty setting during insertion. [#56944](https://github.com/ClickHouse/ClickHouse/pull/56944) ([Amos Bird](https://github.com/amosbird)). -* Fixed potential exception due to stale profile UUID [#57263](https://github.com/ClickHouse/ClickHouse/pull/57263) ([Vasily Nemkov](https://github.com/Enmk)). -* Fix working with read buffers in StreamingFormatExecutor [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)). -* Ignore MVs with dropped target table during pushing to views [#57520](https://github.com/ClickHouse/ClickHouse/pull/57520) ([Kruglov Pavel](https://github.com/Avogar)). -* [RFC] Eliminate possible race between ALTER_METADATA and MERGE_PARTS [#57755](https://github.com/ClickHouse/ClickHouse/pull/57755) ([Azat Khuzhin](https://github.com/azat)). -* Fix the exprs order bug in group by with rollup [#57786](https://github.com/ClickHouse/ClickHouse/pull/57786) ([Chen768959](https://github.com/Chen768959)). -* Fix lost blobs after dropping a replica with broken detached parts [#58333](https://github.com/ClickHouse/ClickHouse/pull/58333) ([Alexander Tokmakov](https://github.com/tavplubix)). -* Allow users to work with symlinks in user_files_path (again) [#58447](https://github.com/ClickHouse/ClickHouse/pull/58447) ([Duc Canh Le](https://github.com/canhld94)). -* Fix segfault when graphite table does not have agg function [#58453](https://github.com/ClickHouse/ClickHouse/pull/58453) ([Duc Canh Le](https://github.com/canhld94)). -* Delay reading from StorageKafka to allow multiple reads in materialized views [#58477](https://github.com/ClickHouse/ClickHouse/pull/58477) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Fix a stupid case of intersecting parts [#58482](https://github.com/ClickHouse/ClickHouse/pull/58482) ([Alexander Tokmakov](https://github.com/tavplubix)). -* MergeTreePrefetchedReadPool disable for LIMIT only queries [#58505](https://github.com/ClickHouse/ClickHouse/pull/58505) ([Maksim Kita](https://github.com/kitaisreal)). -* Enable ordinary databases while restoration [#58520](https://github.com/ClickHouse/ClickHouse/pull/58520) ([Jihyuk Bok](https://github.com/tomahawk28)). -* Fix hive threadpool read ORC/Parquet/... Failed [#58537](https://github.com/ClickHouse/ClickHouse/pull/58537) ([sunny](https://github.com/sunny19930321)). -* Hide credentials in system.backup_log base_backup_name column [#58550](https://github.com/ClickHouse/ClickHouse/pull/58550) ([Daniel Pozo Escalona](https://github.com/danipozo)). -* toStartOfInterval for milli- microsencods values rounding [#58557](https://github.com/ClickHouse/ClickHouse/pull/58557) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Disable max_joined_block_rows in ConcurrentHashJoin [#58595](https://github.com/ClickHouse/ClickHouse/pull/58595) ([vdimir](https://github.com/vdimir)). -* Fix join using nullable in old analyzer [#58596](https://github.com/ClickHouse/ClickHouse/pull/58596) ([vdimir](https://github.com/vdimir)). -* `makeDateTime64()`: Allow non-const fraction argument [#58597](https://github.com/ClickHouse/ClickHouse/pull/58597) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix possible NULL dereference during symbolizing inline frames [#58607](https://github.com/ClickHouse/ClickHouse/pull/58607) ([Azat Khuzhin](https://github.com/azat)). -* Improve isolation of query cache entries under re-created users or role switches [#58611](https://github.com/ClickHouse/ClickHouse/pull/58611) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix broken partition key analysis when doing projection optimization [#58638](https://github.com/ClickHouse/ClickHouse/pull/58638) ([Amos Bird](https://github.com/amosbird)). -* Query cache: Fix per-user quota [#58731](https://github.com/ClickHouse/ClickHouse/pull/58731) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix stream partitioning in parallel window functions [#58739](https://github.com/ClickHouse/ClickHouse/pull/58739) ([Dmitry Novik](https://github.com/novikd)). -* Fix double destroy call on exception throw in addBatchLookupTable8 [#58745](https://github.com/ClickHouse/ClickHouse/pull/58745) ([Raúl Marín](https://github.com/Algunenano)). -* Don't process requests in Keeper during shutdown [#58765](https://github.com/ClickHouse/ClickHouse/pull/58765) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix Segfault in `SlabsPolygonIndex::find` [#58771](https://github.com/ClickHouse/ClickHouse/pull/58771) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Fix JSONExtract function for LowCardinality(Nullable) columns [#58808](https://github.com/ClickHouse/ClickHouse/pull/58808) ([vdimir](https://github.com/vdimir)). -* Table CREATE DROP Poco::Logger memory leak fix [#58831](https://github.com/ClickHouse/ClickHouse/pull/58831) ([Maksim Kita](https://github.com/kitaisreal)). -* Fix HTTP compressors finalization [#58846](https://github.com/ClickHouse/ClickHouse/pull/58846) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Multiple read file log storage in mv [#58877](https://github.com/ClickHouse/ClickHouse/pull/58877) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Restriction for the access key id for s3. [#58900](https://github.com/ClickHouse/ClickHouse/pull/58900) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). -* Fix possible crash in clickhouse-local during loading suggestions [#58907](https://github.com/ClickHouse/ClickHouse/pull/58907) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix crash when indexHint() is used [#58911](https://github.com/ClickHouse/ClickHouse/pull/58911) ([Dmitry Novik](https://github.com/novikd)). -* Fix StorageURL forgetting headers on server restart [#58933](https://github.com/ClickHouse/ClickHouse/pull/58933) ([Michael Kolupaev](https://github.com/al13n321)). -* Analyzer: fix storage replacement with insertion block [#58958](https://github.com/ClickHouse/ClickHouse/pull/58958) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Fix seek in ReadBufferFromZipArchive [#58966](https://github.com/ClickHouse/ClickHouse/pull/58966) ([Michael Kolupaev](https://github.com/al13n321)). -* `DROP INDEX` of inverted index now removes all relevant files from persistence [#59040](https://github.com/ClickHouse/ClickHouse/pull/59040) ([mochi](https://github.com/MochiXu)). -* Fix data race on query_factories_info [#59049](https://github.com/ClickHouse/ClickHouse/pull/59049) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Disable "Too many redirects" error retry [#59099](https://github.com/ClickHouse/ClickHouse/pull/59099) ([skyoct](https://github.com/skyoct)). -* Fix aggregation issue in mixed x86_64 and ARM clusters [#59132](https://github.com/ClickHouse/ClickHouse/pull/59132) ([Harry Lee](https://github.com/HarryLeeIBM)). -* Fix not started database shutdown deadlock [#59137](https://github.com/ClickHouse/ClickHouse/pull/59137) ([Sergei Trifonov](https://github.com/serxa)). -* Fix: LIMIT BY and LIMIT in distributed query [#59153](https://github.com/ClickHouse/ClickHouse/pull/59153) ([Igor Nikonov](https://github.com/devcrafter)). -* Fix crash with nullable timezone for `toString` [#59190](https://github.com/ClickHouse/ClickHouse/pull/59190) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Fix abort in iceberg metadata on bad file paths [#59275](https://github.com/ClickHouse/ClickHouse/pull/59275) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix architecture name in select of Rust target [#59307](https://github.com/ClickHouse/ClickHouse/pull/59307) ([p1rattttt](https://github.com/p1rattttt)). -* Fix not-ready set for system.tables [#59351](https://github.com/ClickHouse/ClickHouse/pull/59351) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix lazy initialization in RabbitMQ [#59352](https://github.com/ClickHouse/ClickHouse/pull/59352) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix possible errors when joining sub-types with low cardinality (e.g., Array(LowCardinality(T)) with Array(T)). [#51550](https://github.com/ClickHouse/ClickHouse/pull/51550) ([vdimir](https://github.com/vdimir)). +* Flatten only true Nested type if flatten_nested=1, not all Array(Tuple). [#56132](https://github.com/ClickHouse/ClickHouse/pull/56132) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix a bug with projections and the `aggregate_functions_null_for_empty` setting during insertion. This is an addition to [#42198](https://github.com/ClickHouse/ClickHouse/issues/42198) and [#49873](https://github.com/ClickHouse/ClickHouse/issues/49873). The bug was found by fuzzer in [#56666](https://github.com/ClickHouse/ClickHouse/issues/56666). This PR also fix potential issues with projections and the `transform_null_in` setting. [#56944](https://github.com/ClickHouse/ClickHouse/pull/56944) ([Amos Bird](https://github.com/amosbird)). +* Fixed (a rare) exception in case when user's assigned profiles are updated right after user logging in, which could cause a missing entry in `session_log` or problems with logging in. [#57263](https://github.com/ClickHouse/ClickHouse/pull/57263) ([Vasily Nemkov](https://github.com/Enmk)). +* Fix working with read buffers in StreamingFormatExecutor, previously it could lead to segfaults in Kafka and other streaming engines. [#57438](https://github.com/ClickHouse/ClickHouse/pull/57438) ([Kruglov Pavel](https://github.com/Avogar)). +* Ignore MVs with dropped target table during pushing to views in insert to a source table. [#57520](https://github.com/ClickHouse/ClickHouse/pull/57520) ([Kruglov Pavel](https://github.com/Avogar)). +* Eliminate possible race between ALTER_METADATA and MERGE_PARTS (that leads to checksum mismatch - CHECKSUM_DOESNT_MATCH). [#57755](https://github.com/ClickHouse/ClickHouse/pull/57755) ([Azat Khuzhin](https://github.com/azat)). +* Fix the exprs order bug in group by with rollup. [#57786](https://github.com/ClickHouse/ClickHouse/pull/57786) ([Chen768959](https://github.com/Chen768959)). +* Fix a bug in zero-copy-replication (an experimental feature) that could lead to `The specified key does not exist` error and data loss. It could happen when dropping a replica with broken or unexpected/ignored detached parts. Fixes [#57985](https://github.com/ClickHouse/ClickHouse/issues/57985). [#58333](https://github.com/ClickHouse/ClickHouse/pull/58333) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Fix a bug that users cannot work with symlinks in user_files_path. [#58447](https://github.com/ClickHouse/ClickHouse/pull/58447) ([Duc Canh Le](https://github.com/canhld94)). +* Fix segfault when graphite table does not have agg function. [#58453](https://github.com/ClickHouse/ClickHouse/pull/58453) ([Duc Canh Le](https://github.com/canhld94)). +* Fix reading multiple times from KafkaEngine in materialized views. [#58477](https://github.com/ClickHouse/ClickHouse/pull/58477) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Fix `Part ... intersects part ...` error that might occur in `ReplicatedMergeTree` when the server was restarted just after [automatically] dropping [an empty] part and adjacent parts were merged. The bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/56282. [#58482](https://github.com/ClickHouse/ClickHouse/pull/58482) ([Alexander Tokmakov](https://github.com/tavplubix)). +* MergeTreePrefetchedReadPool disable for LIMIT only queries, because time spend during filling per thread tasks can be greater than whole query execution for big tables with small limit. [#58505](https://github.com/ClickHouse/ClickHouse/pull/58505) ([Maksim Kita](https://github.com/kitaisreal)). +* While `restore` is underway in Clickhouse, restore should allow the database with an `ordinary` engine. [#58520](https://github.com/ClickHouse/ClickHouse/pull/58520) ([Jihyuk Bok](https://github.com/tomahawk28)). +* Fix read buffer creation in Hive engine when thread_pool read method is used. Closes [#57978](https://github.com/ClickHouse/ClickHouse/issues/57978). [#58537](https://github.com/ClickHouse/ClickHouse/pull/58537) ([sunny](https://github.com/sunny19930321)). +* Hide credentials in `base_backup_name` column of `system.backup_log`. [#58550](https://github.com/ClickHouse/ClickHouse/pull/58550) ([Daniel Pozo Escalona](https://github.com/danipozo)). +* While executing queries like `SELECT toStartOfInterval(toDateTime64('2023-10-09 10:11:12.000999', 6), toIntervalMillisecond(1));`, the result was not rounded to 1 millisecond previously. Current PR solves this issue. Also, current PR will solve some problems appearing in https://github.com/ClickHouse/ClickHouse/pull/56738. [#58557](https://github.com/ClickHouse/ClickHouse/pull/58557) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix logical error in `parallel_hash` working with `max_joined_block_size_rows`. [#58595](https://github.com/ClickHouse/ClickHouse/pull/58595) ([vdimir](https://github.com/vdimir)). +* Fix error in join with `USING` when one of the table has `Nullable` key. [#58596](https://github.com/ClickHouse/ClickHouse/pull/58596) ([vdimir](https://github.com/vdimir)). +* The (optional) `fraction` argument in function `makeDateTime64()` can now be non-const. This was possible already with ClickHouse <= 23.8. [#58597](https://github.com/ClickHouse/ClickHouse/pull/58597) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix possible server crash during symbolizing inline frames. [#58607](https://github.com/ClickHouse/ClickHouse/pull/58607) ([Azat Khuzhin](https://github.com/azat)). +* The query cache now denies access to entries when the user is re-created or assumes another role. This improves prevents attacks where 1. an user with the same name as a dropped user may access the old user's cache entries or 2. a user with a different role may access cache entries of a role with a different row policy. [#58611](https://github.com/ClickHouse/ClickHouse/pull/58611) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix broken partition key analysis when doing projection optimization with `force_index_by_date = 1`. This fixes [#58620](https://github.com/ClickHouse/ClickHouse/issues/58620). We don't need partition key analysis for projections after https://github.com/ClickHouse/ClickHouse/pull/56502 . [#58638](https://github.com/ClickHouse/ClickHouse/pull/58638) ([Amos Bird](https://github.com/amosbird)). +* The query cache now behaves properly when per-user quotas are defined and `SYSTEM DROP QUERY CACHE` ran. [#58731](https://github.com/ClickHouse/ClickHouse/pull/58731) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix data stream partitioning for window functions when there are different window descriptions with similar prefixes but different partitioning. Fixes [#58714](https://github.com/ClickHouse/ClickHouse/issues/58714). [#58739](https://github.com/ClickHouse/ClickHouse/pull/58739) ([Dmitry Novik](https://github.com/novikd)). +* Fix double destroy call on exception throw in addBatchLookupTable8. [#58745](https://github.com/ClickHouse/ClickHouse/pull/58745) ([Raúl Marín](https://github.com/Algunenano)). +* Keeper fix: don't process requests during shutdown because it will lead to invalid state. [#58765](https://github.com/ClickHouse/ClickHouse/pull/58765) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix a crash in the polygon dictionary. Fixes [#58612](https://github.com/ClickHouse/ClickHouse/issues/58612). [#58771](https://github.com/ClickHouse/ClickHouse/pull/58771) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix possible crash in JSONExtract function extracting `LowCardinality(Nullable(T))` type. [#58808](https://github.com/ClickHouse/ClickHouse/pull/58808) ([vdimir](https://github.com/vdimir)). +* Table CREATE DROP `Poco::Logger` memory leak fix. Closes [#57931](https://github.com/ClickHouse/ClickHouse/issues/57931). Closes [#58496](https://github.com/ClickHouse/ClickHouse/issues/58496). [#58831](https://github.com/ClickHouse/ClickHouse/pull/58831) ([Maksim Kita](https://github.com/kitaisreal)). +* Fix HTTP compressors. Follow-up [#58475](https://github.com/ClickHouse/ClickHouse/issues/58475). [#58846](https://github.com/ClickHouse/ClickHouse/pull/58846) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fix reading multiple times from FileLog engine in materialized views. [#58877](https://github.com/ClickHouse/ClickHouse/pull/58877) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Prevent specifying an `access_key_id` that does not match the correct [correct pattern]( https://docs.aws.amazon.com/IAM/latest/APIReference/API_AccessKey.html). [#58900](https://github.com/ClickHouse/ClickHouse/pull/58900) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Fix possible crash in clickhouse-local during loading suggestions. Closes [#58825](https://github.com/ClickHouse/ClickHouse/issues/58825). [#58907](https://github.com/ClickHouse/ClickHouse/pull/58907) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix crash when `indexHint` function is used without arguments in the filters. [#58911](https://github.com/ClickHouse/ClickHouse/pull/58911) ([Dmitry Novik](https://github.com/novikd)). +* Fixed URL and S3 engines losing the `headers` argument on server restart. [#58933](https://github.com/ClickHouse/ClickHouse/pull/58933) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix analyzer - insertion from select with subquery referencing insertion table should process only insertion block for all table expressions. Fixes [#58080](https://github.com/ClickHouse/ClickHouse/issues/58080). follow-up [#50857](https://github.com/ClickHouse/ClickHouse/issues/50857). [#58958](https://github.com/ClickHouse/ClickHouse/pull/58958) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fixed reading parquet files from archives. [#58966](https://github.com/ClickHouse/ClickHouse/pull/58966) ([Michael Kolupaev](https://github.com/al13n321)). +* Experimental feature of inverted indices: `ALTER TABLE DROP INDEX` for an inverted index now removes all inverted index files from the new part (issue [#59039](https://github.com/ClickHouse/ClickHouse/issues/59039)). [#59040](https://github.com/ClickHouse/ClickHouse/pull/59040) ([mochi](https://github.com/MochiXu)). +* Fix data race on collecting factories info for system.query_log. [#59049](https://github.com/ClickHouse/ClickHouse/pull/59049) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fixs: [#58967](https://github.com/ClickHouse/ClickHouse/issues/58967). [#59099](https://github.com/ClickHouse/ClickHouse/pull/59099) ([skyoct](https://github.com/skyoct)). +* Fixed wrong aggregation results in mixed x86_64 and ARM clusters. [#59132](https://github.com/ClickHouse/ClickHouse/pull/59132) ([Harry Lee](https://github.com/HarryLeeIBM)). +* Fix a deadlock that can happen during the shutdown of the server due to metadata loading failure. [#59137](https://github.com/ClickHouse/ClickHouse/pull/59137) ([Sergei Trifonov](https://github.com/serxa)). +* The combination of LIMIT BY and LIMIT could produce an incorrect result in distributed queries (parallel replicas included). [#59153](https://github.com/ClickHouse/ClickHouse/pull/59153) ([Igor Nikonov](https://github.com/devcrafter)). +* Fixes crash with for `toString()` with timezone in nullable format. Fixes [#59126](https://github.com/ClickHouse/ClickHouse/issues/59126). [#59190](https://github.com/ClickHouse/ClickHouse/pull/59190) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix abort in iceberg metadata on bad file paths. [#59275](https://github.com/ClickHouse/ClickHouse/pull/59275) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix architecture name in select of Rust target. [#59307](https://github.com/ClickHouse/ClickHouse/pull/59307) ([p1rattttt](https://github.com/p1rattttt)). +* Fix `Not-ready Set` for queries from `system.tables` with `table IN (subquery)` filter expression. Fixes [#59342](https://github.com/ClickHouse/ClickHouse/issues/59342). [#59351](https://github.com/ClickHouse/ClickHouse/pull/59351) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix lazy initialization in RabbitMQ that could lead to logical error and not initialized state. [#59352](https://github.com/ClickHouse/ClickHouse/pull/59352) ([Kruglov Pavel](https://github.com/Avogar)). #### NO CL ENTRY diff --git a/docs/changelogs/v24.1.2.5-stable.md b/docs/changelogs/v24.1.2.5-stable.md index bac25c9b9ed..080e24da6f0 100644 --- a/docs/changelogs/v24.1.2.5-stable.md +++ b/docs/changelogs/v24.1.2.5-stable.md @@ -9,6 +9,6 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix translate() with FixedString input [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). -* Fix stacktraces for binaries without debug symbols [#59444](https://github.com/ClickHouse/ClickHouse/pull/59444) ([Azat Khuzhin](https://github.com/azat)). +* Backported in [#59425](https://github.com/ClickHouse/ClickHouse/issues/59425): Fix translate() with FixedString input. Could lead to crashes as it'd return a String column (vs the expected FixedString). This issue was found through ClickHouse Bug Bounty Program YohannJardin. [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59478](https://github.com/ClickHouse/ClickHouse/issues/59478): Fix stacktraces for binaries without debug symbols. [#59444](https://github.com/ClickHouse/ClickHouse/pull/59444) ([Azat Khuzhin](https://github.com/azat)). diff --git a/docs/changelogs/v24.1.3.31-stable.md b/docs/changelogs/v24.1.3.31-stable.md index e898fba5c87..ec73672c8d5 100644 --- a/docs/changelogs/v24.1.3.31-stable.md +++ b/docs/changelogs/v24.1.3.31-stable.md @@ -13,13 +13,13 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix `ASTAlterCommand::formatImpl` in case of column specific settings... [#59445](https://github.com/ClickHouse/ClickHouse/pull/59445) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Make MAX use the same rules as permutation for complex types [#59498](https://github.com/ClickHouse/ClickHouse/pull/59498) ([Raúl Marín](https://github.com/Algunenano)). -* Fix corner case when passing `update_insert_deduplication_token_in_dependent_materialized_views` [#59544](https://github.com/ClickHouse/ClickHouse/pull/59544) ([Jordi Villar](https://github.com/jrdi)). -* Fix incorrect result of arrayElement / map[] on empty value [#59594](https://github.com/ClickHouse/ClickHouse/pull/59594) ([Raúl Marín](https://github.com/Algunenano)). -* Fix crash in topK when merging empty states [#59603](https://github.com/ClickHouse/ClickHouse/pull/59603) ([Raúl Marín](https://github.com/Algunenano)). -* Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor [#59658](https://github.com/ClickHouse/ClickHouse/pull/59658) ([Raúl Marín](https://github.com/Algunenano)). -* Fix leftPad / rightPad function with FixedString input [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59726](https://github.com/ClickHouse/ClickHouse/issues/59726): Fix formatting of alter commands in case of column specific settings. [#59445](https://github.com/ClickHouse/ClickHouse/pull/59445) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#59585](https://github.com/ClickHouse/ClickHouse/issues/59585): Make MAX use the same rules as permutation for complex types. [#59498](https://github.com/ClickHouse/ClickHouse/pull/59498) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59579](https://github.com/ClickHouse/ClickHouse/issues/59579): Fix a corner case when passing `update_insert_deduplication_token_in_dependent_materialized_views` setting. There is one corner case not covered due to the absence of tables in the path:. [#59544](https://github.com/ClickHouse/ClickHouse/pull/59544) ([Jordi Villar](https://github.com/jrdi)). +* Backported in [#59647](https://github.com/ClickHouse/ClickHouse/issues/59647): Fix incorrect result of arrayElement / map[] on empty value. [#59594](https://github.com/ClickHouse/ClickHouse/pull/59594) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59639](https://github.com/ClickHouse/ClickHouse/issues/59639): Fix crash in topK when merging empty states. [#59603](https://github.com/ClickHouse/ClickHouse/pull/59603) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59696](https://github.com/ClickHouse/ClickHouse/issues/59696): Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor. [#59658](https://github.com/ClickHouse/ClickHouse/pull/59658) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59764](https://github.com/ClickHouse/ClickHouse/issues/59764): Fix leftPad / rightPad function with FixedString input. [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). #### NO CL ENTRY diff --git a/docs/changelogs/v24.1.4.20-stable.md b/docs/changelogs/v24.1.4.20-stable.md index 8612a485f12..1baec2178b1 100644 --- a/docs/changelogs/v24.1.4.20-stable.md +++ b/docs/changelogs/v24.1.4.20-stable.md @@ -15,10 +15,10 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix digest calculation in Keeper [#59439](https://github.com/ClickHouse/ClickHouse/pull/59439) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix distributed table with a constant sharding key [#59606](https://github.com/ClickHouse/ClickHouse/pull/59606) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix query start time on non initial queries [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). -* Fix parsing of partition expressions surrounded by parens [#59901](https://github.com/ClickHouse/ClickHouse/pull/59901) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#59457](https://github.com/ClickHouse/ClickHouse/issues/59457): Keeper fix: fix digest calculation for nodes. [#59439](https://github.com/ClickHouse/ClickHouse/pull/59439) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#59682](https://github.com/ClickHouse/ClickHouse/issues/59682): Fix distributed table with a constant sharding key. [#59606](https://github.com/ClickHouse/ClickHouse/pull/59606) ([Vitaly Baranov](https://github.com/vitlibar)). +* Backported in [#59842](https://github.com/ClickHouse/ClickHouse/issues/59842): Fix query start time on non initial queries. [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#59937](https://github.com/ClickHouse/ClickHouse/issues/59937): Fix parsing of partition expressions that are surrounded by parentheses, e.g.: `ALTER TABLE test DROP PARTITION ('2023-10-19')`. [#59901](https://github.com/ClickHouse/ClickHouse/pull/59901) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v24.1.5.6-stable.md b/docs/changelogs/v24.1.5.6-stable.md index ce46c51e2f4..caf246fcab6 100644 --- a/docs/changelogs/v24.1.5.6-stable.md +++ b/docs/changelogs/v24.1.5.6-stable.md @@ -9,7 +9,7 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* UniqExactSet read crash fix [#59928](https://github.com/ClickHouse/ClickHouse/pull/59928) ([Maksim Kita](https://github.com/kitaisreal)). +* Backported in [#59959](https://github.com/ClickHouse/ClickHouse/issues/59959): Fix crash during deserialization of aggregation function states that internally use `UniqExactSet`. Introduced https://github.com/ClickHouse/ClickHouse/pull/59009. [#59928](https://github.com/ClickHouse/ClickHouse/pull/59928) ([Maksim Kita](https://github.com/kitaisreal)). #### NOT FOR CHANGELOG / INSIGNIFICANT diff --git a/docs/changelogs/v24.1.7.18-stable.md b/docs/changelogs/v24.1.7.18-stable.md index 603a83a67be..3bc94538174 100644 --- a/docs/changelogs/v24.1.7.18-stable.md +++ b/docs/changelogs/v24.1.7.18-stable.md @@ -9,10 +9,10 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix_max_query_size_for_kql_compound_operator: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). -* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). -* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#61330](https://github.com/ClickHouse/ClickHouse/issues/61330): Fix deadlock in parallel parsing when lots of rows are skipped due to errors. [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#61008](https://github.com/ClickHouse/ClickHouse/issues/61008): Fix the issue of `max_query_size` for KQL compound operator like mv-expand. Related to [#59626](https://github.com/ClickHouse/ClickHouse/issues/59626). [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). +* Backported in [#61019](https://github.com/ClickHouse/ClickHouse/issues/61019): Fix crash when `allow_experimental_analyzer` setting value is changed in the subqueries. [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#61293](https://github.com/ClickHouse/ClickHouse/issues/61293): Keeper: fix runtime reconfig for standalone binary. [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.1.8.22-stable.md b/docs/changelogs/v24.1.8.22-stable.md index f780de41c40..e615c60a942 100644 --- a/docs/changelogs/v24.1.8.22-stable.md +++ b/docs/changelogs/v24.1.8.22-stable.md @@ -9,12 +9,12 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix possible incorrect result of aggregate function `uniqExact` [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). -* Fix consecutive keys optimization for nullable keys [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). -* Fix bug when reading system.parts using UUID (issue 61220). [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). -* Fix client `-s` argument [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). -* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). +* Backported in [#61451](https://github.com/ClickHouse/ClickHouse/issues/61451): Fix possible incorrect result of aggregate function `uniqExact`. [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#61844](https://github.com/ClickHouse/ClickHouse/issues/61844): Fixed possible wrong result of aggregation with nullable keys. [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#61746](https://github.com/ClickHouse/ClickHouse/issues/61746): Fix incorrect results when filtering `system.parts` or `system.parts_columns` using UUID. [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). +* Backported in [#61696](https://github.com/ClickHouse/ClickHouse/issues/61696): Fix `clickhouse-client -s` argument, it was broken by defining it two times. [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Backported in [#61576](https://github.com/ClickHouse/ClickHouse/issues/61576): Fix string search with constant start position which previously could lead to memory corruption. [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#61858](https://github.com/ClickHouse/ClickHouse/issues/61858): Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` when specifying incorrect UTF-8 sequence. Example: [#61714](https://github.com/ClickHouse/ClickHouse/issues/61714#issuecomment-2012768202). [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.2.1.2248-stable.md b/docs/changelogs/v24.2.1.2248-stable.md index 02affe12c43..edcd3da3852 100644 --- a/docs/changelogs/v24.2.1.2248-stable.md +++ b/docs/changelogs/v24.2.1.2248-stable.md @@ -60,7 +60,7 @@ sidebar_label: 2024 * Support negative positional arguments. Closes [#57736](https://github.com/ClickHouse/ClickHouse/issues/57736). [#58292](https://github.com/ClickHouse/ClickHouse/pull/58292) ([flynn](https://github.com/ucasfl)). * Implement auto-adjustment for asynchronous insert timeouts. The following settings are introduced: async_insert_poll_timeout_ms, async_insert_use_adaptive_busy_timeout, async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms, async_insert_busy_timeout_increase_rate, async_insert_busy_timeout_decrease_rate. [#58486](https://github.com/ClickHouse/ClickHouse/pull/58486) ([Julia Kartseva](https://github.com/jkartseva)). * Allow to define `volume_priority` in `storage_configuration`. [#58533](https://github.com/ClickHouse/ClickHouse/pull/58533) ([Andrey Zvonov](https://github.com/zvonand)). -* Add support for Date32 type in T64 codec. [#58738](https://github.com/ClickHouse/ClickHouse/pull/58738) ([Hongbin Ma](https://github.com/binmahone)). +* Add support for Date32 type in T64 codec. [#58738](https://github.com/ClickHouse/ClickHouse/pull/58738) ([Hongbin Ma (Mahone)](https://github.com/binmahone)). * Support `LEFT JOIN`, `ALL INNER JOIN`, and simple subqueries for parallel replicas (only with analyzer). New setting `parallel_replicas_prefer_local_join` chooses local `JOIN` execution (by default) vs `GLOBAL JOIN`. All tables should exist on every replica from `cluster_for_parallel_replicas`. New settings `min_external_table_block_size_rows` and `min_external_table_block_size_bytes` are used to squash small blocks that are sent for temporary tables (only with analyzer). [#58916](https://github.com/ClickHouse/ClickHouse/pull/58916) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Allow trailing commas in types with several items. [#59119](https://github.com/ClickHouse/ClickHouse/pull/59119) ([Aleksandr Musorin](https://github.com/AVMusorin)). * Allow parallel and distributed processing for `S3Queue` table engine. For distributed processing use setting `s3queue_total_shards_num` (by default `1`). Setting `s3queue_processing_threads_num` previously was not allowed for Ordered processing mode, now it is allowed. Warning: settings `s3queue_processing_threads_num`(processing threads per each shard) and `s3queue_total_shards_num` for ordered mode change how metadata is stored (make the number of `max_processed_file` nodes equal to `s3queue_processing_threads_num * s3queue_total_shards_num`), so they must be the same for all shards and cannot be changed once at least one shard is created. [#59167](https://github.com/ClickHouse/ClickHouse/pull/59167) ([Kseniia Sumarokova](https://github.com/kssenii)). @@ -123,60 +123,60 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Non ready set in TTL WHERE. [#57430](https://github.com/ClickHouse/ClickHouse/pull/57430) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix quantilesGK bug [#58216](https://github.com/ClickHouse/ClickHouse/pull/58216) ([李扬](https://github.com/taiyang-li)). -* Disable parallel replicas JOIN with CTE (not analyzer) [#59239](https://github.com/ClickHouse/ClickHouse/pull/59239) ([Raúl Marín](https://github.com/Algunenano)). -* Fix bug with `intDiv` for decimal arguments [#59243](https://github.com/ClickHouse/ClickHouse/pull/59243) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Fix translate() with FixedString input [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). -* Fix digest calculation in Keeper [#59439](https://github.com/ClickHouse/ClickHouse/pull/59439) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix stacktraces for binaries without debug symbols [#59444](https://github.com/ClickHouse/ClickHouse/pull/59444) ([Azat Khuzhin](https://github.com/azat)). -* Fix `ASTAlterCommand::formatImpl` in case of column specific settings... [#59445](https://github.com/ClickHouse/ClickHouse/pull/59445) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Fix `SELECT * FROM [...] ORDER BY ALL` with Analyzer [#59462](https://github.com/ClickHouse/ClickHouse/pull/59462) ([zhongyuankai](https://github.com/zhongyuankai)). -* Fix possible uncaught exception during distributed query cancellation [#59487](https://github.com/ClickHouse/ClickHouse/pull/59487) ([Azat Khuzhin](https://github.com/azat)). -* Make MAX use the same rules as permutation for complex types [#59498](https://github.com/ClickHouse/ClickHouse/pull/59498) ([Raúl Marín](https://github.com/Algunenano)). -* Fix corner case when passing `update_insert_deduplication_token_in_dependent_materialized_views` [#59544](https://github.com/ClickHouse/ClickHouse/pull/59544) ([Jordi Villar](https://github.com/jrdi)). -* Fix incorrect result of arrayElement / map[] on empty value [#59594](https://github.com/ClickHouse/ClickHouse/pull/59594) ([Raúl Marín](https://github.com/Algunenano)). -* Fix crash in topK when merging empty states [#59603](https://github.com/ClickHouse/ClickHouse/pull/59603) ([Raúl Marín](https://github.com/Algunenano)). -* Fix distributed table with a constant sharding key [#59606](https://github.com/ClickHouse/ClickHouse/pull/59606) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix_kql_issue_found_by_wingfuzz [#59626](https://github.com/ClickHouse/ClickHouse/pull/59626) ([Yong Wang](https://github.com/kashwy)). -* Fix error "Read beyond last offset" for AsynchronousBoundedReadBuffer [#59630](https://github.com/ClickHouse/ClickHouse/pull/59630) ([Vitaly Baranov](https://github.com/vitlibar)). -* Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor [#59658](https://github.com/ClickHouse/ClickHouse/pull/59658) ([Raúl Marín](https://github.com/Algunenano)). -* Fix query start time on non initial queries [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). -* Validate types of arguments for `minmax` skipping index [#59733](https://github.com/ClickHouse/ClickHouse/pull/59733) ([Anton Popov](https://github.com/CurtizJ)). -* Fix leftPad / rightPad function with FixedString input [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). -* Fix AST fuzzer issue in function `countMatches` [#59752](https://github.com/ClickHouse/ClickHouse/pull/59752) ([Robert Schulze](https://github.com/rschu1ze)). -* rabbitmq: fix having neither acked nor nacked messages [#59775](https://github.com/ClickHouse/ClickHouse/pull/59775) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix StorageURL doing some of the query execution in single thread [#59833](https://github.com/ClickHouse/ClickHouse/pull/59833) ([Michael Kolupaev](https://github.com/al13n321)). -* s3queue: fix uninitialized value [#59897](https://github.com/ClickHouse/ClickHouse/pull/59897) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix parsing of partition expressions surrounded by parens [#59901](https://github.com/ClickHouse/ClickHouse/pull/59901) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Fix crash in JSONColumnsWithMetadata format over http [#59925](https://github.com/ClickHouse/ClickHouse/pull/59925) ([Kruglov Pavel](https://github.com/Avogar)). -* Do not rewrite sum() to count() if return value differs in analyzer [#59926](https://github.com/ClickHouse/ClickHouse/pull/59926) ([Azat Khuzhin](https://github.com/azat)). -* UniqExactSet read crash fix [#59928](https://github.com/ClickHouse/ClickHouse/pull/59928) ([Maksim Kita](https://github.com/kitaisreal)). -* ReplicatedMergeTree invalid metadata_version fix [#59946](https://github.com/ClickHouse/ClickHouse/pull/59946) ([Maksim Kita](https://github.com/kitaisreal)). -* Fix data race in `StorageDistributed` [#59987](https://github.com/ClickHouse/ClickHouse/pull/59987) ([Nikita Taranov](https://github.com/nickitat)). -* Run init scripts when option is enabled rather than disabled [#59991](https://github.com/ClickHouse/ClickHouse/pull/59991) ([jktng](https://github.com/jktng)). -* Fix scale conversion for DateTime64 [#60004](https://github.com/ClickHouse/ClickHouse/pull/60004) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Fix INSERT into SQLite with single quote (by escaping single quotes with a quote instead of backslash) [#60015](https://github.com/ClickHouse/ClickHouse/pull/60015) ([Azat Khuzhin](https://github.com/azat)). -* Fix several logical errors in arrayFold [#60022](https://github.com/ClickHouse/ClickHouse/pull/60022) ([Raúl Marín](https://github.com/Algunenano)). -* Fix optimize_uniq_to_count removing the column alias [#60026](https://github.com/ClickHouse/ClickHouse/pull/60026) ([Raúl Marín](https://github.com/Algunenano)). -* Fix possible exception from s3queue table on drop [#60036](https://github.com/ClickHouse/ClickHouse/pull/60036) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix formatting of NOT with single literals [#60042](https://github.com/ClickHouse/ClickHouse/pull/60042) ([Raúl Marín](https://github.com/Algunenano)). -* Use max_query_size from context in DDLLogEntry instead of hardcoded 4096 [#60083](https://github.com/ClickHouse/ClickHouse/pull/60083) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix inconsistent formatting of queries [#60095](https://github.com/ClickHouse/ClickHouse/pull/60095) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix inconsistent formatting of explain in subqueries [#60102](https://github.com/ClickHouse/ClickHouse/pull/60102) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix cosineDistance crash with Nullable [#60150](https://github.com/ClickHouse/ClickHouse/pull/60150) ([Raúl Marín](https://github.com/Algunenano)). -* Allow casting of bools in string representation to to true bools [#60160](https://github.com/ClickHouse/ClickHouse/pull/60160) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix system.s3queue_log [#60166](https://github.com/ClickHouse/ClickHouse/pull/60166) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix arrayReduce with nullable aggregate function name [#60188](https://github.com/ClickHouse/ClickHouse/pull/60188) ([Raúl Marín](https://github.com/Algunenano)). -* Fix actions execution during preliminary filtering (PK, partition pruning) [#60196](https://github.com/ClickHouse/ClickHouse/pull/60196) ([Azat Khuzhin](https://github.com/azat)). -* Hide sensitive info for s3queue [#60233](https://github.com/ClickHouse/ClickHouse/pull/60233) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Revert "Replace `ORDER BY ALL` by `ORDER BY *`" [#60248](https://github.com/ClickHouse/ClickHouse/pull/60248) ([Robert Schulze](https://github.com/rschu1ze)). -* Fix http exception codes. [#60252](https://github.com/ClickHouse/ClickHouse/pull/60252) ([Austin Kothig](https://github.com/kothiga)). -* s3queue: fix bug (also fixes flaky test_storage_s3_queue/test.py::test_shards_distributed) [#60282](https://github.com/ClickHouse/ClickHouse/pull/60282) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix use-of-uninitialized-value and invalid result in hashing functions with IPv6 [#60359](https://github.com/ClickHouse/ClickHouse/pull/60359) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix OptimizeDateOrDateTimeConverterWithPreimageVisitor with null arguments [#60453](https://github.com/ClickHouse/ClickHouse/pull/60453) ([Raúl Marín](https://github.com/Algunenano)). -* Merging [#59674](https://github.com/ClickHouse/ClickHouse/issues/59674). [#60470](https://github.com/ClickHouse/ClickHouse/pull/60470) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Correctly check keys in s3Cluster [#60477](https://github.com/ClickHouse/ClickHouse/pull/60477) ([Antonio Andelic](https://github.com/antonio2368)). +* Support `IN (subquery)` in table TTL expression. Initially, it was allowed to create such a TTL expression, but any TTL merge would fail with `Not-ready Set` error in the background. Now, TTL is correctly applied. Subquery is executed for every TTL merge, and its result is not cached or reused by other merges. Use such configuration with special care, because subqueries in TTL may lead to high memory consumption and, possibly, a non-deterministic result of TTL merge on different replicas (which is correctly handled by replication, however). [#57430](https://github.com/ClickHouse/ClickHouse/pull/57430) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix quantilesGK bug, close [#57683](https://github.com/ClickHouse/ClickHouse/issues/57683). [#58216](https://github.com/ClickHouse/ClickHouse/pull/58216) ([李扬](https://github.com/taiyang-li)). +* Disable parallel replicas JOIN with CTE (not analyzer). [#59239](https://github.com/ClickHouse/ClickHouse/pull/59239) ([Raúl Marín](https://github.com/Algunenano)). +* Fixes bug with for function `intDiv` with decimal arguments. Fixes [#56414](https://github.com/ClickHouse/ClickHouse/issues/56414). [#59243](https://github.com/ClickHouse/ClickHouse/pull/59243) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix translate() with FixedString input. Could lead to crashes as it'd return a String column (vs the expected FixedString). This issue was found through ClickHouse Bug Bounty Program YohannJardin. [#59356](https://github.com/ClickHouse/ClickHouse/pull/59356) ([Raúl Marín](https://github.com/Algunenano)). +* Keeper fix: fix digest calculation for nodes. [#59439](https://github.com/ClickHouse/ClickHouse/pull/59439) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix stacktraces for binaries without debug symbols. [#59444](https://github.com/ClickHouse/ClickHouse/pull/59444) ([Azat Khuzhin](https://github.com/azat)). +* Fix formatting of alter commands in case of column specific settings. [#59445](https://github.com/ClickHouse/ClickHouse/pull/59445) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* `SELECT * FROM [...] ORDER BY ALL SETTINGS allow_experimental_analyzer = 1` now works. [#59462](https://github.com/ClickHouse/ClickHouse/pull/59462) ([zhongyuankai](https://github.com/zhongyuankai)). +* Fix possible uncaught exception during distributed query cancellation. Closes [#59169](https://github.com/ClickHouse/ClickHouse/issues/59169). [#59487](https://github.com/ClickHouse/ClickHouse/pull/59487) ([Azat Khuzhin](https://github.com/azat)). +* Make MAX use the same rules as permutation for complex types. [#59498](https://github.com/ClickHouse/ClickHouse/pull/59498) ([Raúl Marín](https://github.com/Algunenano)). +* Fix a corner case when passing `update_insert_deduplication_token_in_dependent_materialized_views` setting. There is one corner case not covered due to the absence of tables in the path:. [#59544](https://github.com/ClickHouse/ClickHouse/pull/59544) ([Jordi Villar](https://github.com/jrdi)). +* Fix incorrect result of arrayElement / map[] on empty value. [#59594](https://github.com/ClickHouse/ClickHouse/pull/59594) ([Raúl Marín](https://github.com/Algunenano)). +* Fix crash in topK when merging empty states. [#59603](https://github.com/ClickHouse/ClickHouse/pull/59603) ([Raúl Marín](https://github.com/Algunenano)). +* Fix distributed table with a constant sharding key. [#59606](https://github.com/ClickHouse/ClickHouse/pull/59606) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix segmentation fault in KQL parser when the input query exceeds the `max_query_size`. Also re-enable the KQL dialect. Fixes [#59036](https://github.com/ClickHouse/ClickHouse/issues/59036) and [#59037](https://github.com/ClickHouse/ClickHouse/issues/59037). [#59626](https://github.com/ClickHouse/ClickHouse/pull/59626) ([Yong Wang](https://github.com/kashwy)). +* Fix error `Read beyond last offset` for `AsynchronousBoundedReadBuffer`. [#59630](https://github.com/ClickHouse/ClickHouse/pull/59630) ([Vitaly Baranov](https://github.com/vitlibar)). +* Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor. [#59658](https://github.com/ClickHouse/ClickHouse/pull/59658) ([Raúl Marín](https://github.com/Algunenano)). +* Fix query start time on non initial queries. [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)). +* Validate types of arguments for `minmax` skipping index. [#59733](https://github.com/ClickHouse/ClickHouse/pull/59733) ([Anton Popov](https://github.com/CurtizJ)). +* Fix leftPad / rightPad function with FixedString input. [#59739](https://github.com/ClickHouse/ClickHouse/pull/59739) ([Raúl Marín](https://github.com/Algunenano)). +* Fixed an exception in function `countMatches` with non-const `FixedString` haystack arguments, e.g. `SELECT countMatches(materialize(toFixedString('foobarfoo', 9)), 'foo');`. [#59752](https://github.com/ClickHouse/ClickHouse/pull/59752) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix having neigher acked nor nacked messages. If exception happens during read-write phase, messages will be nacked. [#59775](https://github.com/ClickHouse/ClickHouse/pull/59775) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fixed queries that read a Parquet file over HTTP (url()/URL()) executing in one thread instead of max_threads. [#59833](https://github.com/ClickHouse/ClickHouse/pull/59833) ([Michael Kolupaev](https://github.com/al13n321)). +* Fixed uninitialized value in s3 queue, which happened during upgrade to a new version if table had Ordered mode and resulted in an error "Existing table metadata in ZooKeeper differs in s3queue_processing_threads_num setting". [#59897](https://github.com/ClickHouse/ClickHouse/pull/59897) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix parsing of partition expressions that are surrounded by parentheses, e.g.: `ALTER TABLE test DROP PARTITION ('2023-10-19')`. [#59901](https://github.com/ClickHouse/ClickHouse/pull/59901) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Fix crash in JSONColumnsWithMetadata format over http. Closes [#59853](https://github.com/ClickHouse/ClickHouse/issues/59853). [#59925](https://github.com/ClickHouse/ClickHouse/pull/59925) ([Kruglov Pavel](https://github.com/Avogar)). +* Do not rewrite sum() to count() if return value differs in analyzer. [#59926](https://github.com/ClickHouse/ClickHouse/pull/59926) ([Azat Khuzhin](https://github.com/azat)). +* Fix crash during deserialization of aggregation function states that internally use `UniqExactSet`. Introduced https://github.com/ClickHouse/ClickHouse/pull/59009. [#59928](https://github.com/ClickHouse/ClickHouse/pull/59928) ([Maksim Kita](https://github.com/kitaisreal)). +* ReplicatedMergeTree fix invalid `metadata_version` node initialization in Zookeeper during creation of non first replica. Closes [#54902](https://github.com/ClickHouse/ClickHouse/issues/54902). [#59946](https://github.com/ClickHouse/ClickHouse/pull/59946) ([Maksim Kita](https://github.com/kitaisreal)). +* Fixed data race on cluster object between `StorageDistributed` and `Context::reloadClusterConfig()`. Former held const reference to its member while the latter destroyed the object (in process of replacing it with a new one). [#59987](https://github.com/ClickHouse/ClickHouse/pull/59987) ([Nikita Taranov](https://github.com/nickitat)). +* Fixes [#59989](https://github.com/ClickHouse/ClickHouse/issues/59989): runs init scripts when force-enabled or when no database exists, rather than the inverse. [#59991](https://github.com/ClickHouse/ClickHouse/pull/59991) ([jktng](https://github.com/jktng)). +* This PR fixes scale conversion for DateTime64 values (for example, DateTime64(6)->DateTime64(3)). ```SQL create table test (result DateTime64(3)) engine=Memory;. [#60004](https://github.com/ClickHouse/ClickHouse/pull/60004) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix INSERT into SQLite with single quote (by properly escaping single quotes with a quote instead of backslash). [#60015](https://github.com/ClickHouse/ClickHouse/pull/60015) ([Azat Khuzhin](https://github.com/azat)). +* Fix several logical errors in arrayFold. Fixes support for Nullable and LowCardinality. [#60022](https://github.com/ClickHouse/ClickHouse/pull/60022) ([Raúl Marín](https://github.com/Algunenano)). +* Fix optimize_uniq_to_count removing the column alias. [#60026](https://github.com/ClickHouse/ClickHouse/pull/60026) ([Raúl Marín](https://github.com/Algunenano)). +* Fix possible error while dropping s3queue table, like "no node shard0". [#60036](https://github.com/ClickHouse/ClickHouse/pull/60036) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix formatting of NOT with single literals. [#60042](https://github.com/ClickHouse/ClickHouse/pull/60042) ([Raúl Marín](https://github.com/Algunenano)). +* Use max_query_size from context in parsing changed settings in DDLWorker. Previously with large number of changed settings DDLWorker could fail with `Max query size exceeded` error and don't process log entries. [#60083](https://github.com/ClickHouse/ClickHouse/pull/60083) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix inconsistent formatting of queries containing tables named `table`. Fix wrong formatting of queries with `UNION ALL`, `INTERSECT`, and `EXCEPT` when their structure wasn't linear. This closes [#52349](https://github.com/ClickHouse/ClickHouse/issues/52349). Fix wrong formatting of `SYSTEM` queries, including `SYSTEM ... DROP FILESYSTEM CACHE`, `SYSTEM ... REFRESH/START/STOP/CANCEL/TEST VIEW`, `SYSTEM ENABLE/DISABLE FAILPOINT`. Fix formatting of parameterized DDL queries. Fix the formatting of the `DESCRIBE FILESYSTEM CACHE` query. Fix incorrect formatting of the `SET param_...` (a query setting a parameter). Fix incorrect formatting of `CREATE INDEX` queries. Fix inconsistent formatting of `CREATE USER` and similar queries. Fix inconsistent formatting of `CREATE SETTINGS PROFILE`. Fix incorrect formatting of `ALTER ... MODIFY REFRESH`. Fix inconsistent formatting of window functions if frame offsets were expressions. Fix inconsistent formatting of `RESPECT NULLS` and `IGNORE NULLS` if they were used after a function that implements an operator (such as `plus`). Fix idiotic formatting of `SYSTEM SYNC REPLICA ... LIGHTWEIGHT FROM ...`. Fix inconsistent formatting of invalid queries with `GROUP BY GROUPING SETS ... WITH ROLLUP/CUBE/TOTALS`. Fix inconsistent formatting of `GRANT CURRENT GRANTS`. Fix inconsistent formatting of `CREATE TABLE (... COLLATE)`. Additionally, I fixed the incorrect formatting of `EXPLAIN` in subqueries ([#60102](https://github.com/ClickHouse/ClickHouse/issues/60102)). Fixed incorrect formatting of lambda functions ([#60012](https://github.com/ClickHouse/ClickHouse/issues/60012)). Added a check so there is no way to miss these abominations in the future. [#60095](https://github.com/ClickHouse/ClickHouse/pull/60095) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Queries like `SELECT * FROM (EXPLAIN ...)` were formatted incorrectly. [#60102](https://github.com/ClickHouse/ClickHouse/pull/60102) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix cosineDistance crash with Nullable. [#60150](https://github.com/ClickHouse/ClickHouse/pull/60150) ([Raúl Marín](https://github.com/Algunenano)). +* Boolean values in string representation now cast to true bools. E.g. this query previously threw an exception but now works: `SELECT true = 'true'`. [#60160](https://github.com/ClickHouse/ClickHouse/pull/60160) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix non-filled column `table_uuid` in `system.s3queue_log`. Added columns `database` and `table`. Renamed `table_uuid` to `uuid`. [#60166](https://github.com/ClickHouse/ClickHouse/pull/60166) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix arrayReduce with nullable aggregate function name. [#60188](https://github.com/ClickHouse/ClickHouse/pull/60188) ([Raúl Marín](https://github.com/Algunenano)). +* Fix actions execution during preliminary filtering (PK, partition pruning). [#60196](https://github.com/ClickHouse/ClickHouse/pull/60196) ([Azat Khuzhin](https://github.com/azat)). +* Hide sensitive info for `S3Queue` table engine. [#60233](https://github.com/ClickHouse/ClickHouse/pull/60233) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Restore the previous syntax `ORDER BY ALL` which has temporarily (for a few days) been replaced by ORDER BY *. [#60248](https://github.com/ClickHouse/ClickHouse/pull/60248) ([Robert Schulze](https://github.com/rschu1ze)). +* Fixed a minor bug that caused all http return codes to be 200 (success) instead of a relevant code on exception. [#60252](https://github.com/ClickHouse/ClickHouse/pull/60252) ([Austin Kothig](https://github.com/kothiga)). +* Fix bug in `S3Queue` table engine with ordered parallel mode. [#60282](https://github.com/ClickHouse/ClickHouse/pull/60282) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix use-of-uninitialized-value and invalid result in hashing functions with IPv6. [#60359](https://github.com/ClickHouse/ClickHouse/pull/60359) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix OptimizeDateOrDateTimeConverterWithPreimageVisitor with null arguments. [#60453](https://github.com/ClickHouse/ClickHouse/pull/60453) ([Raúl Marín](https://github.com/Algunenano)). +* Fixed a minor bug that prevented distributed table queries sent from either KQL or PRQL dialect clients to be executed on replicas. [#60470](https://github.com/ClickHouse/ClickHouse/pull/60470) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix incomplete results with s3Cluster when multiple threads are used. [#60477](https://github.com/ClickHouse/ClickHouse/pull/60477) ([Antonio Andelic](https://github.com/antonio2368)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.2.2.71-stable.md b/docs/changelogs/v24.2.2.71-stable.md index b9aa5be626b..e17c22ab176 100644 --- a/docs/changelogs/v24.2.2.71-stable.md +++ b/docs/changelogs/v24.2.2.71-stable.md @@ -12,21 +12,21 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* PartsSplitter invalid ranges for the same part [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)). -* Try to avoid calculation of scalar subqueries for CREATE TABLE. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix_max_query_size_for_kql_compound_operator: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). -* Reduce the number of read rows from `system.numbers` [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)). -* Don't output number tips for date types [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)). -* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Prevent setting custom metadata headers on unsupported multipart upload operations [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). -* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). -* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). -* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix multiple bugs in groupArraySorted [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)). -* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#60640](https://github.com/ClickHouse/ClickHouse/issues/60640): Fixed a bug in parallel optimization for queries with `FINAL`, which could give an incorrect result in rare cases. [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)). +* Backported in [#61085](https://github.com/ClickHouse/ClickHouse/issues/61085): Avoid calculation of scalar subqueries for `CREATE TABLE`. Fixes [#59795](https://github.com/ClickHouse/ClickHouse/issues/59795) and [#59930](https://github.com/ClickHouse/ClickHouse/issues/59930). Attempt to re-implement https://github.com/ClickHouse/ClickHouse/pull/57855. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#61332](https://github.com/ClickHouse/ClickHouse/issues/61332): Fix deadlock in parallel parsing when lots of rows are skipped due to errors. [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#61010](https://github.com/ClickHouse/ClickHouse/issues/61010): Fix the issue of `max_query_size` for KQL compound operator like mv-expand. Related to [#59626](https://github.com/ClickHouse/ClickHouse/issues/59626). [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). +* Backported in [#61002](https://github.com/ClickHouse/ClickHouse/issues/61002): Reduce the number of read rows from `system.numbers`. Fixes [#59418](https://github.com/ClickHouse/ClickHouse/issues/59418). [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)). +* Backported in [#60629](https://github.com/ClickHouse/ClickHouse/issues/60629): Don't output number tips for date types. [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#60793](https://github.com/ClickHouse/ClickHouse/issues/60793): Fix buffer overflow that can happen if the attacker asks the HTTP server to decompress data with a composition of codecs and size triggering numeric overflow. Fix buffer overflow that can happen inside codec NONE on wrong input data. This was submitted by TIANGONG research team through our [Bug Bounty program](https://github.com/ClickHouse/ClickHouse/issues/38986). [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#60785](https://github.com/ClickHouse/ClickHouse/issues/60785): Functions for SQL/JSON were able to read uninitialized memory. This closes [#60017](https://github.com/ClickHouse/ClickHouse/issues/60017). Found by Fuzzer. [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Backported in [#60805](https://github.com/ClickHouse/ClickHouse/issues/60805): Do not set aws custom metadata `x-amz-meta-*` headers on UploadPart & CompleteMultipartUpload calls. [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). +* Backported in [#60822](https://github.com/ClickHouse/ClickHouse/issues/60822): Fix crash in arrayEnumerateRanked. [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#60843](https://github.com/ClickHouse/ClickHouse/issues/60843): Fix crash when using input() in INSERT SELECT JOIN. Closes [#60035](https://github.com/ClickHouse/ClickHouse/issues/60035). [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#60919](https://github.com/ClickHouse/ClickHouse/issues/60919): Fix crash when `allow_experimental_analyzer` setting value is changed in the subqueries. [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#60906](https://github.com/ClickHouse/ClickHouse/issues/60906): Avoid segfault if too many keys are skipped when reading from S3. [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#61307](https://github.com/ClickHouse/ClickHouse/issues/61307): Fix multiple bugs in groupArraySorted. [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#61295](https://github.com/ClickHouse/ClickHouse/issues/61295): Keeper: fix runtime reconfig for standalone binary. [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.2.3.70-stable.md b/docs/changelogs/v24.2.3.70-stable.md index cd88877e254..1a50355e0b9 100644 --- a/docs/changelogs/v24.2.3.70-stable.md +++ b/docs/changelogs/v24.2.3.70-stable.md @@ -15,28 +15,28 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix possible incorrect result of aggregate function `uniqExact` [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). -* Fix ATTACH query with external ON CLUSTER [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix consecutive keys optimization for nullable keys [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). -* fix issue of actions dag split [#61458](https://github.com/ClickHouse/ClickHouse/pull/61458) ([Raúl Marín](https://github.com/Algunenano)). -* Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings [#61468](https://github.com/ClickHouse/ClickHouse/pull/61468) ([Raúl Marín](https://github.com/Algunenano)). -* Fix bug when reading system.parts using UUID (issue 61220). [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). -* Fix ALTER QUERY MODIFY SQL SECURITY [#61480](https://github.com/ClickHouse/ClickHouse/pull/61480) ([pufit](https://github.com/pufit)). -* Fix client `-s` argument [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). -* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). -* Cancel merges before removing moved parts [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). -* Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). -* Crash in Engine Merge if Row Policy does not have expression [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). -* Fix data race on scalars in Context [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). -* Try to fix segfault in Hive engine [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix memory leak in groupArraySorted [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix GCD codec [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). -* Fix temporary data in cache incorrectly processing failure of cache key directory creation [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix incorrect judgement of of monotonicity of function abs [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). -* Make sanity check of settings worse [#63119](https://github.com/ClickHouse/ClickHouse/pull/63119) ([Raúl Marín](https://github.com/Algunenano)). -* Set server name for SSL handshake in MongoDB engine [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). -* Format SQL security option only in `CREATE VIEW` queries. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). +* Backported in [#61453](https://github.com/ClickHouse/ClickHouse/issues/61453): Fix possible incorrect result of aggregate function `uniqExact`. [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#61946](https://github.com/ClickHouse/ClickHouse/issues/61946): Fix the ATTACH query with the ON CLUSTER clause when the database does not exist on the initiator node. Closes [#55009](https://github.com/ClickHouse/ClickHouse/issues/55009). [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#61846](https://github.com/ClickHouse/ClickHouse/issues/61846): Fixed possible wrong result of aggregation with nullable keys. [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#61591](https://github.com/ClickHouse/ClickHouse/issues/61591): ActionsDAG::split can't make sure that "Execution of first then second parts on block is equivalent to execution of initial DAG.". [#61458](https://github.com/ClickHouse/ClickHouse/pull/61458) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#61648](https://github.com/ClickHouse/ClickHouse/issues/61648): Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings. [#61468](https://github.com/ClickHouse/ClickHouse/pull/61468) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#61748](https://github.com/ClickHouse/ClickHouse/issues/61748): Fix incorrect results when filtering `system.parts` or `system.parts_columns` using UUID. [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). +* Backported in [#61963](https://github.com/ClickHouse/ClickHouse/issues/61963): Fix the `ALTER QUERY MODIFY SQL SECURITY` queries to override the table's DDL correctly. [#61480](https://github.com/ClickHouse/ClickHouse/pull/61480) ([pufit](https://github.com/pufit)). +* Backported in [#61699](https://github.com/ClickHouse/ClickHouse/issues/61699): Fix `clickhouse-client -s` argument, it was broken by defining it two times. [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Backported in [#61578](https://github.com/ClickHouse/ClickHouse/issues/61578): Fix string search with constant start position which previously could lead to memory corruption. [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#62531](https://github.com/ClickHouse/ClickHouse/issues/62531): Fix data race between `MOVE PARTITION` query and merges resulting in intersecting parts. [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#61860](https://github.com/ClickHouse/ClickHouse/issues/61860): Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` when specifying incorrect UTF-8 sequence. Example: [#61714](https://github.com/ClickHouse/ClickHouse/issues/61714#issuecomment-2012768202). [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). +* Backported in [#62242](https://github.com/ClickHouse/ClickHouse/issues/62242): Fix skipping escape sequcne parsing errors during JSON data parsing while using `input_format_allow_errors_num/ratio` settings. [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62218](https://github.com/ClickHouse/ClickHouse/issues/62218): Fixes Crash in Engine Merge if Row Policy does not have expression. [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). +* Backported in [#62342](https://github.com/ClickHouse/ClickHouse/issues/62342): Fix data race on scalars in Context. [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62677](https://github.com/ClickHouse/ClickHouse/issues/62677): Fix segmentation fault when using Hive table engine. Reference [#62154](https://github.com/ClickHouse/ClickHouse/issues/62154), [#62560](https://github.com/ClickHouse/ClickHouse/issues/62560). [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#62639](https://github.com/ClickHouse/ClickHouse/issues/62639): Fix memory leak in groupArraySorted. Fix [#62536](https://github.com/ClickHouse/ClickHouse/issues/62536). [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#63054](https://github.com/ClickHouse/ClickHouse/issues/63054): Fixed bug in GCD codec implementation that may lead to server crashes. [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). +* Backported in [#63030](https://github.com/ClickHouse/ClickHouse/issues/63030): Fix temporary data in cache incorrect behaviour in case creation of cache key base directory fails with `no space left on device`. [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#63142](https://github.com/ClickHouse/ClickHouse/issues/63142): Fix incorrect judgement of of monotonicity of function `abs`. [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). +* Backported in [#63183](https://github.com/ClickHouse/ClickHouse/issues/63183): Sanity check: Clamp values instead of throwing. [#63119](https://github.com/ClickHouse/ClickHouse/pull/63119) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#63176](https://github.com/ClickHouse/ClickHouse/issues/63176): Setting server_name might help with recently reported SSL handshake error when connecting to MongoDB Atlas: `Poco::Exception. Code: 1000, e.code() = 0, SSL Exception: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR`. [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#63191](https://github.com/ClickHouse/ClickHouse/issues/63191): Fix a bug when `SQL SECURITY` statement appears in all `CREATE` queries if the server setting `ignore_empty_sql_security_in_create_view_query=true` https://github.com/ClickHouse/ClickHouse/pull/63134. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.3.1.2672-lts.md b/docs/changelogs/v24.3.1.2672-lts.md index 006ab941203..a70a33971c2 100644 --- a/docs/changelogs/v24.3.1.2672-lts.md +++ b/docs/changelogs/v24.3.1.2672-lts.md @@ -20,7 +20,7 @@ sidebar_label: 2024 #### New Feature * Topk/topkweighed support mode, which return count of values and it's error. [#54508](https://github.com/ClickHouse/ClickHouse/pull/54508) ([UnamedRus](https://github.com/UnamedRus)). -* Add generate_series as a table function. This function generates table with an arithmetic progression with natural numbers. [#59390](https://github.com/ClickHouse/ClickHouse/pull/59390) ([divanik](https://github.com/divanik)). +* Add generate_series as a table function. This function generates table with an arithmetic progression with natural numbers. [#59390](https://github.com/ClickHouse/ClickHouse/pull/59390) ([Daniil Ivanik](https://github.com/divanik)). * Support reading and writing backups as tar archives. [#59535](https://github.com/ClickHouse/ClickHouse/pull/59535) ([josh-hildred](https://github.com/josh-hildred)). * Implemented support for S3Express buckets. [#59965](https://github.com/ClickHouse/ClickHouse/pull/59965) ([Nikita Taranov](https://github.com/nickitat)). * Allow to attach parts from a different disk * attach partition from the table on other disks using copy instead of hard link (such as instant table) * attach partition using copy when the hard link fails even on the same disk. [#60112](https://github.com/ClickHouse/ClickHouse/pull/60112) ([Unalian](https://github.com/Unalian)). @@ -133,75 +133,75 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix function execution over const and LowCardinality with GROUP BY const for analyzer [#59986](https://github.com/ClickHouse/ClickHouse/pull/59986) ([Azat Khuzhin](https://github.com/azat)). -* Fix finished_mutations_to_keep=0 for MergeTree (as docs says 0 is to keep everything) [#60031](https://github.com/ClickHouse/ClickHouse/pull/60031) ([Azat Khuzhin](https://github.com/azat)). -* PartsSplitter invalid ranges for the same part [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)). -* Azure Blob Storage : Fix issues endpoint and prefix [#60251](https://github.com/ClickHouse/ClickHouse/pull/60251) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* fix LRUResource Cache bug (Hive cache) [#60262](https://github.com/ClickHouse/ClickHouse/pull/60262) ([shanfengp](https://github.com/Aed-p)). -* Force reanalysis if parallel replicas changed [#60362](https://github.com/ClickHouse/ClickHouse/pull/60362) ([Raúl Marín](https://github.com/Algunenano)). -* Fix usage of plain metadata type with new disks configuration option [#60396](https://github.com/ClickHouse/ClickHouse/pull/60396) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Try to fix logical error 'Cannot capture column because it has incompatible type' in mapContainsKeyLike [#60451](https://github.com/ClickHouse/ClickHouse/pull/60451) ([Kruglov Pavel](https://github.com/Avogar)). -* Try to avoid calculation of scalar subqueries for CREATE TABLE. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix_max_query_size_for_kql_compound_operator: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). -* Keeper fix: add timeouts when waiting for commit logs [#60544](https://github.com/ClickHouse/ClickHouse/pull/60544) ([Antonio Andelic](https://github.com/antonio2368)). -* Reduce the number of read rows from `system.numbers` [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)). -* Don't output number tips for date types [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)). -* Fix reading from MergeTree with non-deterministic functions in filter [#60586](https://github.com/ClickHouse/ClickHouse/pull/60586) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix logical error on bad compatibility setting value type [#60596](https://github.com/ClickHouse/ClickHouse/pull/60596) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix inconsistent aggregate function states in mixed x86-64 / ARM clusters [#60610](https://github.com/ClickHouse/ClickHouse/pull/60610) ([Harry Lee](https://github.com/HarryLeeIBM)). -* fix(prql): Robust panic handler [#60615](https://github.com/ClickHouse/ClickHouse/pull/60615) ([Maximilian Roos](https://github.com/max-sixty)). -* Fix `intDiv` for decimal and date arguments [#60672](https://github.com/ClickHouse/ClickHouse/pull/60672) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). -* Fix: expand CTE in alter modify query [#60682](https://github.com/ClickHouse/ClickHouse/pull/60682) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). -* Fix system.parts for non-Atomic/Ordinary database engine (i.e. Memory) [#60689](https://github.com/ClickHouse/ClickHouse/pull/60689) ([Azat Khuzhin](https://github.com/azat)). -* Fix "Invalid storage definition in metadata file" for parameterized views [#60708](https://github.com/ClickHouse/ClickHouse/pull/60708) ([Azat Khuzhin](https://github.com/azat)). -* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Remove wrong sanitize checking in aggregate function quantileGK [#60740](https://github.com/ClickHouse/ClickHouse/pull/60740) ([李扬](https://github.com/taiyang-li)). -* Fix insert-select + insert_deduplication_token bug by setting streams to 1 [#60745](https://github.com/ClickHouse/ClickHouse/pull/60745) ([Jordi Villar](https://github.com/jrdi)). -* Prevent setting custom metadata headers on unsupported multipart upload operations [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). -* Fix toStartOfInterval [#60763](https://github.com/ClickHouse/ClickHouse/pull/60763) ([Andrey Zvonov](https://github.com/zvonand)). -* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). -* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). -* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix possible stuck on error in HashedDictionaryParallelLoader [#60926](https://github.com/ClickHouse/ClickHouse/pull/60926) ([vdimir](https://github.com/vdimir)). -* Fix async RESTORE with Replicated database [#60934](https://github.com/ClickHouse/ClickHouse/pull/60934) ([Antonio Andelic](https://github.com/antonio2368)). -* fix csv format not support tuple [#60994](https://github.com/ClickHouse/ClickHouse/pull/60994) ([shuai.xu](https://github.com/shuai-xu)). -* Fix deadlock in async inserts to `Log` tables via native protocol [#61055](https://github.com/ClickHouse/ClickHouse/pull/61055) ([Anton Popov](https://github.com/CurtizJ)). -* Fix lazy execution of default argument in dictGetOrDefault for RangeHashedDictionary [#61196](https://github.com/ClickHouse/ClickHouse/pull/61196) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix multiple bugs in groupArraySorted [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)). -* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix usage of session_token in S3 engine [#61234](https://github.com/ClickHouse/ClickHouse/pull/61234) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix possible incorrect result of aggregate function `uniqExact` [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). -* Fix bugs in show database [#61269](https://github.com/ClickHouse/ClickHouse/pull/61269) ([Raúl Marín](https://github.com/Algunenano)). -* Fix logical error in RabbitMQ storage with MATERIALIZED columns [#61320](https://github.com/ClickHouse/ClickHouse/pull/61320) ([vdimir](https://github.com/vdimir)). -* Fix CREATE OR REPLACE DICTIONARY [#61356](https://github.com/ClickHouse/ClickHouse/pull/61356) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix crash in ObjectJson parsing array with nulls [#61364](https://github.com/ClickHouse/ClickHouse/pull/61364) ([vdimir](https://github.com/vdimir)). -* Fix ATTACH query with external ON CLUSTER [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix consecutive keys optimization for nullable keys [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). -* fix issue of actions dag split [#61458](https://github.com/ClickHouse/ClickHouse/pull/61458) ([Raúl Marín](https://github.com/Algunenano)). -* Fix finishing a failed RESTORE [#61466](https://github.com/ClickHouse/ClickHouse/pull/61466) ([Vitaly Baranov](https://github.com/vitlibar)). -* Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings [#61468](https://github.com/ClickHouse/ClickHouse/pull/61468) ([Raúl Marín](https://github.com/Algunenano)). -* Allow queuing in restore pool [#61475](https://github.com/ClickHouse/ClickHouse/pull/61475) ([Nikita Taranov](https://github.com/nickitat)). -* Fix bug when reading system.parts using UUID (issue 61220). [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). -* Fix ALTER QUERY MODIFY SQL SECURITY [#61480](https://github.com/ClickHouse/ClickHouse/pull/61480) ([pufit](https://github.com/pufit)). -* Fix crash in window view [#61526](https://github.com/ClickHouse/ClickHouse/pull/61526) ([Alexey Milovidov](https://github.com/alexey-milovidov)). -* Fix `repeat` with non native integers [#61527](https://github.com/ClickHouse/ClickHouse/pull/61527) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix client `-s` argument [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). -* Reset part level upon attach from disk on MergeTree [#61536](https://github.com/ClickHouse/ClickHouse/pull/61536) ([Arthur Passos](https://github.com/arthurpassos)). -* Fix crash in arrayPartialReverseSort [#61539](https://github.com/ClickHouse/ClickHouse/pull/61539) ([Raúl Marín](https://github.com/Algunenano)). -* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix addDays cause an error when used datetime64 [#61561](https://github.com/ClickHouse/ClickHouse/pull/61561) ([Shuai li](https://github.com/loneylee)). -* disallow LowCardinality input type for JSONExtract [#61617](https://github.com/ClickHouse/ClickHouse/pull/61617) ([Julia Kartseva](https://github.com/jkartseva)). -* Fix `system.part_log` for async insert with deduplication [#61620](https://github.com/ClickHouse/ClickHouse/pull/61620) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix Non-ready set for system.parts. [#61666](https://github.com/ClickHouse/ClickHouse/pull/61666) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Don't allow the same expression in ORDER BY with and without WITH FILL [#61667](https://github.com/ClickHouse/ClickHouse/pull/61667) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix actual_part_name for REPLACE_RANGE (`Entry actual part isn't empty yet`) [#61675](https://github.com/ClickHouse/ClickHouse/pull/61675) ([Alexander Tokmakov](https://github.com/tavplubix)). -* Fix columns after executing MODIFY QUERY for a materialized view with internal table [#61734](https://github.com/ClickHouse/ClickHouse/pull/61734) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). -* Fix RANGE frame is not supported for Nullable columns. [#61766](https://github.com/ClickHouse/ClickHouse/pull/61766) ([YuanLiu](https://github.com/ditgittube)). -* Revert "Revert "Fix bug when reading system.parts using UUID (issue 61220)."" [#61779](https://github.com/ClickHouse/ClickHouse/pull/61779) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Fix function execution over const and LowCardinality with GROUP BY const for analyzer. [#59986](https://github.com/ClickHouse/ClickHouse/pull/59986) ([Azat Khuzhin](https://github.com/azat)). +* Fix finished_mutations_to_keep=0 for MergeTree (as docs says 0 is to keep everything). [#60031](https://github.com/ClickHouse/ClickHouse/pull/60031) ([Azat Khuzhin](https://github.com/azat)). +* Fixed a bug in parallel optimization for queries with `FINAL`, which could give an incorrect result in rare cases. [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)). +* Updated to not include account_name in endpoint if flag `endpoint_contains_account_name` is set and fixed issue with empty container name. [#60251](https://github.com/ClickHouse/ClickHouse/pull/60251) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fix LRUResource Cache implementation that can be triggered by incorrect component usage. Error can't be triggered with current ClickHouse usage. close [#60122](https://github.com/ClickHouse/ClickHouse/issues/60122). [#60262](https://github.com/ClickHouse/ClickHouse/pull/60262) ([shanfengp](https://github.com/Aed-p)). +* Force reanalysis of the query if parallel replicas isn't supported in a subquery. [#60362](https://github.com/ClickHouse/ClickHouse/pull/60362) ([Raúl Marín](https://github.com/Algunenano)). +* Fix usage of plain metadata type for new disks configuration option. [#60396](https://github.com/ClickHouse/ClickHouse/pull/60396) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix logical error 'Cannot capture column because it has incompatible type' in mapContainsKeyLike. [#60451](https://github.com/ClickHouse/ClickHouse/pull/60451) ([Kruglov Pavel](https://github.com/Avogar)). +* Avoid calculation of scalar subqueries for `CREATE TABLE`. Fixes [#59795](https://github.com/ClickHouse/ClickHouse/issues/59795) and [#59930](https://github.com/ClickHouse/ClickHouse/issues/59930). Attempt to re-implement https://github.com/ClickHouse/ClickHouse/pull/57855. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix deadlock in parallel parsing when lots of rows are skipped due to errors. [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix the issue of `max_query_size` for KQL compound operator like mv-expand. Related to [#59626](https://github.com/ClickHouse/ClickHouse/issues/59626). [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)). +* Keeper fix: add timeouts when waiting for commit logs. Keeper could get stuck if the log successfully gets replicated but never committed. [#60544](https://github.com/ClickHouse/ClickHouse/pull/60544) ([Antonio Andelic](https://github.com/antonio2368)). +* Reduce the number of read rows from `system.numbers`. Fixes [#59418](https://github.com/ClickHouse/ClickHouse/issues/59418). [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)). +* Don't output number tips for date types. [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)). +* Fix unexpected result during reading from tables with virtual columns when filter contains non-deterministic functions. Closes [#61106](https://github.com/ClickHouse/ClickHouse/issues/61106). [#60586](https://github.com/ClickHouse/ClickHouse/pull/60586) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix logical error on bad compatibility setting value type. Closes [#60590](https://github.com/ClickHouse/ClickHouse/issues/60590). [#60596](https://github.com/ClickHouse/ClickHouse/pull/60596) ([Kruglov Pavel](https://github.com/Avogar)). +* Fixed potentially inconsistent aggregate function states in mixed x86-64 / ARM clusters. [#60610](https://github.com/ClickHouse/ClickHouse/pull/60610) ([Harry Lee](https://github.com/HarryLeeIBM)). +* Isolates the ClickHouse binary from any panics in `prqlc`. [#60615](https://github.com/ClickHouse/ClickHouse/pull/60615) ([Maximilian Roos](https://github.com/max-sixty)). +* Fixing bug where `intDiv` with decimal and date/datetime as arguments leads to crash. Closes [#60653](https://github.com/ClickHouse/ClickHouse/issues/60653). [#60672](https://github.com/ClickHouse/ClickHouse/pull/60672) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix bug when attempt to 'ALTER TABLE ... MODIFY QUERY' with CTE ends up with "Table [CTE] does not exist" exception (Code: 60). [#60682](https://github.com/ClickHouse/ClickHouse/pull/60682) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fix system.parts for non-Atomic/Ordinary database engine (i.e. Memory - major user is `clickhouse-local`). [#60689](https://github.com/ClickHouse/ClickHouse/pull/60689) ([Azat Khuzhin](https://github.com/azat)). +* Fix "Invalid storage definition in metadata file" for parameterized views. [#60708](https://github.com/ClickHouse/ClickHouse/pull/60708) ([Azat Khuzhin](https://github.com/azat)). +* Fix buffer overflow that can happen if the attacker asks the HTTP server to decompress data with a composition of codecs and size triggering numeric overflow. Fix buffer overflow that can happen inside codec NONE on wrong input data. This was submitted by TIANGONG research team through our [Bug Bounty program](https://github.com/ClickHouse/ClickHouse/issues/38986). [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Functions for SQL/JSON were able to read uninitialized memory. This closes [#60017](https://github.com/ClickHouse/ClickHouse/issues/60017). Found by Fuzzer. [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Remove wrong sanitize checking in aggregate function quantileGK: `sampled_len` in `ApproxSampler` is not guaranteed to be less than `default_compress_threshold`. `default_compress_threshold` is a just soft limitation while executing `ApproxSampler::insert`. cc @Algunenano. This issue was reproduced in https://github.com/oap-project/gluten/pull/4829. [#60740](https://github.com/ClickHouse/ClickHouse/pull/60740) ([李扬](https://github.com/taiyang-li)). +* Fix the issue causing undesired deduplication on insert-select queries passing a custom `insert_deduplication_token.` The change sets streams to 1 in those cases to prevent the issue from happening at the expense of ignoring `max_insert_threads > 1`. [#60745](https://github.com/ClickHouse/ClickHouse/pull/60745) ([Jordi Villar](https://github.com/jrdi)). +* Do not set aws custom metadata `x-amz-meta-*` headers on UploadPart & CompleteMultipartUpload calls. [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)). +* One more fix for toStartOfInterval returning wrong result for interval smaller than second. [#60763](https://github.com/ClickHouse/ClickHouse/pull/60763) ([Andrey Zvonov](https://github.com/zvonand)). +* Fix crash in arrayEnumerateRanked. [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)). +* Fix crash when using input() in INSERT SELECT JOIN. Closes [#60035](https://github.com/ClickHouse/ClickHouse/issues/60035). [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix crash when `allow_experimental_analyzer` setting value is changed in the subqueries. [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)). +* Avoid segfault if too many keys are skipped when reading from S3. [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix possible stuck on error while reloading dictionary with `SHARDS`. [#60926](https://github.com/ClickHouse/ClickHouse/pull/60926) ([vdimir](https://github.com/vdimir)). +* Fix async RESTORE with Replicated database. [#60934](https://github.com/ClickHouse/ClickHouse/pull/60934) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix csv write tuple in a wrong format and can not read it. [#60994](https://github.com/ClickHouse/ClickHouse/pull/60994) ([shuai.xu](https://github.com/shuai-xu)). +* Fixed deadlock in async inserts to `Log` tables via native protocol. [#61055](https://github.com/ClickHouse/ClickHouse/pull/61055) ([Anton Popov](https://github.com/CurtizJ)). +* Fix lazy execution of default argument in dictGetOrDefault for RangeHashedDictionary that could lead to nullptr dereference on bad column types in FunctionsConversion. Closes [#56661](https://github.com/ClickHouse/ClickHouse/issues/56661). [#61196](https://github.com/ClickHouse/ClickHouse/pull/61196) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix multiple bugs in groupArraySorted. [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)). +* Keeper: fix runtime reconfig for standalone binary. [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix usage of session_token in S3 engine. Fixes https://github.com/ClickHouse/ClickHouse/pull/57850#issuecomment-1966404710. [#61234](https://github.com/ClickHouse/ClickHouse/pull/61234) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix possible incorrect result of aggregate function `uniqExact`. [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)). +* Fix bugs in show database. [#61269](https://github.com/ClickHouse/ClickHouse/pull/61269) ([Raúl Marín](https://github.com/Algunenano)). +* Fix possible `LOGICAL_ERROR` in case storage with `RabbitMQ` engine has unsupported `MATERIALIZED|ALIAS|DEFAULT` columns. [#61320](https://github.com/ClickHouse/ClickHouse/pull/61320) ([vdimir](https://github.com/vdimir)). +* This PR fixes `CREATE OR REPLACE DICTIONARY` with `lazy_load` turned off. [#61356](https://github.com/ClickHouse/ClickHouse/pull/61356) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix possible crash in `Object('json')` data type parsing array with `null`s. [#61364](https://github.com/ClickHouse/ClickHouse/pull/61364) ([vdimir](https://github.com/vdimir)). +* Fix the ATTACH query with the ON CLUSTER clause when the database does not exist on the initiator node. Closes [#55009](https://github.com/ClickHouse/ClickHouse/issues/55009). [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)). +* Fixed possible wrong result of aggregation with nullable keys. [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)). +* ActionsDAG::split can't make sure that "Execution of first then second parts on block is equivalent to execution of initial DAG.". [#61458](https://github.com/ClickHouse/ClickHouse/pull/61458) ([Raúl Marín](https://github.com/Algunenano)). +* Fix finishing a failed RESTORE. [#61466](https://github.com/ClickHouse/ClickHouse/pull/61466) ([Vitaly Baranov](https://github.com/vitlibar)). +* Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings. [#61468](https://github.com/ClickHouse/ClickHouse/pull/61468) ([Raúl Marín](https://github.com/Algunenano)). +* Fix deadlock during `restore database` execution if `restore_threads` was set to 1. [#61475](https://github.com/ClickHouse/ClickHouse/pull/61475) ([Nikita Taranov](https://github.com/nickitat)). +* Fix incorrect results when filtering `system.parts` or `system.parts_columns` using UUID. [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)). +* Fix the `ALTER QUERY MODIFY SQL SECURITY` queries to override the table's DDL correctly. [#61480](https://github.com/ClickHouse/ClickHouse/pull/61480) ([pufit](https://github.com/pufit)). +* The experimental "window view" feature (it is disabled by default), which should not be used in production, could lead to a crash. Issue was identified by YohannJardin via Bugcrowd program. [#61526](https://github.com/ClickHouse/ClickHouse/pull/61526) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Fix `repeat` with non-native integers (e.g. `UInt256`). [#61527](https://github.com/ClickHouse/ClickHouse/pull/61527) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix `clickhouse-client -s` argument, it was broken by defining it two times. [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Fix too high part level reported in [#58558](https://github.com/ClickHouse/ClickHouse/issues/58558) by resetting MergeTree part levels upon attach from disk just like `ReplicatedMergeTree` [does](https://github.com/ClickHouse/ClickHouse/blob/9cd7e6155c7027baccd6dc5380d0813db94b03cc/src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp#L838). [#61536](https://github.com/ClickHouse/ClickHouse/pull/61536) ([Arthur Passos](https://github.com/arthurpassos)). +* Fix crash in arrayPartialReverseSort. [#61539](https://github.com/ClickHouse/ClickHouse/pull/61539) ([Raúl Marín](https://github.com/Algunenano)). +* Fix string search with constant start position which previously could lead to memory corruption. [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix the issue where the function `addDays` (and similar functions) reports an error when the first parameter is `DateTime64`. [#61561](https://github.com/ClickHouse/ClickHouse/pull/61561) ([Shuai li](https://github.com/loneylee)). +* Disallow LowCardinality type for the column containing JSON input in the JSONExtract function. [#61617](https://github.com/ClickHouse/ClickHouse/pull/61617) ([Julia Kartseva](https://github.com/jkartseva)). +* Add parts to `system.part_log` when created using async insert with deduplication. [#61620](https://github.com/ClickHouse/ClickHouse/pull/61620) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix `Not-ready Set` error while reading from `system.parts` (with `IN subquery`). Was introduced in [#60510](https://github.com/ClickHouse/ClickHouse/issues/60510). [#61666](https://github.com/ClickHouse/ClickHouse/pull/61666) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Don't allow the same expression in ORDER BY with and without WITH FILL. Such invalid expression could lead to logical error `Invalid number of rows in Chunk`. [#61667](https://github.com/ClickHouse/ClickHouse/pull/61667) ([Kruglov Pavel](https://github.com/Avogar)). +* Fixed `Entry actual part isn't empty yet. This is a bug. (LOGICAL_ERROR)` that might happen in rare cases after executing `REPLACE PARTITION`, `MOVE PARTITION TO TABLE` or `ATTACH PARTITION FROM`. [#61675](https://github.com/ClickHouse/ClickHouse/pull/61675) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Fix columns after executing `ALTER TABLE MODIFY QUERY` for a materialized view with internal table. A materialized view must have the same columns as its internal table if any, however `MODIFY QUERY` could break that rule before this PR causing the materialized view to be inconsistent. [#61734](https://github.com/ClickHouse/ClickHouse/pull/61734) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` when specifying incorrect UTF-8 sequence. Example: [#61714](https://github.com/ClickHouse/ClickHouse/issues/61714#issuecomment-2012768202). [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)). +* Fix RANGE frame is not supported for Nullable columns. ``` SELECT number, sum(number) OVER (ORDER BY number ASC RANGE BETWEEN CURRENT ROW AND 1 FOLLOWING) AS sum FROM values('number Nullable(Int8)', 1, 1, 2, 3, NULL). [#61766](https://github.com/ClickHouse/ClickHouse/pull/61766) ([YuanLiu](https://github.com/ditgittube)). +* Fix incorrect results when filtering `system.parts` or `system.parts_columns` using UUID. [#61779](https://github.com/ClickHouse/ClickHouse/pull/61779) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). #### CI Fix or Improvement (changelog entry is not required) @@ -526,7 +526,7 @@ sidebar_label: 2024 * No "please" [#61916](https://github.com/ClickHouse/ClickHouse/pull/61916) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Update version_date.tsv and changelogs after v23.12.6.19-stable [#61917](https://github.com/ClickHouse/ClickHouse/pull/61917) ([robot-clickhouse](https://github.com/robot-clickhouse)). * Update version_date.tsv and changelogs after v24.1.8.22-stable [#61918](https://github.com/ClickHouse/ClickHouse/pull/61918) ([robot-clickhouse](https://github.com/robot-clickhouse)). -* Fix flaky test_broken_projestions/test.py::test_broken_ignored_replic... [#61932](https://github.com/ClickHouse/ClickHouse/pull/61932) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix flaky test_broken_projestions/test.py::test_broken_ignored_replic… [#61932](https://github.com/ClickHouse/ClickHouse/pull/61932) ([Kseniia Sumarokova](https://github.com/kssenii)). * Check is Rust avaiable for build, if not, suggest a way to disable Rust support [#61938](https://github.com/ClickHouse/ClickHouse/pull/61938) ([Azat Khuzhin](https://github.com/azat)). * CI: new ci menu in PR body [#61948](https://github.com/ClickHouse/ClickHouse/pull/61948) ([Max K.](https://github.com/maxknv)). * Remove flaky test `01193_metadata_loading` [#61961](https://github.com/ClickHouse/ClickHouse/pull/61961) ([Nikita Taranov](https://github.com/nickitat)). diff --git a/docs/changelogs/v24.3.2.23-lts.md b/docs/changelogs/v24.3.2.23-lts.md index 4d59a1cedf6..d8adc63c8ac 100644 --- a/docs/changelogs/v24.3.2.23-lts.md +++ b/docs/changelogs/v24.3.2.23-lts.md @@ -9,9 +9,9 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix logical error in group_by_use_nulls + grouping set + analyzer + materialize/constant [#61567](https://github.com/ClickHouse/ClickHouse/pull/61567) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix external table cannot parse data type Bool [#62115](https://github.com/ClickHouse/ClickHouse/pull/62115) ([Duc Canh Le](https://github.com/canhld94)). -* Revert "Merge pull request [#61564](https://github.com/ClickHouse/ClickHouse/issues/61564) from liuneng1994/optimize_in_single_value" [#62135](https://github.com/ClickHouse/ClickHouse/pull/62135) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#62078](https://github.com/ClickHouse/ClickHouse/issues/62078): Fix logical error ''Unexpected return type from materialize. Expected Nullable. Got UInt8' while using group_by_use_nulls with analyzer and materialize/constant in grouping set. Closes [#61531](https://github.com/ClickHouse/ClickHouse/issues/61531). [#61567](https://github.com/ClickHouse/ClickHouse/pull/61567) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62122](https://github.com/ClickHouse/ClickHouse/issues/62122): Fix external table cannot parse data type Bool. [#62115](https://github.com/ClickHouse/ClickHouse/pull/62115) ([Duc Canh Le](https://github.com/canhld94)). +* Backported in [#62147](https://github.com/ClickHouse/ClickHouse/issues/62147): Revert "Merge pull request [#61564](https://github.com/ClickHouse/ClickHouse/issues/61564) from liuneng1994/optimize_in_single_value". The feature is broken and can't be disabled individually. [#62135](https://github.com/ClickHouse/ClickHouse/pull/62135) ([Raúl Marín](https://github.com/Algunenano)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.3.3.102-lts.md b/docs/changelogs/v24.3.3.102-lts.md index dc89ac24208..1cdbde67031 100644 --- a/docs/changelogs/v24.3.3.102-lts.md +++ b/docs/changelogs/v24.3.3.102-lts.md @@ -17,36 +17,36 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Cancel merges before removing moved parts [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). -* Crash in Engine Merge if Row Policy does not have expression [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). -* ReadWriteBufferFromHTTP set right header host when redirected [#62068](https://github.com/ClickHouse/ClickHouse/pull/62068) ([Sema Checherinda](https://github.com/CheSema)). -* Analyzer: Fix query parameter resolution [#62186](https://github.com/ClickHouse/ClickHouse/pull/62186) ([Dmitry Novik](https://github.com/novikd)). -* Fixing NULL random seed for generateRandom with analyzer. [#62248](https://github.com/ClickHouse/ClickHouse/pull/62248) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix PartsSplitter [#62268](https://github.com/ClickHouse/ClickHouse/pull/62268) ([Nikita Taranov](https://github.com/nickitat)). -* Analyzer: Fix alias to parametrized view resolution [#62274](https://github.com/ClickHouse/ClickHouse/pull/62274) ([Dmitry Novik](https://github.com/novikd)). -* Analyzer: Fix name resolution from parent scopes [#62281](https://github.com/ClickHouse/ClickHouse/pull/62281) ([Dmitry Novik](https://github.com/novikd)). -* Fix argMax with nullable non native numeric column [#62285](https://github.com/ClickHouse/ClickHouse/pull/62285) ([Raúl Marín](https://github.com/Algunenano)). -* Fix data race on scalars in Context [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix analyzer with positional arguments in distributed query [#62362](https://github.com/ClickHouse/ClickHouse/pull/62362) ([flynn](https://github.com/ucasfl)). -* Fix filter pushdown from additional_table_filters in Merge engine in analyzer [#62398](https://github.com/ClickHouse/ClickHouse/pull/62398) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix GLOBAL IN table queries with analyzer. [#62409](https://github.com/ClickHouse/ClickHouse/pull/62409) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix scalar subquery in LIMIT [#62567](https://github.com/ClickHouse/ClickHouse/pull/62567) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Try to fix segfault in Hive engine [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix memory leak in groupArraySorted [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix argMin/argMax combinator state [#62708](https://github.com/ClickHouse/ClickHouse/pull/62708) ([Raúl Marín](https://github.com/Algunenano)). -* Fix temporary data in cache failing because of cache lock contention optimization [#62715](https://github.com/ClickHouse/ClickHouse/pull/62715) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix FINAL modifier is not respected in CTE with analyzer [#62811](https://github.com/ClickHouse/ClickHouse/pull/62811) ([Duc Canh Le](https://github.com/canhld94)). -* Fix crash in function `formatRow` with `JSON` format and HTTP interface [#62840](https://github.com/ClickHouse/ClickHouse/pull/62840) ([Anton Popov](https://github.com/CurtizJ)). -* Fix GCD codec [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). -* Disable optimize_rewrite_aggregate_function_with_if for sum(nullable) [#62912](https://github.com/ClickHouse/ClickHouse/pull/62912) ([Raúl Marín](https://github.com/Algunenano)). -* Fix temporary data in cache incorrectly processing failure of cache key directory creation [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix optimize_rewrite_aggregate_function_with_if implicit cast [#62999](https://github.com/ClickHouse/ClickHouse/pull/62999) ([Raúl Marín](https://github.com/Algunenano)). -* Do not remove server constants from GROUP BY key for secondary query. [#63047](https://github.com/ClickHouse/ClickHouse/pull/63047) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix incorrect judgement of of monotonicity of function abs [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). -* Set server name for SSL handshake in MongoDB engine [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). -* Use user specified db instead of "config" for MongoDB wire protocol version check [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). -* Format SQL security option only in `CREATE VIEW` queries. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). +* Backported in [#62533](https://github.com/ClickHouse/ClickHouse/issues/62533): Fix data race between `MOVE PARTITION` query and merges resulting in intersecting parts. [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Backported in [#62244](https://github.com/ClickHouse/ClickHouse/issues/62244): Fix skipping escape sequcne parsing errors during JSON data parsing while using `input_format_allow_errors_num/ratio` settings. [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62220](https://github.com/ClickHouse/ClickHouse/issues/62220): Fixes Crash in Engine Merge if Row Policy does not have expression. [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). +* Backported in [#62234](https://github.com/ClickHouse/ClickHouse/issues/62234): ReadWriteBufferFromHTTP set right header host when redirected. [#62068](https://github.com/ClickHouse/ClickHouse/pull/62068) ([Sema Checherinda](https://github.com/CheSema)). +* Backported in [#62278](https://github.com/ClickHouse/ClickHouse/issues/62278): Fix query parameter resolution with `allow_experimental_analyzer` enabled. Closes [#62113](https://github.com/ClickHouse/ClickHouse/issues/62113). [#62186](https://github.com/ClickHouse/ClickHouse/pull/62186) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#62354](https://github.com/ClickHouse/ClickHouse/issues/62354): Fix `generateRandom` with `NULL` in the seed argument. Fixes [#62092](https://github.com/ClickHouse/ClickHouse/issues/62092). [#62248](https://github.com/ClickHouse/ClickHouse/pull/62248) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#62412](https://github.com/ClickHouse/ClickHouse/issues/62412): When some index columns are not loaded into memory for some parts of a *MergeTree table, queries with `FINAL` might produce wrong results. Now we explicitly choose only the common prefix of index columns for all parts to avoid this issue. [#62268](https://github.com/ClickHouse/ClickHouse/pull/62268) ([Nikita Taranov](https://github.com/nickitat)). +* Backported in [#62733](https://github.com/ClickHouse/ClickHouse/issues/62733): Fix inability to address parametrized view in SELECT queries via aliases. [#62274](https://github.com/ClickHouse/ClickHouse/pull/62274) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#62407](https://github.com/ClickHouse/ClickHouse/issues/62407): Fix name resolution in case when identifier is resolved to an executed scalar subquery. [#62281](https://github.com/ClickHouse/ClickHouse/pull/62281) ([Dmitry Novik](https://github.com/novikd)). +* Backported in [#62331](https://github.com/ClickHouse/ClickHouse/issues/62331): Fix argMax with nullable non native numeric column. [#62285](https://github.com/ClickHouse/ClickHouse/pull/62285) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#62344](https://github.com/ClickHouse/ClickHouse/issues/62344): Fix data race on scalars in Context. [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62484](https://github.com/ClickHouse/ClickHouse/issues/62484): Resolve positional arguments only on the initiator node. Closes [#62289](https://github.com/ClickHouse/ClickHouse/issues/62289). [#62362](https://github.com/ClickHouse/ClickHouse/pull/62362) ([flynn](https://github.com/ucasfl)). +* Backported in [#62442](https://github.com/ClickHouse/ClickHouse/issues/62442): Fix filter pushdown from additional_table_filters in Merge engine in analyzer. Closes [#62229](https://github.com/ClickHouse/ClickHouse/issues/62229). [#62398](https://github.com/ClickHouse/ClickHouse/pull/62398) ([Kruglov Pavel](https://github.com/Avogar)). +* Backported in [#62475](https://github.com/ClickHouse/ClickHouse/issues/62475): Fix `Unknown expression or table expression identifier` error for `GLOBAL IN table` queries (with new analyzer). Fixes [#62286](https://github.com/ClickHouse/ClickHouse/issues/62286). [#62409](https://github.com/ClickHouse/ClickHouse/pull/62409) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#62612](https://github.com/ClickHouse/ClickHouse/issues/62612): Fix an error `LIMIT expression must be constant` in queries with constant expression in `LIMIT`/`OFFSET` which contains scalar subquery. Fixes [#62294](https://github.com/ClickHouse/ClickHouse/issues/62294). [#62567](https://github.com/ClickHouse/ClickHouse/pull/62567) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#62679](https://github.com/ClickHouse/ClickHouse/issues/62679): Fix segmentation fault when using Hive table engine. Reference [#62154](https://github.com/ClickHouse/ClickHouse/issues/62154), [#62560](https://github.com/ClickHouse/ClickHouse/issues/62560). [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). +* Backported in [#62641](https://github.com/ClickHouse/ClickHouse/issues/62641): Fix memory leak in groupArraySorted. Fix [#62536](https://github.com/ClickHouse/ClickHouse/issues/62536). [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). +* Backported in [#62770](https://github.com/ClickHouse/ClickHouse/issues/62770): Fix argMin/argMax combinator state. [#62708](https://github.com/ClickHouse/ClickHouse/pull/62708) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#62750](https://github.com/ClickHouse/ClickHouse/issues/62750): Fix temporary data in cache failing because of a small value of setting `filesystem_cache_reserve_space_wait_lock_timeout_milliseconds`. Introduced a separate setting `temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds`. [#62715](https://github.com/ClickHouse/ClickHouse/pull/62715) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#62993](https://github.com/ClickHouse/ClickHouse/issues/62993): Fix an error when `FINAL` is not applied when specified in CTE (new analyzer). Fixes [#62779](https://github.com/ClickHouse/ClickHouse/issues/62779). [#62811](https://github.com/ClickHouse/ClickHouse/pull/62811) ([Duc Canh Le](https://github.com/canhld94)). +* Backported in [#62859](https://github.com/ClickHouse/ClickHouse/issues/62859): Fixed crash in function `formatRow` with `JSON` format in queries executed via the HTTP interface. [#62840](https://github.com/ClickHouse/ClickHouse/pull/62840) ([Anton Popov](https://github.com/CurtizJ)). +* Backported in [#63056](https://github.com/ClickHouse/ClickHouse/issues/63056): Fixed bug in GCD codec implementation that may lead to server crashes. [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). +* Backported in [#62960](https://github.com/ClickHouse/ClickHouse/issues/62960): Disable optimize_rewrite_aggregate_function_with_if for sum(nullable). [#62912](https://github.com/ClickHouse/ClickHouse/pull/62912) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#63032](https://github.com/ClickHouse/ClickHouse/issues/63032): Fix temporary data in cache incorrect behaviour in case creation of cache key base directory fails with `no space left on device`. [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Backported in [#63148](https://github.com/ClickHouse/ClickHouse/issues/63148): Fix optimize_rewrite_aggregate_function_with_if implicit cast. [#62999](https://github.com/ClickHouse/ClickHouse/pull/62999) ([Raúl Marín](https://github.com/Algunenano)). +* Backported in [#63146](https://github.com/ClickHouse/ClickHouse/issues/63146): Fix `Not found column in block` error for distributed queries with server-side constants in `GROUP BY` key. Fixes [#62682](https://github.com/ClickHouse/ClickHouse/issues/62682). [#63047](https://github.com/ClickHouse/ClickHouse/pull/63047) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Backported in [#63144](https://github.com/ClickHouse/ClickHouse/issues/63144): Fix incorrect judgement of of monotonicity of function `abs`. [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). +* Backported in [#63178](https://github.com/ClickHouse/ClickHouse/issues/63178): Setting server_name might help with recently reported SSL handshake error when connecting to MongoDB Atlas: `Poco::Exception. Code: 1000, e.code() = 0, SSL Exception: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR`. [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#63170](https://github.com/ClickHouse/ClickHouse/issues/63170): The wire protocol version check for MongoDB used to try accessing "config" database, but this can fail if the user doesn't have permissions for it. The fix is to use the database name provided by user. [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). +* Backported in [#63193](https://github.com/ClickHouse/ClickHouse/issues/63193): Fix a bug when `SQL SECURITY` statement appears in all `CREATE` queries if the server setting `ignore_empty_sql_security_in_create_view_query=true` https://github.com/ClickHouse/ClickHouse/pull/63134. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/changelogs/v24.4.1.2088-stable.md b/docs/changelogs/v24.4.1.2088-stable.md index b8d83f1a31f..06e704356d4 100644 --- a/docs/changelogs/v24.4.1.2088-stable.md +++ b/docs/changelogs/v24.4.1.2088-stable.md @@ -106,75 +106,75 @@ sidebar_label: 2024 #### Bug Fix (user-visible misbehavior in an official stable release) -* Fix parser error when using COUNT(*) with FILTER clause [#61357](https://github.com/ClickHouse/ClickHouse/pull/61357) ([Duc Canh Le](https://github.com/canhld94)). -* Fix logical error in group_by_use_nulls + grouping set + analyzer + materialize/constant [#61567](https://github.com/ClickHouse/ClickHouse/pull/61567) ([Kruglov Pavel](https://github.com/Avogar)). -* Cancel merges before removing moved parts [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Try to fix abort in arrow [#61720](https://github.com/ClickHouse/ClickHouse/pull/61720) ([Kruglov Pavel](https://github.com/Avogar)). -* Search for convert_to_replicated flag at the correct path [#61769](https://github.com/ClickHouse/ClickHouse/pull/61769) ([Kirill](https://github.com/kirillgarbar)). -* Fix possible connections data-race for distributed_foreground_insert/distributed_background_insert_batch [#61867](https://github.com/ClickHouse/ClickHouse/pull/61867) ([Azat Khuzhin](https://github.com/azat)). -* Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix writing exception message in output format in HTTP when http_wait_end_of_query is used [#61951](https://github.com/ClickHouse/ClickHouse/pull/61951) ([Kruglov Pavel](https://github.com/Avogar)). -* Proper fix for LowCardinality together with JSONExtact functions [#61957](https://github.com/ClickHouse/ClickHouse/pull/61957) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). -* Crash in Engine Merge if Row Policy does not have expression [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). -* Fix WriteBufferAzureBlobStorage destructor uncaught exception [#61988](https://github.com/ClickHouse/ClickHouse/pull/61988) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* Fix CREATE TABLE w/o columns definition for ReplicatedMergeTree [#62040](https://github.com/ClickHouse/ClickHouse/pull/62040) ([Azat Khuzhin](https://github.com/azat)). -* Fix optimize_skip_unused_shards_rewrite_in for composite sharding key [#62047](https://github.com/ClickHouse/ClickHouse/pull/62047) ([Azat Khuzhin](https://github.com/azat)). -* ReadWriteBufferFromHTTP set right header host when redirected [#62068](https://github.com/ClickHouse/ClickHouse/pull/62068) ([Sema Checherinda](https://github.com/CheSema)). -* Fix external table cannot parse data type Bool [#62115](https://github.com/ClickHouse/ClickHouse/pull/62115) ([Duc Canh Le](https://github.com/canhld94)). -* Revert "Merge pull request [#61564](https://github.com/ClickHouse/ClickHouse/issues/61564) from liuneng1994/optimize_in_single_value" [#62135](https://github.com/ClickHouse/ClickHouse/pull/62135) ([Raúl Marín](https://github.com/Algunenano)). -* Add test for [#35215](https://github.com/ClickHouse/ClickHouse/issues/35215) [#62180](https://github.com/ClickHouse/ClickHouse/pull/62180) ([Raúl Marín](https://github.com/Algunenano)). -* Analyzer: Fix query parameter resolution [#62186](https://github.com/ClickHouse/ClickHouse/pull/62186) ([Dmitry Novik](https://github.com/novikd)). -* Fix restoring parts while readonly [#62207](https://github.com/ClickHouse/ClickHouse/pull/62207) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix crash in index definition containing sql udf [#62225](https://github.com/ClickHouse/ClickHouse/pull/62225) ([vdimir](https://github.com/vdimir)). -* Fixing NULL random seed for generateRandom with analyzer. [#62248](https://github.com/ClickHouse/ClickHouse/pull/62248) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Correctly handle const columns in DistinctTransfom [#62250](https://github.com/ClickHouse/ClickHouse/pull/62250) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix PartsSplitter [#62268](https://github.com/ClickHouse/ClickHouse/pull/62268) ([Nikita Taranov](https://github.com/nickitat)). -* Analyzer: Fix alias to parametrized view resolution [#62274](https://github.com/ClickHouse/ClickHouse/pull/62274) ([Dmitry Novik](https://github.com/novikd)). -* Analyzer: Fix name resolution from parent scopes [#62281](https://github.com/ClickHouse/ClickHouse/pull/62281) ([Dmitry Novik](https://github.com/novikd)). -* Fix argMax with nullable non native numeric column [#62285](https://github.com/ClickHouse/ClickHouse/pull/62285) ([Raúl Marín](https://github.com/Algunenano)). -* Fix BACKUP and RESTORE of a materialized view in Ordinary database [#62295](https://github.com/ClickHouse/ClickHouse/pull/62295) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fix data race on scalars in Context [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix primary key in materialized view [#62319](https://github.com/ClickHouse/ClickHouse/pull/62319) ([Murat Khairulin](https://github.com/mxwell)). -* Do not build multithread insert pipeline for tables without support [#62333](https://github.com/ClickHouse/ClickHouse/pull/62333) ([vdimir](https://github.com/vdimir)). -* Fix analyzer with positional arguments in distributed query [#62362](https://github.com/ClickHouse/ClickHouse/pull/62362) ([flynn](https://github.com/ucasfl)). -* Fix filter pushdown from additional_table_filters in Merge engine in analyzer [#62398](https://github.com/ClickHouse/ClickHouse/pull/62398) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix GLOBAL IN table queries with analyzer. [#62409](https://github.com/ClickHouse/ClickHouse/pull/62409) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Respect settings truncate_on_insert/create_new_file_on_insert in s3/hdfs/azure engines during partitioned write [#62425](https://github.com/ClickHouse/ClickHouse/pull/62425) ([Kruglov Pavel](https://github.com/Avogar)). -* Fix backup restore path for AzureBlobStorage [#62447](https://github.com/ClickHouse/ClickHouse/pull/62447) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). -* Fix SimpleSquashingChunksTransform [#62451](https://github.com/ClickHouse/ClickHouse/pull/62451) ([Nikita Taranov](https://github.com/nickitat)). -* Fix capture of nested lambda. [#62462](https://github.com/ClickHouse/ClickHouse/pull/62462) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix validation of special MergeTree columns [#62498](https://github.com/ClickHouse/ClickHouse/pull/62498) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). -* Avoid crash when reading protobuf with recursive types [#62506](https://github.com/ClickHouse/ClickHouse/pull/62506) ([Raúl Marín](https://github.com/Algunenano)). -* Fix a bug moving one partition from one to itself [#62524](https://github.com/ClickHouse/ClickHouse/pull/62524) ([helifu](https://github.com/helifu)). -* Fix scalar subquery in LIMIT [#62567](https://github.com/ClickHouse/ClickHouse/pull/62567) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Try to fix segfault in Hive engine [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). -* Fix memory leak in groupArraySorted [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). -* Fix crash in largestTriangleThreeBuckets [#62646](https://github.com/ClickHouse/ClickHouse/pull/62646) ([Raúl Marín](https://github.com/Algunenano)). -* Fix tumble[Start,End] and hop[Start,End] for bigger resolutions [#62705](https://github.com/ClickHouse/ClickHouse/pull/62705) ([Jordi Villar](https://github.com/jrdi)). -* Fix argMin/argMax combinator state [#62708](https://github.com/ClickHouse/ClickHouse/pull/62708) ([Raúl Marín](https://github.com/Algunenano)). -* Fix temporary data in cache failing because of cache lock contention optimization [#62715](https://github.com/ClickHouse/ClickHouse/pull/62715) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix crash in function `mergeTreeIndex` [#62762](https://github.com/ClickHouse/ClickHouse/pull/62762) ([Anton Popov](https://github.com/CurtizJ)). -* fix: update: nested materialized columns: size check fixes [#62773](https://github.com/ClickHouse/ClickHouse/pull/62773) ([Eliot Hautefeuille](https://github.com/hileef)). -* Fix FINAL modifier is not respected in CTE with analyzer [#62811](https://github.com/ClickHouse/ClickHouse/pull/62811) ([Duc Canh Le](https://github.com/canhld94)). -* Fix crash in function `formatRow` with `JSON` format and HTTP interface [#62840](https://github.com/ClickHouse/ClickHouse/pull/62840) ([Anton Popov](https://github.com/CurtizJ)). -* Azure: fix building final url from endpoint object [#62850](https://github.com/ClickHouse/ClickHouse/pull/62850) ([Daniel Pozo Escalona](https://github.com/danipozo)). -* Fix GCD codec [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). -* Fix LowCardinality(Nullable) key in hyperrectangle [#62866](https://github.com/ClickHouse/ClickHouse/pull/62866) ([Amos Bird](https://github.com/amosbird)). -* Fix fromUnixtimestamp in joda syntax while the input value beyond UInt32 [#62901](https://github.com/ClickHouse/ClickHouse/pull/62901) ([KevinyhZou](https://github.com/KevinyhZou)). -* Disable optimize_rewrite_aggregate_function_with_if for sum(nullable) [#62912](https://github.com/ClickHouse/ClickHouse/pull/62912) ([Raúl Marín](https://github.com/Algunenano)). -* Fix PREWHERE for StorageBuffer with different source table column types. [#62916](https://github.com/ClickHouse/ClickHouse/pull/62916) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix temporary data in cache incorrectly processing failure of cache key directory creation [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). -* gRPC: fix crash on IPv6 peer connection [#62978](https://github.com/ClickHouse/ClickHouse/pull/62978) ([Konstantin Bogdanov](https://github.com/thevar1able)). -* Fix possible CHECKSUM_DOESNT_MATCH (and others) during replicated fetches [#62987](https://github.com/ClickHouse/ClickHouse/pull/62987) ([Azat Khuzhin](https://github.com/azat)). -* Fix terminate with uncaught exception in temporary data in cache [#62998](https://github.com/ClickHouse/ClickHouse/pull/62998) ([Kseniia Sumarokova](https://github.com/kssenii)). -* Fix optimize_rewrite_aggregate_function_with_if implicit cast [#62999](https://github.com/ClickHouse/ClickHouse/pull/62999) ([Raúl Marín](https://github.com/Algunenano)). -* Fix unhandled exception in ~RestorerFromBackup [#63040](https://github.com/ClickHouse/ClickHouse/pull/63040) ([Vitaly Baranov](https://github.com/vitlibar)). -* Do not remove server constants from GROUP BY key for secondary query. [#63047](https://github.com/ClickHouse/ClickHouse/pull/63047) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). -* Fix incorrect judgement of of monotonicity of function abs [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). -* Make sanity check of settings worse [#63119](https://github.com/ClickHouse/ClickHouse/pull/63119) ([Raúl Marín](https://github.com/Algunenano)). -* Set server name for SSL handshake in MongoDB engine [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). -* Use user specified db instead of "config" for MongoDB wire protocol version check [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). -* Format SQL security option only in `CREATE VIEW` queries. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). +* Fix parser error when using COUNT(*) with FILTER clause. [#61357](https://github.com/ClickHouse/ClickHouse/pull/61357) ([Duc Canh Le](https://github.com/canhld94)). +* Fix logical error ''Unexpected return type from materialize. Expected Nullable. Got UInt8' while using group_by_use_nulls with analyzer and materialize/constant in grouping set. Closes [#61531](https://github.com/ClickHouse/ClickHouse/issues/61531). [#61567](https://github.com/ClickHouse/ClickHouse/pull/61567) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix data race between `MOVE PARTITION` query and merges resulting in intersecting parts. [#61610](https://github.com/ClickHouse/ClickHouse/pull/61610) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* TBD. [#61720](https://github.com/ClickHouse/ClickHouse/pull/61720) ([Kruglov Pavel](https://github.com/Avogar)). +* Search for MergeTree to ReplicatedMergeTree conversion flag at the correct location for tables with custom storage policy. [#61769](https://github.com/ClickHouse/ClickHouse/pull/61769) ([Kirill](https://github.com/kirillgarbar)). +* Fix possible connections data-race for distributed_foreground_insert/distributed_background_insert_batch that leads to crashes. [#61867](https://github.com/ClickHouse/ClickHouse/pull/61867) ([Azat Khuzhin](https://github.com/azat)). +* Fix skipping escape sequcne parsing errors during JSON data parsing while using `input_format_allow_errors_num/ratio` settings. [#61883](https://github.com/ClickHouse/ClickHouse/pull/61883) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix writing exception message in output format in HTTP when http_wait_end_of_query is used. Closes [#55101](https://github.com/ClickHouse/ClickHouse/issues/55101). [#61951](https://github.com/ClickHouse/ClickHouse/pull/61951) ([Kruglov Pavel](https://github.com/Avogar)). +* This PR reverts https://github.com/ClickHouse/ClickHouse/pull/61617 and fixed the problem with usage of LowCardinality columns together with JSONExtract function. Previously the user may receive either incorrect result of a logical error. [#61957](https://github.com/ClickHouse/ClickHouse/pull/61957) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Fixes Crash in Engine Merge if Row Policy does not have expression. [#61971](https://github.com/ClickHouse/ClickHouse/pull/61971) ([Ilya Golshtein](https://github.com/ilejn)). +* Implemented preFinalize, updated finalizeImpl & destructor of WriteBufferAzureBlobStorage to avoided having uncaught exception in destructor. [#61988](https://github.com/ClickHouse/ClickHouse/pull/61988) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fix CREATE TABLE w/o columns definition for ReplicatedMergeTree (columns will be obtained from replica). [#62040](https://github.com/ClickHouse/ClickHouse/pull/62040) ([Azat Khuzhin](https://github.com/azat)). +* Fix optimize_skip_unused_shards_rewrite_in for composite sharding key (could lead to `NOT_FOUND_COLUMN_IN_BLOCK` and `TYPE_MISMATCH`). [#62047](https://github.com/ClickHouse/ClickHouse/pull/62047) ([Azat Khuzhin](https://github.com/azat)). +* ReadWriteBufferFromHTTP set right header host when redirected. [#62068](https://github.com/ClickHouse/ClickHouse/pull/62068) ([Sema Checherinda](https://github.com/CheSema)). +* Fix external table cannot parse data type Bool. [#62115](https://github.com/ClickHouse/ClickHouse/pull/62115) ([Duc Canh Le](https://github.com/canhld94)). +* Revert "Merge pull request [#61564](https://github.com/ClickHouse/ClickHouse/issues/61564) from liuneng1994/optimize_in_single_value". The feature is broken and can't be disabled individually. [#62135](https://github.com/ClickHouse/ClickHouse/pull/62135) ([Raúl Marín](https://github.com/Algunenano)). +* Fix override of MergeTree virtual columns. [#62180](https://github.com/ClickHouse/ClickHouse/pull/62180) ([Raúl Marín](https://github.com/Algunenano)). +* Fix query parameter resolution with `allow_experimental_analyzer` enabled. Closes [#62113](https://github.com/ClickHouse/ClickHouse/issues/62113). [#62186](https://github.com/ClickHouse/ClickHouse/pull/62186) ([Dmitry Novik](https://github.com/novikd)). +* This PR makes `RESTORE ON CLUSTER` wait for each `ReplicatedMergeTree` table to stop being readonly before attaching any restored parts to it. Earlier it didn't wait and it could try to attach some parts at nearly the same time as checking other replicas during the table's startup. In rare cases some parts could be not attached at all during `RESTORE ON CLUSTER` because of that issue. [#62207](https://github.com/ClickHouse/ClickHouse/pull/62207) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix crash on `CREATE TABLE` with `INDEX` containing SQL UDF in expression, close [#62134](https://github.com/ClickHouse/ClickHouse/issues/62134). [#62225](https://github.com/ClickHouse/ClickHouse/pull/62225) ([vdimir](https://github.com/vdimir)). +* Fix `generateRandom` with `NULL` in the seed argument. Fixes [#62092](https://github.com/ClickHouse/ClickHouse/issues/62092). [#62248](https://github.com/ClickHouse/ClickHouse/pull/62248) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix buffer overflow when `DISTINCT` is used with constant values. [#62250](https://github.com/ClickHouse/ClickHouse/pull/62250) ([Antonio Andelic](https://github.com/antonio2368)). +* When some index columns are not loaded into memory for some parts of a *MergeTree table, queries with `FINAL` might produce wrong results. Now we explicitly choose only the common prefix of index columns for all parts to avoid this issue. [#62268](https://github.com/ClickHouse/ClickHouse/pull/62268) ([Nikita Taranov](https://github.com/nickitat)). +* Fix inability to address parametrized view in SELECT queries via aliases. [#62274](https://github.com/ClickHouse/ClickHouse/pull/62274) ([Dmitry Novik](https://github.com/novikd)). +* Fix name resolution in case when identifier is resolved to an executed scalar subquery. [#62281](https://github.com/ClickHouse/ClickHouse/pull/62281) ([Dmitry Novik](https://github.com/novikd)). +* Fix argMax with nullable non native numeric column. [#62285](https://github.com/ClickHouse/ClickHouse/pull/62285) ([Raúl Marín](https://github.com/Algunenano)). +* Fix BACKUP and RESTORE of a materialized view in Ordinary database. [#62295](https://github.com/ClickHouse/ClickHouse/pull/62295) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix data race on scalars in Context. [#62305](https://github.com/ClickHouse/ClickHouse/pull/62305) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix displaying of materialized_view primary_key in system.tables. Previously it was shown empty even when a CREATE query included PRIMARY KEY. [#62319](https://github.com/ClickHouse/ClickHouse/pull/62319) ([Murat Khairulin](https://github.com/mxwell)). +* Do not build multithread insert pipeline for engines without `max_insert_threads` support. Fix insterted rows order in queries like `INSERT INTO FUNCTION file/s3(...) SELECT * FROM ORDER BY col`. [#62333](https://github.com/ClickHouse/ClickHouse/pull/62333) ([vdimir](https://github.com/vdimir)). +* Resolve positional arguments only on the initiator node. Closes [#62289](https://github.com/ClickHouse/ClickHouse/issues/62289). [#62362](https://github.com/ClickHouse/ClickHouse/pull/62362) ([flynn](https://github.com/ucasfl)). +* Fix filter pushdown from additional_table_filters in Merge engine in analyzer. Closes [#62229](https://github.com/ClickHouse/ClickHouse/issues/62229). [#62398](https://github.com/ClickHouse/ClickHouse/pull/62398) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix `Unknown expression or table expression identifier` error for `GLOBAL IN table` queries (with new analyzer). Fixes [#62286](https://github.com/ClickHouse/ClickHouse/issues/62286). [#62409](https://github.com/ClickHouse/ClickHouse/pull/62409) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Respect settings truncate_on_insert/create_new_file_on_insert in s3/hdfs/azure engines during partitioned write. Closes [#61492](https://github.com/ClickHouse/ClickHouse/issues/61492). [#62425](https://github.com/ClickHouse/ClickHouse/pull/62425) ([Kruglov Pavel](https://github.com/Avogar)). +* Fix backup restore path for AzureBlobStorage to include specified blob path. [#62447](https://github.com/ClickHouse/ClickHouse/pull/62447) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)). +* Fixed rare bug in `SimpleSquashingChunksTransform` that may lead to a loss of the last chunk of data in a stream. [#62451](https://github.com/ClickHouse/ClickHouse/pull/62451) ([Nikita Taranov](https://github.com/nickitat)). +* Fix excessive memory usage for queries with nested lambdas. Fixes [#62036](https://github.com/ClickHouse/ClickHouse/issues/62036). [#62462](https://github.com/ClickHouse/ClickHouse/pull/62462) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix validation of special columns (`ver`, `is_deleted`, `sign`) in MergeTree engines on table creation and alter queries. Fixes [#62463](https://github.com/ClickHouse/ClickHouse/issues/62463). [#62498](https://github.com/ClickHouse/ClickHouse/pull/62498) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Avoid crash when reading protobuf with recursive types. [#62506](https://github.com/ClickHouse/ClickHouse/pull/62506) ([Raúl Marín](https://github.com/Algunenano)). +* Fix [62459](https://github.com/ClickHouse/ClickHouse/issues/62459). [#62524](https://github.com/ClickHouse/ClickHouse/pull/62524) ([helifu](https://github.com/helifu)). +* Fix an error `LIMIT expression must be constant` in queries with constant expression in `LIMIT`/`OFFSET` which contains scalar subquery. Fixes [#62294](https://github.com/ClickHouse/ClickHouse/issues/62294). [#62567](https://github.com/ClickHouse/ClickHouse/pull/62567) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix segmentation fault when using Hive table engine. Reference [#62154](https://github.com/ClickHouse/ClickHouse/issues/62154), [#62560](https://github.com/ClickHouse/ClickHouse/issues/62560). [#62578](https://github.com/ClickHouse/ClickHouse/pull/62578) ([Nikolay Degterinsky](https://github.com/evillique)). +* Fix memory leak in groupArraySorted. Fix [#62536](https://github.com/ClickHouse/ClickHouse/issues/62536). [#62597](https://github.com/ClickHouse/ClickHouse/pull/62597) ([Antonio Andelic](https://github.com/antonio2368)). +* Fix crash in largestTriangleThreeBuckets. [#62646](https://github.com/ClickHouse/ClickHouse/pull/62646) ([Raúl Marín](https://github.com/Algunenano)). +* Fix `tumble[Start,End]` and `hop[Start,End]` functions for resolutions bigger than a day. [#62705](https://github.com/ClickHouse/ClickHouse/pull/62705) ([Jordi Villar](https://github.com/jrdi)). +* Fix argMin/argMax combinator state. [#62708](https://github.com/ClickHouse/ClickHouse/pull/62708) ([Raúl Marín](https://github.com/Algunenano)). +* Fix temporary data in cache failing because of a small value of setting `filesystem_cache_reserve_space_wait_lock_timeout_milliseconds`. Introduced a separate setting `temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds`. [#62715](https://github.com/ClickHouse/ClickHouse/pull/62715) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fixed crash in table function `mergeTreeIndex` after offloading some of the columns from suffix of primary key. [#62762](https://github.com/ClickHouse/ClickHouse/pull/62762) ([Anton Popov](https://github.com/CurtizJ)). +* Fix size checks when updating materialized nested columns ( fixes [#62731](https://github.com/ClickHouse/ClickHouse/issues/62731) ). [#62773](https://github.com/ClickHouse/ClickHouse/pull/62773) ([Eliot Hautefeuille](https://github.com/hileef)). +* Fix an error when `FINAL` is not applied when specified in CTE (new analyzer). Fixes [#62779](https://github.com/ClickHouse/ClickHouse/issues/62779). [#62811](https://github.com/ClickHouse/ClickHouse/pull/62811) ([Duc Canh Le](https://github.com/canhld94)). +* Fixed crash in function `formatRow` with `JSON` format in queries executed via the HTTP interface. [#62840](https://github.com/ClickHouse/ClickHouse/pull/62840) ([Anton Popov](https://github.com/CurtizJ)). +* Fix failure to start when storage account URL has trailing slash. [#62850](https://github.com/ClickHouse/ClickHouse/pull/62850) ([Daniel Pozo Escalona](https://github.com/danipozo)). +* Fixed bug in GCD codec implementation that may lead to server crashes. [#62853](https://github.com/ClickHouse/ClickHouse/pull/62853) ([Nikita Taranov](https://github.com/nickitat)). +* Fix incorrect key analysis when LowCardinality(Nullable) keys appear in the middle of a hyperrectangle. This fixes [#62848](https://github.com/ClickHouse/ClickHouse/issues/62848). [#62866](https://github.com/ClickHouse/ClickHouse/pull/62866) ([Amos Bird](https://github.com/amosbird)). +* When we use function `fromUnixTimestampInJodaSyntax` to convert the input `Int64` or `UInt64` value to `DateTime`, sometimes it return the wrong result,because the input value may exceed the maximum value of Uint32 type,and the function will first convert the input value to Uint32, and so would lead to the wrong result. For example we have a table `test_tbl(a Int64, b UInt64)`, and it has a row (`10262736196`, `10262736196`), when use `fromUnixTimestampInJodaSyntax` to convert, the wrong result as below. [#62901](https://github.com/ClickHouse/ClickHouse/pull/62901) ([KevinyhZou](https://github.com/KevinyhZou)). +* Disable optimize_rewrite_aggregate_function_with_if for sum(nullable). [#62912](https://github.com/ClickHouse/ClickHouse/pull/62912) ([Raúl Marín](https://github.com/Algunenano)). +* Fix the `Unexpected return type` error for queries that read from `StorageBuffer` with `PREWHERE` when the source table has different types. Fixes [#62545](https://github.com/ClickHouse/ClickHouse/issues/62545). [#62916](https://github.com/ClickHouse/ClickHouse/pull/62916) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix temporary data in cache incorrect behaviour in case creation of cache key base directory fails with `no space left on device`. [#62925](https://github.com/ClickHouse/ClickHouse/pull/62925) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fixed server crash on IPv6 gRPC client connection. [#62978](https://github.com/ClickHouse/ClickHouse/pull/62978) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* Fix possible CHECKSUM_DOESNT_MATCH (and others) during replicated fetches. [#62987](https://github.com/ClickHouse/ClickHouse/pull/62987) ([Azat Khuzhin](https://github.com/azat)). +* Fix terminate with uncaught exception in temporary data in cache. [#62998](https://github.com/ClickHouse/ClickHouse/pull/62998) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix optimize_rewrite_aggregate_function_with_if implicit cast. [#62999](https://github.com/ClickHouse/ClickHouse/pull/62999) ([Raúl Marín](https://github.com/Algunenano)). +* Fix possible crash after unsuccessful RESTORE. This PR fixes [#62985](https://github.com/ClickHouse/ClickHouse/issues/62985). [#63040](https://github.com/ClickHouse/ClickHouse/pull/63040) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix `Not found column in block` error for distributed queries with server-side constants in `GROUP BY` key. Fixes [#62682](https://github.com/ClickHouse/ClickHouse/issues/62682). [#63047](https://github.com/ClickHouse/ClickHouse/pull/63047) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix incorrect judgement of of monotonicity of function `abs`. [#63097](https://github.com/ClickHouse/ClickHouse/pull/63097) ([Duc Canh Le](https://github.com/canhld94)). +* Sanity check: Clamp values instead of throwing. [#63119](https://github.com/ClickHouse/ClickHouse/pull/63119) ([Raúl Marín](https://github.com/Algunenano)). +* Setting server_name might help with recently reported SSL handshake error when connecting to MongoDB Atlas: `Poco::Exception. Code: 1000, e.code() = 0, SSL Exception: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR`. [#63122](https://github.com/ClickHouse/ClickHouse/pull/63122) ([Alexander Gololobov](https://github.com/davenger)). +* The wire protocol version check for MongoDB used to try accessing "config" database, but this can fail if the user doesn't have permissions for it. The fix is to use the database name provided by user. [#63126](https://github.com/ClickHouse/ClickHouse/pull/63126) ([Alexander Gololobov](https://github.com/davenger)). +* Fix a bug when `SQL SECURITY` statement appears in all `CREATE` queries if the server setting `ignore_empty_sql_security_in_create_view_query=true` https://github.com/ClickHouse/ClickHouse/pull/63134. [#63136](https://github.com/ClickHouse/ClickHouse/pull/63136) ([pufit](https://github.com/pufit)). #### CI Fix or Improvement (changelog entry is not required) diff --git a/docs/en/development/contrib.md b/docs/en/development/contrib.md index 5f96466bbec..db3eabaecfc 100644 --- a/docs/en/development/contrib.md +++ b/docs/en/development/contrib.md @@ -7,21 +7,43 @@ description: A list of third-party libraries used # Third-Party Libraries Used -ClickHouse utilizes third-party libraries for different purposes, e.g., to connect to other databases, to decode (encode) data during load (save) from (to) disk or to implement certain specialized SQL functions. To be independent of the available libraries in the target system, each third-party library is imported as a Git submodule into ClickHouse's source tree and compiled and linked with ClickHouse. A list of third-party libraries and their licenses can be obtained by the following query: +ClickHouse utilizes third-party libraries for different purposes, e.g., to connect to other databases, to decode/encode data during load/save from/to disk, or to implement certain specialized SQL functions. +To be independent of the available libraries in the target system, each third-party library is imported as a Git submodule into ClickHouse's source tree and compiled and linked with ClickHouse. +A list of third-party libraries and their licenses can be obtained by the following query: ``` sql SELECT library_name, license_type, license_path FROM system.licenses ORDER BY library_name COLLATE 'en'; ``` -Note that the listed libraries are the ones located in the `contrib/` directory of the ClickHouse repository. Depending on the build options, some of the libraries may have not been compiled, and as a result, their functionality may not be available at runtime. +Note that the listed libraries are the ones located in the `contrib/` directory of the ClickHouse repository. +Depending on the build options, some of the libraries may have not been compiled, and, as a result, their functionality may not be available at runtime. [Example](https://play.clickhouse.com/play?user=play#U0VMRUNUIGxpYnJhcnlfbmFtZSwgbGljZW5zZV90eXBlLCBsaWNlbnNlX3BhdGggRlJPTSBzeXN0ZW0ubGljZW5zZXMgT1JERVIgQlkgbGlicmFyeV9uYW1lIENPTExBVEUgJ2VuJw==) -## Adding new third-party libraries and maintaining patches in third-party libraries {#adding-third-party-libraries} +## Adding and maintaining third-party libraries -1. Each third-party library must reside in a dedicated directory under the `contrib/` directory of the ClickHouse repository. Avoid dumps/copies of external code, instead use Git submodule feature to pull third-party code from an external upstream repository. -2. Submodules are listed in `.gitmodule`. If the external library can be used as-is, you may reference the upstream repository directly. Otherwise, i.e. the external library requires patching/customization, create a fork of the official repository in the [ClickHouse organization in GitHub](https://github.com/ClickHouse). -3. In the latter case, create a branch with `clickhouse/` prefix from the branch you want to integrate, e.g. `clickhouse/master` (for `master`) or `clickhouse/release/vX.Y.Z` (for a `release/vX.Y.Z` tag). The purpose of this branch is to isolate customization of the library from upstream work. For example, pulls from the upstream repository into the fork will leave all `clickhouse/` branches unaffected. Submodules in `contrib/` must only track `clickhouse/` branches of forked third-party repositories. -4. To patch a fork of a third-party library, create a dedicated branch with `clickhouse/` prefix in the fork, e.g. `clickhouse/fix-some-desaster`. Finally, merge the patch branch into the custom tracking branch (e.g. `clickhouse/master` or `clickhouse/release/vX.Y.Z`) using a PR. -5. Always create patches of third-party libraries with the official repository in mind. Once a PR of a patch branch to the `clickhouse/` branch in the fork repository is done and the submodule version in ClickHouse official repository is bumped, consider opening another PR from the patch branch to the upstream library repository. This ensures, that 1) the contribution has more than a single use case and importance, 2) others will also benefit from it, 3) the change will not remain a maintenance burden solely on ClickHouse developers. -9. To update a submodule with changes in the upstream repository, first merge upstream `master` (or a new `versionX.Y.Z` tag) into the `clickhouse`-tracking branch in the fork repository. Conflicts with patches/customization will need to be resolved in this merge (see Step 4.). Once the merge is done, bump the submodule in ClickHouse to point to the new hash in the fork. +Each third-party library must reside in a dedicated directory under the `contrib/` directory of the ClickHouse repository. +Avoid dumping copies of external code into the library directory. +Instead create a Git submodule to pull third-party code from an external upstream repository. + +All submodules used by ClickHouse are listed in the `.gitmodule` file. +If the library can be used as-is (the default case), you can reference the upstream repository directly. +If the library needs patching, create a fork of the upstream repository in the [ClickHouse organization on GitHub](https://github.com/ClickHouse). + +In the latter case, we aim to isolate custom patches as much as possible from upstream commits. +To that end, create a branch with prefix `clickhouse/` from the branch or tag you want to integrate, e.g. `clickhouse/master` (for branch `master`) or `clickhouse/release/vX.Y.Z` (for tag `release/vX.Y.Z`). +This ensures that pulls from the upstream repository into the fork will leave custom `clickhouse/` branches unaffected. +Submodules in `contrib/` must only track `clickhouse/` branches of forked third-party repositories. + +Patches are only applied against `clickhouse/` branches of external libraries. +For that, push the patch as a branch with `clickhouse/`, e.g. `clickhouse/fix-some-desaster`. +Then create a PR from the new branch against the custom tracking branch with `clickhouse/` prefix, (e.g. `clickhouse/master` or `clickhouse/release/vX.Y.Z`) and merge the patch. + +Create patches of third-party libraries with the official repository in mind and consider contributing the patch back to the upstream repository. +This makes sure that others will also benefit from the patch and it will not be a maintenance burden for the ClickHouse team. + +To pull upstream changes into the submodule, you can use two methods: +- (less work but less clean): merge upstream `master` into the corresponding `clickhouse/` tracking branch in the forked repository. You will need to resolve merge conflicts with previous custom patches. This method can be used when the `clickhouse/` branch tracks an upstream development branch like `master`, `main`, `dev`, etc. +- (more work but cleaner): create a new branch with `clickhouse/` prefix from the upstream commit or tag you like to integrate. Then re-apply all existing patches using new PRs (or squash them into a single PR). This method can be used when the `clickhouse/` branch tracks a specific upstream version branch or tag. It is cleaner in the sense that custom patches and upstream changes are better isolated from each other. + +Once the submodule has been updated, bump the submodule in ClickHouse to point to the new hash in the fork. diff --git a/docs/en/getting-started/install.md b/docs/en/getting-started/install.md index 6525c29306a..67752f223ce 100644 --- a/docs/en/getting-started/install.md +++ b/docs/en/getting-started/install.md @@ -111,29 +111,10 @@ clickhouse-client # or "clickhouse-client --password" if you've set up a passwor ```
-Deprecated Method for installing deb-packages - -``` bash -sudo apt-get install apt-transport-https ca-certificates dirmngr -sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 - -echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \ - /etc/apt/sources.list.d/clickhouse.list -sudo apt-get update - -sudo apt-get install -y clickhouse-server clickhouse-client - -sudo service clickhouse-server start -clickhouse-client # or "clickhouse-client --password" if you set up a password. -``` - -
- -
-Migration Method for installing the deb-packages +Old distributions method for installing the deb-packages ```bash -sudo apt-key del E0C56BD4 +sudo apt-get install apt-transport-https ca-certificates dirmngr sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754 echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \ /etc/apt/sources.list.d/clickhouse.list @@ -240,22 +221,6 @@ sudo systemctl start clickhouse-keeper sudo systemctl status clickhouse-keeper ``` -
- -Deprecated Method for installing rpm-packages - -``` bash -sudo yum install yum-utils -sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG -sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo -sudo yum install clickhouse-server clickhouse-client - -sudo /etc/init.d/clickhouse-server start -clickhouse-client # or "clickhouse-client --password" if you set up a password. -``` - -
- You can replace `stable` with `lts` to use different [release kinds](/knowledgebase/production) based on your needs. Then run these commands to install packages: @@ -308,33 +273,6 @@ tar -xzvf "clickhouse-client-$LATEST_VERSION-${ARCH}.tgz" \ sudo "clickhouse-client-$LATEST_VERSION/install/doinst.sh" ``` -
- -Deprecated Method for installing tgz archives - -``` bash -export LATEST_VERSION=$(curl -s https://repo.clickhouse.com/tgz/stable/ | \ - grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1) -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz - -tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz -sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh - -tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz -sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh - -tar -xzvf clickhouse-server-$LATEST_VERSION.tgz -sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh -sudo /etc/init.d/clickhouse-server start - -tar -xzvf clickhouse-client-$LATEST_VERSION.tgz -sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh -``` -
- For production environments, it’s recommended to use the latest `stable`-version. You can find its number on GitHub page https://github.com/ClickHouse/ClickHouse/tags with postfix `-stable`. ### From Docker Image {#from-docker-image} diff --git a/docs/en/sql-reference/data-types/ipv4.md b/docs/en/sql-reference/data-types/ipv4.md index 637ed543e08..98ba9f4abac 100644 --- a/docs/en/sql-reference/data-types/ipv4.md +++ b/docs/en/sql-reference/data-types/ipv4.md @@ -57,6 +57,18 @@ SELECT toTypeName(from), hex(from) FROM hits LIMIT 1; └──────────────────┴───────────┘ ``` +IPv4 addresses can be directly compared to IPv6 addresses: + +```sql +SELECT toIPv4('127.0.0.1') = toIPv6('::ffff:127.0.0.1'); +``` + +```text +┌─equals(toIPv4('127.0.0.1'), toIPv6('::ffff:127.0.0.1'))─┐ +│ 1 │ +└─────────────────────────────────────────────────────────┘ +``` + **See Also** - [Functions for Working with IPv4 and IPv6 Addresses](../functions/ip-address-functions.md) diff --git a/docs/en/sql-reference/data-types/ipv6.md b/docs/en/sql-reference/data-types/ipv6.md index 642a7db81fc..d3b7cc72a1a 100644 --- a/docs/en/sql-reference/data-types/ipv6.md +++ b/docs/en/sql-reference/data-types/ipv6.md @@ -57,6 +57,19 @@ SELECT toTypeName(from), hex(from) FROM hits LIMIT 1; └──────────────────┴──────────────────────────────────┘ ``` +IPv6 addresses can be directly compared to IPv4 addresses: + +```sql +SELECT toIPv4('127.0.0.1') = toIPv6('::ffff:127.0.0.1'); +``` + +```text +┌─equals(toIPv4('127.0.0.1'), toIPv6('::ffff:127.0.0.1'))─┐ +│ 1 │ +└─────────────────────────────────────────────────────────┘ +``` + + **See Also** - [Functions for Working with IPv4 and IPv6 Addresses](../functions/ip-address-functions.md) diff --git a/docs/en/sql-reference/functions/date-time-functions.md b/docs/en/sql-reference/functions/date-time-functions.md index 6ad26f452ad..4092c83954a 100644 --- a/docs/en/sql-reference/functions/date-time-functions.md +++ b/docs/en/sql-reference/functions/date-time-functions.md @@ -1235,6 +1235,168 @@ Result: - [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) server configuration parameter. +## toStartOfMillisecond + +Rounds down a date with time to the start of the milliseconds. + +**Syntax** + +``` sql +toStartOfMillisecond(value, [timezone]) +``` + +**Arguments** + +- `value` — Date and time. [DateTime64](../../sql-reference/data-types/datetime64.md). +- `timezone` — [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) for the returned value (optional). If not specified, the function uses the timezone of the `value` parameter. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- Input value with sub-milliseconds. [DateTime64](../../sql-reference/data-types/datetime64.md). + +**Examples** + +Query without timezone: + +``` sql +WITH toDateTime64('2020-01-01 10:20:30.999999999', 9) AS dt64 +SELECT toStartOfMillisecond(dt64); +``` + +Result: + +``` text +┌────toStartOfMillisecond(dt64)─┐ +│ 2020-01-01 10:20:30.999000000 │ +└───────────────────────────────┘ +``` + +Query with timezone: + +``` sql +┌─toStartOfMillisecond(dt64, 'Asia/Istanbul')─┐ +│ 2020-01-01 12:20:30.999000000 │ +└─────────────────────────────────────────────┘ +``` + +Result: + +``` text +┌─toStartOfMillisecond(dt64, 'Asia/Istanbul')─┐ +│ 2020-01-01 12:20:30.999 │ +└─────────────────────────────────────────────┘ +``` + +## toStartOfMicrosecond + +Rounds down a date with time to the start of the microseconds. + +**Syntax** + +``` sql +toStartOfMicrosecond(value, [timezone]) +``` + +**Arguments** + +- `value` — Date and time. [DateTime64](../../sql-reference/data-types/datetime64.md). +- `timezone` — [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) for the returned value (optional). If not specified, the function uses the timezone of the `value` parameter. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- Input value with sub-microseconds. [DateTime64](../../sql-reference/data-types/datetime64.md). + +**Examples** + +Query without timezone: + +``` sql +WITH toDateTime64('2020-01-01 10:20:30.999999999', 9) AS dt64 +SELECT toStartOfMicrosecond(dt64); +``` + +Result: + +``` text +┌────toStartOfMicrosecond(dt64)─┐ +│ 2020-01-01 10:20:30.999999000 │ +└───────────────────────────────┘ +``` + +Query with timezone: + +``` sql +WITH toDateTime64('2020-01-01 10:20:30.999999999', 9) AS dt64 +SELECT toStartOfMicrosecond(dt64, 'Asia/Istanbul'); +``` + +Result: + +``` text +┌─toStartOfMicrosecond(dt64, 'Asia/Istanbul')─┐ +│ 2020-01-01 12:20:30.999999000 │ +└─────────────────────────────────────────────┘ +``` + +**See also** + +- [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) server configuration parameter. + +## toStartOfNanosecond + +Rounds down a date with time to the start of the nanoseconds. + +**Syntax** + +``` sql +toStartOfNanosecond(value, [timezone]) +``` + +**Arguments** + +- `value` — Date and time. [DateTime64](../../sql-reference/data-types/datetime64.md). +- `timezone` — [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) for the returned value (optional). If not specified, the function uses the timezone of the `value` parameter. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- Input value with nanoseconds. [DateTime64](../../sql-reference/data-types/datetime64.md). + +**Examples** + +Query without timezone: + +``` sql +WITH toDateTime64('2020-01-01 10:20:30.999999999', 9) AS dt64 +SELECT toStartOfNanosecond(dt64); +``` + +Result: + +``` text +┌─────toStartOfNanosecond(dt64)─┐ +│ 2020-01-01 10:20:30.999999999 │ +└───────────────────────────────┘ +``` + +Query with timezone: + +``` sql +WITH toDateTime64('2020-01-01 10:20:30.999999999', 9) AS dt64 +SELECT toStartOfNanosecond(dt64, 'Asia/Istanbul'); +``` + +Result: + +``` text +┌─toStartOfNanosecond(dt64, 'Asia/Istanbul')─┐ +│ 2020-01-01 12:20:30.999999999 │ +└────────────────────────────────────────────┘ +``` + +**See also** + +- [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) server configuration parameter. + ## toStartOfFiveMinutes Rounds down a date with time to the start of the five-minute interval. @@ -3953,6 +4115,43 @@ Result: │ 2023-03-16 18:00:00.000 │ └─────────────────────────────────────────────────────────────────────────┘ ``` + +## UTCTimestamp + +Returns the current date and time at the moment of query analysis. The function is a constant expression. + +:::note +This function gives the same result that `now('UTC')` would. It was added only for MySQL support and [`now`](#now-now) is the preferred usage. +::: + +**Syntax** + +```sql +UTCTimestamp() +``` + +Alias: `UTC_timestamp`. + +**Returned value** + +- Returns the current date and time at the moment of query analysis. [DateTime](../data-types/datetime.md). + +**Example** + +Query: + +```sql +SELECT UTCTimestamp(); +``` + +Result: + +```response +┌──────UTCTimestamp()─┐ +│ 2024-05-28 08:32:09 │ +└─────────────────────┘ +``` + ## timeDiff Returns the difference between two dates or dates with time values. The difference is calculated in units of seconds. It is same as `dateDiff` and was added only for MySQL support. `dateDiff` is preferred. diff --git a/docs/en/sql-reference/functions/json-functions.md b/docs/en/sql-reference/functions/json-functions.md index 8359d5f9fbc..5d73c9a83b3 100644 --- a/docs/en/sql-reference/functions/json-functions.md +++ b/docs/en/sql-reference/functions/json-functions.md @@ -4,13 +4,13 @@ sidebar_position: 105 sidebar_label: JSON --- -There are two sets of functions to parse JSON. - - `simpleJSON*` (`visitParam*`) is made to parse a special very limited subset of a JSON, but these functions are extremely fast. - - `JSONExtract*` is made to parse normal JSON. +There are two sets of functions to parse JSON: + - [`simpleJSON*` (`visitParam*`)](#simplejson--visitparam-functions) which is made for parsing a limited subset of JSON extremely fast. + - [`JSONExtract*`](#jsonextract-functions) which is made for parsing ordinary JSON. -# simpleJSON/visitParam functions +## simpleJSON / visitParam functions -ClickHouse has special functions for working with simplified JSON. All these JSON functions are based on strong assumptions about what the JSON can be, but they try to do as little as possible to get the job done. +ClickHouse has special functions for working with simplified JSON. All these JSON functions are based on strong assumptions about what the JSON can be. They try to do as little as possible to get the job done as quickly as possible. The following assumptions are made: @@ -19,7 +19,7 @@ The following assumptions are made: 3. Fields are searched for on any nesting level, indiscriminately. If there are multiple matching fields, the first occurrence is used. 4. The JSON does not have space characters outside of string literals. -## simpleJSONHas +### simpleJSONHas Checks whether there is a field named `field_name`. The result is `UInt8`. @@ -29,14 +29,16 @@ Checks whether there is a field named `field_name`. The result is `UInt8`. simpleJSONHas(json, field_name) ``` +Alias: `visitParamHas`. + **Parameters** -- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string) -- `field_name`: The name of the field to search for. [String literal](../syntax#string) +- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string) +- `field_name` — The name of the field to search for. [String literal](../syntax#string) **Returned value** -It returns `1` if the field exists, `0` otherwise. +- Returns `1` if the field exists, `0` otherwise. [UInt8](../data-types/int-uint.md). **Example** @@ -55,11 +57,13 @@ SELECT simpleJSONHas(json, 'foo') FROM jsons; SELECT simpleJSONHas(json, 'bar') FROM jsons; ``` +Result: + ```response 1 0 ``` -## simpleJSONExtractUInt +### simpleJSONExtractUInt Parses `UInt64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`. @@ -69,14 +73,16 @@ Parses `UInt64` from the value of the field named `field_name`. If this is a str simpleJSONExtractUInt(json, field_name) ``` +Alias: `visitParamExtractUInt`. + **Parameters** -- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string) -- `field_name`: The name of the field to search for. [String literal](../syntax#string) +- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string) +- `field_name` — The name of the field to search for. [String literal](../syntax#string) **Returned value** -It returns the number parsed from the field if the field exists and contains a number, `0` otherwise. +- Returns the number parsed from the field if the field exists and contains a number, `0` otherwise. [UInt64](../data-types/int-uint.md). **Example** @@ -98,6 +104,8 @@ INSERT INTO jsons VALUES ('{"baz":2}'); SELECT simpleJSONExtractUInt(json, 'foo') FROM jsons ORDER BY json; ``` +Result: + ```response 0 4 @@ -106,7 +114,7 @@ SELECT simpleJSONExtractUInt(json, 'foo') FROM jsons ORDER BY json; 5 ``` -## simpleJSONExtractInt +### simpleJSONExtractInt Parses `Int64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`. @@ -116,14 +124,16 @@ Parses `Int64` from the value of the field named `field_name`. If this is a stri simpleJSONExtractInt(json, field_name) ``` +Alias: `visitParamExtractInt`. + **Parameters** -- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string) -- `field_name`: The name of the field to search for. [String literal](../syntax#string) +- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string) +- `field_name` — The name of the field to search for. [String literal](../syntax#string) **Returned value** -It returns the number parsed from the field if the field exists and contains a number, `0` otherwise. +- Returns the number parsed from the field if the field exists and contains a number, `0` otherwise. [Int64](../data-types/int-uint.md). **Example** @@ -145,6 +155,8 @@ INSERT INTO jsons VALUES ('{"baz":2}'); SELECT simpleJSONExtractInt(json, 'foo') FROM jsons ORDER BY json; ``` +Result: + ```response 0 -4 @@ -153,7 +165,7 @@ SELECT simpleJSONExtractInt(json, 'foo') FROM jsons ORDER BY json; 5 ``` -## simpleJSONExtractFloat +### simpleJSONExtractFloat Parses `Float64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`. @@ -163,14 +175,16 @@ Parses `Float64` from the value of the field named `field_name`. If this is a st simpleJSONExtractFloat(json, field_name) ``` +Alias: `visitParamExtractFloat`. + **Parameters** -- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string) -- `field_name`: The name of the field to search for. [String literal](../syntax#string) +- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string) +- `field_name` — The name of the field to search for. [String literal](../syntax#string) **Returned value** -It returns the number parsed from the field if the field exists and contains a number, `0` otherwise. +- Returns the number parsed from the field if the field exists and contains a number, `0` otherwise. [Float64](../data-types/float.md/#float32-float64). **Example** @@ -192,6 +206,8 @@ INSERT INTO jsons VALUES ('{"baz":2}'); SELECT simpleJSONExtractFloat(json, 'foo') FROM jsons ORDER BY json; ``` +Result: + ```response 0 -4000 @@ -200,7 +216,7 @@ SELECT simpleJSONExtractFloat(json, 'foo') FROM jsons ORDER BY json; 5 ``` -## simpleJSONExtractBool +### simpleJSONExtractBool Parses a true/false value from the value of the field named `field_name`. The result is `UInt8`. @@ -210,10 +226,12 @@ Parses a true/false value from the value of the field named `field_name`. The re simpleJSONExtractBool(json, field_name) ``` +Alias: `visitParamExtractBool`. + **Parameters** -- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string) -- `field_name`: The name of the field to search for. [String literal](../syntax#string) +- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string) +- `field_name` — The name of the field to search for. [String literal](../syntax#string) **Returned value** @@ -240,6 +258,8 @@ SELECT simpleJSONExtractBool(json, 'bar') FROM jsons ORDER BY json; SELECT simpleJSONExtractBool(json, 'foo') FROM jsons ORDER BY json; ``` +Result: + ```response 0 1 @@ -247,7 +267,7 @@ SELECT simpleJSONExtractBool(json, 'foo') FROM jsons ORDER BY json; 0 ``` -## simpleJSONExtractRaw +### simpleJSONExtractRaw Returns the value of the field named `field_name` as a `String`, including separators. @@ -257,14 +277,16 @@ Returns the value of the field named `field_name` as a `String`, including separ simpleJSONExtractRaw(json, field_name) ``` +Alias: `visitParamExtractRaw`. + **Parameters** -- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string) -- `field_name`: The name of the field to search for. [String literal](../syntax#string) +- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string) +- `field_name` — The name of the field to search for. [String literal](../syntax#string) **Returned value** -It returns the value of the field as a [`String`](../data-types/string.md#string), including separators if the field exists, or an empty `String` otherwise. +- Returns the value of the field as a string, including separators if the field exists, or an empty string otherwise. [`String`](../data-types/string.md#string) **Example** @@ -286,6 +308,8 @@ INSERT INTO jsons VALUES ('{"baz":2}'); SELECT simpleJSONExtractRaw(json, 'foo') FROM jsons ORDER BY json; ``` +Result: + ```response "-4e3" @@ -294,7 +318,7 @@ SELECT simpleJSONExtractRaw(json, 'foo') FROM jsons ORDER BY json; {"def":[1,2,3]} ``` -## simpleJSONExtractString +### simpleJSONExtractString Parses `String` in double quotes from the value of the field named `field_name`. @@ -304,14 +328,16 @@ Parses `String` in double quotes from the value of the field named `field_name`. simpleJSONExtractString(json, field_name) ``` +Alias: `visitParamExtractString`. + **Parameters** -- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string) -- `field_name`: The name of the field to search for. [String literal](../syntax#string) +- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string) +- `field_name` — The name of the field to search for. [String literal](../syntax#string) **Returned value** -It returns the value of a field as a [`String`](../data-types/string.md#string), including separators. The value is unescaped. It returns an empty `String`: if the field doesn't contain a double quoted string, if unescaping fails or if the field doesn't exist. +- Returns the unescaped value of a field as a string, including separators. An empty string is returned if the field doesn't contain a double quoted string, if unescaping fails or if the field doesn't exist. [String](../data-types/string.md). **Implementation details** @@ -336,6 +362,8 @@ INSERT INTO jsons VALUES ('{"foo":"hello}'); SELECT simpleJSONExtractString(json, 'foo') FROM jsons ORDER BY json; ``` +Result: + ```response \n\0 @@ -343,73 +371,61 @@ SELECT simpleJSONExtractString(json, 'foo') FROM jsons ORDER BY json; ``` -## visitParamHas +## JSONExtract functions -This function is [an alias of `simpleJSONHas`](./json-functions#simplejsonhas). +The following functions are based on [simdjson](https://github.com/lemire/simdjson), and designed for more complex JSON parsing requirements. -## visitParamExtractUInt +### isValidJSON -This function is [an alias of `simpleJSONExtractUInt`](./json-functions#simplejsonextractuint). +Checks that passed string is valid JSON. -## visitParamExtractInt +**Syntax** -This function is [an alias of `simpleJSONExtractInt`](./json-functions#simplejsonextractint). +```sql +isValidJSON(json) +``` -## visitParamExtractFloat - -This function is [an alias of `simpleJSONExtractFloat`](./json-functions#simplejsonextractfloat). - -## visitParamExtractBool - -This function is [an alias of `simpleJSONExtractBool`](./json-functions#simplejsonextractbool). - -## visitParamExtractRaw - -This function is [an alias of `simpleJSONExtractRaw`](./json-functions#simplejsonextractraw). - -## visitParamExtractString - -This function is [an alias of `simpleJSONExtractString`](./json-functions#simplejsonextractstring). - -# JSONExtract functions - -The following functions are based on [simdjson](https://github.com/lemire/simdjson) designed for more complex JSON parsing requirements. - -## isValidJSON(json) - -Checks that passed string is a valid json. - -Examples: +**Examples** ``` sql SELECT isValidJSON('{"a": "hello", "b": [-100, 200.0, 300]}') = 1 SELECT isValidJSON('not a json') = 0 ``` -## JSONHas(json\[, indices_or_keys\]...) +### JSONHas -If the value exists in the JSON document, `1` will be returned. +If the value exists in the JSON document, `1` will be returned. If the value does not exist, `0` will be returned. -If the value does not exist, `0` will be returned. +**Syntax** -Examples: +```sql +JSONHas(json [, indices_or_keys]...) +``` + +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns `1` if the value exists in `json`, otherwise `0`. [UInt8](../data-types/int-uint.md). + +**Examples** + +Query: ``` sql SELECT JSONHas('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 1 SELECT JSONHas('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 4) = 0 ``` -`indices_or_keys` is a list of zero or more arguments each of them can be either string or integer. - -- String = access object member by key. -- Positive integer = access the n-th member/key from the beginning. -- Negative integer = access the n-th member/key from the end. - -Minimum index of the element is 1. Thus the element 0 does not exist. - -You may use integers to access both JSON arrays and JSON objects. - -So, for example: +The minimum index of the element is 1. Thus the element 0 does not exist. You may use integers to access both JSON arrays and JSON objects. For example: ``` sql SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', 1) = 'a' @@ -419,26 +435,62 @@ SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', -2) = 'a' SELECT JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 1) = 'hello' ``` -## JSONLength(json\[, indices_or_keys\]...) +### JSONLength -Return the length of a JSON array or a JSON object. +Return the length of a JSON array or a JSON object. If the value does not exist or has the wrong type, `0` will be returned. -If the value does not exist or has a wrong type, `0` will be returned. +**Syntax** -Examples: +```sql +JSONLength(json [, indices_or_keys]...) +``` + +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns the length of the JSON array or JSON object. Returns `0` if the value does not exist or has the wrong type. [UInt64](../data-types/int-uint.md). + +**Examples** ``` sql SELECT JSONLength('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 3 SELECT JSONLength('{"a": "hello", "b": [-100, 200.0, 300]}') = 2 ``` -## JSONType(json\[, indices_or_keys\]...) +### JSONType -Return the type of a JSON value. +Return the type of a JSON value. If the value does not exist, `Null` will be returned. -If the value does not exist, `Null` will be returned. +**Syntax** -Examples: +```sql +JSONType(json [, indices_or_keys]...) +``` + +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns the type of a JSON value as a string, otherwise if the value doesn't exists it returns `Null`. [String](../data-types/string.md). + +**Examples** ``` sql SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}') = 'Object' @@ -446,35 +498,191 @@ SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') = 'String' SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 'Array' ``` -## JSONExtractUInt(json\[, indices_or_keys\]...) +### JSONExtractUInt -## JSONExtractInt(json\[, indices_or_keys\]...) +Parses JSON and extracts a value of UInt type. -## JSONExtractFloat(json\[, indices_or_keys\]...) +**Syntax** -## JSONExtractBool(json\[, indices_or_keys\]...) - -Parses a JSON and extract a value. These functions are similar to `visitParam` functions. - -If the value does not exist or has a wrong type, `0` will be returned. - -Examples: - -``` sql -SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 1) = -100 -SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 2) = 200.0 -SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) = 300 +```sql +JSONExtractUInt(json [, indices_or_keys]...) ``` -## JSONExtractString(json\[, indices_or_keys\]...) +**Parameters** -Parses a JSON and extract a string. This function is similar to `visitParamExtractString` functions. +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). -If the value does not exist or has a wrong type, an empty string will be returned. +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. -The value is unescaped. If unescaping failed, it returns an empty string. +**Returned value** -Examples: +- Returns a UInt value if it exists, otherwise it returns `Null`. [UInt64](../data-types/string.md). + +**Examples** + +Query: + +``` sql +SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) as x, toTypeName(x); +``` + +Result: + +```response +┌───x─┬─toTypeName(x)─┐ +│ 300 │ UInt64 │ +└─────┴───────────────┘ +``` + +### JSONExtractInt + +Parses JSON and extracts a value of Int type. + +**Syntax** + +```sql +JSONExtractInt(json [, indices_or_keys]...) +``` + +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns an Int value if it exists, otherwise it returns `Null`. [Int64](../data-types/int-uint.md). + +**Examples** + +Query: + +``` sql +SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) as x, toTypeName(x); +``` + +Result: + +```response +┌───x─┬─toTypeName(x)─┐ +│ 300 │ Int64 │ +└─────┴───────────────┘ +``` + +### JSONExtractFloat + +Parses JSON and extracts a value of Int type. + +**Syntax** + +```sql +JSONExtractFloat(json [, indices_or_keys]...) +``` + +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns an Float value if it exists, otherwise it returns `Null`. [Float64](../data-types/float.md). + +**Examples** + +Query: + +``` sql +SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 2) as x, toTypeName(x); +``` + +Result: + +```response +┌───x─┬─toTypeName(x)─┐ +│ 200 │ Float64 │ +└─────┴───────────────┘ +``` + +### JSONExtractBool + +Parses JSON and extracts a boolean value. If the value does not exist or has a wrong type, `0` will be returned. + +**Syntax** + +```sql +JSONExtractBool(json\[, indices_or_keys\]...) +``` + +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns a Boolean value if it exists, otherwise it returns `0`. [Bool](../data-types/boolean.md). + +**Example** + +Query: + +``` sql +SELECT JSONExtractBool('{"passed": true}', 'passed'); +``` + +Result: + +```response +┌─JSONExtractBool('{"passed": true}', 'passed')─┐ +│ 1 │ +└───────────────────────────────────────────────┘ +``` + +### JSONExtractString + +Parses JSON and extracts a string. This function is similar to [`visitParamExtractString`](#simplejsonextractstring) functions. If the value does not exist or has a wrong type, an empty string will be returned. + +**Syntax** + +```sql +JSONExtractString(json [, indices_or_keys]...) +``` + +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns an unescaped string from `json`. If unescaping failed, if the value does not exist or if it has a wrong type then it returns an empty string. [String](../data-types/string.md). + +**Examples** ``` sql SELECT JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') = 'hello' @@ -484,16 +692,35 @@ SELECT JSONExtractString('{"abc":"\\u263"}', 'abc') = '' SELECT JSONExtractString('{"abc":"hello}', 'abc') = '' ``` -## JSONExtract(json\[, indices_or_keys...\], Return_type) +### JSONExtract -Parses a JSON and extract a value of the given ClickHouse data type. +Parses JSON and extracts a value of the given ClickHouse data type. This function is a generalized version of the previous `JSONExtract` functions. Meaning: -This is a generalization of the previous `JSONExtract` functions. -This means `JSONExtract(..., 'String')` returns exactly the same as `JSONExtractString()`, `JSONExtract(..., 'Float64')` returns exactly the same as `JSONExtractFloat()`. -Examples: +**Syntax** + +```sql +JSONExtract(json [, indices_or_keys...], return_type) +``` + +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). +- `return_type` — A string specifying the type of the value to extract. [String](../data-types/string.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns a value if it exists of the specified return type, otherwise it returns `0`, `Null`, or an empty-string depending on the specified return type. [UInt64](../data-types/int-uint.md), [Int64](../data-types/int-uint.md), [Float64](../data-types/float.md), [Bool](../data-types/boolean.md) or [String](../data-types/string.md). + +**Examples** ``` sql SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'Tuple(String, Array(Float64))') = ('hello',[-100,200,300]) @@ -506,17 +733,38 @@ SELECT JSONExtract('{"day": "Thursday"}', 'day', 'Enum8(\'Sunday\' = 0, \'Monday SELECT JSONExtract('{"day": 5}', 'day', 'Enum8(\'Sunday\' = 0, \'Monday\' = 1, \'Tuesday\' = 2, \'Wednesday\' = 3, \'Thursday\' = 4, \'Friday\' = 5, \'Saturday\' = 6)') = 'Friday' ``` -## JSONExtractKeysAndValues(json\[, indices_or_keys...\], Value_type) +### JSONExtractKeysAndValues -Parses key-value pairs from a JSON where the values are of the given ClickHouse data type. +Parses key-value pairs from JSON where the values are of the given ClickHouse data type. -Example: +**Syntax** + +```sql +JSONExtractKeysAndValues(json [, indices_or_keys...], value_type) +``` + +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). +- `value_type` — A string specifying the type of the value to extract. [String](../data-types/string.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns an array of parsed key-value pairs. [Array](../data-types/array.md)([Tuple](../data-types/tuple.md)(`value_type`)). + +**Example** ``` sql SELECT JSONExtractKeysAndValues('{"x": {"a": 5, "b": 7, "c": 11}}', 'x', 'Int8') = [('a',5),('b',7),('c',11)]; ``` -## JSONExtractKeys +### JSONExtractKeys Parses a JSON string and extracts the keys. @@ -526,14 +774,14 @@ Parses a JSON string and extracts the keys. JSONExtractKeys(json[, a, b, c...]) ``` -**Arguments** +**Parameters** - `json` — [String](../data-types/string.md) with valid JSON. - `a, b, c...` — Comma-separated indices or keys that specify the path to the inner field in a nested JSON object. Each argument can be either a [String](../data-types/string.md) to get the field by the key or an [Integer](../data-types/int-uint.md) to get the N-th field (indexed from 1, negative integers count from the end). If not set, the whole JSON is parsed as the top-level object. Optional parameter. **Returned value** -Array with the keys of the JSON. [Array](../data-types/array.md)([String](../data-types/string.md)). +- Returns an array with the keys of the JSON. [Array](../data-types/array.md)([String](../data-types/string.md)). **Example** @@ -552,31 +800,67 @@ text └────────────────────────────────────────────────────────────┘ ``` -## JSONExtractRaw(json\[, indices_or_keys\]...) +### JSONExtractRaw -Returns a part of JSON as unparsed string. +Returns part of the JSON as an unparsed string. If the part does not exist or has the wrong type, an empty string will be returned. -If the part does not exist or has a wrong type, an empty string will be returned. +**Syntax** -Example: +```sql +JSONExtractRaw(json [, indices_or_keys]...) +``` + +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns part of the JSON as an unparsed string. If the part does not exist or has the wrong type, an empty string is returned. [String](../data-types/string.md). + +**Example** ``` sql SELECT JSONExtractRaw('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = '[-100, 200.0, 300]'; ``` -## JSONExtractArrayRaw(json\[, indices_or_keys...\]) +### JSONExtractArrayRaw -Returns an array with elements of JSON array, each represented as unparsed string. +Returns an array with elements of JSON array, each represented as unparsed string. If the part does not exist or isn’t an array, then an empty array will be returned. -If the part does not exist or isn’t array, an empty array will be returned. +**Syntax** -Example: +```sql +JSONExtractArrayRaw(json [, indices_or_keys...]) +``` -``` sql +**Parameters** + +- `json` — JSON string to parse. [String](../data-types/string.md). +- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md). + +`indices_or_keys` type: +- String = access object member by key. +- Positive integer = access the n-th member/key from the beginning. +- Negative integer = access the n-th member/key from the end. + +**Returned value** + +- Returns an array with elements of JSON array, each represented as unparsed string. Otherwise, an empty array is returned if the part does not exist or is not an array. [Array](../data-types/array.md)([String](../data-types/string.md)). + +**Example** + +```sql SELECT JSONExtractArrayRaw('{"a": "hello", "b": [-100, 200.0, "hello"]}', 'b') = ['-100', '200.0', '"hello"']; ``` -## JSONExtractKeysAndValuesRaw +### JSONExtractKeysAndValuesRaw Extracts raw data from a JSON object. @@ -640,13 +924,30 @@ Result: └───────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` -## JSON_EXISTS(json, path) +### JSON_EXISTS -If the value exists in the JSON document, `1` will be returned. +If the value exists in the JSON document, `1` will be returned. If the value does not exist, `0` will be returned. -If the value does not exist, `0` will be returned. +**Syntax** -Examples: +```sql +JSON_EXISTS(json, path) +``` + +**Parameters** + +- `json` — A string with valid JSON. [String](../data-types/string.md). +- `path` — A string representing the path. [String](../data-types/string.md). + +:::note +Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json) +::: + +**Returned value** + +- Returns `1` if the value exists in the JSON document, otherwise `0`. + +**Examples** ``` sql SELECT JSON_EXISTS('{"hello":1}', '$.hello'); @@ -655,17 +956,32 @@ SELECT JSON_EXISTS('{"hello":["world"]}', '$.hello[*]'); SELECT JSON_EXISTS('{"hello":["world"]}', '$.hello[0]'); ``` +### JSON_QUERY + +Parses a JSON and extract a value as a JSON array or JSON object. If the value does not exist, an empty string will be returned. + +**Syntax** + +```sql +JSON_QUERY(json, path) +``` + +**Parameters** + +- `json` — A string with valid JSON. [String](../data-types/string.md). +- `path` — A string representing the path. [String](../data-types/string.md). + :::note Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json) ::: -## JSON_QUERY(json, path) +**Returned value** -Parses a JSON and extract a value as JSON array or JSON object. +- Returns the extracted value as a JSON array or JSON object. Otherwise it returns an empty string if the value does not exist. [String](../data-types/string.md). -If the value does not exist, an empty string will be returned. +**Example** -Example: +Query: ``` sql SELECT JSON_QUERY('{"hello":"world"}', '$.hello'); @@ -682,17 +998,38 @@ Result: [2] String ``` + +### JSON_VALUE + +Parses a JSON and extract a value as a JSON scalar. If the value does not exist, an empty string will be returned by default. + +This function is controlled by the following settings: + +- by SET `function_json_value_return_type_allow_nullable` = `true`, `NULL` will be returned. If the value is complex type (such as: struct, array, map), an empty string will be returned by default. +- by SET `function_json_value_return_type_allow_complex` = `true`, the complex value will be returned. + +**Syntax** + +```sql +JSON_VALUE(json, path) +``` + +**Parameters** + +- `json` — A string with valid JSON. [String](../data-types/string.md). +- `path` — A string representing the path. [String](../data-types/string.md). + :::note -Before version 21.11 the order of arguments was wrong, i.e. JSON_QUERY(path, json) +Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json) ::: -## JSON_VALUE(json, path) +**Returned value** -Parses a JSON and extract a value as JSON scalar. +- Returns the extracted value as a JSON scalar if it exists, otherwise an empty string is returned. [String](../data-types/string.md). -If the value does not exist, an empty string will be returned by default, and by SET `function_json_value_return_type_allow_nullable` = `true`, `NULL` will be returned. If the value is complex type (such as: struct, array, map), an empty string will be returned by default, and by SET `function_json_value_return_type_allow_complex` = `true`, the complex value will be returned. +**Example** -Example: +Query: ``` sql SELECT JSON_VALUE('{"hello":"world"}', '$.hello'); @@ -712,11 +1049,7 @@ world String ``` -:::note -Before version 21.11 the order of arguments was wrong, i.e. JSON_VALUE(path, json) -::: - -## toJSONString +### toJSONString Serializes a value to its JSON representation. Various data types and nested structures are supported. 64-bit [integers](../data-types/int-uint.md) or bigger (like `UInt64` or `Int128`) are enclosed in quotes by default. [output_format_json_quote_64bit_integers](../../operations/settings/settings.md#session_settings-output_format_json_quote_64bit_integers) controls this behavior. @@ -762,7 +1095,7 @@ Result: - [output_format_json_quote_denormals](../../operations/settings/settings.md#settings-output_format_json_quote_denormals) -## JSONArrayLength +### JSONArrayLength Returns the number of elements in the outermost JSON array. The function returns NULL if input JSON string is invalid. @@ -795,7 +1128,7 @@ SELECT ``` -## jsonMergePatch +### jsonMergePatch Returns the merged JSON object string which is formed by merging multiple JSON objects. diff --git a/docs/en/sql-reference/functions/url-functions.md b/docs/en/sql-reference/functions/url-functions.md index 47890e0b271..8b3e4f44840 100644 --- a/docs/en/sql-reference/functions/url-functions.md +++ b/docs/en/sql-reference/functions/url-functions.md @@ -6,7 +6,33 @@ sidebar_label: URLs # Functions for Working with URLs -All these functions do not follow the RFC. They are maximally simplified for improved performance. +:::note +The functions mentioned in this section are optimized for maximum performance and for the most part do not follow the RFC-3986 standard. Functions which implement RFC-3986 have `RFC` appended to their function name and are generally slower. +::: + +You can generally use the non-`RFC` function variants when working with publicly registered domains that contain neither user strings nor `@` symbols. +The table below details which symbols in a URL can (`✔`) or cannot (`✗`) be parsed by the respective `RFC` and non-`RFC` variants: + +|Symbol | non-`RFC`| `RFC` | +|-------|----------|-------| +| ' ' | ✗ |✗ | +| \t | ✗ |✗ | +| < | ✗ |✗ | +| > | ✗ |✗ | +| % | ✗ |✔* | +| { | ✗ |✗ | +| } | ✗ |✗ | +| \| | ✗ |✗ | +| \\\ | ✗ |✗ | +| ^ | ✗ |✗ | +| ~ | ✗ |✔* | +| [ | ✗ |✗ | +| ] | ✗ |✔ | +| ; | ✗ |✔* | +| = | ✗ |✔* | +| & | ✗ |✔* | + +symbols marked `*` are sub-delimiters in RFC 3986 and allowed for user info following the `@` symbol. ## Functions that Extract Parts of a URL @@ -16,21 +42,23 @@ If the relevant part isn’t present in a URL, an empty string is returned. Extracts the protocol from a URL. -Examples of typical returned values: http, https, ftp, mailto, tel, magnet... +Examples of typical returned values: http, https, ftp, mailto, tel, magnet. ### domain Extracts the hostname from a URL. +**Syntax** + ``` sql domain(url) ``` **Arguments** -- `url` — URL. [String](../data-types/string.md). +- `url` — URL. [String](../../sql-reference/data-types/string.md). -The URL can be specified with or without a scheme. Examples: +The URL can be specified with or without a protocol. Examples: ``` text svn+ssh://some.svn-hosting.com:80/repo/trunk @@ -48,7 +76,7 @@ clickhouse.com **Returned values** -- Host name if ClickHouse can parse the input string as a URL, otherwise an empty string. [String](../data-types/string.md). +- Host name if the input string can be parsed as a URL, otherwise an empty string. [String](../data-types/string.md). **Example** @@ -62,9 +90,103 @@ SELECT domain('svn+ssh://some.svn-hosting.com:80/repo/trunk'); └────────────────────────────────────────────────────────┘ ``` +### domainRFC + +Extracts the hostname from a URL. Similar to [domain](#domain), but RFC 3986 conformant. + +**Syntax** + +``` sql +domainRFC(url) +``` + +**Arguments** + +- `url` — URL. [String](../data-types/string.md). + +**Returned values** + +- Host name if the input string can be parsed as a URL, otherwise an empty string. [String](../data-types/string.md). + +**Example** + +``` sql +SELECT + domain('http://user:password@example.com:8080/path?query=value#fragment'), + domainRFC('http://user:password@example.com:8080/path?query=value#fragment'); +``` + +``` text +┌─domain('http://user:password@example.com:8080/path?query=value#fragment')─┬─domainRFC('http://user:password@example.com:8080/path?query=value#fragment')─┐ +│ │ example.com │ +└───────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────┘ +``` + ### domainWithoutWWW -Returns the domain and removes no more than one ‘www.’ from the beginning of it, if present. +Returns the domain without leading `www.` if present. + +**Syntax** + +```sql +domainWithoutWWW(url) +``` + +**Arguments** + +- `url` — URL. [String](../data-types/string.md). + +**Returned values** + +- Domain name if the input string can be parsed as a URL (without leading `www.`), otherwise an empty string. [String](../data-types/string.md). + +**Example** + +``` sql +SELECT domainWithoutWWW('http://paul@www.example.com:80/'); +``` + +``` text +┌─domainWithoutWWW('http://paul@www.example.com:80/')─┐ +│ example.com │ +└─────────────────────────────────────────────────────┘ +``` + +### domainWithoutWWWRFC + +Returns the domain without leading `www.` if present. Similar to [domainWithoutWWW](#domainwithoutwww) but conforms to RFC 3986. + +**Syntax** + +```sql +domainWithoutWWWRFC(url) +``` + +**Arguments** + +- `url` — URL. [String](../data-types/string.md). + +**Returned values** + +- Domain name if the input string can be parsed as a URL (without leading `www.`), otherwise an empty string. [String](../data-types/string.md). + +**Example** + +Query: + +```sql +SELECT + domainWithoutWWW('http://user:password@www.example.com:8080/path?query=value#fragment'), + domainWithoutWWWRFC('http://user:password@www.example.com:8080/path?query=value#fragment'); +``` + +Result: + +```response +┌─domainWithoutWWW('http://user:password@www.example.com:8080/path?query=value#fragment')─┬─domainWithoutWWWRFC('http://user:password@www.example.com:8080/path?query=value#fragment')─┐ +│ │ example.com │ +└─────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────┘ +``` ### topLevelDomain @@ -76,63 +198,314 @@ topLevelDomain(url) **Arguments** -- `url` — URL. [String](../data-types/string.md). +- `url` — URL. [String](../../sql-reference/data-types/string.md). -The URL can be specified with or without a scheme. Examples: +:::note +The URL can be specified with or without a protocol. Examples: ``` text svn+ssh://some.svn-hosting.com:80/repo/trunk some.svn-hosting.com:80/repo/trunk https://clickhouse.com/time/ ``` +::: **Returned values** -- Domain name if ClickHouse can parse the input string as a URL. Otherwise, an empty string. [String](../data-types/string.md). +- Domain name if the input string can be parsed as a URL. Otherwise, an empty string. [String](../../sql-reference/data-types/string.md). **Example** +Query: + ``` sql SELECT topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk'); ``` +Result: + ``` text ┌─topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk')─┐ │ com │ └────────────────────────────────────────────────────────────────────┘ ``` +### topLevelDomainRFC + +Extracts the the top-level domain from a URL. +Similar to [topLevelDomain](#topleveldomain), but conforms to RFC 3986. + +``` sql +topLevelDomainRFC(url) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). + +:::note +The URL can be specified with or without a protocol. Examples: + +``` text +svn+ssh://some.svn-hosting.com:80/repo/trunk +some.svn-hosting.com:80/repo/trunk +https://clickhouse.com/time/ +``` +::: + +**Returned values** + +- Domain name if the input string can be parsed as a URL. Otherwise, an empty string. [String](../../sql-reference/data-types/string.md). + +**Example** + +Query: + +``` sql +SELECT topLevelDomain('http://foo:foo%41bar@foo.com'), topLevelDomainRFC('http://foo:foo%41bar@foo.com'); +``` + +Result: + +``` text +┌─topLevelDomain('http://foo:foo%41bar@foo.com')─┬─topLevelDomainRFC('http://foo:foo%41bar@foo.com')─┐ +│ │ com │ +└────────────────────────────────────────────────┴───────────────────────────────────────────────────┘ +``` + ### firstSignificantSubdomain -Returns the “first significant subdomain”. The first significant subdomain is a second-level domain if it is ‘com’, ‘net’, ‘org’, or ‘co’. Otherwise, it is a third-level domain. For example, `firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’`. The list of “insignificant” second-level domains and other implementation details may change in the future. +Returns the “first significant subdomain”. +The first significant subdomain is a second-level domain for `com`, `net`, `org`, or `co`, otherwise it is a third-level domain. +For example, `firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’`. +The list of "insignificant" second-level domains and other implementation details may change in the future. + +**Syntax** + +```sql +firstSignificantSubdomain(url) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- The first significant subdomain. [String](../data-types/string.md). + +**Example** + +Query: + +```sql +SELECT firstSignificantSubdomain('http://www.example.com/a/b/c?a=b') +``` + +Result: + +```reference +┌─firstSignificantSubdomain('http://www.example.com/a/b/c?a=b')─┐ +│ example │ +└───────────────────────────────────────────────────────────────┘ +``` + +### firstSignificantSubdomainRFC + +Returns the “first significant subdomain”. +The first significant subdomain is a second-level domain for `com`, `net`, `org`, or `co`, otherwise it is a third-level domain. +For example, `firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’`. +The list of "insignificant" second-level domains and other implementation details may change in the future. +Similar to [firstSignficantSubdomain](#firstsignificantsubdomain) but conforms to RFC 1034. + +**Syntax** + +```sql +firstSignificantSubdomainRFC(url) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- The first significant subdomain. [String](../data-types/string.md). + +**Example** + +Query: + +```sql +SELECT + firstSignificantSubdomain('http://user:password@example.com:8080/path?query=value#fragment'), + firstSignificantSubdomainRFC('http://user:password@example.com:8080/path?query=value#fragment'); +``` + +Result: + +```reference +┌─firstSignificantSubdomain('http://user:password@example.com:8080/path?query=value#fragment')─┬─firstSignificantSubdomainRFC('http://user:password@example.com:8080/path?query=value#fragment')─┐ +│ │ example │ +└──────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` ### cutToFirstSignificantSubdomain -Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain” (see the explanation above). +Returns the part of the domain that includes top-level subdomains up to the [“first significant subdomain”](#firstsignificantsubdomain). -For example: +**Syntax** + +```sql +cutToFirstSignificantSubdomain(url) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- Part of the domain that includes top-level subdomains up to the first significant subdomain if possible, otherwise returns an empty string. [String](../data-types/string.md). + +**Example** + +Query: + +```sql +SELECT + cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/'), + cutToFirstSignificantSubdomain('www.tr'), + cutToFirstSignificantSubdomain('tr'); +``` + +Result: + +```response +┌─cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/')─┬─cutToFirstSignificantSubdomain('www.tr')─┬─cutToFirstSignificantSubdomain('tr')─┐ +│ clickhouse.com.tr │ tr │ │ +└───────────────────────────────────────────────────────────────────┴──────────────────────────────────────────┴──────────────────────────────────────┘ +``` + +### cutToFirstSignificantSubdomainRFC + +Returns the part of the domain that includes top-level subdomains up to the [“first significant subdomain”](#firstsignificantsubdomain). +Similar to [cutToFirstSignificantSubdomain](#cuttofirstsignificantsubdomain) but conforms to RFC 3986. + +**Syntax** + +```sql +cutToFirstSignificantSubdomainRFC(url) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- Part of the domain that includes top-level subdomains up to the first significant subdomain if possible, otherwise returns an empty string. [String](../data-types/string.md). + +**Example** + +Query: + +```sql +SELECT + cutToFirstSignificantSubdomain('http://user:password@example.com:8080'), + cutToFirstSignificantSubdomainRFC('http://user:password@example.com:8080'); +``` + +Result: + +```response +┌─cutToFirstSignificantSubdomain('http://user:password@example.com:8080')─┬─cutToFirstSignificantSubdomainRFC('http://user:password@example.com:8080')─┐ +│ │ example.com │ +└─────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┘ +``` -- `cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'`. -- `cutToFirstSignificantSubdomain('www.tr') = 'tr'`. -- `cutToFirstSignificantSubdomain('tr') = ''`. ### cutToFirstSignificantSubdomainWithWWW -Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain”, without stripping "www". +Returns the part of the domain that includes top-level subdomains up to the "first significant subdomain", without stripping `www`. -For example: +**Syntax** -- `cutToFirstSignificantSubdomainWithWWW('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'`. -- `cutToFirstSignificantSubdomainWithWWW('www.tr') = 'www.tr'`. -- `cutToFirstSignificantSubdomainWithWWW('tr') = ''`. +```sql +cutToFirstSignificantSubdomainWithWWW(url) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- Part of the domain that includes top-level subdomains up to the first significant subdomain (with `www`) if possible, otherwise returns an empty string. [String](../data-types/string.md). + +**Example** + +Query: + +```sql +SELECT + cutToFirstSignificantSubdomainWithWWW('https://news.clickhouse.com.tr/'), + cutToFirstSignificantSubdomainWithWWW('www.tr'), + cutToFirstSignificantSubdomainWithWWW('tr'); +``` + +Result: + +```response +┌─cutToFirstSignificantSubdomainWithWWW('https://news.clickhouse.com.tr/')─┬─cutToFirstSignificantSubdomainWithWWW('www.tr')─┬─cutToFirstSignificantSubdomainWithWWW('tr')─┐ +│ clickhouse.com.tr │ www.tr │ │ +└──────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────┴─────────────────────────────────────────────┘ +``` + +### cutToFirstSignificantSubdomainWithWWWRFC + +Returns the part of the domain that includes top-level subdomains up to the "first significant subdomain", without stripping `www`. +Similar to [cutToFirstSignificantSubdomainWithWWW](#cuttofirstsignificantsubdomaincustomwithwww) but conforms to RFC 3986. + +**Syntax** + +```sql +cutToFirstSignificantSubdomainWithWWW(url) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- Part of the domain that includes top-level subdomains up to the first significant subdomain (with "www") if possible, otherwise returns an empty string. [String](../data-types/string.md). + +**Example** + +Query: + +```sql +SELECT + cutToFirstSignificantSubdomainWithWWW('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy'), + cutToFirstSignificantSubdomainWithWWWRFC('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy'); +``` + +Result: + +```response +┌─cutToFirstSignificantSubdomainWithWWW('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy')─┬─cutToFirstSignificantSubdomainWithWWWRFC('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy')─┐ +│ │ mail.ru │ +└───────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────┘ +``` ### cutToFirstSignificantSubdomainCustom -Returns the part of the domain that includes top-level subdomains up to the first significant subdomain. Accepts custom [TLD list](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) name. +Returns the part of the domain that includes top-level subdomains up to the first significant subdomain. +Accepts custom [TLD list](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) name. +This function can be useful if you need a fresh TLD list or if you have a custom list. -Can be useful if you need fresh TLD list or you have custom. - -Configuration example: +**Configuration example** ```xml @@ -146,17 +519,17 @@ Configuration example: **Syntax** ``` sql -cutToFirstSignificantSubdomainCustom(URL, TLD) +cutToFirstSignificantSubdomain(url, tld) ``` **Arguments** -- `URL` — URL. [String](../data-types/string.md). -- `TLD` — Custom TLD list name. [String](../data-types/string.md). +- `url` — URL. [String](../../sql-reference/data-types/string.md). +- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md). **Returned value** -- Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../data-types/string.md). +- Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../../sql-reference/data-types/string.md). **Example** @@ -178,13 +551,39 @@ Result: - [firstSignificantSubdomain](#firstsignificantsubdomain). +### cutToFirstSignificantSubdomainCustomRFC + +Returns the part of the domain that includes top-level subdomains up to the first significant subdomain. +Accepts custom [TLD list](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) name. +This function can be useful if you need a fresh TLD list or if you have a custom list. +Similar to [cutToFirstSignificantSubdomainCustom](#cuttofirstsignificantsubdomaincustom) but conforms to RFC 3986. + +**Syntax** + +``` sql +cutToFirstSignificantSubdomainRFC(url, tld) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). +- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../../sql-reference/data-types/string.md). + +**See Also** + +- [firstSignificantSubdomain](#firstsignificantsubdomain). + ### cutToFirstSignificantSubdomainCustomWithWWW -Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`. Accepts custom TLD list name. +Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`. +Accepts custom TLD list name. +It can be useful if you need a fresh TLD list or if you have a custom list. -Can be useful if you need fresh TLD list or you have custom. - -Configuration example: +**Configuration example** ```xml @@ -198,13 +597,13 @@ Configuration example: **Syntax** ```sql -cutToFirstSignificantSubdomainCustomWithWWW(URL, TLD) +cutToFirstSignificantSubdomainCustomWithWWW(url, tld) ``` **Arguments** -- `URL` — URL. [String](../data-types/string.md). -- `TLD` — Custom TLD list name. [String](../data-types/string.md). +- `url` — URL. [String](../../sql-reference/data-types/string.md). +- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md). **Returned value** @@ -230,10 +629,36 @@ Result: - [firstSignificantSubdomain](#firstsignificantsubdomain). +### cutToFirstSignificantSubdomainCustomWithWWWRFC + +Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`. +Accepts custom TLD list name. +It can be useful if you need a fresh TLD list or if you have a custom list. +Similar to [cutToFirstSignificantSubdomainCustomWithWWW](#cuttofirstsignificantsubdomaincustomwithwww) but conforms to RFC 3986. + +**Syntax** + +```sql +cutToFirstSignificantSubdomainCustomWithWWWRFC(url, tld) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). +- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- Part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`. [String](../../sql-reference/data-types/string.md). + +**See Also** + +- [firstSignificantSubdomain](#firstsignificantsubdomain). + ### firstSignificantSubdomainCustom -Returns the first significant subdomain. Accepts customs TLD list name. - +Returns the first significant subdomain. +Accepts customs TLD list name. Can be useful if you need fresh TLD list or you have custom. Configuration example: @@ -250,17 +675,17 @@ Configuration example: **Syntax** ```sql -firstSignificantSubdomainCustom(URL, TLD) +firstSignificantSubdomainCustom(url, tld) ``` **Arguments** -- `URL` — URL. [String](../data-types/string.md). -- `TLD` — Custom TLD list name. [String](../data-types/string.md). +- `url` — URL. [String](../../sql-reference/data-types/string.md). +- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md). **Returned value** -- First significant subdomain. [String](../data-types/string.md). +- First significant subdomain. [String](../../sql-reference/data-types/string.md). **Example** @@ -282,47 +707,156 @@ Result: - [firstSignificantSubdomain](#firstsignificantsubdomain). -### port(URL\[, default_port = 0\]) +### firstSignificantSubdomainCustomRFC -Returns the port or `default_port` if there is no port in the URL (or in case of validation error). +Returns the first significant subdomain. +Accepts customs TLD list name. +Can be useful if you need fresh TLD list or you have custom. +Similar to [firstSignificantSubdomainCustom](#firstsignificantsubdomaincustom) but conforms to RFC 3986. + +**Syntax** + +```sql +firstSignificantSubdomainCustomRFC(url, tld) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). +- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md). + +**Returned value** + +- First significant subdomain. [String](../../sql-reference/data-types/string.md). + +**See Also** + +- [firstSignificantSubdomain](#firstsignificantsubdomain). + +### port + +Returns the port or `default_port` if the URL contains no port or cannot be parsed. + +**Syntax** + +```sql +port(url [, default_port = 0]) +``` + +**Arguments** + +- `url` — URL. [String](../data-types/string.md). +- `default_port` — The default port number to be returned. [UInt16](../data-types/int-uint.md). + +**Returned value** + +- Port or the default port if there is no port in the URL or in case of a validation error. [UInt16](../data-types/int-uint.md). + +**Example** + +Query: + +```sql +SELECT port('http://paul@www.example.com:80/'); +``` + +Result: + +```response +┌─port('http://paul@www.example.com:80/')─┐ +│ 80 │ +└─────────────────────────────────────────┘ +``` + +### portRFC + +Returns the port or `default_port` if the URL contains no port or cannot be parsed. +Similar to [port](#port), but RFC 3986 conformant. + +**Syntax** + +```sql +portRFC(url [, default_port = 0]) +``` + +**Arguments** + +- `url` — URL. [String](../../sql-reference/data-types/string.md). +- `default_port` — The default port number to be returned. [UInt16](../data-types/int-uint.md). + +**Returned value** + +- Port or the default port if there is no port in the URL or in case of a validation error. [UInt16](../data-types/int-uint.md). + +**Example** + +Query: + +```sql +SELECT + port('http://user:password@example.com:8080'), + portRFC('http://user:password@example.com:8080'); +``` + +Result: + +```resposne +┌─port('http://user:password@example.com:8080')─┬─portRFC('http://user:password@example.com:8080')─┐ +│ 0 │ 8080 │ +└───────────────────────────────────────────────┴──────────────────────────────────────────────────┘ +``` ### path -Returns the path. Example: `/top/news.html` The path does not include the query string. +Returns the path without query string. + +Example: `/top/news.html`. ### pathFull -The same as above, but including query string and fragment. Example: /top/news.html?page=2#comments +The same as above, but including query string and fragment. + +Example: `/top/news.html?page=2#comments`. ### queryString -Returns the query string. Example: page=1&lr=213. query-string does not include the initial question mark, as well as # and everything after #. +Returns the query string without the initial question mark, `#` and everything after `#`. + +Example: `page=1&lr=213`. ### fragment -Returns the fragment identifier. fragment does not include the initial hash symbol. +Returns the fragment identifier without the initial hash symbol. ### queryStringAndFragment -Returns the query string and fragment identifier. Example: page=1#29390. +Returns the query string and fragment identifier. -### extractURLParameter(URL, name) +Example: `page=1#29390`. -Returns the value of the ‘name’ parameter in the URL, if present. Otherwise, an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument. +### extractURLParameter(url, name) -### extractURLParameters(URL) +Returns the value of the `name` parameter in the URL, if present, otherwise an empty string is returned. +If there are multiple parameters with this name, the first occurrence is returned. +The function assumes that the parameter in the `url` parameter is encoded in the same way as in the `name` argument. -Returns an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way. +### extractURLParameters(url) -### extractURLParameterNames(URL) +Returns an array of `name=value` strings corresponding to the URL parameters. +The values are not decoded. -Returns an array of name strings corresponding to the names of URL parameters. The values are not decoded in any way. +### extractURLParameterNames(url) -### URLHierarchy(URL) +Returns an array of name strings corresponding to the names of URL parameters. +The values are not decoded. -Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters. +### URLHierarchy(url) -### URLPathHierarchy(URL) +Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string. +Consecutive separator characters are counted as one. +The cut is made in the position after all the consecutive separator characters. + +### URLPathHierarchy(url) The same as above, but without the protocol and host in the result. The / element (root) is not included. @@ -334,9 +868,10 @@ URLPathHierarchy('https://example.com/browse/CONV-6788') = ] ``` -### encodeURLComponent(URL) +### encodeURLComponent(url) Returns the encoded URL. + Example: ``` sql @@ -349,9 +884,10 @@ SELECT encodeURLComponent('http://127.0.0.1:8123/?query=SELECT 1;') AS EncodedUR └──────────────────────────────────────────────────────────┘ ``` -### decodeURLComponent(URL) +### decodeURLComponent(url) Returns the decoded URL. + Example: ``` sql @@ -364,9 +900,10 @@ SELECT decodeURLComponent('http://127.0.0.1:8123/?query=SELECT%201%3B') AS Decod └────────────────────────────────────────┘ ``` -### encodeURLFormComponent(URL) +### encodeURLFormComponent(url) Returns the encoded URL. Follows rfc-1866, space(` `) is encoded as plus(`+`). + Example: ``` sql @@ -379,9 +916,10 @@ SELECT encodeURLFormComponent('http://127.0.0.1:8123/?query=SELECT 1 2+3') AS En └───────────────────────────────────────────────────────────┘ ``` -### decodeURLFormComponent(URL) +### decodeURLFormComponent(url) Returns the decoded URL. Follows rfc-1866, plain plus(`+`) is decoded as space(` `). + Example: ``` sql @@ -401,12 +939,12 @@ Extracts network locality (`username:password@host:port`) from a URL. **Syntax** ``` sql -netloc(URL) +netloc(url) ``` **Arguments** -- `url` — URL. [String](../data-types/string.md). +- `url` — URL. [String](../../sql-reference/data-types/string.md). **Returned value** @@ -428,44 +966,45 @@ Result: └───────────────────────────────────────────┘ ``` -## Functions that Remove Part of a URL +## Functions that remove part of a URL If the URL does not have anything similar, the URL remains unchanged. ### cutWWW -Removes no more than one ‘www.’ from the beginning of the URL’s domain, if present. +Removes leading `www.` (if present) from the URL’s domain. ### cutQueryString -Removes query string. The question mark is also removed. +Removes query string, including the question mark. ### cutFragment -Removes the fragment identifier. The number sign is also removed. +Removes the fragment identifier, including the number sign. ### cutQueryStringAndFragment -Removes the query string and fragment identifier. The question mark and number sign are also removed. +Removes the query string and fragment identifier, including the question mark and number sign. -### cutURLParameter(URL, name) +### cutURLParameter(url, name) -Removes the `name` parameter from URL, if present. This function does not encode or decode characters in parameter names, e.g. `Client ID` and `Client%20ID` are treated as different parameter names. +Removes the `name` parameter from a URL, if present. +This function does not encode or decode characters in parameter names, e.g. `Client ID` and `Client%20ID` are treated as different parameter names. **Syntax** ``` sql -cutURLParameter(URL, name) +cutURLParameter(url, name) ``` **Arguments** -- `url` — URL. [String](../data-types/string.md). -- `name` — name of URL parameter. [String](../data-types/string.md) or [Array](../data-types/array.md) of Strings. +- `url` — URL. [String](../../sql-reference/data-types/string.md). +- `name` — name of URL parameter. [String](../../sql-reference/data-types/string.md) or [Array](../../sql-reference/data-types/array.md) of Strings. **Returned value** -- URL with `name` URL parameter removed. [String](../data-types/string.md). +- url with `name` URL parameter removed. [String](../data-types/string.md). **Example** diff --git a/docs/en/sql-reference/functions/uuid-functions.md b/docs/en/sql-reference/functions/uuid-functions.md index df081f1065b..0323ae728a9 100644 --- a/docs/en/sql-reference/functions/uuid-functions.md +++ b/docs/en/sql-reference/functions/uuid-functions.md @@ -126,149 +126,6 @@ SELECT generateUUIDv7(1), generateUUIDv7(2); └──────────────────────────────────────┴──────────────────────────────────────┘ ``` -## generateUUIDv7ThreadMonotonic - -Generates a [UUID](../data-types/uuid.md) of [version 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04). - -The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit) to distinguish UUIDs within a millisecond (including a variant field "2", 2 bit), and a random field (32 bits). -For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. -In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value. - -This function behaves like [generateUUIDv7](#generateUUIDv7) but gives no guarantee on counter monotony across different simultaneous requests. -Monotonicity within one timestamp is guaranteed only within the same thread calling this function to generate UUIDs. - -``` - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -| unix_ts_ms | -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -| unix_ts_ms | ver | counter_high_bits | -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -|var| counter_low_bits | -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -| rand_b | -└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘ -``` - -:::note -As of April 2024, version 7 UUIDs are in draft status and their layout may change in future. -::: - -**Syntax** - -``` sql -generateUUIDv7ThreadMonotonic([expr]) -``` - -**Arguments** - -- `expr` — An arbitrary [expression](../syntax.md#syntax-expressions) used to bypass [common subexpression elimination](../functions/index.md#common-subexpression-elimination) if the function is called multiple times in a query. The value of the expression has no effect on the returned UUID. Optional. - -**Returned value** - -A value of type UUIDv7. - -**Usage example** - -First, create a table with a column of type UUID, then insert a generated UUIDv7 into the table. - -``` sql -CREATE TABLE tab (uuid UUID) ENGINE = Memory; - -INSERT INTO tab SELECT generateUUIDv7ThreadMonotonic(); - -SELECT * FROM tab; -``` - -Result: - -```response -┌─────────────────────────────────uuid─┐ -│ 018f05e2-e3b2-70cb-b8be-64b09b626d32 │ -└──────────────────────────────────────┘ -``` - -**Example with multiple UUIDs generated per row** - -```sql -SELECT generateUUIDv7ThreadMonotonic(1), generateUUIDv7ThreadMonotonic(2); - -┌─generateUUIDv7ThreadMonotonic(1)─────┬─generateUUIDv7ThreadMonotonic(2)─────┐ -│ 018f05e1-14ee-7bc5-9906-207153b400b1 │ 018f05e1-14ee-7bc5-9906-2072b8e96758 │ -└──────────────────────────────────────┴──────────────────────────────────────┘ -``` - -## generateUUIDv7NonMonotonic - -Generates a [UUID](../data-types/uuid.md) of [version 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04). - -The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits) and a random field (76 bits, including a 2-bit variant field "2"). - -This function is the fastest `generateUUIDv7*` function but it gives no monotonicity guarantees within a timestamp. - -``` - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -| unix_ts_ms | -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -| unix_ts_ms | ver | rand_a | -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -|var| rand_b | -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -| rand_b | -└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘ -``` - -:::note -As of April 2024, version 7 UUIDs are in draft status and their layout may change in future. -::: - -**Syntax** - -``` sql -generateUUIDv7NonMonotonic([expr]) -``` - -**Arguments** - -- `expr` — An arbitrary [expression](../syntax.md#syntax-expressions) used to bypass [common subexpression elimination](../functions/index.md#common-subexpression-elimination) if the function is called multiple times in a query. The value of the expression has no effect on the returned UUID. Optional. - -**Returned value** - -A value of type UUIDv7. - -**Example** - -First, create a table with a column of type UUID, then insert a generated UUIDv7 into the table. - -``` sql -CREATE TABLE tab (uuid UUID) ENGINE = Memory; - -INSERT INTO tab SELECT generateUUIDv7NonMonotonic(); - -SELECT * FROM tab; -``` - -Result: - -```response -┌─────────────────────────────────uuid─┐ -│ 018f05af-f4a8-778f-beee-1bedbc95c93b │ -└──────────────────────────────────────┘ -``` - -**Example with multiple UUIDs generated per row** - -```sql -SELECT generateUUIDv7NonMonotonic(1), generateUUIDv7NonMonotonic(2); - -┌─generateUUIDv7NonMonotonic(1) ───────┬─generateUUIDv7(2)NonMonotonic────────┐ -│ 018f05b1-8c2e-7567-a988-48d09606ae8c │ 018f05b1-8c2e-7946-895b-fcd7635da9a0 │ -└──────────────────────────────────────┴──────────────────────────────────────┘ -``` - ## empty Checks whether the input UUID is empty. diff --git a/docs/en/sql-reference/statements/alter/view.md b/docs/en/sql-reference/statements/alter/view.md index 83e8e9311b4..fb7a5bd7c03 100644 --- a/docs/en/sql-reference/statements/alter/view.md +++ b/docs/en/sql-reference/statements/alter/view.md @@ -79,8 +79,6 @@ ORDER BY ts, event_type; │ 2020-01-03 00:00:00 │ imp │ │ 2 │ 0 │ └─────────────────────┴────────────┴─────────┴────────────┴──────┘ -SET allow_experimental_alter_materialized_view_structure=1; - ALTER TABLE mv MODIFY QUERY SELECT toStartOfDay(ts) ts, event_type, browser, count() events_cnt, @@ -178,7 +176,6 @@ SELECT * FROM mv; └───┘ ``` ```sql -set allow_experimental_alter_materialized_view_structure=1; ALTER TABLE mv MODIFY QUERY SELECT a * 2 as a FROM src_table; INSERT INTO src_table (a) VALUES (3), (4); SELECT * FROM mv; diff --git a/docs/en/sql-reference/statements/system.md b/docs/en/sql-reference/statements/system.md index 9fec5420f97..7efbff1b42b 100644 --- a/docs/en/sql-reference/statements/system.md +++ b/docs/en/sql-reference/statements/system.md @@ -206,6 +206,32 @@ Enables background data distribution when inserting data into distributed tables SYSTEM START DISTRIBUTED SENDS [db.] [ON CLUSTER cluster_name] ``` +### STOP LISTEN + +Closes the socket and gracefully terminates the existing connections to the server on the specified port with the specified protocol. + +However, if the corresponding protocol settings were not specified in the clickhouse-server configuration, this command will have no effect. + +```sql +SYSTEM STOP LISTEN [ON CLUSTER cluster_name] [QUERIES ALL | QUERIES DEFAULT | QUERIES CUSTOM | TCP | TCP WITH PROXY | TCP SECURE | HTTP | HTTPS | MYSQL | GRPC | POSTGRESQL | PROMETHEUS | CUSTOM 'protocol'] +``` + +- If `CUSTOM 'protocol'` modifier is specified, the custom protocol with the specified name defined in the protocols section of the server configuration will be stopped. +- If `QUERIES ALL [EXCEPT .. [,..]]` modifier is specified, all protocols are stopped, unless specified with `EXCEPT` clause. +- If `QUERIES DEFAULT [EXCEPT .. [,..]]` modifier is specified, all default protocols are stopped, unless specified with `EXCEPT` clause. +- If `QUERIES CUSTOM [EXCEPT .. [,..]]` modifier is specified, all custom protocols are stopped, unless specified with `EXCEPT` clause. + +### START LISTEN + +Allows new connections to be established on the specified protocols. + +However, if the server on the specified port and protocol was not stopped using the SYSTEM STOP LISTEN command, this command will have no effect. + +```sql +SYSTEM START LISTEN [ON CLUSTER cluster_name] [QUERIES ALL | QUERIES DEFAULT | QUERIES CUSTOM | TCP | TCP WITH PROXY | TCP SECURE | HTTP | HTTPS | MYSQL | GRPC | POSTGRESQL | PROMETHEUS | CUSTOM 'protocol'] +``` + + ## Managing MergeTree Tables ClickHouse can manage background processes in [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) tables. @@ -463,30 +489,16 @@ Will do sync syscall. SYSTEM SYNC FILE CACHE [ON CLUSTER cluster_name] ``` +### UNLOAD PRIMARY KEY -## SYSTEM STOP LISTEN - -Closes the socket and gracefully terminates the existing connections to the server on the specified port with the specified protocol. - -However, if the corresponding protocol settings were not specified in the clickhouse-server configuration, this command will have no effect. +Unload the primary keys for the given table or for all tables. ```sql -SYSTEM STOP LISTEN [ON CLUSTER cluster_name] [QUERIES ALL | QUERIES DEFAULT | QUERIES CUSTOM | TCP | TCP WITH PROXY | TCP SECURE | HTTP | HTTPS | MYSQL | GRPC | POSTGRESQL | PROMETHEUS | CUSTOM 'protocol'] +SYSTEM UNLOAD PRIMARY KEY [db.]name ``` -- If `CUSTOM 'protocol'` modifier is specified, the custom protocol with the specified name defined in the protocols section of the server configuration will be stopped. -- If `QUERIES ALL [EXCEPT .. [,..]]` modifier is specified, all protocols are stopped, unless specified with `EXCEPT` clause. -- If `QUERIES DEFAULT [EXCEPT .. [,..]]` modifier is specified, all default protocols are stopped, unless specified with `EXCEPT` clause. -- If `QUERIES CUSTOM [EXCEPT .. [,..]]` modifier is specified, all custom protocols are stopped, unless specified with `EXCEPT` clause. - -## SYSTEM START LISTEN - -Allows new connections to be established on the specified protocols. - -However, if the server on the specified port and protocol was not stopped using the SYSTEM STOP LISTEN command, this command will have no effect. - ```sql -SYSTEM START LISTEN [ON CLUSTER cluster_name] [QUERIES ALL | QUERIES DEFAULT | QUERIES CUSTOM | TCP | TCP WITH PROXY | TCP SECURE | HTTP | HTTPS | MYSQL | GRPC | POSTGRESQL | PROMETHEUS | CUSTOM 'protocol'] +SYSTEM UNLOAD PRIMARY KEY ``` ## Managing Refreshable Materialized Views {#refreshable-materialized-views} @@ -495,7 +507,7 @@ Commands to control background tasks performed by [Refreshable Materialized View Keep an eye on [`system.view_refreshes`](../../operations/system-tables/view_refreshes.md) while using them. -### SYSTEM REFRESH VIEW +### REFRESH VIEW Trigger an immediate out-of-schedule refresh of a given view. @@ -503,7 +515,7 @@ Trigger an immediate out-of-schedule refresh of a given view. SYSTEM REFRESH VIEW [db.]name ``` -### SYSTEM STOP VIEW, SYSTEM STOP VIEWS +### STOP VIEW, STOP VIEWS Disable periodic refreshing of the given view or all refreshable views. If a refresh is in progress, cancel it too. @@ -514,7 +526,7 @@ SYSTEM STOP VIEW [db.]name SYSTEM STOP VIEWS ``` -### SYSTEM START VIEW, SYSTEM START VIEWS +### START VIEW, START VIEWS Enable periodic refreshing for the given view or all refreshable views. No immediate refresh is triggered. @@ -525,22 +537,10 @@ SYSTEM START VIEW [db.]name SYSTEM START VIEWS ``` -### SYSTEM CANCEL VIEW +### CANCEL VIEW If there's a refresh in progress for the given view, interrupt and cancel it. Otherwise do nothing. ```sql SYSTEM CANCEL VIEW [db.]name ``` - -### SYSTEM UNLOAD PRIMARY KEY - -Unload the primary keys for the given table or for all tables. - -```sql -SYSTEM UNLOAD PRIMARY KEY [db.]name -``` - -```sql -SYSTEM UNLOAD PRIMARY KEY -``` \ No newline at end of file diff --git a/docs/en/sql-reference/table-functions/loop.md b/docs/en/sql-reference/table-functions/loop.md new file mode 100644 index 00000000000..3a9367b2d10 --- /dev/null +++ b/docs/en/sql-reference/table-functions/loop.md @@ -0,0 +1,55 @@ +# loop + +**Syntax** + +``` sql +SELECT ... FROM loop(database, table); +SELECT ... FROM loop(database.table); +SELECT ... FROM loop(table); +SELECT ... FROM loop(other_table_function(...)); +``` + +**Parameters** + +- `database` — database name. +- `table` — table name. +- `other_table_function(...)` — other table function. + Example: `SELECT * FROM loop(numbers(10));` + `other_table_function(...)` here is `numbers(10)`. + +**Returned Value** + +Infinite loop to return query results. + +**Examples** + +Selecting data from ClickHouse: + +``` sql +SELECT * FROM loop(test_database, test_table); +SELECT * FROM loop(test_database.test_table); +SELECT * FROM loop(test_table); +``` + +Or using other table function: + +``` sql +SELECT * FROM loop(numbers(3)) LIMIT 7; + ┌─number─┐ +1. │ 0 │ +2. │ 1 │ +3. │ 2 │ + └────────┘ + ┌─number─┐ +4. │ 0 │ +5. │ 1 │ +6. │ 2 │ + └────────┘ + ┌─number─┐ +7. │ 0 │ + └────────┘ +``` +``` sql +SELECT * FROM loop(mysql('localhost:3306', 'test', 'test', 'user', 'password')); +... +``` \ No newline at end of file diff --git a/docs/ru/getting-started/install.md b/docs/ru/getting-started/install.md index 59650826659..aee445da843 100644 --- a/docs/ru/getting-started/install.md +++ b/docs/ru/getting-started/install.md @@ -38,26 +38,6 @@ sudo service clickhouse-server start clickhouse-client # or "clickhouse-client --password" if you've set up a password. ``` -
- -Устаревший способ установки deb-пакетов - -``` bash -sudo apt-get install apt-transport-https ca-certificates dirmngr -sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 - -echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \ - /etc/apt/sources.list.d/clickhouse.list -sudo apt-get update - -sudo apt-get install -y clickhouse-server clickhouse-client - -sudo service clickhouse-server start -clickhouse-client # or "clickhouse-client --password" if you set up a password. -``` - -
- Чтобы использовать различные [версии ClickHouse](../faq/operations/production.md) в зависимости от ваших потребностей, вы можете заменить `stable` на `lts` или `testing`. Также вы можете вручную скачать и установить пакеты из [репозитория](https://packages.clickhouse.com/deb/pool/stable). @@ -110,22 +90,6 @@ sudo systemctl status clickhouse-server clickhouse-client # илм "clickhouse-client --password" если установлен пароль ``` -
- -Устаревший способ установки rpm-пакетов - -``` bash -sudo yum install yum-utils -sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG -sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo -sudo yum install clickhouse-server clickhouse-client - -sudo /etc/init.d/clickhouse-server start -clickhouse-client # or "clickhouse-client --password" if you set up a password. -``` - -
- Для использования наиболее свежих версий нужно заменить `stable` на `testing` (рекомендуется для тестовых окружений). Также иногда доступен `prestable`. Для непосредственной установки пакетов необходимо выполнить следующие команды: @@ -178,33 +142,6 @@ tar -xzvf "clickhouse-client-$LATEST_VERSION-${ARCH}.tgz" \ sudo "clickhouse-client-$LATEST_VERSION/install/doinst.sh" ``` -
- -Устаревший способ установки из архивов tgz - -``` bash -export LATEST_VERSION=$(curl -s https://repo.clickhouse.com/tgz/stable/ | \ - grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1) -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz - -tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz -sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh - -tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz -sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh - -tar -xzvf clickhouse-server-$LATEST_VERSION.tgz -sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh -sudo /etc/init.d/clickhouse-server start - -tar -xzvf clickhouse-client-$LATEST_VERSION.tgz -sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh -``` -
- Для продуктивных окружений рекомендуется использовать последнюю `stable`-версию. Её номер также можно найти на github с на вкладке https://github.com/ClickHouse/ClickHouse/tags c постфиксом `-stable`. ### Из Docker образа {#from-docker-image} diff --git a/docs/ru/sql-reference/functions/uuid-functions.md b/docs/ru/sql-reference/functions/uuid-functions.md index a7fe6592338..7fe90263599 100644 --- a/docs/ru/sql-reference/functions/uuid-functions.md +++ b/docs/ru/sql-reference/functions/uuid-functions.md @@ -112,113 +112,6 @@ SELECT generateUUIDv7(1), generateUUIDv7(2) └──────────────────────────────────────┴──────────────────────────────────────┘ ``` -## generateUUIDv7ThreadMonotonic {#uuidv7threadmonotonic-function-generate} - -Генерирует идентификатор [UUID версии 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04). Генерируемый UUID состоит из 48-битной временной метки (Unix time в миллисекундах), маркеров версии 7 и варианта 2, монотонно возрастающего счётчика для данной временной метки и случайных данных в указанной ниже последовательности. Для каждой новой временной метки счётчик стартует с нового случайного значения, а для следующих UUIDv7 он увеличивается на единицу. В случае переполнения счётчика временная метка принудительно увеличивается на 1, и счётчик снова стартует со случайного значения. Данная функция является ускоренным аналогом функции `generateUUIDv7` за счёт потери гарантии монотонности счётчика при одной и той же метке времени между одновременно исполняемыми разными запросами. Монотонность счётчика гарантируется только в пределах одного треда, исполняющего данную функцию для генерации нескольких UUID. - -**Синтаксис** - -``` sql -generateUUIDv7ThreadMonotonic([x]) -``` - -**Аргументы** - -- `x` — [выражение](../syntax.md#syntax-expressions), возвращающее значение одного из [поддерживаемых типов данных](../data-types/index.md#data_types). Значение используется, чтобы избежать [склейки одинаковых выражений](index.md#common-subexpression-elimination), если функция вызывается несколько раз в одном запросе. Необязательный параметр. - -**Возвращаемое значение** - -Значение типа [UUID](../../sql-reference/functions/uuid-functions.md). - -**Пример использования** - -Этот пример демонстрирует, как создать таблицу с UUID-колонкой и добавить в нее сгенерированный UUIDv7. - -``` sql -CREATE TABLE t_uuid (x UUID) ENGINE=TinyLog - -INSERT INTO t_uuid SELECT generateUUIDv7ThreadMonotonic() - -SELECT * FROM t_uuid -``` - -``` text -┌────────────────────────────────────x─┐ -│ 018f05e2-e3b2-70cb-b8be-64b09b626d32 │ -└──────────────────────────────────────┘ -``` - -**Пример использования, для генерации нескольких значений в одной строке** - -```sql -SELECT generateUUIDv7ThreadMonotonic(1), generateUUIDv7ThreadMonotonic(7) - -┌─generateUUIDv7ThreadMonotonic(1)─────┬─generateUUIDv7ThreadMonotonic(2)─────┐ -│ 018f05e1-14ee-7bc5-9906-207153b400b1 │ 018f05e1-14ee-7bc5-9906-2072b8e96758 │ -└──────────────────────────────────────┴──────────────────────────────────────┘ -``` - -## generateUUIDv7NonMonotonic {#uuidv7nonmonotonic-function-generate} - -Генерирует идентификатор [UUID версии 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04). Генерируемый UUID состоит из 48-битной временной метки (Unix time в миллисекундах), маркеров версии 7 и варианта 2, и случайных данных в следующей последовательности: -``` - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -| unix_ts_ms | -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -| unix_ts_ms | ver | rand_a | -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -|var| rand_b | -├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ -| rand_b | -└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘ -``` -::::note -На апрель 2024 года UUIDv7 находится в статусе черновика и его раскладка по битам может в итоге измениться. -:::: - -**Синтаксис** - -``` sql -generateUUIDv7NonMonotonic([x]) -``` - -**Аргументы** - -- `x` — [выражение](../syntax.md#syntax-expressions), возвращающее значение одного из [поддерживаемых типов данных](../data-types/index.md#data_types). Значение используется, чтобы избежать [склейки одинаковых выражений](index.md#common-subexpression-elimination), если функция вызывается несколько раз в одном запросе. Необязательный параметр. - -**Возвращаемое значение** - -Значение типа [UUID](../../sql-reference/functions/uuid-functions.md). - -**Пример использования** - -Этот пример демонстрирует, как создать таблицу с UUID-колонкой и добавить в нее сгенерированный UUIDv7. - -``` sql -CREATE TABLE t_uuid (x UUID) ENGINE=TinyLog - -INSERT INTO t_uuid SELECT generateUUIDv7NonMonotonic() - -SELECT * FROM t_uuid -``` - -``` text -┌────────────────────────────────────x─┐ -│ 018f05af-f4a8-778f-beee-1bedbc95c93b │ -└──────────────────────────────────────┘ -``` - -**Пример использования, для генерации нескольких значений в одной строке** - -```sql -SELECT generateUUIDv7NonMonotonic(1), generateUUIDv7NonMonotonic(7) -┌─generateUUIDv7NonMonotonic(1)────────┬─generateUUIDv7NonMonotonic(2)────────┐ -│ 018f05b1-8c2e-7567-a988-48d09606ae8c │ 018f05b1-8c2e-7946-895b-fcd7635da9a0 │ -└──────────────────────────────────────┴──────────────────────────────────────┘ -``` - ## empty {#empty} Проверяет, является ли входной UUID пустым. diff --git a/docs/zh/getting-started/install.md b/docs/zh/getting-started/install.md index e65cfea62cd..7e4fb6826e4 100644 --- a/docs/zh/getting-started/install.md +++ b/docs/zh/getting-started/install.md @@ -38,26 +38,6 @@ sudo service clickhouse-server start clickhouse-client # or "clickhouse-client --password" if you've set up a password. ``` -
- -Deprecated Method for installing deb-packages - -``` bash -sudo apt-get install apt-transport-https ca-certificates dirmngr -sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 - -echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \ - /etc/apt/sources.list.d/clickhouse.list -sudo apt-get update - -sudo apt-get install -y clickhouse-server clickhouse-client - -sudo service clickhouse-server start -clickhouse-client # or "clickhouse-client --password" if you set up a password. -``` - -
- 如果您想使用最新的版本,请用`testing`替代`stable`(我们只推荐您用于测试环境)。 你也可以从这里手动下载安装包:[下载](https://packages.clickhouse.com/deb/pool/stable)。 @@ -95,22 +75,6 @@ sudo /etc/init.d/clickhouse-server start clickhouse-client # or "clickhouse-client --password" if you set up a password. ``` -
- -Deprecated Method for installing rpm-packages - -``` bash -sudo yum install yum-utils -sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG -sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo -sudo yum install clickhouse-server clickhouse-client - -sudo /etc/init.d/clickhouse-server start -clickhouse-client # or "clickhouse-client --password" if you set up a password. -``` - -
- 如果您想使用最新的版本,请用`testing`替代`stable`(我们只推荐您用于测试环境)。`prestable`有时也可用。 然后运行命令安装: @@ -164,34 +128,6 @@ tar -xzvf "clickhouse-client-$LATEST_VERSION-${ARCH}.tgz" \ sudo "clickhouse-client-$LATEST_VERSION/install/doinst.sh" ``` -
- -Deprecated Method for installing tgz archives - -``` bash -export LATEST_VERSION=$(curl -s https://repo.clickhouse.com/tgz/stable/ | \ - grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1) -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz -curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz - -tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz -sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh - -tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz -sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh - -tar -xzvf clickhouse-server-$LATEST_VERSION.tgz -sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh -sudo /etc/init.d/clickhouse-server start - -tar -xzvf clickhouse-client-$LATEST_VERSION.tgz -sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh -``` - -
- 对于生产环境,建议使用最新的`stable`版本。你可以在GitHub页面https://github.com/ClickHouse/ClickHouse/tags找到它,它以后缀`-stable`标志。 ### `Docker`安装包 {#from-docker-image} diff --git a/packages/clickhouse-server.init b/packages/clickhouse-server.init index f215e52b6f3..0ac9cf7ae1f 100755 --- a/packages/clickhouse-server.init +++ b/packages/clickhouse-server.init @@ -1,10 +1,11 @@ #!/bin/sh ### BEGIN INIT INFO # Provides: clickhouse-server +# Required-Start: $network +# Required-Stop: $network +# Should-Start: $time # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 -# Should-Start: $time $network -# Should-Stop: $network # Short-Description: clickhouse-server daemon ### END INIT INFO # diff --git a/programs/keeper-client/Commands.cpp b/programs/keeper-client/Commands.cpp index a109912e6e0..860840a2d06 100644 --- a/programs/keeper-client/Commands.cpp +++ b/programs/keeper-client/Commands.cpp @@ -10,6 +10,7 @@ namespace DB namespace ErrorCodes { + extern const int LOGICAL_ERROR; extern const int KEEPER_EXCEPTION; } @@ -441,7 +442,7 @@ void ReconfigCommand::execute(const DB::ASTKeeperQuery * query, DB::KeeperClient new_members = query->args[1].safeGet(); break; default: - UNREACHABLE(); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected operation: {}", operation); } auto response = client->zookeeper->reconfig(joining, leaving, new_members); diff --git a/programs/main.cpp b/programs/main.cpp index bc8476e4ce4..c270388f17f 100644 --- a/programs/main.cpp +++ b/programs/main.cpp @@ -155,8 +155,8 @@ auto instructionFailToString(InstructionFail fail) ret("AVX2"); case InstructionFail::AVX512: ret("AVX512"); +#undef ret } - UNREACHABLE(); } diff --git a/programs/server/Server.cpp b/programs/server/Server.cpp index 1ffff5e64e2..69aefd50048 100644 --- a/programs/server/Server.cpp +++ b/programs/server/Server.cpp @@ -792,9 +792,32 @@ try LOG_INFO(log, "Background threads finished in {} ms", watch.elapsedMilliseconds()); }); + /// This object will periodically calculate some metrics. + ServerAsynchronousMetrics async_metrics( + global_context, + server_settings.asynchronous_metrics_update_period_s, + server_settings.asynchronous_heavy_metrics_update_period_s, + [&]() -> std::vector + { + std::vector metrics; + + std::lock_guard lock(servers_lock); + metrics.reserve(servers_to_start_before_tables.size() + servers.size()); + + for (const auto & server : servers_to_start_before_tables) + metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()}); + + for (const auto & server : servers) + metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()}); + return metrics; + } + ); + /// NOTE: global context should be destroyed *before* GlobalThreadPool::shutdown() /// Otherwise GlobalThreadPool::shutdown() will hang, since Context holds some threads. SCOPE_EXIT({ + async_metrics.stop(); + /** Ask to cancel background jobs all table engines, * and also query_log. * It is important to do early, not in destructor of Context, because @@ -921,27 +944,6 @@ try } } - /// This object will periodically calculate some metrics. - ServerAsynchronousMetrics async_metrics( - global_context, - server_settings.asynchronous_metrics_update_period_s, - server_settings.asynchronous_heavy_metrics_update_period_s, - [&]() -> std::vector - { - std::vector metrics; - - std::lock_guard lock(servers_lock); - metrics.reserve(servers_to_start_before_tables.size() + servers.size()); - - for (const auto & server : servers_to_start_before_tables) - metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()}); - - for (const auto & server : servers) - metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()}); - return metrics; - } - ); - zkutil::validateZooKeeperConfig(config()); bool has_zookeeper = zkutil::hasZooKeeperConfig(config()); @@ -1748,6 +1750,11 @@ try } + if (config().has(DB::PlacementInfo::PLACEMENT_CONFIG_PREFIX)) + { + PlacementInfo::PlacementInfo::instance().initialize(config()); + } + { std::lock_guard lock(servers_lock); /// We should start interserver communications before (and more important shutdown after) tables. @@ -2096,11 +2103,6 @@ try load_metadata_tasks); } - if (config().has(DB::PlacementInfo::PLACEMENT_CONFIG_PREFIX)) - { - PlacementInfo::PlacementInfo::instance().initialize(config()); - } - /// Do not keep tasks in server, they should be kept inside databases. Used here to make dependent tasks only. load_metadata_tasks.clear(); load_metadata_tasks.shrink_to_fit(); diff --git a/src/Access/AccessEntityIO.cpp b/src/Access/AccessEntityIO.cpp index b0dfd74c53b..1b073329296 100644 --- a/src/Access/AccessEntityIO.cpp +++ b/src/Access/AccessEntityIO.cpp @@ -144,8 +144,7 @@ AccessEntityPtr deserializeAccessEntity(const String & definition, const String catch (Exception & e) { e.addMessage("Could not parse " + file_path); - e.rethrow(); - UNREACHABLE(); + throw; } } diff --git a/src/Access/AccessRights.cpp b/src/Access/AccessRights.cpp index c10931f554c..2127f4ada70 100644 --- a/src/Access/AccessRights.cpp +++ b/src/Access/AccessRights.cpp @@ -258,7 +258,7 @@ namespace case TABLE_LEVEL: return AccessFlags::allFlagsGrantableOnTableLevel(); case COLUMN_LEVEL: return AccessFlags::allFlagsGrantableOnColumnLevel(); } - UNREACHABLE(); + chassert(false); } } diff --git a/src/Access/IAccessStorage.cpp b/src/Access/IAccessStorage.cpp index 8e51481e415..8d4e7d3073e 100644 --- a/src/Access/IAccessStorage.cpp +++ b/src/Access/IAccessStorage.cpp @@ -257,8 +257,7 @@ std::vector IAccessStorage::insert(const std::vector & mu } e.addMessage("After successfully inserting {}/{}: {}", successfully_inserted.size(), multiple_entities.size(), successfully_inserted_str); } - e.rethrow(); - UNREACHABLE(); + throw; } } @@ -361,8 +360,7 @@ std::vector IAccessStorage::remove(const std::vector & ids, bool thr } e.addMessage("After successfully removing {}/{}: {}", removed_names.size(), ids.size(), removed_names_str); } - e.rethrow(); - UNREACHABLE(); + throw; } } @@ -458,8 +456,7 @@ std::vector IAccessStorage::update(const std::vector & ids, const Up } e.addMessage("After successfully updating {}/{}: {}", names_of_updated.size(), ids.size(), names_of_updated_str); } - e.rethrow(); - UNREACHABLE(); + throw; } } diff --git a/src/AggregateFunctions/AggregateFunctionGroupArray.cpp b/src/AggregateFunctions/AggregateFunctionGroupArray.cpp index c21b1d376d9..16907e0f24f 100644 --- a/src/AggregateFunctions/AggregateFunctionGroupArray.cpp +++ b/src/AggregateFunctions/AggregateFunctionGroupArray.cpp @@ -60,14 +60,13 @@ struct GroupArrayTrait template constexpr const char * getNameByTrait() { - if (Trait::last) + if constexpr (Trait::last) return "groupArrayLast"; - if (Trait::sampler == Sampler::NONE) - return "groupArray"; - else if (Trait::sampler == Sampler::RNG) - return "groupArraySample"; - - UNREACHABLE(); + switch (Trait::sampler) + { + case Sampler::NONE: return "groupArray"; + case Sampler::RNG: return "groupArraySample"; + } } template diff --git a/src/AggregateFunctions/AggregateFunctionSequenceNextNode.cpp b/src/AggregateFunctions/AggregateFunctionSequenceNextNode.cpp index b3824720b04..b0240225138 100644 --- a/src/AggregateFunctions/AggregateFunctionSequenceNextNode.cpp +++ b/src/AggregateFunctions/AggregateFunctionSequenceNextNode.cpp @@ -414,7 +414,6 @@ public: break; return (i == events_size) ? base - i : unmatched_idx; } - UNREACHABLE(); } void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionSum.h b/src/AggregateFunctions/AggregateFunctionSum.h index 58aaddf357a..2ce03c530c2 100644 --- a/src/AggregateFunctions/AggregateFunctionSum.h +++ b/src/AggregateFunctions/AggregateFunctionSum.h @@ -463,7 +463,6 @@ public: return "sumWithOverflow"; else if constexpr (Type == AggregateFunctionTypeSumKahan) return "sumKahan"; - UNREACHABLE(); } explicit AggregateFunctionSum(const DataTypes & argument_types_) diff --git a/src/Analyzer/Resolve/ExpressionsStack.h b/src/Analyzer/Resolve/ExpressionsStack.h new file mode 100644 index 00000000000..82a27aa8b83 --- /dev/null +++ b/src/Analyzer/Resolve/ExpressionsStack.h @@ -0,0 +1,124 @@ +#pragma once + +#include +#include +#include + +namespace DB +{ + +class ExpressionsStack +{ +public: + void push(const QueryTreeNodePtr & node) + { + if (node->hasAlias()) + { + const auto & node_alias = node->getAlias(); + alias_name_to_expressions[node_alias].push_back(node); + } + + if (const auto * function = node->as()) + { + if (AggregateFunctionFactory::instance().isAggregateFunctionName(function->getFunctionName())) + ++aggregate_functions_counter; + } + + expressions.emplace_back(node); + } + + void pop() + { + const auto & top_expression = expressions.back(); + const auto & top_expression_alias = top_expression->getAlias(); + + if (!top_expression_alias.empty()) + { + auto it = alias_name_to_expressions.find(top_expression_alias); + auto & alias_expressions = it->second; + alias_expressions.pop_back(); + + if (alias_expressions.empty()) + alias_name_to_expressions.erase(it); + } + + if (const auto * function = top_expression->as()) + { + if (AggregateFunctionFactory::instance().isAggregateFunctionName(function->getFunctionName())) + --aggregate_functions_counter; + } + + expressions.pop_back(); + } + + [[maybe_unused]] const QueryTreeNodePtr & getRoot() const + { + return expressions.front(); + } + + const QueryTreeNodePtr & getTop() const + { + return expressions.back(); + } + + [[maybe_unused]] bool hasExpressionWithAlias(const std::string & alias) const + { + return alias_name_to_expressions.contains(alias); + } + + bool hasAggregateFunction() const + { + return aggregate_functions_counter > 0; + } + + QueryTreeNodePtr getExpressionWithAlias(const std::string & alias) const + { + auto expression_it = alias_name_to_expressions.find(alias); + if (expression_it == alias_name_to_expressions.end()) + return {}; + + return expression_it->second.front(); + } + + [[maybe_unused]] size_t size() const + { + return expressions.size(); + } + + bool empty() const + { + return expressions.empty(); + } + + void dump(WriteBuffer & buffer) const + { + buffer << expressions.size() << '\n'; + + for (const auto & expression : expressions) + { + buffer << "Expression "; + buffer << expression->formatASTForErrorMessage(); + + const auto & alias = expression->getAlias(); + if (!alias.empty()) + buffer << " alias " << alias; + + buffer << '\n'; + } + } + + [[maybe_unused]] String dump() const + { + WriteBufferFromOwnString buffer; + dump(buffer); + + return buffer.str(); + } + +private: + QueryTreeNodes expressions; + size_t aggregate_functions_counter = 0; + std::unordered_map alias_name_to_expressions; +}; + +} diff --git a/src/Analyzer/Resolve/IdentifierLookup.h b/src/Analyzer/Resolve/IdentifierLookup.h new file mode 100644 index 00000000000..8dd70c188e9 --- /dev/null +++ b/src/Analyzer/Resolve/IdentifierLookup.h @@ -0,0 +1,195 @@ +#pragma once + +#include +#include +#include + +#include +#include + +namespace DB +{ + +/// Identifier lookup context +enum class IdentifierLookupContext : uint8_t +{ + EXPRESSION = 0, + FUNCTION, + TABLE_EXPRESSION, +}; + +inline const char * toString(IdentifierLookupContext identifier_lookup_context) +{ + switch (identifier_lookup_context) + { + case IdentifierLookupContext::EXPRESSION: return "EXPRESSION"; + case IdentifierLookupContext::FUNCTION: return "FUNCTION"; + case IdentifierLookupContext::TABLE_EXPRESSION: return "TABLE_EXPRESSION"; + } +} + +inline const char * toStringLowercase(IdentifierLookupContext identifier_lookup_context) +{ + switch (identifier_lookup_context) + { + case IdentifierLookupContext::EXPRESSION: return "expression"; + case IdentifierLookupContext::FUNCTION: return "function"; + case IdentifierLookupContext::TABLE_EXPRESSION: return "table expression"; + } +} + +/** Structure that represent identifier lookup during query analysis. + * Lookup can be in query expression, function, table context. + */ +struct IdentifierLookup +{ + Identifier identifier; + IdentifierLookupContext lookup_context; + + bool isExpressionLookup() const + { + return lookup_context == IdentifierLookupContext::EXPRESSION; + } + + bool isFunctionLookup() const + { + return lookup_context == IdentifierLookupContext::FUNCTION; + } + + bool isTableExpressionLookup() const + { + return lookup_context == IdentifierLookupContext::TABLE_EXPRESSION; + } + + String dump() const + { + return identifier.getFullName() + ' ' + toString(lookup_context); + } +}; + +inline bool operator==(const IdentifierLookup & lhs, const IdentifierLookup & rhs) +{ + return lhs.identifier.getFullName() == rhs.identifier.getFullName() && lhs.lookup_context == rhs.lookup_context; +} + +[[maybe_unused]] inline bool operator!=(const IdentifierLookup & lhs, const IdentifierLookup & rhs) +{ + return !(lhs == rhs); +} + +struct IdentifierLookupHash +{ + size_t operator()(const IdentifierLookup & identifier_lookup) const + { + return std::hash()(identifier_lookup.identifier.getFullName()) ^ static_cast(identifier_lookup.lookup_context); + } +}; + +enum class IdentifierResolvePlace : UInt8 +{ + NONE = 0, + EXPRESSION_ARGUMENTS, + ALIASES, + JOIN_TREE, + /// Valid only for table lookup + CTE, + /// Valid only for table lookup + DATABASE_CATALOG +}; + +inline const char * toString(IdentifierResolvePlace resolved_identifier_place) +{ + switch (resolved_identifier_place) + { + case IdentifierResolvePlace::NONE: return "NONE"; + case IdentifierResolvePlace::EXPRESSION_ARGUMENTS: return "EXPRESSION_ARGUMENTS"; + case IdentifierResolvePlace::ALIASES: return "ALIASES"; + case IdentifierResolvePlace::JOIN_TREE: return "JOIN_TREE"; + case IdentifierResolvePlace::CTE: return "CTE"; + case IdentifierResolvePlace::DATABASE_CATALOG: return "DATABASE_CATALOG"; + } +} + +struct IdentifierResolveResult +{ + IdentifierResolveResult() = default; + + QueryTreeNodePtr resolved_identifier; + IdentifierResolvePlace resolve_place = IdentifierResolvePlace::NONE; + bool resolved_from_parent_scopes = false; + + [[maybe_unused]] bool isResolved() const + { + return resolve_place != IdentifierResolvePlace::NONE; + } + + [[maybe_unused]] bool isResolvedFromParentScopes() const + { + return resolved_from_parent_scopes; + } + + [[maybe_unused]] bool isResolvedFromExpressionArguments() const + { + return resolve_place == IdentifierResolvePlace::EXPRESSION_ARGUMENTS; + } + + [[maybe_unused]] bool isResolvedFromAliases() const + { + return resolve_place == IdentifierResolvePlace::ALIASES; + } + + [[maybe_unused]] bool isResolvedFromJoinTree() const + { + return resolve_place == IdentifierResolvePlace::JOIN_TREE; + } + + [[maybe_unused]] bool isResolvedFromCTEs() const + { + return resolve_place == IdentifierResolvePlace::CTE; + } + + void dump(WriteBuffer & buffer) const + { + if (!resolved_identifier) + { + buffer << "unresolved"; + return; + } + + buffer << resolved_identifier->formatASTForErrorMessage() << " place " << toString(resolve_place) << " resolved from parent scopes " << resolved_from_parent_scopes; + } + + [[maybe_unused]] String dump() const + { + WriteBufferFromOwnString buffer; + dump(buffer); + + return buffer.str(); + } +}; + +struct IdentifierResolveState +{ + IdentifierResolveResult resolve_result; + bool cyclic_identifier_resolve = false; +}; + +struct IdentifierResolveSettings +{ + /// Allow to check join tree during identifier resolution + bool allow_to_check_join_tree = true; + + /// Allow to check CTEs during table identifier resolution + bool allow_to_check_cte = true; + + /// Allow to check parent scopes during identifier resolution + bool allow_to_check_parent_scopes = true; + + /// Allow to check database catalog during table identifier resolution + bool allow_to_check_database_catalog = true; + + /// Allow to resolve subquery during identifier resolution + bool allow_to_resolve_subquery_during_identifier_resolution = true; +}; + +} diff --git a/src/Analyzer/Resolve/IdentifierResolveScope.cpp b/src/Analyzer/Resolve/IdentifierResolveScope.cpp new file mode 100644 index 00000000000..ae363b57047 --- /dev/null +++ b/src/Analyzer/Resolve/IdentifierResolveScope.cpp @@ -0,0 +1,184 @@ +#include + +#include +#include +#include + +namespace DB +{ +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + +IdentifierResolveScope::IdentifierResolveScope(QueryTreeNodePtr scope_node_, IdentifierResolveScope * parent_scope_) + : scope_node(std::move(scope_node_)) + , parent_scope(parent_scope_) +{ + if (parent_scope) + { + subquery_depth = parent_scope->subquery_depth; + context = parent_scope->context; + projection_mask_map = parent_scope->projection_mask_map; + } + else + projection_mask_map = std::make_shared>(); + + if (auto * union_node = scope_node->as()) + { + context = union_node->getContext(); + } + else if (auto * query_node = scope_node->as()) + { + context = query_node->getContext(); + group_by_use_nulls = context->getSettingsRef().group_by_use_nulls && + (query_node->isGroupByWithGroupingSets() || query_node->isGroupByWithRollup() || query_node->isGroupByWithCube()); + } + + if (context) + join_use_nulls = context->getSettingsRef().join_use_nulls; + else if (parent_scope) + join_use_nulls = parent_scope->join_use_nulls; + + aliases.alias_name_to_expression_node = &aliases.alias_name_to_expression_node_before_group_by; +} + +[[maybe_unused]] const IdentifierResolveScope * IdentifierResolveScope::getNearestQueryScope() const +{ + const IdentifierResolveScope * scope_to_check = this; + while (scope_to_check != nullptr) + { + if (scope_to_check->scope_node->getNodeType() == QueryTreeNodeType::QUERY) + break; + + scope_to_check = scope_to_check->parent_scope; + } + + return scope_to_check; +} + +IdentifierResolveScope * IdentifierResolveScope::getNearestQueryScope() +{ + IdentifierResolveScope * scope_to_check = this; + while (scope_to_check != nullptr) + { + if (scope_to_check->scope_node->getNodeType() == QueryTreeNodeType::QUERY) + break; + + scope_to_check = scope_to_check->parent_scope; + } + + return scope_to_check; +} + +AnalysisTableExpressionData & IdentifierResolveScope::getTableExpressionDataOrThrow(const QueryTreeNodePtr & table_expression_node) +{ + auto it = table_expression_node_to_data.find(table_expression_node); + if (it == table_expression_node_to_data.end()) + { + throw Exception(ErrorCodes::LOGICAL_ERROR, + "Table expression {} data must be initialized. In scope {}", + table_expression_node->formatASTForErrorMessage(), + scope_node->formatASTForErrorMessage()); + } + + return it->second; +} + +const AnalysisTableExpressionData & IdentifierResolveScope::getTableExpressionDataOrThrow(const QueryTreeNodePtr & table_expression_node) const +{ + auto it = table_expression_node_to_data.find(table_expression_node); + if (it == table_expression_node_to_data.end()) + { + throw Exception(ErrorCodes::LOGICAL_ERROR, + "Table expression {} data must be initialized. In scope {}", + table_expression_node->formatASTForErrorMessage(), + scope_node->formatASTForErrorMessage()); + } + + return it->second; +} + +void IdentifierResolveScope::pushExpressionNode(const QueryTreeNodePtr & node) +{ + bool had_aggregate_function = expressions_in_resolve_process_stack.hasAggregateFunction(); + expressions_in_resolve_process_stack.push(node); + if (group_by_use_nulls && had_aggregate_function != expressions_in_resolve_process_stack.hasAggregateFunction()) + aliases.alias_name_to_expression_node = &aliases.alias_name_to_expression_node_before_group_by; +} + +void IdentifierResolveScope::popExpressionNode() +{ + bool had_aggregate_function = expressions_in_resolve_process_stack.hasAggregateFunction(); + expressions_in_resolve_process_stack.pop(); + if (group_by_use_nulls && had_aggregate_function != expressions_in_resolve_process_stack.hasAggregateFunction()) + aliases.alias_name_to_expression_node = &aliases.alias_name_to_expression_node_after_group_by; +} + +/// Dump identifier resolve scope +[[maybe_unused]] void IdentifierResolveScope::dump(WriteBuffer & buffer) const +{ + buffer << "Scope node " << scope_node->formatASTForErrorMessage() << '\n'; + buffer << "Identifier lookup to resolve state " << identifier_lookup_to_resolve_state.size() << '\n'; + for (const auto & [identifier, state] : identifier_lookup_to_resolve_state) + { + buffer << "Identifier " << identifier.dump() << " resolve result "; + state.resolve_result.dump(buffer); + buffer << '\n'; + } + + buffer << "Expression argument name to node " << expression_argument_name_to_node.size() << '\n'; + for (const auto & [alias_name, node] : expression_argument_name_to_node) + buffer << "Alias name " << alias_name << " node " << node->formatASTForErrorMessage() << '\n'; + + buffer << "Alias name to expression node table size " << aliases.alias_name_to_expression_node->size() << '\n'; + for (const auto & [alias_name, node] : *aliases.alias_name_to_expression_node) + buffer << "Alias name " << alias_name << " expression node " << node->dumpTree() << '\n'; + + buffer << "Alias name to function node table size " << aliases.alias_name_to_lambda_node.size() << '\n'; + for (const auto & [alias_name, node] : aliases.alias_name_to_lambda_node) + buffer << "Alias name " << alias_name << " lambda node " << node->formatASTForErrorMessage() << '\n'; + + buffer << "Alias name to table expression node table size " << aliases.alias_name_to_table_expression_node.size() << '\n'; + for (const auto & [alias_name, node] : aliases.alias_name_to_table_expression_node) + buffer << "Alias name " << alias_name << " node " << node->formatASTForErrorMessage() << '\n'; + + buffer << "CTE name to query node table size " << cte_name_to_query_node.size() << '\n'; + for (const auto & [cte_name, node] : cte_name_to_query_node) + buffer << "CTE name " << cte_name << " node " << node->formatASTForErrorMessage() << '\n'; + + buffer << "WINDOW name to window node table size " << window_name_to_window_node.size() << '\n'; + for (const auto & [window_name, node] : window_name_to_window_node) + buffer << "CTE name " << window_name << " node " << node->formatASTForErrorMessage() << '\n'; + + buffer << "Nodes with duplicated aliases size " << aliases.nodes_with_duplicated_aliases.size() << '\n'; + for (const auto & node : aliases.nodes_with_duplicated_aliases) + buffer << "Alias name " << node->getAlias() << " node " << node->formatASTForErrorMessage() << '\n'; + + buffer << "Expression resolve process stack " << '\n'; + expressions_in_resolve_process_stack.dump(buffer); + + buffer << "Table expressions in resolve process size " << table_expressions_in_resolve_process.size() << '\n'; + for (const auto & node : table_expressions_in_resolve_process) + buffer << "Table expression " << node->formatASTForErrorMessage() << '\n'; + + buffer << "Non cached identifier lookups during expression resolve " << non_cached_identifier_lookups_during_expression_resolve.size() << '\n'; + for (const auto & identifier_lookup : non_cached_identifier_lookups_during_expression_resolve) + buffer << "Identifier lookup " << identifier_lookup.dump() << '\n'; + + buffer << "Table expression node to data " << table_expression_node_to_data.size() << '\n'; + for (const auto & [table_expression_node, table_expression_data] : table_expression_node_to_data) + buffer << "Table expression node " << table_expression_node->formatASTForErrorMessage() << " data " << table_expression_data.dump() << '\n'; + + buffer << "Use identifier lookup to result cache " << use_identifier_lookup_to_result_cache << '\n'; + buffer << "Subquery depth " << subquery_depth << '\n'; +} + +[[maybe_unused]] String IdentifierResolveScope::dump() const +{ + WriteBufferFromOwnString buffer; + dump(buffer); + + return buffer.str(); +} +} diff --git a/src/Analyzer/Resolve/IdentifierResolveScope.h b/src/Analyzer/Resolve/IdentifierResolveScope.h new file mode 100644 index 00000000000..ab2e27cc14d --- /dev/null +++ b/src/Analyzer/Resolve/IdentifierResolveScope.h @@ -0,0 +1,231 @@ +#pragma once + +#include +#include +#include + +#include +#include +#include +#include + +namespace DB +{ + +/** Projection names is name of query tree node that is used in projection part of query node. + * Example: SELECT id FROM test_table; + * `id` is projection name of column node + * + * Example: SELECT id AS id_alias FROM test_table; + * `id_alias` is projection name of column node + * + * Calculation of projection names is done during expression nodes resolution. This is done this way + * because after identifier node is resolved we lose information about identifier name. We could + * potentially save this information in query tree node itself, but that would require to clone it in some cases. + * Example: SELECT big_scalar_subquery AS a, a AS b, b AS c; + * All 3 nodes in projection are the same big_scalar_subquery, but they have different projection names. + * If we want to save it in query tree node, we have to clone subquery node that could lead to performance degradation. + * + * Possible solution is to separate query node metadata and query node content. So only node metadata could be cloned + * if we want to change projection name. This solution does not seem to be easy for client of query tree because projection + * name will be part of interface. If we potentially could hide projection names calculation in analyzer without introducing additional + * changes in query tree structure that would be preferable. + * + * Currently each resolve method returns projection names array. Resolve method must compute projection names of node. + * If node is resolved as list node this is case for `untuple` function or `matcher` result projection names array must contain projection names + * for result nodes. + * If node is not resolved as list node, projection names array contain single projection name for node. + * + * Rules for projection names: + * 1. If node has alias. It is node projection name. + * Except scenario where `untuple` function has alias. Example: SELECT untuple(expr) AS alias, alias. + * + * 2. For constant it is constant value string representation. + * + * 3. For identifier: + * If identifier is resolved from JOIN TREE, we want to remove additional identifier qualifications. + * Example: SELECT default.test_table.id FROM test_table. + * Result projection name is `id`. + * + * Example: SELECT t1.id FROM test_table_1 AS t1, test_table_2 AS t2 + * In example both test_table_1, test_table_2 have `id` column. + * In such case projection name is `t1.id` because if additional qualification is removed then column projection name `id` will be ambiguous. + * + * Example: SELECT default.test_table_1.id FROM test_table_1 AS t1, test_table_2 AS t2 + * In such case projection name is `test_table_1.id` because we remove unnecessary database qualification, but table name qualification cannot be removed + * because otherwise column projection name `id` will be ambiguous. + * + * If identifier is not resolved from JOIN TREE. Identifier name is projection name. + * Except scenario where `untuple` function resolved using identifier. Example: SELECT untuple(expr) AS alias, alias. + * Example: SELECT sum(1, 1) AS value, value. + * In such case both nodes have `value` projection names. + * + * Example: SELECT id AS value, value FROM test_table. + * In such case both nodes have have `value` projection names. + * + * Special case is `untuple` function. If `untuple` function specified with alias, then result nodes will have alias.tuple_column_name projection names. + * Example: SELECT cast(tuple(1), 'Tuple(id UInt64)') AS value, untuple(value) AS a; + * Result projection names are `value`, `a.id`. + * + * If `untuple` function does not have alias then result nodes will have `tupleElement(untuple_expression_projection_name, 'tuple_column_name') projection names. + * + * Example: SELECT cast(tuple(1), 'Tuple(id UInt64)') AS value, untuple(value); + * Result projection names are `value`, `tupleElement(value, 'id')`; + * + * 4. For function: + * Projection name consists from function_name(parameters_projection_names)(arguments_projection_names). + * Additionally if function is window function. Window node projection name is used with OVER clause. + * Example: function_name (parameters_names)(argument_projection_names) OVER window_name; + * Example: function_name (parameters_names)(argument_projection_names) OVER (PARTITION BY id ORDER BY id). + * Example: function_name (parameters_names)(argument_projection_names) OVER (window_name ORDER BY id). + * + * 5. For lambda: + * If it is standalone lambda that returns single expression, function projection name is used. + * Example: WITH (x -> x + 1) AS lambda SELECT lambda(1). + * Projection name is `lambda(1)`. + * + * If is it standalone lambda that returns list, projection names of list nodes are used. + * Example: WITH (x -> *) AS lambda SELECT lambda(1) FROM test_table; + * If test_table has two columns `id`, `value`. Then result projection names are `id`, `value`. + * + * If lambda is argument of function. + * Then projection name consists from lambda(tuple(lambda_arguments)(lambda_body_projection_name)); + * + * 6. For matcher: + * Matched nodes projection names are used as matcher projection names. + * + * Matched nodes must be qualified if needed. + * Example: SELECT * FROM test_table_1 AS t1, test_table_2 AS t2. + * In example table test_table_1 and test_table_2 both have `id`, `value` columns. + * Matched nodes after unqualified matcher resolve must be qualified to avoid ambiguous projection names. + * Result projection names must be `t1.id`, `t1.value`, `t2.id`, `t2.value`. + * + * There are special cases + * 1. For lambda inside APPLY matcher transformer: + * Example: SELECT * APPLY x -> toString(x) FROM test_table. + * In such case lambda argument projection name `x` will be replaced by matched node projection name. + * If table has two columns `id` and `value`. Then result projection names are `toString(id)`, `toString(value)`; + * + * 2. For unqualified matcher when JOIN tree contains JOIN with USING. + * Example: SELECT * FROM test_table_1 AS t1 INNER JOIN test_table_2 AS t2 USING(id); + * Result projection names must be `id`, `t1.value`, `t2.value`. + * + * 7. For subquery: + * For subquery projection name consists of `_subquery_` prefix and implementation specific unique number suffix. + * Example: SELECT (SELECT 1), (SELECT 1 UNION DISTINCT SELECT 1); + * Result projection name can be `_subquery_1`, `subquery_2`; + * + * 8. For table: + * Table node can be used in expression context only as right argument of IN function. In that case identifier is used + * as table node projection name. + * Example: SELECT id IN test_table FROM test_table; + * Result projection name is `in(id, test_table)`. + */ +using ProjectionName = String; +using ProjectionNames = std::vector; +constexpr auto PROJECTION_NAME_PLACEHOLDER = "__projection_name_placeholder"; + +struct IdentifierResolveScope +{ + /// Construct identifier resolve scope using scope node, and parent scope + IdentifierResolveScope(QueryTreeNodePtr scope_node_, IdentifierResolveScope * parent_scope_); + + QueryTreeNodePtr scope_node; + + IdentifierResolveScope * parent_scope = nullptr; + + ContextPtr context; + + /// Identifier lookup to result + std::unordered_map identifier_lookup_to_resolve_state; + + /// Argument can be expression like constant, column, function or table expression + std::unordered_map expression_argument_name_to_node; + + ScopeAliases aliases; + + /// Table column name to column node. Valid only during table ALIAS columns resolve. + ColumnNameToColumnNodeMap column_name_to_column_node; + + /// CTE name to query node + std::unordered_map cte_name_to_query_node; + + /// Window name to window node + std::unordered_map window_name_to_window_node; + + /// Current scope expression in resolve process stack + ExpressionsStack expressions_in_resolve_process_stack; + + /// Table expressions in resolve process + std::unordered_set table_expressions_in_resolve_process; + + /// Current scope expression + std::unordered_set non_cached_identifier_lookups_during_expression_resolve; + + /// Table expression node to data + std::unordered_map table_expression_node_to_data; + + QueryTreeNodePtrWithHashIgnoreTypesSet nullable_group_by_keys; + /// Here we count the number of nullable GROUP BY keys we met resolving expression. + /// E.g. for a query `SELECT tuple(tuple(number)) FROM numbers(10) GROUP BY (number, tuple(number)) with cube` + /// both `number` and `tuple(number)` would be in nullable_group_by_keys. + /// But when we resolve `tuple(tuple(number))` we should figure out that `tuple(number)` is already a key, + /// and we should not convert `number` to nullable. + size_t found_nullable_group_by_key_in_scope = 0; + + /** It's possible that after a JOIN, a column in the projection has a type different from the column in the source table. + * (For example, after join_use_nulls or USING column casted to supertype) + * However, the column in the projection still refers to the table as its source. + * This map is used to revert these columns back to their original columns in the source table. + */ + QueryTreeNodePtrWithHashMap join_columns_with_changed_types; + + /// Use identifier lookup to result cache + bool use_identifier_lookup_to_result_cache = true; + + /// Apply nullability to aggregation keys + bool group_by_use_nulls = false; + /// Join retutns NULLs instead of default values + bool join_use_nulls = false; + + /// JOINs count + size_t joins_count = 0; + + /// Subquery depth + size_t subquery_depth = 0; + + /** Scope join tree node for expression. + * Valid only during analysis construction for single expression. + */ + QueryTreeNodePtr expression_join_tree_node; + + /// Node hash to mask id map + std::shared_ptr> projection_mask_map; + + struct ResolvedFunctionsCache + { + FunctionOverloadResolverPtr resolver; + FunctionBasePtr function_base; + }; + + std::map functions_cache; + + [[maybe_unused]] const IdentifierResolveScope * getNearestQueryScope() const; + + IdentifierResolveScope * getNearestQueryScope(); + + AnalysisTableExpressionData & getTableExpressionDataOrThrow(const QueryTreeNodePtr & table_expression_node); + + const AnalysisTableExpressionData & getTableExpressionDataOrThrow(const QueryTreeNodePtr & table_expression_node) const; + + void pushExpressionNode(const QueryTreeNodePtr & node); + + void popExpressionNode(); + + /// Dump identifier resolve scope + [[maybe_unused]] void dump(WriteBuffer & buffer) const; + + [[maybe_unused]] String dump() const; +}; + +} diff --git a/src/Analyzer/Resolve/QueryAnalysisPass.cpp b/src/Analyzer/Resolve/QueryAnalysisPass.cpp new file mode 100644 index 00000000000..36c747555fc --- /dev/null +++ b/src/Analyzer/Resolve/QueryAnalysisPass.cpp @@ -0,0 +1,22 @@ +#include +#include +#include + +namespace DB +{ + +QueryAnalysisPass::QueryAnalysisPass(QueryTreeNodePtr table_expression_, bool only_analyze_) + : table_expression(std::move(table_expression_)) + , only_analyze(only_analyze_) +{} + +QueryAnalysisPass::QueryAnalysisPass(bool only_analyze_) : only_analyze(only_analyze_) {} + +void QueryAnalysisPass::run(QueryTreeNodePtr & query_tree_node, ContextPtr context) +{ + QueryAnalyzer analyzer(only_analyze); + analyzer.resolve(query_tree_node, table_expression, context); + createUniqueTableAliases(query_tree_node, table_expression, context); +} + +} diff --git a/src/Analyzer/Passes/QueryAnalysisPass.cpp b/src/Analyzer/Resolve/QueryAnalyzer.cpp similarity index 84% rename from src/Analyzer/Passes/QueryAnalysisPass.cpp rename to src/Analyzer/Resolve/QueryAnalyzer.cpp index 43edaaa53fd..d84626c4be6 100644 --- a/src/Analyzer/Passes/QueryAnalysisPass.cpp +++ b/src/Analyzer/Resolve/QueryAnalyzer.cpp @@ -1,16 +1,3 @@ -#include - -#include - -#include -#include -#include - -#include -#include -#include - -#include #include #include #include @@ -20,42 +7,26 @@ #include #include #include -#include #include -#include -#include -#include - #include #include #include #include -#include - #include #include -#include - #include -#include #include #include #include -#include -#include -#include -#include -#include #include #include #include -#include #include #include #include @@ -85,6 +56,11 @@ #include #include +#include +#include +#include +#include + namespace ProfileEvents { extern const Event ScalarSubqueriesGlobalCacheHit; @@ -94,7 +70,6 @@ namespace ProfileEvents namespace DB { - namespace ErrorCodes { extern const int UNSUPPORTED_METHOD; @@ -130,1470 +105,146 @@ namespace ErrorCodes extern const int INVALID_IDENTIFIER; } -/** Query analyzer implementation overview. Please check documentation in QueryAnalysisPass.h first. - * And additional documentation for each method, where special cases are described in detail. - * - * Each node in query must be resolved. For each query tree node resolved state is specific. - * - * For constant node no resolve process exists, it is resolved during construction. - * - * For table node no resolve process exists, it is resolved during construction. - * - * For function node to be resolved parameters and arguments must be resolved, function node must be initialized with concrete aggregate or - * non aggregate function and with result type. - * - * For lambda node there can be 2 different cases. - * 1. Standalone: WITH (x -> x + 1) AS lambda SELECT lambda(1); Such lambdas are inlined in query tree during query analysis pass. - * 2. Function arguments: WITH (x -> x + 1) AS lambda SELECT arrayMap(lambda, [1, 2, 3]); For such lambda resolution must - * set concrete lambda arguments (initially they are identifier nodes) and resolve lambda expression body. - * - * For query node resolve process must resolve all its inner nodes. - * - * For matcher node resolve process must replace it with matched nodes. - * - * For identifier node resolve process must replace it with concrete non identifier node. This part is most complex because - * for identifier resolution scopes and identifier lookup context play important part. - * - * ClickHouse SQL support lexical scoping for identifier resolution. Scope can be defined by query node or by expression node. - * Expression nodes that can define scope are lambdas and table ALIAS columns. - * - * Identifier lookup context can be expression, function, table. - * - * Examples: WITH (x -> x + 1) as func SELECT func() FROM func; During function `func` resolution identifier lookup is performed - * in function context. - * - * If there are no information of identifier context rules are following: - * 1. Try to resolve identifier in expression context. - * 2. Try to resolve identifier in function context, if it is allowed. Example: SELECT func(arguments); Here func identifier cannot be resolved in function context - * because query projection does not support that. - * 3. Try to resolve identifier in table context, if it is allowed. Example: SELECT table; Here table identifier cannot be resolved in function context - * because query projection does not support that. - * - * TODO: This does not supported properly before, because matchers could not be resolved from aliases. - * - * Identifiers are resolved with following rules: - * Resolution starts with current scope. - * 1. Try to resolve identifier from expression scope arguments. Lambda expression arguments are greatest priority. - * 2. Try to resolve identifier from aliases. - * 3. Try to resolve identifier from join tree if scope is query, or if there are registered table columns in scope. - * Steps 2 and 3 can be changed using prefer_column_name_to_alias setting. - * 4. If it is table lookup, try to resolve identifier from CTE. - * If identifier could not be resolved in current scope, resolution must be continued in parent scopes. - * 5. Try to resolve identifier from parent scopes. - * - * Additional rules about aliases and scopes. - * 1. Parent scope cannot refer alias from child scope. - * 2. Child scope can refer to alias in parent scope. - * - * Example: SELECT arrayMap(x -> x + 1 AS a, [1,2,3]), a; Identifier a is unknown in parent scope. - * Example: SELECT a FROM (SELECT 1 as a); Here we do not refer to alias a from child query scope. But we query it projection result, similar to tables. - * Example: WITH 1 as a SELECT (SELECT a) as b; Here in child scope identifier a is resolved using alias from parent scope. - * - * Additional rules about identifier binding. - * Bind for identifier to entity means that identifier first part match some node during analysis. - * If other parts of identifier cannot be resolved in that node, exception must be thrown. - * - * Example: - * CREATE TABLE test_table (id UInt64, compound_value Tuple(value UInt64)) ENGINE=TinyLog; - * SELECT compound_value.value, 1 AS compound_value FROM test_table; - * Identifier first part compound_value bound to entity with alias compound_value, but nested identifier part cannot be resolved from entity, - * lookup should not be continued, and exception must be thrown because if lookup continues that way identifier can be resolved from join tree. - * - * TODO: This was not supported properly before analyzer because nested identifier could not be resolved from alias. - * - * More complex example: - * CREATE TABLE test_table (id UInt64, value UInt64) ENGINE=TinyLog; - * WITH cast(('Value'), 'Tuple (value UInt64') AS value SELECT (SELECT value FROM test_table); - * Identifier first part value bound to test_table column value, but nested identifier part cannot be resolved from it, - * lookup should not be continued, and exception must be thrown because if lookup continues identifier can be resolved from parent scope. - * - * TODO: Update exception messages - * TODO: Table identifiers with optional UUID. - * TODO: Lookup functions arrayReduce(sum, [1, 2, 3]); - * TODO: Support function identifier resolve from parent query scope, if lambda in parent scope does not capture any columns. - */ +QueryAnalyzer::QueryAnalyzer(bool only_analyze_) : only_analyze(only_analyze_) {} +QueryAnalyzer::~QueryAnalyzer() = default; -namespace +void QueryAnalyzer::resolve(QueryTreeNodePtr & node, const QueryTreeNodePtr & table_expression, ContextPtr context) { + IdentifierResolveScope scope(node, nullptr /*parent_scope*/); -/// Identifier lookup context -enum class IdentifierLookupContext : uint8_t -{ - EXPRESSION = 0, - FUNCTION, - TABLE_EXPRESSION, -}; + if (!scope.context) + scope.context = context; -const char * toString(IdentifierLookupContext identifier_lookup_context) -{ - switch (identifier_lookup_context) + auto node_type = node->getNodeType(); + + switch (node_type) { - case IdentifierLookupContext::EXPRESSION: return "EXPRESSION"; - case IdentifierLookupContext::FUNCTION: return "FUNCTION"; - case IdentifierLookupContext::TABLE_EXPRESSION: return "TABLE_EXPRESSION"; + case QueryTreeNodeType::QUERY: + { + if (table_expression) + throw Exception(ErrorCodes::LOGICAL_ERROR, + "For query analysis table expression must be empty"); + + resolveQuery(node, scope); + break; + } + case QueryTreeNodeType::UNION: + { + if (table_expression) + throw Exception(ErrorCodes::LOGICAL_ERROR, + "For union analysis table expression must be empty"); + + resolveUnion(node, scope); + break; + } + case QueryTreeNodeType::IDENTIFIER: + [[fallthrough]]; + case QueryTreeNodeType::CONSTANT: + [[fallthrough]]; + case QueryTreeNodeType::COLUMN: + [[fallthrough]]; + case QueryTreeNodeType::FUNCTION: + [[fallthrough]]; + case QueryTreeNodeType::LIST: + { + if (table_expression) + { + scope.expression_join_tree_node = table_expression; + validateTableExpressionModifiers(scope.expression_join_tree_node, scope); + initializeTableExpressionData(scope.expression_join_tree_node, scope); + } + + if (node_type == QueryTreeNodeType::LIST) + resolveExpressionNodeList(node, scope, false /*allow_lambda_expression*/, false /*allow_table_expression*/); + else + resolveExpressionNode(node, scope, false /*allow_lambda_expression*/, false /*allow_table_expression*/); + + break; + } + case QueryTreeNodeType::TABLE_FUNCTION: + { + QueryExpressionsAliasVisitor expressions_alias_visitor(scope.aliases); + resolveTableFunction(node, scope, expressions_alias_visitor, false /*nested_table_function*/); + break; + } + default: + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Node {} with type {} is not supported by query analyzer. " + "Supported nodes are query, union, identifier, constant, column, function, list.", + node->formatASTForErrorMessage(), + node->getNodeTypeName()); + } } } -const char * toStringLowercase(IdentifierLookupContext identifier_lookup_context) +std::optional QueryAnalyzer::getColumnSideFromJoinTree(const QueryTreeNodePtr & resolved_identifier, const JoinNode & join_node) { - switch (identifier_lookup_context) - { - case IdentifierLookupContext::EXPRESSION: return "expression"; - case IdentifierLookupContext::FUNCTION: return "function"; - case IdentifierLookupContext::TABLE_EXPRESSION: return "table expression"; - } -} - -/** Structure that represent identifier lookup during query analysis. - * Lookup can be in query expression, function, table context. - */ -struct IdentifierLookup -{ - Identifier identifier; - IdentifierLookupContext lookup_context; - - bool isExpressionLookup() const - { - return lookup_context == IdentifierLookupContext::EXPRESSION; - } - - bool isFunctionLookup() const - { - return lookup_context == IdentifierLookupContext::FUNCTION; - } - - bool isTableExpressionLookup() const - { - return lookup_context == IdentifierLookupContext::TABLE_EXPRESSION; - } - - String dump() const - { - return identifier.getFullName() + ' ' + toString(lookup_context); - } -}; - -inline bool operator==(const IdentifierLookup & lhs, const IdentifierLookup & rhs) -{ - return lhs.identifier.getFullName() == rhs.identifier.getFullName() && lhs.lookup_context == rhs.lookup_context; -} - -[[maybe_unused]] inline bool operator!=(const IdentifierLookup & lhs, const IdentifierLookup & rhs) -{ - return !(lhs == rhs); -} - -struct IdentifierLookupHash -{ - size_t operator()(const IdentifierLookup & identifier_lookup) const - { - return std::hash()(identifier_lookup.identifier.getFullName()) ^ static_cast(identifier_lookup.lookup_context); - } -}; - -enum class IdentifierResolvePlace : UInt8 -{ - NONE = 0, - EXPRESSION_ARGUMENTS, - ALIASES, - JOIN_TREE, - /// Valid only for table lookup - CTE, - /// Valid only for table lookup - DATABASE_CATALOG -}; - -const char * toString(IdentifierResolvePlace resolved_identifier_place) -{ - switch (resolved_identifier_place) - { - case IdentifierResolvePlace::NONE: return "NONE"; - case IdentifierResolvePlace::EXPRESSION_ARGUMENTS: return "EXPRESSION_ARGUMENTS"; - case IdentifierResolvePlace::ALIASES: return "ALIASES"; - case IdentifierResolvePlace::JOIN_TREE: return "JOIN_TREE"; - case IdentifierResolvePlace::CTE: return "CTE"; - case IdentifierResolvePlace::DATABASE_CATALOG: return "DATABASE_CATALOG"; - } -} - -struct IdentifierResolveResult -{ - IdentifierResolveResult() = default; - - QueryTreeNodePtr resolved_identifier; - IdentifierResolvePlace resolve_place = IdentifierResolvePlace::NONE; - bool resolved_from_parent_scopes = false; - - [[maybe_unused]] bool isResolved() const - { - return resolve_place != IdentifierResolvePlace::NONE; - } - - [[maybe_unused]] bool isResolvedFromParentScopes() const - { - return resolved_from_parent_scopes; - } - - [[maybe_unused]] bool isResolvedFromExpressionArguments() const - { - return resolve_place == IdentifierResolvePlace::EXPRESSION_ARGUMENTS; - } - - [[maybe_unused]] bool isResolvedFromAliases() const - { - return resolve_place == IdentifierResolvePlace::ALIASES; - } - - [[maybe_unused]] bool isResolvedFromJoinTree() const - { - return resolve_place == IdentifierResolvePlace::JOIN_TREE; - } - - [[maybe_unused]] bool isResolvedFromCTEs() const - { - return resolve_place == IdentifierResolvePlace::CTE; - } - - void dump(WriteBuffer & buffer) const - { - if (!resolved_identifier) - { - buffer << "unresolved"; - return; - } - - buffer << resolved_identifier->formatASTForErrorMessage() << " place " << toString(resolve_place) << " resolved from parent scopes " << resolved_from_parent_scopes; - } - - [[maybe_unused]] String dump() const - { - WriteBufferFromOwnString buffer; - dump(buffer); - - return buffer.str(); - } -}; - -struct IdentifierResolveState -{ - IdentifierResolveResult resolve_result; - bool cyclic_identifier_resolve = false; -}; - -struct IdentifierResolveSettings -{ - /// Allow to check join tree during identifier resolution - bool allow_to_check_join_tree = true; - - /// Allow to check CTEs during table identifier resolution - bool allow_to_check_cte = true; - - /// Allow to check parent scopes during identifier resolution - bool allow_to_check_parent_scopes = true; - - /// Allow to check database catalog during table identifier resolution - bool allow_to_check_database_catalog = true; - - /// Allow to resolve subquery during identifier resolution - bool allow_to_resolve_subquery_during_identifier_resolution = true; -}; - -struct StringTransparentHash -{ - using is_transparent = void; - using hash = std::hash; - - [[maybe_unused]] size_t operator()(const char * data) const - { - return hash()(data); - } - - size_t operator()(std::string_view data) const - { - return hash()(data); - } - - size_t operator()(const std::string & data) const - { - return hash()(data); - } -}; - -using ColumnNameToColumnNodeMap = std::unordered_map>; - -struct TableExpressionData -{ - std::string table_expression_name; - std::string table_expression_description; - std::string database_name; - std::string table_name; - bool should_qualify_columns = true; - NamesAndTypes column_names_and_types; - ColumnNameToColumnNodeMap column_name_to_column_node; - std::unordered_set subcolumn_names; /// Subset columns that are subcolumns of other columns - std::unordered_set> column_identifier_first_parts; - - bool hasFullIdentifierName(IdentifierView identifier_view) const - { - return column_name_to_column_node.contains(identifier_view.getFullName()); - } - - bool canBindIdentifier(IdentifierView identifier_view) const - { - return column_identifier_first_parts.contains(identifier_view.at(0)); - } - - [[maybe_unused]] void dump(WriteBuffer & buffer) const - { - buffer << "Table expression name " << table_expression_name; - - if (!table_expression_description.empty()) - buffer << " table expression description " << table_expression_description; - - if (!database_name.empty()) - buffer << " database name " << database_name; - - if (!table_name.empty()) - buffer << " table name " << table_name; - - buffer << " should qualify columns " << should_qualify_columns; - buffer << " columns size " << column_name_to_column_node.size() << '\n'; - - for (const auto & [column_name, column_node] : column_name_to_column_node) - buffer << "Column name " << column_name << " column node " << column_node->dumpTree() << '\n'; - } - - [[maybe_unused]] String dump() const - { - WriteBufferFromOwnString buffer; - dump(buffer); - - return buffer.str(); - } -}; - -class ExpressionsStack -{ -public: - void push(const QueryTreeNodePtr & node) - { - if (node->hasAlias()) - { - const auto & node_alias = node->getAlias(); - alias_name_to_expressions[node_alias].push_back(node); - } - - if (const auto * function = node->as()) - { - if (AggregateFunctionFactory::instance().isAggregateFunctionName(function->getFunctionName())) - ++aggregate_functions_counter; - } - - expressions.emplace_back(node); - } - - void pop() - { - const auto & top_expression = expressions.back(); - const auto & top_expression_alias = top_expression->getAlias(); - - if (!top_expression_alias.empty()) - { - auto it = alias_name_to_expressions.find(top_expression_alias); - auto & alias_expressions = it->second; - alias_expressions.pop_back(); - - if (alias_expressions.empty()) - alias_name_to_expressions.erase(it); - } - - if (const auto * function = top_expression->as()) - { - if (AggregateFunctionFactory::instance().isAggregateFunctionName(function->getFunctionName())) - --aggregate_functions_counter; - } - - expressions.pop_back(); - } - - [[maybe_unused]] const QueryTreeNodePtr & getRoot() const - { - return expressions.front(); - } - - const QueryTreeNodePtr & getTop() const - { - return expressions.back(); - } - - [[maybe_unused]] bool hasExpressionWithAlias(const std::string & alias) const - { - return alias_name_to_expressions.contains(alias); - } - - bool hasAggregateFunction() const - { - return aggregate_functions_counter > 0; - } - - QueryTreeNodePtr getExpressionWithAlias(const std::string & alias) const - { - auto expression_it = alias_name_to_expressions.find(alias); - if (expression_it == alias_name_to_expressions.end()) - return {}; - - return expression_it->second.front(); - } - - [[maybe_unused]] size_t size() const - { - return expressions.size(); - } - - bool empty() const - { - return expressions.empty(); - } - - void dump(WriteBuffer & buffer) const - { - buffer << expressions.size() << '\n'; - - for (const auto & expression : expressions) - { - buffer << "Expression "; - buffer << expression->formatASTForErrorMessage(); - - const auto & alias = expression->getAlias(); - if (!alias.empty()) - buffer << " alias " << alias; - - buffer << '\n'; - } - } - - [[maybe_unused]] String dump() const - { - WriteBufferFromOwnString buffer; - dump(buffer); - - return buffer.str(); - } - -private: - QueryTreeNodes expressions; - size_t aggregate_functions_counter = 0; - std::unordered_map alias_name_to_expressions; -}; - -struct ScopeAliases -{ - /// Alias name to query expression node - std::unordered_map alias_name_to_expression_node_before_group_by; - std::unordered_map alias_name_to_expression_node_after_group_by; - - std::unordered_map * alias_name_to_expression_node = nullptr; - - /// Alias name to lambda node - std::unordered_map alias_name_to_lambda_node; - - /// Alias name to table expression node - std::unordered_map alias_name_to_table_expression_node; - - /// Expressions like `x as y` where we can't say whether it's a function, expression or table. - std::unordered_map transitive_aliases; - - /// Nodes with duplicated aliases - std::unordered_set nodes_with_duplicated_aliases; - std::vector cloned_nodes_with_duplicated_aliases; - - /// Names which are aliases from ARRAY JOIN. - /// This is needed to properly qualify columns from matchers and avoid name collision. - std::unordered_set array_join_aliases; - - std::unordered_map & getAliasMap(IdentifierLookupContext lookup_context) - { - switch (lookup_context) - { - case IdentifierLookupContext::EXPRESSION: return *alias_name_to_expression_node; - case IdentifierLookupContext::FUNCTION: return alias_name_to_lambda_node; - case IdentifierLookupContext::TABLE_EXPRESSION: return alias_name_to_table_expression_node; - } - } - - enum class FindOption - { - FIRST_NAME, - FULL_NAME, - }; - - const std::string & getKey(const Identifier & identifier, FindOption find_option) - { - switch (find_option) - { - case FindOption::FIRST_NAME: return identifier.front(); - case FindOption::FULL_NAME: return identifier.getFullName(); - } - } - - QueryTreeNodePtr * find(IdentifierLookup lookup, FindOption find_option) - { - auto & alias_map = getAliasMap(lookup.lookup_context); - const std::string * key = &getKey(lookup.identifier, find_option); - - auto it = alias_map.find(*key); - - if (it != alias_map.end()) - return &it->second; - - if (lookup.lookup_context == IdentifierLookupContext::TABLE_EXPRESSION) - return {}; - - while (it == alias_map.end()) - { - auto jt = transitive_aliases.find(*key); - if (jt == transitive_aliases.end()) - return {}; - - key = &(getKey(jt->second, find_option)); - it = alias_map.find(*key); - } - - return &it->second; - } - - const QueryTreeNodePtr * find(IdentifierLookup lookup, FindOption find_option) const - { - return const_cast(this)->find(lookup, find_option); - } -}; - - -/** Projection names is name of query tree node that is used in projection part of query node. - * Example: SELECT id FROM test_table; - * `id` is projection name of column node - * - * Example: SELECT id AS id_alias FROM test_table; - * `id_alias` is projection name of column node - * - * Calculation of projection names is done during expression nodes resolution. This is done this way - * because after identifier node is resolved we lose information about identifier name. We could - * potentially save this information in query tree node itself, but that would require to clone it in some cases. - * Example: SELECT big_scalar_subquery AS a, a AS b, b AS c; - * All 3 nodes in projection are the same big_scalar_subquery, but they have different projection names. - * If we want to save it in query tree node, we have to clone subquery node that could lead to performance degradation. - * - * Possible solution is to separate query node metadata and query node content. So only node metadata could be cloned - * if we want to change projection name. This solution does not seem to be easy for client of query tree because projection - * name will be part of interface. If we potentially could hide projection names calculation in analyzer without introducing additional - * changes in query tree structure that would be preferable. - * - * Currently each resolve method returns projection names array. Resolve method must compute projection names of node. - * If node is resolved as list node this is case for `untuple` function or `matcher` result projection names array must contain projection names - * for result nodes. - * If node is not resolved as list node, projection names array contain single projection name for node. - * - * Rules for projection names: - * 1. If node has alias. It is node projection name. - * Except scenario where `untuple` function has alias. Example: SELECT untuple(expr) AS alias, alias. - * - * 2. For constant it is constant value string representation. - * - * 3. For identifier: - * If identifier is resolved from JOIN TREE, we want to remove additional identifier qualifications. - * Example: SELECT default.test_table.id FROM test_table. - * Result projection name is `id`. - * - * Example: SELECT t1.id FROM test_table_1 AS t1, test_table_2 AS t2 - * In example both test_table_1, test_table_2 have `id` column. - * In such case projection name is `t1.id` because if additional qualification is removed then column projection name `id` will be ambiguous. - * - * Example: SELECT default.test_table_1.id FROM test_table_1 AS t1, test_table_2 AS t2 - * In such case projection name is `test_table_1.id` because we remove unnecessary database qualification, but table name qualification cannot be removed - * because otherwise column projection name `id` will be ambiguous. - * - * If identifier is not resolved from JOIN TREE. Identifier name is projection name. - * Except scenario where `untuple` function resolved using identifier. Example: SELECT untuple(expr) AS alias, alias. - * Example: SELECT sum(1, 1) AS value, value. - * In such case both nodes have `value` projection names. - * - * Example: SELECT id AS value, value FROM test_table. - * In such case both nodes have have `value` projection names. - * - * Special case is `untuple` function. If `untuple` function specified with alias, then result nodes will have alias.tuple_column_name projection names. - * Example: SELECT cast(tuple(1), 'Tuple(id UInt64)') AS value, untuple(value) AS a; - * Result projection names are `value`, `a.id`. - * - * If `untuple` function does not have alias then result nodes will have `tupleElement(untuple_expression_projection_name, 'tuple_column_name') projection names. - * - * Example: SELECT cast(tuple(1), 'Tuple(id UInt64)') AS value, untuple(value); - * Result projection names are `value`, `tupleElement(value, 'id')`; - * - * 4. For function: - * Projection name consists from function_name(parameters_projection_names)(arguments_projection_names). - * Additionally if function is window function. Window node projection name is used with OVER clause. - * Example: function_name (parameters_names)(argument_projection_names) OVER window_name; - * Example: function_name (parameters_names)(argument_projection_names) OVER (PARTITION BY id ORDER BY id). - * Example: function_name (parameters_names)(argument_projection_names) OVER (window_name ORDER BY id). - * - * 5. For lambda: - * If it is standalone lambda that returns single expression, function projection name is used. - * Example: WITH (x -> x + 1) AS lambda SELECT lambda(1). - * Projection name is `lambda(1)`. - * - * If is it standalone lambda that returns list, projection names of list nodes are used. - * Example: WITH (x -> *) AS lambda SELECT lambda(1) FROM test_table; - * If test_table has two columns `id`, `value`. Then result projection names are `id`, `value`. - * - * If lambda is argument of function. - * Then projection name consists from lambda(tuple(lambda_arguments)(lambda_body_projection_name)); - * - * 6. For matcher: - * Matched nodes projection names are used as matcher projection names. - * - * Matched nodes must be qualified if needed. - * Example: SELECT * FROM test_table_1 AS t1, test_table_2 AS t2. - * In example table test_table_1 and test_table_2 both have `id`, `value` columns. - * Matched nodes after unqualified matcher resolve must be qualified to avoid ambiguous projection names. - * Result projection names must be `t1.id`, `t1.value`, `t2.id`, `t2.value`. - * - * There are special cases - * 1. For lambda inside APPLY matcher transformer: - * Example: SELECT * APPLY x -> toString(x) FROM test_table. - * In such case lambda argument projection name `x` will be replaced by matched node projection name. - * If table has two columns `id` and `value`. Then result projection names are `toString(id)`, `toString(value)`; - * - * 2. For unqualified matcher when JOIN tree contains JOIN with USING. - * Example: SELECT * FROM test_table_1 AS t1 INNER JOIN test_table_2 AS t2 USING(id); - * Result projection names must be `id`, `t1.value`, `t2.value`. - * - * 7. For subquery: - * For subquery projection name consists of `_subquery_` prefix and implementation specific unique number suffix. - * Example: SELECT (SELECT 1), (SELECT 1 UNION DISTINCT SELECT 1); - * Result projection name can be `_subquery_1`, `subquery_2`; - * - * 8. For table: - * Table node can be used in expression context only as right argument of IN function. In that case identifier is used - * as table node projection name. - * Example: SELECT id IN test_table FROM test_table; - * Result projection name is `in(id, test_table)`. - */ -using ProjectionName = String; -using ProjectionNames = std::vector; -constexpr auto PROJECTION_NAME_PLACEHOLDER = "__projection_name_placeholder"; - -struct IdentifierResolveScope -{ - /// Construct identifier resolve scope using scope node, and parent scope - IdentifierResolveScope(QueryTreeNodePtr scope_node_, IdentifierResolveScope * parent_scope_) - : scope_node(std::move(scope_node_)) - , parent_scope(parent_scope_) - { - if (parent_scope) - { - subquery_depth = parent_scope->subquery_depth; - context = parent_scope->context; - projection_mask_map = parent_scope->projection_mask_map; - } - else - projection_mask_map = std::make_shared>(); - - if (auto * union_node = scope_node->as()) - { - context = union_node->getContext(); - } - else if (auto * query_node = scope_node->as()) - { - context = query_node->getContext(); - group_by_use_nulls = context->getSettingsRef().group_by_use_nulls && - (query_node->isGroupByWithGroupingSets() || query_node->isGroupByWithRollup() || query_node->isGroupByWithCube()); - } - - if (context) - join_use_nulls = context->getSettingsRef().join_use_nulls; - else if (parent_scope) - join_use_nulls = parent_scope->join_use_nulls; - - aliases.alias_name_to_expression_node = &aliases.alias_name_to_expression_node_before_group_by; - } - - QueryTreeNodePtr scope_node; - - IdentifierResolveScope * parent_scope = nullptr; - - ContextPtr context; - - /// Identifier lookup to result - std::unordered_map identifier_lookup_to_resolve_state; - - /// Argument can be expression like constant, column, function or table expression - std::unordered_map expression_argument_name_to_node; - - ScopeAliases aliases; - - /// Table column name to column node. Valid only during table ALIAS columns resolve. - ColumnNameToColumnNodeMap column_name_to_column_node; - - /// CTE name to query node - std::unordered_map cte_name_to_query_node; - - /// Window name to window node - std::unordered_map window_name_to_window_node; - - /// Current scope expression in resolve process stack - ExpressionsStack expressions_in_resolve_process_stack; - - /// Table expressions in resolve process - std::unordered_set table_expressions_in_resolve_process; - - /// Current scope expression - std::unordered_set non_cached_identifier_lookups_during_expression_resolve; - - /// Table expression node to data - std::unordered_map table_expression_node_to_data; - - QueryTreeNodePtrWithHashIgnoreTypesSet nullable_group_by_keys; - /// Here we count the number of nullable GROUP BY keys we met resolving expression. - /// E.g. for a query `SELECT tuple(tuple(number)) FROM numbers(10) GROUP BY (number, tuple(number)) with cube` - /// both `number` and `tuple(number)` would be in nullable_group_by_keys. - /// But when we resolve `tuple(tuple(number))` we should figure out that `tuple(number)` is already a key, - /// and we should not convert `number` to nullable. - size_t found_nullable_group_by_key_in_scope = 0; - - /** It's possible that after a JOIN, a column in the projection has a type different from the column in the source table. - * (For example, after join_use_nulls or USING column casted to supertype) - * However, the column in the projection still refers to the table as its source. - * This map is used to revert these columns back to their original columns in the source table. - */ - QueryTreeNodePtrWithHashMap join_columns_with_changed_types; - - /// Use identifier lookup to result cache - bool use_identifier_lookup_to_result_cache = true; - - /// Apply nullability to aggregation keys - bool group_by_use_nulls = false; - /// Join retutns NULLs instead of default values - bool join_use_nulls = false; - - /// JOINs count - size_t joins_count = 0; - - /// Subquery depth - size_t subquery_depth = 0; - - /** Scope join tree node for expression. - * Valid only during analysis construction for single expression. - */ - QueryTreeNodePtr expression_join_tree_node; - - /// Node hash to mask id map - std::shared_ptr> projection_mask_map; - - struct ResolvedFunctionsCache - { - FunctionOverloadResolverPtr resolver; - FunctionBasePtr function_base; - }; - - std::map functions_cache; - - [[maybe_unused]] const IdentifierResolveScope * getNearestQueryScope() const - { - const IdentifierResolveScope * scope_to_check = this; - while (scope_to_check != nullptr) - { - if (scope_to_check->scope_node->getNodeType() == QueryTreeNodeType::QUERY) - break; - - scope_to_check = scope_to_check->parent_scope; - } - - return scope_to_check; - } - - IdentifierResolveScope * getNearestQueryScope() - { - IdentifierResolveScope * scope_to_check = this; - while (scope_to_check != nullptr) - { - if (scope_to_check->scope_node->getNodeType() == QueryTreeNodeType::QUERY) - break; - - scope_to_check = scope_to_check->parent_scope; - } - - return scope_to_check; - } - - TableExpressionData & getTableExpressionDataOrThrow(const QueryTreeNodePtr & table_expression_node) - { - auto it = table_expression_node_to_data.find(table_expression_node); - if (it == table_expression_node_to_data.end()) - { - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Table expression {} data must be initialized. In scope {}", - table_expression_node->formatASTForErrorMessage(), - scope_node->formatASTForErrorMessage()); - } - - return it->second; - } - - const TableExpressionData & getTableExpressionDataOrThrow(const QueryTreeNodePtr & table_expression_node) const - { - auto it = table_expression_node_to_data.find(table_expression_node); - if (it == table_expression_node_to_data.end()) - { - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Table expression {} data must be initialized. In scope {}", - table_expression_node->formatASTForErrorMessage(), - scope_node->formatASTForErrorMessage()); - } - - return it->second; - } - - void pushExpressionNode(const QueryTreeNodePtr & node) - { - bool had_aggregate_function = expressions_in_resolve_process_stack.hasAggregateFunction(); - expressions_in_resolve_process_stack.push(node); - if (group_by_use_nulls && had_aggregate_function != expressions_in_resolve_process_stack.hasAggregateFunction()) - aliases.alias_name_to_expression_node = &aliases.alias_name_to_expression_node_before_group_by; - } - - void popExpressionNode() - { - bool had_aggregate_function = expressions_in_resolve_process_stack.hasAggregateFunction(); - expressions_in_resolve_process_stack.pop(); - if (group_by_use_nulls && had_aggregate_function != expressions_in_resolve_process_stack.hasAggregateFunction()) - aliases.alias_name_to_expression_node = &aliases.alias_name_to_expression_node_after_group_by; - } - - /// Dump identifier resolve scope - [[maybe_unused]] void dump(WriteBuffer & buffer) const - { - buffer << "Scope node " << scope_node->formatASTForErrorMessage() << '\n'; - buffer << "Identifier lookup to resolve state " << identifier_lookup_to_resolve_state.size() << '\n'; - for (const auto & [identifier, state] : identifier_lookup_to_resolve_state) - { - buffer << "Identifier " << identifier.dump() << " resolve result "; - state.resolve_result.dump(buffer); - buffer << '\n'; - } - - buffer << "Expression argument name to node " << expression_argument_name_to_node.size() << '\n'; - for (const auto & [alias_name, node] : expression_argument_name_to_node) - buffer << "Alias name " << alias_name << " node " << node->formatASTForErrorMessage() << '\n'; - - buffer << "Alias name to expression node table size " << aliases.alias_name_to_expression_node->size() << '\n'; - for (const auto & [alias_name, node] : *aliases.alias_name_to_expression_node) - buffer << "Alias name " << alias_name << " expression node " << node->dumpTree() << '\n'; - - buffer << "Alias name to function node table size " << aliases.alias_name_to_lambda_node.size() << '\n'; - for (const auto & [alias_name, node] : aliases.alias_name_to_lambda_node) - buffer << "Alias name " << alias_name << " lambda node " << node->formatASTForErrorMessage() << '\n'; - - buffer << "Alias name to table expression node table size " << aliases.alias_name_to_table_expression_node.size() << '\n'; - for (const auto & [alias_name, node] : aliases.alias_name_to_table_expression_node) - buffer << "Alias name " << alias_name << " node " << node->formatASTForErrorMessage() << '\n'; - - buffer << "CTE name to query node table size " << cte_name_to_query_node.size() << '\n'; - for (const auto & [cte_name, node] : cte_name_to_query_node) - buffer << "CTE name " << cte_name << " node " << node->formatASTForErrorMessage() << '\n'; - - buffer << "WINDOW name to window node table size " << window_name_to_window_node.size() << '\n'; - for (const auto & [window_name, node] : window_name_to_window_node) - buffer << "CTE name " << window_name << " node " << node->formatASTForErrorMessage() << '\n'; - - buffer << "Nodes with duplicated aliases size " << aliases.nodes_with_duplicated_aliases.size() << '\n'; - for (const auto & node : aliases.nodes_with_duplicated_aliases) - buffer << "Alias name " << node->getAlias() << " node " << node->formatASTForErrorMessage() << '\n'; - - buffer << "Expression resolve process stack " << '\n'; - expressions_in_resolve_process_stack.dump(buffer); - - buffer << "Table expressions in resolve process size " << table_expressions_in_resolve_process.size() << '\n'; - for (const auto & node : table_expressions_in_resolve_process) - buffer << "Table expression " << node->formatASTForErrorMessage() << '\n'; - - buffer << "Non cached identifier lookups during expression resolve " << non_cached_identifier_lookups_during_expression_resolve.size() << '\n'; - for (const auto & identifier_lookup : non_cached_identifier_lookups_during_expression_resolve) - buffer << "Identifier lookup " << identifier_lookup.dump() << '\n'; - - buffer << "Table expression node to data " << table_expression_node_to_data.size() << '\n'; - for (const auto & [table_expression_node, table_expression_data] : table_expression_node_to_data) - buffer << "Table expression node " << table_expression_node->formatASTForErrorMessage() << " data " << table_expression_data.dump() << '\n'; - - buffer << "Use identifier lookup to result cache " << use_identifier_lookup_to_result_cache << '\n'; - buffer << "Subquery depth " << subquery_depth << '\n'; - } - - [[maybe_unused]] String dump() const - { - WriteBufferFromOwnString buffer; - dump(buffer); - - return buffer.str(); - } -}; - - -/** Visitor that extracts expression and function aliases from node and initialize scope tables with it. - * Does not go into child lambdas and queries. - * - * Important: - * Identifier nodes with aliases are added both in alias to expression and alias to function map. - * - * These is necessary because identifier with alias can give alias name to any query tree node. - * - * Example: - * WITH (x -> x + 1) AS id, id AS value SELECT value(1); - * In this example id as value is identifier node that has alias, during scope initialization we cannot derive - * that id is actually lambda or expression. - * - * There are no easy solution here, without trying to make full featured expression resolution at this stage. - * Example: - * WITH (x -> x + 1) AS id, id AS id_1, id_1 AS id_2 SELECT id_2(1); - * Example: SELECT a, b AS a, b AS c, 1 AS c; - * - * It is client responsibility after resolving identifier node with alias, make following actions: - * 1. If identifier node was resolved in function scope, remove alias from scope expression map. - * 2. If identifier node was resolved in expression scope, remove alias from scope function map. - * - * That way we separate alias map initialization and expressions resolution. - */ -class QueryExpressionsAliasVisitor : public InDepthQueryTreeVisitor -{ -public: - explicit QueryExpressionsAliasVisitor(ScopeAliases & aliases_) - : aliases(aliases_) - {} - - void visitImpl(QueryTreeNodePtr & node) - { - updateAliasesIfNeeded(node, false /*is_lambda_node*/); - } - - bool needChildVisit(const QueryTreeNodePtr &, const QueryTreeNodePtr & child) - { - if (auto * lambda_node = child->as()) - { - updateAliasesIfNeeded(child, true /*is_lambda_node*/); - return false; - } - else if (auto * query_tree_node = child->as()) - { - if (query_tree_node->isCTE()) - return false; - - updateAliasesIfNeeded(child, false /*is_lambda_node*/); - return false; - } - else if (auto * union_node = child->as()) - { - if (union_node->isCTE()) - return false; - - updateAliasesIfNeeded(child, false /*is_lambda_node*/); - return false; - } - - return true; - } -private: - void addDuplicatingAlias(const QueryTreeNodePtr & node) - { - aliases.nodes_with_duplicated_aliases.emplace(node); - auto cloned_node = node->clone(); - aliases.cloned_nodes_with_duplicated_aliases.emplace_back(cloned_node); - aliases.nodes_with_duplicated_aliases.emplace(cloned_node); - } - - void updateAliasesIfNeeded(const QueryTreeNodePtr & node, bool is_lambda_node) - { - if (!node->hasAlias()) - return; - - // We should not resolve expressions to WindowNode - if (node->getNodeType() == QueryTreeNodeType::WINDOW) - return; - - const auto & alias = node->getAlias(); - - if (is_lambda_node) - { - if (aliases.alias_name_to_expression_node->contains(alias)) - addDuplicatingAlias(node); - - auto [_, inserted] = aliases.alias_name_to_lambda_node.insert(std::make_pair(alias, node)); - if (!inserted) - addDuplicatingAlias(node); - - return; - } - - if (aliases.alias_name_to_lambda_node.contains(alias)) - addDuplicatingAlias(node); - - auto [_, inserted] = aliases.alias_name_to_expression_node->insert(std::make_pair(alias, node)); - if (!inserted) - addDuplicatingAlias(node); - - /// If node is identifier put it into transitive aliases map. - if (const auto * identifier = typeid_cast(node.get())) - aliases.transitive_aliases.insert(std::make_pair(alias, identifier->getIdentifier())); - } - - ScopeAliases & aliases; -}; - -class TableExpressionsAliasVisitor : public InDepthQueryTreeVisitor -{ -public: - explicit TableExpressionsAliasVisitor(IdentifierResolveScope & scope_) - : scope(scope_) - {} - - void visitImpl(QueryTreeNodePtr & node) - { - updateAliasesIfNeeded(node); - } - - static bool needChildVisit(const QueryTreeNodePtr & node, const QueryTreeNodePtr & child) - { - auto node_type = node->getNodeType(); - - switch (node_type) - { - case QueryTreeNodeType::ARRAY_JOIN: - { - const auto & array_join_node = node->as(); - return child.get() == array_join_node.getTableExpression().get(); - } - case QueryTreeNodeType::JOIN: - { - const auto & join_node = node->as(); - return child.get() == join_node.getLeftTableExpression().get() || child.get() == join_node.getRightTableExpression().get(); - } - default: - { - break; - } - } - - return false; - } - -private: - void updateAliasesIfNeeded(const QueryTreeNodePtr & node) - { - if (!node->hasAlias()) - return; - - const auto & node_alias = node->getAlias(); - auto [_, inserted] = scope.aliases.alias_name_to_table_expression_node.emplace(node_alias, node); - if (!inserted) - throw Exception(ErrorCodes::MULTIPLE_EXPRESSIONS_FOR_ALIAS, - "Multiple table expressions with same alias {}. In scope {}", - node_alias, - scope.scope_node->formatASTForErrorMessage()); - } - - IdentifierResolveScope & scope; -}; - -class QueryAnalyzer -{ -public: - explicit QueryAnalyzer(bool only_analyze_) : only_analyze(only_analyze_) {} - - void resolve(QueryTreeNodePtr & node, const QueryTreeNodePtr & table_expression, ContextPtr context) - { - IdentifierResolveScope scope(node, nullptr /*parent_scope*/); - - if (!scope.context) - scope.context = context; - - auto node_type = node->getNodeType(); - - switch (node_type) - { - case QueryTreeNodeType::QUERY: - { - if (table_expression) - throw Exception(ErrorCodes::LOGICAL_ERROR, - "For query analysis table expression must be empty"); - - resolveQuery(node, scope); - break; - } - case QueryTreeNodeType::UNION: - { - if (table_expression) - throw Exception(ErrorCodes::LOGICAL_ERROR, - "For union analysis table expression must be empty"); - - resolveUnion(node, scope); - break; - } - case QueryTreeNodeType::IDENTIFIER: - [[fallthrough]]; - case QueryTreeNodeType::CONSTANT: - [[fallthrough]]; - case QueryTreeNodeType::COLUMN: - [[fallthrough]]; - case QueryTreeNodeType::FUNCTION: - [[fallthrough]]; - case QueryTreeNodeType::LIST: - { - if (table_expression) - { - scope.expression_join_tree_node = table_expression; - validateTableExpressionModifiers(scope.expression_join_tree_node, scope); - initializeTableExpressionData(scope.expression_join_tree_node, scope); - } - - if (node_type == QueryTreeNodeType::LIST) - resolveExpressionNodeList(node, scope, false /*allow_lambda_expression*/, false /*allow_table_expression*/); - else - resolveExpressionNode(node, scope, false /*allow_lambda_expression*/, false /*allow_table_expression*/); - - break; - } - case QueryTreeNodeType::TABLE_FUNCTION: - { - QueryExpressionsAliasVisitor expressions_alias_visitor(scope.aliases); - resolveTableFunction(node, scope, expressions_alias_visitor, false /*nested_table_function*/); - break; - } - default: - { - throw Exception(ErrorCodes::BAD_ARGUMENTS, - "Node {} with type {} is not supported by query analyzer. " - "Supported nodes are query, union, identifier, constant, column, function, list.", - node->formatASTForErrorMessage(), - node->getNodeTypeName()); - } - } - } - -private: - /// Utility functions - - static bool isExpressionNodeType(QueryTreeNodeType node_type); - - static bool isFunctionExpressionNodeType(QueryTreeNodeType node_type); - - static bool isSubqueryNodeType(QueryTreeNodeType node_type); - - static bool isTableExpressionNodeType(QueryTreeNodeType node_type); - - static DataTypePtr getExpressionNodeResultTypeOrNull(const QueryTreeNodePtr & query_tree_node); - - static ProjectionName calculateFunctionProjectionName(const QueryTreeNodePtr & function_node, - const ProjectionNames & parameters_projection_names, - const ProjectionNames & arguments_projection_names); - - static ProjectionName calculateWindowProjectionName(const QueryTreeNodePtr & window_node, - const QueryTreeNodePtr & parent_window_node, - const String & parent_window_name, - const ProjectionNames & partition_by_projection_names, - const ProjectionNames & order_by_projection_names, - const ProjectionName & frame_begin_offset_projection_name, - const ProjectionName & frame_end_offset_projection_name); - - static ProjectionName calculateSortColumnProjectionName(const QueryTreeNodePtr & sort_column_node, - const ProjectionName & sort_expression_projection_name, - const ProjectionName & fill_from_expression_projection_name, - const ProjectionName & fill_to_expression_projection_name, - const ProjectionName & fill_step_expression_projection_name); - - static void collectCompoundExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier, - const DataTypePtr & compound_expression_type, - const Identifier & valid_identifier_prefix, - std::unordered_set & valid_identifiers_result); - - static void collectTableExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier, - const QueryTreeNodePtr & table_expression, - const TableExpressionData & table_expression_data, - std::unordered_set & valid_identifiers_result); - - static void collectScopeValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier, - const IdentifierResolveScope & scope, - bool allow_expression_identifiers, - bool allow_function_identifiers, - bool allow_table_expression_identifiers, - std::unordered_set & valid_identifiers_result); - - static void collectScopeWithParentScopesValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier, - const IdentifierResolveScope & scope, - bool allow_expression_identifiers, - bool allow_function_identifiers, - bool allow_table_expression_identifiers, - std::unordered_set & valid_identifiers_result); - - static std::vector collectIdentifierTypoHints(const Identifier & unresolved_identifier, const std::unordered_set & valid_identifiers); - - static QueryTreeNodePtr wrapExpressionNodeInTupleElement(QueryTreeNodePtr expression_node, IdentifierView nested_path); - - QueryTreeNodePtr tryGetLambdaFromSQLUserDefinedFunctions(const std::string & function_name, ContextPtr context); - - void evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & query_tree_node, IdentifierResolveScope & scope); - - static void mergeWindowWithParentWindow(const QueryTreeNodePtr & window_node, const QueryTreeNodePtr & parent_window_node, IdentifierResolveScope & scope); - - void replaceNodesWithPositionalArguments(QueryTreeNodePtr & node_list, const QueryTreeNodes & projection_nodes, IdentifierResolveScope & scope); - - static void convertLimitOffsetExpression(QueryTreeNodePtr & expression_node, const String & expression_description, IdentifierResolveScope & scope); - - static void validateTableExpressionModifiers(const QueryTreeNodePtr & table_expression_node, IdentifierResolveScope & scope); - - static void validateJoinTableExpressionWithoutAlias(const QueryTreeNodePtr & join_node, const QueryTreeNodePtr & table_expression_node, IdentifierResolveScope & scope); - - static void checkDuplicateTableNamesOrAlias(const QueryTreeNodePtr & join_node, QueryTreeNodePtr & left_table_expr, QueryTreeNodePtr & right_table_expr, IdentifierResolveScope & scope); - - static std::pair recursivelyCollectMaxOrdinaryExpressions(QueryTreeNodePtr & node, QueryTreeNodes & into); - - static void expandGroupByAll(QueryNode & query_tree_node_typed); - - void expandOrderByAll(QueryNode & query_tree_node_typed, const Settings & settings); - - static std::string - rewriteAggregateFunctionNameIfNeeded(const std::string & aggregate_function_name, NullsAction action, const ContextPtr & context); - - static std::optional getColumnSideFromJoinTree(const QueryTreeNodePtr & resolved_identifier, const JoinNode & join_node) - { - if (resolved_identifier->getNodeType() == QueryTreeNodeType::CONSTANT) - return {}; - - if (resolved_identifier->getNodeType() == QueryTreeNodeType::FUNCTION) - { - const auto & resolved_function = resolved_identifier->as(); - - const auto & argument_nodes = resolved_function.getArguments().getNodes(); - - std::optional result; - for (const auto & argument_node : argument_nodes) - { - auto table_side = getColumnSideFromJoinTree(argument_node, join_node); - if (table_side && result && *table_side != *result) - { - throw Exception(ErrorCodes::AMBIGUOUS_IDENTIFIER, - "Ambiguous identifier {}. In scope {}", - resolved_identifier->formatASTForErrorMessage(), - join_node.formatASTForErrorMessage()); - } - result = table_side; - } - return result; - } - - const auto * column_src = resolved_identifier->as().getColumnSource().get(); - - if (join_node.getLeftTableExpression().get() == column_src) - return JoinTableSide::Left; - if (join_node.getRightTableExpression().get() == column_src) - return JoinTableSide::Right; + if (resolved_identifier->getNodeType() == QueryTreeNodeType::CONSTANT) return {}; - } - static QueryTreeNodePtr convertJoinedColumnTypeToNullIfNeeded( - const QueryTreeNodePtr & resolved_identifier, - const JoinKind & join_kind, - std::optional resolved_side, - IdentifierResolveScope & scope) + if (resolved_identifier->getNodeType() == QueryTreeNodeType::FUNCTION) { - if (resolved_identifier->getNodeType() == QueryTreeNodeType::COLUMN && - JoinCommon::canBecomeNullable(resolved_identifier->getResultType()) && - (isFull(join_kind) || - (isLeft(join_kind) && resolved_side && *resolved_side == JoinTableSide::Right) || - (isRight(join_kind) && resolved_side && *resolved_side == JoinTableSide::Left))) + const auto & resolved_function = resolved_identifier->as(); + + const auto & argument_nodes = resolved_function.getArguments().getNodes(); + + std::optional result; + for (const auto & argument_node : argument_nodes) { - auto nullable_resolved_identifier = resolved_identifier->clone(); - auto & resolved_column = nullable_resolved_identifier->as(); - auto new_result_type = makeNullableOrLowCardinalityNullable(resolved_column.getColumnType()); - resolved_column.setColumnType(new_result_type); - if (resolved_column.hasExpression()) + auto table_side = getColumnSideFromJoinTree(argument_node, join_node); + if (table_side && result && *table_side != *result) { - auto & resolved_expression = resolved_column.getExpression(); - if (!resolved_expression->getResultType()->equals(*new_result_type)) - resolved_expression = buildCastFunction(resolved_expression, new_result_type, scope.context, true); + throw Exception(ErrorCodes::AMBIGUOUS_IDENTIFIER, + "Ambiguous identifier {}. In scope {}", + resolved_identifier->formatASTForErrorMessage(), + join_node.formatASTForErrorMessage()); } - if (!nullable_resolved_identifier->isEqual(*resolved_identifier)) - scope.join_columns_with_changed_types[nullable_resolved_identifier] = resolved_identifier; - return nullable_resolved_identifier; + result = table_side; } - return nullptr; + return result; } - /// Resolve identifier functions + const auto * column_src = resolved_identifier->as().getColumnSource().get(); - static QueryTreeNodePtr tryResolveTableIdentifierFromDatabaseCatalog(const Identifier & table_identifier, ContextPtr context); + if (join_node.getLeftTableExpression().get() == column_src) + return JoinTableSide::Left; + if (join_node.getRightTableExpression().get() == column_src) + return JoinTableSide::Right; + return {}; +} - QueryTreeNodePtr tryResolveIdentifierFromCompoundExpression(const Identifier & expression_identifier, - size_t identifier_bind_size, - const QueryTreeNodePtr & compound_expression, - String compound_expression_source, - IdentifierResolveScope & scope, - bool can_be_not_found = false); - - QueryTreeNodePtr tryResolveIdentifierFromExpressionArguments(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope); - - static bool tryBindIdentifierToAliases(const IdentifierLookup & identifier_lookup, const IdentifierResolveScope & scope); - - QueryTreeNodePtr tryResolveIdentifierFromAliases(const IdentifierLookup & identifier_lookup, - IdentifierResolveScope & scope, - IdentifierResolveSettings identifier_resolve_settings); - - QueryTreeNodePtr tryResolveIdentifierFromTableColumns(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope); - - static bool tryBindIdentifierToTableExpression(const IdentifierLookup & identifier_lookup, - const QueryTreeNodePtr & table_expression_node, - const IdentifierResolveScope & scope); - - static bool tryBindIdentifierToTableExpressions(const IdentifierLookup & identifier_lookup, - const QueryTreeNodePtr & table_expression_node, - const IdentifierResolveScope & scope); - - QueryTreeNodePtr tryResolveIdentifierFromTableExpression(const IdentifierLookup & identifier_lookup, - const QueryTreeNodePtr & table_expression_node, - IdentifierResolveScope & scope); - - QueryTreeNodePtr tryResolveIdentifierFromJoin(const IdentifierLookup & identifier_lookup, - const QueryTreeNodePtr & table_expression_node, - IdentifierResolveScope & scope); - - QueryTreeNodePtr matchArrayJoinSubcolumns( - const QueryTreeNodePtr & array_join_column_inner_expression, - const ColumnNode & array_join_column_expression_typed, - const QueryTreeNodePtr & resolved_expression, - IdentifierResolveScope & scope); - - QueryTreeNodePtr tryResolveExpressionFromArrayJoinExpressions(const QueryTreeNodePtr & resolved_expression, - const QueryTreeNodePtr & table_expression_node, - IdentifierResolveScope & scope); - - QueryTreeNodePtr tryResolveIdentifierFromArrayJoin(const IdentifierLookup & identifier_lookup, - const QueryTreeNodePtr & table_expression_node, - IdentifierResolveScope & scope); - - QueryTreeNodePtr tryResolveIdentifierFromJoinTreeNode(const IdentifierLookup & identifier_lookup, - const QueryTreeNodePtr & join_tree_node, - IdentifierResolveScope & scope); - - QueryTreeNodePtr tryResolveIdentifierFromJoinTree(const IdentifierLookup & identifier_lookup, - IdentifierResolveScope & scope); - - IdentifierResolveResult tryResolveIdentifierInParentScopes(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope); - - IdentifierResolveResult tryResolveIdentifier(const IdentifierLookup & identifier_lookup, - IdentifierResolveScope & scope, - IdentifierResolveSettings identifier_resolve_settings = {}); - - QueryTreeNodePtr tryResolveIdentifierFromStorage( - const Identifier & identifier, - const QueryTreeNodePtr & table_expression_node, - const TableExpressionData & table_expression_data, - IdentifierResolveScope & scope, - size_t identifier_column_qualifier_parts, - bool can_be_not_found = false); - - /// Resolve query tree nodes functions - - void qualifyColumnNodesWithProjectionNames(const QueryTreeNodes & column_nodes, - const QueryTreeNodePtr & table_expression_node, - const IdentifierResolveScope & scope); - - static GetColumnsOptions buildGetColumnsOptions(QueryTreeNodePtr & matcher_node, const ContextPtr & context); - - using QueryTreeNodesWithNames = std::vector>; - - QueryTreeNodesWithNames getMatchedColumnNodesWithNames(const QueryTreeNodePtr & matcher_node, - const QueryTreeNodePtr & table_expression_node, - const NamesAndTypes & matched_columns, - const IdentifierResolveScope & scope); - - void updateMatchedColumnsFromJoinUsing(QueryTreeNodesWithNames & result_matched_column_nodes_with_names, const QueryTreeNodePtr & source_table_expression, IdentifierResolveScope & scope); - - QueryTreeNodesWithNames resolveQualifiedMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope); - - QueryTreeNodesWithNames resolveUnqualifiedMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope); - - ProjectionNames resolveMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope); - - ProjectionName resolveWindow(QueryTreeNodePtr & window_node, IdentifierResolveScope & scope); - - ProjectionNames resolveLambda(const QueryTreeNodePtr & lambda_node, - const QueryTreeNodePtr & lambda_node_to_resolve, - const QueryTreeNodes & lambda_arguments, - IdentifierResolveScope & scope); - - ProjectionNames resolveFunction(QueryTreeNodePtr & function_node, IdentifierResolveScope & scope); - - ProjectionNames resolveExpressionNode(QueryTreeNodePtr & node, IdentifierResolveScope & scope, bool allow_lambda_expression, bool allow_table_expression, bool ignore_alias = false); - - ProjectionNames resolveExpressionNodeList(QueryTreeNodePtr & node_list, IdentifierResolveScope & scope, bool allow_lambda_expression, bool allow_table_expression); - - ProjectionNames resolveSortNodeList(QueryTreeNodePtr & sort_node_list, IdentifierResolveScope & scope); - - void resolveGroupByNode(QueryNode & query_node_typed, IdentifierResolveScope & scope); - - void resolveInterpolateColumnsNodeList(QueryTreeNodePtr & interpolate_node_list, IdentifierResolveScope & scope); - - void resolveWindowNodeList(QueryTreeNodePtr & window_node_list, IdentifierResolveScope & scope); - - NamesAndTypes resolveProjectionExpressionNodeList(QueryTreeNodePtr & projection_node_list, IdentifierResolveScope & scope); - - void initializeQueryJoinTreeNode(QueryTreeNodePtr & join_tree_node, IdentifierResolveScope & scope); - - void initializeTableExpressionData(const QueryTreeNodePtr & table_expression_node, IdentifierResolveScope & scope); - - void resolveTableFunction(QueryTreeNodePtr & table_function_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor, bool nested_table_function); - - void resolveArrayJoin(QueryTreeNodePtr & array_join_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor); - - void resolveJoin(QueryTreeNodePtr & join_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor); - - void resolveQueryJoinTreeNode(QueryTreeNodePtr & join_tree_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor); - - void resolveQuery(const QueryTreeNodePtr & query_node, IdentifierResolveScope & scope); - - void resolveUnion(const QueryTreeNodePtr & union_node, IdentifierResolveScope & scope); - - /// Lambdas that are currently in resolve process - std::unordered_set lambdas_in_resolve_process; - - /// CTEs that are currently in resolve process - std::unordered_set ctes_in_resolve_process; - - /// Function name to user defined lambda map - std::unordered_map function_name_to_user_defined_lambda; - - /// Array join expressions counter - size_t array_join_expressions_counter = 1; - - /// Subquery counter - size_t subquery_counter = 1; - - /// Global expression node to projection name map - std::unordered_map node_to_projection_name; - - /// Global resolve expression node to projection names map - std::unordered_map resolved_expressions; - - /// Global resolve expression node to tree size - std::unordered_map node_to_tree_size; - - /// Global scalar subquery to scalar value map - std::unordered_map scalar_subquery_to_scalar_value_local; - std::unordered_map scalar_subquery_to_scalar_value_global; - - const bool only_analyze; -}; +QueryTreeNodePtr QueryAnalyzer::convertJoinedColumnTypeToNullIfNeeded( + const QueryTreeNodePtr & resolved_identifier, + const JoinKind & join_kind, + std::optional resolved_side, + IdentifierResolveScope & scope) +{ + if (resolved_identifier->getNodeType() == QueryTreeNodeType::COLUMN && + JoinCommon::canBecomeNullable(resolved_identifier->getResultType()) && + (isFull(join_kind) || + (isLeft(join_kind) && resolved_side && *resolved_side == JoinTableSide::Right) || + (isRight(join_kind) && resolved_side && *resolved_side == JoinTableSide::Left))) + { + auto nullable_resolved_identifier = resolved_identifier->clone(); + auto & resolved_column = nullable_resolved_identifier->as(); + auto new_result_type = makeNullableOrLowCardinalityNullable(resolved_column.getColumnType()); + resolved_column.setColumnType(new_result_type); + if (resolved_column.hasExpression()) + { + auto & resolved_expression = resolved_column.getExpression(); + if (!resolved_expression->getResultType()->equals(*new_result_type)) + resolved_expression = buildCastFunction(resolved_expression, new_result_type, scope.context, true); + } + if (!nullable_resolved_identifier->isEqual(*resolved_identifier)) + scope.join_columns_with_changed_types[nullable_resolved_identifier] = resolved_identifier; + return nullable_resolved_identifier; + } + return nullptr; +} /// Utility functions implementation - bool QueryAnalyzer::isExpressionNodeType(QueryTreeNodeType node_type) { return node_type == QueryTreeNodeType::CONSTANT || node_type == QueryTreeNodeType::COLUMN || node_type == QueryTreeNodeType::FUNCTION @@ -1862,7 +513,7 @@ void QueryAnalyzer::collectCompoundExpressionValidIdentifiersForTypoCorrection( void QueryAnalyzer::collectTableExpressionValidIdentifiersForTypoCorrection( const Identifier & unresolved_identifier, const QueryTreeNodePtr & table_expression, - const TableExpressionData & table_expression_data, + const AnalysisTableExpressionData & table_expression_data, std::unordered_set & valid_identifiers_result) { for (const auto & [column_name, column_node] : table_expression_data.column_name_to_column_node) @@ -3118,7 +1769,7 @@ bool QueryAnalyzer::tryBindIdentifierToTableExpressions(const IdentifierLookup & QueryTreeNodePtr QueryAnalyzer::tryResolveIdentifierFromStorage( const Identifier & identifier, const QueryTreeNodePtr & table_expression_node, - const TableExpressionData & table_expression_data, + const AnalysisTableExpressionData & table_expression_data, IdentifierResolveScope & scope, size_t identifier_column_qualifier_parts, bool can_be_not_found) @@ -4389,7 +3040,7 @@ QueryAnalyzer::QueryTreeNodesWithNames QueryAnalyzer::getMatchedColumnNodesWithN /** Use resolved columns from table expression data in nearest query scope if available. * It is important for ALIAS columns to use column nodes with resolved ALIAS expression. */ - const TableExpressionData * table_expression_data = nullptr; + const AnalysisTableExpressionData * table_expression_data = nullptr; const auto * nearest_query_scope = scope.getNearestQueryScope(); if (nearest_query_scope) table_expression_data = &nearest_query_scope->getTableExpressionDataOrThrow(table_expression_node); @@ -7142,7 +5793,7 @@ void QueryAnalyzer::initializeTableExpressionData(const QueryTreeNodePtr & table if (table_expression_data_it != scope.table_expression_node_to_data.end()) return; - TableExpressionData table_expression_data; + AnalysisTableExpressionData table_expression_data; if (table_node) { @@ -8462,19 +7113,3 @@ void QueryAnalyzer::resolveUnion(const QueryTreeNodePtr & union_node, Identifier } } - -QueryAnalysisPass::QueryAnalysisPass(QueryTreeNodePtr table_expression_, bool only_analyze_) - : table_expression(std::move(table_expression_)) - , only_analyze(only_analyze_) -{} - -QueryAnalysisPass::QueryAnalysisPass(bool only_analyze_) : only_analyze(only_analyze_) {} - -void QueryAnalysisPass::run(QueryTreeNodePtr & query_tree_node, ContextPtr context) -{ - QueryAnalyzer analyzer(only_analyze); - analyzer.resolve(query_tree_node, table_expression, context); - createUniqueTableAliases(query_tree_node, table_expression, context); -} - -} diff --git a/src/Analyzer/Resolve/QueryAnalyzer.h b/src/Analyzer/Resolve/QueryAnalyzer.h new file mode 100644 index 00000000000..e2c4c8df46b --- /dev/null +++ b/src/Analyzer/Resolve/QueryAnalyzer.h @@ -0,0 +1,378 @@ +#pragma once + +#include +#include +#include +#include + +#include +#include + +#include + +namespace DB +{ + +struct GetColumnsOptions; +struct IdentifierResolveScope; +struct AnalysisTableExpressionData; +class QueryExpressionsAliasVisitor ; + +class QueryNode; +class JoinNode; +class ColumnNode; + +using ProjectionName = String; +using ProjectionNames = std::vector; + +struct Settings; + +/** Query analyzer implementation overview. Please check documentation in QueryAnalysisPass.h first. + * And additional documentation for each method, where special cases are described in detail. + * + * Each node in query must be resolved. For each query tree node resolved state is specific. + * + * For constant node no resolve process exists, it is resolved during construction. + * + * For table node no resolve process exists, it is resolved during construction. + * + * For function node to be resolved parameters and arguments must be resolved, function node must be initialized with concrete aggregate or + * non aggregate function and with result type. + * + * For lambda node there can be 2 different cases. + * 1. Standalone: WITH (x -> x + 1) AS lambda SELECT lambda(1); Such lambdas are inlined in query tree during query analysis pass. + * 2. Function arguments: WITH (x -> x + 1) AS lambda SELECT arrayMap(lambda, [1, 2, 3]); For such lambda resolution must + * set concrete lambda arguments (initially they are identifier nodes) and resolve lambda expression body. + * + * For query node resolve process must resolve all its inner nodes. + * + * For matcher node resolve process must replace it with matched nodes. + * + * For identifier node resolve process must replace it with concrete non identifier node. This part is most complex because + * for identifier resolution scopes and identifier lookup context play important part. + * + * ClickHouse SQL support lexical scoping for identifier resolution. Scope can be defined by query node or by expression node. + * Expression nodes that can define scope are lambdas and table ALIAS columns. + * + * Identifier lookup context can be expression, function, table. + * + * Examples: WITH (x -> x + 1) as func SELECT func() FROM func; During function `func` resolution identifier lookup is performed + * in function context. + * + * If there are no information of identifier context rules are following: + * 1. Try to resolve identifier in expression context. + * 2. Try to resolve identifier in function context, if it is allowed. Example: SELECT func(arguments); Here func identifier cannot be resolved in function context + * because query projection does not support that. + * 3. Try to resolve identifier in table context, if it is allowed. Example: SELECT table; Here table identifier cannot be resolved in function context + * because query projection does not support that. + * + * TODO: This does not supported properly before, because matchers could not be resolved from aliases. + * + * Identifiers are resolved with following rules: + * Resolution starts with current scope. + * 1. Try to resolve identifier from expression scope arguments. Lambda expression arguments are greatest priority. + * 2. Try to resolve identifier from aliases. + * 3. Try to resolve identifier from join tree if scope is query, or if there are registered table columns in scope. + * Steps 2 and 3 can be changed using prefer_column_name_to_alias setting. + * 4. If it is table lookup, try to resolve identifier from CTE. + * If identifier could not be resolved in current scope, resolution must be continued in parent scopes. + * 5. Try to resolve identifier from parent scopes. + * + * Additional rules about aliases and scopes. + * 1. Parent scope cannot refer alias from child scope. + * 2. Child scope can refer to alias in parent scope. + * + * Example: SELECT arrayMap(x -> x + 1 AS a, [1,2,3]), a; Identifier a is unknown in parent scope. + * Example: SELECT a FROM (SELECT 1 as a); Here we do not refer to alias a from child query scope. But we query it projection result, similar to tables. + * Example: WITH 1 as a SELECT (SELECT a) as b; Here in child scope identifier a is resolved using alias from parent scope. + * + * Additional rules about identifier binding. + * Bind for identifier to entity means that identifier first part match some node during analysis. + * If other parts of identifier cannot be resolved in that node, exception must be thrown. + * + * Example: + * CREATE TABLE test_table (id UInt64, compound_value Tuple(value UInt64)) ENGINE=TinyLog; + * SELECT compound_value.value, 1 AS compound_value FROM test_table; + * Identifier first part compound_value bound to entity with alias compound_value, but nested identifier part cannot be resolved from entity, + * lookup should not be continued, and exception must be thrown because if lookup continues that way identifier can be resolved from join tree. + * + * TODO: This was not supported properly before analyzer because nested identifier could not be resolved from alias. + * + * More complex example: + * CREATE TABLE test_table (id UInt64, value UInt64) ENGINE=TinyLog; + * WITH cast(('Value'), 'Tuple (value UInt64') AS value SELECT (SELECT value FROM test_table); + * Identifier first part value bound to test_table column value, but nested identifier part cannot be resolved from it, + * lookup should not be continued, and exception must be thrown because if lookup continues identifier can be resolved from parent scope. + * + * TODO: Update exception messages + * TODO: Table identifiers with optional UUID. + * TODO: Lookup functions arrayReduce(sum, [1, 2, 3]); + * TODO: Support function identifier resolve from parent query scope, if lambda in parent scope does not capture any columns. + */ + +class QueryAnalyzer +{ +public: + explicit QueryAnalyzer(bool only_analyze_); + ~QueryAnalyzer(); + + void resolve(QueryTreeNodePtr & node, const QueryTreeNodePtr & table_expression, ContextPtr context); + +private: + /// Utility functions + + static bool isExpressionNodeType(QueryTreeNodeType node_type); + + static bool isFunctionExpressionNodeType(QueryTreeNodeType node_type); + + static bool isSubqueryNodeType(QueryTreeNodeType node_type); + + static bool isTableExpressionNodeType(QueryTreeNodeType node_type); + + static DataTypePtr getExpressionNodeResultTypeOrNull(const QueryTreeNodePtr & query_tree_node); + + static ProjectionName calculateFunctionProjectionName(const QueryTreeNodePtr & function_node, + const ProjectionNames & parameters_projection_names, + const ProjectionNames & arguments_projection_names); + + static ProjectionName calculateWindowProjectionName(const QueryTreeNodePtr & window_node, + const QueryTreeNodePtr & parent_window_node, + const String & parent_window_name, + const ProjectionNames & partition_by_projection_names, + const ProjectionNames & order_by_projection_names, + const ProjectionName & frame_begin_offset_projection_name, + const ProjectionName & frame_end_offset_projection_name); + + static ProjectionName calculateSortColumnProjectionName(const QueryTreeNodePtr & sort_column_node, + const ProjectionName & sort_expression_projection_name, + const ProjectionName & fill_from_expression_projection_name, + const ProjectionName & fill_to_expression_projection_name, + const ProjectionName & fill_step_expression_projection_name); + + static void collectCompoundExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier, + const DataTypePtr & compound_expression_type, + const Identifier & valid_identifier_prefix, + std::unordered_set & valid_identifiers_result); + + static void collectTableExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier, + const QueryTreeNodePtr & table_expression, + const AnalysisTableExpressionData & table_expression_data, + std::unordered_set & valid_identifiers_result); + + static void collectScopeValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier, + const IdentifierResolveScope & scope, + bool allow_expression_identifiers, + bool allow_function_identifiers, + bool allow_table_expression_identifiers, + std::unordered_set & valid_identifiers_result); + + static void collectScopeWithParentScopesValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier, + const IdentifierResolveScope & scope, + bool allow_expression_identifiers, + bool allow_function_identifiers, + bool allow_table_expression_identifiers, + std::unordered_set & valid_identifiers_result); + + static std::vector collectIdentifierTypoHints(const Identifier & unresolved_identifier, const std::unordered_set & valid_identifiers); + + static QueryTreeNodePtr wrapExpressionNodeInTupleElement(QueryTreeNodePtr expression_node, IdentifierView nested_path); + + QueryTreeNodePtr tryGetLambdaFromSQLUserDefinedFunctions(const std::string & function_name, ContextPtr context); + + void evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & query_tree_node, IdentifierResolveScope & scope); + + static void mergeWindowWithParentWindow(const QueryTreeNodePtr & window_node, const QueryTreeNodePtr & parent_window_node, IdentifierResolveScope & scope); + + void replaceNodesWithPositionalArguments(QueryTreeNodePtr & node_list, const QueryTreeNodes & projection_nodes, IdentifierResolveScope & scope); + + static void convertLimitOffsetExpression(QueryTreeNodePtr & expression_node, const String & expression_description, IdentifierResolveScope & scope); + + static void validateTableExpressionModifiers(const QueryTreeNodePtr & table_expression_node, IdentifierResolveScope & scope); + + static void validateJoinTableExpressionWithoutAlias(const QueryTreeNodePtr & join_node, const QueryTreeNodePtr & table_expression_node, IdentifierResolveScope & scope); + + static void checkDuplicateTableNamesOrAlias(const QueryTreeNodePtr & join_node, QueryTreeNodePtr & left_table_expr, QueryTreeNodePtr & right_table_expr, IdentifierResolveScope & scope); + + static std::pair recursivelyCollectMaxOrdinaryExpressions(QueryTreeNodePtr & node, QueryTreeNodes & into); + + static void expandGroupByAll(QueryNode & query_tree_node_typed); + + void expandOrderByAll(QueryNode & query_tree_node_typed, const Settings & settings); + + static std::string + rewriteAggregateFunctionNameIfNeeded(const std::string & aggregate_function_name, NullsAction action, const ContextPtr & context); + + static std::optional getColumnSideFromJoinTree(const QueryTreeNodePtr & resolved_identifier, const JoinNode & join_node); + + static QueryTreeNodePtr convertJoinedColumnTypeToNullIfNeeded( + const QueryTreeNodePtr & resolved_identifier, + const JoinKind & join_kind, + std::optional resolved_side, + IdentifierResolveScope & scope); + + /// Resolve identifier functions + + static QueryTreeNodePtr tryResolveTableIdentifierFromDatabaseCatalog(const Identifier & table_identifier, ContextPtr context); + + QueryTreeNodePtr tryResolveIdentifierFromCompoundExpression(const Identifier & expression_identifier, + size_t identifier_bind_size, + const QueryTreeNodePtr & compound_expression, + String compound_expression_source, + IdentifierResolveScope & scope, + bool can_be_not_found = false); + + QueryTreeNodePtr tryResolveIdentifierFromExpressionArguments(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope); + + static bool tryBindIdentifierToAliases(const IdentifierLookup & identifier_lookup, const IdentifierResolveScope & scope); + + QueryTreeNodePtr tryResolveIdentifierFromAliases(const IdentifierLookup & identifier_lookup, + IdentifierResolveScope & scope, + IdentifierResolveSettings identifier_resolve_settings); + + QueryTreeNodePtr tryResolveIdentifierFromTableColumns(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope); + + static bool tryBindIdentifierToTableExpression(const IdentifierLookup & identifier_lookup, + const QueryTreeNodePtr & table_expression_node, + const IdentifierResolveScope & scope); + + static bool tryBindIdentifierToTableExpressions(const IdentifierLookup & identifier_lookup, + const QueryTreeNodePtr & table_expression_node, + const IdentifierResolveScope & scope); + + QueryTreeNodePtr tryResolveIdentifierFromTableExpression(const IdentifierLookup & identifier_lookup, + const QueryTreeNodePtr & table_expression_node, + IdentifierResolveScope & scope); + + QueryTreeNodePtr tryResolveIdentifierFromJoin(const IdentifierLookup & identifier_lookup, + const QueryTreeNodePtr & table_expression_node, + IdentifierResolveScope & scope); + + QueryTreeNodePtr matchArrayJoinSubcolumns( + const QueryTreeNodePtr & array_join_column_inner_expression, + const ColumnNode & array_join_column_expression_typed, + const QueryTreeNodePtr & resolved_expression, + IdentifierResolveScope & scope); + + QueryTreeNodePtr tryResolveExpressionFromArrayJoinExpressions(const QueryTreeNodePtr & resolved_expression, + const QueryTreeNodePtr & table_expression_node, + IdentifierResolveScope & scope); + + QueryTreeNodePtr tryResolveIdentifierFromArrayJoin(const IdentifierLookup & identifier_lookup, + const QueryTreeNodePtr & table_expression_node, + IdentifierResolveScope & scope); + + QueryTreeNodePtr tryResolveIdentifierFromJoinTreeNode(const IdentifierLookup & identifier_lookup, + const QueryTreeNodePtr & join_tree_node, + IdentifierResolveScope & scope); + + QueryTreeNodePtr tryResolveIdentifierFromJoinTree(const IdentifierLookup & identifier_lookup, + IdentifierResolveScope & scope); + + IdentifierResolveResult tryResolveIdentifierInParentScopes(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope); + + IdentifierResolveResult tryResolveIdentifier(const IdentifierLookup & identifier_lookup, + IdentifierResolveScope & scope, + IdentifierResolveSettings identifier_resolve_settings = {}); + + QueryTreeNodePtr tryResolveIdentifierFromStorage( + const Identifier & identifier, + const QueryTreeNodePtr & table_expression_node, + const AnalysisTableExpressionData & table_expression_data, + IdentifierResolveScope & scope, + size_t identifier_column_qualifier_parts, + bool can_be_not_found = false); + + /// Resolve query tree nodes functions + + void qualifyColumnNodesWithProjectionNames(const QueryTreeNodes & column_nodes, + const QueryTreeNodePtr & table_expression_node, + const IdentifierResolveScope & scope); + + static GetColumnsOptions buildGetColumnsOptions(QueryTreeNodePtr & matcher_node, const ContextPtr & context); + + using QueryTreeNodesWithNames = std::vector>; + + QueryTreeNodesWithNames getMatchedColumnNodesWithNames(const QueryTreeNodePtr & matcher_node, + const QueryTreeNodePtr & table_expression_node, + const NamesAndTypes & matched_columns, + const IdentifierResolveScope & scope); + + void updateMatchedColumnsFromJoinUsing(QueryTreeNodesWithNames & result_matched_column_nodes_with_names, const QueryTreeNodePtr & source_table_expression, IdentifierResolveScope & scope); + + QueryTreeNodesWithNames resolveQualifiedMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope); + + QueryTreeNodesWithNames resolveUnqualifiedMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope); + + ProjectionNames resolveMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope); + + ProjectionName resolveWindow(QueryTreeNodePtr & window_node, IdentifierResolveScope & scope); + + ProjectionNames resolveLambda(const QueryTreeNodePtr & lambda_node, + const QueryTreeNodePtr & lambda_node_to_resolve, + const QueryTreeNodes & lambda_arguments, + IdentifierResolveScope & scope); + + ProjectionNames resolveFunction(QueryTreeNodePtr & function_node, IdentifierResolveScope & scope); + + ProjectionNames resolveExpressionNode(QueryTreeNodePtr & node, IdentifierResolveScope & scope, bool allow_lambda_expression, bool allow_table_expression, bool ignore_alias = false); + + ProjectionNames resolveExpressionNodeList(QueryTreeNodePtr & node_list, IdentifierResolveScope & scope, bool allow_lambda_expression, bool allow_table_expression); + + ProjectionNames resolveSortNodeList(QueryTreeNodePtr & sort_node_list, IdentifierResolveScope & scope); + + void resolveGroupByNode(QueryNode & query_node_typed, IdentifierResolveScope & scope); + + void resolveInterpolateColumnsNodeList(QueryTreeNodePtr & interpolate_node_list, IdentifierResolveScope & scope); + + void resolveWindowNodeList(QueryTreeNodePtr & window_node_list, IdentifierResolveScope & scope); + + NamesAndTypes resolveProjectionExpressionNodeList(QueryTreeNodePtr & projection_node_list, IdentifierResolveScope & scope); + + void initializeQueryJoinTreeNode(QueryTreeNodePtr & join_tree_node, IdentifierResolveScope & scope); + + void initializeTableExpressionData(const QueryTreeNodePtr & table_expression_node, IdentifierResolveScope & scope); + + void resolveTableFunction(QueryTreeNodePtr & table_function_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor, bool nested_table_function); + + void resolveArrayJoin(QueryTreeNodePtr & array_join_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor); + + void resolveJoin(QueryTreeNodePtr & join_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor); + + void resolveQueryJoinTreeNode(QueryTreeNodePtr & join_tree_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor); + + void resolveQuery(const QueryTreeNodePtr & query_node, IdentifierResolveScope & scope); + + void resolveUnion(const QueryTreeNodePtr & union_node, IdentifierResolveScope & scope); + + /// Lambdas that are currently in resolve process + std::unordered_set lambdas_in_resolve_process; + + /// CTEs that are currently in resolve process + std::unordered_set ctes_in_resolve_process; + + /// Function name to user defined lambda map + std::unordered_map function_name_to_user_defined_lambda; + + /// Array join expressions counter + size_t array_join_expressions_counter = 1; + + /// Subquery counter + size_t subquery_counter = 1; + + /// Global expression node to projection name map + std::unordered_map node_to_projection_name; + + /// Global resolve expression node to projection names map + std::unordered_map resolved_expressions; + + /// Global resolve expression node to tree size + std::unordered_map node_to_tree_size; + + /// Global scalar subquery to scalar value map + std::unordered_map scalar_subquery_to_scalar_value_local; + std::unordered_map scalar_subquery_to_scalar_value_global; + + const bool only_analyze; +}; + +} diff --git a/src/Analyzer/Resolve/QueryExpressionsAliasVisitor.h b/src/Analyzer/Resolve/QueryExpressionsAliasVisitor.h new file mode 100644 index 00000000000..45d081e34ea --- /dev/null +++ b/src/Analyzer/Resolve/QueryExpressionsAliasVisitor.h @@ -0,0 +1,119 @@ +#pragma once + +#include +#include +#include + +namespace DB +{ + +/** Visitor that extracts expression and function aliases from node and initialize scope tables with it. + * Does not go into child lambdas and queries. + * + * Important: + * Identifier nodes with aliases are added both in alias to expression and alias to function map. + * + * These is necessary because identifier with alias can give alias name to any query tree node. + * + * Example: + * WITH (x -> x + 1) AS id, id AS value SELECT value(1); + * In this example id as value is identifier node that has alias, during scope initialization we cannot derive + * that id is actually lambda or expression. + * + * There are no easy solution here, without trying to make full featured expression resolution at this stage. + * Example: + * WITH (x -> x + 1) AS id, id AS id_1, id_1 AS id_2 SELECT id_2(1); + * Example: SELECT a, b AS a, b AS c, 1 AS c; + * + * It is client responsibility after resolving identifier node with alias, make following actions: + * 1. If identifier node was resolved in function scope, remove alias from scope expression map. + * 2. If identifier node was resolved in expression scope, remove alias from scope function map. + * + * That way we separate alias map initialization and expressions resolution. + */ +class QueryExpressionsAliasVisitor : public InDepthQueryTreeVisitor +{ +public: + explicit QueryExpressionsAliasVisitor(ScopeAliases & aliases_) + : aliases(aliases_) + {} + + void visitImpl(QueryTreeNodePtr & node) + { + updateAliasesIfNeeded(node, false /*is_lambda_node*/); + } + + bool needChildVisit(const QueryTreeNodePtr &, const QueryTreeNodePtr & child) + { + if (auto * lambda_node = child->as()) + { + updateAliasesIfNeeded(child, true /*is_lambda_node*/); + return false; + } + else if (auto * query_tree_node = child->as()) + { + if (query_tree_node->isCTE()) + return false; + + updateAliasesIfNeeded(child, false /*is_lambda_node*/); + return false; + } + else if (auto * union_node = child->as()) + { + if (union_node->isCTE()) + return false; + + updateAliasesIfNeeded(child, false /*is_lambda_node*/); + return false; + } + + return true; + } +private: + void addDuplicatingAlias(const QueryTreeNodePtr & node) + { + aliases.nodes_with_duplicated_aliases.emplace(node); + auto cloned_node = node->clone(); + aliases.cloned_nodes_with_duplicated_aliases.emplace_back(cloned_node); + aliases.nodes_with_duplicated_aliases.emplace(cloned_node); + } + + void updateAliasesIfNeeded(const QueryTreeNodePtr & node, bool is_lambda_node) + { + if (!node->hasAlias()) + return; + + // We should not resolve expressions to WindowNode + if (node->getNodeType() == QueryTreeNodeType::WINDOW) + return; + + const auto & alias = node->getAlias(); + + if (is_lambda_node) + { + if (aliases.alias_name_to_expression_node->contains(alias)) + addDuplicatingAlias(node); + + auto [_, inserted] = aliases.alias_name_to_lambda_node.insert(std::make_pair(alias, node)); + if (!inserted) + addDuplicatingAlias(node); + + return; + } + + if (aliases.alias_name_to_lambda_node.contains(alias)) + addDuplicatingAlias(node); + + auto [_, inserted] = aliases.alias_name_to_expression_node->insert(std::make_pair(alias, node)); + if (!inserted) + addDuplicatingAlias(node); + + /// If node is identifier put it into transitive aliases map. + if (const auto * identifier = typeid_cast(node.get())) + aliases.transitive_aliases.insert(std::make_pair(alias, identifier->getIdentifier())); + } + + ScopeAliases & aliases; +}; + +} diff --git a/src/Analyzer/Resolve/ScopeAliases.h b/src/Analyzer/Resolve/ScopeAliases.h new file mode 100644 index 00000000000..baab843988b --- /dev/null +++ b/src/Analyzer/Resolve/ScopeAliases.h @@ -0,0 +1,91 @@ +#pragma once + +#include +#include + +namespace DB +{ + +struct ScopeAliases +{ + /// Alias name to query expression node + std::unordered_map alias_name_to_expression_node_before_group_by; + std::unordered_map alias_name_to_expression_node_after_group_by; + + std::unordered_map * alias_name_to_expression_node = nullptr; + + /// Alias name to lambda node + std::unordered_map alias_name_to_lambda_node; + + /// Alias name to table expression node + std::unordered_map alias_name_to_table_expression_node; + + /// Expressions like `x as y` where we can't say whether it's a function, expression or table. + std::unordered_map transitive_aliases; + + /// Nodes with duplicated aliases + std::unordered_set nodes_with_duplicated_aliases; + std::vector cloned_nodes_with_duplicated_aliases; + + /// Names which are aliases from ARRAY JOIN. + /// This is needed to properly qualify columns from matchers and avoid name collision. + std::unordered_set array_join_aliases; + + std::unordered_map & getAliasMap(IdentifierLookupContext lookup_context) + { + switch (lookup_context) + { + case IdentifierLookupContext::EXPRESSION: return *alias_name_to_expression_node; + case IdentifierLookupContext::FUNCTION: return alias_name_to_lambda_node; + case IdentifierLookupContext::TABLE_EXPRESSION: return alias_name_to_table_expression_node; + } + } + + enum class FindOption + { + FIRST_NAME, + FULL_NAME, + }; + + const std::string & getKey(const Identifier & identifier, FindOption find_option) + { + switch (find_option) + { + case FindOption::FIRST_NAME: return identifier.front(); + case FindOption::FULL_NAME: return identifier.getFullName(); + } + } + + QueryTreeNodePtr * find(IdentifierLookup lookup, FindOption find_option) + { + auto & alias_map = getAliasMap(lookup.lookup_context); + const std::string * key = &getKey(lookup.identifier, find_option); + + auto it = alias_map.find(*key); + + if (it != alias_map.end()) + return &it->second; + + if (lookup.lookup_context == IdentifierLookupContext::TABLE_EXPRESSION) + return {}; + + while (it == alias_map.end()) + { + auto jt = transitive_aliases.find(*key); + if (jt == transitive_aliases.end()) + return {}; + + key = &(getKey(jt->second, find_option)); + it = alias_map.find(*key); + } + + return &it->second; + } + + const QueryTreeNodePtr * find(IdentifierLookup lookup, FindOption find_option) const + { + return const_cast(this)->find(lookup, find_option); + } +}; + +} diff --git a/src/Analyzer/Resolve/TableExpressionData.h b/src/Analyzer/Resolve/TableExpressionData.h new file mode 100644 index 00000000000..18cbfa32366 --- /dev/null +++ b/src/Analyzer/Resolve/TableExpressionData.h @@ -0,0 +1,83 @@ +#pragma once + +#include +#include + +namespace DB +{ + +struct StringTransparentHash +{ + using is_transparent = void; + using hash = std::hash; + + [[maybe_unused]] size_t operator()(const char * data) const + { + return hash()(data); + } + + size_t operator()(std::string_view data) const + { + return hash()(data); + } + + size_t operator()(const std::string & data) const + { + return hash()(data); + } +}; + +using ColumnNameToColumnNodeMap = std::unordered_map>; + +struct AnalysisTableExpressionData +{ + std::string table_expression_name; + std::string table_expression_description; + std::string database_name; + std::string table_name; + bool should_qualify_columns = true; + NamesAndTypes column_names_and_types; + ColumnNameToColumnNodeMap column_name_to_column_node; + std::unordered_set subcolumn_names; /// Subset columns that are subcolumns of other columns + std::unordered_set> column_identifier_first_parts; + + bool hasFullIdentifierName(IdentifierView identifier_view) const + { + return column_name_to_column_node.contains(identifier_view.getFullName()); + } + + bool canBindIdentifier(IdentifierView identifier_view) const + { + return column_identifier_first_parts.contains(identifier_view.at(0)); + } + + [[maybe_unused]] void dump(WriteBuffer & buffer) const + { + buffer << "Table expression name " << table_expression_name; + + if (!table_expression_description.empty()) + buffer << " table expression description " << table_expression_description; + + if (!database_name.empty()) + buffer << " database name " << database_name; + + if (!table_name.empty()) + buffer << " table name " << table_name; + + buffer << " should qualify columns " << should_qualify_columns; + buffer << " columns size " << column_name_to_column_node.size() << '\n'; + + for (const auto & [column_name, column_node] : column_name_to_column_node) + buffer << "Column name " << column_name << " column node " << column_node->dumpTree() << '\n'; + } + + [[maybe_unused]] String dump() const + { + WriteBufferFromOwnString buffer; + dump(buffer); + + return buffer.str(); + } +}; + +} diff --git a/src/Analyzer/Resolve/TableExpressionsAliasVisitor.h b/src/Analyzer/Resolve/TableExpressionsAliasVisitor.h new file mode 100644 index 00000000000..cab79806465 --- /dev/null +++ b/src/Analyzer/Resolve/TableExpressionsAliasVisitor.h @@ -0,0 +1,71 @@ +#pragma once + +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int MULTIPLE_EXPRESSIONS_FOR_ALIAS; +} + +class TableExpressionsAliasVisitor : public InDepthQueryTreeVisitor +{ +public: + explicit TableExpressionsAliasVisitor(IdentifierResolveScope & scope_) + : scope(scope_) + {} + + void visitImpl(QueryTreeNodePtr & node) + { + updateAliasesIfNeeded(node); + } + + static bool needChildVisit(const QueryTreeNodePtr & node, const QueryTreeNodePtr & child) + { + auto node_type = node->getNodeType(); + + switch (node_type) + { + case QueryTreeNodeType::ARRAY_JOIN: + { + const auto & array_join_node = node->as(); + return child.get() == array_join_node.getTableExpression().get(); + } + case QueryTreeNodeType::JOIN: + { + const auto & join_node = node->as(); + return child.get() == join_node.getLeftTableExpression().get() || child.get() == join_node.getRightTableExpression().get(); + } + default: + { + break; + } + } + + return false; + } + +private: + void updateAliasesIfNeeded(const QueryTreeNodePtr & node) + { + if (!node->hasAlias()) + return; + + const auto & node_alias = node->getAlias(); + auto [_, inserted] = scope.aliases.alias_name_to_table_expression_node.emplace(node_alias, node); + if (!inserted) + throw Exception(ErrorCodes::MULTIPLE_EXPRESSIONS_FOR_ALIAS, + "Multiple table expressions with same alias {}. In scope {}", + node_alias, + scope.scope_node->formatASTForErrorMessage()); + } + + IdentifierResolveScope & scope; +}; + +} diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index f2e10a27b75..2b5078111ee 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -215,6 +215,7 @@ add_object_library(clickhouse_databases_mysql Databases/MySQL) add_object_library(clickhouse_disks Disks) add_object_library(clickhouse_analyzer Analyzer) add_object_library(clickhouse_analyzer_passes Analyzer/Passes) +add_object_library(clickhouse_analyzer_passes Analyzer/Resolve) add_object_library(clickhouse_planner Planner) add_object_library(clickhouse_interpreters Interpreters) add_object_library(clickhouse_interpreters_cache Interpreters/Cache) diff --git a/src/Columns/ColumnSparse.cpp b/src/Columns/ColumnSparse.cpp index 2e75a2fd4ab..cecd956fb95 100644 --- a/src/Columns/ColumnSparse.cpp +++ b/src/Columns/ColumnSparse.cpp @@ -322,7 +322,9 @@ ColumnPtr ColumnSparse::filter(const Filter & filt, ssize_t) const size_t res_offset = 0; auto offset_it = begin(); - for (size_t i = 0; i < _size; ++i, ++offset_it) + /// Replace the `++offset_it` with `offset_it.increaseCurrentRow()` and `offset_it.increaseCurrentOffset()`, + /// to remove the redundant `isDefault()` in `++` of `Interator` and reuse the following `isDefault()`. + for (size_t i = 0; i < _size; ++i, offset_it.increaseCurrentRow()) { if (!offset_it.isDefault()) { @@ -337,6 +339,7 @@ ColumnPtr ColumnSparse::filter(const Filter & filt, ssize_t) const { values_filter.push_back(0); } + offset_it.increaseCurrentOffset(); } else { diff --git a/src/Columns/ColumnSparse.h b/src/Columns/ColumnSparse.h index 7d3200da35f..12b2def7cf1 100644 --- a/src/Columns/ColumnSparse.h +++ b/src/Columns/ColumnSparse.h @@ -181,14 +181,16 @@ public: { public: Iterator(const PaddedPODArray & offsets_, size_t size_, size_t current_offset_, size_t current_row_) - : offsets(offsets_), size(size_), current_offset(current_offset_), current_row(current_row_) + : offsets(offsets_), offsets_size(offsets.size()), size(size_), current_offset(current_offset_), current_row(current_row_) { } - bool ALWAYS_INLINE isDefault() const { return current_offset == offsets.size() || current_row != offsets[current_offset]; } + bool ALWAYS_INLINE isDefault() const { return current_offset == offsets_size || current_row != offsets[current_offset]; } size_t ALWAYS_INLINE getValueIndex() const { return isDefault() ? 0 : current_offset + 1; } size_t ALWAYS_INLINE getCurrentRow() const { return current_row; } size_t ALWAYS_INLINE getCurrentOffset() const { return current_offset; } + size_t ALWAYS_INLINE increaseCurrentRow() { return ++current_row; } + size_t ALWAYS_INLINE increaseCurrentOffset() { return ++current_offset; } bool operator==(const Iterator & other) const { @@ -209,6 +211,7 @@ public: private: const PaddedPODArray & offsets; + const size_t offsets_size; const size_t size; size_t current_offset; size_t current_row; diff --git a/src/Common/CurrentMetrics.cpp b/src/Common/CurrentMetrics.cpp index e73ac307a35..731c72d65f2 100644 --- a/src/Common/CurrentMetrics.cpp +++ b/src/Common/CurrentMetrics.cpp @@ -127,6 +127,9 @@ M(DestroyAggregatesThreads, "Number of threads in the thread pool for destroy aggregate states.") \ M(DestroyAggregatesThreadsActive, "Number of threads in the thread pool for destroy aggregate states running a task.") \ M(DestroyAggregatesThreadsScheduled, "Number of queued or active jobs in the thread pool for destroy aggregate states.") \ + M(ConcurrentHashJoinPoolThreads, "Number of threads in the thread pool for concurrent hash join.") \ + M(ConcurrentHashJoinPoolThreadsActive, "Number of threads in the thread pool for concurrent hash join running a task.") \ + M(ConcurrentHashJoinPoolThreadsScheduled, "Number of queued or active jobs in the thread pool for concurrent hash join.") \ M(HashedDictionaryThreads, "Number of threads in the HashedDictionary thread pool.") \ M(HashedDictionaryThreadsActive, "Number of threads in the HashedDictionary thread pool running a task.") \ M(HashedDictionaryThreadsScheduled, "Number of queued or active jobs in the HashedDictionary thread pool.") \ @@ -174,6 +177,11 @@ M(ObjectStorageAzureThreads, "Number of threads in the AzureObjectStorage thread pool.") \ M(ObjectStorageAzureThreadsActive, "Number of threads in the AzureObjectStorage thread pool running a task.") \ M(ObjectStorageAzureThreadsScheduled, "Number of queued or active jobs in the AzureObjectStorage thread pool.") \ + \ + M(DiskPlainRewritableAzureDirectoryMapSize, "Number of local-to-remote path entries in the 'plain_rewritable' in-memory map for AzureObjectStorage.") \ + M(DiskPlainRewritableLocalDirectoryMapSize, "Number of local-to-remote path entries in the 'plain_rewritable' in-memory map for LocalObjectStorage.") \ + M(DiskPlainRewritableS3DirectoryMapSize, "Number of local-to-remote path entries in the 'plain_rewritable' in-memory map for S3ObjectStorage.") \ + \ M(MergeTreePartsLoaderThreads, "Number of threads in the MergeTree parts loader thread pool.") \ M(MergeTreePartsLoaderThreadsActive, "Number of threads in the MergeTree parts loader thread pool running a task.") \ M(MergeTreePartsLoaderThreadsScheduled, "Number of queued or active jobs in the MergeTree parts loader thread pool.") \ diff --git a/src/Common/DateLUTImpl.cpp b/src/Common/DateLUTImpl.cpp index 392ee64dcbf..c87d44a4b95 100644 --- a/src/Common/DateLUTImpl.cpp +++ b/src/Common/DateLUTImpl.cpp @@ -41,7 +41,6 @@ UInt8 getDayOfWeek(const cctz::civil_day & date) case cctz::weekday::saturday: return 6; case cctz::weekday::sunday: return 7; } - UNREACHABLE(); } inline cctz::time_point lookupTz(const cctz::time_zone & cctz_time_zone, const cctz::civil_day & date) diff --git a/src/Common/IntervalKind.cpp b/src/Common/IntervalKind.cpp index 22c7db504c3..1548d5cf9a5 100644 --- a/src/Common/IntervalKind.cpp +++ b/src/Common/IntervalKind.cpp @@ -34,8 +34,6 @@ Int64 IntervalKind::toAvgNanoseconds() const default: return toAvgSeconds() * NANOSECONDS_PER_SECOND; } - - UNREACHABLE(); } Int32 IntervalKind::toAvgSeconds() const @@ -54,7 +52,6 @@ Int32 IntervalKind::toAvgSeconds() const case IntervalKind::Kind::Quarter: return 7889238; /// Exactly 1/4 of a year. case IntervalKind::Kind::Year: return 31556952; /// The average length of a Gregorian year is equal to 365.2425 days } - UNREACHABLE(); } Float64 IntervalKind::toSeconds() const @@ -80,7 +77,6 @@ Float64 IntervalKind::toSeconds() const default: throw Exception(ErrorCodes::BAD_ARGUMENTS, "Not possible to get precise number of seconds in non-precise interval"); } - UNREACHABLE(); } bool IntervalKind::isFixedLength() const @@ -99,7 +95,6 @@ bool IntervalKind::isFixedLength() const case IntervalKind::Kind::Quarter: case IntervalKind::Kind::Year: return false; } - UNREACHABLE(); } IntervalKind IntervalKind::fromAvgSeconds(Int64 num_seconds) @@ -141,7 +136,6 @@ const char * IntervalKind::toKeyword() const case IntervalKind::Kind::Quarter: return "QUARTER"; case IntervalKind::Kind::Year: return "YEAR"; } - UNREACHABLE(); } @@ -161,7 +155,6 @@ const char * IntervalKind::toLowercasedKeyword() const case IntervalKind::Kind::Quarter: return "quarter"; case IntervalKind::Kind::Year: return "year"; } - UNREACHABLE(); } @@ -192,7 +185,6 @@ const char * IntervalKind::toDateDiffUnit() const case IntervalKind::Kind::Year: return "year"; } - UNREACHABLE(); } @@ -223,7 +215,6 @@ const char * IntervalKind::toNameOfFunctionToIntervalDataType() const case IntervalKind::Kind::Year: return "toIntervalYear"; } - UNREACHABLE(); } @@ -257,7 +248,6 @@ const char * IntervalKind::toNameOfFunctionExtractTimePart() const case IntervalKind::Kind::Year: return "toYear"; } - UNREACHABLE(); } diff --git a/src/Common/NamedCollections/NamedCollections.cpp b/src/Common/NamedCollections/NamedCollections.cpp index 2fe5ced5b36..74ce405f71d 100644 --- a/src/Common/NamedCollections/NamedCollections.cpp +++ b/src/Common/NamedCollections/NamedCollections.cpp @@ -12,8 +12,6 @@ namespace DB namespace ErrorCodes { - extern const int NAMED_COLLECTION_DOESNT_EXIST; - extern const int NAMED_COLLECTION_ALREADY_EXISTS; extern const int NAMED_COLLECTION_IS_IMMUTABLE; extern const int BAD_ARGUMENTS; } diff --git a/src/Common/NamedCollections/NamedCollectionsFactory.cpp b/src/Common/NamedCollections/NamedCollectionsFactory.cpp index d308b300eea..5b16a076811 100644 --- a/src/Common/NamedCollections/NamedCollectionsFactory.cpp +++ b/src/Common/NamedCollections/NamedCollectionsFactory.cpp @@ -313,4 +313,5 @@ void NamedCollectionFactory::updateFromSQL(const ASTAlterNamedCollectionQuery & for (const auto & key : query.delete_keys) collection->remove(key); } + } diff --git a/src/Common/ProfileEvents.cpp b/src/Common/ProfileEvents.cpp index 8c8e2163aad..9bb7bece0f0 100644 --- a/src/Common/ProfileEvents.cpp +++ b/src/Common/ProfileEvents.cpp @@ -417,6 +417,13 @@ The server successfully detected this situation and will download merged part fr M(DiskS3PutObject, "Number of DiskS3 API PutObject calls.") \ M(DiskS3GetObject, "Number of DiskS3 API GetObject calls.") \ \ + M(DiskPlainRewritableAzureDirectoryCreated, "Number of directories created by the 'plain_rewritable' metadata storage for AzureObjectStorage.") \ + M(DiskPlainRewritableAzureDirectoryRemoved, "Number of directories removed by the 'plain_rewritable' metadata storage for AzureObjectStorage.") \ + M(DiskPlainRewritableLocalDirectoryCreated, "Number of directories created by the 'plain_rewritable' metadata storage for LocalObjectStorage.") \ + M(DiskPlainRewritableLocalDirectoryRemoved, "Number of directories removed by the 'plain_rewritable' metadata storage for LocalObjectStorage.") \ + M(DiskPlainRewritableS3DirectoryCreated, "Number of directories created by the 'plain_rewritable' metadata storage for S3ObjectStorage.") \ + M(DiskPlainRewritableS3DirectoryRemoved, "Number of directories removed by the 'plain_rewritable' metadata storage for S3ObjectStorage.") \ + \ M(S3Clients, "Number of created S3 clients.") \ M(TinyS3Clients, "Number of S3 clients copies which reuse an existing auth provider from another client.") \ \ diff --git a/src/Common/TargetSpecific.cpp b/src/Common/TargetSpecific.cpp index 49f396c0926..8540c9a9986 100644 --- a/src/Common/TargetSpecific.cpp +++ b/src/Common/TargetSpecific.cpp @@ -54,8 +54,6 @@ String toString(TargetArch arch) case TargetArch::AMXTILE: return "amxtile"; case TargetArch::AMXINT8: return "amxint8"; } - - UNREACHABLE(); } } diff --git a/src/Common/ThreadProfileEvents.cpp b/src/Common/ThreadProfileEvents.cpp index 6a63d484cd9..23b41f23bde 100644 --- a/src/Common/ThreadProfileEvents.cpp +++ b/src/Common/ThreadProfileEvents.cpp @@ -75,7 +75,6 @@ const char * TasksStatsCounters::metricsProviderString(MetricsProvider provider) case MetricsProvider::Netlink: return "netlink"; } - UNREACHABLE(); } bool TasksStatsCounters::checkIfAvailable() diff --git a/src/Common/ZooKeeper/IKeeper.cpp b/src/Common/ZooKeeper/IKeeper.cpp index 7d2602bde1e..7cca262baca 100644 --- a/src/Common/ZooKeeper/IKeeper.cpp +++ b/src/Common/ZooKeeper/IKeeper.cpp @@ -146,8 +146,6 @@ const char * errorMessage(Error code) case Error::ZSESSIONMOVED: return "Session moved to another server, so operation is ignored"; case Error::ZNOTREADONLY: return "State-changing request is passed to read-only server"; } - - UNREACHABLE(); } bool isHardwareError(Error zk_return_code) diff --git a/src/Compression/CompressionCodecDeflateQpl.cpp b/src/Compression/CompressionCodecDeflateQpl.cpp index 7e0653c69f8..f1b5b24e866 100644 --- a/src/Compression/CompressionCodecDeflateQpl.cpp +++ b/src/Compression/CompressionCodecDeflateQpl.cpp @@ -466,7 +466,6 @@ void CompressionCodecDeflateQpl::doDecompressData(const char * source, UInt32 so sw_codec->doDecompressData(source, source_size, dest, uncompressed_size); return; } - UNREACHABLE(); } void CompressionCodecDeflateQpl::flushAsynchronousDecompressRequests() diff --git a/src/Compression/CompressionCodecDoubleDelta.cpp b/src/Compression/CompressionCodecDoubleDelta.cpp index e6e8db4c699..cbd8cd57a62 100644 --- a/src/Compression/CompressionCodecDoubleDelta.cpp +++ b/src/Compression/CompressionCodecDoubleDelta.cpp @@ -21,6 +21,11 @@ namespace DB { +namespace ErrorCodes +{ + extern const int BAD_ARGUMENTS; +} + /** NOTE DoubleDelta is surprisingly bad name. The only excuse is that it comes from an academic paper. * Most people will think that "double delta" is just applying delta transform twice. * But in fact it is something more than applying delta transform twice. @@ -142,9 +147,9 @@ namespace ErrorCodes { extern const int CANNOT_COMPRESS; extern const int CANNOT_DECOMPRESS; - extern const int BAD_ARGUMENTS; extern const int ILLEGAL_SYNTAX_FOR_CODEC_TYPE; extern const int ILLEGAL_CODEC_PARAMETER; + extern const int LOGICAL_ERROR; } namespace @@ -163,9 +168,8 @@ inline Int64 getMaxValueForByteSize(Int8 byte_size) case sizeof(UInt64): return std::numeric_limits::max(); default: - assert(false && "only 1, 2, 4 and 8 data sizes are supported"); + throw Exception(ErrorCodes::LOGICAL_ERROR, "only 1, 2, 4 and 8 data sizes are supported"); } - UNREACHABLE(); } struct WriteSpec diff --git a/src/Coordination/KeeperReconfiguration.cpp b/src/Coordination/KeeperReconfiguration.cpp index e3642913a7a..05211af6704 100644 --- a/src/Coordination/KeeperReconfiguration.cpp +++ b/src/Coordination/KeeperReconfiguration.cpp @@ -5,6 +5,12 @@ namespace DB { + +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + ClusterUpdateActions joiningToClusterUpdates(const ClusterConfigPtr & cfg, std::string_view joining) { ClusterUpdateActions out; @@ -79,7 +85,7 @@ String serializeClusterConfig(const ClusterConfigPtr & cfg, const ClusterUpdateA new_config.emplace_back(RaftServerConfig{*cfg->get_server(priority->id)}); } else - UNREACHABLE(); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected update"); } for (const auto & item : cfg->get_servers()) diff --git a/src/Coordination/KeeperServer.cpp b/src/Coordination/KeeperServer.cpp index 8d21ce2ab01..736a01443ce 100644 --- a/src/Coordination/KeeperServer.cpp +++ b/src/Coordination/KeeperServer.cpp @@ -990,7 +990,7 @@ KeeperServer::ConfigUpdateState KeeperServer::applyConfigUpdate( raft_instance->set_priority(update->id, update->priority, /*broadcast on live leader*/true); return Accepted; } - UNREACHABLE(); + std::unreachable(); } ClusterUpdateActions KeeperServer::getRaftConfigurationDiff(const Poco::Util::AbstractConfiguration & config) diff --git a/src/Coordination/Standalone/Context.cpp b/src/Coordination/Standalone/Context.cpp index 4b14b038852..2af8a015c2d 100644 --- a/src/Coordination/Standalone/Context.cpp +++ b/src/Coordination/Standalone/Context.cpp @@ -478,4 +478,9 @@ bool Context::hasTraceCollector() const return false; } +bool Context::isBackgroundOperationContext() const +{ + return false; +} + } diff --git a/src/Coordination/Standalone/Context.h b/src/Coordination/Standalone/Context.h index 7e4d1794f7d..79a3e32a72d 100644 --- a/src/Coordination/Standalone/Context.h +++ b/src/Coordination/Standalone/Context.h @@ -170,6 +170,8 @@ public: const ServerSettings & getServerSettings() const; bool hasTraceCollector() const; + + bool isBackgroundOperationContext() const; }; } diff --git a/src/Core/Field.h b/src/Core/Field.h index 73d3f4ec44e..a78b589c883 100644 --- a/src/Core/Field.h +++ b/src/Core/Field.h @@ -667,8 +667,6 @@ public: case Types::AggregateFunctionState: return f(field.template get()); case Types::CustomType: return f(field.template get()); } - - UNREACHABLE(); } String dump() const; diff --git a/src/Core/Settings.h b/src/Core/Settings.h index f0389e7e2d5..dc61a049de8 100644 --- a/src/Core/Settings.h +++ b/src/Core/Settings.h @@ -394,7 +394,7 @@ class IColumn; M(Bool, allow_experimental_analyzer, true, "Allow experimental analyzer.", 0) \ M(Bool, analyzer_compatibility_join_using_top_level_identifier, false, "Force to resolve identifier in JOIN USING from projection (for example, in `SELECT a + 1 AS b FROM t1 JOIN t2 USING (b)` join will be performed by `t1.a + 1 = t2.b`, rather then `t1.b = t2.b`).", 0) \ M(Bool, prefer_global_in_and_join, false, "If enabled, all IN/JOIN operators will be rewritten as GLOBAL IN/JOIN. It's useful when the to-be-joined tables are only available on the initiator and we need to always scatter their data on-the-fly during distributed processing with the GLOBAL keyword. It's also useful to reduce the need to access the external sources joining external tables.", 0) \ - M(Bool, enable_vertical_final, true, "If enable, remove duplicated rows during FINAL by marking rows as deleted and filtering them later instead of merging rows", 0) \ + M(Bool, enable_vertical_final, false, "Not recommended. If enable, remove duplicated rows during FINAL by marking rows as deleted and filtering them later instead of merging rows", 0) \ \ \ /** Limits during query execution are part of the settings. \ @@ -924,7 +924,7 @@ class IColumn; M(Int64, ignore_cold_parts_seconds, 0, "Only available in ClickHouse Cloud. Exclude new data parts from SELECT queries until they're either pre-warmed (see cache_populated_by_fetch) or this many seconds old. Only for Replicated-/SharedMergeTree.", 0) \ M(Int64, prefer_warmed_unmerged_parts_seconds, 0, "Only available in ClickHouse Cloud. If a merged part is less than this many seconds old and is not pre-warmed (see cache_populated_by_fetch), but all its source parts are available and pre-warmed, SELECT queries will read from those parts instead. Only for ReplicatedMergeTree. Note that this only checks whether CacheWarmer processed the part; if the part was fetched into cache by something else, it'll still be considered cold until CacheWarmer gets to it; if it was warmed, then evicted from cache, it'll still be considered warm.", 0) \ M(Bool, iceberg_engine_ignore_schema_evolution, false, "Ignore schema evolution in Iceberg table engine and read all data using latest schema saved on table creation. Note that it can lead to incorrect result", 0) \ - M(Bool, allow_deprecated_functions, false, "Allow usage of deprecated functions", 0) \ + M(Bool, allow_deprecated_error_prone_window_functions, false, "Allow usage of deprecated error prone window functions (neighbor, runningAccumulate, runningDifferenceStartingWithFirstValue, runningDifference)", 0) \ // End of COMMON_SETTINGS // Please add settings related to formats into the FORMAT_FACTORY_SETTINGS, move obsolete settings to OBSOLETE_SETTINGS and obsolete format settings to OBSOLETE_FORMAT_SETTINGS. diff --git a/src/Core/SettingsChangesHistory.h b/src/Core/SettingsChangesHistory.h index 66341876912..3a0f2ca1e27 100644 --- a/src/Core/SettingsChangesHistory.h +++ b/src/Core/SettingsChangesHistory.h @@ -94,7 +94,7 @@ static std::map sett {"azure_ignore_file_doesnt_exist", false, false, "Allow to return 0 rows when the requested files don't exist instead of throwing an exception in AzureBlobStorage table engine"}, {"s3_ignore_file_doesnt_exist", false, false, "Allow to return 0 rows when the requested files don't exist instead of throwing an exception in S3 table engine"}, }}, - {"24.5", {{"allow_deprecated_functions", true, false, "Allow usage of deprecated functions"}, + {"24.5", {{"allow_deprecated_error_prone_window_functions", true, false, "Allow usage of deprecated error prone window functions (neighbor, runningAccumulate, runningDifferenceStartingWithFirstValue, runningDifference)"}, {"allow_experimental_join_condition", false, false, "Support join with inequal conditions which involve columns from both left and right table. e.g. t1.y < t2.y."}, {"input_format_tsv_crlf_end_of_line", false, false, "Enables reading of CRLF line endings with TSV formats"}, {"output_format_parquet_use_custom_encoder", false, true, "Enable custom Parquet encoder."}, diff --git a/src/DataTypes/Serializations/ISerialization.cpp b/src/DataTypes/Serializations/ISerialization.cpp index dbe27a5f3f6..bbb1d1a6cd1 100644 --- a/src/DataTypes/Serializations/ISerialization.cpp +++ b/src/DataTypes/Serializations/ISerialization.cpp @@ -36,7 +36,6 @@ String ISerialization::kindToString(Kind kind) case Kind::SPARSE: return "Sparse"; } - UNREACHABLE(); } ISerialization::Kind ISerialization::stringToKind(const String & str) diff --git a/src/Databases/DatabaseReplicated.cpp b/src/Databases/DatabaseReplicated.cpp index cc946fc22c4..f5aff604dcb 100644 --- a/src/Databases/DatabaseReplicated.cpp +++ b/src/Databases/DatabaseReplicated.cpp @@ -936,7 +936,7 @@ void DatabaseReplicated::recoverLostReplica(const ZooKeeperPtr & current_zookeep query_context->setSetting("allow_experimental_window_functions", 1); query_context->setSetting("allow_experimental_geo_types", 1); query_context->setSetting("allow_experimental_map_type", 1); - query_context->setSetting("allow_deprecated_functions", 1); + query_context->setSetting("allow_deprecated_error_prone_window_functions", 1); query_context->setSetting("allow_suspicious_low_cardinality_types", 1); query_context->setSetting("allow_suspicious_fixed_string_types", 1); diff --git a/src/Disks/IO/CachedOnDiskReadBufferFromFile.cpp b/src/Disks/IO/CachedOnDiskReadBufferFromFile.cpp index 1fe369832ac..e9c642666d3 100644 --- a/src/Disks/IO/CachedOnDiskReadBufferFromFile.cpp +++ b/src/Disks/IO/CachedOnDiskReadBufferFromFile.cpp @@ -274,6 +274,11 @@ bool CachedOnDiskReadBufferFromFile::canStartFromCache(size_t current_offset, co return current_write_offset > current_offset; } +String CachedOnDiskReadBufferFromFile::toString(ReadType type) +{ + return String(magic_enum::enum_name(type)); +} + CachedOnDiskReadBufferFromFile::ImplementationBufferPtr CachedOnDiskReadBufferFromFile::getReadBufferForFileSegment(FileSegment & file_segment) { diff --git a/src/Disks/IO/CachedOnDiskReadBufferFromFile.h b/src/Disks/IO/CachedOnDiskReadBufferFromFile.h index 3433698a162..119fa166214 100644 --- a/src/Disks/IO/CachedOnDiskReadBufferFromFile.h +++ b/src/Disks/IO/CachedOnDiskReadBufferFromFile.h @@ -129,19 +129,7 @@ private: ReadType read_type = ReadType::REMOTE_FS_READ_BYPASS_CACHE; - static String toString(ReadType type) - { - switch (type) - { - case ReadType::CACHED: - return "CACHED"; - case ReadType::REMOTE_FS_READ_BYPASS_CACHE: - return "REMOTE_FS_READ_BYPASS_CACHE"; - case ReadType::REMOTE_FS_READ_AND_PUT_IN_CACHE: - return "REMOTE_FS_READ_AND_PUT_IN_CACHE"; - } - UNREACHABLE(); - } + static String toString(ReadType type); size_t first_offset = 0; String nextimpl_step_log_info; diff --git a/src/Disks/ObjectStorages/IObjectStorage.cpp b/src/Disks/ObjectStorages/IObjectStorage.cpp index fd1269df79b..ce5f06e8f25 100644 --- a/src/Disks/ObjectStorages/IObjectStorage.cpp +++ b/src/Disks/ObjectStorages/IObjectStorage.cpp @@ -18,6 +18,11 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; } +const MetadataStorageMetrics & IObjectStorage::getMetadataStorageMetrics() const +{ + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method 'getMetadataStorageMetrics' is not implemented"); +} + bool IObjectStorage::existsOrHasAnyChild(const std::string & path) const { RelativePathsWithMetadata files; diff --git a/src/Disks/ObjectStorages/IObjectStorage.h b/src/Disks/ObjectStorages/IObjectStorage.h index b49dc839561..7bc9e4073db 100644 --- a/src/Disks/ObjectStorages/IObjectStorage.h +++ b/src/Disks/ObjectStorages/IObjectStorage.h @@ -13,17 +13,18 @@ #include #include -#include -#include -#include -#include -#include -#include #include #include -#include -#include +#include +#include +#include +#include +#include #include +#include +#include +#include +#include #include "config.h" #if USE_AZURE_BLOB_STORAGE @@ -115,6 +116,8 @@ public: virtual std::string getDescription() const = 0; + virtual const MetadataStorageMetrics & getMetadataStorageMetrics() const; + /// Object exists or not virtual bool exists(const StoredObject & object) const = 0; diff --git a/src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorageOperations.cpp b/src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorageOperations.cpp index a28f4e7a882..7e4b1f69962 100644 --- a/src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorageOperations.cpp +++ b/src/Disks/ObjectStorages/MetadataStorageFromPlainObjectStorageOperations.cpp @@ -52,11 +52,16 @@ void MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::execute(std: [[maybe_unused]] auto result = path_map.emplace(path, std::move(key_prefix)); chassert(result.second); + auto metric = object_storage->getMetadataStorageMetrics().directory_map_size; + CurrentMetrics::add(metric, 1); writeString(path.string(), *buf); buf->finalize(); write_finalized = true; + + auto event = object_storage->getMetadataStorageMetrics().directory_created; + ProfileEvents::increment(event); } void MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::undo(std::unique_lock &) @@ -65,6 +70,9 @@ void MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::undo(std::un if (write_finalized) { path_map.erase(path); + auto metric = object_storage->getMetadataStorageMetrics().directory_map_size; + CurrentMetrics::sub(metric, 1); + object_storage->removeObject(StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME)); } else if (write_created) @@ -165,7 +173,15 @@ void MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::execute(std: auto object_key = ObjectStorageKey::createAsRelative(key_prefix, PREFIX_PATH_FILE_NAME); auto object = StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME); object_storage->removeObject(object); + path_map.erase(path_it); + auto metric = object_storage->getMetadataStorageMetrics().directory_map_size; + CurrentMetrics::sub(metric, 1); + + removed = true; + + auto event = object_storage->getMetadataStorageMetrics().directory_removed; + ProfileEvents::increment(event); } void MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::undo(std::unique_lock &) @@ -185,6 +201,8 @@ void MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::undo(std::un buf->finalize(); path_map.emplace(path, std::move(key_prefix)); + auto metric = object_storage->getMetadataStorageMetrics().directory_map_size; + CurrentMetrics::add(metric, 1); } } diff --git a/src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.cpp b/src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.cpp index 3e772271b99..cc77ca5364b 100644 --- a/src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.cpp +++ b/src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.cpp @@ -50,6 +50,8 @@ MetadataStorageFromPlainObjectStorage::PathMap loadPathPrefixMap(const std::stri res.first->second, remote_path.parent_path().string()); } + auto metric = object_storage->getMetadataStorageMetrics().directory_map_size; + CurrentMetrics::add(metric, result.size()); return result; } @@ -134,6 +136,12 @@ MetadataStorageFromPlainRewritableObjectStorage::MetadataStorageFromPlainRewrita object_storage->setKeysGenerator(keys_gen); } +MetadataStorageFromPlainRewritableObjectStorage::~MetadataStorageFromPlainRewritableObjectStorage() +{ + auto metric = object_storage->getMetadataStorageMetrics().directory_map_size; + CurrentMetrics::sub(metric, path_map->size()); +} + std::vector MetadataStorageFromPlainRewritableObjectStorage::getDirectChildrenOnDisk( const std::string & storage_key, const RelativePathsWithMetadata & remote_paths, const std::string & local_path) const { diff --git a/src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.h b/src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.h index 4415a68c24e..661968d7044 100644 --- a/src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.h +++ b/src/Disks/ObjectStorages/MetadataStorageFromPlainRewritableObjectStorage.h @@ -14,6 +14,7 @@ private: public: MetadataStorageFromPlainRewritableObjectStorage(ObjectStoragePtr object_storage_, String storage_path_prefix_); + ~MetadataStorageFromPlainRewritableObjectStorage() override; MetadataStorageType getType() const override { return MetadataStorageType::PlainRewritable; } diff --git a/src/Disks/ObjectStorages/MetadataStorageMetrics.h b/src/Disks/ObjectStorages/MetadataStorageMetrics.h new file mode 100644 index 00000000000..365fd3c8145 --- /dev/null +++ b/src/Disks/ObjectStorages/MetadataStorageMetrics.h @@ -0,0 +1,24 @@ +#pragma once + +#include +#include +#include + +namespace DB +{ + +struct MetadataStorageMetrics +{ + const ProfileEvents::Event directory_created = ProfileEvents::end(); + const ProfileEvents::Event directory_removed = ProfileEvents::end(); + + CurrentMetrics::Metric directory_map_size = CurrentMetrics::end(); + + template + static MetadataStorageMetrics create() + { + return MetadataStorageMetrics{}; + } +}; + +} diff --git a/src/Disks/ObjectStorages/MetadataStorageTransactionState.cpp b/src/Disks/ObjectStorages/MetadataStorageTransactionState.cpp index 245578b5d9e..a37f4ce7e65 100644 --- a/src/Disks/ObjectStorages/MetadataStorageTransactionState.cpp +++ b/src/Disks/ObjectStorages/MetadataStorageTransactionState.cpp @@ -17,7 +17,6 @@ std::string toString(MetadataStorageTransactionState state) case MetadataStorageTransactionState::PARTIALLY_ROLLED_BACK: return "PARTIALLY_ROLLED_BACK"; } - UNREACHABLE(); } } diff --git a/src/Disks/ObjectStorages/ObjectStorageFactory.cpp b/src/Disks/ObjectStorages/ObjectStorageFactory.cpp index d7884c2911b..8210255decb 100644 --- a/src/Disks/ObjectStorages/ObjectStorageFactory.cpp +++ b/src/Disks/ObjectStorages/ObjectStorageFactory.cpp @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -85,7 +86,9 @@ ObjectStoragePtr createObjectStorage( DataSourceDescription{DataSourceType::ObjectStorage, type, MetadataStorageType::PlainRewritable, /*description*/ ""} .toString()); - return std::make_shared>(std::forward(args)...); + auto metadata_storage_metrics = DB::MetadataStorageMetrics::create(); + return std::make_shared>( + std::move(metadata_storage_metrics), std::forward(args)...); } else return std::make_shared(std::forward(args)...); @@ -256,8 +259,9 @@ void registerS3PlainRewritableObjectStorage(ObjectStorageFactory & factory) auto client = getClient(config, config_prefix, context, *settings, true); auto key_generator = getKeyGenerator(uri, config, config_prefix); + auto metadata_storage_metrics = DB::MetadataStorageMetrics::create(); auto object_storage = std::make_shared>( - std::move(client), std::move(settings), uri, s3_capabilities, key_generator, name); + std::move(metadata_storage_metrics), std::move(client), std::move(settings), uri, s3_capabilities, key_generator, name); /// NOTE: should we still perform this check for clickhouse-disks? if (!skip_access_check) diff --git a/src/Disks/ObjectStorages/PlainRewritableObjectStorage.h b/src/Disks/ObjectStorages/PlainRewritableObjectStorage.h index 2b116cff443..5f000afe625 100644 --- a/src/Disks/ObjectStorages/PlainRewritableObjectStorage.h +++ b/src/Disks/ObjectStorages/PlainRewritableObjectStorage.h @@ -16,8 +16,9 @@ class PlainRewritableObjectStorage : public BaseObjectStorage { public: template - explicit PlainRewritableObjectStorage(Args &&... args) + explicit PlainRewritableObjectStorage(MetadataStorageMetrics && metadata_storage_metrics_, Args &&... args) : BaseObjectStorage(std::forward(args)...) + , metadata_storage_metrics(std::move(metadata_storage_metrics_)) /// A basic key generator is required for checking S3 capabilities, /// it will be reset later by metadata storage. , key_generator(createObjectStorageKeysGeneratorAsIsWithPrefix(BaseObjectStorage::getCommonKeyPrefix())) @@ -26,6 +27,8 @@ public: std::string getName() const override { return "PlainRewritable" + BaseObjectStorage::getName(); } + const MetadataStorageMetrics & getMetadataStorageMetrics() const override { return metadata_storage_metrics; } + bool isWriteOnce() const override { return false; } bool isPlain() const override { return true; } @@ -37,6 +40,7 @@ public: void setKeysGenerator(ObjectStorageKeysGeneratorPtr gen) override { key_generator = gen; } private: + MetadataStorageMetrics metadata_storage_metrics; ObjectStorageKeysGeneratorPtr key_generator; }; diff --git a/src/Disks/ObjectStorages/S3/S3ObjectStorage.cpp b/src/Disks/ObjectStorages/S3/S3ObjectStorage.cpp index 7694337dc55..ae719f5cde4 100644 --- a/src/Disks/ObjectStorages/S3/S3ObjectStorage.cpp +++ b/src/Disks/ObjectStorages/S3/S3ObjectStorage.cpp @@ -259,7 +259,10 @@ std::unique_ptr S3ObjectStorage::writeObject( /// NOLIN throw Exception(ErrorCodes::BAD_ARGUMENTS, "S3 doesn't support append to files"); S3Settings::RequestSettings request_settings = s3_settings.get()->request_settings; - if (auto query_context = CurrentThread::getQueryContext()) + /// NOTE: For background operations settings are not propagated from session or query. They are taken from + /// default user's .xml config. It's obscure and unclear behavior. For them it's always better + /// to rely on settings from disk. + if (auto query_context = CurrentThread::getQueryContext(); query_context && !query_context->isBackgroundOperationContext()) { request_settings.updateFromSettingsIfChanged(query_context->getSettingsRef()); } diff --git a/src/Disks/ObjectStorages/createMetadataStorageMetrics.h b/src/Disks/ObjectStorages/createMetadataStorageMetrics.h new file mode 100644 index 00000000000..6dddc227ade --- /dev/null +++ b/src/Disks/ObjectStorages/createMetadataStorageMetrics.h @@ -0,0 +1,67 @@ +#pragma once + +#if USE_AWS_S3 +# include +#endif +#if USE_AZURE_BLOB_STORAGE && !defined(CLICKHOUSE_KEEPER_STANDALONE_BUILD) +# include +#endif +#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD +# include +#endif +#include + +namespace ProfileEvents +{ +extern const Event DiskPlainRewritableAzureDirectoryCreated; +extern const Event DiskPlainRewritableAzureDirectoryRemoved; +extern const Event DiskPlainRewritableLocalDirectoryCreated; +extern const Event DiskPlainRewritableLocalDirectoryRemoved; +extern const Event DiskPlainRewritableS3DirectoryCreated; +extern const Event DiskPlainRewritableS3DirectoryRemoved; +} + +namespace CurrentMetrics +{ +extern const Metric DiskPlainRewritableAzureDirectoryMapSize; +extern const Metric DiskPlainRewritableLocalDirectoryMapSize; +extern const Metric DiskPlainRewritableS3DirectoryMapSize; +} + +namespace DB +{ + +#if USE_AWS_S3 +template <> +inline MetadataStorageMetrics MetadataStorageMetrics::create() +{ + return MetadataStorageMetrics{ + .directory_created = ProfileEvents::DiskPlainRewritableS3DirectoryCreated, + .directory_removed = ProfileEvents::DiskPlainRewritableS3DirectoryRemoved, + .directory_map_size = CurrentMetrics::DiskPlainRewritableS3DirectoryMapSize}; +} +#endif + +#if USE_AZURE_BLOB_STORAGE && !defined(CLICKHOUSE_KEEPER_STANDALONE_BUILD) +template <> +inline MetadataStorageMetrics MetadataStorageMetrics::create() +{ + return MetadataStorageMetrics{ + .directory_created = ProfileEvents::DiskPlainRewritableAzureDirectoryCreated, + .directory_removed = ProfileEvents::DiskPlainRewritableAzureDirectoryRemoved, + .directory_map_size = CurrentMetrics::DiskPlainRewritableAzureDirectoryMapSize}; +} +#endif + +#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD +template <> +inline MetadataStorageMetrics MetadataStorageMetrics::create() +{ + return MetadataStorageMetrics{ + .directory_created = ProfileEvents::DiskPlainRewritableLocalDirectoryCreated, + .directory_removed = ProfileEvents::DiskPlainRewritableLocalDirectoryRemoved, + .directory_map_size = CurrentMetrics::DiskPlainRewritableLocalDirectoryMapSize}; +} +#endif + +} diff --git a/src/Disks/VolumeJBOD.cpp b/src/Disks/VolumeJBOD.cpp index d0e9d32ff5e..f8b9a57affe 100644 --- a/src/Disks/VolumeJBOD.cpp +++ b/src/Disks/VolumeJBOD.cpp @@ -112,7 +112,6 @@ DiskPtr VolumeJBOD::getDisk(size_t /* index */) const return disks_by_size.top().disk; } } - UNREACHABLE(); } ReservationPtr VolumeJBOD::reserve(UInt64 bytes) @@ -164,7 +163,6 @@ ReservationPtr VolumeJBOD::reserve(UInt64 bytes) return reservation; } } - UNREACHABLE(); } bool VolumeJBOD::areMergesAvoided() const diff --git a/src/Formats/EscapingRuleUtils.cpp b/src/Formats/EscapingRuleUtils.cpp index 89a7a31d033..9577ca2a8df 100644 --- a/src/Formats/EscapingRuleUtils.cpp +++ b/src/Formats/EscapingRuleUtils.cpp @@ -62,7 +62,6 @@ String escapingRuleToString(FormatSettings::EscapingRule escaping_rule) case FormatSettings::EscapingRule::Raw: return "Raw"; } - UNREACHABLE(); } void skipFieldByEscapingRule(ReadBuffer & buf, FormatSettings::EscapingRule escaping_rule, const FormatSettings & format_settings) diff --git a/src/Functions/FunctionsComparison.h b/src/Functions/FunctionsComparison.h index 57aebc11da0..4bee19ba87a 100644 --- a/src/Functions/FunctionsComparison.h +++ b/src/Functions/FunctionsComparison.h @@ -1176,8 +1176,7 @@ public: /// You can compare the date, datetime, or datatime64 and an enumeration with a constant string. || ((left.isDate() || left.isDate32() || left.isDateTime() || left.isDateTime64()) && (right.isDate() || right.isDate32() || right.isDateTime() || right.isDateTime64()) && left.idx == right.idx) /// only date vs date, or datetime vs datetime || (left.isUUID() && right.isUUID()) - || (left.isIPv4() && right.isIPv4()) - || (left.isIPv6() && right.isIPv6()) + || ((left.isIPv4() || left.isIPv6()) && (right.isIPv4() || right.isIPv6())) || (left.isEnum() && right.isEnum() && arguments[0]->getName() == arguments[1]->getName()) /// only equivalent enum type values can be compared against || (left_tuple && right_tuple && left_tuple->getElements().size() == right_tuple->getElements().size()) || (arguments[0]->equals(*arguments[1])))) @@ -1266,6 +1265,8 @@ public: const bool left_is_float = which_left.isFloat(); const bool right_is_float = which_right.isFloat(); + const bool left_is_ipv4 = which_left.isIPv4(); + const bool right_is_ipv4 = which_right.isIPv4(); const bool left_is_ipv6 = which_left.isIPv6(); const bool right_is_ipv6 = which_right.isIPv6(); const bool left_is_fixed_string = which_left.isFixedString(); @@ -1323,10 +1324,13 @@ public: { return res; } - else if (((left_is_ipv6 && right_is_fixed_string) || (right_is_ipv6 && left_is_fixed_string)) && fixed_string_size == IPV6_BINARY_LENGTH) + else if ( + (((left_is_ipv6 && right_is_fixed_string) || (right_is_ipv6 && left_is_fixed_string)) && fixed_string_size == IPV6_BINARY_LENGTH) + || ((left_is_ipv4 || left_is_ipv6) && (right_is_ipv4 || right_is_ipv6)) + ) { - /// Special treatment for FixedString(16) as a binary representation of IPv6 - - /// CAST is customized for this case + /// Special treatment for FixedString(16) as a binary representation of IPv6 & for comparing IPv4 & IPv6 values - + /// CAST is customized for this cases ColumnPtr left_column = left_is_ipv6 ? col_with_type_and_name_left.column : castColumn(col_with_type_and_name_left, right_type); ColumnPtr right_column = right_is_ipv6 ? diff --git a/src/Functions/FunctionsRound.h b/src/Functions/FunctionsRound.h index 1f20fbff24e..d2dac467bff 100644 --- a/src/Functions/FunctionsRound.h +++ b/src/Functions/FunctionsRound.h @@ -149,8 +149,6 @@ struct IntegerRoundingComputation return x; } } - - UNREACHABLE(); } static ALWAYS_INLINE T compute(T x, T scale) @@ -163,8 +161,6 @@ struct IntegerRoundingComputation case ScaleMode::Negative: return computeImpl(x, scale); } - - UNREACHABLE(); } static ALWAYS_INLINE void compute(const T * __restrict in, size_t scale, T * __restrict out) requires std::integral @@ -247,8 +243,6 @@ inline float roundWithMode(float x, RoundingMode mode) case RoundingMode::Ceil: return ceilf(x); case RoundingMode::Trunc: return truncf(x); } - - UNREACHABLE(); } inline double roundWithMode(double x, RoundingMode mode) @@ -260,8 +254,6 @@ inline double roundWithMode(double x, RoundingMode mode) case RoundingMode::Ceil: return ceil(x); case RoundingMode::Trunc: return trunc(x); } - - UNREACHABLE(); } template diff --git a/src/Functions/FunctionsTimeWindow.cpp b/src/Functions/FunctionsTimeWindow.cpp index 1c9f28c9724..f93a885ee65 100644 --- a/src/Functions/FunctionsTimeWindow.cpp +++ b/src/Functions/FunctionsTimeWindow.cpp @@ -232,7 +232,6 @@ struct TimeWindowImpl default: throw Exception(ErrorCodes::SYNTAX_ERROR, "Fraction seconds are unsupported by windows yet"); } - UNREACHABLE(); } template @@ -422,7 +421,6 @@ struct TimeWindowImpl default: throw Exception(ErrorCodes::SYNTAX_ERROR, "Fraction seconds are unsupported by windows yet"); } - UNREACHABLE(); } template diff --git a/src/Functions/PolygonUtils.h b/src/Functions/PolygonUtils.h index c4851718da6..57f1243537d 100644 --- a/src/Functions/PolygonUtils.h +++ b/src/Functions/PolygonUtils.h @@ -381,8 +381,6 @@ bool PointInPolygonWithGrid::contains(CoordinateType x, Coordina case CellType::complexPolygon: return boost::geometry::within(Point(x, y), polygons[cell.index_of_inner_polygon]); } - - UNREACHABLE(); } diff --git a/src/Functions/UserDefined/UserDefinedSQLObjectsZooKeeperStorage.cpp b/src/Functions/UserDefined/UserDefinedSQLObjectsZooKeeperStorage.cpp index 568e0b9b5d2..766d63eafb0 100644 --- a/src/Functions/UserDefined/UserDefinedSQLObjectsZooKeeperStorage.cpp +++ b/src/Functions/UserDefined/UserDefinedSQLObjectsZooKeeperStorage.cpp @@ -35,7 +35,6 @@ namespace case UserDefinedSQLObjectType::Function: return "function_"; } - UNREACHABLE(); } constexpr std::string_view sql_extension = ".sql"; diff --git a/src/Functions/generateUUIDv7.cpp b/src/Functions/generateUUIDv7.cpp index f2a82431c0a..b226c0840f4 100644 --- a/src/Functions/generateUUIDv7.cpp +++ b/src/Functions/generateUUIDv7.cpp @@ -73,20 +73,6 @@ void setVariant(UUID & uuid) UUIDHelpers::getLowBytes(uuid) = (UUIDHelpers::getLowBytes(uuid) & rand_b_bits_mask) | variant_2_mask; } -struct FillAllRandomPolicy -{ - static constexpr auto name = "generateUUIDv7NonMonotonic"; - static constexpr auto description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), and a random field (74 bit, including a 2-bit variant field "2") to distinguish UUIDs within a millisecond. This function is the fastest generateUUIDv7* function but it gives no monotonicity guarantees within a timestamp.)"; - struct Data - { - void generate(UUID & uuid, uint64_t ts) - { - setTimestampAndVersion(uuid, ts); - setVariant(uuid); - } - }; -}; - struct CounterFields { uint64_t last_timestamp = 0; @@ -133,44 +119,21 @@ struct CounterFields }; -struct GlobalCounterPolicy +struct Data { - static constexpr auto name = "generateUUIDv7"; - static constexpr auto description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit, including a variant field "2", 2 bit) to distinguish UUIDs within a millisecond, and a random field (32 bits). For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value. Function generateUUIDv7 guarantees that the counter field within a timestamp increments monotonically across all function invocations in concurrently running threads and queries.)"; - /// Guarantee counter monotonicity within one timestamp across all threads generating UUIDv7 simultaneously. - struct Data + static inline CounterFields fields; + static inline SharedMutex mutex; /// works a little bit faster than std::mutex here + std::lock_guard guard; + + Data() + : guard(mutex) + {} + + void generate(UUID & uuid, uint64_t timestamp) { - static inline CounterFields fields; - static inline SharedMutex mutex; /// works a little bit faster than std::mutex here - std::lock_guard guard; - - Data() - : guard(mutex) - {} - - void generate(UUID & uuid, uint64_t timestamp) - { - fields.generate(uuid, timestamp); - } - }; -}; - -struct ThreadLocalCounterPolicy -{ - static constexpr auto name = "generateUUIDv7ThreadMonotonic"; - static constexpr auto description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit, including a variant field "2", 2 bit) to distinguish UUIDs within a millisecond, and a random field (32 bits). For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value. This function behaves like generateUUIDv7 but gives no guarantee on counter monotony across different simultaneous requests. Monotonicity within one timestamp is guaranteed only within the same thread calling this function to generate UUIDs.)"; - - /// Guarantee counter monotonicity within one timestamp within the same thread. Faster than GlobalCounterPolicy if a query uses multiple threads. - struct Data - { - static inline thread_local CounterFields fields; - - void generate(UUID & uuid, uint64_t timestamp) - { - fields.generate(uuid, timestamp); - } - }; + fields.generate(uuid, timestamp); + } }; } @@ -181,11 +144,12 @@ DECLARE_AVX2_SPECIFIC_CODE(__VA_ARGS__) DECLARE_SEVERAL_IMPLEMENTATIONS( -template -class FunctionGenerateUUIDv7Base : public IFunction, public FillPolicy +class FunctionGenerateUUIDv7Base : public IFunction { public: - String getName() const final { return FillPolicy::name; } + static constexpr auto name = "generateUUIDv7"; + + String getName() const final { return name; } size_t getNumberOfArguments() const final { return 0; } bool isDeterministic() const override { return false; } bool isDeterministicInScopeOfQuery() const final { return false; } @@ -221,7 +185,7 @@ public: uint64_t timestamp = getTimestampMillisecond(); for (UUID & uuid : vec_to) { - typename FillPolicy::Data data; + Data data; data.generate(uuid, timestamp); } } @@ -231,19 +195,18 @@ public: ) // DECLARE_SEVERAL_IMPLEMENTATIONS #undef DECLARE_SEVERAL_IMPLEMENTATIONS -template -class FunctionGenerateUUIDv7Base : public TargetSpecific::Default::FunctionGenerateUUIDv7Base +class FunctionGenerateUUIDv7Base : public TargetSpecific::Default::FunctionGenerateUUIDv7Base { public: - using Self = FunctionGenerateUUIDv7Base; - using Parent = TargetSpecific::Default::FunctionGenerateUUIDv7Base; + using Self = FunctionGenerateUUIDv7Base; + using Parent = TargetSpecific::Default::FunctionGenerateUUIDv7Base; explicit FunctionGenerateUUIDv7Base(ContextPtr context) : selector(context) { selector.registerImplementation(); #if USE_MULTITARGET_CODE - using ParentAVX2 = TargetSpecific::AVX2::FunctionGenerateUUIDv7Base; + using ParentAVX2 = TargetSpecific::AVX2::FunctionGenerateUUIDv7Base; selector.registerImplementation(); #endif } @@ -262,27 +225,16 @@ private: ImplementationSelector selector; }; -template -void registerUUIDv7Generator(auto & factory) -{ - static constexpr auto doc_syntax_format = "{}([expression])"; - static constexpr auto example_format = "SELECT {}()"; - static constexpr auto multiple_example_format = "SELECT {f}(1), {f}(2)"; - - FunctionDocumentation::Description description = FillPolicy::description; - FunctionDocumentation::Syntax syntax = fmt::format(doc_syntax_format, FillPolicy::name); - FunctionDocumentation::Arguments arguments = {{"expression", "The expression is used to bypass common subexpression elimination if the function is called multiple times in a query but otherwise ignored. Optional."}}; - FunctionDocumentation::ReturnedValue returned_value = "A value of type UUID version 7."; - FunctionDocumentation::Examples examples = {{"single", fmt::format(example_format, FillPolicy::name), ""}, {"multiple", fmt::format(multiple_example_format, fmt::arg("f", FillPolicy::name)), ""}}; - FunctionDocumentation::Categories categories = {"UUID"}; - - factory.template registerFunction>({description, syntax, arguments, returned_value, examples, categories}, FunctionFactory::CaseInsensitive); -} - REGISTER_FUNCTION(GenerateUUIDv7) { - registerUUIDv7Generator(factory); - registerUUIDv7Generator(factory); - registerUUIDv7Generator(factory); + FunctionDocumentation::Description description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit, including a variant field "2", 2 bit) to distinguish UUIDs within a millisecond, and a random field (32 bits). For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value. Function generateUUIDv7 guarantees that the counter field within a timestamp increments monotonically across all function invocations in concurrently running threads and queries.)"; + FunctionDocumentation::Syntax syntax = "SELECT generateUUIDv7()"; + FunctionDocumentation::Arguments arguments = {{"expression", "The expression is used to bypass common subexpression elimination if the function is called multiple times in a query but otherwise ignored. Optional."}}; + FunctionDocumentation::ReturnedValue returned_value = "A value of type UUID version 7."; + FunctionDocumentation::Examples examples = {{"single", "SELECT generateUUIDv7()", ""}, {"multiple", "SELECT generateUUIDv7(1), generateUUIDv7(2)", ""}}; + FunctionDocumentation::Categories categories = {"UUID"}; + + factory.registerFunction({description, syntax, arguments, returned_value, examples, categories}); } + } diff --git a/src/Functions/neighbor.cpp b/src/Functions/neighbor.cpp index abe6d39422d..62f129109f9 100644 --- a/src/Functions/neighbor.cpp +++ b/src/Functions/neighbor.cpp @@ -36,11 +36,11 @@ public: static FunctionPtr create(ContextPtr context) { - if (!context->getSettingsRef().allow_deprecated_functions) + if (!context->getSettingsRef().allow_deprecated_error_prone_window_functions) throw Exception( ErrorCodes::DEPRECATED_FUNCTION, "Function {} is deprecated since its usage is error-prone (see docs)." - "Please use proper window function or set `allow_deprecated_functions` setting to enable it", + "Please use proper window function or set `allow_deprecated_error_prone_window_functions` setting to enable it", name); return std::make_shared(); diff --git a/src/Functions/runningAccumulate.cpp b/src/Functions/runningAccumulate.cpp index 9bf387d3357..d585affd91b 100644 --- a/src/Functions/runningAccumulate.cpp +++ b/src/Functions/runningAccumulate.cpp @@ -39,11 +39,11 @@ public: static FunctionPtr create(ContextPtr context) { - if (!context->getSettingsRef().allow_deprecated_functions) + if (!context->getSettingsRef().allow_deprecated_error_prone_window_functions) throw Exception( ErrorCodes::DEPRECATED_FUNCTION, "Function {} is deprecated since its usage is error-prone (see docs)." - "Please use proper window function or set `allow_deprecated_functions` setting to enable it", + "Please use proper window function or set `allow_deprecated_error_prone_window_functions` setting to enable it", name); return std::make_shared(); diff --git a/src/Functions/runningDifference.h b/src/Functions/runningDifference.h index d3704aa97ca..fe477d13744 100644 --- a/src/Functions/runningDifference.h +++ b/src/Functions/runningDifference.h @@ -139,11 +139,11 @@ public: static FunctionPtr create(ContextPtr context) { - if (!context->getSettingsRef().allow_deprecated_functions) + if (!context->getSettingsRef().allow_deprecated_error_prone_window_functions) throw Exception( ErrorCodes::DEPRECATED_FUNCTION, "Function {} is deprecated since its usage is error-prone (see docs)." - "Please use proper window function or set `allow_deprecated_functions` setting to enable it", + "Please use proper window function or set `allow_deprecated_error_prone_window_functions` setting to enable it", name); return std::make_shared>(); diff --git a/src/IO/CompressionMethod.cpp b/src/IO/CompressionMethod.cpp index b8e1134d422..22913125e99 100644 --- a/src/IO/CompressionMethod.cpp +++ b/src/IO/CompressionMethod.cpp @@ -52,7 +52,6 @@ std::string toContentEncodingName(CompressionMethod method) case CompressionMethod::None: return ""; } - UNREACHABLE(); } CompressionMethod chooseHTTPCompressionMethod(const std::string & list) diff --git a/src/IO/HadoopSnappyReadBuffer.h b/src/IO/HadoopSnappyReadBuffer.h index eba614d9d0a..7d6e6db2fa7 100644 --- a/src/IO/HadoopSnappyReadBuffer.h +++ b/src/IO/HadoopSnappyReadBuffer.h @@ -88,7 +88,6 @@ public: case Status::TOO_LARGE_COMPRESSED_BLOCK: return "TOO_LARGE_COMPRESSED_BLOCK"; } - UNREACHABLE(); } explicit HadoopSnappyReadBuffer( diff --git a/src/Interpreters/AggregatedDataVariants.cpp b/src/Interpreters/AggregatedDataVariants.cpp index 87cfdda5948..8f82f15248f 100644 --- a/src/Interpreters/AggregatedDataVariants.cpp +++ b/src/Interpreters/AggregatedDataVariants.cpp @@ -117,8 +117,6 @@ size_t AggregatedDataVariants::size() const APPLY_FOR_AGGREGATED_VARIANTS(M) #undef M } - - UNREACHABLE(); } size_t AggregatedDataVariants::sizeWithoutOverflowRow() const @@ -136,8 +134,6 @@ size_t AggregatedDataVariants::sizeWithoutOverflowRow() const APPLY_FOR_AGGREGATED_VARIANTS(M) #undef M } - - UNREACHABLE(); } const char * AggregatedDataVariants::getMethodName() const @@ -155,8 +151,6 @@ const char * AggregatedDataVariants::getMethodName() const APPLY_FOR_AGGREGATED_VARIANTS(M) #undef M } - - UNREACHABLE(); } bool AggregatedDataVariants::isTwoLevel() const @@ -174,8 +168,6 @@ bool AggregatedDataVariants::isTwoLevel() const APPLY_FOR_AGGREGATED_VARIANTS(M) #undef M } - - UNREACHABLE(); } bool AggregatedDataVariants::isConvertibleToTwoLevel() const diff --git a/src/Interpreters/Cache/FileSegment.cpp b/src/Interpreters/Cache/FileSegment.cpp index 9459029dc4c..61a356fa3c3 100644 --- a/src/Interpreters/Cache/FileSegment.cpp +++ b/src/Interpreters/Cache/FileSegment.cpp @@ -799,7 +799,6 @@ String FileSegment::stateToString(FileSegment::State state) case FileSegment::State::DETACHED: return "DETACHED"; } - UNREACHABLE(); } bool FileSegment::assertCorrectness() const diff --git a/src/Interpreters/ClientInfo.h b/src/Interpreters/ClientInfo.h index c2ed9f7ffa4..3054667e264 100644 --- a/src/Interpreters/ClientInfo.h +++ b/src/Interpreters/ClientInfo.h @@ -130,6 +130,16 @@ public: UInt64 count_participating_replicas{0}; UInt64 number_of_current_replica{0}; + enum class BackgroundOperationType : uint8_t + { + NOT_A_BACKGROUND_OPERATION = 0, + MERGE = 1, + MUTATION = 2, + }; + + /// It's ClientInfo and context created for background operation (not real query) + BackgroundOperationType background_operation_type{BackgroundOperationType::NOT_A_BACKGROUND_OPERATION}; + bool empty() const { return query_kind == QueryKind::NO_QUERY; } /** Serialization and deserialization. diff --git a/src/Interpreters/ComparisonGraph.cpp b/src/Interpreters/ComparisonGraph.cpp index 4eacbae7a30..d53ff4b0227 100644 --- a/src/Interpreters/ComparisonGraph.cpp +++ b/src/Interpreters/ComparisonGraph.cpp @@ -309,7 +309,6 @@ ComparisonGraphCompareResult ComparisonGraph::pathToCompareResult(Path pat case Path::GREATER: return inverse ? ComparisonGraphCompareResult::LESS : ComparisonGraphCompareResult::GREATER; case Path::GREATER_OR_EQUAL: return inverse ? ComparisonGraphCompareResult::LESS_OR_EQUAL : ComparisonGraphCompareResult::GREATER_OR_EQUAL; } - UNREACHABLE(); } template diff --git a/src/Interpreters/ConcurrentHashJoin.cpp b/src/Interpreters/ConcurrentHashJoin.cpp index 96be70c5527..53987694e46 100644 --- a/src/Interpreters/ConcurrentHashJoin.cpp +++ b/src/Interpreters/ConcurrentHashJoin.cpp @@ -1,10 +1,9 @@ -#include -#include #include #include #include #include #include +#include #include #include #include @@ -15,10 +14,20 @@ #include #include #include +#include #include +#include #include +#include +#include #include -#include + +namespace CurrentMetrics +{ +extern const Metric ConcurrentHashJoinPoolThreads; +extern const Metric ConcurrentHashJoinPoolThreadsActive; +extern const Metric ConcurrentHashJoinPoolThreadsScheduled; +} namespace DB { @@ -36,20 +45,82 @@ static UInt32 toPowerOfTwo(UInt32 x) return static_cast(1) << (32 - std::countl_zero(x - 1)); } -ConcurrentHashJoin::ConcurrentHashJoin(ContextPtr context_, std::shared_ptr table_join_, size_t slots_, const Block & right_sample_block, bool any_take_last_row_) +ConcurrentHashJoin::ConcurrentHashJoin( + ContextPtr context_, std::shared_ptr table_join_, size_t slots_, const Block & right_sample_block, bool any_take_last_row_) : context(context_) , table_join(table_join_) , slots(toPowerOfTwo(std::min(static_cast(slots_), 256))) + , pool(std::make_unique( + CurrentMetrics::ConcurrentHashJoinPoolThreads, + CurrentMetrics::ConcurrentHashJoinPoolThreadsActive, + CurrentMetrics::ConcurrentHashJoinPoolThreadsScheduled, + slots)) { - for (size_t i = 0; i < slots; ++i) - { - auto inner_hash_join = std::make_shared(); + hash_joins.resize(slots); - inner_hash_join->data = std::make_unique(table_join_, right_sample_block, any_take_last_row_, 0, fmt::format("concurrent{}", i)); - /// Non zero `max_joined_block_rows` allows to process block partially and return not processed part. - /// TODO: It's not handled properly in ConcurrentHashJoin case, so we set it to 0 to disable this feature. - inner_hash_join->data->setMaxJoinedBlockRows(0); - hash_joins.emplace_back(std::move(inner_hash_join)); + try + { + for (size_t i = 0; i < slots; ++i) + { + pool->scheduleOrThrow( + [&, idx = i, thread_group = CurrentThread::getGroup()]() + { + SCOPE_EXIT_SAFE({ + if (thread_group) + CurrentThread::detachFromGroupIfNotDetached(); + }); + + if (thread_group) + CurrentThread::attachToGroupIfDetached(thread_group); + setThreadName("ConcurrentJoin"); + + auto inner_hash_join = std::make_shared(); + inner_hash_join->data = std::make_unique( + table_join_, right_sample_block, any_take_last_row_, 0, fmt::format("concurrent{}", idx)); + /// Non zero `max_joined_block_rows` allows to process block partially and return not processed part. + /// TODO: It's not handled properly in ConcurrentHashJoin case, so we set it to 0 to disable this feature. + inner_hash_join->data->setMaxJoinedBlockRows(0); + hash_joins[idx] = std::move(inner_hash_join); + }); + } + pool->wait(); + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + pool->wait(); + throw; + } +} + +ConcurrentHashJoin::~ConcurrentHashJoin() +{ + try + { + for (size_t i = 0; i < slots; ++i) + { + // Hash tables destruction may be very time-consuming. + // Without the following code, they would be destroyed in the current thread (i.e. sequentially). + // `InternalHashJoin` is moved here and will be destroyed in the destructor of the lambda function. + pool->scheduleOrThrow( + [join = std::move(hash_joins[i]), thread_group = CurrentThread::getGroup()]() + { + SCOPE_EXIT_SAFE({ + if (thread_group) + CurrentThread::detachFromGroupIfNotDetached(); + }); + + if (thread_group) + CurrentThread::attachToGroupIfDetached(thread_group); + setThreadName("ConcurrentJoin"); + }); + } + pool->wait(); + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + pool->wait(); } } diff --git a/src/Interpreters/ConcurrentHashJoin.h b/src/Interpreters/ConcurrentHashJoin.h index 40796376d23..c797ff27ece 100644 --- a/src/Interpreters/ConcurrentHashJoin.h +++ b/src/Interpreters/ConcurrentHashJoin.h @@ -10,6 +10,7 @@ #include #include #include +#include namespace DB { @@ -39,7 +40,7 @@ public: const Block & right_sample_block, bool any_take_last_row_ = false); - ~ConcurrentHashJoin() override = default; + ~ConcurrentHashJoin() override; std::string getName() const override { return "ConcurrentHashJoin"; } const TableJoin & getTableJoin() const override { return *table_join; } @@ -66,6 +67,7 @@ private: ContextPtr context; std::shared_ptr table_join; size_t slots; + std::unique_ptr pool; std::vector> hash_joins; std::mutex totals_mutex; diff --git a/src/Interpreters/Context.cpp b/src/Interpreters/Context.cpp index e1d82a8f604..5c9ae4716b9 100644 --- a/src/Interpreters/Context.cpp +++ b/src/Interpreters/Context.cpp @@ -2386,6 +2386,17 @@ void Context::setCurrentQueryId(const String & query_id) client_info.initial_query_id = client_info.current_query_id; } +void Context::setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType background_operation) +{ + chassert(background_operation != ClientInfo::BackgroundOperationType::NOT_A_BACKGROUND_OPERATION); + client_info.background_operation_type = background_operation; +} + +bool Context::isBackgroundOperationContext() const +{ + return client_info.background_operation_type != ClientInfo::BackgroundOperationType::NOT_A_BACKGROUND_OPERATION; +} + void Context::killCurrentQuery() const { if (auto elem = getProcessListElement()) diff --git a/src/Interpreters/Context.h b/src/Interpreters/Context.h index 814534f7035..87a7baa0469 100644 --- a/src/Interpreters/Context.h +++ b/src/Interpreters/Context.h @@ -760,6 +760,12 @@ public: void setCurrentDatabaseNameInGlobalContext(const String & name); void setCurrentQueryId(const String & query_id); + /// FIXME: for background operations (like Merge and Mutation) we also use the same Context object and even setup + /// query_id for it (table_uuid::result_part_name). We can distinguish queries from background operation in some way like + /// bool is_background = query_id.contains("::"), but it's much worse than just enum check with more clear purpose + void setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType setBackgroundOperationTypeForContextbackground_operation); + bool isBackgroundOperationContext() const; + void killCurrentQuery() const; bool isCurrentQueryKilled() const; diff --git a/src/Interpreters/FilesystemCacheLog.cpp b/src/Interpreters/FilesystemCacheLog.cpp index 80fe1c3a8ef..90756f1c84a 100644 --- a/src/Interpreters/FilesystemCacheLog.cpp +++ b/src/Interpreters/FilesystemCacheLog.cpp @@ -15,18 +15,7 @@ namespace DB static String typeToString(FilesystemCacheLogElement::CacheType type) { - switch (type) - { - case FilesystemCacheLogElement::CacheType::READ_FROM_CACHE: - return "READ_FROM_CACHE"; - case FilesystemCacheLogElement::CacheType::READ_FROM_FS_AND_DOWNLOADED_TO_CACHE: - return "READ_FROM_FS_AND_DOWNLOADED_TO_CACHE"; - case FilesystemCacheLogElement::CacheType::READ_FROM_FS_BYPASSING_CACHE: - return "READ_FROM_FS_BYPASSING_CACHE"; - case FilesystemCacheLogElement::CacheType::WRITE_THROUGH_CACHE: - return "WRITE_THROUGH_CACHE"; - } - UNREACHABLE(); + return String(magic_enum::enum_name(type)); } ColumnsDescription FilesystemCacheLogElement::getColumnsDescription() diff --git a/src/Interpreters/HashJoin.cpp b/src/Interpreters/HashJoin.cpp index 3a21c13db5e..75da8bbc3e7 100644 --- a/src/Interpreters/HashJoin.cpp +++ b/src/Interpreters/HashJoin.cpp @@ -705,7 +705,6 @@ namespace APPLY_FOR_JOIN_VARIANTS(M) #undef M } - UNREACHABLE(); } } @@ -2641,8 +2640,6 @@ private: default: throw Exception(ErrorCodes::UNSUPPORTED_JOIN_KEYS, "Unsupported JOIN keys (type: {})", parent.data->type); } - - UNREACHABLE(); } template diff --git a/src/Interpreters/HashJoin.h b/src/Interpreters/HashJoin.h index 86db8943926..a0996556f9a 100644 --- a/src/Interpreters/HashJoin.h +++ b/src/Interpreters/HashJoin.h @@ -322,8 +322,6 @@ public: APPLY_FOR_JOIN_VARIANTS(M) #undef M } - - UNREACHABLE(); } size_t getTotalByteCountImpl(Type which) const @@ -338,8 +336,6 @@ public: APPLY_FOR_JOIN_VARIANTS(M) #undef M } - - UNREACHABLE(); } size_t getBufferSizeInCells(Type which) const @@ -354,8 +350,6 @@ public: APPLY_FOR_JOIN_VARIANTS(M) #undef M } - - UNREACHABLE(); } /// NOLINTEND(bugprone-macro-parentheses) }; diff --git a/src/Interpreters/InterpreterTransactionControlQuery.cpp b/src/Interpreters/InterpreterTransactionControlQuery.cpp index d31ace758c4..13872fbe3f5 100644 --- a/src/Interpreters/InterpreterTransactionControlQuery.cpp +++ b/src/Interpreters/InterpreterTransactionControlQuery.cpp @@ -33,7 +33,6 @@ BlockIO InterpreterTransactionControlQuery::execute() case ASTTransactionControl::SET_SNAPSHOT: return executeSetSnapshot(session_context, tcl.snapshot); } - UNREACHABLE(); } BlockIO InterpreterTransactionControlQuery::executeBegin(ContextMutablePtr session_context) diff --git a/src/Interpreters/SetVariants.cpp b/src/Interpreters/SetVariants.cpp index 64796a013f1..c600d096160 100644 --- a/src/Interpreters/SetVariants.cpp +++ b/src/Interpreters/SetVariants.cpp @@ -41,8 +41,6 @@ size_t SetVariantsTemplate::getTotalRowCount() const APPLY_FOR_SET_VARIANTS(M) #undef M } - - UNREACHABLE(); } template @@ -57,8 +55,6 @@ size_t SetVariantsTemplate::getTotalByteCount() const APPLY_FOR_SET_VARIANTS(M) #undef M } - - UNREACHABLE(); } template diff --git a/src/Parsers/ASTExplainQuery.h b/src/Parsers/ASTExplainQuery.h index 701bde8cebd..eb095b5dbbc 100644 --- a/src/Parsers/ASTExplainQuery.h +++ b/src/Parsers/ASTExplainQuery.h @@ -40,8 +40,6 @@ public: case TableOverride: return "EXPLAIN TABLE OVERRIDE"; case CurrentTransaction: return "EXPLAIN CURRENT TRANSACTION"; } - - UNREACHABLE(); } static ExplainKind fromString(const String & str) diff --git a/src/Parsers/Lexer.cpp b/src/Parsers/Lexer.cpp index 34855a7ce20..5f2bd50524c 100644 --- a/src/Parsers/Lexer.cpp +++ b/src/Parsers/Lexer.cpp @@ -42,7 +42,7 @@ Token quotedString(const char *& pos, const char * const token_begin, const char continue; } - UNREACHABLE(); + chassert(false); } } @@ -538,8 +538,6 @@ const char * getTokenName(TokenType type) APPLY_FOR_TOKENS(M) #undef M } - - UNREACHABLE(); } diff --git a/src/Planner/PlannerExpressionAnalysis.cpp b/src/Planner/PlannerExpressionAnalysis.cpp index 2a95234057c..f0a2845c3e8 100644 --- a/src/Planner/PlannerExpressionAnalysis.cpp +++ b/src/Planner/PlannerExpressionAnalysis.cpp @@ -51,7 +51,7 @@ FilterAnalysisResult analyzeFilter(const QueryTreeNodePtr & filter_expression_no return result; } -bool isDeterministicConstant(const ConstantNode & root) +bool canRemoveConstantFromGroupByKey(const ConstantNode & root) { const auto & source_expression = root.getSourceExpression(); if (!source_expression) @@ -64,15 +64,20 @@ bool isDeterministicConstant(const ConstantNode & root) const auto * node = nodes.top(); nodes.pop(); + if (node->getNodeType() == QueryTreeNodeType::QUERY) + /// Allow removing constants from scalar subqueries. We send them to all the shards. + continue; + const auto * constant_node = node->as(); const auto * function_node = node->as(); if (constant_node) { - if (!isDeterministicConstant(*constant_node)) + if (!canRemoveConstantFromGroupByKey(*constant_node)) return false; } else if (function_node) { + /// Do not allow removing constants like `hostName()` if (!function_node->getFunctionOrThrow()->isDeterministic()) return false; @@ -122,7 +127,7 @@ std::optional analyzeAggregation(const QueryTreeNodeP bool is_secondary_query = planner_context->getQueryContext()->getClientInfo().query_kind == ClientInfo::QueryKind::SECONDARY_QUERY; bool is_distributed_query = planner_context->getQueryContext()->isDistributed(); - bool check_deterministic_constants = is_secondary_query || is_distributed_query; + bool check_constants_for_group_by_key = is_secondary_query || is_distributed_query; if (query_node.hasGroupBy()) { @@ -139,7 +144,7 @@ std::optional analyzeAggregation(const QueryTreeNodeP const auto * constant_key = grouping_set_key_node->as(); group_by_with_constant_keys |= (constant_key != nullptr); - if (constant_key && !aggregates_descriptions.empty() && (!check_deterministic_constants || isDeterministicConstant(*constant_key))) + if (constant_key && !aggregates_descriptions.empty() && (!check_constants_for_group_by_key || canRemoveConstantFromGroupByKey(*constant_key))) continue; auto expression_dag_nodes = actions_visitor.visit(before_aggregation_actions, grouping_set_key_node); @@ -191,7 +196,7 @@ std::optional analyzeAggregation(const QueryTreeNodeP const auto * constant_key = group_by_key_node->as(); group_by_with_constant_keys |= (constant_key != nullptr); - if (constant_key && !aggregates_descriptions.empty() && (!check_deterministic_constants || isDeterministicConstant(*constant_key))) + if (constant_key && !aggregates_descriptions.empty() && (!check_constants_for_group_by_key || canRemoveConstantFromGroupByKey(*constant_key))) continue; auto expression_dag_nodes = actions_visitor.visit(before_aggregation_actions, group_by_key_node); diff --git a/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp b/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp index 98cbdeaaa4b..6b7f1f5206c 100644 --- a/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp +++ b/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp @@ -657,7 +657,6 @@ DataTypePtr MsgPackSchemaReader::getDataType(const msgpack::object & object) throw Exception(ErrorCodes::BAD_ARGUMENTS, "Msgpack extension type {:x} is not supported", object_ext.type()); } } - UNREACHABLE(); } std::optional MsgPackSchemaReader::readRowAndGetDataTypes() diff --git a/src/Processors/IProcessor.cpp b/src/Processors/IProcessor.cpp index 8b160153733..5ab5e5277aa 100644 --- a/src/Processors/IProcessor.cpp +++ b/src/Processors/IProcessor.cpp @@ -36,8 +36,6 @@ std::string IProcessor::statusToName(Status status) case Status::ExpandPipeline: return "ExpandPipeline"; } - - UNREACHABLE(); } } diff --git a/src/Processors/QueryPlan/ReadFromLoopStep.cpp b/src/Processors/QueryPlan/ReadFromLoopStep.cpp new file mode 100644 index 00000000000..10436490a2a --- /dev/null +++ b/src/Processors/QueryPlan/ReadFromLoopStep.cpp @@ -0,0 +1,156 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + namespace ErrorCodes + { + extern const int TOO_MANY_RETRIES_TO_FETCH_PARTS; + } + class PullingPipelineExecutor; + + class LoopSource : public ISource + { + public: + + LoopSource( + const Names & column_names_, + const SelectQueryInfo & query_info_, + const StorageSnapshotPtr & storage_snapshot_, + ContextPtr & context_, + QueryProcessingStage::Enum processed_stage_, + StoragePtr inner_storage_, + size_t max_block_size_, + size_t num_streams_) + : ISource(storage_snapshot_->getSampleBlockForColumns(column_names_)) + , column_names(column_names_) + , query_info(query_info_) + , storage_snapshot(storage_snapshot_) + , processed_stage(processed_stage_) + , context(context_) + , inner_storage(std::move(inner_storage_)) + , max_block_size(max_block_size_) + , num_streams(num_streams_) + { + } + + String getName() const override { return "Loop"; } + + Chunk generate() override + { + while (true) + { + if (!loop) + { + QueryPlan plan; + auto storage_snapshot_ = inner_storage->getStorageSnapshotForQuery(inner_storage->getInMemoryMetadataPtr(), nullptr, context); + inner_storage->read( + plan, + column_names, + storage_snapshot_, + query_info, + context, + processed_stage, + max_block_size, + num_streams); + auto builder = plan.buildQueryPipeline( + QueryPlanOptimizationSettings::fromContext(context), + BuildQueryPipelineSettings::fromContext(context)); + QueryPlanResourceHolder resources; + auto pipe = QueryPipelineBuilder::getPipe(std::move(*builder), resources); + query_pipeline = QueryPipeline(std::move(pipe)); + executor = std::make_unique(query_pipeline); + loop = true; + } + Chunk chunk; + if (executor->pull(chunk)) + { + if (chunk) + { + retries_count = 0; + return chunk; + } + + } + else + { + ++retries_count; + if (retries_count > max_retries_count) + throw Exception(ErrorCodes::TOO_MANY_RETRIES_TO_FETCH_PARTS, "Too many retries to pull from storage"); + loop = false; + executor.reset(); + query_pipeline.reset(); + } + } + } + + private: + + const Names column_names; + SelectQueryInfo query_info; + const StorageSnapshotPtr storage_snapshot; + QueryProcessingStage::Enum processed_stage; + ContextPtr context; + StoragePtr inner_storage; + size_t max_block_size; + size_t num_streams; + // add retries. If inner_storage failed to pull X times in a row we'd better to fail here not to hang + size_t retries_count = 0; + size_t max_retries_count = 3; + bool loop = false; + QueryPipeline query_pipeline; + std::unique_ptr executor; + }; + + ReadFromLoopStep::ReadFromLoopStep( + const Names & column_names_, + const SelectQueryInfo & query_info_, + const StorageSnapshotPtr & storage_snapshot_, + const ContextPtr & context_, + QueryProcessingStage::Enum processed_stage_, + StoragePtr inner_storage_, + size_t max_block_size_, + size_t num_streams_) + : SourceStepWithFilter( + DataStream{.header = storage_snapshot_->getSampleBlockForColumns(column_names_)}, + column_names_, + query_info_, + storage_snapshot_, + context_) + , column_names(column_names_) + , processed_stage(processed_stage_) + , inner_storage(std::move(inner_storage_)) + , max_block_size(max_block_size_) + , num_streams(num_streams_) + { + } + + Pipe ReadFromLoopStep::makePipe() + { + return Pipe(std::make_shared( + column_names, query_info, storage_snapshot, context, processed_stage, inner_storage, max_block_size, num_streams)); + } + + void ReadFromLoopStep::initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &) + { + auto pipe = makePipe(); + + if (pipe.empty()) + { + assert(output_stream != std::nullopt); + pipe = Pipe(std::make_shared(output_stream->header)); + } + + pipeline.init(std::move(pipe)); + } + +} diff --git a/src/Processors/QueryPlan/ReadFromLoopStep.h b/src/Processors/QueryPlan/ReadFromLoopStep.h new file mode 100644 index 00000000000..4eee0ca5605 --- /dev/null +++ b/src/Processors/QueryPlan/ReadFromLoopStep.h @@ -0,0 +1,37 @@ +#pragma once +#include +#include +#include +#include + +namespace DB +{ + + class ReadFromLoopStep final : public SourceStepWithFilter + { + public: + ReadFromLoopStep( + const Names & column_names_, + const SelectQueryInfo & query_info_, + const StorageSnapshotPtr & storage_snapshot_, + const ContextPtr & context_, + QueryProcessingStage::Enum processed_stage_, + StoragePtr inner_storage_, + size_t max_block_size_, + size_t num_streams_); + + String getName() const override { return "ReadFromLoop"; } + + void initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &) override; + + private: + + Pipe makePipe(); + + const Names column_names; + QueryProcessingStage::Enum processed_stage; + StoragePtr inner_storage; + size_t max_block_size; + size_t num_streams; + }; +} diff --git a/src/Processors/QueryPlan/ReadFromMergeTree.cpp b/src/Processors/QueryPlan/ReadFromMergeTree.cpp index 503031eb04b..caba1d32988 100644 --- a/src/Processors/QueryPlan/ReadFromMergeTree.cpp +++ b/src/Processors/QueryPlan/ReadFromMergeTree.cpp @@ -1136,8 +1136,6 @@ static void addMergingFinal( return std::make_shared(header, num_outputs, sort_description, max_block_size_rows, /*max_block_size_bytes=*/0, merging_params.graphite_params, now); } - - UNREACHABLE(); }; pipe.addTransform(get_merging_processor()); @@ -2125,8 +2123,6 @@ static const char * indexTypeToString(ReadFromMergeTree::IndexType type) case ReadFromMergeTree::IndexType::Skip: return "Skip"; } - - UNREACHABLE(); } static const char * readTypeToString(ReadFromMergeTree::ReadType type) @@ -2142,8 +2138,6 @@ static const char * readTypeToString(ReadFromMergeTree::ReadType type) case ReadFromMergeTree::ReadType::ParallelReplicas: return "Parallel"; } - - UNREACHABLE(); } void ReadFromMergeTree::describeActions(FormatSettings & format_settings) const diff --git a/src/Processors/QueryPlan/TotalsHavingStep.cpp b/src/Processors/QueryPlan/TotalsHavingStep.cpp index d1bd70fd0b2..ac5e144bf4a 100644 --- a/src/Processors/QueryPlan/TotalsHavingStep.cpp +++ b/src/Processors/QueryPlan/TotalsHavingStep.cpp @@ -86,8 +86,6 @@ static String totalsModeToString(TotalsMode totals_mode, double auto_include_thr case TotalsMode::AFTER_HAVING_AUTO: return "after_having_auto threshold " + std::to_string(auto_include_threshold); } - - UNREACHABLE(); } void TotalsHavingStep::describeActions(FormatSettings & settings) const diff --git a/src/Processors/Transforms/FillingTransform.cpp b/src/Processors/Transforms/FillingTransform.cpp index 05fd2a7254f..bb38c3e1dc5 100644 --- a/src/Processors/Transforms/FillingTransform.cpp +++ b/src/Processors/Transforms/FillingTransform.cpp @@ -67,7 +67,6 @@ static FillColumnDescription::StepFunction getStepFunction( FOR_EACH_INTERVAL_KIND(DECLARE_CASE) #undef DECLARE_CASE } - UNREACHABLE(); } static bool tryConvertFields(FillColumnDescription & descr, const DataTypePtr & type) diff --git a/src/Processors/Transforms/FilterTransform.cpp b/src/Processors/Transforms/FilterTransform.cpp index 0793bb3db5b..e8e7f99ce53 100644 --- a/src/Processors/Transforms/FilterTransform.cpp +++ b/src/Processors/Transforms/FilterTransform.cpp @@ -14,6 +14,7 @@ namespace DB namespace ErrorCodes { extern const int ILLEGAL_TYPE_OF_COLUMN_FOR_FILTER; + extern const int LOGICAL_ERROR; } static void replaceFilterToConstant(Block & block, const String & filter_column_name) @@ -81,7 +82,11 @@ static std::unique_ptr combineFilterAndIndices( auto mutable_holder = ColumnUInt8::create(num_rows, 0); auto & data = mutable_holder->getData(); for (auto idx : selected_by_indices) + { + if (idx >= num_rows) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Index {} out of range {}", idx, num_rows); data[idx] = 1; + } /// AND two filters auto * begin = data.data(); diff --git a/src/Processors/Transforms/buildPushingToViewsChain.cpp b/src/Processors/Transforms/buildPushingToViewsChain.cpp index cdcfad4442c..a1a886fb4f7 100644 --- a/src/Processors/Transforms/buildPushingToViewsChain.cpp +++ b/src/Processors/Transforms/buildPushingToViewsChain.cpp @@ -898,8 +898,6 @@ static std::exception_ptr addStorageToException(std::exception_ptr ptr, const St { return std::current_exception(); } - - UNREACHABLE(); } void FinalizingViewsTransform::work() diff --git a/src/Storages/MergeTree/BackgroundJobsAssignee.cpp b/src/Storages/MergeTree/BackgroundJobsAssignee.cpp index 56a4378cf9a..0a69bf1109f 100644 --- a/src/Storages/MergeTree/BackgroundJobsAssignee.cpp +++ b/src/Storages/MergeTree/BackgroundJobsAssignee.cpp @@ -93,7 +93,6 @@ String BackgroundJobsAssignee::toString(Type type) case Type::Moving: return "Moving"; } - UNREACHABLE(); } void BackgroundJobsAssignee::start() diff --git a/src/Storages/MergeTree/KeyCondition.cpp b/src/Storages/MergeTree/KeyCondition.cpp index bd8642b9f66..9666da574fb 100644 --- a/src/Storages/MergeTree/KeyCondition.cpp +++ b/src/Storages/MergeTree/KeyCondition.cpp @@ -2964,8 +2964,6 @@ String KeyCondition::RPNElement::toString(std::string_view column_name, bool pri case ALWAYS_TRUE: return "true"; } - - UNREACHABLE(); } diff --git a/src/Storages/MergeTree/MergeFromLogEntryTask.cpp b/src/Storages/MergeTree/MergeFromLogEntryTask.cpp index e8d55f75b08..2db0c0af3d7 100644 --- a/src/Storages/MergeTree/MergeFromLogEntryTask.cpp +++ b/src/Storages/MergeTree/MergeFromLogEntryTask.cpp @@ -312,6 +312,7 @@ ReplicatedMergeMutateTaskBase::PrepareResult MergeFromLogEntryTask::prepare() task_context = Context::createCopy(storage.getContext()); task_context->makeQueryContext(); task_context->setCurrentQueryId(getQueryId()); + task_context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MERGE); /// Add merge to list merge_mutate_entry = storage.getContext()->getMergeList().insert( diff --git a/src/Storages/MergeTree/MergePlainMergeTreeTask.cpp b/src/Storages/MergeTree/MergePlainMergeTreeTask.cpp index 866a63911c3..a7070c80df9 100644 --- a/src/Storages/MergeTree/MergePlainMergeTreeTask.cpp +++ b/src/Storages/MergeTree/MergePlainMergeTreeTask.cpp @@ -168,6 +168,7 @@ ContextMutablePtr MergePlainMergeTreeTask::createTaskContext() const context->makeQueryContext(); auto queryId = getQueryId(); context->setCurrentQueryId(queryId); + context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MERGE); return context; } diff --git a/src/Storages/MergeTree/MergeTask.cpp b/src/Storages/MergeTree/MergeTask.cpp index be7e8874b30..f1f856da3a2 100644 --- a/src/Storages/MergeTree/MergeTask.cpp +++ b/src/Storages/MergeTree/MergeTask.cpp @@ -536,6 +536,7 @@ bool MergeTask::VerticalMergeStage::prepareVerticalMergeForAllColumns() const std::unique_ptr reread_buf = wbuf_readable ? wbuf_readable->tryGetReadBuffer() : nullptr; if (!reread_buf) throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot read temporary file {}", ctx->rows_sources_uncompressed_write_buf->getFileName()); + auto * reread_buffer_raw = dynamic_cast(reread_buf.get()); if (!reread_buffer_raw) { @@ -556,6 +557,7 @@ bool MergeTask::VerticalMergeStage::prepareVerticalMergeForAllColumns() const ctx->it_name_and_type = global_ctx->gathering_columns.cbegin(); const auto & settings = global_ctx->context->getSettingsRef(); + size_t max_delayed_streams = 0; if (global_ctx->new_data_part->getDataPartStorage().supportParallelWrite()) { @@ -564,20 +566,20 @@ bool MergeTask::VerticalMergeStage::prepareVerticalMergeForAllColumns() const else max_delayed_streams = DEFAULT_DELAYED_STREAMS_FOR_PARALLEL_WRITE; } + ctx->max_delayed_streams = max_delayed_streams; + bool all_parts_on_remote_disks = std::ranges::all_of(global_ctx->future_part->parts, [](const auto & part) { return part->isStoredOnRemoteDisk(); }); + ctx->use_prefetch = all_parts_on_remote_disks && global_ctx->data->getSettings()->vertical_merge_remote_filesystem_prefetch; + + if (ctx->use_prefetch && ctx->it_name_and_type != global_ctx->gathering_columns.end()) + ctx->prepared_pipe = createPipeForReadingOneColumn(ctx->it_name_and_type->name); + return false; } -void MergeTask::VerticalMergeStage::prepareVerticalMergeForOneColumn() const +Pipe MergeTask::VerticalMergeStage::createPipeForReadingOneColumn(const String & column_name) const { - const auto & [column_name, column_type] = *ctx->it_name_and_type; - Names column_names{column_name}; - - ctx->progress_before = global_ctx->merge_list_element_ptr->progress.load(std::memory_order_relaxed); - - global_ctx->column_progress = std::make_unique(ctx->progress_before, ctx->column_sizes->columnWeight(column_name)); - Pipes pipes; for (size_t part_num = 0; part_num < global_ctx->future_part->parts.size(); ++part_num) { @@ -586,21 +588,42 @@ void MergeTask::VerticalMergeStage::prepareVerticalMergeForOneColumn() const *global_ctx->data, global_ctx->storage_snapshot, global_ctx->future_part->parts[part_num], - column_names, + Names{column_name}, /*mark_ranges=*/ {}, + global_ctx->input_rows_filtered, /*apply_deleted_mask=*/ true, ctx->read_with_direct_io, - /*take_column_types_from_storage=*/ true, - /*quiet=*/ false, - global_ctx->input_rows_filtered); + ctx->use_prefetch); pipes.emplace_back(std::move(pipe)); } - bool is_result_sparse = global_ctx->new_data_part->getSerialization(column_name)->getKind() == ISerialization::Kind::SPARSE; + return Pipe::unitePipes(std::move(pipes)); +} + +void MergeTask::VerticalMergeStage::prepareVerticalMergeForOneColumn() const +{ + const auto & column_name = ctx->it_name_and_type->name; + + ctx->progress_before = global_ctx->merge_list_element_ptr->progress.load(std::memory_order_relaxed); + global_ctx->column_progress = std::make_unique(ctx->progress_before, ctx->column_sizes->columnWeight(column_name)); + + Pipe pipe; + if (ctx->prepared_pipe) + { + pipe = std::move(*ctx->prepared_pipe); + + auto next_column_it = std::next(ctx->it_name_and_type); + if (next_column_it != global_ctx->gathering_columns.end()) + ctx->prepared_pipe = createPipeForReadingOneColumn(next_column_it->name); + } + else + { + pipe = createPipeForReadingOneColumn(column_name); + } - auto pipe = Pipe::unitePipes(std::move(pipes)); ctx->rows_sources_read_buf->seek(0, 0); + bool is_result_sparse = global_ctx->new_data_part->getSerialization(column_name)->getKind() == ISerialization::Kind::SPARSE; const auto data_settings = global_ctx->data->getSettings(); auto transform = std::make_unique( @@ -955,11 +978,10 @@ void MergeTask::ExecuteAndFinalizeHorizontalPart::createMergedStream() part, global_ctx->merging_column_names, /*mark_ranges=*/ {}, + global_ctx->input_rows_filtered, /*apply_deleted_mask=*/ true, ctx->read_with_direct_io, - /*take_column_types_from_storage=*/ true, - /*quiet=*/ false, - global_ctx->input_rows_filtered); + /*prefetch=*/ false); if (global_ctx->metadata_snapshot->hasSortingKey()) { diff --git a/src/Storages/MergeTree/MergeTask.h b/src/Storages/MergeTree/MergeTask.h index c8b0662e3eb..1294fa30449 100644 --- a/src/Storages/MergeTree/MergeTask.h +++ b/src/Storages/MergeTree/MergeTask.h @@ -299,7 +299,9 @@ private: Float64 progress_before = 0; std::unique_ptr column_to{nullptr}; + std::optional prepared_pipe; size_t max_delayed_streams = 0; + bool use_prefetch = false; std::list> delayed_streams; size_t column_elems_written{0}; QueryPipeline column_parts_pipeline; @@ -340,6 +342,8 @@ private: bool executeVerticalMergeForOneColumn() const; void finalizeVerticalMergeForOneColumn() const; + Pipe createPipeForReadingOneColumn(const String & column_name) const; + VerticalMergeRuntimeContextPtr ctx; GlobalRuntimeContextPtr global_ctx; }; diff --git a/src/Storages/MergeTree/MergeTreeData.cpp b/src/Storages/MergeTree/MergeTreeData.cpp index 4b3093eeaac..b6373a22d9c 100644 --- a/src/Storages/MergeTree/MergeTreeData.cpp +++ b/src/Storages/MergeTree/MergeTreeData.cpp @@ -1177,8 +1177,6 @@ String MergeTreeData::MergingParams::getModeName() const case Graphite: return "Graphite"; case VersionedCollapsing: return "VersionedCollapsing"; } - - UNREACHABLE(); } Int64 MergeTreeData::getMaxBlockNumber() const diff --git a/src/Storages/MergeTree/MergeTreeDataWriter.cpp b/src/Storages/MergeTree/MergeTreeDataWriter.cpp index 426e36ce9a9..df4087b8546 100644 --- a/src/Storages/MergeTree/MergeTreeDataWriter.cpp +++ b/src/Storages/MergeTree/MergeTreeDataWriter.cpp @@ -360,8 +360,6 @@ Block MergeTreeDataWriter::mergeBlock( return std::make_shared( block, 1, sort_description, block_size + 1, /*block_size_bytes=*/0, merging_params.graphite_params, time(nullptr)); } - - UNREACHABLE(); }; auto merging_algorithm = get_merging_algorithm(); diff --git a/src/Storages/MergeTree/MergeTreeSequentialSource.cpp b/src/Storages/MergeTree/MergeTreeSequentialSource.cpp index fbb48b37482..865371b7d2c 100644 --- a/src/Storages/MergeTree/MergeTreeSequentialSource.cpp +++ b/src/Storages/MergeTree/MergeTreeSequentialSource.cpp @@ -42,8 +42,7 @@ public: std::optional mark_ranges_, bool apply_deleted_mask, bool read_with_direct_io_, - bool take_column_types_from_storage, - bool quiet = false); + bool prefetch); ~MergeTreeSequentialSource() override; @@ -96,8 +95,7 @@ MergeTreeSequentialSource::MergeTreeSequentialSource( std::optional mark_ranges_, bool apply_deleted_mask, bool read_with_direct_io_, - bool take_column_types_from_storage, - bool quiet) + bool prefetch) : ISource(storage_snapshot_->getSampleBlockForColumns(columns_to_read_)) , storage(storage_) , storage_snapshot(storage_snapshot_) @@ -107,16 +105,13 @@ MergeTreeSequentialSource::MergeTreeSequentialSource( , mark_ranges(std::move(mark_ranges_)) , mark_cache(storage.getContext()->getMarkCache()) { - if (!quiet) - { - /// Print column name but don't pollute logs in case of many columns. - if (columns_to_read.size() == 1) - LOG_DEBUG(log, "Reading {} marks from part {}, total {} rows starting from the beginning of the part, column {}", - data_part->getMarksCount(), data_part->name, data_part->rows_count, columns_to_read.front()); - else - LOG_DEBUG(log, "Reading {} marks from part {}, total {} rows starting from the beginning of the part", - data_part->getMarksCount(), data_part->name, data_part->rows_count); - } + /// Print column name but don't pollute logs in case of many columns. + if (columns_to_read.size() == 1) + LOG_DEBUG(log, "Reading {} marks from part {}, total {} rows starting from the beginning of the part, column {}", + data_part->getMarksCount(), data_part->name, data_part->rows_count, columns_to_read.front()); + else + LOG_DEBUG(log, "Reading {} marks from part {}, total {} rows starting from the beginning of the part", + data_part->getMarksCount(), data_part->name, data_part->rows_count); auto alter_conversions = storage.getAlterConversionsForPart(data_part); @@ -131,21 +126,12 @@ MergeTreeSequentialSource::MergeTreeSequentialSource( storage.supportsSubcolumns(), columns_to_read); - NamesAndTypesList columns_for_reader; - if (take_column_types_from_storage) - { - auto options = GetColumnsOptions(GetColumnsOptions::AllPhysical) - .withExtendedObjects() - .withVirtuals() - .withSubcolumns(storage.supportsSubcolumns()); + auto options = GetColumnsOptions(GetColumnsOptions::AllPhysical) + .withExtendedObjects() + .withVirtuals() + .withSubcolumns(storage.supportsSubcolumns()); - columns_for_reader = storage_snapshot->getColumnsByNames(options, columns_to_read); - } - else - { - /// take columns from data_part - columns_for_reader = data_part->getColumns().addTypes(columns_to_read); - } + auto columns_for_reader = storage_snapshot->getColumnsByNames(options, columns_to_read); const auto & context = storage.getContext(); ReadSettings read_settings = context->getReadSettings(); @@ -191,6 +177,9 @@ MergeTreeSequentialSource::MergeTreeSequentialSource( reader_settings, /*avg_value_size_hints=*/ {}, /*profile_callback=*/ {}); + + if (prefetch) + reader->prefetchBeginOfRange(Priority{}); } static void fillBlockNumberColumns( @@ -313,11 +302,10 @@ Pipe createMergeTreeSequentialSource( MergeTreeData::DataPartPtr data_part, Names columns_to_read, std::optional mark_ranges, + std::shared_ptr> filtered_rows_count, bool apply_deleted_mask, bool read_with_direct_io, - bool take_column_types_from_storage, - bool quiet, - std::shared_ptr> filtered_rows_count) + bool prefetch) { /// The part might have some rows masked by lightweight deletes @@ -329,7 +317,7 @@ Pipe createMergeTreeSequentialSource( auto column_part_source = std::make_shared(type, storage, storage_snapshot, data_part, columns_to_read, std::move(mark_ranges), - /*apply_deleted_mask=*/ false, read_with_direct_io, take_column_types_from_storage, quiet); + /*apply_deleted_mask=*/ false, read_with_direct_io, prefetch); Pipe pipe(std::move(column_part_source)); @@ -408,11 +396,10 @@ public: data_part, columns_to_read, std::move(mark_ranges), + /*filtered_rows_count=*/ nullptr, apply_deleted_mask, /*read_with_direct_io=*/ false, - /*take_column_types_from_storage=*/ true, - /*quiet=*/ false, - /*filtered_rows_count=*/ nullptr); + /*prefetch=*/ false); pipeline.init(Pipe(std::move(source))); } diff --git a/src/Storages/MergeTree/MergeTreeSequentialSource.h b/src/Storages/MergeTree/MergeTreeSequentialSource.h index a5e36a7726f..e6f055f776c 100644 --- a/src/Storages/MergeTree/MergeTreeSequentialSource.h +++ b/src/Storages/MergeTree/MergeTreeSequentialSource.h @@ -23,11 +23,10 @@ Pipe createMergeTreeSequentialSource( MergeTreeData::DataPartPtr data_part, Names columns_to_read, std::optional mark_ranges, + std::shared_ptr> filtered_rows_count, bool apply_deleted_mask, bool read_with_direct_io, - bool take_column_types_from_storage, - bool quiet, - std::shared_ptr> filtered_rows_count); + bool prefetch); class QueryPlan; diff --git a/src/Storages/MergeTree/MergeTreeSettings.h b/src/Storages/MergeTree/MergeTreeSettings.h index a00508fd1c1..cea999d0d98 100644 --- a/src/Storages/MergeTree/MergeTreeSettings.h +++ b/src/Storages/MergeTree/MergeTreeSettings.h @@ -35,7 +35,7 @@ struct Settings; M(UInt64, min_bytes_for_wide_part, 10485760, "Minimal uncompressed size in bytes to create part in wide format instead of compact", 0) \ M(UInt64, min_rows_for_wide_part, 0, "Minimal number of rows to create part in wide format instead of compact", 0) \ M(Float, ratio_of_defaults_for_sparse_serialization, 0.9375f, "Minimal ratio of number of default values to number of all values in column to store it in sparse serializations. If >= 1, columns will be always written in full serialization.", 0) \ - M(Bool, replace_long_file_name_to_hash, false, "If the file name for column is too long (more than 'max_file_name_length' bytes) replace it to SipHash128", 0) \ + M(Bool, replace_long_file_name_to_hash, true, "If the file name for column is too long (more than 'max_file_name_length' bytes) replace it to SipHash128", 0) \ M(UInt64, max_file_name_length, 127, "The maximal length of the file name to keep it as is without hashing", 0) \ M(UInt64, min_bytes_for_full_part_storage, 0, "Only available in ClickHouse Cloud", 0) \ M(UInt64, min_rows_for_full_part_storage, 0, "Only available in ClickHouse Cloud", 0) \ @@ -148,6 +148,7 @@ struct Settings; M(UInt64, vertical_merge_algorithm_min_rows_to_activate, 16 * 8192, "Minimal (approximate) sum of rows in merging parts to activate Vertical merge algorithm.", 0) \ M(UInt64, vertical_merge_algorithm_min_bytes_to_activate, 0, "Minimal (approximate) uncompressed size in bytes in merging parts to activate Vertical merge algorithm.", 0) \ M(UInt64, vertical_merge_algorithm_min_columns_to_activate, 11, "Minimal amount of non-PK columns to activate Vertical merge algorithm.", 0) \ + M(Bool, vertical_merge_remote_filesystem_prefetch, true, "If true prefetching of data from remote filesystem is used for the next column during merge", 0) \ M(UInt64, max_postpone_time_for_failed_mutations_ms, 5ULL * 60 * 1000, "The maximum postpone time for failed mutations.", 0) \ \ /** Compatibility settings */ \ diff --git a/src/Storages/MergeTree/MutateFromLogEntryTask.cpp b/src/Storages/MergeTree/MutateFromLogEntryTask.cpp index 3415b08cebb..8d40658bb2c 100644 --- a/src/Storages/MergeTree/MutateFromLogEntryTask.cpp +++ b/src/Storages/MergeTree/MutateFromLogEntryTask.cpp @@ -206,6 +206,7 @@ ReplicatedMergeMutateTaskBase::PrepareResult MutateFromLogEntryTask::prepare() task_context = Context::createCopy(storage.getContext()); task_context->makeQueryContext(); task_context->setCurrentQueryId(getQueryId()); + task_context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MUTATION); merge_mutate_entry = storage.getContext()->getMergeList().insert( storage.getStorageID(), diff --git a/src/Storages/MergeTree/MutatePlainMergeTreeTask.cpp b/src/Storages/MergeTree/MutatePlainMergeTreeTask.cpp index 0b19aebe36d..2fd02708421 100644 --- a/src/Storages/MergeTree/MutatePlainMergeTreeTask.cpp +++ b/src/Storages/MergeTree/MutatePlainMergeTreeTask.cpp @@ -139,6 +139,7 @@ ContextMutablePtr MutatePlainMergeTreeTask::createTaskContext() const context->makeQueryContext(); auto queryId = getQueryId(); context->setCurrentQueryId(queryId); + context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MUTATION); return context; } diff --git a/src/Storages/MergeTree/PartMovesBetweenShardsOrchestrator.cpp b/src/Storages/MergeTree/PartMovesBetweenShardsOrchestrator.cpp index 78fcfabb704..4228d7b70b6 100644 --- a/src/Storages/MergeTree/PartMovesBetweenShardsOrchestrator.cpp +++ b/src/Storages/MergeTree/PartMovesBetweenShardsOrchestrator.cpp @@ -616,8 +616,6 @@ PartMovesBetweenShardsOrchestrator::Entry PartMovesBetweenShardsOrchestrator::st } } } - - UNREACHABLE(); } void PartMovesBetweenShardsOrchestrator::removePins(const Entry & entry, zkutil::ZooKeeperPtr zk) diff --git a/src/Storages/StorageLoop.cpp b/src/Storages/StorageLoop.cpp new file mode 100644 index 00000000000..2062749e60b --- /dev/null +++ b/src/Storages/StorageLoop.cpp @@ -0,0 +1,49 @@ +#include "StorageLoop.h" +#include +#include +#include + + +namespace DB +{ + namespace ErrorCodes + { + + } + StorageLoop::StorageLoop( + const StorageID & table_id_, + StoragePtr inner_storage_) + : IStorage(table_id_) + , inner_storage(std::move(inner_storage_)) + { + StorageInMemoryMetadata storage_metadata = inner_storage->getInMemoryMetadata(); + setInMemoryMetadata(storage_metadata); + } + + + void StorageLoop::read( + QueryPlan & query_plan, + const Names & column_names, + const StorageSnapshotPtr & storage_snapshot, + SelectQueryInfo & query_info, + ContextPtr context, + QueryProcessingStage::Enum processed_stage, + size_t max_block_size, + size_t num_streams) + { + query_info.optimize_trivial_count = false; + + query_plan.addStep(std::make_unique( + column_names, query_info, storage_snapshot, context, processed_stage, inner_storage, max_block_size, num_streams + )); + } + + void registerStorageLoop(StorageFactory & factory) + { + factory.registerStorage("Loop", [](const StorageFactory::Arguments & args) + { + StoragePtr inner_storage; + return std::make_shared(args.table_id, inner_storage); + }); + } +} diff --git a/src/Storages/StorageLoop.h b/src/Storages/StorageLoop.h new file mode 100644 index 00000000000..48760b169c2 --- /dev/null +++ b/src/Storages/StorageLoop.h @@ -0,0 +1,33 @@ +#pragma once +#include "config.h" +#include + + +namespace DB +{ + + class StorageLoop final : public IStorage + { + public: + StorageLoop( + const StorageID & table_id, + StoragePtr inner_storage_); + + std::string getName() const override { return "Loop"; } + + void read( + QueryPlan & query_plan, + const Names & column_names, + const StorageSnapshotPtr & storage_snapshot, + SelectQueryInfo & query_info, + ContextPtr context, + QueryProcessingStage::Enum processed_stage, + size_t max_block_size, + size_t num_streams) override; + + bool supportsTrivialCountOptimization(const StorageSnapshotPtr &, ContextPtr) const override { return false; } + + private: + StoragePtr inner_storage; + }; +} diff --git a/src/Storages/WindowView/StorageWindowView.cpp b/src/Storages/WindowView/StorageWindowView.cpp index a9ec1f6c694..8bca1c97aad 100644 --- a/src/Storages/WindowView/StorageWindowView.cpp +++ b/src/Storages/WindowView/StorageWindowView.cpp @@ -297,7 +297,6 @@ namespace CASE_WINDOW_KIND(Year) #undef CASE_WINDOW_KIND } - UNREACHABLE(); } class AddingAggregatedChunkInfoTransform : public ISimpleTransform @@ -920,7 +919,6 @@ UInt32 StorageWindowView::getWindowLowerBound(UInt32 time_sec) CASE_WINDOW_KIND(Year) #undef CASE_WINDOW_KIND } - UNREACHABLE(); } UInt32 StorageWindowView::getWindowUpperBound(UInt32 time_sec) @@ -948,7 +946,6 @@ UInt32 StorageWindowView::getWindowUpperBound(UInt32 time_sec) CASE_WINDOW_KIND(Year) #undef CASE_WINDOW_KIND } - UNREACHABLE(); } void StorageWindowView::addFireSignal(std::set & signals) diff --git a/src/Storages/registerStorages.cpp b/src/Storages/registerStorages.cpp index 0fb00c08acc..47542b7b47e 100644 --- a/src/Storages/registerStorages.cpp +++ b/src/Storages/registerStorages.cpp @@ -25,6 +25,7 @@ void registerStorageLiveView(StorageFactory & factory); void registerStorageGenerateRandom(StorageFactory & factory); void registerStorageExecutable(StorageFactory & factory); void registerStorageWindowView(StorageFactory & factory); +void registerStorageLoop(StorageFactory & factory); #if USE_RAPIDJSON || USE_SIMDJSON void registerStorageFuzzJSON(StorageFactory & factory); #endif @@ -120,6 +121,7 @@ void registerStorages() registerStorageGenerateRandom(factory); registerStorageExecutable(factory); registerStorageWindowView(factory); + registerStorageLoop(factory); #if USE_RAPIDJSON || USE_SIMDJSON registerStorageFuzzJSON(factory); #endif diff --git a/src/TableFunctions/TableFunctionLoop.cpp b/src/TableFunctions/TableFunctionLoop.cpp new file mode 100644 index 00000000000..0281002e50f --- /dev/null +++ b/src/TableFunctions/TableFunctionLoop.cpp @@ -0,0 +1,156 @@ +#include "config.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registerTableFunctions.h" + +namespace DB +{ + namespace ErrorCodes + { + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int UNKNOWN_TABLE; + } + namespace + { + class TableFunctionLoop : public ITableFunction + { + public: + static constexpr auto name = "loop"; + std::string getName() const override { return name; } + private: + StoragePtr executeImpl(const ASTPtr & ast_function, ContextPtr context, const String & table_name, ColumnsDescription cached_columns, bool is_insert_query) const override; + const char * getStorageTypeName() const override { return "Loop"; } + ColumnsDescription getActualTableStructure(ContextPtr context, bool is_insert_query) const override; + void parseArguments(const ASTPtr & ast_function, ContextPtr context) override; + + // save the inner table function AST + ASTPtr inner_table_function_ast; + // save database and table + std::string loop_database_name; + std::string loop_table_name; + }; + + } + + void TableFunctionLoop::parseArguments(const ASTPtr & ast_function, ContextPtr context) + { + const auto & args_func = ast_function->as(); + + if (!args_func.arguments) + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Table function 'loop' must have arguments."); + + auto & args = args_func.arguments->children; + if (args.empty()) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "No arguments provided for table function 'loop'"); + + if (args.size() == 1) + { + if (const auto * id = args[0]->as()) + { + String id_name = id->name(); + + size_t dot_pos = id_name.find('.'); + if (id_name.find('.', dot_pos + 1) != String::npos) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "There are more than one dot"); + if (dot_pos != String::npos) + { + loop_database_name = id_name.substr(0, dot_pos); + loop_table_name = id_name.substr(dot_pos + 1); + } + else + { + loop_table_name = id_name; + } + } + else if (const auto * func = args[0]->as()) + { + inner_table_function_ast = args[0]; + } + else + { + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Expected identifier or function for argument 1 of function 'loop', got {}", args[0]->getID()); + } + } + // loop(database, table) + else if (args.size() == 2) + { + args[0] = evaluateConstantExpressionForDatabaseName(args[0], context); + args[1] = evaluateConstantExpressionOrIdentifierAsLiteral(args[1], context); + + loop_database_name = checkAndGetLiteralArgument(args[0], "database"); + loop_table_name = checkAndGetLiteralArgument(args[1], "table"); + } + else + { + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Table function 'loop' must have 1 or 2 arguments."); + } + } + + ColumnsDescription TableFunctionLoop::getActualTableStructure(ContextPtr /*context*/, bool /*is_insert_query*/) const + { + return ColumnsDescription(); + } + + StoragePtr TableFunctionLoop::executeImpl( + const ASTPtr & /*ast_function*/, + ContextPtr context, + const std::string & table_name, + ColumnsDescription cached_columns, + bool is_insert_query) const + { + StoragePtr storage; + if (!loop_table_name.empty()) + { + String database_name = loop_database_name; + if (database_name.empty()) + database_name = context->getCurrentDatabase(); + + auto database = DatabaseCatalog::instance().getDatabase(database_name); + storage = database->tryGetTable(loop_table_name, context); + if (!storage) + throw Exception(ErrorCodes::UNKNOWN_TABLE, "Table '{}' not found in database '{}'", loop_table_name, database_name); + } + + else + { + auto inner_table_function = TableFunctionFactory::instance().get(inner_table_function_ast, context); + storage = inner_table_function->execute( + inner_table_function_ast, + context, + table_name, + std::move(cached_columns), + is_insert_query); + } + auto res = std::make_shared( + StorageID(getDatabaseName(), table_name), + storage + ); + res->startup(); + return res; + } + + void registerTableFunctionLoop(TableFunctionFactory & factory) + { + factory.registerFunction( + {.documentation + = {.description=R"(The table function can be used to continuously output query results in an infinite loop.)", + .examples{{"loop", "SELECT * FROM loop((numbers(3)) LIMIT 7", "0" + "1" + "2" + "0" + "1" + "2" + "0"}} + }}); + } + +} diff --git a/src/TableFunctions/registerTableFunctions.cpp b/src/TableFunctions/registerTableFunctions.cpp index 26b9a771416..ca4913898f9 100644 --- a/src/TableFunctions/registerTableFunctions.cpp +++ b/src/TableFunctions/registerTableFunctions.cpp @@ -11,6 +11,7 @@ void registerTableFunctions() registerTableFunctionMerge(factory); registerTableFunctionRemote(factory); registerTableFunctionNumbers(factory); + registerTableFunctionLoop(factory); registerTableFunctionGenerateSeries(factory); registerTableFunctionNull(factory); registerTableFunctionZeros(factory); diff --git a/src/TableFunctions/registerTableFunctions.h b/src/TableFunctions/registerTableFunctions.h index 4a89b3afbb3..efde4d6dcdc 100644 --- a/src/TableFunctions/registerTableFunctions.h +++ b/src/TableFunctions/registerTableFunctions.h @@ -8,6 +8,7 @@ class TableFunctionFactory; void registerTableFunctionMerge(TableFunctionFactory & factory); void registerTableFunctionRemote(TableFunctionFactory & factory); void registerTableFunctionNumbers(TableFunctionFactory & factory); +void registerTableFunctionLoop(TableFunctionFactory & factory); void registerTableFunctionGenerateSeries(TableFunctionFactory & factory); void registerTableFunctionNull(TableFunctionFactory & factory); void registerTableFunctionZeros(TableFunctionFactory & factory); diff --git a/tests/ci/bugfix_validate_check.py b/tests/ci/bugfix_validate_check.py index 7aaf18e7765..d41fdaf05ff 100644 --- a/tests/ci/bugfix_validate_check.py +++ b/tests/ci/bugfix_validate_check.py @@ -109,12 +109,12 @@ def main(): test_script = jobs_scripts[test_job] if report_file.exists(): report_file.unlink() - extra_timeout_option = "" - if test_job == JobNames.STATELESS_TEST_RELEASE: - extra_timeout_option = str(3600) # "bugfix" must be present in checkname, as integration test runner checks this check_name = f"Validate bugfix: {test_job}" - command = f"python3 {test_script} '{check_name}' {extra_timeout_option} --validate-bugfix --report-to-file {report_file}" + command = ( + f"python3 {test_script} '{check_name}' " + f"--validate-bugfix --report-to-file {report_file}" + ) print(f"Going to validate job [{test_job}], command [{command}]") _ = subprocess.run( command, diff --git a/tests/ci/cherry_pick.py b/tests/ci/cherry_pick.py index 353b1461b93..e470621e2c5 100644 --- a/tests/ci/cherry_pick.py +++ b/tests/ci/cherry_pick.py @@ -33,9 +33,10 @@ from subprocess import CalledProcessError from typing import List, Optional import __main__ + from env_helper import TEMP_PATH from get_robot_token import get_best_robot_token -from git_helper import git_runner, is_shallow +from git_helper import GIT_PREFIX, git_runner, is_shallow from github_helper import GitHub, PullRequest, PullRequests, Repository from lambda_shared_package.lambda_shared.pr import Labels from ssh import SSHKey @@ -90,7 +91,7 @@ close it. name: str, pr: PullRequest, repo: Repository, - backport_created_label: str = Labels.PR_BACKPORTS_CREATED, + backport_created_label: str, ): self.name = name self.pr = pr @@ -104,10 +105,6 @@ close it. self.backport_created_label = backport_created_label - self.git_prefix = ( # All commits to cherrypick are done as robot-clickhouse - "git -c user.email=robot-clickhouse@users.noreply.github.com " - "-c user.name=robot-clickhouse -c commit.gpgsign=false" - ) self.pre_check() def pre_check(self): @@ -118,11 +115,12 @@ close it. if branch_updated: self._backported = True - def pop_prs(self, prs: PullRequests) -> None: + def pop_prs(self, prs: PullRequests) -> PullRequests: """the method processes all prs and pops the ReleaseBranch related prs""" to_pop = [] # type: List[int] for i, pr in enumerate(prs): if self.name not in pr.head.ref: + # this pr is not for the current branch continue if pr.head.ref.startswith(f"cherrypick/{self.name}"): self.cherrypick_pr = pr @@ -131,19 +129,22 @@ close it. self.backport_pr = pr to_pop.append(i) else: - logging.error( - "head ref of PR #%s isn't starting with known suffix", - pr.number, - ) + assert False, f"BUG! Invalid PR's branch [{pr.head.ref}]" + + # Cherry-pick or backport PR found, set @backported flag for current release branch + self._backported = True + for i in reversed(to_pop): # Going from the tail to keep the order and pop greater index first prs.pop(i) + return prs def process( # pylint: disable=too-many-return-statements self, dry_run: bool ) -> None: if self.backported: return + if not self.cherrypick_pr: if dry_run: logging.info( @@ -151,56 +152,54 @@ close it. ) return self.create_cherrypick() - if self.backported: - return - if self.cherrypick_pr is not None: - # Try to merge cherrypick instantly - if self.cherrypick_pr.mergeable and self.cherrypick_pr.state != "closed": - if dry_run: - logging.info( - "DRY RUN: Would merge cherry-pick PR for #%s", self.pr.number - ) - return - self.cherrypick_pr.merge() - # The PR needs update, since PR.merge doesn't update the object - self.cherrypick_pr.update() - if self.cherrypick_pr.merged: - if dry_run: - logging.info( - "DRY RUN: Would create backport PR for #%s", self.pr.number - ) - return - self.create_backport() - return - if self.cherrypick_pr.state == "closed": + assert self.cherrypick_pr, "BUG!" + + if self.cherrypick_pr.mergeable and self.cherrypick_pr.state != "closed": + if dry_run: logging.info( - "The cherrypick PR #%s for PR #%s is discarded", - self.cherrypick_pr.number, - self.pr.number, + "DRY RUN: Would merge cherry-pick PR for #%s", self.pr.number ) - self._backported = True return + self.cherrypick_pr.merge() + # The PR needs update, since PR.merge doesn't update the object + self.cherrypick_pr.update() + if self.cherrypick_pr.merged: + if dry_run: + logging.info( + "DRY RUN: Would create backport PR for #%s", self.pr.number + ) + return + self.create_backport() + return + if self.cherrypick_pr.state == "closed": logging.info( - "Cherrypick PR #%s for PR #%s have conflicts and unable to be merged", + "The cherry-pick PR #%s for PR #%s is discarded", self.cherrypick_pr.number, self.pr.number, ) - self.ping_cherry_pick_assignees(dry_run) + self._backported = True + return + logging.info( + "Cherry-pick PR #%s for PR #%s has conflicts and unable to be merged", + self.cherrypick_pr.number, + self.pr.number, + ) + self.ping_cherry_pick_assignees(dry_run) def create_cherrypick(self): # First, create backport branch: # Checkout release branch with discarding every change - git_runner(f"{self.git_prefix} checkout -f {self.name}") + git_runner(f"{GIT_PREFIX} checkout -f {self.name}") # Create or reset backport branch - git_runner(f"{self.git_prefix} checkout -B {self.backport_branch}") + git_runner(f"{GIT_PREFIX} checkout -B {self.backport_branch}") # Merge all changes from PR's the first parent commit w/o applying anything # It will allow to create a merge commit like it would be a cherry-pick first_parent = git_runner(f"git rev-parse {self.pr.merge_commit_sha}^1") - git_runner(f"{self.git_prefix} merge -s ours --no-edit {first_parent}") + git_runner(f"{GIT_PREFIX} merge -s ours --no-edit {first_parent}") # Second step, create cherrypick branch git_runner( - f"{self.git_prefix} branch -f " + f"{GIT_PREFIX} branch -f " f"{self.cherrypick_branch} {self.pr.merge_commit_sha}" ) @@ -209,7 +208,7 @@ close it. # manually to the release branch already try: output = git_runner( - f"{self.git_prefix} merge --no-commit --no-ff {self.cherrypick_branch}" + f"{GIT_PREFIX} merge --no-commit --no-ff {self.cherrypick_branch}" ) # 'up-to-date', 'up to date', who knows what else (╯°v°)╯ ^┻━┻ if output.startswith("Already up") and output.endswith("date."): @@ -219,18 +218,17 @@ close it. self.name, self.pr.number, ) - self._backported = True return except CalledProcessError: # There are most probably conflicts, they'll be resolved in PR - git_runner(f"{self.git_prefix} reset --merge") + git_runner(f"{GIT_PREFIX} reset --merge") else: # There are changes to apply, so continue - git_runner(f"{self.git_prefix} reset --merge") + git_runner(f"{GIT_PREFIX} reset --merge") - # Push, create the cherrypick PR, lable and assign it + # Push, create the cherry-pick PR, label and assign it for branch in [self.cherrypick_branch, self.backport_branch]: - git_runner(f"{self.git_prefix} push -f {self.REMOTE} {branch}:{branch}") + git_runner(f"{GIT_PREFIX} push -f {self.REMOTE} {branch}:{branch}") self.cherrypick_pr = self.repo.create_pull( title=f"Cherry pick #{self.pr.number} to {self.name}: {self.pr.title}", @@ -249,6 +247,7 @@ close it. self.cherrypick_pr.add_to_labels(Labels.PR_CRITICAL_BUGFIX) elif Labels.PR_BUGFIX in [label.name for label in self.pr.labels]: self.cherrypick_pr.add_to_labels(Labels.PR_BUGFIX) + self._backported = True self._assign_new_pr(self.cherrypick_pr) # update cherrypick PR to get the state for PR.mergable self.cherrypick_pr.update() @@ -258,21 +257,19 @@ close it. # Checkout the backport branch from the remote and make all changes to # apply like they are only one cherry-pick commit on top of release logging.info("Creating backport for PR #%s", self.pr.number) - git_runner(f"{self.git_prefix} checkout -f {self.backport_branch}") - git_runner( - f"{self.git_prefix} pull --ff-only {self.REMOTE} {self.backport_branch}" - ) + git_runner(f"{GIT_PREFIX} checkout -f {self.backport_branch}") + git_runner(f"{GIT_PREFIX} pull --ff-only {self.REMOTE} {self.backport_branch}") merge_base = git_runner( - f"{self.git_prefix} merge-base " + f"{GIT_PREFIX} merge-base " f"{self.REMOTE}/{self.name} {self.backport_branch}" ) - git_runner(f"{self.git_prefix} reset --soft {merge_base}") + git_runner(f"{GIT_PREFIX} reset --soft {merge_base}") title = f"Backport #{self.pr.number} to {self.name}: {self.pr.title}" - git_runner(f"{self.git_prefix} commit --allow-empty -F -", input=title) + git_runner(f"{GIT_PREFIX} commit --allow-empty -F -", input=title) # Push with force, create the backport PR, lable and assign it git_runner( - f"{self.git_prefix} push -f {self.REMOTE} " + f"{GIT_PREFIX} push -f {self.REMOTE} " f"{self.backport_branch}:{self.backport_branch}" ) self.backport_pr = self.repo.create_pull( @@ -343,7 +340,7 @@ close it. @property def backported(self) -> bool: - return self._backported or self.backport_pr is not None + return self._backported def __repr__(self): return self.name @@ -356,16 +353,22 @@ class Backport: repo: str, fetch_from: Optional[str], dry_run: bool, - must_create_backport_labels: List[str], - backport_created_label: str, ): self.gh = gh self._repo_name = repo self._fetch_from = fetch_from self.dry_run = dry_run - self.must_create_backport_labels = must_create_backport_labels - self.backport_created_label = backport_created_label + self.must_create_backport_label = ( + Labels.MUST_BACKPORT + if self._repo_name == self._fetch_from + else Labels.MUST_BACKPORT_CLOUD + ) + self.backport_created_label = ( + Labels.PR_BACKPORTS_CREATED + if self._repo_name == self._fetch_from + else Labels.PR_BACKPORTS_CREATED_CLOUD + ) self._remote = "" self._remote_line = "" @@ -465,7 +468,7 @@ class Backport: query_args = { "query": f"type:pr repo:{self._fetch_from} -label:{self.backport_created_label}", "label": ",".join( - self.labels_to_backport + self.must_create_backport_labels + self.labels_to_backport + [self.must_create_backport_label] ), "merged": [since_date, tomorrow], } @@ -482,23 +485,19 @@ class Backport: self.process_pr(pr) except Exception as e: logging.error( - "During processing the PR #%s error occured: %s", pr.number, e + "During processing the PR #%s error occurred: %s", pr.number, e ) self.error = e def process_pr(self, pr: PullRequest) -> None: pr_labels = [label.name for label in pr.labels] - for label in self.must_create_backport_labels: - # We backport any vXXX-must-backport to all branches of the fetch repo (better than no backport) - if label in pr_labels or self._fetch_from: - branches = [ - ReleaseBranch(br, pr, self.repo, self.backport_created_label) - for br in self.release_branches - ] # type: List[ReleaseBranch] - break - - if not branches: + if self.must_create_backport_label in pr_labels: + branches = [ + ReleaseBranch(br, pr, self.repo, self.backport_created_label) + for br in self.release_branches + ] # type: List[ReleaseBranch] + else: branches = [ ReleaseBranch(br, pr, self.repo, self.backport_created_label) for br in [ @@ -507,20 +506,14 @@ class Backport: if label in self.labels_to_backport ] ] - if not branches: - # This is definitely some error. There must be at least one branch - # It also make the whole program exit code non-zero - self.error = Exception( - f"There are no branches to backport PR #{pr.number}, logical error" - ) - raise self.error + assert branches, "BUG!" logging.info( " PR #%s is supposed to be backported to %s", pr.number, ", ".join(map(str, branches)), ) - # All PRs for cherrypick and backport branches as heads + # All PRs for cherry-pick and backport branches as heads query_suffix = " ".join( [ f"head:{branch.backport_branch} head:{branch.cherrypick_branch}" @@ -532,29 +525,15 @@ class Backport: label=f"{Labels.PR_BACKPORT},{Labels.PR_CHERRYPICK}", ) for br in branches: - br.pop_prs(bp_cp_prs) - - if bp_cp_prs: - # This is definitely some error. All prs must be consumed by - # branches with ReleaseBranch.pop_prs. It also makes the whole - # program exit code non-zero - self.error = Exception( - "The following PRs are not filtered by release branches:\n" - "\n".join(map(str, bp_cp_prs)) - ) - raise self.error - - if all(br.backported for br in branches): - # Let's check if the PR is already backported - self.mark_pr_backported(pr) - return + bp_cp_prs = br.pop_prs(bp_cp_prs) + assert not bp_cp_prs, "BUG!" for br in branches: br.process(self.dry_run) - if all(br.backported for br in branches): - # And check it after the running - self.mark_pr_backported(pr) + for br in branches: + assert br.backported, f"BUG! backport to branch [{br}] failed" + self.mark_pr_backported(pr) def mark_pr_backported(self, pr: PullRequest) -> None: if self.dry_run: @@ -591,19 +570,6 @@ def parse_args(): ) parser.add_argument("--dry-run", action="store_true", help="do not create anything") - parser.add_argument( - "--must-create-backport-label", - default=Labels.MUST_BACKPORT, - choices=(Labels.MUST_BACKPORT, Labels.MUST_BACKPORT_CLOUD), - help="label to filter PRs to backport", - nargs="+", - ) - parser.add_argument( - "--backport-created-label", - default=Labels.PR_BACKPORTS_CREATED, - choices=(Labels.PR_BACKPORTS_CREATED, Labels.PR_BACKPORTS_CREATED_CLOUD), - help="label to mark PRs as backported", - ) parser.add_argument( "--reserve-search-days", default=0, @@ -668,10 +634,6 @@ def main(): args.repo, args.from_repo, args.dry_run, - args.must_create_backport_label - if isinstance(args.must_create_backport_label, list) - else [args.must_create_backport_label], - args.backport_created_label, ) # https://github.com/python/mypy/issues/3004 bp.gh.cache_path = temp_path / "gh_cache" diff --git a/tests/ci/ci.py b/tests/ci/ci.py index c4e06ccd79a..c5271a945c0 100644 --- a/tests/ci/ci.py +++ b/tests/ci/ci.py @@ -18,6 +18,7 @@ import docker_images_helper import upload_result_helper from build_check import get_release_or_pr from ci_config import CI_CONFIG, Build, CILabels, CIStages, JobNames, StatusNames +from ci_metadata import CiMetadata from ci_utils import GHActions, is_hex, normalize_string from clickhouse_helper import ( CiLogsCredentials, @@ -39,22 +40,23 @@ from digest_helper import DockerDigester, JobDigester from env_helper import ( CI, GITHUB_JOB_API_URL, + GITHUB_REPOSITORY, + GITHUB_RUN_ID, GITHUB_RUN_URL, REPO_COPY, REPORT_PATH, S3_BUILDS_BUCKET, TEMP_PATH, - GITHUB_RUN_ID, - GITHUB_REPOSITORY, ) from get_robot_token import get_best_robot_token from git_helper import GIT_PREFIX, Git from git_helper import Runner as GitRunner from github_helper import GitHub from pr_info import PRInfo -from report import ERROR, SUCCESS, BuildResult, JobReport, PENDING +from report import ERROR, FAILURE, PENDING, SUCCESS, BuildResult, JobReport, TestResult from s3_helper import S3Helper -from ci_metadata import CiMetadata +from stopwatch import Stopwatch +from tee_popen import TeePopen from version_helper import get_version_from_repo # pylint: disable=too-many-lines @@ -1867,8 +1869,8 @@ def _run_test(job_name: str, run_command: str) -> int: run_command or CI_CONFIG.get_job_config(job_name).run_command ), "Run command must be provided as input argument or be configured in job config" - if CI_CONFIG.get_job_config(job_name).timeout: - os.environ["KILL_TIMEOUT"] = str(CI_CONFIG.get_job_config(job_name).timeout) + env = os.environ.copy() + timeout = CI_CONFIG.get_job_config(job_name).timeout or None if not run_command: run_command = "/".join( @@ -1879,26 +1881,27 @@ def _run_test(job_name: str, run_command: str) -> int: print("Use run command from a job config") else: print("Use run command from the workflow") - os.environ["CHECK_NAME"] = job_name + env["CHECK_NAME"] = job_name print(f"Going to start run command [{run_command}]") - process = subprocess.run( - run_command, - stdout=sys.stdout, - stderr=sys.stderr, - text=True, - check=False, - shell=True, - ) + stopwatch = Stopwatch() + job_log = Path(TEMP_PATH) / "job_log.txt" + with TeePopen(run_command, job_log, env, timeout) as process: + retcode = process.wait() + if retcode != 0: + print(f"Run action failed for: [{job_name}] with exit code [{retcode}]") + if timeout and process.timeout_exceeded: + print(f"Timeout {timeout} exceeded, dumping the job report") + JobReport( + status=FAILURE, + description=f"Timeout {timeout} exceeded", + test_results=[TestResult.create_check_timeout_expired(timeout)], + start_time=stopwatch.start_time_str, + duration=stopwatch.duration_seconds, + additional_files=[job_log], + ).dump() - if process.returncode == 0: - print(f"Run action done for: [{job_name}]") - exit_code = 0 - else: - print( - f"Run action failed for: [{job_name}] with exit code [{process.returncode}]" - ) - exit_code = process.returncode - return exit_code + print(f"Run action done for: [{job_name}]") + return retcode def _get_ext_check_name(check_name: str) -> str: diff --git a/tests/ci/ci_config.py b/tests/ci/ci_config.py index 68fa6f1cf10..a494f7cf712 100644 --- a/tests/ci/ci_config.py +++ b/tests/ci/ci_config.py @@ -175,8 +175,8 @@ class JobNames(metaclass=WithIter): COMPATIBILITY_TEST = "Compatibility check (amd64)" COMPATIBILITY_TEST_ARM = "Compatibility check (aarch64)" - CLCIKBENCH_TEST = "ClickBench (amd64)" - CLCIKBENCH_TEST_ARM = "ClickBench (aarch64)" + CLICKBENCH_TEST = "ClickBench (amd64)" + CLICKBENCH_TEST_ARM = "ClickBench (aarch64)" LIBFUZZER_TEST = "libFuzzer tests" @@ -472,17 +472,18 @@ compatibility_test_common_params = { } stateless_test_common_params = { "digest": stateless_check_digest, - "run_command": 'functional_test_check.py "$CHECK_NAME" $KILL_TIMEOUT', + "run_command": 'functional_test_check.py "$CHECK_NAME"', "timeout": 10800, } stateful_test_common_params = { "digest": stateful_check_digest, - "run_command": 'functional_test_check.py "$CHECK_NAME" $KILL_TIMEOUT', + "run_command": 'functional_test_check.py "$CHECK_NAME"', "timeout": 3600, } stress_test_common_params = { "digest": stress_check_digest, "run_command": "stress_check.py", + "timeout": 9000, } upgrade_test_common_params = { "digest": upgrade_check_digest, @@ -531,6 +532,7 @@ clickbench_test_params = { docker=["clickhouse/clickbench"], ), "run_command": 'clickbench.py "$CHECK_NAME"', + "timeout": 900, } install_test_params = JobConfig( digest=install_check_digest, @@ -1067,6 +1069,7 @@ CI_CONFIG = CIConfig( Build.PACKAGE_TSAN, Build.PACKAGE_MSAN, Build.PACKAGE_DEBUG, + Build.BINARY_RELEASE, ] ), JobNames.BUILD_CHECK_SPECIAL: BuildReportConfig( @@ -1084,7 +1087,6 @@ CI_CONFIG = CIConfig( Build.BINARY_AMD64_COMPAT, Build.BINARY_AMD64_MUSL, Build.PACKAGE_RELEASE_COVERAGE, - Build.BINARY_RELEASE, Build.FUZZERS, ] ), @@ -1111,6 +1113,7 @@ CI_CONFIG = CIConfig( exclude_files=[".md"], docker=["clickhouse/fasttest"], ), + timeout=2400, ), ), JobNames.STYLE_CHECK: TestConfig( @@ -1123,7 +1126,9 @@ CI_CONFIG = CIConfig( "", # we run this check by label - no digest required job_config=JobConfig( - run_by_label="pr-bugfix", run_command="bugfix_validate_check.py" + run_by_label="pr-bugfix", + run_command="bugfix_validate_check.py", + timeout=900, ), ), }, @@ -1357,10 +1362,10 @@ CI_CONFIG = CIConfig( Build.PACKAGE_RELEASE, job_config=sqllogic_test_params ), JobNames.SQLTEST: TestConfig(Build.PACKAGE_RELEASE, job_config=sql_test_params), - JobNames.CLCIKBENCH_TEST: TestConfig( + JobNames.CLICKBENCH_TEST: TestConfig( Build.PACKAGE_RELEASE, job_config=JobConfig(**clickbench_test_params) # type: ignore ), - JobNames.CLCIKBENCH_TEST_ARM: TestConfig( + JobNames.CLICKBENCH_TEST_ARM: TestConfig( Build.PACKAGE_AARCH64, job_config=JobConfig(**clickbench_test_params) # type: ignore ), JobNames.LIBFUZZER_TEST: TestConfig( @@ -1368,7 +1373,7 @@ CI_CONFIG = CIConfig( job_config=JobConfig( run_by_label=CILabels.libFuzzer, timeout=10800, - run_command='libfuzzer_test_check.py "$CHECK_NAME" 10800', + run_command='libfuzzer_test_check.py "$CHECK_NAME"', ), ), # type: ignore }, @@ -1386,6 +1391,9 @@ REQUIRED_CHECKS = [ JobNames.FAST_TEST, JobNames.STATEFUL_TEST_RELEASE, JobNames.STATELESS_TEST_RELEASE, + JobNames.STATELESS_TEST_ASAN, + JobNames.STATELESS_TEST_FLAKY_ASAN, + JobNames.STATEFUL_TEST_ASAN, JobNames.STYLE_CHECK, JobNames.UNIT_TEST_ASAN, JobNames.UNIT_TEST_MSAN, @@ -1419,6 +1427,11 @@ class CheckDescription: CHECK_DESCRIPTIONS = [ + CheckDescription( + StatusNames.SYNC, + "If it fails, ask a maintainer for help", + lambda x: x == StatusNames.SYNC, + ), CheckDescription( "AST fuzzer", "Runs randomly generated queries to catch program errors. " diff --git a/tests/ci/ci_utils.py b/tests/ci/ci_utils.py index 97d42f9845b..2bc0a4fef14 100644 --- a/tests/ci/ci_utils.py +++ b/tests/ci/ci_utils.py @@ -1,8 +1,7 @@ -from contextlib import contextmanager import os -import signal -from typing import Any, List, Union, Iterator +from contextlib import contextmanager from pathlib import Path +from typing import Any, Iterator, List, Union class WithIter(type): @@ -49,14 +48,3 @@ class GHActions: for line in lines: print(line) print("::endgroup::") - - -def set_job_timeout(): - def timeout_handler(_signum, _frame): - print("Timeout expired") - raise TimeoutError("Job's KILL_TIMEOUT expired") - - kill_timeout = int(os.getenv("KILL_TIMEOUT", "0")) - assert kill_timeout > 0, "kill timeout must be provided in KILL_TIMEOUT env" - signal.signal(signal.SIGALRM, timeout_handler) - signal.alarm(kill_timeout) diff --git a/tests/ci/fast_test_check.py b/tests/ci/fast_test_check.py index 383f5b340c7..ed727dd3659 100644 --- a/tests/ci/fast_test_check.py +++ b/tests/ci/fast_test_check.py @@ -1,5 +1,4 @@ #!/usr/bin/env python3 -import argparse import csv import logging import os @@ -11,15 +10,7 @@ from typing import Tuple from docker_images_helper import DockerImage, get_docker_image, pull_image from env_helper import REPO_COPY, S3_BUILDS_BUCKET, TEMP_PATH from pr_info import PRInfo -from report import ( - ERROR, - FAILURE, - SUCCESS, - JobReport, - TestResult, - TestResults, - read_test_results, -) +from report import ERROR, FAILURE, SUCCESS, JobReport, TestResults, read_test_results from stopwatch import Stopwatch from tee_popen import TeePopen @@ -80,30 +71,9 @@ def process_results(result_directory: Path) -> Tuple[str, str, TestResults]: return state, description, test_results -def parse_args() -> argparse.Namespace: - parser = argparse.ArgumentParser( - formatter_class=argparse.ArgumentDefaultsHelpFormatter, - description="FastTest script", - ) - - parser.add_argument( - "--timeout", - type=int, - # Fast tests in most cases done within 10 min and 40 min timout should be sufficient, - # though due to cold cache build time can be much longer - # https://pastila.nl/?146195b6/9bb99293535e3817a9ea82c3f0f7538d.link#5xtClOjkaPLEjSuZ92L2/g== - default=40, - help="Timeout in minutes", - ) - args = parser.parse_args() - args.timeout = args.timeout * 60 - return args - - def main(): logging.basicConfig(level=logging.INFO) stopwatch = Stopwatch() - args = parse_args() temp_path = Path(TEMP_PATH) temp_path.mkdir(parents=True, exist_ok=True) @@ -134,14 +104,10 @@ def main(): logs_path.mkdir(parents=True, exist_ok=True) run_log_path = logs_path / "run.log" - timeout_expired = False - with TeePopen(run_cmd, run_log_path, timeout=args.timeout) as process: + with TeePopen(run_cmd, run_log_path) as process: retcode = process.wait() - if process.timeout_exceeded: - logging.info("Timeout expired for command: %s", run_cmd) - timeout_expired = True - elif retcode == 0: + if retcode == 0: logging.info("Run successfully") else: logging.info("Run failed") @@ -175,11 +141,6 @@ def main(): else: state, description, test_results = process_results(output_path) - if timeout_expired: - test_results.append(TestResult.create_check_timeout_expired(args.timeout)) - state = FAILURE - description = test_results[-1].name - JobReport( description=description, test_results=test_results, diff --git a/tests/ci/functional_test_check.py b/tests/ci/functional_test_check.py index e898138fb3a..5bb46d7ec2f 100644 --- a/tests/ci/functional_test_check.py +++ b/tests/ci/functional_test_check.py @@ -68,7 +68,6 @@ def get_run_command( repo_path: Path, result_path: Path, server_log_path: Path, - kill_timeout: int, additional_envs: List[str], ci_logs_args: str, image: DockerImage, @@ -86,7 +85,6 @@ def get_run_command( ) envs = [ - f"-e MAX_RUN_TIME={int(0.9 * kill_timeout)}", # a static link, don't use S3_URL or S3_DOWNLOAD '-e S3_URL="https://s3.amazonaws.com/clickhouse-datasets"', ] @@ -192,7 +190,6 @@ def process_results( def parse_args(): parser = argparse.ArgumentParser() parser.add_argument("check_name") - parser.add_argument("kill_timeout", type=int) parser.add_argument( "--validate-bugfix", action="store_true", @@ -224,12 +221,7 @@ def main(): assert ( check_name ), "Check name must be provided as an input arg or in CHECK_NAME env" - kill_timeout = args.kill_timeout or int(os.getenv("KILL_TIMEOUT", "0")) - assert ( - kill_timeout > 0 - ), "kill timeout must be provided as an input arg or in KILL_TIMEOUT env" validate_bugfix_check = args.validate_bugfix - print(f"Runnin check [{check_name}] with timeout [{kill_timeout}]") flaky_check = "flaky" in check_name.lower() @@ -288,7 +280,6 @@ def main(): repo_path, result_path, server_log_path, - kill_timeout, additional_envs, ci_logs_args, docker_image, diff --git a/tests/ci/git_helper.py b/tests/ci/git_helper.py index f15f1273bb9..8ec90dd7b2d 100644 --- a/tests/ci/git_helper.py +++ b/tests/ci/git_helper.py @@ -1,9 +1,12 @@ #!/usr/bin/env python import argparse +import atexit import logging +import os import os.path as p import re import subprocess +import tempfile from typing import Any, List, Optional logger = logging.getLogger(__name__) @@ -19,12 +22,16 @@ SHA_REGEXP = re.compile(r"\A([0-9]|[a-f]){40}\Z") CWD = p.dirname(p.realpath(__file__)) TWEAK = 1 -GIT_PREFIX = ( # All commits to remote are done as robot-clickhouse - "git -c user.email=robot-clickhouse@users.noreply.github.com " - "-c user.name=robot-clickhouse -c commit.gpgsign=false " - "-c core.sshCommand=" - "'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'" -) +with tempfile.NamedTemporaryFile("w", delete=False) as f: + GIT_KNOWN_HOSTS_FILE = f.name + GIT_PREFIX = ( # All commits to remote are done as robot-clickhouse + "git -c user.email=robot-clickhouse@users.noreply.github.com " + "-c user.name=robot-clickhouse -c commit.gpgsign=false " + "-c core.sshCommand=" + f"'ssh -o UserKnownHostsFile={GIT_KNOWN_HOSTS_FILE} " + "-o StrictHostKeyChecking=accept-new'" + ) + atexit.register(os.remove, f.name) # Py 3.8 removeprefix and removesuffix diff --git a/tests/ci/install_check.py b/tests/ci/install_check.py index 54a18c7e26c..6c33b1f2066 100644 --- a/tests/ci/install_check.py +++ b/tests/ci/install_check.py @@ -1,25 +1,21 @@ #!/usr/bin/env python3 import argparse - import logging -import sys import subprocess +import sys from pathlib import Path from shutil import copy2 from typing import Dict - from build_download_helper import download_builds_filter - from compress_files import compress_fast -from docker_images_helper import DockerImage, pull_image, get_docker_image -from env_helper import CI, REPORT_PATH, TEMP_PATH as TEMP -from report import JobReport, TestResults, TestResult, FAILURE, FAIL, OK, SUCCESS +from docker_images_helper import DockerImage, get_docker_image, pull_image +from env_helper import REPORT_PATH +from env_helper import TEMP_PATH as TEMP +from report import FAIL, FAILURE, OK, SUCCESS, JobReport, TestResult, TestResults from stopwatch import Stopwatch from tee_popen import TeePopen -from ci_utils import set_job_timeout - RPM_IMAGE = "clickhouse/install-rpm-test" DEB_IMAGE = "clickhouse/install-deb-test" @@ -256,9 +252,6 @@ def main(): args = parse_args() - if CI: - set_job_timeout() - TEMP_PATH.mkdir(parents=True, exist_ok=True) LOGS_PATH.mkdir(parents=True, exist_ok=True) diff --git a/tests/ci/jepsen_check.py b/tests/ci/jepsen_check.py index 6ed411a11ef..1e61fd9fab7 100644 --- a/tests/ci/jepsen_check.py +++ b/tests/ci/jepsen_check.py @@ -10,6 +10,7 @@ from typing import Any, List import boto3 # type: ignore import requests + from build_download_helper import ( download_build_with_progress, get_build_name_for_check, @@ -201,7 +202,7 @@ def main(): docker_image = KEEPER_IMAGE_NAME if args.program == "keeper" else SERVER_IMAGE_NAME if pr_info.is_scheduled or pr_info.is_dispatched: - # get latest clcikhouse by the static link for latest master buit - get its version and provide permanent url for this version to the jepsen + # get latest clickhouse by the static link for latest master buit - get its version and provide permanent url for this version to the jepsen build_url = f"{S3_URL}/{S3_BUILDS_BUCKET}/master/amd64/clickhouse" download_build_with_progress(build_url, Path(TEMP_PATH) / "clickhouse") git_runner.run(f"chmod +x {TEMP_PATH}/clickhouse") diff --git a/tests/ci/libfuzzer_test_check.py b/tests/ci/libfuzzer_test_check.py index 4bb39010978..1f5936c3fec 100644 --- a/tests/ci/libfuzzer_test_check.py +++ b/tests/ci/libfuzzer_test_check.py @@ -46,7 +46,6 @@ def get_run_command( fuzzers_path: Path, repo_path: Path, result_path: Path, - kill_timeout: int, additional_envs: List[str], ci_logs_args: str, image: DockerImage, @@ -59,7 +58,6 @@ def get_run_command( ) envs = [ - f"-e MAX_RUN_TIME={int(0.9 * kill_timeout)}", # a static link, don't use S3_URL or S3_DOWNLOAD '-e S3_URL="https://s3.amazonaws.com/clickhouse-datasets"', ] @@ -83,7 +81,6 @@ def get_run_command( def parse_args(): parser = argparse.ArgumentParser() parser.add_argument("check_name") - parser.add_argument("kill_timeout", type=int) return parser.parse_args() @@ -99,7 +96,6 @@ def main(): args = parse_args() check_name = args.check_name - kill_timeout = args.kill_timeout pr_info = PRInfo() @@ -145,7 +141,6 @@ def main(): fuzzers_path, repo_path, result_path, - kill_timeout, additional_envs, ci_logs_args, docker_image, diff --git a/tests/ci/report.py b/tests/ci/report.py index 8676c998afb..ee58efdba52 100644 --- a/tests/ci/report.py +++ b/tests/ci/report.py @@ -288,7 +288,7 @@ class JobReport: start_time: str duration: float additional_files: Union[Sequence[str], Sequence[Path]] - # clcikhouse version, build job only + # clickhouse version, build job only version: str = "" # checkname to set in commit status, set if differs from jjob name check_name: str = "" @@ -401,30 +401,40 @@ class BuildResult: @classmethod def load_any(cls, build_name: str, pr_number: int, head_ref: str): # type: ignore """ - loads report from suitable report file with the following priority: - 1. report from PR with the same @pr_number - 2. report from branch with the same @head_ref - 3. report from the master - 4. any other report + loads build report from one of all available report files (matching the job digest) + with the following priority: + 1. report for the current PR @pr_number (might happen in PR' wf with or without job reuse) + 2. report for the current branch @head_ref (might happen in release/master' wf with or without job reuse) + 3. report for master branch (might happen in any workflow in case of job reuse) + 4. any other report (job reuse from another PR, if master report is not available yet) """ - reports = [] + pr_report = None + ref_report = None + master_report = None + any_report = None for file in Path(REPORT_PATH).iterdir(): if f"{build_name}.json" in file.name: - reports.append(file) - if not reports: + any_report = file + if "_master_" in file.name: + master_report = file + elif f"_{head_ref}_" in file.name: + ref_report = file + elif pr_number and f"_{pr_number}_" in file.name: + pr_report = file + + if not any_report: return None - file_path = None - for file in reports: - if pr_number and f"_{pr_number}_" in file.name: - file_path = file - break - if f"_{head_ref}_" in file.name: - file_path = file - break - if "_master_" in file.name: - file_path = file - break - return cls.load_from_file(file_path or reports[-1]) + + if pr_report: + file_path = pr_report + elif ref_report: + file_path = ref_report + elif master_report: + file_path = master_report + else: + file_path = any_report + + return cls.load_from_file(file_path) @classmethod def load_from_file(cls, file: Union[Path, str]): # type: ignore diff --git a/tests/ci/sqllogic_test.py b/tests/ci/sqllogic_test.py index 6ea6fa19d91..63880f07e92 100755 --- a/tests/ci/sqllogic_test.py +++ b/tests/ci/sqllogic_test.py @@ -9,8 +9,8 @@ from pathlib import Path from typing import Tuple from build_download_helper import download_all_deb_packages -from docker_images_helper import DockerImage, pull_image, get_docker_image -from env_helper import REPORT_PATH, TEMP_PATH, REPO_COPY +from docker_images_helper import DockerImage, get_docker_image, pull_image +from env_helper import REPO_COPY, REPORT_PATH, TEMP_PATH from report import ( ERROR, FAIL, @@ -72,11 +72,6 @@ def parse_args() -> argparse.Namespace: required=False, default="", ) - parser.add_argument( - "--kill-timeout", - required=False, - default=0, - ) return parser.parse_args() @@ -96,10 +91,6 @@ def main(): assert ( check_name ), "Check name must be provided as an input arg or in CHECK_NAME env" - kill_timeout = args.kill_timeout or int(os.getenv("KILL_TIMEOUT", "0")) - assert ( - kill_timeout > 0 - ), "kill timeout must be provided as an input arg or in KILL_TIMEOUT env" docker_image = pull_image(get_docker_image(IMAGE_NAME)) @@ -127,7 +118,7 @@ def main(): ) logging.info("Going to run func tests: %s", run_command) - with TeePopen(run_command, run_log_path, timeout=kill_timeout) as process: + with TeePopen(run_command, run_log_path) as process: retcode = process.wait() if retcode == 0: logging.info("Run successfully") diff --git a/tests/ci/stress_check.py b/tests/ci/stress_check.py index 027d7316e23..bf0281cae68 100644 --- a/tests/ci/stress_check.py +++ b/tests/ci/stress_check.py @@ -14,7 +14,7 @@ from docker_images_helper import DockerImage, get_docker_image, pull_image from env_helper import REPO_COPY, REPORT_PATH, TEMP_PATH from get_robot_token import get_parameter_from_ssm from pr_info import PRInfo -from report import ERROR, JobReport, TestResult, TestResults, read_test_results +from report import ERROR, JobReport, TestResults, read_test_results from stopwatch import Stopwatch from tee_popen import TeePopen @@ -161,14 +161,9 @@ def run_stress_test(docker_image_name: str) -> None: ) logging.info("Going to run stress test: %s", run_command) - timeout_expired = False - timeout = 60 * 150 - with TeePopen(run_command, run_log_path, timeout=timeout) as process: + with TeePopen(run_command, run_log_path) as process: retcode = process.wait() - if process.timeout_exceeded: - logging.info("Timeout expired for command: %s", run_command) - timeout_expired = True - elif retcode == 0: + if retcode == 0: logging.info("Run successfully") else: logging.info("Run failed") @@ -180,11 +175,6 @@ def run_stress_test(docker_image_name: str) -> None: result_path, server_log_path, run_log_path ) - if timeout_expired: - test_results.append(TestResult.create_check_timeout_expired(timeout)) - state = "failure" - description = test_results[-1].name - JobReport( description=description, test_results=test_results, diff --git a/tests/clickhouse-test b/tests/clickhouse-test index 133d635f8a0..07e86fbfecc 100755 --- a/tests/clickhouse-test +++ b/tests/clickhouse-test @@ -1223,12 +1223,9 @@ class TestCase: return FailureReason.S3_STORAGE elif ( tags - and ("no-s3-storage-with-slow-build" in tags) + and "no-s3-storage-with-slow-build" in tags and args.s3_storage - and ( - BuildFlags.THREAD in args.build_flags - or BuildFlags.DEBUG in args.build_flags - ) + and BuildFlags.RELEASE not in args.build_flags ): return FailureReason.S3_STORAGE diff --git a/tests/config/config.d/storage_conf.xml b/tests/config/config.d/storage_conf.xml index 0e6cd4b0e03..7a9b579c00a 100644 --- a/tests/config/config.d/storage_conf.xml +++ b/tests/config/config.d/storage_conf.xml @@ -92,6 +92,13 @@ 22548578304 100 + + s3 + http://localhost:11111/test/special/ + clickhouse + clickhouse + 0 + @@ -107,6 +114,13 @@ + + +
+ s3_no_cache +
+
+
diff --git a/tests/integration/helpers/s3_mocks/broken_s3.py b/tests/integration/helpers/s3_mocks/broken_s3.py index 7d0127bc1c4..686abc76bdf 100644 --- a/tests/integration/helpers/s3_mocks/broken_s3.py +++ b/tests/integration/helpers/s3_mocks/broken_s3.py @@ -183,6 +183,9 @@ class _ServerRuntime: ) request_handler.write_error(429, data) + # make sure that Alibaba errors (QpsLimitExceeded, TotalQpsLimitExceededAction) are retriable + # we patched contrib/aws to achive it: https://github.com/ClickHouse/aws-sdk-cpp/pull/22 https://github.com/ClickHouse/aws-sdk-cpp/pull/23 + # https://www.alibabacloud.com/help/en/oss/support/http-status-code-503 class QpsLimitExceededAction: def inject_error(self, request_handler): data = ( @@ -195,6 +198,18 @@ class _ServerRuntime: ) request_handler.write_error(429, data) + class TotalQpsLimitExceededAction: + def inject_error(self, request_handler): + data = ( + '' + "" + "TotalQpsLimitExceeded" + "Please reduce your request rate." + "txfbd566d03042474888193-00608d7537" + "" + ) + request_handler.write_error(429, data) + class RedirectAction: def __init__(self, host="localhost", port=1): self.dst_host = _and_then(host, str) @@ -269,6 +284,10 @@ class _ServerRuntime: self.error_handler = _ServerRuntime.QpsLimitExceededAction( *self.action_args ) + elif self.action == "total_qps_limit_exceeded": + self.error_handler = _ServerRuntime.TotalQpsLimitExceededAction( + *self.action_args + ) else: self.error_handler = _ServerRuntime.Expected500ErrorAction() diff --git a/tests/integration/test_checking_s3_blobs_paranoid/test.py b/tests/integration/test_checking_s3_blobs_paranoid/test.py index a7fe02b16de..476f7c61b28 100644 --- a/tests/integration/test_checking_s3_blobs_paranoid/test.py +++ b/tests/integration/test_checking_s3_blobs_paranoid/test.py @@ -205,6 +205,7 @@ def test_upload_s3_fail_upload_part_when_multi_part_upload( [ ("slow_down", "DB::Exception: Slow Down."), ("qps_limit_exceeded", "DB::Exception: Please reduce your request rate."), + ("total_qps_limit_exceeded", "DB::Exception: Please reduce your request rate."), ( "connection_refused", "Poco::Exception. Code: 1000, e.code() = 111, Connection refused", diff --git a/tests/integration/test_lost_part/test.py b/tests/integration/test_lost_part/test.py index 382539df7de..b8e67551d79 100644 --- a/tests/integration/test_lost_part/test.py +++ b/tests/integration/test_lost_part/test.py @@ -90,7 +90,7 @@ def test_lost_part_same_replica(start_cluster): ) assert node1.contains_in_log( - "Created empty part" + f"Created empty part {victim_part_from_the_middle}" ), f"Seems like empty part {victim_part_from_the_middle} is not created or log message changed" assert node1.query("SELECT COUNT() FROM mt0") == "4\n" @@ -143,7 +143,10 @@ def test_lost_part_other_replica(start_cluster): node1.query("CHECK TABLE mt1") node2.query("SYSTEM START REPLICATION QUEUES") - res, err = node1.query_and_get_answer_with_error("SYSTEM SYNC REPLICA mt1") + # Reduce timeout in sync replica since it might never finish with merge stopped and we don't want to wait 300s + res, err = node1.query_and_get_answer_with_error( + "SYSTEM SYNC REPLICA mt1", settings={"receive_timeout": 30} + ) print("result: ", res) print("error: ", res) @@ -158,10 +161,10 @@ def test_lost_part_other_replica(start_cluster): ) assert node1.contains_in_log( - "Created empty part" - ), "Seems like empty part {} is not created or log message changed".format( - victim_part_from_the_middle - ) + f"Created empty part {victim_part_from_the_middle}" + ) or node1.contains_in_log( + f"Part {victim_part_from_the_middle} looks broken. Removing it and will try to fetch." + ), f"Seems like empty part {victim_part_from_the_middle} is not created or log message changed" assert_eq_with_retry(node2, "SELECT COUNT() FROM mt1", "4") assert_eq_with_retry(node2, "SELECT COUNT() FROM system.replication_queue", "0") diff --git a/tests/integration/test_s3_plain_rewritable/test.py b/tests/integration/test_s3_plain_rewritable/test.py index 67e3ec987a9..06967958631 100644 --- a/tests/integration/test_s3_plain_rewritable/test.py +++ b/tests/integration/test_s3_plain_rewritable/test.py @@ -9,20 +9,10 @@ cluster = ClickHouseCluster(__file__) NUM_WORKERS = 5 -nodes = [] -for i in range(NUM_WORKERS): - name = "node{}".format(i + 1) - node = cluster.add_instance( - name, - main_configs=["configs/storage_conf.xml"], - env_variables={"ENDPOINT_SUBPATH": name}, - with_minio=True, - stay_alive=True, - ) - nodes.append(node) - MAX_ROWS = 1000 +dirs_created = [] + def gen_insert_values(size): return ",".join( @@ -38,6 +28,17 @@ insert_values = ",".join( @pytest.fixture(scope="module", autouse=True) def start_cluster(): + for i in range(NUM_WORKERS): + cluster.add_instance( + f"node{i + 1}", + main_configs=["configs/storage_conf.xml"], + with_minio=True, + env_variables={"ENDPOINT_SUBPATH": f"node{i + 1}"}, + stay_alive=True, + # Override ENDPOINT_SUBPATH. + instance_env_variables=i > 0, + ) + try: cluster.start() yield cluster @@ -64,10 +65,10 @@ def test_insert(): gen_insert_values(random.randint(1, MAX_ROWS)) for _ in range(0, NUM_WORKERS) ] threads = [] + assert len(cluster.instances) == NUM_WORKERS for i in range(NUM_WORKERS): - t = threading.Thread( - target=create_insert, args=(nodes[i], insert_values_arr[i]) - ) + node = cluster.instances[f"node{i + 1}"] + t = threading.Thread(target=create_insert, args=(node, insert_values_arr[i])) threads.append(t) t.start() @@ -75,48 +76,61 @@ def test_insert(): t.join() for i in range(NUM_WORKERS): + node = cluster.instances[f"node{i + 1}"] assert ( - nodes[i].query("SELECT * FROM test ORDER BY id FORMAT Values") + node.query("SELECT * FROM test ORDER BY id FORMAT Values") == insert_values_arr[i] ) for i in range(NUM_WORKERS): - nodes[i].query("ALTER TABLE test MODIFY SETTING old_parts_lifetime = 59") + node = cluster.instances[f"node{i + 1}"] + node.query("ALTER TABLE test MODIFY SETTING old_parts_lifetime = 59") assert ( - nodes[i] - .query( + node.query( "SELECT engine_full from system.tables WHERE database = currentDatabase() AND name = 'test'" - ) - .find("old_parts_lifetime = 59") + ).find("old_parts_lifetime = 59") != -1 ) - nodes[i].query("ALTER TABLE test RESET SETTING old_parts_lifetime") + node.query("ALTER TABLE test RESET SETTING old_parts_lifetime") assert ( - nodes[i] - .query( + node.query( "SELECT engine_full from system.tables WHERE database = currentDatabase() AND name = 'test'" - ) - .find("old_parts_lifetime") + ).find("old_parts_lifetime") == -1 ) - nodes[i].query("ALTER TABLE test MODIFY COMMENT 'new description'") + node.query("ALTER TABLE test MODIFY COMMENT 'new description'") assert ( - nodes[i] - .query( + node.query( "SELECT comment from system.tables WHERE database = currentDatabase() AND name = 'test'" - ) - .find("new description") + ).find("new description") != -1 ) + created = int( + node.query( + "SELECT value FROM system.events WHERE event = 'DiskPlainRewritableS3DirectoryCreated'" + ) + ) + assert created > 0 + dirs_created.append(created) + assert ( + int( + node.query( + "SELECT value FROM system.metrics WHERE metric = 'DiskPlainRewritableS3DirectoryMapSize'" + ) + ) + == created + ) + @pytest.mark.order(1) def test_restart(): insert_values_arr = [] for i in range(NUM_WORKERS): + node = cluster.instances[f"node{i + 1}"] insert_values_arr.append( - nodes[i].query("SELECT * FROM test ORDER BY id FORMAT Values") + node.query("SELECT * FROM test ORDER BY id FORMAT Values") ) def restart(node): @@ -124,7 +138,7 @@ def test_restart(): threads = [] for i in range(NUM_WORKERS): - t = threading.Thread(target=restart, args=(nodes[i],)) + t = threading.Thread(target=restart, args=(node,)) threads.append(t) t.start() @@ -132,8 +146,9 @@ def test_restart(): t.join() for i in range(NUM_WORKERS): + node = cluster.instances[f"node{i + 1}"] assert ( - nodes[i].query("SELECT * FROM test ORDER BY id FORMAT Values") + node.query("SELECT * FROM test ORDER BY id FORMAT Values") == insert_values_arr[i] ) @@ -141,7 +156,16 @@ def test_restart(): @pytest.mark.order(2) def test_drop(): for i in range(NUM_WORKERS): - nodes[i].query("DROP TABLE IF EXISTS test SYNC") + node = cluster.instances[f"node{i + 1}"] + node.query("DROP TABLE IF EXISTS test SYNC") + + removed = int( + node.query( + "SELECT value FROM system.events WHERE event = 'DiskPlainRewritableS3DirectoryRemoved'" + ) + ) + + assert dirs_created[i] == removed it = cluster.minio_client.list_objects( cluster.minio_bucket, "data/", recursive=True diff --git a/tests/performance/sparse_column_filter.xml b/tests/performance/sparse_column_filter.xml new file mode 100644 index 00000000000..bc6a94a1cc4 --- /dev/null +++ b/tests/performance/sparse_column_filter.xml @@ -0,0 +1,42 @@ + + + + serialization + + sparse + + + + ratio + + 10 + 100 + 1000 + + + + + + CREATE TABLE test_{serialization}_{ratio} (id UInt64, u8 UInt8, u64 UInt64, str String) + ENGINE = MergeTree ORDER BY id + SETTINGS ratio_of_defaults_for_sparse_serialization = 0.8 + + + SYSTEM STOP MERGES test_{serialization}_{ratio} + + + INSERT INTO test_{serialization}_{ratio} SELECT + number, + number % {ratio} = 0 ? rand(1) : 0, + number % {ratio} = 0 ? rand(2) : 0, + number % {ratio} = 0 ? randomPrintableASCII(64, 3) : '' + FROM numbers(100000000) + + + SELECT str, COUNT(DISTINCT id) as i FROM test_{serialization}_{ratio} WHERE notEmpty(str) GROUP BY str ORDER BY i DESC LIMIT 10 + SELECT str, COUNT(DISTINCT u8) as u FROM test_{serialization}_{ratio} WHERE notEmpty(str) GROUP BY str ORDER BY u DESC LIMIT 10 + SELECT str, COUNT(DISTINCT u64) as u FROM test_{serialization}_{ratio} WHERE notEmpty(str) GROUP BY str ORDER BY u DESC LIMIT 10 + + + DROP TABLE IF EXISTS test_{serialization}_{ratio} + diff --git a/tests/queries/0_stateless/00166_functions_of_aggregation_states.sql b/tests/queries/0_stateless/00166_functions_of_aggregation_states.sql index 85f26d4e206..62297e4076e 100644 --- a/tests/queries/0_stateless/00166_functions_of_aggregation_states.sql +++ b/tests/queries/0_stateless/00166_functions_of_aggregation_states.sql @@ -1,5 +1,5 @@ -- Disable external aggregation because the state is reset for each new block of data in 'runningAccumulate' function. SET max_bytes_before_external_group_by = 0; -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; SELECT k, finalizeAggregation(sum_state), runningAccumulate(sum_state) FROM (SELECT intDiv(number, 50000) AS k, sumState(number) AS sum_state FROM (SELECT number FROM system.numbers LIMIT 1000000) GROUP BY k ORDER BY k); diff --git a/tests/queries/0_stateless/00410_aggregation_combinators_with_arenas.sql b/tests/queries/0_stateless/00410_aggregation_combinators_with_arenas.sql index 99091878d90..3eb4c2b1b4a 100644 --- a/tests/queries/0_stateless/00410_aggregation_combinators_with_arenas.sql +++ b/tests/queries/0_stateless/00410_aggregation_combinators_with_arenas.sql @@ -1,4 +1,4 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; DROP TABLE IF EXISTS arena; CREATE TABLE arena (k UInt8, d String) ENGINE = Memory; INSERT INTO arena SELECT number % 10 AS k, hex(intDiv(number, 10) % 1000) AS d FROM system.numbers LIMIT 10000000; diff --git a/tests/queries/0_stateless/00653_running_difference.sql b/tests/queries/0_stateless/00653_running_difference.sql index d210e04a3a4..d2858a938cd 100644 --- a/tests/queries/0_stateless/00653_running_difference.sql +++ b/tests/queries/0_stateless/00653_running_difference.sql @@ -1,4 +1,4 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; select runningDifference(x) from (select arrayJoin([0, 1, 5, 10]) as x); select '-'; select runningDifference(x) from (select arrayJoin([2, Null, 3, Null, 10]) as x); diff --git a/tests/queries/0_stateless/00719_parallel_ddl_db.sh b/tests/queries/0_stateless/00719_parallel_ddl_db.sh index 004590c21df..b7dea25c182 100755 --- a/tests/queries/0_stateless/00719_parallel_ddl_db.sh +++ b/tests/queries/0_stateless/00719_parallel_ddl_db.sh @@ -11,7 +11,11 @@ ${CLICKHOUSE_CLIENT} --query "DROP DATABASE IF EXISTS parallel_ddl" function query() { - for _ in {1..50}; do + local it=0 + TIMELIMIT=30 + while [ $SECONDS -lt "$TIMELIMIT" ] && [ $it -lt 50 ]; + do + it=$((it+1)) ${CLICKHOUSE_CLIENT} --query "CREATE DATABASE IF NOT EXISTS parallel_ddl" ${CLICKHOUSE_CLIENT} --query "DROP DATABASE IF EXISTS parallel_ddl" done diff --git a/tests/queries/0_stateless/00719_parallel_ddl_table.sh b/tests/queries/0_stateless/00719_parallel_ddl_table.sh index 57a7e228341..fefe12ae656 100755 --- a/tests/queries/0_stateless/00719_parallel_ddl_table.sh +++ b/tests/queries/0_stateless/00719_parallel_ddl_table.sh @@ -10,7 +10,11 @@ ${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS parallel_ddl" function query() { - for _ in {1..50}; do + local it=0 + TIMELIMIT=30 + while [ $SECONDS -lt "$TIMELIMIT" ] && [ $it -lt 50 ]; + do + it=$((it+1)) ${CLICKHOUSE_CLIENT} --query "CREATE TABLE IF NOT EXISTS parallel_ddl(a Int) ENGINE = Memory" ${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS parallel_ddl" done diff --git a/tests/queries/0_stateless/00808_not_optimize_predicate.sql b/tests/queries/0_stateless/00808_not_optimize_predicate.sql index c39f1ff2ad1..d2527477dbd 100644 --- a/tests/queries/0_stateless/00808_not_optimize_predicate.sql +++ b/tests/queries/0_stateless/00808_not_optimize_predicate.sql @@ -1,6 +1,6 @@ SET send_logs_level = 'fatal'; SET convert_query_to_cnf = 0; -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; DROP TABLE IF EXISTS test_00808; CREATE TABLE test_00808(date Date, id Int8, name String, value Int64, sign Int8) ENGINE = CollapsingMergeTree(sign) ORDER BY (id, date); diff --git a/tests/queries/0_stateless/00840_long_concurrent_select_and_drop_deadlock.sh b/tests/queries/0_stateless/00840_long_concurrent_select_and_drop_deadlock.sh index cbe37de6651..238cdcea547 100755 --- a/tests/queries/0_stateless/00840_long_concurrent_select_and_drop_deadlock.sh +++ b/tests/queries/0_stateless/00840_long_concurrent_select_and_drop_deadlock.sh @@ -19,15 +19,39 @@ trap cleanup EXIT $CLICKHOUSE_CLIENT -q "create view view_00840 as select count(*),database,table from system.columns group by database,table" -for _ in {1..100}; do - $CLICKHOUSE_CLIENT -nm -q " - drop table if exists view_00840; - create view view_00840 as select count(*),database,table from system.columns group by database,table; - " -done & -for _ in {1..250}; do - $CLICKHOUSE_CLIENT -q "select * from view_00840 order by table" >/dev/null 2>&1 || true -done & + +function thread_drop_create() +{ + local TIMELIMIT=$((SECONDS+$1)) + local it=0 + while [ $SECONDS -lt "$TIMELIMIT" ] && [ $it -lt 100 ]; + do + it=$((it+1)) + $CLICKHOUSE_CLIENT -nm -q " + drop table if exists view_00840; + create view view_00840 as select count(*),database,table from system.columns group by database,table; + " + done +} + +function thread_select() +{ + local TIMELIMIT=$((SECONDS+$1)) + local it=0 + while [ $SECONDS -lt "$TIMELIMIT" ] && [ $it -lt 250 ]; + do + it=$((it+1)) + $CLICKHOUSE_CLIENT -q "select * from view_00840 order by table" >/dev/null 2>&1 || true + done +} + + +export -f thread_drop_create +export -f thread_select + +TIMEOUT=60 +thread_drop_create $TIMEOUT & +thread_select $TIMEOUT & wait trap '' EXIT diff --git a/tests/queries/0_stateless/00957_neighbor.sql b/tests/queries/0_stateless/00957_neighbor.sql index 8c40f0aab47..ac26fe0eae7 100644 --- a/tests/queries/0_stateless/00957_neighbor.sql +++ b/tests/queries/0_stateless/00957_neighbor.sql @@ -1,4 +1,4 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; -- no arguments select neighbor(); -- { serverError 42 } -- single argument diff --git a/tests/queries/0_stateless/00996_neighbor.sql b/tests/queries/0_stateless/00996_neighbor.sql index 50b07242eac..f9cbf69a836 100644 --- a/tests/queries/0_stateless/00996_neighbor.sql +++ b/tests/queries/0_stateless/00996_neighbor.sql @@ -1,4 +1,4 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; SELECT number, neighbor(toString(number), 0) FROM numbers(10); SELECT number, neighbor(toString(number), 5) FROM numbers(10); diff --git a/tests/queries/0_stateless/01012_reset_running_accumulate.sql b/tests/queries/0_stateless/01012_reset_running_accumulate.sql index eed653cc629..09bd29de185 100644 --- a/tests/queries/0_stateless/01012_reset_running_accumulate.sql +++ b/tests/queries/0_stateless/01012_reset_running_accumulate.sql @@ -1,6 +1,6 @@ -- Disable external aggregation because the state is reset for each new block of data in 'runningAccumulate' function. SET max_bytes_before_external_group_by = 0; -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; SELECT grouping, item, diff --git a/tests/queries/0_stateless/01051_aggregate_function_crash.sql b/tests/queries/0_stateless/01051_aggregate_function_crash.sql index c50c275d834..a55ead8a2d7 100644 --- a/tests/queries/0_stateless/01051_aggregate_function_crash.sql +++ b/tests/queries/0_stateless/01051_aggregate_function_crash.sql @@ -1,4 +1,4 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; SELECT runningAccumulate(string_state) FROM ( diff --git a/tests/queries/0_stateless/01056_predicate_optimizer_bugs.sql b/tests/queries/0_stateless/01056_predicate_optimizer_bugs.sql index 6ea42ec32b0..07f94c03e10 100644 --- a/tests/queries/0_stateless/01056_predicate_optimizer_bugs.sql +++ b/tests/queries/0_stateless/01056_predicate_optimizer_bugs.sql @@ -1,7 +1,7 @@ SET enable_optimize_predicate_expression = 1; SET joined_subquery_requires_alias = 0; SET convert_query_to_cnf = 0; -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; -- https://github.com/ClickHouse/ClickHouse/issues/3885 -- https://github.com/ClickHouse/ClickHouse/issues/5485 diff --git a/tests/queries/0_stateless/01293_optimize_final_force.reference b/tests/queries/0_stateless/01293_optimize_final_force.reference index b0b9422adf0..e69de29bb2d 100644 --- a/tests/queries/0_stateless/01293_optimize_final_force.reference +++ b/tests/queries/0_stateless/01293_optimize_final_force.reference @@ -1,100 +0,0 @@ -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 -55 0 diff --git a/tests/queries/0_stateless/01293_optimize_final_force.sh b/tests/queries/0_stateless/01293_optimize_final_force.sh index 9b9ed6272a1..d3d3d3e1ac5 100755 --- a/tests/queries/0_stateless/01293_optimize_final_force.sh +++ b/tests/queries/0_stateless/01293_optimize_final_force.sh @@ -6,23 +6,33 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh -for _ in {1..100}; do $CLICKHOUSE_CLIENT --multiquery --query " -DROP TABLE IF EXISTS mt; -CREATE TABLE mt (x UInt8, k UInt8 DEFAULT 0) ENGINE = SummingMergeTree ORDER BY k; +it=0 +TIMELIMIT=31 +while [ $SECONDS -lt "$TIMELIMIT" ] && [ $it -lt 100 ]; +do + it=$((it+1)) + $CLICKHOUSE_CLIENT --multiquery --query " + DROP TABLE IF EXISTS mt; + CREATE TABLE mt (x UInt8, k UInt8 DEFAULT 0) ENGINE = SummingMergeTree ORDER BY k; -INSERT INTO mt (x) VALUES (1); -INSERT INTO mt (x) VALUES (2); -INSERT INTO mt (x) VALUES (3); -INSERT INTO mt (x) VALUES (4); -INSERT INTO mt (x) VALUES (5); -INSERT INTO mt (x) VALUES (6); -INSERT INTO mt (x) VALUES (7); -INSERT INTO mt (x) VALUES (8); -INSERT INTO mt (x) VALUES (9); -INSERT INTO mt (x) VALUES (10); + INSERT INTO mt (x) VALUES (1); + INSERT INTO mt (x) VALUES (2); + INSERT INTO mt (x) VALUES (3); + INSERT INTO mt (x) VALUES (4); + INSERT INTO mt (x) VALUES (5); + INSERT INTO mt (x) VALUES (6); + INSERT INTO mt (x) VALUES (7); + INSERT INTO mt (x) VALUES (8); + INSERT INTO mt (x) VALUES (9); + INSERT INTO mt (x) VALUES (10); -OPTIMIZE TABLE mt FINAL; -SELECT * FROM mt; + OPTIMIZE TABLE mt FINAL; + "; -DROP TABLE mt; -"; done + RES=$($CLICKHOUSE_CLIENT --query "SELECT * FROM mt;") + if [ "$RES" != "55 0" ]; then + echo "FAIL. Got: $RES" + fi + + $CLICKHOUSE_CLIENT --query "DROP TABLE mt;" +done diff --git a/tests/queries/0_stateless/01353_neighbor_overflow.sql b/tests/queries/0_stateless/01353_neighbor_overflow.sql index ac168cb3305..8844cf514b1 100644 --- a/tests/queries/0_stateless/01353_neighbor_overflow.sql +++ b/tests/queries/0_stateless/01353_neighbor_overflow.sql @@ -1,3 +1,3 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; SELECT neighbor(toString(number), -9223372036854775808) FROM numbers(100); -- { serverError 69 } WITH neighbor(toString(number), toInt64(rand64())) AS x SELECT * FROM system.numbers WHERE NOT ignore(x); -- { serverError 69 } diff --git a/tests/queries/0_stateless/01455_optimize_trivial_insert_select.sql b/tests/queries/0_stateless/01455_optimize_trivial_insert_select.sql index 466c9aa3707..09a93d94dc3 100644 --- a/tests/queries/0_stateless/01455_optimize_trivial_insert_select.sql +++ b/tests/queries/0_stateless/01455_optimize_trivial_insert_select.sql @@ -1,5 +1,5 @@ SET max_insert_threads = 1, max_threads = 100, min_insert_block_size_rows = 1048576, max_block_size = 65536; -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; DROP TABLE IF EXISTS t; CREATE TABLE t (x UInt64) ENGINE = StripeLog; -- For trivial INSERT SELECT, max_threads is lowered to max_insert_threads and max_block_size is changed to min_insert_block_size_rows. diff --git a/tests/queries/0_stateless/01665_running_difference_ubsan.sql b/tests/queries/0_stateless/01665_running_difference_ubsan.sql index 504cb0269f8..19947b6ad84 100644 --- a/tests/queries/0_stateless/01665_running_difference_ubsan.sql +++ b/tests/queries/0_stateless/01665_running_difference_ubsan.sql @@ -1,2 +1,2 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; SELECT k, d, i FROM (SELECT t.1 AS k, t.2 AS v, runningDifference(v) AS d, runningDifference(cityHash64(t.1)) AS i FROM (SELECT arrayJoin([(NULL, 65535), ('a', 7), ('a', 3), ('b', 11), ('b', 2), ('', -9223372036854775808)]) AS t)) WHERE i = 9223372036854775807; diff --git a/tests/queries/0_stateless/01670_neighbor_lc_bug.sql b/tests/queries/0_stateless/01670_neighbor_lc_bug.sql index b665c0b48fd..599a1f49063 100644 --- a/tests/queries/0_stateless/01670_neighbor_lc_bug.sql +++ b/tests/queries/0_stateless/01670_neighbor_lc_bug.sql @@ -1,4 +1,4 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; SET output_format_pretty_row_numbers = 0; SELECT diff --git a/tests/queries/0_stateless/01701_parallel_parsing_infinite_segmentation.sh b/tests/queries/0_stateless/01701_parallel_parsing_infinite_segmentation.sh index 0fe04fb95fd..9284348dd62 100755 --- a/tests/queries/0_stateless/01701_parallel_parsing_infinite_segmentation.sh +++ b/tests/queries/0_stateless/01701_parallel_parsing_infinite_segmentation.sh @@ -1,4 +1,5 @@ #!/usr/bin/env bash +# Tags: no-debug, no-asan, no-tsan, no-msan CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh diff --git a/tests/queries/0_stateless/02151_hash_table_sizes_stats.reference b/tests/queries/0_stateless/02151_hash_table_sizes_stats.reference new file mode 100644 index 00000000000..712e2b058a4 --- /dev/null +++ b/tests/queries/0_stateless/02151_hash_table_sizes_stats.reference @@ -0,0 +1,21 @@ +1 +-- +1 +-- +1 +-- +1 +-- +1 +1 +-- +1 +-- +1 +1 +-- +1 +-- +1 +1 +-- diff --git a/tests/queries/0_stateless/02151_hash_table_sizes_stats.sh b/tests/queries/0_stateless/02151_hash_table_sizes_stats.sh new file mode 100755 index 00000000000..f99dbdacec2 --- /dev/null +++ b/tests/queries/0_stateless/02151_hash_table_sizes_stats.sh @@ -0,0 +1,96 @@ +#!/usr/bin/env bash +# Tags: long, no-debug, no-tsan, no-msan, no-ubsan, no-asan, no-random-settings, no-random-merge-tree-settings + +# shellcheck disable=SC2154,SC2162 + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + + +# tests rely on that all the rows are unique and max_threads divides table_size +table_size=1000005 +max_threads=5 + + +prepare_table() { + table_name="t_hash_table_sizes_stats_$RANDOM$RANDOM" + $CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS $table_name;" + if [ -z "$1" ]; then + $CLICKHOUSE_CLIENT -q "CREATE TABLE $table_name(number UInt64) Engine=MergeTree() ORDER BY tuple() SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';" + else + $CLICKHOUSE_CLIENT -q "CREATE TABLE $table_name(number UInt64) Engine=MergeTree() ORDER BY $1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';" + fi + $CLICKHOUSE_CLIENT -q "SYSTEM STOP MERGES $table_name;" + for ((i = 1; i <= max_threads; i++)); do + cnt=$((table_size / max_threads)) + from=$(((i - 1) * cnt)) + $CLICKHOUSE_CLIENT -q "INSERT INTO $table_name SELECT * FROM numbers($from, $cnt);" + done +} + +prepare_table_with_sorting_key() { + prepare_table "$1" +} + +run_query() { + query_id="${CLICKHOUSE_DATABASE}_hash_table_sizes_stats_$RANDOM$RANDOM" + $CLICKHOUSE_CLIENT --query_id="$query_id" --multiquery -q " + SET max_block_size = $((table_size / 10)); + SET merge_tree_min_rows_for_concurrent_read = 1; + SET max_untracked_memory = 0; + SET max_size_to_preallocate_for_aggregation = 1e12; + $query" +} + +check_preallocated_elements() { + # rows may be distributed in any way including "everything goes to the one particular thread" + $CLICKHOUSE_CLIENT --param_query_id="$1" -q " + SELECT COUNT(*) + FROM system.query_log + WHERE event_date >= yesterday() AND query_id = {query_id:String} AND current_database = currentDatabase() + AND ProfileEvents['AggregationPreallocatedElementsInHashTables'] BETWEEN $2 AND $3" +} + +check_convertion_to_two_level() { + # rows may be distributed in any way including "everything goes to the one particular thread" + $CLICKHOUSE_CLIENT --param_query_id="$1" -q " + SELECT SUM(ProfileEvents['AggregationHashTablesInitializedAsTwoLevel']) BETWEEN 1 AND $max_threads + FROM system.query_log + WHERE event_date >= yesterday() AND query_id = {query_id:String} AND current_database = currentDatabase()" +} + +print_border() { + echo "--" +} + +# each test case appends to this array +expected_results=() + +check_expectations() { + $CLICKHOUSE_CLIENT -q "SYSTEM FLUSH LOGS" + + for i in "${!expected_results[@]}"; do + read -a args <<< "${expected_results[$i]}" + if [ ${#args[@]} -eq 4 ]; then + check_convertion_to_two_level "${args[0]}" + fi + check_preallocated_elements "${args[@]}" + print_border + done +} + +# shellcheck source=./02151_hash_table_sizes_stats.testcases +source "$CURDIR"/02151_hash_table_sizes_stats.testcases + +test_one_thread_simple_group_by +test_one_thread_simple_group_by_with_limit +test_one_thread_simple_group_by_with_join_and_subquery +test_several_threads_simple_group_by_with_limit_single_level_ht +test_several_threads_simple_group_by_with_limit_two_level_ht +test_several_threads_simple_group_by_with_limit_and_rollup_single_level_ht +test_several_threads_simple_group_by_with_limit_and_rollup_two_level_ht +test_several_threads_simple_group_by_with_limit_and_cube_single_level_ht +test_several_threads_simple_group_by_with_limit_and_cube_two_level_ht + +check_expectations diff --git a/tests/queries/0_stateless/02151_hash_table_sizes_stats.testcases b/tests/queries/0_stateless/02151_hash_table_sizes_stats.testcases new file mode 100644 index 00000000000..7612108a700 --- /dev/null +++ b/tests/queries/0_stateless/02151_hash_table_sizes_stats.testcases @@ -0,0 +1,183 @@ +test_one_thread_simple_group_by() { + expected_size_hint=$table_size + prepare_table + + query=" + -- size_hint = $expected_size_hint -- + SELECT number + FROM $table_name + GROUP BY number + SETTINGS max_threads = 1 + FORMAT Null;" + + run_query + run_query + expected_results+=("$query_id $expected_size_hint $expected_size_hint") +} + +test_one_thread_simple_group_by_with_limit() { + expected_size_hint=$table_size + prepare_table + + query=" + -- size_hint = $expected_size_hint despite the presence of limit -- + SELECT number + FROM $table_name + GROUP BY number + LIMIT 5 + SETTINGS max_threads = 1 + FORMAT Null;" + + run_query + run_query + expected_results+=("$query_id $expected_size_hint $expected_size_hint") +} + +test_one_thread_simple_group_by_with_join_and_subquery() { + expected_size_hint=$((table_size + table_size / 2)) + prepare_table + + query=" + -- expected two size_hints for different keys: for the inner ($table_size) and the outer aggregation ($((table_size / 2))) + SELECT number + FROM $table_name AS t1 + JOIN + ( + SELECT number + FROM $table_name AS t2 + GROUP BY number + LIMIT $((table_size / 2)) + ) AS t3 USING(number) + GROUP BY number + SETTINGS max_threads = 1, + distributed_product_mode = 'local' + FORMAT Null;" + + run_query + run_query + expected_results+=("$query_id $expected_size_hint $expected_size_hint") +} + +test_several_threads_simple_group_by_with_limit_single_level_ht() { + expected_size_hint=$table_size + prepare_table + + query=" + -- size_hint = $expected_size_hint despite the presence of limit -- + SELECT number + FROM $table_name + GROUP BY number + LIMIT 5 + SETTINGS max_threads = $max_threads, + group_by_two_level_threshold = $((expected_size_hint + 1)), + group_by_two_level_threshold_bytes = $((table_size * 1000)) + FORMAT Null;" + + run_query + run_query + expected_results+=("$query_id $((expected_size_hint / max_threads)) $((expected_size_hint * max_threads))") +} + +test_several_threads_simple_group_by_with_limit_two_level_ht() { + expected_size_hint=$table_size + prepare_table + + query=" + -- size_hint = $expected_size_hint despite the presence of limit -- + SELECT number + FROM $table_name + GROUP BY number + LIMIT 5 + SETTINGS max_threads = $max_threads, + group_by_two_level_threshold = $expected_size_hint, + group_by_two_level_threshold_bytes = $((table_size * 1000)) + FORMAT Null;" + + run_query + run_query + expected_results+=("$query_id $((expected_size_hint / max_threads)) $((expected_size_hint * max_threads)) check_two_level") +} + +test_several_threads_simple_group_by_with_limit_and_rollup_single_level_ht() { + expected_size_hint=$table_size + prepare_table + + query=" + -- size_hint = $expected_size_hint despite the presence of limit -- + SELECT number + FROM $table_name + GROUP BY number + WITH ROLLUP + LIMIT 5 + SETTINGS max_threads = $max_threads, + group_by_two_level_threshold = $((expected_size_hint + 1)), + group_by_two_level_threshold_bytes = $((table_size * 1000)) + FORMAT Null;" + + run_query + run_query + expected_results+=("$query_id $((expected_size_hint / max_threads)) $((expected_size_hint * max_threads))") +} + +test_several_threads_simple_group_by_with_limit_and_rollup_two_level_ht() { + expected_size_hint=$table_size + prepare_table + + query=" + -- size_hint = $expected_size_hint despite the presence of limit -- + SELECT number + FROM $table_name + GROUP BY number + WITH ROLLUP + LIMIT 5 + SETTINGS max_threads = $max_threads, + group_by_two_level_threshold = $expected_size_hint, + group_by_two_level_threshold_bytes = $((table_size * 1000)) + FORMAT Null;" + + run_query + run_query + expected_results+=("$query_id $((expected_size_hint / max_threads)) $((expected_size_hint * max_threads)) check_two_level") +} + +test_several_threads_simple_group_by_with_limit_and_cube_single_level_ht() { + expected_size_hint=$table_size + prepare_table + + query=" + -- size_hint = $expected_size_hint despite the presence of limit -- + SELECT number + FROM $table_name + GROUP BY number + WITH CUBE + LIMIT 5 + SETTINGS max_threads = $max_threads, + group_by_two_level_threshold = $((expected_size_hint + 1)), + group_by_two_level_threshold_bytes = $((table_size * 1000)) + FORMAT Null;" + + run_query + run_query + expected_results+=("$query_id $((expected_size_hint / max_threads)) $((expected_size_hint * max_threads))") +} + +test_several_threads_simple_group_by_with_limit_and_cube_two_level_ht() { + expected_size_hint=$table_size + prepare_table + + query=" + -- size_hint = $expected_size_hint despite the presence of limit -- + SELECT number + FROM $table_name + GROUP BY number + WITH CUBE + LIMIT 5 + SETTINGS max_threads = $max_threads, + group_by_two_level_threshold = $expected_size_hint, + group_by_two_level_threshold_bytes = $((table_size * 1000)) + FORMAT Null;" + + run_query + run_query + expected_results+=("$query_id $((expected_size_hint / max_threads)) $((expected_size_hint * max_threads)) check_two_level") +} diff --git a/tests/queries/0_stateless/02151_hash_table_sizes_stats_distributed.reference b/tests/queries/0_stateless/02151_hash_table_sizes_stats_distributed.reference new file mode 100644 index 00000000000..0d10114f4ff --- /dev/null +++ b/tests/queries/0_stateless/02151_hash_table_sizes_stats_distributed.reference @@ -0,0 +1,33 @@ +1 +1 +-- +1 +1 +-- +1 +1 +-- +1 +1 +-- +1 +1 +1 +1 +-- +1 +1 +-- +1 +1 +1 +1 +-- +1 +1 +-- +1 +1 +1 +1 +-- diff --git a/tests/queries/0_stateless/02151_hash_table_sizes_stats_distributed.sh b/tests/queries/0_stateless/02151_hash_table_sizes_stats_distributed.sh new file mode 100755 index 00000000000..056c176c1ff --- /dev/null +++ b/tests/queries/0_stateless/02151_hash_table_sizes_stats_distributed.sh @@ -0,0 +1,103 @@ +#!/usr/bin/env bash +# Tags: long, distributed, no-debug, no-tsan, no-msan, no-ubsan, no-asan, no-random-settings, no-random-merge-tree-settings + +# These tests don't use `current_database = currentDatabase()` condition, because database name isn't propagated during remote queries. + +# shellcheck disable=SC2154,SC2162 + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + + +# tests rely on that all the rows are unique and max_threads divides table_size +table_size=1000005 +max_threads=5 + + +prepare_table() { + table_name="t_hash_table_sizes_stats_$RANDOM$RANDOM" + $CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS $table_name;" + if [ -z "$1" ]; then + $CLICKHOUSE_CLIENT -q "CREATE TABLE $table_name(number UInt64) Engine=MergeTree() ORDER BY tuple() SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';" + else + $CLICKHOUSE_CLIENT -q "CREATE TABLE $table_name(number UInt64) Engine=MergeTree() ORDER BY $1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';" + fi + $CLICKHOUSE_CLIENT -q "SYSTEM STOP MERGES $table_name;" + for ((i = 1; i <= max_threads; i++)); do + cnt=$((table_size / max_threads)) + from=$(((i - 1) * cnt)) + $CLICKHOUSE_CLIENT -q "INSERT INTO $table_name SELECT * FROM numbers($from, $cnt);" + done + $CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS ${table_name}_d;" + $CLICKHOUSE_CLIENT -q "CREATE TABLE ${table_name}_d AS $table_name ENGINE = Distributed(test_cluster_two_shards, currentDatabase(), $table_name);" + table_name="${table_name}_d" +} + +prepare_table_with_sorting_key() { + prepare_table "$1" +} + +run_query() { + query_id="${CLICKHOUSE_DATABASE}_hash_table_sizes_stats_$RANDOM$RANDOM" + $CLICKHOUSE_CLIENT --query_id="$query_id" --multiquery -q " + SET max_block_size = $((table_size / 10)); + SET merge_tree_min_rows_for_concurrent_read = 1; + SET max_untracked_memory = 0; + SET prefer_localhost_replica = 1; + $query" +} + +check_preallocated_elements() { + # rows may be distributed in any way including "everything goes to the one particular thread" + $CLICKHOUSE_CLIENT --param_query_id="$1" -q " + SELECT COUNT(*) + FROM system.query_log + WHERE event_date >= yesterday() AND (query_id = {query_id:String} OR initial_query_id = {query_id:String}) + AND ProfileEvents['AggregationPreallocatedElementsInHashTables'] BETWEEN $2 AND $3 + GROUP BY query_id" +} + +check_convertion_to_two_level() { + # rows may be distributed in any way including "everything goes to the one particular thread" + $CLICKHOUSE_CLIENT --param_query_id="$1" -q " + SELECT SUM(ProfileEvents['AggregationHashTablesInitializedAsTwoLevel']) BETWEEN 1 AND $max_threads + FROM system.query_log + WHERE event_date >= yesterday() AND (query_id = {query_id:String} OR initial_query_id = {query_id:String}) + GROUP BY query_id" +} + +print_border() { + echo "--" +} + +# each test case appends to this array +expected_results=() + +check_expectations() { + $CLICKHOUSE_CLIENT -q "SYSTEM FLUSH LOGS" + + for i in "${!expected_results[@]}"; do + read -a args <<< "${expected_results[$i]}" + if [ ${#args[@]} -eq 4 ]; then + check_convertion_to_two_level "${args[0]}" + fi + check_preallocated_elements "${args[@]}" + print_border + done +} + +# shellcheck source=./02151_hash_table_sizes_stats.testcases +source "$CURDIR"/02151_hash_table_sizes_stats.testcases + +test_one_thread_simple_group_by +test_one_thread_simple_group_by_with_limit +test_one_thread_simple_group_by_with_join_and_subquery +test_several_threads_simple_group_by_with_limit_single_level_ht +test_several_threads_simple_group_by_with_limit_two_level_ht +test_several_threads_simple_group_by_with_limit_and_rollup_single_level_ht +test_several_threads_simple_group_by_with_limit_and_rollup_two_level_ht +test_several_threads_simple_group_by_with_limit_and_cube_single_level_ht +test_several_threads_simple_group_by_with_limit_and_cube_two_level_ht + +check_expectations diff --git a/tests/queries/0_stateless/02232_dist_insert_send_logs_level_hung.sh b/tests/queries/0_stateless/02232_dist_insert_send_logs_level_hung.sh index 734cef06214..618dc83c223 100755 --- a/tests/queries/0_stateless/02232_dist_insert_send_logs_level_hung.sh +++ b/tests/queries/0_stateless/02232_dist_insert_send_logs_level_hung.sh @@ -1,7 +1,8 @@ #!/usr/bin/env bash -# Tags: long, no-parallel -# Tag: no-parallel - to heavy -# Tag: long - to heavy +# Tags: long, no-parallel, disabled +# Tag: no-parallel - too heavy +# Tag: long - too heavy +# Tag: disabled - Always takes 4+ minutes, in serial mode, which is too much to be always run in CI # This is the regression test when remote peer send some logs for INSERT, # it is easy to archive using materialized views, with small block size. @@ -49,10 +50,10 @@ insert_client_opts=( timeout 250s $CLICKHOUSE_CLIENT "${client_opts[@]}" "${insert_client_opts[@]}" -q "insert into function remote('127.2', currentDatabase(), in_02232) select * from numbers(1e6)" # Kill underlying query of remote() to make KILL faster -# This test is reproducing very interesting bahaviour. +# This test is reproducing very interesting behaviour. # The block size is 1, so the secondary query creates InterpreterSelectQuery for each row due to pushing to the MV. # It works extremely slow, and the initial query produces new blocks and writes them to the socket much faster -# then the secondary query can read and process them. Therefore, it fills network buffers in the kernel. +# than the secondary query can read and process them. Therefore, it fills network buffers in the kernel. # Once a buffer in the kernel is full, send(...) blocks until the secondary query will finish processing data # that it already has in ReadBufferFromPocoSocket and call recv. # Or until the kernel will decide to resize the buffer (seems like it has non-trivial rules for that). diff --git a/tests/queries/0_stateless/02310_uuid_v7.reference b/tests/queries/0_stateless/02310_uuid_v7.reference index ca4150bded2..1fa98ca522a 100644 --- a/tests/queries/0_stateless/02310_uuid_v7.reference +++ b/tests/queries/0_stateless/02310_uuid_v7.reference @@ -1,18 +1,3 @@ --- generateUUIDv7 -- -UUID -7 -2 -0 -0 -1 --- generateUUIDv7ThreadMonotonic -- -UUID -7 -2 -0 -0 -1 --- generateUUIDv7NonMonotonic -- UUID 7 2 diff --git a/tests/queries/0_stateless/02310_uuid_v7.sql b/tests/queries/0_stateless/02310_uuid_v7.sql index 0f12de07d20..e1aa3189d93 100644 --- a/tests/queries/0_stateless/02310_uuid_v7.sql +++ b/tests/queries/0_stateless/02310_uuid_v7.sql @@ -1,23 +1,8 @@ -SELECT '-- generateUUIDv7 --'; +-- Tests function generateUUIDv7 + SELECT toTypeName(generateUUIDv7()); SELECT substring(hex(generateUUIDv7()), 13, 1); -- check version bits SELECT bitAnd(bitShiftRight(toUInt128(generateUUIDv7()), 62), 3); -- check variant bits SELECT generateUUIDv7(1) = generateUUIDv7(2); SELECT generateUUIDv7() = generateUUIDv7(1); SELECT generateUUIDv7(1) = generateUUIDv7(1); - -SELECT '-- generateUUIDv7ThreadMonotonic --'; -SELECT toTypeName(generateUUIDv7ThreadMonotonic()); -SELECT substring(hex(generateUUIDv7ThreadMonotonic()), 13, 1); -- check version bits -SELECT bitAnd(bitShiftRight(toUInt128(generateUUIDv7ThreadMonotonic()), 62), 3); -- check variant bits -SELECT generateUUIDv7ThreadMonotonic(1) = generateUUIDv7ThreadMonotonic(2); -SELECT generateUUIDv7ThreadMonotonic() = generateUUIDv7ThreadMonotonic(1); -SELECT generateUUIDv7ThreadMonotonic(1) = generateUUIDv7ThreadMonotonic(1); - -SELECT '-- generateUUIDv7NonMonotonic --'; -SELECT toTypeName(generateUUIDv7NonMonotonic()); -SELECT substring(hex(generateUUIDv7NonMonotonic()), 13, 1); -- check version bits -SELECT bitAnd(bitShiftRight(toUInt128(generateUUIDv7NonMonotonic()), 62), 3); -- check variant bits -SELECT generateUUIDv7NonMonotonic(1) = generateUUIDv7NonMonotonic(2); -SELECT generateUUIDv7NonMonotonic() = generateUUIDv7NonMonotonic(1); -SELECT generateUUIDv7NonMonotonic(1) = generateUUIDv7NonMonotonic(1); diff --git a/tests/queries/0_stateless/02496_remove_redundant_sorting.reference b/tests/queries/0_stateless/02496_remove_redundant_sorting.reference index dbb8ad02293..77ef213b36d 100644 --- a/tests/queries/0_stateless/02496_remove_redundant_sorting.reference +++ b/tests/queries/0_stateless/02496_remove_redundant_sorting.reference @@ -478,7 +478,7 @@ FROM ORDER BY number DESC ) ORDER BY number ASC -SETTINGS allow_deprecated_functions = 1 +SETTINGS allow_deprecated_error_prone_window_functions = 1 -- explain Expression (Projection) Sorting (Sorting for ORDER BY) diff --git a/tests/queries/0_stateless/02496_remove_redundant_sorting.sh b/tests/queries/0_stateless/02496_remove_redundant_sorting.sh index 31d2936628b..661b32fce72 100755 --- a/tests/queries/0_stateless/02496_remove_redundant_sorting.sh +++ b/tests/queries/0_stateless/02496_remove_redundant_sorting.sh @@ -315,7 +315,7 @@ FROM ORDER BY number DESC ) ORDER BY number ASC -SETTINGS allow_deprecated_functions = 1" +SETTINGS allow_deprecated_error_prone_window_functions = 1" run_query "$query" echo "-- non-stateful function does _not_ prevent removing inner ORDER BY" diff --git a/tests/queries/0_stateless/02496_remove_redundant_sorting_analyzer.reference b/tests/queries/0_stateless/02496_remove_redundant_sorting_analyzer.reference index d74ef70a23f..b6a2e3182df 100644 --- a/tests/queries/0_stateless/02496_remove_redundant_sorting_analyzer.reference +++ b/tests/queries/0_stateless/02496_remove_redundant_sorting_analyzer.reference @@ -477,7 +477,7 @@ FROM ORDER BY number DESC ) ORDER BY number ASC -SETTINGS allow_deprecated_functions = 1 +SETTINGS allow_deprecated_error_prone_window_functions = 1 -- explain Expression (Project names) Sorting (Sorting for ORDER BY) diff --git a/tests/queries/0_stateless/02788_fix_logical_error_in_sorting.sql b/tests/queries/0_stateless/02788_fix_logical_error_in_sorting.sql index 6964d8cf47d..97741e6fcc9 100644 --- a/tests/queries/0_stateless/02788_fix_logical_error_in_sorting.sql +++ b/tests/queries/0_stateless/02788_fix_logical_error_in_sorting.sql @@ -1,4 +1,4 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; DROP TABLE IF EXISTS session_events; DROP TABLE IF EXISTS event_types; diff --git a/tests/queries/0_stateless/02842_largestTriangleThreeBuckets_aggregate_function.sql b/tests/queries/0_stateless/02842_largestTriangleThreeBuckets_aggregate_function.sql index 254875ba041..d5ef564469e 100644 --- a/tests/queries/0_stateless/02842_largestTriangleThreeBuckets_aggregate_function.sql +++ b/tests/queries/0_stateless/02842_largestTriangleThreeBuckets_aggregate_function.sql @@ -1,4 +1,4 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; drop table if exists largestTriangleThreeBucketsTestFloat64Float64; CREATE TABLE largestTriangleThreeBucketsTestFloat64Float64 @@ -55,10 +55,10 @@ CREATE TABLE largestTriangleTreeBucketsBucketSizeTest INSERT INTO largestTriangleTreeBucketsBucketSizeTest (x, y) SELECT (number + 1) AS x, (x % 1000) AS y FROM numbers(9999); -SELECT - arrayJoin(lttb(1000)(x, y)) AS point, - tupleElement(point, 1) AS point_x, - point_x - neighbor(point_x, -1) AS point_x_diff_with_previous_row +SELECT + arrayJoin(lttb(1000)(x, y)) AS point, + tupleElement(point, 1) AS point_x, + point_x - neighbor(point_x, -1) AS point_x_diff_with_previous_row FROM largestTriangleTreeBucketsBucketSizeTest LIMIT 990, 10; DROP TABLE largestTriangleTreeBucketsBucketSizeTest; diff --git a/tests/queries/0_stateless/02901_predicate_pushdown_cte_stateful.sql b/tests/queries/0_stateless/02901_predicate_pushdown_cte_stateful.sql index a208519b655..d65b0da42a4 100644 --- a/tests/queries/0_stateless/02901_predicate_pushdown_cte_stateful.sql +++ b/tests/queries/0_stateless/02901_predicate_pushdown_cte_stateful.sql @@ -1,4 +1,4 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; CREATE TABLE t ( diff --git a/tests/queries/0_stateless/02972_parallel_replicas_cte.reference b/tests/queries/0_stateless/02972_parallel_replicas_cte.reference index bbb5a960463..d3a06db1745 100644 --- a/tests/queries/0_stateless/02972_parallel_replicas_cte.reference +++ b/tests/queries/0_stateless/02972_parallel_replicas_cte.reference @@ -1,6 +1,6 @@ -990000 -990000 +900 +900 10 -990000 +900 1 -1000000 +1000 diff --git a/tests/queries/0_stateless/02972_parallel_replicas_cte.sql b/tests/queries/0_stateless/02972_parallel_replicas_cte.sql index c9ab83ff9ad..083b0ecc5c9 100644 --- a/tests/queries/0_stateless/02972_parallel_replicas_cte.sql +++ b/tests/queries/0_stateless/02972_parallel_replicas_cte.sql @@ -3,25 +3,25 @@ DROP TABLE IF EXISTS pr_2; DROP TABLE IF EXISTS numbers_1e6; CREATE TABLE pr_1 (`a` UInt32) ENGINE = MergeTree ORDER BY a PARTITION BY a % 10 AS -SELECT 10 * intDiv(number, 10) + 1 FROM numbers(1_000_000); +SELECT 10 * intDiv(number, 10) + 1 FROM numbers(1_000); CREATE TABLE pr_2 (`a` UInt32) ENGINE = MergeTree ORDER BY a AS -SELECT * FROM numbers(1_000_000); +SELECT * FROM numbers(1_000); -WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 10000) +WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 100) SELECT count() FROM pr_2 INNER JOIN filtered_groups ON pr_2.a = filtered_groups.a; -WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 10000) +WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 100) SELECT count() FROM pr_2 INNER JOIN filtered_groups ON pr_2.a = filtered_groups.a SETTINGS allow_experimental_parallel_reading_from_replicas = 1, parallel_replicas_for_non_replicated_merge_tree = 1, cluster_for_parallel_replicas = 'test_cluster_one_shard_three_replicas_localhost', max_parallel_replicas = 3; -- Testing that it is disabled for allow_experimental_analyzer=0. With analyzer it will be supported (with correct result) -WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 10000) +WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 100) SELECT count() FROM pr_2 INNER JOIN filtered_groups ON pr_2.a = filtered_groups.a SETTINGS allow_experimental_analyzer = 0, allow_experimental_parallel_reading_from_replicas = 2, parallel_replicas_for_non_replicated_merge_tree = 1, cluster_for_parallel_replicas = 'test_cluster_one_shard_three_replicas_localhost', max_parallel_replicas = 3; -- { serverError SUPPORT_IS_DISABLED } -- Disabled for any value of allow_experimental_parallel_reading_from_replicas != 1, not just 2 -WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 10000) +WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 100) SELECT count() FROM pr_2 INNER JOIN filtered_groups ON pr_2.a = filtered_groups.a SETTINGS allow_experimental_analyzer = 0, allow_experimental_parallel_reading_from_replicas = 512, parallel_replicas_for_non_replicated_merge_tree = 1, cluster_for_parallel_replicas = 'test_cluster_one_shard_three_replicas_localhost', max_parallel_replicas = 3; -- { serverError SUPPORT_IS_DISABLED } @@ -33,7 +33,7 @@ SETTINGS allow_experimental_parallel_reading_from_replicas = 1, parallel_replica SELECT * FROM ( - WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 10000) + WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 100) SELECT count() FROM pr_2 INNER JOIN filtered_groups ON pr_2.a = filtered_groups.a ) SETTINGS allow_experimental_parallel_reading_from_replicas = 1, parallel_replicas_for_non_replicated_merge_tree = 1, cluster_for_parallel_replicas = 'test_cluster_one_shard_three_replicas_localhost', max_parallel_replicas = 3; @@ -45,31 +45,31 @@ FROM SELECT c + 1 FROM ( - WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 10000) + WITH filtered_groups AS (SELECT a FROM pr_1 WHERE a >= 100) SELECT count() as c FROM pr_2 INNER JOIN filtered_groups ON pr_2.a = filtered_groups.a ) ) SETTINGS allow_experimental_parallel_reading_from_replicas = 1, parallel_replicas_for_non_replicated_merge_tree = 1, cluster_for_parallel_replicas = 'test_cluster_one_shard_three_replicas_localhost', max_parallel_replicas = 3; -CREATE TABLE numbers_1e6 +CREATE TABLE numbers_1e3 ( `n` UInt64 ) ENGINE = MergeTree ORDER BY n -AS SELECT * FROM numbers(1_000_000); +AS SELECT * FROM numbers(1_000); -- Same but nested CTE's WITH cte1 AS ( SELECT n - FROM numbers_1e6 + FROM numbers_1e3 ), cte2 AS ( SELECT n - FROM numbers_1e6 + FROM numbers_1e3 WHERE n IN (cte1) ) SELECT count() diff --git a/tests/queries/0_stateless/02992_analyzer_group_by_const.reference b/tests/queries/0_stateless/02992_analyzer_group_by_const.reference index ff61ab0a515..ea9492581c9 100644 --- a/tests/queries/0_stateless/02992_analyzer_group_by_const.reference +++ b/tests/queries/0_stateless/02992_analyzer_group_by_const.reference @@ -4,3 +4,5 @@ a|x String, Const(size = 1, String(size = 1)) String, Const(size = 1, String(size = 1)) 5128475243952187658 +0 0 +0 0 diff --git a/tests/queries/0_stateless/02992_analyzer_group_by_const.sql b/tests/queries/0_stateless/02992_analyzer_group_by_const.sql index f30a49887c7..ede6e0deed9 100644 --- a/tests/queries/0_stateless/02992_analyzer_group_by_const.sql +++ b/tests/queries/0_stateless/02992_analyzer_group_by_const.sql @@ -10,3 +10,23 @@ select dumpColumnStructure('x') GROUP BY 'x'; select dumpColumnStructure('x'); -- from https://github.com/ClickHouse/ClickHouse/pull/60046 SELECT cityHash64('limit', _CAST(materialize('World'), 'LowCardinality(String)')) FROM system.one GROUP BY GROUPING SETS ('limit'); + +WITH ( + SELECT dummy AS x + FROM system.one + ) AS y +SELECT + y, + min(dummy) +FROM remote('127.0.0.{1,2}', system.one) +GROUP BY y; + +WITH ( + SELECT dummy AS x + FROM system.one + ) AS y +SELECT + y, + min(dummy) +FROM remote('127.0.0.{2,3}', system.one) +GROUP BY y; diff --git a/tests/queries/0_stateless/03008_local_plain_rewritable.sh b/tests/queries/0_stateless/03008_local_plain_rewritable.sh index 1761c7d79b1..5fac964a219 100755 --- a/tests/queries/0_stateless/03008_local_plain_rewritable.sh +++ b/tests/queries/0_stateless/03008_local_plain_rewritable.sh @@ -6,13 +6,13 @@ CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CUR_DIR"/../shell_config.sh -${CLICKHOUSE_CLIENT} --query "drop table if exists test_mt sync" +${CLICKHOUSE_CLIENT} --query "drop table if exists 03008_test_local_mt sync" ${CLICKHOUSE_CLIENT} -nm --query " -create table test_mt (a Int32, b Int64, c Int64) +create table 03008_test_local_mt (a Int32, b Int64, c Int64) engine = MergeTree() partition by intDiv(a, 1000) order by tuple(a, b) settings disk = disk( - name = disk_s3_plain, + name = 03008_local_plain_rewritable, type = object_storage, object_storage_type = local, metadata_type = plain_rewritable, @@ -20,34 +20,36 @@ settings disk = disk( " ${CLICKHOUSE_CLIENT} -nm --query " -insert into test_mt (*) values (1, 2, 0), (2, 2, 2), (3, 1, 9), (4, 7, 7), (5, 10, 2), (6, 12, 5); -insert into test_mt (*) select number, number, number from numbers_mt(10000); +insert into 03008_test_local_mt (*) values (1, 2, 0), (2, 2, 2), (3, 1, 9), (4, 7, 7), (5, 10, 2), (6, 12, 5); +insert into 03008_test_local_mt (*) select number, number, number from numbers_mt(10000); " ${CLICKHOUSE_CLIENT} -nm --query " -select count(*) from test_mt; -select (*) from test_mt order by tuple(a, b) limit 10; +select count(*) from 03008_test_local_mt; +select (*) from 03008_test_local_mt order by tuple(a, b) limit 10; " -${CLICKHOUSE_CLIENT} --query "optimize table test_mt final" +${CLICKHOUSE_CLIENT} --query "optimize table 03008_test_local_mt final;" ${CLICKHOUSE_CLIENT} -nm --query " -alter table test_mt modify setting disk = 'disk_s3_plain', old_parts_lifetime = 3600; -select engine_full from system.tables WHERE database = currentDatabase() AND name = 'test_mt'; +alter table 03008_test_local_mt modify setting disk = '03008_local_plain_rewritable', old_parts_lifetime = 3600; +select engine_full from system.tables WHERE database = currentDatabase() AND name = '03008_test_local_mt'; " | grep -c "old_parts_lifetime = 3600" ${CLICKHOUSE_CLIENT} -nm --query " -select count(*) from test_mt; -select (*) from test_mt order by tuple(a, b) limit 10; +select count(*) from 03008_test_local_mt; +select (*) from 03008_test_local_mt order by tuple(a, b) limit 10; " ${CLICKHOUSE_CLIENT} -nm --query " -alter table test_mt update c = 0 where a % 2 = 1; -alter table test_mt add column d Int64 after c; -alter table test_mt drop column c; +alter table 03008_test_local_mt update c = 0 where a % 2 = 1; +alter table 03008_test_local_mt add column d Int64 after c; +alter table 03008_test_local_mt drop column c; " 2>&1 | grep -Fq "SUPPORT_IS_DISABLED" ${CLICKHOUSE_CLIENT} -nm --query " -truncate table test_mt; -select count(*) from test_mt; +truncate table 03008_test_local_mt; +select count(*) from 03008_test_local_mt; " + +${CLICKHOUSE_CLIENT} --query "drop table 03008_test_local_mt sync" diff --git a/tests/queries/0_stateless/03008_s3_plain_rewritable.sh b/tests/queries/0_stateless/03008_s3_plain_rewritable.sh index d72fc47f689..4d5989f6f12 100755 --- a/tests/queries/0_stateless/03008_s3_plain_rewritable.sh +++ b/tests/queries/0_stateless/03008_s3_plain_rewritable.sh @@ -7,47 +7,49 @@ CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CUR_DIR"/../shell_config.sh -${CLICKHOUSE_CLIENT} --query "drop table if exists test_mt" +${CLICKHOUSE_CLIENT} --query "drop table if exists test_s3_mt" ${CLICKHOUSE_CLIENT} -nm --query " -create table test_mt (a Int32, b Int64, c Int64) engine = MergeTree() partition by intDiv(a, 1000) order by tuple(a, b) +create table test_s3_mt (a Int32, b Int64, c Int64) engine = MergeTree() partition by intDiv(a, 1000) order by tuple(a, b) settings disk = disk( - name = s3_plain_rewritable, + name = 03008_s3_plain_rewritable, type = s3_plain_rewritable, - endpoint = 'http://localhost:11111/test/test_mt/', + endpoint = 'http://localhost:11111/test/03008_test_s3_mt/', access_key_id = clickhouse, secret_access_key = clickhouse); " ${CLICKHOUSE_CLIENT} -nm --query " -insert into test_mt (*) values (1, 2, 0), (2, 2, 2), (3, 1, 9), (4, 7, 7), (5, 10, 2), (6, 12, 5); -insert into test_mt (*) select number, number, number from numbers_mt(10000); -select count(*) from test_mt; -select (*) from test_mt order by tuple(a, b) limit 10; +insert into test_s3_mt (*) values (1, 2, 0), (2, 2, 2), (3, 1, 9), (4, 7, 7), (5, 10, 2), (6, 12, 5); +insert into test_s3_mt (*) select number, number, number from numbers_mt(10000); +select count(*) from test_s3_mt; +select (*) from test_s3_mt order by tuple(a, b) limit 10; " -${CLICKHOUSE_CLIENT} --query "optimize table test_mt final" +${CLICKHOUSE_CLIENT} --query "optimize table test_s3_mt final" ${CLICKHOUSE_CLIENT} -m --query " -alter table test_mt add projection test_mt_projection (select * order by b)" 2>&1 | grep -Fq "SUPPORT_IS_DISABLED" +alter table test_s3_mt add projection test_s3_mt_projection (select * order by b)" 2>&1 | grep -Fq "SUPPORT_IS_DISABLED" ${CLICKHOUSE_CLIENT} -nm --query " -alter table test_mt update c = 0 where a % 2 = 1; -alter table test_mt add column d Int64 after c; -alter table test_mt drop column c; +alter table test_s3_mt update c = 0 where a % 2 = 1; +alter table test_s3_mt add column d Int64 after c; +alter table test_s3_mt drop column c; " 2>&1 | grep -Fq "SUPPORT_IS_DISABLED" ${CLICKHOUSE_CLIENT} -nm --query " -detach table test_mt; -attach table test_mt; +detach table test_s3_mt; +attach table test_s3_mt; " -${CLICKHOUSE_CLIENT} --query "drop table if exists test_mt_dst" +${CLICKHOUSE_CLIENT} --query "drop table if exists test_s3_mt_dst" ${CLICKHOUSE_CLIENT} -m --query " -create table test_mt_dst (a Int32, b Int64, c Int64) engine = MergeTree() partition by intDiv(a, 1000) order by tuple(a, b) -settings disk = 's3_plain_rewritable' +create table test_s3_mt_dst (a Int32, b Int64, c Int64) engine = MergeTree() partition by intDiv(a, 1000) order by tuple(a, b) +settings disk = '03008_s3_plain_rewritable' " ${CLICKHOUSE_CLIENT} -m --query " -alter table test_mt move partition 0 to table test_mt_dst" 2>&1 | grep -Fq "SUPPORT_IS_DISABLED" +alter table test_s3_mt move partition 0 to table test_s3_mt_dst" 2>&1 | grep -Fq "SUPPORT_IS_DISABLED" + +${CLICKHOUSE_CLIENT} --query "drop table test_s3_mt sync" diff --git a/tests/queries/0_stateless/03131_deprecated_functions.sql b/tests/queries/0_stateless/03131_deprecated_functions.sql index 35cfe648c00..9247db15fd3 100644 --- a/tests/queries/0_stateless/03131_deprecated_functions.sql +++ b/tests/queries/0_stateless/03131_deprecated_functions.sql @@ -4,7 +4,7 @@ SELECT runningDifference(number) FROM system.numbers LIMIT 10; -- { serverError SELECT k, runningAccumulate(sum_k) AS res FROM (SELECT number as k, sumState(k) AS sum_k FROM numbers(10) GROUP BY k ORDER BY k); -- { serverError 721 } -SET allow_deprecated_functions=1; +SET allow_deprecated_error_prone_window_functions=1; SELECT number, neighbor(number, 2) FROM system.numbers LIMIT 10 FORMAT Null; diff --git a/tests/queries/0_stateless/03147_table_function_loop.reference b/tests/queries/0_stateless/03147_table_function_loop.reference new file mode 100644 index 00000000000..46a2310b65f --- /dev/null +++ b/tests/queries/0_stateless/03147_table_function_loop.reference @@ -0,0 +1,65 @@ +0 +1 +2 +0 +1 +2 +0 +1 +2 +0 +0 +1 +2 +0 +1 +2 +0 +1 +2 +0 +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +0 +1 +2 +3 +4 +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +0 +1 +2 +3 +4 +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +0 +1 +2 +3 +4 diff --git a/tests/queries/0_stateless/03147_table_function_loop.sql b/tests/queries/0_stateless/03147_table_function_loop.sql new file mode 100644 index 00000000000..af48e4b11e3 --- /dev/null +++ b/tests/queries/0_stateless/03147_table_function_loop.sql @@ -0,0 +1,14 @@ +-- Tags: no-parallel + +SELECT * FROM loop(numbers(3)) LIMIT 10; +SELECT * FROM loop (numbers(3)) LIMIT 10 settings max_block_size = 1; + +DROP DATABASE IF EXISTS 03147_db; +CREATE DATABASE IF NOT EXISTS 03147_db; +CREATE TABLE 03147_db.t (n Int8) ENGINE=MergeTree ORDER BY n; +INSERT INTO 03147_db.t SELECT * FROM numbers(10); +USE 03147_db; + +SELECT * FROM loop(03147_db.t) LIMIT 15; +SELECT * FROM loop(t) LIMIT 15; +SELECT * FROM loop(03147_db, t) LIMIT 15; diff --git a/tests/queries/0_stateless/03161_ipv4_ipv6_equality.reference b/tests/queries/0_stateless/03161_ipv4_ipv6_equality.reference new file mode 100644 index 00000000000..2a4cb2e658f --- /dev/null +++ b/tests/queries/0_stateless/03161_ipv4_ipv6_equality.reference @@ -0,0 +1,8 @@ +1 +1 +0 +0 +0 +0 +0 +0 diff --git a/tests/queries/0_stateless/03161_ipv4_ipv6_equality.sql b/tests/queries/0_stateless/03161_ipv4_ipv6_equality.sql new file mode 100644 index 00000000000..da2a660977a --- /dev/null +++ b/tests/queries/0_stateless/03161_ipv4_ipv6_equality.sql @@ -0,0 +1,11 @@ +-- Equal +SELECT toIPv4('127.0.0.1') = toIPv6('::ffff:127.0.0.1'); +SELECT toIPv6('::ffff:127.0.0.1') = toIPv4('127.0.0.1'); + +-- Not equal +SELECT toIPv4('127.0.0.1') = toIPv6('::ffff:127.0.0.2'); +SELECT toIPv4('127.0.0.2') = toIPv6('::ffff:127.0.0.1'); +SELECT toIPv6('::ffff:127.0.0.1') = toIPv4('127.0.0.2'); +SELECT toIPv6('::ffff:127.0.0.2') = toIPv4('127.0.0.1'); +SELECT toIPv4('127.0.0.1') = toIPv6('::ffef:127.0.0.1'); +SELECT toIPv6('::ffef:127.0.0.1') = toIPv4('127.0.0.1'); \ No newline at end of file diff --git a/tests/queries/0_stateless/03164_s3_settings_for_queries_and_merges.reference b/tests/queries/0_stateless/03164_s3_settings_for_queries_and_merges.reference new file mode 100644 index 00000000000..a2aef9837d3 --- /dev/null +++ b/tests/queries/0_stateless/03164_s3_settings_for_queries_and_merges.reference @@ -0,0 +1,3 @@ +655360 +18 0 +2 1 diff --git a/tests/queries/0_stateless/03164_s3_settings_for_queries_and_merges.sql b/tests/queries/0_stateless/03164_s3_settings_for_queries_and_merges.sql new file mode 100644 index 00000000000..652b27b8a67 --- /dev/null +++ b/tests/queries/0_stateless/03164_s3_settings_for_queries_and_merges.sql @@ -0,0 +1,40 @@ +-- Tags: no-random-settings, no-fasttest + +SET allow_prefetched_read_pool_for_remote_filesystem=0; +SET allow_prefetched_read_pool_for_local_filesystem=0; +SET max_threads = 1; +SET remote_read_min_bytes_for_seek = 100000; +-- Will affect INSERT, but not merge +SET s3_check_objects_after_upload=1; + +DROP TABLE IF EXISTS t_compact_bytes_s3; +CREATE TABLE t_compact_bytes_s3(c1 UInt32, c2 UInt32, c3 UInt32, c4 UInt32, c5 UInt32) +ENGINE = MergeTree ORDER BY c1 +SETTINGS index_granularity = 512, min_bytes_for_wide_part = '10G', storage_policy = 's3_no_cache'; + +INSERT INTO t_compact_bytes_s3 SELECT number, number, number, number, number FROM numbers(512 * 32 * 40); + +SYSTEM DROP MARK CACHE; +OPTIMIZE TABLE t_compact_bytes_s3 FINAL; + +SYSTEM DROP MARK CACHE; +SELECT count() FROM t_compact_bytes_s3 WHERE NOT ignore(c2, c4); +SYSTEM FLUSH LOGS; + +SELECT + ProfileEvents['S3ReadRequestsCount'], + ProfileEvents['ReadBufferFromS3Bytes'] < ProfileEvents['ReadCompressedBytes'] * 1.1 +FROM system.query_log +WHERE event_date >= yesterday() AND type = 'QueryFinish' + AND current_database = currentDatabase() + AND query ilike '%INSERT INTO t_compact_bytes_s3 SELECT number, number, number%'; + +SELECT + ProfileEvents['S3ReadRequestsCount'], + ProfileEvents['ReadBufferFromS3Bytes'] < ProfileEvents['ReadCompressedBytes'] * 1.1 +FROM system.query_log +WHERE event_date >= yesterday() AND type = 'QueryFinish' + AND current_database = currentDatabase() + AND query ilike '%OPTIMIZE TABLE t_compact_bytes_s3 FINAL%'; + +DROP TABLE IF EXISTS t_compact_bytes_s3; diff --git a/tests/queries/0_stateless/data_bson/comments.bson b/tests/queries/0_stateless/data_bson/comments.bson index 9aa4b6e6562..06681c51976 100644 Binary files a/tests/queries/0_stateless/data_bson/comments.bson and b/tests/queries/0_stateless/data_bson/comments.bson differ diff --git a/tests/queries/0_stateless/data_bson/comments_new.bson b/tests/queries/0_stateless/data_bson/comments_new.bson new file mode 100644 index 00000000000..aa9ee9bdbb4 Binary files /dev/null and b/tests/queries/0_stateless/data_bson/comments_new.bson differ diff --git a/tests/queries/1_stateful/00144_functions_of_aggregation_states.sql b/tests/queries/1_stateful/00144_functions_of_aggregation_states.sql index c5cd45d68b3..e30c132d242 100644 --- a/tests/queries/1_stateful/00144_functions_of_aggregation_states.sql +++ b/tests/queries/1_stateful/00144_functions_of_aggregation_states.sql @@ -1,3 +1,3 @@ -SET allow_deprecated_functions = 1; +SET allow_deprecated_error_prone_window_functions = 1; SELECT EventDate, finalizeAggregation(state), runningAccumulate(state) FROM (SELECT EventDate, uniqState(UserID) AS state FROM test.hits GROUP BY EventDate ORDER BY EventDate); diff --git a/utils/changelog/changelog.py b/utils/changelog/changelog.py index acc7293473d..304ab568e3c 100755 --- a/utils/changelog/changelog.py +++ b/utils/changelog/changelog.py @@ -3,18 +3,20 @@ import argparse import logging -import os.path as p import os +import os.path as p import re from datetime import date, timedelta -from subprocess import CalledProcessError, DEVNULL +from subprocess import DEVNULL, CalledProcessError from typing import Dict, List, Optional, TextIO from fuzzywuzzy.fuzz import ratio # type: ignore -from github_helper import GitHub, PullRequest, PullRequests, Repository from github.GithubException import RateLimitExceededException, UnknownObjectException from github.NamedUser import NamedUser -from git_helper import is_shallow, git_runner as runner + +from git_helper import git_runner as runner +from git_helper import is_shallow +from github_helper import GitHub, PullRequest, PullRequests, Repository # This array gives the preferred category order, and is also used to # normalize category names. @@ -58,9 +60,10 @@ class Description: self.entry, ) # 2) issue URL w/o markdown link + # including #issuecomment-1 or #event-12 entry = re.sub( - r"([^(])https://github.com/ClickHouse/ClickHouse/issues/([0-9]{4,})", - r"\1[#\2](https://github.com/ClickHouse/ClickHouse/issues/\2)", + r"([^(])(https://github.com/ClickHouse/ClickHouse/issues/([0-9]{4,})[-#a-z0-9]*)", + r"\1[#\3](\2)", entry, ) # It's possible that we face a secondary rate limit. @@ -271,7 +274,6 @@ def generate_description(item: PullRequest, repo: Repository) -> Optional[Descri category, ): category = "Bug Fix (user-visible misbehavior in an official stable release)" - return Description(item.number, item.user, item.html_url, item.title, category) # Filter out documentations changelog if re.match( @@ -300,8 +302,9 @@ def generate_description(item: PullRequest, repo: Repository) -> Optional[Descri return Description(item.number, item.user, item.html_url, entry, category) -def write_changelog(fd: TextIO, descriptions: Dict[str, List[Description]]): - year = date.today().year +def write_changelog( + fd: TextIO, descriptions: Dict[str, List[Description]], year: int +) -> None: to_commit = runner(f"git rev-parse {TO_REF}^{{}}")[:11] from_commit = runner(f"git rev-parse {FROM_REF}^{{}}")[:11] fd.write( @@ -359,6 +362,12 @@ def set_sha_in_changelog(): ).split("\n") +def get_year(prs: PullRequests) -> int: + if not prs: + return date.today().year + return max(pr.created_at.year for pr in prs) + + def main(): log_levels = [logging.WARN, logging.INFO, logging.DEBUG] args = parse_args() @@ -412,8 +421,9 @@ def main(): prs = gh.get_pulls_from_search(query=query, merged=merged, sort="created") descriptions = get_descriptions(prs) + changelog_year = get_year(prs) - write_changelog(args.output, descriptions) + write_changelog(args.output, descriptions, changelog_year) if __name__ == "__main__": diff --git a/utils/check-style/aspell-ignore/en/aspell-dict.txt b/utils/check-style/aspell-ignore/en/aspell-dict.txt index e61db3236b2..244f2ad98ff 100644 --- a/utils/check-style/aspell-ignore/en/aspell-dict.txt +++ b/utils/check-style/aspell-ignore/en/aspell-dict.txt @@ -246,7 +246,6 @@ DockerHub DoubleDelta Doxygen Durre -doesnt ECMA Ecto EdgeAngle @@ -1358,6 +1357,7 @@ cond conf config configs +conformant congruential conjuction conjuctive @@ -1414,8 +1414,12 @@ cutQueryString cutQueryStringAndFragment cutToFirstSignificantSubdomain cutToFirstSignificantSubdomainCustom +cutToFirstSignificantSubdomainCustomRFC cutToFirstSignificantSubdomainCustomWithWWW +cutToFirstSignificantSubdomainCustomWithWWWRFC +cutToFirstSignificantSubdomainRFC cutToFirstSignificantSubdomainWithWWW +cutToFirstSignificantSubdomainWithWWWRFC cutURLParameter cutWWW cyrus @@ -1502,7 +1506,10 @@ displaySecretsInShowAndSelect distro divideDecimal dmesg +doesnt +domainRFC domainWithoutWWW +domainWithoutWWWRFC dont dotProduct downsampling @@ -1575,8 +1582,11 @@ filesystems finalizeAggregation fips firstLine +firstSignficantSubdomain firstSignificantSubdomain firstSignificantSubdomainCustom +firstSignificantSubdomainCustomRFC +firstSignificantSubdomainRFC fixedstring flamegraph flatbuffers @@ -2156,6 +2166,7 @@ polygonsUnionSpherical polygonsWithinCartesian polygonsWithinSpherical popcnt +portRFC porthttps positionCaseInsensitive positionCaseInsensitiveUTF @@ -2660,6 +2671,9 @@ toStartOfSecond toStartOfTenMinutes toStartOfWeek toStartOfYear +toStartOfMicrosecond +toStartOfMillisecond +toStartOfNanosecond toString toStringCutToZero toTime @@ -2690,6 +2704,7 @@ toolset topK topKWeighted topLevelDomain +topLevelDomainRFC topk topkweighted transactional @@ -2788,6 +2803,7 @@ urls usearch userspace userver +UTCTimestamp utils uuid uuidv