Merge branch 'master' into fix-gwp-asan

This commit is contained in:
Antonio Andelic 2024-06-11 09:12:20 +02:00
commit 5e907a5fb9
61 changed files with 506 additions and 201 deletions

2
contrib/cld2 vendored

@ -1 +1 @@
Subproject commit bc6d493a2f64ed1fc1c4c4b4294a542a04e04217
Subproject commit 217ba8b8805b41557faadaa47bb6e99f2242eea3

View File

@ -0,0 +1,101 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.4.2.141-stable (9e23d27bd11) FIXME as compared to v24.4.1.2088-stable (6d4b31322d1)
#### Improvement
* Backported in [#63467](https://github.com/ClickHouse/ClickHouse/issues/63467): Make rabbitmq nack broken messages. Closes [#45350](https://github.com/ClickHouse/ClickHouse/issues/45350). [#60312](https://github.com/ClickHouse/ClickHouse/pull/60312) ([Kseniia Sumarokova](https://github.com/kssenii)).
#### Build/Testing/Packaging Improvement
* Backported in [#63612](https://github.com/ClickHouse/ClickHouse/issues/63612): The Dockerfile is reviewed by the docker official library in https://github.com/docker-library/official-images/pull/15846. [#63400](https://github.com/ClickHouse/ClickHouse/pull/63400) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#64279](https://github.com/ClickHouse/ClickHouse/issues/64279): Fix queries with FINAL give wrong result when table does not use adaptive granularity. [#62432](https://github.com/ClickHouse/ClickHouse/pull/62432) ([Duc Canh Le](https://github.com/canhld94)).
* Backported in [#63295](https://github.com/ClickHouse/ClickHouse/issues/63295): Fix crash with untuple and unresolved lambda. [#63131](https://github.com/ClickHouse/ClickHouse/pull/63131) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#63978](https://github.com/ClickHouse/ClickHouse/issues/63978): Fix intersect parts when restart after drop range. [#63202](https://github.com/ClickHouse/ClickHouse/pull/63202) ([Han Fei](https://github.com/hanfei1991)).
* Backported in [#63413](https://github.com/ClickHouse/ClickHouse/issues/63413): Fix a misbehavior when SQL security defaults don't load for old tables during server startup. [#63209](https://github.com/ClickHouse/ClickHouse/pull/63209) ([pufit](https://github.com/pufit)).
* Backported in [#63388](https://github.com/ClickHouse/ClickHouse/issues/63388): JOIN filter push down filled join fix. Closes [#63228](https://github.com/ClickHouse/ClickHouse/issues/63228). [#63234](https://github.com/ClickHouse/ClickHouse/pull/63234) ([Maksim Kita](https://github.com/kitaisreal)).
* Backported in [#63618](https://github.com/ClickHouse/ClickHouse/issues/63618): Fix bug which could potentially lead to rare LOGICAL_ERROR during SELECT query with message: `Unexpected return type from materialize. Expected type_XXX. Got type_YYY.` Introduced in [#59379](https://github.com/ClickHouse/ClickHouse/issues/59379). [#63353](https://github.com/ClickHouse/ClickHouse/pull/63353) ([alesapin](https://github.com/alesapin)).
* Backported in [#63451](https://github.com/ClickHouse/ClickHouse/issues/63451): Fix `X-ClickHouse-Timezone` header returning wrong timezone when using `session_timezone` as query level setting. [#63377](https://github.com/ClickHouse/ClickHouse/pull/63377) ([Andrey Zvonov](https://github.com/zvonand)).
* Backported in [#63605](https://github.com/ClickHouse/ClickHouse/issues/63605): Fix backup of projection part in case projection was removed from table metadata, but part still has projection. [#63426](https://github.com/ClickHouse/ClickHouse/pull/63426) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Backported in [#63510](https://github.com/ClickHouse/ClickHouse/issues/63510): Fix 'Every derived table must have its own alias' error for MYSQL dictionary source, close [#63341](https://github.com/ClickHouse/ClickHouse/issues/63341). [#63481](https://github.com/ClickHouse/ClickHouse/pull/63481) ([vdimir](https://github.com/vdimir)).
* Backported in [#63592](https://github.com/ClickHouse/ClickHouse/issues/63592): Avoid segafult in `MergeTreePrefetchedReadPool` while fetching projection parts. [#63513](https://github.com/ClickHouse/ClickHouse/pull/63513) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#63750](https://github.com/ClickHouse/ClickHouse/issues/63750): Read only the necessary columns from VIEW (new analyzer). Closes [#62594](https://github.com/ClickHouse/ClickHouse/issues/62594). [#63688](https://github.com/ClickHouse/ClickHouse/pull/63688) ([Maksim Kita](https://github.com/kitaisreal)).
* Backported in [#63772](https://github.com/ClickHouse/ClickHouse/issues/63772): Fix [#63539](https://github.com/ClickHouse/ClickHouse/issues/63539). Forbid WINDOW redefinition in new analyzer. [#63694](https://github.com/ClickHouse/ClickHouse/pull/63694) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#63872](https://github.com/ClickHouse/ClickHouse/issues/63872): Flatten_nested is broken with replicated database. [#63695](https://github.com/ClickHouse/ClickHouse/pull/63695) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#63854](https://github.com/ClickHouse/ClickHouse/issues/63854): Fix `Not found column` and `CAST AS Map from array requires nested tuple of 2 elements` exceptions for distributed queries which use `Map(Nothing, Nothing)` type. Fixes [#63637](https://github.com/ClickHouse/ClickHouse/issues/63637). [#63753](https://github.com/ClickHouse/ClickHouse/pull/63753) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#63847](https://github.com/ClickHouse/ClickHouse/issues/63847): Fix possible `ILLEGAL_COLUMN` error in `partial_merge` join, close [#37928](https://github.com/ClickHouse/ClickHouse/issues/37928). [#63755](https://github.com/ClickHouse/ClickHouse/pull/63755) ([vdimir](https://github.com/vdimir)).
* Backported in [#63908](https://github.com/ClickHouse/ClickHouse/issues/63908): `query_plan_remove_redundant_distinct` can break queries with WINDOW FUNCTIONS (with `allow_experimental_analyzer` is on). Fixes [#62820](https://github.com/ClickHouse/ClickHouse/issues/62820). [#63776](https://github.com/ClickHouse/ClickHouse/pull/63776) ([Igor Nikonov](https://github.com/devcrafter)).
* Backported in [#63955](https://github.com/ClickHouse/ClickHouse/issues/63955): Fix possible crash with SYSTEM UNLOAD PRIMARY KEY. [#63778](https://github.com/ClickHouse/ClickHouse/pull/63778) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#63938](https://github.com/ClickHouse/ClickHouse/issues/63938): Allow JOIN filter push down to both streams if only single equivalent column is used in query. Closes [#63799](https://github.com/ClickHouse/ClickHouse/issues/63799). [#63819](https://github.com/ClickHouse/ClickHouse/pull/63819) ([Maksim Kita](https://github.com/kitaisreal)).
* Backported in [#63991](https://github.com/ClickHouse/ClickHouse/issues/63991): Fix incorrect select query result when parallel replicas were used to read from a Materialized View. [#63861](https://github.com/ClickHouse/ClickHouse/pull/63861) ([Nikita Taranov](https://github.com/nickitat)).
* Backported in [#64033](https://github.com/ClickHouse/ClickHouse/issues/64033): Fix a error `Database name is empty` for remote queries with lambdas over the cluster with modified default database. Fixes [#63471](https://github.com/ClickHouse/ClickHouse/issues/63471). [#63864](https://github.com/ClickHouse/ClickHouse/pull/63864) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#64561](https://github.com/ClickHouse/ClickHouse/issues/64561): Fix SIGSEGV due to CPU/Real (`query_profiler_real_time_period_ns`/`query_profiler_cpu_time_period_ns`) profiler (has been an issue since 2022, that leads to periodic server crashes, especially if you were using distributed engine). [#63865](https://github.com/ClickHouse/ClickHouse/pull/63865) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#64011](https://github.com/ClickHouse/ClickHouse/issues/64011): Fix analyzer - IN function with arbitrary deep sub-selects in materialized view to use insertion block. [#63930](https://github.com/ClickHouse/ClickHouse/pull/63930) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Backported in [#64238](https://github.com/ClickHouse/ClickHouse/issues/64238): Fix resolve of unqualified COLUMNS matcher. Preserve the input columns order and forbid usage of unknown identifiers. [#63962](https://github.com/ClickHouse/ClickHouse/pull/63962) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#64103](https://github.com/ClickHouse/ClickHouse/issues/64103): Deserialize untrusted binary inputs in a safer way. [#64024](https://github.com/ClickHouse/ClickHouse/pull/64024) ([Robert Schulze](https://github.com/rschu1ze)).
* Backported in [#64170](https://github.com/ClickHouse/ClickHouse/issues/64170): Add missing settings to recoverLostReplica. [#64040](https://github.com/ClickHouse/ClickHouse/pull/64040) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#64322](https://github.com/ClickHouse/ClickHouse/issues/64322): This fix will use a proper redefined context with the correct definer for each individual view in the query pipeline Closes [#63777](https://github.com/ClickHouse/ClickHouse/issues/63777). [#64079](https://github.com/ClickHouse/ClickHouse/pull/64079) ([pufit](https://github.com/pufit)).
* Backported in [#64382](https://github.com/ClickHouse/ClickHouse/issues/64382): Fix analyzer: "Not found column" error is fixed when using INTERPOLATE. [#64096](https://github.com/ClickHouse/ClickHouse/pull/64096) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Backported in [#64568](https://github.com/ClickHouse/ClickHouse/issues/64568): Fix creating backups to S3 buckets with different credentials from the disk containing the file. [#64153](https://github.com/ClickHouse/ClickHouse/pull/64153) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#64272](https://github.com/ClickHouse/ClickHouse/issues/64272): Prevent LOGICAL_ERROR on CREATE TABLE as MaterializedView. [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#64330](https://github.com/ClickHouse/ClickHouse/issues/64330): The query cache now considers two identical queries against different databases as different. The previous behavior could be used to bypass missing privileges to read from a table. [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)).
* Backported in [#64254](https://github.com/ClickHouse/ClickHouse/issues/64254): Ignore `text_log` config when using Keeper. [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#64690](https://github.com/ClickHouse/ClickHouse/issues/64690): Fix Query Tree size validation. Closes [#63701](https://github.com/ClickHouse/ClickHouse/issues/63701). [#64377](https://github.com/ClickHouse/ClickHouse/pull/64377) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#64409](https://github.com/ClickHouse/ClickHouse/issues/64409): Fix `Logical error: Bad cast` for `Buffer` table with `PREWHERE`. Fixes [#64172](https://github.com/ClickHouse/ClickHouse/issues/64172). [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#64727](https://github.com/ClickHouse/ClickHouse/issues/64727): Fixed `CREATE TABLE AS` queries for tables with default expressions. [#64455](https://github.com/ClickHouse/ClickHouse/pull/64455) ([Anton Popov](https://github.com/CurtizJ)).
* Backported in [#64623](https://github.com/ClickHouse/ClickHouse/issues/64623): Fix an error `Cannot find column` in distributed queries with constant CTE in the `GROUP BY` key. [#64519](https://github.com/ClickHouse/ClickHouse/pull/64519) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#64680](https://github.com/ClickHouse/ClickHouse/issues/64680): Fix [#64612](https://github.com/ClickHouse/ClickHouse/issues/64612). Do not rewrite aggregation if `-If` combinator is already used. [#64638](https://github.com/ClickHouse/ClickHouse/pull/64638) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#64942](https://github.com/ClickHouse/ClickHouse/issues/64942): Fix OrderByLimitByDuplicateEliminationVisitor across subqueries. [#64766](https://github.com/ClickHouse/ClickHouse/pull/64766) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#64871](https://github.com/ClickHouse/ClickHouse/issues/64871): Fixed memory possible incorrect memory tracking in several kinds of queries: queries that read any data from S3, queries via http protocol, asynchronous inserts. [#64844](https://github.com/ClickHouse/ClickHouse/pull/64844) ([Anton Popov](https://github.com/CurtizJ)).
#### CI Fix or Improvement (changelog entry is not required)
* Backported in [#63364](https://github.com/ClickHouse/ClickHouse/issues/63364): Implement cumulative A Sync status. [#61464](https://github.com/ClickHouse/ClickHouse/pull/61464) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#63338](https://github.com/ClickHouse/ClickHouse/issues/63338): Use `/commit/` to have the URLs in [reports](https://play.clickhouse.com/play?user=play#c2VsZWN0IGRpc3RpbmN0IGNvbW1pdF91cmwgZnJvbSBjaGVja3Mgd2hlcmUgY2hlY2tfc3RhcnRfdGltZSA+PSBub3coKSAtIGludGVydmFsIDEgbW9udGggYW5kIHB1bGxfcmVxdWVzdF9udW1iZXI9NjA1MzI=) like https://github.com/ClickHouse/ClickHouse/commit/44f8bc5308b53797bec8cccc3bd29fab8a00235d and not like https://github.com/ClickHouse/ClickHouse/commits/44f8bc5308b53797bec8cccc3bd29fab8a00235d. [#63331](https://github.com/ClickHouse/ClickHouse/pull/63331) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#63376](https://github.com/ClickHouse/ClickHouse/issues/63376):. [#63366](https://github.com/ClickHouse/ClickHouse/pull/63366) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Backported in [#63571](https://github.com/ClickHouse/ClickHouse/issues/63571):. [#63551](https://github.com/ClickHouse/ClickHouse/pull/63551) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Backported in [#63651](https://github.com/ClickHouse/ClickHouse/issues/63651): Fix 02362_part_log_merge_algorithm flaky test. [#63635](https://github.com/ClickHouse/ClickHouse/pull/63635) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* Backported in [#63828](https://github.com/ClickHouse/ClickHouse/issues/63828): Fix test_odbc_interaction from aarch64 [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63787](https://github.com/ClickHouse/ClickHouse/pull/63787) ([alesapin](https://github.com/alesapin)).
* Backported in [#63897](https://github.com/ClickHouse/ClickHouse/issues/63897): Fix test `test_catboost_evaluate` for aarch64. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63789](https://github.com/ClickHouse/ClickHouse/pull/63789) ([alesapin](https://github.com/alesapin)).
* Backported in [#63889](https://github.com/ClickHouse/ClickHouse/issues/63889): Remove HDFS from disks config for one integration test for arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63832](https://github.com/ClickHouse/ClickHouse/pull/63832) ([alesapin](https://github.com/alesapin)).
* Backported in [#63881](https://github.com/ClickHouse/ClickHouse/issues/63881): Bump version for old image in test_short_strings_aggregation to make it work on arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63836](https://github.com/ClickHouse/ClickHouse/pull/63836) ([alesapin](https://github.com/alesapin)).
* Backported in [#63919](https://github.com/ClickHouse/ClickHouse/issues/63919): Disable test `test_non_default_compression/test.py::test_preconfigured_deflateqpl_codec` on arm. [#61457](https://github.com/ClickHouse/ClickHouse/issues/61457). [#63839](https://github.com/ClickHouse/ClickHouse/pull/63839) ([alesapin](https://github.com/alesapin)).
* Backported in [#63971](https://github.com/ClickHouse/ClickHouse/issues/63971): Fix 02124_insert_deduplication_token_multiple_blocks. [#63950](https://github.com/ClickHouse/ClickHouse/pull/63950) ([Han Fei](https://github.com/hanfei1991)).
* Backported in [#64049](https://github.com/ClickHouse/ClickHouse/issues/64049): Add `ClickHouseVersion.copy` method. Create a branch release in advance without spinning out the release to increase the stability. [#64039](https://github.com/ClickHouse/ClickHouse/pull/64039) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#64078](https://github.com/ClickHouse/ClickHouse/issues/64078): The mime type is not 100% reliable for Python and shell scripts without shebangs; add a check for file extension. [#64062](https://github.com/ClickHouse/ClickHouse/pull/64062) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#64161](https://github.com/ClickHouse/ClickHouse/issues/64161): Add retries in git submodule update. [#64125](https://github.com/ClickHouse/ClickHouse/pull/64125) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Critical Bug Fix (crash, LOGICAL_ERROR, data loss, RBAC)
* Backported in [#64589](https://github.com/ClickHouse/ClickHouse/issues/64589): Disabled `enable_vertical_final` setting by default. This feature should not be used because it has a bug: [#64543](https://github.com/ClickHouse/ClickHouse/issues/64543). [#64544](https://github.com/ClickHouse/ClickHouse/pull/64544) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Backported in [#64880](https://github.com/ClickHouse/ClickHouse/issues/64880): This PR fixes an error when a user in a specific situation can escalate their privileges on the default database without necessary grants. [#64769](https://github.com/ClickHouse/ClickHouse/pull/64769) ([pufit](https://github.com/pufit)).
#### NO CL CATEGORY
* Backported in [#63306](https://github.com/ClickHouse/ClickHouse/issues/63306):. [#63297](https://github.com/ClickHouse/ClickHouse/pull/63297) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#63710](https://github.com/ClickHouse/ClickHouse/issues/63710):. [#63415](https://github.com/ClickHouse/ClickHouse/pull/63415) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Backport [#64363](https://github.com/ClickHouse/ClickHouse/issues/64363) to 24.4: Split tests 03039_dynamic_all_merge_algorithms to avoid timeouts"'. [#64905](https://github.com/ClickHouse/ClickHouse/pull/64905) ([Raúl Marín](https://github.com/Algunenano)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* group_by_use_nulls strikes back [#62922](https://github.com/ClickHouse/ClickHouse/pull/62922) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add `FROM` keyword to `TRUNCATE ALL TABLES` [#63241](https://github.com/ClickHouse/ClickHouse/pull/63241) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* More checks for concurrently deleted files and dirs in system.remote_data_paths [#63274](https://github.com/ClickHouse/ClickHouse/pull/63274) ([Alexander Gololobov](https://github.com/davenger)).
* Try fix segfault in `MergeTreeReadPoolBase::createTask` [#63323](https://github.com/ClickHouse/ClickHouse/pull/63323) ([Antonio Andelic](https://github.com/antonio2368)).
* Skip unaccessible table dirs in system.remote_data_paths [#63330](https://github.com/ClickHouse/ClickHouse/pull/63330) ([Alexander Gololobov](https://github.com/davenger)).
* Workaround for `oklch()` inside canvas bug for firefox [#63404](https://github.com/ClickHouse/ClickHouse/pull/63404) ([Sergei Trifonov](https://github.com/serxa)).
* Cancel S3 reads properly when parallel reads are used [#63687](https://github.com/ClickHouse/ClickHouse/pull/63687) ([Antonio Andelic](https://github.com/antonio2368)).
* Userspace page cache: don't collect stats if cache is unused [#63730](https://github.com/ClickHouse/ClickHouse/pull/63730) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix sanitizers [#64090](https://github.com/ClickHouse/ClickHouse/pull/64090) ([Azat Khuzhin](https://github.com/azat)).
* Split tests 03039_dynamic_all_merge_algorithms to avoid timeouts [#64363](https://github.com/ClickHouse/ClickHouse/pull/64363) ([Kruglov Pavel](https://github.com/Avogar)).
* CI: Critical bugfix category in PR template [#64480](https://github.com/ClickHouse/ClickHouse/pull/64480) ([Max K.](https://github.com/maxknv)).

View File

@ -54,6 +54,7 @@ SELECT * FROM test_table;
- `_path` — Path to the file. Type: `LowCardinalty(String)`.
- `_file` — Name of the file. Type: `LowCardinalty(String)`.
- `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
## See also

View File

@ -235,6 +235,7 @@ libhdfs3 support HDFS namenode HA.
- `_path` — Path to the file. Type: `LowCardinalty(String)`.
- `_file` — Name of the file. Type: `LowCardinalty(String)`.
- `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
## Storage Settings {#storage-settings}

View File

@ -53,14 +53,14 @@ For partitioning by month, use the `toYYYYMM(date_column)` expression, where `da
This example uses the [docker compose recipe](https://github.com/ClickHouse/examples/tree/5fdc6ff72f4e5137e23ea075c88d3f44b0202490/docker-compose-recipes/recipes/ch-and-minio-S3), which integrates ClickHouse and MinIO. You should be able to reproduce the same queries using S3 by replacing the endpoint and authentication values.
Notice that the S3 endpoint in the `ENGINE` configuration uses the parameter token `{_partition_id}` as part of the S3 object (filename), and that the SELECT queries select against those resulting object names (e.g., `test_3.csv`).
Notice that the S3 endpoint in the `ENGINE` configuration uses the parameter token `{_partition_id}` as part of the S3 object (filename), and that the SELECT queries select against those resulting object names (e.g., `test_3.csv`).
:::note
As shown in the example, querying from S3 tables that are partitioned is
not directly supported at this time, but can be accomplished by querying the individual partitions
using the S3 table function.
The primary use-case for writing
The primary use-case for writing
partitioned data in S3 is to enable transferring that data into another
ClickHouse system (for example, moving from on-prem systems to ClickHouse
Cloud). Because ClickHouse datasets are often very large, and network
@ -78,9 +78,9 @@ CREATE TABLE p
)
ENGINE = S3(
# highlight-next-line
'http://minio:10000/clickhouse//test_{_partition_id}.csv',
'minioadmin',
'minioadminpassword',
'http://minio:10000/clickhouse//test_{_partition_id}.csv',
'minioadmin',
'minioadminpassword',
'CSV')
PARTITION BY column3
```
@ -145,6 +145,7 @@ Code: 48. DB::Exception: Received from localhost:9000. DB::Exception: Reading fr
- `_path` — Path to the file. Type: `LowCardinalty(String)`.
- `_file` — Name of the file. Type: `LowCardinalty(String)`.
- `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
For more information about virtual columns see [here](../../../engines/table-engines/index.md#table_engines-virtual_columns).

View File

@ -102,6 +102,7 @@ For partitioning by month, use the `toYYYYMM(date_column)` expression, where `da
- `_path` — Path to the file. Type: `LowCardinalty(String)`.
- `_file` — Name of the file. Type: `LowCardinalty(String)`.
- `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
## Settings {#settings}

View File

@ -108,6 +108,7 @@ For partitioning by month, use the `toYYYYMM(date_column)` expression, where `da
- `_path` — Path to the `URL`. Type: `LowCardinalty(String)`.
- `_file` — Resource name of the `URL`. Type: `LowCardinalty(String)`.
- `_size` — Size of the resource in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
## Storage Settings {#storage-settings}

View File

@ -72,6 +72,7 @@ SELECT count(*) FROM azureBlobStorage('DefaultEndpointsProtocol=https;AccountNam
- `_path` — Path to the file. Type: `LowCardinalty(String)`.
- `_file` — Name of the file. Type: `LowCardinalty(String)`.
- `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the file size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
**See Also**

View File

@ -196,6 +196,7 @@ SELECT count(*) FROM file('big_dir/**/file002', 'CSV', 'name String, value UInt3
- `_path` — Path to the file. Type: `LowCardinalty(String)`.
- `_file` — Name of the file. Type: `LowCardinalty(String)`.
- `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the file size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
## Settings {#settings}

View File

@ -97,6 +97,7 @@ FROM hdfs('hdfs://hdfs1:9000/big_dir/file{0..9}{0..9}{0..9}', 'CSV', 'name Strin
- `_path` — Path to the file. Type: `LowCardinalty(String)`.
- `_file` — Name of the file. Type: `LowCardinalty(String)`.
- `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
## Storage Settings {#storage-settings}

View File

@ -272,6 +272,7 @@ FROM s3(
- `_path` — Path to the file. Type: `LowCardinalty(String)`.
- `_file` — Name of the file. Type: `LowCardinalty(String)`.
- `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the file size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
## Storage Settings {#storage-settings}

View File

@ -53,6 +53,7 @@ Character `|` inside patterns is used to specify failover addresses. They are it
- `_path` — Path to the `URL`. Type: `LowCardinalty(String)`.
- `_file` — Resource name of the `URL`. Type: `LowCardinalty(String)`.
- `_size` — Size of the resource in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
## Storage Settings {#storage-settings}

View File

@ -54,9 +54,9 @@ namespace
S3::PocoHTTPClientConfiguration client_configuration = S3::ClientFactory::instance().createClientConfiguration(
settings.auth_settings.region,
context->getRemoteHostFilter(),
static_cast<unsigned>(global_settings.s3_max_redirects),
static_cast<unsigned>(global_settings.s3_retry_attempts),
global_settings.enable_s3_requests_logging,
static_cast<unsigned>(local_settings.s3_max_redirects),
static_cast<unsigned>(local_settings.backup_restore_s3_retry_attempts),
local_settings.enable_s3_requests_logging,
/* for_disk_s3 = */ false,
request_settings.get_request_throttler,
request_settings.put_request_throttler,

View File

@ -289,10 +289,14 @@ void executeColumnIfNeeded(ColumnWithTypeAndName & column, bool empty)
if (!column_function)
return;
size_t original_size = column.column->size();
if (!empty)
column = column_function->reduce();
else
column.column = column_function->getResultType()->createColumn();
column.column = column_function->getResultType()->createColumnConstWithDefaultValue(original_size)->convertToFullColumnIfConst();
chassert(column.column->size() == original_size);
}
int checkShortCircuitArguments(const ColumnsWithTypeAndName & arguments)

View File

@ -517,6 +517,7 @@ class IColumn;
M(UInt64, backup_restore_keeper_value_max_size, 1048576, "Maximum size of data of a [Zoo]Keeper's node during backup", 0) \
M(UInt64, backup_restore_batch_size_for_keeper_multiread, 10000, "Maximum size of batch for multiread request to [Zoo]Keeper during backup or restore", 0) \
M(UInt64, backup_restore_batch_size_for_keeper_multi, 1000, "Maximum size of batch for multi request to [Zoo]Keeper during backup or restore", 0) \
M(UInt64, backup_restore_s3_retry_attempts, 1000, "Setting for Aws::Client::RetryStrategy, Aws::Client does retries itself, 0 means no retries. It takes place only for backup/restore.", 0) \
M(UInt64, max_backup_bandwidth, 0, "The maximum read speed in bytes per second for particular backup on server. Zero means unlimited.", 0) \
\
M(Bool, log_profile_events, true, "Log query performance statistics into the query_log, query_thread_log and query_views_log.", 0) \

View File

@ -113,6 +113,7 @@ static const std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges
{"http_max_chunk_size", 0, 0, "Internal limitation"},
{"prefer_external_sort_block_bytes", 0, DEFAULT_BLOCK_SIZE * 256, "Prefer maximum block bytes for external sort, reduce the memory usage during merging."},
{"input_format_force_null_for_omitted_fields", false, false, "Disable type-defaults for omitted fields when needed"},
{"backup_restore_s3_retry_attempts", 0, 1000, "A new setting."},
{"cast_string_to_dynamic_use_inference", false, false, "Add setting to allow converting String to Dynamic through parsing"},
{"allow_experimental_dynamic_type", false, false, "Add new experimental Dynamic type"},
{"azure_max_blocks_in_multipart_upload", 50000, 50000, "Maximum number of blocks in multipart upload for Azure."},

View File

@ -122,6 +122,13 @@ DatabaseReplicated::DatabaseReplicated(
fillClusterAuthInfo(db_settings.collection_name.value, context_->getConfigRef());
replica_group_name = context_->getConfigRef().getString("replica_group_name", "");
if (!replica_group_name.empty() && database_name.starts_with(DatabaseReplicated::ALL_GROUPS_CLUSTER_PREFIX))
{
context_->addWarningMessage(fmt::format("There's a Replicated database with a name starting from '{}', "
"and replica_group_name is configured. It may cause collisions in cluster names.",
ALL_GROUPS_CLUSTER_PREFIX));
}
}
String DatabaseReplicated::getFullReplicaName(const String & shard, const String & replica)
@ -173,13 +180,40 @@ ClusterPtr DatabaseReplicated::tryGetCluster() const
return cluster;
}
void DatabaseReplicated::setCluster(ClusterPtr && new_cluster)
ClusterPtr DatabaseReplicated::tryGetAllGroupsCluster() const
{
std::lock_guard lock{mutex};
cluster = std::move(new_cluster);
if (replica_group_name.empty())
return nullptr;
if (cluster_all_groups)
return cluster_all_groups;
/// Database is probably not created or not initialized yet, it's ok to return nullptr
if (is_readonly)
return cluster_all_groups;
try
{
cluster_all_groups = getClusterImpl(/*all_groups*/ true);
}
catch (...)
{
tryLogCurrentException(log);
}
return cluster_all_groups;
}
ClusterPtr DatabaseReplicated::getClusterImpl() const
void DatabaseReplicated::setCluster(ClusterPtr && new_cluster, bool all_groups)
{
std::lock_guard lock{mutex};
if (all_groups)
cluster_all_groups = std::move(new_cluster);
else
cluster = std::move(new_cluster);
}
ClusterPtr DatabaseReplicated::getClusterImpl(bool all_groups) const
{
Strings unfiltered_hosts;
Strings hosts;
@ -199,17 +233,24 @@ ClusterPtr DatabaseReplicated::getClusterImpl() const
"It's possible if the first replica is not fully created yet "
"or if the last replica was just dropped or due to logical error", zookeeper_path);
hosts.clear();
std::vector<String> paths;
for (const auto & host : unfiltered_hosts)
paths.push_back(zookeeper_path + "/replicas/" + host + "/replica_group");
auto replica_groups = zookeeper->tryGet(paths);
for (size_t i = 0; i < paths.size(); ++i)
if (all_groups)
{
if (replica_groups[i].data == replica_group_name)
hosts.push_back(unfiltered_hosts[i]);
hosts = unfiltered_hosts;
}
else
{
hosts.clear();
std::vector<String> paths;
for (const auto & host : unfiltered_hosts)
paths.push_back(zookeeper_path + "/replicas/" + host + "/replica_group");
auto replica_groups = zookeeper->tryGet(paths);
for (size_t i = 0; i < paths.size(); ++i)
{
if (replica_groups[i].data == replica_group_name)
hosts.push_back(unfiltered_hosts[i]);
}
}
Int32 cversion = stat.cversion;
@ -274,6 +315,11 @@ ClusterPtr DatabaseReplicated::getClusterImpl() const
bool treat_local_as_remote = false;
bool treat_local_port_as_remote = getContext()->getApplicationType() == Context::ApplicationType::LOCAL;
String cluster_name = TSA_SUPPRESS_WARNING_FOR_READ(database_name); /// FIXME
if (all_groups)
cluster_name = ALL_GROUPS_CLUSTER_PREFIX + cluster_name;
ClusterConnectionParameters params{
cluster_auth_info.cluster_username,
cluster_auth_info.cluster_password,
@ -282,7 +328,7 @@ ClusterPtr DatabaseReplicated::getClusterImpl() const
treat_local_port_as_remote,
cluster_auth_info.cluster_secure_connection,
Priority{1},
TSA_SUPPRESS_WARNING_FOR_READ(database_name), /// FIXME
cluster_name,
cluster_auth_info.cluster_secret};
return std::make_shared<Cluster>(getContext()->getSettingsRef(), shards, params);

View File

@ -20,6 +20,8 @@ using ClusterPtr = std::shared_ptr<Cluster>;
class DatabaseReplicated : public DatabaseAtomic
{
public:
static constexpr auto ALL_GROUPS_CLUSTER_PREFIX = "all_groups.";
DatabaseReplicated(const String & name_, const String & metadata_path_, UUID uuid,
const String & zookeeper_path_, const String & shard_name_, const String & replica_name_,
DatabaseReplicatedSettings db_settings_,
@ -65,6 +67,7 @@ public:
/// Returns cluster consisting of database replicas
ClusterPtr tryGetCluster() const;
ClusterPtr tryGetAllGroupsCluster() const;
void drop(ContextPtr /*context*/) override;
@ -113,8 +116,8 @@ private:
ASTPtr parseQueryFromMetadataInZooKeeper(const String & node_name, const String & query);
String readMetadataFile(const String & table_name) const;
ClusterPtr getClusterImpl() const;
void setCluster(ClusterPtr && new_cluster);
ClusterPtr getClusterImpl(bool all_groups = false) const;
void setCluster(ClusterPtr && new_cluster, bool all_groups = false);
void createEmptyLogEntry(const ZooKeeperPtr & current_zookeeper);
@ -155,6 +158,7 @@ private:
UInt64 tables_metadata_digest TSA_GUARDED_BY(metadata_mutex);
mutable ClusterPtr cluster;
mutable ClusterPtr cluster_all_groups;
LoadTaskPtr startup_replicated_database_task TSA_GUARDED_BY(mutex);
};

View File

@ -421,6 +421,8 @@ DDLTaskPtr DatabaseReplicatedDDLWorker::initAndCheckTask(const String & entry_na
{
/// Some replica is added or removed, let's update cached cluster
database->setCluster(database->getClusterImpl());
if (!database->replica_group_name.empty())
database->setCluster(database->getClusterImpl(/*all_groups*/ true), /*all_groups*/ true);
out_reason = fmt::format("Entry {} is a dummy task", entry_name);
return {};
}

View File

@ -41,11 +41,11 @@ void applyMetadataChangesToCreateQuery(const ASTPtr & query, const StorageInMemo
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Cannot alter table {} because it was created AS table function"
" and doesn't have structure in metadata", backQuote(ast_create_query.getTable()));
if (!has_structure && !ast_create_query.is_dictionary)
if (!has_structure && !ast_create_query.is_dictionary && !ast_create_query.isParameterizedView())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot alter table {} metadata doesn't have structure",
backQuote(ast_create_query.getTable()));
if (!ast_create_query.is_dictionary)
if (!ast_create_query.is_dictionary && !ast_create_query.isParameterizedView())
{
ASTPtr new_columns = InterpreterCreateQuery::formatColumns(metadata.columns);
ASTPtr new_indices = InterpreterCreateQuery::formatIndices(metadata.secondary_indices);

View File

@ -382,7 +382,6 @@ void S3ObjectStorage::removeObjectsImpl(const StoredObjects & objects, bool if_e
{
std::vector<Aws::S3::Model::ObjectIdentifier> current_chunk;
String keys;
size_t first_position = current_position;
for (; current_position < objects.size() && current_chunk.size() < chunk_size_limit; ++current_position)
{
Aws::S3::Model::ObjectIdentifier obj;
@ -408,9 +407,9 @@ void S3ObjectStorage::removeObjectsImpl(const StoredObjects & objects, bool if_e
{
const auto * outcome_error = outcome.IsSuccess() ? nullptr : &outcome.GetError();
auto time_now = std::chrono::system_clock::now();
for (size_t i = first_position; i < current_position; ++i)
for (const auto & object : objects)
blob_storage_log->addEvent(BlobStorageLogElement::EventType::Delete,
uri.bucket, objects[i].remote_path, objects[i].local_path, objects[i].bytes_size,
uri.bucket, object.remote_path, object.local_path, object.bytes_size,
outcome_error, time_now);
}

View File

@ -5,6 +5,7 @@
#include <functional>
#include <memory>
#include <Poco/Timestamp.h>
namespace DB
{
@ -25,6 +26,7 @@ public:
{
UInt64 uncompressed_size;
UInt64 compressed_size;
Poco::Timestamp last_modified;
bool is_encrypted;
};

View File

@ -157,6 +157,7 @@ public:
file_info.emplace();
file_info->uncompressed_size = archive_entry_size(current_entry);
file_info->compressed_size = archive_entry_size(current_entry);
file_info->last_modified = archive_entry_mtime(current_entry);
file_info->is_encrypted = false;
}

View File

@ -162,7 +162,7 @@ public:
class RetryStrategy : public Aws::Client::RetryStrategy
{
public:
explicit RetryStrategy(uint32_t maxRetries_ = 10, uint32_t scaleFactor_ = 25, uint32_t maxDelayMs_ = 90000);
explicit RetryStrategy(uint32_t maxRetries_ = 10, uint32_t scaleFactor_ = 25, uint32_t maxDelayMs_ = 5000);
/// NOLINTNEXTLINE(google-runtime-int)
bool ShouldRetry(const Aws::Client::AWSError<Aws::Client::CoreErrors>& error, long attemptedRetries) const override;

View File

@ -568,8 +568,21 @@ void ZooKeeperMetadataTransaction::commit()
ClusterPtr tryGetReplicatedDatabaseCluster(const String & cluster_name)
{
if (const auto * replicated_db = dynamic_cast<const DatabaseReplicated *>(DatabaseCatalog::instance().tryGetDatabase(cluster_name).get()))
return replicated_db->tryGetCluster();
String name = cluster_name;
bool all_groups = false;
if (name.starts_with(DatabaseReplicated::ALL_GROUPS_CLUSTER_PREFIX))
{
name = name.substr(strlen(DatabaseReplicated::ALL_GROUPS_CLUSTER_PREFIX));
all_groups = true;
}
if (const auto * replicated_db = dynamic_cast<const DatabaseReplicated *>(DatabaseCatalog::instance().tryGetDatabase(name).get()))
{
if (all_groups)
return replicated_db->tryGetAllGroupsCluster();
else
return replicated_db->tryGetCluster();
}
return {};
}

View File

@ -504,10 +504,6 @@ void SystemLog<LogElement>::flushImpl(const std::vector<LogElement> & to_flush,
Block block(std::move(log_element_columns));
MutableColumns columns = block.mutateColumns();
for (auto & column : columns)
column->reserve(to_flush.size());
for (const auto & elem : to_flush)
elem.appendToBlock(columns);
@ -536,8 +532,7 @@ void SystemLog<LogElement>::flushImpl(const std::vector<LogElement> & to_flush,
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__, fmt::format("Failed to flush system log {} with {} entries up to offset {}",
table_id.getNameForLogs(), to_flush.size(), to_flush_end));
tryLogCurrentException(__PRETTY_FUNCTION__);
}
queue->confirm(to_flush_end);

View File

@ -18,6 +18,7 @@
#include <Interpreters/inplaceBlockConversions.h>
#include <Interpreters/InterpreterSelectWithUnionQuery.h>
#include <Interpreters/InterpreterSelectQueryAnalyzer.h>
#include <Storages/StorageView.h>
#include <Parsers/ASTAlterQuery.h>
#include <Parsers/ASTColumnDeclaration.h>
#include <Parsers/ASTConstraintDeclaration.h>
@ -1613,7 +1614,10 @@ void AlterCommands::validate(const StoragePtr & table, ContextPtr context) const
}
}
if (all_columns.empty())
/// Parameterized views do not have 'columns' in their metadata
bool is_parameterized_view = table->as<StorageView>() && table->as<StorageView>()->isParameterizedView();
if (!is_parameterized_view && all_columns.empty())
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Cannot DROP or CLEAR all columns");
validateColumnsDefaultsAndGetSampleBlock(default_expr_list, all_columns.getAll(), context);

View File

@ -195,12 +195,14 @@ Chunk StorageObjectStorageSource::generate()
const auto & object_info = reader.getObjectInfo();
const auto & filename = object_info.getFileName();
chassert(object_info.metadata);
VirtualColumnUtils::addRequestedPathFileAndSizeVirtualsToChunk(
chunk,
read_from_format_info.requested_virtual_columns,
getUniqueStoragePathIdentifier(*configuration, reader.getObjectInfo(), false),
object_info.metadata->size_bytes, &filename);
VirtualColumnUtils::addRequestedFileLikeStorageVirtualsToChunk(
chunk, read_from_format_info.requested_virtual_columns,
{
.path = getUniqueStoragePathIdentifier(*configuration, reader.getObjectInfo(), false),
.size = object_info.metadata->size_bytes,
.filename = &filename,
.last_modified = object_info.metadata->last_modified
});
return chunk;
}

View File

@ -421,8 +421,14 @@ Chunk StorageS3QueueSource::generate()
file_status->processed_rows += chunk.getNumRows();
processed_rows_from_file += chunk.getNumRows();
VirtualColumnUtils::addRequestedPathFileAndSizeVirtualsToChunk(
chunk, requested_virtual_columns, path, reader.getObjectInfo().metadata->size_bytes);
VirtualColumnUtils::addRequestedFileLikeStorageVirtualsToChunk(
chunk, requested_virtual_columns,
{
.path = path,
.size = reader.getObjectInfo().metadata->size_bytes
});
return chunk;
}
}

View File

@ -1341,6 +1341,7 @@ Chunk StorageFileSource::generate()
chassert(file_enumerator);
current_path = fmt::format("{}::{}", archive_reader->getPath(), *filename_override);
current_file_size = file_enumerator->getFileInfo().uncompressed_size;
current_file_last_modified = file_enumerator->getFileInfo().last_modified;
if (need_only_count && tryGetCountFromCache(current_archive_stat))
continue;
@ -1370,6 +1371,7 @@ Chunk StorageFileSource::generate()
struct stat file_stat;
file_stat = getFileStat(current_path, storage->use_table_fd, storage->table_fd, storage->getName());
current_file_size = file_stat.st_size;
current_file_last_modified = Poco::Timestamp::fromEpochTime(file_stat.st_mtime);
if (getContext()->getSettingsRef().engine_file_skip_empty_files && file_stat.st_size == 0)
continue;
@ -1436,8 +1438,15 @@ Chunk StorageFileSource::generate()
progress(num_rows, chunk_size ? chunk_size : chunk.bytes());
/// Enrich with virtual columns.
VirtualColumnUtils::addRequestedPathFileAndSizeVirtualsToChunk(
chunk, requested_virtual_columns, current_path, current_file_size, filename_override.has_value() ? &filename_override.value() : nullptr);
VirtualColumnUtils::addRequestedFileLikeStorageVirtualsToChunk(
chunk, requested_virtual_columns,
{
.path = current_path,
.size = current_file_size,
.filename = (filename_override.has_value() ? &filename_override.value() : nullptr),
.last_modified = current_file_last_modified
});
return chunk;
}

View File

@ -279,6 +279,7 @@ private:
FilesIteratorPtr files_iterator;
String current_path;
std::optional<size_t> current_file_size;
std::optional<Poco::Timestamp> current_file_last_modified;
struct stat current_archive_stat;
std::optional<String> filename_override;
Block sample_block;

View File

@ -50,6 +50,12 @@ namespace ErrorCodes
namespace
{
struct GenerateRandomState
{
std::atomic<UInt64> add_total_rows = 0;
};
using GenerateRandomStatePtr = std::shared_ptr<GenerateRandomState>;
void fillBufferWithRandomData(char * __restrict data, size_t limit, size_t size_of_type, pcg64 & rng, [[maybe_unused]] bool flip_bytes = false)
{
size_t size = limit * size_of_type;
@ -532,10 +538,24 @@ ColumnPtr fillColumnWithRandomData(
class GenerateSource : public ISource
{
public:
GenerateSource(UInt64 block_size_, UInt64 max_array_length_, UInt64 max_string_length_, UInt64 random_seed_, Block block_header_, ContextPtr context_)
GenerateSource(
UInt64 block_size_,
UInt64 max_array_length_,
UInt64 max_string_length_,
UInt64 random_seed_,
Block block_header_,
ContextPtr context_,
GenerateRandomStatePtr state_)
: ISource(Nested::flattenNested(prepareBlockToFill(block_header_)))
, block_size(block_size_), max_array_length(max_array_length_), max_string_length(max_string_length_)
, block_to_fill(std::move(block_header_)), rng(random_seed_), context(context_) {}
, block_size(block_size_)
, max_array_length(max_array_length_)
, max_string_length(max_string_length_)
, block_to_fill(std::move(block_header_))
, rng(random_seed_)
, context(context_)
, shared_state(state_)
{
}
String getName() const override { return "GenerateRandom"; }
@ -549,7 +569,15 @@ protected:
columns.emplace_back(fillColumnWithRandomData(elem.type, block_size, max_array_length, max_string_length, rng, context));
columns = Nested::flattenNested(block_to_fill.cloneWithColumns(columns)).getColumns();
return {std::move(columns), block_size};
UInt64 total_rows = shared_state->add_total_rows.fetch_and(0);
if (total_rows)
addTotalRowsApprox(total_rows);
auto chunk = Chunk{std::move(columns), block_size};
progress(chunk.getNumRows(), chunk.bytes());
return chunk;
}
private:
@ -561,6 +589,7 @@ private:
pcg64 rng;
ContextPtr context;
GenerateRandomStatePtr shared_state;
static Block & prepareBlockToFill(Block & block)
{
@ -648,9 +677,6 @@ Pipe StorageGenerateRandom::read(
{
storage_snapshot->check(column_names);
Pipes pipes;
pipes.reserve(num_streams);
const ColumnsDescription & our_columns = storage_snapshot->metadata->getColumns();
Block block_header;
for (const auto & name : column_names)
@ -679,16 +705,24 @@ Pipe StorageGenerateRandom::read(
}
}
UInt64 query_limit = query_info.limit;
if (query_limit && num_streams * max_block_size > query_limit)
{
/// We want to avoid spawning more streams than necessary
num_streams = std::min(num_streams, static_cast<size_t>(((query_limit + max_block_size - 1) / max_block_size)));
}
Pipes pipes;
pipes.reserve(num_streams);
/// Will create more seed values for each source from initial seed.
pcg64 generate(random_seed);
auto shared_state = std::make_shared<GenerateRandomState>(query_info.limit);
for (UInt64 i = 0; i < num_streams; ++i)
{
auto source = std::make_shared<GenerateSource>(max_block_size, max_array_length, max_string_length, generate(), block_header, context);
if (i == 0 && query_info.limit)
source->addTotalRowsApprox(query_info.limit);
auto source = std::make_shared<GenerateSource>(
max_block_size, max_array_length, max_string_length, generate(), block_header, context, shared_state);
pipes.emplace_back(std::move(source));
}

View File

@ -411,7 +411,12 @@ Chunk StorageURLSource::generate()
if (input_format)
chunk_size = input_format->getApproxBytesReadForChunk();
progress(num_rows, chunk_size ? chunk_size : chunk.bytes());
VirtualColumnUtils::addRequestedPathFileAndSizeVirtualsToChunk(chunk, requested_virtual_columns, curr_uri.getPath(), current_file_size);
VirtualColumnUtils::addRequestedFileLikeStorageVirtualsToChunk(
chunk, requested_virtual_columns,
{
.path = curr_uri.getPath(),
.size = current_file_size
});
return chunk;
}

View File

@ -54,6 +54,10 @@ void StorageSystemClusters::fillData(MutableColumns & res_columns, ContextPtr co
if (auto database_cluster = replicated->tryGetCluster())
writeCluster(res_columns, {name_and_database.first, database_cluster},
replicated->tryGetAreReplicasActive(database_cluster));
if (auto database_cluster = replicated->tryGetAllGroupsCluster())
writeCluster(res_columns, {DatabaseReplicated::ALL_GROUPS_CLUSTER_PREFIX + name_and_database.first, database_cluster},
replicated->tryGetAreReplicasActive(database_cluster));
}
}
}

View File

@ -16,7 +16,9 @@ namespace
struct ZerosState
{
explicit ZerosState(UInt64 limit) : add_total_rows(limit) { }
std::atomic<UInt64> num_generated_rows = 0;
std::atomic<UInt64> add_total_rows = 0;
};
using ZerosStatePtr = std::shared_ptr<ZerosState>;
@ -42,10 +44,13 @@ protected:
auto column_ptr = column;
size_t column_size = column_ptr->size();
if (state)
UInt64 total_rows = state->add_total_rows.fetch_and(0);
if (total_rows)
addTotalRowsApprox(total_rows);
if (limit)
{
auto generated_rows = state->num_generated_rows.fetch_add(column_size, std::memory_order_acquire);
if (generated_rows >= limit)
return {};
@ -103,36 +108,25 @@ Pipe StorageSystemZeros::read(
{
storage_snapshot->check(column_names);
bool use_multiple_streams = multithreaded;
UInt64 query_limit = limit ? *limit : 0;
if (query_info.limit)
query_limit = query_limit ? std::min(query_limit, query_info.limit) : query_info.limit;
if (limit && *limit < max_block_size)
{
max_block_size = static_cast<size_t>(*limit);
use_multiple_streams = false;
}
if (query_limit && query_limit < max_block_size)
max_block_size = query_limit;
if (!use_multiple_streams)
if (!multithreaded)
num_streams = 1;
else if (query_limit && num_streams * max_block_size > query_limit)
/// We want to avoid spawning more streams than necessary
num_streams = std::min(num_streams, static_cast<size_t>(((query_limit + max_block_size - 1) / max_block_size)));
ZerosStatePtr state = std::make_shared<ZerosState>(query_limit);
Pipe res;
ZerosStatePtr state;
if (limit)
state = std::make_shared<ZerosState>();
for (size_t i = 0; i < num_streams; ++i)
{
auto source = std::make_shared<ZerosSource>(max_block_size, limit ? *limit : 0, state);
if (i == 0)
{
if (limit)
source->addTotalRowsApprox(*limit);
else if (query_info.limit)
source->addTotalRowsApprox(query_info.limit);
}
auto source = std::make_shared<ZerosSource>(max_block_size, query_limit, state);
res.addSource(std::move(source));
}

View File

@ -26,6 +26,7 @@
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypeLowCardinality.h>
#include <DataTypes/DataTypeDateTime.h>
#include <Processors/QueryPlan/QueryPlan.h>
#include <Processors/QueryPlan/BuildQueryPipelineSettings.h>
@ -111,7 +112,7 @@ void filterBlockWithDAG(ActionsDAGPtr dag, Block & block, ContextPtr context)
NameSet getVirtualNamesForFileLikeStorage()
{
return {"_path", "_file", "_size"};
return {"_path", "_file", "_size", "_time"};
}
VirtualColumnsDescription getVirtualsForFileLikeStorage(const ColumnsDescription & storage_columns)
@ -129,6 +130,7 @@ VirtualColumnsDescription getVirtualsForFileLikeStorage(const ColumnsDescription
add_virtual("_path", std::make_shared<DataTypeLowCardinality>(std::make_shared<DataTypeString>()));
add_virtual("_file", std::make_shared<DataTypeLowCardinality>(std::make_shared<DataTypeString>()));
add_virtual("_size", makeNullable(std::make_shared<DataTypeUInt64>()));
add_virtual("_time", makeNullable(std::make_shared<DataTypeDateTime>()));
return desc;
}
@ -187,32 +189,40 @@ ColumnPtr getFilterByPathAndFileIndexes(const std::vector<String> & paths, const
return block.getByName("_idx").column;
}
void addRequestedPathFileAndSizeVirtualsToChunk(
Chunk & chunk, const NamesAndTypesList & requested_virtual_columns, const String & path, std::optional<size_t> size, const String * filename)
void addRequestedFileLikeStorageVirtualsToChunk(
Chunk & chunk, const NamesAndTypesList & requested_virtual_columns,
VirtualsForFileLikeStorage virtual_values)
{
for (const auto & virtual_column : requested_virtual_columns)
{
if (virtual_column.name == "_path")
{
chunk.addColumn(virtual_column.type->createColumnConst(chunk.getNumRows(), path)->convertToFullColumnIfConst());
chunk.addColumn(virtual_column.type->createColumnConst(chunk.getNumRows(), virtual_values.path)->convertToFullColumnIfConst());
}
else if (virtual_column.name == "_file")
{
if (filename)
if (virtual_values.filename)
{
chunk.addColumn(virtual_column.type->createColumnConst(chunk.getNumRows(), *filename)->convertToFullColumnIfConst());
chunk.addColumn(virtual_column.type->createColumnConst(chunk.getNumRows(), (*virtual_values.filename))->convertToFullColumnIfConst());
}
else
{
size_t last_slash_pos = path.find_last_of('/');
auto filename_from_path = path.substr(last_slash_pos + 1);
size_t last_slash_pos = virtual_values.path.find_last_of('/');
auto filename_from_path = virtual_values.path.substr(last_slash_pos + 1);
chunk.addColumn(virtual_column.type->createColumnConst(chunk.getNumRows(), filename_from_path)->convertToFullColumnIfConst());
}
}
else if (virtual_column.name == "_size")
{
if (size)
chunk.addColumn(virtual_column.type->createColumnConst(chunk.getNumRows(), *size)->convertToFullColumnIfConst());
if (virtual_values.size)
chunk.addColumn(virtual_column.type->createColumnConst(chunk.getNumRows(), *virtual_values.size)->convertToFullColumnIfConst());
else
chunk.addColumn(virtual_column.type->createColumnConstWithDefaultValue(chunk.getNumRows())->convertToFullColumnIfConst());
}
else if (virtual_column.name == "_time")
{
if (virtual_values.last_modified)
chunk.addColumn(virtual_column.type->createColumnConst(chunk.getNumRows(), virtual_values.last_modified->epochTime())->convertToFullColumnIfConst());
else
chunk.addColumn(virtual_column.type->createColumnConstWithDefaultValue(chunk.getNumRows())->convertToFullColumnIfConst());
}

View File

@ -68,8 +68,18 @@ void filterByPathOrFile(std::vector<T> & sources, const std::vector<String> & pa
sources = std::move(filtered_sources);
}
void addRequestedPathFileAndSizeVirtualsToChunk(
Chunk & chunk, const NamesAndTypesList & requested_virtual_columns, const String & path, std::optional<size_t> size, const String * filename = nullptr);
struct VirtualsForFileLikeStorage
{
const String & path;
std::optional<size_t> size { std::nullopt };
const String * filename { nullptr };
std::optional<Poco::Timestamp> last_modified { std::nullopt };
};
void addRequestedFileLikeStorageVirtualsToChunk(
Chunk & chunk, const NamesAndTypesList & requested_virtual_columns,
VirtualsForFileLikeStorage virtual_values);
}
}

View File

@ -27,6 +27,8 @@ except ImportError:
DOWNLOAD_RETRIES_COUNT = 5
logger = logging.getLogger(__name__)
class DownloadException(Exception):
pass
@ -42,7 +44,7 @@ def get_with_retries(
sleep: int = 3,
**kwargs: Any,
) -> requests.Response:
logging.info(
logger.info(
"Getting URL with %i tries and sleep %i in between: %s", retries, sleep, url
)
exc = Exception("A placeholder to satisfy typing and avoid nesting")
@ -54,7 +56,7 @@ def get_with_retries(
return response
except Exception as e:
if i + 1 < retries:
logging.info("Exception '%s' while getting, retry %i", e, i + 1)
logger.info("Exception '%s' while getting, retry %i", e, i + 1)
time.sleep(sleep)
exc = e
@ -103,7 +105,7 @@ def get_gh_api(
)
try_auth = e.response.status_code == 404
if (ratelimit_exceeded or try_auth) and not token_is_set:
logging.warning(
logger.warning(
"Received rate limit exception, setting the auth header and retry"
)
set_auth_header()
@ -114,10 +116,10 @@ def get_gh_api(
exc = e
if try_cnt < retries:
logging.info("Exception '%s' while getting, retry %i", exc, try_cnt)
logger.info("Exception '%s' while getting, retry %i", exc, try_cnt)
time.sleep(sleep)
raise APIException("Unable to request data from GH API") from exc
raise APIException(f"Unable to request data from GH API: {url}") from exc
def get_build_name_for_check(check_name: str) -> str:
@ -128,25 +130,25 @@ def read_build_urls(build_name: str, reports_path: Union[Path, str]) -> List[str
for root, _, files in os.walk(reports_path):
for file in files:
if file.endswith(f"_{build_name}.json"):
logging.info("Found build report json %s for %s", file, build_name)
logger.info("Found build report json %s for %s", file, build_name)
with open(
os.path.join(root, file), "r", encoding="utf-8"
) as file_handler:
build_report = json.load(file_handler)
return build_report["build_urls"] # type: ignore
logging.info("A build report is not found for %s", build_name)
logger.info("A build report is not found for %s", build_name)
return []
def download_build_with_progress(url: str, path: Path) -> None:
logging.info("Downloading from %s to temp path %s", url, path)
logger.info("Downloading from %s to temp path %s", url, path)
for i in range(DOWNLOAD_RETRIES_COUNT):
try:
response = get_with_retries(url, retries=1, stream=True)
total_length = int(response.headers.get("content-length", 0))
if path.is_file() and total_length and path.stat().st_size == total_length:
logging.info(
logger.info(
"The file %s already exists and have a proper size %s",
path,
total_length,
@ -155,14 +157,14 @@ def download_build_with_progress(url: str, path: Path) -> None:
with open(path, "wb") as f:
if total_length == 0:
logging.info(
logger.info(
"No content-length, will download file without progress"
)
f.write(response.content)
else:
dl = 0
logging.info("Content length is %ld bytes", total_length)
logger.info("Content length is %ld bytes", total_length)
for data in response.iter_content(chunk_size=4096):
dl += len(data)
f.write(data)
@ -177,8 +179,8 @@ def download_build_with_progress(url: str, path: Path) -> None:
except Exception as e:
if sys.stdout.isatty():
sys.stdout.write("\n")
if os.path.exists(path):
os.remove(path)
if path.exists():
path.unlink()
if i + 1 < DOWNLOAD_RETRIES_COUNT:
time.sleep(3)
@ -189,7 +191,7 @@ def download_build_with_progress(url: str, path: Path) -> None:
if sys.stdout.isatty():
sys.stdout.write("\n")
logging.info("Downloading finished")
logger.info("Downloading finished")
def download_builds(
@ -198,7 +200,7 @@ def download_builds(
for url in build_urls:
if filter_fn(url):
fname = os.path.basename(url.replace("%2B", "+").replace("%20", " "))
logging.info("Will download %s to %s", fname, result_path)
logger.info("Will download %s to %s", fname, result_path)
download_build_with_progress(url, result_path / fname)
@ -210,7 +212,7 @@ def download_builds_filter(
) -> None:
build_name = get_build_name_for_check(check_name)
urls = read_build_urls(build_name, reports_path)
logging.info("The build report for %s contains the next URLs: %s", build_name, urls)
logger.info("The build report for %s contains the next URLs: %s", build_name, urls)
if not urls:
raise DownloadException("No build URLs found")
@ -247,7 +249,7 @@ def get_clickhouse_binary_url(
) -> Optional[str]:
build_name = get_build_name_for_check(check_name)
urls = read_build_urls(build_name, reports_path)
logging.info("The build report for %s contains the next URLs: %s", build_name, urls)
logger.info("The build report for %s contains the next URLs: %s", build_name, urls)
for url in urls:
check_url = url
if "?" in check_url:

View File

@ -59,7 +59,7 @@ def get_pr_for_commit(sha, ref):
data = response.json()
our_prs = [] # type: List[Dict]
if len(data) > 1:
print("Got more than one pr for commit", sha)
logging.warning("Got more than one pr for commit %s", sha)
for pr in data:
# We need to check if the PR is created in our repo, because
# https://github.com/kaynewu/ClickHouse/pull/2
@ -71,13 +71,20 @@ def get_pr_for_commit(sha, ref):
if pr["head"]["ref"] in ref:
return pr
our_prs.append(pr)
print(
f"Cannot find PR with required ref {ref}, sha {sha} - returning first one"
logging.warning(
"Cannot find PR with required ref %s, sha %s - returning first one",
ref,
sha,
)
first_pr = our_prs[0]
return first_pr
except Exception as ex:
print(f"Cannot fetch PR info from commit {ref}, {sha}", ex)
logging.error(
"Cannot fetch PR info from commit ref %s, sha %s, exception: %s",
ref,
sha,
ex,
)
return None
@ -259,12 +266,12 @@ class PRInfo:
self.diff_urls.append(
self.compare_url(
pull_request["base"]["repo"]["default_branch"],
pull_request["head"]["label"],
pull_request["head"]["sha"],
)
)
self.diff_urls.append(
self.compare_url(
pull_request["head"]["label"],
pull_request["head"]["sha"],
pull_request["base"]["repo"]["default_branch"],
)
)
@ -279,7 +286,7 @@ class PRInfo:
# itself, but as well files changed since we branched out
self.diff_urls.append(
self.compare_url(
pull_request["head"]["label"],
pull_request["head"]["sha"],
pull_request["base"]["repo"]["default_branch"],
)
)
@ -289,8 +296,10 @@ class PRInfo:
else:
# assume this is a dispatch
self.event_type = EventType.DISPATCH
print("event.json does not match pull_request or push:")
print(json.dumps(github_event, sort_keys=True, indent=4))
logging.warning(
"event.json does not match pull_request or push:\n%s",
json.dumps(github_event, sort_keys=True, indent=4),
)
self.sha = os.getenv(
"GITHUB_SHA", "0000000000000000000000000000000000000000"
)
@ -330,7 +339,7 @@ class PRInfo:
return self.event_type == EventType.DISPATCH
def compare_pr_url(self, pr_object: dict) -> str:
return self.compare_url(pr_object["base"]["label"], pr_object["head"]["label"])
return self.compare_url(pr_object["base"]["sha"], pr_object["head"]["sha"])
@staticmethod
def compare_url(first: str, second: str) -> str:
@ -357,7 +366,7 @@ class PRInfo:
diff_object = PatchSet(response.text)
self.changed_files.update({f.path for f in diff_object})
self.changed_files_requested = True
print(f"Fetched info about {len(self.changed_files)} changed files")
logging.info("Fetched info about %s changed files", len(self.changed_files))
def get_dict(self):
return {

View File

@ -2,6 +2,7 @@
<profiles>
<default>
<s3_retry_attempts>5</s3_retry_attempts>
<backup_restore_s3_retry_attempts>5</backup_restore_s3_retry_attempts>
</default>
</profiles>
<users>

View File

@ -0,0 +1,10 @@
<clickhouse>
<database_atomic_delay_before_drop_table_sec>10</database_atomic_delay_before_drop_table_sec>
<allow_moving_table_directory_to_trash>1</allow_moving_table_directory_to_trash>
<merge_tree>
<initialization_retry_period>10</initialization_retry_period>
</merge_tree>
<max_database_replicated_create_table_thread_pool_size>50</max_database_replicated_create_table_thread_pool_size>
<allow_experimental_transactions>42</allow_experimental_transactions>
<replica_group_name>group</replica_group_name>
</clickhouse>

View File

@ -61,7 +61,7 @@ all_nodes = [
bad_settings_node = cluster.add_instance(
"bad_settings_node",
main_configs=["configs/config.xml"],
main_configs=["configs/config2.xml"],
user_configs=["configs/inconsistent_settings.xml"],
with_zookeeper=True,
macros={"shard": 1, "replica": 4},
@ -1522,3 +1522,24 @@ def test_auto_recovery(started_cluster):
assert "42\n" == bad_settings_node.query("SELECT * FROM auto_recovery.t2")
assert "137\n" == bad_settings_node.query("SELECT * FROM auto_recovery.t1")
def test_all_groups_cluster(started_cluster):
dummy_node.query("DROP DATABASE IF EXISTS db_cluster")
bad_settings_node.query("DROP DATABASE IF EXISTS db_cluster")
dummy_node.query(
"CREATE DATABASE db_cluster ENGINE = Replicated('/clickhouse/databases/all_groups_cluster', 'shard1', 'replica1');"
)
bad_settings_node.query(
"CREATE DATABASE db_cluster ENGINE = Replicated('/clickhouse/databases/all_groups_cluster', 'shard1', 'replica2');"
)
assert "dummy_node\n" == dummy_node.query(
"select host_name from system.clusters where name='db_cluster' order by host_name"
)
assert "bad_settings_node\n" == bad_settings_node.query(
"select host_name from system.clusters where name='db_cluster' order by host_name"
)
assert "bad_settings_node\ndummy_node\n" == bad_settings_node.query(
"select host_name from system.clusters where name='all_groups.db_cluster' order by host_name"
)

View File

@ -758,12 +758,12 @@ def test_read_subcolumns(cluster):
)
res = node.query(
f"select a.b.d, _path, a.b, _file, a.e from azureBlobStorage('{storage_account_url}', 'cont', 'test_subcolumns.tsv',"
f"select a.b.d, _path, a.b, _file, dateDiff('minute', _time, now()), a.e from azureBlobStorage('{storage_account_url}', 'cont', 'test_subcolumns.tsv',"
f" 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'auto', 'auto',"
f" 'a Tuple(b Tuple(c UInt32, d UInt32), e UInt32)')"
)
assert res == "2\tcont/test_subcolumns.tsv\t(1,2)\ttest_subcolumns.tsv\t3\n"
assert res == "2\tcont/test_subcolumns.tsv\t(1,2)\ttest_subcolumns.tsv\t0\t3\n"
res = node.query(
f"select a.b.d, _path, a.b, _file, a.e from azureBlobStorage('{storage_account_url}', 'cont', 'test_subcolumns.jsonl',"

View File

@ -987,10 +987,10 @@ def test_read_subcolumns(started_cluster):
assert res == "2\ttest_subcolumns.jsonl\t(1,2)\ttest_subcolumns.jsonl\t3\n"
res = node.query(
f"select x.b.d, _path, x.b, _file, x.e from hdfs('hdfs://hdfs1:9000/test_subcolumns.jsonl', auto, 'x Tuple(b Tuple(c UInt32, d UInt32), e UInt32)')"
f"select x.b.d, _path, x.b, _file, dateDiff('minute', _time, now()), x.e from hdfs('hdfs://hdfs1:9000/test_subcolumns.jsonl', auto, 'x Tuple(b Tuple(c UInt32, d UInt32), e UInt32)')"
)
assert res == "0\ttest_subcolumns.jsonl\t(0,0)\ttest_subcolumns.jsonl\t0\n"
assert res == "0\ttest_subcolumns.jsonl\t(0,0)\ttest_subcolumns.jsonl\t0\t0\n"
res = node.query(
f"select x.b.d, _path, x.b, _file, x.e from hdfs('hdfs://hdfs1:9000/test_subcolumns.jsonl', auto, 'x Tuple(b Tuple(c UInt32, d UInt32), e UInt32) default ((42, 42), 42)')"

View File

@ -2117,10 +2117,12 @@ def test_read_subcolumns(started_cluster):
assert res == "0\troot/test_subcolumns.jsonl\t(0,0)\ttest_subcolumns.jsonl\t0\n"
res = instance.query(
f"select x.b.d, _path, x.b, _file, x.e from s3('http://{started_cluster.minio_host}:{started_cluster.minio_port}/{bucket}/test_subcolumns.jsonl', auto, 'x Tuple(b Tuple(c UInt32, d UInt32), e UInt32) default ((42, 42), 42)')"
f"select x.b.d, _path, x.b, _file, dateDiff('minute', _time, now()), x.e from s3('http://{started_cluster.minio_host}:{started_cluster.minio_port}/{bucket}/test_subcolumns.jsonl', auto, 'x Tuple(b Tuple(c UInt32, d UInt32), e UInt32) default ((42, 42), 42)')"
)
assert res == "42\troot/test_subcolumns.jsonl\t(42,42)\ttest_subcolumns.jsonl\t42\n"
assert (
res == "42\troot/test_subcolumns.jsonl\t(42,42)\ttest_subcolumns.jsonl\t0\t42\n"
)
res = instance.query(
f"select a.b.d, _path, a.b, _file, a.e from url('http://{started_cluster.minio_host}:{started_cluster.minio_port}/{bucket}/test_subcolumns.tsv', auto, 'a Tuple(b Tuple(c UInt32, d UInt32), e UInt32)')"
@ -2148,6 +2150,8 @@ def test_read_subcolumns(started_cluster):
res == "42\t/root/test_subcolumns.jsonl\t(42,42)\ttest_subcolumns.jsonl\t42\n"
)
logging.info("Some custom logging")
def test_filtering_by_file_or_path(started_cluster):
bucket = started_cluster.minio_bucket

View File

@ -1,5 +1,5 @@
#!/usr/bin/env bash
# Tags: long, zookeeper, no-parallel, no-fasttest
# Tags: long, zookeeper, no-parallel, no-fasttest, no-asan
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh

View File

@ -1,13 +0,0 @@
#!/usr/bin/env bash
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh
echo "1,2" > $CLICKHOUSE_TEST_UNIQUE_NAME.csv
$CLICKHOUSE_LOCAL -nm -q "
create table test (x UInt64, y UInt32, size UInt64) engine=Memory;
insert into test select c1, c2, _size from file('$CLICKHOUSE_TEST_UNIQUE_NAME.csv') settings use_structure_from_insertion_table_in_table_functions=1;
select * from test;
"
rm $CLICKHOUSE_TEST_UNIQUE_NAME.csv

View File

@ -0,0 +1,14 @@
#!/usr/bin/env bash
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh
echo "1,2" > $CLICKHOUSE_TEST_UNIQUE_NAME.csv
sleep 1
$CLICKHOUSE_LOCAL -nm -q "
create table test (x UInt64, y UInt32, size UInt64, d32 DateTime32, d64 DateTime64) engine=Memory;
insert into test select c1, c2, _size, _time, _time from file('$CLICKHOUSE_TEST_UNIQUE_NAME.csv') settings use_structure_from_insertion_table_in_table_functions=1;
select x, y, size, (dateDiff('millisecond', d32, now()) < 4000 AND dateDiff('millisecond', d32, now()) > 0), (dateDiff('second', d64, now()) < 4 AND dateDiff('second', d64, now()) > 0) from test;
"
rm $CLICKHOUSE_TEST_UNIQUE_NAME.csv

View File

@ -1,49 +0,0 @@
#!/usr/bin/expect -f
set basedir [file dirname $argv0]
set basename [file tail $argv0]
if {[info exists env(CLICKHOUSE_TMP)]} {
set CLICKHOUSE_TMP $env(CLICKHOUSE_TMP)
} else {
set CLICKHOUSE_TMP "."
}
exp_internal -f $CLICKHOUSE_TMP/$basename.debuglog 0
log_user 0
set timeout 60
match_max 100000
set stty_init "rows 25 cols 120"
expect_after {
-i $any_spawn_id eof { exp_continue }
-i $any_spawn_id timeout { exit 1 }
}
spawn clickhouse-local
expect ":) "
# Trivial SELECT with LIMIT from system.zeros shows progress bar.
send "SELECT * FROM system.zeros LIMIT 10000000 FORMAT Null SETTINGS max_execution_speed = 1000000, timeout_before_checking_execution_speed = 0, max_block_size = 128\r"
expect "Progress: "
expect "█"
send "\3"
expect "Query was cancelled."
expect ":) "
send "SELECT * FROM system.zeros_mt LIMIT 10000000 FORMAT Null SETTINGS max_execution_speed = 1000000, timeout_before_checking_execution_speed = 0, max_block_size = 128\r"
expect "Progress: "
expect "█"
send "\3"
expect "Query was cancelled."
expect ":) "
# As well as from generateRandom
send "SELECT * FROM generateRandom() LIMIT 10000000 FORMAT Null SETTINGS max_execution_speed = 1000000, timeout_before_checking_execution_speed = 0, max_block_size = 128\r"
expect "Progress: "
expect "█"
send "\3"
expect "Query was cancelled."
expect ":) "
send "exit\r"
expect eof

View File

@ -0,0 +1,18 @@
#!/usr/bin/env bash
# Tags: no-random-settings
CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CUR_DIR"/../shell_config.sh
function run_with_progress_and_match_total_rows()
{
CURL_RESPONSE=$(echo "$1" | \
${CLICKHOUSE_CURL} -vsS "${CLICKHOUSE_URL}&wait_end_of_query=1&max_block_size=1&send_progress_in_http_headers=1&http_headers_progress_interval_ms=0&output_format_parallel_formatting=0" --data-binary @- 2>&1)
echo "$CURL_RESPONSE" | grep -q '"total_rows_to_read":"100"' && echo "Matched" || echo "Expected total_rows_to_read not found: ${CURL_RESPONSE}"
}
run_with_progress_and_match_total_rows 'SELECT * FROM system.zeros LIMIT 100'
run_with_progress_and_match_total_rows 'SELECT * FROM system.zeros_mt LIMIT 100'
run_with_progress_and_match_total_rows "SELECT * FROM generateRandom('number UInt64') LIMIT 100"

View File

@ -0,0 +1 @@
CREATE VIEW default.test_table_comment AS (SELECT toString({date_from:String})) COMMENT \'test comment\'

View File

@ -0,0 +1,5 @@
DROP TABLE IF EXISTS test_table_comment;
CREATE VIEW test_table_comment AS SELECT toString({date_from:String});
ALTER TABLE test_table_comment MODIFY COMMENT 'test comment';
SELECT create_table_query FROM system.tables WHERE name = 'test_table_comment' AND database = currentDatabase();
DROP TABLE test_table_comment;

View File

@ -127,7 +127,9 @@ CREATE TABLE 03165_token_ft
INDEX idx_message message TYPE full_text() GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY id;
ORDER BY id
-- Full text index works only with full parts.
SETTINGS min_bytes_for_full_part_storage=0;
INSERT INTO 03165_token_ft VALUES(1, 'Service is not ready');

View File

@ -0,0 +1,2 @@
{'ja':0.62,'fr':0.36}
{'ja':0.62,'fr':0.36}

View File

@ -0,0 +1,10 @@
-- Tags: no-fasttest
-- Tag no-fasttest: depends on cld2
-- https://github.com/ClickHouse/ClickHouse/issues/64931
SELECT detectLanguageMixed(materialize('二兎を追う者は一兎をも得ず二兎を追う者は一兎をも得ず A vaincre sans peril, on triomphe sans gloire.'))
GROUP BY
GROUPING SETS (
('a', toUInt256(1)),
(stringToH3(toFixedString(toFixedString('85283473ffffff', 14), 14))))
SETTINGS allow_experimental_nlp_functions = 1;

View File

@ -0,0 +1,6 @@
-- https://github.com/ClickHouse/ClickHouse/issues/64946
SELECT
multiIf((number % toLowCardinality(toNullable(toUInt128(2)))) = (number % toNullable(2)), toInt8(1), (number % materialize(toLowCardinality(3))) = toUInt128(toNullable(0)), toInt8(materialize(materialize(2))), toInt64(toUInt128(3)))
FROM system.numbers
LIMIT 44857
FORMAT Null;

View File

@ -1,4 +1,5 @@
v24.5.1.1763-stable 2024-06-01
v24.4.2.141-stable 2024-06-07
v24.4.1.2088-stable 2024-05-01
v24.3.3.102-lts 2024-05-01
v24.3.2.23-lts 2024-04-03

1 v24.5.1.1763-stable 2024-06-01
2 v24.4.2.141-stable 2024-06-07
3 v24.4.1.2088-stable 2024-05-01
4 v24.3.3.102-lts 2024-05-01
5 v24.3.2.23-lts 2024-04-03