mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-08 07:22:25 +00:00
Merge remote-tracking branch 'upstream/master' into HEAD
This commit is contained in:
commit
7047147db5
174
CHANGELOG.md
174
CHANGELOG.md
@ -1,4 +1,5 @@
|
||||
### Table of Contents
|
||||
**[ClickHouse release v24.5, 2024-05-30](#245)**<br/>
|
||||
**[ClickHouse release v24.4, 2024-04-30](#244)**<br/>
|
||||
**[ClickHouse release v24.3 LTS, 2024-03-26](#243)**<br/>
|
||||
**[ClickHouse release v24.2, 2024-02-29](#242)**<br/>
|
||||
@ -7,6 +8,179 @@
|
||||
|
||||
# 2024 Changelog
|
||||
|
||||
### <a id="245"></a> ClickHouse release 24.5, 2024-05-30
|
||||
|
||||
#### Backward Incompatible Change
|
||||
* Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_functions = 1` or set `compatibility = '24.4'` or lower. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Queries from `system.columns` will work faster if there is a large number of columns, but many databases or tables are not granted for `SHOW TABLES`. Note that in previous versions, if you grant `SHOW COLUMNS` to individual columns without granting `SHOW TABLES` to the corresponding tables, the `system.columns` table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. [#63439](https://github.com/ClickHouse/ClickHouse/pull/63439) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Setting `replace_long_file_name_to_hash` is enabled by default for `MergeTree` tables. [#64457](https://github.com/ClickHouse/ClickHouse/pull/64457) ([Anton Popov](https://github.com/CurtizJ)). The data written with this setting can be read by server versions since 23.9. After you use ClickHouse with this setting enabled, you cannot downgrade to versions 23.8 and earlier.
|
||||
|
||||
#### New Feature
|
||||
* Adds the `Form` format to read/write a single record in the `application/x-www-form-urlencoded` format. [#60199](https://github.com/ClickHouse/ClickHouse/pull/60199) ([Shaun Struwig](https://github.com/Blargian)).
|
||||
* Added possibility to compress in CROSS JOIN. [#60459](https://github.com/ClickHouse/ClickHouse/pull/60459) ([p1rattttt](https://github.com/p1rattttt)).
|
||||
* Added possibility to do `CROSS JOIN` in temporary files if the size exceeds limits. [#63432](https://github.com/ClickHouse/ClickHouse/pull/63432) ([p1rattttt](https://github.com/p1rattttt)).
|
||||
* Support join with inequal conditions which involve columns from both left and right table. e.g. `t1.y < t2.y`. To enable, `SET allow_experimental_join_condition = 1`. [#60920](https://github.com/ClickHouse/ClickHouse/pull/60920) ([lgbo](https://github.com/lgbo-ustc)).
|
||||
* Maps can now have `Float32`, `Float64`, `Array(T)`, `Map(K, V)` and `Tuple(T1, T2, ...)` as keys. Closes [#54537](https://github.com/ClickHouse/ClickHouse/issues/54537). [#59318](https://github.com/ClickHouse/ClickHouse/pull/59318) ([李扬](https://github.com/taiyang-li)).
|
||||
* Introduce bulk loading to `EmbeddedRocksDB` by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce `EmbeddedRocksDB` table settings. [#59163](https://github.com/ClickHouse/ClickHouse/pull/59163) [#63324](https://github.com/ClickHouse/ClickHouse/pull/63324) ([Duc Canh Le](https://github.com/canhld94)).
|
||||
* User can now parse CRLF with TSV format using a setting `input_format_tsv_crlf_end_of_line`. Closes [#56257](https://github.com/ClickHouse/ClickHouse/issues/56257). [#59747](https://github.com/ClickHouse/ClickHouse/pull/59747) ([Shaun Struwig](https://github.com/Blargian)).
|
||||
* A new setting `input_format_force_null_for_omitted_fields` that forces NULL values for omitted fields. [#60887](https://github.com/ClickHouse/ClickHouse/pull/60887) ([Constantine Peresypkin](https://github.com/pkit)).
|
||||
* Earlier our S3 storage and s3 table function didn't support selecting from archive container files, such as tarballs, zip, 7z. Now they allow to iterate over files inside archives in S3. [#62259](https://github.com/ClickHouse/ClickHouse/pull/62259) ([Daniil Ivanik](https://github.com/divanik)).
|
||||
* Support for conditional function `clamp`. [#62377](https://github.com/ClickHouse/ClickHouse/pull/62377) ([skyoct](https://github.com/skyoct)).
|
||||
* Add `NPy` output format. [#62430](https://github.com/ClickHouse/ClickHouse/pull/62430) ([豪肥肥](https://github.com/HowePa)).
|
||||
* `Raw` format as a synonym for `TSVRaw`. [#63394](https://github.com/ClickHouse/ClickHouse/pull/63394) ([Unalian](https://github.com/Unalian)).
|
||||
* Added new SQL functions `generateSnowflakeID` for generating Twitter-style Snowflake IDs. [#63577](https://github.com/ClickHouse/ClickHouse/pull/63577) ([Danila Puzov](https://github.com/kazalika)).
|
||||
* On Linux and MacOS, if the program has stdout redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to `INTO OUTFILE`). [#63662](https://github.com/ClickHouse/ClickHouse/pull/63662) ([v01dXYZ](https://github.com/v01dXYZ)).
|
||||
* Change warning on high number of attached tables to differentiate tables, views and dictionaries. [#64180](https://github.com/ClickHouse/ClickHouse/pull/64180) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
|
||||
* Added SQL functions `fromReadableSize` (along with `OrNull` and `OrZero` variants). This function performs the opposite operation of functions `formatReadableSize` and `formatReadableDecimalSize,` i.e., the given human-readable byte size; they return the number of bytes. Example: `SELECT fromReadableSize('3.0 MiB')` returns `3145728`. [#64386](https://github.com/ClickHouse/ClickHouse/pull/64386) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
|
||||
* Provide support for `azureBlobStorage` function in ClickHouse server to use Azure Workload identity to authenticate against Azure blob storage. If `use_workload_identity` parameter is set in config, [workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications) is used for authentication. [#57881](https://github.com/ClickHouse/ClickHouse/pull/57881) ([Vinay Suryadevara](https://github.com/vinay92-ch)).
|
||||
* Add TTL information in the `system.parts_columns` table. [#63200](https://github.com/ClickHouse/ClickHouse/pull/63200) ([litlig](https://github.com/litlig)).
|
||||
|
||||
#### Experimental Features
|
||||
* Implement `Dynamic` data type that allows to store values of any type inside it without knowing all of them in advance. `Dynamic` type is available under a setting `allow_experimental_dynamic_type`. Reference: [#54864](https://github.com/ClickHouse/ClickHouse/issues/54864). [#63058](https://github.com/ClickHouse/ClickHouse/pull/63058) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Allowed to create `MaterializedMySQL` database without connection to MySQL. [#63397](https://github.com/ClickHouse/ClickHouse/pull/63397) ([Kirill](https://github.com/kirillgarbar)).
|
||||
* Automatically mark a replica of Replicated database as lost and start recovery if some DDL task fails more than `max_retries_before_automatic_recovery` (100 by default) times in a row with the same error. Also, fixed a bug that could cause skipping DDL entries when an exception is thrown during an early stage of entry execution. [#63549](https://github.com/ClickHouse/ClickHouse/pull/63549) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
* Account failed files in `s3queue_tracked_file_ttl_sec` and `s3queue_traked_files_limit` for `StorageS3Queue`. [#63638](https://github.com/ClickHouse/ClickHouse/pull/63638) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
|
||||
#### Performance Improvement
|
||||
* A native parquet reader, which can read parquet binary to ClickHouse columns directly. Now this feature can be activated by setting `input_format_parquet_use_native_reader` to true. [#60361](https://github.com/ClickHouse/ClickHouse/pull/60361) ([ZhiHong Zhang](https://github.com/copperybean)).
|
||||
* Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by `keep_free_space_size(elements)_ratio`). This allows to release pressure from space reservation for queries (on `tryReserve` method). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. [#61250](https://github.com/ClickHouse/ClickHouse/pull/61250) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Skip merging of newly created projection blocks during `INSERT`-s. [#59405](https://github.com/ClickHouse/ClickHouse/pull/59405) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Process string functions `...UTF8` 'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. [#61632](https://github.com/ClickHouse/ClickHouse/pull/61632) ([李扬](https://github.com/taiyang-li)).
|
||||
* Improved performance of selection (`{}`) globs in StorageS3. [#62120](https://github.com/ClickHouse/ClickHouse/pull/62120) ([Andrey Zvonov](https://github.com/zvonand)).
|
||||
* HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. [#62652](https://github.com/ClickHouse/ClickHouse/pull/62652) ([Anton Ivashkin](https://github.com/ianton-ru)).
|
||||
* Function `splitByRegexp` is now faster when the regular expression argument is a single-character, trivial regular expression (in this case, it now falls back internally to `splitByChar`). [#62696](https://github.com/ClickHouse/ClickHouse/pull/62696) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Aggregation with 8-bit and 16-bit keys became faster: added min/max in FixedHashTable to limit the array index and reduce the `isZero()` calls during iteration. [#62746](https://github.com/ClickHouse/ClickHouse/pull/62746) ([Jiebin Sun](https://github.com/jiebinn)).
|
||||
* Add a new configuration`prefer_merge_sort_block_bytes` to control the memory usage and speed up sorting 2 times when merging when there are many columns. [#62904](https://github.com/ClickHouse/ClickHouse/pull/62904) ([LiuNeng](https://github.com/liuneng1994)).
|
||||
* `clickhouse-local` will start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes [#62941](https://github.com/ClickHouse/ClickHouse/issues/62941). [#63074](https://github.com/ClickHouse/ClickHouse/pull/63074) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Micro-optimizations for the new analyzer. [#63429](https://github.com/ClickHouse/ClickHouse/pull/63429) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63443](https://github.com/ClickHouse/ClickHouse/pull/63443) [#63532](https://github.com/ClickHouse/ClickHouse/pull/63532) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Speed up indices of type `set` a little (around 1.5 times) by removing garbage. [#64098](https://github.com/ClickHouse/ClickHouse/pull/64098) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Optimized vertical merges in tables with sparse columns. [#64311](https://github.com/ClickHouse/ClickHouse/pull/64311) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Improve filtering of sparse columns: reduce redundant calls of `ColumnSparse::filter` to improve performance. [#64426](https://github.com/ClickHouse/ClickHouse/pull/64426) ([Jiebin Sun](https://github.com/jiebinn)).
|
||||
* Remove copying data when writing to the filesystem cache. [#63401](https://github.com/ClickHouse/ClickHouse/pull/63401) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Now backups with azure blob storage will use multicopy. [#64116](https://github.com/ClickHouse/ClickHouse/pull/64116) ([alesapin](https://github.com/alesapin)).
|
||||
* Allow to use native copy for azure even with different containers. [#64154](https://github.com/ClickHouse/ClickHouse/pull/64154) ([alesapin](https://github.com/alesapin)).
|
||||
* Finally enable native copy for azure. [#64182](https://github.com/ClickHouse/ClickHouse/pull/64182) ([alesapin](https://github.com/alesapin)).
|
||||
* Improve the iteration over sparse columns to reduce call of `size`. [#64497](https://github.com/ClickHouse/ClickHouse/pull/64497) ([Jiebin Sun](https://github.com/jiebinn)).
|
||||
|
||||
#### Improvement
|
||||
* Allow using `clickhouse-local` and its shortcuts `clickhouse` and `ch` with a query or queries file as a positional argument. Examples: `ch "SELECT 1"`, `ch --param_test Hello "SELECT {test:String}"`, `ch query.sql`. This closes [#62361](https://github.com/ClickHouse/ClickHouse/issues/62361). [#63081](https://github.com/ClickHouse/ClickHouse/pull/63081) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Enable plain_rewritable metadata for local and Azure (azure_blob_storage) object storages. [#63365](https://github.com/ClickHouse/ClickHouse/pull/63365) ([Julia Kartseva](https://github.com/jkartseva)).
|
||||
* Support English-style Unicode quotes, e.g. “Hello”, ‘world’. This is questionable in general but helpful when you type your query in a word processor, such as Google Docs. This closes [#58634](https://github.com/ClickHouse/ClickHouse/issues/58634). [#63381](https://github.com/ClickHouse/ClickHouse/pull/63381) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Allow trailing commas in the columns list in the INSERT query. For example, `INSERT INTO test (a, b, c, ) VALUES ...`. [#63803](https://github.com/ClickHouse/ClickHouse/pull/63803) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Better exception messages for the `Regexp` format. [#63804](https://github.com/ClickHouse/ClickHouse/pull/63804) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Allow trailing commas in the `Values` format. For example, this query is allowed: `INSERT INTO test (a, b, c) VALUES (4, 5, 6,);`. [#63810](https://github.com/ClickHouse/ClickHouse/pull/63810) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Make rabbitmq nack broken messages. Closes [#45350](https://github.com/ClickHouse/ClickHouse/issues/45350). [#60312](https://github.com/ClickHouse/ClickHouse/pull/60312) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix a crash in asynchronous stack unwinding (such as when using the sampling query profiler) while interpreting debug info. This closes [#60460](https://github.com/ClickHouse/ClickHouse/issues/60460). [#60468](https://github.com/ClickHouse/ClickHouse/pull/60468) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Distinct messages for s3 error 'no key' for cases disk and storage. [#61108](https://github.com/ClickHouse/ClickHouse/pull/61108) ([Sema Checherinda](https://github.com/CheSema)).
|
||||
* The progress bar will work for trivial queries with LIMIT from `system.zeros`, `system.zeros_mt` (it already works for `system.numbers` and `system.numbers_mt`), and the `generateRandom` table function. As a bonus, if the total number of records is greater than the `max_rows_to_read` limit, it will throw an exception earlier. This closes [#58183](https://github.com/ClickHouse/ClickHouse/issues/58183). [#61823](https://github.com/ClickHouse/ClickHouse/pull/61823) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Support for "Merge Key" in YAML configurations (this is a weird feature of YAML, please never mind). [#62685](https://github.com/ClickHouse/ClickHouse/pull/62685) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Enhance error message when non-deterministic function is used with Replicated source. [#62896](https://github.com/ClickHouse/ClickHouse/pull/62896) ([Grégoire Pineau](https://github.com/lyrixx)).
|
||||
* Fix interserver secret for Distributed over Distributed from `remote`. [#63013](https://github.com/ClickHouse/ClickHouse/pull/63013) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Support `include_from` for YAML files. However, you should better use `config.d` [#63106](https://github.com/ClickHouse/ClickHouse/pull/63106) ([Eduard Karacharov](https://github.com/korowa)).
|
||||
* Keep previous data in terminal after picking from skim suggestions. [#63261](https://github.com/ClickHouse/ClickHouse/pull/63261) ([FlameFactory](https://github.com/FlameFactory)).
|
||||
* Width of fields (in Pretty formats or the `visibleWidth` function) now correctly ignores ANSI escape sequences. [#63270](https://github.com/ClickHouse/ClickHouse/pull/63270) ([Shaun Struwig](https://github.com/Blargian)).
|
||||
* Update the usage of error code `NUMBER_OF_ARGUMENTS_DOESNT_MATCH` by more accurate error codes when appropriate. [#63406](https://github.com/ClickHouse/ClickHouse/pull/63406) ([Yohann Jardin](https://github.com/yohannj)).
|
||||
* `os_user` and `client_hostname` are now correctly set up for queries for command line suggestions in clickhouse-client. This closes [#63430](https://github.com/ClickHouse/ClickHouse/issues/63430). [#63433](https://github.com/ClickHouse/ClickHouse/pull/63433) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Automatically correct `max_block_size` to the default value if it is zero. [#63587](https://github.com/ClickHouse/ClickHouse/pull/63587) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Add a build_id ALIAS column to trace_log to facilitate auto renaming upon detecting binary changes. This is to address [#52086](https://github.com/ClickHouse/ClickHouse/issues/52086). [#63656](https://github.com/ClickHouse/ClickHouse/pull/63656) ([Zimu Li](https://github.com/woodlzm)).
|
||||
* Enable truncate operation for object storage disks. [#63693](https://github.com/ClickHouse/ClickHouse/pull/63693) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
|
||||
* The loading of the keywords list is now dependent on the server revision and will be disabled for the old versions of ClickHouse server. CC @azat. [#63786](https://github.com/ClickHouse/ClickHouse/pull/63786) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Clickhouse disks have to read server setting to obtain actual metadata format version. [#63831](https://github.com/ClickHouse/ClickHouse/pull/63831) ([Sema Checherinda](https://github.com/CheSema)).
|
||||
* Disable pretty format restrictions (`output_format_pretty_max_rows`/`output_format_pretty_max_value_width`) when stdout is not TTY. [#63942](https://github.com/ClickHouse/ClickHouse/pull/63942) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Exception handling now works when ClickHouse is used inside AWS Lambda. Author: [Alexey Coolnev](https://github.com/acoolnev). [#64014](https://github.com/ClickHouse/ClickHouse/pull/64014) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Throw `CANNOT_DECOMPRESS` instread of `CORRUPTED_DATA` on invalid compressed data passed via HTTP. [#64036](https://github.com/ClickHouse/ClickHouse/pull/64036) ([vdimir](https://github.com/vdimir)).
|
||||
* A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes [#61993](https://github.com/ClickHouse/ClickHouse/issues/61993). [#64084](https://github.com/ClickHouse/ClickHouse/pull/64084) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Added knob `metadata_storage_type` to keep free space on metadata storage disk. [#64128](https://github.com/ClickHouse/ClickHouse/pull/64128) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
|
||||
* Add metrics, logs, and thread names around parts filtering with indices. [#64130](https://github.com/ClickHouse/ClickHouse/pull/64130) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Metrics to track the number of directories created and removed by the `plain_rewritable` metadata storage, and the number of entries in the local-to-remote in-memory map. [#64175](https://github.com/ClickHouse/ClickHouse/pull/64175) ([Julia Kartseva](https://github.com/jkartseva)).
|
||||
* Ignore `allow_suspicious_primary_key` on `ATTACH` and verify on `ALTER`. [#64202](https://github.com/ClickHouse/ClickHouse/pull/64202) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* The query cache now considers identical queries with different settings as different. This increases robustness in cases where different settings (e.g. `limit` or `additional_table_filters`) would affect the query result. [#64205](https://github.com/ClickHouse/ClickHouse/pull/64205) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Test that a non standard error code `QPSLimitExceeded` is supported and it is retryable error. [#64225](https://github.com/ClickHouse/ClickHouse/pull/64225) ([Sema Checherinda](https://github.com/CheSema)).
|
||||
* Settings from the user config doesn't affect merges and mutations for MergeTree on top of object storage. [#64456](https://github.com/ClickHouse/ClickHouse/pull/64456) ([alesapin](https://github.com/alesapin)).
|
||||
* Test that `totalqpslimitexceeded` is a retriable s3 error. [#64520](https://github.com/ClickHouse/ClickHouse/pull/64520) ([Sema Checherinda](https://github.com/CheSema)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
* ClickHouse is built with clang-18. A lot of new checks from clang-tidy-18 have been enabled. [#60469](https://github.com/ClickHouse/ClickHouse/pull/60469) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Experimentally support loongarch64 as a new platform for ClickHouse. [#63733](https://github.com/ClickHouse/ClickHouse/pull/63733) ([qiangxuhui](https://github.com/qiangxuhui)).
|
||||
* The Dockerfile is reviewed by the docker official library in https://github.com/docker-library/official-images/pull/15846. [#63400](https://github.com/ClickHouse/ClickHouse/pull/63400) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Information about every symbol in every translation unit will be collected in the CI database for every build in the CI. This closes [#63494](https://github.com/ClickHouse/ClickHouse/issues/63494). [#63495](https://github.com/ClickHouse/ClickHouse/pull/63495) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Update Apache Datasketches library. It resolves [#63858](https://github.com/ClickHouse/ClickHouse/issues/63858). [#63923](https://github.com/ClickHouse/ClickHouse/pull/63923) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Enable GRPC support for aarch64 linux while cross-compiling binary. [#64072](https://github.com/ClickHouse/ClickHouse/pull/64072) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix unwind on SIGSEGV on aarch64 (due to small stack for signal) [#64058](https://github.com/ClickHouse/ClickHouse/pull/64058) ([Azat Khuzhin](https://github.com/azat)).
|
||||
|
||||
#### Bug Fix
|
||||
* Disabled `enable_vertical_final` setting by default. This feature should not be used because it has a bug: [#64543](https://github.com/ClickHouse/ClickHouse/issues/64543). [#64544](https://github.com/ClickHouse/ClickHouse/pull/64544) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
* Fix making backup when multiple shards are used [#57684](https://github.com/ClickHouse/ClickHouse/pull/57684) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fix passing projections/indexes/primary key from columns list from CREATE query into inner table of MV [#59183](https://github.com/ClickHouse/ClickHouse/pull/59183) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix boundRatio incorrect merge [#60532](https://github.com/ClickHouse/ClickHouse/pull/60532) ([Tao Wang](https://github.com/wangtZJU)).
|
||||
* Fix crash when calling some functions on const low-cardinality columns [#61966](https://github.com/ClickHouse/ClickHouse/pull/61966) ([Michael Kolupaev](https://github.com/al13n321)).
|
||||
* Fix queries with FINAL give wrong result when table does not use adaptive granularity [#62432](https://github.com/ClickHouse/ClickHouse/pull/62432) ([Duc Canh Le](https://github.com/canhld94)).
|
||||
* Improve detection of cgroups v2 support for memory controllers [#62903](https://github.com/ClickHouse/ClickHouse/pull/62903) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Fix subsequent use of external tables in client [#62964](https://github.com/ClickHouse/ClickHouse/pull/62964) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix crash with untuple and unresolved lambda [#63131](https://github.com/ClickHouse/ClickHouse/pull/63131) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Fix premature server listen for connections [#63181](https://github.com/ClickHouse/ClickHouse/pull/63181) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix intersecting parts when restarting after a DROP PART command [#63202](https://github.com/ClickHouse/ClickHouse/pull/63202) ([Han Fei](https://github.com/hanfei1991)).
|
||||
* Correctly load SQL security defaults during startup [#63209](https://github.com/ClickHouse/ClickHouse/pull/63209) ([pufit](https://github.com/pufit)).
|
||||
* JOIN filter push down filter join fix [#63234](https://github.com/ClickHouse/ClickHouse/pull/63234) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix infinite loop in AzureObjectStorage::listObjects [#63257](https://github.com/ClickHouse/ClickHouse/pull/63257) ([Julia Kartseva](https://github.com/jkartseva)).
|
||||
* CROSS join ignore join_algorithm setting [#63273](https://github.com/ClickHouse/ClickHouse/pull/63273) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix finalize WriteBufferToFileSegment and StatusFile [#63346](https://github.com/ClickHouse/ClickHouse/pull/63346) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix logical error during SELECT query after ALTER in rare case [#63353](https://github.com/ClickHouse/ClickHouse/pull/63353) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix `X-ClickHouse-Timezone` header with `session_timezone` [#63377](https://github.com/ClickHouse/ClickHouse/pull/63377) ([Andrey Zvonov](https://github.com/zvonand)).
|
||||
* Fix debug assert when using grouping WITH ROLLUP and LowCardinality types [#63398](https://github.com/ClickHouse/ClickHouse/pull/63398) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Small fixes for group_by_use_nulls [#63405](https://github.com/ClickHouse/ClickHouse/pull/63405) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix backup/restore of projection part in case projection was removed from table metadata, but part still has projection [#63426](https://github.com/ClickHouse/ClickHouse/pull/63426) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix mysql dictionary source [#63481](https://github.com/ClickHouse/ClickHouse/pull/63481) ([vdimir](https://github.com/vdimir)).
|
||||
* Insert QueryFinish on AsyncInsertFlush with no data [#63483](https://github.com/ClickHouse/ClickHouse/pull/63483) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Fix: empty used_dictionaries in system.query_log [#63487](https://github.com/ClickHouse/ClickHouse/pull/63487) ([Eduard Karacharov](https://github.com/korowa)).
|
||||
* Make `MergeTreePrefetchedReadPool` safer [#63513](https://github.com/ClickHouse/ClickHouse/pull/63513) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Fix crash on exit with sentry enabled (due to openssl destroyed before sentry) [#63548](https://github.com/ClickHouse/ClickHouse/pull/63548) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix Array and Map support with Keyed hashing [#63628](https://github.com/ClickHouse/ClickHouse/pull/63628) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
|
||||
* Fix filter pushdown for Parquet and maybe StorageMerge [#63642](https://github.com/ClickHouse/ClickHouse/pull/63642) ([Michael Kolupaev](https://github.com/al13n321)).
|
||||
* Prevent conversion to Replicated if zookeeper path already exists [#63670](https://github.com/ClickHouse/ClickHouse/pull/63670) ([Kirill](https://github.com/kirillgarbar)).
|
||||
* Analyzer: views read only necessary columns [#63688](https://github.com/ClickHouse/ClickHouse/pull/63688) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Analyzer: Forbid WINDOW redefinition [#63694](https://github.com/ClickHouse/ClickHouse/pull/63694) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* flatten_nested was broken with the experimental Replicated database. [#63695](https://github.com/ClickHouse/ClickHouse/pull/63695) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix [#63653](https://github.com/ClickHouse/ClickHouse/issues/63653) [#63722](https://github.com/ClickHouse/ClickHouse/pull/63722) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Allow cast from Array(Nothing) to Map(Nothing, Nothing) [#63753](https://github.com/ClickHouse/ClickHouse/pull/63753) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix ILLEGAL_COLUMN in partial_merge join [#63755](https://github.com/ClickHouse/ClickHouse/pull/63755) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix: remove redundant distinct with window functions [#63776](https://github.com/ClickHouse/ClickHouse/pull/63776) ([Igor Nikonov](https://github.com/devcrafter)).
|
||||
* Fix possible crash with SYSTEM UNLOAD PRIMARY KEY [#63778](https://github.com/ClickHouse/ClickHouse/pull/63778) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Fix a query with duplicating cycling alias. [#63791](https://github.com/ClickHouse/ClickHouse/pull/63791) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Make `TokenIterator` lazy as it should be [#63801](https://github.com/ClickHouse/ClickHouse/pull/63801) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add `endpoint_subpath` S3 URI setting [#63806](https://github.com/ClickHouse/ClickHouse/pull/63806) ([Julia Kartseva](https://github.com/jkartseva)).
|
||||
* Fix deadlock in `ParallelReadBuffer` [#63814](https://github.com/ClickHouse/ClickHouse/pull/63814) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* JOIN filter push down equivalent columns fix [#63819](https://github.com/ClickHouse/ClickHouse/pull/63819) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Remove data from all disks after DROP with Lazy database. [#63848](https://github.com/ClickHouse/ClickHouse/pull/63848) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
|
||||
* Fix incorrect result when reading from MV with parallel replicas and new analyzer [#63861](https://github.com/ClickHouse/ClickHouse/pull/63861) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Fixes in `find_super_nodes` and `find_big_family` command of keeper-client [#63862](https://github.com/ClickHouse/ClickHouse/pull/63862) ([Alexander Gololobov](https://github.com/davenger)).
|
||||
* Update lambda execution name [#63864](https://github.com/ClickHouse/ClickHouse/pull/63864) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix SIGSEGV due to CPU/Real profiler [#63865](https://github.com/ClickHouse/ClickHouse/pull/63865) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix `EXPLAIN CURRENT TRANSACTION` query [#63926](https://github.com/ClickHouse/ClickHouse/pull/63926) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix analyzer: there's turtles all the way down... [#63930](https://github.com/ClickHouse/ClickHouse/pull/63930) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Allow certain ALTER TABLE commands for `plain_rewritable` disk [#63933](https://github.com/ClickHouse/ClickHouse/pull/63933) ([Julia Kartseva](https://github.com/jkartseva)).
|
||||
* Recursive CTE distributed fix [#63939](https://github.com/ClickHouse/ClickHouse/pull/63939) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix reading of columns of type `Tuple(Map(LowCardinality(...)))` [#63956](https://github.com/ClickHouse/ClickHouse/pull/63956) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Analyzer: Fix COLUMNS resolve [#63962](https://github.com/ClickHouse/ClickHouse/pull/63962) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* LIMIT BY and skip_unused_shards with analyzer [#63983](https://github.com/ClickHouse/ClickHouse/pull/63983) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* A fix for some trash (experimental Kusto) [#63992](https://github.com/ClickHouse/ClickHouse/pull/63992) ([Yong Wang](https://github.com/kashwy)).
|
||||
* Deserialize untrusted binary inputs in a safer way [#64024](https://github.com/ClickHouse/ClickHouse/pull/64024) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Fix query analysis for queries with the setting `final` = 1 for Distributed tables over tables from other than the MergeTree family. [#64037](https://github.com/ClickHouse/ClickHouse/pull/64037) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Add missing settings to recoverLostReplica [#64040](https://github.com/ClickHouse/ClickHouse/pull/64040) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Fix SQL security access checks with analyzer [#64079](https://github.com/ClickHouse/ClickHouse/pull/64079) ([pufit](https://github.com/pufit)).
|
||||
* Fix analyzer: only interpolate expression should be used for DAG [#64096](https://github.com/ClickHouse/ClickHouse/pull/64096) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Fix azure backup writing multipart blocks by 1 MiB (read buffer size) instead of `max_upload_part_size` (in non-native copy case) [#64117](https://github.com/ClickHouse/ClickHouse/pull/64117) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Correctly fallback during backup copy [#64153](https://github.com/ClickHouse/ClickHouse/pull/64153) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Prevent LOGICAL_ERROR on CREATE TABLE as Materialized View [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Query Cache: Consider identical queries against different databases as different [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Ignore `text_log` for Keeper [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Fix ARRAY JOIN with Distributed. [#64226](https://github.com/ClickHouse/ClickHouse/pull/64226) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix: CNF with mutually exclusive atoms reduction [#64256](https://github.com/ClickHouse/ClickHouse/pull/64256) ([Eduard Karacharov](https://github.com/korowa)).
|
||||
* Fix Logical error: Bad cast for Buffer table with prewhere. [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
|
||||
|
||||
### <a id="244"></a> ClickHouse release 24.4, 2024-04-30
|
||||
|
||||
#### Upgrade Notes
|
||||
|
@ -57,6 +57,18 @@ SELECT toTypeName(from), hex(from) FROM hits LIMIT 1;
|
||||
└──────────────────┴───────────┘
|
||||
```
|
||||
|
||||
IPv4 addresses can be directly compared to IPv6 addresses:
|
||||
|
||||
```sql
|
||||
SELECT toIPv4('127.0.0.1') = toIPv6('::ffff:127.0.0.1');
|
||||
```
|
||||
|
||||
```text
|
||||
┌─equals(toIPv4('127.0.0.1'), toIPv6('::ffff:127.0.0.1'))─┐
|
||||
│ 1 │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**See Also**
|
||||
|
||||
- [Functions for Working with IPv4 and IPv6 Addresses](../functions/ip-address-functions.md)
|
||||
|
@ -57,6 +57,19 @@ SELECT toTypeName(from), hex(from) FROM hits LIMIT 1;
|
||||
└──────────────────┴──────────────────────────────────┘
|
||||
```
|
||||
|
||||
IPv6 addresses can be directly compared to IPv4 addresses:
|
||||
|
||||
```sql
|
||||
SELECT toIPv4('127.0.0.1') = toIPv6('::ffff:127.0.0.1');
|
||||
```
|
||||
|
||||
```text
|
||||
┌─equals(toIPv4('127.0.0.1'), toIPv6('::ffff:127.0.0.1'))─┐
|
||||
│ 1 │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
|
||||
**See Also**
|
||||
|
||||
- [Functions for Working with IPv4 and IPv6 Addresses](../functions/ip-address-functions.md)
|
||||
|
@ -4,13 +4,13 @@ sidebar_position: 105
|
||||
sidebar_label: JSON
|
||||
---
|
||||
|
||||
There are two sets of functions to parse JSON.
|
||||
- `simpleJSON*` (`visitParam*`) is made to parse a special very limited subset of a JSON, but these functions are extremely fast.
|
||||
- `JSONExtract*` is made to parse normal JSON.
|
||||
There are two sets of functions to parse JSON:
|
||||
- [`simpleJSON*` (`visitParam*`)](#simplejson--visitparam-functions) which is made for parsing a limited subset of JSON extremely fast.
|
||||
- [`JSONExtract*`](#jsonextract-functions) which is made for parsing ordinary JSON.
|
||||
|
||||
# simpleJSON/visitParam functions
|
||||
## simpleJSON / visitParam functions
|
||||
|
||||
ClickHouse has special functions for working with simplified JSON. All these JSON functions are based on strong assumptions about what the JSON can be, but they try to do as little as possible to get the job done.
|
||||
ClickHouse has special functions for working with simplified JSON. All these JSON functions are based on strong assumptions about what the JSON can be. They try to do as little as possible to get the job done as quickly as possible.
|
||||
|
||||
The following assumptions are made:
|
||||
|
||||
@ -19,7 +19,7 @@ The following assumptions are made:
|
||||
3. Fields are searched for on any nesting level, indiscriminately. If there are multiple matching fields, the first occurrence is used.
|
||||
4. The JSON does not have space characters outside of string literals.
|
||||
|
||||
## simpleJSONHas
|
||||
### simpleJSONHas
|
||||
|
||||
Checks whether there is a field named `field_name`. The result is `UInt8`.
|
||||
|
||||
@ -29,14 +29,16 @@ Checks whether there is a field named `field_name`. The result is `UInt8`.
|
||||
simpleJSONHas(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamHas`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns `1` if the field exists, `0` otherwise.
|
||||
- Returns `1` if the field exists, `0` otherwise. [UInt8](../data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -55,11 +57,13 @@ SELECT simpleJSONHas(json, 'foo') FROM jsons;
|
||||
SELECT simpleJSONHas(json, 'bar') FROM jsons;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
1
|
||||
0
|
||||
```
|
||||
## simpleJSONExtractUInt
|
||||
### simpleJSONExtractUInt
|
||||
|
||||
Parses `UInt64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`.
|
||||
|
||||
@ -69,14 +73,16 @@ Parses `UInt64` from the value of the field named `field_name`. If this is a str
|
||||
simpleJSONExtractUInt(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractUInt`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns the number parsed from the field if the field exists and contains a number, `0` otherwise.
|
||||
- Returns the number parsed from the field if the field exists and contains a number, `0` otherwise. [UInt64](../data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -98,6 +104,8 @@ INSERT INTO jsons VALUES ('{"baz":2}');
|
||||
SELECT simpleJSONExtractUInt(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
0
|
||||
4
|
||||
@ -106,7 +114,7 @@ SELECT simpleJSONExtractUInt(json, 'foo') FROM jsons ORDER BY json;
|
||||
5
|
||||
```
|
||||
|
||||
## simpleJSONExtractInt
|
||||
### simpleJSONExtractInt
|
||||
|
||||
Parses `Int64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`.
|
||||
|
||||
@ -116,14 +124,16 @@ Parses `Int64` from the value of the field named `field_name`. If this is a stri
|
||||
simpleJSONExtractInt(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractInt`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns the number parsed from the field if the field exists and contains a number, `0` otherwise.
|
||||
- Returns the number parsed from the field if the field exists and contains a number, `0` otherwise. [Int64](../data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -145,6 +155,8 @@ INSERT INTO jsons VALUES ('{"baz":2}');
|
||||
SELECT simpleJSONExtractInt(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
0
|
||||
-4
|
||||
@ -153,7 +165,7 @@ SELECT simpleJSONExtractInt(json, 'foo') FROM jsons ORDER BY json;
|
||||
5
|
||||
```
|
||||
|
||||
## simpleJSONExtractFloat
|
||||
### simpleJSONExtractFloat
|
||||
|
||||
Parses `Float64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`.
|
||||
|
||||
@ -163,14 +175,16 @@ Parses `Float64` from the value of the field named `field_name`. If this is a st
|
||||
simpleJSONExtractFloat(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractFloat`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns the number parsed from the field if the field exists and contains a number, `0` otherwise.
|
||||
- Returns the number parsed from the field if the field exists and contains a number, `0` otherwise. [Float64](../data-types/float.md/#float32-float64).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -192,6 +206,8 @@ INSERT INTO jsons VALUES ('{"baz":2}');
|
||||
SELECT simpleJSONExtractFloat(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
0
|
||||
-4000
|
||||
@ -200,7 +216,7 @@ SELECT simpleJSONExtractFloat(json, 'foo') FROM jsons ORDER BY json;
|
||||
5
|
||||
```
|
||||
|
||||
## simpleJSONExtractBool
|
||||
### simpleJSONExtractBool
|
||||
|
||||
Parses a true/false value from the value of the field named `field_name`. The result is `UInt8`.
|
||||
|
||||
@ -210,10 +226,12 @@ Parses a true/false value from the value of the field named `field_name`. The re
|
||||
simpleJSONExtractBool(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractBool`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
@ -240,6 +258,8 @@ SELECT simpleJSONExtractBool(json, 'bar') FROM jsons ORDER BY json;
|
||||
SELECT simpleJSONExtractBool(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
0
|
||||
1
|
||||
@ -247,7 +267,7 @@ SELECT simpleJSONExtractBool(json, 'foo') FROM jsons ORDER BY json;
|
||||
0
|
||||
```
|
||||
|
||||
## simpleJSONExtractRaw
|
||||
### simpleJSONExtractRaw
|
||||
|
||||
Returns the value of the field named `field_name` as a `String`, including separators.
|
||||
|
||||
@ -257,14 +277,16 @@ Returns the value of the field named `field_name` as a `String`, including separ
|
||||
simpleJSONExtractRaw(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractRaw`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns the value of the field as a [`String`](../data-types/string.md#string), including separators if the field exists, or an empty `String` otherwise.
|
||||
- Returns the value of the field as a string, including separators if the field exists, or an empty string otherwise. [`String`](../data-types/string.md#string)
|
||||
|
||||
**Example**
|
||||
|
||||
@ -286,6 +308,8 @@ INSERT INTO jsons VALUES ('{"baz":2}');
|
||||
SELECT simpleJSONExtractRaw(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
|
||||
"-4e3"
|
||||
@ -294,7 +318,7 @@ SELECT simpleJSONExtractRaw(json, 'foo') FROM jsons ORDER BY json;
|
||||
{"def":[1,2,3]}
|
||||
```
|
||||
|
||||
## simpleJSONExtractString
|
||||
### simpleJSONExtractString
|
||||
|
||||
Parses `String` in double quotes from the value of the field named `field_name`.
|
||||
|
||||
@ -304,14 +328,16 @@ Parses `String` in double quotes from the value of the field named `field_name`.
|
||||
simpleJSONExtractString(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractString`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns the value of a field as a [`String`](../data-types/string.md#string), including separators. The value is unescaped. It returns an empty `String`: if the field doesn't contain a double quoted string, if unescaping fails or if the field doesn't exist.
|
||||
- Returns the unescaped value of a field as a string, including separators. An empty string is returned if the field doesn't contain a double quoted string, if unescaping fails or if the field doesn't exist. [String](../data-types/string.md).
|
||||
|
||||
**Implementation details**
|
||||
|
||||
@ -336,6 +362,8 @@ INSERT INTO jsons VALUES ('{"foo":"hello}');
|
||||
SELECT simpleJSONExtractString(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
\n\0
|
||||
|
||||
@ -343,73 +371,61 @@ SELECT simpleJSONExtractString(json, 'foo') FROM jsons ORDER BY json;
|
||||
|
||||
```
|
||||
|
||||
## visitParamHas
|
||||
## JSONExtract functions
|
||||
|
||||
This function is [an alias of `simpleJSONHas`](./json-functions#simplejsonhas).
|
||||
The following functions are based on [simdjson](https://github.com/lemire/simdjson), and designed for more complex JSON parsing requirements.
|
||||
|
||||
## visitParamExtractUInt
|
||||
### isValidJSON
|
||||
|
||||
This function is [an alias of `simpleJSONExtractUInt`](./json-functions#simplejsonextractuint).
|
||||
Checks that passed string is valid JSON.
|
||||
|
||||
## visitParamExtractInt
|
||||
**Syntax**
|
||||
|
||||
This function is [an alias of `simpleJSONExtractInt`](./json-functions#simplejsonextractint).
|
||||
```sql
|
||||
isValidJSON(json)
|
||||
```
|
||||
|
||||
## visitParamExtractFloat
|
||||
|
||||
This function is [an alias of `simpleJSONExtractFloat`](./json-functions#simplejsonextractfloat).
|
||||
|
||||
## visitParamExtractBool
|
||||
|
||||
This function is [an alias of `simpleJSONExtractBool`](./json-functions#simplejsonextractbool).
|
||||
|
||||
## visitParamExtractRaw
|
||||
|
||||
This function is [an alias of `simpleJSONExtractRaw`](./json-functions#simplejsonextractraw).
|
||||
|
||||
## visitParamExtractString
|
||||
|
||||
This function is [an alias of `simpleJSONExtractString`](./json-functions#simplejsonextractstring).
|
||||
|
||||
# JSONExtract functions
|
||||
|
||||
The following functions are based on [simdjson](https://github.com/lemire/simdjson) designed for more complex JSON parsing requirements.
|
||||
|
||||
## isValidJSON(json)
|
||||
|
||||
Checks that passed string is a valid json.
|
||||
|
||||
Examples:
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT isValidJSON('{"a": "hello", "b": [-100, 200.0, 300]}') = 1
|
||||
SELECT isValidJSON('not a json') = 0
|
||||
```
|
||||
|
||||
## JSONHas(json\[, indices_or_keys\]...)
|
||||
### JSONHas
|
||||
|
||||
If the value exists in the JSON document, `1` will be returned.
|
||||
If the value exists in the JSON document, `1` will be returned. If the value does not exist, `0` will be returned.
|
||||
|
||||
If the value does not exist, `0` will be returned.
|
||||
**Syntax**
|
||||
|
||||
Examples:
|
||||
```sql
|
||||
JSONHas(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns `1` if the value exists in `json`, otherwise `0`. [UInt8](../data-types/int-uint.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSONHas('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 1
|
||||
SELECT JSONHas('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 4) = 0
|
||||
```
|
||||
|
||||
`indices_or_keys` is a list of zero or more arguments each of them can be either string or integer.
|
||||
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
Minimum index of the element is 1. Thus the element 0 does not exist.
|
||||
|
||||
You may use integers to access both JSON arrays and JSON objects.
|
||||
|
||||
So, for example:
|
||||
The minimum index of the element is 1. Thus the element 0 does not exist. You may use integers to access both JSON arrays and JSON objects. For example:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', 1) = 'a'
|
||||
@ -419,26 +435,62 @@ SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', -2) = 'a'
|
||||
SELECT JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 1) = 'hello'
|
||||
```
|
||||
|
||||
## JSONLength(json\[, indices_or_keys\]...)
|
||||
### JSONLength
|
||||
|
||||
Return the length of a JSON array or a JSON object.
|
||||
Return the length of a JSON array or a JSON object. If the value does not exist or has the wrong type, `0` will be returned.
|
||||
|
||||
If the value does not exist or has a wrong type, `0` will be returned.
|
||||
**Syntax**
|
||||
|
||||
Examples:
|
||||
```sql
|
||||
JSONLength(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns the length of the JSON array or JSON object. Returns `0` if the value does not exist or has the wrong type. [UInt64](../data-types/int-uint.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT JSONLength('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 3
|
||||
SELECT JSONLength('{"a": "hello", "b": [-100, 200.0, 300]}') = 2
|
||||
```
|
||||
|
||||
## JSONType(json\[, indices_or_keys\]...)
|
||||
### JSONType
|
||||
|
||||
Return the type of a JSON value.
|
||||
Return the type of a JSON value. If the value does not exist, `Null` will be returned.
|
||||
|
||||
If the value does not exist, `Null` will be returned.
|
||||
**Syntax**
|
||||
|
||||
Examples:
|
||||
```sql
|
||||
JSONType(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns the type of a JSON value as a string, otherwise if the value doesn't exists it returns `Null`. [String](../data-types/string.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}') = 'Object'
|
||||
@ -446,35 +498,191 @@ SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') = 'String'
|
||||
SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 'Array'
|
||||
```
|
||||
|
||||
## JSONExtractUInt(json\[, indices_or_keys\]...)
|
||||
### JSONExtractUInt
|
||||
|
||||
## JSONExtractInt(json\[, indices_or_keys\]...)
|
||||
Parses JSON and extracts a value of UInt type.
|
||||
|
||||
## JSONExtractFloat(json\[, indices_or_keys\]...)
|
||||
**Syntax**
|
||||
|
||||
## JSONExtractBool(json\[, indices_or_keys\]...)
|
||||
|
||||
Parses a JSON and extract a value. These functions are similar to `visitParam` functions.
|
||||
|
||||
If the value does not exist or has a wrong type, `0` will be returned.
|
||||
|
||||
Examples:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 1) = -100
|
||||
SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 2) = 200.0
|
||||
SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) = 300
|
||||
```sql
|
||||
JSONExtractUInt(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
## JSONExtractString(json\[, indices_or_keys\]...)
|
||||
**Parameters**
|
||||
|
||||
Parses a JSON and extract a string. This function is similar to `visitParamExtractString` functions.
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
If the value does not exist or has a wrong type, an empty string will be returned.
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
The value is unescaped. If unescaping failed, it returns an empty string.
|
||||
**Returned value**
|
||||
|
||||
Examples:
|
||||
- Returns a UInt value if it exists, otherwise it returns `Null`. [UInt64](../data-types/string.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) as x, toTypeName(x);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌───x─┬─toTypeName(x)─┐
|
||||
│ 300 │ UInt64 │
|
||||
└─────┴───────────────┘
|
||||
```
|
||||
|
||||
### JSONExtractInt
|
||||
|
||||
Parses JSON and extracts a value of Int type.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtractInt(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns an Int value if it exists, otherwise it returns `Null`. [Int64](../data-types/int-uint.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) as x, toTypeName(x);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌───x─┬─toTypeName(x)─┐
|
||||
│ 300 │ Int64 │
|
||||
└─────┴───────────────┘
|
||||
```
|
||||
|
||||
### JSONExtractFloat
|
||||
|
||||
Parses JSON and extracts a value of Int type.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtractFloat(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns an Float value if it exists, otherwise it returns `Null`. [Float64](../data-types/float.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 2) as x, toTypeName(x);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌───x─┬─toTypeName(x)─┐
|
||||
│ 200 │ Float64 │
|
||||
└─────┴───────────────┘
|
||||
```
|
||||
|
||||
### JSONExtractBool
|
||||
|
||||
Parses JSON and extracts a boolean value. If the value does not exist or has a wrong type, `0` will be returned.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtractBool(json\[, indices_or_keys\]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns a Boolean value if it exists, otherwise it returns `0`. [Bool](../data-types/boolean.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractBool('{"passed": true}', 'passed');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─JSONExtractBool('{"passed": true}', 'passed')─┐
|
||||
│ 1 │
|
||||
└───────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### JSONExtractString
|
||||
|
||||
Parses JSON and extracts a string. This function is similar to [`visitParamExtractString`](#simplejsonextractstring) functions. If the value does not exist or has a wrong type, an empty string will be returned.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtractString(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns an unescaped string from `json`. If unescaping failed, if the value does not exist or if it has a wrong type then it returns an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') = 'hello'
|
||||
@ -484,16 +692,35 @@ SELECT JSONExtractString('{"abc":"\\u263"}', 'abc') = ''
|
||||
SELECT JSONExtractString('{"abc":"hello}', 'abc') = ''
|
||||
```
|
||||
|
||||
## JSONExtract(json\[, indices_or_keys...\], Return_type)
|
||||
### JSONExtract
|
||||
|
||||
Parses a JSON and extract a value of the given ClickHouse data type.
|
||||
Parses JSON and extracts a value of the given ClickHouse data type. This function is a generalized version of the previous `JSONExtract<type>` functions. Meaning:
|
||||
|
||||
This is a generalization of the previous `JSONExtract<type>` functions.
|
||||
This means
|
||||
`JSONExtract(..., 'String')` returns exactly the same as `JSONExtractString()`,
|
||||
`JSONExtract(..., 'Float64')` returns exactly the same as `JSONExtractFloat()`.
|
||||
|
||||
Examples:
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtract(json [, indices_or_keys...], return_type)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
- `return_type` — A string specifying the type of the value to extract. [String](../data-types/string.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns a value if it exists of the specified return type, otherwise it returns `0`, `Null`, or an empty-string depending on the specified return type. [UInt64](../data-types/int-uint.md), [Int64](../data-types/int-uint.md), [Float64](../data-types/float.md), [Bool](../data-types/boolean.md) or [String](../data-types/string.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'Tuple(String, Array(Float64))') = ('hello',[-100,200,300])
|
||||
@ -506,17 +733,38 @@ SELECT JSONExtract('{"day": "Thursday"}', 'day', 'Enum8(\'Sunday\' = 0, \'Monday
|
||||
SELECT JSONExtract('{"day": 5}', 'day', 'Enum8(\'Sunday\' = 0, \'Monday\' = 1, \'Tuesday\' = 2, \'Wednesday\' = 3, \'Thursday\' = 4, \'Friday\' = 5, \'Saturday\' = 6)') = 'Friday'
|
||||
```
|
||||
|
||||
## JSONExtractKeysAndValues(json\[, indices_or_keys...\], Value_type)
|
||||
### JSONExtractKeysAndValues
|
||||
|
||||
Parses key-value pairs from a JSON where the values are of the given ClickHouse data type.
|
||||
Parses key-value pairs from JSON where the values are of the given ClickHouse data type.
|
||||
|
||||
Example:
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtractKeysAndValues(json [, indices_or_keys...], value_type)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
- `value_type` — A string specifying the type of the value to extract. [String](../data-types/string.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns an array of parsed key-value pairs. [Array](../data-types/array.md)([Tuple](../data-types/tuple.md)(`value_type`)).
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractKeysAndValues('{"x": {"a": 5, "b": 7, "c": 11}}', 'x', 'Int8') = [('a',5),('b',7),('c',11)];
|
||||
```
|
||||
|
||||
## JSONExtractKeys
|
||||
### JSONExtractKeys
|
||||
|
||||
Parses a JSON string and extracts the keys.
|
||||
|
||||
@ -526,14 +774,14 @@ Parses a JSON string and extracts the keys.
|
||||
JSONExtractKeys(json[, a, b, c...])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
**Parameters**
|
||||
|
||||
- `json` — [String](../data-types/string.md) with valid JSON.
|
||||
- `a, b, c...` — Comma-separated indices or keys that specify the path to the inner field in a nested JSON object. Each argument can be either a [String](../data-types/string.md) to get the field by the key or an [Integer](../data-types/int-uint.md) to get the N-th field (indexed from 1, negative integers count from the end). If not set, the whole JSON is parsed as the top-level object. Optional parameter.
|
||||
|
||||
**Returned value**
|
||||
|
||||
Array with the keys of the JSON. [Array](../data-types/array.md)([String](../data-types/string.md)).
|
||||
- Returns an array with the keys of the JSON. [Array](../data-types/array.md)([String](../data-types/string.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -552,31 +800,67 @@ text
|
||||
└────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## JSONExtractRaw(json\[, indices_or_keys\]...)
|
||||
### JSONExtractRaw
|
||||
|
||||
Returns a part of JSON as unparsed string.
|
||||
Returns part of the JSON as an unparsed string. If the part does not exist or has the wrong type, an empty string will be returned.
|
||||
|
||||
If the part does not exist or has a wrong type, an empty string will be returned.
|
||||
**Syntax**
|
||||
|
||||
Example:
|
||||
```sql
|
||||
JSONExtractRaw(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns part of the JSON as an unparsed string. If the part does not exist or has the wrong type, an empty string is returned. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractRaw('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = '[-100, 200.0, 300]';
|
||||
```
|
||||
|
||||
## JSONExtractArrayRaw(json\[, indices_or_keys...\])
|
||||
### JSONExtractArrayRaw
|
||||
|
||||
Returns an array with elements of JSON array, each represented as unparsed string.
|
||||
Returns an array with elements of JSON array, each represented as unparsed string. If the part does not exist or isn’t an array, then an empty array will be returned.
|
||||
|
||||
If the part does not exist or isn’t array, an empty array will be returned.
|
||||
**Syntax**
|
||||
|
||||
Example:
|
||||
```sql
|
||||
JSONExtractArrayRaw(json [, indices_or_keys...])
|
||||
```
|
||||
|
||||
``` sql
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns an array with elements of JSON array, each represented as unparsed string. Otherwise, an empty array is returned if the part does not exist or is not an array. [Array](../data-types/array.md)([String](../data-types/string.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT JSONExtractArrayRaw('{"a": "hello", "b": [-100, 200.0, "hello"]}', 'b') = ['-100', '200.0', '"hello"'];
|
||||
```
|
||||
|
||||
## JSONExtractKeysAndValuesRaw
|
||||
### JSONExtractKeysAndValuesRaw
|
||||
|
||||
Extracts raw data from a JSON object.
|
||||
|
||||
@ -640,13 +924,30 @@ Result:
|
||||
└───────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## JSON_EXISTS(json, path)
|
||||
### JSON_EXISTS
|
||||
|
||||
If the value exists in the JSON document, `1` will be returned.
|
||||
If the value exists in the JSON document, `1` will be returned. If the value does not exist, `0` will be returned.
|
||||
|
||||
If the value does not exist, `0` will be returned.
|
||||
**Syntax**
|
||||
|
||||
Examples:
|
||||
```sql
|
||||
JSON_EXISTS(json, path)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — A string with valid JSON. [String](../data-types/string.md).
|
||||
- `path` — A string representing the path. [String](../data-types/string.md).
|
||||
|
||||
:::note
|
||||
Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json)
|
||||
:::
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns `1` if the value exists in the JSON document, otherwise `0`.
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT JSON_EXISTS('{"hello":1}', '$.hello');
|
||||
@ -655,17 +956,32 @@ SELECT JSON_EXISTS('{"hello":["world"]}', '$.hello[*]');
|
||||
SELECT JSON_EXISTS('{"hello":["world"]}', '$.hello[0]');
|
||||
```
|
||||
|
||||
### JSON_QUERY
|
||||
|
||||
Parses a JSON and extract a value as a JSON array or JSON object. If the value does not exist, an empty string will be returned.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSON_QUERY(json, path)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — A string with valid JSON. [String](../data-types/string.md).
|
||||
- `path` — A string representing the path. [String](../data-types/string.md).
|
||||
|
||||
:::note
|
||||
Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json)
|
||||
:::
|
||||
|
||||
## JSON_QUERY(json, path)
|
||||
**Returned value**
|
||||
|
||||
Parses a JSON and extract a value as JSON array or JSON object.
|
||||
- Returns the extracted value as a JSON array or JSON object. Otherwise it returns an empty string if the value does not exist. [String](../data-types/string.md).
|
||||
|
||||
If the value does not exist, an empty string will be returned.
|
||||
**Example**
|
||||
|
||||
Example:
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSON_QUERY('{"hello":"world"}', '$.hello');
|
||||
@ -682,17 +998,38 @@ Result:
|
||||
[2]
|
||||
String
|
||||
```
|
||||
|
||||
### JSON_VALUE
|
||||
|
||||
Parses a JSON and extract a value as a JSON scalar. If the value does not exist, an empty string will be returned by default.
|
||||
|
||||
This function is controlled by the following settings:
|
||||
|
||||
- by SET `function_json_value_return_type_allow_nullable` = `true`, `NULL` will be returned. If the value is complex type (such as: struct, array, map), an empty string will be returned by default.
|
||||
- by SET `function_json_value_return_type_allow_complex` = `true`, the complex value will be returned.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSON_VALUE(json, path)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — A string with valid JSON. [String](../data-types/string.md).
|
||||
- `path` — A string representing the path. [String](../data-types/string.md).
|
||||
|
||||
:::note
|
||||
Before version 21.11 the order of arguments was wrong, i.e. JSON_QUERY(path, json)
|
||||
Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json)
|
||||
:::
|
||||
|
||||
## JSON_VALUE(json, path)
|
||||
**Returned value**
|
||||
|
||||
Parses a JSON and extract a value as JSON scalar.
|
||||
- Returns the extracted value as a JSON scalar if it exists, otherwise an empty string is returned. [String](../data-types/string.md).
|
||||
|
||||
If the value does not exist, an empty string will be returned by default, and by SET `function_json_value_return_type_allow_nullable` = `true`, `NULL` will be returned. If the value is complex type (such as: struct, array, map), an empty string will be returned by default, and by SET `function_json_value_return_type_allow_complex` = `true`, the complex value will be returned.
|
||||
**Example**
|
||||
|
||||
Example:
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSON_VALUE('{"hello":"world"}', '$.hello');
|
||||
@ -712,11 +1049,7 @@ world
|
||||
String
|
||||
```
|
||||
|
||||
:::note
|
||||
Before version 21.11 the order of arguments was wrong, i.e. JSON_VALUE(path, json)
|
||||
:::
|
||||
|
||||
## toJSONString
|
||||
### toJSONString
|
||||
|
||||
Serializes a value to its JSON representation. Various data types and nested structures are supported.
|
||||
64-bit [integers](../data-types/int-uint.md) or bigger (like `UInt64` or `Int128`) are enclosed in quotes by default. [output_format_json_quote_64bit_integers](../../operations/settings/settings.md#session_settings-output_format_json_quote_64bit_integers) controls this behavior.
|
||||
@ -762,7 +1095,7 @@ Result:
|
||||
- [output_format_json_quote_denormals](../../operations/settings/settings.md#settings-output_format_json_quote_denormals)
|
||||
|
||||
|
||||
## JSONArrayLength
|
||||
### JSONArrayLength
|
||||
|
||||
Returns the number of elements in the outermost JSON array. The function returns NULL if input JSON string is invalid.
|
||||
|
||||
@ -795,7 +1128,7 @@ SELECT
|
||||
```
|
||||
|
||||
|
||||
## jsonMergePatch
|
||||
### jsonMergePatch
|
||||
|
||||
Returns the merged JSON object string which is formed by merging multiple JSON objects.
|
||||
|
||||
|
@ -735,8 +735,6 @@ LIMIT 10
|
||||
|
||||
Given a size (number of bytes), this function returns a readable, rounded size with suffix (KB, MB, etc.) as string.
|
||||
|
||||
The opposite operations of this function are [fromReadableDecimalSize](#fromReadableDecimalSize), [fromReadableDecimalSizeOrZero](#fromReadableDecimalSizeOrZero), and [fromReadableDecimalSizeOrNull](#fromReadableDecimalSizeOrNull).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
@ -768,8 +766,6 @@ Result:
|
||||
|
||||
Given a size (number of bytes), this function returns a readable, rounded size with suffix (KiB, MiB, etc.) as string.
|
||||
|
||||
The opposite operations of this function are [fromReadableSize](#fromReadableSize), [fromReadableSizeOrZero](#fromReadableSizeOrZero), and [fromReadableSizeOrNull](#fromReadableSizeOrNull).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
@ -894,238 +890,6 @@ SELECT
|
||||
└────────────────────┴────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## fromReadableSize
|
||||
|
||||
Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it throws an exception.
|
||||
|
||||
The opposite operation of this function is [formatReadableSize](#fromReadableSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableSize(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer ([UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB']) AS readable_sizes,
|
||||
fromReadableSize(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## fromReadableSizeOrNull
|
||||
|
||||
Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it returns `NULL`.
|
||||
|
||||
The opposite operation of this function is [formatReadableSize](#fromReadableSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableSizeOrNull(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer, or NULL if unable to parse the input (Nullable([UInt64](../../sql-reference/data-types/int-uint.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB', 'invalid']) AS readable_sizes,
|
||||
fromReadableSizeOrNull(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
│ invalid │ ᴺᵁᴸᴸ │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## fromReadableSizeOrZero
|
||||
|
||||
Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it returns `0`.
|
||||
|
||||
The opposite operation of this function is [formatReadableSize](#fromReadableSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableSizeOrZero(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer, or 0 if unable to parse the input ([UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB', 'invalid']) AS readable_sizes,
|
||||
fromReadableSizeOrZero(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
│ invalid │ 0 │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## fromReadableDecimalSize
|
||||
|
||||
Given a string containing a byte size and `B`, `KB`, `MB`, etc. as a unit, this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it throws an exception.
|
||||
|
||||
The opposite operation of this function is [formatReadableDecimalSize](#formatReadableDecimalSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableDecimalSize(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with decimal units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer ([UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB']) AS readable_sizes,
|
||||
fromReadableDecimalSize(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5314 │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## fromReadableDecimalSizeOrNull
|
||||
|
||||
Given a string containing a byte size and `B`, `KB`, `MB`, etc. as a unit, this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it returns `NULL`.
|
||||
|
||||
The opposite operation of this function is [formatReadableDecimalSize](#formatReadableDecimalSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableDecimalSizeOrNull(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with decimal units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer, or NULL if unable to parse the input (Nullable([UInt64](../../sql-reference/data-types/int-uint.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB', 'invalid']) AS readable_sizes,
|
||||
fromReadableDecimalSizeOrNull(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5314 │
|
||||
│ invalid │ ᴺᵁᴸᴸ │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## fromReadableDecimalSizeOrZero
|
||||
|
||||
Given a string containing a byte size and `B`, `KB`, `MB`, etc. as a unit, this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it returns `0`.
|
||||
|
||||
The opposite operation of this function is [formatReadableDecimalSize](#formatReadableDecimalSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableDecimalSizeOrZero(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with decimal units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer, or 0 if unable to parse the input ([UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB', 'invalid']) AS readable_sizes,
|
||||
fromReadableSizeOrZero(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5000 │
|
||||
│ invalid │ 0 │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## parseTimeDelta
|
||||
|
||||
Parse a sequence of numbers followed by something resembling a time unit.
|
||||
|
@ -6,7 +6,33 @@ sidebar_label: URLs
|
||||
|
||||
# Functions for Working with URLs
|
||||
|
||||
All these functions do not follow the RFC. They are maximally simplified for improved performance.
|
||||
:::note
|
||||
The functions mentioned in this section are optimized for maximum performance and for the most part do not follow the RFC-3986 standard. Functions which implement RFC-3986 have `RFC` appended to their function name and are generally slower.
|
||||
:::
|
||||
|
||||
You can generally use the non-`RFC` function variants when working with publicly registered domains that contain neither user strings nor `@` symbols.
|
||||
The table below details which symbols in a URL can (`✔`) or cannot (`✗`) be parsed by the respective `RFC` and non-`RFC` variants:
|
||||
|
||||
|Symbol | non-`RFC`| `RFC` |
|
||||
|-------|----------|-------|
|
||||
| ' ' | ✗ |✗ |
|
||||
| \t | ✗ |✗ |
|
||||
| < | ✗ |✗ |
|
||||
| > | ✗ |✗ |
|
||||
| % | ✗ |✔* |
|
||||
| { | ✗ |✗ |
|
||||
| } | ✗ |✗ |
|
||||
| \| | ✗ |✗ |
|
||||
| \\\ | ✗ |✗ |
|
||||
| ^ | ✗ |✗ |
|
||||
| ~ | ✗ |✔* |
|
||||
| [ | ✗ |✗ |
|
||||
| ] | ✗ |✔ |
|
||||
| ; | ✗ |✔* |
|
||||
| = | ✗ |✔* |
|
||||
| & | ✗ |✔* |
|
||||
|
||||
symbols marked `*` are sub-delimiters in RFC 3986 and allowed for user info following the `@` symbol.
|
||||
|
||||
## Functions that Extract Parts of a URL
|
||||
|
||||
@ -16,21 +42,23 @@ If the relevant part isn’t present in a URL, an empty string is returned.
|
||||
|
||||
Extracts the protocol from a URL.
|
||||
|
||||
Examples of typical returned values: http, https, ftp, mailto, tel, magnet...
|
||||
Examples of typical returned values: http, https, ftp, mailto, tel, magnet.
|
||||
|
||||
### domain
|
||||
|
||||
Extracts the hostname from a URL.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
domain(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
The URL can be specified with or without a scheme. Examples:
|
||||
The URL can be specified with or without a protocol. Examples:
|
||||
|
||||
``` text
|
||||
svn+ssh://some.svn-hosting.com:80/repo/trunk
|
||||
@ -48,7 +76,7 @@ clickhouse.com
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Host name if ClickHouse can parse the input string as a URL, otherwise an empty string. [String](../data-types/string.md).
|
||||
- Host name if the input string can be parsed as a URL, otherwise an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -62,9 +90,103 @@ SELECT domain('svn+ssh://some.svn-hosting.com:80/repo/trunk');
|
||||
└────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### domainRFC
|
||||
|
||||
Extracts the hostname from a URL. Similar to [domain](#domain), but RFC 3986 conformant.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
domainRFC(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Host name if the input string can be parsed as a URL, otherwise an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
domain('http://user:password@example.com:8080/path?query=value#fragment'),
|
||||
domainRFC('http://user:password@example.com:8080/path?query=value#fragment');
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─domain('http://user:password@example.com:8080/path?query=value#fragment')─┬─domainRFC('http://user:password@example.com:8080/path?query=value#fragment')─┐
|
||||
│ │ example.com │
|
||||
└───────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### domainWithoutWWW
|
||||
|
||||
Returns the domain and removes no more than one ‘www.’ from the beginning of it, if present.
|
||||
Returns the domain without leading `www.` if present.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
domainWithoutWWW(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Domain name if the input string can be parsed as a URL (without leading `www.`), otherwise an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT domainWithoutWWW('http://paul@www.example.com:80/');
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─domainWithoutWWW('http://paul@www.example.com:80/')─┐
|
||||
│ example.com │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### domainWithoutWWWRFC
|
||||
|
||||
Returns the domain without leading `www.` if present. Similar to [domainWithoutWWW](#domainwithoutwww) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
domainWithoutWWWRFC(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Domain name if the input string can be parsed as a URL (without leading `www.`), otherwise an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
domainWithoutWWW('http://user:password@www.example.com:8080/path?query=value#fragment'),
|
||||
domainWithoutWWWRFC('http://user:password@www.example.com:8080/path?query=value#fragment');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─domainWithoutWWW('http://user:password@www.example.com:8080/path?query=value#fragment')─┬─domainWithoutWWWRFC('http://user:password@www.example.com:8080/path?query=value#fragment')─┐
|
||||
│ │ example.com │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### topLevelDomain
|
||||
|
||||
@ -76,63 +198,314 @@ topLevelDomain(url)
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
The URL can be specified with or without a scheme. Examples:
|
||||
:::note
|
||||
The URL can be specified with or without a protocol. Examples:
|
||||
|
||||
``` text
|
||||
svn+ssh://some.svn-hosting.com:80/repo/trunk
|
||||
some.svn-hosting.com:80/repo/trunk
|
||||
https://clickhouse.com/time/
|
||||
```
|
||||
:::
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Domain name if ClickHouse can parse the input string as a URL. Otherwise, an empty string. [String](../data-types/string.md).
|
||||
- Domain name if the input string can be parsed as a URL. Otherwise, an empty string. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk')─┐
|
||||
│ com │
|
||||
└────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### topLevelDomainRFC
|
||||
|
||||
Extracts the the top-level domain from a URL.
|
||||
Similar to [topLevelDomain](#topleveldomain), but conforms to RFC 3986.
|
||||
|
||||
``` sql
|
||||
topLevelDomainRFC(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
:::note
|
||||
The URL can be specified with or without a protocol. Examples:
|
||||
|
||||
``` text
|
||||
svn+ssh://some.svn-hosting.com:80/repo/trunk
|
||||
some.svn-hosting.com:80/repo/trunk
|
||||
https://clickhouse.com/time/
|
||||
```
|
||||
:::
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Domain name if the input string can be parsed as a URL. Otherwise, an empty string. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT topLevelDomain('http://foo:foo%41bar@foo.com'), topLevelDomainRFC('http://foo:foo%41bar@foo.com');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─topLevelDomain('http://foo:foo%41bar@foo.com')─┬─topLevelDomainRFC('http://foo:foo%41bar@foo.com')─┐
|
||||
│ │ com │
|
||||
└────────────────────────────────────────────────┴───────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### firstSignificantSubdomain
|
||||
|
||||
Returns the “first significant subdomain”. The first significant subdomain is a second-level domain if it is ‘com’, ‘net’, ‘org’, or ‘co’. Otherwise, it is a third-level domain. For example, `firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’`. The list of “insignificant” second-level domains and other implementation details may change in the future.
|
||||
Returns the “first significant subdomain”.
|
||||
The first significant subdomain is a second-level domain for `com`, `net`, `org`, or `co`, otherwise it is a third-level domain.
|
||||
For example, `firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’`.
|
||||
The list of "insignificant" second-level domains and other implementation details may change in the future.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
firstSignificantSubdomain(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- The first significant subdomain. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT firstSignificantSubdomain('http://www.example.com/a/b/c?a=b')
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```reference
|
||||
┌─firstSignificantSubdomain('http://www.example.com/a/b/c?a=b')─┐
|
||||
│ example │
|
||||
└───────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### firstSignificantSubdomainRFC
|
||||
|
||||
Returns the “first significant subdomain”.
|
||||
The first significant subdomain is a second-level domain for `com`, `net`, `org`, or `co`, otherwise it is a third-level domain.
|
||||
For example, `firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’`.
|
||||
The list of "insignificant" second-level domains and other implementation details may change in the future.
|
||||
Similar to [firstSignficantSubdomain](#firstsignificantsubdomain) but conforms to RFC 1034.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
firstSignificantSubdomainRFC(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- The first significant subdomain. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
firstSignificantSubdomain('http://user:password@example.com:8080/path?query=value#fragment'),
|
||||
firstSignificantSubdomainRFC('http://user:password@example.com:8080/path?query=value#fragment');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```reference
|
||||
┌─firstSignificantSubdomain('http://user:password@example.com:8080/path?query=value#fragment')─┬─firstSignificantSubdomainRFC('http://user:password@example.com:8080/path?query=value#fragment')─┐
|
||||
│ │ example │
|
||||
└──────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### cutToFirstSignificantSubdomain
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain” (see the explanation above).
|
||||
Returns the part of the domain that includes top-level subdomains up to the [“first significant subdomain”](#firstsignificantsubdomain).
|
||||
|
||||
For example:
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
cutToFirstSignificantSubdomain(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain if possible, otherwise returns an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/'),
|
||||
cutToFirstSignificantSubdomain('www.tr'),
|
||||
cutToFirstSignificantSubdomain('tr');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/')─┬─cutToFirstSignificantSubdomain('www.tr')─┬─cutToFirstSignificantSubdomain('tr')─┐
|
||||
│ clickhouse.com.tr │ tr │ │
|
||||
└───────────────────────────────────────────────────────────────────┴──────────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### cutToFirstSignificantSubdomainRFC
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the [“first significant subdomain”](#firstsignificantsubdomain).
|
||||
Similar to [cutToFirstSignificantSubdomain](#cuttofirstsignificantsubdomain) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
cutToFirstSignificantSubdomainRFC(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain if possible, otherwise returns an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
cutToFirstSignificantSubdomain('http://user:password@example.com:8080'),
|
||||
cutToFirstSignificantSubdomainRFC('http://user:password@example.com:8080');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─cutToFirstSignificantSubdomain('http://user:password@example.com:8080')─┬─cutToFirstSignificantSubdomainRFC('http://user:password@example.com:8080')─┐
|
||||
│ │ example.com │
|
||||
└─────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- `cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'`.
|
||||
- `cutToFirstSignificantSubdomain('www.tr') = 'tr'`.
|
||||
- `cutToFirstSignificantSubdomain('tr') = ''`.
|
||||
|
||||
### cutToFirstSignificantSubdomainWithWWW
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain”, without stripping "www".
|
||||
Returns the part of the domain that includes top-level subdomains up to the "first significant subdomain", without stripping `www`.
|
||||
|
||||
For example:
|
||||
**Syntax**
|
||||
|
||||
- `cutToFirstSignificantSubdomainWithWWW('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'`.
|
||||
- `cutToFirstSignificantSubdomainWithWWW('www.tr') = 'www.tr'`.
|
||||
- `cutToFirstSignificantSubdomainWithWWW('tr') = ''`.
|
||||
```sql
|
||||
cutToFirstSignificantSubdomainWithWWW(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain (with `www`) if possible, otherwise returns an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
cutToFirstSignificantSubdomainWithWWW('https://news.clickhouse.com.tr/'),
|
||||
cutToFirstSignificantSubdomainWithWWW('www.tr'),
|
||||
cutToFirstSignificantSubdomainWithWWW('tr');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─cutToFirstSignificantSubdomainWithWWW('https://news.clickhouse.com.tr/')─┬─cutToFirstSignificantSubdomainWithWWW('www.tr')─┬─cutToFirstSignificantSubdomainWithWWW('tr')─┐
|
||||
│ clickhouse.com.tr │ www.tr │ │
|
||||
└──────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────┴─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### cutToFirstSignificantSubdomainWithWWWRFC
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the "first significant subdomain", without stripping `www`.
|
||||
Similar to [cutToFirstSignificantSubdomainWithWWW](#cuttofirstsignificantsubdomaincustomwithwww) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
cutToFirstSignificantSubdomainWithWWW(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain (with "www") if possible, otherwise returns an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
cutToFirstSignificantSubdomainWithWWW('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy'),
|
||||
cutToFirstSignificantSubdomainWithWWWRFC('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─cutToFirstSignificantSubdomainWithWWW('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy')─┬─cutToFirstSignificantSubdomainWithWWWRFC('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy')─┐
|
||||
│ │ mail.ru │
|
||||
└───────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### cutToFirstSignificantSubdomainCustom
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain. Accepts custom [TLD list](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) name.
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain.
|
||||
Accepts custom [TLD list](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) name.
|
||||
This function can be useful if you need a fresh TLD list or if you have a custom list.
|
||||
|
||||
Can be useful if you need fresh TLD list or you have custom.
|
||||
|
||||
Configuration example:
|
||||
**Configuration example**
|
||||
|
||||
```xml
|
||||
<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
|
||||
@ -146,17 +519,17 @@ Configuration example:
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
cutToFirstSignificantSubdomainCustom(URL, TLD)
|
||||
cutToFirstSignificantSubdomain(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `URL` — URL. [String](../data-types/string.md).
|
||||
- `TLD` — Custom TLD list name. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../data-types/string.md).
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -178,13 +551,39 @@ Result:
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### cutToFirstSignificantSubdomainCustomRFC
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain.
|
||||
Accepts custom [TLD list](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) name.
|
||||
This function can be useful if you need a fresh TLD list or if you have a custom list.
|
||||
Similar to [cutToFirstSignificantSubdomainCustom](#cuttofirstsignificantsubdomaincustom) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
cutToFirstSignificantSubdomainRFC(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**See Also**
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### cutToFirstSignificantSubdomainCustomWithWWW
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`. Accepts custom TLD list name.
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`.
|
||||
Accepts custom TLD list name.
|
||||
It can be useful if you need a fresh TLD list or if you have a custom list.
|
||||
|
||||
Can be useful if you need fresh TLD list or you have custom.
|
||||
|
||||
Configuration example:
|
||||
**Configuration example**
|
||||
|
||||
```xml
|
||||
<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
|
||||
@ -198,13 +597,13 @@ Configuration example:
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
cutToFirstSignificantSubdomainCustomWithWWW(URL, TLD)
|
||||
cutToFirstSignificantSubdomainCustomWithWWW(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `URL` — URL. [String](../data-types/string.md).
|
||||
- `TLD` — Custom TLD list name. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
@ -230,10 +629,36 @@ Result:
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### cutToFirstSignificantSubdomainCustomWithWWWRFC
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`.
|
||||
Accepts custom TLD list name.
|
||||
It can be useful if you need a fresh TLD list or if you have a custom list.
|
||||
Similar to [cutToFirstSignificantSubdomainCustomWithWWW](#cuttofirstsignificantsubdomaincustomwithwww) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
cutToFirstSignificantSubdomainCustomWithWWWRFC(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**See Also**
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### firstSignificantSubdomainCustom
|
||||
|
||||
Returns the first significant subdomain. Accepts customs TLD list name.
|
||||
|
||||
Returns the first significant subdomain.
|
||||
Accepts customs TLD list name.
|
||||
Can be useful if you need fresh TLD list or you have custom.
|
||||
|
||||
Configuration example:
|
||||
@ -250,17 +675,17 @@ Configuration example:
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
firstSignificantSubdomainCustom(URL, TLD)
|
||||
firstSignificantSubdomainCustom(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `URL` — URL. [String](../data-types/string.md).
|
||||
- `TLD` — Custom TLD list name. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- First significant subdomain. [String](../data-types/string.md).
|
||||
- First significant subdomain. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -282,47 +707,156 @@ Result:
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### port(URL\[, default_port = 0\])
|
||||
### firstSignificantSubdomainCustomRFC
|
||||
|
||||
Returns the port or `default_port` if there is no port in the URL (or in case of validation error).
|
||||
Returns the first significant subdomain.
|
||||
Accepts customs TLD list name.
|
||||
Can be useful if you need fresh TLD list or you have custom.
|
||||
Similar to [firstSignificantSubdomainCustom](#firstsignificantsubdomaincustom) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
firstSignificantSubdomainCustomRFC(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- First significant subdomain. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**See Also**
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### port
|
||||
|
||||
Returns the port or `default_port` if the URL contains no port or cannot be parsed.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
port(url [, default_port = 0])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
- `default_port` — The default port number to be returned. [UInt16](../data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Port or the default port if there is no port in the URL or in case of a validation error. [UInt16](../data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT port('http://paul@www.example.com:80/');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─port('http://paul@www.example.com:80/')─┐
|
||||
│ 80 │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### portRFC
|
||||
|
||||
Returns the port or `default_port` if the URL contains no port or cannot be parsed.
|
||||
Similar to [port](#port), but RFC 3986 conformant.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
portRFC(url [, default_port = 0])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `default_port` — The default port number to be returned. [UInt16](../data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Port or the default port if there is no port in the URL or in case of a validation error. [UInt16](../data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
port('http://user:password@example.com:8080'),
|
||||
portRFC('http://user:password@example.com:8080');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```resposne
|
||||
┌─port('http://user:password@example.com:8080')─┬─portRFC('http://user:password@example.com:8080')─┐
|
||||
│ 0 │ 8080 │
|
||||
└───────────────────────────────────────────────┴──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### path
|
||||
|
||||
Returns the path. Example: `/top/news.html` The path does not include the query string.
|
||||
Returns the path without query string.
|
||||
|
||||
Example: `/top/news.html`.
|
||||
|
||||
### pathFull
|
||||
|
||||
The same as above, but including query string and fragment. Example: /top/news.html?page=2#comments
|
||||
The same as above, but including query string and fragment.
|
||||
|
||||
Example: `/top/news.html?page=2#comments`.
|
||||
|
||||
### queryString
|
||||
|
||||
Returns the query string. Example: page=1&lr=213. query-string does not include the initial question mark, as well as # and everything after #.
|
||||
Returns the query string without the initial question mark, `#` and everything after `#`.
|
||||
|
||||
Example: `page=1&lr=213`.
|
||||
|
||||
### fragment
|
||||
|
||||
Returns the fragment identifier. fragment does not include the initial hash symbol.
|
||||
Returns the fragment identifier without the initial hash symbol.
|
||||
|
||||
### queryStringAndFragment
|
||||
|
||||
Returns the query string and fragment identifier. Example: page=1#29390.
|
||||
Returns the query string and fragment identifier.
|
||||
|
||||
### extractURLParameter(URL, name)
|
||||
Example: `page=1#29390`.
|
||||
|
||||
Returns the value of the ‘name’ parameter in the URL, if present. Otherwise, an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.
|
||||
### extractURLParameter(url, name)
|
||||
|
||||
### extractURLParameters(URL)
|
||||
Returns the value of the `name` parameter in the URL, if present, otherwise an empty string is returned.
|
||||
If there are multiple parameters with this name, the first occurrence is returned.
|
||||
The function assumes that the parameter in the `url` parameter is encoded in the same way as in the `name` argument.
|
||||
|
||||
Returns an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.
|
||||
### extractURLParameters(url)
|
||||
|
||||
### extractURLParameterNames(URL)
|
||||
Returns an array of `name=value` strings corresponding to the URL parameters.
|
||||
The values are not decoded.
|
||||
|
||||
Returns an array of name strings corresponding to the names of URL parameters. The values are not decoded in any way.
|
||||
### extractURLParameterNames(url)
|
||||
|
||||
### URLHierarchy(URL)
|
||||
Returns an array of name strings corresponding to the names of URL parameters.
|
||||
The values are not decoded.
|
||||
|
||||
Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters.
|
||||
### URLHierarchy(url)
|
||||
|
||||
### URLPathHierarchy(URL)
|
||||
Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string.
|
||||
Consecutive separator characters are counted as one.
|
||||
The cut is made in the position after all the consecutive separator characters.
|
||||
|
||||
### URLPathHierarchy(url)
|
||||
|
||||
The same as above, but without the protocol and host in the result. The / element (root) is not included.
|
||||
|
||||
@ -334,9 +868,10 @@ URLPathHierarchy('https://example.com/browse/CONV-6788') =
|
||||
]
|
||||
```
|
||||
|
||||
### encodeURLComponent(URL)
|
||||
### encodeURLComponent(url)
|
||||
|
||||
Returns the encoded URL.
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
@ -349,9 +884,10 @@ SELECT encodeURLComponent('http://127.0.0.1:8123/?query=SELECT 1;') AS EncodedUR
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### decodeURLComponent(URL)
|
||||
### decodeURLComponent(url)
|
||||
|
||||
Returns the decoded URL.
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
@ -364,9 +900,10 @@ SELECT decodeURLComponent('http://127.0.0.1:8123/?query=SELECT%201%3B') AS Decod
|
||||
└────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### encodeURLFormComponent(URL)
|
||||
### encodeURLFormComponent(url)
|
||||
|
||||
Returns the encoded URL. Follows rfc-1866, space(` `) is encoded as plus(`+`).
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
@ -379,9 +916,10 @@ SELECT encodeURLFormComponent('http://127.0.0.1:8123/?query=SELECT 1 2+3') AS En
|
||||
└───────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### decodeURLFormComponent(URL)
|
||||
### decodeURLFormComponent(url)
|
||||
|
||||
Returns the decoded URL. Follows rfc-1866, plain plus(`+`) is decoded as space(` `).
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
@ -401,12 +939,12 @@ Extracts network locality (`username:password@host:port`) from a URL.
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
netloc(URL)
|
||||
netloc(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
@ -428,44 +966,45 @@ Result:
|
||||
└───────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Functions that Remove Part of a URL
|
||||
## Functions that remove part of a URL
|
||||
|
||||
If the URL does not have anything similar, the URL remains unchanged.
|
||||
|
||||
### cutWWW
|
||||
|
||||
Removes no more than one ‘www.’ from the beginning of the URL’s domain, if present.
|
||||
Removes leading `www.` (if present) from the URL’s domain.
|
||||
|
||||
### cutQueryString
|
||||
|
||||
Removes query string. The question mark is also removed.
|
||||
Removes query string, including the question mark.
|
||||
|
||||
### cutFragment
|
||||
|
||||
Removes the fragment identifier. The number sign is also removed.
|
||||
Removes the fragment identifier, including the number sign.
|
||||
|
||||
### cutQueryStringAndFragment
|
||||
|
||||
Removes the query string and fragment identifier. The question mark and number sign are also removed.
|
||||
Removes the query string and fragment identifier, including the question mark and number sign.
|
||||
|
||||
### cutURLParameter(URL, name)
|
||||
### cutURLParameter(url, name)
|
||||
|
||||
Removes the `name` parameter from URL, if present. This function does not encode or decode characters in parameter names, e.g. `Client ID` and `Client%20ID` are treated as different parameter names.
|
||||
Removes the `name` parameter from a URL, if present.
|
||||
This function does not encode or decode characters in parameter names, e.g. `Client ID` and `Client%20ID` are treated as different parameter names.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
cutURLParameter(URL, name)
|
||||
cutURLParameter(url, name)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
- `name` — name of URL parameter. [String](../data-types/string.md) or [Array](../data-types/array.md) of Strings.
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `name` — name of URL parameter. [String](../../sql-reference/data-types/string.md) or [Array](../../sql-reference/data-types/array.md) of Strings.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- URL with `name` URL parameter removed. [String](../data-types/string.md).
|
||||
- url with `name` URL parameter removed. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
|
@ -126,149 +126,6 @@ SELECT generateUUIDv7(1), generateUUIDv7(2);
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## generateUUIDv7ThreadMonotonic
|
||||
|
||||
Generates a [UUID](../data-types/uuid.md) of [version 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04).
|
||||
|
||||
The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit) to distinguish UUIDs within a millisecond (including a variant field "2", 2 bit), and a random field (32 bits).
|
||||
For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes.
|
||||
In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value.
|
||||
|
||||
This function behaves like [generateUUIDv7](#generateUUIDv7) but gives no guarantee on counter monotony across different simultaneous requests.
|
||||
Monotonicity within one timestamp is guaranteed only within the same thread calling this function to generate UUIDs.
|
||||
|
||||
```
|
||||
0 1 2 3
|
||||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms | ver | counter_high_bits |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
|var| counter_low_bits |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| rand_b |
|
||||
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
|
||||
```
|
||||
|
||||
:::note
|
||||
As of April 2024, version 7 UUIDs are in draft status and their layout may change in future.
|
||||
:::
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
generateUUIDv7ThreadMonotonic([expr])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `expr` — An arbitrary [expression](../syntax.md#syntax-expressions) used to bypass [common subexpression elimination](../functions/index.md#common-subexpression-elimination) if the function is called multiple times in a query. The value of the expression has no effect on the returned UUID. Optional.
|
||||
|
||||
**Returned value**
|
||||
|
||||
A value of type UUIDv7.
|
||||
|
||||
**Usage example**
|
||||
|
||||
First, create a table with a column of type UUID, then insert a generated UUIDv7 into the table.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE tab (uuid UUID) ENGINE = Memory;
|
||||
|
||||
INSERT INTO tab SELECT generateUUIDv7ThreadMonotonic();
|
||||
|
||||
SELECT * FROM tab;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─────────────────────────────────uuid─┐
|
||||
│ 018f05e2-e3b2-70cb-b8be-64b09b626d32 │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Example with multiple UUIDs generated per row**
|
||||
|
||||
```sql
|
||||
SELECT generateUUIDv7ThreadMonotonic(1), generateUUIDv7ThreadMonotonic(2);
|
||||
|
||||
┌─generateUUIDv7ThreadMonotonic(1)─────┬─generateUUIDv7ThreadMonotonic(2)─────┐
|
||||
│ 018f05e1-14ee-7bc5-9906-207153b400b1 │ 018f05e1-14ee-7bc5-9906-2072b8e96758 │
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## generateUUIDv7NonMonotonic
|
||||
|
||||
Generates a [UUID](../data-types/uuid.md) of [version 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04).
|
||||
|
||||
The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits) and a random field (76 bits, including a 2-bit variant field "2").
|
||||
|
||||
This function is the fastest `generateUUIDv7*` function but it gives no monotonicity guarantees within a timestamp.
|
||||
|
||||
```
|
||||
0 1 2 3
|
||||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms | ver | rand_a |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
|var| rand_b |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| rand_b |
|
||||
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
|
||||
```
|
||||
|
||||
:::note
|
||||
As of April 2024, version 7 UUIDs are in draft status and their layout may change in future.
|
||||
:::
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
generateUUIDv7NonMonotonic([expr])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `expr` — An arbitrary [expression](../syntax.md#syntax-expressions) used to bypass [common subexpression elimination](../functions/index.md#common-subexpression-elimination) if the function is called multiple times in a query. The value of the expression has no effect on the returned UUID. Optional.
|
||||
|
||||
**Returned value**
|
||||
|
||||
A value of type UUIDv7.
|
||||
|
||||
**Example**
|
||||
|
||||
First, create a table with a column of type UUID, then insert a generated UUIDv7 into the table.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE tab (uuid UUID) ENGINE = Memory;
|
||||
|
||||
INSERT INTO tab SELECT generateUUIDv7NonMonotonic();
|
||||
|
||||
SELECT * FROM tab;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─────────────────────────────────uuid─┐
|
||||
│ 018f05af-f4a8-778f-beee-1bedbc95c93b │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Example with multiple UUIDs generated per row**
|
||||
|
||||
```sql
|
||||
SELECT generateUUIDv7NonMonotonic(1), generateUUIDv7NonMonotonic(2);
|
||||
|
||||
┌─generateUUIDv7NonMonotonic(1) ───────┬─generateUUIDv7(2)NonMonotonic────────┐
|
||||
│ 018f05b1-8c2e-7567-a988-48d09606ae8c │ 018f05b1-8c2e-7946-895b-fcd7635da9a0 │
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## empty
|
||||
|
||||
Checks whether the input UUID is empty.
|
||||
|
55
docs/en/sql-reference/table-functions/loop.md
Normal file
55
docs/en/sql-reference/table-functions/loop.md
Normal file
@ -0,0 +1,55 @@
|
||||
# loop
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
SELECT ... FROM loop(database, table);
|
||||
SELECT ... FROM loop(database.table);
|
||||
SELECT ... FROM loop(table);
|
||||
SELECT ... FROM loop(other_table_function(...));
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `database` — database name.
|
||||
- `table` — table name.
|
||||
- `other_table_function(...)` — other table function.
|
||||
Example: `SELECT * FROM loop(numbers(10));`
|
||||
`other_table_function(...)` here is `numbers(10)`.
|
||||
|
||||
**Returned Value**
|
||||
|
||||
Infinite loop to return query results.
|
||||
|
||||
**Examples**
|
||||
|
||||
Selecting data from ClickHouse:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM loop(test_database, test_table);
|
||||
SELECT * FROM loop(test_database.test_table);
|
||||
SELECT * FROM loop(test_table);
|
||||
```
|
||||
|
||||
Or using other table function:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM loop(numbers(3)) LIMIT 7;
|
||||
┌─number─┐
|
||||
1. │ 0 │
|
||||
2. │ 1 │
|
||||
3. │ 2 │
|
||||
└────────┘
|
||||
┌─number─┐
|
||||
4. │ 0 │
|
||||
5. │ 1 │
|
||||
6. │ 2 │
|
||||
└────────┘
|
||||
┌─number─┐
|
||||
7. │ 0 │
|
||||
└────────┘
|
||||
```
|
||||
``` sql
|
||||
SELECT * FROM loop(mysql('localhost:3306', 'test', 'test', 'user', 'password'));
|
||||
...
|
||||
```
|
@ -112,113 +112,6 @@ SELECT generateUUIDv7(1), generateUUIDv7(2)
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## generateUUIDv7ThreadMonotonic {#uuidv7threadmonotonic-function-generate}
|
||||
|
||||
Генерирует идентификатор [UUID версии 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04). Генерируемый UUID состоит из 48-битной временной метки (Unix time в миллисекундах), маркеров версии 7 и варианта 2, монотонно возрастающего счётчика для данной временной метки и случайных данных в указанной ниже последовательности. Для каждой новой временной метки счётчик стартует с нового случайного значения, а для следующих UUIDv7 он увеличивается на единицу. В случае переполнения счётчика временная метка принудительно увеличивается на 1, и счётчик снова стартует со случайного значения. Данная функция является ускоренным аналогом функции `generateUUIDv7` за счёт потери гарантии монотонности счётчика при одной и той же метке времени между одновременно исполняемыми разными запросами. Монотонность счётчика гарантируется только в пределах одного треда, исполняющего данную функцию для генерации нескольких UUID.
|
||||
|
||||
**Синтаксис**
|
||||
|
||||
``` sql
|
||||
generateUUIDv7ThreadMonotonic([x])
|
||||
```
|
||||
|
||||
**Аргументы**
|
||||
|
||||
- `x` — [выражение](../syntax.md#syntax-expressions), возвращающее значение одного из [поддерживаемых типов данных](../data-types/index.md#data_types). Значение используется, чтобы избежать [склейки одинаковых выражений](index.md#common-subexpression-elimination), если функция вызывается несколько раз в одном запросе. Необязательный параметр.
|
||||
|
||||
**Возвращаемое значение**
|
||||
|
||||
Значение типа [UUID](../../sql-reference/functions/uuid-functions.md).
|
||||
|
||||
**Пример использования**
|
||||
|
||||
Этот пример демонстрирует, как создать таблицу с UUID-колонкой и добавить в нее сгенерированный UUIDv7.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t_uuid (x UUID) ENGINE=TinyLog
|
||||
|
||||
INSERT INTO t_uuid SELECT generateUUIDv7ThreadMonotonic()
|
||||
|
||||
SELECT * FROM t_uuid
|
||||
```
|
||||
|
||||
``` text
|
||||
┌────────────────────────────────────x─┐
|
||||
│ 018f05e2-e3b2-70cb-b8be-64b09b626d32 │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Пример использования, для генерации нескольких значений в одной строке**
|
||||
|
||||
```sql
|
||||
SELECT generateUUIDv7ThreadMonotonic(1), generateUUIDv7ThreadMonotonic(7)
|
||||
|
||||
┌─generateUUIDv7ThreadMonotonic(1)─────┬─generateUUIDv7ThreadMonotonic(2)─────┐
|
||||
│ 018f05e1-14ee-7bc5-9906-207153b400b1 │ 018f05e1-14ee-7bc5-9906-2072b8e96758 │
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## generateUUIDv7NonMonotonic {#uuidv7nonmonotonic-function-generate}
|
||||
|
||||
Генерирует идентификатор [UUID версии 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04). Генерируемый UUID состоит из 48-битной временной метки (Unix time в миллисекундах), маркеров версии 7 и варианта 2, и случайных данных в следующей последовательности:
|
||||
```
|
||||
0 1 2 3
|
||||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms | ver | rand_a |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
|var| rand_b |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| rand_b |
|
||||
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
|
||||
```
|
||||
::::note
|
||||
На апрель 2024 года UUIDv7 находится в статусе черновика и его раскладка по битам может в итоге измениться.
|
||||
::::
|
||||
|
||||
**Синтаксис**
|
||||
|
||||
``` sql
|
||||
generateUUIDv7NonMonotonic([x])
|
||||
```
|
||||
|
||||
**Аргументы**
|
||||
|
||||
- `x` — [выражение](../syntax.md#syntax-expressions), возвращающее значение одного из [поддерживаемых типов данных](../data-types/index.md#data_types). Значение используется, чтобы избежать [склейки одинаковых выражений](index.md#common-subexpression-elimination), если функция вызывается несколько раз в одном запросе. Необязательный параметр.
|
||||
|
||||
**Возвращаемое значение**
|
||||
|
||||
Значение типа [UUID](../../sql-reference/functions/uuid-functions.md).
|
||||
|
||||
**Пример использования**
|
||||
|
||||
Этот пример демонстрирует, как создать таблицу с UUID-колонкой и добавить в нее сгенерированный UUIDv7.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t_uuid (x UUID) ENGINE=TinyLog
|
||||
|
||||
INSERT INTO t_uuid SELECT generateUUIDv7NonMonotonic()
|
||||
|
||||
SELECT * FROM t_uuid
|
||||
```
|
||||
|
||||
``` text
|
||||
┌────────────────────────────────────x─┐
|
||||
│ 018f05af-f4a8-778f-beee-1bedbc95c93b │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Пример использования, для генерации нескольких значений в одной строке**
|
||||
|
||||
```sql
|
||||
SELECT generateUUIDv7NonMonotonic(1), generateUUIDv7NonMonotonic(7)
|
||||
┌─generateUUIDv7NonMonotonic(1)────────┬─generateUUIDv7NonMonotonic(2)────────┐
|
||||
│ 018f05b1-8c2e-7567-a988-48d09606ae8c │ 018f05b1-8c2e-7946-895b-fcd7635da9a0 │
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## empty {#empty}
|
||||
|
||||
Проверяет, является ли входной UUID пустым.
|
||||
|
@ -10,6 +10,7 @@ namespace DB
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int LOGICAL_ERROR;
|
||||
extern const int KEEPER_EXCEPTION;
|
||||
}
|
||||
|
||||
@ -441,7 +442,7 @@ void ReconfigCommand::execute(const DB::ASTKeeperQuery * query, DB::KeeperClient
|
||||
new_members = query->args[1].safeGet<String>();
|
||||
break;
|
||||
default:
|
||||
UNREACHABLE();
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected operation: {}", operation);
|
||||
}
|
||||
|
||||
auto response = client->zookeeper->reconfig(joining, leaving, new_members);
|
||||
|
@ -155,8 +155,8 @@ auto instructionFailToString(InstructionFail fail)
|
||||
ret("AVX2");
|
||||
case InstructionFail::AVX512:
|
||||
ret("AVX512");
|
||||
#undef ret
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
|
||||
|
@ -792,9 +792,32 @@ try
|
||||
LOG_INFO(log, "Background threads finished in {} ms", watch.elapsedMilliseconds());
|
||||
});
|
||||
|
||||
/// This object will periodically calculate some metrics.
|
||||
ServerAsynchronousMetrics async_metrics(
|
||||
global_context,
|
||||
server_settings.asynchronous_metrics_update_period_s,
|
||||
server_settings.asynchronous_heavy_metrics_update_period_s,
|
||||
[&]() -> std::vector<ProtocolServerMetrics>
|
||||
{
|
||||
std::vector<ProtocolServerMetrics> metrics;
|
||||
|
||||
std::lock_guard lock(servers_lock);
|
||||
metrics.reserve(servers_to_start_before_tables.size() + servers.size());
|
||||
|
||||
for (const auto & server : servers_to_start_before_tables)
|
||||
metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()});
|
||||
|
||||
for (const auto & server : servers)
|
||||
metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()});
|
||||
return metrics;
|
||||
}
|
||||
);
|
||||
|
||||
/// NOTE: global context should be destroyed *before* GlobalThreadPool::shutdown()
|
||||
/// Otherwise GlobalThreadPool::shutdown() will hang, since Context holds some threads.
|
||||
SCOPE_EXIT({
|
||||
async_metrics.stop();
|
||||
|
||||
/** Ask to cancel background jobs all table engines,
|
||||
* and also query_log.
|
||||
* It is important to do early, not in destructor of Context, because
|
||||
@ -921,27 +944,6 @@ try
|
||||
}
|
||||
}
|
||||
|
||||
/// This object will periodically calculate some metrics.
|
||||
ServerAsynchronousMetrics async_metrics(
|
||||
global_context,
|
||||
server_settings.asynchronous_metrics_update_period_s,
|
||||
server_settings.asynchronous_heavy_metrics_update_period_s,
|
||||
[&]() -> std::vector<ProtocolServerMetrics>
|
||||
{
|
||||
std::vector<ProtocolServerMetrics> metrics;
|
||||
|
||||
std::lock_guard lock(servers_lock);
|
||||
metrics.reserve(servers_to_start_before_tables.size() + servers.size());
|
||||
|
||||
for (const auto & server : servers_to_start_before_tables)
|
||||
metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()});
|
||||
|
||||
for (const auto & server : servers)
|
||||
metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()});
|
||||
return metrics;
|
||||
}
|
||||
);
|
||||
|
||||
zkutil::validateZooKeeperConfig(config());
|
||||
bool has_zookeeper = zkutil::hasZooKeeperConfig(config());
|
||||
|
||||
@ -1748,6 +1750,11 @@ try
|
||||
|
||||
}
|
||||
|
||||
if (config().has(DB::PlacementInfo::PLACEMENT_CONFIG_PREFIX))
|
||||
{
|
||||
PlacementInfo::PlacementInfo::instance().initialize(config());
|
||||
}
|
||||
|
||||
{
|
||||
std::lock_guard lock(servers_lock);
|
||||
/// We should start interserver communications before (and more important shutdown after) tables.
|
||||
@ -2096,11 +2103,6 @@ try
|
||||
load_metadata_tasks);
|
||||
}
|
||||
|
||||
if (config().has(DB::PlacementInfo::PLACEMENT_CONFIG_PREFIX))
|
||||
{
|
||||
PlacementInfo::PlacementInfo::instance().initialize(config());
|
||||
}
|
||||
|
||||
/// Do not keep tasks in server, they should be kept inside databases. Used here to make dependent tasks only.
|
||||
load_metadata_tasks.clear();
|
||||
load_metadata_tasks.shrink_to_fit();
|
||||
|
@ -144,8 +144,7 @@ AccessEntityPtr deserializeAccessEntity(const String & definition, const String
|
||||
catch (Exception & e)
|
||||
{
|
||||
e.addMessage("Could not parse " + file_path);
|
||||
e.rethrow();
|
||||
UNREACHABLE();
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -258,7 +258,7 @@ namespace
|
||||
case TABLE_LEVEL: return AccessFlags::allFlagsGrantableOnTableLevel();
|
||||
case COLUMN_LEVEL: return AccessFlags::allFlagsGrantableOnColumnLevel();
|
||||
}
|
||||
UNREACHABLE();
|
||||
chassert(false);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -257,8 +257,7 @@ std::vector<UUID> IAccessStorage::insert(const std::vector<AccessEntityPtr> & mu
|
||||
}
|
||||
e.addMessage("After successfully inserting {}/{}: {}", successfully_inserted.size(), multiple_entities.size(), successfully_inserted_str);
|
||||
}
|
||||
e.rethrow();
|
||||
UNREACHABLE();
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
@ -361,8 +360,7 @@ std::vector<UUID> IAccessStorage::remove(const std::vector<UUID> & ids, bool thr
|
||||
}
|
||||
e.addMessage("After successfully removing {}/{}: {}", removed_names.size(), ids.size(), removed_names_str);
|
||||
}
|
||||
e.rethrow();
|
||||
UNREACHABLE();
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
@ -458,8 +456,7 @@ std::vector<UUID> IAccessStorage::update(const std::vector<UUID> & ids, const Up
|
||||
}
|
||||
e.addMessage("After successfully updating {}/{}: {}", names_of_updated.size(), ids.size(), names_of_updated_str);
|
||||
}
|
||||
e.rethrow();
|
||||
UNREACHABLE();
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -60,14 +60,13 @@ struct GroupArrayTrait
|
||||
template <typename Trait>
|
||||
constexpr const char * getNameByTrait()
|
||||
{
|
||||
if (Trait::last)
|
||||
if constexpr (Trait::last)
|
||||
return "groupArrayLast";
|
||||
if (Trait::sampler == Sampler::NONE)
|
||||
return "groupArray";
|
||||
else if (Trait::sampler == Sampler::RNG)
|
||||
return "groupArraySample";
|
||||
|
||||
UNREACHABLE();
|
||||
switch (Trait::sampler)
|
||||
{
|
||||
case Sampler::NONE: return "groupArray";
|
||||
case Sampler::RNG: return "groupArraySample";
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
|
@ -414,7 +414,6 @@ public:
|
||||
break;
|
||||
return (i == events_size) ? base - i : unmatched_idx;
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena *) const override
|
||||
|
@ -463,7 +463,6 @@ public:
|
||||
return "sumWithOverflow";
|
||||
else if constexpr (Type == AggregateFunctionTypeSumKahan)
|
||||
return "sumKahan";
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
explicit AggregateFunctionSum(const DataTypes & argument_types_)
|
||||
|
@ -181,11 +181,11 @@ public:
|
||||
{
|
||||
public:
|
||||
Iterator(const PaddedPODArray<UInt64> & offsets_, size_t size_, size_t current_offset_, size_t current_row_)
|
||||
: offsets(offsets_), size(size_), current_offset(current_offset_), current_row(current_row_)
|
||||
: offsets(offsets_), offsets_size(offsets.size()), size(size_), current_offset(current_offset_), current_row(current_row_)
|
||||
{
|
||||
}
|
||||
|
||||
bool ALWAYS_INLINE isDefault() const { return current_offset == offsets.size() || current_row != offsets[current_offset]; }
|
||||
bool ALWAYS_INLINE isDefault() const { return current_offset == offsets_size || current_row != offsets[current_offset]; }
|
||||
size_t ALWAYS_INLINE getValueIndex() const { return isDefault() ? 0 : current_offset + 1; }
|
||||
size_t ALWAYS_INLINE getCurrentRow() const { return current_row; }
|
||||
size_t ALWAYS_INLINE getCurrentOffset() const { return current_offset; }
|
||||
@ -211,6 +211,7 @@ public:
|
||||
|
||||
private:
|
||||
const PaddedPODArray<UInt64> & offsets;
|
||||
const size_t offsets_size;
|
||||
const size_t size;
|
||||
size_t current_offset;
|
||||
size_t current_row;
|
||||
|
@ -127,6 +127,9 @@
|
||||
M(DestroyAggregatesThreads, "Number of threads in the thread pool for destroy aggregate states.") \
|
||||
M(DestroyAggregatesThreadsActive, "Number of threads in the thread pool for destroy aggregate states running a task.") \
|
||||
M(DestroyAggregatesThreadsScheduled, "Number of queued or active jobs in the thread pool for destroy aggregate states.") \
|
||||
M(ConcurrentHashJoinPoolThreads, "Number of threads in the thread pool for concurrent hash join.") \
|
||||
M(ConcurrentHashJoinPoolThreadsActive, "Number of threads in the thread pool for concurrent hash join running a task.") \
|
||||
M(ConcurrentHashJoinPoolThreadsScheduled, "Number of queued or active jobs in the thread pool for concurrent hash join.") \
|
||||
M(HashedDictionaryThreads, "Number of threads in the HashedDictionary thread pool.") \
|
||||
M(HashedDictionaryThreadsActive, "Number of threads in the HashedDictionary thread pool running a task.") \
|
||||
M(HashedDictionaryThreadsScheduled, "Number of queued or active jobs in the HashedDictionary thread pool.") \
|
||||
|
@ -41,7 +41,6 @@ UInt8 getDayOfWeek(const cctz::civil_day & date)
|
||||
case cctz::weekday::saturday: return 6;
|
||||
case cctz::weekday::sunday: return 7;
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
inline cctz::time_point<cctz::seconds> lookupTz(const cctz::time_zone & cctz_time_zone, const cctz::civil_day & date)
|
||||
|
@ -34,8 +34,6 @@ Int64 IntervalKind::toAvgNanoseconds() const
|
||||
default:
|
||||
return toAvgSeconds() * NANOSECONDS_PER_SECOND;
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
Int32 IntervalKind::toAvgSeconds() const
|
||||
@ -54,7 +52,6 @@ Int32 IntervalKind::toAvgSeconds() const
|
||||
case IntervalKind::Kind::Quarter: return 7889238; /// Exactly 1/4 of a year.
|
||||
case IntervalKind::Kind::Year: return 31556952; /// The average length of a Gregorian year is equal to 365.2425 days
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
Float64 IntervalKind::toSeconds() const
|
||||
@ -80,7 +77,6 @@ Float64 IntervalKind::toSeconds() const
|
||||
default:
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Not possible to get precise number of seconds in non-precise interval");
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
bool IntervalKind::isFixedLength() const
|
||||
@ -99,7 +95,6 @@ bool IntervalKind::isFixedLength() const
|
||||
case IntervalKind::Kind::Quarter:
|
||||
case IntervalKind::Kind::Year: return false;
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
IntervalKind IntervalKind::fromAvgSeconds(Int64 num_seconds)
|
||||
@ -141,7 +136,6 @@ const char * IntervalKind::toKeyword() const
|
||||
case IntervalKind::Kind::Quarter: return "QUARTER";
|
||||
case IntervalKind::Kind::Year: return "YEAR";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
|
||||
@ -161,7 +155,6 @@ const char * IntervalKind::toLowercasedKeyword() const
|
||||
case IntervalKind::Kind::Quarter: return "quarter";
|
||||
case IntervalKind::Kind::Year: return "year";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
|
||||
@ -192,7 +185,6 @@ const char * IntervalKind::toDateDiffUnit() const
|
||||
case IntervalKind::Kind::Year:
|
||||
return "year";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
|
||||
@ -223,7 +215,6 @@ const char * IntervalKind::toNameOfFunctionToIntervalDataType() const
|
||||
case IntervalKind::Kind::Year:
|
||||
return "toIntervalYear";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
|
||||
@ -257,7 +248,6 @@ const char * IntervalKind::toNameOfFunctionExtractTimePart() const
|
||||
case IntervalKind::Kind::Year:
|
||||
return "toYear";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
|
||||
|
@ -54,8 +54,6 @@ String toString(TargetArch arch)
|
||||
case TargetArch::AMXTILE: return "amxtile";
|
||||
case TargetArch::AMXINT8: return "amxint8";
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -75,7 +75,6 @@ const char * TasksStatsCounters::metricsProviderString(MetricsProvider provider)
|
||||
case MetricsProvider::Netlink:
|
||||
return "netlink";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
bool TasksStatsCounters::checkIfAvailable()
|
||||
|
@ -146,8 +146,6 @@ const char * errorMessage(Error code)
|
||||
case Error::ZSESSIONMOVED: return "Session moved to another server, so operation is ignored";
|
||||
case Error::ZNOTREADONLY: return "State-changing request is passed to read-only server";
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
bool isHardwareError(Error zk_return_code)
|
||||
|
@ -466,7 +466,6 @@ void CompressionCodecDeflateQpl::doDecompressData(const char * source, UInt32 so
|
||||
sw_codec->doDecompressData(source, source_size, dest, uncompressed_size);
|
||||
return;
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
void CompressionCodecDeflateQpl::flushAsynchronousDecompressRequests()
|
||||
|
@ -21,6 +21,11 @@
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int BAD_ARGUMENTS;
|
||||
}
|
||||
|
||||
/** NOTE DoubleDelta is surprisingly bad name. The only excuse is that it comes from an academic paper.
|
||||
* Most people will think that "double delta" is just applying delta transform twice.
|
||||
* But in fact it is something more than applying delta transform twice.
|
||||
@ -142,9 +147,9 @@ namespace ErrorCodes
|
||||
{
|
||||
extern const int CANNOT_COMPRESS;
|
||||
extern const int CANNOT_DECOMPRESS;
|
||||
extern const int BAD_ARGUMENTS;
|
||||
extern const int ILLEGAL_SYNTAX_FOR_CODEC_TYPE;
|
||||
extern const int ILLEGAL_CODEC_PARAMETER;
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
namespace
|
||||
@ -163,9 +168,8 @@ inline Int64 getMaxValueForByteSize(Int8 byte_size)
|
||||
case sizeof(UInt64):
|
||||
return std::numeric_limits<Int64>::max();
|
||||
default:
|
||||
assert(false && "only 1, 2, 4 and 8 data sizes are supported");
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "only 1, 2, 4 and 8 data sizes are supported");
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
struct WriteSpec
|
||||
|
@ -5,6 +5,12 @@
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
ClusterUpdateActions joiningToClusterUpdates(const ClusterConfigPtr & cfg, std::string_view joining)
|
||||
{
|
||||
ClusterUpdateActions out;
|
||||
@ -79,7 +85,7 @@ String serializeClusterConfig(const ClusterConfigPtr & cfg, const ClusterUpdateA
|
||||
new_config.emplace_back(RaftServerConfig{*cfg->get_server(priority->id)});
|
||||
}
|
||||
else
|
||||
UNREACHABLE();
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected update");
|
||||
}
|
||||
|
||||
for (const auto & item : cfg->get_servers())
|
||||
|
@ -990,7 +990,7 @@ KeeperServer::ConfigUpdateState KeeperServer::applyConfigUpdate(
|
||||
raft_instance->set_priority(update->id, update->priority, /*broadcast on live leader*/true);
|
||||
return Accepted;
|
||||
}
|
||||
UNREACHABLE();
|
||||
std::unreachable();
|
||||
}
|
||||
|
||||
ClusterUpdateActions KeeperServer::getRaftConfigurationDiff(const Poco::Util::AbstractConfiguration & config)
|
||||
|
@ -478,4 +478,9 @@ bool Context::hasTraceCollector() const
|
||||
return false;
|
||||
}
|
||||
|
||||
bool Context::isBackgroundOperationContext() const
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -170,6 +170,8 @@ public:
|
||||
const ServerSettings & getServerSettings() const;
|
||||
|
||||
bool hasTraceCollector() const;
|
||||
|
||||
bool isBackgroundOperationContext() const;
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -667,8 +667,6 @@ public:
|
||||
case Types::AggregateFunctionState: return f(field.template get<AggregateFunctionStateData>());
|
||||
case Types::CustomType: return f(field.template get<CustomType>());
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
String dump() const;
|
||||
|
@ -36,7 +36,6 @@ String ISerialization::kindToString(Kind kind)
|
||||
case Kind::SPARSE:
|
||||
return "Sparse";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
ISerialization::Kind ISerialization::stringToKind(const String & str)
|
||||
|
@ -17,7 +17,6 @@ std::string toString(MetadataStorageTransactionState state)
|
||||
case MetadataStorageTransactionState::PARTIALLY_ROLLED_BACK:
|
||||
return "PARTIALLY_ROLLED_BACK";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -259,7 +259,10 @@ std::unique_ptr<WriteBufferFromFileBase> S3ObjectStorage::writeObject( /// NOLIN
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "S3 doesn't support append to files");
|
||||
|
||||
S3Settings::RequestSettings request_settings = s3_settings.get()->request_settings;
|
||||
if (auto query_context = CurrentThread::getQueryContext())
|
||||
/// NOTE: For background operations settings are not propagated from session or query. They are taken from
|
||||
/// default user's .xml config. It's obscure and unclear behavior. For them it's always better
|
||||
/// to rely on settings from disk.
|
||||
if (auto query_context = CurrentThread::getQueryContext(); query_context && !query_context->isBackgroundOperationContext())
|
||||
{
|
||||
request_settings.updateFromSettingsIfChanged(query_context->getSettingsRef());
|
||||
}
|
||||
|
@ -112,7 +112,6 @@ DiskPtr VolumeJBOD::getDisk(size_t /* index */) const
|
||||
return disks_by_size.top().disk;
|
||||
}
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
ReservationPtr VolumeJBOD::reserve(UInt64 bytes)
|
||||
@ -164,7 +163,6 @@ ReservationPtr VolumeJBOD::reserve(UInt64 bytes)
|
||||
return reservation;
|
||||
}
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
bool VolumeJBOD::areMergesAvoided() const
|
||||
|
@ -62,7 +62,6 @@ String escapingRuleToString(FormatSettings::EscapingRule escaping_rule)
|
||||
case FormatSettings::EscapingRule::Raw:
|
||||
return "Raw";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
void skipFieldByEscapingRule(ReadBuffer & buf, FormatSettings::EscapingRule escaping_rule, const FormatSettings & format_settings)
|
||||
|
@ -1176,8 +1176,7 @@ public:
|
||||
/// You can compare the date, datetime, or datatime64 and an enumeration with a constant string.
|
||||
|| ((left.isDate() || left.isDate32() || left.isDateTime() || left.isDateTime64()) && (right.isDate() || right.isDate32() || right.isDateTime() || right.isDateTime64()) && left.idx == right.idx) /// only date vs date, or datetime vs datetime
|
||||
|| (left.isUUID() && right.isUUID())
|
||||
|| (left.isIPv4() && right.isIPv4())
|
||||
|| (left.isIPv6() && right.isIPv6())
|
||||
|| ((left.isIPv4() || left.isIPv6()) && (right.isIPv4() || right.isIPv6()))
|
||||
|| (left.isEnum() && right.isEnum() && arguments[0]->getName() == arguments[1]->getName()) /// only equivalent enum type values can be compared against
|
||||
|| (left_tuple && right_tuple && left_tuple->getElements().size() == right_tuple->getElements().size())
|
||||
|| (arguments[0]->equals(*arguments[1]))))
|
||||
@ -1266,6 +1265,8 @@ public:
|
||||
const bool left_is_float = which_left.isFloat();
|
||||
const bool right_is_float = which_right.isFloat();
|
||||
|
||||
const bool left_is_ipv4 = which_left.isIPv4();
|
||||
const bool right_is_ipv4 = which_right.isIPv4();
|
||||
const bool left_is_ipv6 = which_left.isIPv6();
|
||||
const bool right_is_ipv6 = which_right.isIPv6();
|
||||
const bool left_is_fixed_string = which_left.isFixedString();
|
||||
@ -1323,10 +1324,13 @@ public:
|
||||
{
|
||||
return res;
|
||||
}
|
||||
else if (((left_is_ipv6 && right_is_fixed_string) || (right_is_ipv6 && left_is_fixed_string)) && fixed_string_size == IPV6_BINARY_LENGTH)
|
||||
else if (
|
||||
(((left_is_ipv6 && right_is_fixed_string) || (right_is_ipv6 && left_is_fixed_string)) && fixed_string_size == IPV6_BINARY_LENGTH)
|
||||
|| ((left_is_ipv4 || left_is_ipv6) && (right_is_ipv4 || right_is_ipv6))
|
||||
)
|
||||
{
|
||||
/// Special treatment for FixedString(16) as a binary representation of IPv6 -
|
||||
/// CAST is customized for this case
|
||||
/// Special treatment for FixedString(16) as a binary representation of IPv6 & for comparing IPv4 & IPv6 values -
|
||||
/// CAST is customized for this cases
|
||||
ColumnPtr left_column = left_is_ipv6 ?
|
||||
col_with_type_and_name_left.column : castColumn(col_with_type_and_name_left, right_type);
|
||||
ColumnPtr right_column = right_is_ipv6 ?
|
||||
|
@ -149,8 +149,6 @@ struct IntegerRoundingComputation
|
||||
return x;
|
||||
}
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
static ALWAYS_INLINE T compute(T x, T scale)
|
||||
@ -163,8 +161,6 @@ struct IntegerRoundingComputation
|
||||
case ScaleMode::Negative:
|
||||
return computeImpl(x, scale);
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
static ALWAYS_INLINE void compute(const T * __restrict in, size_t scale, T * __restrict out) requires std::integral<T>
|
||||
@ -247,8 +243,6 @@ inline float roundWithMode(float x, RoundingMode mode)
|
||||
case RoundingMode::Ceil: return ceilf(x);
|
||||
case RoundingMode::Trunc: return truncf(x);
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
inline double roundWithMode(double x, RoundingMode mode)
|
||||
@ -260,8 +254,6 @@ inline double roundWithMode(double x, RoundingMode mode)
|
||||
case RoundingMode::Ceil: return ceil(x);
|
||||
case RoundingMode::Trunc: return trunc(x);
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
|
@ -232,7 +232,6 @@ struct TimeWindowImpl<TUMBLE>
|
||||
default:
|
||||
throw Exception(ErrorCodes::SYNTAX_ERROR, "Fraction seconds are unsupported by windows yet");
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
template <typename ToType, IntervalKind::Kind unit>
|
||||
@ -422,7 +421,6 @@ struct TimeWindowImpl<HOP>
|
||||
default:
|
||||
throw Exception(ErrorCodes::SYNTAX_ERROR, "Fraction seconds are unsupported by windows yet");
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
template <typename ToType, IntervalKind::Kind kind>
|
||||
|
@ -381,8 +381,6 @@ bool PointInPolygonWithGrid<CoordinateType>::contains(CoordinateType x, Coordina
|
||||
case CellType::complexPolygon:
|
||||
return boost::geometry::within(Point(x, y), polygons[cell.index_of_inner_polygon]);
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
|
||||
|
@ -35,7 +35,6 @@ namespace
|
||||
case UserDefinedSQLObjectType::Function:
|
||||
return "function_";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
constexpr std::string_view sql_extension = ".sql";
|
||||
|
@ -1,221 +0,0 @@
|
||||
#pragma once
|
||||
|
||||
#include <base/types.h>
|
||||
#include <boost/algorithm/string/case_conv.hpp>
|
||||
|
||||
#include <Columns/ColumnNullable.h>
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <Columns/ColumnString.h>
|
||||
#include <Common/Exception.h>
|
||||
#include <DataTypes/DataTypeNullable.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <Functions/FunctionHelpers.h>
|
||||
#include <Functions/IFunction.h>
|
||||
#include <IO/ReadBufferFromString.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <cmath>
|
||||
#include <string_view>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int BAD_ARGUMENTS;
|
||||
extern const int CANNOT_PARSE_INPUT_ASSERTION_FAILED;
|
||||
extern const int CANNOT_PARSE_NUMBER;
|
||||
extern const int CANNOT_PARSE_TEXT;
|
||||
extern const int ILLEGAL_COLUMN;
|
||||
extern const int UNEXPECTED_DATA_AFTER_PARSED_VALUE;
|
||||
}
|
||||
|
||||
enum class ErrorHandling : uint8_t
|
||||
{
|
||||
Exception,
|
||||
Zero,
|
||||
Null
|
||||
};
|
||||
|
||||
using ScaleFactors = std::unordered_map<std::string_view, size_t>;
|
||||
|
||||
/** fromReadble*Size - Returns the number of bytes corresponding to a given readable binary or decimal size.
|
||||
* Examples:
|
||||
* - `fromReadableSize('123 MiB')`
|
||||
* - `fromReadableDecimalSize('123 MB')`
|
||||
* Meant to be the inverse of `formatReadable*Size` with the following exceptions:
|
||||
* - Number of bytes is returned as an unsigned integer amount instead of a float. Decimal points are rounded up to the nearest integer.
|
||||
* - Negative numbers are not allowed as negative sizes don't make sense.
|
||||
* Flavours:
|
||||
* - fromReadableSize
|
||||
* - fromReadableSizeOrNull
|
||||
* - fromReadableSizeOrZero
|
||||
* - fromReadableDecimalSize
|
||||
* - fromReadableDecimalSizeOrNull
|
||||
* - fromReadableDecimalSizeOrZero
|
||||
*/
|
||||
template <typename Name, typename Impl, ErrorHandling error_handling>
|
||||
class FunctionFromReadable : public IFunction
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = Name::name;
|
||||
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionFromReadable<Name, Impl, error_handling>>(); }
|
||||
|
||||
String getName() const override { return name; }
|
||||
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; }
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
|
||||
{
|
||||
FunctionArgumentDescriptors args
|
||||
{
|
||||
{"readable_size", static_cast<FunctionArgumentDescriptor::TypeValidator>(&isString), nullptr, "String"},
|
||||
};
|
||||
validateFunctionArgumentTypes(*this, arguments, args);
|
||||
DataTypePtr return_type = std::make_shared<DataTypeUInt64>();
|
||||
if (error_handling == ErrorHandling::Null)
|
||||
return std::make_shared<DataTypeNullable>(return_type);
|
||||
else
|
||||
return return_type;
|
||||
}
|
||||
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
|
||||
{
|
||||
const auto * col_str = checkAndGetColumn<ColumnString>(arguments[0].column.get());
|
||||
if (!col_str)
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::ILLEGAL_COLUMN,
|
||||
"Illegal column {} of first ('str') argument of function {}. Must be string.",
|
||||
arguments[0].column->getName(),
|
||||
getName()
|
||||
);
|
||||
}
|
||||
|
||||
const ScaleFactors & scale_factors = Impl::getScaleFactors();
|
||||
|
||||
auto col_res = ColumnUInt64::create(input_rows_count);
|
||||
|
||||
ColumnUInt8::MutablePtr col_null_map;
|
||||
if constexpr (error_handling == ErrorHandling::Null)
|
||||
col_null_map = ColumnUInt8::create(input_rows_count, 0);
|
||||
|
||||
auto & res_data = col_res->getData();
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
{
|
||||
std::string_view value = col_str->getDataAt(i).toView();
|
||||
try
|
||||
{
|
||||
UInt64 num_bytes = parseReadableFormat(scale_factors, value);
|
||||
res_data[i] = num_bytes;
|
||||
}
|
||||
catch (const Exception &)
|
||||
{
|
||||
if constexpr (error_handling == ErrorHandling::Exception)
|
||||
{
|
||||
throw;
|
||||
}
|
||||
else
|
||||
{
|
||||
res_data[i] = 0;
|
||||
if constexpr (error_handling == ErrorHandling::Null)
|
||||
col_null_map->getData()[i] = 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
if constexpr (error_handling == ErrorHandling::Null)
|
||||
return ColumnNullable::create(std::move(col_res), std::move(col_null_map));
|
||||
else
|
||||
return col_res;
|
||||
}
|
||||
|
||||
private:
|
||||
|
||||
UInt64 parseReadableFormat(const ScaleFactors & scale_factors, const std::string_view & value) const
|
||||
{
|
||||
ReadBufferFromString buf(value);
|
||||
|
||||
// tryReadFloatText does seem to not raise any error when there is leading whitespace so we check it explicitly
|
||||
skipWhitespaceIfAny(buf);
|
||||
if (buf.getPosition() > 0)
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::CANNOT_PARSE_INPUT_ASSERTION_FAILED,
|
||||
"Invalid expression for function {} - Leading whitespace is not allowed (\"{}\")",
|
||||
getName(),
|
||||
value
|
||||
);
|
||||
}
|
||||
|
||||
Float64 base = 0;
|
||||
if (!tryReadFloatTextPrecise(base, buf)) // If we use the default (fast) tryReadFloatText this returns True on garbage input so we use the Precise version
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::CANNOT_PARSE_NUMBER,
|
||||
"Invalid expression for function {} - Unable to parse readable size numeric component (\"{}\")",
|
||||
getName(),
|
||||
value
|
||||
);
|
||||
}
|
||||
else if (std::isnan(base) || !std::isfinite(base))
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::BAD_ARGUMENTS,
|
||||
"Invalid expression for function {} - Invalid numeric component: {}",
|
||||
getName(),
|
||||
base
|
||||
);
|
||||
}
|
||||
else if (base < 0)
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::BAD_ARGUMENTS,
|
||||
"Invalid expression for function {} - Negative sizes are not allowed ({})",
|
||||
getName(),
|
||||
base
|
||||
);
|
||||
}
|
||||
|
||||
skipWhitespaceIfAny(buf);
|
||||
|
||||
String unit;
|
||||
readStringUntilWhitespace(unit, buf);
|
||||
boost::algorithm::to_lower(unit);
|
||||
auto iter = scale_factors.find(unit);
|
||||
if (iter == scale_factors.end())
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::CANNOT_PARSE_TEXT,
|
||||
"Invalid expression for function {} - Unknown readable size unit (\"{}\")",
|
||||
getName(),
|
||||
unit
|
||||
);
|
||||
}
|
||||
else if (!buf.eof())
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::UNEXPECTED_DATA_AFTER_PARSED_VALUE,
|
||||
"Invalid expression for function {} - Found trailing characters after readable size string (\"{}\")",
|
||||
getName(),
|
||||
value
|
||||
);
|
||||
}
|
||||
|
||||
Float64 num_bytes_with_decimals = base * iter->second;
|
||||
if (num_bytes_with_decimals > std::numeric_limits<UInt64>::max())
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::BAD_ARGUMENTS,
|
||||
"Invalid expression for function {} - Result is too big for output type (\"{}\")",
|
||||
getName(),
|
||||
num_bytes_with_decimals
|
||||
);
|
||||
}
|
||||
// As the input might be an arbitrary decimal number we might end up with a non-integer amount of bytes when parsing binary (eg MiB) units.
|
||||
// This doesn't make sense so we round up to indicate the byte size that can fit the passed size.
|
||||
return static_cast<UInt64>(std::ceil(num_bytes_with_decimals));
|
||||
}
|
||||
};
|
||||
}
|
@ -1,122 +0,0 @@
|
||||
#include <base/types.h>
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/fromReadable.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
struct Impl
|
||||
{
|
||||
static const ScaleFactors & getScaleFactors()
|
||||
{
|
||||
static const ScaleFactors scale_factors =
|
||||
{
|
||||
{"b", 1ull},
|
||||
{"kb", 1000ull},
|
||||
{"mb", 1000ull * 1000ull},
|
||||
{"gb", 1000ull * 1000ull * 1000ull},
|
||||
{"tb", 1000ull * 1000ull * 1000ull * 1000ull},
|
||||
{"pb", 1000ull * 1000ull * 1000ull * 1000ull * 1000ull},
|
||||
{"eb", 1000ull * 1000ull * 1000ull * 1000ull * 1000ull * 1000ull},
|
||||
};
|
||||
|
||||
return scale_factors;
|
||||
}
|
||||
};
|
||||
|
||||
struct NameFromReadableDecimalSize
|
||||
{
|
||||
static constexpr auto name = "fromReadableDecimalSize";
|
||||
};
|
||||
|
||||
struct NameFromReadableDecimalSizeOrNull
|
||||
{
|
||||
static constexpr auto name = "fromReadableDecimalSizeOrNull";
|
||||
};
|
||||
|
||||
struct NameFromReadableDecimalSizeOrZero
|
||||
{
|
||||
static constexpr auto name = "fromReadableDecimalSizeOrZero";
|
||||
};
|
||||
|
||||
using FunctionFromReadableDecimalSize = FunctionFromReadable<NameFromReadableDecimalSize, Impl, ErrorHandling::Exception>;
|
||||
using FunctionFromReadableDecimalSizeOrNull = FunctionFromReadable<NameFromReadableDecimalSizeOrNull, Impl, ErrorHandling::Null>;
|
||||
using FunctionFromReadableDecimalSizeOrZero = FunctionFromReadable<NameFromReadableDecimalSizeOrZero, Impl, ErrorHandling::Zero>;
|
||||
|
||||
|
||||
FunctionDocumentation fromReadableDecimalSize_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KB`, `MB`, etc. as a unit, this function returns the corresponding number of bytes. If the function is unable to parse the input value, it throws an exception.",
|
||||
.syntax = "fromReadableDecimalSize(x)",
|
||||
.arguments = {{"x", "Readable size with decimal units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer ([UInt64](../../sql-reference/data-types/int-uint.md))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB']) AS readable_sizes, fromReadableDecimalSize(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5314 │
|
||||
└────────────────┴─────────┘)"
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
|
||||
FunctionDocumentation fromReadableDecimalSizeOrNull_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit, this function returns the corresponding number of bytes. If the function is unable to parse the input value, it returns `NULL`",
|
||||
.syntax = "fromReadableDecimalSizeOrNull(x)",
|
||||
.arguments = {{"x", "Readable size with decimal units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer, or NULL if unable to parse the input (Nullable([UInt64](../../sql-reference/data-types/int-uint.md)))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB', 'invalid']) AS readable_sizes, fromReadableSizeOrNull(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5314 │
|
||||
│ invalid │ ᴺᵁᴸᴸ │
|
||||
└────────────────┴─────────┘)"
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
|
||||
FunctionDocumentation fromReadableDecimalSizeOrZero_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit, this function returns the corresponding number of bytes. If the function is unable to parse the input value, it returns `0`",
|
||||
.syntax = "formatReadableSizeOrZero(x)",
|
||||
.arguments = {{"x", "Readable size with decimal units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer, or 0 if unable to parse the input ([UInt64](../../sql-reference/data-types/int-uint.md))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB', 'invalid']) AS readable_sizes, fromReadableSizeOrZero(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5000 │
|
||||
│ invalid │ 0 │
|
||||
└────────────────┴─────────┘)"
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
}
|
||||
|
||||
REGISTER_FUNCTION(FromReadableDecimalSize)
|
||||
{
|
||||
factory.registerFunction<FunctionFromReadableDecimalSize>(fromReadableDecimalSize_documentation);
|
||||
factory.registerFunction<FunctionFromReadableDecimalSizeOrNull>(fromReadableDecimalSizeOrNull_documentation);
|
||||
factory.registerFunction<FunctionFromReadableDecimalSizeOrZero>(fromReadableDecimalSizeOrZero_documentation);
|
||||
}
|
||||
}
|
@ -1,124 +0,0 @@
|
||||
#include <base/types.h>
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/fromReadable.h>
|
||||
#include "Common/FunctionDocumentation.h"
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
struct Impl
|
||||
{
|
||||
static const ScaleFactors & getScaleFactors()
|
||||
{
|
||||
// ISO/IEC 80000-13 binary units
|
||||
static const ScaleFactors scale_factors =
|
||||
{
|
||||
{"b", 1ull},
|
||||
{"kib", 1024ull},
|
||||
{"mib", 1024ull * 1024ull},
|
||||
{"gib", 1024ull * 1024ull * 1024ull},
|
||||
{"tib", 1024ull * 1024ull * 1024ull * 1024ull},
|
||||
{"pib", 1024ull * 1024ull * 1024ull * 1024ull * 1024ull},
|
||||
{"eib", 1024ull * 1024ull * 1024ull * 1024ull * 1024ull * 1024ull},
|
||||
};
|
||||
|
||||
return scale_factors;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
struct NameFromReadableSize
|
||||
{
|
||||
static constexpr auto name = "fromReadableSize";
|
||||
};
|
||||
|
||||
struct NameFromReadableSizeOrNull
|
||||
{
|
||||
static constexpr auto name = "fromReadableSizeOrNull";
|
||||
};
|
||||
|
||||
struct NameFromReadableSizeOrZero
|
||||
{
|
||||
static constexpr auto name = "fromReadableSizeOrZero";
|
||||
};
|
||||
|
||||
using FunctionFromReadableSize = FunctionFromReadable<NameFromReadableSize, Impl, ErrorHandling::Exception>;
|
||||
using FunctionFromReadableSizeOrNull = FunctionFromReadable<NameFromReadableSizeOrNull, Impl, ErrorHandling::Null>;
|
||||
using FunctionFromReadableSizeOrZero = FunctionFromReadable<NameFromReadableSizeOrZero, Impl, ErrorHandling::Zero>;
|
||||
|
||||
FunctionDocumentation fromReadableSize_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes. If the function is unable to parse the input value, it throws an exception.",
|
||||
.syntax = "fromReadableSize(x)",
|
||||
.arguments = {{"x", "Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer ([UInt64](../../sql-reference/data-types/int-uint.md))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB']) AS readable_sizes, fromReadableSize(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
└────────────────┴─────────┘)"
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
|
||||
FunctionDocumentation fromReadableSizeOrNull_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes. If the function is unable to parse the input value, it returns `NULL`",
|
||||
.syntax = "fromReadableSizeOrNull(x)",
|
||||
.arguments = {{"x", "Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer, or NULL if unable to parse the input (Nullable([UInt64](../../sql-reference/data-types/int-uint.md)))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB', 'invalid']) AS readable_sizes, fromReadableSize(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
│ invalid │ ᴺᵁᴸᴸ │
|
||||
└────────────────┴─────────┘)"
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
|
||||
FunctionDocumentation fromReadableSizeOrZero_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes. If the function is unable to parse the input value, it returns `0`",
|
||||
.syntax = "fromReadableSizeOrZero(x)",
|
||||
.arguments = {{"x", "Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer, or 0 if unable to parse the input ([UInt64](../../sql-reference/data-types/int-uint.md))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB', 'invalid']) AS readable_sizes, fromReadableSize(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
│ invalid │ 0 │
|
||||
└────────────────┴─────────┘)",
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
}
|
||||
|
||||
REGISTER_FUNCTION(FromReadableSize)
|
||||
{
|
||||
factory.registerFunction<FunctionFromReadableSize>(fromReadableSize_documentation);
|
||||
factory.registerFunction<FunctionFromReadableSizeOrNull>(fromReadableSizeOrNull_documentation);
|
||||
factory.registerFunction<FunctionFromReadableSizeOrZero>(fromReadableSizeOrZero_documentation);
|
||||
}
|
||||
}
|
@ -73,20 +73,6 @@ void setVariant(UUID & uuid)
|
||||
UUIDHelpers::getLowBytes(uuid) = (UUIDHelpers::getLowBytes(uuid) & rand_b_bits_mask) | variant_2_mask;
|
||||
}
|
||||
|
||||
struct FillAllRandomPolicy
|
||||
{
|
||||
static constexpr auto name = "generateUUIDv7NonMonotonic";
|
||||
static constexpr auto description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), and a random field (74 bit, including a 2-bit variant field "2") to distinguish UUIDs within a millisecond. This function is the fastest generateUUIDv7* function but it gives no monotonicity guarantees within a timestamp.)";
|
||||
struct Data
|
||||
{
|
||||
void generate(UUID & uuid, uint64_t ts)
|
||||
{
|
||||
setTimestampAndVersion(uuid, ts);
|
||||
setVariant(uuid);
|
||||
}
|
||||
};
|
||||
};
|
||||
|
||||
struct CounterFields
|
||||
{
|
||||
uint64_t last_timestamp = 0;
|
||||
@ -133,44 +119,21 @@ struct CounterFields
|
||||
};
|
||||
|
||||
|
||||
struct GlobalCounterPolicy
|
||||
struct Data
|
||||
{
|
||||
static constexpr auto name = "generateUUIDv7";
|
||||
static constexpr auto description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit, including a variant field "2", 2 bit) to distinguish UUIDs within a millisecond, and a random field (32 bits). For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value. Function generateUUIDv7 guarantees that the counter field within a timestamp increments monotonically across all function invocations in concurrently running threads and queries.)";
|
||||
|
||||
/// Guarantee counter monotonicity within one timestamp across all threads generating UUIDv7 simultaneously.
|
||||
struct Data
|
||||
static inline CounterFields fields;
|
||||
static inline SharedMutex mutex; /// works a little bit faster than std::mutex here
|
||||
std::lock_guard<SharedMutex> guard;
|
||||
|
||||
Data()
|
||||
: guard(mutex)
|
||||
{}
|
||||
|
||||
void generate(UUID & uuid, uint64_t timestamp)
|
||||
{
|
||||
static inline CounterFields fields;
|
||||
static inline SharedMutex mutex; /// works a little bit faster than std::mutex here
|
||||
std::lock_guard<SharedMutex> guard;
|
||||
|
||||
Data()
|
||||
: guard(mutex)
|
||||
{}
|
||||
|
||||
void generate(UUID & uuid, uint64_t timestamp)
|
||||
{
|
||||
fields.generate(uuid, timestamp);
|
||||
}
|
||||
};
|
||||
};
|
||||
|
||||
struct ThreadLocalCounterPolicy
|
||||
{
|
||||
static constexpr auto name = "generateUUIDv7ThreadMonotonic";
|
||||
static constexpr auto description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit, including a variant field "2", 2 bit) to distinguish UUIDs within a millisecond, and a random field (32 bits). For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value. This function behaves like generateUUIDv7 but gives no guarantee on counter monotony across different simultaneous requests. Monotonicity within one timestamp is guaranteed only within the same thread calling this function to generate UUIDs.)";
|
||||
|
||||
/// Guarantee counter monotonicity within one timestamp within the same thread. Faster than GlobalCounterPolicy if a query uses multiple threads.
|
||||
struct Data
|
||||
{
|
||||
static inline thread_local CounterFields fields;
|
||||
|
||||
void generate(UUID & uuid, uint64_t timestamp)
|
||||
{
|
||||
fields.generate(uuid, timestamp);
|
||||
}
|
||||
};
|
||||
fields.generate(uuid, timestamp);
|
||||
}
|
||||
};
|
||||
|
||||
}
|
||||
@ -181,11 +144,12 @@ DECLARE_AVX2_SPECIFIC_CODE(__VA_ARGS__)
|
||||
|
||||
DECLARE_SEVERAL_IMPLEMENTATIONS(
|
||||
|
||||
template <typename FillPolicy>
|
||||
class FunctionGenerateUUIDv7Base : public IFunction, public FillPolicy
|
||||
class FunctionGenerateUUIDv7Base : public IFunction
|
||||
{
|
||||
public:
|
||||
String getName() const final { return FillPolicy::name; }
|
||||
static constexpr auto name = "generateUUIDv7";
|
||||
|
||||
String getName() const final { return name; }
|
||||
size_t getNumberOfArguments() const final { return 0; }
|
||||
bool isDeterministic() const override { return false; }
|
||||
bool isDeterministicInScopeOfQuery() const final { return false; }
|
||||
@ -221,7 +185,7 @@ public:
|
||||
uint64_t timestamp = getTimestampMillisecond();
|
||||
for (UUID & uuid : vec_to)
|
||||
{
|
||||
typename FillPolicy::Data data;
|
||||
Data data;
|
||||
data.generate(uuid, timestamp);
|
||||
}
|
||||
}
|
||||
@ -231,19 +195,18 @@ public:
|
||||
) // DECLARE_SEVERAL_IMPLEMENTATIONS
|
||||
#undef DECLARE_SEVERAL_IMPLEMENTATIONS
|
||||
|
||||
template <typename FillPolicy>
|
||||
class FunctionGenerateUUIDv7Base : public TargetSpecific::Default::FunctionGenerateUUIDv7Base<FillPolicy>
|
||||
class FunctionGenerateUUIDv7Base : public TargetSpecific::Default::FunctionGenerateUUIDv7Base
|
||||
{
|
||||
public:
|
||||
using Self = FunctionGenerateUUIDv7Base<FillPolicy>;
|
||||
using Parent = TargetSpecific::Default::FunctionGenerateUUIDv7Base<FillPolicy>;
|
||||
using Self = FunctionGenerateUUIDv7Base;
|
||||
using Parent = TargetSpecific::Default::FunctionGenerateUUIDv7Base;
|
||||
|
||||
explicit FunctionGenerateUUIDv7Base(ContextPtr context) : selector(context)
|
||||
{
|
||||
selector.registerImplementation<TargetArch::Default, Parent>();
|
||||
|
||||
#if USE_MULTITARGET_CODE
|
||||
using ParentAVX2 = TargetSpecific::AVX2::FunctionGenerateUUIDv7Base<FillPolicy>;
|
||||
using ParentAVX2 = TargetSpecific::AVX2::FunctionGenerateUUIDv7Base;
|
||||
selector.registerImplementation<TargetArch::AVX2, ParentAVX2>();
|
||||
#endif
|
||||
}
|
||||
@ -262,27 +225,16 @@ private:
|
||||
ImplementationSelector<IFunction> selector;
|
||||
};
|
||||
|
||||
template<typename FillPolicy>
|
||||
void registerUUIDv7Generator(auto & factory)
|
||||
{
|
||||
static constexpr auto doc_syntax_format = "{}([expression])";
|
||||
static constexpr auto example_format = "SELECT {}()";
|
||||
static constexpr auto multiple_example_format = "SELECT {f}(1), {f}(2)";
|
||||
|
||||
FunctionDocumentation::Description description = FillPolicy::description;
|
||||
FunctionDocumentation::Syntax syntax = fmt::format(doc_syntax_format, FillPolicy::name);
|
||||
FunctionDocumentation::Arguments arguments = {{"expression", "The expression is used to bypass common subexpression elimination if the function is called multiple times in a query but otherwise ignored. Optional."}};
|
||||
FunctionDocumentation::ReturnedValue returned_value = "A value of type UUID version 7.";
|
||||
FunctionDocumentation::Examples examples = {{"single", fmt::format(example_format, FillPolicy::name), ""}, {"multiple", fmt::format(multiple_example_format, fmt::arg("f", FillPolicy::name)), ""}};
|
||||
FunctionDocumentation::Categories categories = {"UUID"};
|
||||
|
||||
factory.template registerFunction<FunctionGenerateUUIDv7Base<FillPolicy>>({description, syntax, arguments, returned_value, examples, categories}, FunctionFactory::CaseInsensitive);
|
||||
}
|
||||
|
||||
REGISTER_FUNCTION(GenerateUUIDv7)
|
||||
{
|
||||
registerUUIDv7Generator<GlobalCounterPolicy>(factory);
|
||||
registerUUIDv7Generator<ThreadLocalCounterPolicy>(factory);
|
||||
registerUUIDv7Generator<FillAllRandomPolicy>(factory);
|
||||
FunctionDocumentation::Description description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit, including a variant field "2", 2 bit) to distinguish UUIDs within a millisecond, and a random field (32 bits). For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value. Function generateUUIDv7 guarantees that the counter field within a timestamp increments monotonically across all function invocations in concurrently running threads and queries.)";
|
||||
FunctionDocumentation::Syntax syntax = "SELECT generateUUIDv7()";
|
||||
FunctionDocumentation::Arguments arguments = {{"expression", "The expression is used to bypass common subexpression elimination if the function is called multiple times in a query but otherwise ignored. Optional."}};
|
||||
FunctionDocumentation::ReturnedValue returned_value = "A value of type UUID version 7.";
|
||||
FunctionDocumentation::Examples examples = {{"single", "SELECT generateUUIDv7()", ""}, {"multiple", "SELECT generateUUIDv7(1), generateUUIDv7(2)", ""}};
|
||||
FunctionDocumentation::Categories categories = {"UUID"};
|
||||
|
||||
factory.registerFunction<FunctionGenerateUUIDv7Base>({description, syntax, arguments, returned_value, examples, categories});
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -52,7 +52,6 @@ std::string toContentEncodingName(CompressionMethod method)
|
||||
case CompressionMethod::None:
|
||||
return "";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
CompressionMethod chooseHTTPCompressionMethod(const std::string & list)
|
||||
|
@ -88,7 +88,6 @@ public:
|
||||
case Status::TOO_LARGE_COMPRESSED_BLOCK:
|
||||
return "TOO_LARGE_COMPRESSED_BLOCK";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
explicit HadoopSnappyReadBuffer(
|
||||
|
@ -117,8 +117,6 @@ size_t AggregatedDataVariants::size() const
|
||||
APPLY_FOR_AGGREGATED_VARIANTS(M)
|
||||
#undef M
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
size_t AggregatedDataVariants::sizeWithoutOverflowRow() const
|
||||
@ -136,8 +134,6 @@ size_t AggregatedDataVariants::sizeWithoutOverflowRow() const
|
||||
APPLY_FOR_AGGREGATED_VARIANTS(M)
|
||||
#undef M
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
const char * AggregatedDataVariants::getMethodName() const
|
||||
@ -155,8 +151,6 @@ const char * AggregatedDataVariants::getMethodName() const
|
||||
APPLY_FOR_AGGREGATED_VARIANTS(M)
|
||||
#undef M
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
bool AggregatedDataVariants::isTwoLevel() const
|
||||
@ -174,8 +168,6 @@ bool AggregatedDataVariants::isTwoLevel() const
|
||||
APPLY_FOR_AGGREGATED_VARIANTS(M)
|
||||
#undef M
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
bool AggregatedDataVariants::isConvertibleToTwoLevel() const
|
||||
|
@ -799,7 +799,6 @@ String FileSegment::stateToString(FileSegment::State state)
|
||||
case FileSegment::State::DETACHED:
|
||||
return "DETACHED";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
bool FileSegment::assertCorrectness() const
|
||||
|
@ -130,6 +130,16 @@ public:
|
||||
UInt64 count_participating_replicas{0};
|
||||
UInt64 number_of_current_replica{0};
|
||||
|
||||
enum class BackgroundOperationType : uint8_t
|
||||
{
|
||||
NOT_A_BACKGROUND_OPERATION = 0,
|
||||
MERGE = 1,
|
||||
MUTATION = 2,
|
||||
};
|
||||
|
||||
/// It's ClientInfo and context created for background operation (not real query)
|
||||
BackgroundOperationType background_operation_type{BackgroundOperationType::NOT_A_BACKGROUND_OPERATION};
|
||||
|
||||
bool empty() const { return query_kind == QueryKind::NO_QUERY; }
|
||||
|
||||
/** Serialization and deserialization.
|
||||
|
@ -309,7 +309,6 @@ ComparisonGraphCompareResult ComparisonGraph<Node>::pathToCompareResult(Path pat
|
||||
case Path::GREATER: return inverse ? ComparisonGraphCompareResult::LESS : ComparisonGraphCompareResult::GREATER;
|
||||
case Path::GREATER_OR_EQUAL: return inverse ? ComparisonGraphCompareResult::LESS_OR_EQUAL : ComparisonGraphCompareResult::GREATER_OR_EQUAL;
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
template <ComparisonGraphNodeType Node>
|
||||
|
@ -1,10 +1,9 @@
|
||||
#include <memory>
|
||||
#include <mutex>
|
||||
#include <Columns/ColumnSparse.h>
|
||||
#include <Columns/FilterDescription.h>
|
||||
#include <Columns/IColumn.h>
|
||||
#include <Core/ColumnsWithTypeAndName.h>
|
||||
#include <Core/NamesAndTypes.h>
|
||||
#include <DataTypes/DataTypeLowCardinality.h>
|
||||
#include <Interpreters/ConcurrentHashJoin.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/ExpressionActions.h>
|
||||
@ -15,10 +14,20 @@
|
||||
#include <Parsers/ExpressionListParsers.h>
|
||||
#include <Parsers/IAST_fwd.h>
|
||||
#include <Parsers/parseQuery.h>
|
||||
#include <Common/CurrentThread.h>
|
||||
#include <Common/Exception.h>
|
||||
#include <Common/ThreadPool.h>
|
||||
#include <Common/WeakHash.h>
|
||||
#include <Common/scope_guard_safe.h>
|
||||
#include <Common/setThreadName.h>
|
||||
#include <Common/typeid_cast.h>
|
||||
#include <DataTypes/DataTypeLowCardinality.h>
|
||||
|
||||
namespace CurrentMetrics
|
||||
{
|
||||
extern const Metric ConcurrentHashJoinPoolThreads;
|
||||
extern const Metric ConcurrentHashJoinPoolThreadsActive;
|
||||
extern const Metric ConcurrentHashJoinPoolThreadsScheduled;
|
||||
}
|
||||
|
||||
namespace DB
|
||||
{
|
||||
@ -36,20 +45,82 @@ static UInt32 toPowerOfTwo(UInt32 x)
|
||||
return static_cast<UInt32>(1) << (32 - std::countl_zero(x - 1));
|
||||
}
|
||||
|
||||
ConcurrentHashJoin::ConcurrentHashJoin(ContextPtr context_, std::shared_ptr<TableJoin> table_join_, size_t slots_, const Block & right_sample_block, bool any_take_last_row_)
|
||||
ConcurrentHashJoin::ConcurrentHashJoin(
|
||||
ContextPtr context_, std::shared_ptr<TableJoin> table_join_, size_t slots_, const Block & right_sample_block, bool any_take_last_row_)
|
||||
: context(context_)
|
||||
, table_join(table_join_)
|
||||
, slots(toPowerOfTwo(std::min<UInt32>(static_cast<UInt32>(slots_), 256)))
|
||||
, pool(std::make_unique<ThreadPool>(
|
||||
CurrentMetrics::ConcurrentHashJoinPoolThreads,
|
||||
CurrentMetrics::ConcurrentHashJoinPoolThreadsActive,
|
||||
CurrentMetrics::ConcurrentHashJoinPoolThreadsScheduled,
|
||||
slots))
|
||||
{
|
||||
for (size_t i = 0; i < slots; ++i)
|
||||
{
|
||||
auto inner_hash_join = std::make_shared<InternalHashJoin>();
|
||||
hash_joins.resize(slots);
|
||||
|
||||
inner_hash_join->data = std::make_unique<HashJoin>(table_join_, right_sample_block, any_take_last_row_, 0, fmt::format("concurrent{}", i));
|
||||
/// Non zero `max_joined_block_rows` allows to process block partially and return not processed part.
|
||||
/// TODO: It's not handled properly in ConcurrentHashJoin case, so we set it to 0 to disable this feature.
|
||||
inner_hash_join->data->setMaxJoinedBlockRows(0);
|
||||
hash_joins.emplace_back(std::move(inner_hash_join));
|
||||
try
|
||||
{
|
||||
for (size_t i = 0; i < slots; ++i)
|
||||
{
|
||||
pool->scheduleOrThrow(
|
||||
[&, idx = i, thread_group = CurrentThread::getGroup()]()
|
||||
{
|
||||
SCOPE_EXIT_SAFE({
|
||||
if (thread_group)
|
||||
CurrentThread::detachFromGroupIfNotDetached();
|
||||
});
|
||||
|
||||
if (thread_group)
|
||||
CurrentThread::attachToGroupIfDetached(thread_group);
|
||||
setThreadName("ConcurrentJoin");
|
||||
|
||||
auto inner_hash_join = std::make_shared<InternalHashJoin>();
|
||||
inner_hash_join->data = std::make_unique<HashJoin>(
|
||||
table_join_, right_sample_block, any_take_last_row_, 0, fmt::format("concurrent{}", idx));
|
||||
/// Non zero `max_joined_block_rows` allows to process block partially and return not processed part.
|
||||
/// TODO: It's not handled properly in ConcurrentHashJoin case, so we set it to 0 to disable this feature.
|
||||
inner_hash_join->data->setMaxJoinedBlockRows(0);
|
||||
hash_joins[idx] = std::move(inner_hash_join);
|
||||
});
|
||||
}
|
||||
pool->wait();
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
tryLogCurrentException(__PRETTY_FUNCTION__);
|
||||
pool->wait();
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
ConcurrentHashJoin::~ConcurrentHashJoin()
|
||||
{
|
||||
try
|
||||
{
|
||||
for (size_t i = 0; i < slots; ++i)
|
||||
{
|
||||
// Hash tables destruction may be very time-consuming.
|
||||
// Without the following code, they would be destroyed in the current thread (i.e. sequentially).
|
||||
// `InternalHashJoin` is moved here and will be destroyed in the destructor of the lambda function.
|
||||
pool->scheduleOrThrow(
|
||||
[join = std::move(hash_joins[i]), thread_group = CurrentThread::getGroup()]()
|
||||
{
|
||||
SCOPE_EXIT_SAFE({
|
||||
if (thread_group)
|
||||
CurrentThread::detachFromGroupIfNotDetached();
|
||||
});
|
||||
|
||||
if (thread_group)
|
||||
CurrentThread::attachToGroupIfDetached(thread_group);
|
||||
setThreadName("ConcurrentJoin");
|
||||
});
|
||||
}
|
||||
pool->wait();
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
tryLogCurrentException(__PRETTY_FUNCTION__);
|
||||
pool->wait();
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -10,6 +10,7 @@
|
||||
#include <base/defines.h>
|
||||
#include <base/types.h>
|
||||
#include <Common/Stopwatch.h>
|
||||
#include <Common/ThreadPool_fwd.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
@ -39,7 +40,7 @@ public:
|
||||
const Block & right_sample_block,
|
||||
bool any_take_last_row_ = false);
|
||||
|
||||
~ConcurrentHashJoin() override = default;
|
||||
~ConcurrentHashJoin() override;
|
||||
|
||||
std::string getName() const override { return "ConcurrentHashJoin"; }
|
||||
const TableJoin & getTableJoin() const override { return *table_join; }
|
||||
@ -66,6 +67,7 @@ private:
|
||||
ContextPtr context;
|
||||
std::shared_ptr<TableJoin> table_join;
|
||||
size_t slots;
|
||||
std::unique_ptr<ThreadPool> pool;
|
||||
std::vector<std::shared_ptr<InternalHashJoin>> hash_joins;
|
||||
|
||||
std::mutex totals_mutex;
|
||||
|
@ -2386,6 +2386,17 @@ void Context::setCurrentQueryId(const String & query_id)
|
||||
client_info.initial_query_id = client_info.current_query_id;
|
||||
}
|
||||
|
||||
void Context::setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType background_operation)
|
||||
{
|
||||
chassert(background_operation != ClientInfo::BackgroundOperationType::NOT_A_BACKGROUND_OPERATION);
|
||||
client_info.background_operation_type = background_operation;
|
||||
}
|
||||
|
||||
bool Context::isBackgroundOperationContext() const
|
||||
{
|
||||
return client_info.background_operation_type != ClientInfo::BackgroundOperationType::NOT_A_BACKGROUND_OPERATION;
|
||||
}
|
||||
|
||||
void Context::killCurrentQuery() const
|
||||
{
|
||||
if (auto elem = getProcessListElement())
|
||||
|
@ -760,6 +760,12 @@ public:
|
||||
void setCurrentDatabaseNameInGlobalContext(const String & name);
|
||||
void setCurrentQueryId(const String & query_id);
|
||||
|
||||
/// FIXME: for background operations (like Merge and Mutation) we also use the same Context object and even setup
|
||||
/// query_id for it (table_uuid::result_part_name). We can distinguish queries from background operation in some way like
|
||||
/// bool is_background = query_id.contains("::"), but it's much worse than just enum check with more clear purpose
|
||||
void setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType setBackgroundOperationTypeForContextbackground_operation);
|
||||
bool isBackgroundOperationContext() const;
|
||||
|
||||
void killCurrentQuery() const;
|
||||
bool isCurrentQueryKilled() const;
|
||||
|
||||
|
@ -705,7 +705,6 @@ namespace
|
||||
APPLY_FOR_JOIN_VARIANTS(M)
|
||||
#undef M
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
}
|
||||
|
||||
@ -2641,8 +2640,6 @@ private:
|
||||
default:
|
||||
throw Exception(ErrorCodes::UNSUPPORTED_JOIN_KEYS, "Unsupported JOIN keys (type: {})", parent.data->type);
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
template <typename Map>
|
||||
|
@ -322,8 +322,6 @@ public:
|
||||
APPLY_FOR_JOIN_VARIANTS(M)
|
||||
#undef M
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
size_t getTotalByteCountImpl(Type which) const
|
||||
@ -338,8 +336,6 @@ public:
|
||||
APPLY_FOR_JOIN_VARIANTS(M)
|
||||
#undef M
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
size_t getBufferSizeInCells(Type which) const
|
||||
@ -354,8 +350,6 @@ public:
|
||||
APPLY_FOR_JOIN_VARIANTS(M)
|
||||
#undef M
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
/// NOLINTEND(bugprone-macro-parentheses)
|
||||
};
|
||||
|
@ -33,7 +33,6 @@ BlockIO InterpreterTransactionControlQuery::execute()
|
||||
case ASTTransactionControl::SET_SNAPSHOT:
|
||||
return executeSetSnapshot(session_context, tcl.snapshot);
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
BlockIO InterpreterTransactionControlQuery::executeBegin(ContextMutablePtr session_context)
|
||||
|
@ -41,8 +41,6 @@ size_t SetVariantsTemplate<Variant>::getTotalRowCount() const
|
||||
APPLY_FOR_SET_VARIANTS(M)
|
||||
#undef M
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
template <typename Variant>
|
||||
@ -57,8 +55,6 @@ size_t SetVariantsTemplate<Variant>::getTotalByteCount() const
|
||||
APPLY_FOR_SET_VARIANTS(M)
|
||||
#undef M
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
template <typename Variant>
|
||||
|
@ -40,8 +40,6 @@ public:
|
||||
case TableOverride: return "EXPLAIN TABLE OVERRIDE";
|
||||
case CurrentTransaction: return "EXPLAIN CURRENT TRANSACTION";
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
static ExplainKind fromString(const String & str)
|
||||
|
@ -42,7 +42,7 @@ Token quotedString(const char *& pos, const char * const token_begin, const char
|
||||
continue;
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
chassert(false);
|
||||
}
|
||||
}
|
||||
|
||||
@ -538,8 +538,6 @@ const char * getTokenName(TokenType type)
|
||||
APPLY_FOR_TOKENS(M)
|
||||
#undef M
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
|
||||
|
@ -51,7 +51,7 @@ FilterAnalysisResult analyzeFilter(const QueryTreeNodePtr & filter_expression_no
|
||||
return result;
|
||||
}
|
||||
|
||||
bool isDeterministicConstant(const ConstantNode & root)
|
||||
bool canRemoveConstantFromGroupByKey(const ConstantNode & root)
|
||||
{
|
||||
const auto & source_expression = root.getSourceExpression();
|
||||
if (!source_expression)
|
||||
@ -64,15 +64,20 @@ bool isDeterministicConstant(const ConstantNode & root)
|
||||
const auto * node = nodes.top();
|
||||
nodes.pop();
|
||||
|
||||
if (node->getNodeType() == QueryTreeNodeType::QUERY)
|
||||
/// Allow removing constants from scalar subqueries. We send them to all the shards.
|
||||
continue;
|
||||
|
||||
const auto * constant_node = node->as<ConstantNode>();
|
||||
const auto * function_node = node->as<FunctionNode>();
|
||||
if (constant_node)
|
||||
{
|
||||
if (!isDeterministicConstant(*constant_node))
|
||||
if (!canRemoveConstantFromGroupByKey(*constant_node))
|
||||
return false;
|
||||
}
|
||||
else if (function_node)
|
||||
{
|
||||
/// Do not allow removing constants like `hostName()`
|
||||
if (!function_node->getFunctionOrThrow()->isDeterministic())
|
||||
return false;
|
||||
|
||||
@ -122,7 +127,7 @@ std::optional<AggregationAnalysisResult> analyzeAggregation(const QueryTreeNodeP
|
||||
|
||||
bool is_secondary_query = planner_context->getQueryContext()->getClientInfo().query_kind == ClientInfo::QueryKind::SECONDARY_QUERY;
|
||||
bool is_distributed_query = planner_context->getQueryContext()->isDistributed();
|
||||
bool check_deterministic_constants = is_secondary_query || is_distributed_query;
|
||||
bool check_constants_for_group_by_key = is_secondary_query || is_distributed_query;
|
||||
|
||||
if (query_node.hasGroupBy())
|
||||
{
|
||||
@ -139,7 +144,7 @@ std::optional<AggregationAnalysisResult> analyzeAggregation(const QueryTreeNodeP
|
||||
const auto * constant_key = grouping_set_key_node->as<ConstantNode>();
|
||||
group_by_with_constant_keys |= (constant_key != nullptr);
|
||||
|
||||
if (constant_key && !aggregates_descriptions.empty() && (!check_deterministic_constants || isDeterministicConstant(*constant_key)))
|
||||
if (constant_key && !aggregates_descriptions.empty() && (!check_constants_for_group_by_key || canRemoveConstantFromGroupByKey(*constant_key)))
|
||||
continue;
|
||||
|
||||
auto expression_dag_nodes = actions_visitor.visit(before_aggregation_actions, grouping_set_key_node);
|
||||
@ -191,7 +196,7 @@ std::optional<AggregationAnalysisResult> analyzeAggregation(const QueryTreeNodeP
|
||||
const auto * constant_key = group_by_key_node->as<ConstantNode>();
|
||||
group_by_with_constant_keys |= (constant_key != nullptr);
|
||||
|
||||
if (constant_key && !aggregates_descriptions.empty() && (!check_deterministic_constants || isDeterministicConstant(*constant_key)))
|
||||
if (constant_key && !aggregates_descriptions.empty() && (!check_constants_for_group_by_key || canRemoveConstantFromGroupByKey(*constant_key)))
|
||||
continue;
|
||||
|
||||
auto expression_dag_nodes = actions_visitor.visit(before_aggregation_actions, group_by_key_node);
|
||||
|
@ -657,7 +657,6 @@ DataTypePtr MsgPackSchemaReader::getDataType(const msgpack::object & object)
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Msgpack extension type {:x} is not supported", object_ext.type());
|
||||
}
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
std::optional<DataTypes> MsgPackSchemaReader::readRowAndGetDataTypes()
|
||||
|
@ -36,8 +36,6 @@ std::string IProcessor::statusToName(Status status)
|
||||
case Status::ExpandPipeline:
|
||||
return "ExpandPipeline";
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
}
|
||||
|
156
src/Processors/QueryPlan/ReadFromLoopStep.cpp
Normal file
156
src/Processors/QueryPlan/ReadFromLoopStep.cpp
Normal file
@ -0,0 +1,156 @@
|
||||
#include <Processors/QueryPlan/ReadFromLoopStep.h>
|
||||
#include <Processors/QueryPlan/QueryPlan.h>
|
||||
#include <Storages/IStorage.h>
|
||||
#include <QueryPipeline/QueryPipelineBuilder.h>
|
||||
#include <QueryPipeline/QueryPipeline.h>
|
||||
#include <Processors/QueryPlan/BuildQueryPipelineSettings.h>
|
||||
#include <Processors/QueryPlan/Optimizations/QueryPlanOptimizationSettings.h>
|
||||
#include <QueryPipeline/QueryPlanResourceHolder.h>
|
||||
#include <Processors/ISource.h>
|
||||
#include <Processors/Sources/NullSource.h>
|
||||
#include <Processors/Executors/PullingPipelineExecutor.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int TOO_MANY_RETRIES_TO_FETCH_PARTS;
|
||||
}
|
||||
class PullingPipelineExecutor;
|
||||
|
||||
class LoopSource : public ISource
|
||||
{
|
||||
public:
|
||||
|
||||
LoopSource(
|
||||
const Names & column_names_,
|
||||
const SelectQueryInfo & query_info_,
|
||||
const StorageSnapshotPtr & storage_snapshot_,
|
||||
ContextPtr & context_,
|
||||
QueryProcessingStage::Enum processed_stage_,
|
||||
StoragePtr inner_storage_,
|
||||
size_t max_block_size_,
|
||||
size_t num_streams_)
|
||||
: ISource(storage_snapshot_->getSampleBlockForColumns(column_names_))
|
||||
, column_names(column_names_)
|
||||
, query_info(query_info_)
|
||||
, storage_snapshot(storage_snapshot_)
|
||||
, processed_stage(processed_stage_)
|
||||
, context(context_)
|
||||
, inner_storage(std::move(inner_storage_))
|
||||
, max_block_size(max_block_size_)
|
||||
, num_streams(num_streams_)
|
||||
{
|
||||
}
|
||||
|
||||
String getName() const override { return "Loop"; }
|
||||
|
||||
Chunk generate() override
|
||||
{
|
||||
while (true)
|
||||
{
|
||||
if (!loop)
|
||||
{
|
||||
QueryPlan plan;
|
||||
auto storage_snapshot_ = inner_storage->getStorageSnapshotForQuery(inner_storage->getInMemoryMetadataPtr(), nullptr, context);
|
||||
inner_storage->read(
|
||||
plan,
|
||||
column_names,
|
||||
storage_snapshot_,
|
||||
query_info,
|
||||
context,
|
||||
processed_stage,
|
||||
max_block_size,
|
||||
num_streams);
|
||||
auto builder = plan.buildQueryPipeline(
|
||||
QueryPlanOptimizationSettings::fromContext(context),
|
||||
BuildQueryPipelineSettings::fromContext(context));
|
||||
QueryPlanResourceHolder resources;
|
||||
auto pipe = QueryPipelineBuilder::getPipe(std::move(*builder), resources);
|
||||
query_pipeline = QueryPipeline(std::move(pipe));
|
||||
executor = std::make_unique<PullingPipelineExecutor>(query_pipeline);
|
||||
loop = true;
|
||||
}
|
||||
Chunk chunk;
|
||||
if (executor->pull(chunk))
|
||||
{
|
||||
if (chunk)
|
||||
{
|
||||
retries_count = 0;
|
||||
return chunk;
|
||||
}
|
||||
|
||||
}
|
||||
else
|
||||
{
|
||||
++retries_count;
|
||||
if (retries_count > max_retries_count)
|
||||
throw Exception(ErrorCodes::TOO_MANY_RETRIES_TO_FETCH_PARTS, "Too many retries to pull from storage");
|
||||
loop = false;
|
||||
executor.reset();
|
||||
query_pipeline.reset();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private:
|
||||
|
||||
const Names column_names;
|
||||
SelectQueryInfo query_info;
|
||||
const StorageSnapshotPtr storage_snapshot;
|
||||
QueryProcessingStage::Enum processed_stage;
|
||||
ContextPtr context;
|
||||
StoragePtr inner_storage;
|
||||
size_t max_block_size;
|
||||
size_t num_streams;
|
||||
// add retries. If inner_storage failed to pull X times in a row we'd better to fail here not to hang
|
||||
size_t retries_count = 0;
|
||||
size_t max_retries_count = 3;
|
||||
bool loop = false;
|
||||
QueryPipeline query_pipeline;
|
||||
std::unique_ptr<PullingPipelineExecutor> executor;
|
||||
};
|
||||
|
||||
ReadFromLoopStep::ReadFromLoopStep(
|
||||
const Names & column_names_,
|
||||
const SelectQueryInfo & query_info_,
|
||||
const StorageSnapshotPtr & storage_snapshot_,
|
||||
const ContextPtr & context_,
|
||||
QueryProcessingStage::Enum processed_stage_,
|
||||
StoragePtr inner_storage_,
|
||||
size_t max_block_size_,
|
||||
size_t num_streams_)
|
||||
: SourceStepWithFilter(
|
||||
DataStream{.header = storage_snapshot_->getSampleBlockForColumns(column_names_)},
|
||||
column_names_,
|
||||
query_info_,
|
||||
storage_snapshot_,
|
||||
context_)
|
||||
, column_names(column_names_)
|
||||
, processed_stage(processed_stage_)
|
||||
, inner_storage(std::move(inner_storage_))
|
||||
, max_block_size(max_block_size_)
|
||||
, num_streams(num_streams_)
|
||||
{
|
||||
}
|
||||
|
||||
Pipe ReadFromLoopStep::makePipe()
|
||||
{
|
||||
return Pipe(std::make_shared<LoopSource>(
|
||||
column_names, query_info, storage_snapshot, context, processed_stage, inner_storage, max_block_size, num_streams));
|
||||
}
|
||||
|
||||
void ReadFromLoopStep::initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &)
|
||||
{
|
||||
auto pipe = makePipe();
|
||||
|
||||
if (pipe.empty())
|
||||
{
|
||||
assert(output_stream != std::nullopt);
|
||||
pipe = Pipe(std::make_shared<NullSource>(output_stream->header));
|
||||
}
|
||||
|
||||
pipeline.init(std::move(pipe));
|
||||
}
|
||||
|
||||
}
|
37
src/Processors/QueryPlan/ReadFromLoopStep.h
Normal file
37
src/Processors/QueryPlan/ReadFromLoopStep.h
Normal file
@ -0,0 +1,37 @@
|
||||
#pragma once
|
||||
#include <Core/QueryProcessingStage.h>
|
||||
#include <Processors/QueryPlan/SourceStepWithFilter.h>
|
||||
#include <QueryPipeline/Pipe.h>
|
||||
#include <Storages/SelectQueryInfo.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class ReadFromLoopStep final : public SourceStepWithFilter
|
||||
{
|
||||
public:
|
||||
ReadFromLoopStep(
|
||||
const Names & column_names_,
|
||||
const SelectQueryInfo & query_info_,
|
||||
const StorageSnapshotPtr & storage_snapshot_,
|
||||
const ContextPtr & context_,
|
||||
QueryProcessingStage::Enum processed_stage_,
|
||||
StoragePtr inner_storage_,
|
||||
size_t max_block_size_,
|
||||
size_t num_streams_);
|
||||
|
||||
String getName() const override { return "ReadFromLoop"; }
|
||||
|
||||
void initializePipeline(QueryPipelineBuilder & pipeline, const BuildQueryPipelineSettings &) override;
|
||||
|
||||
private:
|
||||
|
||||
Pipe makePipe();
|
||||
|
||||
const Names column_names;
|
||||
QueryProcessingStage::Enum processed_stage;
|
||||
StoragePtr inner_storage;
|
||||
size_t max_block_size;
|
||||
size_t num_streams;
|
||||
};
|
||||
}
|
@ -1136,8 +1136,6 @@ static void addMergingFinal(
|
||||
return std::make_shared<GraphiteRollupSortedTransform>(header, num_outputs,
|
||||
sort_description, max_block_size_rows, /*max_block_size_bytes=*/0, merging_params.graphite_params, now);
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
};
|
||||
|
||||
pipe.addTransform(get_merging_processor());
|
||||
@ -2125,8 +2123,6 @@ static const char * indexTypeToString(ReadFromMergeTree::IndexType type)
|
||||
case ReadFromMergeTree::IndexType::Skip:
|
||||
return "Skip";
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
static const char * readTypeToString(ReadFromMergeTree::ReadType type)
|
||||
@ -2142,8 +2138,6 @@ static const char * readTypeToString(ReadFromMergeTree::ReadType type)
|
||||
case ReadFromMergeTree::ReadType::ParallelReplicas:
|
||||
return "Parallel";
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
void ReadFromMergeTree::describeActions(FormatSettings & format_settings) const
|
||||
|
@ -86,8 +86,6 @@ static String totalsModeToString(TotalsMode totals_mode, double auto_include_thr
|
||||
case TotalsMode::AFTER_HAVING_AUTO:
|
||||
return "after_having_auto threshold " + std::to_string(auto_include_threshold);
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
void TotalsHavingStep::describeActions(FormatSettings & settings) const
|
||||
|
@ -67,7 +67,6 @@ static FillColumnDescription::StepFunction getStepFunction(
|
||||
FOR_EACH_INTERVAL_KIND(DECLARE_CASE)
|
||||
#undef DECLARE_CASE
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
static bool tryConvertFields(FillColumnDescription & descr, const DataTypePtr & type)
|
||||
|
@ -898,8 +898,6 @@ static std::exception_ptr addStorageToException(std::exception_ptr ptr, const St
|
||||
{
|
||||
return std::current_exception();
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
void FinalizingViewsTransform::work()
|
||||
|
@ -93,7 +93,6 @@ String BackgroundJobsAssignee::toString(Type type)
|
||||
case Type::Moving:
|
||||
return "Moving";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
void BackgroundJobsAssignee::start()
|
||||
|
@ -2964,8 +2964,6 @@ String KeyCondition::RPNElement::toString(std::string_view column_name, bool pri
|
||||
case ALWAYS_TRUE:
|
||||
return "true";
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
|
||||
|
@ -312,6 +312,7 @@ ReplicatedMergeMutateTaskBase::PrepareResult MergeFromLogEntryTask::prepare()
|
||||
task_context = Context::createCopy(storage.getContext());
|
||||
task_context->makeQueryContext();
|
||||
task_context->setCurrentQueryId(getQueryId());
|
||||
task_context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MERGE);
|
||||
|
||||
/// Add merge to list
|
||||
merge_mutate_entry = storage.getContext()->getMergeList().insert(
|
||||
|
@ -168,6 +168,7 @@ ContextMutablePtr MergePlainMergeTreeTask::createTaskContext() const
|
||||
context->makeQueryContext();
|
||||
auto queryId = getQueryId();
|
||||
context->setCurrentQueryId(queryId);
|
||||
context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MERGE);
|
||||
return context;
|
||||
}
|
||||
|
||||
|
@ -555,6 +555,7 @@ bool MergeTask::VerticalMergeStage::prepareVerticalMergeForAllColumns() const
|
||||
std::unique_ptr<ReadBuffer> reread_buf = wbuf_readable ? wbuf_readable->tryGetReadBuffer() : nullptr;
|
||||
if (!reread_buf)
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot read temporary file {}", ctx->rows_sources_uncompressed_write_buf->getFileName());
|
||||
|
||||
auto * reread_buffer_raw = dynamic_cast<ReadBufferFromFile *>(reread_buf.get());
|
||||
if (!reread_buffer_raw)
|
||||
{
|
||||
@ -573,6 +574,7 @@ bool MergeTask::VerticalMergeStage::prepareVerticalMergeForAllColumns() const
|
||||
ctx->it_name_and_type = global_ctx->gathering_columns.cbegin();
|
||||
|
||||
const auto & settings = global_ctx->context->getSettingsRef();
|
||||
|
||||
size_t max_delayed_streams = 0;
|
||||
if (global_ctx->new_data_part->getDataPartStorage().supportParallelWrite())
|
||||
{
|
||||
@ -581,19 +583,20 @@ bool MergeTask::VerticalMergeStage::prepareVerticalMergeForAllColumns() const
|
||||
else
|
||||
max_delayed_streams = DEFAULT_DELAYED_STREAMS_FOR_PARALLEL_WRITE;
|
||||
}
|
||||
|
||||
ctx->max_delayed_streams = max_delayed_streams;
|
||||
|
||||
bool all_parts_on_remote_disks = std::ranges::all_of(global_ctx->future_part->parts, [](const auto & part) { return part->isStoredOnRemoteDisk(); });
|
||||
ctx->use_prefetch = all_parts_on_remote_disks && global_ctx->data->getSettings()->vertical_merge_remote_filesystem_prefetch;
|
||||
|
||||
if (ctx->use_prefetch && ctx->it_name_and_type != global_ctx->gathering_columns.end())
|
||||
ctx->prepared_pipe = createPipeForReadingOneColumn(ctx->it_name_and_type->name);
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
void MergeTask::VerticalMergeStage::prepareVerticalMergeForOneColumn() const
|
||||
Pipe MergeTask::VerticalMergeStage::createPipeForReadingOneColumn(const String & column_name) const
|
||||
{
|
||||
const auto & [column_name, column_type] = *ctx->it_name_and_type;
|
||||
Names column_names{column_name};
|
||||
|
||||
ctx->progress_before = global_ctx->merge_list_element_ptr->progress.load(std::memory_order_relaxed);
|
||||
global_ctx->column_progress = std::make_unique<MergeStageProgress>(ctx->progress_before, ctx->column_sizes->columnWeight(column_name));
|
||||
|
||||
Pipes pipes;
|
||||
for (size_t part_num = 0; part_num < global_ctx->future_part->parts.size(); ++part_num)
|
||||
{
|
||||
@ -602,23 +605,44 @@ void MergeTask::VerticalMergeStage::prepareVerticalMergeForOneColumn() const
|
||||
*global_ctx->data,
|
||||
global_ctx->storage_snapshot,
|
||||
global_ctx->future_part->parts[part_num],
|
||||
column_names,
|
||||
Names{column_name},
|
||||
/*mark_ranges=*/ {},
|
||||
global_ctx->input_rows_filtered,
|
||||
/*apply_deleted_mask=*/ true,
|
||||
ctx->read_with_direct_io,
|
||||
/*take_column_types_from_storage=*/ true,
|
||||
/*quiet=*/ false,
|
||||
global_ctx->input_rows_filtered);
|
||||
ctx->use_prefetch);
|
||||
|
||||
pipes.emplace_back(std::move(pipe));
|
||||
}
|
||||
|
||||
auto pipe = Pipe::unitePipes(std::move(pipes));
|
||||
ctx->rows_sources_read_buf->seek(0, 0);
|
||||
return Pipe::unitePipes(std::move(pipes));
|
||||
}
|
||||
|
||||
const auto data_settings = global_ctx->data->getSettings();
|
||||
void MergeTask::VerticalMergeStage::prepareVerticalMergeForOneColumn() const
|
||||
{
|
||||
const auto & column_name = ctx->it_name_and_type->name;
|
||||
|
||||
ctx->progress_before = global_ctx->merge_list_element_ptr->progress.load(std::memory_order_relaxed);
|
||||
global_ctx->column_progress = std::make_unique<MergeStageProgress>(ctx->progress_before, ctx->column_sizes->columnWeight(column_name));
|
||||
|
||||
Pipe pipe;
|
||||
if (ctx->prepared_pipe)
|
||||
{
|
||||
pipe = std::move(*ctx->prepared_pipe);
|
||||
|
||||
auto next_column_it = std::next(ctx->it_name_and_type);
|
||||
if (next_column_it != global_ctx->gathering_columns.end())
|
||||
ctx->prepared_pipe = createPipeForReadingOneColumn(next_column_it->name);
|
||||
}
|
||||
else
|
||||
{
|
||||
pipe = createPipeForReadingOneColumn(column_name);
|
||||
}
|
||||
|
||||
ctx->rows_sources_read_buf->seek(0, 0);
|
||||
bool is_result_sparse = global_ctx->new_data_part->getSerialization(column_name)->getKind() == ISerialization::Kind::SPARSE;
|
||||
|
||||
const auto data_settings = global_ctx->data->getSettings();
|
||||
auto transform = std::make_unique<ColumnGathererTransform>(
|
||||
pipe.getHeader(),
|
||||
pipe.numOutputPorts(),
|
||||
@ -983,11 +1007,10 @@ void MergeTask::ExecuteAndFinalizeHorizontalPart::createMergedStream()
|
||||
part,
|
||||
merging_column_names,
|
||||
/*mark_ranges=*/ {},
|
||||
global_ctx->input_rows_filtered,
|
||||
/*apply_deleted_mask=*/ true,
|
||||
ctx->read_with_direct_io,
|
||||
/*take_column_types_from_storage=*/ true,
|
||||
/*quiet=*/ false,
|
||||
global_ctx->input_rows_filtered);
|
||||
/*prefetch=*/ false);
|
||||
|
||||
if (global_ctx->metadata_snapshot->hasSortingKey())
|
||||
{
|
||||
|
@ -299,7 +299,9 @@ private:
|
||||
|
||||
Float64 progress_before = 0;
|
||||
std::unique_ptr<MergedColumnOnlyOutputStream> column_to{nullptr};
|
||||
std::optional<Pipe> prepared_pipe;
|
||||
size_t max_delayed_streams = 0;
|
||||
bool use_prefetch = false;
|
||||
std::list<std::unique_ptr<MergedColumnOnlyOutputStream>> delayed_streams;
|
||||
size_t column_elems_written{0};
|
||||
QueryPipeline column_parts_pipeline;
|
||||
@ -340,6 +342,8 @@ private:
|
||||
bool executeVerticalMergeForOneColumn() const;
|
||||
void finalizeVerticalMergeForOneColumn() const;
|
||||
|
||||
Pipe createPipeForReadingOneColumn(const String & column_name) const;
|
||||
|
||||
VerticalMergeRuntimeContextPtr ctx;
|
||||
GlobalRuntimeContextPtr global_ctx;
|
||||
};
|
||||
|
@ -1177,8 +1177,6 @@ String MergeTreeData::MergingParams::getModeName() const
|
||||
case Graphite: return "Graphite";
|
||||
case VersionedCollapsing: return "VersionedCollapsing";
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
Int64 MergeTreeData::getMaxBlockNumber() const
|
||||
|
@ -360,8 +360,6 @@ Block MergeTreeDataWriter::mergeBlock(
|
||||
return std::make_shared<GraphiteRollupSortedAlgorithm>(
|
||||
block, 1, sort_description, block_size + 1, /*block_size_bytes=*/0, merging_params.graphite_params, time(nullptr));
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
};
|
||||
|
||||
auto merging_algorithm = get_merging_algorithm();
|
||||
|
@ -42,8 +42,7 @@ public:
|
||||
std::optional<MarkRanges> mark_ranges_,
|
||||
bool apply_deleted_mask,
|
||||
bool read_with_direct_io_,
|
||||
bool take_column_types_from_storage,
|
||||
bool quiet = false);
|
||||
bool prefetch);
|
||||
|
||||
~MergeTreeSequentialSource() override;
|
||||
|
||||
@ -96,8 +95,7 @@ MergeTreeSequentialSource::MergeTreeSequentialSource(
|
||||
std::optional<MarkRanges> mark_ranges_,
|
||||
bool apply_deleted_mask,
|
||||
bool read_with_direct_io_,
|
||||
bool take_column_types_from_storage,
|
||||
bool quiet)
|
||||
bool prefetch)
|
||||
: ISource(storage_snapshot_->getSampleBlockForColumns(columns_to_read_))
|
||||
, storage(storage_)
|
||||
, storage_snapshot(storage_snapshot_)
|
||||
@ -107,16 +105,13 @@ MergeTreeSequentialSource::MergeTreeSequentialSource(
|
||||
, mark_ranges(std::move(mark_ranges_))
|
||||
, mark_cache(storage.getContext()->getMarkCache())
|
||||
{
|
||||
if (!quiet)
|
||||
{
|
||||
/// Print column name but don't pollute logs in case of many columns.
|
||||
if (columns_to_read.size() == 1)
|
||||
LOG_DEBUG(log, "Reading {} marks from part {}, total {} rows starting from the beginning of the part, column {}",
|
||||
data_part->getMarksCount(), data_part->name, data_part->rows_count, columns_to_read.front());
|
||||
else
|
||||
LOG_DEBUG(log, "Reading {} marks from part {}, total {} rows starting from the beginning of the part",
|
||||
data_part->getMarksCount(), data_part->name, data_part->rows_count);
|
||||
}
|
||||
/// Print column name but don't pollute logs in case of many columns.
|
||||
if (columns_to_read.size() == 1)
|
||||
LOG_DEBUG(log, "Reading {} marks from part {}, total {} rows starting from the beginning of the part, column {}",
|
||||
data_part->getMarksCount(), data_part->name, data_part->rows_count, columns_to_read.front());
|
||||
else
|
||||
LOG_DEBUG(log, "Reading {} marks from part {}, total {} rows starting from the beginning of the part",
|
||||
data_part->getMarksCount(), data_part->name, data_part->rows_count);
|
||||
|
||||
auto alter_conversions = storage.getAlterConversionsForPart(data_part);
|
||||
|
||||
@ -131,21 +126,12 @@ MergeTreeSequentialSource::MergeTreeSequentialSource(
|
||||
storage.supportsSubcolumns(),
|
||||
columns_to_read);
|
||||
|
||||
NamesAndTypesList columns_for_reader;
|
||||
if (take_column_types_from_storage)
|
||||
{
|
||||
auto options = GetColumnsOptions(GetColumnsOptions::AllPhysical)
|
||||
.withExtendedObjects()
|
||||
.withVirtuals()
|
||||
.withSubcolumns(storage.supportsSubcolumns());
|
||||
auto options = GetColumnsOptions(GetColumnsOptions::AllPhysical)
|
||||
.withExtendedObjects()
|
||||
.withVirtuals()
|
||||
.withSubcolumns(storage.supportsSubcolumns());
|
||||
|
||||
columns_for_reader = storage_snapshot->getColumnsByNames(options, columns_to_read);
|
||||
}
|
||||
else
|
||||
{
|
||||
/// take columns from data_part
|
||||
columns_for_reader = data_part->getColumns().addTypes(columns_to_read);
|
||||
}
|
||||
auto columns_for_reader = storage_snapshot->getColumnsByNames(options, columns_to_read);
|
||||
|
||||
const auto & context = storage.getContext();
|
||||
ReadSettings read_settings = context->getReadSettings();
|
||||
@ -191,6 +177,9 @@ MergeTreeSequentialSource::MergeTreeSequentialSource(
|
||||
reader_settings,
|
||||
/*avg_value_size_hints=*/ {},
|
||||
/*profile_callback=*/ {});
|
||||
|
||||
if (prefetch)
|
||||
reader->prefetchBeginOfRange(Priority{});
|
||||
}
|
||||
|
||||
static void fillBlockNumberColumns(
|
||||
@ -313,11 +302,10 @@ Pipe createMergeTreeSequentialSource(
|
||||
MergeTreeData::DataPartPtr data_part,
|
||||
Names columns_to_read,
|
||||
std::optional<MarkRanges> mark_ranges,
|
||||
std::shared_ptr<std::atomic<size_t>> filtered_rows_count,
|
||||
bool apply_deleted_mask,
|
||||
bool read_with_direct_io,
|
||||
bool take_column_types_from_storage,
|
||||
bool quiet,
|
||||
std::shared_ptr<std::atomic<size_t>> filtered_rows_count)
|
||||
bool prefetch)
|
||||
{
|
||||
|
||||
/// The part might have some rows masked by lightweight deletes
|
||||
@ -329,7 +317,7 @@ Pipe createMergeTreeSequentialSource(
|
||||
|
||||
auto column_part_source = std::make_shared<MergeTreeSequentialSource>(type,
|
||||
storage, storage_snapshot, data_part, columns_to_read, std::move(mark_ranges),
|
||||
/*apply_deleted_mask=*/ false, read_with_direct_io, take_column_types_from_storage, quiet);
|
||||
/*apply_deleted_mask=*/ false, read_with_direct_io, prefetch);
|
||||
|
||||
Pipe pipe(std::move(column_part_source));
|
||||
|
||||
@ -408,11 +396,10 @@ public:
|
||||
data_part,
|
||||
columns_to_read,
|
||||
std::move(mark_ranges),
|
||||
/*filtered_rows_count=*/ nullptr,
|
||||
apply_deleted_mask,
|
||||
/*read_with_direct_io=*/ false,
|
||||
/*take_column_types_from_storage=*/ true,
|
||||
/*quiet=*/ false,
|
||||
/*filtered_rows_count=*/ nullptr);
|
||||
/*prefetch=*/ false);
|
||||
|
||||
pipeline.init(Pipe(std::move(source)));
|
||||
}
|
||||
|
@ -23,11 +23,10 @@ Pipe createMergeTreeSequentialSource(
|
||||
MergeTreeData::DataPartPtr data_part,
|
||||
Names columns_to_read,
|
||||
std::optional<MarkRanges> mark_ranges,
|
||||
std::shared_ptr<std::atomic<size_t>> filtered_rows_count,
|
||||
bool apply_deleted_mask,
|
||||
bool read_with_direct_io,
|
||||
bool take_column_types_from_storage,
|
||||
bool quiet,
|
||||
std::shared_ptr<std::atomic<size_t>> filtered_rows_count);
|
||||
bool prefetch);
|
||||
|
||||
class QueryPlan;
|
||||
|
||||
|
@ -35,7 +35,7 @@ struct Settings;
|
||||
M(UInt64, min_bytes_for_wide_part, 10485760, "Minimal uncompressed size in bytes to create part in wide format instead of compact", 0) \
|
||||
M(UInt64, min_rows_for_wide_part, 0, "Minimal number of rows to create part in wide format instead of compact", 0) \
|
||||
M(Float, ratio_of_defaults_for_sparse_serialization, 0.9375f, "Minimal ratio of number of default values to number of all values in column to store it in sparse serializations. If >= 1, columns will be always written in full serialization.", 0) \
|
||||
M(Bool, replace_long_file_name_to_hash, false, "If the file name for column is too long (more than 'max_file_name_length' bytes) replace it to SipHash128", 0) \
|
||||
M(Bool, replace_long_file_name_to_hash, true, "If the file name for column is too long (more than 'max_file_name_length' bytes) replace it to SipHash128", 0) \
|
||||
M(UInt64, max_file_name_length, 127, "The maximal length of the file name to keep it as is without hashing", 0) \
|
||||
M(UInt64, min_bytes_for_full_part_storage, 0, "Only available in ClickHouse Cloud", 0) \
|
||||
M(UInt64, min_rows_for_full_part_storage, 0, "Only available in ClickHouse Cloud", 0) \
|
||||
@ -148,6 +148,7 @@ struct Settings;
|
||||
M(UInt64, vertical_merge_algorithm_min_rows_to_activate, 16 * 8192, "Minimal (approximate) sum of rows in merging parts to activate Vertical merge algorithm.", 0) \
|
||||
M(UInt64, vertical_merge_algorithm_min_bytes_to_activate, 0, "Minimal (approximate) uncompressed size in bytes in merging parts to activate Vertical merge algorithm.", 0) \
|
||||
M(UInt64, vertical_merge_algorithm_min_columns_to_activate, 11, "Minimal amount of non-PK columns to activate Vertical merge algorithm.", 0) \
|
||||
M(Bool, vertical_merge_remote_filesystem_prefetch, true, "If true prefetching of data from remote filesystem is used for the next column during merge", 0) \
|
||||
M(UInt64, max_postpone_time_for_failed_mutations_ms, 5ULL * 60 * 1000, "The maximum postpone time for failed mutations.", 0) \
|
||||
\
|
||||
/** Compatibility settings */ \
|
||||
|
@ -206,6 +206,7 @@ ReplicatedMergeMutateTaskBase::PrepareResult MutateFromLogEntryTask::prepare()
|
||||
task_context = Context::createCopy(storage.getContext());
|
||||
task_context->makeQueryContext();
|
||||
task_context->setCurrentQueryId(getQueryId());
|
||||
task_context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MUTATION);
|
||||
|
||||
merge_mutate_entry = storage.getContext()->getMergeList().insert(
|
||||
storage.getStorageID(),
|
||||
|
@ -139,6 +139,7 @@ ContextMutablePtr MutatePlainMergeTreeTask::createTaskContext() const
|
||||
context->makeQueryContext();
|
||||
auto queryId = getQueryId();
|
||||
context->setCurrentQueryId(queryId);
|
||||
context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MUTATION);
|
||||
return context;
|
||||
}
|
||||
|
||||
|
@ -616,8 +616,6 @@ PartMovesBetweenShardsOrchestrator::Entry PartMovesBetweenShardsOrchestrator::st
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
void PartMovesBetweenShardsOrchestrator::removePins(const Entry & entry, zkutil::ZooKeeperPtr zk)
|
||||
|
49
src/Storages/StorageLoop.cpp
Normal file
49
src/Storages/StorageLoop.cpp
Normal file
@ -0,0 +1,49 @@
|
||||
#include "StorageLoop.h"
|
||||
#include <Storages/StorageFactory.h>
|
||||
#include <Processors/QueryPlan/QueryPlan.h>
|
||||
#include <Processors/QueryPlan/ReadFromLoopStep.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
namespace ErrorCodes
|
||||
{
|
||||
|
||||
}
|
||||
StorageLoop::StorageLoop(
|
||||
const StorageID & table_id_,
|
||||
StoragePtr inner_storage_)
|
||||
: IStorage(table_id_)
|
||||
, inner_storage(std::move(inner_storage_))
|
||||
{
|
||||
StorageInMemoryMetadata storage_metadata = inner_storage->getInMemoryMetadata();
|
||||
setInMemoryMetadata(storage_metadata);
|
||||
}
|
||||
|
||||
|
||||
void StorageLoop::read(
|
||||
QueryPlan & query_plan,
|
||||
const Names & column_names,
|
||||
const StorageSnapshotPtr & storage_snapshot,
|
||||
SelectQueryInfo & query_info,
|
||||
ContextPtr context,
|
||||
QueryProcessingStage::Enum processed_stage,
|
||||
size_t max_block_size,
|
||||
size_t num_streams)
|
||||
{
|
||||
query_info.optimize_trivial_count = false;
|
||||
|
||||
query_plan.addStep(std::make_unique<ReadFromLoopStep>(
|
||||
column_names, query_info, storage_snapshot, context, processed_stage, inner_storage, max_block_size, num_streams
|
||||
));
|
||||
}
|
||||
|
||||
void registerStorageLoop(StorageFactory & factory)
|
||||
{
|
||||
factory.registerStorage("Loop", [](const StorageFactory::Arguments & args)
|
||||
{
|
||||
StoragePtr inner_storage;
|
||||
return std::make_shared<StorageLoop>(args.table_id, inner_storage);
|
||||
});
|
||||
}
|
||||
}
|
33
src/Storages/StorageLoop.h
Normal file
33
src/Storages/StorageLoop.h
Normal file
@ -0,0 +1,33 @@
|
||||
#pragma once
|
||||
#include "config.h"
|
||||
#include <Storages/IStorage.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class StorageLoop final : public IStorage
|
||||
{
|
||||
public:
|
||||
StorageLoop(
|
||||
const StorageID & table_id,
|
||||
StoragePtr inner_storage_);
|
||||
|
||||
std::string getName() const override { return "Loop"; }
|
||||
|
||||
void read(
|
||||
QueryPlan & query_plan,
|
||||
const Names & column_names,
|
||||
const StorageSnapshotPtr & storage_snapshot,
|
||||
SelectQueryInfo & query_info,
|
||||
ContextPtr context,
|
||||
QueryProcessingStage::Enum processed_stage,
|
||||
size_t max_block_size,
|
||||
size_t num_streams) override;
|
||||
|
||||
bool supportsTrivialCountOptimization(const StorageSnapshotPtr &, ContextPtr) const override { return false; }
|
||||
|
||||
private:
|
||||
StoragePtr inner_storage;
|
||||
};
|
||||
}
|
@ -297,7 +297,6 @@ namespace
|
||||
CASE_WINDOW_KIND(Year)
|
||||
#undef CASE_WINDOW_KIND
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
class AddingAggregatedChunkInfoTransform : public ISimpleTransform
|
||||
@ -920,7 +919,6 @@ UInt32 StorageWindowView::getWindowLowerBound(UInt32 time_sec)
|
||||
CASE_WINDOW_KIND(Year)
|
||||
#undef CASE_WINDOW_KIND
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
UInt32 StorageWindowView::getWindowUpperBound(UInt32 time_sec)
|
||||
@ -948,7 +946,6 @@ UInt32 StorageWindowView::getWindowUpperBound(UInt32 time_sec)
|
||||
CASE_WINDOW_KIND(Year)
|
||||
#undef CASE_WINDOW_KIND
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
void StorageWindowView::addFireSignal(std::set<UInt32> & signals)
|
||||
|
@ -25,6 +25,7 @@ void registerStorageLiveView(StorageFactory & factory);
|
||||
void registerStorageGenerateRandom(StorageFactory & factory);
|
||||
void registerStorageExecutable(StorageFactory & factory);
|
||||
void registerStorageWindowView(StorageFactory & factory);
|
||||
void registerStorageLoop(StorageFactory & factory);
|
||||
#if USE_RAPIDJSON || USE_SIMDJSON
|
||||
void registerStorageFuzzJSON(StorageFactory & factory);
|
||||
#endif
|
||||
@ -120,6 +121,7 @@ void registerStorages()
|
||||
registerStorageGenerateRandom(factory);
|
||||
registerStorageExecutable(factory);
|
||||
registerStorageWindowView(factory);
|
||||
registerStorageLoop(factory);
|
||||
#if USE_RAPIDJSON || USE_SIMDJSON
|
||||
registerStorageFuzzJSON(factory);
|
||||
#endif
|
||||
|
156
src/TableFunctions/TableFunctionLoop.cpp
Normal file
156
src/TableFunctions/TableFunctionLoop.cpp
Normal file
@ -0,0 +1,156 @@
|
||||
#include "config.h"
|
||||
#include <TableFunctions/ITableFunction.h>
|
||||
#include <TableFunctions/TableFunctionFactory.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/DatabaseCatalog.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Common/Exception.h>
|
||||
#include <Interpreters/evaluateConstantExpression.h>
|
||||
#include <Storages/checkAndGetLiteralArgument.h>
|
||||
#include <Storages/StorageLoop.h>
|
||||
#include "registerTableFunctions.h"
|
||||
|
||||
namespace DB
|
||||
{
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
extern const int UNKNOWN_TABLE;
|
||||
}
|
||||
namespace
|
||||
{
|
||||
class TableFunctionLoop : public ITableFunction
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = "loop";
|
||||
std::string getName() const override { return name; }
|
||||
private:
|
||||
StoragePtr executeImpl(const ASTPtr & ast_function, ContextPtr context, const String & table_name, ColumnsDescription cached_columns, bool is_insert_query) const override;
|
||||
const char * getStorageTypeName() const override { return "Loop"; }
|
||||
ColumnsDescription getActualTableStructure(ContextPtr context, bool is_insert_query) const override;
|
||||
void parseArguments(const ASTPtr & ast_function, ContextPtr context) override;
|
||||
|
||||
// save the inner table function AST
|
||||
ASTPtr inner_table_function_ast;
|
||||
// save database and table
|
||||
std::string loop_database_name;
|
||||
std::string loop_table_name;
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
void TableFunctionLoop::parseArguments(const ASTPtr & ast_function, ContextPtr context)
|
||||
{
|
||||
const auto & args_func = ast_function->as<ASTFunction &>();
|
||||
|
||||
if (!args_func.arguments)
|
||||
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Table function 'loop' must have arguments.");
|
||||
|
||||
auto & args = args_func.arguments->children;
|
||||
if (args.empty())
|
||||
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "No arguments provided for table function 'loop'");
|
||||
|
||||
if (args.size() == 1)
|
||||
{
|
||||
if (const auto * id = args[0]->as<ASTIdentifier>())
|
||||
{
|
||||
String id_name = id->name();
|
||||
|
||||
size_t dot_pos = id_name.find('.');
|
||||
if (id_name.find('.', dot_pos + 1) != String::npos)
|
||||
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "There are more than one dot");
|
||||
if (dot_pos != String::npos)
|
||||
{
|
||||
loop_database_name = id_name.substr(0, dot_pos);
|
||||
loop_table_name = id_name.substr(dot_pos + 1);
|
||||
}
|
||||
else
|
||||
{
|
||||
loop_table_name = id_name;
|
||||
}
|
||||
}
|
||||
else if (const auto * func = args[0]->as<ASTFunction>())
|
||||
{
|
||||
inner_table_function_ast = args[0];
|
||||
}
|
||||
else
|
||||
{
|
||||
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Expected identifier or function for argument 1 of function 'loop', got {}", args[0]->getID());
|
||||
}
|
||||
}
|
||||
// loop(database, table)
|
||||
else if (args.size() == 2)
|
||||
{
|
||||
args[0] = evaluateConstantExpressionForDatabaseName(args[0], context);
|
||||
args[1] = evaluateConstantExpressionOrIdentifierAsLiteral(args[1], context);
|
||||
|
||||
loop_database_name = checkAndGetLiteralArgument<String>(args[0], "database");
|
||||
loop_table_name = checkAndGetLiteralArgument<String>(args[1], "table");
|
||||
}
|
||||
else
|
||||
{
|
||||
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Table function 'loop' must have 1 or 2 arguments.");
|
||||
}
|
||||
}
|
||||
|
||||
ColumnsDescription TableFunctionLoop::getActualTableStructure(ContextPtr /*context*/, bool /*is_insert_query*/) const
|
||||
{
|
||||
return ColumnsDescription();
|
||||
}
|
||||
|
||||
StoragePtr TableFunctionLoop::executeImpl(
|
||||
const ASTPtr & /*ast_function*/,
|
||||
ContextPtr context,
|
||||
const std::string & table_name,
|
||||
ColumnsDescription cached_columns,
|
||||
bool is_insert_query) const
|
||||
{
|
||||
StoragePtr storage;
|
||||
if (!loop_table_name.empty())
|
||||
{
|
||||
String database_name = loop_database_name;
|
||||
if (database_name.empty())
|
||||
database_name = context->getCurrentDatabase();
|
||||
|
||||
auto database = DatabaseCatalog::instance().getDatabase(database_name);
|
||||
storage = database->tryGetTable(loop_table_name, context);
|
||||
if (!storage)
|
||||
throw Exception(ErrorCodes::UNKNOWN_TABLE, "Table '{}' not found in database '{}'", loop_table_name, database_name);
|
||||
}
|
||||
|
||||
else
|
||||
{
|
||||
auto inner_table_function = TableFunctionFactory::instance().get(inner_table_function_ast, context);
|
||||
storage = inner_table_function->execute(
|
||||
inner_table_function_ast,
|
||||
context,
|
||||
table_name,
|
||||
std::move(cached_columns),
|
||||
is_insert_query);
|
||||
}
|
||||
auto res = std::make_shared<StorageLoop>(
|
||||
StorageID(getDatabaseName(), table_name),
|
||||
storage
|
||||
);
|
||||
res->startup();
|
||||
return res;
|
||||
}
|
||||
|
||||
void registerTableFunctionLoop(TableFunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<TableFunctionLoop>(
|
||||
{.documentation
|
||||
= {.description=R"(The table function can be used to continuously output query results in an infinite loop.)",
|
||||
.examples{{"loop", "SELECT * FROM loop((numbers(3)) LIMIT 7", "0"
|
||||
"1"
|
||||
"2"
|
||||
"0"
|
||||
"1"
|
||||
"2"
|
||||
"0"}}
|
||||
}});
|
||||
}
|
||||
|
||||
}
|
@ -11,6 +11,7 @@ void registerTableFunctions()
|
||||
registerTableFunctionMerge(factory);
|
||||
registerTableFunctionRemote(factory);
|
||||
registerTableFunctionNumbers(factory);
|
||||
registerTableFunctionLoop(factory);
|
||||
registerTableFunctionGenerateSeries(factory);
|
||||
registerTableFunctionNull(factory);
|
||||
registerTableFunctionZeros(factory);
|
||||
|
@ -8,6 +8,7 @@ class TableFunctionFactory;
|
||||
void registerTableFunctionMerge(TableFunctionFactory & factory);
|
||||
void registerTableFunctionRemote(TableFunctionFactory & factory);
|
||||
void registerTableFunctionNumbers(TableFunctionFactory & factory);
|
||||
void registerTableFunctionLoop(TableFunctionFactory & factory);
|
||||
void registerTableFunctionGenerateSeries(TableFunctionFactory & factory);
|
||||
void registerTableFunctionNull(TableFunctionFactory & factory);
|
||||
void registerTableFunctionZeros(TableFunctionFactory & factory);
|
||||
|
@ -109,12 +109,12 @@ def main():
|
||||
test_script = jobs_scripts[test_job]
|
||||
if report_file.exists():
|
||||
report_file.unlink()
|
||||
extra_timeout_option = ""
|
||||
if test_job == JobNames.STATELESS_TEST_RELEASE:
|
||||
extra_timeout_option = str(3600)
|
||||
# "bugfix" must be present in checkname, as integration test runner checks this
|
||||
check_name = f"Validate bugfix: {test_job}"
|
||||
command = f"python3 {test_script} '{check_name}' {extra_timeout_option} --validate-bugfix --report-to-file {report_file}"
|
||||
command = (
|
||||
f"python3 {test_script} '{check_name}' "
|
||||
f"--validate-bugfix --report-to-file {report_file}"
|
||||
)
|
||||
print(f"Going to validate job [{test_job}], command [{command}]")
|
||||
_ = subprocess.run(
|
||||
command,
|
||||
|
@ -33,9 +33,10 @@ from subprocess import CalledProcessError
|
||||
from typing import List, Optional
|
||||
|
||||
import __main__
|
||||
|
||||
from env_helper import TEMP_PATH
|
||||
from get_robot_token import get_best_robot_token
|
||||
from git_helper import git_runner, is_shallow
|
||||
from git_helper import GIT_PREFIX, git_runner, is_shallow
|
||||
from github_helper import GitHub, PullRequest, PullRequests, Repository
|
||||
from lambda_shared_package.lambda_shared.pr import Labels
|
||||
from ssh import SSHKey
|
||||
@ -104,10 +105,6 @@ close it.
|
||||
|
||||
self.backport_created_label = backport_created_label
|
||||
|
||||
self.git_prefix = ( # All commits to cherrypick are done as robot-clickhouse
|
||||
"git -c user.email=robot-clickhouse@users.noreply.github.com "
|
||||
"-c user.name=robot-clickhouse -c commit.gpgsign=false"
|
||||
)
|
||||
self.pre_check()
|
||||
|
||||
def pre_check(self):
|
||||
@ -190,17 +187,17 @@ close it.
|
||||
def create_cherrypick(self):
|
||||
# First, create backport branch:
|
||||
# Checkout release branch with discarding every change
|
||||
git_runner(f"{self.git_prefix} checkout -f {self.name}")
|
||||
git_runner(f"{GIT_PREFIX} checkout -f {self.name}")
|
||||
# Create or reset backport branch
|
||||
git_runner(f"{self.git_prefix} checkout -B {self.backport_branch}")
|
||||
git_runner(f"{GIT_PREFIX} checkout -B {self.backport_branch}")
|
||||
# Merge all changes from PR's the first parent commit w/o applying anything
|
||||
# It will allow to create a merge commit like it would be a cherry-pick
|
||||
first_parent = git_runner(f"git rev-parse {self.pr.merge_commit_sha}^1")
|
||||
git_runner(f"{self.git_prefix} merge -s ours --no-edit {first_parent}")
|
||||
git_runner(f"{GIT_PREFIX} merge -s ours --no-edit {first_parent}")
|
||||
|
||||
# Second step, create cherrypick branch
|
||||
git_runner(
|
||||
f"{self.git_prefix} branch -f "
|
||||
f"{GIT_PREFIX} branch -f "
|
||||
f"{self.cherrypick_branch} {self.pr.merge_commit_sha}"
|
||||
)
|
||||
|
||||
@ -209,7 +206,7 @@ close it.
|
||||
# manually to the release branch already
|
||||
try:
|
||||
output = git_runner(
|
||||
f"{self.git_prefix} merge --no-commit --no-ff {self.cherrypick_branch}"
|
||||
f"{GIT_PREFIX} merge --no-commit --no-ff {self.cherrypick_branch}"
|
||||
)
|
||||
# 'up-to-date', 'up to date', who knows what else (╯°v°)╯ ^┻━┻
|
||||
if output.startswith("Already up") and output.endswith("date."):
|
||||
@ -223,14 +220,14 @@ close it.
|
||||
return
|
||||
except CalledProcessError:
|
||||
# There are most probably conflicts, they'll be resolved in PR
|
||||
git_runner(f"{self.git_prefix} reset --merge")
|
||||
git_runner(f"{GIT_PREFIX} reset --merge")
|
||||
else:
|
||||
# There are changes to apply, so continue
|
||||
git_runner(f"{self.git_prefix} reset --merge")
|
||||
git_runner(f"{GIT_PREFIX} reset --merge")
|
||||
|
||||
# Push, create the cherrypick PR, lable and assign it
|
||||
for branch in [self.cherrypick_branch, self.backport_branch]:
|
||||
git_runner(f"{self.git_prefix} push -f {self.REMOTE} {branch}:{branch}")
|
||||
git_runner(f"{GIT_PREFIX} push -f {self.REMOTE} {branch}:{branch}")
|
||||
|
||||
self.cherrypick_pr = self.repo.create_pull(
|
||||
title=f"Cherry pick #{self.pr.number} to {self.name}: {self.pr.title}",
|
||||
@ -258,21 +255,19 @@ close it.
|
||||
# Checkout the backport branch from the remote and make all changes to
|
||||
# apply like they are only one cherry-pick commit on top of release
|
||||
logging.info("Creating backport for PR #%s", self.pr.number)
|
||||
git_runner(f"{self.git_prefix} checkout -f {self.backport_branch}")
|
||||
git_runner(
|
||||
f"{self.git_prefix} pull --ff-only {self.REMOTE} {self.backport_branch}"
|
||||
)
|
||||
git_runner(f"{GIT_PREFIX} checkout -f {self.backport_branch}")
|
||||
git_runner(f"{GIT_PREFIX} pull --ff-only {self.REMOTE} {self.backport_branch}")
|
||||
merge_base = git_runner(
|
||||
f"{self.git_prefix} merge-base "
|
||||
f"{GIT_PREFIX} merge-base "
|
||||
f"{self.REMOTE}/{self.name} {self.backport_branch}"
|
||||
)
|
||||
git_runner(f"{self.git_prefix} reset --soft {merge_base}")
|
||||
git_runner(f"{GIT_PREFIX} reset --soft {merge_base}")
|
||||
title = f"Backport #{self.pr.number} to {self.name}: {self.pr.title}"
|
||||
git_runner(f"{self.git_prefix} commit --allow-empty -F -", input=title)
|
||||
git_runner(f"{GIT_PREFIX} commit --allow-empty -F -", input=title)
|
||||
|
||||
# Push with force, create the backport PR, lable and assign it
|
||||
git_runner(
|
||||
f"{self.git_prefix} push -f {self.REMOTE} "
|
||||
f"{GIT_PREFIX} push -f {self.REMOTE} "
|
||||
f"{self.backport_branch}:{self.backport_branch}"
|
||||
)
|
||||
self.backport_pr = self.repo.create_pull(
|
||||
@ -668,9 +663,11 @@ def main():
|
||||
args.repo,
|
||||
args.from_repo,
|
||||
args.dry_run,
|
||||
args.must_create_backport_label
|
||||
if isinstance(args.must_create_backport_label, list)
|
||||
else [args.must_create_backport_label],
|
||||
(
|
||||
args.must_create_backport_label
|
||||
if isinstance(args.must_create_backport_label, list)
|
||||
else [args.must_create_backport_label]
|
||||
),
|
||||
args.backport_created_label,
|
||||
)
|
||||
# https://github.com/python/mypy/issues/3004
|
||||
|
@ -18,6 +18,7 @@ import docker_images_helper
|
||||
import upload_result_helper
|
||||
from build_check import get_release_or_pr
|
||||
from ci_config import CI_CONFIG, Build, CILabels, CIStages, JobNames, StatusNames
|
||||
from ci_metadata import CiMetadata
|
||||
from ci_utils import GHActions, is_hex, normalize_string
|
||||
from clickhouse_helper import (
|
||||
CiLogsCredentials,
|
||||
@ -39,22 +40,23 @@ from digest_helper import DockerDigester, JobDigester
|
||||
from env_helper import (
|
||||
CI,
|
||||
GITHUB_JOB_API_URL,
|
||||
GITHUB_REPOSITORY,
|
||||
GITHUB_RUN_ID,
|
||||
GITHUB_RUN_URL,
|
||||
REPO_COPY,
|
||||
REPORT_PATH,
|
||||
S3_BUILDS_BUCKET,
|
||||
TEMP_PATH,
|
||||
GITHUB_RUN_ID,
|
||||
GITHUB_REPOSITORY,
|
||||
)
|
||||
from get_robot_token import get_best_robot_token
|
||||
from git_helper import GIT_PREFIX, Git
|
||||
from git_helper import Runner as GitRunner
|
||||
from github_helper import GitHub
|
||||
from pr_info import PRInfo
|
||||
from report import ERROR, SUCCESS, BuildResult, JobReport, PENDING
|
||||
from report import ERROR, FAILURE, PENDING, SUCCESS, BuildResult, JobReport, TestResult
|
||||
from s3_helper import S3Helper
|
||||
from ci_metadata import CiMetadata
|
||||
from stopwatch import Stopwatch
|
||||
from tee_popen import TeePopen
|
||||
from version_helper import get_version_from_repo
|
||||
|
||||
# pylint: disable=too-many-lines
|
||||
@ -1867,8 +1869,8 @@ def _run_test(job_name: str, run_command: str) -> int:
|
||||
run_command or CI_CONFIG.get_job_config(job_name).run_command
|
||||
), "Run command must be provided as input argument or be configured in job config"
|
||||
|
||||
if CI_CONFIG.get_job_config(job_name).timeout:
|
||||
os.environ["KILL_TIMEOUT"] = str(CI_CONFIG.get_job_config(job_name).timeout)
|
||||
env = os.environ.copy()
|
||||
timeout = CI_CONFIG.get_job_config(job_name).timeout or None
|
||||
|
||||
if not run_command:
|
||||
run_command = "/".join(
|
||||
@ -1879,26 +1881,27 @@ def _run_test(job_name: str, run_command: str) -> int:
|
||||
print("Use run command from a job config")
|
||||
else:
|
||||
print("Use run command from the workflow")
|
||||
os.environ["CHECK_NAME"] = job_name
|
||||
env["CHECK_NAME"] = job_name
|
||||
print(f"Going to start run command [{run_command}]")
|
||||
process = subprocess.run(
|
||||
run_command,
|
||||
stdout=sys.stdout,
|
||||
stderr=sys.stderr,
|
||||
text=True,
|
||||
check=False,
|
||||
shell=True,
|
||||
)
|
||||
stopwatch = Stopwatch()
|
||||
job_log = Path(TEMP_PATH) / "job_log.txt"
|
||||
with TeePopen(run_command, job_log, env, timeout) as process:
|
||||
retcode = process.wait()
|
||||
if retcode != 0:
|
||||
print(f"Run action failed for: [{job_name}] with exit code [{retcode}]")
|
||||
if timeout and process.timeout_exceeded:
|
||||
print(f"Timeout {timeout} exceeded, dumping the job report")
|
||||
JobReport(
|
||||
status=FAILURE,
|
||||
description=f"Timeout {timeout} exceeded",
|
||||
test_results=[TestResult.create_check_timeout_expired(timeout)],
|
||||
start_time=stopwatch.start_time_str,
|
||||
duration=stopwatch.duration_seconds,
|
||||
additional_files=[job_log],
|
||||
).dump()
|
||||
|
||||
if process.returncode == 0:
|
||||
print(f"Run action done for: [{job_name}]")
|
||||
exit_code = 0
|
||||
else:
|
||||
print(
|
||||
f"Run action failed for: [{job_name}] with exit code [{process.returncode}]"
|
||||
)
|
||||
exit_code = process.returncode
|
||||
return exit_code
|
||||
print(f"Run action done for: [{job_name}]")
|
||||
return retcode
|
||||
|
||||
|
||||
def _get_ext_check_name(check_name: str) -> str:
|
||||
|
@ -175,8 +175,8 @@ class JobNames(metaclass=WithIter):
|
||||
COMPATIBILITY_TEST = "Compatibility check (amd64)"
|
||||
COMPATIBILITY_TEST_ARM = "Compatibility check (aarch64)"
|
||||
|
||||
CLCIKBENCH_TEST = "ClickBench (amd64)"
|
||||
CLCIKBENCH_TEST_ARM = "ClickBench (aarch64)"
|
||||
CLICKBENCH_TEST = "ClickBench (amd64)"
|
||||
CLICKBENCH_TEST_ARM = "ClickBench (aarch64)"
|
||||
|
||||
LIBFUZZER_TEST = "libFuzzer tests"
|
||||
|
||||
@ -472,17 +472,18 @@ compatibility_test_common_params = {
|
||||
}
|
||||
stateless_test_common_params = {
|
||||
"digest": stateless_check_digest,
|
||||
"run_command": 'functional_test_check.py "$CHECK_NAME" $KILL_TIMEOUT',
|
||||
"run_command": 'functional_test_check.py "$CHECK_NAME"',
|
||||
"timeout": 10800,
|
||||
}
|
||||
stateful_test_common_params = {
|
||||
"digest": stateful_check_digest,
|
||||
"run_command": 'functional_test_check.py "$CHECK_NAME" $KILL_TIMEOUT',
|
||||
"run_command": 'functional_test_check.py "$CHECK_NAME"',
|
||||
"timeout": 3600,
|
||||
}
|
||||
stress_test_common_params = {
|
||||
"digest": stress_check_digest,
|
||||
"run_command": "stress_check.py",
|
||||
"timeout": 9000,
|
||||
}
|
||||
upgrade_test_common_params = {
|
||||
"digest": upgrade_check_digest,
|
||||
@ -531,6 +532,7 @@ clickbench_test_params = {
|
||||
docker=["clickhouse/clickbench"],
|
||||
),
|
||||
"run_command": 'clickbench.py "$CHECK_NAME"',
|
||||
"timeout": 900,
|
||||
}
|
||||
install_test_params = JobConfig(
|
||||
digest=install_check_digest,
|
||||
@ -1067,6 +1069,7 @@ CI_CONFIG = CIConfig(
|
||||
Build.PACKAGE_TSAN,
|
||||
Build.PACKAGE_MSAN,
|
||||
Build.PACKAGE_DEBUG,
|
||||
Build.BINARY_RELEASE,
|
||||
]
|
||||
),
|
||||
JobNames.BUILD_CHECK_SPECIAL: BuildReportConfig(
|
||||
@ -1084,7 +1087,6 @@ CI_CONFIG = CIConfig(
|
||||
Build.BINARY_AMD64_COMPAT,
|
||||
Build.BINARY_AMD64_MUSL,
|
||||
Build.PACKAGE_RELEASE_COVERAGE,
|
||||
Build.BINARY_RELEASE,
|
||||
Build.FUZZERS,
|
||||
]
|
||||
),
|
||||
@ -1111,6 +1113,7 @@ CI_CONFIG = CIConfig(
|
||||
exclude_files=[".md"],
|
||||
docker=["clickhouse/fasttest"],
|
||||
),
|
||||
timeout=2400,
|
||||
),
|
||||
),
|
||||
JobNames.STYLE_CHECK: TestConfig(
|
||||
@ -1123,7 +1126,9 @@ CI_CONFIG = CIConfig(
|
||||
"",
|
||||
# we run this check by label - no digest required
|
||||
job_config=JobConfig(
|
||||
run_by_label="pr-bugfix", run_command="bugfix_validate_check.py"
|
||||
run_by_label="pr-bugfix",
|
||||
run_command="bugfix_validate_check.py",
|
||||
timeout=900,
|
||||
),
|
||||
),
|
||||
},
|
||||
@ -1357,10 +1362,10 @@ CI_CONFIG = CIConfig(
|
||||
Build.PACKAGE_RELEASE, job_config=sqllogic_test_params
|
||||
),
|
||||
JobNames.SQLTEST: TestConfig(Build.PACKAGE_RELEASE, job_config=sql_test_params),
|
||||
JobNames.CLCIKBENCH_TEST: TestConfig(
|
||||
JobNames.CLICKBENCH_TEST: TestConfig(
|
||||
Build.PACKAGE_RELEASE, job_config=JobConfig(**clickbench_test_params) # type: ignore
|
||||
),
|
||||
JobNames.CLCIKBENCH_TEST_ARM: TestConfig(
|
||||
JobNames.CLICKBENCH_TEST_ARM: TestConfig(
|
||||
Build.PACKAGE_AARCH64, job_config=JobConfig(**clickbench_test_params) # type: ignore
|
||||
),
|
||||
JobNames.LIBFUZZER_TEST: TestConfig(
|
||||
@ -1368,7 +1373,7 @@ CI_CONFIG = CIConfig(
|
||||
job_config=JobConfig(
|
||||
run_by_label=CILabels.libFuzzer,
|
||||
timeout=10800,
|
||||
run_command='libfuzzer_test_check.py "$CHECK_NAME" 10800',
|
||||
run_command='libfuzzer_test_check.py "$CHECK_NAME"',
|
||||
),
|
||||
), # type: ignore
|
||||
},
|
||||
@ -1386,6 +1391,9 @@ REQUIRED_CHECKS = [
|
||||
JobNames.FAST_TEST,
|
||||
JobNames.STATEFUL_TEST_RELEASE,
|
||||
JobNames.STATELESS_TEST_RELEASE,
|
||||
JobNames.STATELESS_TEST_ASAN,
|
||||
JobNames.STATELESS_TEST_FLAKY_ASAN,
|
||||
JobNames.STATEFUL_TEST_ASAN,
|
||||
JobNames.STYLE_CHECK,
|
||||
JobNames.UNIT_TEST_ASAN,
|
||||
JobNames.UNIT_TEST_MSAN,
|
||||
@ -1419,6 +1427,11 @@ class CheckDescription:
|
||||
|
||||
|
||||
CHECK_DESCRIPTIONS = [
|
||||
CheckDescription(
|
||||
StatusNames.SYNC,
|
||||
"If it fails, ask a maintainer for help",
|
||||
lambda x: x == StatusNames.SYNC,
|
||||
),
|
||||
CheckDescription(
|
||||
"AST fuzzer",
|
||||
"Runs randomly generated queries to catch program errors. "
|
||||
|
@ -1,8 +1,7 @@
|
||||
from contextlib import contextmanager
|
||||
import os
|
||||
import signal
|
||||
from typing import Any, List, Union, Iterator
|
||||
from contextlib import contextmanager
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterator, List, Union
|
||||
|
||||
|
||||
class WithIter(type):
|
||||
@ -49,14 +48,3 @@ class GHActions:
|
||||
for line in lines:
|
||||
print(line)
|
||||
print("::endgroup::")
|
||||
|
||||
|
||||
def set_job_timeout():
|
||||
def timeout_handler(_signum, _frame):
|
||||
print("Timeout expired")
|
||||
raise TimeoutError("Job's KILL_TIMEOUT expired")
|
||||
|
||||
kill_timeout = int(os.getenv("KILL_TIMEOUT", "0"))
|
||||
assert kill_timeout > 0, "kill timeout must be provided in KILL_TIMEOUT env"
|
||||
signal.signal(signal.SIGALRM, timeout_handler)
|
||||
signal.alarm(kill_timeout)
|
||||
|
@ -1,5 +1,4 @@
|
||||
#!/usr/bin/env python3
|
||||
import argparse
|
||||
import csv
|
||||
import logging
|
||||
import os
|
||||
@ -11,15 +10,7 @@ from typing import Tuple
|
||||
from docker_images_helper import DockerImage, get_docker_image, pull_image
|
||||
from env_helper import REPO_COPY, S3_BUILDS_BUCKET, TEMP_PATH
|
||||
from pr_info import PRInfo
|
||||
from report import (
|
||||
ERROR,
|
||||
FAILURE,
|
||||
SUCCESS,
|
||||
JobReport,
|
||||
TestResult,
|
||||
TestResults,
|
||||
read_test_results,
|
||||
)
|
||||
from report import ERROR, FAILURE, SUCCESS, JobReport, TestResults, read_test_results
|
||||
from stopwatch import Stopwatch
|
||||
from tee_popen import TeePopen
|
||||
|
||||
@ -80,30 +71,9 @@ def process_results(result_directory: Path) -> Tuple[str, str, TestResults]:
|
||||
return state, description, test_results
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(
|
||||
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
|
||||
description="FastTest script",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
# Fast tests in most cases done within 10 min and 40 min timout should be sufficient,
|
||||
# though due to cold cache build time can be much longer
|
||||
# https://pastila.nl/?146195b6/9bb99293535e3817a9ea82c3f0f7538d.link#5xtClOjkaPLEjSuZ92L2/g==
|
||||
default=40,
|
||||
help="Timeout in minutes",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
args.timeout = args.timeout * 60
|
||||
return args
|
||||
|
||||
|
||||
def main():
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
stopwatch = Stopwatch()
|
||||
args = parse_args()
|
||||
|
||||
temp_path = Path(TEMP_PATH)
|
||||
temp_path.mkdir(parents=True, exist_ok=True)
|
||||
@ -134,14 +104,10 @@ def main():
|
||||
logs_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
run_log_path = logs_path / "run.log"
|
||||
timeout_expired = False
|
||||
|
||||
with TeePopen(run_cmd, run_log_path, timeout=args.timeout) as process:
|
||||
with TeePopen(run_cmd, run_log_path) as process:
|
||||
retcode = process.wait()
|
||||
if process.timeout_exceeded:
|
||||
logging.info("Timeout expired for command: %s", run_cmd)
|
||||
timeout_expired = True
|
||||
elif retcode == 0:
|
||||
if retcode == 0:
|
||||
logging.info("Run successfully")
|
||||
else:
|
||||
logging.info("Run failed")
|
||||
@ -175,11 +141,6 @@ def main():
|
||||
else:
|
||||
state, description, test_results = process_results(output_path)
|
||||
|
||||
if timeout_expired:
|
||||
test_results.append(TestResult.create_check_timeout_expired(args.timeout))
|
||||
state = FAILURE
|
||||
description = test_results[-1].name
|
||||
|
||||
JobReport(
|
||||
description=description,
|
||||
test_results=test_results,
|
||||
|
@ -68,7 +68,6 @@ def get_run_command(
|
||||
repo_path: Path,
|
||||
result_path: Path,
|
||||
server_log_path: Path,
|
||||
kill_timeout: int,
|
||||
additional_envs: List[str],
|
||||
ci_logs_args: str,
|
||||
image: DockerImage,
|
||||
@ -86,7 +85,6 @@ def get_run_command(
|
||||
)
|
||||
|
||||
envs = [
|
||||
f"-e MAX_RUN_TIME={int(0.9 * kill_timeout)}",
|
||||
# a static link, don't use S3_URL or S3_DOWNLOAD
|
||||
'-e S3_URL="https://s3.amazonaws.com/clickhouse-datasets"',
|
||||
]
|
||||
@ -192,7 +190,6 @@ def process_results(
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("check_name")
|
||||
parser.add_argument("kill_timeout", type=int)
|
||||
parser.add_argument(
|
||||
"--validate-bugfix",
|
||||
action="store_true",
|
||||
@ -224,12 +221,7 @@ def main():
|
||||
assert (
|
||||
check_name
|
||||
), "Check name must be provided as an input arg or in CHECK_NAME env"
|
||||
kill_timeout = args.kill_timeout or int(os.getenv("KILL_TIMEOUT", "0"))
|
||||
assert (
|
||||
kill_timeout > 0
|
||||
), "kill timeout must be provided as an input arg or in KILL_TIMEOUT env"
|
||||
validate_bugfix_check = args.validate_bugfix
|
||||
print(f"Runnin check [{check_name}] with timeout [{kill_timeout}]")
|
||||
|
||||
flaky_check = "flaky" in check_name.lower()
|
||||
|
||||
@ -288,7 +280,6 @@ def main():
|
||||
repo_path,
|
||||
result_path,
|
||||
server_log_path,
|
||||
kill_timeout,
|
||||
additional_envs,
|
||||
ci_logs_args,
|
||||
docker_image,
|
||||
|
@ -1,9 +1,12 @@
|
||||
#!/usr/bin/env python
|
||||
import argparse
|
||||
import atexit
|
||||
import logging
|
||||
import os
|
||||
import os.path as p
|
||||
import re
|
||||
import subprocess
|
||||
import tempfile
|
||||
from typing import Any, List, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@ -19,12 +22,16 @@ SHA_REGEXP = re.compile(r"\A([0-9]|[a-f]){40}\Z")
|
||||
CWD = p.dirname(p.realpath(__file__))
|
||||
TWEAK = 1
|
||||
|
||||
GIT_PREFIX = ( # All commits to remote are done as robot-clickhouse
|
||||
"git -c user.email=robot-clickhouse@users.noreply.github.com "
|
||||
"-c user.name=robot-clickhouse -c commit.gpgsign=false "
|
||||
"-c core.sshCommand="
|
||||
"'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'"
|
||||
)
|
||||
with tempfile.NamedTemporaryFile("w", delete=False) as f:
|
||||
GIT_KNOWN_HOSTS_FILE = f.name
|
||||
GIT_PREFIX = ( # All commits to remote are done as robot-clickhouse
|
||||
"git -c user.email=robot-clickhouse@users.noreply.github.com "
|
||||
"-c user.name=robot-clickhouse -c commit.gpgsign=false "
|
||||
"-c core.sshCommand="
|
||||
f"'ssh -o UserKnownHostsFile={GIT_KNOWN_HOSTS_FILE} "
|
||||
"-o StrictHostKeyChecking=accept-new'"
|
||||
)
|
||||
atexit.register(os.remove, f.name)
|
||||
|
||||
|
||||
# Py 3.8 removeprefix and removesuffix
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user