mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-10 01:25:21 +00:00
Merge branch 'master' into add-compression-sorts-optimization
This commit is contained in:
commit
f3563d81bd
1
.github/PULL_REQUEST_TEMPLATE.md
vendored
1
.github/PULL_REQUEST_TEMPLATE.md
vendored
@ -11,6 +11,7 @@ tests/ci/cancel_and_rerun_workflow_lambda/app.py
|
||||
- Backward Incompatible Change
|
||||
- Build/Testing/Packaging Improvement
|
||||
- Documentation (changelog entry is not required)
|
||||
- Critical Bug Fix (crash, LOGICAL_ERROR, data loss, RBAC)
|
||||
- Bug Fix (user-visible misbehavior in an official stable release)
|
||||
- CI Fix or Improvement (changelog entry is not required)
|
||||
- Not for changelog (changelog entry is not required)
|
||||
|
174
CHANGELOG.md
174
CHANGELOG.md
@ -1,4 +1,5 @@
|
||||
### Table of Contents
|
||||
**[ClickHouse release v24.5, 2024-05-30](#245)**<br/>
|
||||
**[ClickHouse release v24.4, 2024-04-30](#244)**<br/>
|
||||
**[ClickHouse release v24.3 LTS, 2024-03-26](#243)**<br/>
|
||||
**[ClickHouse release v24.2, 2024-02-29](#242)**<br/>
|
||||
@ -7,6 +8,179 @@
|
||||
|
||||
# 2024 Changelog
|
||||
|
||||
### <a id="245"></a> ClickHouse release 24.5, 2024-05-30
|
||||
|
||||
#### Backward Incompatible Change
|
||||
* Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. [#62884](https://github.com/ClickHouse/ClickHouse/pull/62884) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Usage of functions `neighbor`, `runningAccumulate`, `runningDifferenceStartingWithFirstValue`, `runningDifference` deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set `allow_deprecated_functions = 1` or set `compatibility = '24.4'` or lower. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Queries from `system.columns` will work faster if there is a large number of columns, but many databases or tables are not granted for `SHOW TABLES`. Note that in previous versions, if you grant `SHOW COLUMNS` to individual columns without granting `SHOW TABLES` to the corresponding tables, the `system.columns` table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. [#63439](https://github.com/ClickHouse/ClickHouse/pull/63439) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Setting `replace_long_file_name_to_hash` is enabled by default for `MergeTree` tables. [#64457](https://github.com/ClickHouse/ClickHouse/pull/64457) ([Anton Popov](https://github.com/CurtizJ)). The data written with this setting can be read by server versions since 23.9. After you use ClickHouse with this setting enabled, you cannot downgrade to versions 23.8 and earlier.
|
||||
|
||||
#### New Feature
|
||||
* Adds the `Form` format to read/write a single record in the `application/x-www-form-urlencoded` format. [#60199](https://github.com/ClickHouse/ClickHouse/pull/60199) ([Shaun Struwig](https://github.com/Blargian)).
|
||||
* Added possibility to compress in CROSS JOIN. [#60459](https://github.com/ClickHouse/ClickHouse/pull/60459) ([p1rattttt](https://github.com/p1rattttt)).
|
||||
* Added possibility to do `CROSS JOIN` in temporary files if the size exceeds limits. [#63432](https://github.com/ClickHouse/ClickHouse/pull/63432) ([p1rattttt](https://github.com/p1rattttt)).
|
||||
* Support join with inequal conditions which involve columns from both left and right table. e.g. `t1.y < t2.y`. To enable, `SET allow_experimental_join_condition = 1`. [#60920](https://github.com/ClickHouse/ClickHouse/pull/60920) ([lgbo](https://github.com/lgbo-ustc)).
|
||||
* Maps can now have `Float32`, `Float64`, `Array(T)`, `Map(K, V)` and `Tuple(T1, T2, ...)` as keys. Closes [#54537](https://github.com/ClickHouse/ClickHouse/issues/54537). [#59318](https://github.com/ClickHouse/ClickHouse/pull/59318) ([李扬](https://github.com/taiyang-li)).
|
||||
* Introduce bulk loading to `EmbeddedRocksDB` by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce `EmbeddedRocksDB` table settings. [#59163](https://github.com/ClickHouse/ClickHouse/pull/59163) [#63324](https://github.com/ClickHouse/ClickHouse/pull/63324) ([Duc Canh Le](https://github.com/canhld94)).
|
||||
* User can now parse CRLF with TSV format using a setting `input_format_tsv_crlf_end_of_line`. Closes [#56257](https://github.com/ClickHouse/ClickHouse/issues/56257). [#59747](https://github.com/ClickHouse/ClickHouse/pull/59747) ([Shaun Struwig](https://github.com/Blargian)).
|
||||
* A new setting `input_format_force_null_for_omitted_fields` that forces NULL values for omitted fields. [#60887](https://github.com/ClickHouse/ClickHouse/pull/60887) ([Constantine Peresypkin](https://github.com/pkit)).
|
||||
* Earlier our S3 storage and s3 table function didn't support selecting from archive container files, such as tarballs, zip, 7z. Now they allow to iterate over files inside archives in S3. [#62259](https://github.com/ClickHouse/ClickHouse/pull/62259) ([Daniil Ivanik](https://github.com/divanik)).
|
||||
* Support for conditional function `clamp`. [#62377](https://github.com/ClickHouse/ClickHouse/pull/62377) ([skyoct](https://github.com/skyoct)).
|
||||
* Add `NPy` output format. [#62430](https://github.com/ClickHouse/ClickHouse/pull/62430) ([豪肥肥](https://github.com/HowePa)).
|
||||
* `Raw` format as a synonym for `TSVRaw`. [#63394](https://github.com/ClickHouse/ClickHouse/pull/63394) ([Unalian](https://github.com/Unalian)).
|
||||
* Added new SQL functions `generateSnowflakeID` for generating Twitter-style Snowflake IDs. [#63577](https://github.com/ClickHouse/ClickHouse/pull/63577) ([Danila Puzov](https://github.com/kazalika)).
|
||||
* On Linux and MacOS, if the program has stdout redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to `INTO OUTFILE`). [#63662](https://github.com/ClickHouse/ClickHouse/pull/63662) ([v01dXYZ](https://github.com/v01dXYZ)).
|
||||
* Change warning on high number of attached tables to differentiate tables, views and dictionaries. [#64180](https://github.com/ClickHouse/ClickHouse/pull/64180) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
|
||||
* Added SQL functions `fromReadableSize` (along with `OrNull` and `OrZero` variants). This function performs the opposite operation of functions `formatReadableSize` and `formatReadableDecimalSize,` i.e., the given human-readable byte size; they return the number of bytes. Example: `SELECT fromReadableSize('3.0 MiB')` returns `3145728`. [#64386](https://github.com/ClickHouse/ClickHouse/pull/64386) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
|
||||
* Provide support for `azureBlobStorage` function in ClickHouse server to use Azure Workload identity to authenticate against Azure blob storage. If `use_workload_identity` parameter is set in config, [workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications) is used for authentication. [#57881](https://github.com/ClickHouse/ClickHouse/pull/57881) ([Vinay Suryadevara](https://github.com/vinay92-ch)).
|
||||
* Add TTL information in the `system.parts_columns` table. [#63200](https://github.com/ClickHouse/ClickHouse/pull/63200) ([litlig](https://github.com/litlig)).
|
||||
|
||||
#### Experimental Features
|
||||
* Implement `Dynamic` data type that allows to store values of any type inside it without knowing all of them in advance. `Dynamic` type is available under a setting `allow_experimental_dynamic_type`. Reference: [#54864](https://github.com/ClickHouse/ClickHouse/issues/54864). [#63058](https://github.com/ClickHouse/ClickHouse/pull/63058) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Allowed to create `MaterializedMySQL` database without connection to MySQL. [#63397](https://github.com/ClickHouse/ClickHouse/pull/63397) ([Kirill](https://github.com/kirillgarbar)).
|
||||
* Automatically mark a replica of Replicated database as lost and start recovery if some DDL task fails more than `max_retries_before_automatic_recovery` (100 by default) times in a row with the same error. Also, fixed a bug that could cause skipping DDL entries when an exception is thrown during an early stage of entry execution. [#63549](https://github.com/ClickHouse/ClickHouse/pull/63549) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
* Account failed files in `s3queue_tracked_file_ttl_sec` and `s3queue_traked_files_limit` for `StorageS3Queue`. [#63638](https://github.com/ClickHouse/ClickHouse/pull/63638) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
|
||||
#### Performance Improvement
|
||||
* A native parquet reader, which can read parquet binary to ClickHouse columns directly. Now this feature can be activated by setting `input_format_parquet_use_native_reader` to true. [#60361](https://github.com/ClickHouse/ClickHouse/pull/60361) ([ZhiHong Zhang](https://github.com/copperybean)).
|
||||
* Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by `keep_free_space_size(elements)_ratio`). This allows to release pressure from space reservation for queries (on `tryReserve` method). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. [#61250](https://github.com/ClickHouse/ClickHouse/pull/61250) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Skip merging of newly created projection blocks during `INSERT`-s. [#59405](https://github.com/ClickHouse/ClickHouse/pull/59405) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Process string functions `...UTF8` 'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. [#61632](https://github.com/ClickHouse/ClickHouse/pull/61632) ([李扬](https://github.com/taiyang-li)).
|
||||
* Improved performance of selection (`{}`) globs in StorageS3. [#62120](https://github.com/ClickHouse/ClickHouse/pull/62120) ([Andrey Zvonov](https://github.com/zvonand)).
|
||||
* HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. [#62652](https://github.com/ClickHouse/ClickHouse/pull/62652) ([Anton Ivashkin](https://github.com/ianton-ru)).
|
||||
* Function `splitByRegexp` is now faster when the regular expression argument is a single-character, trivial regular expression (in this case, it now falls back internally to `splitByChar`). [#62696](https://github.com/ClickHouse/ClickHouse/pull/62696) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Aggregation with 8-bit and 16-bit keys became faster: added min/max in FixedHashTable to limit the array index and reduce the `isZero()` calls during iteration. [#62746](https://github.com/ClickHouse/ClickHouse/pull/62746) ([Jiebin Sun](https://github.com/jiebinn)).
|
||||
* Add a new configuration`prefer_merge_sort_block_bytes` to control the memory usage and speed up sorting 2 times when merging when there are many columns. [#62904](https://github.com/ClickHouse/ClickHouse/pull/62904) ([LiuNeng](https://github.com/liuneng1994)).
|
||||
* `clickhouse-local` will start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes [#62941](https://github.com/ClickHouse/ClickHouse/issues/62941). [#63074](https://github.com/ClickHouse/ClickHouse/pull/63074) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Micro-optimizations for the new analyzer. [#63429](https://github.com/ClickHouse/ClickHouse/pull/63429) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Index analysis will work if `DateTime` is compared to `DateTime64`. This closes [#63441](https://github.com/ClickHouse/ClickHouse/issues/63441). [#63443](https://github.com/ClickHouse/ClickHouse/pull/63443) [#63532](https://github.com/ClickHouse/ClickHouse/pull/63532) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Speed up indices of type `set` a little (around 1.5 times) by removing garbage. [#64098](https://github.com/ClickHouse/ClickHouse/pull/64098) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Optimized vertical merges in tables with sparse columns. [#64311](https://github.com/ClickHouse/ClickHouse/pull/64311) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Improve filtering of sparse columns: reduce redundant calls of `ColumnSparse::filter` to improve performance. [#64426](https://github.com/ClickHouse/ClickHouse/pull/64426) ([Jiebin Sun](https://github.com/jiebinn)).
|
||||
* Remove copying data when writing to the filesystem cache. [#63401](https://github.com/ClickHouse/ClickHouse/pull/63401) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Now backups with azure blob storage will use multicopy. [#64116](https://github.com/ClickHouse/ClickHouse/pull/64116) ([alesapin](https://github.com/alesapin)).
|
||||
* Allow to use native copy for azure even with different containers. [#64154](https://github.com/ClickHouse/ClickHouse/pull/64154) ([alesapin](https://github.com/alesapin)).
|
||||
* Finally enable native copy for azure. [#64182](https://github.com/ClickHouse/ClickHouse/pull/64182) ([alesapin](https://github.com/alesapin)).
|
||||
* Improve the iteration over sparse columns to reduce call of `size`. [#64497](https://github.com/ClickHouse/ClickHouse/pull/64497) ([Jiebin Sun](https://github.com/jiebinn)).
|
||||
|
||||
#### Improvement
|
||||
* Allow using `clickhouse-local` and its shortcuts `clickhouse` and `ch` with a query or queries file as a positional argument. Examples: `ch "SELECT 1"`, `ch --param_test Hello "SELECT {test:String}"`, `ch query.sql`. This closes [#62361](https://github.com/ClickHouse/ClickHouse/issues/62361). [#63081](https://github.com/ClickHouse/ClickHouse/pull/63081) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Enable plain_rewritable metadata for local and Azure (azure_blob_storage) object storages. [#63365](https://github.com/ClickHouse/ClickHouse/pull/63365) ([Julia Kartseva](https://github.com/jkartseva)).
|
||||
* Support English-style Unicode quotes, e.g. “Hello”, ‘world’. This is questionable in general but helpful when you type your query in a word processor, such as Google Docs. This closes [#58634](https://github.com/ClickHouse/ClickHouse/issues/58634). [#63381](https://github.com/ClickHouse/ClickHouse/pull/63381) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Allow trailing commas in the columns list in the INSERT query. For example, `INSERT INTO test (a, b, c, ) VALUES ...`. [#63803](https://github.com/ClickHouse/ClickHouse/pull/63803) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Better exception messages for the `Regexp` format. [#63804](https://github.com/ClickHouse/ClickHouse/pull/63804) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Allow trailing commas in the `Values` format. For example, this query is allowed: `INSERT INTO test (a, b, c) VALUES (4, 5, 6,);`. [#63810](https://github.com/ClickHouse/ClickHouse/pull/63810) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Make rabbitmq nack broken messages. Closes [#45350](https://github.com/ClickHouse/ClickHouse/issues/45350). [#60312](https://github.com/ClickHouse/ClickHouse/pull/60312) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix a crash in asynchronous stack unwinding (such as when using the sampling query profiler) while interpreting debug info. This closes [#60460](https://github.com/ClickHouse/ClickHouse/issues/60460). [#60468](https://github.com/ClickHouse/ClickHouse/pull/60468) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Distinct messages for s3 error 'no key' for cases disk and storage. [#61108](https://github.com/ClickHouse/ClickHouse/pull/61108) ([Sema Checherinda](https://github.com/CheSema)).
|
||||
* The progress bar will work for trivial queries with LIMIT from `system.zeros`, `system.zeros_mt` (it already works for `system.numbers` and `system.numbers_mt`), and the `generateRandom` table function. As a bonus, if the total number of records is greater than the `max_rows_to_read` limit, it will throw an exception earlier. This closes [#58183](https://github.com/ClickHouse/ClickHouse/issues/58183). [#61823](https://github.com/ClickHouse/ClickHouse/pull/61823) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Support for "Merge Key" in YAML configurations (this is a weird feature of YAML, please never mind). [#62685](https://github.com/ClickHouse/ClickHouse/pull/62685) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Enhance error message when non-deterministic function is used with Replicated source. [#62896](https://github.com/ClickHouse/ClickHouse/pull/62896) ([Grégoire Pineau](https://github.com/lyrixx)).
|
||||
* Fix interserver secret for Distributed over Distributed from `remote`. [#63013](https://github.com/ClickHouse/ClickHouse/pull/63013) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Support `include_from` for YAML files. However, you should better use `config.d` [#63106](https://github.com/ClickHouse/ClickHouse/pull/63106) ([Eduard Karacharov](https://github.com/korowa)).
|
||||
* Keep previous data in terminal after picking from skim suggestions. [#63261](https://github.com/ClickHouse/ClickHouse/pull/63261) ([FlameFactory](https://github.com/FlameFactory)).
|
||||
* Width of fields (in Pretty formats or the `visibleWidth` function) now correctly ignores ANSI escape sequences. [#63270](https://github.com/ClickHouse/ClickHouse/pull/63270) ([Shaun Struwig](https://github.com/Blargian)).
|
||||
* Update the usage of error code `NUMBER_OF_ARGUMENTS_DOESNT_MATCH` by more accurate error codes when appropriate. [#63406](https://github.com/ClickHouse/ClickHouse/pull/63406) ([Yohann Jardin](https://github.com/yohannj)).
|
||||
* `os_user` and `client_hostname` are now correctly set up for queries for command line suggestions in clickhouse-client. This closes [#63430](https://github.com/ClickHouse/ClickHouse/issues/63430). [#63433](https://github.com/ClickHouse/ClickHouse/pull/63433) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Automatically correct `max_block_size` to the default value if it is zero. [#63587](https://github.com/ClickHouse/ClickHouse/pull/63587) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Add a build_id ALIAS column to trace_log to facilitate auto renaming upon detecting binary changes. This is to address [#52086](https://github.com/ClickHouse/ClickHouse/issues/52086). [#63656](https://github.com/ClickHouse/ClickHouse/pull/63656) ([Zimu Li](https://github.com/woodlzm)).
|
||||
* Enable truncate operation for object storage disks. [#63693](https://github.com/ClickHouse/ClickHouse/pull/63693) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
|
||||
* The loading of the keywords list is now dependent on the server revision and will be disabled for the old versions of ClickHouse server. CC @azat. [#63786](https://github.com/ClickHouse/ClickHouse/pull/63786) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Clickhouse disks have to read server setting to obtain actual metadata format version. [#63831](https://github.com/ClickHouse/ClickHouse/pull/63831) ([Sema Checherinda](https://github.com/CheSema)).
|
||||
* Disable pretty format restrictions (`output_format_pretty_max_rows`/`output_format_pretty_max_value_width`) when stdout is not TTY. [#63942](https://github.com/ClickHouse/ClickHouse/pull/63942) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Exception handling now works when ClickHouse is used inside AWS Lambda. Author: [Alexey Coolnev](https://github.com/acoolnev). [#64014](https://github.com/ClickHouse/ClickHouse/pull/64014) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Throw `CANNOT_DECOMPRESS` instread of `CORRUPTED_DATA` on invalid compressed data passed via HTTP. [#64036](https://github.com/ClickHouse/ClickHouse/pull/64036) ([vdimir](https://github.com/vdimir)).
|
||||
* A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes [#61993](https://github.com/ClickHouse/ClickHouse/issues/61993). [#64084](https://github.com/ClickHouse/ClickHouse/pull/64084) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Added knob `metadata_storage_type` to keep free space on metadata storage disk. [#64128](https://github.com/ClickHouse/ClickHouse/pull/64128) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
|
||||
* Add metrics, logs, and thread names around parts filtering with indices. [#64130](https://github.com/ClickHouse/ClickHouse/pull/64130) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Metrics to track the number of directories created and removed by the `plain_rewritable` metadata storage, and the number of entries in the local-to-remote in-memory map. [#64175](https://github.com/ClickHouse/ClickHouse/pull/64175) ([Julia Kartseva](https://github.com/jkartseva)).
|
||||
* Ignore `allow_suspicious_primary_key` on `ATTACH` and verify on `ALTER`. [#64202](https://github.com/ClickHouse/ClickHouse/pull/64202) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* The query cache now considers identical queries with different settings as different. This increases robustness in cases where different settings (e.g. `limit` or `additional_table_filters`) would affect the query result. [#64205](https://github.com/ClickHouse/ClickHouse/pull/64205) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Test that a non standard error code `QPSLimitExceeded` is supported and it is retryable error. [#64225](https://github.com/ClickHouse/ClickHouse/pull/64225) ([Sema Checherinda](https://github.com/CheSema)).
|
||||
* Settings from the user config doesn't affect merges and mutations for MergeTree on top of object storage. [#64456](https://github.com/ClickHouse/ClickHouse/pull/64456) ([alesapin](https://github.com/alesapin)).
|
||||
* Test that `totalqpslimitexceeded` is a retriable s3 error. [#64520](https://github.com/ClickHouse/ClickHouse/pull/64520) ([Sema Checherinda](https://github.com/CheSema)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
* ClickHouse is built with clang-18. A lot of new checks from clang-tidy-18 have been enabled. [#60469](https://github.com/ClickHouse/ClickHouse/pull/60469) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Experimentally support loongarch64 as a new platform for ClickHouse. [#63733](https://github.com/ClickHouse/ClickHouse/pull/63733) ([qiangxuhui](https://github.com/qiangxuhui)).
|
||||
* The Dockerfile is reviewed by the docker official library in https://github.com/docker-library/official-images/pull/15846. [#63400](https://github.com/ClickHouse/ClickHouse/pull/63400) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Information about every symbol in every translation unit will be collected in the CI database for every build in the CI. This closes [#63494](https://github.com/ClickHouse/ClickHouse/issues/63494). [#63495](https://github.com/ClickHouse/ClickHouse/pull/63495) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Update Apache Datasketches library. It resolves [#63858](https://github.com/ClickHouse/ClickHouse/issues/63858). [#63923](https://github.com/ClickHouse/ClickHouse/pull/63923) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Enable GRPC support for aarch64 linux while cross-compiling binary. [#64072](https://github.com/ClickHouse/ClickHouse/pull/64072) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix unwind on SIGSEGV on aarch64 (due to small stack for signal) [#64058](https://github.com/ClickHouse/ClickHouse/pull/64058) ([Azat Khuzhin](https://github.com/azat)).
|
||||
|
||||
#### Bug Fix
|
||||
* Disabled `enable_vertical_final` setting by default. This feature should not be used because it has a bug: [#64543](https://github.com/ClickHouse/ClickHouse/issues/64543). [#64544](https://github.com/ClickHouse/ClickHouse/pull/64544) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
* Fix making backup when multiple shards are used [#57684](https://github.com/ClickHouse/ClickHouse/pull/57684) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fix passing projections/indexes/primary key from columns list from CREATE query into inner table of MV [#59183](https://github.com/ClickHouse/ClickHouse/pull/59183) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix boundRatio incorrect merge [#60532](https://github.com/ClickHouse/ClickHouse/pull/60532) ([Tao Wang](https://github.com/wangtZJU)).
|
||||
* Fix crash when calling some functions on const low-cardinality columns [#61966](https://github.com/ClickHouse/ClickHouse/pull/61966) ([Michael Kolupaev](https://github.com/al13n321)).
|
||||
* Fix queries with FINAL give wrong result when table does not use adaptive granularity [#62432](https://github.com/ClickHouse/ClickHouse/pull/62432) ([Duc Canh Le](https://github.com/canhld94)).
|
||||
* Improve detection of cgroups v2 support for memory controllers [#62903](https://github.com/ClickHouse/ClickHouse/pull/62903) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Fix subsequent use of external tables in client [#62964](https://github.com/ClickHouse/ClickHouse/pull/62964) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix crash with untuple and unresolved lambda [#63131](https://github.com/ClickHouse/ClickHouse/pull/63131) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Fix premature server listen for connections [#63181](https://github.com/ClickHouse/ClickHouse/pull/63181) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix intersecting parts when restarting after a DROP PART command [#63202](https://github.com/ClickHouse/ClickHouse/pull/63202) ([Han Fei](https://github.com/hanfei1991)).
|
||||
* Correctly load SQL security defaults during startup [#63209](https://github.com/ClickHouse/ClickHouse/pull/63209) ([pufit](https://github.com/pufit)).
|
||||
* JOIN filter push down filter join fix [#63234](https://github.com/ClickHouse/ClickHouse/pull/63234) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix infinite loop in AzureObjectStorage::listObjects [#63257](https://github.com/ClickHouse/ClickHouse/pull/63257) ([Julia Kartseva](https://github.com/jkartseva)).
|
||||
* CROSS join ignore join_algorithm setting [#63273](https://github.com/ClickHouse/ClickHouse/pull/63273) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix finalize WriteBufferToFileSegment and StatusFile [#63346](https://github.com/ClickHouse/ClickHouse/pull/63346) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix logical error during SELECT query after ALTER in rare case [#63353](https://github.com/ClickHouse/ClickHouse/pull/63353) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix `X-ClickHouse-Timezone` header with `session_timezone` [#63377](https://github.com/ClickHouse/ClickHouse/pull/63377) ([Andrey Zvonov](https://github.com/zvonand)).
|
||||
* Fix debug assert when using grouping WITH ROLLUP and LowCardinality types [#63398](https://github.com/ClickHouse/ClickHouse/pull/63398) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Small fixes for group_by_use_nulls [#63405](https://github.com/ClickHouse/ClickHouse/pull/63405) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix backup/restore of projection part in case projection was removed from table metadata, but part still has projection [#63426](https://github.com/ClickHouse/ClickHouse/pull/63426) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix mysql dictionary source [#63481](https://github.com/ClickHouse/ClickHouse/pull/63481) ([vdimir](https://github.com/vdimir)).
|
||||
* Insert QueryFinish on AsyncInsertFlush with no data [#63483](https://github.com/ClickHouse/ClickHouse/pull/63483) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Fix: empty used_dictionaries in system.query_log [#63487](https://github.com/ClickHouse/ClickHouse/pull/63487) ([Eduard Karacharov](https://github.com/korowa)).
|
||||
* Make `MergeTreePrefetchedReadPool` safer [#63513](https://github.com/ClickHouse/ClickHouse/pull/63513) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Fix crash on exit with sentry enabled (due to openssl destroyed before sentry) [#63548](https://github.com/ClickHouse/ClickHouse/pull/63548) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix Array and Map support with Keyed hashing [#63628](https://github.com/ClickHouse/ClickHouse/pull/63628) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
|
||||
* Fix filter pushdown for Parquet and maybe StorageMerge [#63642](https://github.com/ClickHouse/ClickHouse/pull/63642) ([Michael Kolupaev](https://github.com/al13n321)).
|
||||
* Prevent conversion to Replicated if zookeeper path already exists [#63670](https://github.com/ClickHouse/ClickHouse/pull/63670) ([Kirill](https://github.com/kirillgarbar)).
|
||||
* Analyzer: views read only necessary columns [#63688](https://github.com/ClickHouse/ClickHouse/pull/63688) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Analyzer: Forbid WINDOW redefinition [#63694](https://github.com/ClickHouse/ClickHouse/pull/63694) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* flatten_nested was broken with the experimental Replicated database. [#63695](https://github.com/ClickHouse/ClickHouse/pull/63695) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix [#63653](https://github.com/ClickHouse/ClickHouse/issues/63653) [#63722](https://github.com/ClickHouse/ClickHouse/pull/63722) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Allow cast from Array(Nothing) to Map(Nothing, Nothing) [#63753](https://github.com/ClickHouse/ClickHouse/pull/63753) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix ILLEGAL_COLUMN in partial_merge join [#63755](https://github.com/ClickHouse/ClickHouse/pull/63755) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix: remove redundant distinct with window functions [#63776](https://github.com/ClickHouse/ClickHouse/pull/63776) ([Igor Nikonov](https://github.com/devcrafter)).
|
||||
* Fix possible crash with SYSTEM UNLOAD PRIMARY KEY [#63778](https://github.com/ClickHouse/ClickHouse/pull/63778) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Fix a query with duplicating cycling alias. [#63791](https://github.com/ClickHouse/ClickHouse/pull/63791) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Make `TokenIterator` lazy as it should be [#63801](https://github.com/ClickHouse/ClickHouse/pull/63801) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add `endpoint_subpath` S3 URI setting [#63806](https://github.com/ClickHouse/ClickHouse/pull/63806) ([Julia Kartseva](https://github.com/jkartseva)).
|
||||
* Fix deadlock in `ParallelReadBuffer` [#63814](https://github.com/ClickHouse/ClickHouse/pull/63814) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* JOIN filter push down equivalent columns fix [#63819](https://github.com/ClickHouse/ClickHouse/pull/63819) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Remove data from all disks after DROP with Lazy database. [#63848](https://github.com/ClickHouse/ClickHouse/pull/63848) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
|
||||
* Fix incorrect result when reading from MV with parallel replicas and new analyzer [#63861](https://github.com/ClickHouse/ClickHouse/pull/63861) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Fixes in `find_super_nodes` and `find_big_family` command of keeper-client [#63862](https://github.com/ClickHouse/ClickHouse/pull/63862) ([Alexander Gololobov](https://github.com/davenger)).
|
||||
* Update lambda execution name [#63864](https://github.com/ClickHouse/ClickHouse/pull/63864) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix SIGSEGV due to CPU/Real profiler [#63865](https://github.com/ClickHouse/ClickHouse/pull/63865) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix `EXPLAIN CURRENT TRANSACTION` query [#63926](https://github.com/ClickHouse/ClickHouse/pull/63926) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix analyzer: there's turtles all the way down... [#63930](https://github.com/ClickHouse/ClickHouse/pull/63930) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Allow certain ALTER TABLE commands for `plain_rewritable` disk [#63933](https://github.com/ClickHouse/ClickHouse/pull/63933) ([Julia Kartseva](https://github.com/jkartseva)).
|
||||
* Recursive CTE distributed fix [#63939](https://github.com/ClickHouse/ClickHouse/pull/63939) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix reading of columns of type `Tuple(Map(LowCardinality(...)))` [#63956](https://github.com/ClickHouse/ClickHouse/pull/63956) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Analyzer: Fix COLUMNS resolve [#63962](https://github.com/ClickHouse/ClickHouse/pull/63962) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* LIMIT BY and skip_unused_shards with analyzer [#63983](https://github.com/ClickHouse/ClickHouse/pull/63983) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* A fix for some trash (experimental Kusto) [#63992](https://github.com/ClickHouse/ClickHouse/pull/63992) ([Yong Wang](https://github.com/kashwy)).
|
||||
* Deserialize untrusted binary inputs in a safer way [#64024](https://github.com/ClickHouse/ClickHouse/pull/64024) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Fix query analysis for queries with the setting `final` = 1 for Distributed tables over tables from other than the MergeTree family. [#64037](https://github.com/ClickHouse/ClickHouse/pull/64037) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Add missing settings to recoverLostReplica [#64040](https://github.com/ClickHouse/ClickHouse/pull/64040) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Fix SQL security access checks with analyzer [#64079](https://github.com/ClickHouse/ClickHouse/pull/64079) ([pufit](https://github.com/pufit)).
|
||||
* Fix analyzer: only interpolate expression should be used for DAG [#64096](https://github.com/ClickHouse/ClickHouse/pull/64096) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Fix azure backup writing multipart blocks by 1 MiB (read buffer size) instead of `max_upload_part_size` (in non-native copy case) [#64117](https://github.com/ClickHouse/ClickHouse/pull/64117) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Correctly fallback during backup copy [#64153](https://github.com/ClickHouse/ClickHouse/pull/64153) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Prevent LOGICAL_ERROR on CREATE TABLE as Materialized View [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Query Cache: Consider identical queries against different databases as different [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Ignore `text_log` for Keeper [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Fix ARRAY JOIN with Distributed. [#64226](https://github.com/ClickHouse/ClickHouse/pull/64226) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix: CNF with mutually exclusive atoms reduction [#64256](https://github.com/ClickHouse/ClickHouse/pull/64256) ([Eduard Karacharov](https://github.com/korowa)).
|
||||
* Fix Logical error: Bad cast for Buffer table with prewhere. [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
|
||||
|
||||
### <a id="244"></a> ClickHouse release 24.4, 2024-04-30
|
||||
|
||||
#### Upgrade Notes
|
||||
|
2
contrib/aws
vendored
2
contrib/aws
vendored
@ -1 +1 @@
|
||||
Subproject commit eb96e740453ae27afa1f367ba19f99bdcb38484d
|
||||
Subproject commit deeaa9e7c5fe690e3dacc4005d7ecfa7a66a32bb
|
@ -1,11 +0,0 @@
|
||||
sudo apt-get install apt-transport-https ca-certificates dirmngr
|
||||
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4
|
||||
|
||||
echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \
|
||||
/etc/apt/sources.list.d/clickhouse.list
|
||||
sudo apt-get update
|
||||
|
||||
sudo apt-get install -y clickhouse-server clickhouse-client
|
||||
|
||||
sudo service clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you set up a password.
|
@ -1,7 +0,0 @@
|
||||
sudo yum install yum-utils
|
||||
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
|
||||
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo
|
||||
sudo yum install clickhouse-server clickhouse-client
|
||||
|
||||
sudo /etc/init.d/clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you set up a password.
|
@ -1,19 +0,0 @@
|
||||
export LATEST_VERSION=$(curl -s https://repo.clickhouse.com/tgz/stable/ | \
|
||||
grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1)
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz
|
||||
|
||||
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh
|
||||
|
||||
tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh
|
||||
|
||||
tar -xzvf clickhouse-server-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh
|
||||
sudo /etc/init.d/clickhouse-server start
|
||||
|
||||
tar -xzvf clickhouse-client-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh
|
@ -7,21 +7,43 @@ description: A list of third-party libraries used
|
||||
|
||||
# Third-Party Libraries Used
|
||||
|
||||
ClickHouse utilizes third-party libraries for different purposes, e.g., to connect to other databases, to decode (encode) data during load (save) from (to) disk or to implement certain specialized SQL functions. To be independent of the available libraries in the target system, each third-party library is imported as a Git submodule into ClickHouse's source tree and compiled and linked with ClickHouse. A list of third-party libraries and their licenses can be obtained by the following query:
|
||||
ClickHouse utilizes third-party libraries for different purposes, e.g., to connect to other databases, to decode/encode data during load/save from/to disk, or to implement certain specialized SQL functions.
|
||||
To be independent of the available libraries in the target system, each third-party library is imported as a Git submodule into ClickHouse's source tree and compiled and linked with ClickHouse.
|
||||
A list of third-party libraries and their licenses can be obtained by the following query:
|
||||
|
||||
``` sql
|
||||
SELECT library_name, license_type, license_path FROM system.licenses ORDER BY library_name COLLATE 'en';
|
||||
```
|
||||
|
||||
Note that the listed libraries are the ones located in the `contrib/` directory of the ClickHouse repository. Depending on the build options, some of the libraries may have not been compiled, and as a result, their functionality may not be available at runtime.
|
||||
Note that the listed libraries are the ones located in the `contrib/` directory of the ClickHouse repository.
|
||||
Depending on the build options, some of the libraries may have not been compiled, and, as a result, their functionality may not be available at runtime.
|
||||
|
||||
[Example](https://play.clickhouse.com/play?user=play#U0VMRUNUIGxpYnJhcnlfbmFtZSwgbGljZW5zZV90eXBlLCBsaWNlbnNlX3BhdGggRlJPTSBzeXN0ZW0ubGljZW5zZXMgT1JERVIgQlkgbGlicmFyeV9uYW1lIENPTExBVEUgJ2VuJw==)
|
||||
|
||||
## Adding new third-party libraries and maintaining patches in third-party libraries {#adding-third-party-libraries}
|
||||
## Adding and maintaining third-party libraries
|
||||
|
||||
1. Each third-party library must reside in a dedicated directory under the `contrib/` directory of the ClickHouse repository. Avoid dumps/copies of external code, instead use Git submodule feature to pull third-party code from an external upstream repository.
|
||||
2. Submodules are listed in `.gitmodule`. If the external library can be used as-is, you may reference the upstream repository directly. Otherwise, i.e. the external library requires patching/customization, create a fork of the official repository in the [ClickHouse organization in GitHub](https://github.com/ClickHouse).
|
||||
3. In the latter case, create a branch with `clickhouse/` prefix from the branch you want to integrate, e.g. `clickhouse/master` (for `master`) or `clickhouse/release/vX.Y.Z` (for a `release/vX.Y.Z` tag). The purpose of this branch is to isolate customization of the library from upstream work. For example, pulls from the upstream repository into the fork will leave all `clickhouse/` branches unaffected. Submodules in `contrib/` must only track `clickhouse/` branches of forked third-party repositories.
|
||||
4. To patch a fork of a third-party library, create a dedicated branch with `clickhouse/` prefix in the fork, e.g. `clickhouse/fix-some-desaster`. Finally, merge the patch branch into the custom tracking branch (e.g. `clickhouse/master` or `clickhouse/release/vX.Y.Z`) using a PR.
|
||||
5. Always create patches of third-party libraries with the official repository in mind. Once a PR of a patch branch to the `clickhouse/` branch in the fork repository is done and the submodule version in ClickHouse official repository is bumped, consider opening another PR from the patch branch to the upstream library repository. This ensures, that 1) the contribution has more than a single use case and importance, 2) others will also benefit from it, 3) the change will not remain a maintenance burden solely on ClickHouse developers.
|
||||
9. To update a submodule with changes in the upstream repository, first merge upstream `master` (or a new `versionX.Y.Z` tag) into the `clickhouse`-tracking branch in the fork repository. Conflicts with patches/customization will need to be resolved in this merge (see Step 4.). Once the merge is done, bump the submodule in ClickHouse to point to the new hash in the fork.
|
||||
Each third-party library must reside in a dedicated directory under the `contrib/` directory of the ClickHouse repository.
|
||||
Avoid dumping copies of external code into the library directory.
|
||||
Instead create a Git submodule to pull third-party code from an external upstream repository.
|
||||
|
||||
All submodules used by ClickHouse are listed in the `.gitmodule` file.
|
||||
If the library can be used as-is (the default case), you can reference the upstream repository directly.
|
||||
If the library needs patching, create a fork of the upstream repository in the [ClickHouse organization on GitHub](https://github.com/ClickHouse).
|
||||
|
||||
In the latter case, we aim to isolate custom patches as much as possible from upstream commits.
|
||||
To that end, create a branch with prefix `clickhouse/` from the branch or tag you want to integrate, e.g. `clickhouse/master` (for branch `master`) or `clickhouse/release/vX.Y.Z` (for tag `release/vX.Y.Z`).
|
||||
This ensures that pulls from the upstream repository into the fork will leave custom `clickhouse/` branches unaffected.
|
||||
Submodules in `contrib/` must only track `clickhouse/` branches of forked third-party repositories.
|
||||
|
||||
Patches are only applied against `clickhouse/` branches of external libraries.
|
||||
For that, push the patch as a branch with `clickhouse/`, e.g. `clickhouse/fix-some-desaster`.
|
||||
Then create a PR from the new branch against the custom tracking branch with `clickhouse/` prefix, (e.g. `clickhouse/master` or `clickhouse/release/vX.Y.Z`) and merge the patch.
|
||||
|
||||
Create patches of third-party libraries with the official repository in mind and consider contributing the patch back to the upstream repository.
|
||||
This makes sure that others will also benefit from the patch and it will not be a maintenance burden for the ClickHouse team.
|
||||
|
||||
To pull upstream changes into the submodule, you can use two methods:
|
||||
- (less work but less clean): merge upstream `master` into the corresponding `clickhouse/` tracking branch in the forked repository. You will need to resolve merge conflicts with previous custom patches. This method can be used when the `clickhouse/` branch tracks an upstream development branch like `master`, `main`, `dev`, etc.
|
||||
- (more work but cleaner): create a new branch with `clickhouse/` prefix from the upstream commit or tag you like to integrate. Then re-apply all existing patches using new PRs (or squash them into a single PR). This method can be used when the `clickhouse/` branch tracks a specific upstream version branch or tag. It is cleaner in the sense that custom patches and upstream changes are better isolated from each other.
|
||||
|
||||
Once the submodule has been updated, bump the submodule in ClickHouse to point to the new hash in the fork.
|
||||
|
@ -111,29 +111,10 @@ clickhouse-client # or "clickhouse-client --password" if you've set up a passwor
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary>Deprecated Method for installing deb-packages</summary>
|
||||
|
||||
``` bash
|
||||
sudo apt-get install apt-transport-https ca-certificates dirmngr
|
||||
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4
|
||||
|
||||
echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \
|
||||
/etc/apt/sources.list.d/clickhouse.list
|
||||
sudo apt-get update
|
||||
|
||||
sudo apt-get install -y clickhouse-server clickhouse-client
|
||||
|
||||
sudo service clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you set up a password.
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Migration Method for installing the deb-packages</summary>
|
||||
<summary>Old distributions method for installing the deb-packages</summary>
|
||||
|
||||
```bash
|
||||
sudo apt-key del E0C56BD4
|
||||
sudo apt-get install apt-transport-https ca-certificates dirmngr
|
||||
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
|
||||
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \
|
||||
/etc/apt/sources.list.d/clickhouse.list
|
||||
@ -240,22 +221,6 @@ sudo systemctl start clickhouse-keeper
|
||||
sudo systemctl status clickhouse-keeper
|
||||
```
|
||||
|
||||
<details markdown="1">
|
||||
|
||||
<summary>Deprecated Method for installing rpm-packages</summary>
|
||||
|
||||
``` bash
|
||||
sudo yum install yum-utils
|
||||
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
|
||||
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo
|
||||
sudo yum install clickhouse-server clickhouse-client
|
||||
|
||||
sudo /etc/init.d/clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you set up a password.
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
You can replace `stable` with `lts` to use different [release kinds](/knowledgebase/production) based on your needs.
|
||||
|
||||
Then run these commands to install packages:
|
||||
@ -308,33 +273,6 @@ tar -xzvf "clickhouse-client-$LATEST_VERSION-${ARCH}.tgz" \
|
||||
sudo "clickhouse-client-$LATEST_VERSION/install/doinst.sh"
|
||||
```
|
||||
|
||||
<details markdown="1">
|
||||
|
||||
<summary>Deprecated Method for installing tgz archives</summary>
|
||||
|
||||
``` bash
|
||||
export LATEST_VERSION=$(curl -s https://repo.clickhouse.com/tgz/stable/ | \
|
||||
grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1)
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz
|
||||
|
||||
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh
|
||||
|
||||
tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh
|
||||
|
||||
tar -xzvf clickhouse-server-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh
|
||||
sudo /etc/init.d/clickhouse-server start
|
||||
|
||||
tar -xzvf clickhouse-client-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh
|
||||
```
|
||||
</details>
|
||||
|
||||
For production environments, it’s recommended to use the latest `stable`-version. You can find its number on GitHub page https://github.com/ClickHouse/ClickHouse/tags with postfix `-stable`.
|
||||
|
||||
### From Docker Image {#from-docker-image}
|
||||
|
@ -1235,6 +1235,168 @@ Result:
|
||||
|
||||
- [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) server configuration parameter.
|
||||
|
||||
## toStartOfMillisecond
|
||||
|
||||
Rounds down a date with time to the start of the milliseconds.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
toStartOfMillisecond(value, [timezone])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `value` — Date and time. [DateTime64](../../sql-reference/data-types/datetime64.md).
|
||||
- `timezone` — [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) for the returned value (optional). If not specified, the function uses the timezone of the `value` parameter. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Input value with sub-milliseconds. [DateTime64](../../sql-reference/data-types/datetime64.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query without timezone:
|
||||
|
||||
``` sql
|
||||
WITH toDateTime64('2020-01-01 10:20:30.999999999', 9) AS dt64
|
||||
SELECT toStartOfMillisecond(dt64);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌────toStartOfMillisecond(dt64)─┐
|
||||
│ 2020-01-01 10:20:30.999000000 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
Query with timezone:
|
||||
|
||||
``` sql
|
||||
┌─toStartOfMillisecond(dt64, 'Asia/Istanbul')─┐
|
||||
│ 2020-01-01 12:20:30.999000000 │
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─toStartOfMillisecond(dt64, 'Asia/Istanbul')─┐
|
||||
│ 2020-01-01 12:20:30.999 │
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## toStartOfMicrosecond
|
||||
|
||||
Rounds down a date with time to the start of the microseconds.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
toStartOfMicrosecond(value, [timezone])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `value` — Date and time. [DateTime64](../../sql-reference/data-types/datetime64.md).
|
||||
- `timezone` — [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) for the returned value (optional). If not specified, the function uses the timezone of the `value` parameter. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Input value with sub-microseconds. [DateTime64](../../sql-reference/data-types/datetime64.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query without timezone:
|
||||
|
||||
``` sql
|
||||
WITH toDateTime64('2020-01-01 10:20:30.999999999', 9) AS dt64
|
||||
SELECT toStartOfMicrosecond(dt64);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌────toStartOfMicrosecond(dt64)─┐
|
||||
│ 2020-01-01 10:20:30.999999000 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
Query with timezone:
|
||||
|
||||
``` sql
|
||||
WITH toDateTime64('2020-01-01 10:20:30.999999999', 9) AS dt64
|
||||
SELECT toStartOfMicrosecond(dt64, 'Asia/Istanbul');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─toStartOfMicrosecond(dt64, 'Asia/Istanbul')─┐
|
||||
│ 2020-01-01 12:20:30.999999000 │
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**See also**
|
||||
|
||||
- [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) server configuration parameter.
|
||||
|
||||
## toStartOfNanosecond
|
||||
|
||||
Rounds down a date with time to the start of the nanoseconds.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
toStartOfNanosecond(value, [timezone])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `value` — Date and time. [DateTime64](../../sql-reference/data-types/datetime64.md).
|
||||
- `timezone` — [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) for the returned value (optional). If not specified, the function uses the timezone of the `value` parameter. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Input value with nanoseconds. [DateTime64](../../sql-reference/data-types/datetime64.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query without timezone:
|
||||
|
||||
``` sql
|
||||
WITH toDateTime64('2020-01-01 10:20:30.999999999', 9) AS dt64
|
||||
SELECT toStartOfNanosecond(dt64);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─────toStartOfNanosecond(dt64)─┐
|
||||
│ 2020-01-01 10:20:30.999999999 │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
Query with timezone:
|
||||
|
||||
``` sql
|
||||
WITH toDateTime64('2020-01-01 10:20:30.999999999', 9) AS dt64
|
||||
SELECT toStartOfNanosecond(dt64, 'Asia/Istanbul');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─toStartOfNanosecond(dt64, 'Asia/Istanbul')─┐
|
||||
│ 2020-01-01 12:20:30.999999999 │
|
||||
└────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**See also**
|
||||
|
||||
- [Timezone](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) server configuration parameter.
|
||||
|
||||
## toStartOfFiveMinutes
|
||||
|
||||
Rounds down a date with time to the start of the five-minute interval.
|
||||
@ -3953,6 +4115,43 @@ Result:
|
||||
│ 2023-03-16 18:00:00.000 │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## UTCTimestamp
|
||||
|
||||
Returns the current date and time at the moment of query analysis. The function is a constant expression.
|
||||
|
||||
:::note
|
||||
This function gives the same result that `now('UTC')` would. It was added only for MySQL support and [`now`](#now-now) is the preferred usage.
|
||||
:::
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
UTCTimestamp()
|
||||
```
|
||||
|
||||
Alias: `UTC_timestamp`.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns the current date and time at the moment of query analysis. [DateTime](../data-types/datetime.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT UTCTimestamp();
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌──────UTCTimestamp()─┐
|
||||
│ 2024-05-28 08:32:09 │
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
## timeDiff
|
||||
|
||||
Returns the difference between two dates or dates with time values. The difference is calculated in units of seconds. It is same as `dateDiff` and was added only for MySQL support. `dateDiff` is preferred.
|
||||
|
@ -4,13 +4,13 @@ sidebar_position: 105
|
||||
sidebar_label: JSON
|
||||
---
|
||||
|
||||
There are two sets of functions to parse JSON.
|
||||
- `simpleJSON*` (`visitParam*`) is made to parse a special very limited subset of a JSON, but these functions are extremely fast.
|
||||
- `JSONExtract*` is made to parse normal JSON.
|
||||
There are two sets of functions to parse JSON:
|
||||
- [`simpleJSON*` (`visitParam*`)](#simplejson--visitparam-functions) which is made for parsing a limited subset of JSON extremely fast.
|
||||
- [`JSONExtract*`](#jsonextract-functions) which is made for parsing ordinary JSON.
|
||||
|
||||
# simpleJSON/visitParam functions
|
||||
## simpleJSON / visitParam functions
|
||||
|
||||
ClickHouse has special functions for working with simplified JSON. All these JSON functions are based on strong assumptions about what the JSON can be, but they try to do as little as possible to get the job done.
|
||||
ClickHouse has special functions for working with simplified JSON. All these JSON functions are based on strong assumptions about what the JSON can be. They try to do as little as possible to get the job done as quickly as possible.
|
||||
|
||||
The following assumptions are made:
|
||||
|
||||
@ -19,7 +19,7 @@ The following assumptions are made:
|
||||
3. Fields are searched for on any nesting level, indiscriminately. If there are multiple matching fields, the first occurrence is used.
|
||||
4. The JSON does not have space characters outside of string literals.
|
||||
|
||||
## simpleJSONHas
|
||||
### simpleJSONHas
|
||||
|
||||
Checks whether there is a field named `field_name`. The result is `UInt8`.
|
||||
|
||||
@ -29,14 +29,16 @@ Checks whether there is a field named `field_name`. The result is `UInt8`.
|
||||
simpleJSONHas(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamHas`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns `1` if the field exists, `0` otherwise.
|
||||
- Returns `1` if the field exists, `0` otherwise. [UInt8](../data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -55,11 +57,13 @@ SELECT simpleJSONHas(json, 'foo') FROM jsons;
|
||||
SELECT simpleJSONHas(json, 'bar') FROM jsons;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
1
|
||||
0
|
||||
```
|
||||
## simpleJSONExtractUInt
|
||||
### simpleJSONExtractUInt
|
||||
|
||||
Parses `UInt64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`.
|
||||
|
||||
@ -69,14 +73,16 @@ Parses `UInt64` from the value of the field named `field_name`. If this is a str
|
||||
simpleJSONExtractUInt(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractUInt`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns the number parsed from the field if the field exists and contains a number, `0` otherwise.
|
||||
- Returns the number parsed from the field if the field exists and contains a number, `0` otherwise. [UInt64](../data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -98,6 +104,8 @@ INSERT INTO jsons VALUES ('{"baz":2}');
|
||||
SELECT simpleJSONExtractUInt(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
0
|
||||
4
|
||||
@ -106,7 +114,7 @@ SELECT simpleJSONExtractUInt(json, 'foo') FROM jsons ORDER BY json;
|
||||
5
|
||||
```
|
||||
|
||||
## simpleJSONExtractInt
|
||||
### simpleJSONExtractInt
|
||||
|
||||
Parses `Int64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`.
|
||||
|
||||
@ -116,14 +124,16 @@ Parses `Int64` from the value of the field named `field_name`. If this is a stri
|
||||
simpleJSONExtractInt(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractInt`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns the number parsed from the field if the field exists and contains a number, `0` otherwise.
|
||||
- Returns the number parsed from the field if the field exists and contains a number, `0` otherwise. [Int64](../data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -145,6 +155,8 @@ INSERT INTO jsons VALUES ('{"baz":2}');
|
||||
SELECT simpleJSONExtractInt(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
0
|
||||
-4
|
||||
@ -153,7 +165,7 @@ SELECT simpleJSONExtractInt(json, 'foo') FROM jsons ORDER BY json;
|
||||
5
|
||||
```
|
||||
|
||||
## simpleJSONExtractFloat
|
||||
### simpleJSONExtractFloat
|
||||
|
||||
Parses `Float64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`.
|
||||
|
||||
@ -163,14 +175,16 @@ Parses `Float64` from the value of the field named `field_name`. If this is a st
|
||||
simpleJSONExtractFloat(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractFloat`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns the number parsed from the field if the field exists and contains a number, `0` otherwise.
|
||||
- Returns the number parsed from the field if the field exists and contains a number, `0` otherwise. [Float64](../data-types/float.md/#float32-float64).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -192,6 +206,8 @@ INSERT INTO jsons VALUES ('{"baz":2}');
|
||||
SELECT simpleJSONExtractFloat(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
0
|
||||
-4000
|
||||
@ -200,7 +216,7 @@ SELECT simpleJSONExtractFloat(json, 'foo') FROM jsons ORDER BY json;
|
||||
5
|
||||
```
|
||||
|
||||
## simpleJSONExtractBool
|
||||
### simpleJSONExtractBool
|
||||
|
||||
Parses a true/false value from the value of the field named `field_name`. The result is `UInt8`.
|
||||
|
||||
@ -210,10 +226,12 @@ Parses a true/false value from the value of the field named `field_name`. The re
|
||||
simpleJSONExtractBool(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractBool`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
@ -240,6 +258,8 @@ SELECT simpleJSONExtractBool(json, 'bar') FROM jsons ORDER BY json;
|
||||
SELECT simpleJSONExtractBool(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
0
|
||||
1
|
||||
@ -247,7 +267,7 @@ SELECT simpleJSONExtractBool(json, 'foo') FROM jsons ORDER BY json;
|
||||
0
|
||||
```
|
||||
|
||||
## simpleJSONExtractRaw
|
||||
### simpleJSONExtractRaw
|
||||
|
||||
Returns the value of the field named `field_name` as a `String`, including separators.
|
||||
|
||||
@ -257,14 +277,16 @@ Returns the value of the field named `field_name` as a `String`, including separ
|
||||
simpleJSONExtractRaw(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractRaw`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns the value of the field as a [`String`](../data-types/string.md#string), including separators if the field exists, or an empty `String` otherwise.
|
||||
- Returns the value of the field as a string, including separators if the field exists, or an empty string otherwise. [`String`](../data-types/string.md#string)
|
||||
|
||||
**Example**
|
||||
|
||||
@ -286,6 +308,8 @@ INSERT INTO jsons VALUES ('{"baz":2}');
|
||||
SELECT simpleJSONExtractRaw(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
|
||||
"-4e3"
|
||||
@ -294,7 +318,7 @@ SELECT simpleJSONExtractRaw(json, 'foo') FROM jsons ORDER BY json;
|
||||
{"def":[1,2,3]}
|
||||
```
|
||||
|
||||
## simpleJSONExtractString
|
||||
### simpleJSONExtractString
|
||||
|
||||
Parses `String` in double quotes from the value of the field named `field_name`.
|
||||
|
||||
@ -304,14 +328,16 @@ Parses `String` in double quotes from the value of the field named `field_name`.
|
||||
simpleJSONExtractString(json, field_name)
|
||||
```
|
||||
|
||||
Alias: `visitParamExtractString`.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json`: The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
|
||||
- `json` — The JSON in which the field is searched for. [String](../data-types/string.md#string)
|
||||
- `field_name` — The name of the field to search for. [String literal](../syntax#string)
|
||||
|
||||
**Returned value**
|
||||
|
||||
It returns the value of a field as a [`String`](../data-types/string.md#string), including separators. The value is unescaped. It returns an empty `String`: if the field doesn't contain a double quoted string, if unescaping fails or if the field doesn't exist.
|
||||
- Returns the unescaped value of a field as a string, including separators. An empty string is returned if the field doesn't contain a double quoted string, if unescaping fails or if the field doesn't exist. [String](../data-types/string.md).
|
||||
|
||||
**Implementation details**
|
||||
|
||||
@ -336,6 +362,8 @@ INSERT INTO jsons VALUES ('{"foo":"hello}');
|
||||
SELECT simpleJSONExtractString(json, 'foo') FROM jsons ORDER BY json;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
\n\0
|
||||
|
||||
@ -343,73 +371,61 @@ SELECT simpleJSONExtractString(json, 'foo') FROM jsons ORDER BY json;
|
||||
|
||||
```
|
||||
|
||||
## visitParamHas
|
||||
## JSONExtract functions
|
||||
|
||||
This function is [an alias of `simpleJSONHas`](./json-functions#simplejsonhas).
|
||||
The following functions are based on [simdjson](https://github.com/lemire/simdjson), and designed for more complex JSON parsing requirements.
|
||||
|
||||
## visitParamExtractUInt
|
||||
### isValidJSON
|
||||
|
||||
This function is [an alias of `simpleJSONExtractUInt`](./json-functions#simplejsonextractuint).
|
||||
Checks that passed string is valid JSON.
|
||||
|
||||
## visitParamExtractInt
|
||||
**Syntax**
|
||||
|
||||
This function is [an alias of `simpleJSONExtractInt`](./json-functions#simplejsonextractint).
|
||||
```sql
|
||||
isValidJSON(json)
|
||||
```
|
||||
|
||||
## visitParamExtractFloat
|
||||
|
||||
This function is [an alias of `simpleJSONExtractFloat`](./json-functions#simplejsonextractfloat).
|
||||
|
||||
## visitParamExtractBool
|
||||
|
||||
This function is [an alias of `simpleJSONExtractBool`](./json-functions#simplejsonextractbool).
|
||||
|
||||
## visitParamExtractRaw
|
||||
|
||||
This function is [an alias of `simpleJSONExtractRaw`](./json-functions#simplejsonextractraw).
|
||||
|
||||
## visitParamExtractString
|
||||
|
||||
This function is [an alias of `simpleJSONExtractString`](./json-functions#simplejsonextractstring).
|
||||
|
||||
# JSONExtract functions
|
||||
|
||||
The following functions are based on [simdjson](https://github.com/lemire/simdjson) designed for more complex JSON parsing requirements.
|
||||
|
||||
## isValidJSON(json)
|
||||
|
||||
Checks that passed string is a valid json.
|
||||
|
||||
Examples:
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT isValidJSON('{"a": "hello", "b": [-100, 200.0, 300]}') = 1
|
||||
SELECT isValidJSON('not a json') = 0
|
||||
```
|
||||
|
||||
## JSONHas(json\[, indices_or_keys\]...)
|
||||
### JSONHas
|
||||
|
||||
If the value exists in the JSON document, `1` will be returned.
|
||||
If the value exists in the JSON document, `1` will be returned. If the value does not exist, `0` will be returned.
|
||||
|
||||
If the value does not exist, `0` will be returned.
|
||||
**Syntax**
|
||||
|
||||
Examples:
|
||||
```sql
|
||||
JSONHas(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns `1` if the value exists in `json`, otherwise `0`. [UInt8](../data-types/int-uint.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSONHas('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 1
|
||||
SELECT JSONHas('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 4) = 0
|
||||
```
|
||||
|
||||
`indices_or_keys` is a list of zero or more arguments each of them can be either string or integer.
|
||||
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
Minimum index of the element is 1. Thus the element 0 does not exist.
|
||||
|
||||
You may use integers to access both JSON arrays and JSON objects.
|
||||
|
||||
So, for example:
|
||||
The minimum index of the element is 1. Thus the element 0 does not exist. You may use integers to access both JSON arrays and JSON objects. For example:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', 1) = 'a'
|
||||
@ -419,26 +435,62 @@ SELECT JSONExtractKey('{"a": "hello", "b": [-100, 200.0, 300]}', -2) = 'a'
|
||||
SELECT JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 1) = 'hello'
|
||||
```
|
||||
|
||||
## JSONLength(json\[, indices_or_keys\]...)
|
||||
### JSONLength
|
||||
|
||||
Return the length of a JSON array or a JSON object.
|
||||
Return the length of a JSON array or a JSON object. If the value does not exist or has the wrong type, `0` will be returned.
|
||||
|
||||
If the value does not exist or has a wrong type, `0` will be returned.
|
||||
**Syntax**
|
||||
|
||||
Examples:
|
||||
```sql
|
||||
JSONLength(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns the length of the JSON array or JSON object. Returns `0` if the value does not exist or has the wrong type. [UInt64](../data-types/int-uint.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT JSONLength('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 3
|
||||
SELECT JSONLength('{"a": "hello", "b": [-100, 200.0, 300]}') = 2
|
||||
```
|
||||
|
||||
## JSONType(json\[, indices_or_keys\]...)
|
||||
### JSONType
|
||||
|
||||
Return the type of a JSON value.
|
||||
Return the type of a JSON value. If the value does not exist, `Null` will be returned.
|
||||
|
||||
If the value does not exist, `Null` will be returned.
|
||||
**Syntax**
|
||||
|
||||
Examples:
|
||||
```sql
|
||||
JSONType(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns the type of a JSON value as a string, otherwise if the value doesn't exists it returns `Null`. [String](../data-types/string.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}') = 'Object'
|
||||
@ -446,35 +498,191 @@ SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') = 'String'
|
||||
SELECT JSONType('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = 'Array'
|
||||
```
|
||||
|
||||
## JSONExtractUInt(json\[, indices_or_keys\]...)
|
||||
### JSONExtractUInt
|
||||
|
||||
## JSONExtractInt(json\[, indices_or_keys\]...)
|
||||
Parses JSON and extracts a value of UInt type.
|
||||
|
||||
## JSONExtractFloat(json\[, indices_or_keys\]...)
|
||||
**Syntax**
|
||||
|
||||
## JSONExtractBool(json\[, indices_or_keys\]...)
|
||||
|
||||
Parses a JSON and extract a value. These functions are similar to `visitParam` functions.
|
||||
|
||||
If the value does not exist or has a wrong type, `0` will be returned.
|
||||
|
||||
Examples:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 1) = -100
|
||||
SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 2) = 200.0
|
||||
SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) = 300
|
||||
```sql
|
||||
JSONExtractUInt(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
## JSONExtractString(json\[, indices_or_keys\]...)
|
||||
**Parameters**
|
||||
|
||||
Parses a JSON and extract a string. This function is similar to `visitParamExtractString` functions.
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
If the value does not exist or has a wrong type, an empty string will be returned.
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
The value is unescaped. If unescaping failed, it returns an empty string.
|
||||
**Returned value**
|
||||
|
||||
Examples:
|
||||
- Returns a UInt value if it exists, otherwise it returns `Null`. [UInt64](../data-types/string.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) as x, toTypeName(x);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌───x─┬─toTypeName(x)─┐
|
||||
│ 300 │ UInt64 │
|
||||
└─────┴───────────────┘
|
||||
```
|
||||
|
||||
### JSONExtractInt
|
||||
|
||||
Parses JSON and extracts a value of Int type.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtractInt(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns an Int value if it exists, otherwise it returns `Null`. [Int64](../data-types/int-uint.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', -1) as x, toTypeName(x);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌───x─┬─toTypeName(x)─┐
|
||||
│ 300 │ Int64 │
|
||||
└─────┴───────────────┘
|
||||
```
|
||||
|
||||
### JSONExtractFloat
|
||||
|
||||
Parses JSON and extracts a value of Int type.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtractFloat(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns an Float value if it exists, otherwise it returns `Null`. [Float64](../data-types/float.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300]}', 'b', 2) as x, toTypeName(x);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌───x─┬─toTypeName(x)─┐
|
||||
│ 200 │ Float64 │
|
||||
└─────┴───────────────┘
|
||||
```
|
||||
|
||||
### JSONExtractBool
|
||||
|
||||
Parses JSON and extracts a boolean value. If the value does not exist or has a wrong type, `0` will be returned.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtractBool(json\[, indices_or_keys\]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns a Boolean value if it exists, otherwise it returns `0`. [Bool](../data-types/boolean.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractBool('{"passed": true}', 'passed');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─JSONExtractBool('{"passed": true}', 'passed')─┐
|
||||
│ 1 │
|
||||
└───────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### JSONExtractString
|
||||
|
||||
Parses JSON and extracts a string. This function is similar to [`visitParamExtractString`](#simplejsonextractstring) functions. If the value does not exist or has a wrong type, an empty string will be returned.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtractString(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns an unescaped string from `json`. If unescaping failed, if the value does not exist or if it has a wrong type then it returns an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') = 'hello'
|
||||
@ -484,16 +692,35 @@ SELECT JSONExtractString('{"abc":"\\u263"}', 'abc') = ''
|
||||
SELECT JSONExtractString('{"abc":"hello}', 'abc') = ''
|
||||
```
|
||||
|
||||
## JSONExtract(json\[, indices_or_keys...\], Return_type)
|
||||
### JSONExtract
|
||||
|
||||
Parses a JSON and extract a value of the given ClickHouse data type.
|
||||
Parses JSON and extracts a value of the given ClickHouse data type. This function is a generalized version of the previous `JSONExtract<type>` functions. Meaning:
|
||||
|
||||
This is a generalization of the previous `JSONExtract<type>` functions.
|
||||
This means
|
||||
`JSONExtract(..., 'String')` returns exactly the same as `JSONExtractString()`,
|
||||
`JSONExtract(..., 'Float64')` returns exactly the same as `JSONExtractFloat()`.
|
||||
|
||||
Examples:
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtract(json [, indices_or_keys...], return_type)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
- `return_type` — A string specifying the type of the value to extract. [String](../data-types/string.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns a value if it exists of the specified return type, otherwise it returns `0`, `Null`, or an empty-string depending on the specified return type. [UInt64](../data-types/int-uint.md), [Int64](../data-types/int-uint.md), [Float64](../data-types/float.md), [Bool](../data-types/boolean.md) or [String](../data-types/string.md).
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'Tuple(String, Array(Float64))') = ('hello',[-100,200,300])
|
||||
@ -506,17 +733,38 @@ SELECT JSONExtract('{"day": "Thursday"}', 'day', 'Enum8(\'Sunday\' = 0, \'Monday
|
||||
SELECT JSONExtract('{"day": 5}', 'day', 'Enum8(\'Sunday\' = 0, \'Monday\' = 1, \'Tuesday\' = 2, \'Wednesday\' = 3, \'Thursday\' = 4, \'Friday\' = 5, \'Saturday\' = 6)') = 'Friday'
|
||||
```
|
||||
|
||||
## JSONExtractKeysAndValues(json\[, indices_or_keys...\], Value_type)
|
||||
### JSONExtractKeysAndValues
|
||||
|
||||
Parses key-value pairs from a JSON where the values are of the given ClickHouse data type.
|
||||
Parses key-value pairs from JSON where the values are of the given ClickHouse data type.
|
||||
|
||||
Example:
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSONExtractKeysAndValues(json [, indices_or_keys...], value_type)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
- `value_type` — A string specifying the type of the value to extract. [String](../data-types/string.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns an array of parsed key-value pairs. [Array](../data-types/array.md)([Tuple](../data-types/tuple.md)(`value_type`)).
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractKeysAndValues('{"x": {"a": 5, "b": 7, "c": 11}}', 'x', 'Int8') = [('a',5),('b',7),('c',11)];
|
||||
```
|
||||
|
||||
## JSONExtractKeys
|
||||
### JSONExtractKeys
|
||||
|
||||
Parses a JSON string and extracts the keys.
|
||||
|
||||
@ -526,14 +774,14 @@ Parses a JSON string and extracts the keys.
|
||||
JSONExtractKeys(json[, a, b, c...])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
**Parameters**
|
||||
|
||||
- `json` — [String](../data-types/string.md) with valid JSON.
|
||||
- `a, b, c...` — Comma-separated indices or keys that specify the path to the inner field in a nested JSON object. Each argument can be either a [String](../data-types/string.md) to get the field by the key or an [Integer](../data-types/int-uint.md) to get the N-th field (indexed from 1, negative integers count from the end). If not set, the whole JSON is parsed as the top-level object. Optional parameter.
|
||||
|
||||
**Returned value**
|
||||
|
||||
Array with the keys of the JSON. [Array](../data-types/array.md)([String](../data-types/string.md)).
|
||||
- Returns an array with the keys of the JSON. [Array](../data-types/array.md)([String](../data-types/string.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -552,31 +800,67 @@ text
|
||||
└────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## JSONExtractRaw(json\[, indices_or_keys\]...)
|
||||
### JSONExtractRaw
|
||||
|
||||
Returns a part of JSON as unparsed string.
|
||||
Returns part of the JSON as an unparsed string. If the part does not exist or has the wrong type, an empty string will be returned.
|
||||
|
||||
If the part does not exist or has a wrong type, an empty string will be returned.
|
||||
**Syntax**
|
||||
|
||||
Example:
|
||||
```sql
|
||||
JSONExtractRaw(json [, indices_or_keys]...)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns part of the JSON as an unparsed string. If the part does not exist or has the wrong type, an empty string is returned. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT JSONExtractRaw('{"a": "hello", "b": [-100, 200.0, 300]}', 'b') = '[-100, 200.0, 300]';
|
||||
```
|
||||
|
||||
## JSONExtractArrayRaw(json\[, indices_or_keys...\])
|
||||
### JSONExtractArrayRaw
|
||||
|
||||
Returns an array with elements of JSON array, each represented as unparsed string.
|
||||
Returns an array with elements of JSON array, each represented as unparsed string. If the part does not exist or isn’t an array, then an empty array will be returned.
|
||||
|
||||
If the part does not exist or isn’t array, an empty array will be returned.
|
||||
**Syntax**
|
||||
|
||||
Example:
|
||||
```sql
|
||||
JSONExtractArrayRaw(json [, indices_or_keys...])
|
||||
```
|
||||
|
||||
``` sql
|
||||
**Parameters**
|
||||
|
||||
- `json` — JSON string to parse. [String](../data-types/string.md).
|
||||
- `indices_or_keys` — A list of zero or more arguments, each of which can be either string or integer. [String](../data-types/string.md), [Int*](../data-types/int-uint.md).
|
||||
|
||||
`indices_or_keys` type:
|
||||
- String = access object member by key.
|
||||
- Positive integer = access the n-th member/key from the beginning.
|
||||
- Negative integer = access the n-th member/key from the end.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns an array with elements of JSON array, each represented as unparsed string. Otherwise, an empty array is returned if the part does not exist or is not an array. [Array](../data-types/array.md)([String](../data-types/string.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT JSONExtractArrayRaw('{"a": "hello", "b": [-100, 200.0, "hello"]}', 'b') = ['-100', '200.0', '"hello"'];
|
||||
```
|
||||
|
||||
## JSONExtractKeysAndValuesRaw
|
||||
### JSONExtractKeysAndValuesRaw
|
||||
|
||||
Extracts raw data from a JSON object.
|
||||
|
||||
@ -640,13 +924,30 @@ Result:
|
||||
└───────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## JSON_EXISTS(json, path)
|
||||
### JSON_EXISTS
|
||||
|
||||
If the value exists in the JSON document, `1` will be returned.
|
||||
If the value exists in the JSON document, `1` will be returned. If the value does not exist, `0` will be returned.
|
||||
|
||||
If the value does not exist, `0` will be returned.
|
||||
**Syntax**
|
||||
|
||||
Examples:
|
||||
```sql
|
||||
JSON_EXISTS(json, path)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — A string with valid JSON. [String](../data-types/string.md).
|
||||
- `path` — A string representing the path. [String](../data-types/string.md).
|
||||
|
||||
:::note
|
||||
Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json)
|
||||
:::
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Returns `1` if the value exists in the JSON document, otherwise `0`.
|
||||
|
||||
**Examples**
|
||||
|
||||
``` sql
|
||||
SELECT JSON_EXISTS('{"hello":1}', '$.hello');
|
||||
@ -655,17 +956,32 @@ SELECT JSON_EXISTS('{"hello":["world"]}', '$.hello[*]');
|
||||
SELECT JSON_EXISTS('{"hello":["world"]}', '$.hello[0]');
|
||||
```
|
||||
|
||||
### JSON_QUERY
|
||||
|
||||
Parses a JSON and extract a value as a JSON array or JSON object. If the value does not exist, an empty string will be returned.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSON_QUERY(json, path)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — A string with valid JSON. [String](../data-types/string.md).
|
||||
- `path` — A string representing the path. [String](../data-types/string.md).
|
||||
|
||||
:::note
|
||||
Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json)
|
||||
:::
|
||||
|
||||
## JSON_QUERY(json, path)
|
||||
**Returned value**
|
||||
|
||||
Parses a JSON and extract a value as JSON array or JSON object.
|
||||
- Returns the extracted value as a JSON array or JSON object. Otherwise it returns an empty string if the value does not exist. [String](../data-types/string.md).
|
||||
|
||||
If the value does not exist, an empty string will be returned.
|
||||
**Example**
|
||||
|
||||
Example:
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSON_QUERY('{"hello":"world"}', '$.hello');
|
||||
@ -682,17 +998,38 @@ Result:
|
||||
[2]
|
||||
String
|
||||
```
|
||||
|
||||
### JSON_VALUE
|
||||
|
||||
Parses a JSON and extract a value as a JSON scalar. If the value does not exist, an empty string will be returned by default.
|
||||
|
||||
This function is controlled by the following settings:
|
||||
|
||||
- by SET `function_json_value_return_type_allow_nullable` = `true`, `NULL` will be returned. If the value is complex type (such as: struct, array, map), an empty string will be returned by default.
|
||||
- by SET `function_json_value_return_type_allow_complex` = `true`, the complex value will be returned.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
JSON_VALUE(json, path)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `json` — A string with valid JSON. [String](../data-types/string.md).
|
||||
- `path` — A string representing the path. [String](../data-types/string.md).
|
||||
|
||||
:::note
|
||||
Before version 21.11 the order of arguments was wrong, i.e. JSON_QUERY(path, json)
|
||||
Before version 21.11 the order of arguments was wrong, i.e. JSON_EXISTS(path, json)
|
||||
:::
|
||||
|
||||
## JSON_VALUE(json, path)
|
||||
**Returned value**
|
||||
|
||||
Parses a JSON and extract a value as JSON scalar.
|
||||
- Returns the extracted value as a JSON scalar if it exists, otherwise an empty string is returned. [String](../data-types/string.md).
|
||||
|
||||
If the value does not exist, an empty string will be returned by default, and by SET `function_json_value_return_type_allow_nullable` = `true`, `NULL` will be returned. If the value is complex type (such as: struct, array, map), an empty string will be returned by default, and by SET `function_json_value_return_type_allow_complex` = `true`, the complex value will be returned.
|
||||
**Example**
|
||||
|
||||
Example:
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT JSON_VALUE('{"hello":"world"}', '$.hello');
|
||||
@ -712,11 +1049,7 @@ world
|
||||
String
|
||||
```
|
||||
|
||||
:::note
|
||||
Before version 21.11 the order of arguments was wrong, i.e. JSON_VALUE(path, json)
|
||||
:::
|
||||
|
||||
## toJSONString
|
||||
### toJSONString
|
||||
|
||||
Serializes a value to its JSON representation. Various data types and nested structures are supported.
|
||||
64-bit [integers](../data-types/int-uint.md) or bigger (like `UInt64` or `Int128`) are enclosed in quotes by default. [output_format_json_quote_64bit_integers](../../operations/settings/settings.md#session_settings-output_format_json_quote_64bit_integers) controls this behavior.
|
||||
@ -762,7 +1095,7 @@ Result:
|
||||
- [output_format_json_quote_denormals](../../operations/settings/settings.md#settings-output_format_json_quote_denormals)
|
||||
|
||||
|
||||
## JSONArrayLength
|
||||
### JSONArrayLength
|
||||
|
||||
Returns the number of elements in the outermost JSON array. The function returns NULL if input JSON string is invalid.
|
||||
|
||||
@ -795,7 +1128,7 @@ SELECT
|
||||
```
|
||||
|
||||
|
||||
## jsonMergePatch
|
||||
### jsonMergePatch
|
||||
|
||||
Returns the merged JSON object string which is formed by merging multiple JSON objects.
|
||||
|
||||
|
@ -735,6 +735,8 @@ LIMIT 10
|
||||
|
||||
Given a size (number of bytes), this function returns a readable, rounded size with suffix (KB, MB, etc.) as string.
|
||||
|
||||
The opposite operations of this function are [fromReadableDecimalSize](#fromReadableDecimalSize), [fromReadableDecimalSizeOrZero](#fromReadableDecimalSizeOrZero), and [fromReadableDecimalSizeOrNull](#fromReadableDecimalSizeOrNull).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
@ -766,6 +768,8 @@ Result:
|
||||
|
||||
Given a size (number of bytes), this function returns a readable, rounded size with suffix (KiB, MiB, etc.) as string.
|
||||
|
||||
The opposite operations of this function are [fromReadableSize](#fromReadableSize), [fromReadableSizeOrZero](#fromReadableSizeOrZero), and [fromReadableSizeOrNull](#fromReadableSizeOrNull).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
@ -890,6 +894,238 @@ SELECT
|
||||
└────────────────────┴────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## fromReadableSize
|
||||
|
||||
Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it throws an exception.
|
||||
|
||||
The opposite operation of this function is [formatReadableSize](#fromReadableSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableSize(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer ([UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB']) AS readable_sizes,
|
||||
fromReadableSize(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## fromReadableSizeOrNull
|
||||
|
||||
Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it returns `NULL`.
|
||||
|
||||
The opposite operation of this function is [formatReadableSize](#fromReadableSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableSizeOrNull(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer, or NULL if unable to parse the input (Nullable([UInt64](../../sql-reference/data-types/int-uint.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB', 'invalid']) AS readable_sizes,
|
||||
fromReadableSizeOrNull(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
│ invalid │ ᴺᵁᴸᴸ │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## fromReadableSizeOrZero
|
||||
|
||||
Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it returns `0`.
|
||||
|
||||
The opposite operation of this function is [formatReadableSize](#fromReadableSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableSizeOrZero(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer, or 0 if unable to parse the input ([UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB', 'invalid']) AS readable_sizes,
|
||||
fromReadableSizeOrZero(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
│ invalid │ 0 │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## fromReadableDecimalSize
|
||||
|
||||
Given a string containing a byte size and `B`, `KB`, `MB`, etc. as a unit, this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it throws an exception.
|
||||
|
||||
The opposite operation of this function is [formatReadableDecimalSize](#formatReadableDecimalSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableDecimalSize(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with decimal units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer ([UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB']) AS readable_sizes,
|
||||
fromReadableDecimalSize(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5314 │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## fromReadableDecimalSizeOrNull
|
||||
|
||||
Given a string containing a byte size and `B`, `KB`, `MB`, etc. as a unit, this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it returns `NULL`.
|
||||
|
||||
The opposite operation of this function is [formatReadableDecimalSize](#formatReadableDecimalSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableDecimalSizeOrNull(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with decimal units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer, or NULL if unable to parse the input (Nullable([UInt64](../../sql-reference/data-types/int-uint.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB', 'invalid']) AS readable_sizes,
|
||||
fromReadableDecimalSizeOrNull(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5314 │
|
||||
│ invalid │ ᴺᵁᴸᴸ │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## fromReadableDecimalSizeOrZero
|
||||
|
||||
Given a string containing a byte size and `B`, `KB`, `MB`, etc. as a unit, this function returns the corresponding number of bytes.
|
||||
If the function is unable to parse the input value, it returns `0`.
|
||||
|
||||
The opposite operation of this function is [formatReadableDecimalSize](#formatReadableDecimalSize).
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
fromReadableDecimalSizeOrZero(x)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` : Readable size with decimal units ([String](../../sql-reference/data-types/string.md)).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Number of bytes, rounded up to the nearest integer, or 0 if unable to parse the input ([UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB', 'invalid']) AS readable_sizes,
|
||||
fromReadableSizeOrZero(readable_sizes) AS sizes
|
||||
```
|
||||
|
||||
```text
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5000 │
|
||||
│ invalid │ 0 │
|
||||
└────────────────┴─────────┘
|
||||
```
|
||||
|
||||
## parseTimeDelta
|
||||
|
||||
Parse a sequence of numbers followed by something resembling a time unit.
|
||||
|
@ -6,7 +6,33 @@ sidebar_label: URLs
|
||||
|
||||
# Functions for Working with URLs
|
||||
|
||||
All these functions do not follow the RFC. They are maximally simplified for improved performance.
|
||||
:::note
|
||||
The functions mentioned in this section are optimized for maximum performance and for the most part do not follow the RFC-3986 standard. Functions which implement RFC-3986 have `RFC` appended to their function name and are generally slower.
|
||||
:::
|
||||
|
||||
You can generally use the non-`RFC` function variants when working with publicly registered domains that contain neither user strings nor `@` symbols.
|
||||
The table below details which symbols in a URL can (`✔`) or cannot (`✗`) be parsed by the respective `RFC` and non-`RFC` variants:
|
||||
|
||||
|Symbol | non-`RFC`| `RFC` |
|
||||
|-------|----------|-------|
|
||||
| ' ' | ✗ |✗ |
|
||||
| \t | ✗ |✗ |
|
||||
| < | ✗ |✗ |
|
||||
| > | ✗ |✗ |
|
||||
| % | ✗ |✔* |
|
||||
| { | ✗ |✗ |
|
||||
| } | ✗ |✗ |
|
||||
| \| | ✗ |✗ |
|
||||
| \\\ | ✗ |✗ |
|
||||
| ^ | ✗ |✗ |
|
||||
| ~ | ✗ |✔* |
|
||||
| [ | ✗ |✗ |
|
||||
| ] | ✗ |✔ |
|
||||
| ; | ✗ |✔* |
|
||||
| = | ✗ |✔* |
|
||||
| & | ✗ |✔* |
|
||||
|
||||
symbols marked `*` are sub-delimiters in RFC 3986 and allowed for user info following the `@` symbol.
|
||||
|
||||
## Functions that Extract Parts of a URL
|
||||
|
||||
@ -16,21 +42,23 @@ If the relevant part isn’t present in a URL, an empty string is returned.
|
||||
|
||||
Extracts the protocol from a URL.
|
||||
|
||||
Examples of typical returned values: http, https, ftp, mailto, tel, magnet...
|
||||
Examples of typical returned values: http, https, ftp, mailto, tel, magnet.
|
||||
|
||||
### domain
|
||||
|
||||
Extracts the hostname from a URL.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
domain(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
The URL can be specified with or without a scheme. Examples:
|
||||
The URL can be specified with or without a protocol. Examples:
|
||||
|
||||
``` text
|
||||
svn+ssh://some.svn-hosting.com:80/repo/trunk
|
||||
@ -48,7 +76,7 @@ clickhouse.com
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Host name if ClickHouse can parse the input string as a URL, otherwise an empty string. [String](../data-types/string.md).
|
||||
- Host name if the input string can be parsed as a URL, otherwise an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -62,9 +90,103 @@ SELECT domain('svn+ssh://some.svn-hosting.com:80/repo/trunk');
|
||||
└────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### domainRFC
|
||||
|
||||
Extracts the hostname from a URL. Similar to [domain](#domain), but RFC 3986 conformant.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
domainRFC(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Host name if the input string can be parsed as a URL, otherwise an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
domain('http://user:password@example.com:8080/path?query=value#fragment'),
|
||||
domainRFC('http://user:password@example.com:8080/path?query=value#fragment');
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─domain('http://user:password@example.com:8080/path?query=value#fragment')─┬─domainRFC('http://user:password@example.com:8080/path?query=value#fragment')─┐
|
||||
│ │ example.com │
|
||||
└───────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### domainWithoutWWW
|
||||
|
||||
Returns the domain and removes no more than one ‘www.’ from the beginning of it, if present.
|
||||
Returns the domain without leading `www.` if present.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
domainWithoutWWW(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Domain name if the input string can be parsed as a URL (without leading `www.`), otherwise an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT domainWithoutWWW('http://paul@www.example.com:80/');
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─domainWithoutWWW('http://paul@www.example.com:80/')─┐
|
||||
│ example.com │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### domainWithoutWWWRFC
|
||||
|
||||
Returns the domain without leading `www.` if present. Similar to [domainWithoutWWW](#domainwithoutwww) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
domainWithoutWWWRFC(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Domain name if the input string can be parsed as a URL (without leading `www.`), otherwise an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
domainWithoutWWW('http://user:password@www.example.com:8080/path?query=value#fragment'),
|
||||
domainWithoutWWWRFC('http://user:password@www.example.com:8080/path?query=value#fragment');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─domainWithoutWWW('http://user:password@www.example.com:8080/path?query=value#fragment')─┬─domainWithoutWWWRFC('http://user:password@www.example.com:8080/path?query=value#fragment')─┐
|
||||
│ │ example.com │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### topLevelDomain
|
||||
|
||||
@ -76,63 +198,314 @@ topLevelDomain(url)
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
The URL can be specified with or without a scheme. Examples:
|
||||
:::note
|
||||
The URL can be specified with or without a protocol. Examples:
|
||||
|
||||
``` text
|
||||
svn+ssh://some.svn-hosting.com:80/repo/trunk
|
||||
some.svn-hosting.com:80/repo/trunk
|
||||
https://clickhouse.com/time/
|
||||
```
|
||||
:::
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Domain name if ClickHouse can parse the input string as a URL. Otherwise, an empty string. [String](../data-types/string.md).
|
||||
- Domain name if the input string can be parsed as a URL. Otherwise, an empty string. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─topLevelDomain('svn+ssh://www.some.svn-hosting.com:80/repo/trunk')─┐
|
||||
│ com │
|
||||
└────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### topLevelDomainRFC
|
||||
|
||||
Extracts the the top-level domain from a URL.
|
||||
Similar to [topLevelDomain](#topleveldomain), but conforms to RFC 3986.
|
||||
|
||||
``` sql
|
||||
topLevelDomainRFC(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
:::note
|
||||
The URL can be specified with or without a protocol. Examples:
|
||||
|
||||
``` text
|
||||
svn+ssh://some.svn-hosting.com:80/repo/trunk
|
||||
some.svn-hosting.com:80/repo/trunk
|
||||
https://clickhouse.com/time/
|
||||
```
|
||||
:::
|
||||
|
||||
**Returned values**
|
||||
|
||||
- Domain name if the input string can be parsed as a URL. Otherwise, an empty string. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT topLevelDomain('http://foo:foo%41bar@foo.com'), topLevelDomainRFC('http://foo:foo%41bar@foo.com');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─topLevelDomain('http://foo:foo%41bar@foo.com')─┬─topLevelDomainRFC('http://foo:foo%41bar@foo.com')─┐
|
||||
│ │ com │
|
||||
└────────────────────────────────────────────────┴───────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### firstSignificantSubdomain
|
||||
|
||||
Returns the “first significant subdomain”. The first significant subdomain is a second-level domain if it is ‘com’, ‘net’, ‘org’, or ‘co’. Otherwise, it is a third-level domain. For example, `firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’`. The list of “insignificant” second-level domains and other implementation details may change in the future.
|
||||
Returns the “first significant subdomain”.
|
||||
The first significant subdomain is a second-level domain for `com`, `net`, `org`, or `co`, otherwise it is a third-level domain.
|
||||
For example, `firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’`.
|
||||
The list of "insignificant" second-level domains and other implementation details may change in the future.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
firstSignificantSubdomain(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- The first significant subdomain. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT firstSignificantSubdomain('http://www.example.com/a/b/c?a=b')
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```reference
|
||||
┌─firstSignificantSubdomain('http://www.example.com/a/b/c?a=b')─┐
|
||||
│ example │
|
||||
└───────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### firstSignificantSubdomainRFC
|
||||
|
||||
Returns the “first significant subdomain”.
|
||||
The first significant subdomain is a second-level domain for `com`, `net`, `org`, or `co`, otherwise it is a third-level domain.
|
||||
For example, `firstSignificantSubdomain (‘https://news.clickhouse.com/’) = ‘clickhouse’, firstSignificantSubdomain (‘https://news.clickhouse.com.tr/’) = ‘clickhouse’`.
|
||||
The list of "insignificant" second-level domains and other implementation details may change in the future.
|
||||
Similar to [firstSignficantSubdomain](#firstsignificantsubdomain) but conforms to RFC 1034.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
firstSignificantSubdomainRFC(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- The first significant subdomain. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
firstSignificantSubdomain('http://user:password@example.com:8080/path?query=value#fragment'),
|
||||
firstSignificantSubdomainRFC('http://user:password@example.com:8080/path?query=value#fragment');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```reference
|
||||
┌─firstSignificantSubdomain('http://user:password@example.com:8080/path?query=value#fragment')─┬─firstSignificantSubdomainRFC('http://user:password@example.com:8080/path?query=value#fragment')─┐
|
||||
│ │ example │
|
||||
└──────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### cutToFirstSignificantSubdomain
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain” (see the explanation above).
|
||||
Returns the part of the domain that includes top-level subdomains up to the [“first significant subdomain”](#firstsignificantsubdomain).
|
||||
|
||||
For example:
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
cutToFirstSignificantSubdomain(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain if possible, otherwise returns an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/'),
|
||||
cutToFirstSignificantSubdomain('www.tr'),
|
||||
cutToFirstSignificantSubdomain('tr');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/')─┬─cutToFirstSignificantSubdomain('www.tr')─┬─cutToFirstSignificantSubdomain('tr')─┐
|
||||
│ clickhouse.com.tr │ tr │ │
|
||||
└───────────────────────────────────────────────────────────────────┴──────────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### cutToFirstSignificantSubdomainRFC
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the [“first significant subdomain”](#firstsignificantsubdomain).
|
||||
Similar to [cutToFirstSignificantSubdomain](#cuttofirstsignificantsubdomain) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
cutToFirstSignificantSubdomainRFC(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain if possible, otherwise returns an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
cutToFirstSignificantSubdomain('http://user:password@example.com:8080'),
|
||||
cutToFirstSignificantSubdomainRFC('http://user:password@example.com:8080');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─cutToFirstSignificantSubdomain('http://user:password@example.com:8080')─┬─cutToFirstSignificantSubdomainRFC('http://user:password@example.com:8080')─┐
|
||||
│ │ example.com │
|
||||
└─────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- `cutToFirstSignificantSubdomain('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'`.
|
||||
- `cutToFirstSignificantSubdomain('www.tr') = 'tr'`.
|
||||
- `cutToFirstSignificantSubdomain('tr') = ''`.
|
||||
|
||||
### cutToFirstSignificantSubdomainWithWWW
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the “first significant subdomain”, without stripping "www".
|
||||
Returns the part of the domain that includes top-level subdomains up to the "first significant subdomain", without stripping `www`.
|
||||
|
||||
For example:
|
||||
**Syntax**
|
||||
|
||||
- `cutToFirstSignificantSubdomainWithWWW('https://news.clickhouse.com.tr/') = 'clickhouse.com.tr'`.
|
||||
- `cutToFirstSignificantSubdomainWithWWW('www.tr') = 'www.tr'`.
|
||||
- `cutToFirstSignificantSubdomainWithWWW('tr') = ''`.
|
||||
```sql
|
||||
cutToFirstSignificantSubdomainWithWWW(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain (with `www`) if possible, otherwise returns an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
cutToFirstSignificantSubdomainWithWWW('https://news.clickhouse.com.tr/'),
|
||||
cutToFirstSignificantSubdomainWithWWW('www.tr'),
|
||||
cutToFirstSignificantSubdomainWithWWW('tr');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─cutToFirstSignificantSubdomainWithWWW('https://news.clickhouse.com.tr/')─┬─cutToFirstSignificantSubdomainWithWWW('www.tr')─┬─cutToFirstSignificantSubdomainWithWWW('tr')─┐
|
||||
│ clickhouse.com.tr │ www.tr │ │
|
||||
└──────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────┴─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### cutToFirstSignificantSubdomainWithWWWRFC
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the "first significant subdomain", without stripping `www`.
|
||||
Similar to [cutToFirstSignificantSubdomainWithWWW](#cuttofirstsignificantsubdomaincustomwithwww) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
cutToFirstSignificantSubdomainWithWWW(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain (with "www") if possible, otherwise returns an empty string. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
cutToFirstSignificantSubdomainWithWWW('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy'),
|
||||
cutToFirstSignificantSubdomainWithWWWRFC('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─cutToFirstSignificantSubdomainWithWWW('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy')─┬─cutToFirstSignificantSubdomainWithWWWRFC('http:%2F%2Fwwwww.nova@mail.ru/economicheskiy')─┐
|
||||
│ │ mail.ru │
|
||||
└───────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### cutToFirstSignificantSubdomainCustom
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain. Accepts custom [TLD list](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) name.
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain.
|
||||
Accepts custom [TLD list](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) name.
|
||||
This function can be useful if you need a fresh TLD list or if you have a custom list.
|
||||
|
||||
Can be useful if you need fresh TLD list or you have custom.
|
||||
|
||||
Configuration example:
|
||||
**Configuration example**
|
||||
|
||||
```xml
|
||||
<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
|
||||
@ -146,17 +519,17 @@ Configuration example:
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
cutToFirstSignificantSubdomainCustom(URL, TLD)
|
||||
cutToFirstSignificantSubdomain(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `URL` — URL. [String](../data-types/string.md).
|
||||
- `TLD` — Custom TLD list name. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../data-types/string.md).
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -178,13 +551,39 @@ Result:
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### cutToFirstSignificantSubdomainCustomRFC
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain.
|
||||
Accepts custom [TLD list](https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) name.
|
||||
This function can be useful if you need a fresh TLD list or if you have a custom list.
|
||||
Similar to [cutToFirstSignificantSubdomainCustom](#cuttofirstsignificantsubdomaincustom) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
cutToFirstSignificantSubdomainRFC(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**See Also**
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### cutToFirstSignificantSubdomainCustomWithWWW
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`. Accepts custom TLD list name.
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`.
|
||||
Accepts custom TLD list name.
|
||||
It can be useful if you need a fresh TLD list or if you have a custom list.
|
||||
|
||||
Can be useful if you need fresh TLD list or you have custom.
|
||||
|
||||
Configuration example:
|
||||
**Configuration example**
|
||||
|
||||
```xml
|
||||
<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
|
||||
@ -198,13 +597,13 @@ Configuration example:
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
cutToFirstSignificantSubdomainCustomWithWWW(URL, TLD)
|
||||
cutToFirstSignificantSubdomainCustomWithWWW(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `URL` — URL. [String](../data-types/string.md).
|
||||
- `TLD` — Custom TLD list name. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
@ -230,10 +629,36 @@ Result:
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### cutToFirstSignificantSubdomainCustomWithWWWRFC
|
||||
|
||||
Returns the part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`.
|
||||
Accepts custom TLD list name.
|
||||
It can be useful if you need a fresh TLD list or if you have a custom list.
|
||||
Similar to [cutToFirstSignificantSubdomainCustomWithWWW](#cuttofirstsignificantsubdomaincustomwithwww) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
cutToFirstSignificantSubdomainCustomWithWWWRFC(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**See Also**
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### firstSignificantSubdomainCustom
|
||||
|
||||
Returns the first significant subdomain. Accepts customs TLD list name.
|
||||
|
||||
Returns the first significant subdomain.
|
||||
Accepts customs TLD list name.
|
||||
Can be useful if you need fresh TLD list or you have custom.
|
||||
|
||||
Configuration example:
|
||||
@ -250,17 +675,17 @@ Configuration example:
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
firstSignificantSubdomainCustom(URL, TLD)
|
||||
firstSignificantSubdomainCustom(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `URL` — URL. [String](../data-types/string.md).
|
||||
- `TLD` — Custom TLD list name. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- First significant subdomain. [String](../data-types/string.md).
|
||||
- First significant subdomain. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -282,47 +707,156 @@ Result:
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### port(URL\[, default_port = 0\])
|
||||
### firstSignificantSubdomainCustomRFC
|
||||
|
||||
Returns the port or `default_port` if there is no port in the URL (or in case of validation error).
|
||||
Returns the first significant subdomain.
|
||||
Accepts customs TLD list name.
|
||||
Can be useful if you need fresh TLD list or you have custom.
|
||||
Similar to [firstSignificantSubdomainCustom](#firstsignificantsubdomaincustom) but conforms to RFC 3986.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
firstSignificantSubdomainCustomRFC(url, tld)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `tld` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- First significant subdomain. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**See Also**
|
||||
|
||||
- [firstSignificantSubdomain](#firstsignificantsubdomain).
|
||||
|
||||
### port
|
||||
|
||||
Returns the port or `default_port` if the URL contains no port or cannot be parsed.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
port(url [, default_port = 0])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
- `default_port` — The default port number to be returned. [UInt16](../data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Port or the default port if there is no port in the URL or in case of a validation error. [UInt16](../data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT port('http://paul@www.example.com:80/');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─port('http://paul@www.example.com:80/')─┐
|
||||
│ 80 │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### portRFC
|
||||
|
||||
Returns the port or `default_port` if the URL contains no port or cannot be parsed.
|
||||
Similar to [port](#port), but RFC 3986 conformant.
|
||||
|
||||
**Syntax**
|
||||
|
||||
```sql
|
||||
portRFC(url [, default_port = 0])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `default_port` — The default port number to be returned. [UInt16](../data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Port or the default port if there is no port in the URL or in case of a validation error. [UInt16](../data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
port('http://user:password@example.com:8080'),
|
||||
portRFC('http://user:password@example.com:8080');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```resposne
|
||||
┌─port('http://user:password@example.com:8080')─┬─portRFC('http://user:password@example.com:8080')─┐
|
||||
│ 0 │ 8080 │
|
||||
└───────────────────────────────────────────────┴──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### path
|
||||
|
||||
Returns the path. Example: `/top/news.html` The path does not include the query string.
|
||||
Returns the path without query string.
|
||||
|
||||
Example: `/top/news.html`.
|
||||
|
||||
### pathFull
|
||||
|
||||
The same as above, but including query string and fragment. Example: /top/news.html?page=2#comments
|
||||
The same as above, but including query string and fragment.
|
||||
|
||||
Example: `/top/news.html?page=2#comments`.
|
||||
|
||||
### queryString
|
||||
|
||||
Returns the query string. Example: page=1&lr=213. query-string does not include the initial question mark, as well as # and everything after #.
|
||||
Returns the query string without the initial question mark, `#` and everything after `#`.
|
||||
|
||||
Example: `page=1&lr=213`.
|
||||
|
||||
### fragment
|
||||
|
||||
Returns the fragment identifier. fragment does not include the initial hash symbol.
|
||||
Returns the fragment identifier without the initial hash symbol.
|
||||
|
||||
### queryStringAndFragment
|
||||
|
||||
Returns the query string and fragment identifier. Example: page=1#29390.
|
||||
Returns the query string and fragment identifier.
|
||||
|
||||
### extractURLParameter(URL, name)
|
||||
Example: `page=1#29390`.
|
||||
|
||||
Returns the value of the ‘name’ parameter in the URL, if present. Otherwise, an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.
|
||||
### extractURLParameter(url, name)
|
||||
|
||||
### extractURLParameters(URL)
|
||||
Returns the value of the `name` parameter in the URL, if present, otherwise an empty string is returned.
|
||||
If there are multiple parameters with this name, the first occurrence is returned.
|
||||
The function assumes that the parameter in the `url` parameter is encoded in the same way as in the `name` argument.
|
||||
|
||||
Returns an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.
|
||||
### extractURLParameters(url)
|
||||
|
||||
### extractURLParameterNames(URL)
|
||||
Returns an array of `name=value` strings corresponding to the URL parameters.
|
||||
The values are not decoded.
|
||||
|
||||
Returns an array of name strings corresponding to the names of URL parameters. The values are not decoded in any way.
|
||||
### extractURLParameterNames(url)
|
||||
|
||||
### URLHierarchy(URL)
|
||||
Returns an array of name strings corresponding to the names of URL parameters.
|
||||
The values are not decoded.
|
||||
|
||||
Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters.
|
||||
### URLHierarchy(url)
|
||||
|
||||
### URLPathHierarchy(URL)
|
||||
Returns an array containing the URL, truncated at the end by the symbols /,? in the path and query-string.
|
||||
Consecutive separator characters are counted as one.
|
||||
The cut is made in the position after all the consecutive separator characters.
|
||||
|
||||
### URLPathHierarchy(url)
|
||||
|
||||
The same as above, but without the protocol and host in the result. The / element (root) is not included.
|
||||
|
||||
@ -334,9 +868,10 @@ URLPathHierarchy('https://example.com/browse/CONV-6788') =
|
||||
]
|
||||
```
|
||||
|
||||
### encodeURLComponent(URL)
|
||||
### encodeURLComponent(url)
|
||||
|
||||
Returns the encoded URL.
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
@ -349,9 +884,10 @@ SELECT encodeURLComponent('http://127.0.0.1:8123/?query=SELECT 1;') AS EncodedUR
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### decodeURLComponent(URL)
|
||||
### decodeURLComponent(url)
|
||||
|
||||
Returns the decoded URL.
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
@ -364,9 +900,10 @@ SELECT decodeURLComponent('http://127.0.0.1:8123/?query=SELECT%201%3B') AS Decod
|
||||
└────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### encodeURLFormComponent(URL)
|
||||
### encodeURLFormComponent(url)
|
||||
|
||||
Returns the encoded URL. Follows rfc-1866, space(` `) is encoded as plus(`+`).
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
@ -379,9 +916,10 @@ SELECT encodeURLFormComponent('http://127.0.0.1:8123/?query=SELECT 1 2+3') AS En
|
||||
└───────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### decodeURLFormComponent(URL)
|
||||
### decodeURLFormComponent(url)
|
||||
|
||||
Returns the decoded URL. Follows rfc-1866, plain plus(`+`) is decoded as space(` `).
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
@ -401,12 +939,12 @@ Extracts network locality (`username:password@host:port`) from a URL.
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
netloc(URL)
|
||||
netloc(url)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
@ -428,44 +966,45 @@ Result:
|
||||
└───────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Functions that Remove Part of a URL
|
||||
## Functions that remove part of a URL
|
||||
|
||||
If the URL does not have anything similar, the URL remains unchanged.
|
||||
|
||||
### cutWWW
|
||||
|
||||
Removes no more than one ‘www.’ from the beginning of the URL’s domain, if present.
|
||||
Removes leading `www.` (if present) from the URL’s domain.
|
||||
|
||||
### cutQueryString
|
||||
|
||||
Removes query string. The question mark is also removed.
|
||||
Removes query string, including the question mark.
|
||||
|
||||
### cutFragment
|
||||
|
||||
Removes the fragment identifier. The number sign is also removed.
|
||||
Removes the fragment identifier, including the number sign.
|
||||
|
||||
### cutQueryStringAndFragment
|
||||
|
||||
Removes the query string and fragment identifier. The question mark and number sign are also removed.
|
||||
Removes the query string and fragment identifier, including the question mark and number sign.
|
||||
|
||||
### cutURLParameter(URL, name)
|
||||
### cutURLParameter(url, name)
|
||||
|
||||
Removes the `name` parameter from URL, if present. This function does not encode or decode characters in parameter names, e.g. `Client ID` and `Client%20ID` are treated as different parameter names.
|
||||
Removes the `name` parameter from a URL, if present.
|
||||
This function does not encode or decode characters in parameter names, e.g. `Client ID` and `Client%20ID` are treated as different parameter names.
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
cutURLParameter(URL, name)
|
||||
cutURLParameter(url, name)
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `url` — URL. [String](../data-types/string.md).
|
||||
- `name` — name of URL parameter. [String](../data-types/string.md) or [Array](../data-types/array.md) of Strings.
|
||||
- `url` — URL. [String](../../sql-reference/data-types/string.md).
|
||||
- `name` — name of URL parameter. [String](../../sql-reference/data-types/string.md) or [Array](../../sql-reference/data-types/array.md) of Strings.
|
||||
|
||||
**Returned value**
|
||||
|
||||
- URL with `name` URL parameter removed. [String](../data-types/string.md).
|
||||
- url with `name` URL parameter removed. [String](../data-types/string.md).
|
||||
|
||||
**Example**
|
||||
|
||||
|
@ -126,149 +126,6 @@ SELECT generateUUIDv7(1), generateUUIDv7(2);
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## generateUUIDv7ThreadMonotonic
|
||||
|
||||
Generates a [UUID](../data-types/uuid.md) of [version 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04).
|
||||
|
||||
The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit) to distinguish UUIDs within a millisecond (including a variant field "2", 2 bit), and a random field (32 bits).
|
||||
For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes.
|
||||
In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value.
|
||||
|
||||
This function behaves like [generateUUIDv7](#generateUUIDv7) but gives no guarantee on counter monotony across different simultaneous requests.
|
||||
Monotonicity within one timestamp is guaranteed only within the same thread calling this function to generate UUIDs.
|
||||
|
||||
```
|
||||
0 1 2 3
|
||||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms | ver | counter_high_bits |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
|var| counter_low_bits |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| rand_b |
|
||||
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
|
||||
```
|
||||
|
||||
:::note
|
||||
As of April 2024, version 7 UUIDs are in draft status and their layout may change in future.
|
||||
:::
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
generateUUIDv7ThreadMonotonic([expr])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `expr` — An arbitrary [expression](../syntax.md#syntax-expressions) used to bypass [common subexpression elimination](../functions/index.md#common-subexpression-elimination) if the function is called multiple times in a query. The value of the expression has no effect on the returned UUID. Optional.
|
||||
|
||||
**Returned value**
|
||||
|
||||
A value of type UUIDv7.
|
||||
|
||||
**Usage example**
|
||||
|
||||
First, create a table with a column of type UUID, then insert a generated UUIDv7 into the table.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE tab (uuid UUID) ENGINE = Memory;
|
||||
|
||||
INSERT INTO tab SELECT generateUUIDv7ThreadMonotonic();
|
||||
|
||||
SELECT * FROM tab;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─────────────────────────────────uuid─┐
|
||||
│ 018f05e2-e3b2-70cb-b8be-64b09b626d32 │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Example with multiple UUIDs generated per row**
|
||||
|
||||
```sql
|
||||
SELECT generateUUIDv7ThreadMonotonic(1), generateUUIDv7ThreadMonotonic(2);
|
||||
|
||||
┌─generateUUIDv7ThreadMonotonic(1)─────┬─generateUUIDv7ThreadMonotonic(2)─────┐
|
||||
│ 018f05e1-14ee-7bc5-9906-207153b400b1 │ 018f05e1-14ee-7bc5-9906-2072b8e96758 │
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## generateUUIDv7NonMonotonic
|
||||
|
||||
Generates a [UUID](../data-types/uuid.md) of [version 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04).
|
||||
|
||||
The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits) and a random field (76 bits, including a 2-bit variant field "2").
|
||||
|
||||
This function is the fastest `generateUUIDv7*` function but it gives no monotonicity guarantees within a timestamp.
|
||||
|
||||
```
|
||||
0 1 2 3
|
||||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms | ver | rand_a |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
|var| rand_b |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| rand_b |
|
||||
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
|
||||
```
|
||||
|
||||
:::note
|
||||
As of April 2024, version 7 UUIDs are in draft status and their layout may change in future.
|
||||
:::
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
generateUUIDv7NonMonotonic([expr])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `expr` — An arbitrary [expression](../syntax.md#syntax-expressions) used to bypass [common subexpression elimination](../functions/index.md#common-subexpression-elimination) if the function is called multiple times in a query. The value of the expression has no effect on the returned UUID. Optional.
|
||||
|
||||
**Returned value**
|
||||
|
||||
A value of type UUIDv7.
|
||||
|
||||
**Example**
|
||||
|
||||
First, create a table with a column of type UUID, then insert a generated UUIDv7 into the table.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE tab (uuid UUID) ENGINE = Memory;
|
||||
|
||||
INSERT INTO tab SELECT generateUUIDv7NonMonotonic();
|
||||
|
||||
SELECT * FROM tab;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌─────────────────────────────────uuid─┐
|
||||
│ 018f05af-f4a8-778f-beee-1bedbc95c93b │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Example with multiple UUIDs generated per row**
|
||||
|
||||
```sql
|
||||
SELECT generateUUIDv7NonMonotonic(1), generateUUIDv7NonMonotonic(2);
|
||||
|
||||
┌─generateUUIDv7NonMonotonic(1) ───────┬─generateUUIDv7(2)NonMonotonic────────┐
|
||||
│ 018f05b1-8c2e-7567-a988-48d09606ae8c │ 018f05b1-8c2e-7946-895b-fcd7635da9a0 │
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## empty
|
||||
|
||||
Checks whether the input UUID is empty.
|
||||
@ -746,71 +603,6 @@ SELECT generateSnowflakeID(1), generateSnowflakeID(2);
|
||||
└────────────────────────┴────────────────────────┘
|
||||
```
|
||||
|
||||
## generateSnowflakeIDThreadMonotonic
|
||||
|
||||
Generates a [Snowflake ID](https://en.wikipedia.org/wiki/Snowflake_ID).
|
||||
|
||||
The generated Snowflake ID contains the current Unix timestamp in milliseconds 41 (+ 1 top zero bit) bits, followed by machine id (10 bits), a counter (12 bits) to distinguish IDs within a millisecond.
|
||||
For any given timestamp (unix_ts_ms), the counter starts at 0 and is incremented by 1 for each new Snowflake ID until the timestamp changes.
|
||||
In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to 0.
|
||||
|
||||
This function behaves like `generateSnowflakeID` but gives no guarantee on counter monotony across different simultaneous requests.
|
||||
Monotonicity within one timestamp is guaranteed only within the same thread calling this function to generate Snowflake IDs.
|
||||
|
||||
```
|
||||
0 1 2 3
|
||||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
|0| timestamp |
|
||||
├─┼ ┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| | machine_id | machine_seq_num |
|
||||
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
|
||||
```
|
||||
|
||||
**Syntax**
|
||||
|
||||
``` sql
|
||||
generateSnowflakeIDThreadMonotonic([expr])
|
||||
```
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `expr` — An arbitrary [expression](../../sql-reference/syntax.md#syntax-expressions) used to bypass [common subexpression elimination](../../sql-reference/functions/index.md#common-subexpression-elimination) if the function is called multiple times in a query. The value of the expression has no effect on the returned Snowflake ID. Optional.
|
||||
|
||||
**Returned value**
|
||||
|
||||
A value of type UInt64.
|
||||
|
||||
**Example**
|
||||
|
||||
First, create a table with a column of type UInt64, then insert a generated Snowflake ID into the table.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE tab (id UInt64) ENGINE = Memory;
|
||||
|
||||
INSERT INTO tab SELECT generateSnowflakeIDThreadMonotonic();
|
||||
|
||||
SELECT * FROM tab;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```response
|
||||
┌──────────────────id─┐
|
||||
│ 7199082832006627328 │
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
**Example with multiple Snowflake IDs generated per row**
|
||||
|
||||
```sql
|
||||
SELECT generateSnowflakeIDThreadMonotonic(1), generateSnowflakeIDThreadMonotonic(2);
|
||||
|
||||
┌─generateSnowflakeIDThreadMonotonic(1)─┬─generateSnowflakeIDThreadMonotonic(2)─┐
|
||||
│ 7199082940311945216 │ 7199082940316139520 │
|
||||
└───────────────────────────────────────┴───────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## snowflakeToDateTime
|
||||
|
||||
Extracts the timestamp component of a [Snowflake ID](https://en.wikipedia.org/wiki/Snowflake_ID) in [DateTime](../data-types/datetime.md) format.
|
||||
|
@ -79,8 +79,6 @@ ORDER BY ts, event_type;
|
||||
│ 2020-01-03 00:00:00 │ imp │ │ 2 │ 0 │
|
||||
└─────────────────────┴────────────┴─────────┴────────────┴──────┘
|
||||
|
||||
SET allow_experimental_alter_materialized_view_structure=1;
|
||||
|
||||
ALTER TABLE mv MODIFY QUERY
|
||||
SELECT toStartOfDay(ts) ts, event_type, browser,
|
||||
count() events_cnt,
|
||||
@ -178,7 +176,6 @@ SELECT * FROM mv;
|
||||
└───┘
|
||||
```
|
||||
```sql
|
||||
set allow_experimental_alter_materialized_view_structure=1;
|
||||
ALTER TABLE mv MODIFY QUERY SELECT a * 2 as a FROM src_table;
|
||||
INSERT INTO src_table (a) VALUES (3), (4);
|
||||
SELECT * FROM mv;
|
||||
|
@ -206,6 +206,32 @@ Enables background data distribution when inserting data into distributed tables
|
||||
SYSTEM START DISTRIBUTED SENDS [db.]<distributed_table_name> [ON CLUSTER cluster_name]
|
||||
```
|
||||
|
||||
### STOP LISTEN
|
||||
|
||||
Closes the socket and gracefully terminates the existing connections to the server on the specified port with the specified protocol.
|
||||
|
||||
However, if the corresponding protocol settings were not specified in the clickhouse-server configuration, this command will have no effect.
|
||||
|
||||
```sql
|
||||
SYSTEM STOP LISTEN [ON CLUSTER cluster_name] [QUERIES ALL | QUERIES DEFAULT | QUERIES CUSTOM | TCP | TCP WITH PROXY | TCP SECURE | HTTP | HTTPS | MYSQL | GRPC | POSTGRESQL | PROMETHEUS | CUSTOM 'protocol']
|
||||
```
|
||||
|
||||
- If `CUSTOM 'protocol'` modifier is specified, the custom protocol with the specified name defined in the protocols section of the server configuration will be stopped.
|
||||
- If `QUERIES ALL [EXCEPT .. [,..]]` modifier is specified, all protocols are stopped, unless specified with `EXCEPT` clause.
|
||||
- If `QUERIES DEFAULT [EXCEPT .. [,..]]` modifier is specified, all default protocols are stopped, unless specified with `EXCEPT` clause.
|
||||
- If `QUERIES CUSTOM [EXCEPT .. [,..]]` modifier is specified, all custom protocols are stopped, unless specified with `EXCEPT` clause.
|
||||
|
||||
### START LISTEN
|
||||
|
||||
Allows new connections to be established on the specified protocols.
|
||||
|
||||
However, if the server on the specified port and protocol was not stopped using the SYSTEM STOP LISTEN command, this command will have no effect.
|
||||
|
||||
```sql
|
||||
SYSTEM START LISTEN [ON CLUSTER cluster_name] [QUERIES ALL | QUERIES DEFAULT | QUERIES CUSTOM | TCP | TCP WITH PROXY | TCP SECURE | HTTP | HTTPS | MYSQL | GRPC | POSTGRESQL | PROMETHEUS | CUSTOM 'protocol']
|
||||
```
|
||||
|
||||
|
||||
## Managing MergeTree Tables
|
||||
|
||||
ClickHouse can manage background processes in [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) tables.
|
||||
@ -463,77 +489,7 @@ Will do sync syscall.
|
||||
SYSTEM SYNC FILE CACHE [ON CLUSTER cluster_name]
|
||||
```
|
||||
|
||||
|
||||
## SYSTEM STOP LISTEN
|
||||
|
||||
Closes the socket and gracefully terminates the existing connections to the server on the specified port with the specified protocol.
|
||||
|
||||
However, if the corresponding protocol settings were not specified in the clickhouse-server configuration, this command will have no effect.
|
||||
|
||||
```sql
|
||||
SYSTEM STOP LISTEN [ON CLUSTER cluster_name] [QUERIES ALL | QUERIES DEFAULT | QUERIES CUSTOM | TCP | TCP WITH PROXY | TCP SECURE | HTTP | HTTPS | MYSQL | GRPC | POSTGRESQL | PROMETHEUS | CUSTOM 'protocol']
|
||||
```
|
||||
|
||||
- If `CUSTOM 'protocol'` modifier is specified, the custom protocol with the specified name defined in the protocols section of the server configuration will be stopped.
|
||||
- If `QUERIES ALL [EXCEPT .. [,..]]` modifier is specified, all protocols are stopped, unless specified with `EXCEPT` clause.
|
||||
- If `QUERIES DEFAULT [EXCEPT .. [,..]]` modifier is specified, all default protocols are stopped, unless specified with `EXCEPT` clause.
|
||||
- If `QUERIES CUSTOM [EXCEPT .. [,..]]` modifier is specified, all custom protocols are stopped, unless specified with `EXCEPT` clause.
|
||||
|
||||
## SYSTEM START LISTEN
|
||||
|
||||
Allows new connections to be established on the specified protocols.
|
||||
|
||||
However, if the server on the specified port and protocol was not stopped using the SYSTEM STOP LISTEN command, this command will have no effect.
|
||||
|
||||
```sql
|
||||
SYSTEM START LISTEN [ON CLUSTER cluster_name] [QUERIES ALL | QUERIES DEFAULT | QUERIES CUSTOM | TCP | TCP WITH PROXY | TCP SECURE | HTTP | HTTPS | MYSQL | GRPC | POSTGRESQL | PROMETHEUS | CUSTOM 'protocol']
|
||||
```
|
||||
|
||||
## Managing Refreshable Materialized Views {#refreshable-materialized-views}
|
||||
|
||||
Commands to control background tasks performed by [Refreshable Materialized Views](../../sql-reference/statements/create/view.md#refreshable-materialized-view)
|
||||
|
||||
Keep an eye on [`system.view_refreshes`](../../operations/system-tables/view_refreshes.md) while using them.
|
||||
|
||||
### SYSTEM REFRESH VIEW
|
||||
|
||||
Trigger an immediate out-of-schedule refresh of a given view.
|
||||
|
||||
```sql
|
||||
SYSTEM REFRESH VIEW [db.]name
|
||||
```
|
||||
|
||||
### SYSTEM STOP VIEW, SYSTEM STOP VIEWS
|
||||
|
||||
Disable periodic refreshing of the given view or all refreshable views. If a refresh is in progress, cancel it too.
|
||||
|
||||
```sql
|
||||
SYSTEM STOP VIEW [db.]name
|
||||
```
|
||||
```sql
|
||||
SYSTEM STOP VIEWS
|
||||
```
|
||||
|
||||
### SYSTEM START VIEW, SYSTEM START VIEWS
|
||||
|
||||
Enable periodic refreshing for the given view or all refreshable views. No immediate refresh is triggered.
|
||||
|
||||
```sql
|
||||
SYSTEM START VIEW [db.]name
|
||||
```
|
||||
```sql
|
||||
SYSTEM START VIEWS
|
||||
```
|
||||
|
||||
### SYSTEM CANCEL VIEW
|
||||
|
||||
If there's a refresh in progress for the given view, interrupt and cancel it. Otherwise do nothing.
|
||||
|
||||
```sql
|
||||
SYSTEM CANCEL VIEW [db.]name
|
||||
```
|
||||
|
||||
### SYSTEM UNLOAD PRIMARY KEY
|
||||
### UNLOAD PRIMARY KEY
|
||||
|
||||
Unload the primary keys for the given table or for all tables.
|
||||
|
||||
@ -544,3 +500,47 @@ SYSTEM UNLOAD PRIMARY KEY [db.]name
|
||||
```sql
|
||||
SYSTEM UNLOAD PRIMARY KEY
|
||||
```
|
||||
|
||||
## Managing Refreshable Materialized Views {#refreshable-materialized-views}
|
||||
|
||||
Commands to control background tasks performed by [Refreshable Materialized Views](../../sql-reference/statements/create/view.md#refreshable-materialized-view)
|
||||
|
||||
Keep an eye on [`system.view_refreshes`](../../operations/system-tables/view_refreshes.md) while using them.
|
||||
|
||||
### REFRESH VIEW
|
||||
|
||||
Trigger an immediate out-of-schedule refresh of a given view.
|
||||
|
||||
```sql
|
||||
SYSTEM REFRESH VIEW [db.]name
|
||||
```
|
||||
|
||||
### STOP VIEW, STOP VIEWS
|
||||
|
||||
Disable periodic refreshing of the given view or all refreshable views. If a refresh is in progress, cancel it too.
|
||||
|
||||
```sql
|
||||
SYSTEM STOP VIEW [db.]name
|
||||
```
|
||||
```sql
|
||||
SYSTEM STOP VIEWS
|
||||
```
|
||||
|
||||
### START VIEW, START VIEWS
|
||||
|
||||
Enable periodic refreshing for the given view or all refreshable views. No immediate refresh is triggered.
|
||||
|
||||
```sql
|
||||
SYSTEM START VIEW [db.]name
|
||||
```
|
||||
```sql
|
||||
SYSTEM START VIEWS
|
||||
```
|
||||
|
||||
### CANCEL VIEW
|
||||
|
||||
If there's a refresh in progress for the given view, interrupt and cancel it. Otherwise do nothing.
|
||||
|
||||
```sql
|
||||
SYSTEM CANCEL VIEW [db.]name
|
||||
```
|
||||
|
@ -38,26 +38,6 @@ sudo service clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you've set up a password.
|
||||
```
|
||||
|
||||
<details markdown="1">
|
||||
|
||||
<summary>Устаревший способ установки deb-пакетов</summary>
|
||||
|
||||
``` bash
|
||||
sudo apt-get install apt-transport-https ca-certificates dirmngr
|
||||
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4
|
||||
|
||||
echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \
|
||||
/etc/apt/sources.list.d/clickhouse.list
|
||||
sudo apt-get update
|
||||
|
||||
sudo apt-get install -y clickhouse-server clickhouse-client
|
||||
|
||||
sudo service clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you set up a password.
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
Чтобы использовать различные [версии ClickHouse](../faq/operations/production.md) в зависимости от ваших потребностей, вы можете заменить `stable` на `lts` или `testing`.
|
||||
|
||||
Также вы можете вручную скачать и установить пакеты из [репозитория](https://packages.clickhouse.com/deb/pool/stable).
|
||||
@ -110,22 +90,6 @@ sudo systemctl status clickhouse-server
|
||||
clickhouse-client # илм "clickhouse-client --password" если установлен пароль
|
||||
```
|
||||
|
||||
<details markdown="1">
|
||||
|
||||
<summary>Устаревший способ установки rpm-пакетов</summary>
|
||||
|
||||
``` bash
|
||||
sudo yum install yum-utils
|
||||
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
|
||||
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo
|
||||
sudo yum install clickhouse-server clickhouse-client
|
||||
|
||||
sudo /etc/init.d/clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you set up a password.
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
Для использования наиболее свежих версий нужно заменить `stable` на `testing` (рекомендуется для тестовых окружений). Также иногда доступен `prestable`.
|
||||
|
||||
Для непосредственной установки пакетов необходимо выполнить следующие команды:
|
||||
@ -178,33 +142,6 @@ tar -xzvf "clickhouse-client-$LATEST_VERSION-${ARCH}.tgz" \
|
||||
sudo "clickhouse-client-$LATEST_VERSION/install/doinst.sh"
|
||||
```
|
||||
|
||||
<details markdown="1">
|
||||
|
||||
<summary>Устаревший способ установки из архивов tgz</summary>
|
||||
|
||||
``` bash
|
||||
export LATEST_VERSION=$(curl -s https://repo.clickhouse.com/tgz/stable/ | \
|
||||
grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1)
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz
|
||||
|
||||
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh
|
||||
|
||||
tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh
|
||||
|
||||
tar -xzvf clickhouse-server-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh
|
||||
sudo /etc/init.d/clickhouse-server start
|
||||
|
||||
tar -xzvf clickhouse-client-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh
|
||||
```
|
||||
</details>
|
||||
|
||||
Для продуктивных окружений рекомендуется использовать последнюю `stable`-версию. Её номер также можно найти на github с на вкладке https://github.com/ClickHouse/ClickHouse/tags c постфиксом `-stable`.
|
||||
|
||||
### Из Docker образа {#from-docker-image}
|
||||
|
@ -112,113 +112,6 @@ SELECT generateUUIDv7(1), generateUUIDv7(2)
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## generateUUIDv7ThreadMonotonic {#uuidv7threadmonotonic-function-generate}
|
||||
|
||||
Генерирует идентификатор [UUID версии 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04). Генерируемый UUID состоит из 48-битной временной метки (Unix time в миллисекундах), маркеров версии 7 и варианта 2, монотонно возрастающего счётчика для данной временной метки и случайных данных в указанной ниже последовательности. Для каждой новой временной метки счётчик стартует с нового случайного значения, а для следующих UUIDv7 он увеличивается на единицу. В случае переполнения счётчика временная метка принудительно увеличивается на 1, и счётчик снова стартует со случайного значения. Данная функция является ускоренным аналогом функции `generateUUIDv7` за счёт потери гарантии монотонности счётчика при одной и той же метке времени между одновременно исполняемыми разными запросами. Монотонность счётчика гарантируется только в пределах одного треда, исполняющего данную функцию для генерации нескольких UUID.
|
||||
|
||||
**Синтаксис**
|
||||
|
||||
``` sql
|
||||
generateUUIDv7ThreadMonotonic([x])
|
||||
```
|
||||
|
||||
**Аргументы**
|
||||
|
||||
- `x` — [выражение](../syntax.md#syntax-expressions), возвращающее значение одного из [поддерживаемых типов данных](../data-types/index.md#data_types). Значение используется, чтобы избежать [склейки одинаковых выражений](index.md#common-subexpression-elimination), если функция вызывается несколько раз в одном запросе. Необязательный параметр.
|
||||
|
||||
**Возвращаемое значение**
|
||||
|
||||
Значение типа [UUID](../../sql-reference/functions/uuid-functions.md).
|
||||
|
||||
**Пример использования**
|
||||
|
||||
Этот пример демонстрирует, как создать таблицу с UUID-колонкой и добавить в нее сгенерированный UUIDv7.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t_uuid (x UUID) ENGINE=TinyLog
|
||||
|
||||
INSERT INTO t_uuid SELECT generateUUIDv7ThreadMonotonic()
|
||||
|
||||
SELECT * FROM t_uuid
|
||||
```
|
||||
|
||||
``` text
|
||||
┌────────────────────────────────────x─┐
|
||||
│ 018f05e2-e3b2-70cb-b8be-64b09b626d32 │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Пример использования, для генерации нескольких значений в одной строке**
|
||||
|
||||
```sql
|
||||
SELECT generateUUIDv7ThreadMonotonic(1), generateUUIDv7ThreadMonotonic(7)
|
||||
|
||||
┌─generateUUIDv7ThreadMonotonic(1)─────┬─generateUUIDv7ThreadMonotonic(2)─────┐
|
||||
│ 018f05e1-14ee-7bc5-9906-207153b400b1 │ 018f05e1-14ee-7bc5-9906-2072b8e96758 │
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## generateUUIDv7NonMonotonic {#uuidv7nonmonotonic-function-generate}
|
||||
|
||||
Генерирует идентификатор [UUID версии 7](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04). Генерируемый UUID состоит из 48-битной временной метки (Unix time в миллисекундах), маркеров версии 7 и варианта 2, и случайных данных в следующей последовательности:
|
||||
```
|
||||
0 1 2 3
|
||||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| unix_ts_ms | ver | rand_a |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
|var| rand_b |
|
||||
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|
||||
| rand_b |
|
||||
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
|
||||
```
|
||||
::::note
|
||||
На апрель 2024 года UUIDv7 находится в статусе черновика и его раскладка по битам может в итоге измениться.
|
||||
::::
|
||||
|
||||
**Синтаксис**
|
||||
|
||||
``` sql
|
||||
generateUUIDv7NonMonotonic([x])
|
||||
```
|
||||
|
||||
**Аргументы**
|
||||
|
||||
- `x` — [выражение](../syntax.md#syntax-expressions), возвращающее значение одного из [поддерживаемых типов данных](../data-types/index.md#data_types). Значение используется, чтобы избежать [склейки одинаковых выражений](index.md#common-subexpression-elimination), если функция вызывается несколько раз в одном запросе. Необязательный параметр.
|
||||
|
||||
**Возвращаемое значение**
|
||||
|
||||
Значение типа [UUID](../../sql-reference/functions/uuid-functions.md).
|
||||
|
||||
**Пример использования**
|
||||
|
||||
Этот пример демонстрирует, как создать таблицу с UUID-колонкой и добавить в нее сгенерированный UUIDv7.
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t_uuid (x UUID) ENGINE=TinyLog
|
||||
|
||||
INSERT INTO t_uuid SELECT generateUUIDv7NonMonotonic()
|
||||
|
||||
SELECT * FROM t_uuid
|
||||
```
|
||||
|
||||
``` text
|
||||
┌────────────────────────────────────x─┐
|
||||
│ 018f05af-f4a8-778f-beee-1bedbc95c93b │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Пример использования, для генерации нескольких значений в одной строке**
|
||||
|
||||
```sql
|
||||
SELECT generateUUIDv7NonMonotonic(1), generateUUIDv7NonMonotonic(7)
|
||||
┌─generateUUIDv7NonMonotonic(1)────────┬─generateUUIDv7NonMonotonic(2)────────┐
|
||||
│ 018f05b1-8c2e-7567-a988-48d09606ae8c │ 018f05b1-8c2e-7946-895b-fcd7635da9a0 │
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## empty {#empty}
|
||||
|
||||
Проверяет, является ли входной UUID пустым.
|
||||
|
@ -38,26 +38,6 @@ sudo service clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you've set up a password.
|
||||
```
|
||||
|
||||
<details markdown="1">
|
||||
|
||||
<summary>Deprecated Method for installing deb-packages</summary>
|
||||
|
||||
``` bash
|
||||
sudo apt-get install apt-transport-https ca-certificates dirmngr
|
||||
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4
|
||||
|
||||
echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \
|
||||
/etc/apt/sources.list.d/clickhouse.list
|
||||
sudo apt-get update
|
||||
|
||||
sudo apt-get install -y clickhouse-server clickhouse-client
|
||||
|
||||
sudo service clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you set up a password.
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
如果您想使用最新的版本,请用`testing`替代`stable`(我们只推荐您用于测试环境)。
|
||||
|
||||
你也可以从这里手动下载安装包:[下载](https://packages.clickhouse.com/deb/pool/stable)。
|
||||
@ -95,22 +75,6 @@ sudo /etc/init.d/clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you set up a password.
|
||||
```
|
||||
|
||||
<details markdown="1">
|
||||
|
||||
<summary>Deprecated Method for installing rpm-packages</summary>
|
||||
|
||||
``` bash
|
||||
sudo yum install yum-utils
|
||||
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
|
||||
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.repo
|
||||
sudo yum install clickhouse-server clickhouse-client
|
||||
|
||||
sudo /etc/init.d/clickhouse-server start
|
||||
clickhouse-client # or "clickhouse-client --password" if you set up a password.
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
如果您想使用最新的版本,请用`testing`替代`stable`(我们只推荐您用于测试环境)。`prestable`有时也可用。
|
||||
|
||||
然后运行命令安装:
|
||||
@ -164,34 +128,6 @@ tar -xzvf "clickhouse-client-$LATEST_VERSION-${ARCH}.tgz" \
|
||||
sudo "clickhouse-client-$LATEST_VERSION/install/doinst.sh"
|
||||
```
|
||||
|
||||
<details markdown="1">
|
||||
|
||||
<summary>Deprecated Method for installing tgz archives</summary>
|
||||
|
||||
``` bash
|
||||
export LATEST_VERSION=$(curl -s https://repo.clickhouse.com/tgz/stable/ | \
|
||||
grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1)
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz
|
||||
curl -O https://repo.clickhouse.com/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz
|
||||
|
||||
tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh
|
||||
|
||||
tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh
|
||||
|
||||
tar -xzvf clickhouse-server-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh
|
||||
sudo /etc/init.d/clickhouse-server start
|
||||
|
||||
tar -xzvf clickhouse-client-$LATEST_VERSION.tgz
|
||||
sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
对于生产环境,建议使用最新的`stable`版本。你可以在GitHub页面https://github.com/ClickHouse/ClickHouse/tags找到它,它以后缀`-stable`标志。
|
||||
|
||||
### `Docker`安装包 {#from-docker-image}
|
||||
|
@ -1,10 +1,11 @@
|
||||
#!/bin/sh
|
||||
### BEGIN INIT INFO
|
||||
# Provides: clickhouse-server
|
||||
# Required-Start: $network
|
||||
# Required-Stop: $network
|
||||
# Should-Start: $time
|
||||
# Default-Start: 2 3 4 5
|
||||
# Default-Stop: 0 1 6
|
||||
# Should-Start: $time $network
|
||||
# Should-Stop: $network
|
||||
# Short-Description: clickhouse-server daemon
|
||||
### END INIT INFO
|
||||
#
|
||||
|
@ -792,9 +792,32 @@ try
|
||||
LOG_INFO(log, "Background threads finished in {} ms", watch.elapsedMilliseconds());
|
||||
});
|
||||
|
||||
/// This object will periodically calculate some metrics.
|
||||
ServerAsynchronousMetrics async_metrics(
|
||||
global_context,
|
||||
server_settings.asynchronous_metrics_update_period_s,
|
||||
server_settings.asynchronous_heavy_metrics_update_period_s,
|
||||
[&]() -> std::vector<ProtocolServerMetrics>
|
||||
{
|
||||
std::vector<ProtocolServerMetrics> metrics;
|
||||
|
||||
std::lock_guard lock(servers_lock);
|
||||
metrics.reserve(servers_to_start_before_tables.size() + servers.size());
|
||||
|
||||
for (const auto & server : servers_to_start_before_tables)
|
||||
metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()});
|
||||
|
||||
for (const auto & server : servers)
|
||||
metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()});
|
||||
return metrics;
|
||||
}
|
||||
);
|
||||
|
||||
/// NOTE: global context should be destroyed *before* GlobalThreadPool::shutdown()
|
||||
/// Otherwise GlobalThreadPool::shutdown() will hang, since Context holds some threads.
|
||||
SCOPE_EXIT({
|
||||
async_metrics.stop();
|
||||
|
||||
/** Ask to cancel background jobs all table engines,
|
||||
* and also query_log.
|
||||
* It is important to do early, not in destructor of Context, because
|
||||
@ -921,27 +944,6 @@ try
|
||||
}
|
||||
}
|
||||
|
||||
/// This object will periodically calculate some metrics.
|
||||
ServerAsynchronousMetrics async_metrics(
|
||||
global_context,
|
||||
server_settings.asynchronous_metrics_update_period_s,
|
||||
server_settings.asynchronous_heavy_metrics_update_period_s,
|
||||
[&]() -> std::vector<ProtocolServerMetrics>
|
||||
{
|
||||
std::vector<ProtocolServerMetrics> metrics;
|
||||
|
||||
std::lock_guard lock(servers_lock);
|
||||
metrics.reserve(servers_to_start_before_tables.size() + servers.size());
|
||||
|
||||
for (const auto & server : servers_to_start_before_tables)
|
||||
metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()});
|
||||
|
||||
for (const auto & server : servers)
|
||||
metrics.emplace_back(ProtocolServerMetrics{server.getPortName(), server.currentThreads()});
|
||||
return metrics;
|
||||
}
|
||||
);
|
||||
|
||||
zkutil::validateZooKeeperConfig(config());
|
||||
bool has_zookeeper = zkutil::hasZooKeeperConfig(config());
|
||||
|
||||
@ -1748,6 +1750,11 @@ try
|
||||
|
||||
}
|
||||
|
||||
if (config().has(DB::PlacementInfo::PLACEMENT_CONFIG_PREFIX))
|
||||
{
|
||||
PlacementInfo::PlacementInfo::instance().initialize(config());
|
||||
}
|
||||
|
||||
{
|
||||
std::lock_guard lock(servers_lock);
|
||||
/// We should start interserver communications before (and more important shutdown after) tables.
|
||||
@ -2096,11 +2103,6 @@ try
|
||||
load_metadata_tasks);
|
||||
}
|
||||
|
||||
if (config().has(DB::PlacementInfo::PLACEMENT_CONFIG_PREFIX))
|
||||
{
|
||||
PlacementInfo::PlacementInfo::instance().initialize(config());
|
||||
}
|
||||
|
||||
/// Do not keep tasks in server, they should be kept inside databases. Used here to make dependent tasks only.
|
||||
load_metadata_tasks.clear();
|
||||
load_metadata_tasks.shrink_to_fit();
|
||||
|
124
src/Analyzer/Resolve/ExpressionsStack.h
Normal file
124
src/Analyzer/Resolve/ExpressionsStack.h
Normal file
@ -0,0 +1,124 @@
|
||||
#pragma once
|
||||
|
||||
#include <IO/Operators.h>
|
||||
#include <AggregateFunctions/AggregateFunctionFactory.h>
|
||||
#include <Analyzer/FunctionNode.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class ExpressionsStack
|
||||
{
|
||||
public:
|
||||
void push(const QueryTreeNodePtr & node)
|
||||
{
|
||||
if (node->hasAlias())
|
||||
{
|
||||
const auto & node_alias = node->getAlias();
|
||||
alias_name_to_expressions[node_alias].push_back(node);
|
||||
}
|
||||
|
||||
if (const auto * function = node->as<FunctionNode>())
|
||||
{
|
||||
if (AggregateFunctionFactory::instance().isAggregateFunctionName(function->getFunctionName()))
|
||||
++aggregate_functions_counter;
|
||||
}
|
||||
|
||||
expressions.emplace_back(node);
|
||||
}
|
||||
|
||||
void pop()
|
||||
{
|
||||
const auto & top_expression = expressions.back();
|
||||
const auto & top_expression_alias = top_expression->getAlias();
|
||||
|
||||
if (!top_expression_alias.empty())
|
||||
{
|
||||
auto it = alias_name_to_expressions.find(top_expression_alias);
|
||||
auto & alias_expressions = it->second;
|
||||
alias_expressions.pop_back();
|
||||
|
||||
if (alias_expressions.empty())
|
||||
alias_name_to_expressions.erase(it);
|
||||
}
|
||||
|
||||
if (const auto * function = top_expression->as<FunctionNode>())
|
||||
{
|
||||
if (AggregateFunctionFactory::instance().isAggregateFunctionName(function->getFunctionName()))
|
||||
--aggregate_functions_counter;
|
||||
}
|
||||
|
||||
expressions.pop_back();
|
||||
}
|
||||
|
||||
[[maybe_unused]] const QueryTreeNodePtr & getRoot() const
|
||||
{
|
||||
return expressions.front();
|
||||
}
|
||||
|
||||
const QueryTreeNodePtr & getTop() const
|
||||
{
|
||||
return expressions.back();
|
||||
}
|
||||
|
||||
[[maybe_unused]] bool hasExpressionWithAlias(const std::string & alias) const
|
||||
{
|
||||
return alias_name_to_expressions.contains(alias);
|
||||
}
|
||||
|
||||
bool hasAggregateFunction() const
|
||||
{
|
||||
return aggregate_functions_counter > 0;
|
||||
}
|
||||
|
||||
QueryTreeNodePtr getExpressionWithAlias(const std::string & alias) const
|
||||
{
|
||||
auto expression_it = alias_name_to_expressions.find(alias);
|
||||
if (expression_it == alias_name_to_expressions.end())
|
||||
return {};
|
||||
|
||||
return expression_it->second.front();
|
||||
}
|
||||
|
||||
[[maybe_unused]] size_t size() const
|
||||
{
|
||||
return expressions.size();
|
||||
}
|
||||
|
||||
bool empty() const
|
||||
{
|
||||
return expressions.empty();
|
||||
}
|
||||
|
||||
void dump(WriteBuffer & buffer) const
|
||||
{
|
||||
buffer << expressions.size() << '\n';
|
||||
|
||||
for (const auto & expression : expressions)
|
||||
{
|
||||
buffer << "Expression ";
|
||||
buffer << expression->formatASTForErrorMessage();
|
||||
|
||||
const auto & alias = expression->getAlias();
|
||||
if (!alias.empty())
|
||||
buffer << " alias " << alias;
|
||||
|
||||
buffer << '\n';
|
||||
}
|
||||
}
|
||||
|
||||
[[maybe_unused]] String dump() const
|
||||
{
|
||||
WriteBufferFromOwnString buffer;
|
||||
dump(buffer);
|
||||
|
||||
return buffer.str();
|
||||
}
|
||||
|
||||
private:
|
||||
QueryTreeNodes expressions;
|
||||
size_t aggregate_functions_counter = 0;
|
||||
std::unordered_map<std::string, QueryTreeNodes> alias_name_to_expressions;
|
||||
};
|
||||
|
||||
}
|
195
src/Analyzer/Resolve/IdentifierLookup.h
Normal file
195
src/Analyzer/Resolve/IdentifierLookup.h
Normal file
@ -0,0 +1,195 @@
|
||||
#pragma once
|
||||
|
||||
#include <IO/WriteHelpers.h>
|
||||
#include <IO/Operators.h>
|
||||
#include <IO/WriteBufferFromString.h>
|
||||
|
||||
#include <Analyzer/IQueryTreeNode.h>
|
||||
#include <Analyzer/Identifier.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/// Identifier lookup context
|
||||
enum class IdentifierLookupContext : uint8_t
|
||||
{
|
||||
EXPRESSION = 0,
|
||||
FUNCTION,
|
||||
TABLE_EXPRESSION,
|
||||
};
|
||||
|
||||
inline const char * toString(IdentifierLookupContext identifier_lookup_context)
|
||||
{
|
||||
switch (identifier_lookup_context)
|
||||
{
|
||||
case IdentifierLookupContext::EXPRESSION: return "EXPRESSION";
|
||||
case IdentifierLookupContext::FUNCTION: return "FUNCTION";
|
||||
case IdentifierLookupContext::TABLE_EXPRESSION: return "TABLE_EXPRESSION";
|
||||
}
|
||||
}
|
||||
|
||||
inline const char * toStringLowercase(IdentifierLookupContext identifier_lookup_context)
|
||||
{
|
||||
switch (identifier_lookup_context)
|
||||
{
|
||||
case IdentifierLookupContext::EXPRESSION: return "expression";
|
||||
case IdentifierLookupContext::FUNCTION: return "function";
|
||||
case IdentifierLookupContext::TABLE_EXPRESSION: return "table expression";
|
||||
}
|
||||
}
|
||||
|
||||
/** Structure that represent identifier lookup during query analysis.
|
||||
* Lookup can be in query expression, function, table context.
|
||||
*/
|
||||
struct IdentifierLookup
|
||||
{
|
||||
Identifier identifier;
|
||||
IdentifierLookupContext lookup_context;
|
||||
|
||||
bool isExpressionLookup() const
|
||||
{
|
||||
return lookup_context == IdentifierLookupContext::EXPRESSION;
|
||||
}
|
||||
|
||||
bool isFunctionLookup() const
|
||||
{
|
||||
return lookup_context == IdentifierLookupContext::FUNCTION;
|
||||
}
|
||||
|
||||
bool isTableExpressionLookup() const
|
||||
{
|
||||
return lookup_context == IdentifierLookupContext::TABLE_EXPRESSION;
|
||||
}
|
||||
|
||||
String dump() const
|
||||
{
|
||||
return identifier.getFullName() + ' ' + toString(lookup_context);
|
||||
}
|
||||
};
|
||||
|
||||
inline bool operator==(const IdentifierLookup & lhs, const IdentifierLookup & rhs)
|
||||
{
|
||||
return lhs.identifier.getFullName() == rhs.identifier.getFullName() && lhs.lookup_context == rhs.lookup_context;
|
||||
}
|
||||
|
||||
[[maybe_unused]] inline bool operator!=(const IdentifierLookup & lhs, const IdentifierLookup & rhs)
|
||||
{
|
||||
return !(lhs == rhs);
|
||||
}
|
||||
|
||||
struct IdentifierLookupHash
|
||||
{
|
||||
size_t operator()(const IdentifierLookup & identifier_lookup) const
|
||||
{
|
||||
return std::hash<std::string>()(identifier_lookup.identifier.getFullName()) ^ static_cast<uint8_t>(identifier_lookup.lookup_context);
|
||||
}
|
||||
};
|
||||
|
||||
enum class IdentifierResolvePlace : UInt8
|
||||
{
|
||||
NONE = 0,
|
||||
EXPRESSION_ARGUMENTS,
|
||||
ALIASES,
|
||||
JOIN_TREE,
|
||||
/// Valid only for table lookup
|
||||
CTE,
|
||||
/// Valid only for table lookup
|
||||
DATABASE_CATALOG
|
||||
};
|
||||
|
||||
inline const char * toString(IdentifierResolvePlace resolved_identifier_place)
|
||||
{
|
||||
switch (resolved_identifier_place)
|
||||
{
|
||||
case IdentifierResolvePlace::NONE: return "NONE";
|
||||
case IdentifierResolvePlace::EXPRESSION_ARGUMENTS: return "EXPRESSION_ARGUMENTS";
|
||||
case IdentifierResolvePlace::ALIASES: return "ALIASES";
|
||||
case IdentifierResolvePlace::JOIN_TREE: return "JOIN_TREE";
|
||||
case IdentifierResolvePlace::CTE: return "CTE";
|
||||
case IdentifierResolvePlace::DATABASE_CATALOG: return "DATABASE_CATALOG";
|
||||
}
|
||||
}
|
||||
|
||||
struct IdentifierResolveResult
|
||||
{
|
||||
IdentifierResolveResult() = default;
|
||||
|
||||
QueryTreeNodePtr resolved_identifier;
|
||||
IdentifierResolvePlace resolve_place = IdentifierResolvePlace::NONE;
|
||||
bool resolved_from_parent_scopes = false;
|
||||
|
||||
[[maybe_unused]] bool isResolved() const
|
||||
{
|
||||
return resolve_place != IdentifierResolvePlace::NONE;
|
||||
}
|
||||
|
||||
[[maybe_unused]] bool isResolvedFromParentScopes() const
|
||||
{
|
||||
return resolved_from_parent_scopes;
|
||||
}
|
||||
|
||||
[[maybe_unused]] bool isResolvedFromExpressionArguments() const
|
||||
{
|
||||
return resolve_place == IdentifierResolvePlace::EXPRESSION_ARGUMENTS;
|
||||
}
|
||||
|
||||
[[maybe_unused]] bool isResolvedFromAliases() const
|
||||
{
|
||||
return resolve_place == IdentifierResolvePlace::ALIASES;
|
||||
}
|
||||
|
||||
[[maybe_unused]] bool isResolvedFromJoinTree() const
|
||||
{
|
||||
return resolve_place == IdentifierResolvePlace::JOIN_TREE;
|
||||
}
|
||||
|
||||
[[maybe_unused]] bool isResolvedFromCTEs() const
|
||||
{
|
||||
return resolve_place == IdentifierResolvePlace::CTE;
|
||||
}
|
||||
|
||||
void dump(WriteBuffer & buffer) const
|
||||
{
|
||||
if (!resolved_identifier)
|
||||
{
|
||||
buffer << "unresolved";
|
||||
return;
|
||||
}
|
||||
|
||||
buffer << resolved_identifier->formatASTForErrorMessage() << " place " << toString(resolve_place) << " resolved from parent scopes " << resolved_from_parent_scopes;
|
||||
}
|
||||
|
||||
[[maybe_unused]] String dump() const
|
||||
{
|
||||
WriteBufferFromOwnString buffer;
|
||||
dump(buffer);
|
||||
|
||||
return buffer.str();
|
||||
}
|
||||
};
|
||||
|
||||
struct IdentifierResolveState
|
||||
{
|
||||
IdentifierResolveResult resolve_result;
|
||||
bool cyclic_identifier_resolve = false;
|
||||
};
|
||||
|
||||
struct IdentifierResolveSettings
|
||||
{
|
||||
/// Allow to check join tree during identifier resolution
|
||||
bool allow_to_check_join_tree = true;
|
||||
|
||||
/// Allow to check CTEs during table identifier resolution
|
||||
bool allow_to_check_cte = true;
|
||||
|
||||
/// Allow to check parent scopes during identifier resolution
|
||||
bool allow_to_check_parent_scopes = true;
|
||||
|
||||
/// Allow to check database catalog during table identifier resolution
|
||||
bool allow_to_check_database_catalog = true;
|
||||
|
||||
/// Allow to resolve subquery during identifier resolution
|
||||
bool allow_to_resolve_subquery_during_identifier_resolution = true;
|
||||
};
|
||||
|
||||
}
|
184
src/Analyzer/Resolve/IdentifierResolveScope.cpp
Normal file
184
src/Analyzer/Resolve/IdentifierResolveScope.cpp
Normal file
@ -0,0 +1,184 @@
|
||||
#include <Analyzer/Resolve/IdentifierResolveScope.h>
|
||||
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Analyzer/QueryNode.h>
|
||||
#include <Analyzer/UnionNode.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
IdentifierResolveScope::IdentifierResolveScope(QueryTreeNodePtr scope_node_, IdentifierResolveScope * parent_scope_)
|
||||
: scope_node(std::move(scope_node_))
|
||||
, parent_scope(parent_scope_)
|
||||
{
|
||||
if (parent_scope)
|
||||
{
|
||||
subquery_depth = parent_scope->subquery_depth;
|
||||
context = parent_scope->context;
|
||||
projection_mask_map = parent_scope->projection_mask_map;
|
||||
}
|
||||
else
|
||||
projection_mask_map = std::make_shared<std::map<IQueryTreeNode::Hash, size_t>>();
|
||||
|
||||
if (auto * union_node = scope_node->as<UnionNode>())
|
||||
{
|
||||
context = union_node->getContext();
|
||||
}
|
||||
else if (auto * query_node = scope_node->as<QueryNode>())
|
||||
{
|
||||
context = query_node->getContext();
|
||||
group_by_use_nulls = context->getSettingsRef().group_by_use_nulls &&
|
||||
(query_node->isGroupByWithGroupingSets() || query_node->isGroupByWithRollup() || query_node->isGroupByWithCube());
|
||||
}
|
||||
|
||||
if (context)
|
||||
join_use_nulls = context->getSettingsRef().join_use_nulls;
|
||||
else if (parent_scope)
|
||||
join_use_nulls = parent_scope->join_use_nulls;
|
||||
|
||||
aliases.alias_name_to_expression_node = &aliases.alias_name_to_expression_node_before_group_by;
|
||||
}
|
||||
|
||||
[[maybe_unused]] const IdentifierResolveScope * IdentifierResolveScope::getNearestQueryScope() const
|
||||
{
|
||||
const IdentifierResolveScope * scope_to_check = this;
|
||||
while (scope_to_check != nullptr)
|
||||
{
|
||||
if (scope_to_check->scope_node->getNodeType() == QueryTreeNodeType::QUERY)
|
||||
break;
|
||||
|
||||
scope_to_check = scope_to_check->parent_scope;
|
||||
}
|
||||
|
||||
return scope_to_check;
|
||||
}
|
||||
|
||||
IdentifierResolveScope * IdentifierResolveScope::getNearestQueryScope()
|
||||
{
|
||||
IdentifierResolveScope * scope_to_check = this;
|
||||
while (scope_to_check != nullptr)
|
||||
{
|
||||
if (scope_to_check->scope_node->getNodeType() == QueryTreeNodeType::QUERY)
|
||||
break;
|
||||
|
||||
scope_to_check = scope_to_check->parent_scope;
|
||||
}
|
||||
|
||||
return scope_to_check;
|
||||
}
|
||||
|
||||
AnalysisTableExpressionData & IdentifierResolveScope::getTableExpressionDataOrThrow(const QueryTreeNodePtr & table_expression_node)
|
||||
{
|
||||
auto it = table_expression_node_to_data.find(table_expression_node);
|
||||
if (it == table_expression_node_to_data.end())
|
||||
{
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR,
|
||||
"Table expression {} data must be initialized. In scope {}",
|
||||
table_expression_node->formatASTForErrorMessage(),
|
||||
scope_node->formatASTForErrorMessage());
|
||||
}
|
||||
|
||||
return it->second;
|
||||
}
|
||||
|
||||
const AnalysisTableExpressionData & IdentifierResolveScope::getTableExpressionDataOrThrow(const QueryTreeNodePtr & table_expression_node) const
|
||||
{
|
||||
auto it = table_expression_node_to_data.find(table_expression_node);
|
||||
if (it == table_expression_node_to_data.end())
|
||||
{
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR,
|
||||
"Table expression {} data must be initialized. In scope {}",
|
||||
table_expression_node->formatASTForErrorMessage(),
|
||||
scope_node->formatASTForErrorMessage());
|
||||
}
|
||||
|
||||
return it->second;
|
||||
}
|
||||
|
||||
void IdentifierResolveScope::pushExpressionNode(const QueryTreeNodePtr & node)
|
||||
{
|
||||
bool had_aggregate_function = expressions_in_resolve_process_stack.hasAggregateFunction();
|
||||
expressions_in_resolve_process_stack.push(node);
|
||||
if (group_by_use_nulls && had_aggregate_function != expressions_in_resolve_process_stack.hasAggregateFunction())
|
||||
aliases.alias_name_to_expression_node = &aliases.alias_name_to_expression_node_before_group_by;
|
||||
}
|
||||
|
||||
void IdentifierResolveScope::popExpressionNode()
|
||||
{
|
||||
bool had_aggregate_function = expressions_in_resolve_process_stack.hasAggregateFunction();
|
||||
expressions_in_resolve_process_stack.pop();
|
||||
if (group_by_use_nulls && had_aggregate_function != expressions_in_resolve_process_stack.hasAggregateFunction())
|
||||
aliases.alias_name_to_expression_node = &aliases.alias_name_to_expression_node_after_group_by;
|
||||
}
|
||||
|
||||
/// Dump identifier resolve scope
|
||||
[[maybe_unused]] void IdentifierResolveScope::dump(WriteBuffer & buffer) const
|
||||
{
|
||||
buffer << "Scope node " << scope_node->formatASTForErrorMessage() << '\n';
|
||||
buffer << "Identifier lookup to resolve state " << identifier_lookup_to_resolve_state.size() << '\n';
|
||||
for (const auto & [identifier, state] : identifier_lookup_to_resolve_state)
|
||||
{
|
||||
buffer << "Identifier " << identifier.dump() << " resolve result ";
|
||||
state.resolve_result.dump(buffer);
|
||||
buffer << '\n';
|
||||
}
|
||||
|
||||
buffer << "Expression argument name to node " << expression_argument_name_to_node.size() << '\n';
|
||||
for (const auto & [alias_name, node] : expression_argument_name_to_node)
|
||||
buffer << "Alias name " << alias_name << " node " << node->formatASTForErrorMessage() << '\n';
|
||||
|
||||
buffer << "Alias name to expression node table size " << aliases.alias_name_to_expression_node->size() << '\n';
|
||||
for (const auto & [alias_name, node] : *aliases.alias_name_to_expression_node)
|
||||
buffer << "Alias name " << alias_name << " expression node " << node->dumpTree() << '\n';
|
||||
|
||||
buffer << "Alias name to function node table size " << aliases.alias_name_to_lambda_node.size() << '\n';
|
||||
for (const auto & [alias_name, node] : aliases.alias_name_to_lambda_node)
|
||||
buffer << "Alias name " << alias_name << " lambda node " << node->formatASTForErrorMessage() << '\n';
|
||||
|
||||
buffer << "Alias name to table expression node table size " << aliases.alias_name_to_table_expression_node.size() << '\n';
|
||||
for (const auto & [alias_name, node] : aliases.alias_name_to_table_expression_node)
|
||||
buffer << "Alias name " << alias_name << " node " << node->formatASTForErrorMessage() << '\n';
|
||||
|
||||
buffer << "CTE name to query node table size " << cte_name_to_query_node.size() << '\n';
|
||||
for (const auto & [cte_name, node] : cte_name_to_query_node)
|
||||
buffer << "CTE name " << cte_name << " node " << node->formatASTForErrorMessage() << '\n';
|
||||
|
||||
buffer << "WINDOW name to window node table size " << window_name_to_window_node.size() << '\n';
|
||||
for (const auto & [window_name, node] : window_name_to_window_node)
|
||||
buffer << "CTE name " << window_name << " node " << node->formatASTForErrorMessage() << '\n';
|
||||
|
||||
buffer << "Nodes with duplicated aliases size " << aliases.nodes_with_duplicated_aliases.size() << '\n';
|
||||
for (const auto & node : aliases.nodes_with_duplicated_aliases)
|
||||
buffer << "Alias name " << node->getAlias() << " node " << node->formatASTForErrorMessage() << '\n';
|
||||
|
||||
buffer << "Expression resolve process stack " << '\n';
|
||||
expressions_in_resolve_process_stack.dump(buffer);
|
||||
|
||||
buffer << "Table expressions in resolve process size " << table_expressions_in_resolve_process.size() << '\n';
|
||||
for (const auto & node : table_expressions_in_resolve_process)
|
||||
buffer << "Table expression " << node->formatASTForErrorMessage() << '\n';
|
||||
|
||||
buffer << "Non cached identifier lookups during expression resolve " << non_cached_identifier_lookups_during_expression_resolve.size() << '\n';
|
||||
for (const auto & identifier_lookup : non_cached_identifier_lookups_during_expression_resolve)
|
||||
buffer << "Identifier lookup " << identifier_lookup.dump() << '\n';
|
||||
|
||||
buffer << "Table expression node to data " << table_expression_node_to_data.size() << '\n';
|
||||
for (const auto & [table_expression_node, table_expression_data] : table_expression_node_to_data)
|
||||
buffer << "Table expression node " << table_expression_node->formatASTForErrorMessage() << " data " << table_expression_data.dump() << '\n';
|
||||
|
||||
buffer << "Use identifier lookup to result cache " << use_identifier_lookup_to_result_cache << '\n';
|
||||
buffer << "Subquery depth " << subquery_depth << '\n';
|
||||
}
|
||||
|
||||
[[maybe_unused]] String IdentifierResolveScope::dump() const
|
||||
{
|
||||
WriteBufferFromOwnString buffer;
|
||||
dump(buffer);
|
||||
|
||||
return buffer.str();
|
||||
}
|
||||
}
|
231
src/Analyzer/Resolve/IdentifierResolveScope.h
Normal file
231
src/Analyzer/Resolve/IdentifierResolveScope.h
Normal file
@ -0,0 +1,231 @@
|
||||
#pragma once
|
||||
|
||||
#include <Interpreters/Context_fwd.h>
|
||||
#include <Analyzer/HashUtils.h>
|
||||
#include <Analyzer/IQueryTreeNode.h>
|
||||
|
||||
#include <Analyzer/Resolve/IdentifierLookup.h>
|
||||
#include <Analyzer/Resolve/ScopeAliases.h>
|
||||
#include <Analyzer/Resolve/TableExpressionData.h>
|
||||
#include <Analyzer/Resolve/ExpressionsStack.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/** Projection names is name of query tree node that is used in projection part of query node.
|
||||
* Example: SELECT id FROM test_table;
|
||||
* `id` is projection name of column node
|
||||
*
|
||||
* Example: SELECT id AS id_alias FROM test_table;
|
||||
* `id_alias` is projection name of column node
|
||||
*
|
||||
* Calculation of projection names is done during expression nodes resolution. This is done this way
|
||||
* because after identifier node is resolved we lose information about identifier name. We could
|
||||
* potentially save this information in query tree node itself, but that would require to clone it in some cases.
|
||||
* Example: SELECT big_scalar_subquery AS a, a AS b, b AS c;
|
||||
* All 3 nodes in projection are the same big_scalar_subquery, but they have different projection names.
|
||||
* If we want to save it in query tree node, we have to clone subquery node that could lead to performance degradation.
|
||||
*
|
||||
* Possible solution is to separate query node metadata and query node content. So only node metadata could be cloned
|
||||
* if we want to change projection name. This solution does not seem to be easy for client of query tree because projection
|
||||
* name will be part of interface. If we potentially could hide projection names calculation in analyzer without introducing additional
|
||||
* changes in query tree structure that would be preferable.
|
||||
*
|
||||
* Currently each resolve method returns projection names array. Resolve method must compute projection names of node.
|
||||
* If node is resolved as list node this is case for `untuple` function or `matcher` result projection names array must contain projection names
|
||||
* for result nodes.
|
||||
* If node is not resolved as list node, projection names array contain single projection name for node.
|
||||
*
|
||||
* Rules for projection names:
|
||||
* 1. If node has alias. It is node projection name.
|
||||
* Except scenario where `untuple` function has alias. Example: SELECT untuple(expr) AS alias, alias.
|
||||
*
|
||||
* 2. For constant it is constant value string representation.
|
||||
*
|
||||
* 3. For identifier:
|
||||
* If identifier is resolved from JOIN TREE, we want to remove additional identifier qualifications.
|
||||
* Example: SELECT default.test_table.id FROM test_table.
|
||||
* Result projection name is `id`.
|
||||
*
|
||||
* Example: SELECT t1.id FROM test_table_1 AS t1, test_table_2 AS t2
|
||||
* In example both test_table_1, test_table_2 have `id` column.
|
||||
* In such case projection name is `t1.id` because if additional qualification is removed then column projection name `id` will be ambiguous.
|
||||
*
|
||||
* Example: SELECT default.test_table_1.id FROM test_table_1 AS t1, test_table_2 AS t2
|
||||
* In such case projection name is `test_table_1.id` because we remove unnecessary database qualification, but table name qualification cannot be removed
|
||||
* because otherwise column projection name `id` will be ambiguous.
|
||||
*
|
||||
* If identifier is not resolved from JOIN TREE. Identifier name is projection name.
|
||||
* Except scenario where `untuple` function resolved using identifier. Example: SELECT untuple(expr) AS alias, alias.
|
||||
* Example: SELECT sum(1, 1) AS value, value.
|
||||
* In such case both nodes have `value` projection names.
|
||||
*
|
||||
* Example: SELECT id AS value, value FROM test_table.
|
||||
* In such case both nodes have have `value` projection names.
|
||||
*
|
||||
* Special case is `untuple` function. If `untuple` function specified with alias, then result nodes will have alias.tuple_column_name projection names.
|
||||
* Example: SELECT cast(tuple(1), 'Tuple(id UInt64)') AS value, untuple(value) AS a;
|
||||
* Result projection names are `value`, `a.id`.
|
||||
*
|
||||
* If `untuple` function does not have alias then result nodes will have `tupleElement(untuple_expression_projection_name, 'tuple_column_name') projection names.
|
||||
*
|
||||
* Example: SELECT cast(tuple(1), 'Tuple(id UInt64)') AS value, untuple(value);
|
||||
* Result projection names are `value`, `tupleElement(value, 'id')`;
|
||||
*
|
||||
* 4. For function:
|
||||
* Projection name consists from function_name(parameters_projection_names)(arguments_projection_names).
|
||||
* Additionally if function is window function. Window node projection name is used with OVER clause.
|
||||
* Example: function_name (parameters_names)(argument_projection_names) OVER window_name;
|
||||
* Example: function_name (parameters_names)(argument_projection_names) OVER (PARTITION BY id ORDER BY id).
|
||||
* Example: function_name (parameters_names)(argument_projection_names) OVER (window_name ORDER BY id).
|
||||
*
|
||||
* 5. For lambda:
|
||||
* If it is standalone lambda that returns single expression, function projection name is used.
|
||||
* Example: WITH (x -> x + 1) AS lambda SELECT lambda(1).
|
||||
* Projection name is `lambda(1)`.
|
||||
*
|
||||
* If is it standalone lambda that returns list, projection names of list nodes are used.
|
||||
* Example: WITH (x -> *) AS lambda SELECT lambda(1) FROM test_table;
|
||||
* If test_table has two columns `id`, `value`. Then result projection names are `id`, `value`.
|
||||
*
|
||||
* If lambda is argument of function.
|
||||
* Then projection name consists from lambda(tuple(lambda_arguments)(lambda_body_projection_name));
|
||||
*
|
||||
* 6. For matcher:
|
||||
* Matched nodes projection names are used as matcher projection names.
|
||||
*
|
||||
* Matched nodes must be qualified if needed.
|
||||
* Example: SELECT * FROM test_table_1 AS t1, test_table_2 AS t2.
|
||||
* In example table test_table_1 and test_table_2 both have `id`, `value` columns.
|
||||
* Matched nodes after unqualified matcher resolve must be qualified to avoid ambiguous projection names.
|
||||
* Result projection names must be `t1.id`, `t1.value`, `t2.id`, `t2.value`.
|
||||
*
|
||||
* There are special cases
|
||||
* 1. For lambda inside APPLY matcher transformer:
|
||||
* Example: SELECT * APPLY x -> toString(x) FROM test_table.
|
||||
* In such case lambda argument projection name `x` will be replaced by matched node projection name.
|
||||
* If table has two columns `id` and `value`. Then result projection names are `toString(id)`, `toString(value)`;
|
||||
*
|
||||
* 2. For unqualified matcher when JOIN tree contains JOIN with USING.
|
||||
* Example: SELECT * FROM test_table_1 AS t1 INNER JOIN test_table_2 AS t2 USING(id);
|
||||
* Result projection names must be `id`, `t1.value`, `t2.value`.
|
||||
*
|
||||
* 7. For subquery:
|
||||
* For subquery projection name consists of `_subquery_` prefix and implementation specific unique number suffix.
|
||||
* Example: SELECT (SELECT 1), (SELECT 1 UNION DISTINCT SELECT 1);
|
||||
* Result projection name can be `_subquery_1`, `subquery_2`;
|
||||
*
|
||||
* 8. For table:
|
||||
* Table node can be used in expression context only as right argument of IN function. In that case identifier is used
|
||||
* as table node projection name.
|
||||
* Example: SELECT id IN test_table FROM test_table;
|
||||
* Result projection name is `in(id, test_table)`.
|
||||
*/
|
||||
using ProjectionName = String;
|
||||
using ProjectionNames = std::vector<ProjectionName>;
|
||||
constexpr auto PROJECTION_NAME_PLACEHOLDER = "__projection_name_placeholder";
|
||||
|
||||
struct IdentifierResolveScope
|
||||
{
|
||||
/// Construct identifier resolve scope using scope node, and parent scope
|
||||
IdentifierResolveScope(QueryTreeNodePtr scope_node_, IdentifierResolveScope * parent_scope_);
|
||||
|
||||
QueryTreeNodePtr scope_node;
|
||||
|
||||
IdentifierResolveScope * parent_scope = nullptr;
|
||||
|
||||
ContextPtr context;
|
||||
|
||||
/// Identifier lookup to result
|
||||
std::unordered_map<IdentifierLookup, IdentifierResolveState, IdentifierLookupHash> identifier_lookup_to_resolve_state;
|
||||
|
||||
/// Argument can be expression like constant, column, function or table expression
|
||||
std::unordered_map<std::string, QueryTreeNodePtr> expression_argument_name_to_node;
|
||||
|
||||
ScopeAliases aliases;
|
||||
|
||||
/// Table column name to column node. Valid only during table ALIAS columns resolve.
|
||||
ColumnNameToColumnNodeMap column_name_to_column_node;
|
||||
|
||||
/// CTE name to query node
|
||||
std::unordered_map<std::string, QueryTreeNodePtr> cte_name_to_query_node;
|
||||
|
||||
/// Window name to window node
|
||||
std::unordered_map<std::string, QueryTreeNodePtr> window_name_to_window_node;
|
||||
|
||||
/// Current scope expression in resolve process stack
|
||||
ExpressionsStack expressions_in_resolve_process_stack;
|
||||
|
||||
/// Table expressions in resolve process
|
||||
std::unordered_set<const IQueryTreeNode *> table_expressions_in_resolve_process;
|
||||
|
||||
/// Current scope expression
|
||||
std::unordered_set<IdentifierLookup, IdentifierLookupHash> non_cached_identifier_lookups_during_expression_resolve;
|
||||
|
||||
/// Table expression node to data
|
||||
std::unordered_map<QueryTreeNodePtr, AnalysisTableExpressionData> table_expression_node_to_data;
|
||||
|
||||
QueryTreeNodePtrWithHashIgnoreTypesSet nullable_group_by_keys;
|
||||
/// Here we count the number of nullable GROUP BY keys we met resolving expression.
|
||||
/// E.g. for a query `SELECT tuple(tuple(number)) FROM numbers(10) GROUP BY (number, tuple(number)) with cube`
|
||||
/// both `number` and `tuple(number)` would be in nullable_group_by_keys.
|
||||
/// But when we resolve `tuple(tuple(number))` we should figure out that `tuple(number)` is already a key,
|
||||
/// and we should not convert `number` to nullable.
|
||||
size_t found_nullable_group_by_key_in_scope = 0;
|
||||
|
||||
/** It's possible that after a JOIN, a column in the projection has a type different from the column in the source table.
|
||||
* (For example, after join_use_nulls or USING column casted to supertype)
|
||||
* However, the column in the projection still refers to the table as its source.
|
||||
* This map is used to revert these columns back to their original columns in the source table.
|
||||
*/
|
||||
QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> join_columns_with_changed_types;
|
||||
|
||||
/// Use identifier lookup to result cache
|
||||
bool use_identifier_lookup_to_result_cache = true;
|
||||
|
||||
/// Apply nullability to aggregation keys
|
||||
bool group_by_use_nulls = false;
|
||||
/// Join retutns NULLs instead of default values
|
||||
bool join_use_nulls = false;
|
||||
|
||||
/// JOINs count
|
||||
size_t joins_count = 0;
|
||||
|
||||
/// Subquery depth
|
||||
size_t subquery_depth = 0;
|
||||
|
||||
/** Scope join tree node for expression.
|
||||
* Valid only during analysis construction for single expression.
|
||||
*/
|
||||
QueryTreeNodePtr expression_join_tree_node;
|
||||
|
||||
/// Node hash to mask id map
|
||||
std::shared_ptr<std::map<IQueryTreeNode::Hash, size_t>> projection_mask_map;
|
||||
|
||||
struct ResolvedFunctionsCache
|
||||
{
|
||||
FunctionOverloadResolverPtr resolver;
|
||||
FunctionBasePtr function_base;
|
||||
};
|
||||
|
||||
std::map<IQueryTreeNode::Hash, ResolvedFunctionsCache> functions_cache;
|
||||
|
||||
[[maybe_unused]] const IdentifierResolveScope * getNearestQueryScope() const;
|
||||
|
||||
IdentifierResolveScope * getNearestQueryScope();
|
||||
|
||||
AnalysisTableExpressionData & getTableExpressionDataOrThrow(const QueryTreeNodePtr & table_expression_node);
|
||||
|
||||
const AnalysisTableExpressionData & getTableExpressionDataOrThrow(const QueryTreeNodePtr & table_expression_node) const;
|
||||
|
||||
void pushExpressionNode(const QueryTreeNodePtr & node);
|
||||
|
||||
void popExpressionNode();
|
||||
|
||||
/// Dump identifier resolve scope
|
||||
[[maybe_unused]] void dump(WriteBuffer & buffer) const;
|
||||
|
||||
[[maybe_unused]] String dump() const;
|
||||
};
|
||||
|
||||
}
|
22
src/Analyzer/Resolve/QueryAnalysisPass.cpp
Normal file
22
src/Analyzer/Resolve/QueryAnalysisPass.cpp
Normal file
@ -0,0 +1,22 @@
|
||||
#include <Analyzer/Passes/QueryAnalysisPass.h>
|
||||
#include <Analyzer/Resolve/QueryAnalyzer.h>
|
||||
#include <Analyzer/createUniqueTableAliases.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
QueryAnalysisPass::QueryAnalysisPass(QueryTreeNodePtr table_expression_, bool only_analyze_)
|
||||
: table_expression(std::move(table_expression_))
|
||||
, only_analyze(only_analyze_)
|
||||
{}
|
||||
|
||||
QueryAnalysisPass::QueryAnalysisPass(bool only_analyze_) : only_analyze(only_analyze_) {}
|
||||
|
||||
void QueryAnalysisPass::run(QueryTreeNodePtr & query_tree_node, ContextPtr context)
|
||||
{
|
||||
QueryAnalyzer analyzer(only_analyze);
|
||||
analyzer.resolve(query_tree_node, table_expression, context);
|
||||
createUniqueTableAliases(query_tree_node, table_expression, context);
|
||||
}
|
||||
|
||||
}
|
File diff suppressed because it is too large
Load Diff
378
src/Analyzer/Resolve/QueryAnalyzer.h
Normal file
378
src/Analyzer/Resolve/QueryAnalyzer.h
Normal file
@ -0,0 +1,378 @@
|
||||
#pragma once
|
||||
|
||||
#include <Interpreters/Context_fwd.h>
|
||||
#include <Analyzer/HashUtils.h>
|
||||
#include <Analyzer/IQueryTreeNode.h>
|
||||
#include <Analyzer/Resolve/IdentifierLookup.h>
|
||||
|
||||
#include <Core/Joins.h>
|
||||
#include <Core/NamesAndTypes.h>
|
||||
|
||||
#include <Parsers/NullsAction.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
struct GetColumnsOptions;
|
||||
struct IdentifierResolveScope;
|
||||
struct AnalysisTableExpressionData;
|
||||
class QueryExpressionsAliasVisitor ;
|
||||
|
||||
class QueryNode;
|
||||
class JoinNode;
|
||||
class ColumnNode;
|
||||
|
||||
using ProjectionName = String;
|
||||
using ProjectionNames = std::vector<ProjectionName>;
|
||||
|
||||
struct Settings;
|
||||
|
||||
/** Query analyzer implementation overview. Please check documentation in QueryAnalysisPass.h first.
|
||||
* And additional documentation for each method, where special cases are described in detail.
|
||||
*
|
||||
* Each node in query must be resolved. For each query tree node resolved state is specific.
|
||||
*
|
||||
* For constant node no resolve process exists, it is resolved during construction.
|
||||
*
|
||||
* For table node no resolve process exists, it is resolved during construction.
|
||||
*
|
||||
* For function node to be resolved parameters and arguments must be resolved, function node must be initialized with concrete aggregate or
|
||||
* non aggregate function and with result type.
|
||||
*
|
||||
* For lambda node there can be 2 different cases.
|
||||
* 1. Standalone: WITH (x -> x + 1) AS lambda SELECT lambda(1); Such lambdas are inlined in query tree during query analysis pass.
|
||||
* 2. Function arguments: WITH (x -> x + 1) AS lambda SELECT arrayMap(lambda, [1, 2, 3]); For such lambda resolution must
|
||||
* set concrete lambda arguments (initially they are identifier nodes) and resolve lambda expression body.
|
||||
*
|
||||
* For query node resolve process must resolve all its inner nodes.
|
||||
*
|
||||
* For matcher node resolve process must replace it with matched nodes.
|
||||
*
|
||||
* For identifier node resolve process must replace it with concrete non identifier node. This part is most complex because
|
||||
* for identifier resolution scopes and identifier lookup context play important part.
|
||||
*
|
||||
* ClickHouse SQL support lexical scoping for identifier resolution. Scope can be defined by query node or by expression node.
|
||||
* Expression nodes that can define scope are lambdas and table ALIAS columns.
|
||||
*
|
||||
* Identifier lookup context can be expression, function, table.
|
||||
*
|
||||
* Examples: WITH (x -> x + 1) as func SELECT func() FROM func; During function `func` resolution identifier lookup is performed
|
||||
* in function context.
|
||||
*
|
||||
* If there are no information of identifier context rules are following:
|
||||
* 1. Try to resolve identifier in expression context.
|
||||
* 2. Try to resolve identifier in function context, if it is allowed. Example: SELECT func(arguments); Here func identifier cannot be resolved in function context
|
||||
* because query projection does not support that.
|
||||
* 3. Try to resolve identifier in table context, if it is allowed. Example: SELECT table; Here table identifier cannot be resolved in function context
|
||||
* because query projection does not support that.
|
||||
*
|
||||
* TODO: This does not supported properly before, because matchers could not be resolved from aliases.
|
||||
*
|
||||
* Identifiers are resolved with following rules:
|
||||
* Resolution starts with current scope.
|
||||
* 1. Try to resolve identifier from expression scope arguments. Lambda expression arguments are greatest priority.
|
||||
* 2. Try to resolve identifier from aliases.
|
||||
* 3. Try to resolve identifier from join tree if scope is query, or if there are registered table columns in scope.
|
||||
* Steps 2 and 3 can be changed using prefer_column_name_to_alias setting.
|
||||
* 4. If it is table lookup, try to resolve identifier from CTE.
|
||||
* If identifier could not be resolved in current scope, resolution must be continued in parent scopes.
|
||||
* 5. Try to resolve identifier from parent scopes.
|
||||
*
|
||||
* Additional rules about aliases and scopes.
|
||||
* 1. Parent scope cannot refer alias from child scope.
|
||||
* 2. Child scope can refer to alias in parent scope.
|
||||
*
|
||||
* Example: SELECT arrayMap(x -> x + 1 AS a, [1,2,3]), a; Identifier a is unknown in parent scope.
|
||||
* Example: SELECT a FROM (SELECT 1 as a); Here we do not refer to alias a from child query scope. But we query it projection result, similar to tables.
|
||||
* Example: WITH 1 as a SELECT (SELECT a) as b; Here in child scope identifier a is resolved using alias from parent scope.
|
||||
*
|
||||
* Additional rules about identifier binding.
|
||||
* Bind for identifier to entity means that identifier first part match some node during analysis.
|
||||
* If other parts of identifier cannot be resolved in that node, exception must be thrown.
|
||||
*
|
||||
* Example:
|
||||
* CREATE TABLE test_table (id UInt64, compound_value Tuple(value UInt64)) ENGINE=TinyLog;
|
||||
* SELECT compound_value.value, 1 AS compound_value FROM test_table;
|
||||
* Identifier first part compound_value bound to entity with alias compound_value, but nested identifier part cannot be resolved from entity,
|
||||
* lookup should not be continued, and exception must be thrown because if lookup continues that way identifier can be resolved from join tree.
|
||||
*
|
||||
* TODO: This was not supported properly before analyzer because nested identifier could not be resolved from alias.
|
||||
*
|
||||
* More complex example:
|
||||
* CREATE TABLE test_table (id UInt64, value UInt64) ENGINE=TinyLog;
|
||||
* WITH cast(('Value'), 'Tuple (value UInt64') AS value SELECT (SELECT value FROM test_table);
|
||||
* Identifier first part value bound to test_table column value, but nested identifier part cannot be resolved from it,
|
||||
* lookup should not be continued, and exception must be thrown because if lookup continues identifier can be resolved from parent scope.
|
||||
*
|
||||
* TODO: Update exception messages
|
||||
* TODO: Table identifiers with optional UUID.
|
||||
* TODO: Lookup functions arrayReduce(sum, [1, 2, 3]);
|
||||
* TODO: Support function identifier resolve from parent query scope, if lambda in parent scope does not capture any columns.
|
||||
*/
|
||||
|
||||
class QueryAnalyzer
|
||||
{
|
||||
public:
|
||||
explicit QueryAnalyzer(bool only_analyze_);
|
||||
~QueryAnalyzer();
|
||||
|
||||
void resolve(QueryTreeNodePtr & node, const QueryTreeNodePtr & table_expression, ContextPtr context);
|
||||
|
||||
private:
|
||||
/// Utility functions
|
||||
|
||||
static bool isExpressionNodeType(QueryTreeNodeType node_type);
|
||||
|
||||
static bool isFunctionExpressionNodeType(QueryTreeNodeType node_type);
|
||||
|
||||
static bool isSubqueryNodeType(QueryTreeNodeType node_type);
|
||||
|
||||
static bool isTableExpressionNodeType(QueryTreeNodeType node_type);
|
||||
|
||||
static DataTypePtr getExpressionNodeResultTypeOrNull(const QueryTreeNodePtr & query_tree_node);
|
||||
|
||||
static ProjectionName calculateFunctionProjectionName(const QueryTreeNodePtr & function_node,
|
||||
const ProjectionNames & parameters_projection_names,
|
||||
const ProjectionNames & arguments_projection_names);
|
||||
|
||||
static ProjectionName calculateWindowProjectionName(const QueryTreeNodePtr & window_node,
|
||||
const QueryTreeNodePtr & parent_window_node,
|
||||
const String & parent_window_name,
|
||||
const ProjectionNames & partition_by_projection_names,
|
||||
const ProjectionNames & order_by_projection_names,
|
||||
const ProjectionName & frame_begin_offset_projection_name,
|
||||
const ProjectionName & frame_end_offset_projection_name);
|
||||
|
||||
static ProjectionName calculateSortColumnProjectionName(const QueryTreeNodePtr & sort_column_node,
|
||||
const ProjectionName & sort_expression_projection_name,
|
||||
const ProjectionName & fill_from_expression_projection_name,
|
||||
const ProjectionName & fill_to_expression_projection_name,
|
||||
const ProjectionName & fill_step_expression_projection_name);
|
||||
|
||||
static void collectCompoundExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
|
||||
const DataTypePtr & compound_expression_type,
|
||||
const Identifier & valid_identifier_prefix,
|
||||
std::unordered_set<Identifier> & valid_identifiers_result);
|
||||
|
||||
static void collectTableExpressionValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
|
||||
const QueryTreeNodePtr & table_expression,
|
||||
const AnalysisTableExpressionData & table_expression_data,
|
||||
std::unordered_set<Identifier> & valid_identifiers_result);
|
||||
|
||||
static void collectScopeValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
|
||||
const IdentifierResolveScope & scope,
|
||||
bool allow_expression_identifiers,
|
||||
bool allow_function_identifiers,
|
||||
bool allow_table_expression_identifiers,
|
||||
std::unordered_set<Identifier> & valid_identifiers_result);
|
||||
|
||||
static void collectScopeWithParentScopesValidIdentifiersForTypoCorrection(const Identifier & unresolved_identifier,
|
||||
const IdentifierResolveScope & scope,
|
||||
bool allow_expression_identifiers,
|
||||
bool allow_function_identifiers,
|
||||
bool allow_table_expression_identifiers,
|
||||
std::unordered_set<Identifier> & valid_identifiers_result);
|
||||
|
||||
static std::vector<String> collectIdentifierTypoHints(const Identifier & unresolved_identifier, const std::unordered_set<Identifier> & valid_identifiers);
|
||||
|
||||
static QueryTreeNodePtr wrapExpressionNodeInTupleElement(QueryTreeNodePtr expression_node, IdentifierView nested_path);
|
||||
|
||||
QueryTreeNodePtr tryGetLambdaFromSQLUserDefinedFunctions(const std::string & function_name, ContextPtr context);
|
||||
|
||||
void evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & query_tree_node, IdentifierResolveScope & scope);
|
||||
|
||||
static void mergeWindowWithParentWindow(const QueryTreeNodePtr & window_node, const QueryTreeNodePtr & parent_window_node, IdentifierResolveScope & scope);
|
||||
|
||||
void replaceNodesWithPositionalArguments(QueryTreeNodePtr & node_list, const QueryTreeNodes & projection_nodes, IdentifierResolveScope & scope);
|
||||
|
||||
static void convertLimitOffsetExpression(QueryTreeNodePtr & expression_node, const String & expression_description, IdentifierResolveScope & scope);
|
||||
|
||||
static void validateTableExpressionModifiers(const QueryTreeNodePtr & table_expression_node, IdentifierResolveScope & scope);
|
||||
|
||||
static void validateJoinTableExpressionWithoutAlias(const QueryTreeNodePtr & join_node, const QueryTreeNodePtr & table_expression_node, IdentifierResolveScope & scope);
|
||||
|
||||
static void checkDuplicateTableNamesOrAlias(const QueryTreeNodePtr & join_node, QueryTreeNodePtr & left_table_expr, QueryTreeNodePtr & right_table_expr, IdentifierResolveScope & scope);
|
||||
|
||||
static std::pair<bool, UInt64> recursivelyCollectMaxOrdinaryExpressions(QueryTreeNodePtr & node, QueryTreeNodes & into);
|
||||
|
||||
static void expandGroupByAll(QueryNode & query_tree_node_typed);
|
||||
|
||||
void expandOrderByAll(QueryNode & query_tree_node_typed, const Settings & settings);
|
||||
|
||||
static std::string
|
||||
rewriteAggregateFunctionNameIfNeeded(const std::string & aggregate_function_name, NullsAction action, const ContextPtr & context);
|
||||
|
||||
static std::optional<JoinTableSide> getColumnSideFromJoinTree(const QueryTreeNodePtr & resolved_identifier, const JoinNode & join_node);
|
||||
|
||||
static QueryTreeNodePtr convertJoinedColumnTypeToNullIfNeeded(
|
||||
const QueryTreeNodePtr & resolved_identifier,
|
||||
const JoinKind & join_kind,
|
||||
std::optional<JoinTableSide> resolved_side,
|
||||
IdentifierResolveScope & scope);
|
||||
|
||||
/// Resolve identifier functions
|
||||
|
||||
static QueryTreeNodePtr tryResolveTableIdentifierFromDatabaseCatalog(const Identifier & table_identifier, ContextPtr context);
|
||||
|
||||
QueryTreeNodePtr tryResolveIdentifierFromCompoundExpression(const Identifier & expression_identifier,
|
||||
size_t identifier_bind_size,
|
||||
const QueryTreeNodePtr & compound_expression,
|
||||
String compound_expression_source,
|
||||
IdentifierResolveScope & scope,
|
||||
bool can_be_not_found = false);
|
||||
|
||||
QueryTreeNodePtr tryResolveIdentifierFromExpressionArguments(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
|
||||
|
||||
static bool tryBindIdentifierToAliases(const IdentifierLookup & identifier_lookup, const IdentifierResolveScope & scope);
|
||||
|
||||
QueryTreeNodePtr tryResolveIdentifierFromAliases(const IdentifierLookup & identifier_lookup,
|
||||
IdentifierResolveScope & scope,
|
||||
IdentifierResolveSettings identifier_resolve_settings);
|
||||
|
||||
QueryTreeNodePtr tryResolveIdentifierFromTableColumns(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
|
||||
|
||||
static bool tryBindIdentifierToTableExpression(const IdentifierLookup & identifier_lookup,
|
||||
const QueryTreeNodePtr & table_expression_node,
|
||||
const IdentifierResolveScope & scope);
|
||||
|
||||
static bool tryBindIdentifierToTableExpressions(const IdentifierLookup & identifier_lookup,
|
||||
const QueryTreeNodePtr & table_expression_node,
|
||||
const IdentifierResolveScope & scope);
|
||||
|
||||
QueryTreeNodePtr tryResolveIdentifierFromTableExpression(const IdentifierLookup & identifier_lookup,
|
||||
const QueryTreeNodePtr & table_expression_node,
|
||||
IdentifierResolveScope & scope);
|
||||
|
||||
QueryTreeNodePtr tryResolveIdentifierFromJoin(const IdentifierLookup & identifier_lookup,
|
||||
const QueryTreeNodePtr & table_expression_node,
|
||||
IdentifierResolveScope & scope);
|
||||
|
||||
QueryTreeNodePtr matchArrayJoinSubcolumns(
|
||||
const QueryTreeNodePtr & array_join_column_inner_expression,
|
||||
const ColumnNode & array_join_column_expression_typed,
|
||||
const QueryTreeNodePtr & resolved_expression,
|
||||
IdentifierResolveScope & scope);
|
||||
|
||||
QueryTreeNodePtr tryResolveExpressionFromArrayJoinExpressions(const QueryTreeNodePtr & resolved_expression,
|
||||
const QueryTreeNodePtr & table_expression_node,
|
||||
IdentifierResolveScope & scope);
|
||||
|
||||
QueryTreeNodePtr tryResolveIdentifierFromArrayJoin(const IdentifierLookup & identifier_lookup,
|
||||
const QueryTreeNodePtr & table_expression_node,
|
||||
IdentifierResolveScope & scope);
|
||||
|
||||
QueryTreeNodePtr tryResolveIdentifierFromJoinTreeNode(const IdentifierLookup & identifier_lookup,
|
||||
const QueryTreeNodePtr & join_tree_node,
|
||||
IdentifierResolveScope & scope);
|
||||
|
||||
QueryTreeNodePtr tryResolveIdentifierFromJoinTree(const IdentifierLookup & identifier_lookup,
|
||||
IdentifierResolveScope & scope);
|
||||
|
||||
IdentifierResolveResult tryResolveIdentifierInParentScopes(const IdentifierLookup & identifier_lookup, IdentifierResolveScope & scope);
|
||||
|
||||
IdentifierResolveResult tryResolveIdentifier(const IdentifierLookup & identifier_lookup,
|
||||
IdentifierResolveScope & scope,
|
||||
IdentifierResolveSettings identifier_resolve_settings = {});
|
||||
|
||||
QueryTreeNodePtr tryResolveIdentifierFromStorage(
|
||||
const Identifier & identifier,
|
||||
const QueryTreeNodePtr & table_expression_node,
|
||||
const AnalysisTableExpressionData & table_expression_data,
|
||||
IdentifierResolveScope & scope,
|
||||
size_t identifier_column_qualifier_parts,
|
||||
bool can_be_not_found = false);
|
||||
|
||||
/// Resolve query tree nodes functions
|
||||
|
||||
void qualifyColumnNodesWithProjectionNames(const QueryTreeNodes & column_nodes,
|
||||
const QueryTreeNodePtr & table_expression_node,
|
||||
const IdentifierResolveScope & scope);
|
||||
|
||||
static GetColumnsOptions buildGetColumnsOptions(QueryTreeNodePtr & matcher_node, const ContextPtr & context);
|
||||
|
||||
using QueryTreeNodesWithNames = std::vector<std::pair<QueryTreeNodePtr, std::string>>;
|
||||
|
||||
QueryTreeNodesWithNames getMatchedColumnNodesWithNames(const QueryTreeNodePtr & matcher_node,
|
||||
const QueryTreeNodePtr & table_expression_node,
|
||||
const NamesAndTypes & matched_columns,
|
||||
const IdentifierResolveScope & scope);
|
||||
|
||||
void updateMatchedColumnsFromJoinUsing(QueryTreeNodesWithNames & result_matched_column_nodes_with_names, const QueryTreeNodePtr & source_table_expression, IdentifierResolveScope & scope);
|
||||
|
||||
QueryTreeNodesWithNames resolveQualifiedMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope);
|
||||
|
||||
QueryTreeNodesWithNames resolveUnqualifiedMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope);
|
||||
|
||||
ProjectionNames resolveMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope);
|
||||
|
||||
ProjectionName resolveWindow(QueryTreeNodePtr & window_node, IdentifierResolveScope & scope);
|
||||
|
||||
ProjectionNames resolveLambda(const QueryTreeNodePtr & lambda_node,
|
||||
const QueryTreeNodePtr & lambda_node_to_resolve,
|
||||
const QueryTreeNodes & lambda_arguments,
|
||||
IdentifierResolveScope & scope);
|
||||
|
||||
ProjectionNames resolveFunction(QueryTreeNodePtr & function_node, IdentifierResolveScope & scope);
|
||||
|
||||
ProjectionNames resolveExpressionNode(QueryTreeNodePtr & node, IdentifierResolveScope & scope, bool allow_lambda_expression, bool allow_table_expression, bool ignore_alias = false);
|
||||
|
||||
ProjectionNames resolveExpressionNodeList(QueryTreeNodePtr & node_list, IdentifierResolveScope & scope, bool allow_lambda_expression, bool allow_table_expression);
|
||||
|
||||
ProjectionNames resolveSortNodeList(QueryTreeNodePtr & sort_node_list, IdentifierResolveScope & scope);
|
||||
|
||||
void resolveGroupByNode(QueryNode & query_node_typed, IdentifierResolveScope & scope);
|
||||
|
||||
void resolveInterpolateColumnsNodeList(QueryTreeNodePtr & interpolate_node_list, IdentifierResolveScope & scope);
|
||||
|
||||
void resolveWindowNodeList(QueryTreeNodePtr & window_node_list, IdentifierResolveScope & scope);
|
||||
|
||||
NamesAndTypes resolveProjectionExpressionNodeList(QueryTreeNodePtr & projection_node_list, IdentifierResolveScope & scope);
|
||||
|
||||
void initializeQueryJoinTreeNode(QueryTreeNodePtr & join_tree_node, IdentifierResolveScope & scope);
|
||||
|
||||
void initializeTableExpressionData(const QueryTreeNodePtr & table_expression_node, IdentifierResolveScope & scope);
|
||||
|
||||
void resolveTableFunction(QueryTreeNodePtr & table_function_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor, bool nested_table_function);
|
||||
|
||||
void resolveArrayJoin(QueryTreeNodePtr & array_join_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor);
|
||||
|
||||
void resolveJoin(QueryTreeNodePtr & join_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor);
|
||||
|
||||
void resolveQueryJoinTreeNode(QueryTreeNodePtr & join_tree_node, IdentifierResolveScope & scope, QueryExpressionsAliasVisitor & expressions_visitor);
|
||||
|
||||
void resolveQuery(const QueryTreeNodePtr & query_node, IdentifierResolveScope & scope);
|
||||
|
||||
void resolveUnion(const QueryTreeNodePtr & union_node, IdentifierResolveScope & scope);
|
||||
|
||||
/// Lambdas that are currently in resolve process
|
||||
std::unordered_set<IQueryTreeNode *> lambdas_in_resolve_process;
|
||||
|
||||
/// CTEs that are currently in resolve process
|
||||
std::unordered_set<std::string_view> ctes_in_resolve_process;
|
||||
|
||||
/// Function name to user defined lambda map
|
||||
std::unordered_map<std::string, QueryTreeNodePtr> function_name_to_user_defined_lambda;
|
||||
|
||||
/// Array join expressions counter
|
||||
size_t array_join_expressions_counter = 1;
|
||||
|
||||
/// Subquery counter
|
||||
size_t subquery_counter = 1;
|
||||
|
||||
/// Global expression node to projection name map
|
||||
std::unordered_map<QueryTreeNodePtr, ProjectionName> node_to_projection_name;
|
||||
|
||||
/// Global resolve expression node to projection names map
|
||||
std::unordered_map<QueryTreeNodePtr, ProjectionNames> resolved_expressions;
|
||||
|
||||
/// Global resolve expression node to tree size
|
||||
std::unordered_map<QueryTreeNodePtr, size_t> node_to_tree_size;
|
||||
|
||||
/// Global scalar subquery to scalar value map
|
||||
std::unordered_map<QueryTreeNodePtrWithHash, Block> scalar_subquery_to_scalar_value_local;
|
||||
std::unordered_map<QueryTreeNodePtrWithHash, Block> scalar_subquery_to_scalar_value_global;
|
||||
|
||||
const bool only_analyze;
|
||||
};
|
||||
|
||||
}
|
119
src/Analyzer/Resolve/QueryExpressionsAliasVisitor.h
Normal file
119
src/Analyzer/Resolve/QueryExpressionsAliasVisitor.h
Normal file
@ -0,0 +1,119 @@
|
||||
#pragma once
|
||||
|
||||
#include <Analyzer/InDepthQueryTreeVisitor.h>
|
||||
#include <Analyzer/Resolve/ScopeAliases.h>
|
||||
#include <Analyzer/LambdaNode.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/** Visitor that extracts expression and function aliases from node and initialize scope tables with it.
|
||||
* Does not go into child lambdas and queries.
|
||||
*
|
||||
* Important:
|
||||
* Identifier nodes with aliases are added both in alias to expression and alias to function map.
|
||||
*
|
||||
* These is necessary because identifier with alias can give alias name to any query tree node.
|
||||
*
|
||||
* Example:
|
||||
* WITH (x -> x + 1) AS id, id AS value SELECT value(1);
|
||||
* In this example id as value is identifier node that has alias, during scope initialization we cannot derive
|
||||
* that id is actually lambda or expression.
|
||||
*
|
||||
* There are no easy solution here, without trying to make full featured expression resolution at this stage.
|
||||
* Example:
|
||||
* WITH (x -> x + 1) AS id, id AS id_1, id_1 AS id_2 SELECT id_2(1);
|
||||
* Example: SELECT a, b AS a, b AS c, 1 AS c;
|
||||
*
|
||||
* It is client responsibility after resolving identifier node with alias, make following actions:
|
||||
* 1. If identifier node was resolved in function scope, remove alias from scope expression map.
|
||||
* 2. If identifier node was resolved in expression scope, remove alias from scope function map.
|
||||
*
|
||||
* That way we separate alias map initialization and expressions resolution.
|
||||
*/
|
||||
class QueryExpressionsAliasVisitor : public InDepthQueryTreeVisitor<QueryExpressionsAliasVisitor>
|
||||
{
|
||||
public:
|
||||
explicit QueryExpressionsAliasVisitor(ScopeAliases & aliases_)
|
||||
: aliases(aliases_)
|
||||
{}
|
||||
|
||||
void visitImpl(QueryTreeNodePtr & node)
|
||||
{
|
||||
updateAliasesIfNeeded(node, false /*is_lambda_node*/);
|
||||
}
|
||||
|
||||
bool needChildVisit(const QueryTreeNodePtr &, const QueryTreeNodePtr & child)
|
||||
{
|
||||
if (auto * lambda_node = child->as<LambdaNode>())
|
||||
{
|
||||
updateAliasesIfNeeded(child, true /*is_lambda_node*/);
|
||||
return false;
|
||||
}
|
||||
else if (auto * query_tree_node = child->as<QueryNode>())
|
||||
{
|
||||
if (query_tree_node->isCTE())
|
||||
return false;
|
||||
|
||||
updateAliasesIfNeeded(child, false /*is_lambda_node*/);
|
||||
return false;
|
||||
}
|
||||
else if (auto * union_node = child->as<UnionNode>())
|
||||
{
|
||||
if (union_node->isCTE())
|
||||
return false;
|
||||
|
||||
updateAliasesIfNeeded(child, false /*is_lambda_node*/);
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
private:
|
||||
void addDuplicatingAlias(const QueryTreeNodePtr & node)
|
||||
{
|
||||
aliases.nodes_with_duplicated_aliases.emplace(node);
|
||||
auto cloned_node = node->clone();
|
||||
aliases.cloned_nodes_with_duplicated_aliases.emplace_back(cloned_node);
|
||||
aliases.nodes_with_duplicated_aliases.emplace(cloned_node);
|
||||
}
|
||||
|
||||
void updateAliasesIfNeeded(const QueryTreeNodePtr & node, bool is_lambda_node)
|
||||
{
|
||||
if (!node->hasAlias())
|
||||
return;
|
||||
|
||||
// We should not resolve expressions to WindowNode
|
||||
if (node->getNodeType() == QueryTreeNodeType::WINDOW)
|
||||
return;
|
||||
|
||||
const auto & alias = node->getAlias();
|
||||
|
||||
if (is_lambda_node)
|
||||
{
|
||||
if (aliases.alias_name_to_expression_node->contains(alias))
|
||||
addDuplicatingAlias(node);
|
||||
|
||||
auto [_, inserted] = aliases.alias_name_to_lambda_node.insert(std::make_pair(alias, node));
|
||||
if (!inserted)
|
||||
addDuplicatingAlias(node);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
if (aliases.alias_name_to_lambda_node.contains(alias))
|
||||
addDuplicatingAlias(node);
|
||||
|
||||
auto [_, inserted] = aliases.alias_name_to_expression_node->insert(std::make_pair(alias, node));
|
||||
if (!inserted)
|
||||
addDuplicatingAlias(node);
|
||||
|
||||
/// If node is identifier put it into transitive aliases map.
|
||||
if (const auto * identifier = typeid_cast<const IdentifierNode *>(node.get()))
|
||||
aliases.transitive_aliases.insert(std::make_pair(alias, identifier->getIdentifier()));
|
||||
}
|
||||
|
||||
ScopeAliases & aliases;
|
||||
};
|
||||
|
||||
}
|
91
src/Analyzer/Resolve/ScopeAliases.h
Normal file
91
src/Analyzer/Resolve/ScopeAliases.h
Normal file
@ -0,0 +1,91 @@
|
||||
#pragma once
|
||||
|
||||
#include <Analyzer/IQueryTreeNode.h>
|
||||
#include <Analyzer/Resolve/IdentifierLookup.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
struct ScopeAliases
|
||||
{
|
||||
/// Alias name to query expression node
|
||||
std::unordered_map<std::string, QueryTreeNodePtr> alias_name_to_expression_node_before_group_by;
|
||||
std::unordered_map<std::string, QueryTreeNodePtr> alias_name_to_expression_node_after_group_by;
|
||||
|
||||
std::unordered_map<std::string, QueryTreeNodePtr> * alias_name_to_expression_node = nullptr;
|
||||
|
||||
/// Alias name to lambda node
|
||||
std::unordered_map<std::string, QueryTreeNodePtr> alias_name_to_lambda_node;
|
||||
|
||||
/// Alias name to table expression node
|
||||
std::unordered_map<std::string, QueryTreeNodePtr> alias_name_to_table_expression_node;
|
||||
|
||||
/// Expressions like `x as y` where we can't say whether it's a function, expression or table.
|
||||
std::unordered_map<std::string, Identifier> transitive_aliases;
|
||||
|
||||
/// Nodes with duplicated aliases
|
||||
std::unordered_set<QueryTreeNodePtr> nodes_with_duplicated_aliases;
|
||||
std::vector<QueryTreeNodePtr> cloned_nodes_with_duplicated_aliases;
|
||||
|
||||
/// Names which are aliases from ARRAY JOIN.
|
||||
/// This is needed to properly qualify columns from matchers and avoid name collision.
|
||||
std::unordered_set<std::string> array_join_aliases;
|
||||
|
||||
std::unordered_map<std::string, QueryTreeNodePtr> & getAliasMap(IdentifierLookupContext lookup_context)
|
||||
{
|
||||
switch (lookup_context)
|
||||
{
|
||||
case IdentifierLookupContext::EXPRESSION: return *alias_name_to_expression_node;
|
||||
case IdentifierLookupContext::FUNCTION: return alias_name_to_lambda_node;
|
||||
case IdentifierLookupContext::TABLE_EXPRESSION: return alias_name_to_table_expression_node;
|
||||
}
|
||||
}
|
||||
|
||||
enum class FindOption
|
||||
{
|
||||
FIRST_NAME,
|
||||
FULL_NAME,
|
||||
};
|
||||
|
||||
const std::string & getKey(const Identifier & identifier, FindOption find_option)
|
||||
{
|
||||
switch (find_option)
|
||||
{
|
||||
case FindOption::FIRST_NAME: return identifier.front();
|
||||
case FindOption::FULL_NAME: return identifier.getFullName();
|
||||
}
|
||||
}
|
||||
|
||||
QueryTreeNodePtr * find(IdentifierLookup lookup, FindOption find_option)
|
||||
{
|
||||
auto & alias_map = getAliasMap(lookup.lookup_context);
|
||||
const std::string * key = &getKey(lookup.identifier, find_option);
|
||||
|
||||
auto it = alias_map.find(*key);
|
||||
|
||||
if (it != alias_map.end())
|
||||
return &it->second;
|
||||
|
||||
if (lookup.lookup_context == IdentifierLookupContext::TABLE_EXPRESSION)
|
||||
return {};
|
||||
|
||||
while (it == alias_map.end())
|
||||
{
|
||||
auto jt = transitive_aliases.find(*key);
|
||||
if (jt == transitive_aliases.end())
|
||||
return {};
|
||||
|
||||
key = &(getKey(jt->second, find_option));
|
||||
it = alias_map.find(*key);
|
||||
}
|
||||
|
||||
return &it->second;
|
||||
}
|
||||
|
||||
const QueryTreeNodePtr * find(IdentifierLookup lookup, FindOption find_option) const
|
||||
{
|
||||
return const_cast<ScopeAliases *>(this)->find(lookup, find_option);
|
||||
}
|
||||
};
|
||||
|
||||
}
|
83
src/Analyzer/Resolve/TableExpressionData.h
Normal file
83
src/Analyzer/Resolve/TableExpressionData.h
Normal file
@ -0,0 +1,83 @@
|
||||
#pragma once
|
||||
|
||||
#include <IO/Operators.h>
|
||||
#include <Analyzer/ColumnNode.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
struct StringTransparentHash
|
||||
{
|
||||
using is_transparent = void;
|
||||
using hash = std::hash<std::string_view>;
|
||||
|
||||
[[maybe_unused]] size_t operator()(const char * data) const
|
||||
{
|
||||
return hash()(data);
|
||||
}
|
||||
|
||||
size_t operator()(std::string_view data) const
|
||||
{
|
||||
return hash()(data);
|
||||
}
|
||||
|
||||
size_t operator()(const std::string & data) const
|
||||
{
|
||||
return hash()(data);
|
||||
}
|
||||
};
|
||||
|
||||
using ColumnNameToColumnNodeMap = std::unordered_map<std::string, ColumnNodePtr, StringTransparentHash, std::equal_to<>>;
|
||||
|
||||
struct AnalysisTableExpressionData
|
||||
{
|
||||
std::string table_expression_name;
|
||||
std::string table_expression_description;
|
||||
std::string database_name;
|
||||
std::string table_name;
|
||||
bool should_qualify_columns = true;
|
||||
NamesAndTypes column_names_and_types;
|
||||
ColumnNameToColumnNodeMap column_name_to_column_node;
|
||||
std::unordered_set<std::string> subcolumn_names; /// Subset columns that are subcolumns of other columns
|
||||
std::unordered_set<std::string, StringTransparentHash, std::equal_to<>> column_identifier_first_parts;
|
||||
|
||||
bool hasFullIdentifierName(IdentifierView identifier_view) const
|
||||
{
|
||||
return column_name_to_column_node.contains(identifier_view.getFullName());
|
||||
}
|
||||
|
||||
bool canBindIdentifier(IdentifierView identifier_view) const
|
||||
{
|
||||
return column_identifier_first_parts.contains(identifier_view.at(0));
|
||||
}
|
||||
|
||||
[[maybe_unused]] void dump(WriteBuffer & buffer) const
|
||||
{
|
||||
buffer << "Table expression name " << table_expression_name;
|
||||
|
||||
if (!table_expression_description.empty())
|
||||
buffer << " table expression description " << table_expression_description;
|
||||
|
||||
if (!database_name.empty())
|
||||
buffer << " database name " << database_name;
|
||||
|
||||
if (!table_name.empty())
|
||||
buffer << " table name " << table_name;
|
||||
|
||||
buffer << " should qualify columns " << should_qualify_columns;
|
||||
buffer << " columns size " << column_name_to_column_node.size() << '\n';
|
||||
|
||||
for (const auto & [column_name, column_node] : column_name_to_column_node)
|
||||
buffer << "Column name " << column_name << " column node " << column_node->dumpTree() << '\n';
|
||||
}
|
||||
|
||||
[[maybe_unused]] String dump() const
|
||||
{
|
||||
WriteBufferFromOwnString buffer;
|
||||
dump(buffer);
|
||||
|
||||
return buffer.str();
|
||||
}
|
||||
};
|
||||
|
||||
}
|
71
src/Analyzer/Resolve/TableExpressionsAliasVisitor.h
Normal file
71
src/Analyzer/Resolve/TableExpressionsAliasVisitor.h
Normal file
@ -0,0 +1,71 @@
|
||||
#pragma once
|
||||
|
||||
#include <Analyzer/InDepthQueryTreeVisitor.h>
|
||||
#include <Analyzer/Resolve/IdentifierResolveScope.h>
|
||||
#include <Analyzer/ArrayJoinNode.h>
|
||||
#include <Analyzer/JoinNode.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int MULTIPLE_EXPRESSIONS_FOR_ALIAS;
|
||||
}
|
||||
|
||||
class TableExpressionsAliasVisitor : public InDepthQueryTreeVisitor<TableExpressionsAliasVisitor>
|
||||
{
|
||||
public:
|
||||
explicit TableExpressionsAliasVisitor(IdentifierResolveScope & scope_)
|
||||
: scope(scope_)
|
||||
{}
|
||||
|
||||
void visitImpl(QueryTreeNodePtr & node)
|
||||
{
|
||||
updateAliasesIfNeeded(node);
|
||||
}
|
||||
|
||||
static bool needChildVisit(const QueryTreeNodePtr & node, const QueryTreeNodePtr & child)
|
||||
{
|
||||
auto node_type = node->getNodeType();
|
||||
|
||||
switch (node_type)
|
||||
{
|
||||
case QueryTreeNodeType::ARRAY_JOIN:
|
||||
{
|
||||
const auto & array_join_node = node->as<const ArrayJoinNode &>();
|
||||
return child.get() == array_join_node.getTableExpression().get();
|
||||
}
|
||||
case QueryTreeNodeType::JOIN:
|
||||
{
|
||||
const auto & join_node = node->as<const JoinNode &>();
|
||||
return child.get() == join_node.getLeftTableExpression().get() || child.get() == join_node.getRightTableExpression().get();
|
||||
}
|
||||
default:
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
private:
|
||||
void updateAliasesIfNeeded(const QueryTreeNodePtr & node)
|
||||
{
|
||||
if (!node->hasAlias())
|
||||
return;
|
||||
|
||||
const auto & node_alias = node->getAlias();
|
||||
auto [_, inserted] = scope.aliases.alias_name_to_table_expression_node.emplace(node_alias, node);
|
||||
if (!inserted)
|
||||
throw Exception(ErrorCodes::MULTIPLE_EXPRESSIONS_FOR_ALIAS,
|
||||
"Multiple table expressions with same alias {}. In scope {}",
|
||||
node_alias,
|
||||
scope.scope_node->formatASTForErrorMessage());
|
||||
}
|
||||
|
||||
IdentifierResolveScope & scope;
|
||||
};
|
||||
|
||||
}
|
@ -188,6 +188,7 @@ void BackupReaderS3::copyFileToDisk(const String & path_in_backup, size_t file_s
|
||||
fs::path(s3_uri.key) / path_in_backup,
|
||||
0,
|
||||
file_size,
|
||||
/* dest_s3_client= */ destination_disk->getS3StorageClient(),
|
||||
/* dest_bucket= */ blob_path[1],
|
||||
/* dest_key= */ blob_path[0],
|
||||
s3_settings.request_settings,
|
||||
@ -252,18 +253,20 @@ void BackupWriterS3::copyFileFromDisk(const String & path_in_backup, DiskPtr src
|
||||
{
|
||||
LOG_TRACE(log, "Copying file {} from disk {} to S3", src_path, src_disk->getName());
|
||||
copyS3File(
|
||||
client,
|
||||
src_disk->getS3StorageClient(),
|
||||
/* src_bucket */ blob_path[1],
|
||||
/* src_key= */ blob_path[0],
|
||||
start_pos,
|
||||
length,
|
||||
s3_uri.bucket,
|
||||
fs::path(s3_uri.key) / path_in_backup,
|
||||
/* dest_s3_client= */ client,
|
||||
/* dest_bucket= */ s3_uri.bucket,
|
||||
/* dest_key= */ fs::path(s3_uri.key) / path_in_backup,
|
||||
s3_settings.request_settings,
|
||||
read_settings,
|
||||
blob_storage_log,
|
||||
{},
|
||||
threadPoolCallbackRunnerUnsafe<void>(getBackupsIOThreadPool().get(), "BackupWriterS3"));
|
||||
threadPoolCallbackRunnerUnsafe<void>(getBackupsIOThreadPool().get(), "BackupWriterS3"),
|
||||
/*for_disk_s3=*/false);
|
||||
return; /// copied!
|
||||
}
|
||||
}
|
||||
@ -281,8 +284,9 @@ void BackupWriterS3::copyFile(const String & destination, const String & source,
|
||||
/* src_key= */ fs::path(s3_uri.key) / source,
|
||||
0,
|
||||
size,
|
||||
s3_uri.bucket,
|
||||
fs::path(s3_uri.key) / destination,
|
||||
/* dest_s3_client= */ client,
|
||||
/* dest_bucket= */ s3_uri.bucket,
|
||||
/* dest_key= */ fs::path(s3_uri.key) / destination,
|
||||
s3_settings.request_settings,
|
||||
read_settings,
|
||||
blob_storage_log,
|
||||
|
@ -215,6 +215,7 @@ add_object_library(clickhouse_databases_mysql Databases/MySQL)
|
||||
add_object_library(clickhouse_disks Disks)
|
||||
add_object_library(clickhouse_analyzer Analyzer)
|
||||
add_object_library(clickhouse_analyzer_passes Analyzer/Passes)
|
||||
add_object_library(clickhouse_analyzer_passes Analyzer/Resolve)
|
||||
add_object_library(clickhouse_planner Planner)
|
||||
add_object_library(clickhouse_interpreters Interpreters)
|
||||
add_object_library(clickhouse_interpreters_cache Interpreters/Cache)
|
||||
|
@ -322,7 +322,9 @@ ColumnPtr ColumnSparse::filter(const Filter & filt, ssize_t) const
|
||||
|
||||
size_t res_offset = 0;
|
||||
auto offset_it = begin();
|
||||
for (size_t i = 0; i < _size; ++i, ++offset_it)
|
||||
/// Replace the `++offset_it` with `offset_it.increaseCurrentRow()` and `offset_it.increaseCurrentOffset()`,
|
||||
/// to remove the redundant `isDefault()` in `++` of `Interator` and reuse the following `isDefault()`.
|
||||
for (size_t i = 0; i < _size; ++i, offset_it.increaseCurrentRow())
|
||||
{
|
||||
if (!offset_it.isDefault())
|
||||
{
|
||||
@ -337,6 +339,7 @@ ColumnPtr ColumnSparse::filter(const Filter & filt, ssize_t) const
|
||||
{
|
||||
values_filter.push_back(0);
|
||||
}
|
||||
offset_it.increaseCurrentOffset();
|
||||
}
|
||||
else
|
||||
{
|
||||
|
@ -181,14 +181,16 @@ public:
|
||||
{
|
||||
public:
|
||||
Iterator(const PaddedPODArray<UInt64> & offsets_, size_t size_, size_t current_offset_, size_t current_row_)
|
||||
: offsets(offsets_), size(size_), current_offset(current_offset_), current_row(current_row_)
|
||||
: offsets(offsets_), offsets_size(offsets.size()), size(size_), current_offset(current_offset_), current_row(current_row_)
|
||||
{
|
||||
}
|
||||
|
||||
bool ALWAYS_INLINE isDefault() const { return current_offset == offsets.size() || current_row != offsets[current_offset]; }
|
||||
bool ALWAYS_INLINE isDefault() const { return current_offset == offsets_size || current_row != offsets[current_offset]; }
|
||||
size_t ALWAYS_INLINE getValueIndex() const { return isDefault() ? 0 : current_offset + 1; }
|
||||
size_t ALWAYS_INLINE getCurrentRow() const { return current_row; }
|
||||
size_t ALWAYS_INLINE getCurrentOffset() const { return current_offset; }
|
||||
size_t ALWAYS_INLINE increaseCurrentRow() { return ++current_row; }
|
||||
size_t ALWAYS_INLINE increaseCurrentOffset() { return ++current_offset; }
|
||||
|
||||
bool operator==(const Iterator & other) const
|
||||
{
|
||||
@ -209,6 +211,7 @@ public:
|
||||
|
||||
private:
|
||||
const PaddedPODArray<UInt64> & offsets;
|
||||
const size_t offsets_size;
|
||||
const size_t size;
|
||||
size_t current_offset;
|
||||
size_t current_row;
|
||||
|
@ -174,6 +174,11 @@
|
||||
M(ObjectStorageAzureThreads, "Number of threads in the AzureObjectStorage thread pool.") \
|
||||
M(ObjectStorageAzureThreadsActive, "Number of threads in the AzureObjectStorage thread pool running a task.") \
|
||||
M(ObjectStorageAzureThreadsScheduled, "Number of queued or active jobs in the AzureObjectStorage thread pool.") \
|
||||
\
|
||||
M(DiskPlainRewritableAzureDirectoryMapSize, "Number of local-to-remote path entries in the 'plain_rewritable' in-memory map for AzureObjectStorage.") \
|
||||
M(DiskPlainRewritableLocalDirectoryMapSize, "Number of local-to-remote path entries in the 'plain_rewritable' in-memory map for LocalObjectStorage.") \
|
||||
M(DiskPlainRewritableS3DirectoryMapSize, "Number of local-to-remote path entries in the 'plain_rewritable' in-memory map for S3ObjectStorage.") \
|
||||
\
|
||||
M(MergeTreePartsLoaderThreads, "Number of threads in the MergeTree parts loader thread pool.") \
|
||||
M(MergeTreePartsLoaderThreadsActive, "Number of threads in the MergeTree parts loader thread pool running a task.") \
|
||||
M(MergeTreePartsLoaderThreadsScheduled, "Number of queued or active jobs in the MergeTree parts loader thread pool.") \
|
||||
|
@ -417,6 +417,13 @@ The server successfully detected this situation and will download merged part fr
|
||||
M(DiskS3PutObject, "Number of DiskS3 API PutObject calls.") \
|
||||
M(DiskS3GetObject, "Number of DiskS3 API GetObject calls.") \
|
||||
\
|
||||
M(DiskPlainRewritableAzureDirectoryCreated, "Number of directories created by the 'plain_rewritable' metadata storage for AzureObjectStorage.") \
|
||||
M(DiskPlainRewritableAzureDirectoryRemoved, "Number of directories removed by the 'plain_rewritable' metadata storage for AzureObjectStorage.") \
|
||||
M(DiskPlainRewritableLocalDirectoryCreated, "Number of directories created by the 'plain_rewritable' metadata storage for LocalObjectStorage.") \
|
||||
M(DiskPlainRewritableLocalDirectoryRemoved, "Number of directories removed by the 'plain_rewritable' metadata storage for LocalObjectStorage.") \
|
||||
M(DiskPlainRewritableS3DirectoryCreated, "Number of directories created by the 'plain_rewritable' metadata storage for S3ObjectStorage.") \
|
||||
M(DiskPlainRewritableS3DirectoryRemoved, "Number of directories removed by the 'plain_rewritable' metadata storage for S3ObjectStorage.") \
|
||||
\
|
||||
M(S3Clients, "Number of created S3 clients.") \
|
||||
M(TinyS3Clients, "Number of S3 clients copies which reuse an existing auth provider from another client.") \
|
||||
\
|
||||
|
@ -478,4 +478,9 @@ bool Context::hasTraceCollector() const
|
||||
return false;
|
||||
}
|
||||
|
||||
bool Context::isBackgroundOperationContext() const
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -170,6 +170,8 @@ public:
|
||||
const ServerSettings & getServerSettings() const;
|
||||
|
||||
bool hasTraceCollector() const;
|
||||
|
||||
bool isBackgroundOperationContext() const;
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -394,7 +394,7 @@ class IColumn;
|
||||
M(Bool, allow_experimental_analyzer, true, "Allow experimental analyzer.", 0) \
|
||||
M(Bool, analyzer_compatibility_join_using_top_level_identifier, false, "Force to resolve identifier in JOIN USING from projection (for example, in `SELECT a + 1 AS b FROM t1 JOIN t2 USING (b)` join will be performed by `t1.a + 1 = t2.b`, rather then `t1.b = t2.b`).", 0) \
|
||||
M(Bool, prefer_global_in_and_join, false, "If enabled, all IN/JOIN operators will be rewritten as GLOBAL IN/JOIN. It's useful when the to-be-joined tables are only available on the initiator and we need to always scatter their data on-the-fly during distributed processing with the GLOBAL keyword. It's also useful to reduce the need to access the external sources joining external tables.", 0) \
|
||||
M(Bool, enable_vertical_final, true, "If enable, remove duplicated rows during FINAL by marking rows as deleted and filtering them later instead of merging rows", 0) \
|
||||
M(Bool, enable_vertical_final, false, "Not recommended. If enable, remove duplicated rows during FINAL by marking rows as deleted and filtering them later instead of merging rows", 0) \
|
||||
\
|
||||
\
|
||||
/** Limits during query execution are part of the settings. \
|
||||
@ -924,7 +924,7 @@ class IColumn;
|
||||
M(Int64, ignore_cold_parts_seconds, 0, "Only available in ClickHouse Cloud. Exclude new data parts from SELECT queries until they're either pre-warmed (see cache_populated_by_fetch) or this many seconds old. Only for Replicated-/SharedMergeTree.", 0) \
|
||||
M(Int64, prefer_warmed_unmerged_parts_seconds, 0, "Only available in ClickHouse Cloud. If a merged part is less than this many seconds old and is not pre-warmed (see cache_populated_by_fetch), but all its source parts are available and pre-warmed, SELECT queries will read from those parts instead. Only for ReplicatedMergeTree. Note that this only checks whether CacheWarmer processed the part; if the part was fetched into cache by something else, it'll still be considered cold until CacheWarmer gets to it; if it was warmed, then evicted from cache, it'll still be considered warm.", 0) \
|
||||
M(Bool, iceberg_engine_ignore_schema_evolution, false, "Ignore schema evolution in Iceberg table engine and read all data using latest schema saved on table creation. Note that it can lead to incorrect result", 0) \
|
||||
M(Bool, allow_deprecated_functions, false, "Allow usage of deprecated functions", 0) \
|
||||
M(Bool, allow_deprecated_error_prone_window_functions, false, "Allow usage of deprecated error prone window functions (neighbor, runningAccumulate, runningDifferenceStartingWithFirstValue, runningDifference)", 0) \
|
||||
|
||||
// End of COMMON_SETTINGS
|
||||
// Please add settings related to formats into the FORMAT_FACTORY_SETTINGS, move obsolete settings to OBSOLETE_SETTINGS and obsolete format settings to OBSOLETE_FORMAT_SETTINGS.
|
||||
|
@ -94,7 +94,7 @@ static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> sett
|
||||
{"azure_ignore_file_doesnt_exist", false, false, "Allow to return 0 rows when the requested files don't exist instead of throwing an exception in AzureBlobStorage table engine"},
|
||||
{"s3_ignore_file_doesnt_exist", false, false, "Allow to return 0 rows when the requested files don't exist instead of throwing an exception in S3 table engine"},
|
||||
}},
|
||||
{"24.5", {{"allow_deprecated_functions", true, false, "Allow usage of deprecated functions"},
|
||||
{"24.5", {{"allow_deprecated_error_prone_window_functions", true, false, "Allow usage of deprecated error prone window functions (neighbor, runningAccumulate, runningDifferenceStartingWithFirstValue, runningDifference)"},
|
||||
{"allow_experimental_join_condition", false, false, "Support join with inequal conditions which involve columns from both left and right table. e.g. t1.y < t2.y."},
|
||||
{"input_format_tsv_crlf_end_of_line", false, false, "Enables reading of CRLF line endings with TSV formats"},
|
||||
{"output_format_parquet_use_custom_encoder", false, true, "Enable custom Parquet encoder."},
|
||||
|
@ -936,7 +936,7 @@ void DatabaseReplicated::recoverLostReplica(const ZooKeeperPtr & current_zookeep
|
||||
query_context->setSetting("allow_experimental_window_functions", 1);
|
||||
query_context->setSetting("allow_experimental_geo_types", 1);
|
||||
query_context->setSetting("allow_experimental_map_type", 1);
|
||||
query_context->setSetting("allow_deprecated_functions", 1);
|
||||
query_context->setSetting("allow_deprecated_error_prone_window_functions", 1);
|
||||
|
||||
query_context->setSetting("allow_suspicious_low_cardinality_types", 1);
|
||||
query_context->setSetting("allow_suspicious_fixed_string_types", 1);
|
||||
|
@ -350,6 +350,13 @@ public:
|
||||
return delegate;
|
||||
}
|
||||
|
||||
#if USE_AWS_S3
|
||||
std::shared_ptr<const S3::Client> getS3StorageClient() const override
|
||||
{
|
||||
return delegate->getS3StorageClient();
|
||||
}
|
||||
#endif
|
||||
|
||||
private:
|
||||
String wrappedPath(const String & path) const
|
||||
{
|
||||
|
@ -14,7 +14,6 @@
|
||||
#include <Disks/DirectoryIterator.h>
|
||||
|
||||
#include <memory>
|
||||
#include <mutex>
|
||||
#include <utility>
|
||||
#include <boost/noncopyable.hpp>
|
||||
#include <Poco/Timestamp.h>
|
||||
@ -116,13 +115,18 @@ public:
|
||||
/// Default constructor.
|
||||
IDisk(const String & name_, const Poco::Util::AbstractConfiguration & config, const String & config_prefix)
|
||||
: name(name_)
|
||||
, copying_thread_pool(CurrentMetrics::IDiskCopierThreads, CurrentMetrics::IDiskCopierThreadsActive, CurrentMetrics::IDiskCopierThreadsScheduled, config.getUInt(config_prefix + ".thread_pool_size", 16))
|
||||
, copying_thread_pool(
|
||||
CurrentMetrics::IDiskCopierThreads,
|
||||
CurrentMetrics::IDiskCopierThreadsActive,
|
||||
CurrentMetrics::IDiskCopierThreadsScheduled,
|
||||
config.getUInt(config_prefix + ".thread_pool_size", 16))
|
||||
{
|
||||
}
|
||||
|
||||
explicit IDisk(const String & name_)
|
||||
: name(name_)
|
||||
, copying_thread_pool(CurrentMetrics::IDiskCopierThreads, CurrentMetrics::IDiskCopierThreadsActive, CurrentMetrics::IDiskCopierThreadsScheduled, 16)
|
||||
, copying_thread_pool(
|
||||
CurrentMetrics::IDiskCopierThreads, CurrentMetrics::IDiskCopierThreadsActive, CurrentMetrics::IDiskCopierThreadsScheduled, 16)
|
||||
{
|
||||
}
|
||||
|
||||
@ -466,6 +470,17 @@ public:
|
||||
|
||||
virtual DiskPtr getDelegateDiskIfExists() const { return nullptr; }
|
||||
|
||||
#if USE_AWS_S3
|
||||
virtual std::shared_ptr<const S3::Client> getS3StorageClient() const
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::NOT_IMPLEMENTED,
|
||||
"Method getS3StorageClient() is not implemented for disk type: {}",
|
||||
getDataSourceDescription().toString());
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
protected:
|
||||
friend class DiskDecorator;
|
||||
|
||||
|
@ -274,6 +274,11 @@ bool CachedOnDiskReadBufferFromFile::canStartFromCache(size_t current_offset, co
|
||||
return current_write_offset > current_offset;
|
||||
}
|
||||
|
||||
String CachedOnDiskReadBufferFromFile::toString(ReadType type)
|
||||
{
|
||||
return String(magic_enum::enum_name(type));
|
||||
}
|
||||
|
||||
CachedOnDiskReadBufferFromFile::ImplementationBufferPtr
|
||||
CachedOnDiskReadBufferFromFile::getReadBufferForFileSegment(FileSegment & file_segment)
|
||||
{
|
||||
|
@ -129,19 +129,7 @@ private:
|
||||
|
||||
ReadType read_type = ReadType::REMOTE_FS_READ_BYPASS_CACHE;
|
||||
|
||||
static String toString(ReadType type)
|
||||
{
|
||||
switch (type)
|
||||
{
|
||||
case ReadType::CACHED:
|
||||
return "CACHED";
|
||||
case ReadType::REMOTE_FS_READ_BYPASS_CACHE:
|
||||
return "REMOTE_FS_READ_BYPASS_CACHE";
|
||||
case ReadType::REMOTE_FS_READ_AND_PUT_IN_CACHE:
|
||||
return "REMOTE_FS_READ_AND_PUT_IN_CACHE";
|
||||
}
|
||||
UNREACHABLE();
|
||||
}
|
||||
static String toString(ReadType type);
|
||||
|
||||
size_t first_offset = 0;
|
||||
String nextimpl_step_log_info;
|
||||
|
@ -127,6 +127,13 @@ public:
|
||||
}
|
||||
#endif
|
||||
|
||||
#if USE_AWS_S3
|
||||
std::shared_ptr<const S3::Client> getS3StorageClient() override
|
||||
{
|
||||
return object_storage->getS3StorageClient();
|
||||
}
|
||||
#endif
|
||||
|
||||
private:
|
||||
FileCacheKey getCacheKey(const std::string & path) const;
|
||||
|
||||
|
@ -582,6 +582,12 @@ UInt64 DiskObjectStorage::getRevision() const
|
||||
return metadata_helper->getRevision();
|
||||
}
|
||||
|
||||
#if USE_AWS_S3
|
||||
std::shared_ptr<const S3::Client> DiskObjectStorage::getS3StorageClient() const
|
||||
{
|
||||
return object_storage->getS3StorageClient();
|
||||
}
|
||||
#endif
|
||||
|
||||
DiskPtr DiskObjectStorageReservation::getDisk(size_t i) const
|
||||
{
|
||||
|
@ -6,6 +6,8 @@
|
||||
#include <Disks/ObjectStorages/IMetadataStorage.h>
|
||||
#include <Common/re2.h>
|
||||
|
||||
#include "config.h"
|
||||
|
||||
|
||||
namespace CurrentMetrics
|
||||
{
|
||||
@ -210,6 +212,10 @@ public:
|
||||
bool supportsChmod() const override { return metadata_storage->supportsChmod(); }
|
||||
void chmod(const String & path, mode_t mode) override;
|
||||
|
||||
#if USE_AWS_S3
|
||||
std::shared_ptr<const S3::Client> getS3StorageClient() const override;
|
||||
#endif
|
||||
|
||||
private:
|
||||
|
||||
/// Create actual disk object storage transaction for operations
|
||||
|
@ -18,6 +18,11 @@ namespace ErrorCodes
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
const MetadataStorageMetrics & IObjectStorage::getMetadataStorageMetrics() const
|
||||
{
|
||||
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method 'getMetadataStorageMetrics' is not implemented");
|
||||
}
|
||||
|
||||
bool IObjectStorage::existsOrHasAnyChild(const std::string & path) const
|
||||
{
|
||||
RelativePathsWithMetadata files;
|
||||
|
@ -1,10 +1,10 @@
|
||||
#pragma once
|
||||
|
||||
#include <filesystem>
|
||||
#include <string>
|
||||
#include <map>
|
||||
#include <mutex>
|
||||
#include <optional>
|
||||
#include <filesystem>
|
||||
|
||||
#include <Poco/Timestamp.h>
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
@ -13,17 +13,18 @@
|
||||
#include <IO/WriteSettings.h>
|
||||
#include <IO/copyData.h>
|
||||
|
||||
#include <Disks/ObjectStorages/StoredObject.h>
|
||||
#include <Disks/DiskType.h>
|
||||
#include <Common/ThreadPool_fwd.h>
|
||||
#include <Common/ObjectStorageKey.h>
|
||||
#include <Disks/WriteMode.h>
|
||||
#include <Interpreters/Context_fwd.h>
|
||||
#include <Core/Types.h>
|
||||
#include <Disks/DirectoryIterator.h>
|
||||
#include <Common/ThreadPool.h>
|
||||
#include <Common/threadPoolCallbackRunner.h>
|
||||
#include <Disks/DiskType.h>
|
||||
#include <Disks/ObjectStorages/MetadataStorageMetrics.h>
|
||||
#include <Disks/ObjectStorages/StoredObject.h>
|
||||
#include <Disks/WriteMode.h>
|
||||
#include <Interpreters/Context_fwd.h>
|
||||
#include <Common/Exception.h>
|
||||
#include <Common/ObjectStorageKey.h>
|
||||
#include <Common/ThreadPool.h>
|
||||
#include <Common/ThreadPool_fwd.h>
|
||||
#include <Common/threadPoolCallbackRunner.h>
|
||||
#include "config.h"
|
||||
|
||||
#if USE_AZURE_BLOB_STORAGE
|
||||
@ -31,6 +32,10 @@
|
||||
#include <azure/storage/blobs.hpp>
|
||||
#endif
|
||||
|
||||
#if USE_AWS_S3
|
||||
#include <IO/S3/Client.h>
|
||||
#endif
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
@ -111,6 +116,8 @@ public:
|
||||
|
||||
virtual std::string getDescription() const = 0;
|
||||
|
||||
virtual const MetadataStorageMetrics & getMetadataStorageMetrics() const;
|
||||
|
||||
/// Object exists or not
|
||||
virtual bool exists(const StoredObject & object) const = 0;
|
||||
|
||||
@ -257,6 +264,13 @@ public:
|
||||
}
|
||||
#endif
|
||||
|
||||
#if USE_AWS_S3
|
||||
virtual std::shared_ptr<const S3::Client> getS3StorageClient()
|
||||
{
|
||||
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "This function is only implemented for S3ObjectStorage");
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
private:
|
||||
mutable std::mutex throttlers_mutex;
|
||||
|
@ -52,11 +52,16 @@ void MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::execute(std:
|
||||
|
||||
[[maybe_unused]] auto result = path_map.emplace(path, std::move(key_prefix));
|
||||
chassert(result.second);
|
||||
auto metric = object_storage->getMetadataStorageMetrics().directory_map_size;
|
||||
CurrentMetrics::add(metric, 1);
|
||||
|
||||
writeString(path.string(), *buf);
|
||||
buf->finalize();
|
||||
|
||||
write_finalized = true;
|
||||
|
||||
auto event = object_storage->getMetadataStorageMetrics().directory_created;
|
||||
ProfileEvents::increment(event);
|
||||
}
|
||||
|
||||
void MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
|
||||
@ -65,6 +70,9 @@ void MetadataStorageFromPlainObjectStorageCreateDirectoryOperation::undo(std::un
|
||||
if (write_finalized)
|
||||
{
|
||||
path_map.erase(path);
|
||||
auto metric = object_storage->getMetadataStorageMetrics().directory_map_size;
|
||||
CurrentMetrics::sub(metric, 1);
|
||||
|
||||
object_storage->removeObject(StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME));
|
||||
}
|
||||
else if (write_created)
|
||||
@ -165,7 +173,15 @@ void MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::execute(std:
|
||||
auto object_key = ObjectStorageKey::createAsRelative(key_prefix, PREFIX_PATH_FILE_NAME);
|
||||
auto object = StoredObject(object_key.serialize(), path / PREFIX_PATH_FILE_NAME);
|
||||
object_storage->removeObject(object);
|
||||
|
||||
path_map.erase(path_it);
|
||||
auto metric = object_storage->getMetadataStorageMetrics().directory_map_size;
|
||||
CurrentMetrics::sub(metric, 1);
|
||||
|
||||
removed = true;
|
||||
|
||||
auto event = object_storage->getMetadataStorageMetrics().directory_removed;
|
||||
ProfileEvents::increment(event);
|
||||
}
|
||||
|
||||
void MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::undo(std::unique_lock<SharedMutex> &)
|
||||
@ -185,6 +201,8 @@ void MetadataStorageFromPlainObjectStorageRemoveDirectoryOperation::undo(std::un
|
||||
buf->finalize();
|
||||
|
||||
path_map.emplace(path, std::move(key_prefix));
|
||||
auto metric = object_storage->getMetadataStorageMetrics().directory_map_size;
|
||||
CurrentMetrics::add(metric, 1);
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -50,6 +50,8 @@ MetadataStorageFromPlainObjectStorage::PathMap loadPathPrefixMap(const std::stri
|
||||
res.first->second,
|
||||
remote_path.parent_path().string());
|
||||
}
|
||||
auto metric = object_storage->getMetadataStorageMetrics().directory_map_size;
|
||||
CurrentMetrics::add(metric, result.size());
|
||||
return result;
|
||||
}
|
||||
|
||||
@ -134,6 +136,12 @@ MetadataStorageFromPlainRewritableObjectStorage::MetadataStorageFromPlainRewrita
|
||||
object_storage->setKeysGenerator(keys_gen);
|
||||
}
|
||||
|
||||
MetadataStorageFromPlainRewritableObjectStorage::~MetadataStorageFromPlainRewritableObjectStorage()
|
||||
{
|
||||
auto metric = object_storage->getMetadataStorageMetrics().directory_map_size;
|
||||
CurrentMetrics::sub(metric, path_map->size());
|
||||
}
|
||||
|
||||
std::vector<std::string> MetadataStorageFromPlainRewritableObjectStorage::getDirectChildrenOnDisk(
|
||||
const std::string & storage_key, const RelativePathsWithMetadata & remote_paths, const std::string & local_path) const
|
||||
{
|
||||
|
@ -14,6 +14,7 @@ private:
|
||||
|
||||
public:
|
||||
MetadataStorageFromPlainRewritableObjectStorage(ObjectStoragePtr object_storage_, String storage_path_prefix_);
|
||||
~MetadataStorageFromPlainRewritableObjectStorage() override;
|
||||
|
||||
MetadataStorageType getType() const override { return MetadataStorageType::PlainRewritable; }
|
||||
|
||||
|
24
src/Disks/ObjectStorages/MetadataStorageMetrics.h
Normal file
24
src/Disks/ObjectStorages/MetadataStorageMetrics.h
Normal file
@ -0,0 +1,24 @@
|
||||
#pragma once
|
||||
|
||||
#include <Disks/DiskType.h>
|
||||
#include <Common/CurrentMetrics.h>
|
||||
#include <Common/ProfileEvents.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
struct MetadataStorageMetrics
|
||||
{
|
||||
const ProfileEvents::Event directory_created = ProfileEvents::end();
|
||||
const ProfileEvents::Event directory_removed = ProfileEvents::end();
|
||||
|
||||
CurrentMetrics::Metric directory_map_size = CurrentMetrics::end();
|
||||
|
||||
template <typename ObjectStorage, MetadataStorageType metadata_type>
|
||||
static MetadataStorageMetrics create()
|
||||
{
|
||||
return MetadataStorageMetrics{};
|
||||
}
|
||||
};
|
||||
|
||||
}
|
@ -23,6 +23,7 @@
|
||||
#include <Disks/ObjectStorages/MetadataStorageFactory.h>
|
||||
#include <Disks/ObjectStorages/PlainObjectStorage.h>
|
||||
#include <Disks/ObjectStorages/PlainRewritableObjectStorage.h>
|
||||
#include <Disks/ObjectStorages/createMetadataStorageMetrics.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Common/Macros.h>
|
||||
|
||||
@ -85,7 +86,9 @@ ObjectStoragePtr createObjectStorage(
|
||||
DataSourceDescription{DataSourceType::ObjectStorage, type, MetadataStorageType::PlainRewritable, /*description*/ ""}
|
||||
.toString());
|
||||
|
||||
return std::make_shared<PlainRewritableObjectStorage<BaseObjectStorage>>(std::forward<Args>(args)...);
|
||||
auto metadata_storage_metrics = DB::MetadataStorageMetrics::create<BaseObjectStorage, MetadataStorageType::PlainRewritable>();
|
||||
return std::make_shared<PlainRewritableObjectStorage<BaseObjectStorage>>(
|
||||
std::move(metadata_storage_metrics), std::forward<Args>(args)...);
|
||||
}
|
||||
else
|
||||
return std::make_shared<BaseObjectStorage>(std::forward<Args>(args)...);
|
||||
@ -256,8 +259,9 @@ void registerS3PlainRewritableObjectStorage(ObjectStorageFactory & factory)
|
||||
auto client = getClient(config, config_prefix, context, *settings, true);
|
||||
auto key_generator = getKeyGenerator(uri, config, config_prefix);
|
||||
|
||||
auto metadata_storage_metrics = DB::MetadataStorageMetrics::create<S3ObjectStorage, MetadataStorageType::PlainRewritable>();
|
||||
auto object_storage = std::make_shared<PlainRewritableObjectStorage<S3ObjectStorage>>(
|
||||
std::move(client), std::move(settings), uri, s3_capabilities, key_generator, name);
|
||||
std::move(metadata_storage_metrics), std::move(client), std::move(settings), uri, s3_capabilities, key_generator, name);
|
||||
|
||||
/// NOTE: should we still perform this check for clickhouse-disks?
|
||||
if (!skip_access_check)
|
||||
|
@ -16,8 +16,9 @@ class PlainRewritableObjectStorage : public BaseObjectStorage
|
||||
{
|
||||
public:
|
||||
template <class... Args>
|
||||
explicit PlainRewritableObjectStorage(Args &&... args)
|
||||
explicit PlainRewritableObjectStorage(MetadataStorageMetrics && metadata_storage_metrics_, Args &&... args)
|
||||
: BaseObjectStorage(std::forward<Args>(args)...)
|
||||
, metadata_storage_metrics(std::move(metadata_storage_metrics_))
|
||||
/// A basic key generator is required for checking S3 capabilities,
|
||||
/// it will be reset later by metadata storage.
|
||||
, key_generator(createObjectStorageKeysGeneratorAsIsWithPrefix(BaseObjectStorage::getCommonKeyPrefix()))
|
||||
@ -26,6 +27,8 @@ public:
|
||||
|
||||
std::string getName() const override { return "PlainRewritable" + BaseObjectStorage::getName(); }
|
||||
|
||||
const MetadataStorageMetrics & getMetadataStorageMetrics() const override { return metadata_storage_metrics; }
|
||||
|
||||
bool isWriteOnce() const override { return false; }
|
||||
|
||||
bool isPlain() const override { return true; }
|
||||
@ -37,6 +40,7 @@ public:
|
||||
void setKeysGenerator(ObjectStorageKeysGeneratorPtr gen) override { key_generator = gen; }
|
||||
|
||||
private:
|
||||
MetadataStorageMetrics metadata_storage_metrics;
|
||||
ObjectStorageKeysGeneratorPtr key_generator;
|
||||
};
|
||||
|
||||
|
@ -259,7 +259,10 @@ std::unique_ptr<WriteBufferFromFileBase> S3ObjectStorage::writeObject( /// NOLIN
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "S3 doesn't support append to files");
|
||||
|
||||
S3Settings::RequestSettings request_settings = s3_settings.get()->request_settings;
|
||||
if (auto query_context = CurrentThread::getQueryContext())
|
||||
/// NOTE: For background operations settings are not propagated from session or query. They are taken from
|
||||
/// default user's .xml config. It's obscure and unclear behavior. For them it's always better
|
||||
/// to rely on settings from disk.
|
||||
if (auto query_context = CurrentThread::getQueryContext(); query_context && !query_context->isBackgroundOperationContext())
|
||||
{
|
||||
request_settings.updateFromSettingsIfChanged(query_context->getSettingsRef());
|
||||
}
|
||||
@ -495,13 +498,14 @@ void S3ObjectStorage::copyObjectToAnotherObjectStorage( // NOLINT
|
||||
try
|
||||
{
|
||||
copyS3File(
|
||||
current_client,
|
||||
uri.bucket,
|
||||
object_from.remote_path,
|
||||
0,
|
||||
size,
|
||||
dest_s3->uri.bucket,
|
||||
object_to.remote_path,
|
||||
/*src_s3_client=*/current_client,
|
||||
/*src_bucket=*/uri.bucket,
|
||||
/*src_key=*/object_from.remote_path,
|
||||
/*src_offset=*/0,
|
||||
/*src_size=*/size,
|
||||
/*dest_s3_client=*/current_client,
|
||||
/*dest_bucket=*/dest_s3->uri.bucket,
|
||||
/*dest_key=*/object_to.remote_path,
|
||||
settings_ptr->request_settings,
|
||||
patchSettings(read_settings),
|
||||
BlobStorageLogWriter::create(disk_name),
|
||||
@ -535,13 +539,15 @@ void S3ObjectStorage::copyObject( // NOLINT
|
||||
auto size = S3::getObjectSize(*current_client, uri.bucket, object_from.remote_path, {}, settings_ptr->request_settings);
|
||||
auto scheduler = threadPoolCallbackRunnerUnsafe<void>(getThreadPoolWriter(), "S3ObjStor_copy");
|
||||
|
||||
copyS3File(current_client,
|
||||
uri.bucket,
|
||||
object_from.remote_path,
|
||||
0,
|
||||
size,
|
||||
uri.bucket,
|
||||
object_to.remote_path,
|
||||
copyS3File(
|
||||
/*src_s3_client=*/current_client,
|
||||
/*src_bucket=*/uri.bucket,
|
||||
/*src_key=*/object_from.remote_path,
|
||||
/*src_offset=*/0,
|
||||
/*src_size=*/size,
|
||||
/*dest_s3_client=*/current_client,
|
||||
/*dest_bucket=*/uri.bucket,
|
||||
/*dest_key=*/object_to.remote_path,
|
||||
settings_ptr->request_settings,
|
||||
patchSettings(read_settings),
|
||||
BlobStorageLogWriter::create(disk_name),
|
||||
@ -617,6 +623,11 @@ ObjectStorageKey S3ObjectStorage::generateObjectKeyForPath(const std::string & p
|
||||
return key_generator->generate(path, /* is_directory */ false);
|
||||
}
|
||||
|
||||
std::shared_ptr<const S3::Client> S3ObjectStorage::getS3StorageClient()
|
||||
{
|
||||
return client.get();
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
#endif
|
||||
|
@ -168,6 +168,7 @@ public:
|
||||
|
||||
bool isReadOnly() const override { return s3_settings.get()->read_only; }
|
||||
|
||||
std::shared_ptr<const S3::Client> getS3StorageClient() override;
|
||||
private:
|
||||
void setNewSettings(std::unique_ptr<S3ObjectStorageSettings> && s3_settings_);
|
||||
|
||||
|
@ -3,6 +3,8 @@
|
||||
#include "config.h"
|
||||
|
||||
#include <Disks/ObjectStorages/IObjectStorage.h>
|
||||
|
||||
#include <filesystem>
|
||||
#include <shared_mutex>
|
||||
|
||||
namespace Poco
|
||||
|
67
src/Disks/ObjectStorages/createMetadataStorageMetrics.h
Normal file
67
src/Disks/ObjectStorages/createMetadataStorageMetrics.h
Normal file
@ -0,0 +1,67 @@
|
||||
#pragma once
|
||||
|
||||
#if USE_AWS_S3
|
||||
# include <Disks/ObjectStorages/S3/S3ObjectStorage.h>
|
||||
#endif
|
||||
#if USE_AZURE_BLOB_STORAGE && !defined(CLICKHOUSE_KEEPER_STANDALONE_BUILD)
|
||||
# include <Disks/ObjectStorages/AzureBlobStorage/AzureObjectStorage.h>
|
||||
#endif
|
||||
#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD
|
||||
# include <Disks/ObjectStorages/Local/LocalObjectStorage.h>
|
||||
#endif
|
||||
#include <Disks/ObjectStorages/MetadataStorageMetrics.h>
|
||||
|
||||
namespace ProfileEvents
|
||||
{
|
||||
extern const Event DiskPlainRewritableAzureDirectoryCreated;
|
||||
extern const Event DiskPlainRewritableAzureDirectoryRemoved;
|
||||
extern const Event DiskPlainRewritableLocalDirectoryCreated;
|
||||
extern const Event DiskPlainRewritableLocalDirectoryRemoved;
|
||||
extern const Event DiskPlainRewritableS3DirectoryCreated;
|
||||
extern const Event DiskPlainRewritableS3DirectoryRemoved;
|
||||
}
|
||||
|
||||
namespace CurrentMetrics
|
||||
{
|
||||
extern const Metric DiskPlainRewritableAzureDirectoryMapSize;
|
||||
extern const Metric DiskPlainRewritableLocalDirectoryMapSize;
|
||||
extern const Metric DiskPlainRewritableS3DirectoryMapSize;
|
||||
}
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
#if USE_AWS_S3
|
||||
template <>
|
||||
inline MetadataStorageMetrics MetadataStorageMetrics::create<S3ObjectStorage, MetadataStorageType::PlainRewritable>()
|
||||
{
|
||||
return MetadataStorageMetrics{
|
||||
.directory_created = ProfileEvents::DiskPlainRewritableS3DirectoryCreated,
|
||||
.directory_removed = ProfileEvents::DiskPlainRewritableS3DirectoryRemoved,
|
||||
.directory_map_size = CurrentMetrics::DiskPlainRewritableS3DirectoryMapSize};
|
||||
}
|
||||
#endif
|
||||
|
||||
#if USE_AZURE_BLOB_STORAGE && !defined(CLICKHOUSE_KEEPER_STANDALONE_BUILD)
|
||||
template <>
|
||||
inline MetadataStorageMetrics MetadataStorageMetrics::create<AzureObjectStorage, MetadataStorageType::PlainRewritable>()
|
||||
{
|
||||
return MetadataStorageMetrics{
|
||||
.directory_created = ProfileEvents::DiskPlainRewritableAzureDirectoryCreated,
|
||||
.directory_removed = ProfileEvents::DiskPlainRewritableAzureDirectoryRemoved,
|
||||
.directory_map_size = CurrentMetrics::DiskPlainRewritableAzureDirectoryMapSize};
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifndef CLICKHOUSE_KEEPER_STANDALONE_BUILD
|
||||
template <>
|
||||
inline MetadataStorageMetrics MetadataStorageMetrics::create<LocalObjectStorage, MetadataStorageType::PlainRewritable>()
|
||||
{
|
||||
return MetadataStorageMetrics{
|
||||
.directory_created = ProfileEvents::DiskPlainRewritableLocalDirectoryCreated,
|
||||
.directory_removed = ProfileEvents::DiskPlainRewritableLocalDirectoryRemoved,
|
||||
.directory_map_size = CurrentMetrics::DiskPlainRewritableLocalDirectoryMapSize};
|
||||
}
|
||||
#endif
|
||||
|
||||
}
|
221
src/Functions/fromReadable.h
Normal file
221
src/Functions/fromReadable.h
Normal file
@ -0,0 +1,221 @@
|
||||
#pragma once
|
||||
|
||||
#include <base/types.h>
|
||||
#include <boost/algorithm/string/case_conv.hpp>
|
||||
|
||||
#include <Columns/ColumnNullable.h>
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <Columns/ColumnString.h>
|
||||
#include <Common/Exception.h>
|
||||
#include <DataTypes/DataTypeNullable.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <Functions/FunctionHelpers.h>
|
||||
#include <Functions/IFunction.h>
|
||||
#include <IO/ReadBufferFromString.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <cmath>
|
||||
#include <string_view>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int BAD_ARGUMENTS;
|
||||
extern const int CANNOT_PARSE_INPUT_ASSERTION_FAILED;
|
||||
extern const int CANNOT_PARSE_NUMBER;
|
||||
extern const int CANNOT_PARSE_TEXT;
|
||||
extern const int ILLEGAL_COLUMN;
|
||||
extern const int UNEXPECTED_DATA_AFTER_PARSED_VALUE;
|
||||
}
|
||||
|
||||
enum class ErrorHandling : uint8_t
|
||||
{
|
||||
Exception,
|
||||
Zero,
|
||||
Null
|
||||
};
|
||||
|
||||
using ScaleFactors = std::unordered_map<std::string_view, size_t>;
|
||||
|
||||
/** fromReadble*Size - Returns the number of bytes corresponding to a given readable binary or decimal size.
|
||||
* Examples:
|
||||
* - `fromReadableSize('123 MiB')`
|
||||
* - `fromReadableDecimalSize('123 MB')`
|
||||
* Meant to be the inverse of `formatReadable*Size` with the following exceptions:
|
||||
* - Number of bytes is returned as an unsigned integer amount instead of a float. Decimal points are rounded up to the nearest integer.
|
||||
* - Negative numbers are not allowed as negative sizes don't make sense.
|
||||
* Flavours:
|
||||
* - fromReadableSize
|
||||
* - fromReadableSizeOrNull
|
||||
* - fromReadableSizeOrZero
|
||||
* - fromReadableDecimalSize
|
||||
* - fromReadableDecimalSizeOrNull
|
||||
* - fromReadableDecimalSizeOrZero
|
||||
*/
|
||||
template <typename Name, typename Impl, ErrorHandling error_handling>
|
||||
class FunctionFromReadable : public IFunction
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = Name::name;
|
||||
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionFromReadable<Name, Impl, error_handling>>(); }
|
||||
|
||||
String getName() const override { return name; }
|
||||
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; }
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
|
||||
{
|
||||
FunctionArgumentDescriptors args
|
||||
{
|
||||
{"readable_size", static_cast<FunctionArgumentDescriptor::TypeValidator>(&isString), nullptr, "String"},
|
||||
};
|
||||
validateFunctionArgumentTypes(*this, arguments, args);
|
||||
DataTypePtr return_type = std::make_shared<DataTypeUInt64>();
|
||||
if (error_handling == ErrorHandling::Null)
|
||||
return std::make_shared<DataTypeNullable>(return_type);
|
||||
else
|
||||
return return_type;
|
||||
}
|
||||
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
|
||||
{
|
||||
const auto * col_str = checkAndGetColumn<ColumnString>(arguments[0].column.get());
|
||||
if (!col_str)
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::ILLEGAL_COLUMN,
|
||||
"Illegal column {} of first ('str') argument of function {}. Must be string.",
|
||||
arguments[0].column->getName(),
|
||||
getName()
|
||||
);
|
||||
}
|
||||
|
||||
const ScaleFactors & scale_factors = Impl::getScaleFactors();
|
||||
|
||||
auto col_res = ColumnUInt64::create(input_rows_count);
|
||||
|
||||
ColumnUInt8::MutablePtr col_null_map;
|
||||
if constexpr (error_handling == ErrorHandling::Null)
|
||||
col_null_map = ColumnUInt8::create(input_rows_count, 0);
|
||||
|
||||
auto & res_data = col_res->getData();
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
{
|
||||
std::string_view value = col_str->getDataAt(i).toView();
|
||||
try
|
||||
{
|
||||
UInt64 num_bytes = parseReadableFormat(scale_factors, value);
|
||||
res_data[i] = num_bytes;
|
||||
}
|
||||
catch (const Exception &)
|
||||
{
|
||||
if constexpr (error_handling == ErrorHandling::Exception)
|
||||
{
|
||||
throw;
|
||||
}
|
||||
else
|
||||
{
|
||||
res_data[i] = 0;
|
||||
if constexpr (error_handling == ErrorHandling::Null)
|
||||
col_null_map->getData()[i] = 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
if constexpr (error_handling == ErrorHandling::Null)
|
||||
return ColumnNullable::create(std::move(col_res), std::move(col_null_map));
|
||||
else
|
||||
return col_res;
|
||||
}
|
||||
|
||||
private:
|
||||
|
||||
UInt64 parseReadableFormat(const ScaleFactors & scale_factors, const std::string_view & value) const
|
||||
{
|
||||
ReadBufferFromString buf(value);
|
||||
|
||||
// tryReadFloatText does seem to not raise any error when there is leading whitespace so we check it explicitly
|
||||
skipWhitespaceIfAny(buf);
|
||||
if (buf.getPosition() > 0)
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::CANNOT_PARSE_INPUT_ASSERTION_FAILED,
|
||||
"Invalid expression for function {} - Leading whitespace is not allowed (\"{}\")",
|
||||
getName(),
|
||||
value
|
||||
);
|
||||
}
|
||||
|
||||
Float64 base = 0;
|
||||
if (!tryReadFloatTextPrecise(base, buf)) // If we use the default (fast) tryReadFloatText this returns True on garbage input so we use the Precise version
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::CANNOT_PARSE_NUMBER,
|
||||
"Invalid expression for function {} - Unable to parse readable size numeric component (\"{}\")",
|
||||
getName(),
|
||||
value
|
||||
);
|
||||
}
|
||||
else if (std::isnan(base) || !std::isfinite(base))
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::BAD_ARGUMENTS,
|
||||
"Invalid expression for function {} - Invalid numeric component: {}",
|
||||
getName(),
|
||||
base
|
||||
);
|
||||
}
|
||||
else if (base < 0)
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::BAD_ARGUMENTS,
|
||||
"Invalid expression for function {} - Negative sizes are not allowed ({})",
|
||||
getName(),
|
||||
base
|
||||
);
|
||||
}
|
||||
|
||||
skipWhitespaceIfAny(buf);
|
||||
|
||||
String unit;
|
||||
readStringUntilWhitespace(unit, buf);
|
||||
boost::algorithm::to_lower(unit);
|
||||
auto iter = scale_factors.find(unit);
|
||||
if (iter == scale_factors.end())
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::CANNOT_PARSE_TEXT,
|
||||
"Invalid expression for function {} - Unknown readable size unit (\"{}\")",
|
||||
getName(),
|
||||
unit
|
||||
);
|
||||
}
|
||||
else if (!buf.eof())
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::UNEXPECTED_DATA_AFTER_PARSED_VALUE,
|
||||
"Invalid expression for function {} - Found trailing characters after readable size string (\"{}\")",
|
||||
getName(),
|
||||
value
|
||||
);
|
||||
}
|
||||
|
||||
Float64 num_bytes_with_decimals = base * iter->second;
|
||||
if (num_bytes_with_decimals > std::numeric_limits<UInt64>::max())
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::BAD_ARGUMENTS,
|
||||
"Invalid expression for function {} - Result is too big for output type (\"{}\")",
|
||||
getName(),
|
||||
num_bytes_with_decimals
|
||||
);
|
||||
}
|
||||
// As the input might be an arbitrary decimal number we might end up with a non-integer amount of bytes when parsing binary (eg MiB) units.
|
||||
// This doesn't make sense so we round up to indicate the byte size that can fit the passed size.
|
||||
return static_cast<UInt64>(std::ceil(num_bytes_with_decimals));
|
||||
}
|
||||
};
|
||||
}
|
122
src/Functions/fromReadableDecimalSize.cpp
Normal file
122
src/Functions/fromReadableDecimalSize.cpp
Normal file
@ -0,0 +1,122 @@
|
||||
#include <base/types.h>
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/fromReadable.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
struct Impl
|
||||
{
|
||||
static const ScaleFactors & getScaleFactors()
|
||||
{
|
||||
static const ScaleFactors scale_factors =
|
||||
{
|
||||
{"b", 1ull},
|
||||
{"kb", 1000ull},
|
||||
{"mb", 1000ull * 1000ull},
|
||||
{"gb", 1000ull * 1000ull * 1000ull},
|
||||
{"tb", 1000ull * 1000ull * 1000ull * 1000ull},
|
||||
{"pb", 1000ull * 1000ull * 1000ull * 1000ull * 1000ull},
|
||||
{"eb", 1000ull * 1000ull * 1000ull * 1000ull * 1000ull * 1000ull},
|
||||
};
|
||||
|
||||
return scale_factors;
|
||||
}
|
||||
};
|
||||
|
||||
struct NameFromReadableDecimalSize
|
||||
{
|
||||
static constexpr auto name = "fromReadableDecimalSize";
|
||||
};
|
||||
|
||||
struct NameFromReadableDecimalSizeOrNull
|
||||
{
|
||||
static constexpr auto name = "fromReadableDecimalSizeOrNull";
|
||||
};
|
||||
|
||||
struct NameFromReadableDecimalSizeOrZero
|
||||
{
|
||||
static constexpr auto name = "fromReadableDecimalSizeOrZero";
|
||||
};
|
||||
|
||||
using FunctionFromReadableDecimalSize = FunctionFromReadable<NameFromReadableDecimalSize, Impl, ErrorHandling::Exception>;
|
||||
using FunctionFromReadableDecimalSizeOrNull = FunctionFromReadable<NameFromReadableDecimalSizeOrNull, Impl, ErrorHandling::Null>;
|
||||
using FunctionFromReadableDecimalSizeOrZero = FunctionFromReadable<NameFromReadableDecimalSizeOrZero, Impl, ErrorHandling::Zero>;
|
||||
|
||||
|
||||
FunctionDocumentation fromReadableDecimalSize_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KB`, `MB`, etc. as a unit, this function returns the corresponding number of bytes. If the function is unable to parse the input value, it throws an exception.",
|
||||
.syntax = "fromReadableDecimalSize(x)",
|
||||
.arguments = {{"x", "Readable size with decimal units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer ([UInt64](../../sql-reference/data-types/int-uint.md))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB']) AS readable_sizes, fromReadableDecimalSize(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5314 │
|
||||
└────────────────┴─────────┘)"
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
|
||||
FunctionDocumentation fromReadableDecimalSizeOrNull_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit, this function returns the corresponding number of bytes. If the function is unable to parse the input value, it returns `NULL`",
|
||||
.syntax = "fromReadableDecimalSizeOrNull(x)",
|
||||
.arguments = {{"x", "Readable size with decimal units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer, or NULL if unable to parse the input (Nullable([UInt64](../../sql-reference/data-types/int-uint.md)))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB', 'invalid']) AS readable_sizes, fromReadableSizeOrNull(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5314 │
|
||||
│ invalid │ ᴺᵁᴸᴸ │
|
||||
└────────────────┴─────────┘)"
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
|
||||
FunctionDocumentation fromReadableDecimalSizeOrZero_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit, this function returns the corresponding number of bytes. If the function is unable to parse the input value, it returns `0`",
|
||||
.syntax = "formatReadableSizeOrZero(x)",
|
||||
.arguments = {{"x", "Readable size with decimal units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer, or 0 if unable to parse the input ([UInt64](../../sql-reference/data-types/int-uint.md))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KB', '3 MB', '5.314 KB', 'invalid']) AS readable_sizes, fromReadableSizeOrZero(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KB │ 1000 │
|
||||
│ 3 MB │ 3000000 │
|
||||
│ 5.314 KB │ 5000 │
|
||||
│ invalid │ 0 │
|
||||
└────────────────┴─────────┘)"
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
}
|
||||
|
||||
REGISTER_FUNCTION(FromReadableDecimalSize)
|
||||
{
|
||||
factory.registerFunction<FunctionFromReadableDecimalSize>(fromReadableDecimalSize_documentation);
|
||||
factory.registerFunction<FunctionFromReadableDecimalSizeOrNull>(fromReadableDecimalSizeOrNull_documentation);
|
||||
factory.registerFunction<FunctionFromReadableDecimalSizeOrZero>(fromReadableDecimalSizeOrZero_documentation);
|
||||
}
|
||||
}
|
124
src/Functions/fromReadableSize.cpp
Normal file
124
src/Functions/fromReadableSize.cpp
Normal file
@ -0,0 +1,124 @@
|
||||
#include <base/types.h>
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/fromReadable.h>
|
||||
#include "Common/FunctionDocumentation.h"
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
struct Impl
|
||||
{
|
||||
static const ScaleFactors & getScaleFactors()
|
||||
{
|
||||
// ISO/IEC 80000-13 binary units
|
||||
static const ScaleFactors scale_factors =
|
||||
{
|
||||
{"b", 1ull},
|
||||
{"kib", 1024ull},
|
||||
{"mib", 1024ull * 1024ull},
|
||||
{"gib", 1024ull * 1024ull * 1024ull},
|
||||
{"tib", 1024ull * 1024ull * 1024ull * 1024ull},
|
||||
{"pib", 1024ull * 1024ull * 1024ull * 1024ull * 1024ull},
|
||||
{"eib", 1024ull * 1024ull * 1024ull * 1024ull * 1024ull * 1024ull},
|
||||
};
|
||||
|
||||
return scale_factors;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
struct NameFromReadableSize
|
||||
{
|
||||
static constexpr auto name = "fromReadableSize";
|
||||
};
|
||||
|
||||
struct NameFromReadableSizeOrNull
|
||||
{
|
||||
static constexpr auto name = "fromReadableSizeOrNull";
|
||||
};
|
||||
|
||||
struct NameFromReadableSizeOrZero
|
||||
{
|
||||
static constexpr auto name = "fromReadableSizeOrZero";
|
||||
};
|
||||
|
||||
using FunctionFromReadableSize = FunctionFromReadable<NameFromReadableSize, Impl, ErrorHandling::Exception>;
|
||||
using FunctionFromReadableSizeOrNull = FunctionFromReadable<NameFromReadableSizeOrNull, Impl, ErrorHandling::Null>;
|
||||
using FunctionFromReadableSizeOrZero = FunctionFromReadable<NameFromReadableSizeOrZero, Impl, ErrorHandling::Zero>;
|
||||
|
||||
FunctionDocumentation fromReadableSize_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes. If the function is unable to parse the input value, it throws an exception.",
|
||||
.syntax = "fromReadableSize(x)",
|
||||
.arguments = {{"x", "Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer ([UInt64](../../sql-reference/data-types/int-uint.md))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB']) AS readable_sizes, fromReadableSize(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
└────────────────┴─────────┘)"
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
|
||||
FunctionDocumentation fromReadableSizeOrNull_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes. If the function is unable to parse the input value, it returns `NULL`",
|
||||
.syntax = "fromReadableSizeOrNull(x)",
|
||||
.arguments = {{"x", "Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer, or NULL if unable to parse the input (Nullable([UInt64](../../sql-reference/data-types/int-uint.md)))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB', 'invalid']) AS readable_sizes, fromReadableSize(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
│ invalid │ ᴺᵁᴸᴸ │
|
||||
└────────────────┴─────────┘)"
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
|
||||
FunctionDocumentation fromReadableSizeOrZero_documentation {
|
||||
.description = "Given a string containing a byte size and `B`, `KiB`, `MiB`, etc. as a unit (i.e. [ISO/IEC 80000-13](https://en.wikipedia.org/wiki/ISO/IEC_80000) unit), this function returns the corresponding number of bytes. If the function is unable to parse the input value, it returns `0`",
|
||||
.syntax = "fromReadableSizeOrZero(x)",
|
||||
.arguments = {{"x", "Readable size with ISO/IEC 80000-13 units ([String](../../sql-reference/data-types/string.md))"}},
|
||||
.returned_value = "Number of bytes, rounded up to the nearest integer, or 0 if unable to parse the input ([UInt64](../../sql-reference/data-types/int-uint.md))",
|
||||
.examples = {
|
||||
{
|
||||
"basic",
|
||||
"SELECT arrayJoin(['1 B', '1 KiB', '3 MiB', '5.314 KiB', 'invalid']) AS readable_sizes, fromReadableSize(readable_sizes) AS sizes;",
|
||||
R"(
|
||||
┌─readable_sizes─┬───sizes─┐
|
||||
│ 1 B │ 1 │
|
||||
│ 1 KiB │ 1024 │
|
||||
│ 3 MiB │ 3145728 │
|
||||
│ 5.314 KiB │ 5442 │
|
||||
│ invalid │ 0 │
|
||||
└────────────────┴─────────┘)",
|
||||
},
|
||||
},
|
||||
.categories = {"OtherFunctions"},
|
||||
};
|
||||
}
|
||||
|
||||
REGISTER_FUNCTION(FromReadableSize)
|
||||
{
|
||||
factory.registerFunction<FunctionFromReadableSize>(fromReadableSize_documentation);
|
||||
factory.registerFunction<FunctionFromReadableSizeOrNull>(fromReadableSizeOrNull_documentation);
|
||||
factory.registerFunction<FunctionFromReadableSizeOrZero>(fromReadableSizeOrZero_documentation);
|
||||
}
|
||||
}
|
@ -123,61 +123,37 @@ SnowflakeIdRange getRangeOfAvailableIds(const SnowflakeId & available, size_t in
|
||||
return {begin, end};
|
||||
}
|
||||
|
||||
struct GlobalCounterPolicy
|
||||
struct Data
|
||||
{
|
||||
static constexpr auto name = "generateSnowflakeID";
|
||||
static constexpr auto description = R"(Generates a Snowflake ID. The generated Snowflake ID contains the current Unix timestamp in milliseconds 41 (+ 1 top zero bit) bits, followed by machine id (10 bits), a counter (12 bits) to distinguish IDs within a millisecond. For any given timestamp (unix_ts_ms), the counter starts at 0 and is incremented by 1 for each new Snowflake ID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to 0. Function generateSnowflakeID guarantees that the counter field within a timestamp increments monotonically across all function invocations in concurrently running threads and queries.)";
|
||||
|
||||
/// Guarantee counter monotonicity within one timestamp across all threads generating Snowflake IDs simultaneously.
|
||||
struct Data
|
||||
static inline std::atomic<uint64_t> lowest_available_snowflake_id = 0;
|
||||
|
||||
SnowflakeId reserveRange(size_t input_rows_count)
|
||||
{
|
||||
static inline std::atomic<uint64_t> lowest_available_snowflake_id = 0;
|
||||
|
||||
SnowflakeId reserveRange(size_t input_rows_count)
|
||||
uint64_t available_snowflake_id = lowest_available_snowflake_id.load();
|
||||
SnowflakeIdRange range;
|
||||
do
|
||||
{
|
||||
uint64_t available_snowflake_id = lowest_available_snowflake_id.load();
|
||||
SnowflakeIdRange range;
|
||||
do
|
||||
{
|
||||
range = getRangeOfAvailableIds(toSnowflakeId(available_snowflake_id), input_rows_count);
|
||||
}
|
||||
while (!lowest_available_snowflake_id.compare_exchange_weak(available_snowflake_id, fromSnowflakeId(range.end)));
|
||||
/// if CAS failed --> another thread updated `lowest_available_snowflake_id` and we re-try
|
||||
/// else --> our thread reserved ID range [begin, end) and return the beginning of the range
|
||||
|
||||
return range.begin;
|
||||
range = getRangeOfAvailableIds(toSnowflakeId(available_snowflake_id), input_rows_count);
|
||||
}
|
||||
};
|
||||
};
|
||||
while (!lowest_available_snowflake_id.compare_exchange_weak(available_snowflake_id, fromSnowflakeId(range.end)));
|
||||
/// CAS failed --> another thread updated `lowest_available_snowflake_id` and we re-try
|
||||
/// else --> our thread reserved ID range [begin, end) and return the beginning of the range
|
||||
|
||||
struct ThreadLocalCounterPolicy
|
||||
{
|
||||
static constexpr auto name = "generateSnowflakeIDThreadMonotonic";
|
||||
static constexpr auto description = R"(Generates a Snowflake ID. The generated Snowflake ID contains the current Unix timestamp in milliseconds 41 (+ 1 top zero bit) bits, followed by machine id (10 bits), a counter (12 bits) to distinguish IDs within a millisecond. For any given timestamp (unix_ts_ms), the counter starts at 0 and is incremented by 1 for each new Snowflake ID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to 0. This function behaves like generateSnowflakeID but gives no guarantee on counter monotony across different simultaneous requests. Monotonicity within one timestamp is guaranteed only within the same thread calling this function to generate Snowflake IDs.)";
|
||||
|
||||
/// Guarantee counter monotonicity within one timestamp within the same thread. Faster than GlobalCounterPolicy if a query uses multiple threads.
|
||||
struct Data
|
||||
{
|
||||
static inline thread_local uint64_t lowest_available_snowflake_id = 0;
|
||||
|
||||
SnowflakeId reserveRange(size_t input_rows_count)
|
||||
{
|
||||
SnowflakeIdRange range = getRangeOfAvailableIds(toSnowflakeId(lowest_available_snowflake_id), input_rows_count);
|
||||
lowest_available_snowflake_id = fromSnowflakeId(range.end);
|
||||
return range.begin;
|
||||
}
|
||||
};
|
||||
return range.begin;
|
||||
}
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
template <typename FillPolicy>
|
||||
class FunctionGenerateSnowflakeID : public IFunction, public FillPolicy
|
||||
class FunctionGenerateSnowflakeID : public IFunction
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = "generateSnowflakeID";
|
||||
|
||||
static FunctionPtr create(ContextPtr /*context*/) { return std::make_shared<FunctionGenerateSnowflakeID>(); }
|
||||
|
||||
String getName() const override { return FillPolicy::name; }
|
||||
String getName() const override { return name; }
|
||||
size_t getNumberOfArguments() const override { return 0; }
|
||||
bool isDeterministic() const override { return false; }
|
||||
bool isDeterministicInScopeOfQuery() const override { return false; }
|
||||
@ -205,7 +181,7 @@ public:
|
||||
{
|
||||
vec_to.resize(input_rows_count);
|
||||
|
||||
typename FillPolicy::Data data;
|
||||
Data data;
|
||||
SnowflakeId snowflake_id = data.reserveRange(input_rows_count); /// returns begin of available snowflake ids range
|
||||
|
||||
for (UInt64 & to_row : vec_to)
|
||||
@ -229,27 +205,16 @@ public:
|
||||
|
||||
};
|
||||
|
||||
template<typename FillPolicy>
|
||||
void registerSnowflakeIDGenerator(auto & factory)
|
||||
{
|
||||
static constexpr auto doc_syntax_format = "{}([expression])";
|
||||
static constexpr auto example_format = "SELECT {}()";
|
||||
static constexpr auto multiple_example_format = "SELECT {f}(1), {f}(2)";
|
||||
|
||||
FunctionDocumentation::Description description = FillPolicy::description;
|
||||
FunctionDocumentation::Syntax syntax = fmt::format(doc_syntax_format, FillPolicy::name);
|
||||
FunctionDocumentation::Arguments arguments = {{"expression", "The expression is used to bypass common subexpression elimination if the function is called multiple times in a query but otherwise ignored. Optional."}};
|
||||
FunctionDocumentation::ReturnedValue returned_value = "A value of type UInt64";
|
||||
FunctionDocumentation::Examples examples = {{"single", fmt::format(example_format, FillPolicy::name), ""}, {"multiple", fmt::format(multiple_example_format, fmt::arg("f", FillPolicy::name)), ""}};
|
||||
FunctionDocumentation::Categories categories = {"Snowflake ID"};
|
||||
|
||||
factory.template registerFunction<FunctionGenerateSnowflakeID<FillPolicy>>({description, syntax, arguments, returned_value, examples, categories}, FunctionFactory::CaseInsensitive);
|
||||
}
|
||||
|
||||
REGISTER_FUNCTION(GenerateSnowflakeID)
|
||||
{
|
||||
registerSnowflakeIDGenerator<GlobalCounterPolicy>(factory);
|
||||
registerSnowflakeIDGenerator<ThreadLocalCounterPolicy>(factory);
|
||||
FunctionDocumentation::Description description = R"(Generates a Snowflake ID. The generated Snowflake ID contains the current Unix timestamp in milliseconds 41 (+ 1 top zero bit) bits, followed by machine id (10 bits), a counter (12 bits) to distinguish IDs within a millisecond. For any given timestamp (unix_ts_ms), the counter starts at 0 and is incremented by 1 for each new Snowflake ID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to 0. Function generateSnowflakeID guarantees that the counter field within a timestamp increments monotonically across all function invocations in concurrently running threads and queries.)";
|
||||
FunctionDocumentation::Syntax syntax = "generateSnowflakeID([expression])";
|
||||
FunctionDocumentation::Arguments arguments = {{"expression", "The expression is used to bypass common subexpression elimination if the function is called multiple times in a query but otherwise ignored. Optional."}};
|
||||
FunctionDocumentation::ReturnedValue returned_value = "A value of type UInt64";
|
||||
FunctionDocumentation::Examples examples = {{"single", "SELECT generateSnowflakeID()", "7201148511606784000"}, {"multiple", "SELECT generateSnowflakeID(1), generateSnowflakeID(2)", ""}};
|
||||
FunctionDocumentation::Categories categories = {"Snowflake ID"};
|
||||
|
||||
factory.registerFunction<FunctionGenerateSnowflakeID>({description, syntax, arguments, returned_value, examples, categories});
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -73,20 +73,6 @@ void setVariant(UUID & uuid)
|
||||
UUIDHelpers::getLowBytes(uuid) = (UUIDHelpers::getLowBytes(uuid) & rand_b_bits_mask) | variant_2_mask;
|
||||
}
|
||||
|
||||
struct FillAllRandomPolicy
|
||||
{
|
||||
static constexpr auto name = "generateUUIDv7NonMonotonic";
|
||||
static constexpr auto description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), and a random field (74 bit, including a 2-bit variant field "2") to distinguish UUIDs within a millisecond. This function is the fastest generateUUIDv7* function but it gives no monotonicity guarantees within a timestamp.)";
|
||||
struct Data
|
||||
{
|
||||
void generate(UUID & uuid, uint64_t ts)
|
||||
{
|
||||
setTimestampAndVersion(uuid, ts);
|
||||
setVariant(uuid);
|
||||
}
|
||||
};
|
||||
};
|
||||
|
||||
struct CounterFields
|
||||
{
|
||||
uint64_t last_timestamp = 0;
|
||||
@ -133,44 +119,21 @@ struct CounterFields
|
||||
};
|
||||
|
||||
|
||||
struct GlobalCounterPolicy
|
||||
struct Data
|
||||
{
|
||||
static constexpr auto name = "generateUUIDv7";
|
||||
static constexpr auto description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit, including a variant field "2", 2 bit) to distinguish UUIDs within a millisecond, and a random field (32 bits). For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value. Function generateUUIDv7 guarantees that the counter field within a timestamp increments monotonically across all function invocations in concurrently running threads and queries.)";
|
||||
|
||||
/// Guarantee counter monotonicity within one timestamp across all threads generating UUIDv7 simultaneously.
|
||||
struct Data
|
||||
static inline CounterFields fields;
|
||||
static inline SharedMutex mutex; /// works a little bit faster than std::mutex here
|
||||
std::lock_guard<SharedMutex> guard;
|
||||
|
||||
Data()
|
||||
: guard(mutex)
|
||||
{}
|
||||
|
||||
void generate(UUID & uuid, uint64_t timestamp)
|
||||
{
|
||||
static inline CounterFields fields;
|
||||
static inline SharedMutex mutex; /// works a little bit faster than std::mutex here
|
||||
std::lock_guard<SharedMutex> guard;
|
||||
|
||||
Data()
|
||||
: guard(mutex)
|
||||
{}
|
||||
|
||||
void generate(UUID & uuid, uint64_t timestamp)
|
||||
{
|
||||
fields.generate(uuid, timestamp);
|
||||
}
|
||||
};
|
||||
};
|
||||
|
||||
struct ThreadLocalCounterPolicy
|
||||
{
|
||||
static constexpr auto name = "generateUUIDv7ThreadMonotonic";
|
||||
static constexpr auto description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit, including a variant field "2", 2 bit) to distinguish UUIDs within a millisecond, and a random field (32 bits). For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value. This function behaves like generateUUIDv7 but gives no guarantee on counter monotony across different simultaneous requests. Monotonicity within one timestamp is guaranteed only within the same thread calling this function to generate UUIDs.)";
|
||||
|
||||
/// Guarantee counter monotonicity within one timestamp within the same thread. Faster than GlobalCounterPolicy if a query uses multiple threads.
|
||||
struct Data
|
||||
{
|
||||
static inline thread_local CounterFields fields;
|
||||
|
||||
void generate(UUID & uuid, uint64_t timestamp)
|
||||
{
|
||||
fields.generate(uuid, timestamp);
|
||||
}
|
||||
};
|
||||
fields.generate(uuid, timestamp);
|
||||
}
|
||||
};
|
||||
|
||||
}
|
||||
@ -181,11 +144,12 @@ DECLARE_AVX2_SPECIFIC_CODE(__VA_ARGS__)
|
||||
|
||||
DECLARE_SEVERAL_IMPLEMENTATIONS(
|
||||
|
||||
template <typename FillPolicy>
|
||||
class FunctionGenerateUUIDv7Base : public IFunction, public FillPolicy
|
||||
class FunctionGenerateUUIDv7Base : public IFunction
|
||||
{
|
||||
public:
|
||||
String getName() const final { return FillPolicy::name; }
|
||||
static constexpr auto name = "generateUUIDv7";
|
||||
|
||||
String getName() const final { return name; }
|
||||
size_t getNumberOfArguments() const final { return 0; }
|
||||
bool isDeterministic() const override { return false; }
|
||||
bool isDeterministicInScopeOfQuery() const final { return false; }
|
||||
@ -221,7 +185,7 @@ public:
|
||||
uint64_t timestamp = getTimestampMillisecond();
|
||||
for (UUID & uuid : vec_to)
|
||||
{
|
||||
typename FillPolicy::Data data;
|
||||
Data data;
|
||||
data.generate(uuid, timestamp);
|
||||
}
|
||||
}
|
||||
@ -231,19 +195,18 @@ public:
|
||||
) // DECLARE_SEVERAL_IMPLEMENTATIONS
|
||||
#undef DECLARE_SEVERAL_IMPLEMENTATIONS
|
||||
|
||||
template <typename FillPolicy>
|
||||
class FunctionGenerateUUIDv7Base : public TargetSpecific::Default::FunctionGenerateUUIDv7Base<FillPolicy>
|
||||
class FunctionGenerateUUIDv7Base : public TargetSpecific::Default::FunctionGenerateUUIDv7Base
|
||||
{
|
||||
public:
|
||||
using Self = FunctionGenerateUUIDv7Base<FillPolicy>;
|
||||
using Parent = TargetSpecific::Default::FunctionGenerateUUIDv7Base<FillPolicy>;
|
||||
using Self = FunctionGenerateUUIDv7Base;
|
||||
using Parent = TargetSpecific::Default::FunctionGenerateUUIDv7Base;
|
||||
|
||||
explicit FunctionGenerateUUIDv7Base(ContextPtr context) : selector(context)
|
||||
{
|
||||
selector.registerImplementation<TargetArch::Default, Parent>();
|
||||
|
||||
#if USE_MULTITARGET_CODE
|
||||
using ParentAVX2 = TargetSpecific::AVX2::FunctionGenerateUUIDv7Base<FillPolicy>;
|
||||
using ParentAVX2 = TargetSpecific::AVX2::FunctionGenerateUUIDv7Base;
|
||||
selector.registerImplementation<TargetArch::AVX2, ParentAVX2>();
|
||||
#endif
|
||||
}
|
||||
@ -262,27 +225,16 @@ private:
|
||||
ImplementationSelector<IFunction> selector;
|
||||
};
|
||||
|
||||
template<typename FillPolicy>
|
||||
void registerUUIDv7Generator(auto & factory)
|
||||
{
|
||||
static constexpr auto doc_syntax_format = "{}([expression])";
|
||||
static constexpr auto example_format = "SELECT {}()";
|
||||
static constexpr auto multiple_example_format = "SELECT {f}(1), {f}(2)";
|
||||
|
||||
FunctionDocumentation::Description description = FillPolicy::description;
|
||||
FunctionDocumentation::Syntax syntax = fmt::format(doc_syntax_format, FillPolicy::name);
|
||||
FunctionDocumentation::Arguments arguments = {{"expression", "The expression is used to bypass common subexpression elimination if the function is called multiple times in a query but otherwise ignored. Optional."}};
|
||||
FunctionDocumentation::ReturnedValue returned_value = "A value of type UUID version 7.";
|
||||
FunctionDocumentation::Examples examples = {{"single", fmt::format(example_format, FillPolicy::name), ""}, {"multiple", fmt::format(multiple_example_format, fmt::arg("f", FillPolicy::name)), ""}};
|
||||
FunctionDocumentation::Categories categories = {"UUID"};
|
||||
|
||||
factory.template registerFunction<FunctionGenerateUUIDv7Base<FillPolicy>>({description, syntax, arguments, returned_value, examples, categories}, FunctionFactory::CaseInsensitive);
|
||||
}
|
||||
|
||||
REGISTER_FUNCTION(GenerateUUIDv7)
|
||||
{
|
||||
registerUUIDv7Generator<GlobalCounterPolicy>(factory);
|
||||
registerUUIDv7Generator<ThreadLocalCounterPolicy>(factory);
|
||||
registerUUIDv7Generator<FillAllRandomPolicy>(factory);
|
||||
FunctionDocumentation::Description description = R"(Generates a UUID of version 7. The generated UUID contains the current Unix timestamp in milliseconds (48 bits), followed by version "7" (4 bits), a counter (42 bit, including a variant field "2", 2 bit) to distinguish UUIDs within a millisecond, and a random field (32 bits). For any given timestamp (unix_ts_ms), the counter starts at a random value and is incremented by 1 for each new UUID until the timestamp changes. In case the counter overflows, the timestamp field is incremented by 1 and the counter is reset to a random new start value. Function generateUUIDv7 guarantees that the counter field within a timestamp increments monotonically across all function invocations in concurrently running threads and queries.)";
|
||||
FunctionDocumentation::Syntax syntax = "SELECT generateUUIDv7()";
|
||||
FunctionDocumentation::Arguments arguments = {{"expression", "The expression is used to bypass common subexpression elimination if the function is called multiple times in a query but otherwise ignored. Optional."}};
|
||||
FunctionDocumentation::ReturnedValue returned_value = "A value of type UUID version 7.";
|
||||
FunctionDocumentation::Examples examples = {{"single", "SELECT generateUUIDv7()", ""}, {"multiple", "SELECT generateUUIDv7(1), generateUUIDv7(2)", ""}};
|
||||
FunctionDocumentation::Categories categories = {"UUID"};
|
||||
|
||||
factory.registerFunction<FunctionGenerateUUIDv7Base>({description, syntax, arguments, returned_value, examples, categories});
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -36,11 +36,11 @@ public:
|
||||
|
||||
static FunctionPtr create(ContextPtr context)
|
||||
{
|
||||
if (!context->getSettingsRef().allow_deprecated_functions)
|
||||
if (!context->getSettingsRef().allow_deprecated_error_prone_window_functions)
|
||||
throw Exception(
|
||||
ErrorCodes::DEPRECATED_FUNCTION,
|
||||
"Function {} is deprecated since its usage is error-prone (see docs)."
|
||||
"Please use proper window function or set `allow_deprecated_functions` setting to enable it",
|
||||
"Please use proper window function or set `allow_deprecated_error_prone_window_functions` setting to enable it",
|
||||
name);
|
||||
|
||||
return std::make_shared<FunctionNeighbor>();
|
||||
|
@ -39,11 +39,11 @@ public:
|
||||
|
||||
static FunctionPtr create(ContextPtr context)
|
||||
{
|
||||
if (!context->getSettingsRef().allow_deprecated_functions)
|
||||
if (!context->getSettingsRef().allow_deprecated_error_prone_window_functions)
|
||||
throw Exception(
|
||||
ErrorCodes::DEPRECATED_FUNCTION,
|
||||
"Function {} is deprecated since its usage is error-prone (see docs)."
|
||||
"Please use proper window function or set `allow_deprecated_functions` setting to enable it",
|
||||
"Please use proper window function or set `allow_deprecated_error_prone_window_functions` setting to enable it",
|
||||
name);
|
||||
|
||||
return std::make_shared<FunctionRunningAccumulate>();
|
||||
|
@ -139,11 +139,11 @@ public:
|
||||
|
||||
static FunctionPtr create(ContextPtr context)
|
||||
{
|
||||
if (!context->getSettingsRef().allow_deprecated_functions)
|
||||
if (!context->getSettingsRef().allow_deprecated_error_prone_window_functions)
|
||||
throw Exception(
|
||||
ErrorCodes::DEPRECATED_FUNCTION,
|
||||
"Function {} is deprecated since its usage is error-prone (see docs)."
|
||||
"Please use proper window function or set `allow_deprecated_functions` setting to enable it",
|
||||
"Please use proper window function or set `allow_deprecated_error_prone_window_functions` setting to enable it",
|
||||
name);
|
||||
|
||||
return std::make_shared<FunctionRunningDifferenceImpl<is_first_line_zero>>();
|
||||
|
@ -652,14 +652,25 @@ namespace
|
||||
const std::optional<std::map<String, String>> & object_metadata_,
|
||||
ThreadPoolCallbackRunnerUnsafe<void> schedule_,
|
||||
bool for_disk_s3_,
|
||||
BlobStorageLogWriterPtr blob_storage_log_)
|
||||
: UploadHelper(client_ptr_, dest_bucket_, dest_key_, request_settings_, object_metadata_, schedule_, for_disk_s3_, blob_storage_log_, getLogger("copyS3File"))
|
||||
BlobStorageLogWriterPtr blob_storage_log_,
|
||||
std::function<void()> fallback_method_)
|
||||
: UploadHelper(
|
||||
client_ptr_,
|
||||
dest_bucket_,
|
||||
dest_key_,
|
||||
request_settings_,
|
||||
object_metadata_,
|
||||
schedule_,
|
||||
for_disk_s3_,
|
||||
blob_storage_log_,
|
||||
getLogger("copyS3File"))
|
||||
, src_bucket(src_bucket_)
|
||||
, src_key(src_key_)
|
||||
, offset(src_offset_)
|
||||
, size(src_size_)
|
||||
, supports_multipart_copy(client_ptr_->supportsMultiPartCopy())
|
||||
, read_settings(read_settings_)
|
||||
, fallback_method(std::move(fallback_method_))
|
||||
{
|
||||
}
|
||||
|
||||
@ -682,14 +693,7 @@ namespace
|
||||
size_t size;
|
||||
bool supports_multipart_copy;
|
||||
const ReadSettings read_settings;
|
||||
|
||||
CreateReadBuffer getSourceObjectReadBuffer()
|
||||
{
|
||||
return [&]
|
||||
{
|
||||
return std::make_unique<ReadBufferFromS3>(client_ptr, src_bucket, src_key, "", request_settings, read_settings);
|
||||
};
|
||||
}
|
||||
std::function<void()> fallback_method;
|
||||
|
||||
void performSingleOperationCopy()
|
||||
{
|
||||
@ -744,28 +748,21 @@ namespace
|
||||
if (outcome.GetError().GetExceptionName() == "EntityTooLarge" ||
|
||||
outcome.GetError().GetExceptionName() == "InvalidRequest" ||
|
||||
outcome.GetError().GetExceptionName() == "InvalidArgument" ||
|
||||
outcome.GetError().GetExceptionName() == "AccessDenied" ||
|
||||
(outcome.GetError().GetExceptionName() == "InternalError" &&
|
||||
outcome.GetError().GetResponseCode() == Aws::Http::HttpResponseCode::GATEWAY_TIMEOUT &&
|
||||
outcome.GetError().GetMessage().contains("use the Rewrite method in the JSON API")))
|
||||
{
|
||||
if (!supports_multipart_copy)
|
||||
if (!supports_multipart_copy || outcome.GetError().GetExceptionName() == "AccessDenied")
|
||||
{
|
||||
LOG_INFO(log, "Multipart upload using copy is not supported, will try regular upload for Bucket: {}, Key: {}, Object size: {}",
|
||||
dest_bucket,
|
||||
dest_key,
|
||||
size);
|
||||
copyDataToS3File(
|
||||
getSourceObjectReadBuffer(),
|
||||
offset,
|
||||
size,
|
||||
client_ptr,
|
||||
LOG_INFO(
|
||||
log,
|
||||
"Multipart upload using copy is not supported, will try regular upload for Bucket: {}, Key: {}, Object size: "
|
||||
"{}",
|
||||
dest_bucket,
|
||||
dest_key,
|
||||
request_settings,
|
||||
blob_storage_log,
|
||||
object_metadata,
|
||||
schedule,
|
||||
for_disk_s3);
|
||||
size);
|
||||
fallback_method();
|
||||
break;
|
||||
}
|
||||
else
|
||||
@ -859,17 +856,29 @@ void copyDataToS3File(
|
||||
ThreadPoolCallbackRunnerUnsafe<void> schedule,
|
||||
bool for_disk_s3)
|
||||
{
|
||||
CopyDataToFileHelper helper{create_read_buffer, offset, size, dest_s3_client, dest_bucket, dest_key, settings, object_metadata, schedule, for_disk_s3, blob_storage_log};
|
||||
CopyDataToFileHelper helper{
|
||||
create_read_buffer,
|
||||
offset,
|
||||
size,
|
||||
dest_s3_client,
|
||||
dest_bucket,
|
||||
dest_key,
|
||||
settings,
|
||||
object_metadata,
|
||||
schedule,
|
||||
for_disk_s3,
|
||||
blob_storage_log};
|
||||
helper.performCopy();
|
||||
}
|
||||
|
||||
|
||||
void copyS3File(
|
||||
const std::shared_ptr<const S3::Client> & s3_client,
|
||||
const std::shared_ptr<const S3::Client> & src_s3_client,
|
||||
const String & src_bucket,
|
||||
const String & src_key,
|
||||
size_t src_offset,
|
||||
size_t src_size,
|
||||
std::shared_ptr<const S3::Client> dest_s3_client,
|
||||
const String & dest_bucket,
|
||||
const String & dest_key,
|
||||
const S3Settings::RequestSettings & settings,
|
||||
@ -879,19 +888,50 @@ void copyS3File(
|
||||
ThreadPoolCallbackRunnerUnsafe<void> schedule,
|
||||
bool for_disk_s3)
|
||||
{
|
||||
if (settings.allow_native_copy)
|
||||
if (!dest_s3_client)
|
||||
dest_s3_client = src_s3_client;
|
||||
|
||||
std::function<void()> fallback_method = [&]
|
||||
{
|
||||
CopyFileHelper helper{s3_client, src_bucket, src_key, src_offset, src_size, dest_bucket, dest_key, settings, read_settings, object_metadata, schedule, for_disk_s3, blob_storage_log};
|
||||
helper.performCopy();
|
||||
}
|
||||
else
|
||||
auto create_read_buffer
|
||||
= [&] { return std::make_unique<ReadBufferFromS3>(src_s3_client, src_bucket, src_key, "", settings, read_settings); };
|
||||
|
||||
copyDataToS3File(
|
||||
create_read_buffer,
|
||||
src_offset,
|
||||
src_size,
|
||||
dest_s3_client,
|
||||
dest_bucket,
|
||||
dest_key,
|
||||
settings,
|
||||
blob_storage_log,
|
||||
object_metadata,
|
||||
schedule,
|
||||
for_disk_s3);
|
||||
};
|
||||
|
||||
if (!settings.allow_native_copy)
|
||||
{
|
||||
auto create_read_buffer = [&]
|
||||
{
|
||||
return std::make_unique<ReadBufferFromS3>(s3_client, src_bucket, src_key, "", settings, read_settings);
|
||||
};
|
||||
copyDataToS3File(create_read_buffer, src_offset, src_size, s3_client, dest_bucket, dest_key, settings, blob_storage_log, object_metadata, schedule, for_disk_s3);
|
||||
fallback_method();
|
||||
return;
|
||||
}
|
||||
|
||||
CopyFileHelper helper{
|
||||
src_s3_client,
|
||||
src_bucket,
|
||||
src_key,
|
||||
src_offset,
|
||||
src_size,
|
||||
dest_bucket,
|
||||
dest_key,
|
||||
settings,
|
||||
read_settings,
|
||||
object_metadata,
|
||||
schedule,
|
||||
for_disk_s3,
|
||||
blob_storage_log,
|
||||
std::move(fallback_method)};
|
||||
helper.performCopy();
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -31,11 +31,12 @@ using CreateReadBuffer = std::function<std::unique_ptr<SeekableReadBuffer>()>;
|
||||
///
|
||||
/// read_settings - is used for throttling in case of native copy is not possible
|
||||
void copyS3File(
|
||||
const std::shared_ptr<const S3::Client> & s3_client,
|
||||
const std::shared_ptr<const S3::Client> & src_s3_client,
|
||||
const String & src_bucket,
|
||||
const String & src_key,
|
||||
size_t src_offset,
|
||||
size_t src_size,
|
||||
std::shared_ptr<const S3::Client> dest_s3_client,
|
||||
const String & dest_bucket,
|
||||
const String & dest_key,
|
||||
const S3Settings::RequestSettings & settings,
|
||||
|
@ -130,6 +130,16 @@ public:
|
||||
UInt64 count_participating_replicas{0};
|
||||
UInt64 number_of_current_replica{0};
|
||||
|
||||
enum class BackgroundOperationType : uint8_t
|
||||
{
|
||||
NOT_A_BACKGROUND_OPERATION = 0,
|
||||
MERGE = 1,
|
||||
MUTATION = 2,
|
||||
};
|
||||
|
||||
/// It's ClientInfo and context created for background operation (not real query)
|
||||
BackgroundOperationType background_operation_type{BackgroundOperationType::NOT_A_BACKGROUND_OPERATION};
|
||||
|
||||
bool empty() const { return query_kind == QueryKind::NO_QUERY; }
|
||||
|
||||
/** Serialization and deserialization.
|
||||
|
@ -2386,6 +2386,17 @@ void Context::setCurrentQueryId(const String & query_id)
|
||||
client_info.initial_query_id = client_info.current_query_id;
|
||||
}
|
||||
|
||||
void Context::setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType background_operation)
|
||||
{
|
||||
chassert(background_operation != ClientInfo::BackgroundOperationType::NOT_A_BACKGROUND_OPERATION);
|
||||
client_info.background_operation_type = background_operation;
|
||||
}
|
||||
|
||||
bool Context::isBackgroundOperationContext() const
|
||||
{
|
||||
return client_info.background_operation_type != ClientInfo::BackgroundOperationType::NOT_A_BACKGROUND_OPERATION;
|
||||
}
|
||||
|
||||
void Context::killCurrentQuery() const
|
||||
{
|
||||
if (auto elem = getProcessListElement())
|
||||
|
@ -760,6 +760,12 @@ public:
|
||||
void setCurrentDatabaseNameInGlobalContext(const String & name);
|
||||
void setCurrentQueryId(const String & query_id);
|
||||
|
||||
/// FIXME: for background operations (like Merge and Mutation) we also use the same Context object and even setup
|
||||
/// query_id for it (table_uuid::result_part_name). We can distinguish queries from background operation in some way like
|
||||
/// bool is_background = query_id.contains("::"), but it's much worse than just enum check with more clear purpose
|
||||
void setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType setBackgroundOperationTypeForContextbackground_operation);
|
||||
bool isBackgroundOperationContext() const;
|
||||
|
||||
void killCurrentQuery() const;
|
||||
bool isCurrentQueryKilled() const;
|
||||
|
||||
|
@ -15,18 +15,7 @@ namespace DB
|
||||
|
||||
static String typeToString(FilesystemCacheLogElement::CacheType type)
|
||||
{
|
||||
switch (type)
|
||||
{
|
||||
case FilesystemCacheLogElement::CacheType::READ_FROM_CACHE:
|
||||
return "READ_FROM_CACHE";
|
||||
case FilesystemCacheLogElement::CacheType::READ_FROM_FS_AND_DOWNLOADED_TO_CACHE:
|
||||
return "READ_FROM_FS_AND_DOWNLOADED_TO_CACHE";
|
||||
case FilesystemCacheLogElement::CacheType::READ_FROM_FS_BYPASSING_CACHE:
|
||||
return "READ_FROM_FS_BYPASSING_CACHE";
|
||||
case FilesystemCacheLogElement::CacheType::WRITE_THROUGH_CACHE:
|
||||
return "WRITE_THROUGH_CACHE";
|
||||
}
|
||||
UNREACHABLE();
|
||||
return String(magic_enum::enum_name(type));
|
||||
}
|
||||
|
||||
ColumnsDescription FilesystemCacheLogElement::getColumnsDescription()
|
||||
|
@ -14,6 +14,7 @@ namespace DB
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_COLUMN_FOR_FILTER;
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
static void replaceFilterToConstant(Block & block, const String & filter_column_name)
|
||||
@ -81,7 +82,11 @@ static std::unique_ptr<IFilterDescription> combineFilterAndIndices(
|
||||
auto mutable_holder = ColumnUInt8::create(num_rows, 0);
|
||||
auto & data = mutable_holder->getData();
|
||||
for (auto idx : selected_by_indices)
|
||||
{
|
||||
if (idx >= num_rows)
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Index {} out of range {}", idx, num_rows);
|
||||
data[idx] = 1;
|
||||
}
|
||||
|
||||
/// AND two filters
|
||||
auto * begin = data.data();
|
||||
|
@ -312,6 +312,7 @@ ReplicatedMergeMutateTaskBase::PrepareResult MergeFromLogEntryTask::prepare()
|
||||
task_context = Context::createCopy(storage.getContext());
|
||||
task_context->makeQueryContext();
|
||||
task_context->setCurrentQueryId(getQueryId());
|
||||
task_context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MERGE);
|
||||
|
||||
/// Add merge to list
|
||||
merge_mutate_entry = storage.getContext()->getMergeList().insert(
|
||||
|
@ -168,6 +168,7 @@ ContextMutablePtr MergePlainMergeTreeTask::createTaskContext() const
|
||||
context->makeQueryContext();
|
||||
auto queryId = getQueryId();
|
||||
context->setCurrentQueryId(queryId);
|
||||
context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MERGE);
|
||||
return context;
|
||||
}
|
||||
|
||||
|
@ -35,7 +35,7 @@ struct Settings;
|
||||
M(UInt64, min_bytes_for_wide_part, 10485760, "Minimal uncompressed size in bytes to create part in wide format instead of compact", 0) \
|
||||
M(UInt64, min_rows_for_wide_part, 0, "Minimal number of rows to create part in wide format instead of compact", 0) \
|
||||
M(Float, ratio_of_defaults_for_sparse_serialization, 0.9375f, "Minimal ratio of number of default values to number of all values in column to store it in sparse serializations. If >= 1, columns will be always written in full serialization.", 0) \
|
||||
M(Bool, replace_long_file_name_to_hash, false, "If the file name for column is too long (more than 'max_file_name_length' bytes) replace it to SipHash128", 0) \
|
||||
M(Bool, replace_long_file_name_to_hash, true, "If the file name for column is too long (more than 'max_file_name_length' bytes) replace it to SipHash128", 0) \
|
||||
M(UInt64, max_file_name_length, 127, "The maximal length of the file name to keep it as is without hashing", 0) \
|
||||
M(UInt64, min_bytes_for_full_part_storage, 0, "Only available in ClickHouse Cloud", 0) \
|
||||
M(UInt64, min_rows_for_full_part_storage, 0, "Only available in ClickHouse Cloud", 0) \
|
||||
|
@ -206,6 +206,7 @@ ReplicatedMergeMutateTaskBase::PrepareResult MutateFromLogEntryTask::prepare()
|
||||
task_context = Context::createCopy(storage.getContext());
|
||||
task_context->makeQueryContext();
|
||||
task_context->setCurrentQueryId(getQueryId());
|
||||
task_context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MUTATION);
|
||||
|
||||
merge_mutate_entry = storage.getContext()->getMergeList().insert(
|
||||
storage.getStorageID(),
|
||||
|
@ -139,6 +139,7 @@ ContextMutablePtr MutatePlainMergeTreeTask::createTaskContext() const
|
||||
context->makeQueryContext();
|
||||
auto queryId = getQueryId();
|
||||
context->setCurrentQueryId(queryId);
|
||||
context->setBackgroundOperationTypeForContext(ClientInfo::BackgroundOperationType::MUTATION);
|
||||
return context;
|
||||
}
|
||||
|
||||
|
@ -9,7 +9,7 @@ from threading import Thread
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import requests
|
||||
from lambda_shared.pr import Labels, check_pr_description
|
||||
from lambda_shared.pr import Labels
|
||||
from lambda_shared.token import get_cached_access_token
|
||||
|
||||
NEED_RERUN_OR_CANCELL_WORKFLOWS = {
|
||||
@ -321,21 +321,21 @@ def main(event):
|
||||
return
|
||||
|
||||
if action == "edited":
|
||||
print("PR is edited, check if the body is correct")
|
||||
error, _ = check_pr_description(
|
||||
pull_request["body"], pull_request["base"]["repo"]["full_name"]
|
||||
)
|
||||
if error:
|
||||
print(
|
||||
f"The PR's body is wrong, is going to comment it. The error is: {error}"
|
||||
)
|
||||
post_json = {
|
||||
"body": "This is an automatic comment. The PR descriptions does not "
|
||||
f"match the [template]({pull_request['base']['repo']['html_url']}/"
|
||||
"blob/master/.github/PULL_REQUEST_TEMPLATE.md?plain=1).\n\n"
|
||||
f"Please, edit it accordingly.\n\nThe error is: {error}"
|
||||
}
|
||||
_exec_post_with_retry(pull_request["comments_url"], token, json=post_json)
|
||||
print("PR is edited - do nothing")
|
||||
# error, _ = check_pr_description(
|
||||
# pull_request["body"], pull_request["base"]["repo"]["full_name"]
|
||||
# )
|
||||
# if error:
|
||||
# print(
|
||||
# f"The PR's body is wrong, is going to comment it. The error is: {error}"
|
||||
# )
|
||||
# post_json = {
|
||||
# "body": "This is an automatic comment. The PR descriptions does not "
|
||||
# f"match the [template]({pull_request['base']['repo']['html_url']}/"
|
||||
# "blob/master/.github/PULL_REQUEST_TEMPLATE.md?plain=1).\n\n"
|
||||
# f"Please, edit it accordingly.\n\nThe error is: {error}"
|
||||
# }
|
||||
# _exec_post_with_retry(pull_request["comments_url"], token, json=post_json)
|
||||
return
|
||||
|
||||
if action == "synchronize":
|
||||
|
@ -33,9 +33,10 @@ from subprocess import CalledProcessError
|
||||
from typing import List, Optional
|
||||
|
||||
import __main__
|
||||
|
||||
from env_helper import TEMP_PATH
|
||||
from get_robot_token import get_best_robot_token
|
||||
from git_helper import git_runner, is_shallow
|
||||
from git_helper import GIT_PREFIX, git_runner, is_shallow
|
||||
from github_helper import GitHub, PullRequest, PullRequests, Repository
|
||||
from lambda_shared_package.lambda_shared.pr import Labels
|
||||
from ssh import SSHKey
|
||||
@ -104,10 +105,6 @@ close it.
|
||||
|
||||
self.backport_created_label = backport_created_label
|
||||
|
||||
self.git_prefix = ( # All commits to cherrypick are done as robot-clickhouse
|
||||
"git -c user.email=robot-clickhouse@users.noreply.github.com "
|
||||
"-c user.name=robot-clickhouse -c commit.gpgsign=false"
|
||||
)
|
||||
self.pre_check()
|
||||
|
||||
def pre_check(self):
|
||||
@ -190,17 +187,17 @@ close it.
|
||||
def create_cherrypick(self):
|
||||
# First, create backport branch:
|
||||
# Checkout release branch with discarding every change
|
||||
git_runner(f"{self.git_prefix} checkout -f {self.name}")
|
||||
git_runner(f"{GIT_PREFIX} checkout -f {self.name}")
|
||||
# Create or reset backport branch
|
||||
git_runner(f"{self.git_prefix} checkout -B {self.backport_branch}")
|
||||
git_runner(f"{GIT_PREFIX} checkout -B {self.backport_branch}")
|
||||
# Merge all changes from PR's the first parent commit w/o applying anything
|
||||
# It will allow to create a merge commit like it would be a cherry-pick
|
||||
first_parent = git_runner(f"git rev-parse {self.pr.merge_commit_sha}^1")
|
||||
git_runner(f"{self.git_prefix} merge -s ours --no-edit {first_parent}")
|
||||
git_runner(f"{GIT_PREFIX} merge -s ours --no-edit {first_parent}")
|
||||
|
||||
# Second step, create cherrypick branch
|
||||
git_runner(
|
||||
f"{self.git_prefix} branch -f "
|
||||
f"{GIT_PREFIX} branch -f "
|
||||
f"{self.cherrypick_branch} {self.pr.merge_commit_sha}"
|
||||
)
|
||||
|
||||
@ -209,7 +206,7 @@ close it.
|
||||
# manually to the release branch already
|
||||
try:
|
||||
output = git_runner(
|
||||
f"{self.git_prefix} merge --no-commit --no-ff {self.cherrypick_branch}"
|
||||
f"{GIT_PREFIX} merge --no-commit --no-ff {self.cherrypick_branch}"
|
||||
)
|
||||
# 'up-to-date', 'up to date', who knows what else (╯°v°)╯ ^┻━┻
|
||||
if output.startswith("Already up") and output.endswith("date."):
|
||||
@ -223,14 +220,14 @@ close it.
|
||||
return
|
||||
except CalledProcessError:
|
||||
# There are most probably conflicts, they'll be resolved in PR
|
||||
git_runner(f"{self.git_prefix} reset --merge")
|
||||
git_runner(f"{GIT_PREFIX} reset --merge")
|
||||
else:
|
||||
# There are changes to apply, so continue
|
||||
git_runner(f"{self.git_prefix} reset --merge")
|
||||
git_runner(f"{GIT_PREFIX} reset --merge")
|
||||
|
||||
# Push, create the cherrypick PR, lable and assign it
|
||||
for branch in [self.cherrypick_branch, self.backport_branch]:
|
||||
git_runner(f"{self.git_prefix} push -f {self.REMOTE} {branch}:{branch}")
|
||||
git_runner(f"{GIT_PREFIX} push -f {self.REMOTE} {branch}:{branch}")
|
||||
|
||||
self.cherrypick_pr = self.repo.create_pull(
|
||||
title=f"Cherry pick #{self.pr.number} to {self.name}: {self.pr.title}",
|
||||
@ -245,6 +242,10 @@ close it.
|
||||
)
|
||||
self.cherrypick_pr.add_to_labels(Labels.PR_CHERRYPICK)
|
||||
self.cherrypick_pr.add_to_labels(Labels.DO_NOT_TEST)
|
||||
if Labels.PR_CRITICAL_BUGFIX in [label.name for label in self.pr.labels]:
|
||||
self.cherrypick_pr.add_to_labels(Labels.PR_CRITICAL_BUGFIX)
|
||||
elif Labels.PR_BUGFIX in [label.name for label in self.pr.labels]:
|
||||
self.cherrypick_pr.add_to_labels(Labels.PR_BUGFIX)
|
||||
self._assign_new_pr(self.cherrypick_pr)
|
||||
# update cherrypick PR to get the state for PR.mergable
|
||||
self.cherrypick_pr.update()
|
||||
@ -254,21 +255,19 @@ close it.
|
||||
# Checkout the backport branch from the remote and make all changes to
|
||||
# apply like they are only one cherry-pick commit on top of release
|
||||
logging.info("Creating backport for PR #%s", self.pr.number)
|
||||
git_runner(f"{self.git_prefix} checkout -f {self.backport_branch}")
|
||||
git_runner(
|
||||
f"{self.git_prefix} pull --ff-only {self.REMOTE} {self.backport_branch}"
|
||||
)
|
||||
git_runner(f"{GIT_PREFIX} checkout -f {self.backport_branch}")
|
||||
git_runner(f"{GIT_PREFIX} pull --ff-only {self.REMOTE} {self.backport_branch}")
|
||||
merge_base = git_runner(
|
||||
f"{self.git_prefix} merge-base "
|
||||
f"{GIT_PREFIX} merge-base "
|
||||
f"{self.REMOTE}/{self.name} {self.backport_branch}"
|
||||
)
|
||||
git_runner(f"{self.git_prefix} reset --soft {merge_base}")
|
||||
git_runner(f"{GIT_PREFIX} reset --soft {merge_base}")
|
||||
title = f"Backport #{self.pr.number} to {self.name}: {self.pr.title}"
|
||||
git_runner(f"{self.git_prefix} commit --allow-empty -F -", input=title)
|
||||
git_runner(f"{GIT_PREFIX} commit --allow-empty -F -", input=title)
|
||||
|
||||
# Push with force, create the backport PR, lable and assign it
|
||||
git_runner(
|
||||
f"{self.git_prefix} push -f {self.REMOTE} "
|
||||
f"{GIT_PREFIX} push -f {self.REMOTE} "
|
||||
f"{self.backport_branch}:{self.backport_branch}"
|
||||
)
|
||||
self.backport_pr = self.repo.create_pull(
|
||||
@ -280,6 +279,10 @@ close it.
|
||||
head=self.backport_branch,
|
||||
)
|
||||
self.backport_pr.add_to_labels(Labels.PR_BACKPORT)
|
||||
if Labels.PR_CRITICAL_BUGFIX in [label.name for label in self.pr.labels]:
|
||||
self.backport_pr.add_to_labels(Labels.PR_CRITICAL_BUGFIX)
|
||||
elif Labels.PR_BUGFIX in [label.name for label in self.pr.labels]:
|
||||
self.backport_pr.add_to_labels(Labels.PR_BUGFIX)
|
||||
self._assign_new_pr(self.backport_pr)
|
||||
|
||||
def ping_cherry_pick_assignees(self, dry_run: bool) -> None:
|
||||
@ -660,9 +663,11 @@ def main():
|
||||
args.repo,
|
||||
args.from_repo,
|
||||
args.dry_run,
|
||||
args.must_create_backport_label
|
||||
if isinstance(args.must_create_backport_label, list)
|
||||
else [args.must_create_backport_label],
|
||||
(
|
||||
args.must_create_backport_label
|
||||
if isinstance(args.must_create_backport_label, list)
|
||||
else [args.must_create_backport_label]
|
||||
),
|
||||
args.backport_created_label,
|
||||
)
|
||||
# https://github.com/python/mypy/issues/3004
|
||||
|
@ -1067,6 +1067,7 @@ CI_CONFIG = CIConfig(
|
||||
Build.PACKAGE_TSAN,
|
||||
Build.PACKAGE_MSAN,
|
||||
Build.PACKAGE_DEBUG,
|
||||
Build.BINARY_RELEASE,
|
||||
]
|
||||
),
|
||||
JobNames.BUILD_CHECK_SPECIAL: BuildReportConfig(
|
||||
@ -1084,7 +1085,6 @@ CI_CONFIG = CIConfig(
|
||||
Build.BINARY_AMD64_COMPAT,
|
||||
Build.BINARY_AMD64_MUSL,
|
||||
Build.PACKAGE_RELEASE_COVERAGE,
|
||||
Build.BINARY_RELEASE,
|
||||
Build.FUZZERS,
|
||||
]
|
||||
),
|
||||
@ -1386,6 +1386,9 @@ REQUIRED_CHECKS = [
|
||||
JobNames.FAST_TEST,
|
||||
JobNames.STATEFUL_TEST_RELEASE,
|
||||
JobNames.STATELESS_TEST_RELEASE,
|
||||
JobNames.STATELESS_TEST_ASAN,
|
||||
JobNames.STATELESS_TEST_FLAKY_ASAN,
|
||||
JobNames.STATEFUL_TEST_ASAN,
|
||||
JobNames.STYLE_CHECK,
|
||||
JobNames.UNIT_TEST_ASAN,
|
||||
JobNames.UNIT_TEST_MSAN,
|
||||
@ -1419,6 +1422,11 @@ class CheckDescription:
|
||||
|
||||
|
||||
CHECK_DESCRIPTIONS = [
|
||||
CheckDescription(
|
||||
StatusNames.SYNC,
|
||||
"If it fails, ask a maintainer for help",
|
||||
lambda x: x == StatusNames.SYNC,
|
||||
),
|
||||
CheckDescription(
|
||||
"AST fuzzer",
|
||||
"Runs randomly generated queries to catch program errors. "
|
||||
|
@ -1,9 +1,12 @@
|
||||
#!/usr/bin/env python
|
||||
import argparse
|
||||
import atexit
|
||||
import logging
|
||||
import os
|
||||
import os.path as p
|
||||
import re
|
||||
import subprocess
|
||||
import tempfile
|
||||
from typing import Any, List, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@ -19,12 +22,16 @@ SHA_REGEXP = re.compile(r"\A([0-9]|[a-f]){40}\Z")
|
||||
CWD = p.dirname(p.realpath(__file__))
|
||||
TWEAK = 1
|
||||
|
||||
GIT_PREFIX = ( # All commits to remote are done as robot-clickhouse
|
||||
"git -c user.email=robot-clickhouse@users.noreply.github.com "
|
||||
"-c user.name=robot-clickhouse -c commit.gpgsign=false "
|
||||
"-c core.sshCommand="
|
||||
"'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'"
|
||||
)
|
||||
with tempfile.NamedTemporaryFile("w", delete=False) as f:
|
||||
GIT_KNOWN_HOSTS_FILE = f.name
|
||||
GIT_PREFIX = ( # All commits to remote are done as robot-clickhouse
|
||||
"git -c user.email=robot-clickhouse@users.noreply.github.com "
|
||||
"-c user.name=robot-clickhouse -c commit.gpgsign=false "
|
||||
"-c core.sshCommand="
|
||||
f"'ssh -o UserKnownHostsFile={GIT_KNOWN_HOSTS_FILE} "
|
||||
"-o StrictHostKeyChecking=accept-new'"
|
||||
)
|
||||
atexit.register(os.remove, f.name)
|
||||
|
||||
|
||||
# Py 3.8 removeprefix and removesuffix
|
||||
|
@ -50,6 +50,8 @@ TRUSTED_CONTRIBUTORS = {
|
||||
|
||||
|
||||
class Labels:
|
||||
PR_BUGFIX = "pr-bugfix"
|
||||
PR_CRITICAL_BUGFIX = "pr-critical-bugfix"
|
||||
CAN_BE_TESTED = "can be tested"
|
||||
DO_NOT_TEST = "do not test"
|
||||
MUST_BACKPORT = "pr-must-backport"
|
||||
@ -68,8 +70,8 @@ class Labels:
|
||||
RELEASE_LTS = "release-lts"
|
||||
SUBMODULE_CHANGED = "submodule changed"
|
||||
|
||||
# pr-bugfix autoport can lead to issues in releases, let's do ci fixes only
|
||||
AUTO_BACKPORT = {"pr-ci"}
|
||||
# automatic backport for critical bug fixes
|
||||
AUTO_BACKPORT = {"pr-critical-bugfix"}
|
||||
|
||||
|
||||
# Descriptions are used in .github/PULL_REQUEST_TEMPLATE.md, keep comments there
|
||||
@ -84,6 +86,7 @@ LABEL_CATEGORIES = {
|
||||
"Bug Fix (user-visible misbehaviour in official stable or prestable release)",
|
||||
"Bug Fix (user-visible misbehavior in official stable or prestable release)",
|
||||
],
|
||||
"pr-critical-bugfix": ["Critical Bug Fix (crash, LOGICAL_ERROR, data loss, RBAC)"],
|
||||
"pr-build": [
|
||||
"Build/Testing/Packaging Improvement",
|
||||
"Build Improvement",
|
||||
|
@ -415,12 +415,12 @@ class BuildResult:
|
||||
for file in Path(REPORT_PATH).iterdir():
|
||||
if f"{build_name}.json" in file.name:
|
||||
any_report = file
|
||||
if "_master_" in file.name:
|
||||
master_report = file
|
||||
elif f"_{head_ref}_" in file.name:
|
||||
ref_report = file
|
||||
elif pr_number and f"_{pr_number}_" in file.name:
|
||||
pr_report = file
|
||||
if "_master_" in file.name:
|
||||
master_report = file
|
||||
elif f"_{head_ref}_" in file.name:
|
||||
ref_report = file
|
||||
elif pr_number and f"_{pr_number}_" in file.name:
|
||||
pr_report = file
|
||||
|
||||
if not any_report:
|
||||
return None
|
||||
|
@ -92,6 +92,13 @@
|
||||
<max_size>22548578304</max_size>
|
||||
<delayed_cleanup_interval_ms>100</delayed_cleanup_interval_ms>
|
||||
</s3_cache_multi_2>
|
||||
<s3_no_cache>
|
||||
<type>s3</type>
|
||||
<endpoint>http://localhost:11111/test/special/</endpoint>
|
||||
<access_key_id>clickhouse</access_key_id>
|
||||
<secret_access_key>clickhouse</secret_access_key>
|
||||
<s3_check_objects_after_upload>0</s3_check_objects_after_upload>
|
||||
</s3_no_cache>
|
||||
</disks>
|
||||
<policies>
|
||||
<local_remote>
|
||||
@ -107,6 +114,13 @@
|
||||
</main>
|
||||
</volumes>
|
||||
</s3_cache>
|
||||
<s3_no_cache>
|
||||
<volumes>
|
||||
<main>
|
||||
<disk>s3_no_cache</disk>
|
||||
</main>
|
||||
</volumes>
|
||||
</s3_no_cache>
|
||||
<s3_cache_multi>
|
||||
<volumes>
|
||||
<main>
|
||||
|
@ -513,6 +513,7 @@ class ClickHouseCluster:
|
||||
self.minio_redirect_host = "proxy1"
|
||||
self.minio_redirect_ip = None
|
||||
self.minio_redirect_port = 8080
|
||||
self.minio_docker_id = self.get_instance_docker_id(self.minio_host)
|
||||
|
||||
self.spark_session = None
|
||||
|
||||
|
@ -183,6 +183,9 @@ class _ServerRuntime:
|
||||
)
|
||||
request_handler.write_error(429, data)
|
||||
|
||||
# make sure that Alibaba errors (QpsLimitExceeded, TotalQpsLimitExceededAction) are retriable
|
||||
# we patched contrib/aws to achive it: https://github.com/ClickHouse/aws-sdk-cpp/pull/22 https://github.com/ClickHouse/aws-sdk-cpp/pull/23
|
||||
# https://www.alibabacloud.com/help/en/oss/support/http-status-code-503
|
||||
class QpsLimitExceededAction:
|
||||
def inject_error(self, request_handler):
|
||||
data = (
|
||||
@ -195,6 +198,18 @@ class _ServerRuntime:
|
||||
)
|
||||
request_handler.write_error(429, data)
|
||||
|
||||
class TotalQpsLimitExceededAction:
|
||||
def inject_error(self, request_handler):
|
||||
data = (
|
||||
'<?xml version="1.0" encoding="UTF-8"?>'
|
||||
"<Error>"
|
||||
"<Code>TotalQpsLimitExceeded</Code>"
|
||||
"<Message>Please reduce your request rate.</Message>"
|
||||
"<RequestId>txfbd566d03042474888193-00608d7537</RequestId>"
|
||||
"</Error>"
|
||||
)
|
||||
request_handler.write_error(429, data)
|
||||
|
||||
class RedirectAction:
|
||||
def __init__(self, host="localhost", port=1):
|
||||
self.dst_host = _and_then(host, str)
|
||||
@ -269,6 +284,10 @@ class _ServerRuntime:
|
||||
self.error_handler = _ServerRuntime.QpsLimitExceededAction(
|
||||
*self.action_args
|
||||
)
|
||||
elif self.action == "total_qps_limit_exceeded":
|
||||
self.error_handler = _ServerRuntime.TotalQpsLimitExceededAction(
|
||||
*self.action_args
|
||||
)
|
||||
else:
|
||||
self.error_handler = _ServerRuntime.Expected500ErrorAction()
|
||||
|
||||
|
@ -0,0 +1,22 @@
|
||||
<?xml version="1.0"?>
|
||||
<clickhouse>
|
||||
<storage_configuration>
|
||||
<disks>
|
||||
<disk_s3_restricted_user>
|
||||
<type>s3</type>
|
||||
<endpoint>http://minio1:9001/root/data/disks/disk_s3_restricted_user/</endpoint>
|
||||
<access_key_id>miniorestricted1</access_key_id>
|
||||
<secret_access_key>minio123</secret_access_key>
|
||||
</disk_s3_restricted_user>
|
||||
</disks>
|
||||
<policies>
|
||||
<policy_s3_restricted>
|
||||
<volumes>
|
||||
<main>
|
||||
<disk>disk_s3_restricted_user</disk>
|
||||
</main>
|
||||
</volumes>
|
||||
</policy_s3_restricted>
|
||||
</policies>
|
||||
</storage_configuration>
|
||||
</clickhouse>
|
@ -3,8 +3,11 @@ import pytest
|
||||
from helpers.cluster import ClickHouseCluster
|
||||
from helpers.test_tools import TSV
|
||||
import uuid
|
||||
import os
|
||||
|
||||
|
||||
CONFIG_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "configs")
|
||||
|
||||
cluster = ClickHouseCluster(__file__)
|
||||
node = cluster.add_instance(
|
||||
"node",
|
||||
@ -20,13 +23,127 @@ node = cluster.add_instance(
|
||||
],
|
||||
with_minio=True,
|
||||
with_zookeeper=True,
|
||||
stay_alive=True,
|
||||
)
|
||||
|
||||
|
||||
def setup_minio_users():
|
||||
# create 2 extra users with restricted access
|
||||
# miniorestricted1 - full access to bucket 'root', no access to other buckets
|
||||
# miniorestricted2 - full access to bucket 'root2', no access to other buckets
|
||||
# storage policy 'policy_s3_restricted' defines a policy for storing files inside bucket 'root' using 'miniorestricted1' user
|
||||
for user, bucket in [("miniorestricted1", "root"), ("miniorestricted2", "root2")]:
|
||||
print(
|
||||
cluster.exec_in_container(
|
||||
cluster.minio_docker_id,
|
||||
[
|
||||
"mc",
|
||||
"alias",
|
||||
"set",
|
||||
"root",
|
||||
"http://minio1:9001",
|
||||
"minio",
|
||||
"minio123",
|
||||
],
|
||||
)
|
||||
)
|
||||
policy = f"""
|
||||
{{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{{
|
||||
"Effect": "Allow",
|
||||
"Principal": {{
|
||||
"AWS": [
|
||||
"*"
|
||||
]
|
||||
}},
|
||||
"Action": [
|
||||
"s3:GetBucketLocation",
|
||||
"s3:ListBucket",
|
||||
"s3:ListBucketMultipartUploads"
|
||||
],
|
||||
"Resource": [
|
||||
"arn:aws:s3:::{bucket}"
|
||||
]
|
||||
}},
|
||||
{{
|
||||
"Effect": "Allow",
|
||||
"Principal": {{
|
||||
"AWS": [
|
||||
"*"
|
||||
]
|
||||
}},
|
||||
"Action": [
|
||||
"s3:AbortMultipartUpload",
|
||||
"s3:DeleteObject",
|
||||
"s3:GetObject",
|
||||
"s3:ListMultipartUploadParts",
|
||||
"s3:PutObject"
|
||||
],
|
||||
"Resource": [
|
||||
"arn:aws:s3:::{bucket}/*"
|
||||
]
|
||||
}}
|
||||
]
|
||||
}}"""
|
||||
|
||||
cluster.exec_in_container(
|
||||
cluster.minio_docker_id,
|
||||
["bash", "-c", f"cat >/tmp/{bucket}_policy.json <<EOL{policy}"],
|
||||
)
|
||||
cluster.exec_in_container(
|
||||
cluster.minio_docker_id, ["cat", f"/tmp/{bucket}_policy.json"]
|
||||
)
|
||||
print(
|
||||
cluster.exec_in_container(
|
||||
cluster.minio_docker_id,
|
||||
["mc", "admin", "user", "add", "root", user, "minio123"],
|
||||
)
|
||||
)
|
||||
print(
|
||||
cluster.exec_in_container(
|
||||
cluster.minio_docker_id,
|
||||
[
|
||||
"mc",
|
||||
"admin",
|
||||
"policy",
|
||||
"create",
|
||||
"root",
|
||||
f"{bucket}only",
|
||||
f"/tmp/{bucket}_policy.json",
|
||||
],
|
||||
)
|
||||
)
|
||||
print(
|
||||
cluster.exec_in_container(
|
||||
cluster.minio_docker_id,
|
||||
[
|
||||
"mc",
|
||||
"admin",
|
||||
"policy",
|
||||
"attach",
|
||||
"root",
|
||||
f"{bucket}only",
|
||||
"--user",
|
||||
user,
|
||||
],
|
||||
)
|
||||
)
|
||||
|
||||
node.stop_clickhouse()
|
||||
node.copy_file_to_container(
|
||||
os.path.join(CONFIG_DIR, "disk_s3_restricted_user.xml"),
|
||||
"/etc/clickhouse-server/config.d/disk_s3_restricted_user.xml",
|
||||
)
|
||||
node.start_clickhouse()
|
||||
|
||||
|
||||
@pytest.fixture(scope="module", autouse=True)
|
||||
def start_cluster():
|
||||
try:
|
||||
cluster.start()
|
||||
setup_minio_users()
|
||||
yield
|
||||
finally:
|
||||
cluster.shutdown()
|
||||
@ -137,6 +254,7 @@ def check_system_tables(backup_query_id=None):
|
||||
("disk_s3_cache", "ObjectStorage", "S3", "Local"),
|
||||
("disk_s3_other_bucket", "ObjectStorage", "S3", "Local"),
|
||||
("disk_s3_plain", "ObjectStorage", "S3", "Plain"),
|
||||
("disk_s3_restricted_user", "ObjectStorage", "S3", "Local"),
|
||||
)
|
||||
assert len(expected_disks) == len(disks)
|
||||
for expected_disk in expected_disks:
|
||||
@ -588,3 +706,22 @@ def test_user_specific_auth(start_cluster):
|
||||
)
|
||||
|
||||
node.query("DROP TABLE IF EXISTS test.specific_auth")
|
||||
|
||||
|
||||
def test_backup_to_s3_different_credentials():
|
||||
storage_policy = "policy_s3_restricted"
|
||||
|
||||
def check_backup_restore(allow_s3_native_copy):
|
||||
backup_name = new_backup_name()
|
||||
backup_destination = f"S3('http://minio1:9001/root2/data/backups/{backup_name}', 'miniorestricted2', 'minio123')"
|
||||
settings = {"allow_s3_native_copy": allow_s3_native_copy}
|
||||
(backup_events, _) = check_backup_and_restore(
|
||||
storage_policy,
|
||||
backup_destination,
|
||||
backup_settings=settings,
|
||||
restore_settings=settings,
|
||||
)
|
||||
check_system_tables(backup_events["query_id"])
|
||||
|
||||
check_backup_restore(False)
|
||||
check_backup_restore(True)
|
||||
|
@ -205,6 +205,7 @@ def test_upload_s3_fail_upload_part_when_multi_part_upload(
|
||||
[
|
||||
("slow_down", "DB::Exception: Slow Down."),
|
||||
("qps_limit_exceeded", "DB::Exception: Please reduce your request rate."),
|
||||
("total_qps_limit_exceeded", "DB::Exception: Please reduce your request rate."),
|
||||
(
|
||||
"connection_refused",
|
||||
"Poco::Exception. Code: 1000, e.code() = 111, Connection refused",
|
||||
|
@ -90,7 +90,7 @@ def test_lost_part_same_replica(start_cluster):
|
||||
)
|
||||
|
||||
assert node1.contains_in_log(
|
||||
"Created empty part"
|
||||
f"Created empty part {victim_part_from_the_middle}"
|
||||
), f"Seems like empty part {victim_part_from_the_middle} is not created or log message changed"
|
||||
|
||||
assert node1.query("SELECT COUNT() FROM mt0") == "4\n"
|
||||
@ -143,7 +143,10 @@ def test_lost_part_other_replica(start_cluster):
|
||||
node1.query("CHECK TABLE mt1")
|
||||
|
||||
node2.query("SYSTEM START REPLICATION QUEUES")
|
||||
res, err = node1.query_and_get_answer_with_error("SYSTEM SYNC REPLICA mt1")
|
||||
# Reduce timeout in sync replica since it might never finish with merge stopped and we don't want to wait 300s
|
||||
res, err = node1.query_and_get_answer_with_error(
|
||||
"SYSTEM SYNC REPLICA mt1", settings={"receive_timeout": 30}
|
||||
)
|
||||
print("result: ", res)
|
||||
print("error: ", res)
|
||||
|
||||
@ -158,10 +161,10 @@ def test_lost_part_other_replica(start_cluster):
|
||||
)
|
||||
|
||||
assert node1.contains_in_log(
|
||||
"Created empty part"
|
||||
), "Seems like empty part {} is not created or log message changed".format(
|
||||
victim_part_from_the_middle
|
||||
)
|
||||
f"Created empty part {victim_part_from_the_middle}"
|
||||
) or node1.contains_in_log(
|
||||
f"Part {victim_part_from_the_middle} looks broken. Removing it and will try to fetch."
|
||||
), f"Seems like empty part {victim_part_from_the_middle} is not created or log message changed"
|
||||
|
||||
assert_eq_with_retry(node2, "SELECT COUNT() FROM mt1", "4")
|
||||
assert_eq_with_retry(node2, "SELECT COUNT() FROM system.replication_queue", "0")
|
||||
|
@ -9,20 +9,10 @@ cluster = ClickHouseCluster(__file__)
|
||||
|
||||
NUM_WORKERS = 5
|
||||
|
||||
nodes = []
|
||||
for i in range(NUM_WORKERS):
|
||||
name = "node{}".format(i + 1)
|
||||
node = cluster.add_instance(
|
||||
name,
|
||||
main_configs=["configs/storage_conf.xml"],
|
||||
env_variables={"ENDPOINT_SUBPATH": name},
|
||||
with_minio=True,
|
||||
stay_alive=True,
|
||||
)
|
||||
nodes.append(node)
|
||||
|
||||
MAX_ROWS = 1000
|
||||
|
||||
dirs_created = []
|
||||
|
||||
|
||||
def gen_insert_values(size):
|
||||
return ",".join(
|
||||
@ -38,6 +28,17 @@ insert_values = ",".join(
|
||||
|
||||
@pytest.fixture(scope="module", autouse=True)
|
||||
def start_cluster():
|
||||
for i in range(NUM_WORKERS):
|
||||
cluster.add_instance(
|
||||
f"node{i + 1}",
|
||||
main_configs=["configs/storage_conf.xml"],
|
||||
with_minio=True,
|
||||
env_variables={"ENDPOINT_SUBPATH": f"node{i + 1}"},
|
||||
stay_alive=True,
|
||||
# Override ENDPOINT_SUBPATH.
|
||||
instance_env_variables=i > 0,
|
||||
)
|
||||
|
||||
try:
|
||||
cluster.start()
|
||||
yield cluster
|
||||
@ -64,10 +65,10 @@ def test_insert():
|
||||
gen_insert_values(random.randint(1, MAX_ROWS)) for _ in range(0, NUM_WORKERS)
|
||||
]
|
||||
threads = []
|
||||
assert len(cluster.instances) == NUM_WORKERS
|
||||
for i in range(NUM_WORKERS):
|
||||
t = threading.Thread(
|
||||
target=create_insert, args=(nodes[i], insert_values_arr[i])
|
||||
)
|
||||
node = cluster.instances[f"node{i + 1}"]
|
||||
t = threading.Thread(target=create_insert, args=(node, insert_values_arr[i]))
|
||||
threads.append(t)
|
||||
t.start()
|
||||
|
||||
@ -75,48 +76,61 @@ def test_insert():
|
||||
t.join()
|
||||
|
||||
for i in range(NUM_WORKERS):
|
||||
node = cluster.instances[f"node{i + 1}"]
|
||||
assert (
|
||||
nodes[i].query("SELECT * FROM test ORDER BY id FORMAT Values")
|
||||
node.query("SELECT * FROM test ORDER BY id FORMAT Values")
|
||||
== insert_values_arr[i]
|
||||
)
|
||||
|
||||
for i in range(NUM_WORKERS):
|
||||
nodes[i].query("ALTER TABLE test MODIFY SETTING old_parts_lifetime = 59")
|
||||
node = cluster.instances[f"node{i + 1}"]
|
||||
node.query("ALTER TABLE test MODIFY SETTING old_parts_lifetime = 59")
|
||||
assert (
|
||||
nodes[i]
|
||||
.query(
|
||||
node.query(
|
||||
"SELECT engine_full from system.tables WHERE database = currentDatabase() AND name = 'test'"
|
||||
)
|
||||
.find("old_parts_lifetime = 59")
|
||||
).find("old_parts_lifetime = 59")
|
||||
!= -1
|
||||
)
|
||||
|
||||
nodes[i].query("ALTER TABLE test RESET SETTING old_parts_lifetime")
|
||||
node.query("ALTER TABLE test RESET SETTING old_parts_lifetime")
|
||||
assert (
|
||||
nodes[i]
|
||||
.query(
|
||||
node.query(
|
||||
"SELECT engine_full from system.tables WHERE database = currentDatabase() AND name = 'test'"
|
||||
)
|
||||
.find("old_parts_lifetime")
|
||||
).find("old_parts_lifetime")
|
||||
== -1
|
||||
)
|
||||
nodes[i].query("ALTER TABLE test MODIFY COMMENT 'new description'")
|
||||
node.query("ALTER TABLE test MODIFY COMMENT 'new description'")
|
||||
assert (
|
||||
nodes[i]
|
||||
.query(
|
||||
node.query(
|
||||
"SELECT comment from system.tables WHERE database = currentDatabase() AND name = 'test'"
|
||||
)
|
||||
.find("new description")
|
||||
).find("new description")
|
||||
!= -1
|
||||
)
|
||||
|
||||
created = int(
|
||||
node.query(
|
||||
"SELECT value FROM system.events WHERE event = 'DiskPlainRewritableS3DirectoryCreated'"
|
||||
)
|
||||
)
|
||||
assert created > 0
|
||||
dirs_created.append(created)
|
||||
assert (
|
||||
int(
|
||||
node.query(
|
||||
"SELECT value FROM system.metrics WHERE metric = 'DiskPlainRewritableS3DirectoryMapSize'"
|
||||
)
|
||||
)
|
||||
== created
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.order(1)
|
||||
def test_restart():
|
||||
insert_values_arr = []
|
||||
for i in range(NUM_WORKERS):
|
||||
node = cluster.instances[f"node{i + 1}"]
|
||||
insert_values_arr.append(
|
||||
nodes[i].query("SELECT * FROM test ORDER BY id FORMAT Values")
|
||||
node.query("SELECT * FROM test ORDER BY id FORMAT Values")
|
||||
)
|
||||
|
||||
def restart(node):
|
||||
@ -124,7 +138,7 @@ def test_restart():
|
||||
|
||||
threads = []
|
||||
for i in range(NUM_WORKERS):
|
||||
t = threading.Thread(target=restart, args=(nodes[i],))
|
||||
t = threading.Thread(target=restart, args=(node,))
|
||||
threads.append(t)
|
||||
t.start()
|
||||
|
||||
@ -132,8 +146,9 @@ def test_restart():
|
||||
t.join()
|
||||
|
||||
for i in range(NUM_WORKERS):
|
||||
node = cluster.instances[f"node{i + 1}"]
|
||||
assert (
|
||||
nodes[i].query("SELECT * FROM test ORDER BY id FORMAT Values")
|
||||
node.query("SELECT * FROM test ORDER BY id FORMAT Values")
|
||||
== insert_values_arr[i]
|
||||
)
|
||||
|
||||
@ -141,7 +156,16 @@ def test_restart():
|
||||
@pytest.mark.order(2)
|
||||
def test_drop():
|
||||
for i in range(NUM_WORKERS):
|
||||
nodes[i].query("DROP TABLE IF EXISTS test SYNC")
|
||||
node = cluster.instances[f"node{i + 1}"]
|
||||
node.query("DROP TABLE IF EXISTS test SYNC")
|
||||
|
||||
removed = int(
|
||||
node.query(
|
||||
"SELECT value FROM system.events WHERE event = 'DiskPlainRewritableS3DirectoryRemoved'"
|
||||
)
|
||||
)
|
||||
|
||||
assert dirs_created[i] == removed
|
||||
|
||||
it = cluster.minio_client.list_objects(
|
||||
cluster.minio_bucket, "data/", recursive=True
|
||||
|
42
tests/performance/sparse_column_filter.xml
Normal file
42
tests/performance/sparse_column_filter.xml
Normal file
@ -0,0 +1,42 @@
|
||||
<test>
|
||||
<substitutions>
|
||||
<substitution>
|
||||
<name>serialization</name>
|
||||
<values>
|
||||
<value>sparse</value>
|
||||
</values>
|
||||
</substitution>
|
||||
<substitution>
|
||||
<name>ratio</name>
|
||||
<values>
|
||||
<value>10</value>
|
||||
<value>100</value>
|
||||
<value>1000</value>
|
||||
</values>
|
||||
</substitution>
|
||||
</substitutions>
|
||||
|
||||
<create_query>
|
||||
CREATE TABLE test_{serialization}_{ratio} (id UInt64, u8 UInt8, u64 UInt64, str String)
|
||||
ENGINE = MergeTree ORDER BY id
|
||||
SETTINGS ratio_of_defaults_for_sparse_serialization = 0.8
|
||||
</create_query>
|
||||
|
||||
<create_query>SYSTEM STOP MERGES test_{serialization}_{ratio}</create_query>
|
||||
|
||||
<fill_query>
|
||||
INSERT INTO test_{serialization}_{ratio} SELECT
|
||||
number,
|
||||
number % {ratio} = 0 ? rand(1) : 0,
|
||||
number % {ratio} = 0 ? rand(2) : 0,
|
||||
number % {ratio} = 0 ? randomPrintableASCII(64, 3) : ''
|
||||
FROM numbers(100000000)
|
||||
</fill_query>
|
||||
|
||||
<query>SELECT str, COUNT(DISTINCT id) as i FROM test_{serialization}_{ratio} WHERE notEmpty(str) GROUP BY str ORDER BY i DESC LIMIT 10</query>
|
||||
<query>SELECT str, COUNT(DISTINCT u8) as u FROM test_{serialization}_{ratio} WHERE notEmpty(str) GROUP BY str ORDER BY u DESC LIMIT 10</query>
|
||||
<query>SELECT str, COUNT(DISTINCT u64) as u FROM test_{serialization}_{ratio} WHERE notEmpty(str) GROUP BY str ORDER BY u DESC LIMIT 10</query>
|
||||
|
||||
|
||||
<drop_query>DROP TABLE IF EXISTS test_{serialization}_{ratio}</drop_query>
|
||||
</test>
|
@ -1,5 +1,5 @@
|
||||
-- Disable external aggregation because the state is reset for each new block of data in 'runningAccumulate' function.
|
||||
SET max_bytes_before_external_group_by = 0;
|
||||
SET allow_deprecated_functions = 1;
|
||||
SET allow_deprecated_error_prone_window_functions = 1;
|
||||
|
||||
SELECT k, finalizeAggregation(sum_state), runningAccumulate(sum_state) FROM (SELECT intDiv(number, 50000) AS k, sumState(number) AS sum_state FROM (SELECT number FROM system.numbers LIMIT 1000000) GROUP BY k ORDER BY k);
|
||||
|
@ -1,4 +1,4 @@
|
||||
SET allow_deprecated_functions = 1;
|
||||
SET allow_deprecated_error_prone_window_functions = 1;
|
||||
DROP TABLE IF EXISTS arena;
|
||||
CREATE TABLE arena (k UInt8, d String) ENGINE = Memory;
|
||||
INSERT INTO arena SELECT number % 10 AS k, hex(intDiv(number, 10) % 1000) AS d FROM system.numbers LIMIT 10000000;
|
||||
|
@ -1,4 +1,4 @@
|
||||
SET allow_deprecated_functions = 1;
|
||||
SET allow_deprecated_error_prone_window_functions = 1;
|
||||
select runningDifference(x) from (select arrayJoin([0, 1, 5, 10]) as x);
|
||||
select '-';
|
||||
select runningDifference(x) from (select arrayJoin([2, Null, 3, Null, 10]) as x);
|
||||
|
@ -11,7 +11,11 @@ ${CLICKHOUSE_CLIENT} --query "DROP DATABASE IF EXISTS parallel_ddl"
|
||||
|
||||
function query()
|
||||
{
|
||||
for _ in {1..50}; do
|
||||
local it=0
|
||||
TIMELIMIT=30
|
||||
while [ $SECONDS -lt "$TIMELIMIT" ] && [ $it -lt 50 ];
|
||||
do
|
||||
it=$((it+1))
|
||||
${CLICKHOUSE_CLIENT} --query "CREATE DATABASE IF NOT EXISTS parallel_ddl"
|
||||
${CLICKHOUSE_CLIENT} --query "DROP DATABASE IF EXISTS parallel_ddl"
|
||||
done
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user