mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-22 07:31:57 +00:00
Merge branch 'master' into add-test-658
This commit is contained in:
commit
445487e073
@ -1,7 +1,7 @@
|
||||
# To run clang-tidy from CMake, build ClickHouse with -DENABLE_CLANG_TIDY=1. To show all warnings, it is
|
||||
# recommended to pass "-k0" to Ninja.
|
||||
|
||||
# Enable all checks + disale selected checks. Feel free to remove disabled checks from below list if
|
||||
# Enable all checks + disable selected checks. Feel free to remove disabled checks from below list if
|
||||
# a) the new check is not controversial (this includes many checks in readability-* and google-*) or
|
||||
# b) too noisy (checks with > 100 new warnings are considered noisy, this includes e.g. cppcoreguidelines-*).
|
||||
|
||||
|
51
CHANGELOG.md
51
CHANGELOG.md
@ -1,6 +1,6 @@
|
||||
### Table of Contents
|
||||
**[ClickHouse release v22.9, 2022-09-22](#229)**<br/>
|
||||
**[ClickHouse release v22.8, 2022-08-18](#228)**<br/>
|
||||
**[ClickHouse release v22.8-lts, 2022-08-18](#228)**<br/>
|
||||
**[ClickHouse release v22.7, 2022-07-21](#227)**<br/>
|
||||
**[ClickHouse release v22.6, 2022-06-16](#226)**<br/>
|
||||
**[ClickHouse release v22.5, 2022-05-19](#225)**<br/>
|
||||
@ -10,10 +10,10 @@
|
||||
**[ClickHouse release v22.1, 2022-01-18](#221)**<br/>
|
||||
**[Changelog for 2021](https://clickhouse.com/docs/en/whats-new/changelog/2021/)**<br/>
|
||||
|
||||
|
||||
### <a id="229"></a> ClickHouse release 22.9, 2022-09-22
|
||||
|
||||
#### Backward Incompatible Change
|
||||
|
||||
* Upgrade from 20.3 and older to 22.9 and newer should be done through an intermediate version if there are any `ReplicatedMergeTree` tables, otherwise server with the new version will not start. [#40641](https://github.com/ClickHouse/ClickHouse/pull/40641) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
* Remove the functions `accurate_Cast` and `accurate_CastOrNull` (they are different to `accurateCast` and `accurateCastOrNull` by underscore in the name and they are not affected by the value of `cast_keep_nullable` setting). These functions were undocumented, untested, unused, and unneeded. They appeared to be alive due to code generalization. [#40682](https://github.com/ClickHouse/ClickHouse/pull/40682) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add a test to ensure that every new table function will be documented. See [#40649](https://github.com/ClickHouse/ClickHouse/issues/40649). Rename table function `MeiliSearch` to `meilisearch`. [#40709](https://github.com/ClickHouse/ClickHouse/pull/40709) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
@ -21,6 +21,7 @@
|
||||
* Make interpretation of YAML configs to be more conventional. [#41044](https://github.com/ClickHouse/ClickHouse/pull/41044) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
|
||||
#### New Feature
|
||||
|
||||
* Support `insert_quorum = 'auto'` to use majority number. [#39970](https://github.com/ClickHouse/ClickHouse/pull/39970) ([Sachin](https://github.com/SachinSetiya)).
|
||||
* Add embedded dashboards to ClickHouse server. This is a demo project about how to achieve 90% results with 1% effort using ClickHouse features. [#40461](https://github.com/ClickHouse/ClickHouse/pull/40461) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Added new settings constraint writability kind `changeable_in_readonly`. [#40631](https://github.com/ClickHouse/ClickHouse/pull/40631) ([Sergei Trifonov](https://github.com/serxa)).
|
||||
@ -38,6 +39,7 @@
|
||||
* Improvement for in-memory data parts: remove completely processed WAL files. [#40592](https://github.com/ClickHouse/ClickHouse/pull/40592) ([Azat Khuzhin](https://github.com/azat)).
|
||||
|
||||
#### Performance Improvement
|
||||
|
||||
* Implement compression of marks and primary key. Close [#34437](https://github.com/ClickHouse/ClickHouse/issues/34437). [#37693](https://github.com/ClickHouse/ClickHouse/pull/37693) ([zhongyuankai](https://github.com/zhongyuankai)).
|
||||
* Allow to load marks with threadpool in advance. Regulated by setting `load_marks_asynchronously` (default: 0). [#40821](https://github.com/ClickHouse/ClickHouse/pull/40821) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Virtual filesystem over s3 will use random object names split into multiple path prefixes for better performance on AWS. [#40968](https://github.com/ClickHouse/ClickHouse/pull/40968) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
@ -58,6 +60,7 @@
|
||||
* Parallel hash JOIN for Float data types might be suboptimal. Make it better. [#41183](https://github.com/ClickHouse/ClickHouse/pull/41183) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
|
||||
#### Improvement
|
||||
|
||||
* During startup and ATTACH call, `ReplicatedMergeTree` tables will be readonly until the ZooKeeper connection is made and the setup is finished. [#40148](https://github.com/ClickHouse/ClickHouse/pull/40148) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Add `enable_extended_results_for_datetime_functions` option to return results of type Date32 for functions toStartOfYear, toStartOfISOYear, toStartOfQuarter, toStartOfMonth, toStartOfWeek, toMonday and toLastDayOfMonth when argument is Date32 or DateTime64, otherwise results of Date type are returned. For compatibility reasons default value is ‘0’. [#41214](https://github.com/ClickHouse/ClickHouse/pull/41214) ([Roman Vasin](https://github.com/rvasin)).
|
||||
* For security and stability reasons, CatBoost models are no longer evaluated within the ClickHouse server. Instead, the evaluation is now done in the clickhouse-library-bridge, a separate process that loads the catboost library and communicates with the server process via HTTP. [#40897](https://github.com/ClickHouse/ClickHouse/pull/40897) ([Robert Schulze](https://github.com/rschu1ze)). [#39629](https://github.com/ClickHouse/ClickHouse/pull/39629) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
@ -108,6 +111,7 @@
|
||||
* Add `has_lightweight_delete` to system.parts. [#41564](https://github.com/ClickHouse/ClickHouse/pull/41564) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* Enforce documentation for every setting. [#40644](https://github.com/ClickHouse/ClickHouse/pull/40644) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Enforce documentation for every current metric. [#40645](https://github.com/ClickHouse/ClickHouse/pull/40645) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Enforce documentation for every profile event counter. Write the documentation where it was missing. [#40646](https://github.com/ClickHouse/ClickHouse/pull/40646) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
@ -217,15 +221,16 @@
|
||||
* Fix read bytes/rows in X-ClickHouse-Summary with materialized views. [#41586](https://github.com/ClickHouse/ClickHouse/pull/41586) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Fix possible `pipeline stuck` exception for queries with `OFFSET`. The error was found with `enable_optimize_predicate_expression = 0` and always false condition in `WHERE`. Fixes [#41383](https://github.com/ClickHouse/ClickHouse/issues/41383). [#41588](https://github.com/ClickHouse/ClickHouse/pull/41588) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
|
||||
|
||||
### <a id="228"></a> ClickHouse release 22.8, 2022-08-18
|
||||
### <a id="228"></a> ClickHouse release 22.8-lts, 2022-08-18
|
||||
|
||||
#### Backward Incompatible Change
|
||||
|
||||
* Extended range of `Date32` and `DateTime64` to support dates from the year 1900 to 2299. In previous versions, the supported interval was only from the year 1925 to 2283. The implementation is using the proleptic Gregorian calendar (which is conformant with [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601):2004 (clause 3.2.1 The Gregorian calendar)) instead of accounting for historical transitions from the Julian to the Gregorian calendar. This change affects implementation-specific behavior for out-of-range arguments. E.g. if in previous versions the value of `1899-01-01` was clamped to `1925-01-01`, in the new version it will be clamped to `1900-01-01`. It changes the behavior of rounding with `toStartOfInterval` if you pass `INTERVAL 3 QUARTER` up to one quarter because the intervals are counted from an implementation-specific point of time. Closes [#28216](https://github.com/ClickHouse/ClickHouse/issues/28216), improves [#38393](https://github.com/ClickHouse/ClickHouse/issues/38393). [#39425](https://github.com/ClickHouse/ClickHouse/pull/39425) ([Roman Vasin](https://github.com/rvasin)).
|
||||
* Now, all relevant dictionary sources respect `remote_url_allow_hosts` setting. It was already done for HTTP, Cassandra, Redis. Added ClickHouse, MongoDB, MySQL, PostgreSQL. Host is checked only for dictionaries created from DDL. [#39184](https://github.com/ClickHouse/ClickHouse/pull/39184) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Make the remote filesystem cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration. Closes https://github.com/ClickHouse/ClickHouse/issues/36140. Closes https://github.com/ClickHouse/ClickHouse/issues/37889. ([Kseniia Sumarokova](https://github.com/kssenii)). [#36171](https://github.com/ClickHouse/ClickHouse/pull/36171))
|
||||
|
||||
#### New Feature
|
||||
|
||||
* Query parameters can be set in interactive mode as `SET param_abc = 'def'` and transferred via the native protocol as settings. [#39906](https://github.com/ClickHouse/ClickHouse/pull/39906) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Quota key can be set in the native protocol ([Yakov Olkhovsky](https://github.com/ClickHouse/ClickHouse/pull/39874)).
|
||||
* Added a setting `exact_rows_before_limit` (0/1). When enabled, ClickHouse will provide exact value for `rows_before_limit_at_least` statistic, but with the cost that the data before limit will have to be read completely. This closes [#6613](https://github.com/ClickHouse/ClickHouse/issues/6613). [#25333](https://github.com/ClickHouse/ClickHouse/pull/25333) ([kevin wan](https://github.com/MaxWk)).
|
||||
@ -240,12 +245,14 @@
|
||||
* Add new setting schema_inference_hints that allows to specify structure hints in schema inference for specific columns. Closes [#39569](https://github.com/ClickHouse/ClickHouse/issues/39569). [#40068](https://github.com/ClickHouse/ClickHouse/pull/40068) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
|
||||
#### Experimental Feature
|
||||
|
||||
* Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families. [#37893](https://github.com/ClickHouse/ClickHouse/pull/37893) ([Jianmei Zhang](https://github.com/zhangjmruc)) ([Alexander Gololobov](https://github.com/davenger)). Note: this new feature does not make ClickHouse an HTAP DBMS.
|
||||
|
||||
#### Performance Improvement
|
||||
|
||||
* Improved memory usage during memory efficient merging of aggregation results. [#39429](https://github.com/ClickHouse/ClickHouse/pull/39429) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Added concurrency control logic to limit total number of concurrent threads created by queries. [#37558](https://github.com/ClickHouse/ClickHouse/pull/37558) ([Sergei Trifonov](https://github.com/serxa)). Add `concurrent_threads_soft_limit parameter` to increase performance in case of high QPS by means of limiting total number of threads for all queries. [#37285](https://github.com/ClickHouse/ClickHouse/pull/37285) ([Roman Vasin](https://github.com/rvasin)).
|
||||
* Add `SLRU` cache policy for uncompressed cache and marks cache. ([Kseniia Sumarokova](https://github.com/kssenii)). [#34651](https://github.com/ClickHouse/ClickHouse/pull/34651) ([alexX512](https://github.com/alexX512)). Decoupling local cache function and cache algorithm [#38048](https://github.com/ClickHouse/ClickHouse/pull/38048) ([Han Shukai](https://github.com/KinderRiven)).
|
||||
* Add `SLRU` cache policy for uncompressed cache and marks cache. ([Kseniia Sumarokova](https://github.com/kssenii)). [#34651](https://github.com/ClickHouse/ClickHouse/pull/34651) ([alexX512](https://github.com/alexX512)). Decoupling local cache function and cache algorithm [#38048](https://github.com/ClickHouse/ClickHouse/pull/38048) ([Han Shukai](https://github.com/KinderRiven)).
|
||||
* Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the [Intel® Query Processing Library (QPL)](https://github.com/intel/qpl) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. [#36654](https://github.com/ClickHouse/ClickHouse/pull/36654) ([jasperzhu](https://github.com/jinjunzh)). [#39494](https://github.com/ClickHouse/ClickHouse/pull/39494) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* `DISTINCT` in order with `ORDER BY`: Deduce way to sort based on input stream sort description. Skip sorting if input stream is already sorted. [#38719](https://github.com/ClickHouse/ClickHouse/pull/38719) ([Igor Nikonov](https://github.com/devcrafter)). Improve memory usage (significantly) and query execution time + use `DistinctSortedChunkTransform` for final distinct when `DISTINCT` columns match `ORDER BY` columns, but rename to `DistinctSortedStreamTransform` in `EXPLAIN PIPELINE` → this improves memory usage significantly + remove unnecessary allocations in hot loop in `DistinctSortedChunkTransform`. [#39432](https://github.com/ClickHouse/ClickHouse/pull/39432) ([Igor Nikonov](https://github.com/devcrafter)). Use `DistinctSortedTransform` only when sort description is applicable to DISTINCT columns, otherwise fall back to ordinary DISTINCT implementation + it allows making less checks during `DistinctSortedTransform` execution. [#39528](https://github.com/ClickHouse/ClickHouse/pull/39528) ([Igor Nikonov](https://github.com/devcrafter)). Fix: `DistinctSortedTransform` didn't take advantage of sorting. It never cleared HashSet since clearing_columns were detected incorrectly (always empty). So, it basically worked as ordinary `DISTINCT` (`DistinctTransform`). The fix reduces memory usage significantly. [#39538](https://github.com/ClickHouse/ClickHouse/pull/39538) ([Igor Nikonov](https://github.com/devcrafter)).
|
||||
* Use local node as first priority to get structure of remote table when executing `cluster` and similar table functions. [#39440](https://github.com/ClickHouse/ClickHouse/pull/39440) ([Mingliang Pan](https://github.com/liangliangpan)).
|
||||
@ -256,6 +263,7 @@
|
||||
* Improve bytes to bits mask transform for SSE/AVX/AVX512. [#39586](https://github.com/ClickHouse/ClickHouse/pull/39586) ([Guo Wangyang](https://github.com/guowangy)).
|
||||
|
||||
#### Improvement
|
||||
|
||||
* Normalize `AggregateFunction` types and state representations because optimizations like [#35788](https://github.com/ClickHouse/ClickHouse/pull/35788) will treat `count(not null columns)` as `count()`, which might confuses distributed interpreters with the following error : `Conversion from AggregateFunction(count) to AggregateFunction(count, Int64) is not supported`. [#39420](https://github.com/ClickHouse/ClickHouse/pull/39420) ([Amos Bird](https://github.com/amosbird)). The functions with identical states can be used in materialized views interchangeably.
|
||||
* Rework and simplify the `system.backups` table, remove the `internal` column, allow user to set the ID of operation, add columns `num_files`, `uncompressed_size`, `compressed_size`, `start_time`, `end_time`. [#39503](https://github.com/ClickHouse/ClickHouse/pull/39503) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Improved structure of DDL query result table for `Replicated` database (separate columns with shard and replica name, more clear status) - `CREATE TABLE ... ON CLUSTER` queries can be normalized on initiator first if `distributed_ddl_entry_format_version` is set to 3 (default value). It means that `ON CLUSTER` queries may not work if initiator does not belong to the cluster that specified in query. Fixes [#37318](https://github.com/ClickHouse/ClickHouse/issues/37318), [#39500](https://github.com/ClickHouse/ClickHouse/issues/39500) - Ignore `ON CLUSTER` clause if database is `Replicated` and cluster name equals to database name. Related to [#35570](https://github.com/ClickHouse/ClickHouse/issues/35570) - Miscellaneous minor fixes for `Replicated` database engine - Check metadata consistency when starting up `Replicated` database, start replica recovery in case of mismatch of local metadata and metadata in Keeper. Resolves [#24880](https://github.com/ClickHouse/ClickHouse/issues/24880). [#37198](https://github.com/ClickHouse/ClickHouse/pull/37198) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
@ -294,6 +302,7 @@
|
||||
* Add support for LARGE_BINARY/LARGE_STRING with Arrow (Closes [#32401](https://github.com/ClickHouse/ClickHouse/issues/32401)). [#40293](https://github.com/ClickHouse/ClickHouse/pull/40293) ([Josh Taylor](https://github.com/joshuataylor)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* [ClickFiddle](https://fiddle.clickhouse.com/): A new tool for testing ClickHouse versions in read/write mode (**Igor Baliuk**).
|
||||
* ClickHouse binary is made self-extracting [#35775](https://github.com/ClickHouse/ClickHouse/pull/35775) ([Yakov Olkhovskiy, Arthur Filatenkov](https://github.com/yakov-olkhovskiy)).
|
||||
* Update tzdata to 2022b to support the new timezone changes. See https://github.com/google/cctz/pull/226. Chile's 2022 DST start is delayed from September 4 to September 11. Iran plans to stop observing DST permanently, after it falls back on 2022-09-21. There are corrections of the historical time zone of Asia/Tehran in the year 1977: Iran adopted standard time in 1935, not 1946. In 1977 it observed DST from 03-21 23:00 to 10-20 24:00; its 1978 transitions were on 03-24 and 08-05, not 03-20 and 10-20; and its spring 1979 transition was on 05-27, not 03-21 (https://data.iana.org/time-zones/tzdb/NEWS). ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
@ -308,6 +317,7 @@
|
||||
* Docker: Now entrypoint.sh in docker image creates and executes chown for all folders it found in config for multidisk setup [#17717](https://github.com/ClickHouse/ClickHouse/issues/17717). [#39121](https://github.com/ClickHouse/ClickHouse/pull/39121) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
|
||||
#### Bug Fix
|
||||
|
||||
* Fix possible segfault in `CapnProto` input format. This bug was found and send through ClickHouse bug-bounty [program](https://github.com/ClickHouse/ClickHouse/issues/38986) by *kiojj*. [#40241](https://github.com/ClickHouse/ClickHouse/pull/40241) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fix a very rare case of incorrect behavior of array subscript operator. This closes [#28720](https://github.com/ClickHouse/ClickHouse/issues/28720). [#40185](https://github.com/ClickHouse/ClickHouse/pull/40185) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix insufficient argument check for encryption functions (found by query fuzzer). This closes [#39987](https://github.com/ClickHouse/ClickHouse/issues/39987). [#40194](https://github.com/ClickHouse/ClickHouse/pull/40194) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
@ -358,16 +368,17 @@
|
||||
* A fix for reverse DNS resolution. [#40134](https://github.com/ClickHouse/ClickHouse/pull/40134) ([Arthur Passos](https://github.com/arthurpassos)).
|
||||
* Fix unexpected result `arrayDifference` of `Array(UInt32). [#40211](https://github.com/ClickHouse/ClickHouse/pull/40211) ([Duc Canh Le](https://github.com/canhld94)).
|
||||
|
||||
|
||||
### <a id="227"></a> ClickHouse release 22.7, 2022-07-21
|
||||
|
||||
#### Upgrade Notes
|
||||
|
||||
* Enable setting `enable_positional_arguments` by default. It allows queries like `SELECT ... ORDER BY 1, 2` where 1, 2 are the references to the select clause. If you need to return the old behavior, disable this setting. [#38204](https://github.com/ClickHouse/ClickHouse/pull/38204) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Disable `format_csv_allow_single_quotes` by default. See [#37096](https://github.com/ClickHouse/ClickHouse/issues/37096). ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* `Ordinary` database engine and old storage definition syntax for `*MergeTree` tables are deprecated. By default it's not possible to create new databases with `Ordinary` engine. If `system` database has `Ordinary` engine it will be automatically converted to `Atomic` on server startup. There are settings to keep old behavior (`allow_deprecated_database_ordinary` and `allow_deprecated_syntax_for_merge_tree`), but these settings may be removed in future releases. [#38335](https://github.com/ClickHouse/ClickHouse/pull/38335) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
* Force rewriting comma join to inner by default (set default value `cross_to_inner_join_rewrite = 2`). To have old behavior set `cross_to_inner_join_rewrite = 1`. [#39326](https://github.com/ClickHouse/ClickHouse/pull/39326) ([Vladimir C](https://github.com/vdimir)). If you will face any incompatibilities, you can turn this setting back.
|
||||
|
||||
#### New Feature
|
||||
|
||||
* Support expressions with window functions. Closes [#19857](https://github.com/ClickHouse/ClickHouse/issues/19857). [#37848](https://github.com/ClickHouse/ClickHouse/pull/37848) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* Add new `direct` join algorithm for `EmbeddedRocksDB` tables, see [#33582](https://github.com/ClickHouse/ClickHouse/issues/33582). [#35363](https://github.com/ClickHouse/ClickHouse/pull/35363) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Added full sorting merge join algorithm. [#35796](https://github.com/ClickHouse/ClickHouse/pull/35796) ([Vladimir C](https://github.com/vdimir)).
|
||||
@ -395,9 +406,11 @@
|
||||
* Add `clickhouse-diagnostics` binary to the packages. [#38647](https://github.com/ClickHouse/ClickHouse/pull/38647) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
|
||||
#### Experimental Feature
|
||||
|
||||
* Adds new setting `implicit_transaction` to run standalone queries inside a transaction. It handles both creation and closing (via COMMIT if the query succeeded or ROLLBACK if it didn't) of the transaction automatically. [#38344](https://github.com/ClickHouse/ClickHouse/pull/38344) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
|
||||
#### Performance Improvement
|
||||
|
||||
* Distinct optimization for sorted columns. Use specialized distinct transformation in case input stream is sorted by column(s) in distinct. Optimization can be applied to pre-distinct, final distinct, or both. Initial implementation by @dimarub2000. [#37803](https://github.com/ClickHouse/ClickHouse/pull/37803) ([Igor Nikonov](https://github.com/devcrafter)).
|
||||
* Improve performance of `ORDER BY`, `MergeTree` merges, window functions using batch version of `BinaryHeap`. [#38022](https://github.com/ClickHouse/ClickHouse/pull/38022) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* More parallel execution for queries with `FINAL` [#36396](https://github.com/ClickHouse/ClickHouse/pull/36396) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
@ -407,7 +420,7 @@
|
||||
* Improve performance of insertion to columns of type `JSON`. [#38320](https://github.com/ClickHouse/ClickHouse/pull/38320) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Optimized insertion and lookups in the HashTable. [#38413](https://github.com/ClickHouse/ClickHouse/pull/38413) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Fix performance degradation from [#32493](https://github.com/ClickHouse/ClickHouse/issues/32493). [#38417](https://github.com/ClickHouse/ClickHouse/pull/38417) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Improve performance of joining with numeric columns using SIMD instructions. [#37235](https://github.com/ClickHouse/ClickHouse/pull/37235) ([zzachimed](https://github.com/zzachimed)). [#38565](https://github.com/ClickHouse/ClickHouse/pull/38565) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Improve performance of joining with numeric columns using SIMD instructions. [#37235](https://github.com/ClickHouse/ClickHouse/pull/37235) ([zzachimed](https://github.com/zzachimed)). [#38565](https://github.com/ClickHouse/ClickHouse/pull/38565) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Norm and Distance functions for arrays speed up 1.2-2 times. [#38740](https://github.com/ClickHouse/ClickHouse/pull/38740) ([Alexander Gololobov](https://github.com/davenger)).
|
||||
* Add AVX-512 VBMI optimized `copyOverlap32Shuffle` for LZ4 decompression. In other words, LZ4 decompression performance is improved. [#37891](https://github.com/ClickHouse/ClickHouse/pull/37891) ([Guo Wangyang](https://github.com/guowangy)).
|
||||
* `ORDER BY (a, b)` will use all the same benefits as `ORDER BY a, b`. [#38873](https://github.com/ClickHouse/ClickHouse/pull/38873) ([Igor Nikonov](https://github.com/devcrafter)).
|
||||
@ -419,6 +432,7 @@
|
||||
* The table `system.asynchronous_metric_log` is further optimized for storage space. This closes [#38134](https://github.com/ClickHouse/ClickHouse/issues/38134). See the [YouTube video](https://www.youtube.com/watch?v=0fSp9SF8N8A). [#38428](https://github.com/ClickHouse/ClickHouse/pull/38428) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
|
||||
#### Improvement
|
||||
|
||||
* Support SQL standard CREATE INDEX and DROP INDEX syntax. [#35166](https://github.com/ClickHouse/ClickHouse/pull/35166) ([Jianmei Zhang](https://github.com/zhangjmruc)).
|
||||
* Send profile events for INSERT queries (previously only SELECT was supported). [#37391](https://github.com/ClickHouse/ClickHouse/pull/37391) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Implement in order aggregation (`optimize_aggregation_in_order`) for fully materialized projections. [#37469](https://github.com/ClickHouse/ClickHouse/pull/37469) ([Azat Khuzhin](https://github.com/azat)).
|
||||
@ -464,6 +478,7 @@
|
||||
* Allow to declare `RabbitMQ` queue without default arguments `x-max-length` and `x-overflow`. [#39259](https://github.com/ClickHouse/ClickHouse/pull/39259) ([rnbondarenko](https://github.com/rnbondarenko)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* Apply Clang Thread Safety Analysis (TSA) annotations to ClickHouse. [#38068](https://github.com/ClickHouse/ClickHouse/pull/38068) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Adapt universal installation script for FreeBSD. [#39302](https://github.com/ClickHouse/ClickHouse/pull/39302) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Preparation for building on `s390x` platform. [#39193](https://github.com/ClickHouse/ClickHouse/pull/39193) ([Harry Lee](https://github.com/HarryLeeIBM)).
|
||||
@ -473,6 +488,7 @@
|
||||
* Change `all|noarch` packages to architecture-dependent - Fix some documentation for it - Push aarch64|arm64 packages to artifactory and release assets - Fixes [#36443](https://github.com/ClickHouse/ClickHouse/issues/36443). [#38580](https://github.com/ClickHouse/ClickHouse/pull/38580) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
|
||||
|
||||
* Fix rounding for `Decimal128/Decimal256` with more than 19-digits long scale. [#38027](https://github.com/ClickHouse/ClickHouse/pull/38027) ([Igor Nikonov](https://github.com/devcrafter)).
|
||||
* Fixed crash caused by data race in storage `Hive` (integration table engine). [#38887](https://github.com/ClickHouse/ClickHouse/pull/38887) ([lgbo](https://github.com/lgbo-ustc)).
|
||||
* Fix crash when executing GRANT ALL ON *.* with ON CLUSTER. It was broken in https://github.com/ClickHouse/ClickHouse/pull/35767. This closes [#38618](https://github.com/ClickHouse/ClickHouse/issues/38618). [#38674](https://github.com/ClickHouse/ClickHouse/pull/38674) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
@ -529,6 +545,7 @@
|
||||
### <a id="226"></a> ClickHouse release 22.6, 2022-06-16
|
||||
|
||||
#### Backward Incompatible Change
|
||||
|
||||
* Remove support for octal number literals in SQL. In previous versions they were parsed as Float64. [#37765](https://github.com/ClickHouse/ClickHouse/pull/37765) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Changes how settings using `seconds` as type are parsed to support floating point values (for example: `max_execution_time=0.5`). Infinity or NaN values will throw an exception. [#37187](https://github.com/ClickHouse/ClickHouse/pull/37187) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Changed format of binary serialization of columns of experimental type `Object`. New format is more convenient to implement by third-party clients. [#37482](https://github.com/ClickHouse/ClickHouse/pull/37482) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
@ -537,6 +554,7 @@
|
||||
* If you run different ClickHouse versions on a cluster with AArch64 CPU or mix AArch64 and amd64 on a cluster, and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don't fit in 64 bits, and the size of the result is huge, the data will not be fully aggregated in the result of these queries during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade.
|
||||
|
||||
#### New Feature
|
||||
|
||||
* Add `GROUPING` function. It allows to disambiguate the records in the queries with `ROLLUP`, `CUBE` or `GROUPING SETS`. Closes [#19426](https://github.com/ClickHouse/ClickHouse/issues/19426). [#37163](https://github.com/ClickHouse/ClickHouse/pull/37163) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* A new codec [FPC](https://userweb.cs.txstate.edu/~burtscher/papers/dcc07a.pdf) algorithm for floating point data compression. [#37553](https://github.com/ClickHouse/ClickHouse/pull/37553) ([Mikhail Guzov](https://github.com/koloshmet)).
|
||||
* Add new columnar JSON formats: `JSONColumns`, `JSONCompactColumns`, `JSONColumnsWithMetadata`. Closes [#36338](https://github.com/ClickHouse/ClickHouse/issues/36338) Closes [#34509](https://github.com/ClickHouse/ClickHouse/issues/34509). [#36975](https://github.com/ClickHouse/ClickHouse/pull/36975) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
@ -557,11 +575,13 @@
|
||||
* Added `SYSTEM UNFREEZE` query that deletes the whole backup regardless if the corresponding table is deleted or not. [#36424](https://github.com/ClickHouse/ClickHouse/pull/36424) ([Vadim Volodin](https://github.com/PolyProgrammist)).
|
||||
|
||||
#### Experimental Feature
|
||||
|
||||
* Enables `POPULATE` for `WINDOW VIEW`. [#36945](https://github.com/ClickHouse/ClickHouse/pull/36945) ([vxider](https://github.com/Vxider)).
|
||||
* `ALTER TABLE ... MODIFY QUERY` support for `WINDOW VIEW`. [#37188](https://github.com/ClickHouse/ClickHouse/pull/37188) ([vxider](https://github.com/Vxider)).
|
||||
* This PR changes the behavior of the `ENGINE` syntax in `WINDOW VIEW`, to make it like in `MATERIALIZED VIEW`. [#37214](https://github.com/ClickHouse/ClickHouse/pull/37214) ([vxider](https://github.com/Vxider)).
|
||||
|
||||
#### Performance Improvement
|
||||
|
||||
* Added numerous optimizations for ARM NEON [#38093](https://github.com/ClickHouse/ClickHouse/pull/38093)([Daniel Kutenin](https://github.com/danlark1)), ([Alexandra Pilipyuk](https://github.com/chalice19)) Note: if you run different ClickHouse versions on a cluster with ARM CPU and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don't fit in 64 bits, the result of the aggregation query will be wrong during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade.
|
||||
* Improve performance and memory usage for select of subset of columns for formats Native, Protobuf, CapnProto, JSONEachRow, TSKV, all formats with suffixes WithNames/WithNamesAndTypes. Previously while selecting only subset of columns from files in these formats all columns were read and stored in memory. Now only required columns are read. This PR enables setting `input_format_skip_unknown_fields` by default, because otherwise in case of select of subset of columns exception will be thrown. [#37192](https://github.com/ClickHouse/ClickHouse/pull/37192) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Now more filters can be pushed down for join. [#37472](https://github.com/ClickHouse/ClickHouse/pull/37472) ([Amos Bird](https://github.com/amosbird)).
|
||||
@ -592,6 +612,7 @@
|
||||
* In function: CompressedWriteBuffer::nextImpl(), there is an unnecessary write-copy step that would happen frequently during inserting data. Below shows the differentiation with this patch: - Before: 1. Compress "working_buffer" into "compressed_buffer" 2. write-copy into "out" - After: Directly Compress "working_buffer" into "out". [#37242](https://github.com/ClickHouse/ClickHouse/pull/37242) ([jasperzhu](https://github.com/jinjunzh)).
|
||||
|
||||
#### Improvement
|
||||
|
||||
* Support types with non-standard defaults in ROLLUP, CUBE, GROUPING SETS. Closes [#37360](https://github.com/ClickHouse/ClickHouse/issues/37360). [#37667](https://github.com/ClickHouse/ClickHouse/pull/37667) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* Fix stack traces collection on ARM. Closes [#37044](https://github.com/ClickHouse/ClickHouse/issues/37044). Closes [#15638](https://github.com/ClickHouse/ClickHouse/issues/15638). [#37797](https://github.com/ClickHouse/ClickHouse/pull/37797) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Client will try every IP address returned by DNS resolution until successful connection. [#37273](https://github.com/ClickHouse/ClickHouse/pull/37273) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
@ -633,6 +654,7 @@
|
||||
* Add implicit grants with grant option too. For example `GRANT CREATE TABLE ON test.* TO A WITH GRANT OPTION` now allows `A` to execute `GRANT CREATE VIEW ON test.* TO B`. [#38017](https://github.com/ClickHouse/ClickHouse/pull/38017) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* Use `clang-14` and LLVM infrastructure version 14 for builds. This closes [#34681](https://github.com/ClickHouse/ClickHouse/issues/34681). [#34754](https://github.com/ClickHouse/ClickHouse/pull/34754) ([Alexey Milovidov](https://github.com/alexey-milovidov)). Note: `clang-14` has [a bug](https://github.com/google/sanitizers/issues/1540) in ThreadSanitizer that makes our CI work worse.
|
||||
* Allow to drop privileges at startup. This simplifies Docker images. Closes [#36293](https://github.com/ClickHouse/ClickHouse/issues/36293). [#36341](https://github.com/ClickHouse/ClickHouse/pull/36341) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add docs spellcheck to CI. [#37790](https://github.com/ClickHouse/ClickHouse/pull/37790) ([Vladimir C](https://github.com/vdimir)).
|
||||
@ -690,7 +712,6 @@
|
||||
* Fix possible heap-use-after-free error when reading system.projection_parts and system.projection_parts_columns . This fixes [#37184](https://github.com/ClickHouse/ClickHouse/issues/37184). [#37185](https://github.com/ClickHouse/ClickHouse/pull/37185) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fixed `DateTime64` fractional seconds behavior prior to Unix epoch. [#37697](https://github.com/ClickHouse/ClickHouse/pull/37697) ([Andrey Zvonov](https://github.com/zvonand)). [#37039](https://github.com/ClickHouse/ClickHouse/pull/37039) ([李扬](https://github.com/taiyang-li)).
|
||||
|
||||
|
||||
### <a id="225"></a> ClickHouse release 22.5, 2022-05-19
|
||||
|
||||
#### Upgrade Notes
|
||||
@ -743,7 +764,7 @@
|
||||
* Implement partial GROUP BY key for optimize_aggregation_in_order. [#35111](https://github.com/ClickHouse/ClickHouse/pull/35111) ([Azat Khuzhin](https://github.com/azat)).
|
||||
|
||||
#### Improvement
|
||||
|
||||
|
||||
* Show names of erroneous files in case of parsing errors while executing table functions `file`, `s3` and `url`. [#36314](https://github.com/ClickHouse/ClickHouse/pull/36314) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Allowed to increase the number of threads for executing background operations (merges, mutations, moves and fetches) at runtime if they are specified at top level config. [#36425](https://github.com/ClickHouse/ClickHouse/pull/36425) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Now date time conversion functions that generates time before 1970-01-01 00:00:00 with partial hours/minutes timezones will be saturated to zero instead of overflow. This is the continuation of https://github.com/ClickHouse/ClickHouse/pull/29953 which addresses https://github.com/ClickHouse/ClickHouse/pull/29953#discussion_r800550280 . Mark as improvement because it's implementation defined behavior (and very rare case) and we are allowed to break it. [#36656](https://github.com/ClickHouse/ClickHouse/pull/36656) ([Amos Bird](https://github.com/amosbird)).
|
||||
@ -852,7 +873,6 @@
|
||||
* Fix ALTER DROP COLUMN of nested column with compact parts (i.e. `ALTER TABLE x DROP COLUMN n`, when there is column `n.d`). [#35797](https://github.com/ClickHouse/ClickHouse/pull/35797) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix substring function range error length when `offset` and `length` is negative constant and `s` is not constant. [#33861](https://github.com/ClickHouse/ClickHouse/pull/33861) ([RogerYK](https://github.com/RogerYK)).
|
||||
|
||||
|
||||
### <a id="224"></a> ClickHouse release 22.4, 2022-04-19
|
||||
|
||||
#### Backward Incompatible Change
|
||||
@ -1004,8 +1024,7 @@
|
||||
* Fix mutations in tables with enabled sparse columns. [#35284](https://github.com/ClickHouse/ClickHouse/pull/35284) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Do not delay final part writing by default (fixes possible `Memory limit exceeded` during `INSERT` by adding `max_insert_delayed_streams_for_parallel_write` with default to 1000 for writes to s3 and disabled as before otherwise). [#34780](https://github.com/ClickHouse/ClickHouse/pull/34780) ([Azat Khuzhin](https://github.com/azat)).
|
||||
|
||||
|
||||
## <a id="223"></a> ClickHouse release v22.3-lts, 2022-03-17
|
||||
### <a id="223"></a> ClickHouse release v22.3-lts, 2022-03-17
|
||||
|
||||
#### Backward Incompatible Change
|
||||
|
||||
@ -1132,7 +1151,6 @@
|
||||
* Fix incorrect result of trivial count query when part movement feature is used [#34089](https://github.com/ClickHouse/ClickHouse/issues/34089). [#34385](https://github.com/ClickHouse/ClickHouse/pull/34385) ([nvartolomei](https://github.com/nvartolomei)).
|
||||
* Fix inconsistency of `max_query_size` limitation in distributed subqueries. [#34078](https://github.com/ClickHouse/ClickHouse/pull/34078) ([Chao Ma](https://github.com/godliness)).
|
||||
|
||||
|
||||
### <a id="222"></a> ClickHouse release v22.2, 2022-02-17
|
||||
|
||||
#### Upgrade Notes
|
||||
@ -1308,7 +1326,6 @@
|
||||
* Fix issue [#18206](https://github.com/ClickHouse/ClickHouse/issues/18206). [#33977](https://github.com/ClickHouse/ClickHouse/pull/33977) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). [#33574](https://github.com/ClickHouse/ClickHouse/pull/33574) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
|
||||
|
||||
### <a id="221"></a> ClickHouse release v22.1, 2022-01-18
|
||||
|
||||
#### Upgrade Notes
|
||||
@ -1335,7 +1352,6 @@
|
||||
* Add function `decodeURLFormComponent` slightly different to `decodeURLComponent`. Close [#10298](https://github.com/ClickHouse/ClickHouse/issues/10298). [#33451](https://github.com/ClickHouse/ClickHouse/pull/33451) ([SuperDJY](https://github.com/cmsxbc)).
|
||||
* Allow to split `GraphiteMergeTree` rollup rules for plain/tagged metrics (optional rule_type field). [#33494](https://github.com/ClickHouse/ClickHouse/pull/33494) ([Michail Safronov](https://github.com/msaf1980)).
|
||||
|
||||
|
||||
#### Performance Improvement
|
||||
|
||||
* Support moving conditions to `PREWHERE` (setting `optimize_move_to_prewhere`) for tables of `Merge` engine if its all underlying tables supports `PREWHERE`. [#33300](https://github.com/ClickHouse/ClickHouse/pull/33300) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
@ -1351,7 +1367,6 @@
|
||||
* Optimize selecting of MergeTree parts that can be moved between volumes. [#33225](https://github.com/ClickHouse/ClickHouse/pull/33225) ([OnePiece](https://github.com/zhongyuankai)).
|
||||
* Fix `sparse_hashed` dict performance with sequential keys (wrong hash function). [#32536](https://github.com/ClickHouse/ClickHouse/pull/32536) ([Azat Khuzhin](https://github.com/azat)).
|
||||
|
||||
|
||||
#### Experimental Feature
|
||||
|
||||
* Parallel reading from multiple replicas within a shard during distributed query without using sample key. To enable this, set `allow_experimental_parallel_reading_from_replicas = 1` and `max_parallel_replicas` to any number. This closes [#26748](https://github.com/ClickHouse/ClickHouse/issues/26748). [#29279](https://github.com/ClickHouse/ClickHouse/pull/29279) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
@ -1364,7 +1379,6 @@
|
||||
* Fix ACL with explicit digit hash in `clickhouse-keeper`: now the behavior consistent with ZooKeeper and generated digest is always accepted. [#33249](https://github.com/ClickHouse/ClickHouse/pull/33249) ([小路](https://github.com/nicelulu)). [#33246](https://github.com/ClickHouse/ClickHouse/pull/33246).
|
||||
* Fix unexpected projection removal when detaching parts. [#32067](https://github.com/ClickHouse/ClickHouse/pull/32067) ([Amos Bird](https://github.com/amosbird)).
|
||||
|
||||
|
||||
#### Improvement
|
||||
|
||||
* Now date time conversion functions that generates time before `1970-01-01 00:00:00` will be saturated to zero instead of overflow. [#29953](https://github.com/ClickHouse/ClickHouse/pull/29953) ([Amos Bird](https://github.com/amosbird)). It also fixes a bug in index analysis if date truncation function would yield result before the Unix epoch.
|
||||
@ -1411,7 +1425,6 @@
|
||||
* Updating `modification_time` for data part in `system.parts` after part movement [#32964](https://github.com/ClickHouse/ClickHouse/issues/32964). [#32965](https://github.com/ClickHouse/ClickHouse/pull/32965) ([save-my-heart](https://github.com/save-my-heart)).
|
||||
* Potential issue, cannot be exploited: integer overflow may happen in array resize. [#33024](https://github.com/ClickHouse/ClickHouse/pull/33024) ([varadarajkumar](https://github.com/varadarajkumar)).
|
||||
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* Add packages, functional tests and Docker builds for AArch64 (ARM) version of ClickHouse. [#32911](https://github.com/ClickHouse/ClickHouse/pull/32911) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). [#32415](https://github.com/ClickHouse/ClickHouse/pull/32415)
|
||||
@ -1426,7 +1439,6 @@
|
||||
* Inject git information into clickhouse binary file. So we can get source code revision easily from clickhouse binary file. [#33124](https://github.com/ClickHouse/ClickHouse/pull/33124) ([taiyang-li](https://github.com/taiyang-li)).
|
||||
* Remove obsolete code from ConfigProcessor. Yandex specific code is not used anymore. The code contained one minor defect. This defect was reported by [Mallik Hassan](https://github.com/SadiHassan) in [#33032](https://github.com/ClickHouse/ClickHouse/issues/33032). This closes [#33032](https://github.com/ClickHouse/ClickHouse/issues/33032). [#33026](https://github.com/ClickHouse/ClickHouse/pull/33026) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in official stable or prestable release)
|
||||
|
||||
* Several fixes for format parsing. This is relevant if `clickhouse-server` is open for write access to adversary. Specifically crafted input data for `Native` format may lead to reading uninitialized memory or crash. This is relevant if `clickhouse-server` is open for write access to adversary. [#33050](https://github.com/ClickHouse/ClickHouse/pull/33050) ([Heena Bansal](https://github.com/HeenaBansal2009)). Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. [#33022](https://github.com/ClickHouse/ClickHouse/pull/33022) ([Harry Lee](https://github.com/HarryLeeIBM)). Fix null pointer dereference in `LowCardinality` data when deserializing `LowCardinality` data in the Native format. [#33021](https://github.com/ClickHouse/ClickHouse/pull/33021) ([Harry Lee](https://github.com/HarryLeeIBM)).
|
||||
@ -1485,5 +1497,4 @@
|
||||
* Fix possible crash (or incorrect result) in case of `LowCardinality` arguments of window function. Fixes [#31114](https://github.com/ClickHouse/ClickHouse/issues/31114). [#31888](https://github.com/ClickHouse/ClickHouse/pull/31888) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix hang up with command `DROP TABLE system.query_log sync`. [#33293](https://github.com/ClickHouse/ClickHouse/pull/33293) ([zhanghuajie](https://github.com/zhanghuajieHIT)).
|
||||
|
||||
|
||||
## [Changelog for 2021](https://clickhouse.com/docs/en/whats-new/changelog/2021)
|
||||
|
@ -10,7 +10,7 @@ ClickHouse® is an open-source column-oriented database management system that a
|
||||
* [Documentation](https://clickhouse.com/docs/en/) provides more in-depth information.
|
||||
* [YouTube channel](https://www.youtube.com/c/ClickHouseDB) has a lot of content about ClickHouse in video format.
|
||||
* [Slack](https://join.slack.com/t/clickhousedb/shared_invite/zt-rxm3rdrk-lIUmhLC3V8WTaL0TGxsOmg) and [Telegram](https://telegram.me/clickhouse_en) allow chatting with ClickHouse users in real-time.
|
||||
* [Blog](https://clickhouse.com/blog/en/) contains various ClickHouse-related articles, as well as announcements and reports about events.
|
||||
* [Blog](https://clickhouse.com/blog/) contains various ClickHouse-related articles, as well as announcements and reports about events.
|
||||
* [Code Browser (Woboq)](https://clickhouse.com/codebrowser/ClickHouse/index.html) with syntax highlight and navigation.
|
||||
* [Code Browser (github.dev)](https://github.dev/ClickHouse/ClickHouse) with syntax highlight, powered by github.dev.
|
||||
* [Contacts](https://clickhouse.com/company/contact) can help to get your questions answered if there are any.
|
||||
|
@ -23,7 +23,7 @@ namespace
|
||||
{
|
||||
|
||||
/// Trim ending whitespace inplace
|
||||
void trim(String & s)
|
||||
void rightTrim(String & s)
|
||||
{
|
||||
s.erase(std::find_if(s.rbegin(), s.rend(), [](unsigned char ch) { return !std::isspace(ch); }).base(), s.end());
|
||||
}
|
||||
@ -441,7 +441,7 @@ LineReader::InputStatus ReplxxLineReader::readOneLine(const String & prompt)
|
||||
return (errno != EAGAIN) ? ABORT : RESET_LINE;
|
||||
input = cinput;
|
||||
|
||||
trim(input);
|
||||
rightTrim(input);
|
||||
return INPUT_LINE;
|
||||
}
|
||||
|
||||
@ -512,6 +512,9 @@ void ReplxxLineReader::openInteractiveHistorySearch()
|
||||
/// NOTE: You can use one of the following to configure the behaviour additionally:
|
||||
/// - SKIM_DEFAULT_OPTIONS
|
||||
/// - FZF_DEFAULT_OPTS
|
||||
///
|
||||
/// And also note, that fzf and skim is 95% compatible (at least option
|
||||
/// that is used here)
|
||||
std::string fuzzy_finder_command = fmt::format(
|
||||
"{} --read0 --tac --no-sort --tiebreak=index --bind=ctrl-r:toggle-sort --height=30% < {} > {}",
|
||||
fuzzy_finder, history_file.getPath(), output_file.getPath());
|
||||
@ -521,7 +524,8 @@ void ReplxxLineReader::openInteractiveHistorySearch()
|
||||
{
|
||||
if (executeCommand(argv) == 0)
|
||||
{
|
||||
const std::string & new_query = readFile(output_file.getPath());
|
||||
std::string new_query = readFile(output_file.getPath());
|
||||
rightTrim(new_query);
|
||||
rx.set_state(replxx::Replxx::State(new_query.c_str(), new_query.size()));
|
||||
}
|
||||
}
|
||||
|
@ -3,7 +3,6 @@
|
||||
#endif
|
||||
#include <unistd.h>
|
||||
#include <base/safeExit.h>
|
||||
#include <base/defines.h>
|
||||
|
||||
[[noreturn]] void safeExit(int code)
|
||||
{
|
||||
|
@ -3,15 +3,15 @@
|
||||
# This is a workaround for bug in llvm/clang,
|
||||
# that does not produce .debug_aranges with LTO
|
||||
#
|
||||
# NOTE: this is a temporary solution, that should be removed once [1] will be
|
||||
# resolved.
|
||||
# NOTE: this is a temporary solution, that should be removed after upgrading to
|
||||
# clang-16/llvm-16.
|
||||
#
|
||||
# [1]: https://discourse.llvm.org/t/clang-does-not-produce-full-debug-aranges-section-with-thinlto/64898/8
|
||||
# Refs: https://reviews.llvm.org/D133092
|
||||
|
||||
# NOTE: only -flto=thin is supported.
|
||||
# NOTE: it is not possible to check was there -gdwarf-aranges initially or not.
|
||||
if [[ "$*" =~ -plugin-opt=thinlto ]]; then
|
||||
exec "@LLD_PATH@" -mllvm -generate-arange-section "$@"
|
||||
exec "@LLD_PATH@" -plugin-opt=-generate-arange-section "$@"
|
||||
else
|
||||
exec "@LLD_PATH@" "$@"
|
||||
fi
|
||||
|
2
contrib/cctz
vendored
2
contrib/cctz
vendored
@ -1 +1 @@
|
||||
Subproject commit 49c656c62fbd36a1bc20d64c476853bdb7cf7bb9
|
||||
Subproject commit 7a454c25c7d16053bcd327cdd16329212a08fa4a
|
2
contrib/llvm-project
vendored
2
contrib/llvm-project
vendored
@ -1 +1 @@
|
||||
Subproject commit c7f7cfc85e4b81c1c76cdd633dd8808d2dfd6114
|
||||
Subproject commit 3a39038345a400e7e767811b142a94355d511215
|
@ -1,4 +1,4 @@
|
||||
if (APPLE OR NOT ARCH_AMD64 OR SANITIZE STREQUAL "undefined" OR NOT USE_STATIC_LIBRARIES)
|
||||
if (APPLE OR NOT ARCH_AMD64 OR SANITIZE STREQUAL "undefined")
|
||||
set (ENABLE_EMBEDDED_COMPILER_DEFAULT OFF)
|
||||
else()
|
||||
set (ENABLE_EMBEDDED_COMPILER_DEFAULT ON)
|
||||
@ -6,15 +6,16 @@ endif()
|
||||
|
||||
option (ENABLE_EMBEDDED_COMPILER "Enable support for 'compile_expressions' option for query execution" ${ENABLE_EMBEDDED_COMPILER_DEFAULT})
|
||||
|
||||
# If USE_STATIC_LIBRARIES=0 was passed to CMake, we'll still build LLVM statically to keep complexity minimal.
|
||||
|
||||
if (NOT ENABLE_EMBEDDED_COMPILER)
|
||||
message(STATUS "Not using LLVM")
|
||||
return()
|
||||
endif()
|
||||
|
||||
# TODO: Enable shared library build
|
||||
# TODO: Enable compilation on AArch64
|
||||
|
||||
set (LLVM_VERSION "14.0.0bundled")
|
||||
set (LLVM_VERSION "15.0.0bundled")
|
||||
set (LLVM_INCLUDE_DIRS
|
||||
"${ClickHouse_SOURCE_DIR}/contrib/llvm-project/llvm/include"
|
||||
"${ClickHouse_BINARY_DIR}/contrib/llvm-project/llvm/include"
|
||||
@ -62,9 +63,6 @@ set (REQUIRED_LLVM_LIBRARIES
|
||||
# list(APPEND REQUIRED_LLVM_LIBRARIES LLVMAArch64Info LLVMAArch64Desc LLVMAArch64CodeGen)
|
||||
# endif ()
|
||||
|
||||
# ld: unknown option: --color-diagnostics
|
||||
# set (LINKER_SUPPORTS_COLOR_DIAGNOSTICS 0 CACHE INTERNAL "")
|
||||
|
||||
set (CMAKE_INSTALL_RPATH "ON") # Do not adjust RPATH in llvm, since then it will not be able to find libcxx/libcxxabi/libunwind
|
||||
set (LLVM_COMPILER_CHECKED 1 CACHE INTERNAL "") # Skip internal compiler selection
|
||||
set (LLVM_ENABLE_EH 1 CACHE INTERNAL "") # With exception handling
|
||||
@ -80,6 +78,7 @@ set(LLVM_ENABLE_LIBXML2 0 CACHE INTERNAL "")
|
||||
set(LLVM_ENABLE_LIBEDIT 0 CACHE INTERNAL "")
|
||||
set(LLVM_ENABLE_LIBPFM 0 CACHE INTERNAL "")
|
||||
set(LLVM_ENABLE_ZLIB 0 CACHE INTERNAL "")
|
||||
set(LLVM_ENABLE_ZSTD 0 CACHE INTERNAL "")
|
||||
set(LLVM_ENABLE_Z3_SOLVER 0 CACHE INTERNAL "")
|
||||
set(LLVM_INCLUDE_TOOLS 0 CACHE INTERNAL "")
|
||||
set(LLVM_BUILD_TOOLS 0 CACHE INTERNAL "")
|
||||
@ -96,9 +95,6 @@ set(LLVM_INCLUDE_DOCS 0 CACHE INTERNAL "")
|
||||
set(LLVM_ENABLE_OCAMLDOC 0 CACHE INTERNAL "")
|
||||
set(LLVM_ENABLE_BINDINGS 0 CACHE INTERNAL "")
|
||||
|
||||
# C++20 is currently not supported due to ambiguous operator != etc.
|
||||
set (CMAKE_CXX_STANDARD 17)
|
||||
|
||||
set (LLVM_SOURCE_DIR "${ClickHouse_SOURCE_DIR}/contrib/llvm-project/llvm")
|
||||
set (LLVM_BINARY_DIR "${ClickHouse_BINARY_DIR}/contrib/llvm-project/llvm")
|
||||
add_subdirectory ("${LLVM_SOURCE_DIR}" "${LLVM_BINARY_DIR}")
|
||||
|
@ -36,10 +36,7 @@ RUN arch=${TARGETARCH:-amd64} \
|
||||
# repo versions doesn't work correctly with C++17
|
||||
# also we push reports to s3, so we add index.html to subfolder urls
|
||||
# https://github.com/ClickHouse-Extras/woboq_codebrowser/commit/37e15eaf377b920acb0b48dbe82471be9203f76b
|
||||
# TODO: remove branch in a few weeks after merge, e.g. in May or June 2022
|
||||
#
|
||||
# FIXME: update location of a repo
|
||||
RUN git clone https://github.com/azat/woboq_codebrowser --branch llvm-15 \
|
||||
RUN git clone https://github.com/ClickHouse/woboq_codebrowser \
|
||||
&& cd woboq_codebrowser \
|
||||
&& cmake . -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang\+\+-${LLVM_VERSION} -DCMAKE_C_COMPILER=clang-${LLVM_VERSION} \
|
||||
&& ninja \
|
||||
|
22
docker/test/stress/run.sh
Executable file → Normal file
22
docker/test/stress/run.sh
Executable file → Normal file
@ -47,7 +47,6 @@ function install_packages()
|
||||
|
||||
function configure()
|
||||
{
|
||||
export ZOOKEEPER_FAULT_INJECTION=1
|
||||
# install test configs
|
||||
export USE_DATABASE_ORDINARY=1
|
||||
export EXPORT_S3_STORAGE_POLICIES=1
|
||||
@ -203,6 +202,7 @@ quit
|
||||
|
||||
install_packages package_folder
|
||||
|
||||
export ZOOKEEPER_FAULT_INJECTION=1
|
||||
configure
|
||||
|
||||
azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --debug /azurite_log &
|
||||
@ -243,6 +243,7 @@ stop
|
||||
|
||||
# Let's enable S3 storage by default
|
||||
export USE_S3_STORAGE_FOR_MERGE_TREE=1
|
||||
export ZOOKEEPER_FAULT_INJECTION=1
|
||||
configure
|
||||
|
||||
# But we still need default disk because some tables loaded only into it
|
||||
@ -375,6 +376,8 @@ else
|
||||
install_packages previous_release_package_folder
|
||||
|
||||
# Start server from previous release
|
||||
# Previous version may not be ready for fault injections
|
||||
export ZOOKEEPER_FAULT_INJECTION=0
|
||||
configure
|
||||
|
||||
# Avoid "Setting s3_check_objects_after_upload is neither a builtin setting..."
|
||||
@ -389,12 +392,23 @@ else
|
||||
|
||||
clickhouse-client --query="SELECT 'Server version: ', version()"
|
||||
|
||||
# Install new package before running stress test because we should use new clickhouse-client and new clickhouse-test
|
||||
# But we should leave old binary in /usr/bin/ for gdb (so it will print sane stacktarces)
|
||||
# Install new package before running stress test because we should use new
|
||||
# clickhouse-client and new clickhouse-test.
|
||||
#
|
||||
# But we should leave old binary in /usr/bin/ and debug symbols in
|
||||
# /usr/lib/debug/usr/bin (if any) for gdb and internal DWARF parser, so it
|
||||
# will print sane stacktraces and also to avoid possible crashes.
|
||||
#
|
||||
# FIXME: those files can be extracted directly from debian package, but
|
||||
# actually better solution will be to use different PATH instead of playing
|
||||
# games with files from packages.
|
||||
mv /usr/bin/clickhouse previous_release_package_folder/
|
||||
mv /usr/lib/debug/usr/bin/clickhouse.debug previous_release_package_folder/
|
||||
install_packages package_folder
|
||||
mv /usr/bin/clickhouse package_folder/
|
||||
mv /usr/lib/debug/usr/bin/clickhouse.debug package_folder/
|
||||
mv previous_release_package_folder/clickhouse /usr/bin/
|
||||
mv previous_release_package_folder/clickhouse.debug /usr/lib/debug/usr/bin/clickhouse.debug
|
||||
|
||||
mkdir tmp_stress_output
|
||||
|
||||
@ -410,6 +424,8 @@ else
|
||||
|
||||
# Start new server
|
||||
mv package_folder/clickhouse /usr/bin/
|
||||
mv package_folder/clickhouse.debug /usr/lib/debug/usr/bin/clickhouse.debug
|
||||
export ZOOKEEPER_FAULT_INJECTION=1
|
||||
configure
|
||||
start 500
|
||||
clickhouse-client --query "SELECT 'Backward compatibility check: Server successfully started', 'OK'" >> /test_output/test_results.tsv \
|
||||
|
@ -5,6 +5,7 @@ FROM ubuntu:20.04
|
||||
ARG apt_archive="http://archive.ubuntu.com"
|
||||
RUN sed -i "s|http://archive.ubuntu.com|$apt_archive|g" /etc/apt/sources.list
|
||||
|
||||
# 15.0.2
|
||||
ENV DEBIAN_FRONTEND=noninteractive LLVM_VERSION=15
|
||||
|
||||
RUN apt-get update \
|
||||
@ -58,6 +59,9 @@ RUN apt-get update \
|
||||
RUN ln -s /usr/bin/lld-${LLVM_VERSION} /usr/bin/ld.lld
|
||||
# for external_symbolizer_path
|
||||
RUN ln -s /usr/bin/llvm-symbolizer-${LLVM_VERSION} /usr/bin/llvm-symbolizer
|
||||
# FIXME: workaround for "The imported target "merge-fdata" references the file" error
|
||||
# https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/commit/992e52c0b156a5ba9c6a8a54f8c4857ddd3d371d
|
||||
RUN sed -i '/_IMPORT_CHECK_FILES_FOR_\(mlir-\|llvm-bolt\|merge-fdata\|MLIR\)/ {s|^|#|}' /usr/lib/llvm-${LLVM_VERSION}/lib/cmake/llvm/LLVMExports-*.cmake
|
||||
|
||||
ARG CCACHE_VERSION=4.6.1
|
||||
RUN mkdir /tmp/ccache \
|
||||
|
@ -1,14 +0,0 @@
|
||||
---
|
||||
slug: /en/development/browse-code
|
||||
sidebar_label: Source Code Browser
|
||||
sidebar_position: 72
|
||||
description: Various ways to browse and edit the source code
|
||||
---
|
||||
|
||||
# Browse ClickHouse Source Code
|
||||
|
||||
You can use the **Woboq** online code browser available [here](https://clickhouse.com/codebrowser/ClickHouse/src/index.html). It provides code navigation and semantic highlighting, search and indexing. The code snapshot is updated daily.
|
||||
|
||||
Also, you can browse sources on [GitHub](https://github.com/ClickHouse/ClickHouse) as usual.
|
||||
|
||||
If you’re interested what IDE to use, we recommend CLion, QT Creator, VS Code and KDevelop (with caveats). You can use any favorite IDE. Vim and Emacs also count.
|
@ -38,13 +38,13 @@ For other Linux distribution - check the availability of the [prebuild packages]
|
||||
#### Use the latest clang for Builds
|
||||
|
||||
``` bash
|
||||
export CC=clang-14
|
||||
export CXX=clang++-14
|
||||
export CC=clang-15
|
||||
export CXX=clang++-15
|
||||
```
|
||||
|
||||
In this example we use version 14 that is the latest as of Feb 2022.
|
||||
In this example we use version 15 that is the latest as of Sept 2022.
|
||||
|
||||
Gcc can also be used though it is discouraged.
|
||||
Gcc cannot be used.
|
||||
|
||||
### Checkout ClickHouse Sources {#checkout-clickhouse-sources}
|
||||
|
||||
|
@ -419,6 +419,8 @@ Supported data types: `Int*`, `UInt*`, `Float*`, `Enum`, `Date`, `DateTime`, `St
|
||||
|
||||
For `Map` data type client can specify if index should be created for keys or values using [mapKeys](../../../sql-reference/functions/tuple-map-functions.md#mapkeys) or [mapValues](../../../sql-reference/functions/tuple-map-functions.md#mapvalues) function.
|
||||
|
||||
There are also special-purpose and experimental indexes to support approximate nearest neighbor (ANN) queries. See [here](annindexes.md) for details.
|
||||
|
||||
The following functions can use the filter: [equals](../../../sql-reference/functions/comparison-functions.md), [notEquals](../../../sql-reference/functions/comparison-functions.md), [in](../../../sql-reference/functions/in-functions), [notIn](../../../sql-reference/functions/in-functions), [has](../../../sql-reference/functions/array-functions#hasarr-elem), [hasAny](../../../sql-reference/functions/array-functions#hasany), [hasAll](../../../sql-reference/functions/array-functions#hasall).
|
||||
|
||||
Example of index creation for `Map` data type
|
||||
|
@ -1020,6 +1020,62 @@ Example:
|
||||
}
|
||||
```
|
||||
|
||||
To use object name as column value you can use special setting [format_json_object_each_row_column_for_object_name](../operations/settings/settings.md#format_json_object_each_row_column_for_object_name). Value of this setting is set to the name of a column, that is used as JSON key for a row in resulting object.
|
||||
Examples:
|
||||
|
||||
For output:
|
||||
|
||||
Let's say we have table `test` with two columns:
|
||||
```
|
||||
┌─object_name─┬─number─┐
|
||||
│ first_obj │ 1 │
|
||||
│ second_obj │ 2 │
|
||||
│ third_obj │ 3 │
|
||||
└─────────────┴────────┘
|
||||
```
|
||||
Let's output it in `JSONObjectEachRow` format and use `format_json_object_each_row_column_for_object_name` setting:
|
||||
|
||||
```sql
|
||||
select * from test settings format_json_object_each_row_column_for_object_name='object_name'
|
||||
```
|
||||
|
||||
The output:
|
||||
```json
|
||||
{
|
||||
"first_obj": {"number": 1},
|
||||
"second_obj": {"number": 2},
|
||||
"third_obj": {"number": 3}
|
||||
}
|
||||
```
|
||||
|
||||
For input:
|
||||
|
||||
Let's say we stored output from previous example in a file with name `data.json`:
|
||||
```sql
|
||||
select * from file('data.json', JSONObjectEachRow, 'object_name String, number UInt64') settings format_json_object_each_row_column_for_object_name='object_name'
|
||||
```
|
||||
|
||||
```
|
||||
┌─object_name─┬─number─┐
|
||||
│ first_obj │ 1 │
|
||||
│ second_obj │ 2 │
|
||||
│ third_obj │ 3 │
|
||||
└─────────────┴────────┘
|
||||
```
|
||||
|
||||
It also works in schema inference:
|
||||
|
||||
```sql
|
||||
desc file('data.json', JSONObjectEachRow) settings format_json_object_each_row_column_for_object_name='object_name'
|
||||
```
|
||||
|
||||
```
|
||||
┌─name────────┬─type────────────┐
|
||||
│ object_name │ String │
|
||||
│ number │ Nullable(Int64) │
|
||||
└─────────────┴─────────────────┘
|
||||
```
|
||||
|
||||
|
||||
### Inserting Data {#json-inserting-data}
|
||||
|
||||
|
@ -41,6 +41,7 @@ ClickHouse Inc does **not** maintain the libraries listed below and hasn’t don
|
||||
- [node-clickhouse](https://github.com/apla/node-clickhouse)
|
||||
- [nestjs-clickhouse](https://github.com/depyronick/nestjs-clickhouse)
|
||||
- [clickhouse-client](https://github.com/depyronick/clickhouse-client)
|
||||
- [node-clickhouse-orm](https://github.com/zimv/node-clickhouse-orm)
|
||||
- Perl
|
||||
- [perl-DBD-ClickHouse](https://github.com/elcamlost/perl-DBD-ClickHouse)
|
||||
- [HTTP-ClickHouse](https://metacpan.org/release/HTTP-ClickHouse)
|
||||
|
@ -171,6 +171,55 @@ end_time: 2022-08-30 09:21:46
|
||||
1 row in set. Elapsed: 0.002 sec.
|
||||
```
|
||||
|
||||
## Backup to S3
|
||||
|
||||
It is possible to `BACKUP`/`RESTORE` to S3, but this disk should be configured
|
||||
in a proper way, since by default you will need to backup metadata from local
|
||||
disk to make backup full.
|
||||
|
||||
First of all, you need to configure S3 disk in a special way:
|
||||
|
||||
```xml
|
||||
<clickhouse>
|
||||
<storage_configuration>
|
||||
<disks>
|
||||
<s3_plain>
|
||||
<type>s3_plain</type>
|
||||
<endpoint></endpoint>
|
||||
<access_key_id></access_key_id>
|
||||
<secret_access_key></secret_access_key>
|
||||
</s3_plain>
|
||||
</disks>
|
||||
<policies>
|
||||
<s3>
|
||||
<volumes>
|
||||
<main>
|
||||
<disk>s3</disk>
|
||||
</main>
|
||||
</volumes>
|
||||
</s3>
|
||||
</policies>
|
||||
</storage_configuration>
|
||||
|
||||
<backups>
|
||||
<allowed_disk>s3_plain</allowed_disk>
|
||||
</backups>
|
||||
</clickhouse>
|
||||
```
|
||||
|
||||
And then `BACKUP`/`RESTORE` as usual:
|
||||
|
||||
```sql
|
||||
BACKUP TABLE data TO Disk('s3_plain', 'cloud_backup');
|
||||
RESTORE TABLE data AS data_restored FROM Disk('s3_plain', 'cloud_backup');
|
||||
```
|
||||
|
||||
:::note
|
||||
But keep in mind that:
|
||||
- This disk should not be used for `MergeTree` itself, only for `BACKUP`/`RESTORE`
|
||||
- It has excessive API calls
|
||||
:::
|
||||
|
||||
## Alternatives
|
||||
|
||||
ClickHouse stores data on disk, and there are many ways to backup disks. These are some alternatives that have been used in the past, and that may fit in well in your environment.
|
||||
|
@ -1502,6 +1502,21 @@ If not set, [tmp_path](#tmp-path) is used, otherwise it is ignored.
|
||||
- Policy should have exactly one volume with local disks.
|
||||
:::
|
||||
|
||||
## max_temporary_data_on_disk_size {#max_temporary_data_on_disk_size}
|
||||
|
||||
Limit the amount of disk space consumed by temporary files in `tmp_path` for the server.
|
||||
Queries that exceed this limit will fail with an exception.
|
||||
|
||||
Default value: `0`.
|
||||
|
||||
**See also**
|
||||
|
||||
- [max_temporary_data_on_disk_size_for_user](../../operations/settings/query-complexity.md#settings_max_temporary_data_on_disk_size_for_user)
|
||||
- [max_temporary_data_on_disk_size_for_query](../../operations/settings/query-complexity.md#settings_max_temporary_data_on_disk_size_for_query)
|
||||
- [tmp_path](#tmp-path)
|
||||
- [tmp_policy](#tmp-policy)
|
||||
- [max_server_memory_usage](#max_server_memory_usage)
|
||||
|
||||
## uncompressed_cache_size {#server-settings-uncompressed_cache_size}
|
||||
|
||||
Cache size (in bytes) for uncompressed data used by table engines from the [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md).
|
||||
|
@ -313,4 +313,19 @@ When inserting data, ClickHouse calculates the number of partitions in the inser
|
||||
|
||||
> “Too many partitions for single INSERT block (more than” + toString(max_parts) + “). The limit is controlled by ‘max_partitions_per_insert_block’ setting. A large number of partitions is a common misconception. It will lead to severe negative performance impact, including slow server startup, slow INSERT queries and slow SELECT queries. Recommended total number of partitions for a table is under 1000..10000. Please note, that partitioning is not intended to speed up SELECT queries (ORDER BY key is sufficient to make range queries fast). Partitions are intended for data manipulation (DROP PARTITION, etc).”
|
||||
|
||||
## max_temporary_data_on_disk_size_for_user {#settings_max_temporary_data_on_disk_size_for_user}
|
||||
|
||||
The maximum amount of data consumed by temporary files on disk in bytes for all concurrently running user queries.
|
||||
Zero means unlimited.
|
||||
|
||||
Default value: 0.
|
||||
|
||||
|
||||
## max_temporary_data_on_disk_size_for_query {#settings_max_temporary_data_on_disk_size_for_query}
|
||||
|
||||
The maximum amount of data consumed by temporary files on disk in bytes for all concurrently running queries.
|
||||
Zero means unlimited.
|
||||
|
||||
Default value: 0.
|
||||
|
||||
[Original article](https://clickhouse.com/docs/en/operations/settings/query_complexity/) <!--hide-->
|
||||
|
@ -3902,6 +3902,13 @@ Controls validation of UTF-8 sequences in JSON output formats, doesn't impact fo
|
||||
|
||||
Disabled by default.
|
||||
|
||||
### format_json_object_each_row_column_for_object_name {#format_json_object_each_row_column_for_object_name}
|
||||
|
||||
The name of column that will be used for storing/writing object names in [JSONObjectEachRow](../../interfaces/formats.md#jsonobjecteachrow) format.
|
||||
Column type should be String. If value is empty, default names `row_{i}`will be used for object names.
|
||||
|
||||
Default value: ''.
|
||||
|
||||
## TSV format settings {#tsv-format-settings}
|
||||
|
||||
### input_format_tsv_empty_as_default {#input_format_tsv_empty_as_default}
|
||||
|
@ -1,14 +0,0 @@
|
||||
---
|
||||
slug: /ru/development/browse-code
|
||||
sidebar_position: 72
|
||||
sidebar_label: "Навигация по коду ClickHouse"
|
||||
---
|
||||
|
||||
|
||||
# Навигация по коду ClickHouse {#navigatsiia-po-kodu-clickhouse}
|
||||
|
||||
Для навигации по коду онлайн доступен **Woboq**, он расположен [здесь](https://clickhouse.com/codebrowser/ClickHouse/src/index.html). В нём реализовано удобное перемещение между исходными файлами, семантическая подсветка, подсказки, индексация и поиск. Слепок кода обновляется ежедневно.
|
||||
|
||||
Также вы можете просматривать исходники на [GitHub](https://github.com/ClickHouse/ClickHouse).
|
||||
|
||||
Если вы интересуетесь, какую среду разработки выбрать для работы с ClickHouse, мы рекомендуем CLion, QT Creator, VSCode или KDevelop (с некоторыми предостережениями). Вы можете использовать свою любимую среду разработки, Vim и Emacs тоже считаются.
|
@ -34,6 +34,7 @@ sidebar_label: "Клиентские библиотеки от сторонни
|
||||
- [node-clickhouse](https://github.com/apla/node-clickhouse)
|
||||
- [nestjs-clickhouse](https://github.com/depyronick/nestjs-clickhouse)
|
||||
- [clickhouse-client](https://github.com/depyronick/clickhouse-client)
|
||||
- [node-clickhouse-orm](https://github.com/zimv/node-clickhouse-orm)
|
||||
- Perl
|
||||
- [perl-DBD-ClickHouse](https://github.com/elcamlost/perl-DBD-ClickHouse)
|
||||
- [HTTP-ClickHouse](https://metacpan.org/release/HTTP-ClickHouse)
|
||||
|
@ -1,13 +0,0 @@
|
||||
---
|
||||
slug: /zh/development/browse-code
|
||||
sidebar_position: 63
|
||||
sidebar_label: "\u6D4F\u89C8\u6E90\u4EE3\u7801"
|
||||
---
|
||||
|
||||
# 浏览ClickHouse源代码 {#browse-clickhouse-source-code}
|
||||
|
||||
您可以使用 **Woboq** 在线代码浏览器 [点击这里](https://clickhouse.com/codebrowser/ClickHouse/src/index.html). 它提供了代码导航和语义突出显示、搜索和索引。 代码快照每天更新。
|
||||
|
||||
此外,您还可以像往常一样浏览源代码 [GitHub](https://github.com/ClickHouse/ClickHouse)
|
||||
|
||||
如果你希望了解哪种IDE较好,我们推荐使用CLion,QT Creator,VS Code和KDevelop(有注意事项)。 您可以使用任何您喜欢的IDE。 Vim和Emacs也可以。
|
@ -1,10 +1,460 @@
|
||||
---
|
||||
slug: /zh/getting-started/example-datasets/brown-benchmark
|
||||
sidebar_label: Brown University Benchmark
|
||||
description: A new analytical benchmark for machine-generated log data
|
||||
title: "Brown University Benchmark"
|
||||
sidebar_label: 布朗大学基准
|
||||
description: 机器生成日志数据的新分析基准
|
||||
title: "布朗大学基准"
|
||||
---
|
||||
|
||||
import Content from '@site/docs/en/getting-started/example-datasets/brown-benchmark.md';
|
||||
`MgBench` 是机器生成的日志数据的新分析基准,[Andrew Crotty](http://cs.brown.edu/people/acrotty/)。
|
||||
|
||||
<Content />
|
||||
下载数据:
|
||||
|
||||
```bash
|
||||
wget https://datasets.clickhouse.com/mgbench{1..3}.csv.xz
|
||||
```
|
||||
|
||||
解压数据:
|
||||
|
||||
```bash
|
||||
xz -v -d mgbench{1..3}.csv.xz
|
||||
```
|
||||
|
||||
创建数据库和表:
|
||||
|
||||
```sql
|
||||
CREATE DATABASE mgbench;
|
||||
```
|
||||
|
||||
```sql
|
||||
USE mgbench;
|
||||
```
|
||||
|
||||
```sql
|
||||
CREATE TABLE mgbench.logs1 (
|
||||
log_time DateTime,
|
||||
machine_name LowCardinality(String),
|
||||
machine_group LowCardinality(String),
|
||||
cpu_idle Nullable(Float32),
|
||||
cpu_nice Nullable(Float32),
|
||||
cpu_system Nullable(Float32),
|
||||
cpu_user Nullable(Float32),
|
||||
cpu_wio Nullable(Float32),
|
||||
disk_free Nullable(Float32),
|
||||
disk_total Nullable(Float32),
|
||||
part_max_used Nullable(Float32),
|
||||
load_fifteen Nullable(Float32),
|
||||
load_five Nullable(Float32),
|
||||
load_one Nullable(Float32),
|
||||
mem_buffers Nullable(Float32),
|
||||
mem_cached Nullable(Float32),
|
||||
mem_free Nullable(Float32),
|
||||
mem_shared Nullable(Float32),
|
||||
swap_free Nullable(Float32),
|
||||
bytes_in Nullable(Float32),
|
||||
bytes_out Nullable(Float32)
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
ORDER BY (machine_group, machine_name, log_time);
|
||||
```
|
||||
|
||||
|
||||
```sql
|
||||
CREATE TABLE mgbench.logs2 (
|
||||
log_time DateTime,
|
||||
client_ip IPv4,
|
||||
request String,
|
||||
status_code UInt16,
|
||||
object_size UInt64
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
ORDER BY log_time;
|
||||
```
|
||||
|
||||
|
||||
```sql
|
||||
CREATE TABLE mgbench.logs3 (
|
||||
log_time DateTime64,
|
||||
device_id FixedString(15),
|
||||
device_name LowCardinality(String),
|
||||
device_type LowCardinality(String),
|
||||
device_floor UInt8,
|
||||
event_type LowCardinality(String),
|
||||
event_unit FixedString(1),
|
||||
event_value Nullable(Float32)
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
ORDER BY (event_type, log_time);
|
||||
```
|
||||
|
||||
插入数据:
|
||||
|
||||
```
|
||||
clickhouse-client --query "INSERT INTO mgbench.logs1 FORMAT CSVWithNames" < mgbench1.csv
|
||||
clickhouse-client --query "INSERT INTO mgbench.logs2 FORMAT CSVWithNames" < mgbench2.csv
|
||||
clickhouse-client --query "INSERT INTO mgbench.logs3 FORMAT CSVWithNames" < mgbench3.csv
|
||||
```
|
||||
|
||||
## 运行基准查询:
|
||||
|
||||
```sql
|
||||
USE mgbench;
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Q1.1: 自午夜以来每个 Web 服务器的 CPU/网络利用率是多少?
|
||||
|
||||
SELECT machine_name,
|
||||
MIN(cpu) AS cpu_min,
|
||||
MAX(cpu) AS cpu_max,
|
||||
AVG(cpu) AS cpu_avg,
|
||||
MIN(net_in) AS net_in_min,
|
||||
MAX(net_in) AS net_in_max,
|
||||
AVG(net_in) AS net_in_avg,
|
||||
MIN(net_out) AS net_out_min,
|
||||
MAX(net_out) AS net_out_max,
|
||||
AVG(net_out) AS net_out_avg
|
||||
FROM (
|
||||
SELECT machine_name,
|
||||
COALESCE(cpu_user, 0.0) AS cpu,
|
||||
COALESCE(bytes_in, 0.0) AS net_in,
|
||||
COALESCE(bytes_out, 0.0) AS net_out
|
||||
FROM logs1
|
||||
WHERE machine_name IN ('anansi','aragog','urd')
|
||||
AND log_time >= TIMESTAMP '2017-01-11 00:00:00'
|
||||
) AS r
|
||||
GROUP BY machine_name;
|
||||
```
|
||||
|
||||
|
||||
```sql
|
||||
-- Q1.2:最近一天有哪些机房的机器离线?
|
||||
|
||||
SELECT machine_name,
|
||||
log_time
|
||||
FROM logs1
|
||||
WHERE (machine_name LIKE 'cslab%' OR
|
||||
machine_name LIKE 'mslab%')
|
||||
AND load_one IS NULL
|
||||
AND log_time >= TIMESTAMP '2017-01-10 00:00:00'
|
||||
ORDER BY machine_name,
|
||||
log_time;
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Q1.3:特定工作站过去 10 天的每小时的平均指标是多少?
|
||||
|
||||
SELECT dt,
|
||||
hr,
|
||||
AVG(load_fifteen) AS load_fifteen_avg,
|
||||
AVG(load_five) AS load_five_avg,
|
||||
AVG(load_one) AS load_one_avg,
|
||||
AVG(mem_free) AS mem_free_avg,
|
||||
AVG(swap_free) AS swap_free_avg
|
||||
FROM (
|
||||
SELECT CAST(log_time AS DATE) AS dt,
|
||||
EXTRACT(HOUR FROM log_time) AS hr,
|
||||
load_fifteen,
|
||||
load_five,
|
||||
load_one,
|
||||
mem_free,
|
||||
swap_free
|
||||
FROM logs1
|
||||
WHERE machine_name = 'babbage'
|
||||
AND load_fifteen IS NOT NULL
|
||||
AND load_five IS NOT NULL
|
||||
AND load_one IS NOT NULL
|
||||
AND mem_free IS NOT NULL
|
||||
AND swap_free IS NOT NULL
|
||||
AND log_time >= TIMESTAMP '2017-01-01 00:00:00'
|
||||
) AS r
|
||||
GROUP BY dt,
|
||||
hr
|
||||
ORDER BY dt,
|
||||
hr;
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Q1.4: 1 个月内,每台服务器的磁盘 I/O 阻塞的频率是多少?
|
||||
|
||||
SELECT machine_name,
|
||||
COUNT(*) AS spikes
|
||||
FROM logs1
|
||||
WHERE machine_group = 'Servers'
|
||||
AND cpu_wio > 0.99
|
||||
AND log_time >= TIMESTAMP '2016-12-01 00:00:00'
|
||||
AND log_time < TIMESTAMP '2017-01-01 00:00:00'
|
||||
GROUP BY machine_name
|
||||
ORDER BY spikes DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Q1.5:哪些外部可访问的虚拟机的运行内存不足?
|
||||
|
||||
SELECT machine_name,
|
||||
dt,
|
||||
MIN(mem_free) AS mem_free_min
|
||||
FROM (
|
||||
SELECT machine_name,
|
||||
CAST(log_time AS DATE) AS dt,
|
||||
mem_free
|
||||
FROM logs1
|
||||
WHERE machine_group = 'DMZ'
|
||||
AND mem_free IS NOT NULL
|
||||
) AS r
|
||||
GROUP BY machine_name,
|
||||
dt
|
||||
HAVING MIN(mem_free) < 10000
|
||||
ORDER BY machine_name,
|
||||
dt;
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Q1.6: 每小时所有文件服务器的总网络流量是多少?
|
||||
|
||||
SELECT dt,
|
||||
hr,
|
||||
SUM(net_in) AS net_in_sum,
|
||||
SUM(net_out) AS net_out_sum,
|
||||
SUM(net_in) + SUM(net_out) AS both_sum
|
||||
FROM (
|
||||
SELECT CAST(log_time AS DATE) AS dt,
|
||||
EXTRACT(HOUR FROM log_time) AS hr,
|
||||
COALESCE(bytes_in, 0.0) / 1000000000.0 AS net_in,
|
||||
COALESCE(bytes_out, 0.0) / 1000000000.0 AS net_out
|
||||
FROM logs1
|
||||
WHERE machine_name IN ('allsorts','andes','bigred','blackjack','bonbon',
|
||||
'cadbury','chiclets','cotton','crows','dove','fireball','hearts','huey',
|
||||
'lindt','milkduds','milkyway','mnm','necco','nerds','orbit','peeps',
|
||||
'poprocks','razzles','runts','smarties','smuggler','spree','stride',
|
||||
'tootsie','trident','wrigley','york')
|
||||
) AS r
|
||||
GROUP BY dt,
|
||||
hr
|
||||
ORDER BY both_sum DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Q2.1:过去 2 周内哪些请求导致了服务器错误?
|
||||
|
||||
SELECT *
|
||||
FROM logs2
|
||||
WHERE status_code >= 500
|
||||
AND log_time >= TIMESTAMP '2012-12-18 00:00:00'
|
||||
ORDER BY log_time;
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Q2.2:在特定的某 2 周内,用户密码文件是否被泄露了?
|
||||
|
||||
SELECT *
|
||||
FROM logs2
|
||||
WHERE status_code >= 200
|
||||
AND status_code < 300
|
||||
AND request LIKE '%/etc/passwd%'
|
||||
AND log_time >= TIMESTAMP '2012-05-06 00:00:00'
|
||||
AND log_time < TIMESTAMP '2012-05-20 00:00:00';
|
||||
```
|
||||
|
||||
|
||||
```sql
|
||||
-- Q2.3:过去一个月顶级请求的平均路径深度是多少?
|
||||
|
||||
SELECT top_level,
|
||||
AVG(LENGTH(request) - LENGTH(REPLACE(request, '/', ''))) AS depth_avg
|
||||
FROM (
|
||||
SELECT SUBSTRING(request FROM 1 FOR len) AS top_level,
|
||||
request
|
||||
FROM (
|
||||
SELECT POSITION(SUBSTRING(request FROM 2), '/') AS len,
|
||||
request
|
||||
FROM logs2
|
||||
WHERE status_code >= 200
|
||||
AND status_code < 300
|
||||
AND log_time >= TIMESTAMP '2012-12-01 00:00:00'
|
||||
) AS r
|
||||
WHERE len > 0
|
||||
) AS s
|
||||
WHERE top_level IN ('/about','/courses','/degrees','/events',
|
||||
'/grad','/industry','/news','/people',
|
||||
'/publications','/research','/teaching','/ugrad')
|
||||
GROUP BY top_level
|
||||
ORDER BY top_level;
|
||||
```
|
||||
|
||||
|
||||
```sql
|
||||
-- Q2.4:在过去的 3 个月里,哪些客户端发出了过多的请求?
|
||||
|
||||
SELECT client_ip,
|
||||
COUNT(*) AS num_requests
|
||||
FROM logs2
|
||||
WHERE log_time >= TIMESTAMP '2012-10-01 00:00:00'
|
||||
GROUP BY client_ip
|
||||
HAVING COUNT(*) >= 100000
|
||||
ORDER BY num_requests DESC;
|
||||
```
|
||||
|
||||
|
||||
```sql
|
||||
-- Q2.5:每天的独立访问者数量是多少?
|
||||
|
||||
SELECT dt,
|
||||
COUNT(DISTINCT client_ip)
|
||||
FROM (
|
||||
SELECT CAST(log_time AS DATE) AS dt,
|
||||
client_ip
|
||||
FROM logs2
|
||||
) AS r
|
||||
GROUP BY dt
|
||||
ORDER BY dt;
|
||||
```
|
||||
|
||||
|
||||
```sql
|
||||
-- Q2.6:平均和最大数据传输速率(Gbps)是多少?
|
||||
|
||||
SELECT AVG(transfer) / 125000000.0 AS transfer_avg,
|
||||
MAX(transfer) / 125000000.0 AS transfer_max
|
||||
FROM (
|
||||
SELECT log_time,
|
||||
SUM(object_size) AS transfer
|
||||
FROM logs2
|
||||
GROUP BY log_time
|
||||
) AS r;
|
||||
```
|
||||
|
||||
|
||||
```sql
|
||||
-- Q3.1:自 2019/11/29 17:00 以来,室温是否达到过冰点?
|
||||
|
||||
SELECT *
|
||||
FROM logs3
|
||||
WHERE event_type = 'temperature'
|
||||
AND event_value <= 32.0
|
||||
AND log_time >= '2019-11-29 17:00:00.000';
|
||||
```
|
||||
|
||||
|
||||
```sql
|
||||
-- Q3.4:在过去的 6 个月里,每扇门打开的频率是多少?
|
||||
|
||||
SELECT device_name,
|
||||
device_floor,
|
||||
COUNT(*) AS ct
|
||||
FROM logs3
|
||||
WHERE event_type = 'door_open'
|
||||
AND log_time >= '2019-06-01 00:00:00.000'
|
||||
GROUP BY device_name,
|
||||
device_floor
|
||||
ORDER BY ct DESC;
|
||||
```
|
||||
|
||||
下面的查询 3.5 使用了 UNION 关键词。设置该模式以便组合 SELECT 的查询结果。该设置仅在未明确指定 UNION ALL 或 UNION DISTINCT 但使用了 UNION 进行共享时使用。
|
||||
|
||||
```sql
|
||||
SET union_default_mode = 'DISTINCT'
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Q3.5: 在冬季和夏季,建筑物内哪些地方会出现较大的温度变化?
|
||||
|
||||
WITH temperature AS (
|
||||
SELECT dt,
|
||||
device_name,
|
||||
device_type,
|
||||
device_floor
|
||||
FROM (
|
||||
SELECT dt,
|
||||
hr,
|
||||
device_name,
|
||||
device_type,
|
||||
device_floor,
|
||||
AVG(event_value) AS temperature_hourly_avg
|
||||
FROM (
|
||||
SELECT CAST(log_time AS DATE) AS dt,
|
||||
EXTRACT(HOUR FROM log_time) AS hr,
|
||||
device_name,
|
||||
device_type,
|
||||
device_floor,
|
||||
event_value
|
||||
FROM logs3
|
||||
WHERE event_type = 'temperature'
|
||||
) AS r
|
||||
GROUP BY dt,
|
||||
hr,
|
||||
device_name,
|
||||
device_type,
|
||||
device_floor
|
||||
) AS s
|
||||
GROUP BY dt,
|
||||
device_name,
|
||||
device_type,
|
||||
device_floor
|
||||
HAVING MAX(temperature_hourly_avg) - MIN(temperature_hourly_avg) >= 25.0
|
||||
)
|
||||
SELECT DISTINCT device_name,
|
||||
device_type,
|
||||
device_floor,
|
||||
'WINTER'
|
||||
FROM temperature
|
||||
WHERE dt >= DATE '2018-12-01'
|
||||
AND dt < DATE '2019-03-01'
|
||||
UNION
|
||||
SELECT DISTINCT device_name,
|
||||
device_type,
|
||||
device_floor,
|
||||
'SUMMER'
|
||||
FROM temperature
|
||||
WHERE dt >= DATE '2019-06-01'
|
||||
AND dt < DATE '2019-09-01';
|
||||
```
|
||||
|
||||
|
||||
```sql
|
||||
-- Q3.6:对于每种类别的设备,每月的功耗指标是什么?
|
||||
|
||||
SELECT yr,
|
||||
mo,
|
||||
SUM(coffee_hourly_avg) AS coffee_monthly_sum,
|
||||
AVG(coffee_hourly_avg) AS coffee_monthly_avg,
|
||||
SUM(printer_hourly_avg) AS printer_monthly_sum,
|
||||
AVG(printer_hourly_avg) AS printer_monthly_avg,
|
||||
SUM(projector_hourly_avg) AS projector_monthly_sum,
|
||||
AVG(projector_hourly_avg) AS projector_monthly_avg,
|
||||
SUM(vending_hourly_avg) AS vending_monthly_sum,
|
||||
AVG(vending_hourly_avg) AS vending_monthly_avg
|
||||
FROM (
|
||||
SELECT dt,
|
||||
yr,
|
||||
mo,
|
||||
hr,
|
||||
AVG(coffee) AS coffee_hourly_avg,
|
||||
AVG(printer) AS printer_hourly_avg,
|
||||
AVG(projector) AS projector_hourly_avg,
|
||||
AVG(vending) AS vending_hourly_avg
|
||||
FROM (
|
||||
SELECT CAST(log_time AS DATE) AS dt,
|
||||
EXTRACT(YEAR FROM log_time) AS yr,
|
||||
EXTRACT(MONTH FROM log_time) AS mo,
|
||||
EXTRACT(HOUR FROM log_time) AS hr,
|
||||
CASE WHEN device_name LIKE 'coffee%' THEN event_value END AS coffee,
|
||||
CASE WHEN device_name LIKE 'printer%' THEN event_value END AS printer,
|
||||
CASE WHEN device_name LIKE 'projector%' THEN event_value END AS projector,
|
||||
CASE WHEN device_name LIKE 'vending%' THEN event_value END AS vending
|
||||
FROM logs3
|
||||
WHERE device_type = 'meter'
|
||||
) AS r
|
||||
GROUP BY dt,
|
||||
yr,
|
||||
mo,
|
||||
hr
|
||||
) AS s
|
||||
GROUP BY yr,
|
||||
mo
|
||||
ORDER BY yr,
|
||||
mo;
|
||||
```
|
||||
|
||||
此数据集可在 [Playground](https://play.clickhouse.com/play?user=play) 中进行交互式的请求, [example](https://play.clickhouse.com/play?user=play#U0VMRUNUIG1hY2hpbmVfbmFtZSwKICAgICAgIE1JTihjcHUpIEFTIGNwdV9taW4sCiAgICAgICBNQVgoY3B1KSBBUyBjcHVfbWF4LAogICAgICAgQVZHKGNwdSkgQVMgY3B1X2F2ZywKICAgICAgIE1JTihuZXRfaW4pIEFTIG5ldF9pbl9taW4sCiAgICAgICBNQVgobmV0X2luKSBBUyBuZXRfaW5fbWF4LAogICAgICAgQVZHKG5ldF9pbikgQVMgbmV0X2luX2F2ZywKICAgICAgIE1JTihuZXRfb3V0KSBBUyBuZXRfb3V0X21pbiwKICAgICAgIE1BWChuZXRfb3V0KSBBUyBuZXRfb3V0X21heCwKICAgICAgIEFWRyhuZXRfb3V0KSBBUyBuZXRfb3V0X2F2ZwpGUk9NICgKICBTRUxFQ1QgbWFjaGluZV9uYW1lLAogICAgICAgICBDT0FMRVNDRShjcHVfdXNlciwgMC4wKSBBUyBjcHUsCiAgICAgICAgIENPQUxFU0NFKGJ5dGVzX2luLCAwLjApIEFTIG5ldF9pbiwKICAgICAgICAgQ09BTEVTQ0UoYnl0ZXNfb3V0LCAwLjApIEFTIG5ldF9vdXQKICBGUk9NIG1nYmVuY2gubG9nczEKICBXSEVSRSBtYWNoaW5lX25hbWUgSU4gKCdhbmFuc2knLCdhcmFnb2cnLCd1cmQnKQogICAgQU5EIGxvZ190aW1lID49IFRJTUVTVEFNUCAnMjAxNy0wMS0xMSAwMDowMDowMCcKKSBBUyByCkdST1VQIEJZIG1hY2hpbmVfbmFtZQ==).
|
||||
|
@ -1,9 +1,232 @@
|
||||
---
|
||||
slug: /zh/getting-started/example-datasets/cell-towers
|
||||
sidebar_label: Cell Towers
|
||||
title: "Cell Towers"
|
||||
sidebar_label: 蜂窝信号塔
|
||||
sidebar_position: 3
|
||||
title: "蜂窝信号塔"
|
||||
---
|
||||
|
||||
import Content from '@site/docs/en/getting-started/example-datasets/cell-towers.md';
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
import CodeBlock from '@theme/CodeBlock';
|
||||
import ActionsMenu from '@site/docs/en/_snippets/_service_actions_menu.md';
|
||||
import SQLConsoleDetail from '@site/docs/en/_snippets/_launch_sql_console.md';
|
||||
|
||||
该数据集来自 [OpenCellid](https://www.opencellid.org/) - 世界上最大的蜂窝信号塔的开放数据库。
|
||||
|
||||
截至 2021 年,它拥有超过 4000 万条关于全球蜂窝信号塔(GSM、LTE、UMTS 等)的记录及其地理坐标和元数据(国家代码、网络等)。
|
||||
|
||||
OpenCelliD 项目在 `Creative Commons Attribution-ShareAlike 4.0 International License` 协议下许可使用,我们根据相同许可条款重新分发此数据集的快照。登录后即可下载最新版本的数据集。
|
||||
|
||||
|
||||
## 获取数据集 {#get-the-dataset}
|
||||
|
||||
<Tabs groupId="deployMethod">
|
||||
<TabItem value="serverless" label="ClickHouse Cloud" default>
|
||||
|
||||
在 ClickHouse Cloud 上可以通过一个按钮实现通过 S3 上传此数据集。登录你的 ClickHouse Cloud 组织,或通过 [ClickHouse.cloud](https://clickhouse.cloud) 创建免费试用版。<ActionsMenu menu="Load Data" />
|
||||
|
||||
从 **Sample data** 选项卡中选择 **Cell Towers** 数据集,然后选择 **Load data**:
|
||||
|
||||
![加载数据集](@site/docs/en/_snippets/images/cloud-load-data-sample.png)
|
||||
|
||||
检查 cell_towers 的表结构:
|
||||
|
||||
```sql
|
||||
DESCRIBE TABLE cell_towers
|
||||
```
|
||||
|
||||
<SQLConsoleDetail />
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="selfmanaged" label="Self-managed">
|
||||
|
||||
1. 下载 2021 年 2 月以来的数据集快照:[cell_towers.csv.xz](https://datasets.clickhouse.com/cell_towers.csv.xz) (729 MB)。
|
||||
|
||||
2. 验证完整性(可选步骤):
|
||||
|
||||
```bash
|
||||
md5sum cell_towers.csv.xz
|
||||
```
|
||||
|
||||
```response
|
||||
8cf986f4a0d9f12c6f384a0e9192c908 cell_towers.csv.xz
|
||||
```
|
||||
|
||||
3. 使用以下命令解压:
|
||||
|
||||
```bash
|
||||
xz -d cell_towers.csv.xz
|
||||
```
|
||||
|
||||
4. 创建表:
|
||||
|
||||
```sql
|
||||
CREATE TABLE cell_towers
|
||||
(
|
||||
radio Enum8('' = 0, 'CDMA' = 1, 'GSM' = 2, 'LTE' = 3, 'NR' = 4, 'UMTS' = 5),
|
||||
mcc UInt16,
|
||||
net UInt16,
|
||||
area UInt16,
|
||||
cell UInt64,
|
||||
unit Int16,
|
||||
lon Float64,
|
||||
lat Float64,
|
||||
range UInt32,
|
||||
samples UInt32,
|
||||
changeable UInt8,
|
||||
created DateTime,
|
||||
updated DateTime,
|
||||
averageSignal UInt8
|
||||
)
|
||||
ENGINE = MergeTree ORDER BY (radio, mcc, net, created);
|
||||
```
|
||||
|
||||
5. 插入数据集:
|
||||
|
||||
```bash
|
||||
clickhouse-client --query "INSERT INTO cell_towers FORMAT CSVWithNames" < cell_towers.csv
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## 查询示例 {#examples}
|
||||
|
||||
1. 按类型划分的基站数量:
|
||||
|
||||
```sql
|
||||
SELECT radio, count() AS c FROM cell_towers GROUP BY radio ORDER BY c DESC
|
||||
```
|
||||
```response
|
||||
┌─radio─┬────────c─┐
|
||||
│ UMTS │ 20686487 │
|
||||
│ LTE │ 12101148 │
|
||||
│ GSM │ 9931312 │
|
||||
│ CDMA │ 556344 │
|
||||
│ NR │ 867 │
|
||||
└───────┴──────────┘
|
||||
|
||||
5 rows in set. Elapsed: 0.011 sec. Processed 43.28 million rows, 43.28 MB (3.83 billion rows/s., 3.83 GB/s.)
|
||||
```
|
||||
|
||||
2. 各个[移动国家代码(MCC)](https://en.wikipedia.org/wiki/Mobile_country_code)对应的蜂窝信号塔数量:
|
||||
|
||||
```sql
|
||||
SELECT mcc, count() FROM cell_towers GROUP BY mcc ORDER BY count() DESC LIMIT 10
|
||||
```
|
||||
```response
|
||||
┌─mcc─┬─count()─┐
|
||||
│ 310 │ 5024650 │
|
||||
│ 262 │ 2622423 │
|
||||
│ 250 │ 1953176 │
|
||||
│ 208 │ 1891187 │
|
||||
│ 724 │ 1836150 │
|
||||
│ 404 │ 1729151 │
|
||||
│ 234 │ 1618924 │
|
||||
│ 510 │ 1353998 │
|
||||
│ 440 │ 1343355 │
|
||||
│ 311 │ 1332798 │
|
||||
└─────┴─────────┘
|
||||
|
||||
10 rows in set. Elapsed: 0.019 sec. Processed 43.28 million rows, 86.55 MB (2.33 billion rows/s., 4.65 GB/s.)
|
||||
```
|
||||
|
||||
排名靠前的国家是:美国、德国和俄罗斯。
|
||||
|
||||
你可以通过在 ClickHouse 中创建一个 [External Dictionary](../../sql-reference/dictionaries/external-dictionaries/external-dicts.md) 来解码这些值。
|
||||
|
||||
## 用例:合并地理数据 {#use-case}
|
||||
|
||||
使用 `pointInPolygon` 函数。
|
||||
|
||||
1. 创建一个用于存储多边形的表:
|
||||
|
||||
<Tabs groupId="deployMethod">
|
||||
<TabItem value="serverless" label="ClickHouse Cloud" default>
|
||||
|
||||
```sql
|
||||
CREATE TABLE moscow (polygon Array(Tuple(Float64, Float64)))
|
||||
ORDER BY polygon;
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="selfmanaged" label="Self-managed">
|
||||
|
||||
```sql
|
||||
CREATE TEMPORARY TABLE
|
||||
moscow (polygon Array(Tuple(Float64, Float64)));
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
2. 以下点大致上构造了莫斯科的地理围栏(除“新莫斯科”外):
|
||||
|
||||
```sql
|
||||
INSERT INTO moscow VALUES ([(37.84172564285271, 55.78000432402266),
|
||||
(37.8381207618713, 55.775874525970494), (37.83979446823122, 55.775626746008065), (37.84243326983639, 55.77446586811748), (37.84262672750849, 55.771974101091104), (37.84153238623039, 55.77114545193181), (37.841124690460184, 55.76722010265554),
|
||||
(37.84239076983644, 55.76654891107098), (37.842283558197025, 55.76258709833121), (37.8421759312134, 55.758073999993734), (37.84198330422974, 55.75381499999371), (37.8416827275085, 55.749277102484484), (37.84157576190186, 55.74794544108413),
|
||||
(37.83897929098507, 55.74525257875241), (37.83739676451868, 55.74404373042019), (37.838732481460525, 55.74298009816793), (37.841183997352545, 55.743060321833575), (37.84097476190185, 55.73938799999373), (37.84048155819702, 55.73570799999372),
|
||||
(37.840095812164286, 55.73228210777237), (37.83983814285274, 55.73080491981639), (37.83846476321406, 55.729799917464675), (37.83835745269769, 55.72919751082619), (37.838636380279524, 55.72859509486539), (37.8395161005249, 55.727705075632784),
|
||||
(37.83897964285276, 55.722727886185154), (37.83862557539366, 55.72034817326636), (37.83559735744853, 55.71944437307499), (37.835370708803126, 55.71831419154461), (37.83738169402022, 55.71765218986692), (37.83823396494291, 55.71691750159089),
|
||||
(37.838056931213345, 55.71547311301385), (37.836812846557606, 55.71221445615604), (37.83522525396725, 55.709331054395555), (37.83269301586908, 55.70953687463627), (37.829667367706236, 55.70903403789297), (37.83311126588435, 55.70552351822608),
|
||||
(37.83058993121339, 55.70041317726053), (37.82983872750851, 55.69883771404813), (37.82934501586913, 55.69718947487017), (37.828926414016685, 55.69504441658371), (37.82876530422971, 55.69287499999378), (37.82894754100031, 55.690759754047335),
|
||||
(37.827697554878185, 55.68951421135665), (37.82447346292115, 55.68965045405069), (37.83136543914793, 55.68322046195302), (37.833554015869154, 55.67814012759211), (37.83544184655761, 55.67295011628339), (37.837480388885474, 55.6672498719639),
|
||||
(37.838960677246064, 55.66316274139358), (37.83926093121332, 55.66046999999383), (37.839025050262435, 55.65869897264431), (37.83670784390257, 55.65794084879904), (37.835656529083245, 55.65694309303843), (37.83704060449217, 55.65689306460552),
|
||||
(37.83696819873806, 55.65550363526252), (37.83760389616388, 55.65487847246661), (37.83687972750851, 55.65356745541324), (37.83515216004943, 55.65155951234079), (37.83312418518067, 55.64979413590619), (37.82801726983639, 55.64640836412121),
|
||||
(37.820614174591, 55.64164525405531), (37.818908190475426, 55.6421883258084), (37.81717543386075, 55.64112490388471), (37.81690987037274, 55.63916106913107), (37.815099354492155, 55.637925371757085), (37.808769150787356, 55.633798276884455),
|
||||
(37.80100123544311, 55.62873670012244), (37.79598013491824, 55.62554336109055), (37.78634567724606, 55.62033499605651), (37.78334147619623, 55.618768681480326), (37.77746201055901, 55.619855533402706), (37.77527329626457, 55.61909966711279),
|
||||
(37.77801986242668, 55.618770300976294), (37.778212973541216, 55.617257701952106), (37.77784818518065, 55.61574504433011), (37.77016867724609, 55.61148576294007), (37.760191219573976, 55.60599579539028), (37.75338926983641, 55.60227892751446),
|
||||
(37.746329965606634, 55.59920577639331), (37.73939925396728, 55.59631430313617), (37.73273665739439, 55.5935318803559), (37.7299954450912, 55.59350760316188), (37.7268679946899, 55.59469840523759), (37.72626726983634, 55.59229549697373),
|
||||
(37.7262673598022, 55.59081598950582), (37.71897193121335, 55.5877595845419), (37.70871550793456, 55.58393177431724), (37.700497489410374, 55.580917323756644), (37.69204305026244, 55.57778089778455), (37.68544477378839, 55.57815154690915),
|
||||
(37.68391050793454, 55.57472945079756), (37.678803592590306, 55.57328235936491), (37.6743402539673, 55.57255251445782), (37.66813862698363, 55.57216388774464), (37.617927457672096, 55.57505691895805), (37.60443099999999, 55.5757737568051),
|
||||
(37.599683515869145, 55.57749105910326), (37.59754177842709, 55.57796291823627), (37.59625834786988, 55.57906686095235), (37.59501783265684, 55.57746616444403), (37.593090671936025, 55.57671634534502), (37.587018007904, 55.577944600233785),
|
||||
(37.578692203704804, 55.57982895000019), (37.57327546607398, 55.58116294118248), (37.57385012109279, 55.581550362779), (37.57399562266922, 55.5820107079112), (37.5735356072979, 55.58226289171689), (37.57290393054962, 55.582393529795155),
|
||||
(37.57037722355653, 55.581919415056234), (37.5592298306885, 55.584471614867844), (37.54189249206543, 55.58867650795186), (37.5297256269836, 55.59158133551745), (37.517837865081766, 55.59443656218868), (37.51200186508174, 55.59635625174229),
|
||||
(37.506808949737554, 55.59907823904434), (37.49820432275389, 55.6062944994944), (37.494406071441674, 55.60967103463367), (37.494760001358024, 55.61066689753365), (37.49397137107085, 55.61220931698269), (37.49016528606031, 55.613417718449064),
|
||||
(37.48773249206542, 55.61530616333343), (37.47921386508177, 55.622640129112334), (37.470652153442394, 55.62993723476164), (37.46273446298218, 55.6368075123157), (37.46350692265317, 55.64068225239439), (37.46050283203121, 55.640794546982576),
|
||||
(37.457627470916734, 55.64118904154646), (37.450718034393326, 55.64690488145138), (37.44239252645875, 55.65397824729769), (37.434587576721185, 55.66053543155961), (37.43582144975277, 55.661693766520735), (37.43576786245721, 55.662755031737014),
|
||||
(37.430982915344174, 55.664610641628116), (37.428547447097685, 55.66778515273695), (37.42945134592044, 55.668633314343566), (37.42859571562949, 55.66948145750025), (37.4262836402282, 55.670813882451405), (37.418709037048295, 55.6811141674414),
|
||||
(37.41922139651101, 55.68235377885389), (37.419218771842885, 55.68359335082235), (37.417196501327446, 55.684375235224735), (37.41607020370478, 55.68540557585352), (37.415640857147146, 55.68686637150793), (37.414632153442334, 55.68903015131686),
|
||||
(37.413344899475064, 55.690896881757396), (37.41171432275391, 55.69264232162232), (37.40948282275393, 55.69455101638112), (37.40703674603271, 55.69638690385348), (37.39607169577025, 55.70451821283731), (37.38952706878662, 55.70942491932811),
|
||||
(37.387778313491815, 55.71149057784176), (37.39049275399779, 55.71419814298992), (37.385557272491454, 55.7155489617061), (37.38388335714726, 55.71849856042102), (37.378368238098155, 55.7292763261685), (37.37763597123337, 55.730845879211614),
|
||||
(37.37890062088197, 55.73167906388319), (37.37750451918789, 55.734703664681774), (37.375610832015965, 55.734851959522246), (37.3723813571472, 55.74105626086403), (37.37014935714723, 55.746115620904355), (37.36944173016362, 55.750883999993725),
|
||||
(37.36975304365541, 55.76335905525834), (37.37244070571134, 55.76432079697595), (37.3724259757175, 55.76636979670426), (37.369922155757884, 55.76735417953104), (37.369892695770275, 55.76823419316575), (37.370214730163575, 55.782312184391266),
|
||||
(37.370493611114505, 55.78436801120489), (37.37120164550783, 55.78596427165359), (37.37284851456452, 55.7874378183096), (37.37608325135799, 55.7886695054807), (37.3764587460632, 55.78947647305964), (37.37530000265506, 55.79146512926804),
|
||||
(37.38235915344241, 55.79899647809345), (37.384344043655396, 55.80113596939471), (37.38594269577028, 55.80322699999366), (37.38711208598329, 55.804919036911976), (37.3880239841309, 55.806610999993666), (37.38928977249147, 55.81001864976979),
|
||||
(37.39038389947512, 55.81348641242801), (37.39235781481933, 55.81983538336746), (37.393709457672124, 55.82417822811877), (37.394685720901464, 55.82792275755836), (37.39557615344238, 55.830447148154136), (37.39844478226658, 55.83167107969975),
|
||||
(37.40019761214057, 55.83151823557964), (37.400398790382326, 55.83264967594742), (37.39659544313046, 55.83322180909622), (37.39667059524539, 55.83402792148566), (37.39682089947515, 55.83638877400216), (37.39643489154053, 55.83861656112751),
|
||||
(37.3955338994751, 55.84072348043264), (37.392680272491454, 55.84502158126453), (37.39241188227847, 55.84659117913199), (37.392529730163616, 55.84816071336481), (37.39486835714723, 55.85288092980303), (37.39873052645878, 55.859893456073635),
|
||||
(37.40272161111449, 55.86441833633205), (37.40697072750854, 55.867579567544375), (37.410007082016016, 55.868369880337), (37.4120992989502, 55.86920843741314), (37.412668021163924, 55.87055369615854), (37.41482461111453, 55.87170587948249),
|
||||
(37.41862266137694, 55.873183961039565), (37.42413732540892, 55.874879126654704), (37.4312182698669, 55.875614937236705), (37.43111093783558, 55.8762723478417), (37.43332105622856, 55.87706546369396), (37.43385747619623, 55.87790681284802),
|
||||
(37.441303050262405, 55.88027084462084), (37.44747234260555, 55.87942070143253), (37.44716141796871, 55.88072960917233), (37.44769797085568, 55.88121221323979), (37.45204320500181, 55.882080694420715), (37.45673176190186, 55.882346110794586),
|
||||
(37.463383999999984, 55.88252729504517), (37.46682797486874, 55.88294937719063), (37.470014457672086, 55.88361266759345), (37.47751410450743, 55.88546991372396), (37.47860317658232, 55.88534929207307), (37.48165826025772, 55.882563306475106),
|
||||
(37.48316434442331, 55.8815803226785), (37.483831555817645, 55.882427612793315), (37.483182967125686, 55.88372791409729), (37.483092277908824, 55.88495581062434), (37.4855716508179, 55.8875561994203), (37.486440636245746, 55.887827444039566),
|
||||
(37.49014203439328, 55.88897899871799), (37.493210285705544, 55.890208937135604), (37.497512451065035, 55.891342397444696), (37.49780744510645, 55.89174030252967), (37.49940333499519, 55.89239745507079), (37.50018383334346, 55.89339220941865),
|
||||
(37.52421672750851, 55.903869074155224), (37.52977457672118, 55.90564076517974), (37.53503220370484, 55.90661661218259), (37.54042858064267, 55.90714113744566), (37.54320461007303, 55.905645048442985), (37.545686966066306, 55.906608607018505),
|
||||
(37.54743976120755, 55.90788552162358), (37.55796999999999, 55.90901557907218), (37.572711542327866, 55.91059395704873), (37.57942799999998, 55.91073854155573), (37.58502865872187, 55.91009969268444), (37.58739968913264, 55.90794809960554),
|
||||
(37.59131567193598, 55.908713267595054), (37.612687423278814, 55.902866854295375), (37.62348079629517, 55.90041967242986), (37.635797880950896, 55.898141151686396), (37.649487626983664, 55.89639275532968), (37.65619302513125, 55.89572360207488),
|
||||
(37.66294133862307, 55.895295577183965), (37.66874564418033, 55.89505457604897), (37.67375601586915, 55.89254677027454), (37.67744661901856, 55.8947775867987), (37.688347, 55.89450045676125), (37.69480554232789, 55.89422926332761),
|
||||
(37.70107096560668, 55.89322256101114), (37.705962965606716, 55.891763491662616), (37.711885134918205, 55.889110234998974), (37.71682005026245, 55.886577568759876), (37.7199315476074, 55.88458159806678), (37.72234560316464, 55.882281005794134),
|
||||
(37.72364385977171, 55.8809452036196), (37.725371142837474, 55.8809722706006), (37.727870902099546, 55.88037213862385), (37.73394330422971, 55.877941504088696), (37.745339592590376, 55.87208120378722), (37.75525267724611, 55.86703807949492),
|
||||
(37.76919976190188, 55.859821640197474), (37.827835219574, 55.82962968399116), (37.83341438888553, 55.82575289922351), (37.83652584655761, 55.82188784027888), (37.83809213491821, 55.81612575504693), (37.83605359521481, 55.81460347077685),
|
||||
(37.83632178569025, 55.81276696067908), (37.838623105812026, 55.811486181656385), (37.83912198147584, 55.807329380532785), (37.839079078033414, 55.80510270463816), (37.83965844708251, 55.79940712529036), (37.840581150787344, 55.79131399999368),
|
||||
(37.84172564285271, 55.78000432402266)]);
|
||||
```
|
||||
|
||||
3. 检查莫斯科有多少个蜂窝信号塔:
|
||||
|
||||
```sql
|
||||
SELECT count() FROM cell_towers
|
||||
WHERE pointInPolygon((lon, lat), (SELECT * FROM moscow))
|
||||
```
|
||||
```response
|
||||
┌─count()─┐
|
||||
│ 310463 │
|
||||
└─────────┘
|
||||
|
||||
1 rows in set. Elapsed: 0.067 sec. Processed 43.28 million rows, 692.42 MB (645.83 million rows/s., 10.33 GB/s.)
|
||||
```
|
||||
|
||||
虽然不能创建临时表,但此数据集仍可在 [Playground](https://play.clickhouse.com/play?user=play) 中进行交互式的请求, [example](https://play.clickhouse.com/play?user=play#U0VMRUNUIG1jYywgY291bnQoKSBGUk9NIGNlbGxfdG93ZXJzIEdST1VQIEJZIG1jYyBPUkRFUiBCWSBjb3VudCgpIERFU0M=).
|
||||
|
||||
<Content />
|
||||
|
@ -1,9 +1,352 @@
|
||||
---
|
||||
slug: /zh/getting-started/example-datasets/menus
|
||||
sidebar_label: New York Public Library "What's on the Menu?" Dataset
|
||||
title: "New York Public Library \"What's on the Menu?\" Dataset"
|
||||
---
|
||||
slug: /zh/getting-started/example-datasets/menus
|
||||
sidebar_label: '纽约公共图书馆“菜单上有什么?”数据集'
|
||||
title: '纽约公共图书馆“菜单上有什么?”数据集'
|
||||
---
|
||||
|
||||
import Content from '@site/docs/en/getting-started/example-datasets/menus.md';
|
||||
该数据集由纽约公共图书馆创建。其中含有有关酒店、餐馆和咖啡馆的菜单上的菜肴及其价格的历史数据。
|
||||
|
||||
<Content />
|
||||
来源:http://menus.nypl.org/data
|
||||
数据为开放数据。
|
||||
|
||||
数据来自于图书馆中的档案,因此可能不完整,以至于难以进行统计分析。尽管如此,该数据集也是非常有意思的。数据集中只有 130 万条关于菜单中的菜肴的记录 - 这对于 ClickHouse 来说是一个非常小的数据量,但这仍是一个很好的例子。
|
||||
|
||||
## 下载数据集 {#download-dataset}
|
||||
|
||||
运行命令:
|
||||
|
||||
```bash
|
||||
wget https://s3.amazonaws.com/menusdata.nypl.org/gzips/2021_08_01_07_01_17_data.tgz
|
||||
```
|
||||
|
||||
如果有需要可以使用 http://menus.nypl.org/data 中的最新链接。下载的大小约为 35 MB。
|
||||
|
||||
## 解压数据集 {#unpack-dataset}
|
||||
|
||||
```bash
|
||||
tar xvf 2021_08_01_07_01_17_data.tgz
|
||||
```
|
||||
|
||||
解压后的的大小约为 150 MB。
|
||||
|
||||
数据集由四个表组成:
|
||||
|
||||
- `Menu` - 有关菜单的信息,其中包含:餐厅名称,看到菜单的日期等
|
||||
- `Dish` - 有关菜肴的信息,其中包含:菜肴名称以及一些特征。
|
||||
- `MenuPage` - 有关菜单中页面的信息,每个页面都属于某个 `Menu`。
|
||||
- `MenuItem` - 菜单项。某个菜单页面上的菜肴及其价格:指向 `Dish` 和 `MenuPage`的链接。
|
||||
|
||||
## 创建表 {#create-tables}
|
||||
|
||||
使用 [Decimal](/docs/zh/sql-reference/data-types/decimal.md) 数据类型来存储价格。
|
||||
|
||||
```sql
|
||||
CREATE TABLE dish
|
||||
(
|
||||
id UInt32,
|
||||
name String,
|
||||
description String,
|
||||
menus_appeared UInt32,
|
||||
times_appeared Int32,
|
||||
first_appeared UInt16,
|
||||
last_appeared UInt16,
|
||||
lowest_price Decimal64(3),
|
||||
highest_price Decimal64(3)
|
||||
) ENGINE = MergeTree ORDER BY id;
|
||||
|
||||
CREATE TABLE menu
|
||||
(
|
||||
id UInt32,
|
||||
name String,
|
||||
sponsor String,
|
||||
event String,
|
||||
venue String,
|
||||
place String,
|
||||
physical_description String,
|
||||
occasion String,
|
||||
notes String,
|
||||
call_number String,
|
||||
keywords String,
|
||||
language String,
|
||||
date String,
|
||||
location String,
|
||||
location_type String,
|
||||
currency String,
|
||||
currency_symbol String,
|
||||
status String,
|
||||
page_count UInt16,
|
||||
dish_count UInt16
|
||||
) ENGINE = MergeTree ORDER BY id;
|
||||
|
||||
CREATE TABLE menu_page
|
||||
(
|
||||
id UInt32,
|
||||
menu_id UInt32,
|
||||
page_number UInt16,
|
||||
image_id String,
|
||||
full_height UInt16,
|
||||
full_width UInt16,
|
||||
uuid UUID
|
||||
) ENGINE = MergeTree ORDER BY id;
|
||||
|
||||
CREATE TABLE menu_item
|
||||
(
|
||||
id UInt32,
|
||||
menu_page_id UInt32,
|
||||
price Decimal64(3),
|
||||
high_price Decimal64(3),
|
||||
dish_id UInt32,
|
||||
created_at DateTime,
|
||||
updated_at DateTime,
|
||||
xpos Float64,
|
||||
ypos Float64
|
||||
) ENGINE = MergeTree ORDER BY id;
|
||||
```
|
||||
|
||||
## 导入数据 {#import-data}
|
||||
|
||||
执行以下命令将数据导入 ClickHouse:
|
||||
|
||||
```bash
|
||||
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO dish FORMAT CSVWithNames" < Dish.csv
|
||||
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu FORMAT CSVWithNames" < Menu.csv
|
||||
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu_page FORMAT CSVWithNames" < MenuPage.csv
|
||||
clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --date_time_input_format best_effort --query "INSERT INTO menu_item FORMAT CSVWithNames" < MenuItem.csv
|
||||
```
|
||||
|
||||
因为数据由带有标题的 CSV 表示,所以使用 [CSVWithNames](/docs/zh/interfaces/formats.md#csvwithnames) 格式。
|
||||
|
||||
因为只有双引号用于数据字段,单引号可以在值内,所以禁用了 `format_csv_allow_single_quotes` 以避免混淆 CSV 解析器。
|
||||
|
||||
因为数据中没有 [NULL](/docs/zh/sql-reference/syntax.md#null-literal) 值,所以禁用 [input_format_null_as_default](/docs/zh/operations/settings/settings.md#settings-input-format-null-as-default)。不然 ClickHouse 将会尝试解析 `\N` 序列,并可能与数据中的 `\` 混淆。
|
||||
|
||||
设置 [date_time_input_format best_effort](/docs/zh/operations/settings/settings.md#settings-date_time_input_format) 以便解析各种格式的 [DateTime](/docs/zh/sql-reference/data-types/datetime.md)字段。例如,识别像“2000-01-01 01:02”这样没有秒数的 ISO-8601 时间字符串。如果没有此设置,则仅允许使用固定的 DateTime 格式。
|
||||
|
||||
## 非规范化数据 {#denormalize-data}
|
||||
|
||||
数据以 [规范化形式] (https://en.wikipedia.org/wiki/Database_normalization#Normal_forms) 在多个表格中呈现。这意味着如果你想进行如查询菜单项中的菜名这类的查询,则必须执行 [JOIN](/docs/zh/sql-reference/statements/select/join.md#select-join)。在典型的分析任务中,预先处理联接的数据以避免每次都执行“联接”会更有效率。这中操作被称为“非规范化”数据。
|
||||
|
||||
我们将创建一个表“menu_item_denorm”,其中将包含所有联接在一起的数据:
|
||||
|
||||
```sql
|
||||
CREATE TABLE menu_item_denorm
|
||||
ENGINE = MergeTree ORDER BY (dish_name, created_at)
|
||||
AS SELECT
|
||||
price,
|
||||
high_price,
|
||||
created_at,
|
||||
updated_at,
|
||||
xpos,
|
||||
ypos,
|
||||
dish.id AS dish_id,
|
||||
dish.name AS dish_name,
|
||||
dish.description AS dish_description,
|
||||
dish.menus_appeared AS dish_menus_appeared,
|
||||
dish.times_appeared AS dish_times_appeared,
|
||||
dish.first_appeared AS dish_first_appeared,
|
||||
dish.last_appeared AS dish_last_appeared,
|
||||
dish.lowest_price AS dish_lowest_price,
|
||||
dish.highest_price AS dish_highest_price,
|
||||
menu.id AS menu_id,
|
||||
menu.name AS menu_name,
|
||||
menu.sponsor AS menu_sponsor,
|
||||
menu.event AS menu_event,
|
||||
menu.venue AS menu_venue,
|
||||
menu.place AS menu_place,
|
||||
menu.physical_description AS menu_physical_description,
|
||||
menu.occasion AS menu_occasion,
|
||||
menu.notes AS menu_notes,
|
||||
menu.call_number AS menu_call_number,
|
||||
menu.keywords AS menu_keywords,
|
||||
menu.language AS menu_language,
|
||||
menu.date AS menu_date,
|
||||
menu.location AS menu_location,
|
||||
menu.location_type AS menu_location_type,
|
||||
menu.currency AS menu_currency,
|
||||
menu.currency_symbol AS menu_currency_symbol,
|
||||
menu.status AS menu_status,
|
||||
menu.page_count AS menu_page_count,
|
||||
menu.dish_count AS menu_dish_count
|
||||
FROM menu_item
|
||||
JOIN dish ON menu_item.dish_id = dish.id
|
||||
JOIN menu_page ON menu_item.menu_page_id = menu_page.id
|
||||
JOIN menu ON menu_page.menu_id = menu.id;
|
||||
```
|
||||
|
||||
## 验证数据 {#validate-data}
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT count() FROM menu_item_denorm;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌─count()─┐
|
||||
│ 1329175 │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
## 运行一些查询 {#run-queries}
|
||||
|
||||
### 菜品的平均历史价格 {#query-averaged-historical-prices}
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
|
||||
count(),
|
||||
round(avg(price), 2),
|
||||
bar(avg(price), 0, 100, 100)
|
||||
FROM menu_item_denorm
|
||||
WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022)
|
||||
GROUP BY d
|
||||
ORDER BY d ASC;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 100, 100)─┐
|
||||
│ 1850 │ 618 │ 1.5 │ █▍ │
|
||||
│ 1860 │ 1634 │ 1.29 │ █▎ │
|
||||
│ 1870 │ 2215 │ 1.36 │ █▎ │
|
||||
│ 1880 │ 3909 │ 1.01 │ █ │
|
||||
│ 1890 │ 8837 │ 1.4 │ █▍ │
|
||||
│ 1900 │ 176292 │ 0.68 │ ▋ │
|
||||
│ 1910 │ 212196 │ 0.88 │ ▊ │
|
||||
│ 1920 │ 179590 │ 0.74 │ ▋ │
|
||||
│ 1930 │ 73707 │ 0.6 │ ▌ │
|
||||
│ 1940 │ 58795 │ 0.57 │ ▌ │
|
||||
│ 1950 │ 41407 │ 0.95 │ ▊ │
|
||||
│ 1960 │ 51179 │ 1.32 │ █▎ │
|
||||
│ 1970 │ 12914 │ 1.86 │ █▋ │
|
||||
│ 1980 │ 7268 │ 4.35 │ ████▎ │
|
||||
│ 1990 │ 11055 │ 6.03 │ ██████ │
|
||||
│ 2000 │ 2467 │ 11.85 │ ███████████▋ │
|
||||
│ 2010 │ 597 │ 25.66 │ █████████████████████████▋ │
|
||||
└──────┴─────────┴──────────────────────┴──────────────────────────────┘
|
||||
```
|
||||
|
||||
带上一粒盐。
|
||||
|
||||
### 汉堡价格 {#query-burger-prices}
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
|
||||
count(),
|
||||
round(avg(price), 2),
|
||||
bar(avg(price), 0, 50, 100)
|
||||
FROM menu_item_denorm
|
||||
WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%burger%')
|
||||
GROUP BY d
|
||||
ORDER BY d ASC;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)───────────┐
|
||||
│ 1880 │ 2 │ 0.42 │ ▋ │
|
||||
│ 1890 │ 7 │ 0.85 │ █▋ │
|
||||
│ 1900 │ 399 │ 0.49 │ ▊ │
|
||||
│ 1910 │ 589 │ 0.68 │ █▎ │
|
||||
│ 1920 │ 280 │ 0.56 │ █ │
|
||||
│ 1930 │ 74 │ 0.42 │ ▋ │
|
||||
│ 1940 │ 119 │ 0.59 │ █▏ │
|
||||
│ 1950 │ 134 │ 1.09 │ ██▏ │
|
||||
│ 1960 │ 272 │ 0.92 │ █▋ │
|
||||
│ 1970 │ 108 │ 1.18 │ ██▎ │
|
||||
│ 1980 │ 88 │ 2.82 │ █████▋ │
|
||||
│ 1990 │ 184 │ 3.68 │ ███████▎ │
|
||||
│ 2000 │ 21 │ 7.14 │ ██████████████▎ │
|
||||
│ 2010 │ 6 │ 18.42 │ ████████████████████████████████████▋ │
|
||||
└──────┴─────────┴──────────────────────┴───────────────────────────────────────┘
|
||||
```
|
||||
|
||||
###伏特加{#query-vodka}
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
|
||||
count(),
|
||||
round(avg(price), 2),
|
||||
bar(avg(price), 0, 50, 100)
|
||||
FROM menu_item_denorm
|
||||
WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%vodka%')
|
||||
GROUP BY d
|
||||
ORDER BY d ASC;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)─┐
|
||||
│ 1910 │ 2 │ 0 │ │
|
||||
│ 1920 │ 1 │ 0.3 │ ▌ │
|
||||
│ 1940 │ 21 │ 0.42 │ ▋ │
|
||||
│ 1950 │ 14 │ 0.59 │ █▏ │
|
||||
│ 1960 │ 113 │ 2.17 │ ████▎ │
|
||||
│ 1970 │ 37 │ 0.68 │ █▎ │
|
||||
│ 1980 │ 19 │ 2.55 │ █████ │
|
||||
│ 1990 │ 86 │ 3.6 │ ███████▏ │
|
||||
│ 2000 │ 2 │ 3.98 │ ███████▊ │
|
||||
└──────┴─────────┴──────────────────────┴─────────────────────────────┘
|
||||
```
|
||||
|
||||
要查询 `Vodka`,必须声明通过 `ILIKE '%vodka%'` 进行查询。
|
||||
|
||||
### 鱼子酱 {#query-caviar}
|
||||
|
||||
列出鱼子酱的价格。另外,列出任何带有鱼子酱的菜肴的名称。
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d,
|
||||
count(),
|
||||
round(avg(price), 2),
|
||||
bar(avg(price), 0, 50, 100),
|
||||
any(dish_name)
|
||||
FROM menu_item_denorm
|
||||
WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%caviar%')
|
||||
GROUP BY d
|
||||
ORDER BY d ASC;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)──────┬─any(dish_name)──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ 1090 │ 1 │ 0 │ │ Caviar │
|
||||
│ 1880 │ 3 │ 0 │ │ Caviar │
|
||||
│ 1890 │ 39 │ 0.59 │ █▏ │ Butter and caviar │
|
||||
│ 1900 │ 1014 │ 0.34 │ ▋ │ Anchovy Caviar on Toast │
|
||||
│ 1910 │ 1588 │ 1.35 │ ██▋ │ 1/1 Brötchen Caviar │
|
||||
│ 1920 │ 927 │ 1.37 │ ██▋ │ ASTRAKAN CAVIAR │
|
||||
│ 1930 │ 289 │ 1.91 │ ███▋ │ Astrachan caviar │
|
||||
│ 1940 │ 201 │ 0.83 │ █▋ │ (SPECIAL) Domestic Caviar Sandwich │
|
||||
│ 1950 │ 81 │ 2.27 │ ████▌ │ Beluga Caviar │
|
||||
│ 1960 │ 126 │ 2.21 │ ████▍ │ Beluga Caviar │
|
||||
│ 1970 │ 105 │ 0.95 │ █▊ │ BELUGA MALOSSOL CAVIAR AMERICAN DRESSING │
|
||||
│ 1980 │ 12 │ 7.22 │ ██████████████▍ │ Authentic Iranian Beluga Caviar the world's finest black caviar presented in ice garni and a sampling of chilled 100° Russian vodka │
|
||||
│ 1990 │ 74 │ 14.42 │ ████████████████████████████▋ │ Avocado Salad, Fresh cut avocado with caviare │
|
||||
│ 2000 │ 3 │ 7.82 │ ███████████████▋ │ Aufgeschlagenes Kartoffelsueppchen mit Forellencaviar │
|
||||
│ 2010 │ 6 │ 15.58 │ ███████████████████████████████▏ │ "OYSTERS AND PEARLS" "Sabayon" of Pearl Tapioca with Island Creek Oysters and Russian Sevruga Caviar │
|
||||
└──────┴─────────┴──────────────────────┴──────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
至少他们有伏特加配鱼子酱。真棒。
|
||||
|
||||
## 在线 Playground{#playground}
|
||||
|
||||
此数据集已经上传到了 ClickHouse Playground 中,[example](https://play.clickhouse.com/play?user=play#U0VMRUNUCiAgICByb3VuZCh0b1VJbnQzMk9yWmVybyhleHRyYWN0KG1lbnVfZGF0ZSwgJ15cXGR7NH0nKSksIC0xKSBBUyBkLAogICAgY291bnQoKSwKICAgIHJvdW5kKGF2ZyhwcmljZSksIDIpLAogICAgYmFyKGF2ZyhwcmljZSksIDAsIDUwLCAxMDApLAogICAgYW55KGRpc2hfbmFtZSkKRlJPTSBtZW51X2l0ZW1fZGVub3JtCldIRVJFIChtZW51X2N1cnJlbmN5IElOICgnRG9sbGFycycsICcnKSkgQU5EIChkID4gMCkgQU5EIChkIDwgMjAyMikgQU5EIChkaXNoX25hbWUgSUxJS0UgJyVjYXZpYXIlJykKR1JPVVAgQlkgZApPUkRFUiBCWSBkIEFTQw==)。
|
||||
|
@ -1,9 +1,416 @@
|
||||
---
|
||||
---
|
||||
slug: /zh/getting-started/example-datasets/opensky
|
||||
sidebar_label: Air Traffic Data
|
||||
title: "Crowdsourced air traffic data from The OpenSky Network 2020"
|
||||
sidebar_label: 空中交通数据
|
||||
description: 该数据集中的数据是从完整的 OpenSky 数据集中衍生而来的,对其中的数据进行了必要的清理,用以展示在 COVID-19 期间空中交通的发展。
|
||||
title: "来自 The OpenSky Network 2020 的众包空中交通数据"
|
||||
---
|
||||
|
||||
import Content from '@site/docs/en/getting-started/example-datasets/opensky.md';
|
||||
该数据集中的数据是从完整的 OpenSky 数据集中派生和清理的,以说明 COVID-19 大流行期间空中交通的发展。它涵盖了自 2019 年 1 月 1 日以来该网络中 2500 多名成员观测到的所有航班。直到 COVID-19 大流行结束,更多数据将定期的更新到数据集中。
|
||||
|
||||
<Content />
|
||||
来源:https://zenodo.org/record/5092942#.YRBCyTpRXYd
|
||||
|
||||
Martin Strohmeier、Xavier Olive、Jannis Lübbe、Matthias Schäfer 和 Vincent Lenders “来自 OpenSky 网络 2019-2020 的众包空中交通数据”地球系统科学数据 13(2),2021 https://doi.org/10.5194/essd- 13-357-2021
|
||||
|
||||
## 下载数据集 {#download-dataset}
|
||||
|
||||
运行命令:
|
||||
|
||||
```bash
|
||||
wget -O- https://zenodo.org/record/5092942 | grep -oP 'https://zenodo.org/record/5092942/files/flightlist_\d+_\d+\.csv\.gz' | xargs wget
|
||||
```
|
||||
|
||||
Download will take about 2 minutes with good internet connection. There are 30 files with total size of 4.3 GB.
|
||||
|
||||
## 创建表 {#create-table}
|
||||
|
||||
```sql
|
||||
CREATE TABLE opensky
|
||||
(
|
||||
callsign String,
|
||||
number String,
|
||||
icao24 String,
|
||||
registration String,
|
||||
typecode String,
|
||||
origin String,
|
||||
destination String,
|
||||
firstseen DateTime,
|
||||
lastseen DateTime,
|
||||
day DateTime,
|
||||
latitude_1 Float64,
|
||||
longitude_1 Float64,
|
||||
altitude_1 Float64,
|
||||
latitude_2 Float64,
|
||||
longitude_2 Float64,
|
||||
altitude_2 Float64
|
||||
) ENGINE = MergeTree ORDER BY (origin, destination, callsign);
|
||||
```
|
||||
|
||||
## 导入数据 {#import-data}
|
||||
|
||||
将数据并行导入到 ClickHouse:
|
||||
|
||||
```bash
|
||||
ls -1 flightlist_*.csv.gz | xargs -P100 -I{} bash -c 'gzip -c -d "{}" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"'
|
||||
```
|
||||
|
||||
- 这里我们将文件列表(`ls -1 flightlist_*.csv.gz`)传递给`xargs`以进行并行处理。 `xargs -P100` 指定最多使用 100 个并行工作程序,但由于我们只有 30 个文件,工作程序的数量将只有 30 个。
|
||||
- 对于每个文件,`xargs` 将通过 `bash -c` 为每个文件运行一个脚本文件。该脚本通过使用 `{}` 表示文件名占位符,然后 `xargs` 由命令进行填充(使用 `-I{}`)。
|
||||
- 该脚本会将文件 (`gzip -c -d "{}"`) 解压缩到标准输出(`-c` 参数),并将输出重定向到 `clickhouse-client`。
|
||||
- 我们还要求使用扩展解析器解析 [DateTime](../../sql-reference/data-types/datetime.md) 字段 ([--date_time_input_format best_effort](../../operations/settings/ settings.md#settings-date_time_input_format)) 以识别具有时区偏移的 ISO-8601 格式。
|
||||
|
||||
最后,`clickhouse-client` 会以 [CSVWithNames](../../interfaces/formats.md#csvwithnames) 格式读取输入数据然后执行插入。
|
||||
|
||||
并行导入需要 24 秒。
|
||||
|
||||
如果您不想使用并行导入,以下是顺序导入的方式:
|
||||
|
||||
```bash
|
||||
for file in flightlist_*.csv.gz; do gzip -c -d "$file" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"; done
|
||||
```
|
||||
|
||||
## 验证数据 {#validate-data}
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT count() FROM opensky;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌──count()─┐
|
||||
│ 66010819 │
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
ClickHouse 中的数据集大小只有 2.66 GiB,检查一下。
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'opensky';
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌─formatReadableSize(total_bytes)─┐
|
||||
│ 2.66 GiB │
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 运行一些查询 {#run-queries}
|
||||
|
||||
总行驶距离为 680 亿公里。
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT formatReadableQuantity(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) / 1000) FROM opensky;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌─formatReadableQuantity(divide(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)), 1000))─┐
|
||||
│ 68.72 billion │
|
||||
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
平均飞行距离约为 1000 公里。
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) FROM opensky;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌─avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2))─┐
|
||||
│ 1041090.6465708319 │
|
||||
└────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 最繁忙的始发机场和观测到的平均距离{#busy-airports-average-distance}
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
origin,
|
||||
count(),
|
||||
round(avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2))) AS distance,
|
||||
bar(distance, 0, 10000000, 100) AS bar
|
||||
FROM opensky
|
||||
WHERE origin != ''
|
||||
GROUP BY origin
|
||||
ORDER BY count() DESC
|
||||
LIMIT 100;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌─origin─┬─count()─┬─distance─┬─bar────────────────────────────────────┐
|
||||
1. │ KORD │ 745007 │ 1546108 │ ███████████████▍ │
|
||||
2. │ KDFW │ 696702 │ 1358721 │ █████████████▌ │
|
||||
3. │ KATL │ 667286 │ 1169661 │ ███████████▋ │
|
||||
4. │ KDEN │ 582709 │ 1287742 │ ████████████▊ │
|
||||
5. │ KLAX │ 581952 │ 2628393 │ ██████████████████████████▎ │
|
||||
6. │ KLAS │ 447789 │ 1336967 │ █████████████▎ │
|
||||
7. │ KPHX │ 428558 │ 1345635 │ █████████████▍ │
|
||||
8. │ KSEA │ 412592 │ 1757317 │ █████████████████▌ │
|
||||
9. │ KCLT │ 404612 │ 880355 │ ████████▋ │
|
||||
10. │ VIDP │ 363074 │ 1445052 │ ██████████████▍ │
|
||||
11. │ EDDF │ 362643 │ 2263960 │ ██████████████████████▋ │
|
||||
12. │ KSFO │ 361869 │ 2445732 │ ████████████████████████▍ │
|
||||
13. │ KJFK │ 349232 │ 2996550 │ █████████████████████████████▊ │
|
||||
14. │ KMSP │ 346010 │ 1287328 │ ████████████▋ │
|
||||
15. │ LFPG │ 344748 │ 2206203 │ ██████████████████████ │
|
||||
16. │ EGLL │ 341370 │ 3216593 │ ████████████████████████████████▏ │
|
||||
17. │ EHAM │ 340272 │ 2116425 │ █████████████████████▏ │
|
||||
18. │ KEWR │ 337696 │ 1826545 │ ██████████████████▎ │
|
||||
19. │ KPHL │ 320762 │ 1291761 │ ████████████▊ │
|
||||
20. │ OMDB │ 308855 │ 2855706 │ ████████████████████████████▌ │
|
||||
21. │ UUEE │ 307098 │ 1555122 │ ███████████████▌ │
|
||||
22. │ KBOS │ 304416 │ 1621675 │ ████████████████▏ │
|
||||
23. │ LEMD │ 291787 │ 1695097 │ ████████████████▊ │
|
||||
24. │ YSSY │ 272979 │ 1875298 │ ██████████████████▋ │
|
||||
25. │ KMIA │ 265121 │ 1923542 │ ███████████████████▏ │
|
||||
26. │ ZGSZ │ 263497 │ 745086 │ ███████▍ │
|
||||
27. │ EDDM │ 256691 │ 1361453 │ █████████████▌ │
|
||||
28. │ WMKK │ 254264 │ 1626688 │ ████████████████▎ │
|
||||
29. │ CYYZ │ 251192 │ 2175026 │ █████████████████████▋ │
|
||||
30. │ KLGA │ 248699 │ 1106935 │ ███████████ │
|
||||
31. │ VHHH │ 248473 │ 3457658 │ ██████████████████████████████████▌ │
|
||||
32. │ RJTT │ 243477 │ 1272744 │ ████████████▋ │
|
||||
33. │ KBWI │ 241440 │ 1187060 │ ███████████▋ │
|
||||
34. │ KIAD │ 239558 │ 1683485 │ ████████████████▋ │
|
||||
35. │ KIAH │ 234202 │ 1538335 │ ███████████████▍ │
|
||||
36. │ KFLL │ 223447 │ 1464410 │ ██████████████▋ │
|
||||
37. │ KDAL │ 212055 │ 1082339 │ ██████████▋ │
|
||||
38. │ KDCA │ 207883 │ 1013359 │ ██████████▏ │
|
||||
39. │ LIRF │ 207047 │ 1427965 │ ██████████████▎ │
|
||||
40. │ PANC │ 206007 │ 2525359 │ █████████████████████████▎ │
|
||||
41. │ LTFJ │ 205415 │ 860470 │ ████████▌ │
|
||||
42. │ KDTW │ 204020 │ 1106716 │ ███████████ │
|
||||
43. │ VABB │ 201679 │ 1300865 │ █████████████ │
|
||||
44. │ OTHH │ 200797 │ 3759544 │ █████████████████████████████████████▌ │
|
||||
45. │ KMDW │ 200796 │ 1232551 │ ████████████▎ │
|
||||
46. │ KSAN │ 198003 │ 1495195 │ ██████████████▊ │
|
||||
47. │ KPDX │ 197760 │ 1269230 │ ████████████▋ │
|
||||
48. │ SBGR │ 197624 │ 2041697 │ ████████████████████▍ │
|
||||
49. │ VOBL │ 189011 │ 1040180 │ ██████████▍ │
|
||||
50. │ LEBL │ 188956 │ 1283190 │ ████████████▋ │
|
||||
51. │ YBBN │ 188011 │ 1253405 │ ████████████▌ │
|
||||
52. │ LSZH │ 187934 │ 1572029 │ ███████████████▋ │
|
||||
53. │ YMML │ 187643 │ 1870076 │ ██████████████████▋ │
|
||||
54. │ RCTP │ 184466 │ 2773976 │ ███████████████████████████▋ │
|
||||
55. │ KSNA │ 180045 │ 778484 │ ███████▋ │
|
||||
56. │ EGKK │ 176420 │ 1694770 │ ████████████████▊ │
|
||||
57. │ LOWW │ 176191 │ 1274833 │ ████████████▋ │
|
||||
58. │ UUDD │ 176099 │ 1368226 │ █████████████▋ │
|
||||
59. │ RKSI │ 173466 │ 3079026 │ ██████████████████████████████▋ │
|
||||
60. │ EKCH │ 172128 │ 1229895 │ ████████████▎ │
|
||||
61. │ KOAK │ 171119 │ 1114447 │ ███████████▏ │
|
||||
62. │ RPLL │ 170122 │ 1440735 │ ██████████████▍ │
|
||||
63. │ KRDU │ 167001 │ 830521 │ ████████▎ │
|
||||
64. │ KAUS │ 164524 │ 1256198 │ ████████████▌ │
|
||||
65. │ KBNA │ 163242 │ 1022726 │ ██████████▏ │
|
||||
66. │ KSDF │ 162655 │ 1380867 │ █████████████▋ │
|
||||
67. │ ENGM │ 160732 │ 910108 │ █████████ │
|
||||
68. │ LIMC │ 160696 │ 1564620 │ ███████████████▋ │
|
||||
69. │ KSJC │ 159278 │ 1081125 │ ██████████▋ │
|
||||
70. │ KSTL │ 157984 │ 1026699 │ ██████████▎ │
|
||||
71. │ UUWW │ 156811 │ 1261155 │ ████████████▌ │
|
||||
72. │ KIND │ 153929 │ 987944 │ █████████▊ │
|
||||
73. │ ESSA │ 153390 │ 1203439 │ ████████████ │
|
||||
74. │ KMCO │ 153351 │ 1508657 │ ███████████████ │
|
||||
75. │ KDVT │ 152895 │ 74048 │ ▋ │
|
||||
76. │ VTBS │ 152645 │ 2255591 │ ██████████████████████▌ │
|
||||
77. │ CYVR │ 149574 │ 2027413 │ ████████████████████▎ │
|
||||
78. │ EIDW │ 148723 │ 1503985 │ ███████████████ │
|
||||
79. │ LFPO │ 143277 │ 1152964 │ ███████████▌ │
|
||||
80. │ EGSS │ 140830 │ 1348183 │ █████████████▍ │
|
||||
81. │ KAPA │ 140776 │ 420441 │ ████▏ │
|
||||
82. │ KHOU │ 138985 │ 1068806 │ ██████████▋ │
|
||||
83. │ KTPA │ 138033 │ 1338223 │ █████████████▍ │
|
||||
84. │ KFFZ │ 137333 │ 55397 │ ▌ │
|
||||
85. │ NZAA │ 136092 │ 1581264 │ ███████████████▋ │
|
||||
86. │ YPPH │ 133916 │ 1271550 │ ████████████▋ │
|
||||
87. │ RJBB │ 133522 │ 1805623 │ ██████████████████ │
|
||||
88. │ EDDL │ 133018 │ 1265919 │ ████████████▋ │
|
||||
89. │ ULLI │ 130501 │ 1197108 │ ███████████▊ │
|
||||
90. │ KIWA │ 127195 │ 250876 │ ██▌ │
|
||||
91. │ KTEB │ 126969 │ 1189414 │ ███████████▊ │
|
||||
92. │ VOMM │ 125616 │ 1127757 │ ███████████▎ │
|
||||
93. │ LSGG │ 123998 │ 1049101 │ ██████████▍ │
|
||||
94. │ LPPT │ 122733 │ 1779187 │ █████████████████▋ │
|
||||
95. │ WSSS │ 120493 │ 3264122 │ ████████████████████████████████▋ │
|
||||
96. │ EBBR │ 118539 │ 1579939 │ ███████████████▋ │
|
||||
97. │ VTBD │ 118107 │ 661627 │ ██████▌ │
|
||||
98. │ KVNY │ 116326 │ 692960 │ ██████▊ │
|
||||
99. │ EDDT │ 115122 │ 941740 │ █████████▍ │
|
||||
100. │ EFHK │ 114860 │ 1629143 │ ████████████████▎ │
|
||||
└────────┴─────────┴──────────┴────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 每周来自莫斯科三个主要机场的航班数量 {#flights-from-moscow}
|
||||
|
||||
请求:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
toMonday(day) AS k,
|
||||
count() AS c,
|
||||
bar(c, 0, 10000, 100) AS bar
|
||||
FROM opensky
|
||||
WHERE origin IN ('UUEE', 'UUDD', 'UUWW')
|
||||
GROUP BY k
|
||||
ORDER BY k ASC;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
```text
|
||||
┌──────────k─┬────c─┬─bar──────────────────────────────────────────────────────────────────────────┐
|
||||
1. │ 2018-12-31 │ 5248 │ ████████████████████████████████████████████████████▍ │
|
||||
2. │ 2019-01-07 │ 6302 │ ███████████████████████████████████████████████████████████████ │
|
||||
3. │ 2019-01-14 │ 5701 │ █████████████████████████████████████████████████████████ │
|
||||
4. │ 2019-01-21 │ 5638 │ ████████████████████████████████████████████████████████▍ │
|
||||
5. │ 2019-01-28 │ 5731 │ █████████████████████████████████████████████████████████▎ │
|
||||
6. │ 2019-02-04 │ 5683 │ ████████████████████████████████████████████████████████▋ │
|
||||
7. │ 2019-02-11 │ 5759 │ █████████████████████████████████████████████████████████▌ │
|
||||
8. │ 2019-02-18 │ 5736 │ █████████████████████████████████████████████████████████▎ │
|
||||
9. │ 2019-02-25 │ 5873 │ ██████████████████████████████████████████████████████████▋ │
|
||||
10. │ 2019-03-04 │ 5965 │ ███████████████████████████████████████████████████████████▋ │
|
||||
11. │ 2019-03-11 │ 5900 │ ███████████████████████████████████████████████████████████ │
|
||||
12. │ 2019-03-18 │ 5823 │ ██████████████████████████████████████████████████████████▏ │
|
||||
13. │ 2019-03-25 │ 5899 │ ██████████████████████████████████████████████████████████▊ │
|
||||
14. │ 2019-04-01 │ 6043 │ ████████████████████████████████████████████████████████████▍ │
|
||||
15. │ 2019-04-08 │ 6098 │ ████████████████████████████████████████████████████████████▊ │
|
||||
16. │ 2019-04-15 │ 6196 │ █████████████████████████████████████████████████████████████▊ │
|
||||
17. │ 2019-04-22 │ 6486 │ ████████████████████████████████████████████████████████████████▋ │
|
||||
18. │ 2019-04-29 │ 6682 │ ██████████████████████████████████████████████████████████████████▋ │
|
||||
19. │ 2019-05-06 │ 6739 │ ███████████████████████████████████████████████████████████████████▍ │
|
||||
20. │ 2019-05-13 │ 6600 │ ██████████████████████████████████████████████████████████████████ │
|
||||
21. │ 2019-05-20 │ 6575 │ █████████████████████████████████████████████████████████████████▋ │
|
||||
22. │ 2019-05-27 │ 6786 │ ███████████████████████████████████████████████████████████████████▋ │
|
||||
23. │ 2019-06-03 │ 6872 │ ████████████████████████████████████████████████████████████████████▋ │
|
||||
24. │ 2019-06-10 │ 7045 │ ██████████████████████████████████████████████████████████████████████▍ │
|
||||
25. │ 2019-06-17 │ 7045 │ ██████████████████████████████████████████████████████████████████████▍ │
|
||||
26. │ 2019-06-24 │ 6852 │ ████████████████████████████████████████████████████████████████████▌ │
|
||||
27. │ 2019-07-01 │ 7248 │ ████████████████████████████████████████████████████████████████████████▍ │
|
||||
28. │ 2019-07-08 │ 7284 │ ████████████████████████████████████████████████████████████████████████▋ │
|
||||
29. │ 2019-07-15 │ 7142 │ ███████████████████████████████████████████████████████████████████████▍ │
|
||||
30. │ 2019-07-22 │ 7108 │ ███████████████████████████████████████████████████████████████████████ │
|
||||
31. │ 2019-07-29 │ 7251 │ ████████████████████████████████████████████████████████████████████████▌ │
|
||||
32. │ 2019-08-05 │ 7403 │ ██████████████████████████████████████████████████████████████████████████ │
|
||||
33. │ 2019-08-12 │ 7457 │ ██████████████████████████████████████████████████████████████████████████▌ │
|
||||
34. │ 2019-08-19 │ 7502 │ ███████████████████████████████████████████████████████████████████████████ │
|
||||
35. │ 2019-08-26 │ 7540 │ ███████████████████████████████████████████████████████████████████████████▍ │
|
||||
36. │ 2019-09-02 │ 7237 │ ████████████████████████████████████████████████████████████████████████▎ │
|
||||
37. │ 2019-09-09 │ 7328 │ █████████████████████████████████████████████████████████████████████████▎ │
|
||||
38. │ 2019-09-16 │ 5566 │ ███████████████████████████████████████████████████████▋ │
|
||||
39. │ 2019-09-23 │ 7049 │ ██████████████████████████████████████████████████████████████████████▍ │
|
||||
40. │ 2019-09-30 │ 6880 │ ████████████████████████████████████████████████████████████████████▋ │
|
||||
41. │ 2019-10-07 │ 6518 │ █████████████████████████████████████████████████████████████████▏ │
|
||||
42. │ 2019-10-14 │ 6688 │ ██████████████████████████████████████████████████████████████████▊ │
|
||||
43. │ 2019-10-21 │ 6667 │ ██████████████████████████████████████████████████████████████████▋ │
|
||||
44. │ 2019-10-28 │ 6303 │ ███████████████████████████████████████████████████████████████ │
|
||||
45. │ 2019-11-04 │ 6298 │ ██████████████████████████████████████████████████████████████▊ │
|
||||
46. │ 2019-11-11 │ 6137 │ █████████████████████████████████████████████████████████████▎ │
|
||||
47. │ 2019-11-18 │ 6051 │ ████████████████████████████████████████████████████████████▌ │
|
||||
48. │ 2019-11-25 │ 5820 │ ██████████████████████████████████████████████████████████▏ │
|
||||
49. │ 2019-12-02 │ 5942 │ ███████████████████████████████████████████████████████████▍ │
|
||||
50. │ 2019-12-09 │ 4891 │ ████████████████████████████████████████████████▊ │
|
||||
51. │ 2019-12-16 │ 5682 │ ████████████████████████████████████████████████████████▋ │
|
||||
52. │ 2019-12-23 │ 6111 │ █████████████████████████████████████████████████████████████ │
|
||||
53. │ 2019-12-30 │ 5870 │ ██████████████████████████████████████████████████████████▋ │
|
||||
54. │ 2020-01-06 │ 5953 │ ███████████████████████████████████████████████████████████▌ │
|
||||
55. │ 2020-01-13 │ 5698 │ ████████████████████████████████████████████████████████▊ │
|
||||
56. │ 2020-01-20 │ 5339 │ █████████████████████████████████████████████████████▍ │
|
||||
57. │ 2020-01-27 │ 5566 │ ███████████████████████████████████████████████████████▋ │
|
||||
58. │ 2020-02-03 │ 5801 │ ██████████████████████████████████████████████████████████ │
|
||||
59. │ 2020-02-10 │ 5692 │ ████████████████████████████████████████████████████████▊ │
|
||||
60. │ 2020-02-17 │ 5912 │ ███████████████████████████████████████████████████████████ │
|
||||
61. │ 2020-02-24 │ 6031 │ ████████████████████████████████████████████████████████████▎ │
|
||||
62. │ 2020-03-02 │ 6105 │ █████████████████████████████████████████████████████████████ │
|
||||
63. │ 2020-03-09 │ 5823 │ ██████████████████████████████████████████████████████████▏ │
|
||||
64. │ 2020-03-16 │ 4659 │ ██████████████████████████████████████████████▌ │
|
||||
65. │ 2020-03-23 │ 3720 │ █████████████████████████████████████▏ │
|
||||
66. │ 2020-03-30 │ 1720 │ █████████████████▏ │
|
||||
67. │ 2020-04-06 │ 849 │ ████████▍ │
|
||||
68. │ 2020-04-13 │ 710 │ ███████ │
|
||||
69. │ 2020-04-20 │ 725 │ ███████▏ │
|
||||
70. │ 2020-04-27 │ 920 │ █████████▏ │
|
||||
71. │ 2020-05-04 │ 859 │ ████████▌ │
|
||||
72. │ 2020-05-11 │ 1047 │ ██████████▍ │
|
||||
73. │ 2020-05-18 │ 1135 │ ███████████▎ │
|
||||
74. │ 2020-05-25 │ 1266 │ ████████████▋ │
|
||||
75. │ 2020-06-01 │ 1793 │ █████████████████▊ │
|
||||
76. │ 2020-06-08 │ 1979 │ ███████████████████▋ │
|
||||
77. │ 2020-06-15 │ 2297 │ ██████████████████████▊ │
|
||||
78. │ 2020-06-22 │ 2788 │ ███████████████████████████▊ │
|
||||
79. │ 2020-06-29 │ 3389 │ █████████████████████████████████▊ │
|
||||
80. │ 2020-07-06 │ 3545 │ ███████████████████████████████████▍ │
|
||||
81. │ 2020-07-13 │ 3569 │ ███████████████████████████████████▋ │
|
||||
82. │ 2020-07-20 │ 3784 │ █████████████████████████████████████▋ │
|
||||
83. │ 2020-07-27 │ 3960 │ ███████████████████████████████████████▌ │
|
||||
84. │ 2020-08-03 │ 4323 │ ███████████████████████████████████████████▏ │
|
||||
85. │ 2020-08-10 │ 4581 │ █████████████████████████████████████████████▋ │
|
||||
86. │ 2020-08-17 │ 4791 │ ███████████████████████████████████████████████▊ │
|
||||
87. │ 2020-08-24 │ 4928 │ █████████████████████████████████████████████████▎ │
|
||||
88. │ 2020-08-31 │ 4687 │ ██████████████████████████████████████████████▋ │
|
||||
89. │ 2020-09-07 │ 4643 │ ██████████████████████████████████████████████▍ │
|
||||
90. │ 2020-09-14 │ 4594 │ █████████████████████████████████████████████▊ │
|
||||
91. │ 2020-09-21 │ 4478 │ ████████████████████████████████████████████▋ │
|
||||
92. │ 2020-09-28 │ 4382 │ ███████████████████████████████████████████▋ │
|
||||
93. │ 2020-10-05 │ 4261 │ ██████████████████████████████████████████▌ │
|
||||
94. │ 2020-10-12 │ 4243 │ ██████████████████████████████████████████▍ │
|
||||
95. │ 2020-10-19 │ 3941 │ ███████████████████████████████████████▍ │
|
||||
96. │ 2020-10-26 │ 3616 │ ████████████████████████████████████▏ │
|
||||
97. │ 2020-11-02 │ 3586 │ ███████████████████████████████████▋ │
|
||||
98. │ 2020-11-09 │ 3403 │ ██████████████████████████████████ │
|
||||
99. │ 2020-11-16 │ 3336 │ █████████████████████████████████▎ │
|
||||
100. │ 2020-11-23 │ 3230 │ ████████████████████████████████▎ │
|
||||
101. │ 2020-11-30 │ 3183 │ ███████████████████████████████▋ │
|
||||
102. │ 2020-12-07 │ 3285 │ ████████████████████████████████▋ │
|
||||
103. │ 2020-12-14 │ 3367 │ █████████████████████████████████▋ │
|
||||
104. │ 2020-12-21 │ 3748 │ █████████████████████████████████████▍ │
|
||||
105. │ 2020-12-28 │ 3986 │ ███████████████████████████████████████▋ │
|
||||
106. │ 2021-01-04 │ 3906 │ ███████████████████████████████████████ │
|
||||
107. │ 2021-01-11 │ 3425 │ ██████████████████████████████████▎ │
|
||||
108. │ 2021-01-18 │ 3144 │ ███████████████████████████████▍ │
|
||||
109. │ 2021-01-25 │ 3115 │ ███████████████████████████████▏ │
|
||||
110. │ 2021-02-01 │ 3285 │ ████████████████████████████████▋ │
|
||||
111. │ 2021-02-08 │ 3321 │ █████████████████████████████████▏ │
|
||||
112. │ 2021-02-15 │ 3475 │ ██████████████████████████████████▋ │
|
||||
113. │ 2021-02-22 │ 3549 │ ███████████████████████████████████▍ │
|
||||
114. │ 2021-03-01 │ 3755 │ █████████████████████████████████████▌ │
|
||||
115. │ 2021-03-08 │ 3080 │ ██████████████████████████████▋ │
|
||||
116. │ 2021-03-15 │ 3789 │ █████████████████████████████████████▊ │
|
||||
117. │ 2021-03-22 │ 3804 │ ██████████████████████████████████████ │
|
||||
118. │ 2021-03-29 │ 4238 │ ██████████████████████████████████████████▍ │
|
||||
119. │ 2021-04-05 │ 4307 │ ███████████████████████████████████████████ │
|
||||
120. │ 2021-04-12 │ 4225 │ ██████████████████████████████████████████▎ │
|
||||
121. │ 2021-04-19 │ 4391 │ ███████████████████████████████████████████▊ │
|
||||
122. │ 2021-04-26 │ 4868 │ ████████████████████████████████████████████████▋ │
|
||||
123. │ 2021-05-03 │ 4977 │ █████████████████████████████████████████████████▋ │
|
||||
124. │ 2021-05-10 │ 5164 │ ███████████████████████████████████████████████████▋ │
|
||||
125. │ 2021-05-17 │ 4986 │ █████████████████████████████████████████████████▋ │
|
||||
126. │ 2021-05-24 │ 5024 │ ██████████████████████████████████████████████████▏ │
|
||||
127. │ 2021-05-31 │ 4824 │ ████████████████████████████████████████████████▏ │
|
||||
128. │ 2021-06-07 │ 5652 │ ████████████████████████████████████████████████████████▌ │
|
||||
129. │ 2021-06-14 │ 5613 │ ████████████████████████████████████████████████████████▏ │
|
||||
130. │ 2021-06-21 │ 6061 │ ████████████████████████████████████████████████████████████▌ │
|
||||
131. │ 2021-06-28 │ 2554 │ █████████████████████████▌ │
|
||||
└────────────┴──────┴──────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 在线 Playground {#playground}
|
||||
|
||||
你可以使用交互式资源 [Online Playground](https://play.clickhouse.com/play?user=play) 来尝试对此数据集的其他查询。 例如, [执行这个查询](https://play.clickhouse.com/play?user=play#U0VMRUNUCiAgICBvcmlnaW4sCiAgICBjb3VudCgpLAogICAgcm91bmQoYXZnKGdlb0Rpc3RhbmNlKGxvbmdpdHVkZV8xLCBsYXRpdHVkZV8xLCBsb25naXR1ZGVfMiwgbGF0aXR1ZGVfMikpKSBBUyBkaXN0YW5jZSwKICAgIGJhcihkaXN0YW5jZSwgMCwgMTAwMDAwMDAsIDEwMCkgQVMgYmFyCkZST00gb3BlbnNreQpXSEVSRSBvcmlnaW4gIT0gJycKR1JPVVAgQlkgb3JpZ2luCk9SREVSIEJZIGNvdW50KCkgREVTQwpMSU1JVCAxMDA=). 但是,请注意无法在 Playground 中创建临时表。
|
||||
|
@ -1,9 +1,339 @@
|
||||
---
|
||||
slug: /zh/getting-started/example-datasets/recipes
|
||||
sidebar_label: Recipes Dataset
|
||||
title: "Recipes Dataset"
|
||||
---
|
||||
slug: /zh/getting-started/example-datasets/recipes
|
||||
sidebar_label: 食谱数据集
|
||||
title: "食谱数据集"
|
||||
---
|
||||
|
||||
import Content from '@site/docs/en/getting-started/example-datasets/recipes.md';
|
||||
RecipeNLG 数据集可在 [此处](https://recipenlg.cs.put.poznan.pl/dataset) 下载。其中包含 220 万份食谱。大小略小于 1 GB。
|
||||
|
||||
<Content />
|
||||
## 下载并解压数据集
|
||||
|
||||
1. 进入下载页面[https://recipenlg.cs.put.poznan.pl/dataset](https://recipenlg.cs.put.poznan.pl/dataset)。
|
||||
2. 接受条款和条件并下载 zip 文件。
|
||||
3. 使用 `unzip` 解压 zip 文件,得到 `full_dataset.csv` 文件。
|
||||
|
||||
## 创建表
|
||||
|
||||
运行 clickhouse-client 并执行以下 CREATE 请求:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE recipes
|
||||
(
|
||||
title String,
|
||||
ingredients Array(String),
|
||||
directions Array(String),
|
||||
link String,
|
||||
source LowCardinality(String),
|
||||
NER Array(String)
|
||||
) ENGINE = MergeTree ORDER BY title;
|
||||
```
|
||||
|
||||
## 插入数据
|
||||
|
||||
运行以下命令:
|
||||
|
||||
``` bash
|
||||
clickhouse-client --query "
|
||||
INSERT INTO recipes
|
||||
SELECT
|
||||
title,
|
||||
JSONExtract(ingredients, 'Array(String)'),
|
||||
JSONExtract(directions, 'Array(String)'),
|
||||
link,
|
||||
source,
|
||||
JSONExtract(NER, 'Array(String)')
|
||||
FROM input('num UInt32, title String, ingredients String, directions String, link String, source LowCardinality(String), NER String')
|
||||
FORMAT CSVWithNames
|
||||
" --input_format_with_names_use_header 0 --format_csv_allow_single_quote 0 --input_format_allow_errors_num 10 < full_dataset.csv
|
||||
```
|
||||
|
||||
这是一个展示如何解析自定义 CSV,这其中涉及了许多调整。
|
||||
|
||||
说明:
|
||||
- 数据集为 CSV 格式,但在插入时需要一些预处理;使用表函数 [input](../../sql-reference/table-functions/input.md) 进行预处理;
|
||||
- CSV 文件的结构在表函数 `input` 的参数中指定;
|
||||
- 字段 `num`(行号)是不需要的 - 可以忽略并从文件中进行解析;
|
||||
- 使用 `FORMAT CSVWithNames`,因为标题不包含第一个字段的名称,因此 CSV 中的标题将被忽略(通过命令行参数 `--input_format_with_names_use_header 0`);
|
||||
- 文件仅使用双引号将 CSV 字符串括起来;一些字符串没有用双引号括起来,单引号也不能被解析为括起来的字符串 - 所以添加`--format_csv_allow_single_quote 0`参数接受文件中的单引号;
|
||||
- 由于某些 CSV 的字符串的开头包含 `\M/` 因此无法被解析; CSV 中唯一可能以反斜杠开头的值是 `\N`,这个值被解析为 SQL NULL。通过添加`--input_format_allow_errors_num 10`参数,允许在导入过程中跳过 10 个格式错误;
|
||||
- 在数据集中的 Ingredients、directions 和 NER 字段为数组;但这些数组并没有以一般形式表示:这些字段作为 JSON 序列化为字符串,然后放入 CSV 中 - 在导入是将它们解析为字符串,然后使用 [JSONExtract](../../sql-reference/functions/json-functions.md ) 函数将其转换为数组。
|
||||
|
||||
## 验证插入的数据
|
||||
|
||||
通过检查行数:
|
||||
|
||||
请求:
|
||||
|
||||
``` sql
|
||||
SELECT count() FROM recipes;
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─count()─┐
|
||||
│ 2231141 │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
## 示例查询
|
||||
|
||||
### 按配方数量排列的顶级组件:
|
||||
|
||||
在此示例中,我们学习如何使用 [arrayJoin](../../sql-reference/functions/array-join/) 函数将数组扩展为行的集合。
|
||||
|
||||
请求:
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
arrayJoin(NER) AS k,
|
||||
count() AS c
|
||||
FROM recipes
|
||||
GROUP BY k
|
||||
ORDER BY c DESC
|
||||
LIMIT 50
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─k────────────────────┬──────c─┐
|
||||
│ salt │ 890741 │
|
||||
│ sugar │ 620027 │
|
||||
│ butter │ 493823 │
|
||||
│ flour │ 466110 │
|
||||
│ eggs │ 401276 │
|
||||
│ onion │ 372469 │
|
||||
│ garlic │ 358364 │
|
||||
│ milk │ 346769 │
|
||||
│ water │ 326092 │
|
||||
│ vanilla │ 270381 │
|
||||
│ olive oil │ 197877 │
|
||||
│ pepper │ 179305 │
|
||||
│ brown sugar │ 174447 │
|
||||
│ tomatoes │ 163933 │
|
||||
│ egg │ 160507 │
|
||||
│ baking powder │ 148277 │
|
||||
│ lemon juice │ 146414 │
|
||||
│ Salt │ 122557 │
|
||||
│ cinnamon │ 117927 │
|
||||
│ sour cream │ 116682 │
|
||||
│ cream cheese │ 114423 │
|
||||
│ margarine │ 112742 │
|
||||
│ celery │ 112676 │
|
||||
│ baking soda │ 110690 │
|
||||
│ parsley │ 102151 │
|
||||
│ chicken │ 101505 │
|
||||
│ onions │ 98903 │
|
||||
│ vegetable oil │ 91395 │
|
||||
│ oil │ 85600 │
|
||||
│ mayonnaise │ 84822 │
|
||||
│ pecans │ 79741 │
|
||||
│ nuts │ 78471 │
|
||||
│ potatoes │ 75820 │
|
||||
│ carrots │ 75458 │
|
||||
│ pineapple │ 74345 │
|
||||
│ soy sauce │ 70355 │
|
||||
│ black pepper │ 69064 │
|
||||
│ thyme │ 68429 │
|
||||
│ mustard │ 65948 │
|
||||
│ chicken broth │ 65112 │
|
||||
│ bacon │ 64956 │
|
||||
│ honey │ 64626 │
|
||||
│ oregano │ 64077 │
|
||||
│ ground beef │ 64068 │
|
||||
│ unsalted butter │ 63848 │
|
||||
│ mushrooms │ 61465 │
|
||||
│ Worcestershire sauce │ 59328 │
|
||||
│ cornstarch │ 58476 │
|
||||
│ green pepper │ 58388 │
|
||||
│ Cheddar cheese │ 58354 │
|
||||
└──────────────────────┴────────┘
|
||||
|
||||
50 rows in set. Elapsed: 0.112 sec. Processed 2.23 million rows, 361.57 MB (19.99 million rows/s., 3.24 GB/s.)
|
||||
```
|
||||
|
||||
### 最复杂的草莓食谱
|
||||
|
||||
``` sql
|
||||
SELECT
|
||||
title,
|
||||
length(NER),
|
||||
length(directions)
|
||||
FROM recipes
|
||||
WHERE has(NER, 'strawberry')
|
||||
ORDER BY length(directions) DESC
|
||||
LIMIT 10
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─title────────────────────────────────────────────────────────────┬─length(NER)─┬─length(directions)─┐
|
||||
│ Chocolate-Strawberry-Orange Wedding Cake │ 24 │ 126 │
|
||||
│ Strawberry Cream Cheese Crumble Tart │ 19 │ 47 │
|
||||
│ Charlotte-Style Ice Cream │ 11 │ 45 │
|
||||
│ Sinfully Good a Million Layers Chocolate Layer Cake, With Strawb │ 31 │ 45 │
|
||||
│ Sweetened Berries With Elderflower Sherbet │ 24 │ 44 │
|
||||
│ Chocolate-Strawberry Mousse Cake │ 15 │ 42 │
|
||||
│ Rhubarb Charlotte with Strawberries and Rum │ 20 │ 42 │
|
||||
│ Chef Joey's Strawberry Vanilla Tart │ 7 │ 37 │
|
||||
│ Old-Fashioned Ice Cream Sundae Cake │ 17 │ 37 │
|
||||
│ Watermelon Cake │ 16 │ 36 │
|
||||
└──────────────────────────────────────────────────────────────────┴─────────────┴────────────────────┘
|
||||
|
||||
10 rows in set. Elapsed: 0.215 sec. Processed 2.23 million rows, 1.48 GB (10.35 million rows/s., 6.86 GB/s.)
|
||||
```
|
||||
|
||||
在此示例中,我们使用 [has](../../sql-reference/functions/array-functions/#hasarr-elem) 函数来按过滤数组类型元素并按 directions 的数量进行排序。
|
||||
|
||||
有一个婚礼蛋糕需要整个126个步骤来制作!显示 directions:
|
||||
|
||||
请求:
|
||||
|
||||
``` sql
|
||||
SELECT arrayJoin(directions)
|
||||
FROM recipes
|
||||
WHERE title = 'Chocolate-Strawberry-Orange Wedding Cake'
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
``` text
|
||||
┌─arrayJoin(directions)───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Position 1 rack in center and 1 rack in bottom third of oven and preheat to 350F. │
|
||||
│ Butter one 5-inch-diameter cake pan with 2-inch-high sides, one 8-inch-diameter cake pan with 2-inch-high sides and one 12-inch-diameter cake pan with 2-inch-high sides. │
|
||||
│ Dust pans with flour; line bottoms with parchment. │
|
||||
│ Combine 1/3 cup orange juice and 2 ounces unsweetened chocolate in heavy small saucepan. │
|
||||
│ Stir mixture over medium-low heat until chocolate melts. │
|
||||
│ Remove from heat. │
|
||||
│ Gradually mix in 1 2/3 cups orange juice. │
|
||||
│ Sift 3 cups flour, 2/3 cup cocoa, 2 teaspoons baking soda, 1 teaspoon salt and 1/2 teaspoon baking powder into medium bowl. │
|
||||
│ using electric mixer, beat 1 cup (2 sticks) butter and 3 cups sugar in large bowl until blended (mixture will look grainy). │
|
||||
│ Add 4 eggs, 1 at a time, beating to blend after each. │
|
||||
│ Beat in 1 tablespoon orange peel and 1 tablespoon vanilla extract. │
|
||||
│ Add dry ingredients alternately with orange juice mixture in 3 additions each, beating well after each addition. │
|
||||
│ Mix in 1 cup chocolate chips. │
|
||||
│ Transfer 1 cup plus 2 tablespoons batter to prepared 5-inch pan, 3 cups batter to prepared 8-inch pan and remaining batter (about 6 cups) to 12-inch pan. │
|
||||
│ Place 5-inch and 8-inch pans on center rack of oven. │
|
||||
│ Place 12-inch pan on lower rack of oven. │
|
||||
│ Bake cakes until tester inserted into center comes out clean, about 35 minutes. │
|
||||
│ Transfer cakes in pans to racks and cool completely. │
|
||||
│ Mark 4-inch diameter circle on one 6-inch-diameter cardboard cake round. │
|
||||
│ Cut out marked circle. │
|
||||
│ Mark 7-inch-diameter circle on one 8-inch-diameter cardboard cake round. │
|
||||
│ Cut out marked circle. │
|
||||
│ Mark 11-inch-diameter circle on one 12-inch-diameter cardboard cake round. │
|
||||
│ Cut out marked circle. │
|
||||
│ Cut around sides of 5-inch-cake to loosen. │
|
||||
│ Place 4-inch cardboard over pan. │
|
||||
│ Hold cardboard and pan together; turn cake out onto cardboard. │
|
||||
│ Peel off parchment.Wrap cakes on its cardboard in foil. │
|
||||
│ Repeat turning out, peeling off parchment and wrapping cakes in foil, using 7-inch cardboard for 8-inch cake and 11-inch cardboard for 12-inch cake. │
|
||||
│ Using remaining ingredients, make 1 more batch of cake batter and bake 3 more cake layers as described above. │
|
||||
│ Cool cakes in pans. │
|
||||
│ Cover cakes in pans tightly with foil. │
|
||||
│ (Can be prepared ahead. │
|
||||
│ Let stand at room temperature up to 1 day or double-wrap all cake layers and freeze up to 1 week. │
|
||||
│ Bring cake layers to room temperature before using.) │
|
||||
│ Place first 12-inch cake on its cardboard on work surface. │
|
||||
│ Spread 2 3/4 cups ganache over top of cake and all the way to edge. │
|
||||
│ Spread 2/3 cup jam over ganache, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Drop 1 3/4 cups white chocolate frosting by spoonfuls over jam. │
|
||||
│ Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Rub some cocoa powder over second 12-inch cardboard. │
|
||||
│ Cut around sides of second 12-inch cake to loosen. │
|
||||
│ Place cardboard, cocoa side down, over pan. │
|
||||
│ Turn cake out onto cardboard. │
|
||||
│ Peel off parchment. │
|
||||
│ Carefully slide cake off cardboard and onto filling on first 12-inch cake. │
|
||||
│ Refrigerate. │
|
||||
│ Place first 8-inch cake on its cardboard on work surface. │
|
||||
│ Spread 1 cup ganache over top all the way to edge. │
|
||||
│ Spread 1/4 cup jam over, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Drop 1 cup white chocolate frosting by spoonfuls over jam. │
|
||||
│ Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Rub some cocoa over second 8-inch cardboard. │
|
||||
│ Cut around sides of second 8-inch cake to loosen. │
|
||||
│ Place cardboard, cocoa side down, over pan. │
|
||||
│ Turn cake out onto cardboard. │
|
||||
│ Peel off parchment. │
|
||||
│ Slide cake off cardboard and onto filling on first 8-inch cake. │
|
||||
│ Refrigerate. │
|
||||
│ Place first 5-inch cake on its cardboard on work surface. │
|
||||
│ Spread 1/2 cup ganache over top of cake and all the way to edge. │
|
||||
│ Spread 2 tablespoons jam over, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Drop 1/3 cup white chocolate frosting by spoonfuls over jam. │
|
||||
│ Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Rub cocoa over second 6-inch cardboard. │
|
||||
│ Cut around sides of second 5-inch cake to loosen. │
|
||||
│ Place cardboard, cocoa side down, over pan. │
|
||||
│ Turn cake out onto cardboard. │
|
||||
│ Peel off parchment. │
|
||||
│ Slide cake off cardboard and onto filling on first 5-inch cake. │
|
||||
│ Chill all cakes 1 hour to set filling. │
|
||||
│ Place 12-inch tiered cake on its cardboard on revolving cake stand. │
|
||||
│ Spread 2 2/3 cups frosting over top and sides of cake as a first coat. │
|
||||
│ Refrigerate cake. │
|
||||
│ Place 8-inch tiered cake on its cardboard on cake stand. │
|
||||
│ Spread 1 1/4 cups frosting over top and sides of cake as a first coat. │
|
||||
│ Refrigerate cake. │
|
||||
│ Place 5-inch tiered cake on its cardboard on cake stand. │
|
||||
│ Spread 3/4 cup frosting over top and sides of cake as a first coat. │
|
||||
│ Refrigerate all cakes until first coats of frosting set, about 1 hour. │
|
||||
│ (Cakes can be made to this point up to 1 day ahead; cover and keep refrigerate.) │
|
||||
│ Prepare second batch of frosting, using remaining frosting ingredients and following directions for first batch. │
|
||||
│ Spoon 2 cups frosting into pastry bag fitted with small star tip. │
|
||||
│ Place 12-inch cake on its cardboard on large flat platter. │
|
||||
│ Place platter on cake stand. │
|
||||
│ Using icing spatula, spread 2 1/2 cups frosting over top and sides of cake; smooth top. │
|
||||
│ Using filled pastry bag, pipe decorative border around top edge of cake. │
|
||||
│ Refrigerate cake on platter. │
|
||||
│ Place 8-inch cake on its cardboard on cake stand. │
|
||||
│ Using icing spatula, spread 1 1/2 cups frosting over top and sides of cake; smooth top. │
|
||||
│ Using pastry bag, pipe decorative border around top edge of cake. │
|
||||
│ Refrigerate cake on its cardboard. │
|
||||
│ Place 5-inch cake on its cardboard on cake stand. │
|
||||
│ Using icing spatula, spread 3/4 cup frosting over top and sides of cake; smooth top. │
|
||||
│ Using pastry bag, pipe decorative border around top edge of cake, spooning more frosting into bag if necessary. │
|
||||
│ Refrigerate cake on its cardboard. │
|
||||
│ Keep all cakes refrigerated until frosting sets, about 2 hours. │
|
||||
│ (Can be prepared 2 days ahead. │
|
||||
│ Cover loosely; keep refrigerated.) │
|
||||
│ Place 12-inch cake on platter on work surface. │
|
||||
│ Press 1 wooden dowel straight down into and completely through center of cake. │
|
||||
│ Mark dowel 1/4 inch above top of frosting. │
|
||||
│ Remove dowel and cut with serrated knife at marked point. │
|
||||
│ Cut 4 more dowels to same length. │
|
||||
│ Press 1 cut dowel back into center of cake. │
|
||||
│ Press remaining 4 cut dowels into cake, positioning 3 1/2 inches inward from cake edges and spacing evenly. │
|
||||
│ Place 8-inch cake on its cardboard on work surface. │
|
||||
│ Press 1 dowel straight down into and completely through center of cake. │
|
||||
│ Mark dowel 1/4 inch above top of frosting. │
|
||||
│ Remove dowel and cut with serrated knife at marked point. │
|
||||
│ Cut 3 more dowels to same length. │
|
||||
│ Press 1 cut dowel back into center of cake. │
|
||||
│ Press remaining 3 cut dowels into cake, positioning 2 1/2 inches inward from edges and spacing evenly. │
|
||||
│ Using large metal spatula as aid, place 8-inch cake on its cardboard atop dowels in 12-inch cake, centering carefully. │
|
||||
│ Gently place 5-inch cake on its cardboard atop dowels in 8-inch cake, centering carefully. │
|
||||
│ Using citrus stripper, cut long strips of orange peel from oranges. │
|
||||
│ Cut strips into long segments. │
|
||||
│ To make orange peel coils, wrap peel segment around handle of wooden spoon; gently slide peel off handle so that peel keeps coiled shape. │
|
||||
│ Garnish cake with orange peel coils, ivy or mint sprigs, and some berries. │
|
||||
│ (Assembled cake can be made up to 8 hours ahead. │
|
||||
│ Let stand at cool room temperature.) │
|
||||
│ Remove top and middle cake tiers. │
|
||||
│ Remove dowels from cakes. │
|
||||
│ Cut top and middle cakes into slices. │
|
||||
│ To cut 12-inch cake: Starting 3 inches inward from edge and inserting knife straight down, cut through from top to bottom to make 6-inch-diameter circle in center of cake. │
|
||||
│ Cut outer portion of cake into slices; cut inner portion into slices and serve with strawberries. │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
126 rows in set. Elapsed: 0.011 sec. Processed 8.19 thousand rows, 5.34 MB (737.75 thousand rows/s., 480.59 MB/s.)
|
||||
```
|
||||
|
||||
### 在线 Playground
|
||||
|
||||
此数据集也可在 [在线 Playground](https://play.clickhouse.com/play?user=play#U0VMRUNUCiAgICBhcnJheUpvaW4oTkVSKSBBUyBrLAogICAgY291bnQoKSBBUyBjCkZST00gcmVjaXBlcwpHUk9VUCBCWSBrCk9SREVSIEJZIGMgREVTQwpMSU1JVCA1MA==) 中体验。
|
||||
|
||||
[原文链接](https://clickhouse.com/docs/en/getting-started/example-datasets/recipes/)
|
||||
|
@ -1,10 +1,450 @@
|
||||
---
|
||||
slug: /zh/getting-started/example-datasets/uk-price-paid
|
||||
sidebar_label: UK Property Price Paid
|
||||
sidebar_label: 英国房地产支付价格
|
||||
sidebar_position: 1
|
||||
title: "UK Property Price Paid"
|
||||
title: "英国房地产支付价格"
|
||||
---
|
||||
|
||||
import Content from '@site/docs/en/getting-started/example-datasets/uk-price-paid.md';
|
||||
该数据集包含自 1995 年以来有关英格兰和威尔士房地产价格的数据。未压缩的大小约为 4 GiB,在 ClickHouse 中大约需要 278 MiB。
|
||||
|
||||
<Content />
|
||||
来源:https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads
|
||||
字段说明:https://www.gov.uk/guidance/about-the-price-data
|
||||
|
||||
包含 HM Land Registry data © Crown copyright and database right 2021.。此数据集需在 Open Government License v3.0 的许可下使用。
|
||||
|
||||
## 创建表 {#create-table}
|
||||
|
||||
```sql
|
||||
CREATE TABLE uk_price_paid
|
||||
(
|
||||
price UInt32,
|
||||
date Date,
|
||||
postcode1 LowCardinality(String),
|
||||
postcode2 LowCardinality(String),
|
||||
type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
|
||||
is_new UInt8,
|
||||
duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
|
||||
addr1 String,
|
||||
addr2 String,
|
||||
street LowCardinality(String),
|
||||
locality LowCardinality(String),
|
||||
town LowCardinality(String),
|
||||
district LowCardinality(String),
|
||||
county LowCardinality(String)
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY (postcode1, postcode2, addr1, addr2);
|
||||
```
|
||||
|
||||
## 预处理和插入数据 {#preprocess-import-data}
|
||||
|
||||
我们将使用 `url` 函数将数据流式传输到 ClickHouse。我们需要首先预处理一些传入的数据,其中包括:
|
||||
|
||||
- 将`postcode` 拆分为两个不同的列 - `postcode1` 和 `postcode2`,因为这更适合存储和查询
|
||||
- 将`time` 字段转换为日期为它只包含 00:00 时间
|
||||
- 忽略 [UUid](/docs/zh/sql-reference/data-types/uuid.md) 字段,因为我们不需要它进行分析
|
||||
- 使用 [transform](/docs/zh/sql-reference/functions/other-functions.md#transform) 函数将 `Enum` 字段 `type` 和 `duration` 转换为更易读的 `Enum` 字段
|
||||
- 将 `is_new` 字段从单字符串(` Y`/`N`) 到 [UInt8](/docs/zh/sql-reference/data-types/int-uint.md#uint8-uint16-uint32-uint64-uint256-int8-int16-int32-int64 -int128-int256) 字段为 0 或 1
|
||||
- 删除最后两列,因为它们都具有相同的值(即 0)
|
||||
|
||||
`url` 函数将来自网络服务器的数据流式传输到 ClickHouse 表中。以下命令将 500 万行插入到 `uk_price_paid` 表中:
|
||||
|
||||
```sql
|
||||
INSERT INTO uk_price_paid
|
||||
WITH
|
||||
splitByChar(' ', postcode) AS p
|
||||
SELECT
|
||||
toUInt32(price_string) AS price,
|
||||
parseDateTimeBestEffortUS(time) AS date,
|
||||
p[1] AS postcode1,
|
||||
p[2] AS postcode2,
|
||||
transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
|
||||
b = 'Y' AS is_new,
|
||||
transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
|
||||
addr1,
|
||||
addr2,
|
||||
street,
|
||||
locality,
|
||||
town,
|
||||
district,
|
||||
county
|
||||
FROM url(
|
||||
'http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv',
|
||||
'CSV',
|
||||
'uuid_string String,
|
||||
price_string String,
|
||||
time String,
|
||||
postcode String,
|
||||
a String,
|
||||
b String,
|
||||
c String,
|
||||
addr1 String,
|
||||
addr2 String,
|
||||
street String,
|
||||
locality String,
|
||||
town String,
|
||||
district String,
|
||||
county String,
|
||||
d String,
|
||||
e String'
|
||||
) SETTINGS max_http_get_redirects=10;
|
||||
```
|
||||
|
||||
需要等待一两分钟以便数据插入,具体时间取决于网络速度。
|
||||
|
||||
## 验证数据 {#validate-data}
|
||||
|
||||
让我们通过查看插入了多少行来验证它是否有效:
|
||||
|
||||
```sql
|
||||
SELECT count()
|
||||
FROM uk_price_paid
|
||||
```
|
||||
|
||||
在执行此查询时,数据集有 27,450,499 行。让我们看看 ClickHouse 中表的大小是多少:
|
||||
|
||||
```sql
|
||||
SELECT formatReadableSize(total_bytes)
|
||||
FROM system.tables
|
||||
WHERE name = 'uk_price_paid'
|
||||
```
|
||||
|
||||
请注意,表的大小仅为 221.43 MiB!
|
||||
|
||||
## 运行一些查询 {#run-queries}
|
||||
|
||||
让我们运行一些查询来分析数据:
|
||||
|
||||
### 查询 1. 每年平均价格 {#average-price}
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
toYear(date) AS year,
|
||||
round(avg(price)) AS price,
|
||||
bar(price, 0, 1000000, 80
|
||||
)
|
||||
FROM uk_price_paid
|
||||
GROUP BY year
|
||||
ORDER BY year
|
||||
```
|
||||
|
||||
结果如下所示:
|
||||
|
||||
```response
|
||||
┌─year─┬──price─┬─bar(round(avg(price)), 0, 1000000, 80)─┐
|
||||
│ 1995 │ 67934 │ █████▍ │
|
||||
│ 1996 │ 71508 │ █████▋ │
|
||||
│ 1997 │ 78536 │ ██████▎ │
|
||||
│ 1998 │ 85441 │ ██████▋ │
|
||||
│ 1999 │ 96038 │ ███████▋ │
|
||||
│ 2000 │ 107487 │ ████████▌ │
|
||||
│ 2001 │ 118888 │ █████████▌ │
|
||||
│ 2002 │ 137948 │ ███████████ │
|
||||
│ 2003 │ 155893 │ ████████████▍ │
|
||||
│ 2004 │ 178888 │ ██████████████▎ │
|
||||
│ 2005 │ 189359 │ ███████████████▏ │
|
||||
│ 2006 │ 203532 │ ████████████████▎ │
|
||||
│ 2007 │ 219375 │ █████████████████▌ │
|
||||
│ 2008 │ 217056 │ █████████████████▎ │
|
||||
│ 2009 │ 213419 │ █████████████████ │
|
||||
│ 2010 │ 236110 │ ██████████████████▊ │
|
||||
│ 2011 │ 232805 │ ██████████████████▌ │
|
||||
│ 2012 │ 238381 │ ███████████████████ │
|
||||
│ 2013 │ 256927 │ ████████████████████▌ │
|
||||
│ 2014 │ 280008 │ ██████████████████████▍ │
|
||||
│ 2015 │ 297263 │ ███████████████████████▋ │
|
||||
│ 2016 │ 313518 │ █████████████████████████ │
|
||||
│ 2017 │ 346371 │ ███████████████████████████▋ │
|
||||
│ 2018 │ 350556 │ ████████████████████████████ │
|
||||
│ 2019 │ 352184 │ ████████████████████████████▏ │
|
||||
│ 2020 │ 375808 │ ██████████████████████████████ │
|
||||
│ 2021 │ 381105 │ ██████████████████████████████▍ │
|
||||
│ 2022 │ 362572 │ █████████████████████████████ │
|
||||
└──────┴────────┴────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 查询 2. 伦敦每年的平均价格 {#average-price-london}
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
toYear(date) AS year,
|
||||
round(avg(price)) AS price,
|
||||
bar(price, 0, 2000000, 100
|
||||
)
|
||||
FROM uk_price_paid
|
||||
WHERE town = 'LONDON'
|
||||
GROUP BY year
|
||||
ORDER BY year
|
||||
```
|
||||
|
||||
结果如下所示:
|
||||
|
||||
```response
|
||||
┌─year─┬───price─┬─bar(round(avg(price)), 0, 2000000, 100)───────────────┐
|
||||
│ 1995 │ 109110 │ █████▍ │
|
||||
│ 1996 │ 118659 │ █████▊ │
|
||||
│ 1997 │ 136526 │ ██████▋ │
|
||||
│ 1998 │ 153002 │ ███████▋ │
|
||||
│ 1999 │ 180633 │ █████████ │
|
||||
│ 2000 │ 215849 │ ██████████▋ │
|
||||
│ 2001 │ 232987 │ ███████████▋ │
|
||||
│ 2002 │ 263668 │ █████████████▏ │
|
||||
│ 2003 │ 278424 │ █████████████▊ │
|
||||
│ 2004 │ 304664 │ ███████████████▏ │
|
||||
│ 2005 │ 322887 │ ████████████████▏ │
|
||||
│ 2006 │ 356195 │ █████████████████▋ │
|
||||
│ 2007 │ 404062 │ ████████████████████▏ │
|
||||
│ 2008 │ 420741 │ █████████████████████ │
|
||||
│ 2009 │ 427754 │ █████████████████████▍ │
|
||||
│ 2010 │ 480322 │ ████████████████████████ │
|
||||
│ 2011 │ 496278 │ ████████████████████████▋ │
|
||||
│ 2012 │ 519482 │ █████████████████████████▊ │
|
||||
│ 2013 │ 616195 │ ██████████████████████████████▋ │
|
||||
│ 2014 │ 724121 │ ████████████████████████████████████▏ │
|
||||
│ 2015 │ 792101 │ ███████████████████████████████████████▌ │
|
||||
│ 2016 │ 843589 │ ██████████████████████████████████████████▏ │
|
||||
│ 2017 │ 983523 │ █████████████████████████████████████████████████▏ │
|
||||
│ 2018 │ 1016753 │ ██████████████████████████████████████████████████▋ │
|
||||
│ 2019 │ 1041673 │ ████████████████████████████████████████████████████ │
|
||||
│ 2020 │ 1060027 │ █████████████████████████████████████████████████████ │
|
||||
│ 2021 │ 958249 │ ███████████████████████████████████████████████▊ │
|
||||
│ 2022 │ 902596 │ █████████████████████████████████████████████▏ │
|
||||
└──────┴─────────┴───────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
2020 年房价出事了!但这并不令人意外……
|
||||
|
||||
### 查询 3. 最昂贵的社区 {#most-expensive-neighborhoods}
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
town,
|
||||
district,
|
||||
count() AS c,
|
||||
round(avg(price)) AS price,
|
||||
bar(price, 0, 5000000, 100)
|
||||
FROM uk_price_paid
|
||||
WHERE date >= '2020-01-01'
|
||||
GROUP BY
|
||||
town,
|
||||
district
|
||||
HAVING c >= 100
|
||||
ORDER BY price DESC
|
||||
LIMIT 100
|
||||
```
|
||||
|
||||
结果如下所示:
|
||||
|
||||
```response
|
||||
┌─town─────────────────┬─district───────────────┬─────c─┬───price─┬─bar(round(avg(price)), 0, 5000000, 100)─────────────────────────┐
|
||||
│ LONDON │ CITY OF LONDON │ 578 │ 3149590 │ ██████████████████████████████████████████████████████████████▊ │
|
||||
│ LONDON │ CITY OF WESTMINSTER │ 7083 │ 2903794 │ ██████████████████████████████████████████████████████████ │
|
||||
│ LONDON │ KENSINGTON AND CHELSEA │ 4986 │ 2333782 │ ██████████████████████████████████████████████▋ │
|
||||
│ LEATHERHEAD │ ELMBRIDGE │ 203 │ 2071595 │ █████████████████████████████████████████▍ │
|
||||
│ VIRGINIA WATER │ RUNNYMEDE │ 308 │ 1939465 │ ██████████████████████████████████████▋ │
|
||||
│ LONDON │ CAMDEN │ 5750 │ 1673687 │ █████████████████████████████████▍ │
|
||||
│ WINDLESHAM │ SURREY HEATH │ 182 │ 1428358 │ ████████████████████████████▌ │
|
||||
│ NORTHWOOD │ THREE RIVERS │ 112 │ 1404170 │ ████████████████████████████ │
|
||||
│ BARNET │ ENFIELD │ 259 │ 1338299 │ ██████████████████████████▋ │
|
||||
│ LONDON │ ISLINGTON │ 5504 │ 1275520 │ █████████████████████████▌ │
|
||||
│ LONDON │ RICHMOND UPON THAMES │ 1345 │ 1261935 │ █████████████████████████▏ │
|
||||
│ COBHAM │ ELMBRIDGE │ 727 │ 1251403 │ █████████████████████████ │
|
||||
│ BEACONSFIELD │ BUCKINGHAMSHIRE │ 680 │ 1199970 │ ███████████████████████▊ │
|
||||
│ LONDON │ TOWER HAMLETS │ 10012 │ 1157827 │ ███████████████████████▏ │
|
||||
│ LONDON │ HOUNSLOW │ 1278 │ 1144389 │ ██████████████████████▊ │
|
||||
│ BURFORD │ WEST OXFORDSHIRE │ 182 │ 1139393 │ ██████████████████████▋ │
|
||||
│ RICHMOND │ RICHMOND UPON THAMES │ 1649 │ 1130076 │ ██████████████████████▌ │
|
||||
│ KINGSTON UPON THAMES │ RICHMOND UPON THAMES │ 147 │ 1126111 │ ██████████████████████▌ │
|
||||
│ ASCOT │ WINDSOR AND MAIDENHEAD │ 773 │ 1106109 │ ██████████████████████ │
|
||||
│ LONDON │ HAMMERSMITH AND FULHAM │ 6162 │ 1056198 │ █████████████████████ │
|
||||
│ RADLETT │ HERTSMERE │ 513 │ 1045758 │ ████████████████████▊ │
|
||||
│ LEATHERHEAD │ GUILDFORD │ 354 │ 1045175 │ ████████████████████▊ │
|
||||
│ WEYBRIDGE │ ELMBRIDGE │ 1275 │ 1036702 │ ████████████████████▋ │
|
||||
│ FARNHAM │ EAST HAMPSHIRE │ 107 │ 1033682 │ ████████████████████▋ │
|
||||
│ ESHER │ ELMBRIDGE │ 915 │ 1032753 │ ████████████████████▋ │
|
||||
│ FARNHAM │ HART │ 102 │ 1002692 │ ████████████████████ │
|
||||
│ GERRARDS CROSS │ BUCKINGHAMSHIRE │ 845 │ 983639 │ ███████████████████▋ │
|
||||
│ CHALFONT ST GILES │ BUCKINGHAMSHIRE │ 286 │ 973993 │ ███████████████████▍ │
|
||||
│ SALCOMBE │ SOUTH HAMS │ 215 │ 965724 │ ███████████████████▎ │
|
||||
│ SURBITON │ ELMBRIDGE │ 181 │ 960346 │ ███████████████████▏ │
|
||||
│ BROCKENHURST │ NEW FOREST │ 226 │ 951278 │ ███████████████████ │
|
||||
│ SUTTON COLDFIELD │ LICHFIELD │ 110 │ 930757 │ ██████████████████▌ │
|
||||
│ EAST MOLESEY │ ELMBRIDGE │ 372 │ 927026 │ ██████████████████▌ │
|
||||
│ LLANGOLLEN │ WREXHAM │ 127 │ 925681 │ ██████████████████▌ │
|
||||
│ OXFORD │ SOUTH OXFORDSHIRE │ 638 │ 923830 │ ██████████████████▍ │
|
||||
│ LONDON │ MERTON │ 4383 │ 923194 │ ██████████████████▍ │
|
||||
│ GUILDFORD │ WAVERLEY │ 261 │ 905733 │ ██████████████████ │
|
||||
│ TEDDINGTON │ RICHMOND UPON THAMES │ 1147 │ 894856 │ █████████████████▊ │
|
||||
│ HARPENDEN │ ST ALBANS │ 1271 │ 893079 │ █████████████████▋ │
|
||||
│ HENLEY-ON-THAMES │ SOUTH OXFORDSHIRE │ 1042 │ 887557 │ █████████████████▋ │
|
||||
│ POTTERS BAR │ WELWYN HATFIELD │ 314 │ 863037 │ █████████████████▎ │
|
||||
│ LONDON │ WANDSWORTH │ 13210 │ 857318 │ █████████████████▏ │
|
||||
│ BILLINGSHURST │ CHICHESTER │ 255 │ 856508 │ █████████████████▏ │
|
||||
│ LONDON │ SOUTHWARK │ 7742 │ 843145 │ ████████████████▋ │
|
||||
│ LONDON │ HACKNEY │ 6656 │ 839716 │ ████████████████▋ │
|
||||
│ LUTTERWORTH │ HARBOROUGH │ 1096 │ 836546 │ ████████████████▋ │
|
||||
│ KINGSTON UPON THAMES │ KINGSTON UPON THAMES │ 1846 │ 828990 │ ████████████████▌ │
|
||||
│ LONDON │ EALING │ 5583 │ 820135 │ ████████████████▍ │
|
||||
│ INGATESTONE │ CHELMSFORD │ 120 │ 815379 │ ████████████████▎ │
|
||||
│ MARLOW │ BUCKINGHAMSHIRE │ 718 │ 809943 │ ████████████████▏ │
|
||||
│ EAST GRINSTEAD │ TANDRIDGE │ 105 │ 809461 │ ████████████████▏ │
|
||||
│ CHIGWELL │ EPPING FOREST │ 484 │ 809338 │ ████████████████▏ │
|
||||
│ EGHAM │ RUNNYMEDE │ 989 │ 807858 │ ████████████████▏ │
|
||||
│ HASLEMERE │ CHICHESTER │ 223 │ 804173 │ ████████████████ │
|
||||
│ PETWORTH │ CHICHESTER │ 288 │ 803206 │ ████████████████ │
|
||||
│ TWICKENHAM │ RICHMOND UPON THAMES │ 2194 │ 802616 │ ████████████████ │
|
||||
│ WEMBLEY │ BRENT │ 1698 │ 801733 │ ████████████████ │
|
||||
│ HINDHEAD │ WAVERLEY │ 233 │ 801482 │ ████████████████ │
|
||||
│ LONDON │ BARNET │ 8083 │ 792066 │ ███████████████▋ │
|
||||
│ WOKING │ GUILDFORD │ 343 │ 789360 │ ███████████████▋ │
|
||||
│ STOCKBRIDGE │ TEST VALLEY │ 318 │ 777909 │ ███████████████▌ │
|
||||
│ BERKHAMSTED │ DACORUM │ 1049 │ 776138 │ ███████████████▌ │
|
||||
│ MAIDENHEAD │ BUCKINGHAMSHIRE │ 236 │ 775572 │ ███████████████▌ │
|
||||
│ SOLIHULL │ STRATFORD-ON-AVON │ 142 │ 770727 │ ███████████████▍ │
|
||||
│ GREAT MISSENDEN │ BUCKINGHAMSHIRE │ 431 │ 764493 │ ███████████████▎ │
|
||||
│ TADWORTH │ REIGATE AND BANSTEAD │ 920 │ 757511 │ ███████████████▏ │
|
||||
│ LONDON │ BRENT │ 4124 │ 757194 │ ███████████████▏ │
|
||||
│ THAMES DITTON │ ELMBRIDGE │ 470 │ 750828 │ ███████████████ │
|
||||
│ LONDON │ LAMBETH │ 10431 │ 750532 │ ███████████████ │
|
||||
│ RICKMANSWORTH │ THREE RIVERS │ 1500 │ 747029 │ ██████████████▊ │
|
||||
│ KINGS LANGLEY │ DACORUM │ 281 │ 746536 │ ██████████████▊ │
|
||||
│ HARLOW │ EPPING FOREST │ 172 │ 739423 │ ██████████████▋ │
|
||||
│ TONBRIDGE │ SEVENOAKS │ 103 │ 738740 │ ██████████████▋ │
|
||||
│ BELVEDERE │ BEXLEY │ 686 │ 736385 │ ██████████████▋ │
|
||||
│ CRANBROOK │ TUNBRIDGE WELLS │ 769 │ 734328 │ ██████████████▋ │
|
||||
│ SOLIHULL │ WARWICK │ 116 │ 733286 │ ██████████████▋ │
|
||||
│ ALDERLEY EDGE │ CHESHIRE EAST │ 357 │ 732882 │ ██████████████▋ │
|
||||
│ WELWYN │ WELWYN HATFIELD │ 404 │ 730281 │ ██████████████▌ │
|
||||
│ CHISLEHURST │ BROMLEY │ 870 │ 730279 │ ██████████████▌ │
|
||||
│ LONDON │ HARINGEY │ 6488 │ 726715 │ ██████████████▌ │
|
||||
│ AMERSHAM │ BUCKINGHAMSHIRE │ 965 │ 725426 │ ██████████████▌ │
|
||||
│ SEVENOAKS │ SEVENOAKS │ 2183 │ 725102 │ ██████████████▌ │
|
||||
│ BOURNE END │ BUCKINGHAMSHIRE │ 269 │ 724595 │ ██████████████▍ │
|
||||
│ NORTHWOOD │ HILLINGDON │ 568 │ 722436 │ ██████████████▍ │
|
||||
│ PURFLEET │ THURROCK │ 143 │ 722205 │ ██████████████▍ │
|
||||
│ SLOUGH │ BUCKINGHAMSHIRE │ 832 │ 721529 │ ██████████████▍ │
|
||||
│ INGATESTONE │ BRENTWOOD │ 301 │ 718292 │ ██████████████▎ │
|
||||
│ EPSOM │ REIGATE AND BANSTEAD │ 315 │ 709264 │ ██████████████▏ │
|
||||
│ ASHTEAD │ MOLE VALLEY │ 524 │ 708646 │ ██████████████▏ │
|
||||
│ BETCHWORTH │ MOLE VALLEY │ 155 │ 708525 │ ██████████████▏ │
|
||||
│ OXTED │ TANDRIDGE │ 645 │ 706946 │ ██████████████▏ │
|
||||
│ READING │ SOUTH OXFORDSHIRE │ 593 │ 705466 │ ██████████████ │
|
||||
│ FELTHAM │ HOUNSLOW │ 1536 │ 703815 │ ██████████████ │
|
||||
│ TUNBRIDGE WELLS │ WEALDEN │ 207 │ 703296 │ ██████████████ │
|
||||
│ LEWES │ WEALDEN │ 116 │ 701349 │ ██████████████ │
|
||||
│ OXFORD │ OXFORD │ 3656 │ 700813 │ ██████████████ │
|
||||
│ MAYFIELD │ WEALDEN │ 177 │ 698158 │ █████████████▊ │
|
||||
│ PINNER │ HARROW │ 997 │ 697876 │ █████████████▊ │
|
||||
│ LECHLADE │ COTSWOLD │ 155 │ 696262 │ █████████████▊ │
|
||||
│ WALTON-ON-THAMES │ ELMBRIDGE │ 1850 │ 690102 │ █████████████▋ │
|
||||
└──────────────────────┴────────────────────────┴───────┴─────────┴─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 使用 Projection 加速查询 {#speedup-with-projections}
|
||||
|
||||
[Projections](/docs/zh/sql-reference/statements/alter/projection.mdx) 允许我们通过存储任意格式的预先聚合的数据来提高查询速度。在此示例中,我们创建了一个按年份、地区和城镇分组的房产的平均价格、总价格和数量的 Projection。在执行时,如果 ClickHouse 认为 Projection 可以提高查询的性能,它将使用 Projection(何时使用由 ClickHouse 决定)。
|
||||
|
||||
### 构建投影{#build-projection}
|
||||
|
||||
让我们通过维度 `toYear(date)`、`district` 和 `town` 创建一个聚合 Projection:
|
||||
|
||||
```sql
|
||||
ALTER TABLE uk_price_paid
|
||||
ADD PROJECTION projection_by_year_district_town
|
||||
(
|
||||
SELECT
|
||||
toYear(date),
|
||||
district,
|
||||
town,
|
||||
avg(price),
|
||||
sum(price),
|
||||
count()
|
||||
GROUP BY
|
||||
toYear(date),
|
||||
district,
|
||||
town
|
||||
)
|
||||
```
|
||||
|
||||
填充现有数据的 Projection。 (如果不进行 materialize 操作,则 ClickHouse 只会为新插入的数据创建 Projection):
|
||||
|
||||
```sql
|
||||
ALTER TABLE uk_price_paid
|
||||
MATERIALIZE PROJECTION projection_by_year_district_town
|
||||
SETTINGS mutations_sync = 1
|
||||
```
|
||||
|
||||
## Test Performance {#test-performance}
|
||||
|
||||
让我们再次运行相同的 3 个查询:
|
||||
|
||||
### 查询 1. 每年平均价格 {#average-price-projections}
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
toYear(date) AS year,
|
||||
round(avg(price)) AS price,
|
||||
bar(price, 0, 1000000, 80)
|
||||
FROM uk_price_paid
|
||||
GROUP BY year
|
||||
ORDER BY year ASC
|
||||
```
|
||||
|
||||
结果是一样的,但是性能更好!
|
||||
```response
|
||||
No projection: 28 rows in set. Elapsed: 1.775 sec. Processed 27.45 million rows, 164.70 MB (15.47 million rows/s., 92.79 MB/s.)
|
||||
With projection: 28 rows in set. Elapsed: 0.665 sec. Processed 87.51 thousand rows, 3.21 MB (131.51 thousand rows/s., 4.82 MB/s.)
|
||||
```
|
||||
|
||||
|
||||
### 查询 2. 伦敦每年的平均价格 {#average-price-london-projections}
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
toYear(date) AS year,
|
||||
round(avg(price)) AS price,
|
||||
bar(price, 0, 2000000, 100)
|
||||
FROM uk_price_paid
|
||||
WHERE town = 'LONDON'
|
||||
GROUP BY year
|
||||
ORDER BY year ASC
|
||||
```
|
||||
|
||||
Same result, but notice the improvement in query performance:
|
||||
|
||||
```response
|
||||
No projection: 28 rows in set. Elapsed: 0.720 sec. Processed 27.45 million rows, 46.61 MB (38.13 million rows/s., 64.74 MB/s.)
|
||||
With projection: 28 rows in set. Elapsed: 0.015 sec. Processed 87.51 thousand rows, 3.51 MB (5.74 million rows/s., 230.24 MB/s.)
|
||||
```
|
||||
|
||||
### 查询 3. 最昂贵的社区 {#most-expensive-neighborhoods-projections}
|
||||
|
||||
注意:需要修改 (date >= '2020-01-01') 以使其与 Projection 定义的维度 (`toYear(date) >= 2020)` 匹配:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
town,
|
||||
district,
|
||||
count() AS c,
|
||||
round(avg(price)) AS price,
|
||||
bar(price, 0, 5000000, 100)
|
||||
FROM uk_price_paid
|
||||
WHERE toYear(date) >= 2020
|
||||
GROUP BY
|
||||
town,
|
||||
district
|
||||
HAVING c >= 100
|
||||
ORDER BY price DESC
|
||||
LIMIT 100
|
||||
```
|
||||
|
||||
同样,结果是相同的,但请注意查询性能的改进:
|
||||
|
||||
```response
|
||||
No projection: 100 rows in set. Elapsed: 0.928 sec. Processed 27.45 million rows, 103.80 MB (29.56 million rows/s., 111.80 MB/s.)
|
||||
With projection: 100 rows in set. Elapsed: 0.336 sec. Processed 17.32 thousand rows, 1.23 MB (51.61 thousand rows/s., 3.65 MB/s.)
|
||||
```
|
||||
|
||||
### 在 Playground 上测试{#playground}
|
||||
|
||||
也可以在 [Online Playground](https://play.clickhouse.com/play?user=play#U0VMRUNUIHRvd24sIGRpc3RyaWN0LCBjb3VudCgpIEFTIGMsIHJvdW5kKGF2ZyhwcmljZSkpIEFTIHByaWNlLCBiYXIocHJpY2UsIDAsIDUwMDAwMDAsIDEwMCkgRlJPTSB1a19wcmljZV9wYWlkIFdIRVJFIGRhdGUgPj0gJzIwMjAtMDEtMDEnIEdST1VQIEJZIHRvd24sIGRpc3RyaWN0IEhBVklORyBjID49IDEwMCBPUkRFUiBCWSBwcmljZSBERVNDIExJTUlUIDEwMA==) 上找到此数据集。
|
||||
|
@ -35,6 +35,9 @@ Yandex**没有**维护下面列出的库,也没有做过任何广泛的测试
|
||||
- NodeJs
|
||||
- [clickhouse (NodeJs)](https://github.com/TimonKK/clickhouse)
|
||||
- [node-clickhouse](https://github.com/apla/node-clickhouse)
|
||||
- [nestjs-clickhouse](https://github.com/depyronick/nestjs-clickhouse)
|
||||
- [clickhouse-client](https://github.com/depyronick/clickhouse-client)
|
||||
- [node-clickhouse-orm](https://github.com/zimv/node-clickhouse-orm)
|
||||
- Perl
|
||||
- [perl-DBD-ClickHouse](https://github.com/elcamlost/perl-DBD-ClickHouse)
|
||||
- [HTTP-ClickHouse](https://metacpan.org/release/HTTP-ClickHouse)
|
||||
|
@ -120,7 +120,11 @@ use_cron()
|
||||
if [ -x "/bin/systemctl" ] && [ -f /etc/systemd/system/clickhouse-server.service ] && [ -d /run/systemd/system ]; then
|
||||
return 1
|
||||
fi
|
||||
# 2. disabled by config
|
||||
# 2. checking whether the config is existed
|
||||
if [ ! -f "$CLICKHOUSE_CRONFILE" ]; then
|
||||
return 1
|
||||
fi
|
||||
# 3. disabled by config
|
||||
if [ -z "$CLICKHOUSE_CRONFILE" ]; then
|
||||
return 2
|
||||
fi
|
||||
|
@ -1,6 +1,6 @@
|
||||
module github.com/ClickHouse/ClickHouse/programs/diagnostics
|
||||
|
||||
go 1.17
|
||||
go 1.19
|
||||
|
||||
require (
|
||||
github.com/ClickHouse/clickhouse-go/v2 v2.0.12
|
||||
|
@ -65,7 +65,6 @@ github.com/Azure/go-autorest/logger v0.2.0/go.mod h1:T9E3cAhj2VqvPOtCYAvby9aBXkZ
|
||||
github.com/Azure/go-autorest/tracing v0.6.0/go.mod h1:+vhtPC754Xsa23ID7GlGsrdKBpUA79WCAKPPZVC2DeU=
|
||||
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
|
||||
github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802/go.mod h1:IVnqGOEym/WlBOVXweHU+Q+/VP0lqqI8lqeDx9IjBqo=
|
||||
github.com/ClickHouse/clickhouse-go v1.5.3 h1:Vok8zUb/wlqc9u8oEqQzBMBRDoFd8NxPRqgYEqMnV88=
|
||||
github.com/ClickHouse/clickhouse-go v1.5.3/go.mod h1:EaI/sW7Azgz9UATzd5ZdZHRUhHgv5+JMS9NSr2smCJI=
|
||||
github.com/ClickHouse/clickhouse-go/v2 v2.0.12 h1:Nbl/NZwoM6LGJm7smNBgvtdr/rxjlIssSW3eG/Nmb9E=
|
||||
github.com/ClickHouse/clickhouse-go/v2 v2.0.12/go.mod h1:u4RoNQLLM2W6hNSPYrIESLJqaWSInZVmfM+MlaAhXcg=
|
||||
@ -457,7 +456,6 @@ github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0/go.mod h1:8NvIoxWQoOIhqOTXgf
|
||||
github.com/grpc-ecosystem/grpc-gateway v1.9.5/go.mod h1:vNeuVxBJEsws4ogUvrchl83t/GYV9WGTSLVdBhOQFDY=
|
||||
github.com/grpc-ecosystem/grpc-gateway v1.16.0/go.mod h1:BDjrQk3hbvj6Nolgz8mAMFbcEtjT1g+wF4CSlocrBnw=
|
||||
github.com/hashicorp/consul/api v1.11.0/go.mod h1:XjsvQN+RJGWI2TWy1/kqaE16HrR2J/FWgkYjdZQsX9M=
|
||||
github.com/hashicorp/consul/api v1.12.0/go.mod h1:6pVBMo0ebnYdt2S3H87XhekM/HHrUoTD2XXb/VrZVy0=
|
||||
github.com/hashicorp/consul/sdk v0.8.0/go.mod h1:GBvyrGALthsZObzUGsfgHZQDXjg4lOjagTIwIR1vPms=
|
||||
github.com/hashicorp/errwrap v0.0.0-20141028054710-7554cd9344ce/go.mod h1:YH+1FKiLXxHSkmPseP+kNlulaMuP3n2brvKWEqk/Jc4=
|
||||
github.com/hashicorp/errwrap v1.0.0/go.mod h1:YH+1FKiLXxHSkmPseP+kNlulaMuP3n2brvKWEqk/Jc4=
|
||||
@ -663,9 +661,7 @@ github.com/paulmach/protoscan v0.2.1-0.20210522164731-4e53c6875432/go.mod h1:2sV
|
||||
github.com/pelletier/go-toml v1.9.4 h1:tjENF6MfZAg8e4ZmZTeWaWiT2vXtsoO6+iuOjFhECwM=
|
||||
github.com/pelletier/go-toml v1.9.4/go.mod h1:u1nR/EPcESfeI/szUZKdtJ0xRNbUoANCkoOuaOx1Y+c=
|
||||
github.com/peterbourgon/diskv v2.0.1+incompatible/go.mod h1:uqqh8zWWbv1HBMNONnaR/tNboyR3/BZd58JJSHlUSCU=
|
||||
github.com/pierrec/lz4 v2.0.5+incompatible h1:2xWsjqPFWcplujydGg4WmhC/6fZqK42wMM8aXeqhl0I=
|
||||
github.com/pierrec/lz4 v2.0.5+incompatible/go.mod h1:pdkljMzZIN41W+lC3N2tnIh5sFi+IEE17M5jbnwPHcY=
|
||||
github.com/pierrec/lz4/v4 v4.1.12/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4=
|
||||
github.com/pierrec/lz4/v4 v4.1.14 h1:+fL8AQEZtz/ijeNnpduH0bROTu0O3NZAlPjQxGn8LwE=
|
||||
github.com/pierrec/lz4/v4 v4.1.14/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4=
|
||||
github.com/pkg/errors v0.8.0/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
|
||||
@ -717,7 +713,6 @@ github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQD
|
||||
github.com/ryanuber/columnize v0.0.0-20160712163229-9b3edd62028f/go.mod h1:sm1tb6uqfes/u+d4ooFouqFdy9/2g9QGwK3SQygK0Ts=
|
||||
github.com/safchain/ethtool v0.0.0-20190326074333-42ed695e3de8/go.mod h1:Z0q5wiBQGYcxhMZ6gUqHn6pYNLypFAvaL3UvgZLR0U4=
|
||||
github.com/sagikazarmark/crypt v0.3.0/go.mod h1:uD/D+6UF4SrIR1uGEv7bBNkNqLGqUr43MRiaGWX1Nig=
|
||||
github.com/sagikazarmark/crypt v0.4.0/go.mod h1:ALv2SRj7GxYV4HO9elxH9nS6M9gW+xDNxqmyJ6RfDFM=
|
||||
github.com/satori/go.uuid v1.2.0/go.mod h1:dA0hQrYB0VpLJoorglMZABFdXlWrHn1NEOzdhQKdks0=
|
||||
github.com/sean-/seed v0.0.0-20170313163322-e2103e2c3529/go.mod h1:DxrIzT+xaE7yg65j358z/aeFdxmN0P9QXhEzd20vsDc=
|
||||
github.com/seccomp/libseccomp-golang v0.9.1/go.mod h1:GbW5+tmTXfcxTToHLXlScSlAvWlF4P2Ca7zGrPiEpWo=
|
||||
@ -1083,7 +1078,6 @@ golang.org/x/sys v0.0.0-20211109184856-51b60fd695b3/go.mod h1:oPkhp1MJrh7nUepCBc
|
||||
golang.org/x/sys v0.0.0-20211110154304-99a53858aa08/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20211124211545-fe61309f8881/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20211205182925-97ca703d548d/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20211210111614-af8b64212486/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9 h1:XfKQ4OlFl8okEOr5UvAqFRVj8pY/4yfcXrddB8qAbU0=
|
||||
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
|
||||
@ -1202,7 +1196,6 @@ google.golang.org/api v0.57.0/go.mod h1:dVPlbZyBo2/OjBpmvNdpn2GRm6rPy75jyU7bmhdr
|
||||
google.golang.org/api v0.59.0/go.mod h1:sT2boj7M9YJxZzgeZqXogmhfmRWDtPzT31xkieUbuZU=
|
||||
google.golang.org/api v0.61.0/go.mod h1:xQRti5UdCmoCEqFxcz93fTl338AVqDgyaDRuOZ3hg9I=
|
||||
google.golang.org/api v0.62.0/go.mod h1:dKmwPCydfsad4qCH08MSdgWjfHOyfpd4VtDGgRFdavw=
|
||||
google.golang.org/api v0.63.0/go.mod h1:gs4ij2ffTRXwuzzgJl/56BdwJaA194ijkfn++9tDuPo=
|
||||
google.golang.org/appengine v1.1.0/go.mod h1:EbEs0AVv82hx2wNQdGPgUI5lhzA/G0D9YwlJXL52JkM=
|
||||
google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4=
|
||||
google.golang.org/appengine v1.5.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4=
|
||||
|
@ -58,7 +58,7 @@ void DisksApp::addOptions(
|
||||
("disk", po::value<String>(), "Set disk name")
|
||||
("command_name", po::value<String>(), "Name for command to do")
|
||||
("send-logs", "Send logs")
|
||||
("log-level", "Logging level")
|
||||
("log-level", po::value<String>(), "Logging level")
|
||||
;
|
||||
|
||||
positional_options_description.add("command_name", 1);
|
||||
|
@ -45,6 +45,7 @@ if (BUILD_STANDALONE_KEEPER)
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperLogStore.cpp
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperServer.cpp
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperSnapshotManager.cpp
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperSnapshotManagerS3.cpp
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperStateMachine.cpp
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperStateManager.cpp
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/../../src/Coordination/KeeperStorage.cpp
|
||||
|
@ -13,7 +13,6 @@
|
||||
#include <Interpreters/DatabaseCatalog.h>
|
||||
#include <base/getFQDNOrHostName.h>
|
||||
#include <Common/scope_guard_safe.h>
|
||||
#include <Interpreters/UserDefinedSQLObjectsLoader.h>
|
||||
#include <Interpreters/Session.h>
|
||||
#include <Access/AccessControl.h>
|
||||
#include <Common/Exception.h>
|
||||
@ -32,6 +31,7 @@
|
||||
#include <Parsers/IAST.h>
|
||||
#include <Parsers/ASTInsertQuery.h>
|
||||
#include <Common/ErrorHandlers.h>
|
||||
#include <Functions/UserDefined/IUserDefinedSQLObjectsLoader.h>
|
||||
#include <Functions/registerFunctions.h>
|
||||
#include <AggregateFunctions/registerAggregateFunctions.h>
|
||||
#include <TableFunctions/registerTableFunctions.h>
|
||||
@ -602,8 +602,6 @@ void LocalServer::processConfig()
|
||||
global_context->setCurrentDatabase(default_database);
|
||||
applyCmdOptions(global_context);
|
||||
|
||||
bool enable_objects_loader = false;
|
||||
|
||||
if (config().has("path"))
|
||||
{
|
||||
String path = global_context->getPath();
|
||||
@ -611,12 +609,6 @@ void LocalServer::processConfig()
|
||||
/// Lock path directory before read
|
||||
status.emplace(fs::path(path) / "status", StatusFile::write_full_info);
|
||||
|
||||
LOG_DEBUG(log, "Loading user defined objects from {}", path);
|
||||
Poco::File(path + "user_defined/").createDirectories();
|
||||
UserDefinedSQLObjectsLoader::instance().loadObjects(global_context);
|
||||
enable_objects_loader = true;
|
||||
LOG_DEBUG(log, "Loaded user defined objects.");
|
||||
|
||||
LOG_DEBUG(log, "Loading metadata from {}", path);
|
||||
loadMetadataSystem(global_context);
|
||||
attachSystemTablesLocal(global_context, *createMemoryDatabaseIfNotExists(global_context, DatabaseCatalog::SYSTEM_DATABASE));
|
||||
@ -630,6 +622,9 @@ void LocalServer::processConfig()
|
||||
DatabaseCatalog::instance().loadDatabases();
|
||||
}
|
||||
|
||||
/// For ClickHouse local if path is not set the loader will be disabled.
|
||||
global_context->getUserDefinedSQLObjectsLoader().loadObjects();
|
||||
|
||||
LOG_DEBUG(log, "Loaded metadata.");
|
||||
}
|
||||
else if (!config().has("no-system-tables"))
|
||||
@ -639,9 +634,6 @@ void LocalServer::processConfig()
|
||||
attachInformationSchema(global_context, *createMemoryDatabaseIfNotExists(global_context, DatabaseCatalog::INFORMATION_SCHEMA_UPPERCASE));
|
||||
}
|
||||
|
||||
/// Persist SQL user defined objects only if user_defined folder was created
|
||||
UserDefinedSQLObjectsLoader::instance().enable(enable_objects_loader);
|
||||
|
||||
server_display_name = config().getString("display_name", getFQDNOrHostName());
|
||||
prompt_by_server_display_name = config().getRawString("prompt_by_server_display_name.default", "{display_name} :) ");
|
||||
std::map<String, String> prompt_substitutions{{"display_name", server_display_name}};
|
||||
|
@ -53,7 +53,6 @@
|
||||
#include <Interpreters/ExternalDictionariesLoader.h>
|
||||
#include <Interpreters/ProcessList.h>
|
||||
#include <Interpreters/loadMetadata.h>
|
||||
#include <Interpreters/UserDefinedSQLObjectsLoader.h>
|
||||
#include <Interpreters/JIT/CompiledExpressionCache.h>
|
||||
#include <Access/AccessControl.h>
|
||||
#include <Storages/StorageReplicatedMergeTree.h>
|
||||
@ -62,6 +61,7 @@
|
||||
#include <Storages/Cache/ExternalDataSourceCache.h>
|
||||
#include <Storages/Cache/registerRemoteFileMetadatas.h>
|
||||
#include <AggregateFunctions/registerAggregateFunctions.h>
|
||||
#include <Functions/UserDefined/IUserDefinedSQLObjectsLoader.h>
|
||||
#include <Functions/registerFunctions.h>
|
||||
#include <TableFunctions/registerTableFunctions.h>
|
||||
#include <Formats/registerFormats.h>
|
||||
@ -82,13 +82,17 @@
|
||||
#if USE_BORINGSSL
|
||||
#include <Compression/CompressionCodecEncrypted.h>
|
||||
#endif
|
||||
#include <Server/HTTP/HTTPServerConnectionFactory.h>
|
||||
#include <Server/MySQLHandlerFactory.h>
|
||||
#include <Server/PostgreSQLHandlerFactory.h>
|
||||
#include <Server/ProxyV1HandlerFactory.h>
|
||||
#include <Server/TLSHandlerFactory.h>
|
||||
#include <Server/CertificateReloader.h>
|
||||
#include <Server/ProtocolServerAdapter.h>
|
||||
#include <Server/HTTP/HTTPServer.h>
|
||||
#include <Interpreters/AsynchronousInsertQueue.h>
|
||||
#include <filesystem>
|
||||
#include <unordered_set>
|
||||
|
||||
#include "config.h"
|
||||
#include "config_version.h"
|
||||
@ -387,7 +391,16 @@ bool getListenTry(const Poco::Util::AbstractConfiguration & config)
|
||||
{
|
||||
bool listen_try = config.getBool("listen_try", false);
|
||||
if (!listen_try)
|
||||
listen_try = DB::getMultipleValuesFromConfig(config, "", "listen_host").empty();
|
||||
{
|
||||
Poco::Util::AbstractConfiguration::Keys protocols;
|
||||
config.keys("protocols", protocols);
|
||||
listen_try =
|
||||
DB::getMultipleValuesFromConfig(config, "", "listen_host").empty() &&
|
||||
std::none_of(protocols.begin(), protocols.end(), [&](const auto & protocol)
|
||||
{
|
||||
return config.has("protocols." + protocol + ".host") && config.has("protocols." + protocol + ".port");
|
||||
});
|
||||
}
|
||||
return listen_try;
|
||||
}
|
||||
|
||||
@ -971,10 +984,10 @@ int Server::main(const std::vector<std::string> & /*args*/)
|
||||
|
||||
/// Storage with temporary data for processing of heavy queries.
|
||||
{
|
||||
std::string tmp_path = config().getString("tmp_path", path / "tmp/");
|
||||
std::string tmp_policy = config().getString("tmp_policy", "");
|
||||
size_t tmp_max_size = config().getUInt64("tmp_max_size", 0);
|
||||
const VolumePtr & volume = global_context->setTemporaryStorage(tmp_path, tmp_policy, tmp_max_size);
|
||||
std::string temporary_path = config().getString("tmp_path", path / "tmp/");
|
||||
std::string temporary_policy = config().getString("tmp_policy", "");
|
||||
size_t max_size = config().getUInt64("max_temporary_data_on_disk_size", 0);
|
||||
const VolumePtr & volume = global_context->setTemporaryStorage(temporary_path, temporary_policy, max_size);
|
||||
for (const DiskPtr & disk : volume->getDisks())
|
||||
setupTmpPath(log, disk->getPath());
|
||||
}
|
||||
@ -1010,12 +1023,6 @@ int Server::main(const std::vector<std::string> & /*args*/)
|
||||
fs::create_directories(user_scripts_path);
|
||||
}
|
||||
|
||||
{
|
||||
std::string user_defined_path = config().getString("user_defined_path", path / "user_defined/");
|
||||
global_context->setUserDefinedPath(user_defined_path);
|
||||
fs::create_directories(user_defined_path);
|
||||
}
|
||||
|
||||
/// top_level_domains_lists
|
||||
{
|
||||
const std::string & top_level_domains_path = config().getString("top_level_domains_path", path / "top_level_domains/");
|
||||
@ -1559,18 +1566,6 @@ int Server::main(const std::vector<std::string> & /*args*/)
|
||||
/// system logs may copy global context.
|
||||
global_context->setCurrentDatabaseNameInGlobalContext(default_database);
|
||||
|
||||
LOG_INFO(log, "Loading user defined objects from {}", path_str);
|
||||
try
|
||||
{
|
||||
UserDefinedSQLObjectsLoader::instance().loadObjects(global_context);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
tryLogCurrentException(log, "Caught exception while loading user defined objects");
|
||||
throw;
|
||||
}
|
||||
LOG_DEBUG(log, "Loaded user defined objects");
|
||||
|
||||
LOG_INFO(log, "Loading metadata from {}", path_str);
|
||||
|
||||
try
|
||||
@ -1598,6 +1593,8 @@ int Server::main(const std::vector<std::string> & /*args*/)
|
||||
database_catalog.loadDatabases();
|
||||
/// After loading validate that default database exists
|
||||
database_catalog.assertDatabaseExists(default_database);
|
||||
/// Load user-defined SQL functions.
|
||||
global_context->getUserDefinedSQLObjectsLoader().loadObjects();
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
@ -1853,6 +1850,82 @@ int Server::main(const std::vector<std::string> & /*args*/)
|
||||
return Application::EXIT_OK;
|
||||
}
|
||||
|
||||
std::unique_ptr<TCPProtocolStackFactory> Server::buildProtocolStackFromConfig(
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
const std::string & protocol,
|
||||
Poco::Net::HTTPServerParams::Ptr http_params,
|
||||
AsynchronousMetrics & async_metrics,
|
||||
bool & is_secure)
|
||||
{
|
||||
auto create_factory = [&](const std::string & type, const std::string & conf_name) -> TCPServerConnectionFactory::Ptr
|
||||
{
|
||||
if (type == "tcp")
|
||||
return TCPServerConnectionFactory::Ptr(new TCPHandlerFactory(*this, false, false));
|
||||
|
||||
if (type == "tls")
|
||||
#if USE_SSL
|
||||
return TCPServerConnectionFactory::Ptr(new TLSHandlerFactory(*this, conf_name));
|
||||
#else
|
||||
throw Exception{"SSL support for TCP protocol is disabled because Poco library was built without NetSSL support.",
|
||||
ErrorCodes::SUPPORT_IS_DISABLED};
|
||||
#endif
|
||||
|
||||
if (type == "proxy1")
|
||||
return TCPServerConnectionFactory::Ptr(new ProxyV1HandlerFactory(*this, conf_name));
|
||||
if (type == "mysql")
|
||||
return TCPServerConnectionFactory::Ptr(new MySQLHandlerFactory(*this));
|
||||
if (type == "postgres")
|
||||
return TCPServerConnectionFactory::Ptr(new PostgreSQLHandlerFactory(*this));
|
||||
if (type == "http")
|
||||
return TCPServerConnectionFactory::Ptr(
|
||||
new HTTPServerConnectionFactory(context(), http_params, createHandlerFactory(*this, config, async_metrics, "HTTPHandler-factory"))
|
||||
);
|
||||
if (type == "prometheus")
|
||||
return TCPServerConnectionFactory::Ptr(
|
||||
new HTTPServerConnectionFactory(context(), http_params, createHandlerFactory(*this, config, async_metrics, "PrometheusHandler-factory"))
|
||||
);
|
||||
if (type == "interserver")
|
||||
return TCPServerConnectionFactory::Ptr(
|
||||
new HTTPServerConnectionFactory(context(), http_params, createHandlerFactory(*this, config, async_metrics, "InterserverIOHTTPHandler-factory"))
|
||||
);
|
||||
|
||||
throw Exception(ErrorCodes::INVALID_CONFIG_PARAMETER, "Protocol configuration error, unknown protocol name '{}'", type);
|
||||
};
|
||||
|
||||
std::string conf_name = "protocols." + protocol;
|
||||
std::string prefix = conf_name + ".";
|
||||
std::unordered_set<std::string> pset {conf_name};
|
||||
|
||||
auto stack = std::make_unique<TCPProtocolStackFactory>(*this, conf_name);
|
||||
|
||||
while (true)
|
||||
{
|
||||
// if there is no "type" - it's a reference to another protocol and this is just an endpoint
|
||||
if (config.has(prefix + "type"))
|
||||
{
|
||||
std::string type = config.getString(prefix + "type");
|
||||
if (type == "tls")
|
||||
{
|
||||
if (is_secure)
|
||||
throw Exception(ErrorCodes::INVALID_CONFIG_PARAMETER, "Protocol '{}' contains more than one TLS layer", protocol);
|
||||
is_secure = true;
|
||||
}
|
||||
|
||||
stack->append(create_factory(type, conf_name));
|
||||
}
|
||||
|
||||
if (!config.has(prefix + "impl"))
|
||||
break;
|
||||
|
||||
conf_name = "protocols." + config.getString(prefix + "impl");
|
||||
prefix = conf_name + ".";
|
||||
|
||||
if (!pset.insert(conf_name).second)
|
||||
throw Exception(ErrorCodes::INVALID_CONFIG_PARAMETER, "Protocol '{}' configuration contains a loop on '{}'", protocol, conf_name);
|
||||
}
|
||||
|
||||
return stack;
|
||||
}
|
||||
|
||||
void Server::createServers(
|
||||
Poco::Util::AbstractConfiguration & config,
|
||||
@ -1871,6 +1944,55 @@ void Server::createServers(
|
||||
http_params->setTimeout(settings.http_receive_timeout);
|
||||
http_params->setKeepAliveTimeout(keep_alive_timeout);
|
||||
|
||||
Poco::Util::AbstractConfiguration::Keys protocols;
|
||||
config.keys("protocols", protocols);
|
||||
|
||||
for (const auto & protocol : protocols)
|
||||
{
|
||||
std::vector<std::string> hosts;
|
||||
if (config.has("protocols." + protocol + ".host"))
|
||||
hosts.push_back(config.getString("protocols." + protocol + ".host"));
|
||||
else
|
||||
hosts = listen_hosts;
|
||||
|
||||
for (const auto & host : hosts)
|
||||
{
|
||||
std::string conf_name = "protocols." + protocol;
|
||||
std::string prefix = conf_name + ".";
|
||||
|
||||
if (!config.has(prefix + "port"))
|
||||
continue;
|
||||
|
||||
std::string description {"<undefined> protocol"};
|
||||
if (config.has(prefix + "description"))
|
||||
description = config.getString(prefix + "description");
|
||||
std::string port_name = prefix + "port";
|
||||
bool is_secure = false;
|
||||
auto stack = buildProtocolStackFromConfig(config, protocol, http_params, async_metrics, is_secure);
|
||||
|
||||
if (stack->empty())
|
||||
throw Exception(ErrorCodes::INVALID_CONFIG_PARAMETER, "Protocol '{}' stack empty", protocol);
|
||||
|
||||
createServer(config, host, port_name.c_str(), listen_try, start_servers, servers, [&](UInt16 port) -> ProtocolServerAdapter
|
||||
{
|
||||
Poco::Net::ServerSocket socket;
|
||||
auto address = socketBindListen(config, socket, host, port, is_secure);
|
||||
socket.setReceiveTimeout(settings.receive_timeout);
|
||||
socket.setSendTimeout(settings.send_timeout);
|
||||
|
||||
return ProtocolServerAdapter(
|
||||
host,
|
||||
port_name.c_str(),
|
||||
description + ": " + address.toString(),
|
||||
std::make_unique<TCPServer>(
|
||||
stack.release(),
|
||||
server_pool,
|
||||
socket,
|
||||
new Poco::Net::TCPServerParams));
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
for (const auto & listen_host : listen_hosts)
|
||||
{
|
||||
/// HTTP
|
||||
@ -2118,13 +2240,50 @@ void Server::updateServers(
|
||||
{
|
||||
if (!server.isStopping())
|
||||
{
|
||||
bool has_host = std::find(listen_hosts.begin(), listen_hosts.end(), server.getListenHost()) != listen_hosts.end();
|
||||
bool has_port = !config.getString(server.getPortName(), "").empty();
|
||||
std::string port_name = server.getPortName();
|
||||
bool has_host = false;
|
||||
bool is_http = false;
|
||||
if (port_name.starts_with("protocols."))
|
||||
{
|
||||
std::string protocol = port_name.substr(0, port_name.find_last_of('.'));
|
||||
has_host = config.has(protocol + ".host");
|
||||
|
||||
/// NOTE: better to compare using getPortName() over using
|
||||
/// dynamic_cast<> since HTTPServer is also used for prometheus and
|
||||
/// internal replication communications.
|
||||
bool is_http = server.getPortName() == "http_port" || server.getPortName() == "https_port";
|
||||
std::string conf_name = protocol;
|
||||
std::string prefix = protocol + ".";
|
||||
std::unordered_set<std::string> pset {conf_name};
|
||||
while (true)
|
||||
{
|
||||
if (config.has(prefix + "type"))
|
||||
{
|
||||
std::string type = config.getString(prefix + "type");
|
||||
if (type == "http")
|
||||
{
|
||||
is_http = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!config.has(prefix + "impl"))
|
||||
break;
|
||||
|
||||
conf_name = "protocols." + config.getString(prefix + "impl");
|
||||
prefix = conf_name + ".";
|
||||
|
||||
if (!pset.insert(conf_name).second)
|
||||
throw Exception(ErrorCodes::INVALID_CONFIG_PARAMETER, "Protocol '{}' configuration contains a loop on '{}'", protocol, conf_name);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
/// NOTE: better to compare using getPortName() over using
|
||||
/// dynamic_cast<> since HTTPServer is also used for prometheus and
|
||||
/// internal replication communications.
|
||||
is_http = server.getPortName() == "http_port" || server.getPortName() == "https_port";
|
||||
}
|
||||
|
||||
if (!has_host)
|
||||
has_host = std::find(listen_hosts.begin(), listen_hosts.end(), server.getListenHost()) != listen_hosts.end();
|
||||
bool has_port = !config.getString(port_name, "").empty();
|
||||
bool force_restart = is_http && !isSameConfiguration(previous_config, config, "http_handlers");
|
||||
if (force_restart)
|
||||
LOG_TRACE(log, "<http_handlers> had been changed, will reload {}", server.getDescription());
|
||||
|
@ -3,6 +3,8 @@
|
||||
#include <Server/IServer.h>
|
||||
|
||||
#include <Daemon/BaseDaemon.h>
|
||||
#include <Server/TCPProtocolStackFactory.h>
|
||||
#include <Poco/Net/HTTPServerParams.h>
|
||||
|
||||
/** Server provides three interfaces:
|
||||
* 1. HTTP - simple interface for any applications.
|
||||
@ -77,6 +79,13 @@ private:
|
||||
UInt16 port,
|
||||
[[maybe_unused]] bool secure = false) const;
|
||||
|
||||
std::unique_ptr<TCPProtocolStackFactory> buildProtocolStackFromConfig(
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
const std::string & protocol,
|
||||
Poco::Net::HTTPServerParams::Ptr http_params,
|
||||
AsynchronousMetrics & async_metrics,
|
||||
bool & is_secure);
|
||||
|
||||
using CreateServerFunc = std::function<ProtocolServerAdapter(UInt16)>;
|
||||
void createServer(
|
||||
Poco::Util::AbstractConfiguration & config,
|
||||
|
@ -0,0 +1,38 @@
|
||||
#include <AggregateFunctions/AggregateFunctionFactory.h>
|
||||
#include <AggregateFunctions/AggregateFunctionAnalysisOfVariance.h>
|
||||
#include <AggregateFunctions/FactoryHelpers.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int BAD_ARGUMENTS;
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
AggregateFunctionPtr createAggregateFunctionAnalysisOfVariance(const std::string & name, const DataTypes & arguments, const Array & parameters, const Settings *)
|
||||
{
|
||||
assertNoParameters(name, parameters);
|
||||
assertBinary(name, arguments);
|
||||
|
||||
if (!isNumber(arguments[0]) || !isNumber(arguments[1]))
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Aggregate function {} only supports numerical types", name);
|
||||
|
||||
return std::make_shared<AggregateFunctionAnalysisOfVariance>(arguments, parameters);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
void registerAggregateFunctionAnalysisOfVariance(AggregateFunctionFactory & factory)
|
||||
{
|
||||
AggregateFunctionProperties properties = { .is_order_dependent = false };
|
||||
factory.registerFunction("analysisOfVariance", {createAggregateFunctionAnalysisOfVariance, properties}, AggregateFunctionFactory::CaseInsensitive);
|
||||
|
||||
/// This is widely used term
|
||||
factory.registerAlias("anova", "analysisOfVariance", AggregateFunctionFactory::CaseInsensitive);
|
||||
}
|
||||
|
||||
}
|
98
src/AggregateFunctions/AggregateFunctionAnalysisOfVariance.h
Normal file
98
src/AggregateFunctions/AggregateFunctionAnalysisOfVariance.h
Normal file
@ -0,0 +1,98 @@
|
||||
#pragma once
|
||||
|
||||
#include <IO/VarInt.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
|
||||
#include <array>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <Columns/ColumnNullable.h>
|
||||
#include <Columns/ColumnsCommon.h>
|
||||
#include <AggregateFunctions/IAggregateFunction.h>
|
||||
#include <AggregateFunctions/Moments.h>
|
||||
#include "Common/NaNUtils.h"
|
||||
#include <Common/assert_cast.h>
|
||||
#include <Core/Types.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int BAD_ARGUMENTS;
|
||||
}
|
||||
|
||||
class AggregateFunctionAnalysisOfVarianceData final : public AnalysisOfVarianceMoments<Float64>
|
||||
{
|
||||
};
|
||||
|
||||
|
||||
/// One way analysis of variance
|
||||
/// Provides a statistical test of whether two or more population means are equal (null hypothesis)
|
||||
/// Has an assumption that subjects from group i have normal distribution.
|
||||
/// Accepts two arguments - a value and a group number which this value belongs to.
|
||||
/// Groups are enumerated starting from 0 and there should be at least two groups to perform a test
|
||||
/// Moreover there should be at least one group with the number of observations greater than one.
|
||||
class AggregateFunctionAnalysisOfVariance final : public IAggregateFunctionDataHelper<AggregateFunctionAnalysisOfVarianceData, AggregateFunctionAnalysisOfVariance>
|
||||
{
|
||||
public:
|
||||
explicit AggregateFunctionAnalysisOfVariance(const DataTypes & arguments, const Array & params)
|
||||
: IAggregateFunctionDataHelper(arguments, params)
|
||||
{}
|
||||
|
||||
DataTypePtr getReturnType() const override
|
||||
{
|
||||
DataTypes types {std::make_shared<DataTypeNumber<Float64>>(), std::make_shared<DataTypeNumber<Float64>>() };
|
||||
Strings names {"f_statistic", "p_value"};
|
||||
return std::make_shared<DataTypeTuple>(
|
||||
std::move(types),
|
||||
std::move(names)
|
||||
);
|
||||
}
|
||||
|
||||
String getName() const override { return "analysisOfVariance"; }
|
||||
|
||||
bool allocatesMemoryInArena() const override { return false; }
|
||||
|
||||
void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override
|
||||
{
|
||||
data(place).add(columns[0]->getFloat64(row_num), columns[1]->getUInt(row_num));
|
||||
}
|
||||
|
||||
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, Arena *) const override
|
||||
{
|
||||
data(place).merge(data(rhs));
|
||||
}
|
||||
|
||||
void serialize(ConstAggregateDataPtr __restrict place, WriteBuffer & buf, std::optional<size_t> /* version */) const override
|
||||
{
|
||||
data(place).write(buf);
|
||||
}
|
||||
|
||||
void deserialize(AggregateDataPtr __restrict place, ReadBuffer & buf, std::optional<size_t> /* version */, Arena *) const override
|
||||
{
|
||||
data(place).read(buf);
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena *) const override
|
||||
{
|
||||
auto f_stat = data(place).getFStatistic();
|
||||
if (std::isinf(f_stat) || isNaN(f_stat))
|
||||
throw Exception("F statistic is not defined or infinite for these arguments", ErrorCodes::BAD_ARGUMENTS);
|
||||
|
||||
auto p_value = data(place).getPValue(f_stat);
|
||||
|
||||
/// Because p-value is a probability.
|
||||
p_value = std::min(1.0, std::max(0.0, p_value));
|
||||
|
||||
auto & column_tuple = assert_cast<ColumnTuple &>(to);
|
||||
auto & column_stat = assert_cast<ColumnVector<Float64> &>(column_tuple.getColumn(0));
|
||||
auto & column_value = assert_cast<ColumnVector<Float64> &>(column_tuple.getColumn(1));
|
||||
|
||||
column_stat.getData().push_back(f_stat);
|
||||
column_value.getData().push_back(p_value);
|
||||
}
|
||||
|
||||
};
|
||||
|
||||
}
|
@ -156,6 +156,11 @@ public:
|
||||
nested_func->insertResultInto(place, to, arena);
|
||||
}
|
||||
|
||||
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
nested_func->insertMergeResultInto(place, to, arena);
|
||||
}
|
||||
|
||||
bool allocatesMemoryInArena() const override
|
||||
{
|
||||
return nested_func->allocatesMemoryInArena();
|
||||
|
@ -196,7 +196,8 @@ public:
|
||||
this->data(place).deserialize(buf, arena);
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
template <bool MergeResult>
|
||||
void insertResultIntoImpl(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const
|
||||
{
|
||||
auto arguments = this->data(place).getArguments(this->argument_types);
|
||||
ColumnRawPtrs arguments_raw(arguments.size());
|
||||
@ -205,7 +206,20 @@ public:
|
||||
|
||||
assert(!arguments.empty());
|
||||
nested_func->addBatchSinglePlace(0, arguments[0]->size(), getNestedPlace(place), arguments_raw.data(), arena);
|
||||
nested_func->insertResultInto(getNestedPlace(place), to, arena);
|
||||
if constexpr (MergeResult)
|
||||
nested_func->insertMergeResultInto(getNestedPlace(place), to, arena);
|
||||
else
|
||||
nested_func->insertResultInto(getNestedPlace(place), to, arena);
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<false>(place, to, arena);
|
||||
}
|
||||
|
||||
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<true>(place, to, arena);
|
||||
}
|
||||
|
||||
size_t sizeOfData() const override
|
||||
|
@ -114,4 +114,6 @@ struct AggregateUtils
|
||||
}
|
||||
};
|
||||
|
||||
const String & getAggregateFunctionCanonicalNameIfAny(const String & name);
|
||||
|
||||
}
|
||||
|
@ -257,7 +257,8 @@ public:
|
||||
}
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
template <bool merge>
|
||||
void insertResultIntoImpl(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const
|
||||
{
|
||||
AggregateFunctionForEachData & state = data(place);
|
||||
|
||||
@ -268,13 +269,26 @@ public:
|
||||
char * nested_state = state.array_of_aggregate_datas;
|
||||
for (size_t i = 0; i < state.dynamic_array_size; ++i)
|
||||
{
|
||||
nested_func->insertResultInto(nested_state, elems_to, arena);
|
||||
if constexpr (merge)
|
||||
nested_func->insertMergeResultInto(nested_state, elems_to, arena);
|
||||
else
|
||||
nested_func->insertResultInto(nested_state, elems_to, arena);
|
||||
nested_state += nested_size_of_data;
|
||||
}
|
||||
|
||||
offsets_to.push_back(offsets_to.back() + state.dynamic_array_size);
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<false>(place, to, arena);
|
||||
}
|
||||
|
||||
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<true>(place, to, arena);
|
||||
}
|
||||
|
||||
bool allocatesMemoryInArena() const override
|
||||
{
|
||||
return true;
|
||||
|
@ -183,6 +183,11 @@ public:
|
||||
nested_func->insertResultInto(place, to, arena);
|
||||
}
|
||||
|
||||
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
nested_func->insertMergeResultInto(place, to, arena);
|
||||
}
|
||||
|
||||
bool allocatesMemoryInArena() const override
|
||||
{
|
||||
return nested_func->allocatesMemoryInArena();
|
||||
|
@ -264,7 +264,8 @@ public:
|
||||
}
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
template <bool merge>
|
||||
void insertResultIntoImpl(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const
|
||||
{
|
||||
auto & map_column = assert_cast<ColumnMap &>(to);
|
||||
auto & nested_column = map_column.getNestedColumn();
|
||||
@ -288,13 +289,26 @@ public:
|
||||
for (auto & key : keys)
|
||||
{
|
||||
key_column.insert(key);
|
||||
nested_func->insertResultInto(merged_maps[key], val_column, arena);
|
||||
if constexpr (merge)
|
||||
nested_func->insertMergeResultInto(merged_maps[key], val_column, arena);
|
||||
else
|
||||
nested_func->insertResultInto(merged_maps[key], val_column, arena);
|
||||
}
|
||||
|
||||
IColumn::Offsets & res_offsets = nested_column.getOffsets();
|
||||
res_offsets.push_back(val_column.size());
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<false>(place, to, arena);
|
||||
}
|
||||
|
||||
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<true>(place, to, arena);
|
||||
}
|
||||
|
||||
bool allocatesMemoryInArena() const override { return true; }
|
||||
|
||||
AggregateFunctionPtr getNestedFunction() const override { return nested_func; }
|
||||
|
@ -163,14 +163,18 @@ public:
|
||||
}
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
template <bool merge>
|
||||
void insertResultIntoImpl(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const
|
||||
{
|
||||
if constexpr (result_is_nullable)
|
||||
{
|
||||
ColumnNullable & to_concrete = assert_cast<ColumnNullable &>(to);
|
||||
if (getFlag(place))
|
||||
{
|
||||
nested_function->insertResultInto(nestedPlace(place), to_concrete.getNestedColumn(), arena);
|
||||
if constexpr (merge)
|
||||
nested_function->insertMergeResultInto(nestedPlace(place), to_concrete.getNestedColumn(), arena);
|
||||
else
|
||||
nested_function->insertResultInto(nestedPlace(place), to_concrete.getNestedColumn(), arena);
|
||||
to_concrete.getNullMapData().push_back(0);
|
||||
}
|
||||
else
|
||||
@ -180,10 +184,23 @@ public:
|
||||
}
|
||||
else
|
||||
{
|
||||
nested_function->insertResultInto(nestedPlace(place), to, arena);
|
||||
if constexpr (merge)
|
||||
nested_function->insertMergeResultInto(nestedPlace(place), to, arena);
|
||||
else
|
||||
nested_function->insertResultInto(nestedPlace(place), to, arena);
|
||||
}
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<false>(place, to, arena);
|
||||
}
|
||||
|
||||
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<true>(place, to, arena);
|
||||
}
|
||||
|
||||
bool allocatesMemoryInArena() const override
|
||||
{
|
||||
return nested_function->allocatesMemoryInArena();
|
||||
|
@ -265,10 +265,11 @@ public:
|
||||
}
|
||||
}
|
||||
|
||||
void insertResultInto(
|
||||
template <bool merge>
|
||||
void insertResultIntoImpl(
|
||||
AggregateDataPtr __restrict place,
|
||||
IColumn & to,
|
||||
Arena * arena) const override
|
||||
Arena * arena) const
|
||||
{
|
||||
if (place[size_of_data])
|
||||
{
|
||||
@ -277,7 +278,12 @@ public:
|
||||
// -OrNull
|
||||
|
||||
if (inner_nullable)
|
||||
nested_function->insertResultInto(place, to, arena);
|
||||
{
|
||||
if constexpr (merge)
|
||||
nested_function->insertMergeResultInto(place, to, arena);
|
||||
else
|
||||
nested_function->insertResultInto(place, to, arena);
|
||||
}
|
||||
else
|
||||
{
|
||||
ColumnNullable & col = typeid_cast<ColumnNullable &>(to);
|
||||
@ -289,14 +295,26 @@ public:
|
||||
else
|
||||
{
|
||||
// -OrDefault
|
||||
|
||||
nested_function->insertResultInto(place, to, arena);
|
||||
if constexpr (merge)
|
||||
nested_function->insertMergeResultInto(place, to, arena);
|
||||
else
|
||||
nested_function->insertResultInto(place, to, arena);
|
||||
}
|
||||
}
|
||||
else
|
||||
to.insertDefault();
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<false>(place, to, arena);
|
||||
}
|
||||
|
||||
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<true>(place, to, arena);
|
||||
}
|
||||
|
||||
AggregateFunctionPtr getNestedFunction() const override { return nested_function; }
|
||||
};
|
||||
|
||||
|
@ -46,7 +46,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile(
|
||||
if (which.idx == TypeIndex::DateTime64) return std::make_shared<Function<DateTime64, false>>(argument_types, params);
|
||||
|
||||
if (which.idx == TypeIndex::Int128) return std::make_shared<Function<Int128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<Int128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<UInt128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::Int256) return std::make_shared<Function<Int256, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt256) return std::make_shared<Function<UInt256, true>>(argument_types, params);
|
||||
|
||||
|
@ -40,7 +40,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile(
|
||||
if (which.idx == TypeIndex::DateTime) return std::make_shared<Function<DataTypeDateTime::FieldType, false>>(argument_types, params);
|
||||
|
||||
if (which.idx == TypeIndex::Int128) return std::make_shared<Function<Int128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<Int128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<UInt128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::Int256) return std::make_shared<Function<Int256, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt256) return std::make_shared<Function<UInt256, true>>(argument_types, params);
|
||||
|
||||
|
@ -47,7 +47,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile(
|
||||
if (which.idx == TypeIndex::DateTime64) return std::make_shared<Function<DateTime64, false>>(argument_types, params);
|
||||
|
||||
if (which.idx == TypeIndex::Int128) return std::make_shared<Function<Int128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<Int128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<UInt128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::Int256) return std::make_shared<Function<Int256, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt256) return std::make_shared<Function<UInt256, true>>(argument_types, params);
|
||||
|
||||
|
@ -46,7 +46,7 @@ AggregateFunctionPtr createAggregateFunctionQuantile(
|
||||
if (which.idx == TypeIndex::DateTime64) return std::make_shared<Function<DateTime64, false>>(argument_types, params);
|
||||
|
||||
if (which.idx == TypeIndex::Int128) return std::make_shared<Function<Int128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<Int128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt128) return std::make_shared<Function<UInt128, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::Int256) return std::make_shared<Function<Int256, true>>(argument_types, params);
|
||||
if (which.idx == TypeIndex::UInt256) return std::make_shared<Function<UInt256, true>>(argument_types, params);
|
||||
|
||||
|
@ -195,17 +195,33 @@ public:
|
||||
return std::make_shared<DataTypeArray>(nested_function->getReturnType());
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
template <bool merge>
|
||||
void insertResultIntoImpl(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const
|
||||
{
|
||||
auto & col = assert_cast<ColumnArray &>(to);
|
||||
auto & col_offsets = assert_cast<ColumnArray::ColumnOffsets &>(col.getOffsetsColumn());
|
||||
|
||||
for (size_t i = 0; i < total; ++i)
|
||||
nested_function->insertResultInto(place + i * size_of_data, col.getData(), arena);
|
||||
{
|
||||
if constexpr (merge)
|
||||
nested_function->insertMergeResultInto(place + i * size_of_data, col.getData(), arena);
|
||||
else
|
||||
nested_function->insertResultInto(place + i * size_of_data, col.getData(), arena);
|
||||
}
|
||||
|
||||
col_offsets.getData().push_back(col.getData().size());
|
||||
}
|
||||
|
||||
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<false>(place, to, arena);
|
||||
}
|
||||
|
||||
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
|
||||
{
|
||||
insertResultIntoImpl<true>(place, to, arena);
|
||||
}
|
||||
|
||||
AggregateFunctionPtr getNestedFunction() const override { return nested_function; }
|
||||
};
|
||||
|
||||
|
@ -111,6 +111,11 @@ public:
|
||||
assert_cast<ColumnAggregateFunction &>(to).getData().push_back(place);
|
||||
}
|
||||
|
||||
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena *) const override
|
||||
{
|
||||
assert_cast<ColumnAggregateFunction &>(to).insertFrom(place);
|
||||
}
|
||||
|
||||
/// Aggregate function or aggregate function state.
|
||||
bool isState() const override { return true; }
|
||||
|
||||
|
@ -40,7 +40,15 @@ struct WelchTTestData : public TTestMoments<Float64>
|
||||
Float64 denominator_x = sx2 * sx2 / (nx * nx * (nx - 1));
|
||||
Float64 denominator_y = sy2 * sy2 / (ny * ny * (ny - 1));
|
||||
|
||||
return numerator / (denominator_x + denominator_y);
|
||||
auto result = numerator / (denominator_x + denominator_y);
|
||||
|
||||
if (result <= 0 || std::isinf(result) || isNaN(result))
|
||||
throw Exception(
|
||||
ErrorCodes::BAD_ARGUMENTS,
|
||||
"Cannot calculate p_value, because the t-distribution \
|
||||
has inappropriate value of degrees of freedom (={}). It should be > 0", result);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
std::tuple<Float64, Float64> getResult() const
|
||||
|
@ -164,6 +164,18 @@ public:
|
||||
/// window function.
|
||||
virtual void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const = 0;
|
||||
|
||||
/// Special method for aggregate functions with -State combinator, it behaves the same way as insertResultInto,
|
||||
/// but if we need to insert AggregateData into ColumnAggregateFunction we use special method
|
||||
/// insertInto that inserts default value and then performs merge with provided AggregateData
|
||||
/// instead of just copying pointer to this AggregateData. Used in WindowTransform.
|
||||
virtual void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const
|
||||
{
|
||||
if (isState())
|
||||
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Function {} is marked as State but method insertMergeResultInto is not implemented");
|
||||
|
||||
insertResultInto(place, to, arena);
|
||||
}
|
||||
|
||||
/// Used for machine learning methods. Predict result from trained model.
|
||||
/// Will insert result into `to` column for rows in range [offset, offset + limit).
|
||||
virtual void predictValues(
|
||||
|
@ -4,7 +4,9 @@
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <boost/math/distributions/students_t.hpp>
|
||||
#include <boost/math/distributions/normal.hpp>
|
||||
#include <boost/math/distributions/fisher_f.hpp>
|
||||
#include <cfloat>
|
||||
#include <numeric>
|
||||
|
||||
|
||||
namespace DB
|
||||
@ -13,6 +15,7 @@ struct Settings;
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int BAD_ARGUMENTS;
|
||||
extern const int DECIMAL_OVERFLOW;
|
||||
}
|
||||
|
||||
@ -476,4 +479,127 @@ struct ZTestMoments
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
struct AnalysisOfVarianceMoments
|
||||
{
|
||||
/// Sums of values within a group
|
||||
std::vector<T> xs1{};
|
||||
/// Sums of squared values within a group
|
||||
std::vector<T> xs2{};
|
||||
/// Sizes of each group. Total number of observations is just a sum of all these values
|
||||
std::vector<size_t> ns{};
|
||||
|
||||
void resizeIfNeeded(size_t possible_size)
|
||||
{
|
||||
if (xs1.size() >= possible_size)
|
||||
return;
|
||||
|
||||
xs1.resize(possible_size, 0.0);
|
||||
xs2.resize(possible_size, 0.0);
|
||||
ns.resize(possible_size, 0);
|
||||
}
|
||||
|
||||
void add(T value, size_t group)
|
||||
{
|
||||
resizeIfNeeded(group + 1);
|
||||
xs1[group] += value;
|
||||
xs2[group] += value * value;
|
||||
ns[group] += 1;
|
||||
}
|
||||
|
||||
void merge(const AnalysisOfVarianceMoments & rhs)
|
||||
{
|
||||
resizeIfNeeded(rhs.xs1.size());
|
||||
for (size_t i = 0; i < rhs.xs1.size(); ++i)
|
||||
{
|
||||
xs1[i] += rhs.xs1[i];
|
||||
xs2[i] += rhs.xs2[i];
|
||||
ns[i] += rhs.ns[i];
|
||||
}
|
||||
}
|
||||
|
||||
void write(WriteBuffer & buf) const
|
||||
{
|
||||
writeVectorBinary(xs1, buf);
|
||||
writeVectorBinary(xs2, buf);
|
||||
writeVectorBinary(ns, buf);
|
||||
}
|
||||
|
||||
void read(ReadBuffer & buf)
|
||||
{
|
||||
readVectorBinary(xs1, buf);
|
||||
readVectorBinary(xs2, buf);
|
||||
readVectorBinary(ns, buf);
|
||||
}
|
||||
|
||||
Float64 getMeanAll() const
|
||||
{
|
||||
const auto n = std::accumulate(ns.begin(), ns.end(), 0UL);
|
||||
if (n == 0)
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "There are no observations to calculate mean value");
|
||||
|
||||
return std::accumulate(xs1.begin(), xs1.end(), 0.0) / n;
|
||||
}
|
||||
|
||||
Float64 getMeanGroup(size_t group) const
|
||||
{
|
||||
if (ns[group] == 0)
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "There is no observations for group {}", group);
|
||||
|
||||
return xs1[group] / ns[group];
|
||||
}
|
||||
|
||||
Float64 getBetweenGroupsVariation() const
|
||||
{
|
||||
Float64 res = 0;
|
||||
auto mean = getMeanAll();
|
||||
|
||||
for (size_t i = 0; i < xs1.size(); ++i)
|
||||
{
|
||||
auto group_mean = getMeanGroup(i);
|
||||
res += ns[i] * (group_mean - mean) * (group_mean - mean);
|
||||
}
|
||||
return res;
|
||||
}
|
||||
|
||||
Float64 getWithinGroupsVariation() const
|
||||
{
|
||||
Float64 res = 0;
|
||||
for (size_t i = 0; i < xs1.size(); ++i)
|
||||
{
|
||||
auto group_mean = getMeanGroup(i);
|
||||
res += xs2[i] + ns[i] * group_mean * group_mean - 2 * group_mean * xs1[i];
|
||||
}
|
||||
return res;
|
||||
}
|
||||
|
||||
Float64 getFStatistic() const
|
||||
{
|
||||
const auto k = xs1.size();
|
||||
const auto n = std::accumulate(ns.begin(), ns.end(), 0UL);
|
||||
|
||||
if (k == 1)
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "There should be more than one group to calculate f-statistics");
|
||||
|
||||
if (k == n)
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "There is only one observation in each group");
|
||||
|
||||
return (getBetweenGroupsVariation() * (n - k)) / (getWithinGroupsVariation() * (k - 1));
|
||||
}
|
||||
|
||||
Float64 getPValue(Float64 f_statistic) const
|
||||
{
|
||||
const auto k = xs1.size();
|
||||
const auto n = std::accumulate(ns.begin(), ns.end(), 0UL);
|
||||
|
||||
if (k == 1)
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "There should be more than one group to calculate f-statistics");
|
||||
|
||||
if (k == n)
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "There is only one observation in each group");
|
||||
|
||||
return 1.0f - boost::math::cdf(boost::math::fisher_f(k - 1, n - k), f_statistic);
|
||||
}
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -72,6 +72,7 @@ void registerAggregateFunctionNothing(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionExponentialMovingAverage(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionSparkbar(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionIntervalLengthSum(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionAnalysisOfVariance(AggregateFunctionFactory &);
|
||||
|
||||
class AggregateFunctionCombinatorFactory;
|
||||
void registerAggregateFunctionCombinatorIf(AggregateFunctionCombinatorFactory &);
|
||||
@ -156,6 +157,7 @@ void registerAggregateFunctions()
|
||||
registerAggregateFunctionIntervalLengthSum(factory);
|
||||
registerAggregateFunctionExponentialMovingAverage(factory);
|
||||
registerAggregateFunctionSparkbar(factory);
|
||||
registerAggregateFunctionAnalysisOfVariance(factory);
|
||||
|
||||
registerWindowFunctions(factory);
|
||||
}
|
||||
|
@ -32,10 +32,12 @@ void BackupFactory::registerBackupEngine(const String & engine_name, const Creat
|
||||
}
|
||||
|
||||
void registerBackupEnginesFileAndDisk(BackupFactory &);
|
||||
void registerBackupEngineS3(BackupFactory &);
|
||||
|
||||
void registerBackupEngines(BackupFactory & factory)
|
||||
{
|
||||
registerBackupEnginesFileAndDisk(factory);
|
||||
registerBackupEngineS3(factory);
|
||||
}
|
||||
|
||||
BackupFactory::BackupFactory()
|
||||
|
375
src/Backups/BackupIO_S3.cpp
Normal file
375
src/Backups/BackupIO_S3.cpp
Normal file
@ -0,0 +1,375 @@
|
||||
#include <Backups/BackupIO_S3.h>
|
||||
|
||||
#if USE_AWS_S3
|
||||
#include <Common/quoteString.h>
|
||||
#include <Interpreters/threadPoolCallbackRunner.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Storages/StorageS3Settings.h>
|
||||
#include <IO/IOThreadPool.h>
|
||||
#include <IO/ReadBufferFromS3.h>
|
||||
#include <IO/WriteBufferFromS3.h>
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
#include <aws/core/auth/AWSCredentials.h>
|
||||
#include <aws/s3/S3Client.h>
|
||||
#include <filesystem>
|
||||
|
||||
#include <aws/s3/model/ListObjectsRequest.h>
|
||||
|
||||
|
||||
namespace fs = std::filesystem;
|
||||
|
||||
namespace DB
|
||||
{
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int S3_ERROR;
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
std::shared_ptr<Aws::S3::S3Client>
|
||||
makeS3Client(const S3::URI & s3_uri, const String & access_key_id, const String & secret_access_key, const ContextPtr & context)
|
||||
{
|
||||
auto settings = context->getStorageS3Settings().getSettings(s3_uri.uri.toString());
|
||||
|
||||
Aws::Auth::AWSCredentials credentials(access_key_id, secret_access_key);
|
||||
HeaderCollection headers;
|
||||
if (access_key_id.empty())
|
||||
{
|
||||
credentials = Aws::Auth::AWSCredentials(settings.auth_settings.access_key_id, settings.auth_settings.secret_access_key);
|
||||
headers = settings.auth_settings.headers;
|
||||
}
|
||||
|
||||
S3::PocoHTTPClientConfiguration client_configuration = S3::ClientFactory::instance().createClientConfiguration(
|
||||
settings.auth_settings.region,
|
||||
context->getRemoteHostFilter(),
|
||||
context->getGlobalContext()->getSettingsRef().s3_max_redirects,
|
||||
context->getGlobalContext()->getSettingsRef().enable_s3_requests_logging,
|
||||
/* for_disk_s3 = */ false);
|
||||
|
||||
client_configuration.endpointOverride = s3_uri.endpoint;
|
||||
client_configuration.maxConnections = context->getSettingsRef().s3_max_connections;
|
||||
/// Increase connect timeout
|
||||
client_configuration.connectTimeoutMs = 10 * 1000;
|
||||
/// Requests in backups can be extremely long, set to one hour
|
||||
client_configuration.requestTimeoutMs = 60 * 60 * 1000;
|
||||
|
||||
return S3::ClientFactory::instance().create(
|
||||
client_configuration,
|
||||
s3_uri.is_virtual_hosted_style,
|
||||
credentials.GetAWSAccessKeyId(),
|
||||
credentials.GetAWSSecretKey(),
|
||||
settings.auth_settings.server_side_encryption_customer_key_base64,
|
||||
std::move(headers),
|
||||
settings.auth_settings.use_environment_credentials.value_or(
|
||||
context->getConfigRef().getBool("s3.use_environment_credentials", false)),
|
||||
settings.auth_settings.use_insecure_imds_request.value_or(
|
||||
context->getConfigRef().getBool("s3.use_insecure_imds_request", false)));
|
||||
}
|
||||
|
||||
Aws::Vector<Aws::S3::Model::Object> listObjects(Aws::S3::S3Client & client, const S3::URI & s3_uri, const String & file_name)
|
||||
{
|
||||
Aws::S3::Model::ListObjectsRequest request;
|
||||
request.SetBucket(s3_uri.bucket);
|
||||
request.SetPrefix(fs::path{s3_uri.key} / file_name);
|
||||
request.SetMaxKeys(1);
|
||||
auto outcome = client.ListObjects(request);
|
||||
if (!outcome.IsSuccess())
|
||||
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
|
||||
return outcome.GetResult().GetContents();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
BackupReaderS3::BackupReaderS3(
|
||||
const S3::URI & s3_uri_, const String & access_key_id_, const String & secret_access_key_, const ContextPtr & context_)
|
||||
: s3_uri(s3_uri_)
|
||||
, client(makeS3Client(s3_uri_, access_key_id_, secret_access_key_, context_))
|
||||
, max_single_read_retries(context_->getSettingsRef().s3_max_single_read_retries)
|
||||
, read_settings(context_->getReadSettings())
|
||||
{
|
||||
}
|
||||
|
||||
DataSourceDescription BackupReaderS3::getDataSourceDescription() const
|
||||
{
|
||||
return DataSourceDescription{DataSourceType::S3, s3_uri.endpoint, false, false};
|
||||
}
|
||||
|
||||
|
||||
BackupReaderS3::~BackupReaderS3() = default;
|
||||
|
||||
bool BackupReaderS3::fileExists(const String & file_name)
|
||||
{
|
||||
return !listObjects(*client, s3_uri, file_name).empty();
|
||||
}
|
||||
|
||||
UInt64 BackupReaderS3::getFileSize(const String & file_name)
|
||||
{
|
||||
auto objects = listObjects(*client, s3_uri, file_name);
|
||||
if (objects.empty())
|
||||
throw Exception(ErrorCodes::S3_ERROR, "Object {} must exist");
|
||||
return objects[0].GetSize();
|
||||
}
|
||||
|
||||
std::unique_ptr<SeekableReadBuffer> BackupReaderS3::readFile(const String & file_name)
|
||||
{
|
||||
return std::make_unique<ReadBufferFromS3>(
|
||||
client, s3_uri.bucket, fs::path(s3_uri.key) / file_name, s3_uri.version_id, max_single_read_retries, read_settings);
|
||||
}
|
||||
|
||||
|
||||
BackupWriterS3::BackupWriterS3(
|
||||
const S3::URI & s3_uri_, const String & access_key_id_, const String & secret_access_key_, const ContextPtr & context_)
|
||||
: s3_uri(s3_uri_)
|
||||
, client(makeS3Client(s3_uri_, access_key_id_, secret_access_key_, context_))
|
||||
, max_single_read_retries(context_->getSettingsRef().s3_max_single_read_retries)
|
||||
, read_settings(context_->getReadSettings())
|
||||
, rw_settings(context_->getStorageS3Settings().getSettings(s3_uri.uri.toString()).rw_settings)
|
||||
{
|
||||
rw_settings.updateFromSettingsIfEmpty(context_->getSettingsRef());
|
||||
}
|
||||
|
||||
DataSourceDescription BackupWriterS3::getDataSourceDescription() const
|
||||
{
|
||||
return DataSourceDescription{DataSourceType::S3, s3_uri.endpoint, false, false};
|
||||
}
|
||||
|
||||
bool BackupWriterS3::supportNativeCopy(DataSourceDescription data_source_description) const
|
||||
{
|
||||
return getDataSourceDescription() == data_source_description;
|
||||
}
|
||||
|
||||
|
||||
void BackupWriterS3::copyObjectImpl(
|
||||
const String & src_bucket,
|
||||
const String & src_key,
|
||||
const String & dst_bucket,
|
||||
const String & dst_key,
|
||||
std::optional<Aws::S3::Model::HeadObjectResult> head,
|
||||
std::optional<ObjectAttributes> metadata) const
|
||||
{
|
||||
Aws::S3::Model::CopyObjectRequest request;
|
||||
request.SetCopySource(src_bucket + "/" + src_key);
|
||||
request.SetBucket(dst_bucket);
|
||||
request.SetKey(dst_key);
|
||||
if (metadata)
|
||||
{
|
||||
request.SetMetadata(*metadata);
|
||||
request.SetMetadataDirective(Aws::S3::Model::MetadataDirective::REPLACE);
|
||||
}
|
||||
|
||||
auto outcome = client->CopyObject(request);
|
||||
|
||||
if (!outcome.IsSuccess() && outcome.GetError().GetExceptionName() == "EntityTooLarge")
|
||||
{ // Can't come here with MinIO, MinIO allows single part upload for large objects.
|
||||
copyObjectMultipartImpl(src_bucket, src_key, dst_bucket, dst_key, head, metadata);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!outcome.IsSuccess())
|
||||
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
|
||||
|
||||
}
|
||||
|
||||
Aws::S3::Model::HeadObjectOutcome BackupWriterS3::requestObjectHeadData(const std::string & bucket_from, const std::string & key) const
|
||||
{
|
||||
Aws::S3::Model::HeadObjectRequest request;
|
||||
request.SetBucket(bucket_from);
|
||||
request.SetKey(key);
|
||||
|
||||
return client->HeadObject(request);
|
||||
}
|
||||
|
||||
void BackupWriterS3::copyObjectMultipartImpl(
|
||||
const String & src_bucket,
|
||||
const String & src_key,
|
||||
const String & dst_bucket,
|
||||
const String & dst_key,
|
||||
std::optional<Aws::S3::Model::HeadObjectResult> head,
|
||||
std::optional<ObjectAttributes> metadata) const
|
||||
{
|
||||
if (!head)
|
||||
head = requestObjectHeadData(src_bucket, src_key).GetResult();
|
||||
|
||||
size_t size = head->GetContentLength();
|
||||
|
||||
String multipart_upload_id;
|
||||
|
||||
{
|
||||
Aws::S3::Model::CreateMultipartUploadRequest request;
|
||||
request.SetBucket(dst_bucket);
|
||||
request.SetKey(dst_key);
|
||||
if (metadata)
|
||||
request.SetMetadata(*metadata);
|
||||
|
||||
auto outcome = client->CreateMultipartUpload(request);
|
||||
|
||||
if (!outcome.IsSuccess())
|
||||
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
|
||||
|
||||
multipart_upload_id = outcome.GetResult().GetUploadId();
|
||||
}
|
||||
|
||||
std::vector<String> part_tags;
|
||||
|
||||
size_t upload_part_size = rw_settings.min_upload_part_size;
|
||||
for (size_t position = 0, part_number = 1; position < size; ++part_number, position += upload_part_size)
|
||||
{
|
||||
Aws::S3::Model::UploadPartCopyRequest part_request;
|
||||
part_request.SetCopySource(src_bucket + "/" + src_key);
|
||||
part_request.SetBucket(dst_bucket);
|
||||
part_request.SetKey(dst_key);
|
||||
part_request.SetUploadId(multipart_upload_id);
|
||||
part_request.SetPartNumber(part_number);
|
||||
part_request.SetCopySourceRange(fmt::format("bytes={}-{}", position, std::min(size, position + upload_part_size) - 1));
|
||||
|
||||
auto outcome = client->UploadPartCopy(part_request);
|
||||
if (!outcome.IsSuccess())
|
||||
{
|
||||
Aws::S3::Model::AbortMultipartUploadRequest abort_request;
|
||||
abort_request.SetBucket(dst_bucket);
|
||||
abort_request.SetKey(dst_key);
|
||||
abort_request.SetUploadId(multipart_upload_id);
|
||||
client->AbortMultipartUpload(abort_request);
|
||||
// In error case we throw exception later with first error from UploadPartCopy
|
||||
}
|
||||
if (!outcome.IsSuccess())
|
||||
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
|
||||
|
||||
auto etag = outcome.GetResult().GetCopyPartResult().GetETag();
|
||||
part_tags.push_back(etag);
|
||||
}
|
||||
|
||||
{
|
||||
Aws::S3::Model::CompleteMultipartUploadRequest req;
|
||||
req.SetBucket(dst_bucket);
|
||||
req.SetKey(dst_key);
|
||||
req.SetUploadId(multipart_upload_id);
|
||||
|
||||
Aws::S3::Model::CompletedMultipartUpload multipart_upload;
|
||||
for (size_t i = 0; i < part_tags.size(); ++i)
|
||||
{
|
||||
Aws::S3::Model::CompletedPart part;
|
||||
multipart_upload.AddParts(part.WithETag(part_tags[i]).WithPartNumber(i + 1));
|
||||
}
|
||||
|
||||
req.SetMultipartUpload(multipart_upload);
|
||||
|
||||
auto outcome = client->CompleteMultipartUpload(req);
|
||||
|
||||
if (!outcome.IsSuccess())
|
||||
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
|
||||
}
|
||||
}
|
||||
|
||||
void BackupWriterS3::copyFileNative(DiskPtr from_disk, const String & file_name_from, const String & file_name_to)
|
||||
{
|
||||
if (!from_disk)
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot natively copy data to disk without source disk");
|
||||
|
||||
auto objects = from_disk->getStorageObjects(file_name_from);
|
||||
if (objects.size() > 1)
|
||||
{
|
||||
copyFileThroughBuffer(from_disk->readFile(file_name_from), file_name_to);
|
||||
}
|
||||
else
|
||||
{
|
||||
auto object_storage = from_disk->getObjectStorage();
|
||||
std::string source_bucket = object_storage->getObjectsNamespace();
|
||||
auto file_path = fs::path(s3_uri.key) / file_name_to;
|
||||
|
||||
auto head = requestObjectHeadData(source_bucket, objects[0].absolute_path).GetResult();
|
||||
static constexpr int64_t multipart_upload_threashold = 5UL * 1024 * 1024 * 1024;
|
||||
if (head.GetContentLength() >= multipart_upload_threashold)
|
||||
{
|
||||
copyObjectMultipartImpl(
|
||||
source_bucket, objects[0].absolute_path, s3_uri.bucket, file_path, head);
|
||||
}
|
||||
else
|
||||
{
|
||||
copyObjectImpl(
|
||||
source_bucket, objects[0].absolute_path, s3_uri.bucket, file_path, head);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
BackupWriterS3::~BackupWriterS3() = default;
|
||||
|
||||
bool BackupWriterS3::fileExists(const String & file_name)
|
||||
{
|
||||
return !listObjects(*client, s3_uri, file_name).empty();
|
||||
}
|
||||
|
||||
UInt64 BackupWriterS3::getFileSize(const String & file_name)
|
||||
{
|
||||
auto objects = listObjects(*client, s3_uri, file_name);
|
||||
if (objects.empty())
|
||||
throw Exception(ErrorCodes::S3_ERROR, "Object {} must exist");
|
||||
return objects[0].GetSize();
|
||||
}
|
||||
|
||||
bool BackupWriterS3::fileContentsEqual(const String & file_name, const String & expected_file_contents)
|
||||
{
|
||||
if (listObjects(*client, s3_uri, file_name).empty())
|
||||
return false;
|
||||
|
||||
try
|
||||
{
|
||||
auto in = std::make_unique<ReadBufferFromS3>(
|
||||
client, s3_uri.bucket, fs::path(s3_uri.key) / file_name, s3_uri.version_id, max_single_read_retries, read_settings);
|
||||
String actual_file_contents(expected_file_contents.size(), ' ');
|
||||
return (in->read(actual_file_contents.data(), actual_file_contents.size()) == actual_file_contents.size())
|
||||
&& (actual_file_contents == expected_file_contents) && in->eof();
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
tryLogCurrentException(__PRETTY_FUNCTION__);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
std::unique_ptr<WriteBuffer> BackupWriterS3::writeFile(const String & file_name)
|
||||
{
|
||||
return std::make_unique<WriteBufferFromS3>(
|
||||
client,
|
||||
s3_uri.bucket,
|
||||
fs::path(s3_uri.key) / file_name,
|
||||
rw_settings,
|
||||
std::nullopt,
|
||||
DBMS_DEFAULT_BUFFER_SIZE,
|
||||
threadPoolCallbackRunner<void>(IOThreadPool::get(), "BackupWriterS3"));
|
||||
}
|
||||
|
||||
void BackupWriterS3::removeFiles(const Strings & file_names)
|
||||
{
|
||||
/// One call of DeleteObjects() cannot remove more than 1000 keys.
|
||||
size_t chunk_size_limit = 1000;
|
||||
|
||||
size_t current_position = 0;
|
||||
while (current_position < file_names.size())
|
||||
{
|
||||
std::vector<Aws::S3::Model::ObjectIdentifier> current_chunk;
|
||||
for (; current_position < file_names.size() && current_chunk.size() < chunk_size_limit; ++current_position)
|
||||
{
|
||||
Aws::S3::Model::ObjectIdentifier obj;
|
||||
obj.SetKey(fs::path(s3_uri.key) / file_names[current_position]);
|
||||
current_chunk.push_back(obj);
|
||||
}
|
||||
|
||||
Aws::S3::Model::Delete delkeys;
|
||||
delkeys.SetObjects(current_chunk);
|
||||
Aws::S3::Model::DeleteObjectsRequest request;
|
||||
request.SetBucket(s3_uri.bucket);
|
||||
request.SetDelete(delkeys);
|
||||
|
||||
auto outcome = client->DeleteObjects(request);
|
||||
if (!outcome.IsSuccess())
|
||||
throw Exception(outcome.GetError().GetMessage(), ErrorCodes::S3_ERROR);
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
#endif
|
92
src/Backups/BackupIO_S3.h
Normal file
92
src/Backups/BackupIO_S3.h
Normal file
@ -0,0 +1,92 @@
|
||||
#pragma once
|
||||
|
||||
#include "config.h"
|
||||
|
||||
#if USE_AWS_S3
|
||||
#include <Backups/BackupIO.h>
|
||||
#include <IO/S3Common.h>
|
||||
#include <IO/ReadSettings.h>
|
||||
#include <Storages/StorageS3Settings.h>
|
||||
|
||||
#include <aws/s3/S3Client.h>
|
||||
#include <aws/s3/model/CopyObjectRequest.h>
|
||||
#include <aws/s3/model/ListObjectsV2Request.h>
|
||||
#include <aws/s3/model/HeadObjectRequest.h>
|
||||
#include <aws/s3/model/DeleteObjectRequest.h>
|
||||
#include <aws/s3/model/DeleteObjectsRequest.h>
|
||||
#include <aws/s3/model/CreateMultipartUploadRequest.h>
|
||||
#include <aws/s3/model/CompleteMultipartUploadRequest.h>
|
||||
#include <aws/s3/model/UploadPartCopyRequest.h>
|
||||
#include <aws/s3/model/AbortMultipartUploadRequest.h>
|
||||
#include <aws/s3/model/HeadObjectResult.h>
|
||||
#include <aws/s3/model/ListObjectsV2Result.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/// Represents a backup stored to AWS S3.
|
||||
class BackupReaderS3 : public IBackupReader
|
||||
{
|
||||
public:
|
||||
BackupReaderS3(const S3::URI & s3_uri_, const String & access_key_id_, const String & secret_access_key_, const ContextPtr & context_);
|
||||
~BackupReaderS3() override;
|
||||
|
||||
bool fileExists(const String & file_name) override;
|
||||
UInt64 getFileSize(const String & file_name) override;
|
||||
std::unique_ptr<SeekableReadBuffer> readFile(const String & file_name) override;
|
||||
DataSourceDescription getDataSourceDescription() const override;
|
||||
|
||||
private:
|
||||
S3::URI s3_uri;
|
||||
std::shared_ptr<Aws::S3::S3Client> client;
|
||||
UInt64 max_single_read_retries;
|
||||
ReadSettings read_settings;
|
||||
};
|
||||
|
||||
|
||||
class BackupWriterS3 : public IBackupWriter
|
||||
{
|
||||
public:
|
||||
BackupWriterS3(const S3::URI & s3_uri_, const String & access_key_id_, const String & secret_access_key_, const ContextPtr & context_);
|
||||
~BackupWriterS3() override;
|
||||
|
||||
bool fileExists(const String & file_name) override;
|
||||
UInt64 getFileSize(const String & file_name) override;
|
||||
bool fileContentsEqual(const String & file_name, const String & expected_file_contents) override;
|
||||
std::unique_ptr<WriteBuffer> writeFile(const String & file_name) override;
|
||||
void removeFiles(const Strings & file_names) override;
|
||||
|
||||
DataSourceDescription getDataSourceDescription() const override;
|
||||
bool supportNativeCopy(DataSourceDescription data_source_description) const override;
|
||||
void copyFileNative(DiskPtr from_disk, const String & file_name_from, const String & file_name_to) override;
|
||||
|
||||
private:
|
||||
|
||||
Aws::S3::Model::HeadObjectOutcome requestObjectHeadData(const std::string & bucket_from, const std::string & key) const;
|
||||
|
||||
void copyObjectImpl(
|
||||
const String & src_bucket,
|
||||
const String & src_key,
|
||||
const String & dst_bucket,
|
||||
const String & dst_key,
|
||||
std::optional<Aws::S3::Model::HeadObjectResult> head = std::nullopt,
|
||||
std::optional<ObjectAttributes> metadata = std::nullopt) const;
|
||||
|
||||
void copyObjectMultipartImpl(
|
||||
const String & src_bucket,
|
||||
const String & src_key,
|
||||
const String & dst_bucket,
|
||||
const String & dst_key,
|
||||
std::optional<Aws::S3::Model::HeadObjectResult> head = std::nullopt,
|
||||
std::optional<ObjectAttributes> metadata = std::nullopt) const;
|
||||
|
||||
S3::URI s3_uri;
|
||||
std::shared_ptr<Aws::S3::S3Client> client;
|
||||
UInt64 max_single_read_retries;
|
||||
ReadSettings read_settings;
|
||||
S3Settings::ReadWriteSettings rw_settings;
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
#endif
|
@ -455,6 +455,7 @@ void BackupImpl::createLockFile()
|
||||
assert(uuid);
|
||||
auto out = writer->writeFile(lock_file_name);
|
||||
writeUUIDText(*uuid, *out);
|
||||
out->finalize();
|
||||
}
|
||||
|
||||
bool BackupImpl::checkLockFile(bool throw_if_failed) const
|
||||
|
129
src/Backups/registerBackupEngineS3.cpp
Normal file
129
src/Backups/registerBackupEngineS3.cpp
Normal file
@ -0,0 +1,129 @@
|
||||
#include "config.h"
|
||||
|
||||
#include <Backups/BackupFactory.h>
|
||||
#include <Common/Exception.h>
|
||||
|
||||
#if USE_AWS_S3
|
||||
#include <Backups/BackupIO_S3.h>
|
||||
#include <Backups/BackupImpl.h>
|
||||
#include <IO/Archives/hasRegisteredArchiveFileExtension.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
#include <filesystem>
|
||||
#endif
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
namespace fs = std::filesystem;
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int BAD_ARGUMENTS;
|
||||
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
|
||||
extern const int SUPPORT_IS_DISABLED;
|
||||
}
|
||||
|
||||
#if USE_AWS_S3
|
||||
namespace
|
||||
{
|
||||
String removeFileNameFromURL(String & url)
|
||||
{
|
||||
Poco::URI url2{url};
|
||||
String path = url2.getPath();
|
||||
size_t slash_pos = path.find_last_of('/');
|
||||
String file_name = path.substr(slash_pos + 1);
|
||||
path.resize(slash_pos + 1);
|
||||
url2.setPath(path);
|
||||
url = url2.toString();
|
||||
return file_name;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
void registerBackupEngineS3(BackupFactory & factory)
|
||||
{
|
||||
auto creator_fn = []([[maybe_unused]] const BackupFactory::CreateParams & params) -> std::unique_ptr<IBackup>
|
||||
{
|
||||
#if USE_AWS_S3
|
||||
String backup_name = params.backup_info.toString();
|
||||
const String & id_arg = params.backup_info.id_arg;
|
||||
const auto & args = params.backup_info.args;
|
||||
|
||||
String s3_uri, access_key_id, secret_access_key;
|
||||
|
||||
if (!id_arg.empty())
|
||||
{
|
||||
const auto & config = params.context->getConfigRef();
|
||||
auto config_prefix = "named_collections." + id_arg;
|
||||
|
||||
if (!config.has(config_prefix))
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "There is no collection named `{}` in config", id_arg);
|
||||
|
||||
s3_uri = config.getString(config_prefix + ".url");
|
||||
access_key_id = config.getString(config_prefix + ".access_key_id", "");
|
||||
secret_access_key = config.getString(config_prefix + ".secret_access_key", "");
|
||||
|
||||
if (config.has(config_prefix + ".filename"))
|
||||
s3_uri = fs::path(s3_uri) / config.getString(config_prefix + ".filename");
|
||||
|
||||
if (args.size() > 1)
|
||||
throw Exception(
|
||||
"Backup S3 requires 1 or 2 arguments: named_collection, [filename]",
|
||||
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
|
||||
|
||||
if (args.size() == 1)
|
||||
s3_uri = fs::path(s3_uri) / args[0].safeGet<String>();
|
||||
}
|
||||
else
|
||||
{
|
||||
if ((args.size() != 1) && (args.size() != 3))
|
||||
throw Exception(
|
||||
"Backup S3 requires 1 or 3 arguments: url, [access_key_id, secret_access_key]",
|
||||
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
|
||||
|
||||
s3_uri = args[0].safeGet<String>();
|
||||
if (args.size() >= 3)
|
||||
{
|
||||
access_key_id = args[1].safeGet<String>();
|
||||
secret_access_key = args[2].safeGet<String>();
|
||||
}
|
||||
}
|
||||
|
||||
BackupImpl::ArchiveParams archive_params;
|
||||
if (hasRegisteredArchiveFileExtension(s3_uri))
|
||||
{
|
||||
if (params.is_internal_backup)
|
||||
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Using archives with backups on clusters is disabled");
|
||||
|
||||
archive_params.archive_name = removeFileNameFromURL(s3_uri);
|
||||
archive_params.compression_method = params.compression_method;
|
||||
archive_params.compression_level = params.compression_level;
|
||||
archive_params.password = params.password;
|
||||
}
|
||||
else
|
||||
{
|
||||
if (!params.password.empty())
|
||||
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Password is not applicable, backup cannot be encrypted");
|
||||
}
|
||||
|
||||
if (params.open_mode == IBackup::OpenMode::READ)
|
||||
{
|
||||
auto reader = std::make_shared<BackupReaderS3>(S3::URI{Poco::URI{s3_uri}}, access_key_id, secret_access_key, params.context);
|
||||
return std::make_unique<BackupImpl>(backup_name, archive_params, params.base_backup_info, reader, params.context);
|
||||
}
|
||||
else
|
||||
{
|
||||
auto writer = std::make_shared<BackupWriterS3>(S3::URI{Poco::URI{s3_uri}}, access_key_id, secret_access_key, params.context);
|
||||
return std::make_unique<BackupImpl>(backup_name, archive_params, params.base_backup_info, writer, params.context, params.is_internal_backup, params.backup_coordination, params.backup_uuid);
|
||||
}
|
||||
#else
|
||||
throw Exception("S3 support is disabled", ErrorCodes::SUPPORT_IS_DISABLED);
|
||||
#endif
|
||||
};
|
||||
|
||||
factory.registerBackupEngine("S3", creator_fn);
|
||||
}
|
||||
|
||||
}
|
@ -284,6 +284,7 @@ add_object_library(clickhouse_processors_ttl Processors/TTL)
|
||||
add_object_library(clickhouse_processors_merges_algorithms Processors/Merges/Algorithms)
|
||||
add_object_library(clickhouse_processors_queryplan Processors/QueryPlan)
|
||||
add_object_library(clickhouse_processors_queryplan_optimizations Processors/QueryPlan/Optimizations)
|
||||
add_object_library(clickhouse_user_defined_functions Functions/UserDefined)
|
||||
|
||||
if (TARGET ch_contrib::nuraft)
|
||||
add_object_library(clickhouse_coordination Coordination)
|
||||
|
@ -1498,6 +1498,7 @@ void ClientBase::processParsedSingleQuery(const String & full_query, const Strin
|
||||
if (!old_settings)
|
||||
old_settings.emplace(global_context->getSettingsRef());
|
||||
global_context->applySettingsChanges(settings_ast.as<ASTSetQuery>()->changes);
|
||||
global_context->resetSettingsToDefaultValue(settings_ast.as<ASTSetQuery>()->default_settings);
|
||||
};
|
||||
|
||||
const auto * insert = parsed_query->as<ASTInsertQuery>();
|
||||
@ -1543,6 +1544,7 @@ void ClientBase::processParsedSingleQuery(const String & full_query, const Strin
|
||||
else
|
||||
global_context->applySettingChange(change);
|
||||
}
|
||||
global_context->resetSettingsToDefaultValue(set_query->default_settings);
|
||||
}
|
||||
if (const auto * use_query = parsed_query->as<ASTUseQuery>())
|
||||
{
|
||||
|
@ -393,24 +393,38 @@ MultiplexedConnections::ReplicaState & MultiplexedConnections::getReplicaForRead
|
||||
Poco::Net::Socket::SocketList write_list;
|
||||
Poco::Net::Socket::SocketList except_list;
|
||||
|
||||
for (const ReplicaState & state : replica_states)
|
||||
{
|
||||
Connection * connection = state.connection;
|
||||
if (connection != nullptr)
|
||||
read_list.push_back(*connection->socket);
|
||||
}
|
||||
|
||||
auto timeout = is_draining ? drain_timeout : receive_timeout;
|
||||
int n = Poco::Net::Socket::select(
|
||||
read_list,
|
||||
write_list,
|
||||
except_list,
|
||||
timeout);
|
||||
int n = 0;
|
||||
|
||||
/// EINTR loop
|
||||
while (true)
|
||||
{
|
||||
read_list.clear();
|
||||
for (const ReplicaState & state : replica_states)
|
||||
{
|
||||
Connection * connection = state.connection;
|
||||
if (connection != nullptr)
|
||||
read_list.push_back(*connection->socket);
|
||||
}
|
||||
|
||||
/// poco returns 0 on EINTR, let's reset errno to ensure that EINTR came from select().
|
||||
errno = 0;
|
||||
|
||||
n = Poco::Net::Socket::select(
|
||||
read_list,
|
||||
write_list,
|
||||
except_list,
|
||||
timeout);
|
||||
if (n <= 0 && errno == EINTR)
|
||||
continue;
|
||||
break;
|
||||
}
|
||||
|
||||
/// We treat any error as timeout for simplicity.
|
||||
/// And we also check if read_list is still empty just in case.
|
||||
if (n <= 0 || read_list.empty())
|
||||
{
|
||||
const auto & addresses = dumpAddressesUnlocked();
|
||||
for (ReplicaState & state : replica_states)
|
||||
{
|
||||
Connection * connection = state.connection;
|
||||
@ -423,7 +437,7 @@ MultiplexedConnections::ReplicaState & MultiplexedConnections::getReplicaForRead
|
||||
throw Exception(ErrorCodes::TIMEOUT_EXCEEDED,
|
||||
"Timeout ({} ms) exceeded while reading from {}",
|
||||
timeout.totalMilliseconds(),
|
||||
dumpAddressesUnlocked());
|
||||
addresses);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -236,6 +236,21 @@ ASTPtr QueryFuzzer::getRandomColumnLike()
|
||||
return new_ast;
|
||||
}
|
||||
|
||||
ASTPtr QueryFuzzer::getRandomExpressionList()
|
||||
{
|
||||
if (column_like.empty())
|
||||
{
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
ASTPtr new_ast = std::make_shared<ASTExpressionList>();
|
||||
for (size_t i = 0; i < fuzz_rand() % 5 + 1; ++i)
|
||||
{
|
||||
new_ast->children.push_back(getRandomColumnLike());
|
||||
}
|
||||
return new_ast;
|
||||
}
|
||||
|
||||
void QueryFuzzer::replaceWithColumnLike(ASTPtr & ast)
|
||||
{
|
||||
if (column_like.empty())
|
||||
@ -453,6 +468,16 @@ bool QueryFuzzer::isSuitableForFuzzing(const ASTCreateQuery & create)
|
||||
return create.columns_list && create.columns_list->columns;
|
||||
}
|
||||
|
||||
static String getOriginalTableName(const String & full_name)
|
||||
{
|
||||
return full_name.substr(0, full_name.find("__fuzz_"));
|
||||
}
|
||||
|
||||
static String getFuzzedTableName(const String & original_name, size_t index)
|
||||
{
|
||||
return original_name + "__fuzz_" + toString(index);
|
||||
}
|
||||
|
||||
void QueryFuzzer::fuzzCreateQuery(ASTCreateQuery & create)
|
||||
{
|
||||
if (create.columns_list && create.columns_list->columns)
|
||||
@ -486,10 +511,9 @@ void QueryFuzzer::fuzzCreateQuery(ASTCreateQuery & create)
|
||||
}
|
||||
|
||||
auto full_name = create.getTable();
|
||||
auto original_name = full_name.substr(0, full_name.find("__fuzz_"));
|
||||
|
||||
auto original_name = getOriginalTableName(full_name);
|
||||
size_t index = index_of_fuzzed_table[original_name]++;
|
||||
auto new_name = original_name + "__fuzz_" + toString(index);
|
||||
auto new_name = getFuzzedTableName(original_name, index);
|
||||
|
||||
create.setTable(new_name);
|
||||
|
||||
@ -650,7 +674,8 @@ void QueryFuzzer::fuzzTableName(ASTTableExpression & table)
|
||||
if (table_id.empty())
|
||||
return;
|
||||
|
||||
auto it = original_table_name_to_fuzzed.find(table_id.getTableName());
|
||||
auto original_name = getOriginalTableName(table_id.getTableName());
|
||||
auto it = original_table_name_to_fuzzed.find(original_name);
|
||||
if (it != original_table_name_to_fuzzed.end() && !it->second.empty())
|
||||
{
|
||||
auto new_table_name = it->second.begin();
|
||||
@ -713,7 +738,7 @@ ASTs QueryFuzzer::getDropQueriesForFuzzedTables(const ASTDropQuery & drop_query)
|
||||
/// Drop all created tables, not only unique ones.
|
||||
for (size_t i = 0; i < it->second; ++i)
|
||||
{
|
||||
auto fuzzed_name = table_name + "__fuzz_" + toString(i);
|
||||
auto fuzzed_name = getFuzzedTableName(table_name, i);
|
||||
auto & query = queries.emplace_back(drop_query.clone());
|
||||
query->as<ASTDropQuery>()->setTable(fuzzed_name);
|
||||
/// Just in case add IF EXISTS to avoid exceptions.
|
||||
@ -734,7 +759,9 @@ void QueryFuzzer::notifyQueryFailed(ASTPtr ast)
|
||||
if (pos != std::string::npos)
|
||||
{
|
||||
auto original_name = table_name.substr(0, pos);
|
||||
original_table_name_to_fuzzed[original_name].erase(table_name);
|
||||
auto it = original_table_name_to_fuzzed.find(original_name);
|
||||
if (it != original_table_name_to_fuzzed.end())
|
||||
it->second.erase(table_name);
|
||||
}
|
||||
};
|
||||
|
||||
@ -841,7 +868,52 @@ void QueryFuzzer::fuzz(ASTPtr & ast)
|
||||
else if (auto * select = typeid_cast<ASTSelectQuery *>(ast.get()))
|
||||
{
|
||||
fuzzColumnLikeExpressionList(select->select().get());
|
||||
fuzzColumnLikeExpressionList(select->groupBy().get());
|
||||
|
||||
if (select->groupBy().get())
|
||||
{
|
||||
if (fuzz_rand() % 50 == 0)
|
||||
{
|
||||
select->groupBy()->children.clear();
|
||||
select->setExpression(ASTSelectQuery::Expression::GROUP_BY, {});
|
||||
select->group_by_with_grouping_sets = false;
|
||||
select->group_by_with_rollup = false;
|
||||
select->group_by_with_cube = false;
|
||||
select->group_by_with_totals = true;
|
||||
}
|
||||
else if (fuzz_rand() % 100 == 0)
|
||||
{
|
||||
select->group_by_with_grouping_sets = !select->group_by_with_grouping_sets;
|
||||
}
|
||||
else if (fuzz_rand() % 100 == 0)
|
||||
{
|
||||
select->group_by_with_rollup = !select->group_by_with_rollup;
|
||||
}
|
||||
else if (fuzz_rand() % 100 == 0)
|
||||
{
|
||||
select->group_by_with_cube = !select->group_by_with_cube;
|
||||
}
|
||||
else if (fuzz_rand() % 100 == 0)
|
||||
{
|
||||
select->group_by_with_totals = !select->group_by_with_totals;
|
||||
}
|
||||
}
|
||||
else if (fuzz_rand() % 50 == 0)
|
||||
{
|
||||
select->setExpression(ASTSelectQuery::Expression::GROUP_BY, getRandomExpressionList());
|
||||
}
|
||||
|
||||
if (select->where().get())
|
||||
{
|
||||
if (fuzz_rand() % 50 == 0)
|
||||
{
|
||||
select->where()->children.clear();
|
||||
select->setExpression(ASTSelectQuery::Expression::WHERE, {});
|
||||
}
|
||||
}
|
||||
else if (fuzz_rand() % 50 == 0)
|
||||
{
|
||||
select->setExpression(ASTSelectQuery::Expression::WHERE, getRandomColumnLike());
|
||||
}
|
||||
fuzzOrderByList(select->orderBy().get());
|
||||
|
||||
fuzz(select->children);
|
||||
|
@ -8,6 +8,7 @@
|
||||
#include <pcg-random/pcg_random.hpp>
|
||||
|
||||
#include <Common/randomSeed.h>
|
||||
#include "Parsers/IAST_fwd.h"
|
||||
#include <Core/Field.h>
|
||||
#include <Parsers/IAST.h>
|
||||
|
||||
@ -72,6 +73,7 @@ struct QueryFuzzer
|
||||
Field getRandomField(int type);
|
||||
Field fuzzField(Field field);
|
||||
ASTPtr getRandomColumnLike();
|
||||
ASTPtr getRandomExpressionList();
|
||||
DataTypePtr fuzzDataType(DataTypePtr type);
|
||||
DataTypePtr getRandomType();
|
||||
ASTs getInsertQueriesForFuzzedTables(const String & full_query);
|
||||
|
@ -33,6 +33,7 @@
|
||||
M(TemporaryFilesForSort, "Number of temporary files created for external sorting") \
|
||||
M(TemporaryFilesForAggregation, "Number of temporary files created for external aggregation") \
|
||||
M(TemporaryFilesForJoin, "Number of temporary files created for JOIN") \
|
||||
M(TemporaryFilesUnknown, "Number of temporary files created without known purpose") \
|
||||
M(Read, "Number of read (read, pread, io_getevents, etc.) syscalls in fly") \
|
||||
M(Write, "Number of write (write, pwrite, io_getevents, etc.) syscalls in fly") \
|
||||
M(NetworkReceive, "Number of threads receiving data from network. Only ClickHouse-related network interaction is included, not by 3rd party libraries.") \
|
||||
|
@ -178,7 +178,11 @@ public:
|
||||
func = std::forward<Function>(func),
|
||||
args = std::make_tuple(std::forward<Args>(args)...)]() mutable /// mutable is needed to destroy capture
|
||||
{
|
||||
SCOPE_EXIT(state->event.set());
|
||||
SCOPE_EXIT(
|
||||
{
|
||||
state->finished = true;
|
||||
state->event.set();
|
||||
});
|
||||
|
||||
state->thread_id = std::this_thread::get_id();
|
||||
|
||||
@ -213,6 +217,17 @@ public:
|
||||
|
||||
~ThreadFromGlobalPoolImpl()
|
||||
{
|
||||
/// The problem is that the our ThreadFromGlobalPool can be actually finished
|
||||
/// before we try to join the thread or check whether it is joinable or not.
|
||||
/// In some places we have code like:
|
||||
/// if (thread->joinable())
|
||||
/// thread->join();
|
||||
/// Where join() won't be executed in case when we call it
|
||||
/// from the same std::thread and it will end to std::abort().
|
||||
/// So we just do nothing in this case
|
||||
if (state->finished)
|
||||
return;
|
||||
|
||||
if (initialized())
|
||||
abort();
|
||||
}
|
||||
@ -252,6 +267,9 @@ protected:
|
||||
|
||||
/// The state used in this object and inside the thread job.
|
||||
Poco::Event event;
|
||||
|
||||
/// To allow joining to the same std::thread after finishing
|
||||
std::atomic<bool> finished{false};
|
||||
};
|
||||
std::shared_ptr<State> state;
|
||||
|
||||
|
@ -99,7 +99,7 @@ void ZooKeeper::init(ZooKeeperArgs args_)
|
||||
if (dns_error)
|
||||
throw KeeperException("Cannot resolve any of provided ZooKeeper hosts due to DNS error", Coordination::Error::ZCONNECTIONLOSS);
|
||||
else
|
||||
throw KeeperException("Cannot use any of provided ZooKeeper nodes", Coordination::Error::ZBADARGUMENTS);
|
||||
throw KeeperException("Cannot use any of provided ZooKeeper nodes", Coordination::Error::ZCONNECTIONLOSS);
|
||||
}
|
||||
|
||||
impl = std::make_unique<Coordination::ZooKeeper>(nodes, args, zk_log);
|
||||
|
@ -307,7 +307,19 @@ void reverseTransposeBytes(const UInt64 * matrix, UInt32 col, T & value)
|
||||
template <typename T>
|
||||
void load(const char * src, T * buf, UInt32 tail = 64)
|
||||
{
|
||||
memcpy(buf, src, tail * sizeof(T));
|
||||
if constexpr (std::endian::native == std::endian::little)
|
||||
{
|
||||
memcpy(buf, src, tail * sizeof(T));
|
||||
}
|
||||
else
|
||||
{
|
||||
/// Since the algorithm uses little-endian integers, data is loaded
|
||||
/// as little-endian types on big-endian machine (s390x, etc).
|
||||
for (UInt32 i = 0; i < tail; ++i)
|
||||
{
|
||||
buf[i] = unalignedLoadLE<T>(src + i * sizeof(T));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
|
@ -1,14 +1,21 @@
|
||||
#include <Coordination/KeeperDispatcher.h>
|
||||
|
||||
#include <Poco/Path.h>
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
|
||||
#include <Common/hex.h>
|
||||
#include <Common/setThreadName.h>
|
||||
#include <Common/ZooKeeper/KeeperException.h>
|
||||
#include <future>
|
||||
#include <chrono>
|
||||
#include <Poco/Path.h>
|
||||
#include <Common/hex.h>
|
||||
#include <filesystem>
|
||||
#include <Common/checkStackSize.h>
|
||||
#include <Common/CurrentMetrics.h>
|
||||
|
||||
|
||||
#include <future>
|
||||
#include <chrono>
|
||||
#include <filesystem>
|
||||
#include <iterator>
|
||||
#include <limits>
|
||||
|
||||
namespace CurrentMetrics
|
||||
{
|
||||
extern const Metric KeeperAliveConnections;
|
||||
@ -32,9 +39,7 @@ KeeperDispatcher::KeeperDispatcher()
|
||||
: responses_queue(std::numeric_limits<size_t>::max())
|
||||
, configuration_and_settings(std::make_shared<KeeperConfigurationAndSettings>())
|
||||
, log(&Poco::Logger::get("KeeperDispatcher"))
|
||||
{
|
||||
}
|
||||
|
||||
{}
|
||||
|
||||
void KeeperDispatcher::requestThread()
|
||||
{
|
||||
@ -191,7 +196,13 @@ void KeeperDispatcher::snapshotThread()
|
||||
|
||||
try
|
||||
{
|
||||
task.create_snapshot(std::move(task.snapshot));
|
||||
auto snapshot_path = task.create_snapshot(std::move(task.snapshot));
|
||||
|
||||
if (snapshot_path.empty())
|
||||
continue;
|
||||
|
||||
if (isLeader())
|
||||
snapshot_s3.uploadSnapshot(snapshot_path);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
@ -285,7 +296,9 @@ void KeeperDispatcher::initialize(const Poco::Util::AbstractConfiguration & conf
|
||||
responses_thread = ThreadFromGlobalPool([this] { responseThread(); });
|
||||
snapshot_thread = ThreadFromGlobalPool([this] { snapshotThread(); });
|
||||
|
||||
server = std::make_unique<KeeperServer>(configuration_and_settings, config, responses_queue, snapshots_queue);
|
||||
snapshot_s3.startup(config);
|
||||
|
||||
server = std::make_unique<KeeperServer>(configuration_and_settings, config, responses_queue, snapshots_queue, snapshot_s3);
|
||||
|
||||
try
|
||||
{
|
||||
@ -312,7 +325,6 @@ void KeeperDispatcher::initialize(const Poco::Util::AbstractConfiguration & conf
|
||||
/// Start it after keeper server start
|
||||
session_cleaner_thread = ThreadFromGlobalPool([this] { sessionCleanerTask(); });
|
||||
update_configuration_thread = ThreadFromGlobalPool([this] { updateConfigurationThread(); });
|
||||
updateConfiguration(config);
|
||||
|
||||
LOG_DEBUG(log, "Dispatcher initialized");
|
||||
}
|
||||
@ -415,6 +427,8 @@ void KeeperDispatcher::shutdown()
|
||||
if (server)
|
||||
server->shutdown();
|
||||
|
||||
snapshot_s3.shutdown();
|
||||
|
||||
CurrentMetrics::set(CurrentMetrics::KeeperAliveConnections, 0);
|
||||
|
||||
}
|
||||
@ -678,6 +692,8 @@ void KeeperDispatcher::updateConfiguration(const Poco::Util::AbstractConfigurati
|
||||
if (!push_result)
|
||||
throw Exception(ErrorCodes::SYSTEM_ERROR, "Cannot push configuration update to queue");
|
||||
}
|
||||
|
||||
snapshot_s3.updateS3Configuration(config);
|
||||
}
|
||||
|
||||
void KeeperDispatcher::updateKeeperStatLatency(uint64_t process_time_ms)
|
||||
|
@ -14,6 +14,7 @@
|
||||
#include <Coordination/CoordinationSettings.h>
|
||||
#include <Coordination/Keeper4LWInfo.h>
|
||||
#include <Coordination/KeeperConnectionStats.h>
|
||||
#include <Coordination/KeeperSnapshotManagerS3.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
@ -76,6 +77,8 @@ private:
|
||||
/// Counter for new session_id requests.
|
||||
std::atomic<int64_t> internal_session_id_counter{0};
|
||||
|
||||
KeeperSnapshotManagerS3 snapshot_s3;
|
||||
|
||||
/// Thread put requests to raft
|
||||
void requestThread();
|
||||
/// Thread put responses for subscribed sessions
|
||||
|
@ -8,6 +8,7 @@
|
||||
#include <string>
|
||||
#include <Coordination/KeeperStateMachine.h>
|
||||
#include <Coordination/KeeperStateManager.h>
|
||||
#include <Coordination/KeeperSnapshotManagerS3.h>
|
||||
#include <Coordination/LoggerWrapper.h>
|
||||
#include <Coordination/ReadBufferFromNuraftBuffer.h>
|
||||
#include <Coordination/WriteBufferFromNuraftBuffer.h>
|
||||
@ -105,7 +106,8 @@ KeeperServer::KeeperServer(
|
||||
const KeeperConfigurationAndSettingsPtr & configuration_and_settings_,
|
||||
const Poco::Util::AbstractConfiguration & config,
|
||||
ResponsesQueue & responses_queue_,
|
||||
SnapshotsQueue & snapshots_queue_)
|
||||
SnapshotsQueue & snapshots_queue_,
|
||||
KeeperSnapshotManagerS3 & snapshot_manager_s3)
|
||||
: server_id(configuration_and_settings_->server_id)
|
||||
, coordination_settings(configuration_and_settings_->coordination_settings)
|
||||
, log(&Poco::Logger::get("KeeperServer"))
|
||||
@ -125,6 +127,7 @@ KeeperServer::KeeperServer(
|
||||
configuration_and_settings_->snapshot_storage_path,
|
||||
coordination_settings,
|
||||
keeper_context,
|
||||
config.getBool("keeper_server.upload_snapshot_on_exit", true) ? &snapshot_manager_s3 : nullptr,
|
||||
checkAndGetSuperdigest(configuration_and_settings_->super_digest));
|
||||
|
||||
state_manager = nuraft::cs_new<KeeperStateManager>(
|
||||
|
@ -71,7 +71,8 @@ public:
|
||||
const KeeperConfigurationAndSettingsPtr & settings_,
|
||||
const Poco::Util::AbstractConfiguration & config_,
|
||||
ResponsesQueue & responses_queue_,
|
||||
SnapshotsQueue & snapshots_queue_);
|
||||
SnapshotsQueue & snapshots_queue_,
|
||||
KeeperSnapshotManagerS3 & snapshot_manager_s3);
|
||||
|
||||
/// Load state machine from the latest snapshot and load log storage. Start NuRaft with required settings.
|
||||
void startup(const Poco::Util::AbstractConfiguration & config, bool enable_ipv6 = true);
|
||||
|
@ -87,7 +87,7 @@ public:
|
||||
};
|
||||
|
||||
using KeeperStorageSnapshotPtr = std::shared_ptr<KeeperStorageSnapshot>;
|
||||
using CreateSnapshotCallback = std::function<void(KeeperStorageSnapshotPtr &&)>;
|
||||
using CreateSnapshotCallback = std::function<std::string(KeeperStorageSnapshotPtr &&)>;
|
||||
|
||||
|
||||
using SnapshotMetaAndStorage = std::pair<SnapshotMetadataPtr, KeeperStoragePtr>;
|
||||
|
311
src/Coordination/KeeperSnapshotManagerS3.cpp
Normal file
311
src/Coordination/KeeperSnapshotManagerS3.cpp
Normal file
@ -0,0 +1,311 @@
|
||||
#include <Coordination/KeeperSnapshotManagerS3.h>
|
||||
|
||||
#if USE_AWS_S3
|
||||
#include <Core/UUID.h>
|
||||
|
||||
#include <Common/Exception.h>
|
||||
#include <Common/setThreadName.h>
|
||||
|
||||
#include <IO/S3Common.h>
|
||||
#include <IO/WriteBufferFromS3.h>
|
||||
#include <IO/ReadBufferFromS3.h>
|
||||
#include <IO/ReadBufferFromFile.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <IO/S3/PocoHTTPClient.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
#include <IO/copyData.h>
|
||||
|
||||
#include <aws/core/auth/AWSCredentials.h>
|
||||
#include <aws/s3/S3Client.h>
|
||||
#include <aws/s3/S3Errors.h>
|
||||
#include <aws/s3/model/HeadObjectRequest.h>
|
||||
#include <aws/s3/model/DeleteObjectRequest.h>
|
||||
|
||||
#include <filesystem>
|
||||
|
||||
namespace fs = std::filesystem;
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
struct KeeperSnapshotManagerS3::S3Configuration
|
||||
{
|
||||
S3Configuration(S3::URI uri_, S3::AuthSettings auth_settings_, std::shared_ptr<const Aws::S3::S3Client> client_)
|
||||
: uri(std::move(uri_))
|
||||
, auth_settings(std::move(auth_settings_))
|
||||
, client(std::move(client_))
|
||||
{}
|
||||
|
||||
S3::URI uri;
|
||||
S3::AuthSettings auth_settings;
|
||||
std::shared_ptr<const Aws::S3::S3Client> client;
|
||||
};
|
||||
|
||||
KeeperSnapshotManagerS3::KeeperSnapshotManagerS3()
|
||||
: snapshots_s3_queue(std::numeric_limits<size_t>::max())
|
||||
, log(&Poco::Logger::get("KeeperSnapshotManagerS3"))
|
||||
, uuid(UUIDHelpers::generateV4())
|
||||
{}
|
||||
|
||||
void KeeperSnapshotManagerS3::updateS3Configuration(const Poco::Util::AbstractConfiguration & config)
|
||||
{
|
||||
try
|
||||
{
|
||||
const std::string config_prefix = "keeper_server.s3_snapshot";
|
||||
|
||||
if (!config.has(config_prefix))
|
||||
{
|
||||
std::lock_guard client_lock{snapshot_s3_client_mutex};
|
||||
if (snapshot_s3_client)
|
||||
LOG_INFO(log, "S3 configuration was removed");
|
||||
snapshot_s3_client = nullptr;
|
||||
return;
|
||||
}
|
||||
|
||||
auto auth_settings = S3::AuthSettings::loadFromConfig(config_prefix, config);
|
||||
|
||||
auto endpoint = config.getString(config_prefix + ".endpoint");
|
||||
auto new_uri = S3::URI{Poco::URI(endpoint)};
|
||||
|
||||
{
|
||||
std::lock_guard client_lock{snapshot_s3_client_mutex};
|
||||
// if client is not changed (same auth settings, same endpoint) we don't need to update
|
||||
if (snapshot_s3_client && snapshot_s3_client->client && auth_settings == snapshot_s3_client->auth_settings
|
||||
&& snapshot_s3_client->uri.uri == new_uri.uri)
|
||||
return;
|
||||
}
|
||||
|
||||
LOG_INFO(log, "S3 configuration was updated");
|
||||
|
||||
auto credentials = Aws::Auth::AWSCredentials(auth_settings.access_key_id, auth_settings.secret_access_key);
|
||||
HeaderCollection headers = auth_settings.headers;
|
||||
|
||||
static constexpr size_t s3_max_redirects = 10;
|
||||
static constexpr bool enable_s3_requests_logging = false;
|
||||
|
||||
if (!new_uri.key.empty())
|
||||
{
|
||||
LOG_ERROR(log, "Invalid endpoint defined for S3, it shouldn't contain key, endpoint: {}", endpoint);
|
||||
return;
|
||||
}
|
||||
|
||||
S3::PocoHTTPClientConfiguration client_configuration = S3::ClientFactory::instance().createClientConfiguration(
|
||||
auth_settings.region,
|
||||
RemoteHostFilter(), s3_max_redirects,
|
||||
enable_s3_requests_logging,
|
||||
/* for_disk_s3 = */ false);
|
||||
|
||||
client_configuration.endpointOverride = new_uri.endpoint;
|
||||
|
||||
auto client = S3::ClientFactory::instance().create(
|
||||
client_configuration,
|
||||
new_uri.is_virtual_hosted_style,
|
||||
credentials.GetAWSAccessKeyId(),
|
||||
credentials.GetAWSSecretKey(),
|
||||
auth_settings.server_side_encryption_customer_key_base64,
|
||||
std::move(headers),
|
||||
auth_settings.use_environment_credentials.value_or(false),
|
||||
auth_settings.use_insecure_imds_request.value_or(false));
|
||||
|
||||
auto new_client = std::make_shared<KeeperSnapshotManagerS3::S3Configuration>(std::move(new_uri), std::move(auth_settings), std::move(client));
|
||||
|
||||
{
|
||||
std::lock_guard client_lock{snapshot_s3_client_mutex};
|
||||
snapshot_s3_client = std::move(new_client);
|
||||
}
|
||||
LOG_INFO(log, "S3 client was updated");
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
LOG_ERROR(log, "Failed to create an S3 client for snapshots");
|
||||
tryLogCurrentException(__PRETTY_FUNCTION__);
|
||||
}
|
||||
}
|
||||
std::shared_ptr<KeeperSnapshotManagerS3::S3Configuration> KeeperSnapshotManagerS3::getSnapshotS3Client() const
|
||||
{
|
||||
std::lock_guard lock{snapshot_s3_client_mutex};
|
||||
return snapshot_s3_client;
|
||||
}
|
||||
|
||||
void KeeperSnapshotManagerS3::uploadSnapshotImpl(const std::string & snapshot_path)
|
||||
{
|
||||
try
|
||||
{
|
||||
auto s3_client = getSnapshotS3Client();
|
||||
if (s3_client == nullptr)
|
||||
return;
|
||||
|
||||
S3Settings::ReadWriteSettings read_write_settings;
|
||||
read_write_settings.upload_part_size_multiply_parts_count_threshold = 10000;
|
||||
|
||||
const auto create_writer = [&](const auto & key)
|
||||
{
|
||||
return WriteBufferFromS3
|
||||
{
|
||||
s3_client->client,
|
||||
s3_client->uri.bucket,
|
||||
key,
|
||||
read_write_settings
|
||||
};
|
||||
};
|
||||
|
||||
const auto file_exists = [&](const auto & key)
|
||||
{
|
||||
Aws::S3::Model::HeadObjectRequest request;
|
||||
request.SetBucket(s3_client->uri.bucket);
|
||||
request.SetKey(key);
|
||||
auto outcome = s3_client->client->HeadObject(request);
|
||||
|
||||
if (outcome.IsSuccess())
|
||||
return true;
|
||||
|
||||
const auto & error = outcome.GetError();
|
||||
if (error.GetErrorType() != Aws::S3::S3Errors::NO_SUCH_KEY && error.GetErrorType() != Aws::S3::S3Errors::RESOURCE_NOT_FOUND)
|
||||
throw S3Exception(error.GetErrorType(), "Failed to verify existence of lock file: {}", error.GetMessage());
|
||||
|
||||
return false;
|
||||
};
|
||||
|
||||
|
||||
LOG_INFO(log, "Will try to upload snapshot on {} to S3", snapshot_path);
|
||||
ReadBufferFromFile snapshot_file(snapshot_path);
|
||||
|
||||
auto snapshot_name = fs::path(snapshot_path).filename().string();
|
||||
auto lock_file = fmt::format(".{}_LOCK", snapshot_name);
|
||||
|
||||
if (file_exists(snapshot_name))
|
||||
{
|
||||
LOG_ERROR(log, "Snapshot {} already exists", snapshot_name);
|
||||
return;
|
||||
}
|
||||
|
||||
// First we need to verify that there isn't already a lock file for the snapshot we want to upload
|
||||
// Only leader uploads a snapshot, but there can be a rare case where we have 2 leaders in NuRaft
|
||||
if (file_exists(lock_file))
|
||||
{
|
||||
LOG_ERROR(log, "Lock file for {} already, exists. Probably a different node is already uploading the snapshot", snapshot_name);
|
||||
return;
|
||||
}
|
||||
|
||||
// We write our UUID to lock file
|
||||
LOG_DEBUG(log, "Trying to create a lock file");
|
||||
WriteBufferFromS3 lock_writer = create_writer(lock_file);
|
||||
writeUUIDText(uuid, lock_writer);
|
||||
lock_writer.finalize();
|
||||
|
||||
// We read back the written UUID, if it's the same we can upload the file
|
||||
ReadBufferFromS3 lock_reader
|
||||
{
|
||||
s3_client->client,
|
||||
s3_client->uri.bucket,
|
||||
lock_file,
|
||||
"",
|
||||
1,
|
||||
{}
|
||||
};
|
||||
|
||||
std::string read_uuid;
|
||||
readStringUntilEOF(read_uuid, lock_reader);
|
||||
|
||||
if (read_uuid != toString(uuid))
|
||||
{
|
||||
LOG_ERROR(log, "Failed to create a lock file");
|
||||
return;
|
||||
}
|
||||
|
||||
SCOPE_EXIT(
|
||||
{
|
||||
LOG_INFO(log, "Removing lock file");
|
||||
try
|
||||
{
|
||||
Aws::S3::Model::DeleteObjectRequest delete_request;
|
||||
delete_request.SetBucket(s3_client->uri.bucket);
|
||||
delete_request.SetKey(lock_file);
|
||||
auto delete_outcome = s3_client->client->DeleteObject(delete_request);
|
||||
if (!delete_outcome.IsSuccess())
|
||||
throw S3Exception(delete_outcome.GetError().GetMessage(), delete_outcome.GetError().GetErrorType());
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
LOG_INFO(log, "Failed to delete lock file for {} from S3", snapshot_path);
|
||||
tryLogCurrentException(__PRETTY_FUNCTION__);
|
||||
}
|
||||
});
|
||||
|
||||
WriteBufferFromS3 snapshot_writer = create_writer(snapshot_name);
|
||||
copyData(snapshot_file, snapshot_writer);
|
||||
snapshot_writer.finalize();
|
||||
|
||||
LOG_INFO(log, "Successfully uploaded {} to S3", snapshot_path);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
LOG_INFO(log, "Failure during upload of {} to S3", snapshot_path);
|
||||
tryLogCurrentException(__PRETTY_FUNCTION__);
|
||||
}
|
||||
}
|
||||
|
||||
void KeeperSnapshotManagerS3::snapshotS3Thread()
|
||||
{
|
||||
setThreadName("KeeperS3SnpT");
|
||||
|
||||
while (!shutdown_called)
|
||||
{
|
||||
std::string snapshot_path;
|
||||
if (!snapshots_s3_queue.pop(snapshot_path))
|
||||
break;
|
||||
|
||||
if (shutdown_called)
|
||||
break;
|
||||
|
||||
uploadSnapshotImpl(snapshot_path);
|
||||
}
|
||||
}
|
||||
|
||||
void KeeperSnapshotManagerS3::uploadSnapshot(const std::string & path, bool async_upload)
|
||||
{
|
||||
if (getSnapshotS3Client() == nullptr)
|
||||
return;
|
||||
|
||||
if (async_upload)
|
||||
{
|
||||
if (!snapshots_s3_queue.push(path))
|
||||
LOG_WARNING(log, "Failed to add snapshot {} to S3 queue", path);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
uploadSnapshotImpl(path);
|
||||
}
|
||||
|
||||
void KeeperSnapshotManagerS3::startup(const Poco::Util::AbstractConfiguration & config)
|
||||
{
|
||||
updateS3Configuration(config);
|
||||
snapshot_s3_thread = ThreadFromGlobalPool([this] { snapshotS3Thread(); });
|
||||
}
|
||||
|
||||
void KeeperSnapshotManagerS3::shutdown()
|
||||
{
|
||||
if (shutdown_called)
|
||||
return;
|
||||
|
||||
LOG_DEBUG(log, "Shutting down KeeperSnapshotManagerS3");
|
||||
shutdown_called = true;
|
||||
|
||||
try
|
||||
{
|
||||
snapshots_s3_queue.finish();
|
||||
if (snapshot_s3_thread.joinable())
|
||||
snapshot_s3_thread.join();
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
tryLogCurrentException(__PRETTY_FUNCTION__);
|
||||
}
|
||||
|
||||
LOG_INFO(log, "KeeperSnapshotManagerS3 shut down");
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
#endif
|
68
src/Coordination/KeeperSnapshotManagerS3.h
Normal file
68
src/Coordination/KeeperSnapshotManagerS3.h
Normal file
@ -0,0 +1,68 @@
|
||||
#pragma once
|
||||
|
||||
#include "config.h"
|
||||
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
|
||||
#if USE_AWS_S3
|
||||
#include <Common/ConcurrentBoundedQueue.h>
|
||||
#include <Common/ThreadPool.h>
|
||||
#include <Common/logger_useful.h>
|
||||
|
||||
#include <string>
|
||||
#endif
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
#if USE_AWS_S3
|
||||
class KeeperSnapshotManagerS3
|
||||
{
|
||||
public:
|
||||
KeeperSnapshotManagerS3();
|
||||
|
||||
void updateS3Configuration(const Poco::Util::AbstractConfiguration & config);
|
||||
void uploadSnapshot(const std::string & path, bool async_upload = true);
|
||||
|
||||
void startup(const Poco::Util::AbstractConfiguration & config);
|
||||
void shutdown();
|
||||
private:
|
||||
using SnapshotS3Queue = ConcurrentBoundedQueue<std::string>;
|
||||
SnapshotS3Queue snapshots_s3_queue;
|
||||
|
||||
/// Upload new snapshots to S3
|
||||
ThreadFromGlobalPool snapshot_s3_thread;
|
||||
|
||||
struct S3Configuration;
|
||||
mutable std::mutex snapshot_s3_client_mutex;
|
||||
std::shared_ptr<S3Configuration> snapshot_s3_client;
|
||||
|
||||
std::atomic<bool> shutdown_called{false};
|
||||
|
||||
Poco::Logger * log;
|
||||
|
||||
UUID uuid;
|
||||
|
||||
std::shared_ptr<S3Configuration> getSnapshotS3Client() const;
|
||||
|
||||
void uploadSnapshotImpl(const std::string & snapshot_path);
|
||||
|
||||
/// Thread upload snapshots to S3 in the background
|
||||
void snapshotS3Thread();
|
||||
};
|
||||
#else
|
||||
class KeeperSnapshotManagerS3
|
||||
{
|
||||
public:
|
||||
KeeperSnapshotManagerS3() = default;
|
||||
|
||||
void updateS3Configuration(const Poco::Util::AbstractConfiguration &) {}
|
||||
void uploadSnapshot(const std::string &, [[maybe_unused]] bool async_upload = true) {}
|
||||
|
||||
void startup(const Poco::Util::AbstractConfiguration &) {}
|
||||
|
||||
void shutdown() {}
|
||||
};
|
||||
#endif
|
||||
|
||||
}
|
@ -44,6 +44,7 @@ KeeperStateMachine::KeeperStateMachine(
|
||||
const std::string & snapshots_path_,
|
||||
const CoordinationSettingsPtr & coordination_settings_,
|
||||
const KeeperContextPtr & keeper_context_,
|
||||
KeeperSnapshotManagerS3 * snapshot_manager_s3_,
|
||||
const std::string & superdigest_)
|
||||
: coordination_settings(coordination_settings_)
|
||||
, snapshot_manager(
|
||||
@ -59,6 +60,7 @@ KeeperStateMachine::KeeperStateMachine(
|
||||
, log(&Poco::Logger::get("KeeperStateMachine"))
|
||||
, superdigest(superdigest_)
|
||||
, keeper_context(keeper_context_)
|
||||
, snapshot_manager_s3(snapshot_manager_s3_)
|
||||
{
|
||||
}
|
||||
|
||||
@ -400,13 +402,22 @@ void KeeperStateMachine::create_snapshot(nuraft::snapshot & s, nuraft::async_res
|
||||
}
|
||||
|
||||
when_done(ret, exception);
|
||||
|
||||
return ret ? latest_snapshot_path : "";
|
||||
};
|
||||
|
||||
|
||||
if (keeper_context->server_state == KeeperContext::Phase::SHUTDOWN)
|
||||
{
|
||||
LOG_INFO(log, "Creating a snapshot during shutdown because 'create_snapshot_on_exit' is enabled.");
|
||||
snapshot_task.create_snapshot(std::move(snapshot_task.snapshot));
|
||||
auto snapshot_path = snapshot_task.create_snapshot(std::move(snapshot_task.snapshot));
|
||||
|
||||
if (!snapshot_path.empty() && snapshot_manager_s3)
|
||||
{
|
||||
LOG_INFO(log, "Uploading snapshot {} during shutdown because 'upload_snapshot_on_exit' is enabled.", snapshot_path);
|
||||
snapshot_manager_s3->uploadSnapshot(snapshot_path, /* asnyc_upload */ false);
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -2,11 +2,13 @@
|
||||
|
||||
#include <Coordination/CoordinationSettings.h>
|
||||
#include <Coordination/KeeperSnapshotManager.h>
|
||||
#include <Coordination/KeeperSnapshotManagerS3.h>
|
||||
#include <Coordination/KeeperContext.h>
|
||||
#include <Coordination/KeeperStorage.h>
|
||||
|
||||
#include <libnuraft/nuraft.hxx>
|
||||
#include <Common/ConcurrentBoundedQueue.h>
|
||||
#include <Common/logger_useful.h>
|
||||
#include <Coordination/KeeperContext.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
@ -26,6 +28,7 @@ public:
|
||||
const std::string & snapshots_path_,
|
||||
const CoordinationSettingsPtr & coordination_settings_,
|
||||
const KeeperContextPtr & keeper_context_,
|
||||
KeeperSnapshotManagerS3 * snapshot_manager_s3_,
|
||||
const std::string & superdigest_ = "");
|
||||
|
||||
/// Read state from the latest snapshot
|
||||
@ -146,6 +149,8 @@ private:
|
||||
const std::string superdigest;
|
||||
|
||||
KeeperContextPtr keeper_context;
|
||||
|
||||
KeeperSnapshotManagerS3 * snapshot_manager_s3;
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -1318,7 +1318,7 @@ void testLogAndStateMachine(Coordination::CoordinationSettingsPtr settings, uint
|
||||
|
||||
ResponsesQueue queue(std::numeric_limits<size_t>::max());
|
||||
SnapshotsQueue snapshots_queue{1};
|
||||
auto state_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue, "./snapshots", settings, keeper_context);
|
||||
auto state_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue, "./snapshots", settings, keeper_context, nullptr);
|
||||
state_machine->init();
|
||||
DB::KeeperLogStore changelog("./logs", settings->rotate_log_storage_interval, true, enable_compression);
|
||||
changelog.init(state_machine->last_commit_index() + 1, settings->reserved_log_items);
|
||||
@ -1359,7 +1359,7 @@ void testLogAndStateMachine(Coordination::CoordinationSettingsPtr settings, uint
|
||||
}
|
||||
|
||||
SnapshotsQueue snapshots_queue1{1};
|
||||
auto restore_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue1, "./snapshots", settings, keeper_context);
|
||||
auto restore_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue1, "./snapshots", settings, keeper_context, nullptr);
|
||||
restore_machine->init();
|
||||
EXPECT_EQ(restore_machine->last_commit_index(), total_logs - total_logs % settings->snapshot_distance);
|
||||
|
||||
@ -1471,7 +1471,7 @@ TEST_P(CoordinationTest, TestEphemeralNodeRemove)
|
||||
|
||||
ResponsesQueue queue(std::numeric_limits<size_t>::max());
|
||||
SnapshotsQueue snapshots_queue{1};
|
||||
auto state_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue, "./snapshots", settings, keeper_context);
|
||||
auto state_machine = std::make_shared<KeeperStateMachine>(queue, snapshots_queue, "./snapshots", settings, keeper_context, nullptr);
|
||||
state_machine->init();
|
||||
|
||||
std::shared_ptr<ZooKeeperCreateRequest> request_c = std::make_shared<ZooKeeperCreateRequest>();
|
||||
|
@ -42,7 +42,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
|
||||
*/
|
||||
|
||||
#define COMMON_SETTINGS(M) \
|
||||
M(Dialect, dialect, Dialect::clickhouse, "Which SQL dialect will be used to parse query", 0)\
|
||||
M(Dialect, dialect, Dialect::clickhouse, "Which dialect will be used to parse query", 0)\
|
||||
M(UInt64, min_compress_block_size, 65536, "The actual size of the block to compress, if the uncompressed data less than max_compress_block_size is no less than this value and no less than the volume of data for one mark.", 0) \
|
||||
M(UInt64, max_compress_block_size, 1048576, "The maximum size of blocks of uncompressed data before compressing for writing to a table.", 0) \
|
||||
M(UInt64, max_block_size, DEFAULT_BLOCK_SIZE, "Maximum block size for reading", 0) \
|
||||
@ -84,7 +84,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
|
||||
M(UInt64, connections_with_failover_max_tries, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES, "The maximum number of attempts to connect to replicas.", 0) \
|
||||
M(UInt64, s3_min_upload_part_size, 16*1024*1024, "The minimum size of part to upload during multipart upload to S3.", 0) \
|
||||
M(UInt64, s3_upload_part_size_multiply_factor, 2, "Multiply s3_min_upload_part_size by this factor each time s3_multiply_parts_count_threshold parts were uploaded from a single write to S3.", 0) \
|
||||
M(UInt64, s3_upload_part_size_multiply_parts_count_threshold, 1000, "Each time this number of parts was uploaded to S3 s3_min_upload_part_size multiplied by s3_upload_part_size_multiply_factor.", 0) \
|
||||
M(UInt64, s3_upload_part_size_multiply_parts_count_threshold, 500, "Each time this number of parts was uploaded to S3 s3_min_upload_part_size multiplied by s3_upload_part_size_multiply_factor.", 0) \
|
||||
M(UInt64, s3_max_single_part_upload_size, 32*1024*1024, "The maximum size of object to upload using singlepart upload to S3.", 0) \
|
||||
M(UInt64, s3_max_single_read_retries, 4, "The maximum number of retries during single S3 read.", 0) \
|
||||
M(UInt64, s3_max_unexpected_write_error_retries, 4, "The maximum number of retries in case of unexpected errors during S3 write.", 0) \
|
||||
@ -776,6 +776,8 @@ static constexpr UInt64 operator""_GiB(unsigned long long value)
|
||||
M(Bool, output_format_json_array_of_rows, false, "Output a JSON array of all rows in JSONEachRow(Compact) format.", 0) \
|
||||
M(Bool, output_format_json_validate_utf8, false, "Validate UTF-8 sequences in JSON output formats, doesn't impact formats JSON/JSONCompact/JSONColumnsWithMetadata, they always validate utf8", 0) \
|
||||
\
|
||||
M(String, format_json_object_each_row_column_for_object_name, "", "The name of column that will be used as object names in JSONObjectEachRow format. Column type should be String", 0) \
|
||||
\
|
||||
M(UInt64, output_format_pretty_max_rows, 10000, "Rows limit for Pretty formats.", 0) \
|
||||
M(UInt64, output_format_pretty_max_column_pad_width, 250, "Maximum width to pad all values in a column in Pretty formats.", 0) \
|
||||
M(UInt64, output_format_pretty_max_value_width, 10000, "Maximum width of value to display in Pretty formats. If greater - it will be cut.", 0) \
|
||||
@ -890,6 +892,8 @@ struct Settings : public BaseSettings<SettingsTraits>, public IHints<2, Settings
|
||||
|
||||
void set(std::string_view name, const Field & value) override;
|
||||
|
||||
void setDefaultValue(const String & name) { resetToDefault(name); }
|
||||
|
||||
private:
|
||||
void applyCompatibilitySetting();
|
||||
|
||||
|
@ -10,6 +10,7 @@
|
||||
#include <DataTypes/DataTypeAggregateFunction.h>
|
||||
#include <DataTypes/Serializations/SerializationAggregateFunction.h>
|
||||
#include <DataTypes/DataTypeFactory.h>
|
||||
#include <DataTypes/transformTypesRecursively.h>
|
||||
#include <IO/WriteBufferFromString.h>
|
||||
#include <IO/Operators.h>
|
||||
|
||||
@ -241,6 +242,23 @@ static DataTypePtr create(const ASTPtr & arguments)
|
||||
return std::make_shared<DataTypeAggregateFunction>(function, argument_types, params_row, version);
|
||||
}
|
||||
|
||||
void setVersionToAggregateFunctions(DataTypePtr & type, bool if_empty, std::optional<size_t> revision)
|
||||
{
|
||||
auto callback = [revision, if_empty](DataTypePtr & column_type)
|
||||
{
|
||||
const auto * aggregate_function_type = typeid_cast<const DataTypeAggregateFunction *>(column_type.get());
|
||||
if (aggregate_function_type && aggregate_function_type->isVersioned())
|
||||
{
|
||||
if (revision)
|
||||
aggregate_function_type->updateVersionFromRevision(*revision, if_empty);
|
||||
else
|
||||
aggregate_function_type->setVersion(0, if_empty);
|
||||
}
|
||||
};
|
||||
|
||||
callOnNestedSimpleTypes(type, callback);
|
||||
}
|
||||
|
||||
|
||||
void registerDataTypeAggregateFunction(DataTypeFactory & factory)
|
||||
{
|
||||
|
@ -70,8 +70,6 @@ public:
|
||||
|
||||
bool isVersioned() const { return function->isVersioned(); }
|
||||
|
||||
size_t getVersionFromRevision(size_t revision) const { return function->getVersionFromRevision(revision); }
|
||||
|
||||
/// Version is not empty only if it was parsed from AST or implicitly cast to 0 or version according
|
||||
/// to server revision.
|
||||
/// It is ok to have an empty version value here - then for serialization a default (latest)
|
||||
@ -84,6 +82,13 @@ public:
|
||||
|
||||
version = version_;
|
||||
}
|
||||
|
||||
void updateVersionFromRevision(size_t revision, bool if_empty) const
|
||||
{
|
||||
setVersion(function->getVersionFromRevision(revision), if_empty);
|
||||
}
|
||||
};
|
||||
|
||||
void setVersionToAggregateFunctions(DataTypePtr & type, bool if_empty, std::optional<size_t> revision = std::nullopt);
|
||||
|
||||
}
|
||||
|
@ -76,9 +76,9 @@ void SerializationDate::serializeTextCSV(const IColumn & column, size_t row_num,
|
||||
|
||||
void SerializationDate::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings &) const
|
||||
{
|
||||
LocalDate value;
|
||||
DayNum value;
|
||||
readCSV(value, istr);
|
||||
assert_cast<ColumnUInt16 &>(column).getData().push_back(value.getDayNum());
|
||||
assert_cast<ColumnUInt16 &>(column).getData().push_back(value);
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -175,4 +175,10 @@ void transformTypesRecursively(DataTypes & types, std::function<void(DataTypes &
|
||||
transform_simple_types(types);
|
||||
}
|
||||
|
||||
void callOnNestedSimpleTypes(DataTypePtr & type, std::function<void(DataTypePtr &)> callback)
|
||||
{
|
||||
DataTypes types = {type};
|
||||
transformTypesRecursively(types, [callback](auto & data_types){ callback(data_types[0]); }, {});
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -14,4 +14,6 @@ namespace DB
|
||||
/// Function transform_complex_types will be applied to complex types (Array/Map/Tuple) after recursive call to their nested types.
|
||||
void transformTypesRecursively(DataTypes & types, std::function<void(DataTypes &)> transform_simple_types, std::function<void(DataTypes &)> transform_complex_types);
|
||||
|
||||
void callOnNestedSimpleTypes(DataTypePtr & type, std::function<void(DataTypePtr &)> callback);
|
||||
|
||||
}
|
||||
|
@ -1,6 +1,7 @@
|
||||
#include <Databases/DDLDependencyVisitor.h>
|
||||
#include <Dictionaries/getDictionaryConfigurationFromAST.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/evaluateConstantExpression.h>
|
||||
#include <Parsers/ASTCreateQuery.h>
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
@ -11,6 +12,8 @@
|
||||
namespace DB
|
||||
{
|
||||
|
||||
using TableLoadingDependenciesVisitor = DDLDependencyVisitor::Visitor;
|
||||
|
||||
TableNamesSet getDependenciesSetFromCreateQuery(ContextPtr global_context, const QualifiedTableName & table, const ASTPtr & ast)
|
||||
{
|
||||
assert(global_context == global_context->getGlobalContext());
|
||||
@ -35,7 +38,7 @@ void DDLDependencyVisitor::visit(const ASTPtr & ast, Data & data)
|
||||
visit(*storage, data);
|
||||
}
|
||||
|
||||
bool DDLDependencyVisitor::needChildVisit(const ASTPtr & node, const ASTPtr & child)
|
||||
bool DDLMatcherBase::needChildVisit(const ASTPtr & node, const ASTPtr & child)
|
||||
{
|
||||
if (node->as<ASTStorage>())
|
||||
return false;
|
||||
@ -49,20 +52,26 @@ bool DDLDependencyVisitor::needChildVisit(const ASTPtr & node, const ASTPtr & ch
|
||||
return true;
|
||||
}
|
||||
|
||||
void DDLDependencyVisitor::visit(const ASTFunction & function, Data & data)
|
||||
ssize_t DDLMatcherBase::getPositionOfTableNameArgument(const ASTFunction & function)
|
||||
{
|
||||
if (function.name == "joinGet" ||
|
||||
function.name == "dictHas" ||
|
||||
function.name == "dictIsIn" ||
|
||||
function.name.starts_with("dictGet"))
|
||||
{
|
||||
extractTableNameFromArgument(function, data, 0);
|
||||
}
|
||||
else if (Poco::toLower(function.name) == "in")
|
||||
{
|
||||
extractTableNameFromArgument(function, data, 1);
|
||||
}
|
||||
return 0;
|
||||
|
||||
if (Poco::toLower(function.name) == "in")
|
||||
return 1;
|
||||
|
||||
return -1;
|
||||
}
|
||||
|
||||
void DDLDependencyVisitor::visit(const ASTFunction & function, Data & data)
|
||||
{
|
||||
ssize_t table_name_arg_idx = getPositionOfTableNameArgument(function);
|
||||
if (table_name_arg_idx < 0)
|
||||
return;
|
||||
extractTableNameFromArgument(function, data, table_name_arg_idx);
|
||||
}
|
||||
|
||||
void DDLDependencyVisitor::visit(const ASTFunctionWithKeyValueArguments & dict_source, Data & data)
|
||||
@ -140,4 +149,50 @@ void DDLDependencyVisitor::extractTableNameFromArgument(const ASTFunction & func
|
||||
data.dependencies.emplace(std::move(qualified_name));
|
||||
}
|
||||
|
||||
|
||||
void NormalizeAndEvaluateConstants::visit(const ASTPtr & ast, Data & data)
|
||||
{
|
||||
assert(data.create_query_context->hasQueryContext());
|
||||
|
||||
/// Looking for functions in column default expressions and dictionary source definition
|
||||
if (const auto * function = ast->as<ASTFunction>())
|
||||
visit(*function, data);
|
||||
else if (const auto * dict_source = ast->as<ASTFunctionWithKeyValueArguments>())
|
||||
visit(*dict_source, data);
|
||||
}
|
||||
|
||||
void NormalizeAndEvaluateConstants::visit(const ASTFunction & function, Data & data)
|
||||
{
|
||||
/// Replace expressions like "dictGet(currentDatabase() || '.dict', 'value', toUInt32(1))"
|
||||
/// with "dictGet('db_name.dict', 'value', toUInt32(1))"
|
||||
ssize_t table_name_arg_idx = getPositionOfTableNameArgument(function);
|
||||
if (table_name_arg_idx < 0)
|
||||
return;
|
||||
|
||||
if (!function.arguments || function.arguments->children.size() <= static_cast<size_t>(table_name_arg_idx))
|
||||
return;
|
||||
|
||||
auto & arg = function.arguments->as<ASTExpressionList &>().children[table_name_arg_idx];
|
||||
if (arg->as<ASTFunction>())
|
||||
arg = evaluateConstantExpressionAsLiteral(arg, data.create_query_context);
|
||||
}
|
||||
|
||||
|
||||
void NormalizeAndEvaluateConstants::visit(const ASTFunctionWithKeyValueArguments & dict_source, Data & data)
|
||||
{
|
||||
if (!dict_source.elements)
|
||||
return;
|
||||
|
||||
auto & expr_list = dict_source.elements->as<ASTExpressionList &>();
|
||||
for (auto & child : expr_list.children)
|
||||
{
|
||||
ASTPair * pair = child->as<ASTPair>();
|
||||
if (pair->second->as<ASTFunction>())
|
||||
{
|
||||
auto ast_literal = evaluateConstantExpressionAsLiteral(pair->children[0], data.create_query_context);
|
||||
pair->replace(pair->second, ast_literal);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -14,11 +14,19 @@ using TableNamesSet = std::unordered_set<QualifiedTableName>;
|
||||
|
||||
TableNamesSet getDependenciesSetFromCreateQuery(ContextPtr global_context, const QualifiedTableName & table, const ASTPtr & ast);
|
||||
|
||||
|
||||
class DDLMatcherBase
|
||||
{
|
||||
public:
|
||||
static bool needChildVisit(const ASTPtr & node, const ASTPtr & child);
|
||||
static ssize_t getPositionOfTableNameArgument(const ASTFunction & function);
|
||||
};
|
||||
|
||||
/// Visits ASTCreateQuery and extracts names of table (or dictionary) dependencies
|
||||
/// from column default expressions (joinGet, dictGet, etc)
|
||||
/// or dictionary source (for dictionaries from local ClickHouse table).
|
||||
/// Does not validate AST, works a best-effort way.
|
||||
class DDLDependencyVisitor
|
||||
class DDLDependencyVisitor : public DDLMatcherBase
|
||||
{
|
||||
public:
|
||||
struct Data
|
||||
@ -32,7 +40,6 @@ public:
|
||||
using Visitor = ConstInDepthNodeVisitor<DDLDependencyVisitor, true>;
|
||||
|
||||
static void visit(const ASTPtr & ast, Data & data);
|
||||
static bool needChildVisit(const ASTPtr & node, const ASTPtr & child);
|
||||
|
||||
private:
|
||||
static void visit(const ASTFunction & function, Data & data);
|
||||
@ -42,6 +49,24 @@ private:
|
||||
static void extractTableNameFromArgument(const ASTFunction & function, Data & data, size_t arg_idx);
|
||||
};
|
||||
|
||||
using TableLoadingDependenciesVisitor = DDLDependencyVisitor::Visitor;
|
||||
class NormalizeAndEvaluateConstants : public DDLMatcherBase
|
||||
{
|
||||
public:
|
||||
struct Data
|
||||
{
|
||||
ContextPtr create_query_context;
|
||||
};
|
||||
|
||||
using Visitor = ConstInDepthNodeVisitor<NormalizeAndEvaluateConstants, true>;
|
||||
|
||||
static void visit(const ASTPtr & ast, Data & data);
|
||||
|
||||
private:
|
||||
static void visit(const ASTFunction & function, Data & data);
|
||||
static void visit(const ASTFunctionWithKeyValueArguments & dict_source, Data & data);
|
||||
|
||||
};
|
||||
|
||||
using NormalizeAndEvaluateConstantsVisitor = NormalizeAndEvaluateConstants::Visitor;
|
||||
|
||||
}
|
||||
|
@ -452,6 +452,11 @@ void buildConfigurationFromFunctionWithKeyValueArguments(
|
||||
}
|
||||
else if (const auto * func = pair->second->as<ASTFunction>())
|
||||
{
|
||||
/// This branch exists only for compatibility.
|
||||
/// It's not possible to have a function in a dictionary definition since 22.10,
|
||||
/// because query must be normalized on dictionary creation. It's possible only when we load old metadata.
|
||||
/// For debug builds allow it only during server startup to avoid crash in BC check in Stress Tests.
|
||||
assert(!Context::getGlobalContextInstance()->isServerCompletelyStarted());
|
||||
auto builder = FunctionFactory::instance().tryGet(func->name, context);
|
||||
auto function = builder->build({});
|
||||
function->prepare({});
|
||||
|
@ -241,6 +241,11 @@ DiskObjectStoragePtr DiskDecorator::createDiskObjectStorage()
|
||||
return delegate->createDiskObjectStorage();
|
||||
}
|
||||
|
||||
ObjectStoragePtr DiskDecorator::getObjectStorage()
|
||||
{
|
||||
return delegate->getObjectStorage();
|
||||
}
|
||||
|
||||
DiskPtr DiskDecorator::getNestedDisk() const
|
||||
{
|
||||
if (const auto * decorator = dynamic_cast<const DiskDecorator *>(delegate.get()))
|
||||
|
@ -89,6 +89,7 @@ public:
|
||||
void getRemotePathsRecursive(const String & path, std::vector<LocalPathWithObjectStoragePaths> & paths_map) override { return delegate->getRemotePathsRecursive(path, paths_map); }
|
||||
|
||||
DiskObjectStoragePtr createDiskObjectStorage() override;
|
||||
ObjectStoragePtr getObjectStorage() override;
|
||||
NameSet getCacheLayersNames() const override { return delegate->getCacheLayersNames(); }
|
||||
|
||||
MetadataStoragePtr getMetadataStorage() override { return delegate->getMetadataStorage(); }
|
||||
|
@ -366,6 +366,14 @@ public:
|
||||
/// Return current disk revision.
|
||||
virtual UInt64 getRevision() const { return 0; }
|
||||
|
||||
virtual ObjectStoragePtr getObjectStorage()
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::NOT_IMPLEMENTED,
|
||||
"Method getObjectStorage() is not implemented for disk type: {}",
|
||||
getDataSourceDescription().type);
|
||||
}
|
||||
|
||||
/// Create disk object storage according to disk type.
|
||||
/// For example for DiskLocal create DiskObjectStorage(LocalObjectStorage),
|
||||
/// for DiskObjectStorage create just a copy.
|
||||
|
@ -82,6 +82,11 @@ DiskTransactionPtr DiskObjectStorage::createTransaction()
|
||||
return std::make_shared<FakeDiskTransaction>(*this);
|
||||
}
|
||||
|
||||
ObjectStoragePtr DiskObjectStorage::getObjectStorage()
|
||||
{
|
||||
return object_storage;
|
||||
}
|
||||
|
||||
DiskTransactionPtr DiskObjectStorage::createObjectStorageTransaction()
|
||||
{
|
||||
return std::make_shared<DiskObjectStorageTransaction>(
|
||||
|
@ -166,6 +166,8 @@ public:
|
||||
|
||||
UInt64 getRevision() const override;
|
||||
|
||||
ObjectStoragePtr getObjectStorage() override;
|
||||
|
||||
DiskObjectStoragePtr createDiskObjectStorage() override;
|
||||
|
||||
bool supportsCache() const override;
|
||||
|
@ -9,6 +9,7 @@
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/// Store metadata in the disk itself.
|
||||
class FakeMetadataStorageFromDisk final : public IMetadataStorage
|
||||
{
|
||||
private:
|
||||
|
@ -12,6 +12,7 @@ class Logger;
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/// Treat local disk as an object storage (for interface compatibility).
|
||||
class LocalObjectStorage : public IObjectStorage
|
||||
{
|
||||
public:
|
||||
|
@ -10,6 +10,8 @@
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/// Store metadata on a separate disk
|
||||
/// (used for object storages, like S3 and related).
|
||||
class MetadataStorageFromDisk final : public IMetadataStorage
|
||||
{
|
||||
private:
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user