mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-01 20:12:02 +00:00
Merge branch 'master' into fix-nothing-error
This commit is contained in:
commit
f46f7257dd
@ -176,6 +176,8 @@ CheckOptions:
|
||||
value: CamelCase
|
||||
- key: modernize-loop-convert.UseCxx20ReverseRanges
|
||||
value: false
|
||||
- key: performance-move-const-arg.CheckTriviallyCopyableMove
|
||||
value: false
|
||||
# Workaround clang-tidy bug: https://github.com/llvm/llvm-project/issues/46097
|
||||
- key: readability-identifier-naming.TypeTemplateParameterIgnoredRegexp
|
||||
value: expr-type
|
||||
|
2
.github/workflows/backport.yml
vendored
2
.github/workflows/backport.yml
vendored
@ -9,6 +9,8 @@ concurrency:
|
||||
on: # yamllint disable-line rule:truthy
|
||||
schedule:
|
||||
- cron: '0 */3 * * *'
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
CherryPick:
|
||||
runs-on: [self-hosted, style-checker]
|
||||
|
@ -13,9 +13,7 @@ max-statements=200
|
||||
ignore-long-lines = (# )?<?https?://\S+>?$
|
||||
|
||||
[MESSAGES CONTROL]
|
||||
disable = bad-continuation,
|
||||
missing-docstring,
|
||||
bad-whitespace,
|
||||
disable = missing-docstring,
|
||||
too-few-public-methods,
|
||||
invalid-name,
|
||||
too-many-arguments,
|
||||
|
171
CHANGELOG.md
171
CHANGELOG.md
@ -1,10 +1,177 @@
|
||||
### Table of Contents
|
||||
**[ClickHouse release v22.6, 2022-06-16](#226)**<br>
|
||||
**[ClickHouse release v22.5, 2022-05-19](#225)**<br>
|
||||
**[ClickHouse release v22.4, 2022-04-20](#224)**<br>
|
||||
**[ClickHouse release v22.3-lts, 2022-03-17](#223)**<br>
|
||||
**[ClickHouse release v22.2, 2022-02-17](#222)**<br>
|
||||
**[ClickHouse release v22.1, 2022-01-18](#221)**<br>
|
||||
**[Changelog for 2021](https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/whats-new/changelog/2021.md)**<br>
|
||||
**[Changelog for 2021](https://clickhouse.com/docs/en/whats-new/changelog/2021/)**<br>
|
||||
|
||||
### <a id="226"></a> ClickHouse release 22.6, 2022-06-16
|
||||
|
||||
#### Backward Incompatible Change
|
||||
* Remove support for octal number literals in SQL. In previous versions they were parsed as Float64. [#37765](https://github.com/ClickHouse/ClickHouse/pull/37765) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Changes how settings using `seconds` as type are parsed to support floating point values (for example: `max_execution_time=0.5`). Infinity or NaN values will throw an exception. [#37187](https://github.com/ClickHouse/ClickHouse/pull/37187) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Changed format of binary serialization of columns of experimental type `Object`. New format is more convenient to implement by third-party clients. [#37482](https://github.com/ClickHouse/ClickHouse/pull/37482) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Turn on setting `output_format_json_named_tuples_as_objects` by default. It allows to serialize named tuples as JSON objects in JSON formats. [#37756](https://github.com/ClickHouse/ClickHouse/pull/37756) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* LIKE patterns with trailing escape symbol ('\\') are now disallowed (as mandated by the SQL standard). [#37764](https://github.com/ClickHouse/ClickHouse/pull/37764) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* If you run different ClickHouse versions on a cluster with AArch64 CPU or mix AArch64 and amd64 on a cluster, and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don't fit in 64 bits, and the size of the result is huge, the data will not be fully aggregated in the result of these queries during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade.
|
||||
|
||||
#### New Feature
|
||||
* A new codec [FPC](https://userweb.cs.txstate.edu/~burtscher/papers/dcc07a.pdf) algorithm for floating point data compression. [#37553](https://github.com/ClickHouse/ClickHouse/pull/37553) ([Mikhail Guzov](https://github.com/koloshmet)).
|
||||
* Add new columnar JSON formats: `JSONColumns`, `JSONCompactColumns`, `JSONColumnsWithMetadata`. Closes [#36338](https://github.com/ClickHouse/ClickHouse/issues/36338) Closes [#34509](https://github.com/ClickHouse/ClickHouse/issues/34509). [#36975](https://github.com/ClickHouse/ClickHouse/pull/36975) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Added open telemetry traces visualizing tool based on d3js. [#37810](https://github.com/ClickHouse/ClickHouse/pull/37810) ([Sergei Trifonov](https://github.com/serxa)).
|
||||
* Support INSERTs into `system.zookeeper` table. Closes [#22130](https://github.com/ClickHouse/ClickHouse/issues/22130). [#37596](https://github.com/ClickHouse/ClickHouse/pull/37596) ([Han Fei](https://github.com/hanfei1991)).
|
||||
* Support non-constant pattern argument for `LIKE`, `ILIKE` and `match` functions. [#37251](https://github.com/ClickHouse/ClickHouse/pull/37251) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Executable user defined functions now support parameters. Example: `SELECT test_function(parameters)(arguments)`. Closes [#37578](https://github.com/ClickHouse/ClickHouse/issues/37578). [#37720](https://github.com/ClickHouse/ClickHouse/pull/37720) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Add `merge_reason` column to system.part_log table. [#36912](https://github.com/ClickHouse/ClickHouse/pull/36912) ([Sema Checherinda](https://github.com/CheSema)).
|
||||
* Add support for Maps and Records in Avro format. Add new setting `input_format_avro_null_as_default ` that allow to insert null as default in Avro format. Closes [#18925](https://github.com/ClickHouse/ClickHouse/issues/18925) Closes [#37378](https://github.com/ClickHouse/ClickHouse/issues/37378) Closes [#32899](https://github.com/ClickHouse/ClickHouse/issues/32899). [#37525](https://github.com/ClickHouse/ClickHouse/pull/37525) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Add `clickhouse-disks` tool to introspect and operate on virtual filesystems configured for ClickHouse. [#36060](https://github.com/ClickHouse/ClickHouse/pull/36060) ([Artyom Yurkov](https://github.com/Varinara)).
|
||||
* Adds H3 unidirectional edge functions. [#36843](https://github.com/ClickHouse/ClickHouse/pull/36843) ([Bharat Nallan](https://github.com/bharatnc)).
|
||||
* Add support for calculating [hashids](https://hashids.org/) from unsigned integers. [#37013](https://github.com/ClickHouse/ClickHouse/pull/37013) ([Michael Nutt](https://github.com/mnutt)).
|
||||
* Explicit `SALT` specification is allowed for `CREATE USER <user> IDENTIFIED WITH sha256_hash`. [#37377](https://github.com/ClickHouse/ClickHouse/pull/37377) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Add two new settings `input_format_csv_skip_first_lines/input_format_tsv_skip_first_lines` to allow skipping specified number of lines in the beginning of the file in CSV/TSV formats. [#37537](https://github.com/ClickHouse/ClickHouse/pull/37537) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* `showCertificate` function shows current server's SSL certificate. [#37540](https://github.com/ClickHouse/ClickHouse/pull/37540) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* HTTP source for Data Dictionaries in Named Collections is supported. [#37581](https://github.com/ClickHouse/ClickHouse/pull/37581) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Added a new window function `nonNegativeDerivative(metric_column, timestamp_column[, INTERVAL x SECOND])`. [#37628](https://github.com/ClickHouse/ClickHouse/pull/37628) ([Andrey Zvonov](https://github.com/zvonand)).
|
||||
* Implemented changing the comment for `ReplicatedMergeTree` tables. [#37416](https://github.com/ClickHouse/ClickHouse/pull/37416) ([Vasily Nemkov](https://github.com/Enmk)).
|
||||
* Added `SYSTEM UNFREEZE` query that deletes the whole backup regardless if the corresponding table is deleted or not. [#36424](https://github.com/ClickHouse/ClickHouse/pull/36424) ([Vadim Volodin](https://github.com/PolyProgrammist)).
|
||||
|
||||
#### Experimental Feature
|
||||
* Add `GROUPING` function. Closes [#19426](https://github.com/ClickHouse/ClickHouse/issues/19426). [#37163](https://github.com/ClickHouse/ClickHouse/pull/37163) ([Dmitry Novik](https://github.com/novikd)). The behavior of this function will be changed in subsequent releases.
|
||||
* Enables `POPULATE` for `WINDOW VIEW`. [#36945](https://github.com/ClickHouse/ClickHouse/pull/36945) ([vxider](https://github.com/Vxider)).
|
||||
* `ALTER TABLE ... MODIFY QUERY` support for `WINDOW VIEW`. [#37188](https://github.com/ClickHouse/ClickHouse/pull/37188) ([vxider](https://github.com/Vxider)).
|
||||
* This PR changes the behavior of the `ENGINE` syntax in `WINDOW VIEW`, to make it like in `MATERIALIZED VIEW`. [#37214](https://github.com/ClickHouse/ClickHouse/pull/37214) ([vxider](https://github.com/Vxider)).
|
||||
|
||||
#### Performance Improvement
|
||||
* Added numerous optimizations for ARM NEON [#38093](https://github.com/ClickHouse/ClickHouse/pull/38093)([Daniel Kutenin](https://github.com/danlark1)), ([Alexandra Pilipyuk](https://github.com/chalice19)) Note: if you run different ClickHouse versions on a cluster with ARM CPU and use distributed queries with GROUP BY multiple keys of fixed-size type that fit in 256 bits but don't fit in 64 bits, the result of the aggregation query will be wrong during upgrade. Workaround: upgrade with downtime instead of a rolling upgrade.
|
||||
* Improve performance and memory usage for select of subset of columns for formats Native, Protobuf, CapnProto, JSONEachRow, TSKV, all formats with suffixes WithNames/WithNamesAndTypes. Previously while selecting only subset of columns from files in these formats all columns were read and stored in memory. Now only required columns are read. This PR enables setting `input_format_skip_unknown_fields` by default, because otherwise in case of select of subset of columns exception will be thrown. [#37192](https://github.com/ClickHouse/ClickHouse/pull/37192) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Now more filters can be pushed down for join. [#37472](https://github.com/ClickHouse/ClickHouse/pull/37472) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Load marks for only necessary columns when reading wide parts. [#36879](https://github.com/ClickHouse/ClickHouse/pull/36879) ([Anton Kozlov](https://github.com/tonickkozlov)).
|
||||
* Improved performance of aggregation in case, when sparse columns (can be enabled by experimental setting `ratio_of_defaults_for_sparse_serialization` in `MergeTree` tables) are used as arguments in aggregate functions. [#37617](https://github.com/ClickHouse/ClickHouse/pull/37617) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Optimize function `COALESCE` with two arguments. [#37666](https://github.com/ClickHouse/ClickHouse/pull/37666) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Replace `multiIf` to `if` in case when `multiIf` has only one condition, because function `if` is more performant. [#37695](https://github.com/ClickHouse/ClickHouse/pull/37695) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Improve performance of `dictGetDescendants`, `dictGetChildren` functions, create temporary parent to children hierarchical index per query, not per function call during query. Allow to specify `BIDIRECTIONAL` for `HIERARHICAL` attributes, dictionary will maintain parent to children index in memory, that way functions `dictGetDescendants`, `dictGetChildren` will not create temporary index per query. Closes [#32481](https://github.com/ClickHouse/ClickHouse/issues/32481). [#37148](https://github.com/ClickHouse/ClickHouse/pull/37148) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Aggregates state destruction now may be posted on a thread pool. For queries with LIMIT and big state it provides significant speedup, e.g. `select uniq(number) from numbers_mt(1e7) group by number limit 100` became around 2.5x faster. [#37855](https://github.com/ClickHouse/ClickHouse/pull/37855) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Improve sort performance by single column. [#37195](https://github.com/ClickHouse/ClickHouse/pull/37195) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Improve performance of single column sorting using sorting queue specializations. [#37990](https://github.com/ClickHouse/ClickHouse/pull/37990) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Improved performance on array norm and distance functions 2x-4x times. [#37394](https://github.com/ClickHouse/ClickHouse/pull/37394) ([Alexander Gololobov](https://github.com/davenger)).
|
||||
* Improve performance of number comparison functions using dynamic dispatch. [#37399](https://github.com/ClickHouse/ClickHouse/pull/37399) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Improve performance of ORDER BY with LIMIT. [#37481](https://github.com/ClickHouse/ClickHouse/pull/37481) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Improve performance of `hasAll` function using dynamic dispatch infrastructure. [#37484](https://github.com/ClickHouse/ClickHouse/pull/37484) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Improve performance of `greatCircleAngle`, `greatCircleDistance`, `geoDistance` functions. [#37524](https://github.com/ClickHouse/ClickHouse/pull/37524) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Improve performance of insert into MergeTree if there are multiple columns in ORDER BY. [#35762](https://github.com/ClickHouse/ClickHouse/pull/35762) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix excessive CPU usage in background when there are a lot of tables. [#38028](https://github.com/ClickHouse/ClickHouse/pull/38028) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Improve performance of `not` function using dynamic dispatch. [#38058](https://github.com/ClickHouse/ClickHouse/pull/38058) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Optimized the internal caching of re2 patterns which occur e.g. in LIKE and MATCH functions. [#37544](https://github.com/ClickHouse/ClickHouse/pull/37544) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Improve filter bitmask generator function all in one with AVX-512 instructions. [#37588](https://github.com/ClickHouse/ClickHouse/pull/37588) ([yaqi-zhao](https://github.com/yaqi-zhao)).
|
||||
* Apply read method `threadpool` for Hive integration engine. This will significantly speed up reading. [#36328](https://github.com/ClickHouse/ClickHouse/pull/36328) ([李扬](https://github.com/taiyang-li)).
|
||||
* When all the columns to read are partition keys, construct columns by the file's row number without real reading the Hive file. [#37103](https://github.com/ClickHouse/ClickHouse/pull/37103) ([lgbo](https://github.com/lgbo-ustc)).
|
||||
* Support multi disks for caching hive files. [#37279](https://github.com/ClickHouse/ClickHouse/pull/37279) ([lgbo](https://github.com/lgbo-ustc)).
|
||||
* Limiting the maximum cache usage per query can effectively prevent cache pool contamination. [Related Issues](https://github.com/ClickHouse/ClickHouse/issues/28961). [#37859](https://github.com/ClickHouse/ClickHouse/pull/37859) ([Han Shukai](https://github.com/KinderRiven)).
|
||||
* Currently clickhouse directly downloads all remote files to the local cache (even if they are only read once), which will frequently cause IO of the local hard disk. In some scenarios, these IOs may not be necessary and may easily cause negative optimization. As shown in the figure below, when we run SSB Q1-Q4, the performance of the cache has caused negative optimization. [#37516](https://github.com/ClickHouse/ClickHouse/pull/37516) ([Han Shukai](https://github.com/KinderRiven)).
|
||||
* Allow to prune the list of files via virtual columns such as `_file` and `_path` when reading from S3. This is for [#37174](https://github.com/ClickHouse/ClickHouse/issues/37174) , [#23494](https://github.com/ClickHouse/ClickHouse/issues/23494). [#37356](https://github.com/ClickHouse/ClickHouse/pull/37356) ([Amos Bird](https://github.com/amosbird)).
|
||||
* In function: CompressedWriteBuffer::nextImpl(), there is an unnecessary write-copy step that would happen frequently during inserting data. Below shows the differentiation with this patch: - Before: 1. Compress "working_buffer" into "compressed_buffer" 2. write-copy into "out" - After: Directly Compress "working_buffer" into "out". [#37242](https://github.com/ClickHouse/ClickHouse/pull/37242) ([jasperzhu](https://github.com/jinjunzh)).
|
||||
|
||||
#### Improvement
|
||||
* Support types with non-standard defaults in ROLLUP, CUBE, GROUPING SETS. Closes [#37360](https://github.com/ClickHouse/ClickHouse/issues/37360). [#37667](https://github.com/ClickHouse/ClickHouse/pull/37667) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* Fix stack traces collection on ARM. Closes [#37044](https://github.com/ClickHouse/ClickHouse/issues/37044). Closes [#15638](https://github.com/ClickHouse/ClickHouse/issues/15638). [#37797](https://github.com/ClickHouse/ClickHouse/pull/37797) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Client will try every IP address returned by DNS resolution until successful connection. [#37273](https://github.com/ClickHouse/ClickHouse/pull/37273) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Allow to use String type instead of Binary in Arrow/Parquet/ORC formats. This PR introduces 3 new settings for it: `output_format_arrow_string_as_string`, `output_format_parquet_string_as_string`, `output_format_orc_string_as_string`. Default value for all settings is `false`. [#37327](https://github.com/ClickHouse/ClickHouse/pull/37327) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Apply setting `input_format_max_rows_to_read_for_schema_inference` for all read rows in total from all files in globs. Previously setting `input_format_max_rows_to_read_for_schema_inference` was applied for each file in glob separately and in case of huge number of nulls we could read first `input_format_max_rows_to_read_for_schema_inference` rows from each file and get nothing. Also increase default value for this setting to 25000. [#37332](https://github.com/ClickHouse/ClickHouse/pull/37332) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Add separate `CLUSTER` grant (and `access_control_improvements.on_cluster_queries_require_cluster_grant` configuration directive, for backward compatibility, default to `false`). [#35767](https://github.com/ClickHouse/ClickHouse/pull/35767) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Added support for schema inference for `hdfsCluster`. [#35812](https://github.com/ClickHouse/ClickHouse/pull/35812) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Implement `least_used` load balancing algorithm for disks inside volume (multi disk configuration). [#36686](https://github.com/ClickHouse/ClickHouse/pull/36686) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Modify the HTTP Endpoint to return the full stats under the `X-ClickHouse-Summary` header when `send_progress_in_http_headers=0` (before it would return all zeros). - Modify the HTTP Endpoint to return `X-ClickHouse-Exception-Code` header when progress has been sent before (`send_progress_in_http_headers=1`) - Modify the HTTP Endpoint to return `HTTP_REQUEST_TIMEOUT` (408) instead of `HTTP_INTERNAL_SERVER_ERROR` (500) on `TIMEOUT_EXCEEDED` errors. [#36884](https://github.com/ClickHouse/ClickHouse/pull/36884) ([Raúl Marín](https://github.com/Algunenano)).
|
||||
* Allow a user to inspect grants from granted roles. [#36941](https://github.com/ClickHouse/ClickHouse/pull/36941) ([nvartolomei](https://github.com/nvartolomei)).
|
||||
* Do not calculate an integral numerically but use CDF functions instead. This will speed up execution and will increase the precision. This fixes [#36714](https://github.com/ClickHouse/ClickHouse/issues/36714). [#36953](https://github.com/ClickHouse/ClickHouse/pull/36953) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Add default implementation for Nothing in functions. Now most of the functions will return column with type Nothing in case one of it's arguments is Nothing. It also solves problem with functions like arrayMap/arrayFilter and similar when they have empty array as an argument. Previously queries like `select arrayMap(x -> 2 * x, []);` failed because function inside lambda cannot work with type `Nothing`, now such queries return empty array with type `Array(Nothing)`. Also add support for arrays of nullable types in functions like arrayFilter/arrayFill. Previously, queries like `select arrayFilter(x -> x % 2, [1, NULL])` failed, now they work (if the result of lambda is NULL, then this value won't be included in the result). Closes [#37000](https://github.com/ClickHouse/ClickHouse/issues/37000). [#37048](https://github.com/ClickHouse/ClickHouse/pull/37048) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Now if a shard has local replica we create a local plan and a plan to read from all remote replicas. They have shared initiator which coordinates reading. [#37204](https://github.com/ClickHouse/ClickHouse/pull/37204) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Do no longer abort server startup if configuration option "mark_cache_size" is not explicitly set. [#37326](https://github.com/ClickHouse/ClickHouse/pull/37326) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Allows providing `NULL`/`NOT NULL` right after type in column declaration. [#37337](https://github.com/ClickHouse/ClickHouse/pull/37337) ([Igor Nikonov](https://github.com/devcrafter)).
|
||||
* optimize file segment PARTIALLY_DOWNLOADED get read buffer. [#37338](https://github.com/ClickHouse/ClickHouse/pull/37338) ([xiedeyantu](https://github.com/xiedeyantu)).
|
||||
* Try to improve short circuit functions processing to fix problems with stress tests. [#37384](https://github.com/ClickHouse/ClickHouse/pull/37384) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Closes [#37395](https://github.com/ClickHouse/ClickHouse/issues/37395). [#37415](https://github.com/ClickHouse/ClickHouse/pull/37415) ([Memo](https://github.com/Joeywzr)).
|
||||
* Fix extremely rare deadlock during part fetch in zero-copy replication. Fixes [#37423](https://github.com/ClickHouse/ClickHouse/issues/37423). [#37424](https://github.com/ClickHouse/ClickHouse/pull/37424) ([metahys](https://github.com/metahys)).
|
||||
* Don't allow to create storage with unknown data format. [#37450](https://github.com/ClickHouse/ClickHouse/pull/37450) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Set `global_memory_usage_overcommit_max_wait_microseconds` default value to 5 seconds. Add info about `OvercommitTracker` to OOM exception message. Add `MemoryOvercommitWaitTimeMicroseconds` profile event. [#37460](https://github.com/ClickHouse/ClickHouse/pull/37460) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* Do not display `-0.0` CPU time in clickhouse-client. It can appear due to rounding errors. This closes [#38003](https://github.com/ClickHouse/ClickHouse/issues/38003). This closes [#38038](https://github.com/ClickHouse/ClickHouse/issues/38038). [#38064](https://github.com/ClickHouse/ClickHouse/pull/38064) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Play UI: Keep controls in place when the page is scrolled horizontally. This makes edits comfortable even if the table is wide and it was scrolled far to the right. The feature proposed by Maksym Tereshchenko from CaspianDB. [#37470](https://github.com/ClickHouse/ClickHouse/pull/37470) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Modify query div in play.html to be extendable beyond 20% height. In case of very long queries it is helpful to extend the textarea element, only today, since the div is fixed height, the extended textarea hides the data div underneath. With this fix, extending the textarea element will push the data div down/up such the extended textarea won't hide it. Also, keeps query box width 100% even when the user adjusting the size of the query textarea. [#37488](https://github.com/ClickHouse/ClickHouse/pull/37488) ([guyco87](https://github.com/guyco87)).
|
||||
* Added `ProfileEvents` for introspection of type of written (inserted or merged) parts (`Inserted{Wide/Compact/InMemory}Parts`, `MergedInto{Wide/Compact/InMemory}Parts`. Added column `part_type` to `system.part_log`. Resolves [#37495](https://github.com/ClickHouse/ClickHouse/issues/37495). [#37536](https://github.com/ClickHouse/ClickHouse/pull/37536) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* clickhouse-keeper improvement: move broken logs to a timestamped folder. [#37565](https://github.com/ClickHouse/ClickHouse/pull/37565) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Do not write expired columns by TTL after subsequent merges (before only first merge/optimize of the part will not write expired by TTL columns, all other will do). [#37570](https://github.com/ClickHouse/ClickHouse/pull/37570) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* More precise result of the `dumpColumnStructure` miscellaneous function in presence of LowCardinality or Sparse columns. In previous versions, these functions were converting the argument to a full column before returning the result. This is needed to provide an answer in [#6935](https://github.com/ClickHouse/ClickHouse/issues/6935). [#37633](https://github.com/ClickHouse/ClickHouse/pull/37633) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* clickhouse-keeper: store only unique session IDs for watches. [#37641](https://github.com/ClickHouse/ClickHouse/pull/37641) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix possible "Cannot write to finalized buffer". [#37645](https://github.com/ClickHouse/ClickHouse/pull/37645) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Add setting `support_batch_delete` for `DiskS3` to disable multiobject delete calls, which Google Cloud Storage doesn't support. [#37659](https://github.com/ClickHouse/ClickHouse/pull/37659) ([Fred Wulff](https://github.com/frew)).
|
||||
* Add an option to disable connection pooling in ODBC bridge. [#37705](https://github.com/ClickHouse/ClickHouse/pull/37705) ([Anton Kozlov](https://github.com/tonickkozlov)).
|
||||
* Functions `dictGetHierarchy`, `dictIsIn`, `dictGetChildren`, `dictGetDescendants` added support nullable `HIERARCHICAL` attribute in dictionaries. Closes [#35521](https://github.com/ClickHouse/ClickHouse/issues/35521). [#37805](https://github.com/ClickHouse/ClickHouse/pull/37805) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Expose BoringSSL version related info in the `system.build_options` table. [#37850](https://github.com/ClickHouse/ClickHouse/pull/37850) ([Bharat Nallan](https://github.com/bharatnc)).
|
||||
* Now clickhouse-server removes `delete_tmp` directories on server start. Fixes [#26503](https://github.com/ClickHouse/ClickHouse/issues/26503). [#37906](https://github.com/ClickHouse/ClickHouse/pull/37906) ([alesapin](https://github.com/alesapin)).
|
||||
* Clean up broken detached parts after timeout. Closes [#25195](https://github.com/ClickHouse/ClickHouse/issues/25195). [#37975](https://github.com/ClickHouse/ClickHouse/pull/37975) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Now in MergeTree table engines family failed-to-move parts will be removed instantly. [#37994](https://github.com/ClickHouse/ClickHouse/pull/37994) ([alesapin](https://github.com/alesapin)).
|
||||
* Now if setting `always_fetch_merged_part` is enabled for ReplicatedMergeTree merges will try to find parts on other replicas rarely with smaller load for [Zoo]Keeper. [#37995](https://github.com/ClickHouse/ClickHouse/pull/37995) ([alesapin](https://github.com/alesapin)).
|
||||
* Add implicit grants with grant option too. For example `GRANT CREATE TABLE ON test.* TO A WITH GRANT OPTION` now allows `A` to execute `GRANT CREATE VIEW ON test.* TO B`. [#38017](https://github.com/ClickHouse/ClickHouse/pull/38017) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
* Use `clang-14` and LLVM infrastructure version 14 for builds. This closes [#34681](https://github.com/ClickHouse/ClickHouse/issues/34681). [#34754](https://github.com/ClickHouse/ClickHouse/pull/34754) ([Alexey Milovidov](https://github.com/alexey-milovidov)). Note: `clang-14` has [a bug](https://github.com/google/sanitizers/issues/1540) in ThreadSanitizer that makes our CI work worse.
|
||||
* Allow to drop privileges at startup. This simplifies Docker images. Closes [#36293](https://github.com/ClickHouse/ClickHouse/issues/36293). [#36341](https://github.com/ClickHouse/ClickHouse/pull/36341) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add docs spellcheck to CI. [#37790](https://github.com/ClickHouse/ClickHouse/pull/37790) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix overly aggressive stripping which removed the embedded hash required for checking the consistency of the executable. [#37993](https://github.com/ClickHouse/ClickHouse/pull/37993) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
|
||||
#### Bug Fix
|
||||
|
||||
* Fix `SELECT ... INTERSECT` and `EXCEPT SELECT` statements with constant string types. [#37738](https://github.com/ClickHouse/ClickHouse/pull/37738) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Fix `GROUP BY` `AggregateFunction` (i.e. you `GROUP BY` by the column that has `AggregateFunction` type). [#37093](https://github.com/ClickHouse/ClickHouse/pull/37093) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* (experimental WINDOW VIEW) Fix `addDependency` in WindowView. This bug can be reproduced like [#37237](https://github.com/ClickHouse/ClickHouse/issues/37237). [#37224](https://github.com/ClickHouse/ClickHouse/pull/37224) ([vxider](https://github.com/Vxider)).
|
||||
* Fix inconsistency in ORDER BY ... WITH FILL feature. Query, containing ORDER BY ... WITH FILL, can generate extra rows when multiple WITH FILL columns are present. [#38074](https://github.com/ClickHouse/ClickHouse/pull/38074) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* This PR moving `addDependency` from constructor to `startup()` to avoid adding dependency to a *dropped* table, fix [#37237](https://github.com/ClickHouse/ClickHouse/issues/37237). [#37243](https://github.com/ClickHouse/ClickHouse/pull/37243) ([vxider](https://github.com/Vxider)).
|
||||
* Fix inserting defaults for missing values in columnar formats. Previously missing columns were filled with defaults for types, not for columns. [#37253](https://github.com/ClickHouse/ClickHouse/pull/37253) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* (experimental Object type) Fix some cases of insertion nested arrays to columns of type `Object`. [#37305](https://github.com/ClickHouse/ClickHouse/pull/37305) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix unexpected errors with a clash of constant strings in aggregate function, prewhere and join. Close [#36891](https://github.com/ClickHouse/ClickHouse/issues/36891). [#37336](https://github.com/ClickHouse/ClickHouse/pull/37336) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix projections with GROUP/ORDER BY in query and optimize_aggregation_in_order (before the result was incorrect since only finish sorting was performed). [#37342](https://github.com/ClickHouse/ClickHouse/pull/37342) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fixed error with symbols in key name in S3. Fixes [#33009](https://github.com/ClickHouse/ClickHouse/issues/33009). [#37344](https://github.com/ClickHouse/ClickHouse/pull/37344) ([Vladimir Chebotarev](https://github.com/excitoon)).
|
||||
* Throw an exception when GROUPING SETS used with ROLLUP or CUBE. [#37367](https://github.com/ClickHouse/ClickHouse/pull/37367) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* Fix LOGICAL_ERROR in getMaxSourcePartsSizeForMerge during merges (in case of non standard, greater, values of `background_pool_size`/`background_merges_mutations_concurrency_ratio` has been specified in `config.xml` (new way) not in `users.xml` (deprecated way)). [#37413](https://github.com/ClickHouse/ClickHouse/pull/37413) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Stop removing UTF-8 BOM in RowBinary format. [#37428](https://github.com/ClickHouse/ClickHouse/pull/37428) ([Paul Loyd](https://github.com/loyd)). [#37428](https://github.com/ClickHouse/ClickHouse/pull/37428) ([Paul Loyd](https://github.com/loyd)).
|
||||
* clickhouse-keeper bugfix: fix force recovery for single node cluster. [#37440](https://github.com/ClickHouse/ClickHouse/pull/37440) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Fix logical error in normalizeUTF8 functions. Closes [#37298](https://github.com/ClickHouse/ClickHouse/issues/37298). [#37443](https://github.com/ClickHouse/ClickHouse/pull/37443) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix cast lowcard of nullable in JoinSwitcher, close [#37385](https://github.com/ClickHouse/ClickHouse/issues/37385). [#37453](https://github.com/ClickHouse/ClickHouse/pull/37453) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix named tuples output in ORC/Arrow/Parquet formats. [#37458](https://github.com/ClickHouse/ClickHouse/pull/37458) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fix optimization of monotonous functions in ORDER BY clause in presence of GROUPING SETS. Fixes [#37401](https://github.com/ClickHouse/ClickHouse/issues/37401). [#37493](https://github.com/ClickHouse/ClickHouse/pull/37493) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* Fix error on joining with dictionary on some conditions. Close [#37386](https://github.com/ClickHouse/ClickHouse/issues/37386). [#37530](https://github.com/ClickHouse/ClickHouse/pull/37530) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Prohibit `optimize_aggregation_in_order` with `GROUPING SETS` (fixes `LOGICAL_ERROR`). [#37542](https://github.com/ClickHouse/ClickHouse/pull/37542) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix wrong dump information of ActionsDAG. [#37587](https://github.com/ClickHouse/ClickHouse/pull/37587) ([zhanglistar](https://github.com/zhanglistar)).
|
||||
* Fix converting types for UNION queries (may produce LOGICAL_ERROR). [#37593](https://github.com/ClickHouse/ClickHouse/pull/37593) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix `WITH FILL` modifier with negative intervals in `STEP` clause. Fixes [#37514](https://github.com/ClickHouse/ClickHouse/issues/37514). [#37600](https://github.com/ClickHouse/ClickHouse/pull/37600) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix illegal joinGet array usage when ` join_use_nulls = 1`. This fixes [#37562](https://github.com/ClickHouse/ClickHouse/issues/37562) . [#37650](https://github.com/ClickHouse/ClickHouse/pull/37650) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix columns number mismatch in cross join, close [#37561](https://github.com/ClickHouse/ClickHouse/issues/37561). [#37653](https://github.com/ClickHouse/ClickHouse/pull/37653) ([Vladimir C](https://github.com/vdimir)).
|
||||
* Fix segmentation fault in `show create table` from mysql database when it is configured with named collections. Closes [#37683](https://github.com/ClickHouse/ClickHouse/issues/37683). [#37690](https://github.com/ClickHouse/ClickHouse/pull/37690) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix RabbitMQ Storage not being able to startup on server restart if storage was create without SETTINGS clause. Closes [#37463](https://github.com/ClickHouse/ClickHouse/issues/37463). [#37691](https://github.com/ClickHouse/ClickHouse/pull/37691) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* SQL user defined functions disable CREATE/DROP in readonly mode. Closes [#37280](https://github.com/ClickHouse/ClickHouse/issues/37280). [#37699](https://github.com/ClickHouse/ClickHouse/pull/37699) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix formatting of Nullable arguments for executable user defined functions. Closes [#35897](https://github.com/ClickHouse/ClickHouse/issues/35897). [#37711](https://github.com/ClickHouse/ClickHouse/pull/37711) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix optimization enabled by setting `optimize_monotonous_functions_in_order_by` in distributed queries. Fixes [#36037](https://github.com/ClickHouse/ClickHouse/issues/36037). [#37724](https://github.com/ClickHouse/ClickHouse/pull/37724) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix possible logical error: `Invalid Field get from type UInt64 to type Float64` in `values` table function. Closes [#37602](https://github.com/ClickHouse/ClickHouse/issues/37602). [#37754](https://github.com/ClickHouse/ClickHouse/pull/37754) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fix possible segfault in schema inference in case of exception in SchemaReader constructor. Closes [#37680](https://github.com/ClickHouse/ClickHouse/issues/37680). [#37760](https://github.com/ClickHouse/ClickHouse/pull/37760) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fix setting cast_ipv4_ipv6_default_on_conversion_error for internal cast function. Closes [#35156](https://github.com/ClickHouse/ClickHouse/issues/35156). [#37761](https://github.com/ClickHouse/ClickHouse/pull/37761) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix toString error on DatatypeDate32. [#37775](https://github.com/ClickHouse/ClickHouse/pull/37775) ([LiuNeng](https://github.com/liuneng1994)).
|
||||
* The clickhouse-keeper setting `dead_session_check_period_ms` was transformed into microseconds (multiplied by 1000), which lead to dead sessions only being cleaned up after several minutes (instead of 500ms). [#37824](https://github.com/ClickHouse/ClickHouse/pull/37824) ([Michael Lex](https://github.com/mlex)).
|
||||
* Fix possible "No more packets are available" for distributed queries (in case of `async_socket_for_remote`/`use_hedged_requests` is disabled). [#37826](https://github.com/ClickHouse/ClickHouse/pull/37826) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* (experimental WINDOW VIEW) Do not drop the inner target table when executing `ALTER TABLE … MODIFY QUERY` in WindowView. [#37879](https://github.com/ClickHouse/ClickHouse/pull/37879) ([vxider](https://github.com/Vxider)).
|
||||
* Fix directory ownership of coordination dir in clickhouse-keeper Docker image. Fixes [#37914](https://github.com/ClickHouse/ClickHouse/issues/37914). [#37915](https://github.com/ClickHouse/ClickHouse/pull/37915) ([James Maidment](https://github.com/jamesmaidment)).
|
||||
* Dictionaries fix custom query with update field and `{condition}`. Closes [#33746](https://github.com/ClickHouse/ClickHouse/issues/33746). [#37947](https://github.com/ClickHouse/ClickHouse/pull/37947) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix possible incorrect result of `SELECT ... WITH FILL` in the case when `ORDER BY` should be applied after `WITH FILL` result (e.g. for outer query). Incorrect result was caused by optimization for `ORDER BY` expressions ([#35623](https://github.com/ClickHouse/ClickHouse/issues/35623)). Closes [#37904](https://github.com/ClickHouse/ClickHouse/issues/37904). [#37959](https://github.com/ClickHouse/ClickHouse/pull/37959) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* (experimental WINDOW VIEW) Add missing default columns when pushing to the target table in WindowView, fix [#37815](https://github.com/ClickHouse/ClickHouse/issues/37815). [#37965](https://github.com/ClickHouse/ClickHouse/pull/37965) ([vxider](https://github.com/Vxider)).
|
||||
* Fixed too large stack frame that would cause compilation to fail. [#37996](https://github.com/ClickHouse/ClickHouse/pull/37996) ([Han Shukai](https://github.com/KinderRiven)).
|
||||
* When open enable_filesystem_query_cache_limit, throw Reserved cache size exceeds the remaining cache size. [#38004](https://github.com/ClickHouse/ClickHouse/pull/38004) ([xiedeyantu](https://github.com/xiedeyantu)).
|
||||
* Fix converting types for UNION queries (may produce LOGICAL_ERROR). [#34775](https://github.com/ClickHouse/ClickHouse/pull/34775) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* TTL merge may not be scheduled again if BackgroundExecutor is busy. --merges_with_ttl_counter is increased in selectPartsToMerge() --merge task will be ignored if BackgroundExecutor is busy --merges_with_ttl_counter will not be decrease. [#36387](https://github.com/ClickHouse/ClickHouse/pull/36387) ([lthaooo](https://github.com/lthaooo)).
|
||||
* Fix overridden settings value of `normalize_function_names`. [#36937](https://github.com/ClickHouse/ClickHouse/pull/36937) ([李扬](https://github.com/taiyang-li)).
|
||||
* Fix for exponential time decaying window functions. Now respecting boundaries of the window. [#36944](https://github.com/ClickHouse/ClickHouse/pull/36944) ([Vladimir Chebotarev](https://github.com/excitoon)).
|
||||
* Fix possible heap-use-after-free error when reading system.projection_parts and system.projection_parts_columns . This fixes [#37184](https://github.com/ClickHouse/ClickHouse/issues/37184). [#37185](https://github.com/ClickHouse/ClickHouse/pull/37185) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fixed `DateTime64` fractional seconds behavior prior to Unix epoch. [#37697](https://github.com/ClickHouse/ClickHouse/pull/37697) ([Andrey Zvonov](https://github.com/zvonand)). [#37039](https://github.com/ClickHouse/ClickHouse/pull/37039) ([李扬](https://github.com/taiyang-li)).
|
||||
|
||||
|
||||
### <a id="225"></a> ClickHouse release 22.5, 2022-05-19
|
||||
|
||||
@ -172,7 +339,7 @@
|
||||
|
||||
#### Backward Incompatible Change
|
||||
|
||||
* Do not allow SETTINGS after FORMAT for INSERT queries (there is compatibility setting `parser_settings_after_format_compact` to accept such queries, but it is turned OFF by default). [#35883](https://github.com/ClickHouse/ClickHouse/pull/35883) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Do not allow SETTINGS after FORMAT for INSERT queries (there is compatibility setting `allow_settings_after_format_in_insert` to accept such queries, but it is turned OFF by default). [#35883](https://github.com/ClickHouse/ClickHouse/pull/35883) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Function `yandexConsistentHash` (consistent hashing algorithm by Konstantin "kostik" Oblakov) is renamed to `kostikConsistentHash`. The old name is left as an alias for compatibility. Although this change is backward compatible, we may remove the alias in subsequent releases, that's why it's recommended to update the usages of this function in your apps. [#35553](https://github.com/ClickHouse/ClickHouse/pull/35553) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
|
||||
#### New Feature
|
||||
|
@ -244,16 +244,18 @@ endif ()
|
||||
# Add a section with the hash of the compiled machine code for integrity checks.
|
||||
# Only for official builds, because adding a section can be time consuming (rewrite of several GB).
|
||||
# And cross compiled binaries are not supported (since you cannot execute clickhouse hash-binary)
|
||||
if (OBJCOPY_PATH AND CLICKHOUSE_OFFICIAL_BUILD AND (NOT CMAKE_TOOLCHAIN_FILE OR CMAKE_TOOLCHAIN_FILE MATCHES "linux/toolchain-x86_64.cmake$"))
|
||||
if (CLICKHOUSE_OFFICIAL_BUILD AND (NOT CMAKE_TOOLCHAIN_FILE OR CMAKE_TOOLCHAIN_FILE MATCHES "linux/toolchain-x86_64.cmake$"))
|
||||
message(STATUS "Official build: A checksum hash will be added to the clickhouse executable")
|
||||
set (USE_BINARY_HASH 1 CACHE STRING "Calculate binary hash and store it in the separate section")
|
||||
else ()
|
||||
message(STATUS "No official build: A checksum hash will not be added to the clickhouse executable")
|
||||
endif ()
|
||||
|
||||
# Allows to build stripped binary in a separate directory
|
||||
if (OBJCOPY_PATH AND STRIP_PATH)
|
||||
option(INSTALL_STRIPPED_BINARIES "Build stripped binaries with debug info in separate directory" OFF)
|
||||
if (INSTALL_STRIPPED_BINARIES)
|
||||
set(STRIPPED_BINARIES_OUTPUT "stripped" CACHE STRING "A separate directory for stripped information")
|
||||
endif()
|
||||
# Optionally split binaries and debug symbols.
|
||||
option(INSTALL_STRIPPED_BINARIES "Split binaries and debug symbols" OFF)
|
||||
if (INSTALL_STRIPPED_BINARIES)
|
||||
message(STATUS "Will split binaries and debug symbols")
|
||||
set(STRIPPED_BINARIES_OUTPUT "stripped" CACHE STRING "A separate directory for stripped information")
|
||||
endif()
|
||||
|
||||
cmake_host_system_information(RESULT AVAILABLE_PHYSICAL_MEMORY QUERY AVAILABLE_PHYSICAL_MEMORY) # Not available under freebsd
|
||||
|
@ -13,7 +13,3 @@ ClickHouse® is an open-source column-oriented database management system that a
|
||||
* [Code Browser (Woboq)](https://clickhouse.com/codebrowser/ClickHouse/index.html) with syntax highlight and navigation.
|
||||
* [Code Browser (github.dev)](https://github.dev/ClickHouse/ClickHouse) with syntax highlight, powered by github.dev.
|
||||
* [Contacts](https://clickhouse.com/company/#contact) can help to get your questions answered if there are any.
|
||||
|
||||
## Upcoming Events
|
||||
|
||||
* [ClickHouse Meetup Amsterdam (in-person and online)](https://www.meetup.com/clickhouse-netherlands-user-group/events/286017044/) on June 8th, 2022
|
||||
|
@ -49,7 +49,7 @@ struct Decimal
|
||||
using NativeType = T;
|
||||
|
||||
constexpr Decimal() = default;
|
||||
constexpr Decimal(Decimal<T> &&) = default;
|
||||
constexpr Decimal(Decimal<T> &&) noexcept = default;
|
||||
constexpr Decimal(const Decimal<T> &) = default;
|
||||
|
||||
constexpr Decimal(const T & value_): value(value_) {}
|
||||
@ -57,7 +57,7 @@ struct Decimal
|
||||
template <typename U>
|
||||
constexpr Decimal(const Decimal<U> & x): value(x.value) {}
|
||||
|
||||
constexpr Decimal<T> & operator = (Decimal<T> &&) = default;
|
||||
constexpr Decimal<T> & operator=(Decimal<T> &&) noexcept = default;
|
||||
constexpr Decimal<T> & operator = (const Decimal<T> &) = default;
|
||||
|
||||
constexpr operator T () const { return value; }
|
||||
|
@ -9,7 +9,7 @@ std::string errnoToString(int code, int the_errno)
|
||||
char buf[buf_size];
|
||||
#ifndef _GNU_SOURCE
|
||||
int rc = strerror_r(the_errno, buf, buf_size);
|
||||
#ifdef __APPLE__
|
||||
#ifdef OS_DARWIN
|
||||
if (rc != 0 && rc != EINVAL)
|
||||
#else
|
||||
if (rc != 0)
|
||||
|
@ -16,7 +16,7 @@ uint64_t getAvailableMemoryAmountOrZero()
|
||||
{
|
||||
#if defined(_SC_PHYS_PAGES) // linux
|
||||
return getPageSize() * sysconf(_SC_PHYS_PAGES);
|
||||
#elif defined(__FreeBSD__)
|
||||
#elif defined(OS_FREEBSD)
|
||||
struct vmtotal vmt;
|
||||
size_t vmt_size = sizeof(vmt);
|
||||
if (sysctlbyname("vm.vmtotal", &vmt, &vmt_size, NULL, 0) == 0)
|
||||
|
@ -6,7 +6,7 @@
|
||||
|
||||
#include <base/defines.h>
|
||||
|
||||
#if defined(__linux__) && !defined(THREAD_SANITIZER) && !defined(USE_MUSL)
|
||||
#if defined(OS_LINUX) && !defined(THREAD_SANITIZER) && !defined(USE_MUSL)
|
||||
#define USE_PHDR_CACHE 1
|
||||
#endif
|
||||
|
||||
|
@ -29,7 +29,7 @@ if (ARCH_NATIVE)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=native")
|
||||
|
||||
elseif (ARCH_AARCH64)
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=armv8-a+crc")
|
||||
set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=armv8-a+crc+simd+crypto+dotprod+ssbs")
|
||||
|
||||
elseif (ARCH_PPC64LE)
|
||||
# Note that gcc and clang have support for x86 SSE2 intrinsics when building for PowerPC
|
||||
|
@ -19,9 +19,12 @@ macro(clickhouse_strip_binary)
|
||||
COMMAND mkdir -p "${STRIP_DESTINATION_DIR}/lib/debug/bin"
|
||||
COMMAND mkdir -p "${STRIP_DESTINATION_DIR}/bin"
|
||||
COMMAND cp "${STRIP_BINARY_PATH}" "${STRIP_DESTINATION_DIR}/bin/${STRIP_TARGET}"
|
||||
# Splits debug symbols into separate file, leaves the binary untouched:
|
||||
COMMAND "${OBJCOPY_PATH}" --only-keep-debug --compress-debug-sections "${STRIP_DESTINATION_DIR}/bin/${STRIP_TARGET}" "${STRIP_DESTINATION_DIR}/lib/debug/bin/${STRIP_TARGET}.debug"
|
||||
COMMAND chmod 0644 "${STRIP_DESTINATION_DIR}/lib/debug/bin/${STRIP_TARGET}.debug"
|
||||
COMMAND "${STRIP_PATH}" --remove-section=.comment --remove-section=.note "${STRIP_DESTINATION_DIR}/bin/${STRIP_TARGET}"
|
||||
# Strips binary, sections '.note' & '.comment' are removed in line with Debian's stripping policy: www.debian.org/doc/debian-policy/ch-files.html, section '.clickhouse.hash' is needed for integrity check:
|
||||
COMMAND "${STRIP_PATH}" --remove-section=.comment --remove-section=.note --keep-section=.clickhouse.hash "${STRIP_DESTINATION_DIR}/bin/${STRIP_TARGET}"
|
||||
# Associate stripped binary with debug symbols:
|
||||
COMMAND "${OBJCOPY_PATH}" --add-gnu-debuglink "${STRIP_DESTINATION_DIR}/lib/debug/bin/${STRIP_TARGET}.debug" "${STRIP_DESTINATION_DIR}/bin/${STRIP_TARGET}"
|
||||
COMMENT "Stripping clickhouse binary" VERBATIM
|
||||
)
|
||||
|
@ -77,6 +77,7 @@ if (OS_LINUX AND NOT LINKER_NAME)
|
||||
|
||||
if (NOT LINKER_NAME)
|
||||
if (GOLD_PATH)
|
||||
message (WARNING "Linking with gold is not recommended. Please use lld.")
|
||||
if (COMPILER_GCC)
|
||||
set (LINKER_NAME "gold")
|
||||
else ()
|
||||
|
@ -705,3 +705,109 @@ target_compile_options(_crypto PRIVATE -Wno-gnu-anonymous-struct)
|
||||
|
||||
add_library(OpenSSL::Crypto ALIAS _crypto)
|
||||
add_library(OpenSSL::SSL ALIAS _ssl)
|
||||
|
||||
# Helper function used in the populate_openssl_vars function below
|
||||
function(from_hex HEX DEC)
|
||||
string(TOUPPER "${HEX}" HEX)
|
||||
set(_res 0)
|
||||
string(LENGTH "${HEX}" _strlen)
|
||||
|
||||
while (_strlen GREATER 0)
|
||||
math(EXPR _res "${_res} * 16")
|
||||
string(SUBSTRING "${HEX}" 0 1 NIBBLE)
|
||||
string(SUBSTRING "${HEX}" 1 -1 HEX)
|
||||
if (NIBBLE STREQUAL "A")
|
||||
math(EXPR _res "${_res} + 10")
|
||||
elseif (NIBBLE STREQUAL "B")
|
||||
math(EXPR _res "${_res} + 11")
|
||||
elseif (NIBBLE STREQUAL "C")
|
||||
math(EXPR _res "${_res} + 12")
|
||||
elseif (NIBBLE STREQUAL "D")
|
||||
math(EXPR _res "${_res} + 13")
|
||||
elseif (NIBBLE STREQUAL "E")
|
||||
math(EXPR _res "${_res} + 14")
|
||||
elseif (NIBBLE STREQUAL "F")
|
||||
math(EXPR _res "${_res} + 15")
|
||||
else ()
|
||||
math(EXPR _res "${_res} + ${NIBBLE}")
|
||||
endif ()
|
||||
|
||||
string(LENGTH "${HEX}" _strlen)
|
||||
endwhile ()
|
||||
|
||||
set(${DEC} ${_res} PARENT_SCOPE)
|
||||
endfunction()
|
||||
|
||||
# ClickHouse uses BoringSSL which is a fork of OpenSSL.
|
||||
# This populates CMAKE var OPENSSL_VERSION from the OPENSSL_VERSION_NUMBER defined
|
||||
# in contrib/boringssl/include/openssl/base.h. It also sets the CMAKE var OPENSSL_IS_BORING_SSL
|
||||
# if it's defined in the file. Both OPENSSL_VERSION and OPENSSL_IS_BORING_SSL variables will be
|
||||
# used to populate flags in the `system.build_options` table for more context on ssl version used.
|
||||
# This cmake script is adopted from FindOpenSSL cmake module and slightly modified for this use-case .
|
||||
if (EXISTS "${BORINGSSL_SOURCE_DIR}/include/openssl/base.h")
|
||||
file(STRINGS "${BORINGSSL_SOURCE_DIR}/include/openssl/base.h" openssl_version_str
|
||||
REGEX "^#[\t ]*define[\t ]+OPENSSL_VERSION_NUMBER[\t ]+0x([0-9a-fA-F])+.*")
|
||||
|
||||
file(STRINGS "${BORINGSSL_SOURCE_DIR}/include/openssl/base.h" openssl_is_boringssl
|
||||
REGEX "^#[\t ]*define[\t ]+OPENSSL_IS_BORINGSSL.*")
|
||||
|
||||
# Set to true if OPENSSL_IS_BORING_SSL is defined
|
||||
if (openssl_is_boringssl)
|
||||
set(OPENSSL_IS_BORING_SSL 1)
|
||||
endif ()
|
||||
|
||||
# If openssl_version_str is defined extrapolate and set OPENSSL_VERSION
|
||||
if (openssl_version_str)
|
||||
# The version number is encoded as 0xMNNFFPPS: major minor fix patch status
|
||||
# The status gives if this is a developer or prerelease and is ignored here.
|
||||
# Major, minor, and fix directly translate into the version numbers shown in
|
||||
# the string. The patch field translates to the single character suffix that
|
||||
# indicates the bug fix state, which 00 -> nothing, 01 -> a, 02 -> b and so
|
||||
# on.
|
||||
|
||||
string(REGEX REPLACE "^.*OPENSSL_VERSION_NUMBER[\t ]+0x([0-9a-fA-F])([0-9a-fA-F][0-9a-fA-F])([0-9a-fA-F][0-9a-fA-F])([0-9a-fA-F][0-9a-fA-F])([0-9a-fA-F]).*$"
|
||||
"\\1;\\2;\\3;\\4;\\5" OPENSSL_VERSION_LIST "${openssl_version_str}")
|
||||
list(GET OPENSSL_VERSION_LIST 0 OPENSSL_VERSION_MAJOR)
|
||||
list(GET OPENSSL_VERSION_LIST 1 OPENSSL_VERSION_MINOR)
|
||||
from_hex("${OPENSSL_VERSION_MINOR}" OPENSSL_VERSION_MINOR)
|
||||
list(GET OPENSSL_VERSION_LIST 2 OPENSSL_VERSION_FIX)
|
||||
from_hex("${OPENSSL_VERSION_FIX}" OPENSSL_VERSION_FIX)
|
||||
list(GET OPENSSL_VERSION_LIST 3 OPENSSL_VERSION_PATCH)
|
||||
|
||||
if (NOT OPENSSL_VERSION_PATCH STREQUAL "00")
|
||||
from_hex("${OPENSSL_VERSION_PATCH}" _tmp)
|
||||
# 96 is the ASCII code of 'a' minus 1
|
||||
math(EXPR OPENSSL_VERSION_PATCH_ASCII "${_tmp} + 96")
|
||||
unset(_tmp)
|
||||
# Once anyone knows how OpenSSL would call the patch versions beyond 'z'
|
||||
# this should be updated to handle that, too. This has not happened yet
|
||||
# so it is simply ignored here for now.
|
||||
string(ASCII "${OPENSSL_VERSION_PATCH_ASCII}" OPENSSL_VERSION_PATCH_STRING)
|
||||
endif ()
|
||||
|
||||
set(OPENSSL_VERSION "${OPENSSL_VERSION_MAJOR}.${OPENSSL_VERSION_MINOR}.${OPENSSL_VERSION_FIX}${OPENSSL_VERSION_PATCH_STRING}")
|
||||
else ()
|
||||
# Since OpenSSL 3.0.0, the new version format is MAJOR.MINOR.PATCH and
|
||||
# a new OPENSSL_VERSION_STR macro contains exactly that
|
||||
file(STRINGS "${BORINGSSL_SOURCE_DIR}/include/openssl/base.h" OPENSSL_VERSION_STR
|
||||
REGEX "^#[\t ]*define[\t ]+OPENSSL_VERSION_STR[\t ]+\"([0-9])+\\.([0-9])+\\.([0-9])+\".*")
|
||||
string(REGEX REPLACE "^.*OPENSSL_VERSION_STR[\t ]+\"([0-9]+\\.[0-9]+\\.[0-9]+)\".*$"
|
||||
"\\1" OPENSSL_VERSION_STR "${OPENSSL_VERSION_STR}")
|
||||
|
||||
set(OPENSSL_VERSION "${OPENSSL_VERSION_STR}")
|
||||
|
||||
# Setting OPENSSL_VERSION_MAJOR OPENSSL_VERSION_MINOR and OPENSSL_VERSION_FIX
|
||||
string(REGEX MATCHALL "([0-9])+" OPENSSL_VERSION_NUMBER "${OPENSSL_VERSION}")
|
||||
list(POP_FRONT OPENSSL_VERSION_NUMBER
|
||||
OPENSSL_VERSION_MAJOR
|
||||
OPENSSL_VERSION_MINOR
|
||||
OPENSSL_VERSION_FIX)
|
||||
|
||||
unset(OPENSSL_VERSION_NUMBER)
|
||||
unset(OPENSSL_VERSION_STR)
|
||||
endif ()
|
||||
endif ()
|
||||
|
||||
# Set CMAKE variables so that they can be referenced properly from everywhere
|
||||
set(OPENSSL_VERSION "${OPENSSL_VERSION}" CACHE INTERNAL "")
|
||||
set(OPENSSL_IS_BORING_SSL "${OPENSSL_IS_BORING_SSL}" CACHE INTERNAL 0)
|
||||
|
2
contrib/grpc
vendored
2
contrib/grpc
vendored
@ -1 +1 @@
|
||||
Subproject commit 7eac189a6badddac593580ec2ad1478bd2656fc7
|
||||
Subproject commit 5e23e96c0c02e451dbb291cf9f66231d02b6cdb6
|
2
contrib/libunwind
vendored
2
contrib/libunwind
vendored
@ -1 +1 @@
|
||||
Subproject commit c4ea9848a697747dfa35325af9b3452f30841685
|
||||
Subproject commit 5022f30f3e092a54a7c101c335ce5e08769db366
|
@ -76,9 +76,7 @@ message (STATUS "LLVM library Directory: ${LLVM_LIBRARY_DIRS}")
|
||||
message (STATUS "LLVM C++ compiler flags: ${LLVM_CXXFLAGS}")
|
||||
|
||||
# ld: unknown option: --color-diagnostics
|
||||
if (APPLE)
|
||||
set (LINKER_SUPPORTS_COLOR_DIAGNOSTICS 0 CACHE INTERNAL "")
|
||||
endif ()
|
||||
set (LINKER_SUPPORTS_COLOR_DIAGNOSTICS 0 CACHE INTERNAL "")
|
||||
|
||||
# Do not adjust RPATH in llvm, since then it will not be able to find libcxx/libcxxabi/libunwind
|
||||
set (CMAKE_INSTALL_RPATH "ON")
|
||||
|
2
contrib/simdjson
vendored
2
contrib/simdjson
vendored
@ -1 +1 @@
|
||||
Subproject commit 8df32cea3359cb30120795da6020b3b73da01d38
|
||||
Subproject commit de196dd7a3a16e4056b0551ffa3b85c2f52581e1
|
2
contrib/zstd
vendored
2
contrib/zstd
vendored
@ -1 +1 @@
|
||||
Subproject commit a488ba114ec17ea1054b9057c26a046fc122b3b6
|
||||
Subproject commit b944db0c451ba1bc6bbd8e201d5f88f9041bf1f9
|
@ -50,7 +50,7 @@ GetLibraryVersion("${HEADER_CONTENT}" LIBVER_MAJOR LIBVER_MINOR LIBVER_RELEASE)
|
||||
MESSAGE(STATUS "ZSTD VERSION ${LIBVER_MAJOR}.${LIBVER_MINOR}.${LIBVER_RELEASE}")
|
||||
|
||||
# cd contrib/zstd/lib
|
||||
# find . -name '*.c' | grep -vP 'deprecated|legacy' | sort | sed 's/^\./ "${LIBRARY_DIR}/"'
|
||||
# find . -name '*.c' -or -name '*.S' | grep -vP 'deprecated|legacy' | sort | sed 's/^\./ "${LIBRARY_DIR}/"'
|
||||
SET(Sources
|
||||
"${LIBRARY_DIR}/common/debug.c"
|
||||
"${LIBRARY_DIR}/common/entropy_common.c"
|
||||
@ -73,6 +73,7 @@ SET(Sources
|
||||
"${LIBRARY_DIR}/compress/zstd_ldm.c"
|
||||
"${LIBRARY_DIR}/compress/zstdmt_compress.c"
|
||||
"${LIBRARY_DIR}/compress/zstd_opt.c"
|
||||
"${LIBRARY_DIR}/decompress/huf_decompress_amd64.S"
|
||||
"${LIBRARY_DIR}/decompress/huf_decompress.c"
|
||||
"${LIBRARY_DIR}/decompress/zstd_ddict.c"
|
||||
"${LIBRARY_DIR}/decompress/zstd_decompress_block.c"
|
||||
@ -85,6 +86,7 @@ SET(Sources
|
||||
# cd contrib/zstd/lib
|
||||
# find . -name '*.h' | grep -vP 'deprecated|legacy' | sort | sed 's/^\./ "${LIBRARY_DIR}/"'
|
||||
SET(Headers
|
||||
"${LIBRARY_DIR}/common/bits.h"
|
||||
"${LIBRARY_DIR}/common/bitstream.h"
|
||||
"${LIBRARY_DIR}/common/compiler.h"
|
||||
"${LIBRARY_DIR}/common/cpu.h"
|
||||
@ -94,11 +96,13 @@ SET(Headers
|
||||
"${LIBRARY_DIR}/common/huf.h"
|
||||
"${LIBRARY_DIR}/common/mem.h"
|
||||
"${LIBRARY_DIR}/common/pool.h"
|
||||
"${LIBRARY_DIR}/common/portability_macros.h"
|
||||
"${LIBRARY_DIR}/common/threading.h"
|
||||
"${LIBRARY_DIR}/common/xxhash.h"
|
||||
"${LIBRARY_DIR}/common/zstd_deps.h"
|
||||
"${LIBRARY_DIR}/common/zstd_internal.h"
|
||||
"${LIBRARY_DIR}/common/zstd_trace.h"
|
||||
"${LIBRARY_DIR}/compress/clevels.h"
|
||||
"${LIBRARY_DIR}/compress/hist.h"
|
||||
"${LIBRARY_DIR}/compress/zstd_compress_internal.h"
|
||||
"${LIBRARY_DIR}/compress/zstd_compress_literals.h"
|
||||
|
@ -1,15 +1,16 @@
|
||||
# rebuild in #36968
|
||||
# docker build -t clickhouse/docs-builder .
|
||||
# nodejs 17 prefers ipv6 and is broken in our environment
|
||||
FROM node:16.14.2-alpine3.15
|
||||
FROM node:16-alpine
|
||||
|
||||
RUN apk add --no-cache git openssh bash
|
||||
|
||||
# TODO: clean before merge!
|
||||
ARG DOCS_BRANCH=main
|
||||
# At this point we want to really update /opt/clickhouse-docs
|
||||
# despite the cached images
|
||||
ARG CACHE_INVALIDATOR=0
|
||||
|
||||
RUN git clone https://github.com/ClickHouse/clickhouse-docs.git \
|
||||
--depth=1 --branch=${DOCS_BRANCH} /opt/clickhouse-docs
|
||||
--depth=1 --branch=main /opt/clickhouse-docs
|
||||
|
||||
WORKDIR /opt/clickhouse-docs
|
||||
|
||||
|
@ -8,8 +8,6 @@ if [ "$GIT_DOCS_BRANCH" ] && ! [ "$GIT_DOCS_BRANCH" == "$GIT_BRANCH" ]; then
|
||||
git fetch origin --depth=1 -- "$GIT_DOCS_BRANCH:$GIT_DOCS_BRANCH"
|
||||
git checkout "$GIT_DOCS_BRANCH"
|
||||
else
|
||||
# Untracked yarn.lock could cause pull to fail
|
||||
git clean -fdx
|
||||
# Update docs repo
|
||||
git pull
|
||||
fi
|
||||
|
@ -42,6 +42,7 @@ DATA_DIR="${CLICKHOUSE_DATA_DIR:-/var/lib/clickhouse}"
|
||||
LOG_DIR="${LOG_DIR:-/var/log/clickhouse-keeper}"
|
||||
LOG_PATH="${LOG_DIR}/clickhouse-keeper.log"
|
||||
ERROR_LOG_PATH="${LOG_DIR}/clickhouse-keeper.err.log"
|
||||
COORDINATION_DIR="${DATA_DIR}/coordination"
|
||||
COORDINATION_LOG_DIR="${DATA_DIR}/coordination/log"
|
||||
COORDINATION_SNAPSHOT_DIR="${DATA_DIR}/coordination/snapshots"
|
||||
CLICKHOUSE_WATCHDOG_ENABLE=${CLICKHOUSE_WATCHDOG_ENABLE:-0}
|
||||
@ -49,6 +50,7 @@ CLICKHOUSE_WATCHDOG_ENABLE=${CLICKHOUSE_WATCHDOG_ENABLE:-0}
|
||||
for dir in "$DATA_DIR" \
|
||||
"$LOG_DIR" \
|
||||
"$TMP_DIR" \
|
||||
"$COORDINATION_DIR" \
|
||||
"$COORDINATION_LOG_DIR" \
|
||||
"$COORDINATION_SNAPSHOT_DIR"
|
||||
do
|
||||
|
@ -21,7 +21,9 @@ By default, starting above server instance will be run as default user without p
|
||||
|
||||
### connect to it from a native client
|
||||
```bash
|
||||
$ docker run -it --rm --link some-clickhouse-server:clickhouse-server clickhouse/clickhouse-client --host clickhouse-server
|
||||
$ docker run -it --rm --link some-clickhouse-server:clickhouse-server --entrypoint clickhouse-client clickhouse/clickhouse-server --host clickhouse-server
|
||||
# OR
|
||||
$ docker exec -it some-clickhouse-server clickhouse-client
|
||||
```
|
||||
|
||||
More information about [ClickHouse client](https://clickhouse.com/docs/en/interfaces/cli/).
|
||||
|
@ -7,22 +7,12 @@ RUN apt-get update -y \
|
||||
&& env DEBIAN_FRONTEND=noninteractive \
|
||||
apt-get install --yes --no-install-recommends \
|
||||
python3-requests \
|
||||
llvm-9
|
||||
&& apt-get clean
|
||||
|
||||
COPY s3downloader /s3downloader
|
||||
|
||||
ENV S3_URL="https://clickhouse-datasets.s3.amazonaws.com"
|
||||
ENV DATASETS="hits visits"
|
||||
ENV EXPORT_S3_STORAGE_POLICIES=1
|
||||
|
||||
# Download Minio-related binaries
|
||||
RUN arch=${TARGETARCH:-amd64} \
|
||||
&& if [ "$arch" = "amd64" ] ; then wget "https://dl.min.io/server/minio/release/linux-${arch}/archive/minio-20220103182258.0.0.x86_64.rpm"; else wget "https://dl.min.io/server/minio/release/linux-${arch}/archive/minio-20220103182258.0.0.aarch64.rpm" ; fi \
|
||||
&& wget "https://dl.min.io/client/mc/release/linux-${arch}/mc" \
|
||||
&& chmod +x ./mc
|
||||
ENV MINIO_ROOT_USER="clickhouse"
|
||||
ENV MINIO_ROOT_PASSWORD="clickhouse"
|
||||
COPY setup_minio.sh /
|
||||
|
||||
COPY run.sh /
|
||||
CMD ["/bin/bash", "/run.sh"]
|
||||
|
@ -17,7 +17,7 @@ ln -s /usr/share/clickhouse-test/clickhouse-test /usr/bin/clickhouse-test
|
||||
# install test configs
|
||||
/usr/share/clickhouse-test/config/install.sh
|
||||
|
||||
./setup_minio.sh
|
||||
./setup_minio.sh stateful
|
||||
|
||||
function start()
|
||||
{
|
||||
|
@ -1,77 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
# TODO: Make this file shared with stateless tests
|
||||
#
|
||||
# Usage for local run:
|
||||
#
|
||||
# ./docker/test/stateful/setup_minio.sh ./tests/
|
||||
#
|
||||
|
||||
set -e -x -a -u
|
||||
|
||||
rpm2cpio ./minio-20220103182258.0.0.*.rpm | cpio -i --make-directories
|
||||
find / -name minio
|
||||
cp ./usr/local/bin/minio ./
|
||||
|
||||
ls -lha
|
||||
|
||||
mkdir -p ./minio_data
|
||||
|
||||
if [ ! -f ./minio ]; then
|
||||
echo 'MinIO binary not found, downloading...'
|
||||
|
||||
BINARY_TYPE=$(uname -s | tr '[:upper:]' '[:lower:]')
|
||||
|
||||
wget "https://dl.min.io/server/minio/release/${BINARY_TYPE}-amd64/minio" \
|
||||
&& chmod +x ./minio \
|
||||
&& wget "https://dl.min.io/client/mc/release/${BINARY_TYPE}-amd64/mc" \
|
||||
&& chmod +x ./mc
|
||||
fi
|
||||
|
||||
MINIO_ROOT_USER=${MINIO_ROOT_USER:-clickhouse}
|
||||
MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-clickhouse}
|
||||
|
||||
./minio --version
|
||||
./minio server --address ":11111" ./minio_data &
|
||||
|
||||
i=0
|
||||
while ! curl -v --silent http://localhost:11111 2>&1 | grep AccessDenied
|
||||
do
|
||||
if [[ $i == 60 ]]; then
|
||||
echo "Failed to setup minio"
|
||||
exit 0
|
||||
fi
|
||||
echo "Trying to connect to minio"
|
||||
sleep 1
|
||||
i=$((i + 1))
|
||||
done
|
||||
|
||||
lsof -i :11111
|
||||
|
||||
sleep 5
|
||||
|
||||
./mc alias set clickminio http://localhost:11111 clickhouse clickhouse
|
||||
./mc admin user add clickminio test testtest
|
||||
./mc admin policy set clickminio readwrite user=test
|
||||
./mc mb clickminio/test
|
||||
|
||||
|
||||
# Upload data to Minio. By default after unpacking all tests will in
|
||||
# /usr/share/clickhouse-test/queries
|
||||
|
||||
TEST_PATH=${1:-/usr/share/clickhouse-test}
|
||||
MINIO_DATA_PATH=${TEST_PATH}/queries/1_stateful/data_minio
|
||||
|
||||
# Iterating over globs will cause redudant FILE variale to be a path to a file, not a filename
|
||||
# shellcheck disable=SC2045
|
||||
for FILE in $(ls "${MINIO_DATA_PATH}"); do
|
||||
echo "$FILE";
|
||||
./mc cp "${MINIO_DATA_PATH}"/"$FILE" clickminio/test/"$FILE";
|
||||
done
|
||||
|
||||
mkdir -p ~/.aws
|
||||
cat <<EOT >> ~/.aws/credentials
|
||||
[default]
|
||||
aws_access_key_id=clickhouse
|
||||
aws_secret_access_key=clickhouse
|
||||
EOT
|
1
docker/test/stateful/setup_minio.sh
Symbolic link
1
docker/test/stateful/setup_minio.sh
Symbolic link
@ -0,0 +1 @@
|
||||
../stateless/setup_minio.sh
|
@ -5,37 +5,36 @@ FROM clickhouse/test-base:$FROM_TAG
|
||||
|
||||
ARG odbc_driver_url="https://github.com/ClickHouse/clickhouse-odbc/releases/download/v1.1.4.20200302/clickhouse-odbc-1.1.4-Linux.tar.gz"
|
||||
|
||||
# golang version 1.13 on Ubuntu 20 is enough for tests
|
||||
RUN apt-get update -y \
|
||||
&& env DEBIAN_FRONTEND=noninteractive \
|
||||
apt-get install --yes --no-install-recommends \
|
||||
awscli \
|
||||
brotli \
|
||||
expect \
|
||||
zstd \
|
||||
golang \
|
||||
lsof \
|
||||
mysql-client=8.0* \
|
||||
ncdu \
|
||||
netcat-openbsd \
|
||||
openjdk-11-jre-headless \
|
||||
openssl \
|
||||
postgresql-client \
|
||||
protobuf-compiler \
|
||||
python3 \
|
||||
python3-lxml \
|
||||
python3-pip \
|
||||
python3-requests \
|
||||
python3-termcolor \
|
||||
python3-pip \
|
||||
qemu-user-static \
|
||||
sqlite3 \
|
||||
sudo \
|
||||
# golang version 1.13 on Ubuntu 20 is enough for tests
|
||||
golang \
|
||||
telnet \
|
||||
tree \
|
||||
unixodbc \
|
||||
wget \
|
||||
mysql-client=8.0* \
|
||||
postgresql-client \
|
||||
sqlite3 \
|
||||
awscli \
|
||||
openjdk-11-jre-headless \
|
||||
rpm2cpio \
|
||||
cpio
|
||||
zstd \
|
||||
&& apt-get clean
|
||||
|
||||
|
||||
RUN pip3 install numpy scipy pandas Jinja2
|
||||
@ -53,13 +52,17 @@ RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
|
||||
ENV NUM_TRIES=1
|
||||
ENV MAX_RUN_TIME=0
|
||||
|
||||
# Unrelated to vars in setup_minio.sh, but should be the same there
|
||||
# to have the same binaries for local running scenario
|
||||
ARG MINIO_SERVER_VERSION=2022-01-03T18-22-58Z
|
||||
ARG MINIO_CLIENT_VERSION=2022-01-05T23-52-51Z
|
||||
ARG TARGETARCH
|
||||
|
||||
# Download Minio-related binaries
|
||||
RUN arch=${TARGETARCH:-amd64} \
|
||||
&& if [ "$arch" = "amd64" ] ; then wget "https://dl.min.io/server/minio/release/linux-${arch}/archive/minio-20220103182258.0.0.x86_64.rpm"; else wget "https://dl.min.io/server/minio/release/linux-${arch}/archive/minio-20220103182258.0.0.aarch64.rpm" ; fi \
|
||||
&& wget "https://dl.min.io/client/mc/release/linux-${arch}/mc" \
|
||||
&& chmod +x ./mc
|
||||
&& wget "https://dl.min.io/server/minio/release/linux-${arch}/archive/minio.RELEASE.${MINIO_SERVER_VERSION}" -O ./minio \
|
||||
&& wget "https://dl.min.io/client/mc/release/linux-${arch}/archive/mc.RELEASE.${MINIO_CLIENT_VERSION}" -O ./mc \
|
||||
&& chmod +x ./mc ./minio
|
||||
|
||||
|
||||
RUN wget 'https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz' \
|
||||
|
@ -18,7 +18,7 @@ ln -s /usr/share/clickhouse-test/clickhouse-test /usr/bin/clickhouse-test
|
||||
# install test configs
|
||||
/usr/share/clickhouse-test/config/install.sh
|
||||
|
||||
./setup_minio.sh
|
||||
./setup_minio.sh stateless
|
||||
./setup_hdfs_minicluster.sh
|
||||
|
||||
# For flaky check we also enable thread fuzzer
|
||||
|
@ -1,29 +1,41 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Usage for local run:
|
||||
#
|
||||
# ./docker/test/stateless/setup_minio.sh ./tests/
|
||||
#
|
||||
USAGE='Usage for local run:
|
||||
|
||||
./docker/test/stateless/setup_minio.sh { stateful | stateless } ./tests/
|
||||
|
||||
'
|
||||
|
||||
set -e -x -a -u
|
||||
|
||||
rpm2cpio ./minio-20220103182258.0.0.*.rpm | cpio -i --make-directories
|
||||
find / -name minio
|
||||
cp ./usr/local/bin/minio ./
|
||||
TEST_TYPE="$1"
|
||||
shift
|
||||
|
||||
case $TEST_TYPE in
|
||||
stateless) QUERY_DIR=0_stateless ;;
|
||||
stateful) QUERY_DIR=1_stateful ;;
|
||||
*) echo "unknown test type $TEST_TYPE"; echo "${USAGE}"; exit 1 ;;
|
||||
esac
|
||||
|
||||
ls -lha
|
||||
|
||||
mkdir -p ./minio_data
|
||||
|
||||
if [ ! -f ./minio ]; then
|
||||
MINIO_SERVER_VERSION=${MINIO_SERVER_VERSION:-2022-01-03T18-22-58Z}
|
||||
MINIO_CLIENT_VERSION=${MINIO_CLIENT_VERSION:-2022-01-05T23-52-51Z}
|
||||
case $(uname -m) in
|
||||
x86_64) BIN_ARCH=amd64 ;;
|
||||
aarch64) BIN_ARCH=arm64 ;;
|
||||
*) echo "unknown architecture $(uname -m)"; exit 1 ;;
|
||||
esac
|
||||
echo 'MinIO binary not found, downloading...'
|
||||
|
||||
BINARY_TYPE=$(uname -s | tr '[:upper:]' '[:lower:]')
|
||||
|
||||
wget "https://dl.min.io/server/minio/release/${BINARY_TYPE}-amd64/minio" \
|
||||
&& chmod +x ./minio \
|
||||
&& wget "https://dl.min.io/client/mc/release/${BINARY_TYPE}-amd64/mc" \
|
||||
&& chmod +x ./mc
|
||||
wget "https://dl.min.io/server/minio/release/${BINARY_TYPE}-${BIN_ARCH}/archive/minio.RELEASE.${MINIO_SERVER_VERSION}" -O ./minio \
|
||||
&& wget "https://dl.min.io/client/mc/release/${BINARY_TYPE}-${BIN_ARCH}/archive/mc.RELEASE.${MINIO_CLIENT_VERSION}" -O ./mc \
|
||||
&& chmod +x ./mc ./minio
|
||||
fi
|
||||
|
||||
MINIO_ROOT_USER=${MINIO_ROOT_USER:-clickhouse}
|
||||
@ -52,14 +64,16 @@ sleep 5
|
||||
./mc admin user add clickminio test testtest
|
||||
./mc admin policy set clickminio readwrite user=test
|
||||
./mc mb clickminio/test
|
||||
./mc policy set public clickminio/test
|
||||
if [ "$TEST_TYPE" = "stateless" ]; then
|
||||
./mc policy set public clickminio/test
|
||||
fi
|
||||
|
||||
|
||||
# Upload data to Minio. By default after unpacking all tests will in
|
||||
# /usr/share/clickhouse-test/queries
|
||||
|
||||
TEST_PATH=${1:-/usr/share/clickhouse-test}
|
||||
MINIO_DATA_PATH=${TEST_PATH}/queries/0_stateless/data_minio
|
||||
MINIO_DATA_PATH=${TEST_PATH}/queries/${QUERY_DIR}/data_minio
|
||||
|
||||
# Iterating over globs will cause redudant FILE variale to be a path to a file, not a filename
|
||||
# shellcheck disable=SC2045
|
||||
@ -71,6 +85,6 @@ done
|
||||
mkdir -p ~/.aws
|
||||
cat <<EOT >> ~/.aws/credentials
|
||||
[default]
|
||||
aws_access_key_id=clickhouse
|
||||
aws_secret_access_key=clickhouse
|
||||
aws_access_key_id=${MINIO_ROOT_USER}
|
||||
aws_secret_access_key=${MINIO_ROOT_PASSWORD}
|
||||
EOT
|
||||
|
@ -174,7 +174,7 @@ install_packages package_folder
|
||||
|
||||
configure
|
||||
|
||||
./setup_minio.sh
|
||||
./setup_minio.sh stateful # to have a proper environment
|
||||
|
||||
start
|
||||
|
||||
|
@ -8,16 +8,16 @@ ARG apt_archive="http://archive.ubuntu.com"
|
||||
RUN sed -i "s|http://archive.ubuntu.com|$apt_archive|g" /etc/apt/sources.list
|
||||
|
||||
RUN apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --yes \
|
||||
aspell \
|
||||
curl \
|
||||
git \
|
||||
libxml2-utils \
|
||||
moreutils \
|
||||
pylint \
|
||||
python3-fuzzywuzzy \
|
||||
python3-pip \
|
||||
shellcheck \
|
||||
yamllint \
|
||||
&& pip3 install black boto3 codespell dohq-artifactory PyGithub unidiff
|
||||
&& pip3 install black boto3 codespell dohq-artifactory PyGithub unidiff pylint==2.6.2
|
||||
|
||||
# Architecture of the image when BuildKit/buildx is used
|
||||
ARG TARGETARCH
|
||||
|
@ -18,6 +18,7 @@ def process_result(result_folder):
|
||||
("typos", "typos_output.txt"),
|
||||
("whitespaces", "whitespaces_output.txt"),
|
||||
("workflows", "workflows_output.txt"),
|
||||
("doc typos", "doc_spell_output.txt"),
|
||||
)
|
||||
|
||||
for name, out_file in checks:
|
||||
|
@ -11,6 +11,8 @@ echo "Check python formatting with black" | ts
|
||||
./check-black -n |& tee /test_output/black_output.txt
|
||||
echo "Check typos" | ts
|
||||
./check-typos |& tee /test_output/typos_output.txt
|
||||
echo "Check docs spelling" | ts
|
||||
./check-doc-aspell |& tee /test_output/doc_spell_output.txt
|
||||
echo "Check whitespaces" | ts
|
||||
./check-whitespaces -n |& tee /test_output/whitespaces_output.txt
|
||||
echo "Check workflows" | ts
|
||||
|
@ -11,6 +11,7 @@ TIMEOUT_SIGN = "[ Timeout! "
|
||||
UNKNOWN_SIGN = "[ UNKNOWN "
|
||||
SKIPPED_SIGN = "[ SKIPPED "
|
||||
HUNG_SIGN = "Found hung queries in processlist"
|
||||
DATABASE_SIGN = "Database: "
|
||||
|
||||
SUCCESS_FINISH_SIGNS = ["All tests have finished", "No tests were run"]
|
||||
|
||||
@ -27,14 +28,19 @@ def process_test_log(log_path):
|
||||
retries = False
|
||||
success_finish = False
|
||||
test_results = []
|
||||
test_end = True
|
||||
with open(log_path, "r") as test_file:
|
||||
for line in test_file:
|
||||
original_line = line
|
||||
line = line.strip()
|
||||
|
||||
if any(s in line for s in SUCCESS_FINISH_SIGNS):
|
||||
success_finish = True
|
||||
# Ignore hung check report, since it may be quite large.
|
||||
# (and may break python parser which has limit of 128KiB for each row).
|
||||
if HUNG_SIGN in line:
|
||||
hung = True
|
||||
break
|
||||
if RETRIES_SIGN in line:
|
||||
retries = True
|
||||
if any(
|
||||
@ -67,8 +73,17 @@ def process_test_log(log_path):
|
||||
else:
|
||||
success += int(OK_SIGN in line)
|
||||
test_results.append((test_name, "OK", test_time, []))
|
||||
elif len(test_results) > 0 and test_results[-1][1] == "FAIL":
|
||||
test_end = False
|
||||
elif (
|
||||
len(test_results) > 0 and test_results[-1][1] == "FAIL" and not test_end
|
||||
):
|
||||
test_results[-1][3].append(original_line)
|
||||
# Database printed after everything else in case of failures,
|
||||
# so this is a stop marker for capturing test output.
|
||||
#
|
||||
# And it is handled after everything else to include line with database into the report.
|
||||
if DATABASE_SIGN in line:
|
||||
test_end = True
|
||||
|
||||
test_results = [
|
||||
(test[0], test[1], test[2], "".join(test[3])) for test in test_results
|
||||
@ -113,7 +128,7 @@ def process_result(result_path):
|
||||
test_results,
|
||||
) = process_test_log(result_path)
|
||||
is_flacky_check = 1 < int(os.environ.get("NUM_TRIES", 1))
|
||||
logging.info("Is flacky check: %s", is_flacky_check)
|
||||
logging.info("Is flaky check: %s", is_flacky_check)
|
||||
# If no tests were run (success == 0) it indicates an error (e.g. server did not start or crashed immediately)
|
||||
# But it's Ok for "flaky checks" - they can contain just one test for check which is marked as skipped.
|
||||
if failed != 0 or unknown != 0 or (success == 0 and (not is_flacky_check)):
|
||||
|
@ -138,7 +138,7 @@ It's important to name tests correctly, so one could turn some tests subset off
|
||||
|
||||
| Tester flag| What should be in test name | When flag should be added |
|
||||
|---|---|---|---|
|
||||
| `--[no-]zookeeper`| "zookeeper" or "replica" | Test uses tables from ReplicatedMergeTree family |
|
||||
| `--[no-]zookeeper`| "zookeeper" or "replica" | Test uses tables from `ReplicatedMergeTree` family |
|
||||
| `--[no-]shard` | "shard" or "distributed" or "global"| Test using connections to 127.0.0.2 or similar |
|
||||
| `--[no-]long` | "long" or "deadlock" or "race" | Test runs longer than 60 seconds |
|
||||
|
||||
|
@ -5,7 +5,7 @@ sidebar_position: 62
|
||||
|
||||
# Overview of ClickHouse Architecture
|
||||
|
||||
ClickHouse is a true column-oriented DBMS. Data is stored by columns, and during the execution of arrays (vectors or chunks of columns).
|
||||
ClickHouse is a true column-oriented DBMS. Data is stored by columns, and during the execution of arrays (vectors or chunks of columns).
|
||||
Whenever possible, operations are dispatched on arrays, rather than on individual values. It is called “vectorized query execution” and it helps lower the cost of actual data processing.
|
||||
|
||||
> This idea is nothing new. It dates back to the `APL` (A programming language, 1957) and its descendants: `A +` (APL dialect), `J` (1990), `K` (1993), and `Q` (programming language from Kx Systems, 2003). Array programming is used in scientific data processing. Neither is this idea something new in relational databases: for example, it is used in the `VectorWise` system (also known as Actian Vector Analytic Database by Actian Corporation).
|
||||
@ -149,13 +149,13 @@ The server implements several different interfaces:
|
||||
- A TCP interface for the native ClickHouse client and for cross-server communication during distributed query execution.
|
||||
- An interface for transferring data for replication.
|
||||
|
||||
Internally, it is just a primitive multithreaded server without coroutines or fibers. Since the server is not designed to process a high rate of simple queries but to process a relatively low rate of complex queries, each of them can process a vast amount of data for analytics.
|
||||
Internally, it is just a primitive multithread server without coroutines or fibers. Since the server is not designed to process a high rate of simple queries but to process a relatively low rate of complex queries, each of them can process a vast amount of data for analytics.
|
||||
|
||||
The server initializes the `Context` class with the necessary environment for query execution: the list of available databases, users and access rights, settings, clusters, the process list, the query log, and so on. Interpreters use this environment.
|
||||
|
||||
We maintain full backward and forward compatibility for the server TCP protocol: old clients can talk to new servers, and new clients can talk to old servers. But we do not want to maintain it eternally, and we are removing support for old versions after about one year.
|
||||
|
||||
:::note
|
||||
:::note
|
||||
For most external applications, we recommend using the HTTP interface because it is simple and easy to use. The TCP protocol is more tightly linked to internal data structures: it uses an internal format for passing blocks of data, and it uses custom framing for compressed data. We haven’t released a C library for that protocol because it requires linking most of the ClickHouse codebase, which is not practical.
|
||||
:::
|
||||
|
||||
@ -178,7 +178,7 @@ To execute queries and do side activities ClickHouse allocates threads from one
|
||||
|
||||
Server pool is a `Poco::ThreadPool` class instance defined in `Server::main()` method. It can have at most `max_connection` threads. Every thread is dedicated to a single active connection.
|
||||
|
||||
Global thread pool is `GlobalThreadPool` singleton class. To allocate thread from it `ThreadFromGlobalPool` is used. It has an interface similar to `std::thread`, but pulls thread from the global pool and does all necessary initializations. It is configured with the following settings:
|
||||
Global thread pool is `GlobalThreadPool` singleton class. To allocate thread from it `ThreadFromGlobalPool` is used. It has an interface similar to `std::thread`, but pulls thread from the global pool and does all necessary initialization. It is configured with the following settings:
|
||||
* `max_thread_pool_size` - limit on thread count in pool.
|
||||
* `max_thread_pool_free_size` - limit on idle thread count waiting for new jobs.
|
||||
* `thread_pool_queue_size` - limit on scheduled job count.
|
||||
@ -189,7 +189,7 @@ IO thread pool is implemented as a plain `ThreadPool` accessible via `IOThreadPo
|
||||
|
||||
For periodic task execution there is `BackgroundSchedulePool` class. You can register tasks using `BackgroundSchedulePool::TaskHolder` objects and the pool ensures that no task runs two jobs at the same time. It also allows you to postpone task execution to a specific instant in the future or temporarily deactivate task. Global `Context` provides a few instances of this class for different purposes. For general purpose tasks `Context::getSchedulePool()` is used.
|
||||
|
||||
There are also specialized thread pools for preemptable tasks. Such `IExecutableTask` task can be split into ordered sequence of jobs, called steps. To schedule these tasks in a manner allowing short tasks to be prioritied over long ones `MergeTreeBackgroundExecutor` is used. As name suggests it is used for background MergeTree related operations such as merges, mutations, fetches and moves. Pool instances are available using `Context::getCommonExecutor()` and other similar methods.
|
||||
There are also specialized thread pools for preemptable tasks. Such `IExecutableTask` task can be split into ordered sequence of jobs, called steps. To schedule these tasks in a manner allowing short tasks to be prioritized over long ones `MergeTreeBackgroundExecutor` is used. As name suggests it is used for background MergeTree related operations such as merges, mutations, fetches and moves. Pool instances are available using `Context::getCommonExecutor()` and other similar methods.
|
||||
|
||||
No matter what pool is used for a job, at start `ThreadStatus` instance is created for this job. It encapsulates all per-thread information: thread id, query id, performance counters, resource consumption and many other useful data. Job can access it via thread local pointer by `CurrentThread::get()` call, so we do not need to pass it to every function.
|
||||
|
||||
@ -201,7 +201,7 @@ Servers in a cluster setup are mostly independent. You can create a `Distributed
|
||||
|
||||
Things become more complicated when you have subqueries in IN or JOIN clauses, and each of them uses a `Distributed` table. We have different strategies for the execution of these queries.
|
||||
|
||||
There is no global query plan for distributed query execution. Each node has its local query plan for its part of the job. We only have simple one-pass distributed query execution: we send queries for remote nodes and then merge the results. But this is not feasible for complicated queries with high cardinality GROUP BYs or with a large amount of temporary data for JOIN. In such cases, we need to “reshuffle” data between servers, which requires additional coordination. ClickHouse does not support that kind of query execution, and we need to work on it.
|
||||
There is no global query plan for distributed query execution. Each node has its local query plan for its part of the job. We only have simple one-pass distributed query execution: we send queries for remote nodes and then merge the results. But this is not feasible for complicated queries with high cardinality `GROUP BY`s or with a large amount of temporary data for JOIN. In such cases, we need to “reshuffle” data between servers, which requires additional coordination. ClickHouse does not support that kind of query execution, and we need to work on it.
|
||||
|
||||
## Merge Tree {#merge-tree}
|
||||
|
||||
@ -231,7 +231,7 @@ Replication is physical: only compressed parts are transferred between nodes, no
|
||||
|
||||
Besides, each replica stores its state in ZooKeeper as the set of parts and its checksums. When the state on the local filesystem diverges from the reference state in ZooKeeper, the replica restores its consistency by downloading missing and broken parts from other replicas. When there is some unexpected or broken data in the local filesystem, ClickHouse does not remove it, but moves it to a separate directory and forgets it.
|
||||
|
||||
:::note
|
||||
:::note
|
||||
The ClickHouse cluster consists of independent shards, and each shard consists of replicas. The cluster is **not elastic**, so after adding a new shard, data is not rebalanced between shards automatically. Instead, the cluster load is supposed to be adjusted to be uneven. This implementation gives you more control, and it is ok for relatively small clusters, such as tens of nodes. But for clusters with hundreds of nodes that we are using in production, this approach becomes a significant drawback. We should implement a table engine that spans across the cluster with dynamically replicated regions that could be split and balanced between clusters automatically.
|
||||
:::
|
||||
|
||||
|
@ -4,7 +4,7 @@ sidebar_label: Build on Mac OS X
|
||||
description: How to build ClickHouse on Mac OS X
|
||||
---
|
||||
|
||||
# How to Build ClickHouse on Mac OS X
|
||||
# How to Build ClickHouse on Mac OS X
|
||||
|
||||
:::info You don't have to build ClickHouse yourself!
|
||||
You can install pre-built ClickHouse as described in [Quick Start](https://clickhouse.com/#quick-start). Follow **macOS (Intel)** or **macOS (Apple silicon)** installation instructions.
|
||||
@ -20,9 +20,9 @@ It is also possible to compile with Apple's XCode `apple-clang` or Homebrew's `g
|
||||
|
||||
First install [Homebrew](https://brew.sh/)
|
||||
|
||||
## For Apple's Clang (discouraged): Install Xcode and Command Line Tools {#install-xcode-and-command-line-tools}
|
||||
## For Apple's Clang (discouraged): Install XCode and Command Line Tools {#install-xcode-and-command-line-tools}
|
||||
|
||||
Install the latest [Xcode](https://apps.apple.com/am/app/xcode/id497799835?mt=12) from App Store.
|
||||
Install the latest [XCode](https://apps.apple.com/am/app/xcode/id497799835?mt=12) from App Store.
|
||||
|
||||
Open it at least once to accept the end-user license agreement and automatically install the required components.
|
||||
|
||||
@ -62,7 +62,7 @@ cmake --build build
|
||||
# The resulting binary will be created at: build/programs/clickhouse
|
||||
```
|
||||
|
||||
To build using Xcode's native AppleClang compiler in Xcode IDE (this option is only for development builds and workflows, and is **not recommended** unless you know what you are doing):
|
||||
To build using XCode native AppleClang compiler in XCode IDE (this option is only for development builds and workflows, and is **not recommended** unless you know what you are doing):
|
||||
|
||||
``` bash
|
||||
cd ClickHouse
|
||||
@ -71,7 +71,7 @@ mkdir build
|
||||
cd build
|
||||
XCODE_IDE=1 ALLOW_APPLECLANG=1 cmake -G Xcode -DCMAKE_BUILD_TYPE=Debug -DENABLE_JEMALLOC=OFF ..
|
||||
cmake --open .
|
||||
# ...then, in Xcode IDE select ALL_BUILD scheme and start the building process.
|
||||
# ...then, in XCode IDE select ALL_BUILD scheme and start the building process.
|
||||
# The resulting binary will be created at: ./programs/Debug/clickhouse
|
||||
```
|
||||
|
||||
@ -91,9 +91,9 @@ cmake --build build
|
||||
|
||||
## Caveats {#caveats}
|
||||
|
||||
If you intend to run `clickhouse-server`, make sure to increase the system’s maxfiles variable.
|
||||
If you intend to run `clickhouse-server`, make sure to increase the system’s `maxfiles` variable.
|
||||
|
||||
:::note
|
||||
:::note
|
||||
You’ll need to use sudo.
|
||||
:::
|
||||
|
||||
|
@ -19,7 +19,7 @@ The following tutorial is based on the Ubuntu Linux system. With appropriate cha
|
||||
### Install Git, CMake, Python and Ninja {#install-git-cmake-python-and-ninja}
|
||||
|
||||
``` bash
|
||||
sudo apt-get install git cmake python ninja-build
|
||||
sudo apt-get install git cmake ccache python3 ninja-build
|
||||
```
|
||||
|
||||
Or cmake3 instead of cmake on older systems.
|
||||
@ -130,7 +130,7 @@ Here is an example of how to install the new `cmake` from the official website:
|
||||
```
|
||||
wget https://github.com/Kitware/CMake/releases/download/v3.22.2/cmake-3.22.2-linux-x86_64.sh
|
||||
chmod +x cmake-3.22.2-linux-x86_64.sh
|
||||
./cmake-3.22.2-linux-x86_64.sh
|
||||
./cmake-3.22.2-linux-x86_64.sh
|
||||
export PATH=/home/milovidov/work/cmake-3.22.2-linux-x86_64/bin/:${PATH}
|
||||
hash cmake
|
||||
```
|
||||
@ -163,7 +163,7 @@ ClickHouse is available in pre-built binaries and packages. Binaries are portabl
|
||||
|
||||
They are built for stable, prestable and testing releases as long as for every commit to master and for every pull request.
|
||||
|
||||
To find the freshest build from `master`, go to [commits page](https://github.com/ClickHouse/ClickHouse/commits/master), click on the first green checkmark or red cross near commit, and click to the “Details” link right after “ClickHouse Build Check”.
|
||||
To find the freshest build from `master`, go to [commits page](https://github.com/ClickHouse/ClickHouse/commits/master), click on the first green check mark or red cross near commit, and click to the “Details” link right after “ClickHouse Build Check”.
|
||||
|
||||
## Faster builds for development: Split build configuration {#split-build}
|
||||
|
||||
|
@ -19,7 +19,7 @@ cmake .. \
|
||||
|
||||
## CMake files types
|
||||
|
||||
1. ClickHouse's source CMake files (located in the root directory and in /src).
|
||||
1. ClickHouse source CMake files (located in the root directory and in /src).
|
||||
2. Arch-dependent CMake files (located in /cmake/*os_name*).
|
||||
3. Libraries finders (search for contrib libraries, located in /contrib/*/CMakeLists.txt).
|
||||
4. Contrib build CMake files (used instead of libraries' own CMake files, located in /cmake/modules)
|
||||
@ -456,7 +456,7 @@ option(ENABLE_TESTS "Provide unit_test_dbms target with Google.test unit tests"
|
||||
|
||||
#### If the option's state could produce unwanted (or unusual) result, explicitly warn the user.
|
||||
|
||||
Suppose you have an option that may strip debug symbols from the ClickHouse's part.
|
||||
Suppose you have an option that may strip debug symbols from the ClickHouse part.
|
||||
This can speed up the linking process, but produces a binary that cannot be debugged.
|
||||
In that case, prefer explicitly raising a warning telling the developer that he may be doing something wrong.
|
||||
Also, such options should be disabled if applies.
|
||||
|
@ -31,7 +31,7 @@ If you are not sure what to do, ask a maintainer for help.
|
||||
## Merge With Master
|
||||
|
||||
Verifies that the PR can be merged to master. If not, it will fail with the
|
||||
message 'Cannot fetch mergecommit'. To fix this check, resolve the conflict as
|
||||
message `Cannot fetch mergecommit`. To fix this check, resolve the conflict as
|
||||
described in the [GitHub
|
||||
documentation](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/resolving-a-merge-conflict-on-github),
|
||||
or merge the `master` branch to your pull request branch using git.
|
||||
@ -57,7 +57,7 @@ You have to specify a changelog category for your change (e.g., Bug Fix), and
|
||||
write a user-readable message describing the change for [CHANGELOG.md](../whats-new/changelog/)
|
||||
|
||||
|
||||
## Push To Dockerhub
|
||||
## Push To DockerHub
|
||||
|
||||
Builds docker images used for build and tests, then pushes them to DockerHub.
|
||||
|
||||
@ -118,7 +118,7 @@ Builds ClickHouse in various configurations for use in further steps. You have t
|
||||
- **Compiler**: `gcc-9` or `clang-10` (or `clang-10-xx` for other architectures e.g. `clang-10-freebsd`).
|
||||
- **Build type**: `Debug` or `RelWithDebInfo` (cmake).
|
||||
- **Sanitizer**: `none` (without sanitizers), `address` (ASan), `memory` (MSan), `undefined` (UBSan), or `thread` (TSan).
|
||||
- **Splitted** `splitted` is a [split build](../development/build.md#split-build)
|
||||
- **Split** `splitted` is a [split build](../development/build.md#split-build)
|
||||
- **Status**: `success` or `fail`
|
||||
- **Build log**: link to the building and files copying log, useful when build failed.
|
||||
- **Build time**.
|
||||
|
@ -96,9 +96,9 @@ SELECT library_name, license_type, license_path FROM system.licenses ORDER BY li
|
||||
|
||||
## Adding new third-party libraries and maintaining patches in third-party libraries {#adding-third-party-libraries}
|
||||
|
||||
1. Each third-party libary must reside in a dedicated directory under the `contrib/` directory of the ClickHouse repository. Avoid dumps/copies of external code, instead use Git's submodule feature to pull third-party code from an external upstream repository.
|
||||
2. Submodules are listed in `.gitmodule`. If the external library can be used as-is, you may reference the upstream repository directly. Otherwise, i.e. the external libary requires patching/customization, create a fork of the official repository in the [Clickhouse organization in GitHub](https://github.com/ClickHouse).
|
||||
1. Each third-party library must reside in a dedicated directory under the `contrib/` directory of the ClickHouse repository. Avoid dumps/copies of external code, instead use Git submodule feature to pull third-party code from an external upstream repository.
|
||||
2. Submodules are listed in `.gitmodule`. If the external library can be used as-is, you may reference the upstream repository directly. Otherwise, i.e. the external library requires patching/customization, create a fork of the official repository in the [Clickhouse organization in GitHub](https://github.com/ClickHouse).
|
||||
3. In the latter case, create a branch with `clickhouse/` prefix from the branch you want to integrate, e.g. `clickhouse/master` (for `master`) or `clickhouse/release/vX.Y.Z` (for a `release/vX.Y.Z` tag). The purpose of this branch is to isolate customization of the library from upstream work. For example, pulls from the upstream repository into the fork will leave all `clickhouse/` branches unaffected. Submodules in `contrib/` must only track `clickhouse/` branches of forked third-party repositories.
|
||||
4. To patch a fork of a third-party library, create a dedicated branch with `clickhouse/` prefix in the fork, e.g. `clickhouse/fix-some-desaster`. Finally, merge the patch branch into the custom tracking branch (e.g. `clickhouse/master` or `clickhouse/release/vX.Y.Z`) using a PR.
|
||||
5. Always create patches of third-party libraries with the official repository in mind. Once a PR of a patch branch to the `clickhouse/` branch in the fork repository is done and the submodule version in ClickHouse's official repository is bumped, consider opening another PR from the patch branch to the upstream library repository. This ensures, that 1) the contribution has more than a single use case and importance, 2) others will also benefit from it, 3) the change will not remain a maintenance burden solely on ClickHouse developers.
|
||||
5. Always create patches of third-party libraries with the official repository in mind. Once a PR of a patch branch to the `clickhouse/` branch in the fork repository is done and the submodule version in ClickHouse official repository is bumped, consider opening another PR from the patch branch to the upstream library repository. This ensures, that 1) the contribution has more than a single use case and importance, 2) others will also benefit from it, 3) the change will not remain a maintenance burden solely on ClickHouse developers.
|
||||
9. To update a submodule with changes in the upstream repository, first merge upstream `master` (or a new `versionX.Y.Z` tag) into the `clickhouse`-tracking branch in the fork repository. Conflicts with patches/customization will need to be resolved in this merge (see Step 4.). Once the merge is done, bump the submodule in ClickHouse to point to the new hash in the fork.
|
||||
|
@ -70,7 +70,7 @@ You can also clone the repository via https protocol:
|
||||
|
||||
This, however, will not let you send your changes to the server. You can still use it temporarily and add the SSH keys later replacing the remote address of the repository with `git remote` command.
|
||||
|
||||
You can also add original ClickHouse repo’s address to your local repository to pull updates from there:
|
||||
You can also add original ClickHouse repo address to your local repository to pull updates from there:
|
||||
|
||||
git remote add upstream git@github.com:ClickHouse/ClickHouse.git
|
||||
|
||||
@ -177,7 +177,7 @@ If you require to build all the binaries (utilities and tests), you should run n
|
||||
|
||||
Full build requires about 30GB of free disk space or 15GB to build the main binaries.
|
||||
|
||||
When a large amount of RAM is available on build machine you should limit the number of build tasks run in parallel with `-j` param:
|
||||
When a large amount of RAM is available on build machine you should limit the number of build tasks run in parallel with `-j` parameter:
|
||||
|
||||
ninja -j 1 clickhouse-server clickhouse-client
|
||||
|
||||
@ -269,7 +269,7 @@ Developing ClickHouse often requires loading realistic datasets. It is particula
|
||||
|
||||
Navigate to your fork repository in GitHub’s UI. If you have been developing in a branch, you need to select that branch. There will be a “Pull request” button located on the screen. In essence, this means “create a request for accepting my changes into the main repository”.
|
||||
|
||||
A pull request can be created even if the work is not completed yet. In this case please put the word “WIP” (work in progress) at the beginning of the title, it can be changed later. This is useful for cooperative reviewing and discussion of changes as well as for running all of the available tests. It is important that you provide a brief description of your changes, it will later be used for generating release changelogs.
|
||||
A pull request can be created even if the work is not completed yet. In this case please put the word “WIP” (work in progress) at the beginning of the title, it can be changed later. This is useful for cooperative reviewing and discussion of changes as well as for running all of the available tests. It is important that you provide a brief description of your changes, it will later be used for generating release changelog.
|
||||
|
||||
Testing will commence as soon as ClickHouse employees label your PR with a tag “can be tested”. The results of some first checks (e.g. code style) will come in within several minutes. Build check results will arrive within half an hour. And the main set of tests will report itself within an hour.
|
||||
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
Rust library integration will be described based on BLAKE3 hash-function integration.
|
||||
|
||||
The first step is forking a library and making neccessary changes for Rust and C/C++ compatibility.
|
||||
The first step is forking a library and making necessary changes for Rust and C/C++ compatibility.
|
||||
|
||||
After forking library repository you need to change target settings in Cargo.toml file. Firstly, you need to switch build to static library. Secondly, you need to add cbindgen crate to the crate list. We will use it later to generate C-header automatically.
|
||||
|
||||
@ -51,9 +51,9 @@ pub unsafe extern "C" fn blake3_apply_shim(
|
||||
}
|
||||
```
|
||||
|
||||
This method gets C-compatible string, its size and output string pointer as input. Then, it converts C-compatible inputs into types that are used by actual library methods and calls them. After that, it should convert library methods' outputs back into C-compatible type. In that particular case library supported direct writing into pointer by method fill(), so the convertion was not needed. The main advice here is to create less methods, so you will need to do less convertions on each method call and won't create much overhead.
|
||||
This method gets C-compatible string, its size and output string pointer as input. Then, it converts C-compatible inputs into types that are used by actual library methods and calls them. After that, it should convert library methods' outputs back into C-compatible type. In that particular case library supported direct writing into pointer by method fill(), so the conversion was not needed. The main advice here is to create less methods, so you will need to do less conversions on each method call and won't create much overhead.
|
||||
|
||||
Also, you should use attribute #[no_mangle] and extern "C" for every C-compatible attribute. Without it library can compile incorrectly and cbindgen won't launch header autogeneration.
|
||||
Also, you should use attribute #[no_mangle] and `extern "C"` for every C-compatible attribute. Without it library can compile incorrectly and cbindgen won't launch header autogeneration.
|
||||
|
||||
After all these steps you can test your library in a small project to find all problems with compatibility or header generation. If any problems occur during header generation, you can try to configure it with cbindgen.toml file (you can find an example of it in BLAKE3 directory or a template here: [https://github.com/eqrion/cbindgen/blob/master/template.toml](https://github.com/eqrion/cbindgen/blob/master/template.toml)). If everything works correctly, you can finally integrate its methods into ClickHouse.
|
||||
|
||||
|
@ -4,7 +4,7 @@ sidebar_label: C++ Guide
|
||||
description: A list of recommendations regarding coding style, naming convention, formatting and more
|
||||
---
|
||||
|
||||
# How to Write C++ Code
|
||||
# How to Write C++ Code
|
||||
|
||||
## General Recommendations {#general-recommendations}
|
||||
|
||||
@ -196,7 +196,7 @@ std::cerr << static_cast<int>(c) << std::endl;
|
||||
|
||||
The same is true for small methods in any classes or structs.
|
||||
|
||||
For templated classes and structs, do not separate the method declarations from the implementation (because otherwise they must be defined in the same translation unit).
|
||||
For template classes and structs, do not separate the method declarations from the implementation (because otherwise they must be defined in the same translation unit).
|
||||
|
||||
**31.** You can wrap lines at 140 characters, instead of 80.
|
||||
|
||||
@ -285,7 +285,7 @@ Note: You can use Doxygen to generate documentation from these comments. But Dox
|
||||
/// WHAT THE FAIL???
|
||||
```
|
||||
|
||||
**14.** Do not use comments to make delimeters.
|
||||
**14.** Do not use comments to make delimiters.
|
||||
|
||||
``` cpp
|
||||
///******************************************************
|
||||
@ -491,7 +491,7 @@ if (0 != close(fd))
|
||||
throwFromErrno("Cannot close file " + file_name, ErrorCodes::CANNOT_CLOSE_FILE);
|
||||
```
|
||||
|
||||
You can use assert to check invariants in code.
|
||||
You can use assert to check invariant in code.
|
||||
|
||||
**4.** Exception types.
|
||||
|
||||
@ -552,9 +552,9 @@ Do not try to implement lock-free data structures unless it is your primary area
|
||||
|
||||
In most cases, prefer references.
|
||||
|
||||
**10.** const.
|
||||
**10.** `const`.
|
||||
|
||||
Use constant references, pointers to constants, `const_iterator`, and const methods.
|
||||
Use constant references, pointers to constants, `const_iterator`, and `const` methods.
|
||||
|
||||
Consider `const` to be default and use non-`const` only when necessary.
|
||||
|
||||
@ -596,7 +596,7 @@ public:
|
||||
AggregateFunctionPtr get(const String & name, const DataTypes & argument_types) const;
|
||||
```
|
||||
|
||||
**15.** namespace.
|
||||
**15.** `namespace`.
|
||||
|
||||
There is no need to use a separate `namespace` for application code.
|
||||
|
||||
@ -606,7 +606,7 @@ For medium to large libraries, put everything in a `namespace`.
|
||||
|
||||
In the library’s `.h` file, you can use `namespace detail` to hide implementation details not needed for the application code.
|
||||
|
||||
In a `.cpp` file, you can use a `static` or anonymous namespace to hide symbols.
|
||||
In a `.cpp` file, you can use a `static` or anonymous `namespace` to hide symbols.
|
||||
|
||||
Also, a `namespace` can be used for an `enum` to prevent the corresponding names from falling into an external `namespace` (but it’s better to use an `enum class`).
|
||||
|
||||
|
@ -4,7 +4,7 @@ sidebar_label: Testing
|
||||
description: Most of ClickHouse features can be tested with functional tests and they are mandatory to use for every change in ClickHouse code that can be tested that way.
|
||||
---
|
||||
|
||||
# ClickHouse Testing
|
||||
# ClickHouse Testing
|
||||
|
||||
## Functional Tests
|
||||
|
||||
@ -85,7 +85,7 @@ Performance tests allow to measure and compare performance of some isolated part
|
||||
|
||||
Each test run one or multiple queries (possibly with combinations of parameters) in a loop.
|
||||
|
||||
If you want to improve performance of ClickHouse in some scenario, and if improvements can be observed on simple queries, it is highly recommended to write a performance test. It always makes sense to use `perf top` or other perf tools during your tests.
|
||||
If you want to improve performance of ClickHouse in some scenario, and if improvements can be observed on simple queries, it is highly recommended to write a performance test. It always makes sense to use `perf top` or other `perf` tools during your tests.
|
||||
|
||||
## Test Tools and Scripts {#test-tools-and-scripts}
|
||||
|
||||
@ -228,7 +228,7 @@ Our Security Team did some basic overview of ClickHouse capabilities from the se
|
||||
|
||||
We run `clang-tidy` on per-commit basis. `clang-static-analyzer` checks are also enabled. `clang-tidy` is also used for some style checks.
|
||||
|
||||
We have evaluated `clang-tidy`, `Coverity`, `cppcheck`, `PVS-Studio`, `tscancode`, `CodeQL`. You will find instructions for usage in `tests/instructions/` directory.
|
||||
We have evaluated `clang-tidy`, `Coverity`, `cppcheck`, `PVS-Studio`, `tscancode`, `CodeQL`. You will find instructions for usage in `tests/instructions/` directory.
|
||||
|
||||
If you use `CLion` as an IDE, you can leverage some `clang-tidy` checks out of the box.
|
||||
|
||||
@ -244,7 +244,7 @@ In debug build we also involve a customization of libc that ensures that no "har
|
||||
|
||||
Debug assertions are used extensively.
|
||||
|
||||
In debug build, if exception with "logical error" code (implies a bug) is being thrown, the program is terminated prematurally. It allows to use exceptions in release build but make it an assertion in debug build.
|
||||
In debug build, if exception with "logical error" code (implies a bug) is being thrown, the program is terminated prematurely. It allows to use exceptions in release build but make it an assertion in debug build.
|
||||
|
||||
Debug version of jemalloc is used for debug builds.
|
||||
Debug version of libc++ is used for debug builds.
|
||||
@ -253,7 +253,7 @@ Debug version of libc++ is used for debug builds.
|
||||
|
||||
Data stored on disk is checksummed. Data in MergeTree tables is checksummed in three ways simultaneously* (compressed data blocks, uncompressed data blocks, the total checksum across blocks). Data transferred over network between client and server or between servers is also checksummed. Replication ensures bit-identical data on replicas.
|
||||
|
||||
It is required to protect from faulty hardware (bit rot on storage media, bit flips in RAM on server, bit flips in RAM of network controller, bit flips in RAM of network switch, bit flips in RAM of client, bit flips on the wire). Note that bit flips are common and likely to occur even for ECC RAM and in presense of TCP checksums (if you manage to run thousands of servers processing petabytes of data each day). [See the video (russian)](https://www.youtube.com/watch?v=ooBAQIe0KlQ).
|
||||
It is required to protect from faulty hardware (bit rot on storage media, bit flips in RAM on server, bit flips in RAM of network controller, bit flips in RAM of network switch, bit flips in RAM of client, bit flips on the wire). Note that bit flips are common and likely to occur even for ECC RAM and in presence of TCP checksums (if you manage to run thousands of servers processing petabytes of data each day). [See the video (russian)](https://www.youtube.com/watch?v=ooBAQIe0KlQ).
|
||||
|
||||
ClickHouse provides diagnostics that will help ops engineers to find faulty hardware.
|
||||
|
||||
|
@ -12,7 +12,7 @@ The table engine (type of table) determines:
|
||||
- Which queries are supported, and how.
|
||||
- Concurrent data access.
|
||||
- Use of indexes, if present.
|
||||
- Whether multithreaded request execution is possible.
|
||||
- Whether multithread request execution is possible.
|
||||
- Data replication parameters.
|
||||
|
||||
## Engine Families {#engine-families}
|
||||
|
@ -40,7 +40,7 @@ Uniqueness of rows is determined by the `ORDER BY` table section, not `PRIMARY K
|
||||
When merging, `ReplacingMergeTree` from all the rows with the same sorting key leaves only one:
|
||||
|
||||
- The last in the selection, if `ver` not set. A selection is a set of rows in a set of parts participating in the merge. The most recently created part (the last insert) will be the last one in the selection. Thus, after deduplication, the very last row from the most recent insert will remain for each unique sorting key.
|
||||
- With the maximum version, if `ver` specified.
|
||||
- With the maximum version, if `ver` specified. If `ver` is the same for several rows, then it will use "if `ver` is not specified" rule for them, i.e. the most recent inserted row will remain.
|
||||
|
||||
**Query clauses**
|
||||
|
||||
|
@ -1,78 +1,139 @@
|
||||
---
|
||||
sidebar_label: Web Analytics Data
|
||||
description: Dataset consists of two tables containing anonymized web analytics data with hits and visits
|
||||
description: Dataset consisting of two tables containing anonymized web analytics data with hits and visits
|
||||
---
|
||||
|
||||
# Anonymized Web Analytics Data
|
||||
|
||||
Dataset consists of two tables containing anonymized web analytics data with hits (`hits_v1`) and visits (`visits_v1`).
|
||||
This dataset consists of two tables containing anonymized web analytics data with hits (`hits_v1`) and visits (`visits_v1`).
|
||||
|
||||
The dataset consists of two tables, either of them can be downloaded as a compressed `tsv.xz` file or as prepared partitions. In addition to that, an extended version of the `hits` table containing 100 million rows is available as TSV at https://datasets.clickhouse.com/hits/tsv/hits_100m_obfuscated_v1.tsv.xz and as prepared partitions at https://datasets.clickhouse.com/hits/partitions/hits_100m_obfuscated_v1.tar.xz.
|
||||
The tables can be downloaded as compressed `tsv.xz` files. In addition to the sample worked with in this document, an extended (7.5GB) version of the `hits` table containing 100 million rows is available as TSV at [https://datasets.clickhouse.com/hits/tsv/hits_100m_obfuscated_v1.tsv.xz](https://datasets.clickhouse.com/hits/tsv/hits_100m_obfuscated_v1.tsv.xz).
|
||||
|
||||
## Obtaining Tables from Prepared Partitions {#obtaining-tables-from-prepared-partitions}
|
||||
## Download and ingest the data
|
||||
|
||||
Download and import hits table:
|
||||
|
||||
``` bash
|
||||
curl -O https://datasets.clickhouse.com/hits/partitions/hits_v1.tar
|
||||
tar xvf hits_v1.tar -C /var/lib/clickhouse # path to ClickHouse data directory
|
||||
# check permissions on unpacked data, fix if required
|
||||
sudo service clickhouse-server restart
|
||||
clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
|
||||
```
|
||||
|
||||
Download and import visits:
|
||||
|
||||
``` bash
|
||||
curl -O https://datasets.clickhouse.com/visits/partitions/visits_v1.tar
|
||||
tar xvf visits_v1.tar -C /var/lib/clickhouse # path to ClickHouse data directory
|
||||
# check permissions on unpacked data, fix if required
|
||||
sudo service clickhouse-server restart
|
||||
clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
|
||||
```
|
||||
|
||||
## Obtaining Tables from Compressed TSV File {#obtaining-tables-from-compressed-tsv-file}
|
||||
|
||||
Download and import hits from compressed TSV file:
|
||||
### Download the hits compressed TSV file:
|
||||
|
||||
``` bash
|
||||
curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
|
||||
# Validate the checksum
|
||||
md5sum hits_v1.tsv
|
||||
# Checksum should be equal to: f3631b6295bf06989c1437491f7592cb
|
||||
# now create table
|
||||
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS datasets"
|
||||
# for hits_v1
|
||||
clickhouse-client --query "CREATE TABLE datasets.hits_v1 ( WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, ClientIP6 FixedString(16), RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, URLDomain String, RefererDomain String, Refresh UInt8, IsRobot UInt8, RefererCategories Array(UInt16), URLCategories Array(UInt16), URLRegions Array(UInt32), RefererRegions Array(UInt32), ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), UTCEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, GeneralInterests Array(UInt16), RemoteIP UInt32, RemoteIP6 FixedString(16), WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming Int32, DNSTiming Int32, ConnectTiming Int32, ResponseStartTiming Int32, ResponseEndTiming Int32, FetchTiming Int32, RedirectTiming Int32, DOMInteractiveTiming Int32, DOMContentLoadedTiming Int32, DOMCompleteTiming Int32, LoadEventStartTiming Int32, LoadEventEndTiming Int32, NSToDOMContentLoadedTiming Int32, FirstPaintTiming Int32, RedirectCount Int8, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, GoalsReached Array(UInt32), OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32, YCLID UInt64, ShareService String, ShareURL String, ShareTitle String, ParsedParams Nested(Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), IslandID FixedString(16), RequestNum UInt32, RequestTry UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"
|
||||
# for hits_100m_obfuscated
|
||||
clickhouse-client --query="CREATE TABLE default.hits_100m_obfuscated (WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, Refresh UInt8, RefererCategoryID UInt16, RefererRegionID UInt32, URLCategoryID UInt16, URLRegionID UInt32, ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, OriginalURL String, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), LocalEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, RemoteIP UInt32, WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming UInt32, DNSTiming UInt32, ConnectTiming UInt32, ResponseStartTiming UInt32, ResponseEndTiming UInt32, FetchTiming UInt32, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"
|
||||
```
|
||||
|
||||
# import data
|
||||
### Create the database and table
|
||||
|
||||
```bash
|
||||
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS datasets"
|
||||
```
|
||||
|
||||
For hits_v1
|
||||
|
||||
```bash
|
||||
clickhouse-client --query "CREATE TABLE datasets.hits_v1 ( WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, ClientIP6 FixedString(16), RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, URLDomain String, RefererDomain String, Refresh UInt8, IsRobot UInt8, RefererCategories Array(UInt16), URLCategories Array(UInt16), URLRegions Array(UInt32), RefererRegions Array(UInt32), ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), UTCEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, GeneralInterests Array(UInt16), RemoteIP UInt32, RemoteIP6 FixedString(16), WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming Int32, DNSTiming Int32, ConnectTiming Int32, ResponseStartTiming Int32, ResponseEndTiming Int32, FetchTiming Int32, RedirectTiming Int32, DOMInteractiveTiming Int32, DOMContentLoadedTiming Int32, DOMCompleteTiming Int32, LoadEventStartTiming Int32, LoadEventEndTiming Int32, NSToDOMContentLoadedTiming Int32, FirstPaintTiming Int32, RedirectCount Int8, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, GoalsReached Array(UInt32), OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32, YCLID UInt64, ShareService String, ShareURL String, ShareTitle String, ParsedParams Nested(Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), IslandID FixedString(16), RequestNum UInt32, RequestTry UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"
|
||||
```
|
||||
|
||||
Or for hits_100m_obfuscated
|
||||
|
||||
```bash
|
||||
clickhouse-client --query="CREATE TABLE default.hits_100m_obfuscated (WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, Refresh UInt8, RefererCategoryID UInt16, RefererRegionID UInt32, URLCategoryID UInt16, URLRegionID UInt32, ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, OriginalURL String, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), LocalEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, RemoteIP UInt32, WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming UInt32, DNSTiming UInt32, ConnectTiming UInt32, ResponseStartTiming UInt32, ResponseEndTiming UInt32, FetchTiming UInt32, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"
|
||||
```
|
||||
|
||||
### Import the hits data:
|
||||
|
||||
```bash
|
||||
cat hits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.hits_v1 FORMAT TSV" --max_insert_block_size=100000
|
||||
# optionally you can optimize table
|
||||
clickhouse-client --query "OPTIMIZE TABLE datasets.hits_v1 FINAL"
|
||||
```
|
||||
|
||||
Verify the count of rows
|
||||
|
||||
```bash
|
||||
clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
|
||||
```
|
||||
|
||||
Download and import visits from compressed tsv-file:
|
||||
```response
|
||||
8873898
|
||||
```
|
||||
|
||||
### Download the visits compressed TSV file:
|
||||
|
||||
``` bash
|
||||
curl https://datasets.clickhouse.com/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
|
||||
# Validate the checksum
|
||||
md5sum visits_v1.tsv
|
||||
# Checksum should be equal to: 6dafe1a0f24e59e3fc2d0fed85601de6
|
||||
# now create table
|
||||
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS datasets"
|
||||
```
|
||||
|
||||
### Create the visits table
|
||||
|
||||
```bash
|
||||
clickhouse-client --query "CREATE TABLE datasets.visits_v1 ( CounterID UInt32, StartDate Date, Sign Int8, IsNew UInt8, VisitID UInt64, UserID UInt64, StartTime DateTime, Duration UInt32, UTCStartTime DateTime, PageViews Int32, Hits Int32, IsBounce UInt8, Referer String, StartURL String, RefererDomain String, StartURLDomain String, EndURL String, LinkURL String, IsDownload UInt8, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, PlaceID Int32, RefererCategories Array(UInt16), URLCategories Array(UInt16), URLRegions Array(UInt32), RefererRegions Array(UInt32), IsYandex UInt8, GoalReachesDepth Int32, GoalReachesURL Int32, GoalReachesAny Int32, SocialSourceNetworkID UInt8, SocialSourcePage String, MobilePhoneModel String, ClientEventTime DateTime, RegionID UInt32, ClientIP UInt32, ClientIP6 FixedString(16), RemoteIP UInt32, RemoteIP6 FixedString(16), IPNetworkID UInt32, SilverlightVersion3 UInt32, CodeVersion UInt32, ResolutionWidth UInt16, ResolutionHeight UInt16, UserAgentMajor UInt16, UserAgentMinor UInt16, WindowClientWidth UInt16, WindowClientHeight UInt16, SilverlightVersion2 UInt8, SilverlightVersion4 UInt16, FlashVersion3 UInt16, FlashVersion4 UInt16, ClientTimeZone Int16, OS UInt8, UserAgent UInt8, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, NetMajor UInt8, NetMinor UInt8, MobilePhone UInt8, SilverlightVersion1 UInt8, Age UInt8, Sex UInt8, Income UInt8, JavaEnable UInt8, CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, BrowserLanguage UInt16, BrowserCountry UInt16, Interests UInt16, Robotness UInt8, GeneralInterests Array(UInt16), Params Array(String), Goals Nested(ID UInt32, Serial UInt32, EventTime DateTime, Price Int64, OrderID String, CurrencyID UInt32), WatchIDs Array(UInt64), ParamSumPrice Int64, ParamCurrency FixedString(3), ParamCurrencyID UInt16, ClickLogID UInt64, ClickEventID Int32, ClickGoodEvent Int32, ClickEventTime DateTime, ClickPriorityID Int32, ClickPhraseID Int32, ClickPageID Int32, ClickPlaceID Int32, ClickTypeID Int32, ClickResourceID Int32, ClickCost UInt32, ClickClientIP UInt32, ClickDomainID UInt32, ClickURL String, ClickAttempt UInt8, ClickOrderID UInt32, ClickBannerID UInt32, ClickMarketCategoryID UInt32, ClickMarketPP UInt32, ClickMarketCategoryName String, ClickMarketPPName String, ClickAWAPSCampaignName String, ClickPageName String, ClickTargetType UInt16, ClickTargetPhraseID UInt64, ClickContextType UInt8, ClickSelectType Int8, ClickOptions String, ClickGroupBannerID Int32, OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, FirstVisit DateTime, PredLastVisit Date, LastVisit Date, TotalVisits UInt32, TraficSource Nested(ID Int8, SearchEngineID UInt16, AdvEngineID UInt8, PlaceID UInt16, SocialSourceNetworkID UInt8, Domain String, SearchPhrase String, SocialSourcePage String), Attendance FixedString(16), CLID UInt32, YCLID UInt64, NormalizedRefererHash UInt64, SearchPhraseHash UInt64, RefererDomainHash UInt64, NormalizedStartURLHash UInt64, StartURLDomainHash UInt64, NormalizedEndURLHash UInt64, TopLevelDomain UInt64, URLScheme UInt64, OpenstatServiceNameHash UInt64, OpenstatCampaignIDHash UInt64, OpenstatAdIDHash UInt64, OpenstatSourceIDHash UInt64, UTMSourceHash UInt64, UTMMediumHash UInt64, UTMCampaignHash UInt64, UTMContentHash UInt64, UTMTermHash UInt64, FromHash UInt64, WebVisorEnabled UInt8, WebVisorActivity UInt32, ParsedParams Nested(Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), Market Nested(Type UInt8, GoalID UInt32, OrderID String, OrderPrice Int64, PP UInt32, DirectPlaceID UInt32, DirectOrderID UInt32, DirectBannerID UInt32, GoodID String, GoodName String, GoodQuantity Int32, GoodPrice Int64), IslandID FixedString(16)) ENGINE = CollapsingMergeTree(Sign) PARTITION BY toYYYYMM(StartDate) ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"
|
||||
# import data
|
||||
```
|
||||
|
||||
### Import the visits data
|
||||
```bash
|
||||
cat visits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.visits_v1 FORMAT TSV" --max_insert_block_size=100000
|
||||
# optionally you can optimize table
|
||||
clickhouse-client --query "OPTIMIZE TABLE datasets.visits_v1 FINAL"
|
||||
```
|
||||
|
||||
Verify the count
|
||||
```bash
|
||||
clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
|
||||
```
|
||||
|
||||
## Example Queries {#example-queries}
|
||||
```response
|
||||
1680609
|
||||
```
|
||||
|
||||
[The ClickHouse tutorial](../../tutorial.md) is based on this web analytics dataset, and the recommended way to get started with this dataset is to go through the tutorial.
|
||||
## An example JOIN
|
||||
|
||||
Additional examples of queries to these tables can be found among [stateful tests](https://github.com/ClickHouse/ClickHouse/tree/master/tests/queries/1_stateful) of ClickHouse (they are named `test.hits` and `test.visits` there).
|
||||
The hits and visits dataset is used in the ClickHouse test
|
||||
routines, this is one of the queries from the test suite. The rest
|
||||
of the tests are refernced in the *What's Next* section at the
|
||||
end of this page.
|
||||
|
||||
```sql
|
||||
clickhouse-client --query "SELECT
|
||||
EventDate,
|
||||
hits,
|
||||
visits
|
||||
FROM
|
||||
(
|
||||
SELECT
|
||||
EventDate,
|
||||
count() AS hits
|
||||
FROM datasets.hits_v1
|
||||
GROUP BY EventDate
|
||||
) ANY LEFT JOIN
|
||||
(
|
||||
SELECT
|
||||
StartDate AS EventDate,
|
||||
sum(Sign) AS visits
|
||||
FROM datasets.visits_v1
|
||||
GROUP BY EventDate
|
||||
) USING EventDate
|
||||
ORDER BY hits DESC
|
||||
LIMIT 10
|
||||
SETTINGS joined_subquery_requires_alias = 0
|
||||
FORMAT PrettyCompact"
|
||||
```
|
||||
|
||||
```response
|
||||
┌──EventDate─┬────hits─┬─visits─┐
|
||||
│ 2014-03-17 │ 1406958 │ 265108 │
|
||||
│ 2014-03-19 │ 1405797 │ 261624 │
|
||||
│ 2014-03-18 │ 1383658 │ 258723 │
|
||||
│ 2014-03-20 │ 1353623 │ 255328 │
|
||||
│ 2014-03-21 │ 1245779 │ 236232 │
|
||||
│ 2014-03-23 │ 1046491 │ 202212 │
|
||||
│ 2014-03-22 │ 1031592 │ 197354 │
|
||||
└────────────┴─────────┴────────┘
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
[A Practical Introduction to Sparse Primary Indexes in ClickHouse](../../guides/improving-query-performance/sparse-primary-indexes/sparse-primary-indexes-intro.md) uses the hits dataset to discuss the differences in ClickHouse indexing compared to traditional relational databases, how ClickHouse builds and uses a sparse primary index, and indexing best practices.
|
||||
|
||||
Additional examples of queries to these tables can be found among the ClickHouse [stateful tests](https://github.com/ClickHouse/ClickHouse/blob/d7129855757f38ceec3e4ecc6dafacdabe9b178f/tests/queries/1_stateful/00172_parallel_join.sql).
|
||||
|
||||
:::note
|
||||
The test suite uses a database name `test`, and the tables are named `hits` and `visits`. You can rename your database and tables, or edit the SQL from the test file.
|
||||
:::
|
||||
|
@ -190,8 +190,7 @@ sudo ./clickhouse install
|
||||
|
||||
### From Precompiled Binaries for Non-Standard Environments {#from-binaries-non-linux}
|
||||
|
||||
For non-Linux operating systems and for AArch64 CPU arhitecture, ClickHouse builds are provided as a cross-compiled binary from the latest commit of the `master` branch (with a few hours delay).
|
||||
|
||||
For non-Linux operating systems and for AArch64 CPU architecture, ClickHouse builds are provided as a cross-compiled binary from the latest commit of the `master` branch (with a few hours delay).
|
||||
|
||||
- [MacOS x86_64](https://builds.clickhouse.com/master/macos/clickhouse)
|
||||
```bash
|
||||
|
@ -119,7 +119,7 @@ Dates with times are written in the format `YYYY-MM-DD hh:mm:ss` and parsed in t
|
||||
This all occurs in the system time zone at the time the client or server starts (depending on which of them formats data). For dates with times, daylight saving time is not specified. So if a dump has times during daylight saving time, the dump does not unequivocally match the data, and parsing will select one of the two times.
|
||||
During a read operation, incorrect dates and dates with times can be parsed with natural overflow or as null dates and times, without an error message.
|
||||
|
||||
As an exception, parsing dates with times is also supported in Unix timestamp format, if it consists of exactly 10 decimal digits. The result is not time zone-dependent. The formats YYYY-MM-DD hh:mm:ss and NNNNNNNNNN are differentiated automatically.
|
||||
As an exception, parsing dates with times is also supported in Unix timestamp format, if it consists of exactly 10 decimal digits. The result is not time zone-dependent. The formats `YYYY-MM-DD hh:mm:ss` and `NNNNNNNNNN` are differentiated automatically.
|
||||
|
||||
Strings are output with backslash-escaped special characters. The following escape sequences are used for output: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\'`, `\\`. Parsing also supports the sequences `\a`, `\v`, and `\xHH` (hex escape sequences) and any `\c` sequences, where `c` is any character (these sequences are converted to `c`). Thus, reading data supports formats where a line feed can be written as `\n` or `\`, or as a line feed. For example, the string `Hello world` with a line feed between the words instead of space can be parsed in any of the following variations:
|
||||
|
||||
@ -333,8 +333,9 @@ Total rows: 2
|
||||
```
|
||||
|
||||
``` sql
|
||||
INSERT INTO UserActivity FORMAT Template SETTINGS
|
||||
INSERT INTO UserActivity SETTINGS
|
||||
format_template_resultset = '/some/path/resultset.format', format_template_row = '/some/path/row.format'
|
||||
FORMAT Template
|
||||
```
|
||||
|
||||
`/some/path/resultset.format`:
|
||||
@ -359,8 +360,9 @@ Similar to `Template`, but skips whitespace characters between delimiters and va
|
||||
It’s possible to read `JSON` using this format, if values of columns have the same order in all rows. For example, the following request can be used for inserting data from output example of format [JSON](#json):
|
||||
|
||||
``` sql
|
||||
INSERT INTO table_name FORMAT TemplateIgnoreSpaces SETTINGS
|
||||
INSERT INTO table_name SETTINGS
|
||||
format_template_resultset = '/some/path/resultset.format', format_template_row = '/some/path/row.format', format_template_rows_between_delimiter = ','
|
||||
FORMAT TemplateIgnoreSpaces
|
||||
```
|
||||
|
||||
`/some/path/resultset.format`:
|
||||
@ -816,7 +818,7 @@ Columns that are not present in the block will be filled with default values (yo
|
||||
|
||||
## JSONEachRow {#jsoneachrow}
|
||||
|
||||
In this format, CliskHouse outputs each row as a separated, newline-delimited JSON Object.
|
||||
In this format, ClickHouse outputs each row as a separated, newline-delimited JSON Object.
|
||||
|
||||
Example:
|
||||
|
||||
@ -1337,7 +1339,7 @@ Arrays can be nested and can have a value of the `Nullable` type as an argument.
|
||||
You can insert CapnProto data from a file into ClickHouse table by the following command:
|
||||
|
||||
``` bash
|
||||
$ cat capnproto_messages.bin | clickhouse-client --query "INSERT INTO test.hits FORMAT CapnProto SETTINGS format_schema = 'schema:Message'"
|
||||
$ cat capnproto_messages.bin | clickhouse-client --query "INSERT INTO test.hits SETTINGS format_schema = 'schema:Message' FORMAT CapnProto"
|
||||
```
|
||||
|
||||
Where `schema.capnp` looks like this:
|
||||
@ -1363,9 +1365,9 @@ Columns `name` ([String](../sql-reference/data-types/string.md)) and `value` (nu
|
||||
Rows may optionally contain `help` ([String](../sql-reference/data-types/string.md)) and `timestamp` (number).
|
||||
Column `type` ([String](../sql-reference/data-types/string.md)) is either `counter`, `gauge`, `histogram`, `summary`, `untyped` or empty.
|
||||
Each metric value may also have some `labels` ([Map(String, String)](../sql-reference/data-types/map.md)).
|
||||
Several consequent rows may refer to the one metric with different lables. The table should be sorted by metric name (e.g., with `ORDER BY name`).
|
||||
Several consequent rows may refer to the one metric with different labels. The table should be sorted by metric name (e.g., with `ORDER BY name`).
|
||||
|
||||
There's special requirements for labels for `histogram` and `summary`, see [Prometheus doc](https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries) for the details. Special rules applied to row with labels `{'count':''}` and `{'sum':''}`, they'll be convered to `<metric_name>_count` and `<metric_name>_sum` respectively.
|
||||
There's special requirements for labels for `histogram` and `summary`, see [Prometheus doc](https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries) for the details. Special rules applied to row with labels `{'count':''}` and `{'sum':''}`, they'll be converted to `<metric_name>_count` and `<metric_name>_sum` respectively.
|
||||
|
||||
**Example:**
|
||||
|
||||
@ -1439,7 +1441,7 @@ SELECT * FROM test.table FORMAT Protobuf SETTINGS format_schema = 'schemafile:Me
|
||||
```
|
||||
|
||||
``` bash
|
||||
cat protobuf_messages.bin | clickhouse-client --query "INSERT INTO test.table FORMAT Protobuf SETTINGS format_schema='schemafile:MessageType'"
|
||||
cat protobuf_messages.bin | clickhouse-client --query "INSERT INTO test.table SETTINGS format_schema='schemafile:MessageType' FORMAT Protobuf"
|
||||
```
|
||||
|
||||
where the file `schemafile.proto` looks like this:
|
||||
@ -1665,7 +1667,7 @@ To exchange data with Hadoop, you can use [HDFS table engine](../engines/table-e
|
||||
|
||||
### Parquet format settings {#parquet-format-settings}
|
||||
|
||||
- [output_format_parquet_row_group_size](../operations/settings/settings.md#output_format_parquet_row_group_size) - row group size in rows while data output. Default value - `1000000`.
|
||||
- [output_format_parquet_row_group_size](../operations/settings/settings.md#output_format_parquet_row_group_size) - row group size in rows while data output. Default value - `1000000`.
|
||||
- [output_format_parquet_string_as_string](../operations/settings/settings.md#output_format_parquet_string_as_string) - use Parquet String type instead of Binary for String columns. Default value - `false`.
|
||||
- [input_format_parquet_import_nested](../operations/settings/settings.md#input_format_parquet_import_nested) - allow inserting array of structs into [Nested](../sql-reference/data-types/nested-data-structures/nested.md) table in Parquet input format. Default value - `false`.
|
||||
- [input_format_parquet_case_insensitive_column_matching](../operations/settings/settings.md#input_format_parquet_case_insensitive_column_matching) - ignore case when matching Parquet columns with ClickHouse columns. Default value - `false`.
|
||||
@ -1845,7 +1847,7 @@ When working with the `Regexp` format, you can use the following settings:
|
||||
- Quoted (similarly to [Values](#data-format-values))
|
||||
- Raw (extracts subpatterns as a whole, no escaping rules, similarly to [TSVRaw](#tabseparatedraw))
|
||||
|
||||
- `format_regexp_skip_unmatched` — [UInt8](../sql-reference/data-types/int-uint.md). Defines the need to throw an exeption in case the `format_regexp` expression does not match the imported data. Can be set to `0` or `1`.
|
||||
- `format_regexp_skip_unmatched` — [UInt8](../sql-reference/data-types/int-uint.md). Defines the need to throw an exception in case the `format_regexp` expression does not match the imported data. Can be set to `0` or `1`.
|
||||
|
||||
**Usage**
|
||||
|
||||
@ -1875,7 +1877,7 @@ CREATE TABLE imp_regex_table (id UInt32, array Array(UInt32), string String, dat
|
||||
Import command:
|
||||
|
||||
```bash
|
||||
$ cat data.tsv | clickhouse-client --query "INSERT INTO imp_regex_table FORMAT Regexp SETTINGS format_regexp='id: (.+?) array: (.+?) string: (.+?) date: (.+?)', format_regexp_escaping_rule='Escaped', format_regexp_skip_unmatched=0;"
|
||||
$ cat data.tsv | clickhouse-client --query "INSERT INTO imp_regex_table SETTINGS format_regexp='id: (.+?) array: (.+?) string: (.+?) date: (.+?)', format_regexp_escaping_rule='Escaped', format_regexp_skip_unmatched=0 FORMAT Regexp;"
|
||||
```
|
||||
|
||||
Query:
|
||||
|
@ -422,7 +422,7 @@ Now `rule` can configure `method`, `headers`, `url`, `handler`:
|
||||
|
||||
- `query` — use with `predefined_query_handler` type, executes query when the handler is called.
|
||||
|
||||
- `query_param_name` — use with `dynamic_query_handler` type, extracts and executes the value corresponding to the `query_param_name` value in HTTP request params.
|
||||
- `query_param_name` — use with `dynamic_query_handler` type, extracts and executes the value corresponding to the `query_param_name` value in HTTP request parameters.
|
||||
|
||||
- `status` — use with `static` type, response status code.
|
||||
|
||||
@ -477,9 +477,9 @@ In one `predefined_query_handler` only supports one `query` of an insert type.
|
||||
|
||||
### dynamic_query_handler {#dynamic_query_handler}
|
||||
|
||||
In `dynamic_query_handler`, the query is written in the form of param of the HTTP request. The difference is that in `predefined_query_handler`, the query is written in the configuration file. You can configure `query_param_name` in `dynamic_query_handler`.
|
||||
In `dynamic_query_handler`, the query is written in the form of parameter of the HTTP request. The difference is that in `predefined_query_handler`, the query is written in the configuration file. You can configure `query_param_name` in `dynamic_query_handler`.
|
||||
|
||||
ClickHouse extracts and executes the value corresponding to the `query_param_name` value in the URL of the HTTP request. The default value of `query_param_name` is `/query` . It is an optional configuration. If there is no definition in the configuration file, the param is not passed in.
|
||||
ClickHouse extracts and executes the value corresponding to the `query_param_name` value in the URL of the HTTP request. The default value of `query_param_name` is `/query` . It is an optional configuration. If there is no definition in the configuration file, the parameter is not passed in.
|
||||
|
||||
To experiment with this functionality, the example defines the values of [max_threads](../operations/settings/settings.md#settings-max_threads) and `max_final_threads` and `queries` whether the settings were set successfully.
|
||||
|
||||
|
@ -5,7 +5,7 @@ sidebar_label: PostgreSQL Interface
|
||||
|
||||
# PostgreSQL Interface
|
||||
|
||||
ClickHouse supports the PostgreSQL wire protocol, which allows you to use Postgres clients to connect to ClickHouse. In a sense, ClickHouse can pretend to be a PostgreSQL instance - allowing you to connect a PostgreSQL client application to ClickHouse that is not already directy supported by ClickHouse (for example, Amazon Redshift).
|
||||
ClickHouse supports the PostgreSQL wire protocol, which allows you to use Postgres clients to connect to ClickHouse. In a sense, ClickHouse can pretend to be a PostgreSQL instance - allowing you to connect a PostgreSQL client application to ClickHouse that is not already directly supported by ClickHouse (for example, Amazon Redshift).
|
||||
|
||||
To enable the PostgreSQL wire protocol, add the [postgresql_port](../operations/server-configuration-parameters/settings#server_configuration_parameters-postgresql_port) setting to your server's configuration file. For example, you could define the port in a new XML file in your `config.d` folder:
|
||||
|
||||
@ -59,7 +59,7 @@ The PostgreSQL protocol currently only supports plain-text passwords.
|
||||
|
||||
## Using SSL
|
||||
|
||||
If you have SSL/TLS configured on your ClickHouse instance, then `postgresql_port` will use the same settings (the port is shared for both secure and unsecure clients).
|
||||
If you have SSL/TLS configured on your ClickHouse instance, then `postgresql_port` will use the same settings (the port is shared for both secure and insecure clients).
|
||||
|
||||
Each client has their own method of how to connect using SSL. The following command demonstrates how to pass in the certificates and key to securely connect `psql` to ClickHouse:
|
||||
|
||||
|
@ -47,6 +47,8 @@ ClickHouse Inc does **not** maintain the libraries listed below and hasn’t don
|
||||
- [ClickHouse (Ruby)](https://github.com/shlima/click_house)
|
||||
- [clickhouse-activerecord](https://github.com/PNixx/clickhouse-activerecord)
|
||||
- Rust
|
||||
- [clickhouse.rs](https://github.com/loyd/clickhouse.rs)
|
||||
- [clickhouse-rs](https://github.com/suharev7/clickhouse-rs)
|
||||
- [Klickhouse](https://github.com/Protryon/klickhouse)
|
||||
- R
|
||||
- [clickhouse-r](https://github.com/hannesmuehleisen/clickhouse-r)
|
||||
|
@ -53,7 +53,7 @@ Internal coordination settings are located in the `<keeper_server>.<coordination
|
||||
- `auto_forwarding` — Allow to forward write requests from followers to the leader (default: true).
|
||||
- `shutdown_timeout` — Wait to finish internal connections and shutdown (ms) (default: 5000).
|
||||
- `startup_timeout` — If the server doesn't connect to other quorum participants in the specified timeout it will terminate (ms) (default: 30000).
|
||||
- `four_letter_word_white_list` — White list of 4lw commands (default: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro").
|
||||
- `four_letter_word_white_list` — White list of 4lw commands (default: `conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro`).
|
||||
|
||||
Quorum configuration is located in the `<keeper_server>.<raft_configuration>` section and contain servers description.
|
||||
|
||||
@ -122,7 +122,7 @@ clickhouse keeper --config /etc/your_path_to_config/config.xml
|
||||
|
||||
ClickHouse Keeper also provides 4lw commands which are almost the same with Zookeeper. Each command is composed of four letters such as `mntr`, `stat` etc. There are some more interesting commands: `stat` gives some general information about the server and connected clients, while `srvr` and `cons` give extended details on server and connections respectively.
|
||||
|
||||
The 4lw commands has a white list configuration `four_letter_word_white_list` which has default value "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro".
|
||||
The 4lw commands has a white list configuration `four_letter_word_white_list` which has default value `conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro`.
|
||||
|
||||
You can issue the commands to ClickHouse Keeper via telnet or nc, at the client port.
|
||||
|
||||
@ -132,7 +132,7 @@ echo mntr | nc localhost 9181
|
||||
|
||||
Bellow is the detailed 4lw commands:
|
||||
|
||||
- `ruok`: Tests if server is running in a non-error state. The server will respond with imok if it is running. Otherwise it will not respond at all. A response of "imok" does not necessarily indicate that the server has joined the quorum, just that the server process is active and bound to the specified client port. Use "stat" for details on state wrt quorum and client connection information.
|
||||
- `ruok`: Tests if server is running in a non-error state. The server will respond with `imok` if it is running. Otherwise it will not respond at all. A response of `imok` does not necessarily indicate that the server has joined the quorum, just that the server process is active and bound to the specified client port. Use "stat" for details on state wrt quorum and client connection information.
|
||||
|
||||
```
|
||||
imok
|
||||
@ -330,9 +330,9 @@ E.g. for a 3-node cluster, it will continue working correctly if only 1 node cra
|
||||
|
||||
Cluster configuration can be dynamically configured but there are some limitations. Reconfiguration relies on Raft also
|
||||
so to add/remove a node from the cluster you need to have a quorum. If you lose too many nodes in your cluster at the same time without any chance
|
||||
of starting them again, Raft will stop working and not allow you to reconfigure your cluster using the convenvtional way.
|
||||
of starting them again, Raft will stop working and not allow you to reconfigure your cluster using the conventional way.
|
||||
|
||||
Nevertheless, Clickhouse Keeper has a recovery mode which allows you to forcfully reconfigure your cluster with only 1 node.
|
||||
Nevertheless, Clickhouse Keeper has a recovery mode which allows you to forcefully reconfigure your cluster with only 1 node.
|
||||
This should be done only as your last resort if you cannot start your nodes again, or start a new instance on the same endpoint.
|
||||
|
||||
Important things to note before continuing:
|
||||
|
@ -57,7 +57,7 @@ Substitutions can also be performed from ZooKeeper. To do this, specify the attr
|
||||
|
||||
The `config.xml` file can specify a separate config with user settings, profiles, and quotas. The relative path to this config is set in the `users_config` element. By default, it is `users.xml`. If `users_config` is omitted, the user settings, profiles, and quotas are specified directly in `config.xml`.
|
||||
|
||||
Users configuration can be splitted into separate files similar to `config.xml` and `config.d/`.
|
||||
Users configuration can be split into separate files similar to `config.xml` and `config.d/`.
|
||||
Directory name is defined as `users_config` setting without `.xml` postfix concatenated with `.d`.
|
||||
Directory `users.d` is used by default, as `users_config` defaults to `users.xml`.
|
||||
|
||||
|
@ -2,6 +2,18 @@
|
||||
|
||||
The values of `merge_tree` settings (for all MergeTree tables) can be viewed in the table `system.merge_tree_settings`, they can be overridden in `config.xml` in the `merge_tree` section, or set in the `SETTINGS` section of each table.
|
||||
|
||||
These are example overrides for `max_suspicious_broken_parts`:
|
||||
|
||||
## max_suspicious_broken_parts
|
||||
|
||||
If the number of broken parts in a single partition exceeds the `max_suspicious_broken_parts` value, automatic deletion is denied.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 10.
|
||||
|
||||
Override example in `config.xml`:
|
||||
|
||||
``` text
|
||||
@ -152,6 +164,187 @@ Default value: 604800 (1 week).
|
||||
|
||||
Similar to [replicated_deduplication_window](#replicated-deduplication-window), `replicated_deduplication_window_seconds` specifies how long to store hash sums of blocks for insert deduplication. Hash sums older than `replicated_deduplication_window_seconds` are removed from ClickHouse Keeper, even if they are less than ` replicated_deduplication_window`.
|
||||
|
||||
## max_replicated_logs_to_keep
|
||||
|
||||
How many records may be in the ClickHouse Keeper log if there is inactive replica. An inactive replica becomes lost when when this number exceed.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 1000
|
||||
|
||||
## min_replicated_logs_to_keep
|
||||
|
||||
Keep about this number of last records in ZooKeeper log, even if they are obsolete. It doesn't affect work of tables: used only to diagnose ZooKeeper log before cleaning.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 10
|
||||
|
||||
## prefer_fetch_merged_part_time_threshold
|
||||
|
||||
If the time passed since a replication log (ClickHouse Keeper or ZooKeeper) entry creation exceeds this threshold, and the sum of the size of parts is greater than `prefer_fetch_merged_part_size_threshold`, then prefer fetching merged part from a replica instead of doing merge locally. This is to speed up very long merges.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 3600
|
||||
|
||||
## prefer_fetch_merged_part_size_threshold
|
||||
|
||||
If the sum of the size of parts exceeds this threshold and the time since a replication log entry creation is greater than `prefer_fetch_merged_part_time_threshold`, then prefer fetching merged part from a replica instead of doing merge locally. This is to speed up very long merges.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 10,737,418,240
|
||||
|
||||
## execute_merges_on_single_replica_time_threshold
|
||||
|
||||
When this setting has a value greater than zero, only a single replica starts the merge immediately, and other replicas wait up to that amount of time to download the result instead of doing merges locally. If the chosen replica doesn't finish the merge during that amount of time, fallback to standard behavior happens.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 0 (seconds)
|
||||
|
||||
## remote_fs_execute_merges_on_single_replica_time_threshold
|
||||
|
||||
When this setting has a value greater than than zero only a single replica starts the merge immediately if merged part on shared storage and `allow_remote_fs_zero_copy_replication` is enabled.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 1800
|
||||
|
||||
## try_fetch_recompressed_part_timeout
|
||||
|
||||
Recompression works slow in most cases, so we don't start merge with recompression until this timeout and trying to fetch recompressed part from replica which assigned this merge with recompression.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 7200
|
||||
|
||||
## always_fetch_merged_part
|
||||
|
||||
If true, this replica never merges parts and always downloads merged parts from other replicas.
|
||||
|
||||
Possible values:
|
||||
|
||||
- true, false
|
||||
|
||||
Default value: false
|
||||
|
||||
## max_suspicious_broken_parts
|
||||
|
||||
Max broken parts, if more - deny automatic deletion.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 10
|
||||
|
||||
## max_suspicious_broken_parts_bytes
|
||||
|
||||
|
||||
Max size of all broken parts, if more - deny automatic deletion.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 1,073,741,824
|
||||
|
||||
## max_files_to_modify_in_alter_columns
|
||||
|
||||
Do not apply ALTER if number of files for modification(deletion, addition) is greater than this setting.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 75
|
||||
|
||||
## max_files_to_remove_in_alter_columns
|
||||
|
||||
Do not apply ALTER, if the number of files for deletion is greater than this setting.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 50
|
||||
|
||||
## replicated_max_ratio_of_wrong_parts
|
||||
|
||||
If the ratio of wrong parts to total number of parts is less than this - allow to start.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Float, 0.0 - 1.0
|
||||
|
||||
Default value: 0.5
|
||||
|
||||
## replicated_max_parallel_fetches_for_host
|
||||
|
||||
Limit parallel fetches from endpoint (actually pool size).
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 15
|
||||
|
||||
## replicated_fetches_http_connection_timeout
|
||||
|
||||
HTTP connection timeout for part fetch requests. Inherited from default profile `http_connection_timeout` if not set explicitly.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: Inherited from default profile `http_connection_timeout` if not set explicitly.
|
||||
|
||||
## replicated_can_become_leader
|
||||
|
||||
If true, replicated tables replicas on this node will try to acquire leadership.
|
||||
|
||||
Possible values:
|
||||
|
||||
- true, false
|
||||
|
||||
Default value: true
|
||||
|
||||
## zookeeper_session_expiration_check_period
|
||||
|
||||
ZooKeeper session expiration check period, in seconds.
|
||||
|
||||
Possible values:
|
||||
|
||||
- Any positive integer.
|
||||
|
||||
Default value: 60
|
||||
|
||||
## detach_old_local_parts_when_cloning_replica
|
||||
|
||||
Do not remove old local parts when repairing lost replica.
|
||||
|
||||
Possible values:
|
||||
|
||||
- true, false
|
||||
|
||||
Default value: true
|
||||
|
||||
## replicated_fetches_http_connection_timeout {#replicated_fetches_http_connection_timeout}
|
||||
|
||||
HTTP connection timeout (in seconds) for part fetch requests. Inherited from default profile [http_connection_timeout](./settings.md#http_connection_timeout) if not set explicitly.
|
||||
|
@ -70,7 +70,7 @@ Regardless of RAID use, always use replication for data security.
|
||||
Enable NCQ with a long queue. For HDD, choose the CFQ scheduler, and for SSD, choose noop. Don’t reduce the ‘readahead’ setting.
|
||||
For HDD, enable the write cache.
|
||||
|
||||
Make sure that [fstrim](https://en.wikipedia.org/wiki/Trim_(computing)) is enabled for NVME and SSD disks in your OS (usually it's implemented using a cronjob or systemd service).
|
||||
Make sure that [`fstrim`](https://en.wikipedia.org/wiki/Trim_(computing)) is enabled for NVME and SSD disks in your OS (usually it's implemented using a cronjob or systemd service).
|
||||
|
||||
## File System {#file-system}
|
||||
|
||||
@ -94,7 +94,7 @@ Use at least a 10 GB network, if possible. 1 Gb will also work, but it will be m
|
||||
|
||||
## Huge Pages {#huge-pages}
|
||||
|
||||
If you are using old Linux kernel, disable transparent huge pages. It interferes with memory allocators, which leads to significant performance degradation.
|
||||
If you are using old Linux kernel, disable transparent huge pages. It interferes with memory allocator, which leads to significant performance degradation.
|
||||
On newer Linux kernels transparent huge pages are alright.
|
||||
|
||||
``` bash
|
||||
@ -107,7 +107,7 @@ If you are using OpenStack, set
|
||||
```
|
||||
cpu_mode=host-passthrough
|
||||
```
|
||||
in nova.conf.
|
||||
in `nova.conf`.
|
||||
|
||||
If you are using libvirt, set
|
||||
```
|
||||
@ -136,7 +136,7 @@ Do not change `minSessionTimeout` setting, large values may affect ClickHouse re
|
||||
|
||||
With the default settings, ZooKeeper is a time bomb:
|
||||
|
||||
> The ZooKeeper server won’t delete files from old snapshots and logs when using the default configuration (see autopurge), and this is the responsibility of the operator.
|
||||
> The ZooKeeper server won’t delete files from old snapshots and logs when using the default configuration (see `autopurge`), and this is the responsibility of the operator.
|
||||
|
||||
This bomb must be defused.
|
||||
|
||||
@ -241,7 +241,7 @@ JAVA_OPTS="-Xms{{ '{{' }} cluster.get('xms','128M') {{ '}}' }} \
|
||||
-XX:MaxGCPauseMillis=50"
|
||||
```
|
||||
|
||||
Salt init:
|
||||
Salt initialization:
|
||||
|
||||
``` text
|
||||
description "zookeeper-{{ '{{' }} cluster['name'] {{ '}}' }} centralized coordination service"
|
||||
|
@ -3,7 +3,7 @@ sidebar_position: 46
|
||||
sidebar_label: Troubleshooting
|
||||
---
|
||||
|
||||
# Troubleshooting
|
||||
# Troubleshooting
|
||||
|
||||
- [Installation](#troubleshooting-installation-errors)
|
||||
- [Connecting to the server](#troubleshooting-accepts-no-connections)
|
||||
@ -26,7 +26,7 @@ Possible issues:
|
||||
|
||||
### Server Is Not Running {#server-is-not-running}
|
||||
|
||||
**Check if server is runnnig**
|
||||
**Check if server is running**
|
||||
|
||||
Command:
|
||||
|
||||
|
@ -6,10 +6,10 @@ sidebar_position: 33
|
||||
|
||||
Calculates the amount `Σ((x - x̅)^2) / (n - 1)`, where `n` is the sample size and `x̅`is the average value of `x`.
|
||||
|
||||
It represents an unbiased estimate of the variance of a random variable if passed values form its sample.
|
||||
It represents an unbiased estimate of the variance of a random variable if passed values from its sample.
|
||||
|
||||
Returns `Float64`. When `n <= 1`, returns `+∞`.
|
||||
|
||||
:::note
|
||||
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `varSampStable` function. It works slower but provides a lower computational error.
|
||||
:::
|
||||
:::
|
||||
|
@ -4,7 +4,7 @@ sidebar_label: H3 Indexes
|
||||
|
||||
# Functions for Working with H3 Indexes
|
||||
|
||||
[H3](https://eng.uber.com/h3/) is a geographical indexing system where Earth’s surface divided into a grid of even hexagonal cells. This system is hierarchical, i. e. each hexagon on the top level ("parent") can be splitted into seven even but smaller ones ("children"), and so on.
|
||||
[H3](https://eng.uber.com/h3/) is a geographical indexing system where Earth’s surface divided into a grid of even hexagonal cells. This system is hierarchical, i. e. each hexagon on the top level ("parent") can be split into seven even but smaller ones ("children"), and so on.
|
||||
|
||||
The level of the hierarchy is called `resolution` and can receive a value from `0` till `15`, where `0` is the `base` level with the largest and coarsest cells.
|
||||
|
||||
@ -1398,4 +1398,4 @@ Result:
|
||||
│ [(37.42012867767779,-122.03773496427027),(37.33755608435299,-122.090428929044)] │
|
||||
└─────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
[Original article](https://clickhouse.com/docs/en/sql-reference/functions/geo/h3) <!--hide-->
|
||||
[Original article](https://clickhouse.com/docs/en/sql-reference/functions/geo/h3) <!--hide-->
|
||||
|
@ -174,22 +174,24 @@ Result:
|
||||
Creating `test_function_sum_json` with named arguments and format [JSONEachRow](../../interfaces/formats.md#jsoneachrow) using XML configuration.
|
||||
File test_function.xml.
|
||||
```xml
|
||||
<function>
|
||||
<type>executable</type>
|
||||
<name>test_function_sum_json</name>
|
||||
<return_type>UInt64</return_type>
|
||||
<return_name>result_name</return_name>
|
||||
<argument>
|
||||
<type>UInt64</type>
|
||||
<name>argument_1</name>
|
||||
</argument>
|
||||
<argument>
|
||||
<type>UInt64</type>
|
||||
<name>argument_2</name>
|
||||
</argument>
|
||||
<format>JSONEachRow</format>
|
||||
<command>test_function_sum_json.py</command>
|
||||
</function>
|
||||
<functions>
|
||||
<function>
|
||||
<type>executable</type>
|
||||
<name>test_function_sum_json</name>
|
||||
<return_type>UInt64</return_type>
|
||||
<return_name>result_name</return_name>
|
||||
<argument>
|
||||
<type>UInt64</type>
|
||||
<name>argument_1</name>
|
||||
</argument>
|
||||
<argument>
|
||||
<type>UInt64</type>
|
||||
<name>argument_2</name>
|
||||
</argument>
|
||||
<format>JSONEachRow</format>
|
||||
<command>test_function_sum_json.py</command>
|
||||
</function>
|
||||
</functions>
|
||||
```
|
||||
|
||||
Script file inside `user_scripts` folder `test_function_sum_json.py`.
|
||||
@ -224,6 +226,50 @@ Result:
|
||||
└──────────────────────────────┘
|
||||
```
|
||||
|
||||
Executable user defined functions can take constant parameters configured in `command` setting (works only for user defined functions with `executable` type).
|
||||
File test_function_parameter_python.xml.
|
||||
```xml
|
||||
<functions>
|
||||
<function>
|
||||
<type>executable</type>
|
||||
<name>test_function_parameter_python</name>
|
||||
<return_type>String</return_type>
|
||||
<argument>
|
||||
<type>UInt64</type>
|
||||
</argument>
|
||||
<format>TabSeparated</format>
|
||||
<command>test_function_parameter_python.py {test_parameter:UInt64}</command>
|
||||
</function>
|
||||
</functions>
|
||||
```
|
||||
|
||||
Script file inside `user_scripts` folder `test_function_parameter_python.py`.
|
||||
|
||||
```python
|
||||
#!/usr/bin/python3
|
||||
|
||||
import sys
|
||||
|
||||
if __name__ == "__main__":
|
||||
for line in sys.stdin:
|
||||
print("Parameter " + str(sys.argv[1]) + " value " + str(line), end="")
|
||||
sys.stdout.flush()
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT test_function_parameter_python(1)(2);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─test_function_parameter_python(1)(2)─┐
|
||||
│ Parameter 1 value 2 │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
Some functions might throw an exception if the data is invalid. In this case, the query is canceled and an error text is returned to the client. For distributed processing, when an exception occurs on one of the servers, the other servers also attempt to abort the query.
|
||||
|
@ -32,7 +32,7 @@ Integer value in the `Int8`, `Int16`, `Int32`, `Int64`, `Int128` or `Int256` dat
|
||||
|
||||
Functions use [rounding towards zero](https://en.wikipedia.org/wiki/Rounding#Rounding_towards_zero), meaning they truncate fractional digits of numbers.
|
||||
|
||||
The behavior of functions for the [NaN and Inf](../../sql-reference/data-types/float.md#data_type-float-nan-inf) arguments is undefined. Remember about [numeric convertions issues](#numeric-conversion-issues), when using the functions.
|
||||
The behavior of functions for the [NaN and Inf](../../sql-reference/data-types/float.md#data_type-float-nan-inf) arguments is undefined. Remember about [numeric conversions issues](#numeric-conversion-issues), when using the functions.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -131,7 +131,7 @@ Integer value in the `UInt8`, `UInt16`, `UInt32`, `UInt64` or `UInt256` data typ
|
||||
|
||||
Functions use [rounding towards zero](https://en.wikipedia.org/wiki/Rounding#Rounding_towards_zero), meaning they truncate fractional digits of numbers.
|
||||
|
||||
The behavior of functions for negative agruments and for the [NaN and Inf](../../sql-reference/data-types/float.md#data_type-float-nan-inf) arguments is undefined. If you pass a string with a negative number, for example `'-32'`, ClickHouse raises an exception. Remember about [numeric convertions issues](#numeric-conversion-issues), when using the functions.
|
||||
The behavior of functions for negative agruments and for the [NaN and Inf](../../sql-reference/data-types/float.md#data_type-float-nan-inf) arguments is undefined. If you pass a string with a negative number, for example `'-32'`, ClickHouse raises an exception. Remember about [numeric conversions issues](#numeric-conversion-issues), when using the functions.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -689,7 +689,7 @@ x::t
|
||||
|
||||
- Converted value.
|
||||
|
||||
:::note
|
||||
:::note
|
||||
If the input value does not fit the bounds of the target type, the result overflows. For example, `CAST(-1, 'UInt8')` returns `255`.
|
||||
:::
|
||||
|
||||
@ -1433,7 +1433,7 @@ Result:
|
||||
|
||||
Converts a `DateTime64` to a `Int64` value with fixed sub-second precision. Input value is scaled up or down appropriately depending on it precision.
|
||||
|
||||
:::note
|
||||
:::note
|
||||
The output value is a timestamp in UTC, not in the timezone of `DateTime64`.
|
||||
:::
|
||||
|
||||
|
@ -248,6 +248,7 @@ Specialized codecs:
|
||||
- `Delta(delta_bytes)` — Compression approach in which raw values are replaced by the difference of two neighboring values, except for the first value that stays unchanged. Up to `delta_bytes` are used for storing delta values, so `delta_bytes` is the maximum size of raw values. Possible `delta_bytes` values: 1, 2, 4, 8. The default value for `delta_bytes` is `sizeof(type)` if equal to 1, 2, 4, or 8. In all other cases, it’s 1.
|
||||
- `DoubleDelta` — Calculates delta of deltas and writes it in compact binary form. Optimal compression rates are achieved for monotonic sequences with a constant stride, such as time series data. Can be used with any fixed-width type. Implements the algorithm used in Gorilla TSDB, extending it to support 64-bit types. Uses 1 extra bit for 32-byte deltas: 5-bit prefixes instead of 4-bit prefixes. For additional information, see Compressing Time Stamps in [Gorilla: A Fast, Scalable, In-Memory Time Series Database](http://www.vldb.org/pvldb/vol8/p1816-teller.pdf).
|
||||
- `Gorilla` — Calculates XOR between current and previous value and writes it in compact binary form. Efficient when storing a series of floating point values that change slowly, because the best compression rate is achieved when neighboring values are binary equal. Implements the algorithm used in Gorilla TSDB, extending it to support 64-bit types. For additional information, see Compressing Values in [Gorilla: A Fast, Scalable, In-Memory Time Series Database](http://www.vldb.org/pvldb/vol8/p1816-teller.pdf).
|
||||
- `FPC` - Repeatedly predicts the next floating point value in the sequence using the better of two predictors, then XORs the actual with the predicted value, and leading-zero compresses the result. Similar to Gorilla, this is efficient when storing a series of floating point values that change slowly. For 64-bit values (double), FPC is faster than Gorilla, for 32-bit values your mileage may vary. For a detailed description of the algorithm see [High Throughput Compression of Double-Precision Floating-Point Data](https://userweb.cs.txstate.edu/~burtscher/papers/dcc07a.pdf).
|
||||
- `T64` — Compression approach that crops unused high bits of values in integer data types (including `Enum`, `Date` and `DateTime`). At each step of its algorithm, codec takes a block of 64 values, puts them into 64x64 bit matrix, transposes it, crops the unused bits of values and returns the rest as a sequence. Unused bits are the bits, that do not differ between maximum and minimum values in the whole data part for which the compression is used.
|
||||
|
||||
`DoubleDelta` and `Gorilla` codecs are used in Gorilla TSDB as the components of its compressing algorithm. Gorilla approach is effective in scenarios when there is a sequence of slowly changing values with their timestamps. Timestamps are effectively compressed by the `DoubleDelta` codec, and values are effectively compressed by the `Gorilla` codec. For example, to get an effectively stored table, you can create it in the following configuration:
|
||||
|
@ -252,12 +252,14 @@ This is an experimental feature that may change in backwards-incompatible ways i
|
||||
:::
|
||||
|
||||
``` sql
|
||||
CREATE WINDOW VIEW [IF NOT EXISTS] [db.]table_name [TO [db.]table_name] [ENGINE = engine] [WATERMARK = strategy] [ALLOWED_LATENESS = interval_function] AS SELECT ... GROUP BY time_window_function
|
||||
CREATE WINDOW VIEW [IF NOT EXISTS] [db.]table_name [TO [db.]table_name] [INNER ENGINE engine] [ENGINE engine] [WATERMARK strategy] [ALLOWED_LATENESS interval_function] [POPULATE] AS SELECT ... GROUP BY time_window_function
|
||||
```
|
||||
|
||||
Window view can aggregate data by time window and output the results when the window is ready to fire. It stores the partial aggregation results in an inner(or specified) table to reduce latency and can push the processing result to a specified table or push notifications using the WATCH query.
|
||||
|
||||
Creating a window view is similar to creating `MATERIALIZED VIEW`. Window view needs an inner storage engine to store intermediate data. The inner storage will use `AggregatingMergeTree` as the default engine.
|
||||
Creating a window view is similar to creating `MATERIALIZED VIEW`. Window view needs an inner storage engine to store intermediate data. The inner storage can be specified by using `INNER ENGINE` clause, the window view will use `AggregatingMergeTree` as the default inner engine.
|
||||
|
||||
When creating a window view without `TO [db].[table]`, you must specify `ENGINE` – the table engine for storing data.
|
||||
|
||||
### Time Window Functions
|
||||
|
||||
@ -297,6 +299,8 @@ CREATE WINDOW VIEW test.wv TO test.dst WATERMARK=ASCENDING ALLOWED_LATENESS=INTE
|
||||
|
||||
Note that elements emitted by a late firing should be treated as updated results of a previous computation. Instead of firing at the end of windows, the window view will fire immediately when the late event arrives. Thus, it will result in multiple outputs for the same window. Users need to take these duplicated results into account or deduplicate them.
|
||||
|
||||
You can modify `SELECT` query that was specified in the window view by using `ALTER TABLE … MODIFY QUERY` statement. The data structure resulting in a new `SELECT` query should be the same as the original `SELECT` query when with or without `TO [db.]name` clause. Note that the data in the current window will be lost because the intermediate state cannot be reused.
|
||||
|
||||
### Monitoring New Windows
|
||||
|
||||
Window view supports the [WATCH](../../../sql-reference/statements/watch.md) query to monitoring changes, or use `TO` syntax to output the results to a table.
|
||||
@ -314,6 +318,7 @@ WATCH [db.]window_view
|
||||
|
||||
- `window_view_clean_interval`: The clean interval of window view in seconds to free outdated data. The system will retain the windows that have not been fully triggered according to the system time or `WATERMARK` configuration, and the other data will be deleted.
|
||||
- `window_view_heartbeat_interval`: The heartbeat interval in seconds to indicate the watch query is alive.
|
||||
- `wait_for_window_view_fire_signal_timeout`: Timeout for waiting for window view fire signal in event time processing.
|
||||
|
||||
### Example
|
||||
|
||||
|
@ -32,6 +32,7 @@ The list of available `SYSTEM` statements:
|
||||
- [START TTL MERGES](#query_language-start-ttl-merges)
|
||||
- [STOP MOVES](#query_language-stop-moves)
|
||||
- [START MOVES](#query_language-start-moves)
|
||||
- [SYSTEM UNFREEZE](#query_language-system-unfreeze)
|
||||
- [STOP FETCHES](#query_language-system-stop-fetches)
|
||||
- [START FETCHES](#query_language-system-start-fetches)
|
||||
- [STOP REPLICATED SENDS](#query_language-system-start-replicated-sends)
|
||||
@ -239,6 +240,14 @@ Returns `Ok.` even if table does not exist. Returns error when database does not
|
||||
SYSTEM START MOVES [[db.]merge_tree_family_table_name]
|
||||
```
|
||||
|
||||
### SYSTEM UNFREEZE {#query_language-system-unfreeze}
|
||||
|
||||
Clears freezed backup with the specified name from all the disks. See more about unfreezing separate parts in [ALTER TABLE table_name UNFREEZE WITH NAME ](alter/partition.md#alter_unfreeze-partition)
|
||||
|
||||
``` sql
|
||||
SYSTEM UNFREEZE WITH NAME <backup_name>
|
||||
```
|
||||
|
||||
## Managing ReplicatedMergeTree Tables
|
||||
|
||||
ClickHouse can manage background replication related processes in [ReplicatedMergeTree](../../engines/table-engines/mergetree-family/replication.md#table_engines-replication) tables.
|
||||
|
@ -38,7 +38,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
|
||||
При слиянии `ReplacingMergeTree` оставляет только строку для каждого уникального ключа сортировки:
|
||||
|
||||
- Последнюю в выборке, если `ver` не задан. Под выборкой здесь понимается набор строк в наборе кусков данных, участвующих в слиянии. Последний по времени создания кусок (последняя вставка) будет последним в выборке. Таким образом, после дедупликации для каждого значения ключа сортировки останется самая последняя строка из самой последней вставки.
|
||||
- С максимальной версией, если `ver` задан.
|
||||
- С максимальной версией, если `ver` задан. Если `ver` одинаковый у нескольких строк, то для них используется правило -- если `ver` не задан, т.е. в результате слияния останется самая последняя строка из самой последней вставки.
|
||||
|
||||
**Секции запроса**
|
||||
|
||||
|
@ -22,9 +22,9 @@ ClickHouse позволяет автоматически удалять данн
|
||||
|
||||
ClickHouse не удаляет данные в реальном времени, как СУБД [OLTP](https://en.wikipedia.org/wiki/Online_transaction_processing). Больше всего на такое удаление похожи мутации. Они выполняются с помощью запросов `ALTER ... DELETE` или `ALTER ... UPDATE`. В отличие от обычных запросов `DELETE` и `UPDATE`, мутации выполняются асинхронно, в пакетном режиме, не в реальном времени. В остальном после слов `ALTER TABLE` синтаксис обычных запросов и мутаций одинаковый.
|
||||
|
||||
`ALTER DELETE` можно использовать для гибкого удаления устаревших данных. Если вам нужно делать это регулярно, единственный недостаток такого подхода будет заключаться в том, что потребуется внешняя система для запуска запроса. Кроме того, могут возникнуть некоторые проблемы с производительностью, поскольку мутации перезаписывают целые куски данных если в них содержится хотя бы одна строка, которую нужно удалить.
|
||||
`ALTER DELETE` можно использовать для гибкого удаления устаревших данных. Если вам нужно делать это регулярно, основной недостаток такого подхода будет заключаться в том, что потребуется внешняя система для запуска запроса. Кроме того, могут возникнуть некоторые проблемы с производительностью, поскольку мутации перезаписывают целые куски данных если в них содержится хотя бы одна строка, которую нужно удалить.
|
||||
|
||||
Это самый распространенный подход к тому, чтобы обеспечить соблюдение принципов [GDPR](https://gdpr-info.eu) в вашей системе на ClickHouse.
|
||||
Это - самый распространенный подход к тому, чтобы обеспечить соблюдение принципов [GDPR](https://gdpr-info.eu) в вашей системе на ClickHouse.
|
||||
|
||||
Подробнее смотрите в разделе [Мутации](../../sql-reference/statements/alter/index.md#alter-mutations).
|
||||
|
||||
|
@ -41,6 +41,8 @@ sidebar_label: "Клиентские библиотеки от сторонни
|
||||
- [ClickHouse (Ruby)](https://github.com/shlima/click_house)
|
||||
- [clickhouse-activerecord](https://github.com/PNixx/clickhouse-activerecord)
|
||||
- Rust
|
||||
- [clickhouse.rs](https://github.com/loyd/clickhouse.rs)
|
||||
- [clickhouse-rs](https://github.com/suharev7/clickhouse-rs)
|
||||
- [Klickhouse](https://github.com/Protryon/klickhouse)
|
||||
- R
|
||||
- [clickhouse-r](https://github.com/hannesmuehleisen/clickhouse-r)
|
||||
|
@ -3,23 +3,21 @@ sidebar_position: 66
|
||||
sidebar_label: ClickHouse Keeper
|
||||
---
|
||||
|
||||
# [пре-продакшн] ClickHouse Keeper {#clickHouse-keeper}
|
||||
# ClickHouse Keeper {#clickHouse-keeper}
|
||||
|
||||
Сервер ClickHouse использует сервис координации [ZooKeeper](https://zookeeper.apache.org/) для [репликации](../engines/table-engines/mergetree-family/replication.md) данных и выполнения [распределенных DDL запросов](../sql-reference/distributed-ddl.md). ClickHouse Keeper — это альтернативный сервис координации, совместимый с ZooKeeper.
|
||||
|
||||
:::danger "Предупреждение"
|
||||
ClickHouse Keeper находится в стадии пре-продакшн и тестируется в CI ClickHouse и на нескольких внутренних инсталляциях.
|
||||
|
||||
## Детали реализации {#implementation-details}
|
||||
|
||||
ZooKeeper — один из первых широко известных сервисов координации с открытым исходным кодом. Он реализован на языке программирования Java, имеет достаточно простую и мощную модель данных. Алгоритм координации Zookeeper называется ZAB (ZooKeeper Atomic Broadcast). Он не гарантирует линеаризуемость операций чтения, поскольку каждый узел ZooKeeper обслуживает чтения локально. В отличие от ZooKeeper, ClickHouse Keeper реализован на C++ и использует алгоритм [RAFT](https://raft.github.io/), [реализация](https://github.com/eBay/NuRaft). Этот алгоритм позволяет достичь линеаризуемости чтения и записи, имеет несколько реализаций с открытым исходным кодом на разных языках.
|
||||
|
||||
По умолчанию ClickHouse Keeper предоставляет те же гарантии, что и ZooKeeper (линеаризуемость записей, последовательная согласованность чтений). У него есть совместимый клиент-серверный протокол, поэтому любой стандартный клиент ZooKeeper может использоваться для взаимодействия с ClickHouse Keeper. Снэпшоты и журналы имеют несовместимый с ZooKeeper формат, однако можно конвертировать данные Zookeeper в снэпшот ClickHouse Keeper с помощью `clickhouse-keeper-converter`. Межсерверный протокол ClickHouse Keeper также несовместим с ZooKeeper, поэтому создание смешанного кластера ZooKeeper / ClickHouse Keeper невозможно.
|
||||
По умолчанию ClickHouse Keeper предоставляет те же гарантии, что и ZooKeeper (линеаризуемость записей, нелинеаризуемость чтений). ClickHouse Keeper предоставляет совместимый клиент-серверный протокол, поэтому любой стандартный клиент ZooKeeper может использоваться для взаимодействия с ClickHouse Keeper. Снэпшоты и журналы имеют несовместимый с ZooKeeper формат, однако можно конвертировать данные Zookeeper в снэпшот ClickHouse Keeper с помощью `clickhouse-keeper-converter`. Межсерверный протокол ClickHouse Keeper также несовместим с ZooKeeper, поэтому создание смешанного кластера ZooKeeper / ClickHouse Keeper невозможно.
|
||||
|
||||
Система управления доступом (ACL) ClickHouse Keeper реализована так же, как в [ZooKeeper](https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl). ClickHouse Keeper поддерживает тот же набор разрешений и идентичные схемы: `world`, `auth`, `digest`, `host` и `ip`. Digest для аутентификации использует пару значений `username:password`. Пароль кодируется в Base64.
|
||||
Система управления доступом (ACL) ClickHouse Keeper реализована так же, как в [ZooKeeper](https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl). ClickHouse Keeper поддерживает тот же набор разрешений и идентичные схемы: `world`, `auth`, `digest`. Digest для аутентификации использует пару значений `username:password`. Пароль кодируется в Base64.
|
||||
|
||||
:::info "Примечание"
|
||||
:::note
|
||||
Внешние интеграции не поддерживаются.
|
||||
:::
|
||||
|
||||
## Конфигурация {#configuration}
|
||||
|
||||
@ -27,34 +25,36 @@ ClickHouse Keeper может использоваться как равноце
|
||||
|
||||
- `tcp_port` — порт для подключения клиента (по умолчанию для ZooKeeper: `2181`).
|
||||
- `tcp_port_secure` — зашифрованный порт для SSL-соединения между клиентом и сервером сервиса.
|
||||
- `server_id` — уникальный идентификатор сервера, каждый участник кластера должен иметь уникальный номер (1, 2, 3 и т. д.).
|
||||
- `log_storage_path` — путь к журналам координации, лучше хранить их на незанятом устройстве (актуально и для ZooKeeper).
|
||||
- `server_id` — уникальный идентификатор сервера, каждый участник кластера должен иметь уникальный номер (1, 2, 3 и т.д.).
|
||||
- `log_storage_path` — путь к журналам координации, лучше хранить их на не нагруженном устройстве (актуально и для ZooKeeper).
|
||||
- `snapshot_storage_path` — путь к снэпшотам координации.
|
||||
|
||||
Другие общие параметры наследуются из конфигурации сервера ClickHouse (`listen_host`, `logger`, и т. д.).
|
||||
|
||||
Настройки внутренней координации находятся в `<keeper_server>.<coordination_settings>`:
|
||||
|
||||
- `operation_timeout_ms` — максимальное время ожидания для одной клиентской операции в миллисекундах (по умолчанию: 10000).
|
||||
- `session_timeout_ms` — максимальное время ожидания для клиентской сессии в миллисекундах (по умолчанию: 30000).
|
||||
- `dead_session_check_period_ms` — частота, с которой ClickHouse Keeper проверяет мертвые сессии и удаляет их, в миллисекундах (по умолчанию: 500).
|
||||
- `heart_beat_interval_ms` — частота, с которой узел-лидер ClickHouse Keeper отправляет хартбиты узлам-последователям, в миллисекундах (по умолчанию: 500).
|
||||
- `election_timeout_lower_bound_ms` — время, после которого последователь может инициировать выборы лидера, если не получил от него сердцебиения (по умолчанию: 1000).
|
||||
- `election_timeout_upper_bound_ms` — время, после которого последователь должен инициировать выборы лидера, если не получил от него сердцебиения (по умолчанию: 2000).
|
||||
- `rotate_log_storage_interval` — количество записей в журнале координации для хранения в одном файле (по умолчанию: 100000).
|
||||
- `reserved_log_items` — минимальное количество записей в журнале координации которые нужно сохранять после снятия снепшота (по умолчанию: 100000).
|
||||
- `snapshot_distance` — частота, с которой ClickHouse Keeper делает новые снэпшоты (по количеству записей в журналах), в миллисекундах (по умолчанию: 100000).
|
||||
- `snapshots_to_keep` — количество снэпшотов для сохранения (по умолчанию: 3).
|
||||
- `stale_log_gap` — время, после которого лидер считает последователя устаревшим и отправляет ему снэпшот вместо журналов (по умолчанию: 10000).
|
||||
- `fresh_log_gap` — максимальное отставание от лидера в количестве записей журнала после которого последователь считает себя не отстающим (по умолчанию: 200).
|
||||
- `max_requests_batch_size` — количество запросов на запись, которые будут сгруппированы в один перед отправкой через RAFT (по умолчанию: 100).
|
||||
- `force_sync` — вызывать `fsync` при каждой записи в журнал координации (по умолчанию: true).
|
||||
- `quorum_reads` — выполнять запросы чтения аналогично запросам записи через весь консенсус RAFT с негативным эффектом на производительность и размер журналов (по умолчанию: false).
|
||||
- `raft_logs_level` — уровень логгирования сообщений в текстовый лог (trace, debug и т. д.) (по умолчанию: information).
|
||||
- `auto_forwarding` — разрешить пересылку запросов на запись от последователей лидеру (по умолчанию: true).
|
||||
- `shutdown_timeout` — время ожидания завершения внутренних подключений и выключения, в миллисекундах (по умолчанию: 5000).
|
||||
- `dead_session_check_period_ms` — частота, с которой ClickHouse Keeper проверяет мертвые сессии и удаляет их, в миллисекундах (по умолчанию: 500).
|
||||
- `election_timeout_lower_bound_ms` — время, после которого последователь может инициировать перевыбор лидера, если не получил от него контрольный сигнал (по умолчанию: 1000).
|
||||
- `election_timeout_upper_bound_ms` — время, после которого последователь должен инициировать перевыбор лидера, если не получил от него контрольный сигнал (по умолчанию: 2000).
|
||||
- `force_sync` — вызывать `fsync` при каждой записи в журнал координации (по умолчанию: true).
|
||||
- `four_letter_word_white_list` — список разрешенных 4-х буквенных команд (по умолчанию: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro").
|
||||
- `fresh_log_gap` — минимальное отставание от лидера в количестве записей журнала после которого последователь считает себя актуальным (по умолчанию: 200).
|
||||
- `heart_beat_interval_ms` — частота, с которой узел-лидер ClickHouse Keeper отправляет контрольные сигналы узлам-последователям, в миллисекундах (по умолчанию: 500).
|
||||
- `max_requests_batch_size` — количество запросов на запись, которые будут сгруппированы в один перед отправкой через RAFT (по умолчанию: 100).
|
||||
- `min_session_timeout_ms` — Min timeout for client session (ms) (default: 10000).
|
||||
- `operation_timeout_ms` — максимальное время ожидания для одной клиентской операции в миллисекундах (по умолчанию: 10000).
|
||||
- `quorum_reads` — выполнять запросы чтения аналогично запросам записи через консенсус RAFT (по умолчанию: false).
|
||||
- `raft_logs_level` — уровень логгирования сообщений в текстовый лог (trace, debug и т. д.) (по умолчанию: default).
|
||||
- `reserved_log_items` — минимальное количество записей в журнале координации которые нужно сохранять после снятия снепшота (по умолчанию: 100000).
|
||||
- `rotate_log_storage_interval` — количество записей в журнале координации для хранения в одном файле (по умолчанию: 100000).
|
||||
- `session_timeout_ms` — максимальное время ожидания для клиентской сессии в миллисекундах (по умолчанию: 30000).
|
||||
- `shutdown_timeout` — время ожидания завершения внутренних подключений при выключении, в миллисекундах (по умолчанию: 5000).
|
||||
- `snapshot_distance` — частота, с которой ClickHouse Keeper делает новые снэпшоты (по количеству записей в журналах) (по умолчанию: 100000).
|
||||
- `snapshots_to_keep` — количество снэпшотов для хранения (по умолчанию: 3).
|
||||
- `stale_log_gap` — время, после которого лидер считает последователя отставшим и отправляет ему снэпшот вместо журналов (по умолчанию: 10000).
|
||||
- `startup_timeout` — время отключения сервера, если он не подключается к другим участникам кворума, в миллисекундах (по умолчанию: 30000).
|
||||
- `four_letter_word_allow_list` — список разрешенных 4-х буквенных команд (по умолчанию: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro").
|
||||
|
||||
|
||||
Конфигурация кворума находится в `<keeper_server>.<raft_configuration>` и содержит описание серверов.
|
||||
|
||||
@ -67,6 +67,10 @@ ClickHouse Keeper может использоваться как равноце
|
||||
- `port` — порт, на котором серверу доступны соединения для внутренней коммуникации.
|
||||
|
||||
|
||||
:::note
|
||||
В случае изменения топологии кластера ClickHouse Keeper(например, замены сервера), удостоверьтесь, что вы сохраняеете отношение `server_id` - `hostname`, не переиспользуете существующие `server_id` для для новых серверов и не перемешиваете идентификаторы. Подобные ошибки могут случаться, если вы используете автоматизацию при разворачивании кластера без логики сохранения идентификаторов.
|
||||
:::
|
||||
|
||||
Примеры конфигурации кворума с тремя узлами можно найти в [интеграционных тестах](https://github.com/ClickHouse/ClickHouse/tree/master/tests/integration) с префиксом `test_keeper_`. Пример конфигурации для сервера №1:
|
||||
|
||||
```xml
|
||||
@ -314,4 +318,31 @@ clickhouse-keeper-converter --zookeeper-logs-dir /var/lib/zookeeper/version-2 --
|
||||
|
||||
4. Скопируйте снэпшот на узлы сервера ClickHouse с настроенным `keeper` или запустите ClickHouse Keeper вместо ZooKeeper. Снэпшот должен сохраняться на всех узлах: в противном случае пустые узлы могут захватить лидерство и сконвертированные данные могут быть отброшены на старте.
|
||||
|
||||
## Восстановление после потери кворума
|
||||
|
||||
Так как ClickHouse Keeper основан на протоколе Raft, он может оставаться работоспособным при отказе определенного количества нод в зависимости от размера кластера.
|
||||
Например, для кластера из 3 нод, алгоритм кворума продолжает работать при отказе не более чем одной ноды.
|
||||
|
||||
Конфигурация кластера может быть изменена динамически с некоторыми ограничениями.
|
||||
Переконфигурация также использует Raft, поэтому для добавление новой ноды кластера или исключения старой ноды из него требуется достижения кворума в рамках текущей конфигурации кластера.
|
||||
Если в вашем кластере произошел отказ большего числа нод, чем допускает Raft для вашей текущей конфигурации и у вас нет возможности восстановить их работоспособность, Raft перестанет работать и не позволит изменить конфигурацию стандартным механизмом.
|
||||
|
||||
Тем не менее ClickHousr Keeper имеет возможность запуститься в режиме восстановления, который позволяет переконфигурировать класте используя только одну ноду кластера.
|
||||
Этот механизм может использоваться только как крайняя мера, когда вы не можете восстановить существующие ноды кластера или запустить новый сервер с тем же идентификатором.
|
||||
|
||||
Важно:
|
||||
- Удостоверьтесь, что отказавшие ноды не смогут в дальнейшем подключиться к кластеру в будущем.
|
||||
- Не запускайте новые ноды, пока не завешите процедуру ниже.
|
||||
|
||||
После того, как выполнили действия выше выполните следующие шаги.
|
||||
1. Выберете одну ноду Keeper, которая станет новым лидером. Учтите, что данные которые с этой ноды будут испольщзованы всем кластером, поэтому рекомендуется выбрать ноду с наиболее актуальным состоянием.
|
||||
2. Перед дальнейшими действиям сделайте резервную копию данных из директорий `log_storage_path` и `snapshot_storage_path`.
|
||||
3. Измените настройки на всех нодах кластера, которые вы собираетесь использовать.
|
||||
4. Отправьте команду `rcvr` на ноду, которую вы выбрали или остановите ее и запустите заново с аргументом `--force-recovery`. Это переведет ноду в режим восстановления.
|
||||
5. Запускайте остальные ноды кластера по одной и проверяйте, что команда `mntr` возвращает `follower` в выводе состояния `zk_server_state` перед тем, как запустить следующую ноду.
|
||||
6. Пока нода работает в режиме восстановления, лидер будет возвращать ошибку на запрос `mntr` пока кворум не будет достигнут с помощью новых нод. Любые запросы от клиентов и постедователей будут возвращать ошибку.
|
||||
7. После достижения кворума лидер перейдет в нормальный режим работы и станет обрабатывать все запросы через Raft. Удостоверьтесь, что запрос `mntr` возвращает `leader` в выводе состояния `zk_server_state`.
|
||||
|
||||
[Original article](https://clickhouse.com/docs/en/operations/clickhouse-keeper/) <!--hide-->
|
||||
|
||||
|
||||
|
@ -174,22 +174,24 @@ SELECT test_function_sum(2, 2);
|
||||
Создание `test_function_sum_json` с именноваными аргументами и форматом [JSONEachRow](../../interfaces/formats.md#jsoneachrow) с использованием конфигурации XML.
|
||||
Файл test_function.xml.
|
||||
```xml
|
||||
<function>
|
||||
<type>executable</type>
|
||||
<name>test_function_sum_json</name>
|
||||
<return_type>UInt64</return_type>
|
||||
<return_name>result_name</return_name>
|
||||
<argument>
|
||||
<type>UInt64</type>
|
||||
<name>argument_1</name>
|
||||
</argument>
|
||||
<argument>
|
||||
<type>UInt64</type>
|
||||
<name>argument_2</name>
|
||||
</argument>
|
||||
<format>JSONEachRow</format>
|
||||
<command>test_function_sum_json.py</command>
|
||||
</function>
|
||||
<functions>
|
||||
<function>
|
||||
<type>executable</type>
|
||||
<name>test_function_sum_json</name>
|
||||
<return_type>UInt64</return_type>
|
||||
<return_name>result_name</return_name>
|
||||
<argument>
|
||||
<type>UInt64</type>
|
||||
<name>argument_1</name>
|
||||
</argument>
|
||||
<argument>
|
||||
<type>UInt64</type>
|
||||
<name>argument_2</name>
|
||||
</argument>
|
||||
<format>JSONEachRow</format>
|
||||
<command>test_function_sum_json.py</command>
|
||||
</function>
|
||||
</functions>
|
||||
```
|
||||
|
||||
Файл скрипта внутри папки `user_scripts` `test_function_sum_json.py`.
|
||||
@ -224,6 +226,50 @@ SELECT test_function_sum_json(2, 2);
|
||||
└──────────────────────────────┘
|
||||
```
|
||||
|
||||
Исполняемые пользовательские функции могут принимать константные параметры, их конфигурация является частью настройки `command` (работает только для пользовательских функций с типом `executable`).
|
||||
Файл test_function_parameter_python.xml.
|
||||
```xml
|
||||
<functions>
|
||||
<function>
|
||||
<type>executable</type>
|
||||
<name>test_function_parameter_python</name>
|
||||
<return_type>String</return_type>
|
||||
<argument>
|
||||
<type>UInt64</type>
|
||||
</argument>
|
||||
<format>TabSeparated</format>
|
||||
<command>test_function_parameter_python.py {test_parameter:UInt64}</command>
|
||||
</function>
|
||||
</functions>
|
||||
```
|
||||
|
||||
Файл скрипта внутри папки `user_scripts` `test_function_parameter_python.py`.
|
||||
|
||||
```python
|
||||
#!/usr/bin/python3
|
||||
|
||||
import sys
|
||||
|
||||
if __name__ == "__main__":
|
||||
for line in sys.stdin:
|
||||
print("Parameter " + str(sys.argv[1]) + " value " + str(line), end="")
|
||||
sys.stdout.flush()
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT test_function_parameter_python(1)(2);
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
``` text
|
||||
┌─test_function_parameter_python(1)(2)─┐
|
||||
│ Parameter 1 value 2 │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Обработка ошибок {#obrabotka-oshibok}
|
||||
|
||||
Некоторые функции могут кидать исключения в случае ошибочных данных. В этом случае, выполнение запроса прерывается, и текст ошибки выводится клиенту. При распределённой обработке запроса, при возникновении исключения на одном из серверов, на другие серверы пытается отправиться просьба тоже прервать выполнение запроса.
|
||||
|
@ -45,7 +45,7 @@ CHECK TABLE test_table;
|
||||
└───────────┴───────────┴─────────┘
|
||||
```
|
||||
|
||||
Если `check_query_single_value_result` = 0, запрос `CHECK TABLE` возвращает статус таблицы в целом.
|
||||
Если `check_query_single_value_result` = 1, запрос `CHECK TABLE` возвращает статус таблицы в целом.
|
||||
|
||||
```sql
|
||||
SET check_query_single_value_result = 1;
|
||||
|
@ -30,6 +30,7 @@ sidebar_label: SYSTEM
|
||||
- [START TTL MERGES](#query_language-start-ttl-merges)
|
||||
- [STOP MOVES](#query_language-stop-moves)
|
||||
- [START MOVES](#query_language-start-moves)
|
||||
- [SYSTEM UNFREEZE](#query_language-system-unfreeze)
|
||||
- [STOP FETCHES](#query_language-system-stop-fetches)
|
||||
- [START FETCHES](#query_language-system-start-fetches)
|
||||
- [STOP REPLICATED SENDS](#query_language-system-start-replicated-sends)
|
||||
@ -235,6 +236,14 @@ SYSTEM STOP MOVES [[db.]merge_tree_family_table_name]
|
||||
SYSTEM START MOVES [[db.]merge_tree_family_table_name]
|
||||
```
|
||||
|
||||
### SYSTEM UNFREEZE {#query_language-system-unfreeze}
|
||||
|
||||
Удаляет с диска все "замороженные" партиции данного бэкапа. Про удаление партиций по отдельности смотрите запрос [ALTER TABLE table_name UNFREEZE WITH NAME ](alter/partition.md#alter_unfreeze-partition)
|
||||
|
||||
``` sql
|
||||
SYSTEM UNFREEZE WITH NAME <backup_name>
|
||||
```
|
||||
|
||||
## Managing ReplicatedMergeTree Tables {#query-language-system-replicated}
|
||||
|
||||
ClickHouse может управлять фоновыми процессами связанными c репликацией в таблицах семейства [ReplicatedMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md).
|
||||
|
@ -5,4 +5,4 @@ sidebar_position: 82
|
||||
|
||||
# Что нового в ClickHouse?
|
||||
|
||||
Планы развития вкратце изложены [здесь](https://github.com/ClickHouse/ClickHouse/issues/17623), а новости по предыдущим релизам подробно описаны в [журнале изменений](./changelog/).
|
||||
Планы развития вкратце изложены [здесь](https://github.com/ClickHouse/ClickHouse/issues/32513), а новости по предыдущим релизам подробно описаны в [журнале изменений](./changelog/).
|
||||
|
@ -41,6 +41,10 @@ Yandex**没有**维护下面列出的库,也没有做过任何广泛的测试
|
||||
- Ruby
|
||||
- [ClickHouse (Ruby)](https://github.com/shlima/click_house)
|
||||
- [clickhouse-activerecord](https://github.com/PNixx/clickhouse-activerecord)
|
||||
- Rust
|
||||
- [clickhouse.rs](https://github.com/loyd/clickhouse.rs)
|
||||
- [clickhouse-rs](https://github.com/suharev7/clickhouse-rs)
|
||||
- [Klickhouse](https://github.com/Protryon/klickhouse)
|
||||
- R
|
||||
- [clickhouse-r](https://github.com/hannesmuehleisen/clickhouse-r)
|
||||
- [RClickHouse](https://github.com/IMSMWU/RClickHouse)
|
||||
|
@ -250,12 +250,14 @@ Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table defa
|
||||
`set allow_experimental_window_view = 1`。
|
||||
|
||||
``` sql
|
||||
CREATE WINDOW VIEW [IF NOT EXISTS] [db.]table_name [TO [db.]table_name] [ENGINE = engine] [WATERMARK = strategy] [ALLOWED_LATENESS = interval_function] AS SELECT ... GROUP BY time_window_function
|
||||
CREATE WINDOW VIEW [IF NOT EXISTS] [db.]table_name [TO [db.]table_name] [INNER ENGINE engine] [ENGINE engine] [WATERMARK strategy] [ALLOWED_LATENESS interval_function] [POPULATE] AS SELECT ... GROUP BY time_window_function
|
||||
```
|
||||
|
||||
Window view可以通过时间窗口聚合数据,并在满足窗口触发条件时自动触发对应窗口计算。其通过将计算状态保存降低处理延迟,支持将处理结果输出至目标表或通过`WATCH`语句输出至终端。
|
||||
|
||||
创建window view的方式和创建物化视图类似。Window view使用默认为`AggregatingMergeTree`的内部存储引擎存储计算中间状态。
|
||||
创建window view的方式和创建物化视图类似。Window view通过`INNER ENGINE`指定内部存储引擎以存储窗口计算中间状态,默认使用`AggregatingMergeTree`作为内部中间状态存储引擎。
|
||||
|
||||
创建不带`TO [db].[table]`的window view时,必须指定`ENGINE` – 用于存储数据的表引擎。
|
||||
|
||||
### 时间窗口函数 {#window-view-shi-jian-chuang-kou-han-shu}
|
||||
|
||||
@ -295,6 +297,10 @@ CREATE WINDOW VIEW test.wv TO test.dst WATERMARK=ASCENDING ALLOWED_LATENESS=INTE
|
||||
|
||||
需要注意的是,迟到消息需要更新之前的处理结果。与在窗口结束时触发不同,迟到消息到达时window view会立即触发计算。因此,会导致同一个窗口输出多次计算结果。用户需要注意这种情况,并消除重复结果。
|
||||
|
||||
### 查询语句修改 {#window-view-cha-xun-yu-ju-xiu-gai}
|
||||
|
||||
用户可以通过`ALTER TABLE ... MODIFY QUERY`语句修改window view的`SELECT`查询语句。无论是否使用`TO [db.]name`语句,新`SELECT`语句的数据结构均需和旧语句相同。需要注意的是,由于窗口计算中间状态无法复用,修改查询语句时会丢失当前窗口数据。
|
||||
|
||||
### 新窗口监控 {#window-view-xin-chuang-kou-jian-kong}
|
||||
|
||||
Window view可以通过`WATCH`语句将处理结果推送至终端,或通过`TO`语句将结果推送至数据表。
|
||||
@ -309,6 +315,7 @@ WATCH [db.]name [LIMIT n]
|
||||
|
||||
- `window_view_clean_interval`: window view清除过期数据间隔(单位为秒)。系统会定期清除过期数据,尚未触发的窗口数据不会被清除。
|
||||
- `window_view_heartbeat_interval`: 用于判断watch查询活跃的心跳时间间隔。
|
||||
- `wait_for_window_view_fire_signal_timeout`: Event time 处理模式下,窗口触发信号等待超时时间。
|
||||
|
||||
### 示例 {#window-view-shi-li}
|
||||
|
||||
|
@ -26,6 +26,7 @@ sidebar_label: SYSTEM
|
||||
- [START TTL MERGES](#query_language-start-ttl-merges)
|
||||
- [STOP MOVES](#query_language-stop-moves)
|
||||
- [START MOVES](#query_language-start-moves)
|
||||
- [SYSTEM UNFREEZE](#query_language-system-unfreeze)
|
||||
- [STOP FETCHES](#query_language-system-stop-fetches)
|
||||
- [START FETCHES](#query_language-system-start-fetches)
|
||||
- [STOP REPLICATED SENDS](#query_language-system-start-replicated-sends)
|
||||
@ -203,6 +204,14 @@ SYSTEM STOP MOVES [[db.]merge_tree_family_table_name]
|
||||
SYSTEM STOP MOVES [[db.]merge_tree_family_table_name]
|
||||
```
|
||||
|
||||
### SYSTEM UNFREEZE {#query_language-system-unfreeze}
|
||||
|
||||
从所有磁盘中清除具有指定名称的冻结备份。 查看更多关于解冻单独部分的信息 [ALTER TABLE table_name UNFREEZE WITH NAME ](alter/partition.md#alter_unfreeze-partition)
|
||||
|
||||
``` sql
|
||||
SYSTEM UNFREEZE WITH NAME <backup_name>
|
||||
```
|
||||
|
||||
## Managing ReplicatedMergeTree Tables {#query-language-system-replicated}
|
||||
|
||||
管理 [ReplicatedMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md)表的后台复制相关进程。
|
||||
|
@ -508,7 +508,7 @@ else ()
|
||||
endif()
|
||||
|
||||
if (USE_BINARY_HASH)
|
||||
add_custom_command(TARGET clickhouse POST_BUILD COMMAND ./clickhouse hash-binary > hash && ${OBJCOPY_PATH} --add-section .note.ClickHouse.hash=hash clickhouse COMMENT "Adding .note.ClickHouse.hash to clickhouse" VERBATIM)
|
||||
add_custom_command(TARGET clickhouse POST_BUILD COMMAND ./clickhouse hash-binary > hash && ${OBJCOPY_PATH} --add-section .clickhouse.hash=hash clickhouse COMMENT "Adding section '.clickhouse.hash' to clickhouse binary" VERBATIM)
|
||||
endif()
|
||||
|
||||
if (INSTALL_STRIPPED_BINARIES)
|
||||
|
@ -5,7 +5,7 @@
|
||||
#include <sys/stat.h>
|
||||
#include <pwd.h>
|
||||
|
||||
#if defined(__linux__)
|
||||
#if defined(OS_LINUX)
|
||||
#include <syscall.h>
|
||||
#include <linux/capability.h>
|
||||
#endif
|
||||
@ -789,7 +789,7 @@ int mainEntryClickHouseInstall(int argc, char ** argv)
|
||||
* then attempt to run this file will end up with a cryptic "Operation not permitted" message.
|
||||
*/
|
||||
|
||||
#if defined(__linux__)
|
||||
#if defined(OS_LINUX)
|
||||
fmt::print("Setting capabilities for clickhouse binary. This is optional.\n");
|
||||
std::string command = fmt::format("command -v setcap >/dev/null"
|
||||
" && command -v capsh >/dev/null"
|
||||
|
@ -2,7 +2,7 @@
|
||||
#include <csetjmp>
|
||||
#include <unistd.h>
|
||||
|
||||
#ifdef __linux__
|
||||
#ifdef OS_LINUX
|
||||
#include <sys/mman.h>
|
||||
#endif
|
||||
|
||||
@ -82,7 +82,7 @@ int mainEntryClickHouseDisks(int argc, char ** argv);
|
||||
int mainEntryClickHouseHashBinary(int, char **)
|
||||
{
|
||||
/// Intentionally without newline. So you can run:
|
||||
/// objcopy --add-section .note.ClickHouse.hash=<(./clickhouse hash-binary) clickhouse
|
||||
/// objcopy --add-section .clickhouse.hash=<(./clickhouse hash-binary) clickhouse
|
||||
std::cout << getHashOfLoadedBinaryHex();
|
||||
return 0;
|
||||
}
|
||||
@ -339,7 +339,7 @@ struct Checker
|
||||
checkRequiredInstructions();
|
||||
}
|
||||
} checker
|
||||
#ifndef __APPLE__
|
||||
#ifndef OS_DARWIN
|
||||
__attribute__((init_priority(101))) /// Run before other static initializers.
|
||||
#endif
|
||||
;
|
||||
|
@ -619,7 +619,7 @@ private:
|
||||
|
||||
public:
|
||||
explicit MarkovModel(MarkovModelParameters params_)
|
||||
: params(params_), code_points(params.order, BEGIN) {}
|
||||
: params(std::move(params_)), code_points(params.order, BEGIN) {}
|
||||
|
||||
void consume(const char * data, size_t size)
|
||||
{
|
||||
@ -830,7 +830,7 @@ private:
|
||||
MarkovModel markov_model;
|
||||
|
||||
public:
|
||||
StringModel(UInt64 seed_, MarkovModelParameters params_) : seed(seed_), markov_model(params_) {}
|
||||
StringModel(UInt64 seed_, MarkovModelParameters params_) : seed(seed_), markov_model(std::move(params_)) {}
|
||||
|
||||
void train(const IColumn & column) override
|
||||
{
|
||||
|
@ -744,16 +744,18 @@ int Server::main(const std::vector<std::string> & /*args*/)
|
||||
/// But there are other sections of the binary (e.g. exception handling tables)
|
||||
/// that are interpreted (not executed) but can alter the behaviour of the program as well.
|
||||
|
||||
/// Please keep the below log messages in-sync with the ones in daemon/BaseDaemon.cpp
|
||||
|
||||
String calculated_binary_hash = getHashOfLoadedBinaryHex();
|
||||
|
||||
if (stored_binary_hash.empty())
|
||||
{
|
||||
LOG_WARNING(log, "Calculated checksum of the binary: {}."
|
||||
" There is no information about the reference checksum.", calculated_binary_hash);
|
||||
LOG_WARNING(log, "Integrity check of the executable skipped because the reference checksum could not be read."
|
||||
" (calculated checksum: {})", calculated_binary_hash);
|
||||
}
|
||||
else if (calculated_binary_hash == stored_binary_hash)
|
||||
{
|
||||
LOG_INFO(log, "Calculated checksum of the binary: {}, integrity check passed.", calculated_binary_hash);
|
||||
LOG_INFO(log, "Integrity check of the executable successfully passed (checksum: {})", calculated_binary_hash);
|
||||
}
|
||||
else
|
||||
{
|
||||
@ -769,14 +771,14 @@ int Server::main(const std::vector<std::string> & /*args*/)
|
||||
else
|
||||
{
|
||||
throw Exception(ErrorCodes::CORRUPTED_DATA,
|
||||
"Calculated checksum of the ClickHouse binary ({0}) does not correspond"
|
||||
" to the reference checksum stored in the binary ({1})."
|
||||
" It may indicate one of the following:"
|
||||
" - the file {2} was changed just after startup;"
|
||||
" - the file {2} is damaged on disk due to faulty hardware;"
|
||||
" - the loaded executable is damaged in memory due to faulty hardware;"
|
||||
"Calculated checksum of the executable ({0}) does not correspond"
|
||||
" to the reference checksum stored in the executable ({1})."
|
||||
" This may indicate one of the following:"
|
||||
" - the executable {2} was changed just after startup;"
|
||||
" - the executable {2} was corrupted on disk due to faulty hardware;"
|
||||
" - the loaded executable was corrupted in memory due to faulty hardware;"
|
||||
" - the file {2} was intentionally modified;"
|
||||
" - logical error in code."
|
||||
" - a logical error in the code."
|
||||
, calculated_binary_hash, stored_binary_hash, executable_path);
|
||||
}
|
||||
}
|
||||
|
@ -164,6 +164,7 @@ enum class AccessType
|
||||
M(SYSTEM_FLUSH_LOGS, "FLUSH LOGS", GLOBAL, SYSTEM_FLUSH) \
|
||||
M(SYSTEM_FLUSH, "", GROUP, SYSTEM) \
|
||||
M(SYSTEM_THREAD_FUZZER, "SYSTEM START THREAD FUZZER, SYSTEM STOP THREAD FUZZER, START THREAD FUZZER, STOP THREAD FUZZER", GLOBAL, SYSTEM) \
|
||||
M(SYSTEM_UNFREEZE, "SYSTEM UNFREEZE", GLOBAL, SYSTEM) \
|
||||
M(SYSTEM, "", GROUP, ALL) /* allows to execute SYSTEM {SHUTDOWN|RELOAD CONFIG|...} */ \
|
||||
\
|
||||
M(dictGet, "dictHas, dictGetHierarchy, dictIsIn", DICTIONARY, ALL) /* allows to execute functions dictGet(), dictHas(), dictGetHierarchy(), dictIsIn() */\
|
||||
|
@ -120,6 +120,7 @@ namespace
|
||||
|
||||
AccessRights res = access;
|
||||
res.modifyFlags(modifier);
|
||||
res.modifyFlagsWithGrantOption(modifier);
|
||||
|
||||
/// Anyone has access to the "system" and "information_schema" database.
|
||||
res.grant(AccessType::SELECT, DatabaseCatalog::SYSTEM_DATABASE);
|
||||
|
@ -326,7 +326,7 @@ Strings BackupCoordinationDistributed::listFiles(const String & prefix, const St
|
||||
elements.push_back(String{new_element});
|
||||
}
|
||||
|
||||
std::sort(elements.begin(), elements.end());
|
||||
::sort(elements.begin(), elements.end());
|
||||
return elements;
|
||||
}
|
||||
|
||||
|
@ -84,7 +84,7 @@ namespace
|
||||
return true;
|
||||
});
|
||||
|
||||
std::sort(res.begin(), res.end());
|
||||
::sort(res.begin(), res.end());
|
||||
res.erase(std::unique(res.begin(), res.end()), res.end());
|
||||
return res;
|
||||
}
|
||||
@ -113,7 +113,7 @@ namespace
|
||||
return true;
|
||||
});
|
||||
|
||||
std::sort(res.begin(), res.end());
|
||||
::sort(res.begin(), res.end());
|
||||
res.erase(std::unique(res.begin(), res.end()), res.end());
|
||||
return res;
|
||||
}
|
||||
|
@ -950,18 +950,17 @@ void ClientBase::onProfileEvents(Block & block)
|
||||
progress_indication.addThreadIdToList(host_name, thread_id);
|
||||
auto event_name = names.getDataAt(i);
|
||||
auto value = array_values[i];
|
||||
|
||||
/// Ignore negative time delta or memory usage just in case.
|
||||
if (value < 0)
|
||||
continue;
|
||||
|
||||
if (event_name == user_time_name)
|
||||
{
|
||||
thread_times[host_name][thread_id].user_ms = value;
|
||||
}
|
||||
else if (event_name == system_time_name)
|
||||
{
|
||||
thread_times[host_name][thread_id].system_ms = value;
|
||||
}
|
||||
else if (event_name == MemoryTracker::USAGE_EVENT_NAME)
|
||||
{
|
||||
thread_times[host_name][thread_id].memory_usage = value;
|
||||
}
|
||||
}
|
||||
auto elapsed_time = profile_events.watch.elapsedMicroseconds();
|
||||
progress_indication.updateThreadEventData(thread_times, elapsed_time);
|
||||
|
@ -22,8 +22,8 @@ namespace ErrorCodes
|
||||
extern const int ILLEGAL_COLUMN;
|
||||
extern const int DUPLICATE_COLUMN;
|
||||
extern const int NUMBER_OF_DIMENSIONS_MISMATHED;
|
||||
extern const int NOT_IMPLEMENTED;
|
||||
extern const int SIZES_OF_COLUMNS_DOESNT_MATCH;
|
||||
extern const int ARGUMENT_OUT_OF_BOUND;
|
||||
}
|
||||
|
||||
namespace
|
||||
@ -179,7 +179,7 @@ ColumnObject::Subcolumn::Subcolumn(
|
||||
{
|
||||
}
|
||||
|
||||
size_t ColumnObject::Subcolumn::Subcolumn::size() const
|
||||
size_t ColumnObject::Subcolumn::size() const
|
||||
{
|
||||
size_t res = num_of_defaults_in_prefix;
|
||||
for (const auto & part : data)
|
||||
@ -187,7 +187,7 @@ size_t ColumnObject::Subcolumn::Subcolumn::size() const
|
||||
return res;
|
||||
}
|
||||
|
||||
size_t ColumnObject::Subcolumn::Subcolumn::byteSize() const
|
||||
size_t ColumnObject::Subcolumn::byteSize() const
|
||||
{
|
||||
size_t res = 0;
|
||||
for (const auto & part : data)
|
||||
@ -195,7 +195,7 @@ size_t ColumnObject::Subcolumn::Subcolumn::byteSize() const
|
||||
return res;
|
||||
}
|
||||
|
||||
size_t ColumnObject::Subcolumn::Subcolumn::allocatedBytes() const
|
||||
size_t ColumnObject::Subcolumn::allocatedBytes() const
|
||||
{
|
||||
size_t res = 0;
|
||||
for (const auto & part : data)
|
||||
@ -203,6 +203,37 @@ size_t ColumnObject::Subcolumn::Subcolumn::allocatedBytes() const
|
||||
return res;
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::get(size_t n, Field & res) const
|
||||
{
|
||||
if (isFinalized())
|
||||
{
|
||||
getFinalizedColumn().get(n, res);
|
||||
return;
|
||||
}
|
||||
|
||||
size_t ind = n;
|
||||
if (ind < num_of_defaults_in_prefix)
|
||||
{
|
||||
res = least_common_type.get()->getDefault();
|
||||
return;
|
||||
}
|
||||
|
||||
ind -= num_of_defaults_in_prefix;
|
||||
for (const auto & part : data)
|
||||
{
|
||||
if (ind < part->size())
|
||||
{
|
||||
part->get(ind, res);
|
||||
res = convertFieldToTypeOrThrow(res, *least_common_type.get());
|
||||
return;
|
||||
}
|
||||
|
||||
ind -= part->size();
|
||||
}
|
||||
|
||||
throw Exception(ErrorCodes::ARGUMENT_OUT_OF_BOUND, "Index ({}) for getting field is out of range", n);
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::checkTypes() const
|
||||
{
|
||||
DataTypes prefix_types;
|
||||
@ -221,7 +252,7 @@ void ColumnObject::Subcolumn::checkTypes() const
|
||||
|
||||
void ColumnObject::Subcolumn::insert(Field field)
|
||||
{
|
||||
auto info = getFieldInfo(field);
|
||||
auto info = DB::getFieldInfo(field);
|
||||
insert(std::move(field), std::move(info));
|
||||
}
|
||||
|
||||
@ -244,8 +275,8 @@ static bool isConversionRequiredBetweenIntegers(const IDataType & lhs, const IDa
|
||||
bool is_native_int = which_lhs.isNativeInt() && which_rhs.isNativeInt();
|
||||
bool is_native_uint = which_lhs.isNativeUInt() && which_rhs.isNativeUInt();
|
||||
|
||||
return (is_native_int || is_native_uint)
|
||||
&& lhs.getSizeOfValueInMemory() <= rhs.getSizeOfValueInMemory();
|
||||
return (!is_native_int && !is_native_uint)
|
||||
|| lhs.getSizeOfValueInMemory() > rhs.getSizeOfValueInMemory();
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::insert(Field field, FieldInfo info)
|
||||
@ -288,7 +319,7 @@ void ColumnObject::Subcolumn::insert(Field field, FieldInfo info)
|
||||
}
|
||||
else if (!least_common_base_type->equals(*base_type) && !isNothing(base_type))
|
||||
{
|
||||
if (!isConversionRequiredBetweenIntegers(*base_type, *least_common_base_type))
|
||||
if (isConversionRequiredBetweenIntegers(*base_type, *least_common_base_type))
|
||||
{
|
||||
base_type = getLeastSupertype(DataTypes{std::move(base_type), least_common_base_type}, true);
|
||||
type_changed = true;
|
||||
@ -305,35 +336,96 @@ void ColumnObject::Subcolumn::insert(Field field, FieldInfo info)
|
||||
|
||||
void ColumnObject::Subcolumn::insertRangeFrom(const Subcolumn & src, size_t start, size_t length)
|
||||
{
|
||||
assert(src.isFinalized());
|
||||
const auto & src_column = src.data.back();
|
||||
const auto & src_type = src.least_common_type.get();
|
||||
assert(start + length <= src.size());
|
||||
size_t end = start + length;
|
||||
|
||||
if (data.empty())
|
||||
{
|
||||
addNewColumnPart(src.least_common_type.get());
|
||||
data.back()->insertRangeFrom(*src_column, start, length);
|
||||
addNewColumnPart(src.getLeastCommonType());
|
||||
}
|
||||
else if (least_common_type.get()->equals(*src_type))
|
||||
else if (!least_common_type.get()->equals(*src.getLeastCommonType()))
|
||||
{
|
||||
data.back()->insertRangeFrom(*src_column, start, length);
|
||||
}
|
||||
else
|
||||
{
|
||||
auto new_least_common_type = getLeastSupertype(DataTypes{least_common_type.get(), src_type}, true);
|
||||
auto casted_column = castColumn({src_column, src_type, ""}, new_least_common_type);
|
||||
|
||||
if (!least_common_type.get()->equals(*new_least_common_type))
|
||||
auto new_least_common_type = getLeastSupertype(DataTypes{least_common_type.get(), src.getLeastCommonType()}, true);
|
||||
if (!new_least_common_type->equals(*least_common_type.get()))
|
||||
addNewColumnPart(std::move(new_least_common_type));
|
||||
}
|
||||
|
||||
data.back()->insertRangeFrom(*casted_column, start, length);
|
||||
if (end <= src.num_of_defaults_in_prefix)
|
||||
{
|
||||
data.back()->insertManyDefaults(length);
|
||||
return;
|
||||
}
|
||||
|
||||
if (start < src.num_of_defaults_in_prefix)
|
||||
data.back()->insertManyDefaults(src.num_of_defaults_in_prefix - start);
|
||||
|
||||
auto insert_from_part = [&](const auto & column, size_t from, size_t n)
|
||||
{
|
||||
assert(from + n <= column->size());
|
||||
auto column_type = getDataTypeByColumn(*column);
|
||||
|
||||
if (column_type->equals(*least_common_type.get()))
|
||||
{
|
||||
data.back()->insertRangeFrom(*column, from, n);
|
||||
return;
|
||||
}
|
||||
|
||||
/// If we need to insert large range, there is no sense to cut part of column and cast it.
|
||||
/// Casting of all column and inserting from it can be faster.
|
||||
/// Threshold is just a guess.
|
||||
|
||||
if (n * 3 >= column->size())
|
||||
{
|
||||
auto casted_column = castColumn({column, column_type, ""}, least_common_type.get());
|
||||
data.back()->insertRangeFrom(*casted_column, from, n);
|
||||
return;
|
||||
}
|
||||
|
||||
auto casted_column = column->cut(from, n);
|
||||
casted_column = castColumn({casted_column, column_type, ""}, least_common_type.get());
|
||||
data.back()->insertRangeFrom(*casted_column, 0, n);
|
||||
};
|
||||
|
||||
size_t pos = 0;
|
||||
size_t processed_rows = src.num_of_defaults_in_prefix;
|
||||
|
||||
/// Find the first part of the column that intersects the range.
|
||||
while (pos < src.data.size() && processed_rows + src.data[pos]->size() < start)
|
||||
{
|
||||
processed_rows += src.data[pos]->size();
|
||||
++pos;
|
||||
}
|
||||
|
||||
/// Insert from the first part of column.
|
||||
if (pos < src.data.size() && processed_rows < start)
|
||||
{
|
||||
size_t part_start = start - processed_rows;
|
||||
size_t part_length = std::min(src.data[pos]->size() - part_start, end - start);
|
||||
insert_from_part(src.data[pos], part_start, part_length);
|
||||
processed_rows += src.data[pos]->size();
|
||||
++pos;
|
||||
}
|
||||
|
||||
/// Insert from the parts of column in the middle of range.
|
||||
while (pos < src.data.size() && processed_rows + src.data[pos]->size() < end)
|
||||
{
|
||||
insert_from_part(src.data[pos], 0, src.data[pos]->size());
|
||||
processed_rows += src.data[pos]->size();
|
||||
++pos;
|
||||
}
|
||||
|
||||
/// Insert from the last part of column if needed.
|
||||
if (pos < src.data.size() && processed_rows < end)
|
||||
{
|
||||
size_t part_end = end - processed_rows;
|
||||
insert_from_part(src.data[pos], 0, part_end);
|
||||
}
|
||||
}
|
||||
|
||||
bool ColumnObject::Subcolumn::isFinalized() const
|
||||
{
|
||||
return data.empty() ||
|
||||
(data.size() == 1 && !data[0]->isSparse() && num_of_defaults_in_prefix == 0);
|
||||
return num_of_defaults_in_prefix == 0 &&
|
||||
(data.empty() || (data.size() == 1 && !data[0]->isSparse()));
|
||||
}
|
||||
|
||||
void ColumnObject::Subcolumn::finalize()
|
||||
@ -432,6 +524,13 @@ void ColumnObject::Subcolumn::popBack(size_t n)
|
||||
num_of_defaults_in_prefix -= n;
|
||||
}
|
||||
|
||||
ColumnObject::Subcolumn ColumnObject::Subcolumn::cut(size_t start, size_t length) const
|
||||
{
|
||||
Subcolumn new_subcolumn(0, is_nullable);
|
||||
new_subcolumn.insertRangeFrom(*this, start, length);
|
||||
return new_subcolumn;
|
||||
}
|
||||
|
||||
Field ColumnObject::Subcolumn::getLastField() const
|
||||
{
|
||||
if (data.empty())
|
||||
@ -442,6 +541,18 @@ Field ColumnObject::Subcolumn::getLastField() const
|
||||
return (*last_part)[last_part->size() - 1];
|
||||
}
|
||||
|
||||
FieldInfo ColumnObject::Subcolumn::getFieldInfo() const
|
||||
{
|
||||
const auto & base_type = least_common_type.getBase();
|
||||
return FieldInfo
|
||||
{
|
||||
.scalar_type = base_type,
|
||||
.have_nulls = base_type->isNullable(),
|
||||
.need_convert = false,
|
||||
.num_dimensions = least_common_type.getNumberOfDimensions(),
|
||||
};
|
||||
}
|
||||
|
||||
ColumnObject::Subcolumn ColumnObject::Subcolumn::recreateWithDefaultValues(const FieldInfo & field_info) const
|
||||
{
|
||||
auto scalar_type = field_info.scalar_type;
|
||||
@ -479,6 +590,13 @@ const ColumnPtr & ColumnObject::Subcolumn::getFinalizedColumnPtr() const
|
||||
return data[0];
|
||||
}
|
||||
|
||||
ColumnObject::Subcolumn::LeastCommonType::LeastCommonType()
|
||||
: type(std::make_shared<DataTypeNothing>())
|
||||
, base_type(type)
|
||||
, num_dimensions(0)
|
||||
{
|
||||
}
|
||||
|
||||
ColumnObject::Subcolumn::LeastCommonType::LeastCommonType(DataTypePtr type_)
|
||||
: type(std::move(type_))
|
||||
, base_type(getBaseTypeOfArray(type))
|
||||
@ -525,16 +643,6 @@ size_t ColumnObject::size() const
|
||||
return num_rows;
|
||||
}
|
||||
|
||||
MutableColumnPtr ColumnObject::cloneResized(size_t new_size) const
|
||||
{
|
||||
/// cloneResized with new_size == 0 is used for cloneEmpty().
|
||||
if (new_size != 0)
|
||||
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
|
||||
"ColumnObject doesn't support resize to non-zero length");
|
||||
|
||||
return ColumnObject::create(is_nullable);
|
||||
}
|
||||
|
||||
size_t ColumnObject::byteSize() const
|
||||
{
|
||||
size_t res = 0;
|
||||
@ -553,23 +661,21 @@ size_t ColumnObject::allocatedBytes() const
|
||||
|
||||
void ColumnObject::forEachSubcolumn(ColumnCallback callback)
|
||||
{
|
||||
if (!isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot iterate over non-finalized ColumnObject");
|
||||
|
||||
for (auto & entry : subcolumns)
|
||||
callback(entry->data.data.back());
|
||||
for (auto & part : entry->data.data)
|
||||
callback(part);
|
||||
}
|
||||
|
||||
void ColumnObject::insert(const Field & field)
|
||||
{
|
||||
const auto & object = field.get<const Object &>();
|
||||
|
||||
HashSet<StringRef, StringRefHash> inserted;
|
||||
HashSet<StringRef, StringRefHash> inserted_paths;
|
||||
size_t old_size = size();
|
||||
for (const auto & [key_str, value] : object)
|
||||
{
|
||||
PathInData key(key_str);
|
||||
inserted.insert(key_str);
|
||||
inserted_paths.insert(key_str);
|
||||
if (!hasSubcolumn(key))
|
||||
addSubcolumn(key, old_size);
|
||||
|
||||
@ -578,8 +684,14 @@ void ColumnObject::insert(const Field & field)
|
||||
}
|
||||
|
||||
for (auto & entry : subcolumns)
|
||||
if (!inserted.has(entry->path.getPath()))
|
||||
entry->data.insertDefault();
|
||||
{
|
||||
if (!inserted_paths.has(entry->path.getPath()))
|
||||
{
|
||||
bool inserted = tryInsertDefaultFromNested(entry);
|
||||
if (!inserted)
|
||||
entry->data.insertDefault();
|
||||
}
|
||||
}
|
||||
|
||||
++num_rows;
|
||||
}
|
||||
@ -594,26 +706,21 @@ void ColumnObject::insertDefault()
|
||||
|
||||
Field ColumnObject::operator[](size_t n) const
|
||||
{
|
||||
if (!isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get Field from non-finalized ColumnObject");
|
||||
|
||||
Object object;
|
||||
for (const auto & entry : subcolumns)
|
||||
object[entry->path.getPath()] = (*entry->data.data.back())[n];
|
||||
|
||||
Field object;
|
||||
get(n, object);
|
||||
return object;
|
||||
}
|
||||
|
||||
void ColumnObject::get(size_t n, Field & res) const
|
||||
{
|
||||
if (!isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot get Field from non-finalized ColumnObject");
|
||||
assert(n < size());
|
||||
res = Object();
|
||||
|
||||
auto & object = res.get<Object &>();
|
||||
for (const auto & entry : subcolumns)
|
||||
{
|
||||
auto it = object.try_emplace(entry->path.getPath()).first;
|
||||
entry->data.data.back()->get(n, it->second);
|
||||
entry->data.get(n, it->second);
|
||||
}
|
||||
}
|
||||
|
||||
@ -626,41 +733,28 @@ void ColumnObject::insertFrom(const IColumn & src, size_t n)
|
||||
void ColumnObject::insertRangeFrom(const IColumn & src, size_t start, size_t length)
|
||||
{
|
||||
const auto & src_object = assert_cast<const ColumnObject &>(src);
|
||||
if (!src_object.isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot insertRangeFrom non-finalized ColumnObject");
|
||||
|
||||
for (auto & entry : subcolumns)
|
||||
{
|
||||
if (src_object.hasSubcolumn(entry->path))
|
||||
entry->data.insertRangeFrom(src_object.getSubcolumn(entry->path), start, length);
|
||||
else
|
||||
entry->data.insertManyDefaults(length);
|
||||
}
|
||||
|
||||
for (const auto & entry : src_object.subcolumns)
|
||||
{
|
||||
if (!hasSubcolumn(entry->path))
|
||||
{
|
||||
if (entry->path.hasNested())
|
||||
{
|
||||
const auto & base_type = entry->data.getLeastCommonTypeBase();
|
||||
FieldInfo field_info
|
||||
{
|
||||
.scalar_type = base_type,
|
||||
.have_nulls = base_type->isNullable(),
|
||||
.need_convert = false,
|
||||
.num_dimensions = entry->data.getNumberOfDimensions(),
|
||||
};
|
||||
|
||||
addNestedSubcolumn(entry->path, field_info, num_rows);
|
||||
}
|
||||
addNestedSubcolumn(entry->path, entry->data.getFieldInfo(), num_rows);
|
||||
else
|
||||
{
|
||||
addSubcolumn(entry->path, num_rows);
|
||||
}
|
||||
}
|
||||
|
||||
auto & subcolumn = getSubcolumn(entry->path);
|
||||
subcolumn.insertRangeFrom(entry->data, start, length);
|
||||
auto & subcolumn = getSubcolumn(entry->path);
|
||||
subcolumn.insertRangeFrom(entry->data, start, length);
|
||||
}
|
||||
|
||||
for (auto & entry : subcolumns)
|
||||
{
|
||||
if (!src_object.hasSubcolumn(entry->path))
|
||||
{
|
||||
bool inserted = tryInsertManyDefaultsFromNested(entry);
|
||||
if (!inserted)
|
||||
entry->data.insertManyDefaults(length);
|
||||
}
|
||||
}
|
||||
|
||||
@ -668,21 +762,6 @@ void ColumnObject::insertRangeFrom(const IColumn & src, size_t start, size_t len
|
||||
finalize();
|
||||
}
|
||||
|
||||
ColumnPtr ColumnObject::replicate(const Offsets & offsets) const
|
||||
{
|
||||
if (!isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot replicate non-finalized ColumnObject");
|
||||
|
||||
auto res_column = ColumnObject::create(is_nullable);
|
||||
for (const auto & entry : subcolumns)
|
||||
{
|
||||
auto replicated_data = entry->data.data.back()->replicate(offsets)->assumeMutable();
|
||||
res_column->addSubcolumn(entry->path, std::move(replicated_data));
|
||||
}
|
||||
|
||||
return res_column;
|
||||
}
|
||||
|
||||
void ColumnObject::popBack(size_t length)
|
||||
{
|
||||
for (auto & entry : subcolumns)
|
||||
@ -692,10 +771,15 @@ void ColumnObject::popBack(size_t length)
|
||||
}
|
||||
|
||||
template <typename Func>
|
||||
ColumnPtr ColumnObject::applyForSubcolumns(Func && func, std::string_view func_name) const
|
||||
MutableColumnPtr ColumnObject::applyForSubcolumns(Func && func) const
|
||||
{
|
||||
if (!isFinalized())
|
||||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot {} non-finalized ColumnObject", func_name);
|
||||
{
|
||||
auto finalized = IColumn::mutate(getPtr());
|
||||
auto & finalized_object = assert_cast<ColumnObject &>(*finalized);
|
||||
finalized_object.finalize();
|
||||
return finalized_object.applyForSubcolumns(std::forward<Func>(func));
|
||||
}
|
||||
|
||||
auto res = ColumnObject::create(is_nullable);
|
||||
for (const auto & subcolumn : subcolumns)
|
||||
@ -703,22 +787,36 @@ ColumnPtr ColumnObject::applyForSubcolumns(Func && func, std::string_view func_n
|
||||
auto new_subcolumn = func(subcolumn->data.getFinalizedColumn());
|
||||
res->addSubcolumn(subcolumn->path, new_subcolumn->assumeMutable());
|
||||
}
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
ColumnPtr ColumnObject::permute(const Permutation & perm, size_t limit) const
|
||||
{
|
||||
return applyForSubcolumns([&](const auto & subcolumn) { return subcolumn.permute(perm, limit); }, "permute");
|
||||
return applyForSubcolumns([&](const auto & subcolumn) { return subcolumn.permute(perm, limit); });
|
||||
}
|
||||
|
||||
ColumnPtr ColumnObject::filter(const Filter & filter, ssize_t result_size_hint) const
|
||||
{
|
||||
return applyForSubcolumns([&](const auto & subcolumn) { return subcolumn.filter(filter, result_size_hint); }, "filter");
|
||||
return applyForSubcolumns([&](const auto & subcolumn) { return subcolumn.filter(filter, result_size_hint); });
|
||||
}
|
||||
|
||||
ColumnPtr ColumnObject::index(const IColumn & indexes, size_t limit) const
|
||||
{
|
||||
return applyForSubcolumns([&](const auto & subcolumn) { return subcolumn.index(indexes, limit); }, "index");
|
||||
return applyForSubcolumns([&](const auto & subcolumn) { return subcolumn.index(indexes, limit); });
|
||||
}
|
||||
|
||||
ColumnPtr ColumnObject::replicate(const Offsets & offsets) const
|
||||
{
|
||||
return applyForSubcolumns([&](const auto & subcolumn) { return subcolumn.replicate(offsets); });
|
||||
}
|
||||
|
||||
MutableColumnPtr ColumnObject::cloneResized(size_t new_size) const
|
||||
{
|
||||
if (new_size == 0)
|
||||
return ColumnObject::create(is_nullable);
|
||||
|
||||
return applyForSubcolumns([&](const auto & subcolumn) { return subcolumn.cloneResized(new_size); });
|
||||
}
|
||||
|
||||
const ColumnObject::Subcolumn & ColumnObject::getSubcolumn(const PathInData & key) const
|
||||
@ -810,6 +908,92 @@ void ColumnObject::addNestedSubcolumn(const PathInData & key, const FieldInfo &
|
||||
|
||||
if (num_rows == 0)
|
||||
num_rows = new_size;
|
||||
else if (new_size != num_rows)
|
||||
throw Exception(ErrorCodes::SIZES_OF_COLUMNS_DOESNT_MATCH,
|
||||
"Required size of subcolumn {} ({}) is inconsistent with column size ({})",
|
||||
key.getPath(), new_size, num_rows);
|
||||
}
|
||||
|
||||
const ColumnObject::Subcolumns::Node * ColumnObject::getLeafOfTheSameNested(const Subcolumns::NodePtr & entry) const
|
||||
{
|
||||
if (!entry->path.hasNested())
|
||||
return nullptr;
|
||||
|
||||
size_t old_size = entry->data.size();
|
||||
const auto * current_node = subcolumns.findLeaf(entry->path);
|
||||
const Subcolumns::Node * leaf = nullptr;
|
||||
|
||||
while (current_node)
|
||||
{
|
||||
/// Try to find the first Nested up to the current node.
|
||||
const auto * node_nested = subcolumns.findParent(current_node,
|
||||
[](const auto & candidate) { return candidate.isNested(); });
|
||||
|
||||
if (!node_nested)
|
||||
break;
|
||||
|
||||
/// Find the leaf with subcolumn that contains values
|
||||
/// for the last rows.
|
||||
/// If there are no leaves, skip current node and find
|
||||
/// the next node up to the current.
|
||||
leaf = subcolumns.findLeaf(node_nested,
|
||||
[&](const auto & candidate)
|
||||
{
|
||||
return candidate.data.size() > old_size;
|
||||
});
|
||||
|
||||
if (leaf)
|
||||
break;
|
||||
|
||||
current_node = node_nested->parent;
|
||||
}
|
||||
|
||||
if (leaf && isNothing(leaf->data.getLeastCommonTypeBase()))
|
||||
return nullptr;
|
||||
|
||||
return leaf;
|
||||
}
|
||||
|
||||
bool ColumnObject::tryInsertManyDefaultsFromNested(const Subcolumns::NodePtr & entry) const
|
||||
{
|
||||
const auto * leaf = getLeafOfTheSameNested(entry);
|
||||
if (!leaf)
|
||||
return false;
|
||||
|
||||
size_t old_size = entry->data.size();
|
||||
auto field_info = entry->data.getFieldInfo();
|
||||
|
||||
/// Cut the needed range from the found leaf
|
||||
/// and replace scalar values to the correct
|
||||
/// default values for given entry.
|
||||
auto new_subcolumn = leaf->data
|
||||
.cut(old_size, leaf->data.size() - old_size)
|
||||
.recreateWithDefaultValues(field_info);
|
||||
|
||||
entry->data.insertRangeFrom(new_subcolumn, 0, new_subcolumn.size());
|
||||
return true;
|
||||
}
|
||||
|
||||
bool ColumnObject::tryInsertDefaultFromNested(const Subcolumns::NodePtr & entry) const
|
||||
{
|
||||
const auto * leaf = getLeafOfTheSameNested(entry);
|
||||
if (!leaf)
|
||||
return false;
|
||||
|
||||
auto last_field = leaf->data.getLastField();
|
||||
if (last_field.isNull())
|
||||
return false;
|
||||
|
||||
size_t leaf_num_dimensions = leaf->data.getNumberOfDimensions();
|
||||
size_t entry_num_dimensions = entry->data.getNumberOfDimensions();
|
||||
|
||||
auto default_scalar = entry_num_dimensions > leaf_num_dimensions
|
||||
? createEmptyArrayField(entry_num_dimensions - leaf_num_dimensions)
|
||||
: entry->data.getLeastCommonTypeBase()->getDefault();
|
||||
|
||||
auto default_field = applyVisitor(FieldVisitorReplaceScalars(default_scalar, leaf_num_dimensions), last_field);
|
||||
entry->data.insert(std::move(default_field));
|
||||
return true;
|
||||
}
|
||||
|
||||
PathsInData ColumnObject::getKeys() const
|
||||
@ -835,7 +1019,7 @@ void ColumnObject::finalize()
|
||||
{
|
||||
const auto & least_common_type = entry->data.getLeastCommonType();
|
||||
|
||||
/// Do not add subcolumns, which consists only from NULLs.
|
||||
/// Do not add subcolumns, which consist only from NULLs.
|
||||
if (isNothing(getBaseTypeOfArray(least_common_type)))
|
||||
continue;
|
||||
|
||||
|
@ -65,6 +65,7 @@ public:
|
||||
size_t size() const;
|
||||
size_t byteSize() const;
|
||||
size_t allocatedBytes() const;
|
||||
void get(size_t n, Field & res) const;
|
||||
|
||||
bool isFinalized() const;
|
||||
const DataTypePtr & getLeastCommonType() const { return least_common_type.get(); }
|
||||
@ -84,6 +85,8 @@ public:
|
||||
void insertRangeFrom(const Subcolumn & src, size_t start, size_t length);
|
||||
void popBack(size_t n);
|
||||
|
||||
Subcolumn cut(size_t start, size_t length) const;
|
||||
|
||||
/// Converts all column's parts to the common type and
|
||||
/// creates a single column that stores all values.
|
||||
void finalize();
|
||||
@ -91,6 +94,8 @@ public:
|
||||
/// Returns last inserted field.
|
||||
Field getLastField() const;
|
||||
|
||||
FieldInfo getFieldInfo() const;
|
||||
|
||||
/// Recreates subcolumn with default scalar values and keeps sizes of arrays.
|
||||
/// Used to create columns of type Nested with consistent array sizes.
|
||||
Subcolumn recreateWithDefaultValues(const FieldInfo & field_info) const;
|
||||
@ -101,13 +106,16 @@ public:
|
||||
const IColumn & getFinalizedColumn() const;
|
||||
const ColumnPtr & getFinalizedColumnPtr() const;
|
||||
|
||||
const std::vector<WrappedPtr> & getData() const { return data; }
|
||||
size_t getNumberOfDefaultsInPrefix() const { return num_of_defaults_in_prefix; }
|
||||
|
||||
friend class ColumnObject;
|
||||
|
||||
private:
|
||||
class LeastCommonType
|
||||
{
|
||||
public:
|
||||
LeastCommonType() = default;
|
||||
LeastCommonType();
|
||||
explicit LeastCommonType(DataTypePtr type_);
|
||||
|
||||
const DataTypePtr & get() const { return type; }
|
||||
@ -175,6 +183,11 @@ public:
|
||||
/// It cares about consistency of sizes of Nested arrays.
|
||||
void addNestedSubcolumn(const PathInData & key, const FieldInfo & field_info, size_t new_size);
|
||||
|
||||
/// Finds a subcolumn from the same Nested type as @entry and inserts
|
||||
/// an array with default values with consistent sizes as in Nested type.
|
||||
bool tryInsertDefaultFromNested(const Subcolumns::NodePtr & entry) const;
|
||||
bool tryInsertManyDefaultsFromNested(const Subcolumns::NodePtr & entry) const;
|
||||
|
||||
const Subcolumns & getSubcolumns() const { return subcolumns; }
|
||||
Subcolumns & getSubcolumns() { return subcolumns; }
|
||||
PathsInData getKeys() const;
|
||||
@ -189,7 +202,6 @@ public:
|
||||
TypeIndex getDataType() const override { return TypeIndex::Object; }
|
||||
|
||||
size_t size() const override;
|
||||
MutableColumnPtr cloneResized(size_t new_size) const override;
|
||||
size_t byteSize() const override;
|
||||
size_t allocatedBytes() const override;
|
||||
void forEachSubcolumn(ColumnCallback callback) override;
|
||||
@ -197,13 +209,14 @@ public:
|
||||
void insertDefault() override;
|
||||
void insertFrom(const IColumn & src, size_t n) override;
|
||||
void insertRangeFrom(const IColumn & src, size_t start, size_t length) override;
|
||||
ColumnPtr replicate(const Offsets & offsets) const override;
|
||||
void popBack(size_t length) override;
|
||||
Field operator[](size_t n) const override;
|
||||
void get(size_t n, Field & res) const override;
|
||||
ColumnPtr permute(const Permutation & perm, size_t limit) const override;
|
||||
ColumnPtr filter(const Filter & filter, ssize_t result_size_hint) const override;
|
||||
ColumnPtr index(const IColumn & indexes, size_t limit) const override;
|
||||
ColumnPtr replicate(const Offsets & offsets) const override;
|
||||
MutableColumnPtr cloneResized(size_t new_size) const override;
|
||||
|
||||
/// All other methods throw exception.
|
||||
|
||||
@ -236,7 +249,11 @@ private:
|
||||
}
|
||||
|
||||
template <typename Func>
|
||||
ColumnPtr applyForSubcolumns(Func && func, std::string_view func_name) const;
|
||||
MutableColumnPtr applyForSubcolumns(Func && func) const;
|
||||
|
||||
/// For given subcolumn return subcolumn from the same Nested type.
|
||||
/// It's used to get shared sized of Nested to insert correct default values.
|
||||
const Subcolumns::Node * getLeafOfTheSameNested(const Subcolumns::NodePtr & entry) const;
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -318,7 +318,59 @@ template <typename T>
|
||||
void ColumnVector<T>::updatePermutation(IColumn::PermutationSortDirection direction, IColumn::PermutationSortStability stability,
|
||||
size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_ranges) const
|
||||
{
|
||||
auto sort = [](auto begin, auto end, auto pred) { ::sort(begin, end, pred); };
|
||||
bool reverse = direction == IColumn::PermutationSortDirection::Descending;
|
||||
bool ascending = direction == IColumn::PermutationSortDirection::Ascending;
|
||||
bool sort_is_stable = stability == IColumn::PermutationSortStability::Stable;
|
||||
|
||||
auto sort = [&](auto begin, auto end, auto pred)
|
||||
{
|
||||
/// A case for radix sort
|
||||
if constexpr (is_arithmetic_v<T> && !is_big_int_v<T>)
|
||||
{
|
||||
/// TODO: LSD RadixSort is currently not stable if direction is descending, or value is floating point
|
||||
bool use_radix_sort = (sort_is_stable && ascending && !std::is_floating_point_v<T>) || !sort_is_stable;
|
||||
size_t size = end - begin;
|
||||
|
||||
/// Thresholds on size. Lower threshold is arbitrary. Upper threshold is chosen by the type for histogram counters.
|
||||
if (size >= 256 && size <= std::numeric_limits<UInt32>::max() && use_radix_sort)
|
||||
{
|
||||
PaddedPODArray<ValueWithIndex<T>> pairs(size);
|
||||
size_t index = 0;
|
||||
|
||||
for (auto * it = begin; it != end; ++it)
|
||||
{
|
||||
pairs[index] = {data[*it], static_cast<UInt32>(*it)};
|
||||
++index;
|
||||
}
|
||||
|
||||
RadixSort<RadixSortTraits<T>>::executeLSD(pairs.data(), size, reverse, begin);
|
||||
|
||||
/// Radix sort treats all NaNs to be greater than all numbers.
|
||||
/// If the user needs the opposite, we must move them accordingly.
|
||||
if (std::is_floating_point_v<T> && nan_direction_hint < 0)
|
||||
{
|
||||
size_t nans_to_move = 0;
|
||||
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
{
|
||||
if (isNaN(data[begin[reverse ? i : size - 1 - i]]))
|
||||
++nans_to_move;
|
||||
else
|
||||
break;
|
||||
}
|
||||
|
||||
if (nans_to_move)
|
||||
{
|
||||
std::rotate(begin, begin + (reverse ? nans_to_move : size - nans_to_move), end);
|
||||
}
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
::sort(begin, end, pred);
|
||||
};
|
||||
auto partial_sort = [](auto begin, auto mid, auto end, auto pred) { ::partial_sort(begin, mid, end, pred); };
|
||||
|
||||
if (direction == IColumn::PermutationSortDirection::Ascending && stability == IColumn::PermutationSortStability::Unstable)
|
||||
|
@ -8,6 +8,9 @@
|
||||
#if defined(__AVX512F__) || defined(__AVX512BW__) || defined(__AVX__) || defined(__AVX2__)
|
||||
#include <immintrin.h>
|
||||
#endif
|
||||
#if defined(__aarch64__) && defined(__ARM_NEON)
|
||||
# include <arm_neon.h>
|
||||
#endif
|
||||
|
||||
/// Common helper methods for implementation of different columns.
|
||||
|
||||
@ -44,6 +47,22 @@ inline UInt64 bytes64MaskToBits64Mask(const UInt8 * bytes64)
|
||||
_mm_loadu_si128(reinterpret_cast<const __m128i *>(bytes64 + 32)), zero16))) << 32) & 0xffff00000000)
|
||||
| ((static_cast<UInt64>(_mm_movemask_epi8(_mm_cmpeq_epi8(
|
||||
_mm_loadu_si128(reinterpret_cast<const __m128i *>(bytes64 + 48)), zero16))) << 48) & 0xffff000000000000);
|
||||
#elif defined(__aarch64__) && defined(__ARM_NEON)
|
||||
const uint8x16_t bitmask = {0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80, 0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80};
|
||||
const auto * src = reinterpret_cast<const unsigned char *>(bytes64);
|
||||
const uint8x16_t p0 = vceqzq_u8(vld1q_u8(src));
|
||||
const uint8x16_t p1 = vceqzq_u8(vld1q_u8(src + 16));
|
||||
const uint8x16_t p2 = vceqzq_u8(vld1q_u8(src + 32));
|
||||
const uint8x16_t p3 = vceqzq_u8(vld1q_u8(src + 48));
|
||||
uint8x16_t t0 = vandq_u8(p0, bitmask);
|
||||
uint8x16_t t1 = vandq_u8(p1, bitmask);
|
||||
uint8x16_t t2 = vandq_u8(p2, bitmask);
|
||||
uint8x16_t t3 = vandq_u8(p3, bitmask);
|
||||
uint8x16_t sum0 = vpaddq_u8(t0, t1);
|
||||
uint8x16_t sum1 = vpaddq_u8(t2, t3);
|
||||
sum0 = vpaddq_u8(sum0, sum1);
|
||||
sum0 = vpaddq_u8(sum0, sum0);
|
||||
UInt64 res = vgetq_lane_u64(vreinterpretq_u64_u8(sum0), 0);
|
||||
#else
|
||||
UInt64 res = 0;
|
||||
for (size_t i = 0; i < 64; ++i)
|
||||
|
@ -90,7 +90,7 @@ public:
|
||||
/// Creates column with the same type and specified size.
|
||||
/// If size is less current size, then data is cut.
|
||||
/// If size is greater, than default values are appended.
|
||||
[[nodiscard]] virtual MutablePtr cloneResized(size_t /*size*/) const { throw Exception("Cannot cloneResized() column " + getName(), ErrorCodes::NOT_IMPLEMENTED); }
|
||||
[[nodiscard]] virtual MutablePtr cloneResized(size_t /*size*/) const { throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Cannot cloneResized() column {}", getName()); }
|
||||
|
||||
/// Returns number of values in column.
|
||||
[[nodiscard]] virtual size_t size() const = 0;
|
||||
|
120
src/Columns/tests/gtest_column_object.cpp
Normal file
120
src/Columns/tests/gtest_column_object.cpp
Normal file
@ -0,0 +1,120 @@
|
||||
#include <Common/FieldVisitorsAccurateComparison.h>
|
||||
#include <DataTypes/getLeastSupertype.h>
|
||||
#include <Interpreters/castColumn.h>
|
||||
#include <Interpreters/convertFieldToType.h>
|
||||
#include <Columns/ColumnObject.h>
|
||||
#include <Common/FieldVisitorToString.h>
|
||||
|
||||
#include <Common/randomSeed.h>
|
||||
#include <fmt/core.h>
|
||||
#include <pcg_random.hpp>
|
||||
#include <gtest/gtest.h>
|
||||
#include <random>
|
||||
|
||||
using namespace DB;
|
||||
|
||||
static pcg64 rng(randomSeed());
|
||||
|
||||
Field getRandomField(size_t type)
|
||||
{
|
||||
switch (type)
|
||||
{
|
||||
case 0:
|
||||
return rng();
|
||||
case 1:
|
||||
return std::uniform_real_distribution<>(0.0, 1.0)(rng);
|
||||
case 2:
|
||||
return std::string(rng() % 10, 'a' + rng() % 26);
|
||||
default:
|
||||
return Field();
|
||||
}
|
||||
}
|
||||
|
||||
std::pair<ColumnObject::Subcolumn, std::vector<Field>> generate(size_t size)
|
||||
{
|
||||
bool has_defaults = rng() % 3 == 0;
|
||||
size_t num_defaults = has_defaults ? rng() % size : 0;
|
||||
|
||||
ColumnObject::Subcolumn subcolumn(num_defaults, false);
|
||||
std::vector<Field> fields;
|
||||
|
||||
while (subcolumn.size() < size)
|
||||
{
|
||||
size_t part_size = rng() % (size - subcolumn.size()) + 1;
|
||||
size_t field_type = rng() % 3;
|
||||
|
||||
for (size_t i = 0; i < part_size; ++i)
|
||||
{
|
||||
fields.push_back(getRandomField(field_type));
|
||||
subcolumn.insert(fields.back());
|
||||
}
|
||||
}
|
||||
|
||||
std::vector<Field> result_fields;
|
||||
for (size_t i = 0; i < num_defaults; ++i)
|
||||
result_fields.emplace_back();
|
||||
|
||||
result_fields.insert(result_fields.end(), fields.begin(), fields.end());
|
||||
return {std::move(subcolumn), std::move(result_fields)};
|
||||
}
|
||||
|
||||
void checkFieldsAreEqual(ColumnObject::Subcolumn subcolumn, const std::vector<Field> & fields)
|
||||
{
|
||||
ASSERT_EQ(subcolumn.size(), fields.size());
|
||||
for (size_t i = 0; i < subcolumn.size(); ++i)
|
||||
{
|
||||
Field field;
|
||||
subcolumn.get(i, field); // Also check 'get' method.
|
||||
if (!applyVisitor(FieldVisitorAccurateEquals(), field, fields[i]))
|
||||
{
|
||||
std::cerr << fmt::format("Wrong value at position {}, expected {}, got {}",
|
||||
i, applyVisitor(FieldVisitorToString(), fields[i]), applyVisitor(FieldVisitorToString(), field));
|
||||
ASSERT_TRUE(false);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
constexpr size_t T = 1000;
|
||||
constexpr size_t N = 1000;
|
||||
|
||||
TEST(ColumnObject, InsertRangeFrom)
|
||||
{
|
||||
for (size_t t = 0; t < T; ++t)
|
||||
{
|
||||
auto [subcolumn_dst, fields_dst] = generate(N);
|
||||
auto [subcolumn_src, fields_src] = generate(N);
|
||||
|
||||
ASSERT_EQ(subcolumn_dst.size(), fields_dst.size());
|
||||
ASSERT_EQ(subcolumn_src.size(), fields_src.size());
|
||||
|
||||
const auto & type_dst = subcolumn_dst.getLeastCommonType();
|
||||
const auto & type_src = subcolumn_src.getLeastCommonType();
|
||||
auto type_res = getLeastSupertype(DataTypes{type_dst, type_src}, true);
|
||||
|
||||
size_t from = rng() % subcolumn_src.size();
|
||||
size_t to = rng() % subcolumn_src.size();
|
||||
if (from > to)
|
||||
std::swap(from, to);
|
||||
++to;
|
||||
|
||||
for (auto & field : fields_dst)
|
||||
{
|
||||
if (field.isNull())
|
||||
field = type_res->getDefault();
|
||||
else
|
||||
field = convertFieldToTypeOrThrow(field, *type_res);
|
||||
}
|
||||
|
||||
for (size_t i = from; i < to; ++i)
|
||||
{
|
||||
if (fields_src[i].isNull())
|
||||
fields_dst.push_back(type_res->getDefault());
|
||||
else
|
||||
fields_dst.push_back(convertFieldToTypeOrThrow(fields_src[i], *type_res));
|
||||
|
||||
}
|
||||
|
||||
subcolumn_dst.insertRangeFrom(subcolumn_src, from, to - from);
|
||||
checkFieldsAreEqual(subcolumn_dst, fields_dst);
|
||||
}
|
||||
}
|
@ -11,7 +11,7 @@
|
||||
#include <Common/FieldVisitors.h>
|
||||
|
||||
using namespace DB;
|
||||
pcg64 rng(randomSeed());
|
||||
static pcg64 rng(randomSeed());
|
||||
|
||||
std::pair<MutableColumnPtr, MutableColumnPtr> createColumns(size_t n, size_t k)
|
||||
{
|
||||
|
@ -11,7 +11,7 @@
|
||||
#include <pcg_random.hpp>
|
||||
#include <Common/thread_local_rng.h>
|
||||
|
||||
#if !defined(__APPLE__) && !defined(__FreeBSD__)
|
||||
#if !defined(OS_DARWIN) && !defined(OS_FREEBSD)
|
||||
#include <malloc.h>
|
||||
#endif
|
||||
|
||||
|
@ -1,4 +1,4 @@
|
||||
#if defined(__ELF__) && !defined(__FreeBSD__)
|
||||
#if defined(__ELF__) && !defined(OS_FREEBSD)
|
||||
|
||||
/*
|
||||
* Copyright 2012-present Facebook, Inc.
|
||||
|
@ -1,6 +1,6 @@
|
||||
#pragma once
|
||||
|
||||
#if defined(__ELF__) && !defined(__FreeBSD__)
|
||||
#if defined(__ELF__) && !defined(OS_FREEBSD)
|
||||
|
||||
/*
|
||||
* Copyright 2012-present Facebook, Inc.
|
||||
|
@ -1,4 +1,4 @@
|
||||
#if defined(__ELF__) && !defined(__FreeBSD__)
|
||||
#if defined(__ELF__) && !defined(OS_FREEBSD)
|
||||
|
||||
#include <Common/Elf.h>
|
||||
#include <Common/Exception.h>
|
||||
@ -176,9 +176,9 @@ String Elf::getBuildID(const char * nhdr_pos, size_t size)
|
||||
#endif // OS_SUNOS
|
||||
|
||||
|
||||
String Elf::getBinaryHash() const
|
||||
String Elf::getStoredBinaryHash() const
|
||||
{
|
||||
if (auto section = findSectionByName(".note.ClickHouse.hash"))
|
||||
if (auto section = findSectionByName(".clickhouse.hash"))
|
||||
return {section->begin(), section->end()};
|
||||
else
|
||||
return {};
|
||||
|
@ -1,6 +1,6 @@
|
||||
#pragma once
|
||||
|
||||
#if defined(__ELF__) && !defined(__FreeBSD__)
|
||||
#if defined(__ELF__) && !defined(OS_FREEBSD)
|
||||
|
||||
#include <IO/MMapReadBufferFromFile.h>
|
||||
|
||||
@ -61,7 +61,7 @@ public:
|
||||
static String getBuildID(const char * nhdr_pos, size_t size);
|
||||
|
||||
/// Hash of the binary for integrity checks.
|
||||
String getBinaryHash() const;
|
||||
String getStoredBinaryHash() const;
|
||||
|
||||
private:
|
||||
MMapReadBufferFromFile in;
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user