diff --git a/.gitignore b/.gitignore index a469ff7bca1..0bf31508419 100644 --- a/.gitignore +++ b/.gitignore @@ -33,6 +33,9 @@ /docs/ja/single.md /docs/fa/single.md /docs/en/development/cmake-in-clickhouse.md +/docs/ja/development/cmake-in-clickhouse.md +/docs/zh/development/cmake-in-clickhouse.md +/docs/ru/development/cmake-in-clickhouse.md # callgrind files callgrind.out.* diff --git a/AUTHORS b/AUTHORS index 12838d7fa14..1d2e5adc523 100644 --- a/AUTHORS +++ b/AUTHORS @@ -1,2 +1,2 @@ -To see the list of authors who created the source code of ClickHouse, published and distributed by YANDEX LLC as the owner, +To see the list of authors who created the source code of ClickHouse, published and distributed by ClickHouse, Inc. as the owner, run "SELECT * FROM system.contributors;" query on any ClickHouse server. diff --git a/LICENSE b/LICENSE index 9167b80e269..c46bc7d19e1 100644 --- a/LICENSE +++ b/LICENSE @@ -1,4 +1,4 @@ -Copyright 2016-2021 Yandex LLC +Copyright 2016-2021 ClickHouse, Inc. Apache License Version 2.0, January 2004 diff --git a/SECURITY.md b/SECURITY.md index 846b7e8239c..0405d5cf8fc 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -28,15 +28,16 @@ The following versions of ClickHouse server are currently being supported with s | 21.3 | ✅ | | 21.4 | :x: | | 21.5 | :x: | -| 21.6 | ✅ | +| 21.6 | :x: | | 21.7 | ✅ | | 21.8 | ✅ | +| 21.9 | ✅ | ## Reporting a Vulnerability We're extremely grateful for security researchers and users that report vulnerabilities to the ClickHouse Open Source Community. All reports are thoroughly investigated by developers. -To report a potential vulnerability in ClickHouse please send the details about it to [clickhouse-feedback@yandex-team.com](mailto:clickhouse-feedback@yandex-team.com). +To report a potential vulnerability in ClickHouse please send the details about it to [security@clickhouse.com](mailto:security@clickhouse.com). ### When Should I Report a Vulnerability? diff --git a/debian/control b/debian/control index 9b34e982698..633b7e7c8a3 100644 --- a/debian/control +++ b/debian/control @@ -1,16 +1,13 @@ Source: clickhouse Section: database Priority: optional -Maintainer: Alexey Milovidov +Maintainer: Alexey Milovidov Build-Depends: debhelper (>= 9), cmake | cmake3, ninja-build, - clang-11, - llvm-11, + clang-13, + llvm-13, libc6-dev, - libicu-dev, - libreadline-dev, - gperf, tzdata Standards-Version: 3.9.8 diff --git a/docker/test/stress/stress b/docker/test/stress/stress index 73a84ad4c40..8fc4ade2da6 100755 --- a/docker/test/stress/stress +++ b/docker/test/stress/stress @@ -70,6 +70,9 @@ def compress_stress_logs(output_path, files_prefix): def prepare_for_hung_check(drop_databases): # FIXME this function should not exist, but... + # ThreadFuzzer significantly slows down server and causes false-positive hung check failures + call("clickhouse client -q 'SYSTEM STOP THREAD FUZZER'", shell=True, stderr=STDOUT) + # We attach gdb to clickhouse-server before running tests # to print stacktraces of all crashes even if clickhouse cannot print it for some reason. # However, it obstruct checking for hung queries. diff --git a/docs/en/commercial/cloud.md b/docs/en/commercial/cloud.md index c6ed80d4fdb..afa2e23b7a8 100644 --- a/docs/en/commercial/cloud.md +++ b/docs/en/commercial/cloud.md @@ -6,4 +6,4 @@ toc_title: Cloud # ClickHouse Cloud Service {#clickhouse-cloud-service} !!! info "Info" - Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](/company/#contact) to learn more. + Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](https://clickhouse.com/company/#contact) to learn more. diff --git a/docs/en/commercial/support.md b/docs/en/commercial/support.md index 8ee976c8d6f..33b69b40b2d 100644 --- a/docs/en/commercial/support.md +++ b/docs/en/commercial/support.md @@ -6,4 +6,4 @@ toc_title: Support # ClickHouse Commercial Support Service {#clickhouse-commercial-support-service} !!! info "Info" - Detailed public description for ClickHouse support services is not ready yet, please [contact us](/company/#contact) to learn more. + Detailed public description for ClickHouse support services is not ready yet, please [contact us](https://clickhouse.com/company/#contact) to learn more. diff --git a/docs/en/development/build-osx.md b/docs/en/development/build-osx.md index a22d2031803..d188b4bb147 100644 --- a/docs/en/development/build-osx.md +++ b/docs/en/development/build-osx.md @@ -114,15 +114,25 @@ To do so, create the `/Library/LaunchDaemons/limit.maxfiles.plist` file with the ``` -Execute the following command: +Give the file correct permissions: ``` bash sudo chown root:wheel /Library/LaunchDaemons/limit.maxfiles.plist ``` -Reboot. +Validate that the file is correct: -To check if it’s working, you can use `ulimit -n` command. +``` bash +plutil /Library/LaunchDaemons/limit.maxfiles.plist +``` + +Load the file (or reboot): + +``` bash +sudo launchctl load -w /Library/LaunchDaemons/limit.maxfiles.plist +``` + +To check if it’s working, use the `ulimit -n` or `launchctl limit maxfiles` commands. ## Run ClickHouse server: diff --git a/docs/en/engines/table-engines/mergetree-family/mergetree.md b/docs/en/engines/table-engines/mergetree-family/mergetree.md index fd82dbaea7a..f7118f7557e 100644 --- a/docs/en/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/en/engines/table-engines/mergetree-family/mergetree.md @@ -100,8 +100,8 @@ For a description of parameters, see the [CREATE query description](../../../sql - `min_merge_bytes_to_use_direct_io` — The minimum data volume for merge operation that is required for using direct I/O access to the storage disk. When merging data parts, ClickHouse calculates the total storage volume of all the data to be merged. If the volume exceeds `min_merge_bytes_to_use_direct_io` bytes, ClickHouse reads and writes the data to the storage disk using the direct I/O interface (`O_DIRECT` option). If `min_merge_bytes_to_use_direct_io = 0`, then direct I/O is disabled. Default value: `10 * 1024 * 1024 * 1024` bytes. - `merge_with_ttl_timeout` — Minimum delay in seconds before repeating a merge with delete TTL. Default value: `14400` seconds (4 hours). - - `merge_with_recompression_ttl_timeout` — Minimum delay in seconds before repeating a merge with recompression TTL. Default value: `14400` seconds (4 hours). - - `try_fetch_recompressed_part_timeout` — Timeout (in seconds) before starting merge with recompression. During this time ClickHouse tries to fetch recompressed part from replica which assigned this merge with recompression. Default value: `7200` seconds (2 hours). + - `merge_with_recompression_ttl_timeout` — Minimum delay in seconds before repeating a merge with recompression TTL. Default value: `14400` seconds (4 hours). + - `try_fetch_recompressed_part_timeout` — Timeout (in seconds) before starting merge with recompression. During this time ClickHouse tries to fetch recompressed part from replica which assigned this merge with recompression. Default value: `7200` seconds (2 hours). - `write_final_mark` — Enables or disables writing the final index mark at the end of data part (after the last byte). Default value: 1. Don’t turn it off. - `merge_max_block_size` — Maximum number of rows in block for merge operations. Default value: 8192. - `storage_policy` — Storage policy. See [Using Multiple Block Devices for Data Storage](#table_engine-mergetree-multiple-volumes). @@ -335,7 +335,16 @@ SELECT count() FROM table WHERE u64 * i32 == 10 AND u64 * length(s) >= 1234 The optional `false_positive` parameter is the probability of receiving a false positive response from the filter. Possible values: (0, 1). Default value: 0.025. - Supported data types: `Int*`, `UInt*`, `Float*`, `Enum`, `Date`, `DateTime`, `String`, `FixedString`, `Array`, `LowCardinality`, `Nullable`, `UUID`. + Supported data types: `Int*`, `UInt*`, `Float*`, `Enum`, `Date`, `DateTime`, `String`, `FixedString`, `Array`, `LowCardinality`, `Nullable`, `UUID`, `Map`. + + For `Map` data type client can specify if index should be created for keys or values using [mapKeys](../../../sql-reference/functions/tuple-map-functions.md#mapkeys) or [mapValues](../../../sql-reference/functions/tuple-map-functions.md#mapvalues) function. + + Example of index creation for `Map` data type + +``` +INDEX map_key_index mapKeys(map_column) TYPE bloom_filter GRANULARITY 1 +INDEX map_key_index mapValues(map_column) TYPE bloom_filter GRANULARITY 1 +``` The following functions can use it: [equals](../../../sql-reference/functions/comparison-functions.md), [notEquals](../../../sql-reference/functions/comparison-functions.md), [in](../../../sql-reference/functions/in-functions.md), [notIn](../../../sql-reference/functions/in-functions.md), [has](../../../sql-reference/functions/array-functions.md). @@ -398,7 +407,7 @@ Projections are an experimental feature. To enable them you must set the [allow_ Projections are not supported in the `SELECT` statements with the [FINAL](../../../sql-reference/statements/select/from.md#select-from-final) modifier. ### Projection Query {#projection-query} -A projection query is what defines a projection. It implicitly selects data from the parent table. +A projection query is what defines a projection. It implicitly selects data from the parent table. **Syntax** ```sql @@ -548,7 +557,7 @@ ORDER BY d TTL d + INTERVAL 1 MONTH DELETE WHERE toDayOfWeek(d) = 1; ``` -Creating a table, where expired rows are recompressed: +Creating a table, where expired rows are recompressed: ```sql CREATE TABLE table_for_recompression diff --git a/docs/en/getting-started/example-datasets/github-events.md b/docs/en/getting-started/example-datasets/github-events.md index a6c71733832..e470e88b182 100644 --- a/docs/en/getting-started/example-datasets/github-events.md +++ b/docs/en/getting-started/example-datasets/github-events.md @@ -7,5 +7,5 @@ toc_title: GitHub Events Dataset contains all events on GitHub from 2011 to Dec 6 2020, the size is 3.1 billion records. Download size is 75 GB and it will require up to 200 GB space on disk if stored in a table with lz4 compression. -Full dataset description, insights, download instruction and interactive queries are posted [here](https://github-sql.github.io/explorer/). +Full dataset description, insights, download instruction and interactive queries are posted [here](https://ghe.clickhouse.tech/). diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md index c8b3d690e17..6e06fc1f06a 100644 --- a/docs/en/interfaces/formats.md +++ b/docs/en/interfaces/formats.md @@ -1553,18 +1553,20 @@ ClickHouse supports reading and writing [MessagePack](https://msgpack.org/) data ### Data Types Matching {#data-types-matching-msgpack} -| MsgPack data type | ClickHouse data type | -|---------------------------------|----------------------------------------------------------------------------------| -| `uint N`, `positive fixint` | [UIntN](../sql-reference/data-types/int-uint.md) | -| `int N` | [IntN](../sql-reference/data-types/int-uint.md) | -| `fixstr`, `str 8`, `str 16`, `str 32` | [String](../sql-reference/data-types/string.md), [FixedString](../sql-reference/data-types/fixedstring.md) | -| `float 32` | [Float32](../sql-reference/data-types/float.md) | -| `float 64` | [Float64](../sql-reference/data-types/float.md) | -| `uint 16` | [Date](../sql-reference/data-types/date.md) | -| `uint 32` | [DateTime](../sql-reference/data-types/datetime.md) | -| `uint 64` | [DateTime64](../sql-reference/data-types/datetime.md) | -| `fixarray`, `array 16`, `array 32`| [Array](../sql-reference/data-types/array.md) | -| `nil` | [Nothing](../sql-reference/data-types/special-data-types/nothing.md) | +| MessagePack data type (`INSERT`) | ClickHouse data type | MessagePack data type (`SELECT`) | +|--------------------------------------------------------------------|-----------------------------------------------------------|------------------------------------| +| `uint N`, `positive fixint` | [UIntN](../sql-reference/data-types/int-uint.md) | `uint N` | +| `int N` | [IntN](../sql-reference/data-types/int-uint.md) | `int N` | +| `bool` | [UInt8](../sql-reference/data-types/int-uint.md) | `uint 8` | +| `fixstr`, `str 8`, `str 16`, `str 32`, `bin 8`, `bin 16`, `bin 32` | [String](../sql-reference/data-types/string.md) | `bin 8`, `bin 16`, `bin 32` | +| `fixstr`, `str 8`, `str 16`, `str 32`, `bin 8`, `bin 16`, `bin 32` | [FixedString](../sql-reference/data-types/fixedstring.md) | `bin 8`, `bin 16`, `bin 32` | +| `float 32` | [Float32](../sql-reference/data-types/float.md) | `float 32` | +| `float 64` | [Float64](../sql-reference/data-types/float.md) | `float 64` | +| `uint 16` | [Date](../sql-reference/data-types/date.md) | `uint 16` | +| `uint 32` | [DateTime](../sql-reference/data-types/datetime.md) | `uint 32` | +| `uint 64` | [DateTime64](../sql-reference/data-types/datetime.md) | `uint 64` | +| `fixarray`, `array 16`, `array 32` | [Array](../sql-reference/data-types/array.md) | `fixarray`, `array 16`, `array 32` | +| `fixmap`, `map 16`, `map 32` | [Map](../sql-reference/data-types/map.md) | `fixmap`, `map 16`, `map 32` | Example: diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index e3a46d46cd7..31452f10fca 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -810,7 +810,7 @@ If ClickHouse should read more than `merge_tree_max_bytes_to_use_cache` bytes in The cache of uncompressed blocks stores data extracted for queries. ClickHouse uses this cache to speed up responses to repeated small queries. This setting protects the cache from trashing by queries that read a large amount of data. The [uncompressed_cache_size](../../operations/server-configuration-parameters/settings.md#server-settings-uncompressed_cache_size) server setting defines the size of the cache of uncompressed blocks. -Possible value: +Possible values: - Any positive integer. @@ -818,23 +818,23 @@ Default value: 2013265920. ## merge_tree_clear_old_temporary_directories_interval_seconds {#setting-merge-tree-clear-old-temporary-directories-interval-seconds} -The interval in seconds for ClickHouse to execute the cleanup old temporary directories. +Sets the interval in seconds for ClickHouse to execute the cleanup of old temporary directories. -Possible value: +Possible values: - Any positive integer. -Default value: 60. +Default value: `60` seconds. ## merge_tree_clear_old_parts_interval_seconds {#setting-merge-tree-clear-old-parts-interval-seconds} -The interval in seconds for ClickHouse to execute the cleanup old parts, WALs, and mutations. +Sets the interval in seconds for ClickHouse to execute the cleanup of old parts, WALs, and mutations. -Possible value: +Possible values: - Any positive integer. -Default value: 1. +Default value: `1` second. ## min_bytes_to_use_direct_io {#settings-min-bytes-to-use-direct-io} @@ -2833,6 +2833,43 @@ Possible values: Default value: `1`. +## output_format_csv_null_representation {#output_format_csv_null_representation} + +Defines the representation of `NULL` for [CSV](../../interfaces/formats.md#csv) output format. User can set any string as a value, for example, `My NULL`. + +Default value: `\N`. + +**Examples** + +Query + +```sql +SELECT * from csv_custom_null FORMAT CSV; +``` + +Result + +```text +788 +\N +\N +``` + +Query + +```sql +SET output_format_csv_null_representation = 'My NULL'; +SELECT * FROM csv_custom_null FORMAT CSV; +``` + +Result + +```text +788 +My NULL +My NULL +``` + ## output_format_tsv_null_representation {#output_format_tsv_null_representation} Defines the representation of `NULL` for [TSV](../../interfaces/formats.md#tabseparated) output format. User can set any string as a value, for example, `My NULL`. @@ -3306,7 +3343,7 @@ Result: └─────┘ ``` -## optimize_fuse_sum_count_avg {#optimize_fuse_sum_count_avg} +## optimize_syntax_fuse_functions {#optimize_syntax_fuse_functions} Enables to fuse aggregate functions with identical argument. It rewrites query contains at least two aggregate functions from [sum](../../sql-reference/aggregate-functions/reference/sum.md#agg_function-sum), [count](../../sql-reference/aggregate-functions/reference/count.md#agg_function-count) or [avg](../../sql-reference/aggregate-functions/reference/avg.md#agg_function-avg) with identical argument to [sumCount](../../sql-reference/aggregate-functions/reference/sumcount.md#agg_function-sumCount). @@ -3323,7 +3360,7 @@ Query: ``` sql CREATE TABLE fuse_tbl(a Int8, b Int8) Engine = Log; -SET optimize_fuse_sum_count_avg = 1; +SET optimize_syntax_fuse_functions = 1; EXPLAIN SYNTAX SELECT sum(a), sum(b), count(b), avg(b) from fuse_tbl FORMAT TSV; ``` @@ -3567,6 +3604,18 @@ Possible values: Default value: `1000`. +## short_circuit_function_evaluation {#short-circuit-function-evaluation} + +Allows calculating the [if](../../sql-reference/functions/conditional-functions.md#if), [multiIf](../../sql-reference/functions/conditional-functions.md#multiif), [and](../../sql-reference/functions/logical-functions.md#logical-and-function), and [or](../../sql-reference/functions/logical-functions.md#logical-or-function) functions according to a [short scheme](https://en.wikipedia.org/wiki/Short-circuit_evaluation). This helps optimize the execution of complex expressions in these functions and prevent possible exceptions (such as division by zero when it is not expected). + +Possible values: + +- `enable` — Enables short-circuit function evaluation for functions that are suitable for it (can throw an exception or computationally heavy). +- `force_enable` — Enables short-circuit function evaluation for all functions. +- `disable` — Disables short-circuit function evaluation. + +Default value: `enable`. + ## max_hyperscan_regexp_length {#max-hyperscan-regexp-length} Defines the maximum length for each regular expression in the [hyperscan multi-match functions](../../sql-reference/functions/string-search-functions.md#multimatchanyhaystack-pattern1-pattern2-patternn). @@ -3592,7 +3641,6 @@ Result: ┌─multiMatchAny('abcd', ['ab', 'bcd', 'c', 'd'])─┐ │ 1 │ └────────────────────────────────────────────────┘ - ``` Query: @@ -3611,7 +3659,6 @@ Exception: Regexp length too large. - [max_hyperscan_regexp_total_length](#max-hyperscan-regexp-total-length) - ## max_hyperscan_regexp_total_length {#max-hyperscan-regexp-total-length} Sets the maximum length total of all regular expressions in each [hyperscan multi-match function](../../sql-reference/functions/string-search-functions.md#multimatchanyhaystack-pattern1-pattern2-patternn). diff --git a/docs/en/sql-reference/aggregate-functions/reference/index.md b/docs/en/sql-reference/aggregate-functions/reference/index.md index 6615482a52e..59befed8785 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/index.md +++ b/docs/en/sql-reference/aggregate-functions/reference/index.md @@ -66,6 +66,8 @@ ClickHouse-specific aggregate functions: - [quantileDeterministic](../../../sql-reference/aggregate-functions/reference/quantiledeterministic.md) - [quantileTDigest](../../../sql-reference/aggregate-functions/reference/quantiletdigest.md) - [quantileTDigestWeighted](../../../sql-reference/aggregate-functions/reference/quantiletdigestweighted.md) +- [quantileBFloat16](../../../sql-reference/aggregate-functions/reference/quantilebfloat16.md#quantilebfloat16) +- [quantileBFloat16Weighted](../../../sql-reference/aggregate-functions/reference/quantilebfloat16.md#quantilebfloat16weighted) - [simpleLinearRegression](../../../sql-reference/aggregate-functions/reference/simplelinearregression.md) - [stochasticLinearRegression](../../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md) - [stochasticLogisticRegression](../../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md) diff --git a/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md b/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md index 25c7233aa56..728c200441d 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md +++ b/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md @@ -58,6 +58,10 @@ Result: ``` Note that all floating point values in the example are truncated to 1.0 when converting to `bfloat16`. +# quantileBFloat16Weighted {#quantilebfloat16weighted} + +Like `quantileBFloat16` but takes into account the weight of each sequence member. + **See Also** - [median](../../../sql-reference/aggregate-functions/reference/median.md#median) diff --git a/docs/en/sql-reference/aggregate-functions/reference/sumcount.md b/docs/en/sql-reference/aggregate-functions/reference/sumcount.md index 2986511e01a..00a7a9fc9f1 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/sumcount.md +++ b/docs/en/sql-reference/aggregate-functions/reference/sumcount.md @@ -43,4 +43,4 @@ Result: **See also** -- [optimize_fuse_sum_count_avg](../../../operations/settings/settings.md#optimize_fuse_sum_count_avg) setting. +- [optimize_syntax_fuse_functions](../../../operations/settings/settings.md#optimize_syntax_fuse_functions) setting. diff --git a/docs/en/sql-reference/functions/bitmap-functions.md b/docs/en/sql-reference/functions/bitmap-functions.md index f8d1fdc69fa..a6104835469 100644 --- a/docs/en/sql-reference/functions/bitmap-functions.md +++ b/docs/en/sql-reference/functions/bitmap-functions.md @@ -107,7 +107,7 @@ bitmapSubsetLimit(bitmap, range_start, cardinality_limit) The subset. -Type: `Bitmap object`. +Type: [Bitmap object](#bitmap_functions-bitmapbuild). **Example** @@ -125,9 +125,9 @@ Result: └───────────────────────────┘ ``` -## subBitmap {#subBitmap} +## subBitmap {#subbitmap} -Creates a subset of bitmap limit the results to `cardinality_limit` with offset of `offset`. +Returns the bitmap elements, starting from the `offset` position. The number of returned elements is limited by the `cardinality_limit` parameter. Analog of the [substring](string-functions.md#substring)) string function, but for bitmap. **Syntax** @@ -137,15 +137,15 @@ subBitmap(bitmap, offset, cardinality_limit) **Arguments** -- `bitmap` – [Bitmap object](#bitmap_functions-bitmapbuild). -- `offset` – the number of offsets. Type: [UInt32](../../sql-reference/data-types/int-uint.md). -- `cardinality_limit` – The subset cardinality upper limit. Type: [UInt32](../../sql-reference/data-types/int-uint.md). +- `bitmap` – The bitmap. Type: [Bitmap object](#bitmap_functions-bitmapbuild). +- `offset` – The position of the first element of the subset. Type: [UInt32](../../sql-reference/data-types/int-uint.md). +- `cardinality_limit` – The maximum number of elements in the subset. Type: [UInt32](../../sql-reference/data-types/int-uint.md). **Returned value** The subset. -Type: `Bitmap object`. +Type: [Bitmap object](#bitmap_functions-bitmapbuild). **Example** diff --git a/docs/en/sql-reference/functions/conditional-functions.md b/docs/en/sql-reference/functions/conditional-functions.md index a23da82a9c6..241112f7f7f 100644 --- a/docs/en/sql-reference/functions/conditional-functions.md +++ b/docs/en/sql-reference/functions/conditional-functions.md @@ -12,11 +12,13 @@ Controls conditional branching. Unlike most systems, ClickHouse always evaluate **Syntax** ``` sql -SELECT if(cond, then, else) +if(cond, then, else) ``` If the condition `cond` evaluates to a non-zero value, returns the result of the expression `then`, and the result of the expression `else`, if present, is skipped. If the `cond` is zero or `NULL`, then the result of the `then` expression is skipped and the result of the `else` expression, if present, is returned. +You can use the [short_circuit_function_evaluation](../../operations/settings/settings.md#short-circuit-function-evaluation) setting to calculate the `if` function according to a short scheme. If this setting is enabled, `then` expression is evaluated only on rows where `cond` is true, `else` expression – where `cond` is false. For example, an exception about division by zero is not thrown when executing the query `SELECT if(number = 0, 0, intDiv(42, number)) FROM numbers(10)`, because `intDiv(42, number)` will be evaluated only for numbers that doesn't satisfy condition `number = 0`. + **Arguments** - `cond` – The condition for evaluation that can be zero or not. The type is UInt8, Nullable(UInt8) or NULL. @@ -115,9 +117,15 @@ Returns `then` if the `cond` evaluates to be true (greater than zero), otherwise Allows you to write the [CASE](../../sql-reference/operators/index.md#operator_case) operator more compactly in the query. -Syntax: `multiIf(cond_1, then_1, cond_2, then_2, ..., else)` +**Syntax** -**Arguments:** +``` sql +multiIf(cond_1, then_1, cond_2, then_2, ..., else) +``` + +You can use the [short_circuit_function_evaluation](../../operations/settings/settings.md#short-circuit-function-evaluation) setting to calculate the `multiIf` function according to a short scheme. If this setting is enabled, `then_i` expression is evaluated only on rows where `((NOT cond_1) AND (NOT cond_2) AND ... AND (NOT cond_{i-1}) AND cond_i)` is true, `cond_i` will be evaluated only on rows where `((NOT cond_1) AND (NOT cond_2) AND ... AND (NOT cond_{i-1}))` is true. For example, an exception about division by zero is not thrown when executing the query `SELECT multiIf(number = 2, intDiv(1, number), number = 5) FROM numbers(10)`. + +**Arguments** - `cond_N` — The condition for the function to return `then_N`. - `then_N` — The result of the function when executed. @@ -201,4 +209,3 @@ FROM LEFT_RIGHT │ 4 │ ᴺᵁᴸᴸ │ Both equal │ └──────┴───────┴──────────────────┘ ``` - diff --git a/docs/en/sql-reference/functions/logical-functions.md b/docs/en/sql-reference/functions/logical-functions.md index 965ed97f20c..dcdb01e2059 100644 --- a/docs/en/sql-reference/functions/logical-functions.md +++ b/docs/en/sql-reference/functions/logical-functions.md @@ -19,6 +19,8 @@ Calculates the result of the logical conjunction between two or more values. Cor and(val1, val2...) ``` +You can use the [short_circuit_function_evaluation](../../operations/settings/settings.md#short-circuit-function-evaluation) setting to calculate the `and` function according to a short scheme. If this setting is enabled, `vali` is evaluated only on rows where `(val1 AND val2 AND ... AND val{i-1})` is true. For example, an exception about division by zero is not thrown when executing the query `SELECT and(number = 2, intDiv(1, number)) FROM numbers(10)`. + **Arguments** - `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). @@ -68,9 +70,11 @@ Calculates the result of the logical disjunction between two or more values. Cor **Syntax** ``` sql -and(val1, val2...) +or(val1, val2...) ``` +You can use the [short_circuit_function_evaluation](../../operations/settings/settings.md#short-circuit-function-evaluation) setting to calculate the `or` function according to a short scheme. If this setting is enabled, `vali` is evaluated only on rows where `((NOT val1) AND (NOT val2) AND ... AND (NOT val{i-1}))` is true. For example, an exception about division by zero is not thrown when executing the query `SELECT or(number = 0, intDiv(1, number) != 0) FROM numbers(10)`. + **Arguments** - `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). diff --git a/docs/en/sql-reference/functions/rounding-functions.md b/docs/en/sql-reference/functions/rounding-functions.md index f564f15659c..ad92ba502e1 100644 --- a/docs/en/sql-reference/functions/rounding-functions.md +++ b/docs/en/sql-reference/functions/rounding-functions.md @@ -176,7 +176,7 @@ roundBankers(4.5) = 4 roundBankers(3.55, 1) = 3.6 roundBankers(3.65, 1) = 3.6 roundBankers(10.35, 1) = 10.4 -roundBankers(10.755, 2) = 11,76 +roundBankers(10.755, 2) = 10.76 ``` **See Also** diff --git a/docs/en/sql-reference/statements/alter/order-by.md b/docs/en/sql-reference/statements/alter/order-by.md index d41b9a91724..16f9ace206d 100644 --- a/docs/en/sql-reference/statements/alter/order-by.md +++ b/docs/en/sql-reference/statements/alter/order-by.md @@ -11,7 +11,7 @@ ALTER TABLE [db].name [ON CLUSTER cluster] MODIFY ORDER BY new_expression The command changes the [sorting key](../../../engines/table-engines/mergetree-family/mergetree.md) of the table to `new_expression` (an expression or a tuple of expressions). Primary key remains the same. -The command is lightweight in a sense that it only changes metadata. To keep the property that data part rows are ordered by the sorting key expression you cannot add expressions containing existing columns to the sorting key (only columns added by the `ADD COLUMN` command in the same `ALTER` query). +The command is lightweight in a sense that it only changes metadata. To keep the property that data part rows are ordered by the sorting key expression you cannot add expressions containing existing columns to the sorting key (only columns added by the `ADD COLUMN` command in the same `ALTER` query, without default column value). !!! note "Note" It only works for tables in the [`MergeTree`](../../../engines/table-engines/mergetree-family/mergetree.md) family (including [replicated](../../../engines/table-engines/mergetree-family/replication.md) tables). diff --git a/docs/en/sql-reference/syntax.md b/docs/en/sql-reference/syntax.md index 7a908dd221f..207b2b82cd2 100644 --- a/docs/en/sql-reference/syntax.md +++ b/docs/en/sql-reference/syntax.md @@ -104,6 +104,28 @@ There are many nuances to processing `NULL`. For example, if at least one of the In queries, you can check `NULL` using the [IS NULL](../sql-reference/operators/index.md#operator-is-null) and [IS NOT NULL](../sql-reference/operators/index.md) operators and the related functions `isNull` and `isNotNull`. +### Heredoc {#heredeoc} + +A [heredoc](https://en.wikipedia.org/wiki/Here_document) is a way to define a string (often multiline), while maintaining the original formatting. A heredoc is defined as a custom string literal, placed between two `$` symbols, for example `$heredoc$`. A value between two heredocs is processed "as-is". + +You can use a heredoc to embed snippets of SQL, HTML, or XML code, etc. + +**Example** + +Query: + +```sql +SELECT $smth$SHOW CREATE VIEW my_view$smth$; +``` + +Result: + +```text +┌─'SHOW CREATE VIEW my_view'─┐ +│ SHOW CREATE VIEW my_view │ +└────────────────────────────┘ +``` + ## Functions {#functions} Function calls are written like an identifier with a list of arguments (possibly empty) in round brackets. In contrast to standard SQL, the brackets are required, even for an empty argument list. Example: `now()`. diff --git a/docs/ja/commercial/cloud.md b/docs/ja/commercial/cloud.md index dceffcd591f..312b8aed6ea 100644 --- a/docs/ja/commercial/cloud.md +++ b/docs/ja/commercial/cloud.md @@ -8,4 +8,4 @@ toc_title: "\u30AF\u30E9\u30A6\u30C9" # ClickHouse Cloud Service {#clickhouse-cloud-service} !!! info "Info" - Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](/company/#contact) to learn more. + Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](https://clickhouse.com/company/#contact) to learn more. diff --git a/docs/ja/development/cmake-in-clickhouse.md b/docs/ja/development/cmake-in-clickhouse.md deleted file mode 120000 index 0eb485952cd..00000000000 --- a/docs/ja/development/cmake-in-clickhouse.md +++ /dev/null @@ -1 +0,0 @@ -../../en/development/cmake-in-clickhouse.md \ No newline at end of file diff --git a/docs/ru/commercial/cloud.md b/docs/ru/commercial/cloud.md index 2bdb8d68da5..549978f3cdf 100644 --- a/docs/ru/commercial/cloud.md +++ b/docs/ru/commercial/cloud.md @@ -6,4 +6,4 @@ toc_title: "Поставщики облачных услуг ClickHouse" # Поставщики облачных услуг ClickHouse {#clickhouse-cloud-service-providers} !!! info "Info" - Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](/company/#contact) to learn more. + Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](https://clickhouse.com/company/#contact) to learn more. diff --git a/docs/ru/development/cmake-in-clickhouse.md b/docs/ru/development/cmake-in-clickhouse.md deleted file mode 120000 index 0eb485952cd..00000000000 --- a/docs/ru/development/cmake-in-clickhouse.md +++ /dev/null @@ -1 +0,0 @@ -../../en/development/cmake-in-clickhouse.md \ No newline at end of file diff --git a/docs/ru/operations/clickhouse-keeper.md b/docs/ru/operations/clickhouse-keeper.md index 3a724fc3d35..14d95ebae68 100644 --- a/docs/ru/operations/clickhouse-keeper.md +++ b/docs/ru/operations/clickhouse-keeper.md @@ -94,7 +94,7 @@ ClickHouse Keeper может использоваться как равноце ## Как запустить -ClickHouse Keeper входит в пакет` clickhouse-server`, просто добавьте кофигурацию `` и запустите сервер ClickHouse как обычно. Если вы хотите запустить ClickHouse Keeper автономно, сделайте это аналогичным способом: +ClickHouse Keeper входит в пакет `clickhouse-server`, просто добавьте кофигурацию `` и запустите сервер ClickHouse как обычно. Если вы хотите запустить ClickHouse Keeper автономно, сделайте это аналогичным способом: ```bash clickhouse-keeper --config /etc/your_path_to_config/config.xml --daemon @@ -116,4 +116,4 @@ clickhouse-keeper-converter --zookeeper-logs-dir /var/lib/zookeeper/version-2 -- 4. Скопируйте снэпшот на узлы сервера ClickHouse с настроенным `keeper` или запустите ClickHouse Keeper вместо ZooKeeper. Снэпшот должен сохраняться на всех узлах: в противном случае пустые узлы могут захватить лидерство и сконвертированные данные могут быть отброшены на старте. -[Original article](https://clickhouse.com/docs/en/operations/clickhouse-keeper/) \ No newline at end of file +[Original article](https://clickhouse.com/docs/en/operations/clickhouse-keeper/) diff --git a/docs/ru/operations/settings/settings.md b/docs/ru/operations/settings/settings.md index 44e05d862c6..6258da2c30d 100644 --- a/docs/ru/operations/settings/settings.md +++ b/docs/ru/operations/settings/settings.md @@ -801,12 +801,32 @@ ClickHouse может парсить только базовый формат `Y Кэш несжатых блоков хранит данные, извлечённые при выполнении запросов. ClickHouse использует кэш для ускорения ответов на повторяющиеся небольшие запросы. Настройка защищает кэш от переполнения. Настройка сервера [uncompressed_cache_size](../server-configuration-parameters/settings.md#server-settings-uncompressed_cache_size) определяет размер кэша несжатых блоков. -Возможное значение: +Возможные значения: - Положительное целое число. Значение по умолчанию: 2013265920. +## merge_tree_clear_old_temporary_directories_interval_seconds {#setting-merge-tree-clear-old-temporary-directories-interval-seconds} + +Задает интервал в секундах для удаления старых временных каталогов на сервере ClickHouse. + +Возможные значения: + +- Положительное целое число. + +Значение по умолчанию: `60` секунд. + +## merge_tree_clear_old_parts_interval_seconds {#setting-merge-tree-clear-old-parts-interval-seconds} + +Задает интервал в секундах для удаления старых кусков данных, журналов предзаписи (WAL) и мутаций на сервере ClickHouse . + +Возможные значения: + +- Положительное целое число. + +Значение по умолчанию: `1` секунда. + ## min_bytes_to_use_direct_io {#settings-min-bytes-to-use-direct-io} Минимальный объём данных, необходимый для прямого (небуферизованного) чтения/записи (direct I/O) на диск. @@ -3122,7 +3142,7 @@ SELECT * FROM test LIMIT 10 OFFSET 100; Значение по умолчанию: `1800`. -## optimize_fuse_sum_count_avg {#optimize_fuse_sum_count_avg} +## optimize_syntax_fuse_functions {#optimize_syntax_fuse_functions} Позволяет объединить агрегатные функции с одинаковым аргументом. Запрос, содержащий по крайней мере две агрегатные функции: [sum](../../sql-reference/aggregate-functions/reference/sum.md#agg_function-sum), [count](../../sql-reference/aggregate-functions/reference/count.md#agg_function-count) или [avg](../../sql-reference/aggregate-functions/reference/avg.md#agg_function-avg) с одинаковым аргументом, перезаписывается как [sumCount](../../sql-reference/aggregate-functions/reference/sumcount.md#agg_function-sumCount). @@ -3139,7 +3159,7 @@ SELECT * FROM test LIMIT 10 OFFSET 100; ``` sql CREATE TABLE fuse_tbl(a Int8, b Int8) Engine = Log; -SET optimize_fuse_sum_count_avg = 1; +SET optimize_syntax_fuse_functions = 1; EXPLAIN SYNTAX SELECT sum(a), sum(b), count(b), avg(b) from fuse_tbl FORMAT TSV; ``` @@ -3333,7 +3353,7 @@ SETTINGS index_granularity = 8192 │ ## force_optimize_projection {#force-optimize-projection} -Включает или отключает обязательное использование [проекций](../../engines/table-engines/mergetree-family/mergetree.md#projections) в запросах `SELECT`, если поддержка проекций включена (см. настройку [allow_experimental_projection_optimization](#allow-experimental-projection-optimization)). +Включает или отключает обязательное использование [проекций](../../engines/table-engines/mergetree-family/mergetree.md#projections) в запросах `SELECT`, если поддержка проекций включена (см. настройку [allow_experimental_projection_optimization](#allow-experimental-projection-optimization)). Возможные значения: @@ -3376,6 +3396,18 @@ SETTINGS index_granularity = 8192 │ Значение по умолчанию: `1000`. +## short_circuit_function_evaluation {#short-circuit-function-evaluation} + +Позволяет вычислять функции [if](../../sql-reference/functions/conditional-functions.md#if), [multiIf](../../sql-reference/functions/conditional-functions.md#multiif), [and](../../sql-reference/functions/logical-functions.md#logical-and-function) и [or](../../sql-reference/functions/logical-functions.md#logical-or-function) по [короткой схеме](https://ru-wikipedia-org.turbopages.org/ru.wikipedia.org/s/wiki/Вычисления_по_короткой_схеме). Это помогает оптимизировать выполнение сложных выражений в этих функциях и предотвратить возможные исключения (например, деление на ноль, когда оно не ожидается). + +Возможные значения: + +- `enable` — по короткой схеме вычисляются функции, которые подходят для этого (могут сгенерировать исключение или требуют сложных вычислений). +- `force_enable` — все функции вычисляются по короткой схеме. +- `disable` — вычисление функций по короткой схеме отключено. + +Значение по умолчанию: `enable`. + ## max_hyperscan_regexp_length {#max-hyperscan-regexp-length} Задает максимальную длину каждого регулярного выражения в [hyperscan-функциях](../../sql-reference/functions/string-search-functions.md#multimatchanyhaystack-pattern1-pattern2-patternn) поиска множественных совпадений в строке. diff --git a/docs/ru/sql-reference/aggregate-functions/reference/index.md b/docs/ru/sql-reference/aggregate-functions/reference/index.md index b2172e1e70e..c372a0713ca 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/index.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/index.md @@ -61,6 +61,8 @@ toc_hidden: true - [quantileDeterministic](../../../sql-reference/aggregate-functions/reference/quantiledeterministic.md) - [quantileTDigest](../../../sql-reference/aggregate-functions/reference/quantiletdigest.md) - [quantileTDigestWeighted](../../../sql-reference/aggregate-functions/reference/quantiletdigestweighted.md) +- [quantileBFloat16](../../../sql-reference/aggregate-functions/reference/quantilebfloat16.md#quantilebfloat16) +- [quantileBFloat16Weighted](../../../sql-reference/aggregate-functions/reference/quantilebfloat16.md#quantilebfloat16weighted) - [simpleLinearRegression](../../../sql-reference/aggregate-functions/reference/simplelinearregression.md) - [stochasticLinearRegression](../../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md) - [stochasticLogisticRegression](../../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md) diff --git a/docs/ru/sql-reference/aggregate-functions/reference/quantilebfloat16.md b/docs/ru/sql-reference/aggregate-functions/reference/quantilebfloat16.md index f8622e3fd05..8f240a3a009 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/quantilebfloat16.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/quantilebfloat16.md @@ -58,6 +58,10 @@ SELECT quantileBFloat16(0.75)(a), quantileBFloat16(0.75)(b) FROM example_table; ``` Обратите внимание, что все числа с плавающей точкой в примере были округлены до 1.0 при преобразовании к `bfloat16`. +# quantileBFloat16Weighted {#quantilebfloat16weighted} + +Версия функции `quantileBFloat16`, которая учитывает вес каждого элемента последовательности. + **См. также** - [median](../../../sql-reference/aggregate-functions/reference/median.md#median) diff --git a/docs/ru/sql-reference/aggregate-functions/reference/sumcount.md b/docs/ru/sql-reference/aggregate-functions/reference/sumcount.md index 5a8f93209cf..ac721577a9a 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/sumcount.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/sumcount.md @@ -43,4 +43,4 @@ SELECT sumCount(x) from s_table; **Смотрите также** -- Настройка [optimize_fuse_sum_count_avg](../../../operations/settings/settings.md#optimize_fuse_sum_count_avg) +- Настройка [optimize_syntax_fuse_functions](../../../operations/settings/settings.md#optimize_syntax_fuse_functions) diff --git a/docs/ru/sql-reference/functions/bitmap-functions.md b/docs/ru/sql-reference/functions/bitmap-functions.md index 3da729664d0..011d339c847 100644 --- a/docs/ru/sql-reference/functions/bitmap-functions.md +++ b/docs/ru/sql-reference/functions/bitmap-functions.md @@ -66,15 +66,14 @@ bitmapSubsetLimit(bitmap, range_start, cardinality_limit) **Аргументы** - `bitmap` – битмап. [Bitmap object](#bitmap_functions-bitmapbuild). - - `range_start` – начальная точка подмножества. [UInt32](../../sql-reference/functions/bitmap-functions.md#bitmap-functions). -- `cardinality_limit` – Верхний предел подмножества. [UInt32](../../sql-reference/functions/bitmap-functions.md#bitmap-functions). +- `cardinality_limit` – верхний предел подмножества. [UInt32](../../sql-reference/functions/bitmap-functions.md#bitmap-functions). **Возвращаемое значение** Подмножество битмапа. -Тип: `Bitmap object`. +Тип: [Bitmap object](#bitmap_functions-bitmapbuild). **Пример** @@ -92,6 +91,44 @@ SELECT bitmapToArray(bitmapSubsetLimit(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,12 └───────────────────────────┘ ``` +## subBitmap {#subbitmap} + +Возвращает элементы битмапа, начиная с позиции `offset`. Число возвращаемых элементов ограничивается параметром `cardinality_limit`. Аналог строковой функции [substring](string-functions.md#substring)), но для битмапа. + +**Синтаксис** + +``` sql +subBitmap(bitmap, offset, cardinality_limit) +``` + +**Аргументы** + +- `bitmap` – битмап. Тип: [Bitmap object](#bitmap_functions-bitmapbuild). +- `offset` – позиция первого элемента возвращаемого подмножества. Тип: [UInt32](../../sql-reference/data-types/int-uint.md). +- `cardinality_limit` – максимальное число элементов возвращаемого подмножества. Тип: [UInt32](../../sql-reference/data-types/int-uint.md). + +**Возвращаемое значение** + +Подмножество битмапа. + +Тип: [Bitmap object](#bitmap_functions-bitmapbuild). + +**Пример** + +Запрос: + +``` sql +SELECT bitmapToArray(subBitmap(bitmapBuild([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,100,200,500]), toUInt32(10), toUInt32(10))) AS res; +``` + +Результат: + +``` text +┌─res─────────────────────────────┐ +│ [10,11,12,13,14,15,16,17,18,19] │ +└─────────────────────────────────┘ +``` + ## bitmapContains {#bitmap_functions-bitmapcontains} Проверяет вхождение элемента в битовый массив. diff --git a/docs/ru/sql-reference/functions/conditional-functions.md b/docs/ru/sql-reference/functions/conditional-functions.md index 8a78425efd4..521df785df2 100644 --- a/docs/ru/sql-reference/functions/conditional-functions.md +++ b/docs/ru/sql-reference/functions/conditional-functions.md @@ -12,11 +12,13 @@ toc_title: "Условные функции" **Синтаксис** ``` sql -SELECT if(cond, then, else) +if(cond, then, else) ``` Если условие `cond` не равно нулю, то возвращается результат выражения `then`. Если условие `cond` равно нулю или является NULL, то результат выражения `then` пропускается и возвращается результат выражения `else`. +Чтобы вычислять функцию `if` по короткой схеме, используйте настройку [short_circuit_function_evaluation](../../operations/settings/settings.md#short-circuit-function-evaluation). Если настройка включена, то выражение `then` вычисляется только для строк, где условие `cond` верно, а выражение `else` – для строк, где условие `cond` неверно. Например, при выполнении запроса `SELECT if(number = 0, 0, intDiv(42, number)) FROM numbers(10)` не будет сгенерировано исключение из-за деления на ноль, так как `intDiv(42, number)` будет вычислено только для чисел, которые не удовлетворяют условию `number = 0`. + **Аргументы** - `cond` – проверяемое условие. Может быть [UInt8](../../sql-reference/functions/conditional-functions.md) или `NULL`. @@ -77,7 +79,13 @@ SELECT if(0, plus(2, 2), plus(2, 6)); Позволяет более компактно записать оператор [CASE](../operators/index.md#operator_case) в запросе. - multiIf(cond_1, then_1, cond_2, then_2...else) +**Синтаксис** + +``` sql +multiIf(cond_1, then_1, cond_2, then_2, ..., else) +``` + +Чтобы вычислять функцию `multiIf` по короткой схеме, используйте настройку [short_circuit_function_evaluation](../../operations/settings/settings.md#short-circuit-function-evaluation). Если настройка включена, то выражение `then_i` вычисляется только для строк, где условие `((NOT cond_1) AND (NOT cond_2) AND ... AND (NOT cond_{i-1}) AND cond_i)` верно, `cond_i` вычисляется только для строк, где условие `((NOT cond_1) AND (NOT cond_2) AND ... AND (NOT cond_{i-1}))` верно. Например, при выполнении запроса `SELECT multiIf(number = 2, intDiv(1, number), number = 5) FROM numbers(10)` не будет сгенерировано исключение из-за деления на ноль. **Аргументы** @@ -110,4 +118,3 @@ SELECT if(0, plus(2, 2), plus(2, 6)); │ ᴺᵁᴸᴸ │ └────────────────────────────────────────────┘ ``` - diff --git a/docs/ru/sql-reference/functions/logical-functions.md b/docs/ru/sql-reference/functions/logical-functions.md index 837541ec58f..6ba55dca30f 100644 --- a/docs/ru/sql-reference/functions/logical-functions.md +++ b/docs/ru/sql-reference/functions/logical-functions.md @@ -19,6 +19,8 @@ toc_title: "Логические функции" and(val1, val2...) ``` +Чтобы вычислять функцию `and` по короткой схеме, используйте настройку [short_circuit_function_evaluation](../../operations/settings/settings.md#short-circuit-function-evaluation). Если настройка включена, то выражение `vali` вычисляется только для строк, где условие `(val1 AND val2 AND ... AND val{i-1})` верно. Например, при выполнении запроса `SELECT and(number = 2, intDiv(1, number)) FROM numbers(10)` не будет сгенерировано исключение из-за деления на ноль. + **Аргументы** - `val1, val2, ...` — список из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). @@ -71,6 +73,8 @@ SELECT and(NULL, 1, 10, -2); and(val1, val2...) ``` +Чтобы вычислять функцию `or` по короткой схеме, используйте настройку [short_circuit_function_evaluation](../../operations/settings/settings.md#short-circuit-function-evaluation). Если настройка включена, то выражение `vali` вычисляется только для строк, где условие `((NOT val1) AND (NOT val2) AND ... AND (NOT val{i-1}))` верно. Например, при выполнении запроса `SELECT or(number = 0, intDiv(1, number) != 0) FROM numbers(10)` не будет сгенерировано исключение из-за деления на ноль. + **Аргументы** - `val1, val2, ...` — список из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). diff --git a/docs/ru/sql-reference/syntax.md b/docs/ru/sql-reference/syntax.md index 488e327dd31..6705b1068fe 100644 --- a/docs/ru/sql-reference/syntax.md +++ b/docs/ru/sql-reference/syntax.md @@ -3,7 +3,7 @@ toc_priority: 31 toc_title: "Синтаксис" --- -# Синтаксис {#sintaksis} +# Синтаксис {#syntax} В системе есть два вида парсеров: полноценный парсер SQL (recursive descent parser) и парсер форматов данных (быстрый потоковый парсер). Во всех случаях кроме запроса INSERT, используется только полноценный парсер SQL. @@ -21,11 +21,11 @@ INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def') Далее пойдёт речь о полноценном парсере. О парсерах форматов, смотри раздел «Форматы». -## Пробелы {#probely} +## Пробелы {#spaces} Между синтаксическими конструкциями (в том числе, в начале и конце запроса) может быть расположено произвольное количество пробельных символов. К пробельным символам относятся пробел, таб, перевод строки, CR, form feed. -## Комментарии {#kommentarii} +## Комментарии {#comments} Поддерживаются комментарии в SQL-стиле и C-стиле. Комментарии в SQL-стиле: от `--` до конца строки. Пробел после `--` может не ставиться. @@ -63,7 +63,7 @@ INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def') Существуют: числовые, строковые, составные литералы и `NULL`. -### Числовые {#chislovye} +### Числовые {#numeric} Числовой литерал пытается распарситься: @@ -83,7 +83,7 @@ INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def') Минимальный набор символов, которых вам необходимо экранировать в строковых литералах: `'` и `\`. Одинарная кавычка может быть экранирована одинарной кавычкой, литералы `'It\'s'` и `'It''s'` эквивалентны. -### Составные {#sostavnye} +### Составные {#compound} Поддерживаются конструкции для массивов: `[1, 2, 3]` и кортежей: `(1, 'Hello, world!', 2)`. На самом деле, это вовсе не литералы, а выражение с оператором создания массива и оператором создания кортежа, соответственно. @@ -102,17 +102,39 @@ INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def') В запросах можно проверить `NULL` с помощью операторов [IS NULL](operators/index.md#operator-is-null) и [IS NOT NULL](operators/index.md), а также соответствующих функций `isNull` и `isNotNull`. -## Функции {#funktsii} +### Heredoc {#heredeoc} + +Синтаксис [heredoc](https://ru.wikipedia.org/wiki/Heredoc-синтаксис) — это способ определения строк с сохранением исходного формата (часто с переносом строки). `Heredoc` задается как произвольный строковый литерал между двумя символами `$`, например `$heredoc$`. Значение между двумя `heredoc` обрабатывается "как есть". + +Синтаксис `heredoc` часто используют для вставки кусков кода SQL, HTML, XML и т.п. + +**Пример** + +Запрос: + +```sql +SELECT $smth$SHOW CREATE VIEW my_view$smth$; +``` + +Результат: + +```text +┌─'SHOW CREATE VIEW my_view'─┐ +│ SHOW CREATE VIEW my_view │ +└────────────────────────────┘ +``` + +## Функции {#functions} Функции записываются как идентификатор со списком аргументов (возможно, пустым) в скобках. В отличие от стандартного SQL, даже в случае пустого списка аргументов, скобки обязательны. Пример: `now()`. Бывают обычные и агрегатные функции (смотрите раздел «Агрегатные функции»). Некоторые агрегатные функции могут содержать два списка аргументов в круглых скобках. Пример: `quantile(0.9)(x)`. Такие агрегатные функции называются «параметрическими», а первый список аргументов называется «параметрами». Синтаксис агрегатных функций без параметров ничем не отличается от обычных функций. -## Операторы {#operatory} +## Операторы {#operators} Операторы преобразуются в соответствующие им функции во время парсинга запроса, с учётом их приоритета и ассоциативности. Например, выражение `1 + 2 * 3 + 4` преобразуется в `plus(plus(1, multiply(2, 3)), 4)`. -## Типы данных и движки таблиц {#tipy-dannykh-i-dvizhki-tablits} +## Типы данных и движки таблиц {#data_types-and-database-table-engines} Типы данных и движки таблиц в запросе `CREATE` записываются также, как идентификаторы или также как функции. То есть, могут содержать или не содержать список аргументов в круглых скобках. Подробнее смотрите разделы «Типы данных», «Движки таблиц», «CREATE». diff --git a/docs/ru/whats-new/index.md b/docs/ru/whats-new/index.md index d8a26423813..f3d6e2839bc 100644 --- a/docs/ru/whats-new/index.md +++ b/docs/ru/whats-new/index.md @@ -5,4 +5,4 @@ toc_priority: 82 # Что нового в ClickHouse? -Планы развития вкратце изложены [здесь](extended-roadmap.md), а новости по предыдущим релизам подробно описаны в [журнале изменений](changelog/index.md). +Планы развития вкратце изложены [здесь](https://github.com/ClickHouse/ClickHouse/issues/17623), а новости по предыдущим релизам подробно описаны в [журнале изменений](changelog/index.md). diff --git a/docs/tools/blog.py b/docs/tools/blog.py index d0f2496f914..bfc8c0908e9 100644 --- a/docs/tools/blog.py +++ b/docs/tools/blog.py @@ -51,7 +51,7 @@ def build_for_lang(lang, args): if args.htmlproofer: plugins.append('htmlproofer') - website_url = 'https://clickhouse.tech' + website_url = 'https://clickhouse.com' site_name = site_names.get(lang, site_names['en']) blog_nav, post_meta = nav.build_blog_nav(lang, args) raw_config = dict( @@ -62,7 +62,7 @@ def build_for_lang(lang, args): strict=True, theme=theme_cfg, nav=blog_nav, - copyright='©2016–2021 Yandex LLC', + copyright='©2016–2021 ClickHouse, Inc.', use_directory_urls=True, repo_name='ClickHouse/ClickHouse', repo_url='https://github.com/ClickHouse/ClickHouse/', diff --git a/docs/tools/build.py b/docs/tools/build.py index 025cf348c1f..3ea6d3e38c7 100755 --- a/docs/tools/build.py +++ b/docs/tools/build.py @@ -203,6 +203,7 @@ if __name__ == '__main__': arg_parser.add_argument('--verbose', action='store_true') args = arg_parser.parse_args() + args.minify = False # TODO remove logging.basicConfig( level=logging.DEBUG if args.verbose else logging.INFO, diff --git a/docs/tools/cmake_in_clickhouse_generator.py b/docs/tools/cmake_in_clickhouse_generator.py index 1414ffc4b9e..8b440823df3 100644 --- a/docs/tools/cmake_in_clickhouse_generator.py +++ b/docs/tools/cmake_in_clickhouse_generator.py @@ -155,6 +155,12 @@ def generate_cmake_flags_files() -> None: with open(footer_file_name, "r") as footer: f.write(footer.read()) + other_languages = ["docs/ja/development/cmake-in-clickhouse.md", + "docs/zh/development/cmake-in-clickhouse.md", + "docs/ru/development/cmake-in-clickhouse.md"] + + for lang in other_languages: + os.symlink(output_file_name, os.path.join(root_path, lang)) if __name__ == '__main__': generate_cmake_flags_files() diff --git a/docs/tools/test.py b/docs/tools/test.py index 90ab6d1cac2..53ed9505acd 100755 --- a/docs/tools/test.py +++ b/docs/tools/test.py @@ -8,6 +8,9 @@ import subprocess def test_single_page(input_path, lang): + if not (lang == 'en' or lang == 'ru'): + return + with open(input_path) as f: soup = bs4.BeautifulSoup( f, @@ -33,11 +36,8 @@ def test_single_page(input_path, lang): logging.info('Link to nowhere: %s' % href) if links_to_nowhere: - if lang == 'en' or lang == 'ru': - logging.error(f'Found {links_to_nowhere} links to nowhere in {lang}') - # TODO: restore sys.exit(1) here - else: - logging.warning(f'Found {links_to_nowhere} links to nowhere in {lang}') + logging.error(f'Found {links_to_nowhere} links to nowhere in {lang}') + sys.exit(1) if len(anchor_points) <= 10: logging.error('Html parsing is probably broken') diff --git a/docs/tools/website.py b/docs/tools/website.py index 2e0d0974a5d..5e4f48e3441 100644 --- a/docs/tools/website.py +++ b/docs/tools/website.py @@ -215,10 +215,12 @@ def minify_file(path, css_digest, js_digest): content = minify_html(content) content = content.replace('base.css?css_digest', f'base.css?{css_digest}') content = content.replace('base.js?js_digest', f'base.js?{js_digest}') - elif path.endswith('.css'): - content = cssmin.cssmin(content) - elif path.endswith('.js'): - content = jsmin.jsmin(content) +# TODO: restore cssmin +# elif path.endswith('.css'): +# content = cssmin.cssmin(content) +# TODO: restore jsmin +# elif path.endswith('.js'): +# content = jsmin.jsmin(content) with open(path, 'wb') as f: f.write(content.encode('utf-8')) @@ -240,7 +242,7 @@ def minify_website(args): js_in = get_js_in(args) js_out = f'{args.output_dir}/js/base.js' - if args.minify: + if args.minify and False: # TODO: return closure js_in = [js[1:-1] for js in js_in] closure_args = [ '--js', *js_in, '--js_output_file', js_out, diff --git a/docs/zh/commercial/cloud.md b/docs/zh/commercial/cloud.md index e8c098db5be..61a4f638bbc 100644 --- a/docs/zh/commercial/cloud.md +++ b/docs/zh/commercial/cloud.md @@ -8,4 +8,4 @@ toc_title: 云 # ClickHouse Cloud Service {#clickhouse-cloud-service} !!! info "Info" - Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](/company/#contact) to learn more. + Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](https://clickhouse.com/company/#contact) to learn more. diff --git a/docs/zh/commercial/support.md b/docs/zh/commercial/support.md index 5e5d00a22b8..a3714c98d44 100644 --- a/docs/zh/commercial/support.md +++ b/docs/zh/commercial/support.md @@ -6,4 +6,4 @@ toc_title: 支持 # ClickHouse 商业支持服务提供商 {#clickhouse-commercial-support-service-providers} !!! info "Info" - Detailed public description for ClickHouse support services is not ready yet, please [contact us](/company/#contact) to learn more. + Detailed public description for ClickHouse support services is not ready yet, please [contact us](https://clickhouse.com/company/#contact) to learn more. diff --git a/docs/zh/development/cmake-in-clickhouse.md b/docs/zh/development/cmake-in-clickhouse.md deleted file mode 120000 index 0eb485952cd..00000000000 --- a/docs/zh/development/cmake-in-clickhouse.md +++ /dev/null @@ -1 +0,0 @@ -../../en/development/cmake-in-clickhouse.md \ No newline at end of file diff --git a/programs/copier/ClusterCopier.cpp b/programs/copier/ClusterCopier.cpp index de26e34bf2e..1e8222f8769 100644 --- a/programs/copier/ClusterCopier.cpp +++ b/programs/copier/ClusterCopier.cpp @@ -1964,7 +1964,7 @@ UInt64 ClusterCopier::executeQueryOnCluster( } catch (...) { - LOG_WARNING(log, "Seemns like node with address {} is unreachable.", node.host_name); + LOG_WARNING(log, "Node with address {} seems to be unreachable.", node.host_name); continue; } diff --git a/programs/local/LocalServer.cpp b/programs/local/LocalServer.cpp index c70f0c07ea9..02328ca3d79 100644 --- a/programs/local/LocalServer.cpp +++ b/programs/local/LocalServer.cpp @@ -306,6 +306,7 @@ try attachInformationSchema(global_context, *createMemoryDatabaseIfNotExists(global_context, DatabaseCatalog::INFORMATION_SCHEMA)); attachInformationSchema(global_context, *createMemoryDatabaseIfNotExists(global_context, DatabaseCatalog::INFORMATION_SCHEMA_UPPERCASE)); loadMetadata(global_context); + startupSystemTables(); DatabaseCatalog::instance().loadDatabases(); LOG_DEBUG(log, "Loaded metadata."); } diff --git a/programs/server/Server.cpp b/programs/server/Server.cpp index 961d8e7b4cf..5804d28d337 100644 --- a/programs/server/Server.cpp +++ b/programs/server/Server.cpp @@ -918,7 +918,7 @@ if (ThreadFuzzer::instance().isEffective()) global_context, settings.async_insert_threads, settings.async_insert_max_data_size, - AsynchronousInsertQueue::Timeout{.busy = settings.async_insert_busy_timeout, .stale = settings.async_insert_stale_timeout})); + AsynchronousInsertQueue::Timeout{.busy = settings.async_insert_busy_timeout_ms, .stale = settings.async_insert_stale_timeout_ms})); /// Size of cache for marks (index of MergeTree family of tables). It is mandatory. size_t mark_cache_size = config().getUInt64("mark_cache_size"); @@ -1116,6 +1116,7 @@ if (ThreadFuzzer::instance().isEffective()) database_catalog.loadMarkedAsDroppedTables(); /// Then, load remaining databases loadMetadata(global_context, default_database); + startupSystemTables(); database_catalog.loadDatabases(); /// After loading validate that default database exists database_catalog.assertDatabaseExists(default_database); diff --git a/src/AggregateFunctions/AggregateFunctionExponentialMovingAverage.cpp b/src/AggregateFunctions/AggregateFunctionExponentialMovingAverage.cpp new file mode 100644 index 00000000000..8569e8f9c8c --- /dev/null +++ b/src/AggregateFunctions/AggregateFunctionExponentialMovingAverage.cpp @@ -0,0 +1,98 @@ +#include +#include +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; + extern const int ILLEGAL_TYPE_OF_ARGUMENT; +} + + +/** See the comments in ExponentiallySmoothedCounter.h + */ +class AggregateFunctionExponentialMovingAverage final + : public IAggregateFunctionDataHelper +{ +private: + String name; + Float64 half_decay; + +public: + AggregateFunctionExponentialMovingAverage(const DataTypes & argument_types_, const Array & params) + : IAggregateFunctionDataHelper(argument_types_, params) + { + if (params.size() != 1) + throw Exception{"Aggregate function " + getName() + " requires exactly one parameter: half decay time.", + ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH}; + + half_decay = applyVisitor(FieldVisitorConvertToNumber(), params[0]); + } + + String getName() const override + { + return "exponentialMovingAverage"; + } + + DataTypePtr getReturnType() const override + { + return std::make_shared>(); + } + + bool allocatesMemoryInArena() const override { return false; } + + void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override + { + const auto & value = columns[0]->getFloat64(row_num); + const auto & time = columns[1]->getFloat64(row_num); + this->data(place).add(value, time, half_decay); + } + + void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, Arena *) const override + { + this->data(place).merge(this->data(rhs), half_decay); + } + + void serialize(ConstAggregateDataPtr __restrict place, WriteBuffer & buf) const override + { + writeBinary(this->data(place).value, buf); + writeBinary(this->data(place).time, buf); + } + + void deserialize(AggregateDataPtr __restrict place, ReadBuffer & buf, Arena *) const override + { + readBinary(this->data(place).value, buf); + readBinary(this->data(place).time, buf); + } + + void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena *) const override + { + auto & column = assert_cast &>(to); + column.getData().push_back(this->data(place).get(half_decay)); + } +}; + +void registerAggregateFunctionExponentialMovingAverage(AggregateFunctionFactory & factory) +{ + factory.registerFunction("exponentialMovingAverage", + [](const std::string & name, const DataTypes & argument_types, const Array & params, const Settings *) -> AggregateFunctionPtr + { + assertBinary(name, argument_types); + for (const auto & type : argument_types) + if (!isNumber(*type)) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "Both arguments for aggregate function {} must have numeric type, got {}", name, type->getName()); + return std::make_shared(argument_types, params); + }); +} + +} diff --git a/src/AggregateFunctions/AggregateFunctionUniqCombined.cpp b/src/AggregateFunctions/AggregateFunctionUniqCombined.cpp index e137937343b..2b7e0d97372 100644 --- a/src/AggregateFunctions/AggregateFunctionUniqCombined.cpp +++ b/src/AggregateFunctions/AggregateFunctionUniqCombined.cpp @@ -102,7 +102,7 @@ namespace // This range is hardcoded below if (precision_param > 20 || precision_param < 12) throw Exception( - "Parameter for aggregate function " + name + " is out or range: [12, 20].", ErrorCodes::ARGUMENT_OUT_OF_BOUND); + "Parameter for aggregate function " + name + " is out of range: [12, 20].", ErrorCodes::ARGUMENT_OUT_OF_BOUND); precision = precision_param; } diff --git a/src/AggregateFunctions/registerAggregateFunctions.cpp b/src/AggregateFunctions/registerAggregateFunctions.cpp index dd1f292a392..5d0af719290 100644 --- a/src/AggregateFunctions/registerAggregateFunctions.cpp +++ b/src/AggregateFunctions/registerAggregateFunctions.cpp @@ -50,7 +50,9 @@ void registerAggregateFunctionWelchTTest(AggregateFunctionFactory &); void registerAggregateFunctionStudentTTest(AggregateFunctionFactory &); void registerAggregateFunctionSingleValueOrNull(AggregateFunctionFactory &); void registerAggregateFunctionSequenceNextNode(AggregateFunctionFactory &); +void registerAggregateFunctionExponentialMovingAverage(AggregateFunctionFactory &); void registerAggregateFunctionSparkbar(AggregateFunctionFactory &); +void registerAggregateFunctionIntervalLengthSum(AggregateFunctionFactory &); class AggregateFunctionCombinatorFactory; void registerAggregateFunctionCombinatorIf(AggregateFunctionCombinatorFactory &); @@ -66,8 +68,6 @@ void registerAggregateFunctionCombinatorDistinct(AggregateFunctionCombinatorFact void registerWindowFunctions(AggregateFunctionFactory & factory); -void registerAggregateFunctionIntervalLengthSum(AggregateFunctionFactory &); - void registerAggregateFunctions() { { @@ -116,11 +116,11 @@ void registerAggregateFunctions() registerAggregateFunctionWelchTTest(factory); registerAggregateFunctionStudentTTest(factory); registerAggregateFunctionSingleValueOrNull(factory); + registerAggregateFunctionIntervalLengthSum(factory); + registerAggregateFunctionExponentialMovingAverage(factory); + registerAggregateFunctionSparkbar(factory); registerWindowFunctions(factory); - - registerAggregateFunctionIntervalLengthSum(factory); - registerAggregateFunctionSparkbar(factory); } { diff --git a/src/Backups/renameInCreateQuery.cpp b/src/Backups/renameInCreateQuery.cpp index a36995654ee..4c78844d266 100644 --- a/src/Backups/renameInCreateQuery.cpp +++ b/src/Backups/renameInCreateQuery.cpp @@ -160,26 +160,29 @@ namespace if (args.size() <= db_name_index) return; - String db_name = evaluateConstantExpressionForDatabaseName(args[db_name_index], data.context)->as().value.safeGet(); + String name = evaluateConstantExpressionForDatabaseName(args[db_name_index], data.context)->as().value.safeGet(); - String table_name; size_t table_name_index = static_cast(-1); - size_t dot = String::npos; - if (function.name != "Distributed") - dot = db_name.find('.'); - if (dot != String::npos) - { - table_name = db_name.substr(dot + 1); - db_name.resize(dot); - } + + QualifiedTableName qualified_name; + + if (function.name == "Distributed") + qualified_name.table = name; else + qualified_name = QualifiedTableName::parseFromString(name); + + if (qualified_name.database.empty()) { + std::swap(qualified_name.database, qualified_name.table); table_name_index = 2; if (args.size() <= table_name_index) return; - table_name = evaluateConstantExpressionForDatabaseName(args[table_name_index], data.context)->as().value.safeGet(); + qualified_name.table = evaluateConstantExpressionForDatabaseName(args[table_name_index], data.context)->as().value.safeGet(); } + const String & db_name = qualified_name.database; + const String & table_name = qualified_name.table; + if (db_name.empty() || table_name.empty()) return; diff --git a/src/Client/Connection.cpp b/src/Client/Connection.cpp index 366e61bc8e2..dffac97d8ce 100644 --- a/src/Client/Connection.cpp +++ b/src/Client/Connection.cpp @@ -130,10 +130,16 @@ void Connection::connect(const ConnectionTimeouts & timeouts) } catch (Poco::TimeoutException & e) { + /// disconnect() will reset the socket, get timeouts before. + const std::string & message = fmt::format("{} ({}, receive timeout {} ms, send timeout {} ms)", + e.displayText(), getDescription(), + socket->getReceiveTimeout().totalMilliseconds(), + socket->getSendTimeout().totalMilliseconds()); + disconnect(); /// Add server address to exception. Also Exception will remember stack trace. It's a pity that more precise exception type is lost. - throw NetException(e.displayText() + " (" + getDescription() + ")", ErrorCodes::SOCKET_TIMEOUT); + throw NetException(message, ErrorCodes::SOCKET_TIMEOUT); } } @@ -413,7 +419,12 @@ void Connection::sendQuery( if (!connected) connect(timeouts); - TimeoutSetter timeout_setter(*socket, timeouts.send_timeout, timeouts.receive_timeout, true); + /// Query is not executed within sendQuery() function. + /// + /// And what this means that temporary timeout (via TimeoutSetter) is not + /// enough, since next query can use timeout from the previous query in this case. + socket->setReceiveTimeout(timeouts.receive_timeout); + socket->setSendTimeout(timeouts.send_timeout); if (settings) { diff --git a/src/Common/Config/YAMLParser.cpp b/src/Common/Config/YAMLParser.cpp index 8d758eefdc0..e7917a0a616 100644 --- a/src/Common/Config/YAMLParser.cpp +++ b/src/Common/Config/YAMLParser.cpp @@ -156,7 +156,7 @@ Poco::AutoPtr YAMLParser::parse(const String& path) throw Exception(ErrorCodes::CANNOT_OPEN_FILE, "Unable to open YAML configuration file {}", path); } Poco::AutoPtr xml = new Document; - Poco::AutoPtr root_node = xml->createElement("yandex"); + Poco::AutoPtr root_node = xml->createElement("clickhouse"); xml->appendChild(root_node); try { diff --git a/src/Common/Exception.cpp b/src/Common/Exception.cpp index 09629b436b2..408cab8b702 100644 --- a/src/Common/Exception.cpp +++ b/src/Common/Exception.cpp @@ -534,6 +534,13 @@ ExecutionStatus ExecutionStatus::fromCurrentException(const std::string & start_ return ExecutionStatus(getCurrentExceptionCode(), msg); } +ExecutionStatus ExecutionStatus::fromText(const std::string & data) +{ + ExecutionStatus status; + status.deserializeText(data); + return status; +} + ParsingException::ParsingException() = default; ParsingException::ParsingException(const std::string & msg, int code) : Exception(msg, code) diff --git a/src/Common/Exception.h b/src/Common/Exception.h index d04b0f71b9e..3aa06f8c988 100644 --- a/src/Common/Exception.h +++ b/src/Common/Exception.h @@ -184,6 +184,8 @@ struct ExecutionStatus static ExecutionStatus fromCurrentException(const std::string & start_of_message = ""); + static ExecutionStatus fromText(const std::string & data); + std::string serializeText() const; void deserializeText(const std::string & data); diff --git a/src/Common/ExponentiallySmoothedCounter.h b/src/Common/ExponentiallySmoothedCounter.h new file mode 100644 index 00000000000..28d4e5e25c1 --- /dev/null +++ b/src/Common/ExponentiallySmoothedCounter.h @@ -0,0 +1,114 @@ +#pragma once + +#include +#include + + +namespace DB +{ + +/** https://en.wikipedia.org/wiki/Exponential_smoothing + * + * Exponentially smoothed average over time is weighted average with weight proportional to negative exponent of the time passed. + * For example, the last value is taken with weight 1/2, the value one second ago with weight 1/4, two seconds ago - 1/8, etc. + * It can be understood as an average over sliding window, but with different kernel. + * + * As an advantage, it is easy to update. Instead of collecting values and calculating a series of x1 / 2 + x2 / 4 + x3 / 8... + * just calculate x_old / 2 + x_new / 2. + * + * It is often used for resource usage metrics. For example, "load average" in Linux is exponentially smoothed moving average. + * We can use exponentially smoothed counters in query scheduler. + */ +struct ExponentiallySmoothedAverage +{ + /// The sum. It contains the last value and all previous values scaled accordingly to the difference of their time to the reference time. + /// Older values are summed with exponentially smaller coefficients. + /// To obtain the average, you have to divide this value to the sum of all coefficients (see 'sumWeights'). + + double value = 0; + + /// The point of reference. You can translate the value to a different point of reference (see 'remap'). + /// You can imagine that the value exponentially decays over time. + /// But it is also meaningful to treat the whole counters as constants over time but in another non-linear coordinate system, + /// that inflates over time, while the counter itself does not change + /// (it continues to be the same physical quantity, but only changes its representation in the "usual" coordinate system). + + /// Recap: the whole counter is one dimensional and it can be represented as a curve formed by two dependent coordinates in 2d plane, + /// the space can be represented by (value, time) coordinates, and the curves will be exponentially decaying over time, + /// alternatively the space can be represented by (exponentially_adjusted_value, time) and then the curves will be constant over time. + + /// Also useful analogy is the exponential representation of a number: x = a * exp(b) = a * e (where e = exp(b)) + /// a number x is represented by a curve in 2d plane that can be parametrized by coordinates (a, b) or (a, e). + + double time = 0; + + + ExponentiallySmoothedAverage() + { + } + + ExponentiallySmoothedAverage(double current_value, double current_time) + : value(current_value), time(current_time) + { + } + + /// How much value decays after time_passed. + static double scale(double time_passed, double half_decay_time) + { + return exp2(-time_passed / half_decay_time); + } + + /// Sum of weights of all values. Divide by it to get the average. + static double sumWeights(double half_decay_time) + { + double k = scale(1.0, half_decay_time); + return 1 / (1 - k); + } + + /// Obtain the same counter in another point of reference. + ExponentiallySmoothedAverage remap(double current_time, double half_decay_time) const + { + return ExponentiallySmoothedAverage(value * scale(current_time - time, half_decay_time), current_time); + } + + /// Merge two counters. It is done by moving to the same point of reference and summing the values. + static ExponentiallySmoothedAverage merge(const ExponentiallySmoothedAverage & a, const ExponentiallySmoothedAverage & b, double half_decay_time) + { + if (a.time > b.time) + return ExponentiallySmoothedAverage(a.value + b.remap(a.time, half_decay_time).value, a.time); + if (a.time < b.time) + return ExponentiallySmoothedAverage(b.value + a.remap(b.time, half_decay_time).value, b.time); + + return ExponentiallySmoothedAverage(a.value + b.value, a.time); + } + + void merge(const ExponentiallySmoothedAverage & other, double half_decay_time) + { + *this = merge(*this, other, half_decay_time); + } + + void add(double new_value, double current_time, double half_decay_time) + { + merge(ExponentiallySmoothedAverage(new_value, current_time), half_decay_time); + } + + /// Calculate the average from the sum. + double get(double half_decay_time) const + { + return value / sumWeights(half_decay_time); + } + + double get(double current_time, double half_decay_time) const + { + return remap(current_time, half_decay_time).get(half_decay_time); + } + + /// Compare two counters (by moving to the same point of reference and comparing sums). + /// You can store the counters in container and sort it without changing the stored values over time. + bool less(const ExponentiallySmoothedAverage & other, double half_decay_time) const + { + return remap(other.time, half_decay_time).value < other.value; + } +}; + +} diff --git a/src/Common/FileChecker.cpp b/src/Common/FileChecker.cpp index 173c4bd8a3a..91be58e3e39 100644 --- a/src/Common/FileChecker.cpp +++ b/src/Common/FileChecker.cpp @@ -111,7 +111,7 @@ void FileChecker::save() const std::unique_ptr out = disk->writeFile(tmp_files_info_path); /// So complex JSON structure - for compatibility with the old format. - writeCString("{\"yandex\":{", *out); + writeCString("{\"clickhouse\":{", *out); auto settings = FormatSettings(); for (auto it = map.begin(); it != map.end(); ++it) @@ -153,7 +153,7 @@ void FileChecker::load() } JSON json(out.str()); - JSON files = json["yandex"]; + JSON files = json.has("clickhouse") ? json["clickhouse"] : json["yandex"]; for (const JSON file : files) // NOLINT map[unescapeForFileName(file.getName())] = file.getValue()["size"].toUInt(); } diff --git a/src/Common/SettingsChanges.cpp b/src/Common/SettingsChanges.cpp index e7c769ad55a..370b465eba3 100644 --- a/src/Common/SettingsChanges.cpp +++ b/src/Common/SettingsChanges.cpp @@ -1,6 +1,5 @@ #include - namespace DB { namespace diff --git a/src/Common/SettingsChanges.h b/src/Common/SettingsChanges.h index 734d8ecb227..073af21c6cc 100644 --- a/src/Common/SettingsChanges.h +++ b/src/Common/SettingsChanges.h @@ -5,6 +5,9 @@ namespace DB { + +class IColumn; + struct SettingChange { String name; diff --git a/src/Common/ThreadFuzzer.cpp b/src/Common/ThreadFuzzer.cpp index 9963a5308c3..896d8ee4e62 100644 --- a/src/Common/ThreadFuzzer.cpp +++ b/src/Common/ThreadFuzzer.cpp @@ -128,6 +128,9 @@ void ThreadFuzzer::initConfiguration() bool ThreadFuzzer::isEffective() const { + if (!isStarted()) + return false; + #if THREAD_FUZZER_WRAP_PTHREAD # define CHECK_WRAPPER_PARAMS(RET, NAME, ...) \ if (NAME##_before_yield_probability.load(std::memory_order_relaxed)) \ @@ -159,6 +162,20 @@ bool ThreadFuzzer::isEffective() const || (sleep_probability > 0 && sleep_time_us > 0)); } +void ThreadFuzzer::stop() +{ + started.store(false, std::memory_order_relaxed); +} + +void ThreadFuzzer::start() +{ + started.store(true, std::memory_order_relaxed); +} + +bool ThreadFuzzer::isStarted() +{ + return started.load(std::memory_order_relaxed); +} static void injection( double yield_probability, @@ -166,6 +183,10 @@ static void injection( double sleep_probability, double sleep_time_us [[maybe_unused]]) { + DENY_ALLOCATIONS_IN_SCOPE; + if (!ThreadFuzzer::isStarted()) + return; + if (yield_probability > 0 && std::bernoulli_distribution(yield_probability)(thread_local_rng)) { diff --git a/src/Common/ThreadFuzzer.h b/src/Common/ThreadFuzzer.h index 1a9e98ca674..743b8c75dc0 100644 --- a/src/Common/ThreadFuzzer.h +++ b/src/Common/ThreadFuzzer.h @@ -1,6 +1,6 @@ #pragma once #include - +#include namespace DB { @@ -54,6 +54,10 @@ public: bool isEffective() const; + static void stop(); + static void start(); + static bool isStarted(); + private: uint64_t cpu_time_period_us = 0; double yield_probability = 0; @@ -61,6 +65,8 @@ private: double sleep_probability = 0; double sleep_time_us = 0; + inline static std::atomic started{true}; + ThreadFuzzer(); void initConfiguration(); diff --git a/src/Common/ZooKeeper/ZooKeeper.h b/src/Common/ZooKeeper/ZooKeeper.h index bfbfea03aae..27dfdad7cdd 100644 --- a/src/Common/ZooKeeper/ZooKeeper.h +++ b/src/Common/ZooKeeper/ZooKeeper.h @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -277,6 +278,8 @@ public: void setZooKeeperLog(std::shared_ptr zk_log_); + UInt32 getSessionUptime() const { return session_uptime.elapsedSeconds(); } + private: friend class EphemeralNodeHolder; @@ -307,6 +310,8 @@ private: Poco::Logger * log = nullptr; std::shared_ptr zk_log; + + AtomicStopwatch session_uptime; }; diff --git a/src/Core/PostgreSQL/PoolWithFailover.cpp b/src/Core/PostgreSQL/PoolWithFailover.cpp index b8b8e78396c..3addb511c3b 100644 --- a/src/Core/PostgreSQL/PoolWithFailover.cpp +++ b/src/Core/PostgreSQL/PoolWithFailover.cpp @@ -5,6 +5,8 @@ #include "Utils.h" #include #include +#include +#include namespace DB { @@ -18,7 +20,7 @@ namespace postgres { PoolWithFailover::PoolWithFailover( - const Poco::Util::AbstractConfiguration & config, const String & config_prefix, + const DB::ExternalDataSourcesConfigurationByPriority & configurations_by_priority, size_t pool_size, size_t pool_wait_timeout_, size_t max_tries_) : pool_wait_timeout(pool_wait_timeout_) , max_tries(max_tries_) @@ -26,45 +28,19 @@ PoolWithFailover::PoolWithFailover( LOG_TRACE(&Poco::Logger::get("PostgreSQLConnectionPool"), "PostgreSQL connection pool size: {}, connection wait timeout: {}, max failover tries: {}", pool_size, pool_wait_timeout, max_tries_); - auto db = config.getString(config_prefix + ".db", ""); - auto host = config.getString(config_prefix + ".host", ""); - auto port = config.getUInt(config_prefix + ".port", 0); - auto user = config.getString(config_prefix + ".user", ""); - auto password = config.getString(config_prefix + ".password", ""); - - if (config.has(config_prefix + ".replica")) + for (const auto & [priority, configurations] : configurations_by_priority) { - Poco::Util::AbstractConfiguration::Keys config_keys; - config.keys(config_prefix, config_keys); - - for (const auto & config_key : config_keys) + for (const auto & replica_configuration : configurations) { - if (config_key.starts_with("replica")) - { - std::string replica_name = config_prefix + "." + config_key; - size_t priority = config.getInt(replica_name + ".priority", 0); - - auto replica_host = config.getString(replica_name + ".host", host); - auto replica_port = config.getUInt(replica_name + ".port", port); - auto replica_user = config.getString(replica_name + ".user", user); - auto replica_password = config.getString(replica_name + ".password", password); - - auto connection_string = formatConnectionString(db, replica_host, replica_port, replica_user, replica_password).first; - replicas_with_priority[priority].emplace_back(connection_string, pool_size); - } + auto connection_string = formatConnectionString(replica_configuration.database, + replica_configuration.host, replica_configuration.port, replica_configuration.username, replica_configuration.password).first; + replicas_with_priority[priority].emplace_back(connection_string, pool_size, getConnectionForLog(replica_configuration.host, replica_configuration.port)); } } - else - { - auto connection_string = formatConnectionString(db, host, port, user, password).first; - replicas_with_priority[0].emplace_back(connection_string, pool_size); - } } PoolWithFailover::PoolWithFailover( - const std::string & database, - const RemoteDescription & addresses, - const std::string & user, const std::string & password, + const DB::StoragePostgreSQLConfiguration & configuration, size_t pool_size, size_t pool_wait_timeout_, size_t max_tries_) : pool_wait_timeout(pool_wait_timeout_) , max_tries(max_tries_) @@ -73,11 +49,11 @@ PoolWithFailover::PoolWithFailover( pool_size, pool_wait_timeout, max_tries_); /// Replicas have the same priority, but traversed replicas are moved to the end of the queue. - for (const auto & [host, port] : addresses) + for (const auto & [host, port] : configuration.addresses) { LOG_DEBUG(&Poco::Logger::get("PostgreSQLPoolWithFailover"), "Adding address host: {}, port: {} to connection pool", host, port); - auto connection_string = formatConnectionString(database, host, port, user, password).first; - replicas_with_priority[0].emplace_back(connection_string, pool_size); + auto connection_string = formatConnectionString(configuration.database, host, port, configuration.username, configuration.password).first; + replicas_with_priority[0].emplace_back(connection_string, pool_size, getConnectionForLog(host, port)); } } @@ -85,6 +61,7 @@ ConnectionHolderPtr PoolWithFailover::get() { std::lock_guard lock(mutex); + DB::WriteBufferFromOwnString error_message; for (size_t try_idx = 0; try_idx < max_tries; ++try_idx) { for (auto & priority : replicas_with_priority) @@ -115,6 +92,7 @@ ConnectionHolderPtr PoolWithFailover::get() catch (const pqxx::broken_connection & pqxx_error) { LOG_ERROR(log, "Connection error: {}", pqxx_error.what()); + error_message << "Try " << try_idx + 1 << ". Connection to `" << replica.name_for_log << "` failed: " << pqxx_error.what() << "\n"; replica.pool->returnObject(std::move(connection)); continue; @@ -136,7 +114,7 @@ ConnectionHolderPtr PoolWithFailover::get() } } - throw DB::Exception(DB::ErrorCodes::POSTGRESQL_CONNECTION_FAILURE, "Unable to connect to any of the replicas"); + throw DB::Exception(DB::ErrorCodes::POSTGRESQL_CONNECTION_FAILURE, error_message.str()); } } diff --git a/src/Core/PostgreSQL/PoolWithFailover.h b/src/Core/PostgreSQL/PoolWithFailover.h index 9150262e242..9d491625a0b 100644 --- a/src/Core/PostgreSQL/PoolWithFailover.h +++ b/src/Core/PostgreSQL/PoolWithFailover.h @@ -11,6 +11,7 @@ #include #include #include +#include namespace postgres @@ -27,17 +28,13 @@ public: static constexpr inline auto POSTGRESQL_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES = 5; PoolWithFailover( - const Poco::Util::AbstractConfiguration & config, - const std::string & config_prefix, + const DB::ExternalDataSourcesConfigurationByPriority & configurations_by_priority, size_t pool_size = POSTGRESQL_POOL_DEFAULT_SIZE, size_t pool_wait_timeout = POSTGRESQL_POOL_WAIT_TIMEOUT, size_t max_tries_ = POSTGRESQL_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES); PoolWithFailover( - const std::string & database, - const RemoteDescription & addresses, - const std::string & user, - const std::string & password, + const DB::StoragePostgreSQLConfiguration & configuration, size_t pool_size = POSTGRESQL_POOL_DEFAULT_SIZE, size_t pool_wait_timeout = POSTGRESQL_POOL_WAIT_TIMEOUT, size_t max_tries_ = POSTGRESQL_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES); @@ -51,9 +48,10 @@ private: { String connection_string; PoolPtr pool; + String name_for_log; - PoolHolder(const String & connection_string_, size_t pool_size) - : connection_string(connection_string_), pool(std::make_shared(pool_size)) {} + PoolHolder(const String & connection_string_, size_t pool_size, const String & name_for_log_) + : connection_string(connection_string_), pool(std::make_shared(pool_size)), name_for_log(name_for_log_) {} }; /// Highest priority is 0, the bigger the number in map, the less the priority diff --git a/src/Core/PostgreSQL/Utils.cpp b/src/Core/PostgreSQL/Utils.cpp index accc3b29a93..60b13218202 100644 --- a/src/Core/PostgreSQL/Utils.cpp +++ b/src/Core/PostgreSQL/Utils.cpp @@ -3,6 +3,7 @@ #if USE_LIBPQXX #include +#include namespace postgres { @@ -19,6 +20,11 @@ ConnectionInfo formatConnectionString(String dbname, String host, UInt16 port, S return std::make_pair(out.str(), host + ':' + DB::toString(port)); } +String getConnectionForLog(const String & host, UInt16 port) +{ + return host + ":" + DB::toString(port); +} + String formatNameForLogs(const String & postgres_database_name, const String & postgres_table_name) { /// Logger for StorageMaterializedPostgreSQL - both db and table names. diff --git a/src/Core/PostgreSQL/Utils.h b/src/Core/PostgreSQL/Utils.h index 59b44f8f5e1..c80089d5741 100644 --- a/src/Core/PostgreSQL/Utils.h +++ b/src/Core/PostgreSQL/Utils.h @@ -22,6 +22,8 @@ namespace postgres ConnectionInfo formatConnectionString(String dbname, String host, UInt16 port, String user, String password); +String getConnectionForLog(const String & host, UInt16 port); + String formatNameForLogs(const String & postgres_database_name, const String & postgres_table_name); } diff --git a/src/Core/QualifiedTableName.h b/src/Core/QualifiedTableName.h index 453d55d85c7..c1cb9b27d15 100644 --- a/src/Core/QualifiedTableName.h +++ b/src/Core/QualifiedTableName.h @@ -2,11 +2,20 @@ #include #include +#include +#include #include +#include +#include namespace DB { +namespace ErrorCodes +{ +extern const int SYNTAX_ERROR; +} + //TODO replace with StorageID struct QualifiedTableName { @@ -30,6 +39,46 @@ struct QualifiedTableName hash_state.update(table.data(), table.size()); return hash_state.get64(); } + + /// NOTE: It's different from compound identifier parsing and does not support escaping and dots in name. + /// Usually it's better to use ParserIdentifier instead, + /// but we parse DDL dictionary name (and similar things) this way for historical reasons. + static std::optional tryParseFromString(const String & maybe_qualified_name) + { + if (maybe_qualified_name.empty()) + return {}; + + /// Do not allow dot at the beginning and at the end + auto pos = maybe_qualified_name.find('.'); + if (pos == 0 || pos == (maybe_qualified_name.size() - 1)) + return {}; + + QualifiedTableName name; + if (pos == std::string::npos) + { + name.table = std::move(maybe_qualified_name); + } + else if (maybe_qualified_name.find('.', pos + 1) != std::string::npos) + { + /// Do not allow multiple dots + return {}; + } + else + { + name.database = maybe_qualified_name.substr(0, pos); + name.table = maybe_qualified_name.substr(pos + 1); + } + + return name; + } + + static QualifiedTableName parseFromString(const String & maybe_qualified_name) + { + auto name = tryParseFromString(maybe_qualified_name); + if (!name) + throw Exception(ErrorCodes::SYNTAX_ERROR, "Invalid qualified name: {}", maybe_qualified_name); + return *name; + } }; } @@ -47,5 +96,23 @@ template <> struct hash return qualified_table.hash(); } }; - } + +namespace fmt +{ + template <> + struct formatter + { + constexpr auto parse(format_parse_context & ctx) + { + return ctx.begin(); + } + + template + auto format(const DB::QualifiedTableName & name, FormatContext & ctx) + { + return format_to(ctx.out(), "{}.{}", DB::backQuoteIfNeed(name.database), DB::backQuoteIfNeed(name.table)); + } + }; +} + diff --git a/src/Core/Settings.h b/src/Core/Settings.h index 4980897965f..a728ba636ad 100644 --- a/src/Core/Settings.h +++ b/src/Core/Settings.h @@ -386,6 +386,7 @@ class IColumn; M(Bool, low_cardinality_allow_in_native_format, true, "Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.", 0) \ M(Bool, cancel_http_readonly_queries_on_client_close, false, "Cancel HTTP readonly queries when a client closes the connection without waiting for response.", 0) \ M(Bool, external_table_functions_use_nulls, true, "If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only by 'mysql', 'postgresql' and 'odbc' table functions.", 0) \ + M(Bool, external_table_strict_query, false, "If it is set to true, transforming expression to local filter is forbidden for queries to external tables.", 0) \ \ M(Bool, allow_hyperscan, true, "Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.", 0) \ M(UInt64, max_hyperscan_regexp_length, 0, "Max length of regexp than can be used in hyperscan multi-match functions. Zero means unlimited.", 0) \ @@ -455,7 +456,7 @@ class IColumn; M(Bool, allow_non_metadata_alters, true, "Allow to execute alters which affects not only tables metadata, but also data on disk", 0) \ M(Bool, enable_global_with_statement, true, "Propagate WITH statements to UNION queries and all subqueries", 0) \ M(Bool, aggregate_functions_null_for_empty, false, "Rewrite all aggregate functions in a query, adding -OrNull suffix to them", 0) \ - M(Bool, optimize_fuse_sum_count_avg, false, "Fuse aggregate functions sum(), avg(), count() with identical arguments into one sumCount() call, if the query has at least two different functions", 0) \ + M(Bool, optimize_syntax_fuse_functions, false, "Fuse aggregate functions (`sum, avg, count` with identical arguments into one `sumCount`, quantile-family functions with the same argument into `quantiles*(...)[...]`)", 0) \ M(Bool, flatten_nested, true, "If true, columns of type Nested will be flatten to separate array columns instead of one array of tuples", 0) \ M(Bool, asterisk_include_materialized_columns, false, "Include MATERIALIZED columns for wildcard query", 0) \ M(Bool, asterisk_include_alias_columns, false, "Include ALIAS columns for wildcard query", 0) \ @@ -507,8 +508,16 @@ class IColumn; M(Bool, remote_filesystem_read_prefetch, true, "Should use prefetching when reading data from remote filesystem.", 0) \ M(Int64, read_priority, 0, "Priority to read data from local filesystem. Only supported for 'pread_threadpool' method.", 0) \ \ - M(Int64, remote_disk_read_backoff_threashold, 10000, "Max wait time when trying to read data for remote disk", 0) \ - M(Int64, remote_disk_read_backoff_max_tries, 5, "Max attempts to read with backoff", 0) \ + M(UInt64, async_insert_threads, 16, "Maximum number of threads to actually parse and insert data in background. Zero means asynchronous mode is disabled", 0) \ + M(Bool, async_insert, false, "If true, data from INSERT query is stored in queue and later flushed to table in background. Makes sense only for inserts via HTTP protocol. If wait_for_async_insert is false, INSERT query is processed almost instantly, otherwise client will wait until data will be flushed to table", 0) \ + M(Bool, wait_for_async_insert, true, "If true wait for processing of asynchronous insertion", 0) \ + M(Seconds, wait_for_async_insert_timeout, DBMS_DEFAULT_LOCK_ACQUIRE_TIMEOUT_SEC, "Timeout for waiting for processing asynchronous insertion", 0) \ + M(UInt64, async_insert_max_data_size, 100000, "Maximum size in bytes of unparsed data collected per query before being inserted", 0) \ + M(Milliseconds, async_insert_busy_timeout_ms, 200, "Maximum time to wait before dumping collected data per query since the first data appeared", 0) \ + M(Milliseconds, async_insert_stale_timeout_ms, 0, "Maximum time to wait before dumping collected data per query since the last data appeared. Zero means no timeout at all", 0) \ + \ + M(Int64, remote_fs_read_backoff_threshold, 10000, "Max wait time when trying to read data for remote disk", 0) \ + M(Int64, remote_fs_read_backoff_max_tries, 5, "Max attempts to read with backoff", 0) \ \ /** Experimental functions */ \ M(Bool, allow_experimental_funnel_functions, false, "Enable experimental functions for funnel analysis.", 0) \ @@ -525,6 +534,7 @@ class IColumn; M(HandleKafkaErrorMode, handle_kafka_error_mode, HandleKafkaErrorMode::DEFAULT, "Obsolete setting, does nothing.", 0) \ M(Bool, database_replicated_ddl_output, true, "Obsolete setting, does nothing.", 0) \ M(UInt64, replication_alter_columns_timeout, 60, "Obsolete setting, does nothing.", 0) \ + M(Bool, optimize_fuse_sum_count_avg, false, "Obsolete, use optimize_syntax_fuse_functions", 0) \ /** The section above is for obsolete settings. Do not add anything there. */ @@ -577,6 +587,7 @@ class IColumn; M(String, output_format_avro_codec, "", "Compression codec used for output. Possible values: 'null', 'deflate', 'snappy'.", 0) \ M(UInt64, output_format_avro_sync_interval, 16 * 1024, "Sync interval in bytes.", 0) \ M(Bool, output_format_tsv_crlf_end_of_line, false, "If it is set true, end of line in TSV format will be \\r\\n instead of \\n.", 0) \ + M(String, output_format_csv_null_representation, "\\N", "Custom NULL representation in CSV format", 0) \ M(String, output_format_tsv_null_representation, "\\N", "Custom NULL representation in TSV format", 0) \ M(Bool, output_format_decimal_trailing_zeros, false, "Output trailing zeros when printing Decimal values. E.g. 1.230000 instead of 1.23.", 0) \ \ @@ -605,14 +616,6 @@ class IColumn; M(Bool, output_format_pretty_row_numbers, false, "Add row numbers before each row for pretty output format", 0) \ M(Bool, insert_distributed_one_random_shard, false, "If setting is enabled, inserting into distributed table will choose a random shard to write when there is no sharding key", 0) \ \ - M(UInt64, async_insert_threads, 16, "Maximum number of threads to actually parse and insert data in background. Zero means asynchronous mode is disabled", 0) \ - M(Bool, async_insert_mode, false, "Insert query is processed almost instantly, but an actual data queued for later asynchronous insertion", 0) \ - M(Bool, wait_for_async_insert, true, "If true wait for processing of asynchronous insertion", 0) \ - M(Seconds, wait_for_async_insert_timeout, DBMS_DEFAULT_LOCK_ACQUIRE_TIMEOUT_SEC, "Timeout for waiting for processing asynchronous insertion", 0) \ - M(UInt64, async_insert_max_data_size, 1000000, "Maximum size in bytes of unparsed data collected per query before being inserted", 0) \ - M(Milliseconds, async_insert_busy_timeout, 200, "Maximum time to wait before dumping collected data per query since the first data appeared", 0) \ - M(Milliseconds, async_insert_stale_timeout, 0, "Maximum time to wait before dumping collected data per query since the last data appeared. Zero means no timeout at all", 0) \ - \ M(Bool, cross_to_inner_join_rewrite, true, "Use inner join instead of comma/cross join if possible", 0) \ \ M(Bool, output_format_arrow_low_cardinality_as_dictionary, false, "Enable output LowCardinality type as Dictionary Arrow type", 0) \ diff --git a/src/DataTypes/Serializations/SerializationNullable.cpp b/src/DataTypes/Serializations/SerializationNullable.cpp index 4de2b08c043..b607d5871d6 100644 --- a/src/DataTypes/Serializations/SerializationNullable.cpp +++ b/src/DataTypes/Serializations/SerializationNullable.cpp @@ -327,7 +327,7 @@ void SerializationNullable::serializeTextCSV(const IColumn & column, size_t row_ const ColumnNullable & col = assert_cast(column); if (col.isNullAt(row_num)) - writeCString("\\N", ostr); + writeString(settings.csv.null_representation, ostr); else nested->serializeTextCSV(col.getNestedColumn(), row_num, ostr, settings); } diff --git a/src/Databases/DDLDependencyVisitor.cpp b/src/Databases/DDLDependencyVisitor.cpp new file mode 100644 index 00000000000..0399ec59b16 --- /dev/null +++ b/src/Databases/DDLDependencyVisitor.cpp @@ -0,0 +1,103 @@ +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + +void DDLDependencyVisitor::visit(const ASTPtr & ast, Data & data) +{ + /// Looking for functions in column default expressions and dictionary source definition + if (const auto * function = ast->as()) + visit(*function, data); + else if (const auto * dict_source = ast->as()) + visit(*dict_source, data); +} + +bool DDLDependencyVisitor::needChildVisit(const ASTPtr & node, const ASTPtr & /*child*/) +{ + return !node->as(); +} + +void DDLDependencyVisitor::visit(const ASTFunction & function, Data & data) +{ + if (function.name == "joinGet" || + function.name == "dictHas" || + function.name == "dictIsIn" || + function.name.starts_with("dictGet")) + { + extractTableNameFromArgument(function, data, 0); + } + else if (Poco::toLower(function.name) == "in") + { + extractTableNameFromArgument(function, data, 1); + } + +} + +void DDLDependencyVisitor::visit(const ASTFunctionWithKeyValueArguments & dict_source, Data & data) +{ + if (dict_source.name != "clickhouse") + return; + if (!dict_source.elements) + return; + + auto config = getDictionaryConfigurationFromAST(data.create_query->as(), data.global_context); + auto info = getInfoIfClickHouseDictionarySource(config, data.global_context); + + if (!info || !info->is_local) + return; + + if (info->table_name.database.empty()) + info->table_name.database = data.default_database; + data.dependencies.emplace(std::move(info->table_name)); +} + + +void DDLDependencyVisitor::extractTableNameFromArgument(const ASTFunction & function, Data & data, size_t arg_idx) +{ + /// Just ignore incorrect arguments, proper exception will be thrown later + if (!function.arguments || function.arguments->children.size() <= arg_idx) + return; + + QualifiedTableName qualified_name; + + const auto * arg = function.arguments->as()->children[arg_idx].get(); + if (const auto * literal = arg->as()) + { + if (literal->value.getType() != Field::Types::String) + return; + + auto maybe_qualified_name = QualifiedTableName::tryParseFromString(literal->value.get()); + /// Just return if name if invalid + if (!maybe_qualified_name) + return; + + qualified_name = std::move(*maybe_qualified_name); + } + else if (const auto * identifier = arg->as()) + { + auto table_identifier = identifier->createTable(); + /// Just return if table identified is invalid + if (!table_identifier) + return; + + qualified_name.database = table_identifier->getDatabaseName(); + qualified_name.table = table_identifier->shortName(); + } + else + { + assert(false); + return; + } + + if (qualified_name.database.empty()) + qualified_name.database = data.default_database; + data.dependencies.emplace(std::move(qualified_name)); +} + +} diff --git a/src/Databases/DDLDependencyVisitor.h b/src/Databases/DDLDependencyVisitor.h new file mode 100644 index 00000000000..c0b39d70b08 --- /dev/null +++ b/src/Databases/DDLDependencyVisitor.h @@ -0,0 +1,42 @@ +#pragma once +#include +#include +#include + +namespace DB +{ + +class ASTFunction; +class ASTFunctionWithKeyValueArguments; + +/// Visits ASTCreateQuery and extracts names of table (or dictionary) dependencies +/// from column default expressions (joinGet, dictGet, etc) +/// or dictionary source (for dictionaries from local ClickHouse table). +/// Does not validate AST, works a best-effort way. +class DDLDependencyVisitor +{ +public: + struct Data + { + using TableNamesSet = std::set; + String default_database; + TableNamesSet dependencies; + ContextPtr global_context; + ASTPtr create_query; + }; + + using Visitor = ConstInDepthNodeVisitor; + + static void visit(const ASTPtr & ast, Data & data); + static bool needChildVisit(const ASTPtr & node, const ASTPtr & child); + +private: + static void visit(const ASTFunction & function, Data & data); + static void visit(const ASTFunctionWithKeyValueArguments & dict_source, Data & data); + + static void extractTableNameFromArgument(const ASTFunction & function, Data & data, size_t arg_idx); +}; + +using TableLoadingDependenciesVisitor = DDLDependencyVisitor::Visitor; + +} diff --git a/src/Databases/DatabaseAtomic.cpp b/src/Databases/DatabaseAtomic.cpp index 2dbcd652004..5c75f6f1036 100644 --- a/src/Databases/DatabaseAtomic.cpp +++ b/src/Databases/DatabaseAtomic.cpp @@ -416,40 +416,49 @@ UUID DatabaseAtomic::tryGetTableUUID(const String & table_name) const return UUIDHelpers::Nil; } -void DatabaseAtomic::loadStoredObjects( - ContextMutablePtr local_context, bool has_force_restore_data_flag, bool force_attach, bool skip_startup_tables) +void DatabaseAtomic::beforeLoadingMetadata(ContextMutablePtr /*context*/, bool force_restore, bool /*force_attach*/) { + if (!force_restore) + return; + /// Recreate symlinks to table data dirs in case of force restore, because some of them may be broken - if (has_force_restore_data_flag) + for (const auto & table_path : fs::directory_iterator(path_to_table_symlinks)) { - for (const auto & table_path : fs::directory_iterator(path_to_table_symlinks)) + if (!fs::is_symlink(table_path)) { - if (!fs::is_symlink(table_path)) - { - throw Exception(ErrorCodes::ABORTED, - "'{}' is not a symlink. Atomic database should contains only symlinks.", std::string(table_path.path())); - } - - fs::remove(table_path); - } - } - - DatabaseOrdinary::loadStoredObjects(local_context, has_force_restore_data_flag, force_attach, skip_startup_tables); - - if (has_force_restore_data_flag) - { - NameToPathMap table_names; - { - std::lock_guard lock{mutex}; - table_names = table_name_to_path; + throw Exception(ErrorCodes::ABORTED, + "'{}' is not a symlink. Atomic database should contains only symlinks.", std::string(table_path.path())); } - fs::create_directories(path_to_table_symlinks); - for (const auto & table : table_names) - tryCreateSymlink(table.first, table.second, true); + fs::remove(table_path); } } +void DatabaseAtomic::loadStoredObjects( + ContextMutablePtr local_context, bool force_restore, bool force_attach, bool skip_startup_tables) +{ + beforeLoadingMetadata(local_context, force_restore, force_attach); + DatabaseOrdinary::loadStoredObjects(local_context, force_restore, force_attach, skip_startup_tables); +} + +void DatabaseAtomic::startupTables(ThreadPool & thread_pool, bool force_restore, bool force_attach) +{ + DatabaseOrdinary::startupTables(thread_pool, force_restore, force_attach); + + if (!force_restore) + return; + + NameToPathMap table_names; + { + std::lock_guard lock{mutex}; + table_names = table_name_to_path; + } + + fs::create_directories(path_to_table_symlinks); + for (const auto & table : table_names) + tryCreateSymlink(table.first, table.second, true); +} + void DatabaseAtomic::tryCreateSymlink(const String & table_name, const String & actual_data_path, bool if_data_path_exist) { try diff --git a/src/Databases/DatabaseAtomic.h b/src/Databases/DatabaseAtomic.h index 8be009cd6ca..1fe13f8b27f 100644 --- a/src/Databases/DatabaseAtomic.h +++ b/src/Databases/DatabaseAtomic.h @@ -47,7 +47,11 @@ public: DatabaseTablesIteratorPtr getTablesIterator(ContextPtr context, const FilterByNameFunction & filter_by_table_name) const override; - void loadStoredObjects(ContextMutablePtr context, bool has_force_restore_data_flag, bool force_attach, bool skip_startup_tables) override; + void loadStoredObjects(ContextMutablePtr context, bool force_restore, bool force_attach, bool skip_startup_tables) override; + + void beforeLoadingMetadata(ContextMutablePtr context, bool force_restore, bool force_attach) override; + + void startupTables(ThreadPool & thread_pool, bool force_restore, bool force_attach) override; /// Atomic database cannot be detached if there is detached table which still in use void assertCanBeDetached(bool cleanup) override; diff --git a/src/Databases/DatabaseFactory.cpp b/src/Databases/DatabaseFactory.cpp index 047d4a55802..731af1458bf 100644 --- a/src/Databases/DatabaseFactory.cpp +++ b/src/Databases/DatabaseFactory.cpp @@ -13,6 +13,7 @@ #include #include #include +#include #include #if !defined(ARCADIA_BUILD) @@ -38,6 +39,7 @@ #include // Y_IGNORE #include #include +#include #endif #if USE_SQLITE @@ -141,40 +143,66 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String else if (engine_name == "MySQL" || engine_name == "MaterializeMySQL" || engine_name == "MaterializedMySQL") { const ASTFunction * engine = engine_define->engine; - if (!engine->arguments || engine->arguments->children.size() != 4) - throw Exception( - engine_name + " Database require mysql_hostname, mysql_database_name, mysql_username, mysql_password arguments.", - ErrorCodes::BAD_ARGUMENTS); + if (!engine->arguments) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Engine `{}` must have arguments", engine_name); + StorageMySQLConfiguration configuration; ASTs & arguments = engine->arguments->children; - arguments[1] = evaluateConstantExpressionOrIdentifierAsLiteral(arguments[1], context); - const auto & host_port = safeGetLiteralValue(arguments[0], engine_name); - const auto & mysql_database_name = safeGetLiteralValue(arguments[1], engine_name); - const auto & mysql_user_name = safeGetLiteralValue(arguments[2], engine_name); - const auto & mysql_user_password = safeGetLiteralValue(arguments[3], engine_name); + if (auto named_collection = getExternalDataSourceConfiguration(arguments, context, true)) + { + auto [common_configuration, storage_specific_args] = named_collection.value(); + + configuration.set(common_configuration); + configuration.addresses = {std::make_pair(configuration.host, configuration.port)}; + + if (!storage_specific_args.empty()) + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "MySQL database require mysql_hostname, mysql_database_name, mysql_username, mysql_password arguments."); + } + else + { + if (arguments.size() != 4) + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "MySQL database require mysql_hostname, mysql_database_name, mysql_username, mysql_password arguments."); + + + arguments[1] = evaluateConstantExpressionOrIdentifierAsLiteral(arguments[1], context); + const auto & host_port = safeGetLiteralValue(arguments[0], engine_name); + + if (engine_name == "MySQL") + { + size_t max_addresses = context->getSettingsRef().glob_expansion_max_elements; + configuration.addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 3306); + } + else + { + const auto & [remote_host, remote_port] = parseAddress(host_port, 3306); + configuration.host = remote_host; + configuration.port = remote_port; + } + + configuration.database = safeGetLiteralValue(arguments[1], engine_name); + configuration.username = safeGetLiteralValue(arguments[2], engine_name); + configuration.password = safeGetLiteralValue(arguments[3], engine_name); + } try { if (engine_name == "MySQL") { auto mysql_database_settings = std::make_unique(); - /// Split into replicas if needed. - size_t max_addresses = context->getSettingsRef().glob_expansion_max_elements; - auto addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 3306); - auto mysql_pool = mysqlxx::PoolWithFailover(mysql_database_name, addresses, mysql_user_name, mysql_user_password); + auto mysql_pool = mysqlxx::PoolWithFailover(configuration.database, configuration.addresses, configuration.username, configuration.password); mysql_database_settings->loadFromQueryContext(context); mysql_database_settings->loadFromQuery(*engine_define); /// higher priority return std::make_shared( - context, database_name, metadata_path, engine_define, mysql_database_name, std::move(mysql_database_settings), std::move(mysql_pool)); + context, database_name, metadata_path, engine_define, configuration.database, std::move(mysql_database_settings), std::move(mysql_pool)); } - const auto & [remote_host_name, remote_port] = parseAddress(host_port, 3306); - MySQLClient client(remote_host_name, remote_port, mysql_user_name, mysql_user_password); - auto mysql_pool = mysqlxx::Pool(mysql_database_name, remote_host_name, mysql_user_name, mysql_user_password, remote_port); - + MySQLClient client(configuration.host, configuration.port, configuration.username, configuration.password); + auto mysql_pool = mysqlxx::Pool(configuration.database, configuration.host, configuration.username, configuration.password, configuration.port); auto materialize_mode_settings = std::make_unique(); @@ -183,12 +211,12 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String if (create.uuid == UUIDHelpers::Nil) return std::make_shared>( - context, database_name, metadata_path, uuid, mysql_database_name, - std::move(mysql_pool), std::move(client), std::move(materialize_mode_settings)); + context, database_name, metadata_path, uuid, configuration.database, std::move(mysql_pool), + std::move(client), std::move(materialize_mode_settings)); else return std::make_shared>( - context, database_name, metadata_path, uuid, mysql_database_name, - std::move(mysql_pool), std::move(client), std::move(materialize_mode_settings)); + context, database_name, metadata_path, uuid, configuration.database, std::move(mysql_pool), + std::move(client), std::move(materialize_mode_settings)); } catch (...) { @@ -242,77 +270,109 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String else if (engine_name == "PostgreSQL") { const ASTFunction * engine = engine_define->engine; - - if (!engine->arguments || engine->arguments->children.size() < 4 || engine->arguments->children.size() > 6) - throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{} Database require `host:port`, `database_name`, `username`, `password` [, `schema` = "", `use_table_cache` = 0].", - engine_name); + if (!engine->arguments) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Engine `{}` must have arguments", engine_name); ASTs & engine_args = engine->arguments->children; + auto use_table_cache = false; + StoragePostgreSQLConfiguration configuration; - for (auto & engine_arg : engine_args) - engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, context); + if (auto named_collection = getExternalDataSourceConfiguration(engine_args, context, true)) + { + auto [common_configuration, storage_specific_args] = named_collection.value(); - const auto & host_port = safeGetLiteralValue(engine_args[0], engine_name); - const auto & postgres_database_name = safeGetLiteralValue(engine_args[1], engine_name); - const auto & username = safeGetLiteralValue(engine_args[2], engine_name); - const auto & password = safeGetLiteralValue(engine_args[3], engine_name); + configuration.set(common_configuration); + configuration.addresses = {std::make_pair(configuration.host, configuration.port)}; - String schema; - if (engine->arguments->children.size() >= 5) - schema = safeGetLiteralValue(engine_args[4], engine_name); + for (const auto & [arg_name, arg_value] : storage_specific_args) + { + if (arg_name == "use_table_cache") + use_table_cache = true; + else + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Unexpected key-value argument." + "Got: {}, but expected one of:" + "host, port, username, password, database, schema, use_table_cache.", arg_name); + } + } + else + { + if (engine_args.size() < 4 || engine_args.size() > 6) + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "PostgreSQL Database require `host:port`, `database_name`, `username`, `password`" + "[, `schema` = "", `use_table_cache` = 0"); - auto use_table_cache = 0; - if (engine->arguments->children.size() >= 6) + for (auto & engine_arg : engine_args) + engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, context); + + const auto & host_port = safeGetLiteralValue(engine_args[0], engine_name); + size_t max_addresses = context->getSettingsRef().glob_expansion_max_elements; + + configuration.addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 5432); + configuration.database = safeGetLiteralValue(engine_args[1], engine_name); + configuration.username = safeGetLiteralValue(engine_args[2], engine_name); + configuration.password = safeGetLiteralValue(engine_args[3], engine_name); + + if (engine_args.size() >= 5) + configuration.schema = safeGetLiteralValue(engine_args[4], engine_name); + } + + if (engine_args.size() >= 6) use_table_cache = safeGetLiteralValue(engine_args[5], engine_name); - /// Split into replicas if needed. - size_t max_addresses = context->getSettingsRef().glob_expansion_max_elements; - auto addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 5432); - - /// no connection is made here - auto connection_pool = std::make_shared( - postgres_database_name, - addresses, - username, password, + auto pool = std::make_shared(configuration, context->getSettingsRef().postgresql_connection_pool_size, context->getSettingsRef().postgresql_connection_pool_wait_timeout); return std::make_shared( - context, metadata_path, engine_define, database_name, postgres_database_name, schema, connection_pool, use_table_cache); + context, metadata_path, engine_define, database_name, configuration, pool, use_table_cache); } else if (engine_name == "MaterializedPostgreSQL") { const ASTFunction * engine = engine_define->engine; - - if (!engine->arguments || engine->arguments->children.size() != 4) - { - throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{} Database require `host:port`, `database_name`, `username`, `password`.", - engine_name); - } + if (!engine->arguments) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Engine `{}` must have arguments", engine_name); ASTs & engine_args = engine->arguments->children; + StoragePostgreSQLConfiguration configuration; - for (auto & engine_arg : engine_args) - engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, context); + if (auto named_collection = getExternalDataSourceConfiguration(engine_args, context, true)) + { + auto [common_configuration, storage_specific_args] = named_collection.value(); + configuration.set(common_configuration); - const auto & host_port = safeGetLiteralValue(engine_args[0], engine_name); - const auto & postgres_database_name = safeGetLiteralValue(engine_args[1], engine_name); - const auto & username = safeGetLiteralValue(engine_args[2], engine_name); - const auto & password = safeGetLiteralValue(engine_args[3], engine_name); + if (!storage_specific_args.empty()) + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "MaterializedPostgreSQL Database requires only `host`, `port`, `database_name`, `username`, `password`."); + } + else + { + if (engine_args.size() != 4) + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "MaterializedPostgreSQL Database require `host:port`, `database_name`, `username`, `password`."); - auto parsed_host_port = parseAddress(host_port, 5432); - auto connection_info = postgres::formatConnectionString(postgres_database_name, parsed_host_port.first, parsed_host_port.second, username, password); + for (auto & engine_arg : engine_args) + engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, context); + + auto parsed_host_port = parseAddress(safeGetLiteralValue(engine_args[0], engine_name), 5432); + + configuration.host = parsed_host_port.first; + configuration.port = parsed_host_port.second; + configuration.database = safeGetLiteralValue(engine_args[1], engine_name); + configuration.username = safeGetLiteralValue(engine_args[2], engine_name); + configuration.password = safeGetLiteralValue(engine_args[3], engine_name); + } + + auto connection_info = postgres::formatConnectionString( + configuration.database, configuration.host, configuration.port, configuration.username, configuration.password); auto postgresql_replica_settings = std::make_unique(); - if (engine_define->settings) postgresql_replica_settings->loadFromQuery(*engine_define); return std::make_shared( context, metadata_path, uuid, create.attach, - database_name, postgres_database_name, connection_info, + database_name, configuration.database, connection_info, std::move(postgresql_replica_settings)); } diff --git a/src/Databases/DatabaseLazy.cpp b/src/Databases/DatabaseLazy.cpp index 7e0e1b7aa43..384c5ff47dd 100644 --- a/src/Databases/DatabaseLazy.cpp +++ b/src/Databases/DatabaseLazy.cpp @@ -36,7 +36,7 @@ DatabaseLazy::DatabaseLazy(const String & name_, const String & metadata_path_, void DatabaseLazy::loadStoredObjects( - ContextMutablePtr local_context, bool /* has_force_restore_data_flag */, bool /*force_attach*/, bool /* skip_startup_tables */) + ContextMutablePtr local_context, bool /* force_restore */, bool /*force_attach*/, bool /* skip_startup_tables */) { iterateMetadataFiles(local_context, [this](const String & file_name) { diff --git a/src/Databases/DatabaseLazy.h b/src/Databases/DatabaseLazy.h index bc79a49b2fe..45c816c2e76 100644 --- a/src/Databases/DatabaseLazy.h +++ b/src/Databases/DatabaseLazy.h @@ -26,7 +26,7 @@ public: bool canContainDistributedTables() const override { return false; } - void loadStoredObjects(ContextMutablePtr context, bool has_force_restore_data_flag, bool force_attach, bool skip_startup_tables) override; + void loadStoredObjects(ContextMutablePtr context, bool force_restore, bool force_attach, bool skip_startup_tables) override; void createTable( ContextPtr context, diff --git a/src/Databases/DatabaseMemory.cpp b/src/Databases/DatabaseMemory.cpp index c0af027e027..0ca1de09dd1 100644 --- a/src/Databases/DatabaseMemory.cpp +++ b/src/Databases/DatabaseMemory.cpp @@ -42,12 +42,17 @@ void DatabaseMemory::dropTable( try { table->drop(); - fs::path table_data_dir{getTableDataPath(table_name)}; - if (fs::exists(table_data_dir)) - fs::remove_all(table_data_dir); + if (table->storesDataOnDisk()) + { + assert(database_name != DatabaseCatalog::TEMPORARY_DATABASE); + fs::path table_data_dir{getTableDataPath(table_name)}; + if (fs::exists(table_data_dir)) + fs::remove_all(table_data_dir); + } } catch (...) { + assert(database_name != DatabaseCatalog::TEMPORARY_DATABASE); attachTableUnlocked(table_name, table, lock); throw; } diff --git a/src/Databases/DatabaseOnDisk.cpp b/src/Databases/DatabaseOnDisk.cpp index f5b930a83c7..e941e18625d 100644 --- a/src/Databases/DatabaseOnDisk.cpp +++ b/src/Databases/DatabaseOnDisk.cpp @@ -46,7 +46,7 @@ std::pair createTableFromAST( const String & database_name, const String & table_data_path_relative, ContextMutablePtr context, - bool has_force_restore_data_flag) + bool force_restore) { ast_create_query.attach = true; ast_create_query.database = database_name; @@ -88,7 +88,7 @@ std::pair createTableFromAST( context->getGlobalContext(), columns, constraints, - has_force_restore_data_flag) + force_restore) }; } diff --git a/src/Databases/DatabaseOnDisk.h b/src/Databases/DatabaseOnDisk.h index e375704be33..dce82c2b441 100644 --- a/src/Databases/DatabaseOnDisk.h +++ b/src/Databases/DatabaseOnDisk.h @@ -16,7 +16,7 @@ std::pair createTableFromAST( const String & database_name, const String & table_data_path_relative, ContextMutablePtr context, - bool has_force_restore_data_flag); + bool force_restore); /** Get the string with the table definition based on the CREATE query. * It is an ATTACH query that you can execute to create a table from the correspondent database. diff --git a/src/Databases/DatabaseOrdinary.cpp b/src/Databases/DatabaseOrdinary.cpp index bfe5de4c95f..1bdb273c9fb 100644 --- a/src/Databases/DatabaseOrdinary.cpp +++ b/src/Databases/DatabaseOrdinary.cpp @@ -4,6 +4,8 @@ #include #include #include +#include +#include #include #include #include @@ -27,8 +29,6 @@ namespace fs = std::filesystem; namespace DB { -static constexpr size_t PRINT_MESSAGE_EACH_N_OBJECTS = 256; -static constexpr size_t PRINT_MESSAGE_EACH_N_SECONDS = 5; static constexpr size_t METADATA_FILE_BUFFER_SIZE = 32768; namespace @@ -39,7 +39,7 @@ namespace DatabaseOrdinary & database, const String & database_name, const String & metadata_path, - bool has_force_restore_data_flag) + bool force_restore) { try { @@ -48,7 +48,7 @@ namespace database_name, database.getTableDataPath(query), context, - has_force_restore_data_flag); + force_restore); database.attachTable(table_name, table, database.getTableDataPath(query)); } @@ -60,15 +60,6 @@ namespace throw; } } - - void logAboutProgress(Poco::Logger * log, size_t processed, size_t total, AtomicStopwatch & watch) - { - if (processed % PRINT_MESSAGE_EACH_N_OBJECTS == 0 || watch.compareAndRestart(PRINT_MESSAGE_EACH_N_SECONDS)) - { - LOG_INFO(log, "{}%", processed * 100.0 / total); - watch.restart(); - } - } } @@ -84,20 +75,88 @@ DatabaseOrdinary::DatabaseOrdinary( } void DatabaseOrdinary::loadStoredObjects( - ContextMutablePtr local_context, bool has_force_restore_data_flag, bool /*force_attach*/, bool skip_startup_tables) + ContextMutablePtr local_context, bool force_restore, bool force_attach, bool skip_startup_tables) { /** Tables load faster if they are loaded in sorted (by name) order. * Otherwise (for the ext4 filesystem), `DirectoryIterator` iterates through them in some order, * which does not correspond to order tables creation and does not correspond to order of their location on disk. */ - using FileNames = std::map; - std::mutex file_names_mutex; - FileNames file_names; - size_t total_dictionaries = 0; + ParsedTablesMetadata metadata; + loadTablesMetadata(local_context, metadata); - auto process_metadata = [&file_names, &total_dictionaries, &file_names_mutex, this]( - const String & file_name) + size_t total_tables = metadata.parsed_tables.size() - metadata.total_dictionaries; + + AtomicStopwatch watch; + std::atomic dictionaries_processed{0}; + std::atomic tables_processed{0}; + + ThreadPool pool; + + /// We must attach dictionaries before attaching tables + /// because while we're attaching tables we may need to have some dictionaries attached + /// (for example, dictionaries can be used in the default expressions for some tables). + /// On the other hand we can attach any dictionary (even sourced from ClickHouse table) + /// without having any tables attached. It is so because attaching of a dictionary means + /// loading of its config only, it doesn't involve loading the dictionary itself. + + /// Attach dictionaries. + for (const auto & name_with_path_and_query : metadata.parsed_tables) + { + const auto & name = name_with_path_and_query.first; + const auto & path = name_with_path_and_query.second.path; + const auto & ast = name_with_path_and_query.second.ast; + const auto & create_query = ast->as(); + + if (create_query.is_dictionary) + { + pool.scheduleOrThrowOnError([&]() + { + loadTableFromMetadata(local_context, path, name, ast, force_restore); + + /// Messages, so that it's not boring to wait for the server to load for a long time. + logAboutProgress(log, ++dictionaries_processed, metadata.total_dictionaries, watch); + }); + } + } + + pool.wait(); + + /// Attach tables. + for (const auto & name_with_path_and_query : metadata.parsed_tables) + { + const auto & name = name_with_path_and_query.first; + const auto & path = name_with_path_and_query.second.path; + const auto & ast = name_with_path_and_query.second.ast; + const auto & create_query = ast->as(); + + if (!create_query.is_dictionary) + { + pool.scheduleOrThrowOnError([&]() + { + loadTableFromMetadata(local_context, path, name, ast, force_restore); + + /// Messages, so that it's not boring to wait for the server to load for a long time. + logAboutProgress(log, ++tables_processed, total_tables, watch); + }); + } + } + + pool.wait(); + + if (!skip_startup_tables) + { + /// After all tables was basically initialized, startup them. + startupTables(pool, force_restore, force_attach); + } +} + +void DatabaseOrdinary::loadTablesMetadata(ContextPtr local_context, ParsedTablesMetadata & metadata) +{ + size_t prev_tables_count = metadata.parsed_tables.size(); + size_t prev_total_dictionaries = metadata.total_dictionaries; + + auto process_metadata = [&metadata, this](const String & file_name) { fs::path path(getMetadataPath()); fs::path file_path(file_name); @@ -122,9 +181,29 @@ void DatabaseOrdinary::loadStoredObjects( return; } - std::lock_guard lock{file_names_mutex}; - file_names[file_name] = ast; - total_dictionaries += create_query->is_dictionary; + TableLoadingDependenciesVisitor::Data data; + data.default_database = metadata.default_database; + data.create_query = ast; + data.global_context = getContext(); + TableLoadingDependenciesVisitor visitor{data}; + visitor.visit(ast); + QualifiedTableName qualified_name{database_name, create_query->table}; + + std::lock_guard lock{metadata.mutex}; + metadata.parsed_tables[qualified_name] = ParsedTableMetadata{full_path.string(), ast}; + if (data.dependencies.empty()) + { + metadata.independent_database_objects.emplace_back(std::move(qualified_name)); + } + else + { + for (const auto & dependency : data.dependencies) + { + metadata.dependencies_info[dependency].dependent_database_objects.push_back(qualified_name); + ++metadata.dependencies_info[qualified_name].dependencies_count; + } + } + metadata.total_dictionaries += create_query->is_dictionary; } } catch (Exception & e) @@ -136,86 +215,29 @@ void DatabaseOrdinary::loadStoredObjects( iterateMetadataFiles(local_context, process_metadata); - size_t total_tables = file_names.size() - total_dictionaries; + size_t objects_in_database = metadata.parsed_tables.size() - prev_tables_count; + size_t dictionaries_in_database = metadata.total_dictionaries - prev_total_dictionaries; + size_t tables_in_database = objects_in_database - dictionaries_in_database; - LOG_INFO(log, "Total {} tables and {} dictionaries.", total_tables, total_dictionaries); - - AtomicStopwatch watch; - std::atomic tables_processed{0}; - - ThreadPool pool; - - /// We must attach dictionaries before attaching tables - /// because while we're attaching tables we may need to have some dictionaries attached - /// (for example, dictionaries can be used in the default expressions for some tables). - /// On the other hand we can attach any dictionary (even sourced from ClickHouse table) - /// without having any tables attached. It is so because attaching of a dictionary means - /// loading of its config only, it doesn't involve loading the dictionary itself. - - /// Attach dictionaries. - for (const auto & name_with_query : file_names) - { - const auto & create_query = name_with_query.second->as(); - - if (create_query.is_dictionary) - { - pool.scheduleOrThrowOnError([&]() - { - tryAttachTable( - local_context, - create_query, - *this, - database_name, - getMetadataPath() + name_with_query.first, - has_force_restore_data_flag); - - /// Messages, so that it's not boring to wait for the server to load for a long time. - logAboutProgress(log, ++tables_processed, total_tables, watch); - }); - } - } - - pool.wait(); - - /// Attach tables. - for (const auto & name_with_query : file_names) - { - const auto & create_query = name_with_query.second->as(); - - if (!create_query.is_dictionary) - { - pool.scheduleOrThrowOnError([&]() - { - tryAttachTable( - local_context, - create_query, - *this, - database_name, - getMetadataPath() + name_with_query.first, - has_force_restore_data_flag); - - /// Messages, so that it's not boring to wait for the server to load for a long time. - logAboutProgress(log, ++tables_processed, total_tables, watch); - }); - } - } - - pool.wait(); - - if (!skip_startup_tables) - { - /// After all tables was basically initialized, startup them. - startupTablesImpl(pool); - } + LOG_INFO(log, "Metadata processed, database {} has {} tables and {} dictionaries in total.", + database_name, tables_in_database, dictionaries_in_database); } -void DatabaseOrdinary::startupTables() +void DatabaseOrdinary::loadTableFromMetadata(ContextMutablePtr local_context, const String & file_path, const QualifiedTableName & name, const ASTPtr & ast, bool force_restore) { - ThreadPool pool; - startupTablesImpl(pool); + assert(name.database == database_name); + const auto & create_query = ast->as(); + + tryAttachTable( + local_context, + create_query, + *this, + name.database, + file_path, + force_restore); } -void DatabaseOrdinary::startupTablesImpl(ThreadPool & thread_pool) +void DatabaseOrdinary::startupTables(ThreadPool & thread_pool, bool /*force_restore*/, bool /*force_attach*/) { LOG_INFO(log, "Starting up tables."); @@ -240,6 +262,7 @@ void DatabaseOrdinary::startupTablesImpl(ThreadPool & thread_pool) } catch (...) { + /// We have to wait for jobs to finish here, because job function has reference to variables on the stack of current thread. thread_pool.wait(); throw; } diff --git a/src/Databases/DatabaseOrdinary.h b/src/Databases/DatabaseOrdinary.h index 5540632d60c..982be2024ce 100644 --- a/src/Databases/DatabaseOrdinary.h +++ b/src/Databases/DatabaseOrdinary.h @@ -21,9 +21,15 @@ public: String getEngineName() const override { return "Ordinary"; } - void loadStoredObjects(ContextMutablePtr context, bool has_force_restore_data_flag, bool force_attach, bool skip_startup_tables) override; + void loadStoredObjects(ContextMutablePtr context, bool force_restore, bool force_attach, bool skip_startup_tables) override; - void startupTables() override; + bool supportsLoadingInTopologicalOrder() const override { return true; } + + void loadTablesMetadata(ContextPtr context, ParsedTablesMetadata & metadata) override; + + void loadTableFromMetadata(ContextMutablePtr local_context, const String & file_path, const QualifiedTableName & name, const ASTPtr & ast, bool force_restore) override; + + void startupTables(ThreadPool & thread_pool, bool force_restore, bool force_attach) override; void alterTable( ContextPtr context, @@ -37,8 +43,6 @@ protected: const String & table_metadata_path, const String & statement, ContextPtr query_context); - - void startupTablesImpl(ThreadPool & thread_pool); }; } diff --git a/src/Databases/DatabaseReplicated.cpp b/src/Databases/DatabaseReplicated.cpp index da03eb6aba6..c2ff002ea36 100644 --- a/src/Databases/DatabaseReplicated.cpp +++ b/src/Databases/DatabaseReplicated.cpp @@ -305,13 +305,21 @@ void DatabaseReplicated::createReplicaNodesInZooKeeper(const zkutil::ZooKeeperPt createEmptyLogEntry(current_zookeeper); } -void DatabaseReplicated::loadStoredObjects( - ContextMutablePtr local_context, bool has_force_restore_data_flag, bool force_attach, bool skip_startup_tables) +void DatabaseReplicated::beforeLoadingMetadata(ContextMutablePtr /*context*/, bool /*force_restore*/, bool force_attach) { tryConnectToZooKeeperAndInitDatabase(force_attach); +} - DatabaseAtomic::loadStoredObjects(local_context, has_force_restore_data_flag, force_attach, skip_startup_tables); +void DatabaseReplicated::loadStoredObjects( + ContextMutablePtr local_context, bool force_restore, bool force_attach, bool skip_startup_tables) +{ + beforeLoadingMetadata(local_context, force_restore, force_attach); + DatabaseAtomic::loadStoredObjects(local_context, force_restore, force_attach, skip_startup_tables); +} +void DatabaseReplicated::startupTables(ThreadPool & thread_pool, bool force_restore, bool force_attach) +{ + DatabaseAtomic::startupTables(thread_pool, force_restore, force_attach); ddl_worker = std::make_unique(this, getContext()); ddl_worker->startup(); } diff --git a/src/Databases/DatabaseReplicated.h b/src/Databases/DatabaseReplicated.h index 1e0daeed07e..60526a1e5b0 100644 --- a/src/Databases/DatabaseReplicated.h +++ b/src/Databases/DatabaseReplicated.h @@ -57,7 +57,12 @@ public: void drop(ContextPtr /*context*/) override; - void loadStoredObjects(ContextMutablePtr context, bool has_force_restore_data_flag, bool force_attach, bool skip_startup_tables) override; + void loadStoredObjects(ContextMutablePtr context, bool force_restore, bool force_attach, bool skip_startup_tables) override; + + void beforeLoadingMetadata(ContextMutablePtr context, bool force_restore, bool force_attach) override; + + void startupTables(ThreadPool & thread_pool, bool force_restore, bool force_attach) override; + void shutdown() override; friend struct DatabaseReplicatedTask; diff --git a/src/Databases/IDatabase.h b/src/Databases/IDatabase.h index 3cb1856d08d..e1ba359bbfc 100644 --- a/src/Databases/IDatabase.h +++ b/src/Databases/IDatabase.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -27,11 +28,14 @@ class ASTCreateQuery; class AlterCommands; class SettingsChanges; using DictionariesWithID = std::vector>; +struct ParsedTablesMetadata; +struct QualifiedTableName; namespace ErrorCodes { extern const int NOT_IMPLEMENTED; extern const int CANNOT_GET_CREATE_TABLE_QUERY; + extern const int LOGICAL_ERROR; } class IDatabaseTablesIterator @@ -127,13 +131,32 @@ public: /// You can call only once, right after the object is created. virtual void loadStoredObjects( ContextMutablePtr /*context*/, - bool /*has_force_restore_data_flag*/, + bool /*force_restore*/, bool /*force_attach*/ = false, bool /* skip_startup_tables */ = false) { } - virtual void startupTables() {} + virtual bool supportsLoadingInTopologicalOrder() const { return false; } + + virtual void beforeLoadingMetadata( + ContextMutablePtr /*context*/, + bool /*force_restore*/, + bool /*force_attach*/) + { + } + + virtual void loadTablesMetadata(ContextPtr /*local_context*/, ParsedTablesMetadata & /*metadata*/) + { + throw Exception(ErrorCodes::LOGICAL_ERROR, "Not implemented"); + } + + virtual void loadTableFromMetadata(ContextMutablePtr /*local_context*/, const String & /*file_path*/, const QualifiedTableName & /*name*/, const ASTPtr & /*ast*/, bool /*force_restore*/) + { + throw Exception(ErrorCodes::LOGICAL_ERROR, "Not implemented"); + } + + virtual void startupTables(ThreadPool & /*thread_pool*/, bool /*force_restore*/, bool /*force_attach*/) {} /// Check the existence of the table. virtual bool isTableExist(const String & name, ContextPtr context) const = 0; diff --git a/src/Databases/MySQL/DatabaseMaterializedMySQL.cpp b/src/Databases/MySQL/DatabaseMaterializedMySQL.cpp index 0d81a4e1a98..2b4649c275a 100644 --- a/src/Databases/MySQL/DatabaseMaterializedMySQL.cpp +++ b/src/Databases/MySQL/DatabaseMaterializedMySQL.cpp @@ -94,10 +94,10 @@ void DatabaseMaterializedMySQL::setException(const std::exception_ptr & ex } template -void DatabaseMaterializedMySQL::loadStoredObjects( - ContextMutablePtr context_, bool has_force_restore_data_flag, bool force_attach, bool skip_startup_tables) +void DatabaseMaterializedMySQL::startupTables(ThreadPool & thread_pool, bool force_restore, bool force_attach) { - Base::loadStoredObjects(context_, has_force_restore_data_flag, force_attach, skip_startup_tables); + Base::startupTables(thread_pool, force_restore, force_attach); + if (!force_attach) materialize_thread.assertMySQLAvailable(); diff --git a/src/Databases/MySQL/DatabaseMaterializedMySQL.h b/src/Databases/MySQL/DatabaseMaterializedMySQL.h index 292edc97878..ac32607a22c 100644 --- a/src/Databases/MySQL/DatabaseMaterializedMySQL.h +++ b/src/Databases/MySQL/DatabaseMaterializedMySQL.h @@ -43,7 +43,7 @@ protected: public: String getEngineName() const override { return "MaterializedMySQL"; } - void loadStoredObjects(ContextMutablePtr context_, bool has_force_restore_data_flag, bool force_attach, bool skip_startup_tables) override; + void startupTables(ThreadPool & thread_pool, bool force_restore, bool force_attach) override; void createTable(ContextPtr context_, const String & name, const StoragePtr & table, const ASTPtr & query) override; diff --git a/src/Databases/PostgreSQL/DatabaseMaterializedPostgreSQL.cpp b/src/Databases/PostgreSQL/DatabaseMaterializedPostgreSQL.cpp index cb3cda8ab79..8fb75473d15 100644 --- a/src/Databases/PostgreSQL/DatabaseMaterializedPostgreSQL.cpp +++ b/src/Databases/PostgreSQL/DatabaseMaterializedPostgreSQL.cpp @@ -108,11 +108,9 @@ void DatabaseMaterializedPostgreSQL::startSynchronization() } -void DatabaseMaterializedPostgreSQL::loadStoredObjects( - ContextMutablePtr local_context, bool has_force_restore_data_flag, bool force_attach, bool skip_startup_tables) +void DatabaseMaterializedPostgreSQL::startupTables(ThreadPool & thread_pool, bool force_restore, bool force_attach) { - DatabaseAtomic::loadStoredObjects(local_context, has_force_restore_data_flag, force_attach, skip_startup_tables); - + DatabaseAtomic::startupTables(thread_pool, force_restore, force_attach); try { startSynchronization(); diff --git a/src/Databases/PostgreSQL/DatabaseMaterializedPostgreSQL.h b/src/Databases/PostgreSQL/DatabaseMaterializedPostgreSQL.h index a0f9b3fce7a..5770187ad09 100644 --- a/src/Databases/PostgreSQL/DatabaseMaterializedPostgreSQL.h +++ b/src/Databases/PostgreSQL/DatabaseMaterializedPostgreSQL.h @@ -42,7 +42,7 @@ public: String getMetadataPath() const override { return metadata_path; } - void loadStoredObjects(ContextMutablePtr, bool, bool force_attach, bool skip_startup_tables) override; + void startupTables(ThreadPool & thread_pool, bool force_restore, bool force_attach) override; DatabaseTablesIteratorPtr getTablesIterator(ContextPtr context, const DatabaseOnDisk::FilterByNameFunction & filter_by_table_name) const override; diff --git a/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp b/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp index 8dad3aa3a5c..f17ee2a8e17 100644 --- a/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp +++ b/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp @@ -39,16 +39,14 @@ DatabasePostgreSQL::DatabasePostgreSQL( const String & metadata_path_, const ASTStorage * database_engine_define_, const String & dbname_, - const String & postgres_dbname_, - const String & postgres_schema_, + const StoragePostgreSQLConfiguration & configuration_, postgres::PoolWithFailoverPtr pool_, bool cache_tables_) : IDatabase(dbname_) , WithContext(context_->getGlobalContext()) , metadata_path(metadata_path_) , database_engine_define(database_engine_define_->clone()) - , postgres_dbname(postgres_dbname_) - , postgres_schema(postgres_schema_) + , configuration(configuration_) , pool(std::move(pool_)) , cache_tables(cache_tables_) { @@ -59,17 +57,17 @@ DatabasePostgreSQL::DatabasePostgreSQL( String DatabasePostgreSQL::getTableNameForLogs(const String & table_name) const { - if (postgres_schema.empty()) - return fmt::format("{}.{}", postgres_dbname, table_name); - return fmt::format("{}.{}.{}", postgres_dbname, postgres_schema, table_name); + if (configuration.schema.empty()) + return fmt::format("{}.{}", configuration.database, table_name); + return fmt::format("{}.{}.{}", configuration.database, configuration.schema, table_name); } String DatabasePostgreSQL::formatTableName(const String & table_name) const { - if (postgres_schema.empty()) + if (configuration.schema.empty()) return doubleQuoteString(table_name); - return fmt::format("{}.{}", doubleQuoteString(postgres_schema), doubleQuoteString(table_name)); + return fmt::format("{}.{}", doubleQuoteString(configuration.schema), doubleQuoteString(table_name)); } @@ -78,7 +76,7 @@ bool DatabasePostgreSQL::empty() const std::lock_guard lock(mutex); auto connection_holder = pool->get(); - auto tables_list = fetchPostgreSQLTablesList(connection_holder->get(), postgres_schema); + auto tables_list = fetchPostgreSQLTablesList(connection_holder->get(), configuration.schema); for (const auto & table_name : tables_list) if (!detached_or_dropped.count(table_name)) @@ -94,7 +92,7 @@ DatabaseTablesIteratorPtr DatabasePostgreSQL::getTablesIterator(ContextPtr local Tables tables; auto connection_holder = pool->get(); - auto table_names = fetchPostgreSQLTablesList(connection_holder->get(), postgres_schema); + auto table_names = fetchPostgreSQLTablesList(connection_holder->get(), configuration.schema); for (const auto & table_name : table_names) if (!detached_or_dropped.count(table_name)) @@ -125,7 +123,7 @@ bool DatabasePostgreSQL::checkPostgresTable(const String & table_name) const "WHERE schemaname != 'pg_catalog' AND {} " "AND tablename = '{}'", formatTableName(table_name), - (postgres_schema.empty() ? "schemaname != 'information_schema'" : "schemaname = " + quoteString(postgres_schema)), + (configuration.schema.empty() ? "schemaname != 'information_schema'" : "schemaname = " + quoteString(configuration.schema)), formatTableName(table_name))); } catch (pqxx::undefined_table const &) @@ -179,7 +177,7 @@ StoragePtr DatabasePostgreSQL::fetchTable(const String & table_name, ContextPtr, auto storage = StoragePostgreSQL::create( StorageID(database_name, table_name), pool, table_name, - ColumnsDescription{*columns}, ConstraintsDescription{}, String{}, postgres_schema); + ColumnsDescription{*columns}, ConstraintsDescription{}, String{}, configuration.schema, configuration.on_conflict); if (cache_tables) cached_tables[table_name] = storage; @@ -306,7 +304,7 @@ void DatabasePostgreSQL::removeOutdatedTables() { std::lock_guard lock{mutex}; auto connection_holder = pool->get(); - auto actual_tables = fetchPostgreSQLTablesList(connection_holder->get(), postgres_schema); + auto actual_tables = fetchPostgreSQLTablesList(connection_holder->get(), configuration.schema); if (cache_tables) { diff --git a/src/Databases/PostgreSQL/DatabasePostgreSQL.h b/src/Databases/PostgreSQL/DatabasePostgreSQL.h index 629f9eadf2d..0f66a6c7b90 100644 --- a/src/Databases/PostgreSQL/DatabasePostgreSQL.h +++ b/src/Databases/PostgreSQL/DatabasePostgreSQL.h @@ -10,7 +10,7 @@ #include #include #include - +#include namespace DB { @@ -32,8 +32,7 @@ public: const String & metadata_path_, const ASTStorage * database_engine_define, const String & dbname_, - const String & postgres_dbname_, - const String & postgres_schema_, + const StoragePostgreSQLConfiguration & configuration, postgres::PoolWithFailoverPtr pool_, bool cache_tables_); @@ -70,8 +69,7 @@ protected: private: String metadata_path; ASTPtr database_engine_define; - String postgres_dbname; - String postgres_schema; + StoragePostgreSQLConfiguration configuration; postgres::PoolWithFailoverPtr pool; const bool cache_tables; diff --git a/src/Databases/TablesLoader.cpp b/src/Databases/TablesLoader.cpp new file mode 100644 index 00000000000..48d751b5795 --- /dev/null +++ b/src/Databases/TablesLoader.cpp @@ -0,0 +1,255 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int INFINITE_LOOP; + extern const int LOGICAL_ERROR; +} + +static constexpr size_t PRINT_MESSAGE_EACH_N_OBJECTS = 256; +static constexpr size_t PRINT_MESSAGE_EACH_N_SECONDS = 5; + + +void logAboutProgress(Poco::Logger * log, size_t processed, size_t total, AtomicStopwatch & watch) +{ + if (processed % PRINT_MESSAGE_EACH_N_OBJECTS == 0 || watch.compareAndRestart(PRINT_MESSAGE_EACH_N_SECONDS)) + { + LOG_INFO(log, "{}%", processed * 100.0 / total); + watch.restart(); + } +} + +TablesLoader::TablesLoader(ContextMutablePtr global_context_, Databases databases_, bool force_restore_, bool force_attach_) +: global_context(global_context_) +, databases(std::move(databases_)) +, force_restore(force_restore_) +, force_attach(force_attach_) +{ + metadata.default_database = global_context->getCurrentDatabase(); + log = &Poco::Logger::get("TablesLoader"); +} + + +void TablesLoader::loadTables() +{ + bool need_resolve_dependencies = !global_context->getConfigRef().has("ignore_table_dependencies_on_metadata_loading"); + + /// Load all Lazy, MySQl, PostgreSQL, SQLite, etc databases first. + for (auto & database : databases) + { + if (need_resolve_dependencies && database.second->supportsLoadingInTopologicalOrder()) + databases_to_load.push_back(database.first); + else + database.second->loadStoredObjects(global_context, force_restore, force_attach, true); + } + + if (databases_to_load.empty()) + return; + + /// Read and parse metadata from Ordinary, Atomic, Materialized*, Replicated, etc databases. Build dependency graph. + for (auto & database_name : databases_to_load) + { + databases[database_name]->beforeLoadingMetadata(global_context, force_restore, force_attach); + databases[database_name]->loadTablesMetadata(global_context, metadata); + } + + LOG_INFO(log, "Parsed metadata of {} tables in {} databases in {} sec", + metadata.parsed_tables.size(), databases_to_load.size(), stopwatch.elapsedSeconds()); + + stopwatch.restart(); + + logDependencyGraph(); + + /// Some tables were loaded by database with loadStoredObjects(...). Remove them from graph if necessary. + removeUnresolvableDependencies(); + + loadTablesInTopologicalOrder(pool); +} + +void TablesLoader::startupTables() +{ + /// Startup tables after all tables are loaded. Background tasks (merges, mutations, etc) may slow down data parts loading. + for (auto & database : databases) + database.second->startupTables(pool, force_restore, force_attach); +} + + +void TablesLoader::removeUnresolvableDependencies() +{ + auto need_exclude_dependency = [this](const QualifiedTableName & dependency_name, const DependenciesInfo & info) + { + /// Table exists and will be loaded + if (metadata.parsed_tables.contains(dependency_name)) + return false; + /// Table exists and it's already loaded + if (DatabaseCatalog::instance().isTableExist(StorageID(dependency_name.database, dependency_name.table), global_context)) + return true; + /// It's XML dictionary. It was loaded before tables and DDL dictionaries. + if (dependency_name.database == metadata.default_database && + global_context->getExternalDictionariesLoader().has(dependency_name.table)) + return true; + + /// Some tables depends on table "dependency_name", but there is no such table in DatabaseCatalog and we don't have its metadata. + /// We will ignore it and try to load dependent tables without "dependency_name" + /// (but most likely dependent tables will fail to load). + LOG_WARNING(log, "Tables {} depend on {}, but seems like the it does not exist. Will ignore it and try to load existing tables", + fmt::join(info.dependent_database_objects, ", "), dependency_name); + + if (info.dependencies_count) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Table {} does not exist, but we have seen its AST and found {} dependencies." + "It's a bug", dependency_name, info.dependencies_count); + if (info.dependent_database_objects.empty()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Table {} does not have dependencies and dependent tables as it expected to." + "It's a bug", dependency_name); + + return true; + }; + + auto table_it = metadata.dependencies_info.begin(); + while (table_it != metadata.dependencies_info.end()) + { + auto & info = table_it->second; + if (need_exclude_dependency(table_it->first, info)) + table_it = removeResolvedDependency(table_it, metadata.independent_database_objects); + else + ++table_it; + } +} + +void TablesLoader::loadTablesInTopologicalOrder(ThreadPool & pool) +{ + /// Load independent tables in parallel. + /// Then remove loaded tables from dependency graph, find tables/dictionaries that do not have unresolved dependencies anymore, + /// move them to the list of independent tables and load. + /// Repeat until we have some tables to load. + /// If we do not, then either all objects are loaded or there is cyclic dependency. + /// Complexity: O(V + E) + size_t level = 0; + do + { + assert(metadata.parsed_tables.size() == tables_processed + metadata.independent_database_objects.size() + getNumberOfTablesWithDependencies()); + logDependencyGraph(); + + startLoadingIndependentTables(pool, level); + + TableNames new_independent_database_objects; + for (const auto & table_name : metadata.independent_database_objects) + { + auto info_it = metadata.dependencies_info.find(table_name); + if (info_it == metadata.dependencies_info.end()) + { + /// No tables depend on table_name and it was not even added to dependencies_info + continue; + } + removeResolvedDependency(info_it, new_independent_database_objects); + } + + pool.wait(); + + metadata.independent_database_objects = std::move(new_independent_database_objects); + ++level; + } while (!metadata.independent_database_objects.empty()); + + checkCyclicDependencies(); +} + +DependenciesInfosIter TablesLoader::removeResolvedDependency(const DependenciesInfosIter & info_it, TableNames & independent_database_objects) +{ + auto & info = info_it->second; + if (info.dependencies_count) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Table {} is in list of independent tables, but dependencies count is {}." + "It's a bug", info_it->first, info.dependencies_count); + if (info.dependent_database_objects.empty()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Table {} does not have dependent tables. It's a bug", info_it->first); + + /// Decrement number of dependencies for each dependent table + for (auto & dependent_table : info.dependent_database_objects) + { + auto & dependent_info = metadata.dependencies_info[dependent_table]; + auto & dependencies_count = dependent_info.dependencies_count; + if (dependencies_count == 0) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Trying to decrement 0 dependencies counter for {}. It's a bug", dependent_table); + --dependencies_count; + if (dependencies_count == 0) + { + independent_database_objects.push_back(dependent_table); + if (dependent_info.dependent_database_objects.empty()) + metadata.dependencies_info.erase(dependent_table); + } + } + + return metadata.dependencies_info.erase(info_it); +} + +void TablesLoader::startLoadingIndependentTables(ThreadPool & pool, size_t level) +{ + size_t total_tables = metadata.parsed_tables.size(); + + LOG_INFO(log, "Loading {} tables with {} dependency level", metadata.independent_database_objects.size(), level); + + for (const auto & table_name : metadata.independent_database_objects) + { + pool.scheduleOrThrowOnError([this, total_tables, &table_name]() + { + const auto & path_and_query = metadata.parsed_tables[table_name]; + databases[table_name.database]->loadTableFromMetadata(global_context, path_and_query.path, table_name, path_and_query.ast, force_restore); + logAboutProgress(log, ++tables_processed, total_tables, stopwatch); + }); + } +} + +size_t TablesLoader::getNumberOfTablesWithDependencies() const +{ + size_t number_of_tables_with_dependencies = 0; + for (const auto & info : metadata.dependencies_info) + if (info.second.dependencies_count) + ++number_of_tables_with_dependencies; + return number_of_tables_with_dependencies; +} + +void TablesLoader::checkCyclicDependencies() const +{ + /// Loading is finished if all dependencies are resolved + if (metadata.dependencies_info.empty()) + return; + + for (const auto & info : metadata.dependencies_info) + { + LOG_WARNING(log, "Cannot resolve dependencies: Table {} have {} dependencies and {} dependent tables. List of dependent tables: {}", + info.first, info.second.dependencies_count, + info.second.dependent_database_objects.size(), fmt::join(info.second.dependent_database_objects, ", ")); + assert(info.second.dependencies_count == 0); + } + + throw Exception(ErrorCodes::INFINITE_LOOP, "Cannot attach {} tables due to cyclic dependencies. " + "See server log for details.", metadata.dependencies_info.size()); +} + +void TablesLoader::logDependencyGraph() const +{ + LOG_TEST(log, "Have {} independent tables: {}", + metadata.independent_database_objects.size(), + fmt::join(metadata.independent_database_objects, ", ")); + for (const auto & dependencies : metadata.dependencies_info) + { + LOG_TEST(log, + "Table {} have {} dependencies and {} dependent tables. List of dependent tables: {}", + dependencies.first, + dependencies.second.dependencies_count, + dependencies.second.dependent_database_objects.size(), + fmt::join(dependencies.second.dependent_database_objects, ", ")); + } +} + +} diff --git a/src/Databases/TablesLoader.h b/src/Databases/TablesLoader.h new file mode 100644 index 00000000000..12f6c2e86a5 --- /dev/null +++ b/src/Databases/TablesLoader.h @@ -0,0 +1,112 @@ +#pragma once +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace Poco +{ + class Logger; +} + +class AtomicStopwatch; + +namespace DB +{ + +void logAboutProgress(Poco::Logger * log, size_t processed, size_t total, AtomicStopwatch & watch); + + +class IDatabase; +using DatabasePtr = std::shared_ptr; + +struct ParsedTableMetadata +{ + String path; + ASTPtr ast; +}; + +using ParsedMetadata = std::map; +using TableNames = std::vector; + +struct DependenciesInfo +{ + /// How many dependencies this table have + size_t dependencies_count = 0; + /// List of tables/dictionaries which depend on this table/dictionary + TableNames dependent_database_objects; +}; + +using DependenciesInfos = std::unordered_map; +using DependenciesInfosIter = std::unordered_map::iterator; + +struct ParsedTablesMetadata +{ + String default_database; + + std::mutex mutex; + ParsedMetadata parsed_tables; + + /// For logging + size_t total_dictionaries = 0; + + /// List of tables/dictionaries that do not have any dependencies and can be loaded + TableNames independent_database_objects; + + /// Actually it contains two different maps (with, probably, intersecting keys): + /// 1. table/dictionary name -> number of dependencies + /// 2. table/dictionary name -> dependent tables/dictionaries list (adjacency list of dependencies graph). + /// If table A depends on table B, then there is an edge B --> A, i.e. dependencies_info[B].dependent_database_objects contains A. + /// And dependencies_info[C].dependencies_count is a number of incoming edges for vertex C (how many tables we have to load before C). + DependenciesInfos dependencies_info; +}; + +/// Loads tables (and dictionaries) from specified databases +/// taking into account dependencies between them. +class TablesLoader +{ +public: + using Databases = std::map; + + TablesLoader(ContextMutablePtr global_context_, Databases databases_, bool force_restore_ = false, bool force_attach_ = false); + TablesLoader() = delete; + + void loadTables(); + void startupTables(); + +private: + ContextMutablePtr global_context; + Databases databases; + bool force_restore; + bool force_attach; + + Strings databases_to_load; + ParsedTablesMetadata metadata; + Poco::Logger * log; + std::atomic tables_processed{0}; + AtomicStopwatch stopwatch; + + ThreadPool pool; + + void removeUnresolvableDependencies(); + + void loadTablesInTopologicalOrder(ThreadPool & pool); + + DependenciesInfosIter removeResolvedDependency(const DependenciesInfosIter & info_it, TableNames & independent_database_objects); + + void startLoadingIndependentTables(ThreadPool & pool, size_t level); + + void checkCyclicDependencies() const; + + size_t getNumberOfTablesWithDependencies() const; + + void logDependencyGraph() const; +}; + +} diff --git a/src/Databases/ya.make b/src/Databases/ya.make index 34f47a5edf0..d088ba16fe2 100644 --- a/src/Databases/ya.make +++ b/src/Databases/ya.make @@ -9,6 +9,7 @@ PEERDIR( SRCS( + DDLDependencyVisitor.cpp DatabaseAtomic.cpp DatabaseDictionary.cpp DatabaseFactory.cpp @@ -30,6 +31,7 @@ SRCS( SQLite/DatabaseSQLite.cpp SQLite/SQLiteUtils.cpp SQLite/fetchSQLiteTableStructure.cpp + TablesLoader.cpp ) diff --git a/src/Dictionaries/MongoDBDictionarySource.cpp b/src/Dictionaries/MongoDBDictionarySource.cpp index 23ea9bc00e2..ebf479d7038 100644 --- a/src/Dictionaries/MongoDBDictionarySource.cpp +++ b/src/Dictionaries/MongoDBDictionarySource.cpp @@ -2,6 +2,8 @@ #include "DictionarySourceFactory.h" #include "DictionaryStructure.h" #include "registerDictionaries.h" +#include + namespace DB { @@ -13,19 +15,20 @@ void registerDictionarySourceMongoDB(DictionarySourceFactory & factory) const Poco::Util::AbstractConfiguration & config, const std::string & root_config_prefix, Block & sample_block, - ContextPtr, + ContextPtr context, const std::string & /* default_database */, bool /* created_from_ddl */) { const auto config_prefix = root_config_prefix + ".mongodb"; + auto configuration = getExternalDataSourceConfiguration(config, config_prefix, context); return std::make_unique(dict_struct, config.getString(config_prefix + ".uri", ""), - config.getString(config_prefix + ".host", ""), - config.getUInt(config_prefix + ".port", 0), - config.getString(config_prefix + ".user", ""), - config.getString(config_prefix + ".password", ""), + configuration.host, + configuration.port, + configuration.username, + configuration.password, config.getString(config_prefix + ".method", ""), - config.getString(config_prefix + ".db", ""), + configuration.database, config.getString(config_prefix + ".collection"), sample_block); }; diff --git a/src/Dictionaries/MySQLDictionarySource.cpp b/src/Dictionaries/MySQLDictionarySource.cpp index bd53c1e60a7..4f805687c26 100644 --- a/src/Dictionaries/MySQLDictionarySource.cpp +++ b/src/Dictionaries/MySQLDictionarySource.cpp @@ -12,6 +12,8 @@ #include #include #include +#include + namespace DB { @@ -32,38 +34,43 @@ void registerDictionarySourceMysql(DictionarySourceFactory & factory) [[maybe_unused]] const std::string & config_prefix, [[maybe_unused]] Block & sample_block, [[maybe_unused]] ContextPtr global_context, - const std::string & /* default_database */, - bool /* created_from_ddl */) -> DictionarySourcePtr { + const std::string & /* default_database */, + [[maybe_unused]] bool created_from_ddl) -> DictionarySourcePtr { #if USE_MYSQL - StreamSettings mysql_input_stream_settings(global_context->getSettingsRef() - , config.getBool(config_prefix + ".mysql.close_connection", false) || config.getBool(config_prefix + ".mysql.share_connection", false) - , false - , config.getBool(config_prefix + ".mysql.fail_on_connection_loss", false) ? 1 : default_num_tries_on_connection_loss); + StreamSettings mysql_input_stream_settings( + global_context->getSettingsRef(), + config.getBool(config_prefix + ".mysql.close_connection", false) || config.getBool(config_prefix + ".mysql.share_connection", false), + false, + config.getBool(config_prefix + ".mysql.fail_on_connection_loss", false) ? 1 : default_num_tries_on_connection_loss); auto settings_config_prefix = config_prefix + ".mysql"; - - auto table = config.getString(settings_config_prefix + ".table", ""); - auto where = config.getString(settings_config_prefix + ".where", ""); + auto configuration = getExternalDataSourceConfiguration(config, settings_config_prefix, global_context); auto query = config.getString(settings_config_prefix + ".query", ""); - - if (query.empty() && table.empty()) + if (query.empty() && configuration.table.empty()) throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "MySQL dictionary source configuration must contain table or query field"); - MySQLDictionarySource::Configuration configuration + MySQLDictionarySource::Configuration dictionary_configuration { - .db = config.getString(settings_config_prefix + ".db", ""), - .table = table, + .db = configuration.database, + .table = configuration.table, .query = query, - .where = where, + .where = config.getString(settings_config_prefix + ".where", ""), .invalidate_query = config.getString(settings_config_prefix + ".invalidate_query", ""), .update_field = config.getString(settings_config_prefix + ".update_field", ""), .update_lag = config.getUInt64(settings_config_prefix + ".update_lag", 1), .dont_check_update_time = config.getBool(settings_config_prefix + ".dont_check_update_time", false) }; - auto pool = std::make_shared(mysqlxx::PoolFactory::instance().get(config, settings_config_prefix)); + std::shared_ptr pool; + if (created_from_ddl) + { + std::vector> addresses{std::make_pair(configuration.host, configuration.port)}; + pool = std::make_shared(configuration.database, addresses, configuration.username, configuration.password); + } + else + pool = std::make_shared(mysqlxx::PoolFactory::instance().get(config, settings_config_prefix)); - return std::make_unique(dict_struct, configuration, std::move(pool), sample_block, mysql_input_stream_settings); + return std::make_unique(dict_struct, dictionary_configuration, std::move(pool), sample_block, mysql_input_stream_settings); #else throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Dictionary source of type `mysql` is disabled because ClickHouse was built without mysql support."); diff --git a/src/Dictionaries/PostgreSQLDictionarySource.cpp b/src/Dictionaries/PostgreSQLDictionarySource.cpp index 3fe9e899cd9..56b75f024ad 100644 --- a/src/Dictionaries/PostgreSQLDictionarySource.cpp +++ b/src/Dictionaries/PostgreSQLDictionarySource.cpp @@ -1,6 +1,7 @@ #include "PostgreSQLDictionarySource.h" #include +#include #include "DictionarySourceFactory.h" #include "registerDictionaries.h" @@ -10,6 +11,7 @@ #include #include "readInvalidateQuery.h" #include +#include #endif @@ -29,19 +31,13 @@ namespace { ExternalQueryBuilder makeExternalQueryBuilder(const DictionaryStructure & dict_struct, const String & schema, const String & table, const String & query, const String & where) { - auto schema_value = schema; - auto table_value = table; + QualifiedTableName qualified_name{schema, table}; + + if (qualified_name.database.empty() && !qualified_name.table.empty()) + qualified_name = QualifiedTableName::parseFromString(qualified_name.table); - if (schema_value.empty()) - { - if (auto pos = table_value.find('.'); pos != std::string::npos) - { - schema_value = table_value.substr(0, pos); - table_value = table_value.substr(pos + 1); - } - } /// Do not need db because it is already in a connection string. - return {dict_struct, "", schema_value, table_value, query, where, IdentifierQuotingStyle::DoubleQuotes}; + return {dict_struct, "", qualified_name.database, qualified_name.table, query, where, IdentifierQuotingStyle::DoubleQuotes}; } } @@ -182,22 +178,24 @@ void registerDictionarySourcePostgreSQL(DictionarySourceFactory & factory) const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, Block & sample_block, - ContextPtr global_context, + ContextPtr context, const std::string & /* default_database */, bool /* created_from_ddl */) -> DictionarySourcePtr { #if USE_LIBPQXX const auto settings_config_prefix = config_prefix + ".postgresql"; - auto pool = std::make_shared( - config, settings_config_prefix, - global_context->getSettingsRef().postgresql_connection_pool_size, - global_context->getSettingsRef().postgresql_connection_pool_wait_timeout); - PostgreSQLDictionarySource::Configuration configuration + auto configuration = getExternalDataSourceConfigurationByPriority(config, settings_config_prefix, context); + auto pool = std::make_shared( + configuration.replicas_configurations, + context->getSettingsRef().postgresql_connection_pool_size, + context->getSettingsRef().postgresql_connection_pool_wait_timeout); + + PostgreSQLDictionarySource::Configuration dictionary_configuration { - .db = config.getString(fmt::format("{}.db", settings_config_prefix), ""), - .schema = config.getString(fmt::format("{}.schema", settings_config_prefix), ""), - .table = config.getString(fmt::format("{}.table", settings_config_prefix), ""), + .db = configuration.database, + .schema = configuration.schema, + .table = configuration.table, .query = config.getString(fmt::format("{}.query", settings_config_prefix), ""), .where = config.getString(fmt::format("{}.where", settings_config_prefix), ""), .invalidate_query = config.getString(fmt::format("{}.invalidate_query", settings_config_prefix), ""), @@ -205,13 +203,13 @@ void registerDictionarySourcePostgreSQL(DictionarySourceFactory & factory) .update_lag = config.getUInt64(fmt::format("{}.update_lag", settings_config_prefix), 1) }; - return std::make_unique(dict_struct, configuration, pool, sample_block); + return std::make_unique(dict_struct, dictionary_configuration, pool, sample_block); #else (void)dict_struct; (void)config; (void)config_prefix; (void)sample_block; - (void)global_context; + (void)context; throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Dictionary source of type `postgresql` is disabled because ClickHouse was built without postgresql support."); #endif diff --git a/src/Dictionaries/XDBCDictionarySource.cpp b/src/Dictionaries/XDBCDictionarySource.cpp index 9fc7e92634b..bf7526580c0 100644 --- a/src/Dictionaries/XDBCDictionarySource.cpp +++ b/src/Dictionaries/XDBCDictionarySource.cpp @@ -38,29 +38,22 @@ namespace const std::string & where_, IXDBCBridgeHelper & bridge_) { - std::string schema = schema_; - std::string table = table_; + QualifiedTableName qualified_name{schema_, table_}; if (bridge_.isSchemaAllowed()) { - if (schema.empty()) - { - if (auto pos = table.find('.'); pos != std::string::npos) - { - schema = table.substr(0, pos); - table = table.substr(pos + 1); - } - } + if (qualified_name.database.empty()) + qualified_name = QualifiedTableName::parseFromString(qualified_name.table); } else { - if (!schema.empty()) + if (!qualified_name.database.empty()) throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Dictionary source of type {} specifies a schema but schema is not supported by {}-driver", bridge_.getName()); } - return {dict_struct_, db_, schema, table, query_, where_, bridge_.getIdentifierQuotingStyle()}; + return {dict_struct_, db_, qualified_name.database, qualified_name.table, query_, where_, bridge_.getIdentifierQuotingStyle()}; } } diff --git a/src/Dictionaries/getDictionaryConfigurationFromAST.cpp b/src/Dictionaries/getDictionaryConfigurationFromAST.cpp index c77ac36ade6..0ed5b3af83d 100644 --- a/src/Dictionaries/getDictionaryConfigurationFromAST.cpp +++ b/src/Dictionaries/getDictionaryConfigurationFromAST.cpp @@ -4,7 +4,6 @@ #include #include #include -#include #include #include #include @@ -16,6 +15,8 @@ #include #include #include +#include +#include namespace DB @@ -576,4 +577,28 @@ getDictionaryConfigurationFromAST(const ASTCreateQuery & query, ContextPtr conte return conf; } +std::optional +getInfoIfClickHouseDictionarySource(DictionaryConfigurationPtr & config, ContextPtr global_context) +{ + ClickHouseDictionarySourceInfo info; + + String host = config->getString("dictionary.source.clickhouse.host", ""); + UInt16 port = config->getUInt("dictionary.source.clickhouse.port", 0); + String database = config->getString("dictionary.source.clickhouse.db", ""); + String table = config->getString("dictionary.source.clickhouse.table", ""); + bool secure = config->getBool("dictionary.source.clickhouse.secure", false); + + if (host.empty() || port == 0 || table.empty()) + return {}; + + info.table_name = {database, table}; + + UInt16 default_port = secure ? global_context->getTCPPortSecure().value_or(0) : global_context->getTCPPort(); + if (!isLocalAddress({host, port}, default_port)) + return info; + + info.is_local = true; + return info; +} + } diff --git a/src/Dictionaries/getDictionaryConfigurationFromAST.h b/src/Dictionaries/getDictionaryConfigurationFromAST.h index b464fdf1d8c..ec44b9815ff 100644 --- a/src/Dictionaries/getDictionaryConfigurationFromAST.h +++ b/src/Dictionaries/getDictionaryConfigurationFromAST.h @@ -15,4 +15,13 @@ using DictionaryConfigurationPtr = Poco::AutoPtr +getInfoIfClickHouseDictionarySource(DictionaryConfigurationPtr & config, ContextPtr global_context); + } diff --git a/src/Disks/DiskWebServer.cpp b/src/Disks/DiskWebServer.cpp index 93d5593f0f2..19f2305dcb6 100644 --- a/src/Disks/DiskWebServer.cpp +++ b/src/Disks/DiskWebServer.cpp @@ -112,23 +112,29 @@ public: const String & uri_, RemoteMetadata metadata_, ContextPtr context_, - size_t buf_size_) + size_t buf_size_, + size_t backoff_threshold_, + size_t max_tries_) : ReadIndirectBufferFromRemoteFS(metadata_) , uri(uri_) , context(context_) , buf_size(buf_size_) + , backoff_threshold(backoff_threshold_) + , max_tries(max_tries_) { } std::unique_ptr createReadBuffer(const String & path) override { - return std::make_unique(fs::path(uri) / path, context, buf_size); + return std::make_unique(fs::path(uri) / path, context, buf_size, backoff_threshold, max_tries); } private: String uri; ContextPtr context; size_t buf_size; + size_t backoff_threshold; + size_t max_tries; }; @@ -190,7 +196,8 @@ std::unique_ptr DiskWebServer::readFile(const String & p RemoteMetadata meta(path, remote_path); meta.remote_fs_objects.emplace_back(std::make_pair(remote_path, iter->second.size)); - auto reader = std::make_unique(url, meta, getContext(), read_settings.remote_fs_buffer_size); + auto reader = std::make_unique(url, meta, getContext(), + read_settings.remote_fs_buffer_size, read_settings.remote_fs_backoff_threshold, read_settings.remote_fs_backoff_max_tries); return std::make_unique(std::move(reader), min_bytes_for_seek); } diff --git a/src/Disks/ReadIndirectBufferFromWebServer.cpp b/src/Disks/ReadIndirectBufferFromWebServer.cpp index 10761867e44..809e6d67107 100644 --- a/src/Disks/ReadIndirectBufferFromWebServer.cpp +++ b/src/Disks/ReadIndirectBufferFromWebServer.cpp @@ -22,18 +22,17 @@ namespace ErrorCodes static const auto WAIT_MS = 10; -ReadIndirectBufferFromWebServer::ReadIndirectBufferFromWebServer(const String & url_, - ContextPtr context_, - size_t buf_size_) + +ReadIndirectBufferFromWebServer::ReadIndirectBufferFromWebServer( + const String & url_, ContextPtr context_, size_t buf_size_, size_t backoff_threshold_, size_t max_tries_) : BufferWithOwnMemory(buf_size_) , log(&Poco::Logger::get("ReadIndirectBufferFromWebServer")) , context(context_) , url(url_) , buf_size(buf_size_) + , backoff_threshold_ms(backoff_threshold_) + , max_tries(max_tries_) { - const auto & settings = context->getSettingsRef(); - wait_threshold_ms = settings.remote_disk_read_backoff_threashold; - max_tries = settings.remote_disk_read_backoff_max_tries; } @@ -79,7 +78,7 @@ bool ReadIndirectBufferFromWebServer::nextImpl() WriteBufferFromOwnString error_msg; for (size_t i = 0; (i < max_tries) && !successful_read && !next_result; ++i) { - while (milliseconds_to_wait < wait_threshold_ms) + while (milliseconds_to_wait < backoff_threshold_ms) { try { diff --git a/src/Disks/ReadIndirectBufferFromWebServer.h b/src/Disks/ReadIndirectBufferFromWebServer.h index bc66aedd518..04bb155f83b 100644 --- a/src/Disks/ReadIndirectBufferFromWebServer.h +++ b/src/Disks/ReadIndirectBufferFromWebServer.h @@ -18,7 +18,8 @@ class ReadIndirectBufferFromWebServer : public BufferWithOwnMemorysecond.input_creator || it->second.input_processor_creator); +} + +bool FormatFactory::isOutputFormat(const String & name) const +{ + auto it = dict.find(name); + return it != dict.end() && (it->second.output_creator || it->second.output_processor_creator); +} + FormatFactory & FormatFactory::instance() { static FormatFactory ret; diff --git a/src/Formats/FormatFactory.h b/src/Formats/FormatFactory.h index e935eb4d761..7ff72387509 100644 --- a/src/Formats/FormatFactory.h +++ b/src/Formats/FormatFactory.h @@ -187,6 +187,9 @@ public: return dict; } + bool isInputFormat(const String & name) const; + bool isOutputFormat(const String & name) const; + private: FormatsDictionary dict; diff --git a/src/Formats/FormatSettings.h b/src/Formats/FormatSettings.h index 3e1e00584c0..3a274f99a5c 100644 --- a/src/Formats/FormatSettings.h +++ b/src/Formats/FormatSettings.h @@ -76,6 +76,7 @@ struct FormatSettings bool crlf_end_of_line = false; bool input_format_enum_as_number = false; bool input_format_arrays_as_nested_csv = false; + String null_representation = "\\N"; } csv; struct Custom diff --git a/src/Functions/uptime.cpp b/src/Functions/FunctionConstantBase.h similarity index 50% rename from src/Functions/uptime.cpp rename to src/Functions/FunctionConstantBase.h index bb767101fea..35096a9942f 100644 --- a/src/Functions/uptime.cpp +++ b/src/Functions/FunctionConstantBase.h @@ -1,30 +1,35 @@ +#pragma once #include #include -#include #include namespace DB { -/** Returns server uptime in seconds. - */ -class FunctionUptime : public IFunction +/// Base class for constant functions +template +class FunctionConstantBase : public IFunction { public: - static constexpr auto name = "uptime"; - static FunctionPtr create(ContextPtr context) + + /// For server-level constants (uptime(), version(), etc) + explicit FunctionConstantBase(ContextPtr context, T && constant_value_) + : is_distributed(context->isDistributed()) + , constant_value(std::forward(constant_value_)) { - return std::make_shared(context->isDistributed(), context->getUptimeSeconds()); } - explicit FunctionUptime(bool is_distributed_, time_t uptime_) : is_distributed(is_distributed_), uptime(uptime_) + /// For real constants (pi(), e(), etc) + explicit FunctionConstantBase(const T & constant_value_) + : is_distributed(false) + , constant_value(constant_value_) { } String getName() const override { - return name; + return Derived::name; } size_t getNumberOfArguments() const override @@ -34,29 +39,26 @@ public: DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override { - return std::make_shared(); + return std::make_shared(); } bool isDeterministic() const override { return false; } bool isDeterministicInScopeOfQuery() const override { return true; } + + /// Some functions may return different values on different shards/replicas, so it's not constant for distributed query bool isSuitableForConstantFolding() const override { return !is_distributed; } bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; } ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override { - return DataTypeUInt32().createColumnConst(input_rows_count, static_cast(uptime)); + return ColumnT().createColumnConst(input_rows_count, constant_value); } private: bool is_distributed; - time_t uptime; + const T constant_value; }; - -void registerFunctionUptime(FunctionFactory & factory) -{ - factory.registerFunction(); } -} diff --git a/src/Functions/FunctionJoinGet.cpp b/src/Functions/FunctionJoinGet.cpp index ee173607437..f0dff0ac7e4 100644 --- a/src/Functions/FunctionJoinGet.cpp +++ b/src/Functions/FunctionJoinGet.cpp @@ -48,22 +48,11 @@ getJoin(const ColumnsWithTypeAndName & arguments, ContextPtr context) "Illegal type " + arguments[0].type->getName() + " of first argument of function joinGet, expected a const string.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - size_t dot = join_name.find('.'); - String database_name; - if (dot == String::npos) - { - database_name = context->getCurrentDatabase(); - dot = 0; - } - else - { - database_name = join_name.substr(0, dot); - ++dot; - } - String table_name = join_name.substr(dot); - if (table_name.empty()) - throw Exception("joinGet does not allow empty table name", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - auto table = DatabaseCatalog::instance().getTable({database_name, table_name}, std::const_pointer_cast(context)); + auto qualified_name = QualifiedTableName::parseFromString(join_name); + if (qualified_name.database.empty()) + qualified_name.database = context->getCurrentDatabase(); + + auto table = DatabaseCatalog::instance().getTable({qualified_name.database, qualified_name.table}, std::const_pointer_cast(context)); auto storage_join = std::dynamic_pointer_cast(table); if (!storage_join) throw Exception("Table " + join_name + " should have engine StorageJoin", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); diff --git a/src/Functions/FunctionMathConstFloat64.h b/src/Functions/FunctionMathConstFloat64.h deleted file mode 100644 index 1d866b3dcd8..00000000000 --- a/src/Functions/FunctionMathConstFloat64.h +++ /dev/null @@ -1,36 +0,0 @@ -#pragma once - -#include -#include -#include - - -namespace DB -{ - -template -class FunctionMathConstFloat64 : public IFunction -{ -public: - static constexpr auto name = Impl::name; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - -private: - String getName() const override { return name; } - - size_t getNumberOfArguments() const override { return 0; } - - bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; } - - DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override - { - return std::make_shared(); - } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr & result_type, size_t input_rows_count) const override - { - return result_type->createColumnConst(input_rows_count, Impl::value); - } -}; - -} diff --git a/src/Functions/array/arrayIndex.h b/src/Functions/array/arrayIndex.h index 137f3a2ec78..c231ddbb373 100644 --- a/src/Functions/array/arrayIndex.h +++ b/src/Functions/array/arrayIndex.h @@ -3,10 +3,12 @@ #include #include #include +#include #include #include #include #include +#include #include #include #include @@ -373,23 +375,108 @@ public: size_t getNumberOfArguments() const override { return 2; } - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override { - const DataTypeArray * array_type = checkAndGetDataType(arguments[0].get()); + auto first_argument_type = arguments[0].type; + auto second_argument_type = arguments[1].type; - if (!array_type) - throw Exception("First argument for function " + getName() + " must be an array.", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + const DataTypeArray * array_type = checkAndGetDataType(first_argument_type.get()); + const DataTypeMap * map_type = checkAndGetDataType(first_argument_type.get()); - if (!arguments[1]->onlyNull() && !allowArguments(array_type->getNestedType(), arguments[1])) + DataTypePtr inner_type; + + /// If map is first argument only has(map_column, key) function is supported + if constexpr (std::is_same_v) + { + if (!array_type && !map_type) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "First argument for function {} must be an array or map.", + getName()); + + inner_type = map_type ? map_type->getKeyType() : array_type->getNestedType(); + } + else + { + if (!array_type) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "First argument for function {} must be an array.", + getName()); + + inner_type = array_type->getNestedType(); + } + + if (!second_argument_type->onlyNull() && !allowArguments(inner_type, second_argument_type)) + { + const char * first_argument_type_name = map_type ? "map" : "array"; throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, - "Types of array and 2nd argument of function `{}` must be identical up to nullability, cardinality, " + "Types of {} and 2nd argument of function `{}` must be identical up to nullability, cardinality, " "numeric types, or Enum and numeric type. Passed: {} and {}.", - getName(), arguments[0]->getName(), arguments[1]->getName()); + first_argument_type_name, + getName(), + first_argument_type->getName(), + second_argument_type->getName()); + } return std::make_shared>(); } + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t /*input_rows_count*/) const override + { + if constexpr (std::is_same_v) + { + if (isMap(arguments[0].type)) + { + auto non_const_map_column = arguments[0].column->convertToFullColumnIfConst(); + + const auto & map_column = assert_cast(*non_const_map_column); + const auto & map_array_column = map_column.getNestedColumn(); + auto offsets = map_array_column.getOffsetsPtr(); + auto keys = map_column.getNestedData().getColumnPtr(0); + auto array_column = ColumnArray::create(std::move(keys), std::move(offsets)); + + const auto & type_map = assert_cast(*arguments[0].type); + auto array_type = std::make_shared(type_map.getKeyType()); + + auto arguments_copy = arguments; + arguments_copy[0].column = std::move(array_column); + arguments_copy[0].type = std::move(array_type); + arguments_copy[0].name = arguments[0].name; + + return executeArrayImpl(arguments_copy, result_type); + } + } + + return executeArrayImpl(arguments, result_type); + } + +private: + using ResultType = typename ConcreteAction::ResultType; + using ResultColumnType = ColumnVector; + using ResultColumnPtr = decltype(ResultColumnType::create()); + + using NullMaps = std::pair; + + struct ExecutionData + { + const IColumn& left; + const IColumn& right; + const ColumnArray::Offsets& offsets; + ColumnPtr result_column; + NullMaps maps; + ResultColumnPtr result { ResultColumnType::create() }; + + inline void moveResult() { result_column = std::move(result); } + }; + + static inline bool allowArguments(const DataTypePtr & inner_type, const DataTypePtr & arg) + { + auto inner_type_decayed = removeNullable(removeLowCardinality(inner_type)); + auto arg_decayed = removeNullable(removeLowCardinality(arg)); + + return ((isNativeNumber(inner_type_decayed) || isEnum(inner_type_decayed)) && isNativeNumber(arg_decayed)) + || getLeastSupertype({inner_type_decayed, arg_decayed}); + } + /** * If one or both arguments passed to this function are nullable, * we create a new column that contains non-nullable arguments: @@ -404,7 +491,7 @@ public: * (they are vectors of Fields, which may represent the NULL value), * they do not require any preprocessing. */ - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t /*input_rows_count*/) const override + ColumnPtr executeArrayImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type) const { const ColumnPtr & ptr = arguments[0].column; @@ -419,11 +506,13 @@ public: if (col_array) nullable = checkAndGetColumn(col_array->getData()); - auto & arg_column = arguments[1].column; + const auto & arg_column = arguments[1].column; const ColumnNullable * arg_nullable = checkAndGetColumn(*arg_column); if (!nullable && !arg_nullable) + { return executeOnNonNullable(arguments, result_type); + } else { /** @@ -483,34 +572,6 @@ public: } } -private: - using ResultType = typename ConcreteAction::ResultType; - using ResultColumnType = ColumnVector; - using ResultColumnPtr = decltype(ResultColumnType::create()); - - using NullMaps = std::pair; - - struct ExecutionData - { - const IColumn& left; - const IColumn& right; - const ColumnArray::Offsets& offsets; - ColumnPtr result_column; - NullMaps maps; - ResultColumnPtr result { ResultColumnType::create() }; - - inline void moveResult() { result_column = std::move(result); } - }; - - static inline bool allowArguments(const DataTypePtr & array_inner_type, const DataTypePtr & arg) - { - auto inner_type_decayed = removeNullable(removeLowCardinality(array_inner_type)); - auto arg_decayed = removeNullable(removeLowCardinality(arg)); - - return ((isNativeNumber(inner_type_decayed) || isEnum(inner_type_decayed)) && isNativeNumber(arg_decayed)) - || getLeastSupertype({inner_type_decayed, arg_decayed}); - } - #define INTEGRAL_TPL_PACK UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, Float64 ColumnPtr executeOnNonNullable(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type) const diff --git a/src/Functions/buildId.cpp b/src/Functions/buildId.cpp deleted file mode 100644 index 047bddeed9b..00000000000 --- a/src/Functions/buildId.cpp +++ /dev/null @@ -1,80 +0,0 @@ -#if defined(__ELF__) && !defined(__FreeBSD__) - -#include -#include -#include -#include -#include -#include - - -namespace DB -{ -namespace -{ - -/** buildId() - returns the compiler build id of the running binary. - */ -class FunctionBuildId : public IFunction -{ -public: - static constexpr auto name = "buildId"; - static FunctionPtr create(ContextPtr context) - { - return std::make_shared(context->isDistributed()); - } - - explicit FunctionBuildId(bool is_distributed_) : is_distributed(is_distributed_) - { - } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override - { - return 0; - } - - bool isDeterministic() const override { return false; } - bool isDeterministicInScopeOfQuery() const override { return true; } - bool isSuitableForConstantFolding() const override { return !is_distributed; } - bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override - { - return false; - } - - DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override - { - return std::make_shared(); - } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override - { - return DataTypeString().createColumnConst(input_rows_count, SymbolIndex::instance()->getBuildIDHex()); - } - -private: - bool is_distributed; -}; - -} - -void registerFunctionBuildId(FunctionFactory & factory) -{ - factory.registerFunction(); -} - -} - -#else - -namespace DB -{ -class FunctionFactory; -void registerFunctionBuildId(FunctionFactory &) {} -} - -#endif diff --git a/src/Functions/e.cpp b/src/Functions/e.cpp deleted file mode 100644 index c43bb7d572a..00000000000 --- a/src/Functions/e.cpp +++ /dev/null @@ -1,24 +0,0 @@ -#include -#include - -namespace DB -{ -namespace -{ - -struct EImpl -{ - static constexpr auto name = "e"; - static constexpr double value = 2.7182818284590452353602874713526624977572470; -}; - -using FunctionE = FunctionMathConstFloat64; - -} - -void registerFunctionE(FunctionFactory & factory) -{ - factory.registerFunction(); -} - -} diff --git a/src/Functions/hostName.cpp b/src/Functions/hostName.cpp deleted file mode 100644 index 2739b37e175..00000000000 --- a/src/Functions/hostName.cpp +++ /dev/null @@ -1,70 +0,0 @@ -#include -#include -#include -#include -#include -#include - - -namespace DB -{ -namespace -{ - -/// Get the host name. Is is constant on single server, but is not constant in distributed queries. -class FunctionHostName : public IFunction -{ -public: - static constexpr auto name = "hostName"; - static FunctionPtr create(ContextPtr context) - { - return std::make_shared(context->isDistributed()); - } - - explicit FunctionHostName(bool is_distributed_) : is_distributed(is_distributed_) - { - } - - String getName() const override - { - return name; - } - - bool isDeterministic() const override { return false; } - - bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; } - - bool isDeterministicInScopeOfQuery() const override - { - return true; - } - - bool isSuitableForConstantFolding() const override { return !is_distributed; } - - size_t getNumberOfArguments() const override - { - return 0; - } - - DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override - { - return std::make_shared(); - } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr & result_type, size_t input_rows_count) const override - { - return result_type->createColumnConst(input_rows_count, DNSResolver::instance().getHostName()); - } -private: - bool is_distributed; -}; - -} - -void registerFunctionHostName(FunctionFactory & factory) -{ - factory.registerFunction(); - factory.registerAlias("hostname", "hostName"); -} - -} diff --git a/src/Functions/map.cpp b/src/Functions/map.cpp index c1c639cb6f9..5517dced3e0 100644 --- a/src/Functions/map.cpp +++ b/src/Functions/map.cpp @@ -155,61 +155,25 @@ public: return NameMapContains::name; } - size_t getNumberOfArguments() const override { return 2; } + size_t getNumberOfArguments() const override { return impl.getNumberOfArguments(); } - bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; } + bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & arguments) const override + { + return impl.isSuitableForShortCircuitArgumentsExecution(arguments); + } DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override { - if (arguments.size() != 2) - throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " - + toString(arguments.size()) + ", should be 2", - ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); - - const DataTypeMap * map_type = checkAndGetDataType(arguments[0].type.get()); - - if (!map_type) - throw Exception{"First argument for function " + getName() + " must be a map", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT}; - - auto key_type = map_type->getKeyType(); - - if (!(isNumber(arguments[1].type) && isNumber(key_type)) - && key_type->getName() != arguments[1].type->getName()) - throw Exception{"Second argument for function " + getName() + " must be a " + key_type->getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT}; - - return std::make_shared(); + return impl.getReturnTypeImpl(arguments); } - bool useDefaultImplementationForConstants() const override { return true; } - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override { - bool is_const = isColumnConst(*arguments[0].column); - const ColumnMap * col_map = is_const ? checkAndGetColumnConstData(arguments[0].column.get()) : checkAndGetColumn(arguments[0].column.get()); - const DataTypeMap * map_type = checkAndGetDataType(arguments[0].type.get()); - if (!col_map || !map_type) - throw Exception{"First argument for function " + getName() + " must be a map", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT}; - - auto key_type = map_type->getKeyType(); - const auto & nested_column = col_map->getNestedColumn(); - const auto & keys_data = col_map->getNestedData().getColumn(0); - - /// Prepare arguments to call arrayIndex for check has the array element. - ColumnPtr column_array = ColumnArray::create(keys_data.getPtr(), nested_column.getOffsetsPtr()); - ColumnsWithTypeAndName new_arguments = - { - { - is_const ? ColumnConst::create(std::move(column_array), keys_data.size()) : std::move(column_array), - std::make_shared(key_type), - "" - }, - arguments[1] - }; - - return FunctionArrayIndex().executeImpl(new_arguments, result_type, input_rows_count); + return impl.executeImpl(arguments, result_type, input_rows_count); } + +private: + FunctionArrayIndex impl; }; diff --git a/src/Functions/mathConstants.cpp b/src/Functions/mathConstants.cpp new file mode 100644 index 00000000000..ecc2f8c48b5 --- /dev/null +++ b/src/Functions/mathConstants.cpp @@ -0,0 +1,47 @@ +#include +#include + +namespace DB +{ + +namespace +{ + template + class FunctionMathConstFloat64 : public FunctionConstantBase, Float64, DataTypeFloat64> + { + public: + static constexpr auto name = Impl::name; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + FunctionMathConstFloat64() : FunctionConstantBase, Float64, DataTypeFloat64>(Impl::value) {} + }; + + + struct EImpl + { + static constexpr char name[] = "e"; + static constexpr double value = 2.7182818284590452353602874713526624977572470; + }; + + using FunctionE = FunctionMathConstFloat64; + + + struct PiImpl + { + static constexpr char name[] = "pi"; + static constexpr double value = 3.1415926535897932384626433832795028841971693; + }; + + using FunctionPi = FunctionMathConstFloat64; +} + +void registerFunctionE(FunctionFactory & factory) +{ + factory.registerFunction(); +} + +void registerFunctionPi(FunctionFactory & factory) +{ + factory.registerFunction(FunctionFactory::CaseInsensitive); +} + +} diff --git a/src/Functions/pi.cpp b/src/Functions/pi.cpp deleted file mode 100644 index efa536c7314..00000000000 --- a/src/Functions/pi.cpp +++ /dev/null @@ -1,24 +0,0 @@ -#include -#include - -namespace DB -{ -namespace -{ - -struct PiImpl -{ - static constexpr auto name = "pi"; - static constexpr double value = 3.1415926535897932384626433832795028841971693; -}; - -using FunctionPi = FunctionMathConstFloat64; - -} - -void registerFunctionPi(FunctionFactory & factory) -{ - factory.registerFunction(FunctionFactory::CaseInsensitive); -} - -} diff --git a/src/Functions/registerFunctionsMiscellaneous.cpp b/src/Functions/registerFunctionsMiscellaneous.cpp index 04561203c67..dfd986c5f82 100644 --- a/src/Functions/registerFunctionsMiscellaneous.cpp +++ b/src/Functions/registerFunctionsMiscellaneous.cpp @@ -80,6 +80,7 @@ void registerFunctionIsIPAddressContainedIn(FunctionFactory &); void registerFunctionQueryID(FunctionFactory & factory); void registerFunctionInitialQueryID(FunctionFactory & factory); void registerFunctionServerUUID(FunctionFactory &); +void registerFunctionZooKeeperSessionUptime(FunctionFactory &); #if USE_ICU void registerFunctionConvertCharset(FunctionFactory &); @@ -160,6 +161,7 @@ void registerFunctionsMiscellaneous(FunctionFactory & factory) registerFunctionQueryID(factory); registerFunctionInitialQueryID(factory); registerFunctionServerUUID(factory); + registerFunctionZooKeeperSessionUptime(factory); #if USE_ICU registerFunctionConvertCharset(factory); diff --git a/src/Functions/serverConstants.cpp b/src/Functions/serverConstants.cpp new file mode 100644 index 00000000000..6808e6607cf --- /dev/null +++ b/src/Functions/serverConstants.cpp @@ -0,0 +1,144 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#if !defined(ARCADIA_BUILD) +# include +#endif + + +namespace DB +{ +namespace +{ + +#if defined(__ELF__) && !defined(__FreeBSD__) + /// buildId() - returns the compiler build id of the running binary. + class FunctionBuildId : public FunctionConstantBase + { + public: + static constexpr auto name = "buildId"; + static FunctionPtr create(ContextPtr context) { return std::make_shared(context); } + explicit FunctionBuildId(ContextPtr context) : FunctionConstantBase(context, SymbolIndex::instance()->getBuildIDHex()) {} + }; +#endif + + + /// Get the host name. Is is constant on single server, but is not constant in distributed queries. + class FunctionHostName : public FunctionConstantBase + { + public: + static constexpr auto name = "hostName"; + static FunctionPtr create(ContextPtr context) { return std::make_shared(context); } + explicit FunctionHostName(ContextPtr context) : FunctionConstantBase(context, DNSResolver::instance().getHostName()) {} + }; + + + class FunctionServerUUID : public FunctionConstantBase + { + public: + static constexpr auto name = "serverUUID"; + static FunctionPtr create(ContextPtr context) { return std::make_shared(context); } + explicit FunctionServerUUID(ContextPtr context) : FunctionConstantBase(context, ServerUUID::get()) {} + }; + + + class FunctionTcpPort : public FunctionConstantBase + { + public: + static constexpr auto name = "tcpPort"; + static FunctionPtr create(ContextPtr context) { return std::make_shared(context); } + explicit FunctionTcpPort(ContextPtr context) : FunctionConstantBase(context, context->getTCPPort()) {} + }; + + + /// Returns the server time zone. + class FunctionTimezone : public FunctionConstantBase + { + public: + static constexpr auto name = "timezone"; + static FunctionPtr create(ContextPtr context) { return std::make_shared(context); } + explicit FunctionTimezone(ContextPtr context) : FunctionConstantBase(context, String{DateLUT::instance().getTimeZone()}) {} + }; + + + /// Returns server uptime in seconds. + class FunctionUptime : public FunctionConstantBase + { + public: + static constexpr auto name = "uptime"; + static FunctionPtr create(ContextPtr context) { return std::make_shared(context); } + explicit FunctionUptime(ContextPtr context) : FunctionConstantBase(context, context->getUptimeSeconds()) {} + }; + + + /// version() - returns the current version as a string. + class FunctionVersion : public FunctionConstantBase + { + public: + static constexpr auto name = "version"; + static FunctionPtr create(ContextPtr context) { return std::make_shared(context); } + explicit FunctionVersion(ContextPtr context) : FunctionConstantBase(context, VERSION_STRING) {} + }; + + class FunctionZooKeeperSessionUptime : public FunctionConstantBase + { + public: + static constexpr auto name = "zookeeperSessionUptime"; + explicit FunctionZooKeeperSessionUptime(ContextPtr context) : FunctionConstantBase(context, context->getZooKeeperSessionUptime()) {} + static FunctionPtr create(ContextPtr context) { return std::make_shared(context); } + }; +} + + +void registerFunctionBuildId([[maybe_unused]] FunctionFactory & factory) +{ +#if defined(__ELF__) && !defined(__FreeBSD__) + factory.registerFunction(); +#endif +} + +void registerFunctionHostName(FunctionFactory & factory) +{ + factory.registerFunction(); + factory.registerAlias("hostname", "hostName"); +} + +void registerFunctionServerUUID(FunctionFactory & factory) +{ + factory.registerFunction(); +} + +void registerFunctionTcpPort(FunctionFactory & factory) +{ + factory.registerFunction(); +} + +void registerFunctionTimezone(FunctionFactory & factory) +{ + factory.registerFunction(); + factory.registerAlias("timeZone", "timezone"); +} + +void registerFunctionUptime(FunctionFactory & factory) +{ + factory.registerFunction(); +} + +void registerFunctionVersion(FunctionFactory & factory) +{ + factory.registerFunction(FunctionFactory::CaseInsensitive); +} + +void registerFunctionZooKeeperSessionUptime(FunctionFactory & factory) +{ + factory.registerFunction(); +} + +} + diff --git a/src/Functions/serverUUID.cpp b/src/Functions/serverUUID.cpp deleted file mode 100644 index 4b70b1576ac..00000000000 --- a/src/Functions/serverUUID.cpp +++ /dev/null @@ -1,60 +0,0 @@ -#include -#include -#include -#include - - -namespace DB -{ - -namespace -{ - -class FunctionServerUUID : public IFunction - { - public: - static constexpr auto name = "serverUUID"; - - static FunctionPtr create(ContextPtr context) - { - return std::make_shared(context->isDistributed(), ServerUUID::get()); - } - - explicit FunctionServerUUID(bool is_distributed_, UUID server_uuid_) - : is_distributed(is_distributed_), server_uuid(server_uuid_) - { - } - - String getName() const override { return name; } - - size_t getNumberOfArguments() const override { return 0; } - - DataTypePtr getReturnTypeImpl(const DataTypes &) const override { return std::make_shared(); } - - bool isDeterministic() const override { return false; } - - bool isDeterministicInScopeOfQuery() const override { return true; } - - bool isSuitableForConstantFolding() const override { return !is_distributed; } - - bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo &) const override { return false; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override - { - return DataTypeUUID().createColumnConst(input_rows_count, server_uuid); - } - - private: - bool is_distributed; - const UUID server_uuid; - }; - -} - -void registerFunctionServerUUID(FunctionFactory & factory) -{ - factory.registerFunction(); -} - -} - diff --git a/src/Functions/tcpPort.cpp b/src/Functions/tcpPort.cpp deleted file mode 100644 index 10b89faa1be..00000000000 --- a/src/Functions/tcpPort.cpp +++ /dev/null @@ -1,57 +0,0 @@ -#include -#include -#include - - -namespace DB -{ - -namespace -{ - -class FunctionTcpPort : public IFunction -{ -public: - static constexpr auto name = "tcpPort"; - - static FunctionPtr create(ContextPtr context) - { - return std::make_shared(context->isDistributed(), context->getTCPPort()); - } - - explicit FunctionTcpPort(bool is_distributed_, UInt16 port_) : is_distributed(is_distributed_), port(port_) - { - } - - String getName() const override { return name; } - - size_t getNumberOfArguments() const override { return 0; } - - DataTypePtr getReturnTypeImpl(const DataTypes &) const override { return std::make_shared(); } - - bool isDeterministic() const override { return false; } - - bool isDeterministicInScopeOfQuery() const override { return true; } - - bool isSuitableForConstantFolding() const override { return !is_distributed; } - - bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override - { - return DataTypeUInt16().createColumnConst(input_rows_count, port); - } - -private: - bool is_distributed; - const UInt64 port; -}; - -} - -void registerFunctionTcpPort(FunctionFactory & factory) -{ - factory.registerFunction(); -} - -} diff --git a/src/Functions/timezone.cpp b/src/Functions/timezone.cpp deleted file mode 100644 index 3b2319c22ca..00000000000 --- a/src/Functions/timezone.cpp +++ /dev/null @@ -1,65 +0,0 @@ -#include -#include -#include -#include -#include -#include - - -namespace DB -{ -namespace -{ - -/** Returns the server time zone. - */ -class FunctionTimezone : public IFunction -{ -public: - static constexpr auto name = "timezone"; - static FunctionPtr create(ContextPtr context) - { - return std::make_shared(context->isDistributed()); - } - - explicit FunctionTimezone(bool is_distributed_) : is_distributed(is_distributed_) - { - } - - String getName() const override - { - return name; - } - size_t getNumberOfArguments() const override - { - return 0; - } - - DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override - { - return std::make_shared(); - } - - bool isDeterministic() const override { return false; } - bool isDeterministicInScopeOfQuery() const override { return true; } - bool isSuitableForConstantFolding() const override { return !is_distributed; } - - bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override - { - return DataTypeString().createColumnConst(input_rows_count, DateLUT::instance().getTimeZone()); - } -private: - bool is_distributed; -}; - -} - -void registerFunctionTimezone(FunctionFactory & factory) -{ - factory.registerFunction(); - factory.registerAlias("timeZone", "timezone"); -} - -} diff --git a/src/Functions/version.cpp b/src/Functions/version.cpp deleted file mode 100644 index 81e40655eef..00000000000 --- a/src/Functions/version.cpp +++ /dev/null @@ -1,63 +0,0 @@ -#include -#include -#include -#include -#include - -#if !defined(ARCADIA_BUILD) -# include -#endif - -namespace DB -{ - -/** version() - returns the current version as a string. - */ -class FunctionVersion : public IFunction -{ -public: - static constexpr auto name = "version"; - static FunctionPtr create(ContextPtr context) - { - return std::make_shared(context->isDistributed()); - } - - explicit FunctionVersion(bool is_distributed_) : is_distributed(is_distributed_) - { - } - - String getName() const override - { - return name; - } - - bool isDeterministic() const override { return false; } - bool isDeterministicInScopeOfQuery() const override { return true; } - bool isSuitableForConstantFolding() const override { return !is_distributed; } - bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; } - - size_t getNumberOfArguments() const override - { - return 0; - } - - DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override - { - return std::make_shared(); - } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override - { - return DataTypeString().createColumnConst(input_rows_count, VERSION_STRING); - } -private: - bool is_distributed; -}; - - -void registerFunctionVersion(FunctionFactory & factory) -{ - factory.registerFunction(FunctionFactory::CaseInsensitive); -} - -} diff --git a/src/Functions/ya.make b/src/Functions/ya.make index fbfff751314..431f279e682 100644 --- a/src/Functions/ya.make +++ b/src/Functions/ya.make @@ -218,7 +218,6 @@ SRCS( blockNumber.cpp blockSerializedSize.cpp blockSize.cpp - buildId.cpp byteSize.cpp caseWithExpression.cpp cbrt.cpp @@ -249,7 +248,6 @@ SRCS( divide/divide.cpp divide/divideImpl.cpp dumpColumnStructure.cpp - e.cpp empty.cpp encodeXMLComponent.cpp encrypt.cpp @@ -306,6 +304,7 @@ SRCS( h3IndexesAreNeighbors.cpp h3IsValid.cpp h3ToChildren.cpp + h3ToGeoBoundary.cpp h3ToParent.cpp h3ToString.cpp h3kRing.cpp @@ -314,7 +313,6 @@ SRCS( hasThreadFuzzer.cpp hasToken.cpp hasTokenCaseInsensitive.cpp - hostName.cpp hyperscanRegexpChecker.cpp hypot.cpp identity.cpp @@ -362,6 +360,7 @@ SRCS( map.cpp match.cpp materialize.cpp + mathConstants.cpp minus.cpp modulo.cpp moduloOrZero.cpp @@ -402,7 +401,6 @@ SRCS( nullIf.cpp padString.cpp partitionId.cpp - pi.cpp plus.cpp pointInEllipses.cpp pointInPolygon.cpp @@ -479,7 +477,7 @@ SRCS( s2RectIntersection.cpp s2RectUnion.cpp s2ToGeo.cpp - serverUUID.cpp + serverConstants.cpp sigmoid.cpp sign.cpp sin.cpp @@ -505,13 +503,11 @@ SRCS( synonyms.cpp tan.cpp tanh.cpp - tcpPort.cpp tgamma.cpp throwIf.cpp tid.cpp timeSlot.cpp timeSlots.cpp - timezone.cpp timezoneOf.cpp timezoneOffset.cpp toColumnTypeName.cpp @@ -574,9 +570,7 @@ SRCS( tupleToNameValuePairs.cpp upper.cpp upperUTF8.cpp - uptime.cpp validateNestedArraySizes.cpp - version.cpp visibleWidth.cpp visitParamExtractBool.cpp visitParamExtractFloat.cpp diff --git a/src/IO/ReadBufferFromPocoSocket.cpp b/src/IO/ReadBufferFromPocoSocket.cpp index 50e0fad0265..527b68e623a 100644 --- a/src/IO/ReadBufferFromPocoSocket.cpp +++ b/src/IO/ReadBufferFromPocoSocket.cpp @@ -54,7 +54,9 @@ bool ReadBufferFromPocoSocket::nextImpl() } catch (const Poco::TimeoutException &) { - throw NetException("Timeout exceeded while reading from socket (" + peer_address.toString() + ")", ErrorCodes::SOCKET_TIMEOUT); + throw NetException(fmt::format("Timeout exceeded while reading from socket ({}, {} ms)", + peer_address.toString(), + socket.impl()->getReceiveTimeout().totalMilliseconds()), ErrorCodes::SOCKET_TIMEOUT); } catch (const Poco::IOException & e) { diff --git a/src/IO/ReadSettings.h b/src/IO/ReadSettings.h index 100041d3dec..379b7bc2216 100644 --- a/src/IO/ReadSettings.h +++ b/src/IO/ReadSettings.h @@ -66,6 +66,9 @@ struct ReadSettings /// For 'pread_threadpool' method. Lower is more priority. size_t priority = 0; + size_t remote_fs_backoff_threshold = 10000; + size_t remote_fs_backoff_max_tries = 4; + ReadSettings adjustBufferSize(size_t file_size) const { ReadSettings res = *this; diff --git a/src/IO/WriteBufferFromPocoSocket.cpp b/src/IO/WriteBufferFromPocoSocket.cpp index a0e4de4c831..79c8952f5a3 100644 --- a/src/IO/WriteBufferFromPocoSocket.cpp +++ b/src/IO/WriteBufferFromPocoSocket.cpp @@ -57,7 +57,9 @@ void WriteBufferFromPocoSocket::nextImpl() } catch (const Poco::TimeoutException &) { - throw NetException("Timeout exceeded while writing to socket (" + peer_address.toString() + ")", ErrorCodes::SOCKET_TIMEOUT); + throw NetException(fmt::format("Timeout exceeded while writing to socket ({}, {} ms)", + peer_address.toString(), + socket.impl()->getSendTimeout().totalMilliseconds()), ErrorCodes::SOCKET_TIMEOUT); } catch (const Poco::IOException & e) { diff --git a/src/Interpreters/AsynchronousInsertQueue.cpp b/src/Interpreters/AsynchronousInsertQueue.cpp index 5b9521f334e..da41eb82d5e 100644 --- a/src/Interpreters/AsynchronousInsertQueue.cpp +++ b/src/Interpreters/AsynchronousInsertQueue.cpp @@ -18,6 +18,7 @@ #include #include #include +#include namespace DB @@ -27,6 +28,7 @@ namespace ErrorCodes { extern const int TIMEOUT_EXCEEDED; extern const int UNKNOWN_EXCEPTION; + extern const int UNKNOWN_FORMAT; } AsynchronousInsertQueue::InsertQuery::InsertQuery(const ASTPtr & query_, const Settings & settings_) @@ -166,6 +168,9 @@ void AsynchronousInsertQueue::push(ASTPtr query, ContextPtr query_context) auto table = interpreter.getTable(insert_query); auto sample_block = interpreter.getSampleBlock(insert_query, table, table->getInMemoryMetadataPtr()); + if (!FormatFactory::instance().isInputFormat(insert_query.format)) + throw Exception(ErrorCodes::UNKNOWN_FORMAT, "Unknown input format {}", insert_query.format); + query_context->checkAccess(AccessType::INSERT, insert_query.table_id, sample_block.getNames()); String bytes; @@ -324,7 +329,7 @@ void AsynchronousInsertQueue::cleanup() } if (total_removed) - LOG_TRACE(log, "Removed stale entries for {} queries from asynchronous insertion queue", keys_to_remove.size()); + LOG_TRACE(log, "Removed stale entries for {} queries from asynchronous insertion queue", total_removed); } { diff --git a/src/Interpreters/Context.cpp b/src/Interpreters/Context.cpp index 14a91959d5c..7e43343ab34 100644 --- a/src/Interpreters/Context.cpp +++ b/src/Interpreters/Context.cpp @@ -1689,6 +1689,14 @@ zkutil::ZooKeeperPtr Context::getZooKeeper() const return shared->zookeeper; } +UInt32 Context::getZooKeeperSessionUptime() const +{ + std::lock_guard lock(shared->zookeeper_mutex); + if (!shared->zookeeper || shared->zookeeper->expired()) + return 0; + return shared->zookeeper->getSessionUptime(); +} + void Context::setSystemZooKeeperLogAfterInitializationIfNeeded() { /// It can be nearly impossible to understand in which order global objects are initialized on server startup. @@ -2769,8 +2777,8 @@ void Context::setAsynchronousInsertQueue(const std::shared_ptrasync_insert_queue = ptr; } @@ -2836,6 +2844,9 @@ ReadSettings Context::getReadSettings() const res.local_fs_prefetch = settings.local_filesystem_read_prefetch; res.remote_fs_prefetch = settings.remote_filesystem_read_prefetch; + res.remote_fs_backoff_threshold = settings.remote_fs_read_backoff_threshold; + res.remote_fs_backoff_max_tries = settings.remote_fs_read_backoff_max_tries; + res.local_fs_buffer_size = settings.max_read_buffer_size; res.direct_io_threshold = settings.min_bytes_to_use_direct_io; res.mmap_threshold = settings.min_bytes_to_use_mmap_io; diff --git a/src/Interpreters/Context.h b/src/Interpreters/Context.h index b7472fb1c29..f8718967aa3 100644 --- a/src/Interpreters/Context.h +++ b/src/Interpreters/Context.h @@ -659,6 +659,8 @@ public: /// Same as above but return a zookeeper connection from auxiliary_zookeepers configuration entry. std::shared_ptr getAuxiliaryZooKeeper(const String & name) const; + UInt32 getZooKeeperSessionUptime() const; + #if USE_NURAFT std::shared_ptr & getKeeperDispatcher() const; #endif diff --git a/src/Interpreters/DDLWorker.cpp b/src/Interpreters/DDLWorker.cpp index 7abe78472b0..4c08ff06e87 100644 --- a/src/Interpreters/DDLWorker.cpp +++ b/src/Interpreters/DDLWorker.cpp @@ -772,7 +772,9 @@ bool DDLWorker::tryExecuteQueryOnLeaderReplica( String shard_path = task.getShardNodePath(); String is_executed_path = fs::path(shard_path) / "executed"; String tries_to_execute_path = fs::path(shard_path) / "tries_to_execute"; - zookeeper->createAncestors(fs::path(shard_path) / ""); /* appends "/" at the end of shard_path */ + assert(shard_path.starts_with(String(fs::path(task.entry_path) / "shards" / ""))); + zookeeper->createIfNotExists(fs::path(task.entry_path) / "shards", ""); + zookeeper->createIfNotExists(shard_path, ""); /// Leader replica creates is_executed_path node on successful query execution. /// We will remove create_shard_flag from zk operations list, if current replica is just waiting for leader to execute the query. diff --git a/src/Interpreters/DatabaseCatalog.cpp b/src/Interpreters/DatabaseCatalog.cpp index ace258d9013..f273f8a165d 100644 --- a/src/Interpreters/DatabaseCatalog.cpp +++ b/src/Interpreters/DatabaseCatalog.cpp @@ -157,15 +157,6 @@ void DatabaseCatalog::loadDatabases() /// Another background thread which drops temporary LiveViews. /// We should start it after loadMarkedAsDroppedTables() to avoid race condition. TemporaryLiveViewCleaner::instance().startup(); - - /// Start up tables after all databases are loaded. - for (const auto & [database_name, database] : databases) - { - if (database_name == DatabaseCatalog::TEMPORARY_DATABASE) - continue; - - database->startupTables(); - } } void DatabaseCatalog::shutdownImpl() diff --git a/src/Interpreters/ExternalDictionariesLoader.cpp b/src/Interpreters/ExternalDictionariesLoader.cpp index cbb0e52b91b..bf2ce9e66ee 100644 --- a/src/Interpreters/ExternalDictionariesLoader.cpp +++ b/src/Interpreters/ExternalDictionariesLoader.cpp @@ -89,57 +89,53 @@ DictionaryStructure ExternalDictionariesLoader::getDictionaryStructure(const std std::string ExternalDictionariesLoader::resolveDictionaryName(const std::string & dictionary_name, const std::string & current_database_name) const { - bool has_dictionary = has(dictionary_name); - if (has_dictionary) + if (has(dictionary_name)) return dictionary_name; - std::string resolved_name = resolveDictionaryNameFromDatabaseCatalog(dictionary_name); - has_dictionary = has(resolved_name); + std::string resolved_name = resolveDictionaryNameFromDatabaseCatalog(dictionary_name, current_database_name); - if (!has_dictionary) - { - /// If dictionary not found. And database was not implicitly specified - /// we can qualify dictionary name with current database name. - /// It will help if dictionary is created with DDL and is in current database. - if (dictionary_name.find('.') == std::string::npos) - { - String dictionary_name_with_database = current_database_name + '.' + dictionary_name; - resolved_name = resolveDictionaryNameFromDatabaseCatalog(dictionary_name_with_database); - has_dictionary = has(resolved_name); - } - } + if (has(resolved_name)) + return resolved_name; - if (!has_dictionary) - throw Exception(ErrorCodes::BAD_ARGUMENTS, "Dictionary ({}) not found", backQuote(dictionary_name)); - - return resolved_name; + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Dictionary ({}) not found", backQuote(dictionary_name)); } -std::string ExternalDictionariesLoader::resolveDictionaryNameFromDatabaseCatalog(const std::string & name) const +std::string ExternalDictionariesLoader::resolveDictionaryNameFromDatabaseCatalog(const std::string & name, const std::string & current_database_name) const { /// If it's dictionary from Atomic database, then we need to convert qualified name to UUID. /// Try to split name and get id from associated StorageDictionary. /// If something went wrong, return name as is. - auto pos = name.find('.'); - if (pos == std::string::npos || name.find('.', pos + 1) != std::string::npos) - return name; + String res = name; - std::string maybe_database_name = name.substr(0, pos); - std::string maybe_table_name = name.substr(pos + 1); + auto qualified_name = QualifiedTableName::tryParseFromString(name); + if (!qualified_name) + return res; + + if (qualified_name->database.empty()) + { + /// Ether database name is not specified and we should use current one + /// or it's an XML dictionary. + bool is_xml_dictionary = has(name); + if (is_xml_dictionary) + return res; + + qualified_name->database = current_database_name; + res = current_database_name + '.' + name; + } auto [db, table] = DatabaseCatalog::instance().tryGetDatabaseAndTable( - {maybe_database_name, maybe_table_name}, + {qualified_name->database, qualified_name->table}, const_pointer_cast(getContext())); if (!db) - return name; + return res; assert(table); if (db->getUUID() == UUIDHelpers::Nil) - return name; + return res; if (table->getName() != "Dictionary") - return name; + return res; return toString(table->getStorageID().uuid); } diff --git a/src/Interpreters/ExternalDictionariesLoader.h b/src/Interpreters/ExternalDictionariesLoader.h index 06f64ef30c5..f748d75d908 100644 --- a/src/Interpreters/ExternalDictionariesLoader.h +++ b/src/Interpreters/ExternalDictionariesLoader.h @@ -42,7 +42,7 @@ protected: std::string resolveDictionaryName(const std::string & dictionary_name, const std::string & current_database_name) const; /// Try convert qualified dictionary name to persistent UUID - std::string resolveDictionaryNameFromDatabaseCatalog(const std::string & name) const; + std::string resolveDictionaryNameFromDatabaseCatalog(const std::string & name, const std::string & current_database_name) const; friend class StorageSystemDictionaries; friend class DatabaseDictionary; diff --git a/src/Interpreters/GatherFunctionQuantileVisitor.cpp b/src/Interpreters/GatherFunctionQuantileVisitor.cpp new file mode 100644 index 00000000000..ec3866ba0c0 --- /dev/null +++ b/src/Interpreters/GatherFunctionQuantileVisitor.cpp @@ -0,0 +1,85 @@ +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + +/// Mapping from quantile functions for single value to plural +static const std::unordered_map quantile_fuse_name_mapping = { + {NameQuantile::name, NameQuantiles::name}, + {NameQuantileBFloat16::name, NameQuantilesBFloat16::name}, + {NameQuantileBFloat16Weighted::name, NameQuantilesBFloat16Weighted::name}, + {NameQuantileDeterministic::name, NameQuantilesDeterministic::name}, + {NameQuantileExact::name, NameQuantilesExact::name}, + {NameQuantileExactExclusive::name, NameQuantilesExactExclusive::name}, + {NameQuantileExactHigh::name, NameQuantilesExactHigh::name}, + {NameQuantileExactInclusive::name, NameQuantilesExactInclusive::name}, + {NameQuantileExactLow::name, NameQuantilesExactLow::name}, + {NameQuantileExactWeighted::name, NameQuantilesExactWeighted::name}, + {NameQuantileTDigest::name, NameQuantilesTDigest::name}, + {NameQuantileTDigestWeighted::name, NameQuantilesTDigestWeighted::name}, + {NameQuantileTiming::name, NameQuantilesTiming::name}, + {NameQuantileTimingWeighted::name, NameQuantilesTimingWeighted::name}, +}; + +String GatherFunctionQuantileData::getFusedName(const String & func_name) +{ + if (auto it = quantile_fuse_name_mapping.find(func_name); it != quantile_fuse_name_mapping.end()) + return it->second; + throw DB::Exception(ErrorCodes::LOGICAL_ERROR, "Function '{}' is not quantile-family or cannot be fused", func_name); +} + +void GatherFunctionQuantileData::visit(ASTFunction & function, ASTPtr & ast) +{ + if (!quantile_fuse_name_mapping.contains(function.name)) + return; + + fuse_quantile[function.name].addFuncNode(ast); +} + +void GatherFunctionQuantileData::FuseQuantileAggregatesData::addFuncNode(ASTPtr & ast) +{ + const auto * func = ast->as(); + if (!func) + return; + + const auto & arguments = func->arguments->children; + + bool need_two_args = func->name == NameQuantileDeterministic::name + || func->name == NameQuantileExactWeighted::name + || func->name == NameQuantileTimingWeighted::name + || func->name == NameQuantileTDigestWeighted::name + || func->name == NameQuantileBFloat16Weighted::name; + if (arguments.size() != (need_two_args ? 2 : 1)) + return; + + if (arguments[0]->getColumnName().find(',') != std::string::npos) + return; + String arg_name = arguments[0]->getColumnName(); + if (need_two_args) + { + if (arguments[1]->getColumnName().find(',') != std::string::npos) + return; + arg_name += "," + arguments[1]->getColumnName(); + } + + arg_map_function[arg_name].push_back(&ast); +} + +bool GatherFunctionQuantileData::needChild(const ASTPtr & node, const ASTPtr &) +{ + /// Skip children of quantile* functions to escape cycles in further processing + if (const auto * func = node ? node->as() : nullptr) + return !quantile_fuse_name_mapping.contains(func->name); + return true; +} + +} + diff --git a/src/Interpreters/GatherFunctionQuantileVisitor.h b/src/Interpreters/GatherFunctionQuantileVisitor.h new file mode 100644 index 00000000000..19f092720af --- /dev/null +++ b/src/Interpreters/GatherFunctionQuantileVisitor.h @@ -0,0 +1,35 @@ +#pragma once + +#include +#include +#include +#include + +namespace DB +{ + +/// Gather all the `quantile*` functions +class GatherFunctionQuantileData +{ +public: + struct FuseQuantileAggregatesData + { + std::unordered_map> arg_map_function; + + void addFuncNode(ASTPtr & ast); + }; + + using TypeToVisit = ASTFunction; + + std::unordered_map fuse_quantile; + + void visit(ASTFunction & function, ASTPtr & ast); + + static String getFusedName(const String & func_name); + + static bool needChild(const ASTPtr & node, const ASTPtr &); +}; + +using GatherFunctionQuantileVisitor = InDepthNodeVisitor, true>; + +} diff --git a/src/Interpreters/InterpreterCreateQuery.cpp b/src/Interpreters/InterpreterCreateQuery.cpp index 7e061662534..db4b8a72a7d 100644 --- a/src/Interpreters/InterpreterCreateQuery.cpp +++ b/src/Interpreters/InterpreterCreateQuery.cpp @@ -53,6 +53,7 @@ #include #include #include +#include #include @@ -271,9 +272,13 @@ BlockIO InterpreterCreateQuery::createDatabase(ASTCreateQuery & create) renamed = true; } - /// We use global context here, because storages lifetime is bigger than query context lifetime - database->loadStoredObjects( - getContext()->getGlobalContext(), has_force_restore_data_flag, create.attach && force_attach, skip_startup_tables); //-V560 + if (!load_database_without_tables) + { + /// We use global context here, because storages lifetime is bigger than query context lifetime + TablesLoader loader{getContext()->getGlobalContext(), {{database_name, database}}, has_force_restore_data_flag, create.attach && force_attach}; //-V560 + loader.loadTables(); + loader.startupTables(); + } } catch (...) { diff --git a/src/Interpreters/InterpreterCreateQuery.h b/src/Interpreters/InterpreterCreateQuery.h index 1ef5e0470fc..89d27a30555 100644 --- a/src/Interpreters/InterpreterCreateQuery.h +++ b/src/Interpreters/InterpreterCreateQuery.h @@ -52,9 +52,9 @@ public: force_attach = force_attach_; } - void setSkipStartupTables(bool skip_startup_tables_) + void setLoadDatabaseWithoutTables(bool load_database_without_tables_) { - skip_startup_tables = skip_startup_tables_; + load_database_without_tables = load_database_without_tables_; } /// Obtain information about columns, their types, default values and column comments, @@ -99,7 +99,7 @@ private: /// Is this an internal query - not from the user. bool internal = false; bool force_attach = false; - bool skip_startup_tables = false; + bool load_database_without_tables = false; mutable String as_database_saved; mutable String as_table_saved; diff --git a/src/Interpreters/InterpreterDropQuery.cpp b/src/Interpreters/InterpreterDropQuery.cpp index 8ad9b658fba..509211df3b6 100644 --- a/src/Interpreters/InterpreterDropQuery.cpp +++ b/src/Interpreters/InterpreterDropQuery.cpp @@ -355,6 +355,13 @@ BlockIO InterpreterDropQuery::executeToDatabaseImpl(const ASTDropQuery & query, } } + if (!drop && query.no_delay) + { + /// Avoid "some tables are still in use" when sync mode is enabled + for (const auto & table_uuid : uuids_to_wait) + database->waitDetachedTableNotInUse(table_uuid); + } + /// Protects from concurrent CREATE TABLE queries auto db_guard = DatabaseCatalog::instance().getExclusiveDDLGuardForDatabase(database_name); diff --git a/src/Interpreters/InterpreterSystemQuery.cpp b/src/Interpreters/InterpreterSystemQuery.cpp index 226ff124cfb..18d31e2f89c 100644 --- a/src/Interpreters/InterpreterSystemQuery.cpp +++ b/src/Interpreters/InterpreterSystemQuery.cpp @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -445,6 +446,12 @@ BlockIO InterpreterSystemQuery::execute() case Type::STOP_LISTEN_QUERIES: case Type::START_LISTEN_QUERIES: throw Exception(ErrorCodes::NOT_IMPLEMENTED, "{} is not supported yet", query.type); + case Type::STOP_THREAD_FUZZER: + ThreadFuzzer::stop(); + break; + case Type::START_THREAD_FUZZER: + ThreadFuzzer::start(); + break; default: throw Exception("Unknown type of SYSTEM query", ErrorCodes::BAD_ARGUMENTS); } @@ -877,6 +884,8 @@ AccessRightsElements InterpreterSystemQuery::getRequiredAccessForDDLOnCluster() } case Type::STOP_LISTEN_QUERIES: break; case Type::START_LISTEN_QUERIES: break; + case Type::STOP_THREAD_FUZZER: break; + case Type::START_THREAD_FUZZER: break; case Type::UNKNOWN: break; case Type::END: break; } diff --git a/src/Interpreters/TreeOptimizer.cpp b/src/Interpreters/TreeOptimizer.cpp index 5f135d8a6e5..3236418fe6f 100644 --- a/src/Interpreters/TreeOptimizer.cpp +++ b/src/Interpreters/TreeOptimizer.cpp @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -626,6 +627,59 @@ void optimizeFunctionsToSubcolumns(ASTPtr & query, const StorageMetadataPtr & me RewriteFunctionToSubcolumnVisitor(data).visit(query); } +std::shared_ptr getQuantileFuseCandidate(const String & func_name, std::vector & functions) +{ + if (functions.size() < 2) + return nullptr; + + const auto & common_arguments = (*functions[0])->as()->arguments->children; + auto func_base = makeASTFunction(GatherFunctionQuantileData::getFusedName(func_name)); + func_base->arguments->children = common_arguments; + func_base->parameters = std::make_shared(); + + for (const auto * ast : functions) + { + assert(ast && *ast); + const auto * func = (*ast)->as(); + assert(func && func->parameters->as()); + const ASTs & parameters = func->parameters->as().children; + if (parameters.size() != 1) + return nullptr; /// query is illegal, give up + func_base->parameters->children.push_back(parameters[0]); + } + return func_base; +} + +/// Rewrites multi quantile()() functions with the same arguments to quantiles()()[] +/// eg:SELECT quantile(0.5)(x), quantile(0.9)(x), quantile(0.95)(x) FROM... +/// rewrite to : SELECT quantiles(0.5, 0.9, 0.95)(x)[1], quantiles(0.5, 0.9, 0.95)(x)[2], quantiles(0.5, 0.9, 0.95)(x)[3] FROM ... +void optimizeFuseQuantileFunctions(ASTPtr & query) +{ + GatherFunctionQuantileVisitor::Data data{}; + GatherFunctionQuantileVisitor(data).visit(query); + for (auto & candidate : data.fuse_quantile) + { + String func_name = candidate.first; + auto & args_to_functions = candidate.second; + + /// Try to fuse multiply `quantile*` Function to plural + for (auto it : args_to_functions.arg_map_function) + { + std::vector & functions = it.second; + auto func_base = getQuantileFuseCandidate(func_name, functions); + if (!func_base) + continue; + for (size_t i = 0; i < functions.size(); ++i) + { + std::shared_ptr ast_new = makeASTFunction("arrayElement", func_base, std::make_shared(i + 1)); + if (const auto & alias = (*functions[i])->tryGetAlias(); !alias.empty()) + ast_new->setAlias(alias); + *functions[i] = ast_new; + } + } + } +} + } void TreeOptimizer::optimizeIf(ASTPtr & query, Aliases & aliases, bool if_chain_to_multiif) @@ -723,6 +777,9 @@ void TreeOptimizer::apply(ASTPtr & query, TreeRewriterResult & result, /// Remove duplicated columns from USING(...). optimizeUsing(select_query); + + if (settings.optimize_syntax_fuse_functions) + optimizeFuseQuantileFunctions(query); } } diff --git a/src/Interpreters/TreeRewriter.cpp b/src/Interpreters/TreeRewriter.cpp index f92baf5536e..8ac7b48ddcb 100644 --- a/src/Interpreters/TreeRewriter.cpp +++ b/src/Interpreters/TreeRewriter.cpp @@ -1072,7 +1072,7 @@ void TreeRewriter::normalize( // if we have at least two different functions. E.g. we will replace sum(x) // and count(x) with sumCount(x).1 and sumCount(x).2, and sumCount() will // be calculated only once because of CSE. - if (settings.optimize_fuse_sum_count_avg) + if (settings.optimize_fuse_sum_count_avg || settings.optimize_syntax_fuse_functions) { FuseSumCountAggregatesVisitor::Data data; FuseSumCountAggregatesVisitor(data).visit(query); diff --git a/src/Interpreters/UserDefinedExecutableFunction.h b/src/Interpreters/UserDefinedExecutableFunction.h index dc5b92ea745..240422a02ca 100644 --- a/src/Interpreters/UserDefinedExecutableFunction.h +++ b/src/Interpreters/UserDefinedExecutableFunction.h @@ -63,7 +63,6 @@ public: std::shared_ptr clone() const override { - std::cerr << "UserDefinedExecutableFunction::clone " << this << std::endl; return std::make_shared(configuration, lifetime, process_pool); } diff --git a/src/Interpreters/ZooKeeperLog.cpp b/src/Interpreters/ZooKeeperLog.cpp index 39bd9a75f3e..fdcbe430834 100644 --- a/src/Interpreters/ZooKeeperLog.cpp +++ b/src/Interpreters/ZooKeeperLog.cpp @@ -19,6 +19,41 @@ namespace DB { + +DataTypePtr getCoordinationErrorCodesEnumType() +{ + return std::make_shared( + DataTypeEnum8::Values + { + {"ZOK", static_cast(Coordination::Error::ZOK)}, + + {"ZSYSTEMERROR", static_cast(Coordination::Error::ZSYSTEMERROR)}, + {"ZRUNTIMEINCONSISTENCY", static_cast(Coordination::Error::ZRUNTIMEINCONSISTENCY)}, + {"ZDATAINCONSISTENCY", static_cast(Coordination::Error::ZDATAINCONSISTENCY)}, + {"ZCONNECTIONLOSS", static_cast(Coordination::Error::ZCONNECTIONLOSS)}, + {"ZMARSHALLINGERROR", static_cast(Coordination::Error::ZMARSHALLINGERROR)}, + {"ZUNIMPLEMENTED", static_cast(Coordination::Error::ZUNIMPLEMENTED)}, + {"ZOPERATIONTIMEOUT", static_cast(Coordination::Error::ZOPERATIONTIMEOUT)}, + {"ZBADARGUMENTS", static_cast(Coordination::Error::ZBADARGUMENTS)}, + {"ZINVALIDSTATE", static_cast(Coordination::Error::ZINVALIDSTATE)}, + + {"ZAPIERROR", static_cast(Coordination::Error::ZAPIERROR)}, + {"ZNONODE", static_cast(Coordination::Error::ZNONODE)}, + {"ZNOAUTH", static_cast(Coordination::Error::ZNOAUTH)}, + {"ZBADVERSION", static_cast(Coordination::Error::ZBADVERSION)}, + {"ZNOCHILDRENFOREPHEMERALS", static_cast(Coordination::Error::ZNOCHILDRENFOREPHEMERALS)}, + {"ZNODEEXISTS", static_cast(Coordination::Error::ZNODEEXISTS)}, + {"ZNOTEMPTY", static_cast(Coordination::Error::ZNOTEMPTY)}, + {"ZSESSIONEXPIRED", static_cast(Coordination::Error::ZSESSIONEXPIRED)}, + {"ZINVALIDCALLBACK", static_cast(Coordination::Error::ZINVALIDCALLBACK)}, + {"ZINVALIDACL", static_cast(Coordination::Error::ZINVALIDACL)}, + {"ZAUTHFAILED", static_cast(Coordination::Error::ZAUTHFAILED)}, + {"ZCLOSING", static_cast(Coordination::Error::ZCLOSING)}, + {"ZNOTHING", static_cast(Coordination::Error::ZNOTHING)}, + {"ZSESSIONMOVED", static_cast(Coordination::Error::ZSESSIONMOVED)}, + }); +} + NamesAndTypesList ZooKeeperLogElement::getNamesAndTypes() { auto type_enum = std::make_shared( @@ -52,36 +87,7 @@ NamesAndTypesList ZooKeeperLogElement::getNamesAndTypes() {"SessionID", static_cast(Coordination::OpNum::SessionID)}, }); - auto error_enum = std::make_shared( - DataTypeEnum8::Values - { - {"ZOK", static_cast(Coordination::Error::ZOK)}, - - {"ZSYSTEMERROR", static_cast(Coordination::Error::ZSYSTEMERROR)}, - {"ZRUNTIMEINCONSISTENCY", static_cast(Coordination::Error::ZRUNTIMEINCONSISTENCY)}, - {"ZDATAINCONSISTENCY", static_cast(Coordination::Error::ZDATAINCONSISTENCY)}, - {"ZCONNECTIONLOSS", static_cast(Coordination::Error::ZCONNECTIONLOSS)}, - {"ZMARSHALLINGERROR", static_cast(Coordination::Error::ZMARSHALLINGERROR)}, - {"ZUNIMPLEMENTED", static_cast(Coordination::Error::ZUNIMPLEMENTED)}, - {"ZOPERATIONTIMEOUT", static_cast(Coordination::Error::ZOPERATIONTIMEOUT)}, - {"ZBADARGUMENTS", static_cast(Coordination::Error::ZBADARGUMENTS)}, - {"ZINVALIDSTATE", static_cast(Coordination::Error::ZINVALIDSTATE)}, - - {"ZAPIERROR", static_cast(Coordination::Error::ZAPIERROR)}, - {"ZNONODE", static_cast(Coordination::Error::ZNONODE)}, - {"ZNOAUTH", static_cast(Coordination::Error::ZNOAUTH)}, - {"ZBADVERSION", static_cast(Coordination::Error::ZBADVERSION)}, - {"ZNOCHILDRENFOREPHEMERALS", static_cast(Coordination::Error::ZNOCHILDRENFOREPHEMERALS)}, - {"ZNODEEXISTS", static_cast(Coordination::Error::ZNODEEXISTS)}, - {"ZNOTEMPTY", static_cast(Coordination::Error::ZNOTEMPTY)}, - {"ZSESSIONEXPIRED", static_cast(Coordination::Error::ZSESSIONEXPIRED)}, - {"ZINVALIDCALLBACK", static_cast(Coordination::Error::ZINVALIDCALLBACK)}, - {"ZINVALIDACL", static_cast(Coordination::Error::ZINVALIDACL)}, - {"ZAUTHFAILED", static_cast(Coordination::Error::ZAUTHFAILED)}, - {"ZCLOSING", static_cast(Coordination::Error::ZCLOSING)}, - {"ZNOTHING", static_cast(Coordination::Error::ZNOTHING)}, - {"ZSESSIONMOVED", static_cast(Coordination::Error::ZSESSIONMOVED)}, - }); + auto error_enum = getCoordinationErrorCodesEnumType(); auto watch_type_enum = std::make_shared( DataTypeEnum8::Values diff --git a/src/Interpreters/ZooKeeperLog.h b/src/Interpreters/ZooKeeperLog.h index d3ef68625af..d721081fdae 100644 --- a/src/Interpreters/ZooKeeperLog.h +++ b/src/Interpreters/ZooKeeperLog.h @@ -73,4 +73,6 @@ class ZooKeeperLog : public SystemLog using SystemLog::SystemLog; }; +DataTypePtr getCoordinationErrorCodesEnumType(); + } diff --git a/src/Interpreters/executeQuery.cpp b/src/Interpreters/executeQuery.cpp index ecf2d87dd5c..0b1746feebc 100644 --- a/src/Interpreters/executeQuery.cpp +++ b/src/Interpreters/executeQuery.cpp @@ -582,7 +582,7 @@ static std::tuple executeQueryImpl( auto * queue = context->getAsynchronousInsertQueue(); const bool async_insert = queue && insert_query && !insert_query->select - && insert_query->hasInlinedData() && settings.async_insert_mode; + && insert_query->hasInlinedData() && settings.async_insert; if (async_insert) { diff --git a/src/Interpreters/loadMetadata.cpp b/src/Interpreters/loadMetadata.cpp index 230831a6674..858b4281f5a 100644 --- a/src/Interpreters/loadMetadata.cpp +++ b/src/Interpreters/loadMetadata.cpp @@ -11,6 +11,7 @@ #include #include +#include #include #include @@ -43,7 +44,7 @@ static void executeCreateQuery( interpreter.setInternal(true); interpreter.setForceAttach(true); interpreter.setForceRestoreData(has_force_restore_data_flag); - interpreter.setSkipStartupTables(true); + interpreter.setLoadDatabaseWithoutTables(true); interpreter.execute(); } @@ -161,8 +162,16 @@ void loadMetadata(ContextMutablePtr context, const String & default_database_nam if (create_default_db_if_not_exists && !metadata_dir_for_default_db_already_exists) databases.emplace(default_database_name, path + "/" + escapeForFileName(default_database_name)); + TablesLoader::Databases loaded_databases; for (const auto & [name, db_path] : databases) + { loadDatabase(context, name, db_path, has_force_restore_data_flag); + loaded_databases.insert({name, DatabaseCatalog::instance().getDatabase(name)}); + } + + TablesLoader loader{context, std::move(loaded_databases), has_force_restore_data_flag, /* force_attach */ true}; + loader.loadTables(); + loader.startupTables(); if (has_force_restore_data_flag) { @@ -197,11 +206,28 @@ static void loadSystemDatabaseImpl(ContextMutablePtr context, const String & dat } } + +void startupSystemTables() +{ + ThreadPool pool; + DatabaseCatalog::instance().getSystemDatabase()->startupTables(pool, /* force_restore */ true, /* force_attach */ true); +} + void loadMetadataSystem(ContextMutablePtr context) { loadSystemDatabaseImpl(context, DatabaseCatalog::SYSTEM_DATABASE, "Atomic"); loadSystemDatabaseImpl(context, DatabaseCatalog::INFORMATION_SCHEMA, "Memory"); loadSystemDatabaseImpl(context, DatabaseCatalog::INFORMATION_SCHEMA_UPPERCASE, "Memory"); + + TablesLoader::Databases databases = + { + {DatabaseCatalog::SYSTEM_DATABASE, DatabaseCatalog::instance().getSystemDatabase()}, + {DatabaseCatalog::INFORMATION_SCHEMA, DatabaseCatalog::instance().getDatabase(DatabaseCatalog::INFORMATION_SCHEMA)}, + {DatabaseCatalog::INFORMATION_SCHEMA_UPPERCASE, DatabaseCatalog::instance().getDatabase(DatabaseCatalog::INFORMATION_SCHEMA_UPPERCASE)}, + }; + TablesLoader loader{context, databases, /* force_restore */ true, /* force_attach */ true}; + loader.loadTables(); + /// Will startup tables in system database after all databases are loaded. } } diff --git a/src/Interpreters/loadMetadata.h b/src/Interpreters/loadMetadata.h index 529d2e43fc8..e918b5f530c 100644 --- a/src/Interpreters/loadMetadata.h +++ b/src/Interpreters/loadMetadata.h @@ -1,6 +1,7 @@ #pragma once #include +#include namespace DB @@ -14,4 +15,8 @@ void loadMetadataSystem(ContextMutablePtr context); /// Use separate function to load system tables. void loadMetadata(ContextMutablePtr context, const String & default_database_name = {}); +/// Background operations in system tables may slowdown loading of the rest tables, +/// so we startup system tables after all databases are loaded. +void startupSystemTables(); + } diff --git a/src/Interpreters/ya.make b/src/Interpreters/ya.make index e8b33d09914..f24a10e55df 100644 --- a/src/Interpreters/ya.make +++ b/src/Interpreters/ya.make @@ -28,6 +28,7 @@ SRCS( ApplyWithSubqueryVisitor.cpp ArithmeticOperationsInAgrFuncOptimize.cpp ArrayJoinAction.cpp + AsynchronousInsertQueue.cpp AsynchronousMetricLog.cpp AsynchronousMetrics.cpp BloomFilter.cpp @@ -64,6 +65,7 @@ SRCS( ExtractExpressionInfoVisitor.cpp FillingRow.cpp FunctionNameNormalizer.cpp + GatherFunctionQuantileVisitor.cpp HashJoin.cpp IExternalLoadable.cpp IInterpreter.cpp diff --git a/src/Parsers/ASTSystemQuery.cpp b/src/Parsers/ASTSystemQuery.cpp index 7575f2718df..ba8e49b98ca 100644 --- a/src/Parsers/ASTSystemQuery.cpp +++ b/src/Parsers/ASTSystemQuery.cpp @@ -3,14 +3,46 @@ #include #include +#include namespace DB { +namespace +{ + std::vector getTypeIndexToTypeName() + { + constexpr std::size_t types_size = magic_enum::enum_count(); + + std::vector type_index_to_type_name; + type_index_to_type_name.resize(types_size); + + auto entries = magic_enum::enum_entries(); + for (const auto & [entry, str] : entries) + { + auto str_copy = String(str); + std::replace(str_copy.begin(), str_copy.end(), '_', ' '); + type_index_to_type_name[static_cast(entry)] = std::move(str_copy); + } + + return type_index_to_type_name; + } +} + +const char * ASTSystemQuery::typeToString(Type type) +{ + /** During parsing if SystemQuery is not parsed properly it is added to Expected variants as description check IParser.h. + * Description string must be statically allocated. + */ + static std::vector type_index_to_type_name = getTypeIndexToTypeName(); + const auto & type_name = type_index_to_type_name[static_cast(type)]; + return type_name.data(); +} + void ASTSystemQuery::formatImpl(const FormatSettings & settings, FormatState &, FormatStateStacked) const { settings.ostr << (settings.hilite ? hilite_keyword : "") << "SYSTEM "; - settings.ostr << type << (settings.hilite ? hilite_none : ""); + settings.ostr << typeToString(type) << (settings.hilite ? hilite_none : ""); auto print_database_table = [&] { diff --git a/src/Parsers/ASTSystemQuery.h b/src/Parsers/ASTSystemQuery.h index f55ccc59160..2248818fd0f 100644 --- a/src/Parsers/ASTSystemQuery.h +++ b/src/Parsers/ASTSystemQuery.h @@ -15,7 +15,7 @@ class ASTSystemQuery : public IAST, public ASTQueryWithOnCluster { public: - enum class Type + enum class Type : UInt64 { UNKNOWN, SHUTDOWN, @@ -61,9 +61,13 @@ public: FLUSH_DISTRIBUTED, STOP_DISTRIBUTED_SENDS, START_DISTRIBUTED_SENDS, + START_THREAD_FUZZER, + STOP_THREAD_FUZZER, END }; + static const char * typeToString(Type type); + Type type = Type::UNKNOWN; String target_model; diff --git a/src/Parsers/ParserSystemQuery.cpp b/src/Parsers/ParserSystemQuery.cpp index 5381566263e..81afdad9a6e 100644 --- a/src/Parsers/ParserSystemQuery.cpp +++ b/src/Parsers/ParserSystemQuery.cpp @@ -6,7 +6,7 @@ #include #include -#include +#include #include namespace ErrorCodes @@ -70,16 +70,11 @@ bool ParserSystemQuery::parseImpl(IParser::Pos & pos, ASTPtr & node, Expected & bool found = false; - // If query is executed on single replica, we want to parse input like FLUSH DISTRIBUTED - // If query is executed on cluster, we also want to parse serialized input like FLUSH_DISTRIBUTED - for (const auto & [entry, str] : magic_enum::enum_entries()) + for (const auto & type : magic_enum::enum_values()) { - String underscore_to_space(str); - std::replace(underscore_to_space.begin(), underscore_to_space.end(), '_', ' '); - - if (ParserKeyword(underscore_to_space).ignore(pos, expected) || ParserKeyword(str).ignore(pos, expected)) + if (ParserKeyword{ASTSystemQuery::typeToString(type)}.ignore(pos, expected)) { - res->type = entry; + res->type = type; found = true; break; } diff --git a/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp b/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp index 1d5f7d5e2f8..2b568166d5b 100644 --- a/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp +++ b/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp @@ -7,15 +7,17 @@ #include #include -#include #include #include +#include +#include #include -#include #include #include #include +#include +#include namespace DB { @@ -50,95 +52,213 @@ void MsgPackVisitor::reset() info_stack = {}; } -void MsgPackVisitor::insert_integer(UInt64 value) // NOLINT +template +static bool checkAndInsertNullable(IColumn & column, DataTypePtr type, InsertFunc insert_func) { - Info & info = info_stack.top(); - switch (info.type->getTypeId()) + if (type->isNullable()) + { + auto & nullable_column = assert_cast(column); + auto & nested_column = nullable_column.getNestedColumn(); + const auto & nested_type = assert_cast(type.get())->getNestedType(); + insert_func(nested_column, nested_type); + nullable_column.getNullMapColumn().insertValue(0); + return true; + } + + return false; +} + +template +static bool checkAndInsertLowCardinality(IColumn & column, DataTypePtr type, InsertFunc insert_func) +{ + if (type->lowCardinality()) + { + auto & lc_column = assert_cast(column); + auto tmp_column = lc_column.getDictionary().getNestedColumn()->cloneEmpty(); + auto dict_type = assert_cast(type.get())->getDictionaryType(); + insert_func(*tmp_column, dict_type); + lc_column.insertFromFullColumn(*tmp_column, 0); + return true; + } + return false; +} + +static void insertInteger(IColumn & column, DataTypePtr type, UInt64 value) +{ + auto insert_func = [&](IColumn & column_, DataTypePtr type_) + { + insertInteger(column_, type_, value); + }; + + if (checkAndInsertNullable(column, type, insert_func) || checkAndInsertLowCardinality(column, type, insert_func)) + return; + + switch (type->getTypeId()) { case TypeIndex::UInt8: { - assert_cast(info.column).insertValue(value); + assert_cast(column).insertValue(value); break; } case TypeIndex::Date: [[fallthrough]]; case TypeIndex::UInt16: { - assert_cast(info.column).insertValue(value); + assert_cast(column).insertValue(value); break; } case TypeIndex::DateTime: [[fallthrough]]; case TypeIndex::UInt32: { - assert_cast(info.column).insertValue(value); + assert_cast(column).insertValue(value); break; } case TypeIndex::UInt64: { - assert_cast(info.column).insertValue(value); + assert_cast(column).insertValue(value); break; } case TypeIndex::Int8: { - assert_cast(info.column).insertValue(value); + assert_cast(column).insertValue(value); break; } case TypeIndex::Int16: { - assert_cast(info.column).insertValue(value); + assert_cast(column).insertValue(value); break; } case TypeIndex::Int32: { - assert_cast(info.column).insertValue(value); + assert_cast(column).insertValue(value); break; } case TypeIndex::Int64: { - assert_cast(info.column).insertValue(value); + assert_cast(column).insertValue(value); break; } case TypeIndex::DateTime64: { - assert_cast(info.column).insertValue(value); + assert_cast(column).insertValue(value); break; } default: - throw Exception("Type " + info.type->getName() + " is not supported for MsgPack input format", ErrorCodes::ILLEGAL_COLUMN); + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot insert MessagePack integer into column with type {}.", type->getName()); } } +static void insertString(IColumn & column, DataTypePtr type, const char * value, size_t size) +{ + auto insert_func = [&](IColumn & column_, DataTypePtr type_) + { + insertString(column_, type_, value, size); + }; + + if (checkAndInsertNullable(column, type, insert_func) || checkAndInsertLowCardinality(column, type, insert_func)) + return; + + if (!isStringOrFixedString(type)) + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot insert MessagePack string into column with type {}.", type->getName()); + + column.insertData(value, size); +} + +static void insertFloat32(IColumn & column, DataTypePtr type, Float32 value) // NOLINT +{ + auto insert_func = [&](IColumn & column_, DataTypePtr type_) + { + insertFloat32(column_, type_, value); + }; + + if (checkAndInsertNullable(column, type, insert_func) || checkAndInsertLowCardinality(column, type, insert_func)) + return; + + if (!WhichDataType(type).isFloat32()) + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot insert MessagePack float32 into column with type {}.", type->getName()); + + assert_cast(column).insertValue(value); +} + +static void insertFloat64(IColumn & column, DataTypePtr type, Float64 value) // NOLINT +{ + auto insert_func = [&](IColumn & column_, DataTypePtr type_) + { + insertFloat64(column_, type_, value); + }; + + if (checkAndInsertNullable(column, type, insert_func) || checkAndInsertLowCardinality(column, type, insert_func)) + return; + + if (!WhichDataType(type).isFloat64()) + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot insert MessagePack float64 into column with type {}.", type->getName()); + + assert_cast(column).insertValue(value); +} + +static void insertNull(IColumn & column, DataTypePtr type) +{ + auto insert_func = [&](IColumn & column_, DataTypePtr type_) + { + insertNull(column_, type_); + }; + + /// LowCardinality(Nullable(...)) + if (checkAndInsertLowCardinality(column, type, insert_func)) + return; + + if (!type->isNullable()) + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot insert MessagePack null into non-nullable column with type {}.", type->getName()); + + assert_cast(column).insertDefault(); +} + bool MsgPackVisitor::visit_positive_integer(UInt64 value) // NOLINT { - insert_integer(value); + insertInteger(info_stack.top().column, info_stack.top().type, value); return true; } bool MsgPackVisitor::visit_negative_integer(Int64 value) // NOLINT { - insert_integer(value); + insertInteger(info_stack.top().column, info_stack.top().type, value); return true; } -bool MsgPackVisitor::visit_str(const char* value, size_t size) // NOLINT +bool MsgPackVisitor::visit_str(const char * value, size_t size) // NOLINT { - info_stack.top().column.insertData(value, size); + insertString(info_stack.top().column, info_stack.top().type, value, size); + return true; +} + +bool MsgPackVisitor::visit_bin(const char * value, size_t size) // NOLINT +{ + insertString(info_stack.top().column, info_stack.top().type, value, size); return true; } bool MsgPackVisitor::visit_float32(Float32 value) // NOLINT { - assert_cast(info_stack.top().column).insertValue(value); + insertFloat32(info_stack.top().column, info_stack.top().type, value); return true; } bool MsgPackVisitor::visit_float64(Float64 value) // NOLINT { - assert_cast(info_stack.top().column).insertValue(value); + insertFloat64(info_stack.top().column, info_stack.top().type, value); + return true; +} + +bool MsgPackVisitor::visit_boolean(bool value) +{ + insertInteger(info_stack.top().column, info_stack.top().type, UInt64(value)); return true; } bool MsgPackVisitor::start_array(size_t size) // NOLINT { + if (!isArray(info_stack.top().type)) + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot insert MessagePack array into column with type {}.", info_stack.top().type->getName()); + auto nested_type = assert_cast(*info_stack.top().type).getNestedType(); ColumnArray & column_array = assert_cast(info_stack.top().column); ColumnArray::Offsets & offsets = column_array.getOffsets(); @@ -154,6 +274,50 @@ bool MsgPackVisitor::end_array() // NOLINT return true; } +bool MsgPackVisitor::start_map(uint32_t size) // NOLINT +{ + if (!isMap(info_stack.top().type)) + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot insert MessagePack map into column with type {}.", info_stack.top().type->getName()); + ColumnArray & column_array = assert_cast(info_stack.top().column).getNestedColumn(); + ColumnArray::Offsets & offsets = column_array.getOffsets(); + offsets.push_back(offsets.back() + size); + return true; +} + +bool MsgPackVisitor::start_map_key() // NOLINT +{ + auto key_column = assert_cast(info_stack.top().column).getNestedData().getColumns()[0]; + auto key_type = assert_cast(*info_stack.top().type).getKeyType(); + info_stack.push(Info{*key_column, key_type}); + return true; +} + +bool MsgPackVisitor::end_map_key() // NOLINT +{ + info_stack.pop(); + return true; +} + +bool MsgPackVisitor::start_map_value() // NOLINT +{ + auto value_column = assert_cast(info_stack.top().column).getNestedData().getColumns()[1]; + auto value_type = assert_cast(*info_stack.top().type).getValueType(); + info_stack.push(Info{*value_column, value_type}); + return true; +} + +bool MsgPackVisitor::end_map_value() // NOLINT +{ + info_stack.pop(); + return true; +} + +bool MsgPackVisitor::visit_nil() +{ + insertNull(info_stack.top().column, info_stack.top().type); + return true; +} + void MsgPackVisitor::parse_error(size_t, size_t) // NOLINT { throw Exception("Error occurred while parsing msgpack data.", ErrorCodes::INCORRECT_DATA); diff --git a/src/Processors/Formats/Impl/MsgPackRowInputFormat.h b/src/Processors/Formats/Impl/MsgPackRowInputFormat.h index ac44772929a..1d790b4485b 100644 --- a/src/Processors/Formats/Impl/MsgPackRowInputFormat.h +++ b/src/Processors/Formats/Impl/MsgPackRowInputFormat.h @@ -33,22 +33,27 @@ public: bool visit_negative_integer(Int64 value); bool visit_float32(Float32 value); bool visit_float64(Float64 value); - bool visit_str(const char* value, size_t size); + bool visit_str(const char * value, size_t size); + bool visit_bin(const char * value, size_t size); + bool visit_boolean(bool value); bool start_array(size_t size); bool end_array(); + bool visit_nil(); + bool start_map(uint32_t size); + bool start_map_key(); + bool end_map_key(); + bool start_map_value(); + bool end_map_value(); /// This function will be called if error occurs in parsing [[noreturn]] void parse_error(size_t parsed_offset, size_t error_offset); /// Update info_stack void set_info(IColumn & column, DataTypePtr type); - - void insert_integer(UInt64 value); - void reset(); private: - /// Stack is needed to process nested arrays + /// Stack is needed to process arrays and maps std::stack info_stack; }; diff --git a/src/Processors/Formats/Impl/MsgPackRowOutputFormat.cpp b/src/Processors/Formats/Impl/MsgPackRowOutputFormat.cpp index bb20d3d9899..27f736128f7 100644 --- a/src/Processors/Formats/Impl/MsgPackRowOutputFormat.cpp +++ b/src/Processors/Formats/Impl/MsgPackRowOutputFormat.cpp @@ -6,15 +6,18 @@ #include #include -#include #include #include +#include +#include #include #include #include #include #include +#include +#include namespace DB { @@ -91,15 +94,15 @@ void MsgPackRowOutputFormat::serializeField(const IColumn & column, DataTypePtr case TypeIndex::String: { const StringRef & string = assert_cast(column).getDataAt(row_num); - packer.pack_str(string.size); - packer.pack_str_body(string.data, string.size); + packer.pack_bin(string.size); + packer.pack_bin_body(string.data, string.size); return; } case TypeIndex::FixedString: { const StringRef & string = assert_cast(column).getDataAt(row_num); - packer.pack_str(string.size); - packer.pack_str_body(string.data, string.size); + packer.pack_bin(string.size); + packer.pack_bin_body(string.data, string.size); return; } case TypeIndex::Array: @@ -132,6 +135,35 @@ void MsgPackRowOutputFormat::serializeField(const IColumn & column, DataTypePtr packer.pack_nil(); return; } + case TypeIndex::Map: + { + const auto & map_column = assert_cast(column); + const auto & nested_column = map_column.getNestedColumn(); + const auto & key_value_columns = map_column.getNestedData().getColumns(); + const auto & key_column = key_value_columns[0]; + const auto & value_column = key_value_columns[1]; + + const auto & map_type = assert_cast(*data_type); + const auto & offsets = nested_column.getOffsets(); + size_t offset = offsets[row_num - 1]; + size_t size = offsets[row_num] - offset; + packer.pack_map(size); + for (size_t i = 0; i < size; ++i) + { + serializeField(*key_column, map_type.getKeyType(), offset + i); + serializeField(*value_column, map_type.getValueType(), offset + i); + } + return; + } + case TypeIndex::LowCardinality: + { + const auto & lc_column = assert_cast(column); + auto dict_type = assert_cast(data_type.get())->getDictionaryType(); + auto dict_column = lc_column.getDictionary().getNestedColumn(); + size_t index = lc_column.getIndexAt(row_num); + serializeField(*dict_column, dict_type, index); + return; + } default: break; } diff --git a/src/Processors/ya.make b/src/Processors/ya.make index 7d1bf047712..f2063609440 100644 --- a/src/Processors/ya.make +++ b/src/Processors/ya.make @@ -28,6 +28,7 @@ SRCS( Executors/PollingQueue.cpp Executors/PullingAsyncPipelineExecutor.cpp Executors/PullingPipelineExecutor.cpp + Executors/StreamingFormatExecutor.cpp ForkProcessor.cpp Formats/IInputFormat.cpp Formats/IOutputFormat.cpp @@ -58,12 +59,8 @@ SRCS( Formats/Impl/MySQLOutputFormat.cpp Formats/Impl/NullFormat.cpp Formats/Impl/ODBCDriver2BlockOutputFormat.cpp - Formats/Impl/ORCBlockInputFormat.cpp - Formats/Impl/ORCBlockOutputFormat.cpp Formats/Impl/ParallelFormattingOutputFormat.cpp Formats/Impl/ParallelParsingInputFormat.cpp - Formats/Impl/ParquetBlockInputFormat.cpp - Formats/Impl/ParquetBlockOutputFormat.cpp Formats/Impl/PostgreSQLOutputFormat.cpp Formats/Impl/PrettyBlockOutputFormat.cpp Formats/Impl/PrettyCompactBlockOutputFormat.cpp diff --git a/src/Storages/ExternalDataSourceConfiguration.cpp b/src/Storages/ExternalDataSourceConfiguration.cpp new file mode 100644 index 00000000000..69af9424eaf --- /dev/null +++ b/src/Storages/ExternalDataSourceConfiguration.cpp @@ -0,0 +1,306 @@ +#include "ExternalDataSourceConfiguration.h" + +#include +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int BAD_ARGUMENTS; +} + +String ExternalDataSourceConfiguration::toString() const +{ + WriteBufferFromOwnString configuration_info; + configuration_info << "username: " << username << "\t"; + if (addresses.empty()) + { + configuration_info << "host: " << host << "\t"; + configuration_info << "port: " << port << "\t"; + } + else + { + for (const auto & [replica_host, replica_port] : addresses) + { + configuration_info << "host: " << replica_host << "\t"; + configuration_info << "port: " << replica_port << "\t"; + } + } + return configuration_info.str(); +} + + +void ExternalDataSourceConfiguration::set(const ExternalDataSourceConfiguration & conf) +{ + host = conf.host; + port = conf.port; + username = conf.username; + password = conf.password; + database = conf.database; + table = conf.table; + schema = conf.schema; +} + + +std::optional getExternalDataSourceConfiguration(const ASTs & args, ContextPtr context, bool is_database_engine) +{ + if (args.empty()) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "External data source must have arguments"); + + ExternalDataSourceConfiguration configuration; + StorageSpecificArgs non_common_args; + + if (const auto * collection = typeid_cast(args[0].get())) + { + const auto & config = context->getConfigRef(); + const auto & config_prefix = fmt::format("named_collections.{}", collection->name()); + + if (!config.has(config_prefix)) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "There is no collection named `{}` in config", collection->name()); + + configuration.host = config.getString(config_prefix + ".host", ""); + configuration.port = config.getInt(config_prefix + ".port", 0); + configuration.username = config.getString(config_prefix + ".user", ""); + configuration.password = config.getString(config_prefix + ".password", ""); + configuration.database = config.getString(config_prefix + ".database", ""); + configuration.table = config.getString(config_prefix + ".table", ""); + configuration.schema = config.getString(config_prefix + ".schema", ""); + + if ((args.size() == 1) && (configuration.host.empty() || configuration.port == 0 + || configuration.username.empty() || configuration.password.empty() + || configuration.database.empty() || (configuration.table.empty() && !is_database_engine))) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Named collection of connection parameters is missing some of the parameters and no key-value arguments are added"); + } + + for (size_t i = 1; i < args.size(); ++i) + { + if (const auto * ast_function = typeid_cast(args[i].get())) + { + const auto * args_expr = assert_cast(ast_function->arguments.get()); + auto function_args = args_expr->children; + if (function_args.size() != 2) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Expected key-value defined argument"); + + auto arg_name = function_args[0]->as()->name(); + auto arg_value = evaluateConstantExpressionOrIdentifierAsLiteral(function_args[1], context)->as()->value; + + if (arg_name == "host") + configuration.host = arg_value.safeGet(); + else if (arg_name == "port") + configuration.port = arg_value.safeGet(); + else if (arg_name == "user") + configuration.username = arg_value.safeGet(); + else if (arg_name == "password") + configuration.password = arg_value.safeGet(); + else if (arg_name == "database") + configuration.database = arg_value.safeGet(); + else if (arg_name == "table") + configuration.table = arg_value.safeGet(); + else if (arg_name == "schema") + configuration.schema = arg_value.safeGet(); + else + non_common_args.emplace_back(std::make_pair(arg_name, arg_value)); + } + else + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Expected key-value defined argument"); + } + } + + ExternalDataSourceConfig source_config{ .configuration = configuration, .specific_args = non_common_args }; + return source_config; + } + return std::nullopt; +} + + +ExternalDataSourceConfiguration getExternalDataSourceConfiguration( + const Poco::Util::AbstractConfiguration & dict_config, const String & dict_config_prefix, ContextPtr context) +{ + ExternalDataSourceConfiguration configuration; + + auto collection_name = dict_config.getString(dict_config_prefix + ".name", ""); + if (!collection_name.empty()) + { + const auto & config = context->getConfigRef(); + const auto & config_prefix = fmt::format("named_collections.{}", collection_name); + + if (!config.has(config_prefix)) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "There is no collection named `{}` in config", collection_name); + + configuration.host = dict_config.getString(dict_config_prefix + ".host", config.getString(config_prefix + ".host", "")); + configuration.port = dict_config.getInt(dict_config_prefix + ".port", config.getUInt(config_prefix + ".port", 0)); + configuration.username = dict_config.getString(dict_config_prefix + ".user", config.getString(config_prefix + ".user", "")); + configuration.password = dict_config.getString(dict_config_prefix + ".password", config.getString(config_prefix + ".password", "")); + configuration.database = dict_config.getString(dict_config_prefix + ".db", config.getString(config_prefix + ".database", "")); + configuration.table = dict_config.getString(dict_config_prefix + ".table", config.getString(config_prefix + ".table", "")); + configuration.schema = dict_config.getString(dict_config_prefix + ".schema", config.getString(config_prefix + ".schema", "")); + + if (configuration.host.empty() || configuration.port == 0 || configuration.username.empty() || configuration.password.empty() + || configuration.database.empty() || configuration.table.empty()) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Named collection of connection parameters is missing some of the parameters and dictionary parameters are added"); + } + } + else + { + configuration.host = dict_config.getString(dict_config_prefix + ".host", ""); + configuration.port = dict_config.getUInt(dict_config_prefix + ".port", 0); + configuration.username = dict_config.getString(dict_config_prefix + ".user", ""); + configuration.password = dict_config.getString(dict_config_prefix + ".password", ""); + configuration.database = dict_config.getString(dict_config_prefix + ".db", ""); + configuration.table = dict_config.getString(fmt::format("{}.table", dict_config_prefix), ""); + configuration.schema = dict_config.getString(fmt::format("{}.schema", dict_config_prefix), ""); + } + return configuration; +} + + +ExternalDataSourcesByPriority getExternalDataSourceConfigurationByPriority( + const Poco::Util::AbstractConfiguration & dict_config, const String & dict_config_prefix, ContextPtr context) +{ + auto common_configuration = getExternalDataSourceConfiguration(dict_config, dict_config_prefix, context); + ExternalDataSourcesByPriority configuration + { + .database = common_configuration.database, + .table = common_configuration.table, + .schema = common_configuration.schema, + .replicas_configurations = {} + }; + + if (dict_config.has(dict_config_prefix + ".replica")) + { + Poco::Util::AbstractConfiguration::Keys config_keys; + dict_config.keys(dict_config_prefix, config_keys); + + for (const auto & config_key : config_keys) + { + if (config_key.starts_with("replica")) + { + ExternalDataSourceConfiguration replica_configuration(common_configuration); + String replica_name = dict_config_prefix + "." + config_key; + size_t priority = dict_config.getInt(replica_name + ".priority", 0); + + replica_configuration.host = dict_config.getString(replica_name + ".host", common_configuration.host); + replica_configuration.port = dict_config.getUInt(replica_name + ".port", common_configuration.port); + replica_configuration.username = dict_config.getString(replica_name + ".user", common_configuration.username); + replica_configuration.password = dict_config.getString(replica_name + ".password", common_configuration.password); + + if (replica_configuration.host.empty() || replica_configuration.port == 0 + || replica_configuration.username.empty() || replica_configuration.password.empty()) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Named collection of connection parameters is missing some of the parameters and no other dictionary parameters are added"); + } + + configuration.replicas_configurations[priority].emplace_back(replica_configuration); + } + } + } + else + { + configuration.replicas_configurations[0].emplace_back(common_configuration); + } + + return configuration; +} + + +void URLBasedDataSourceConfiguration::set(const URLBasedDataSourceConfiguration & conf) +{ + url = conf.url; + format = conf.format; + compression_method = conf.compression_method; + structure = conf.structure; +} + + +std::optional getURLBasedDataSourceConfiguration(const ASTs & args, ContextPtr context) +{ + if (args.empty()) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "External data source must have arguments"); + + URLBasedDataSourceConfiguration configuration; + StorageSpecificArgs non_common_args; + + if (const auto * collection = typeid_cast(args[0].get())) + { + const auto & config = context->getConfigRef(); + auto config_prefix = fmt::format("named_collections.{}", collection->name()); + + if (!config.has(config_prefix)) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "There is no collection named `{}` in config", collection->name()); + + Poco::Util::AbstractConfiguration::Keys keys; + config.keys(config_prefix, keys); + for (const auto & key : keys) + { + if (key == "url") + { + configuration.url = config.getString(config_prefix + ".url", ""); + } + else if (key == "headers") + { + Poco::Util::AbstractConfiguration::Keys header_keys; + config.keys(config_prefix + ".headers", header_keys); + for (const auto & header : header_keys) + { + const auto header_prefix = config_prefix + ".headers." + header; + configuration.headers.emplace_back(std::make_pair(config.getString(header_prefix + ".name"), config.getString(header_prefix + ".value"))); + } + } + else + non_common_args.emplace_back(std::make_pair(key, config.getString(config_prefix + '.' + key))); + } + + for (size_t i = 1; i < args.size(); ++i) + { + if (const auto * ast_function = typeid_cast(args[i].get())) + { + const auto * args_expr = assert_cast(ast_function->arguments.get()); + auto function_args = args_expr->children; + if (function_args.size() != 2) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Expected key-value defined argument"); + + auto arg_name = function_args[0]->as()->name(); + auto arg_value = evaluateConstantExpressionOrIdentifierAsLiteral(function_args[1], context)->as()->value; + + if (arg_name == "url") + configuration.url = arg_value.safeGet(); + else if (arg_name == "format") + configuration.format = arg_value.safeGet(); + else if (arg_name == "compression_method") + configuration.compression_method = arg_value.safeGet(); + else if (arg_name == "structure") + configuration.structure = arg_value.safeGet(); + else + non_common_args.emplace_back(std::make_pair(arg_name, arg_value)); + } + else + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Expected key-value defined argument"); + } + } + + if (configuration.url.empty() || configuration.format.empty()) + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Storage requires {}", configuration.url.empty() ? "url" : "format"); + + URLBasedDataSourceConfig source_config{ .configuration = configuration, .specific_args = non_common_args }; + return source_config; + } + return std::nullopt; +} + +} diff --git a/src/Storages/ExternalDataSourceConfiguration.h b/src/Storages/ExternalDataSourceConfiguration.h new file mode 100644 index 00000000000..df3b4f6da1f --- /dev/null +++ b/src/Storages/ExternalDataSourceConfiguration.h @@ -0,0 +1,115 @@ +#pragma once + +#include +#include +#include + + +namespace DB +{ + +struct ExternalDataSourceConfiguration +{ + String host; + UInt16 port = 0; + String username; + String password; + String database; + String table; + String schema; + + std::vector> addresses; /// Failover replicas. + + String toString() const; + + void set(const ExternalDataSourceConfiguration & conf); +}; + +using ExternalDataSourceConfigurationPtr = std::shared_ptr; + + +struct StoragePostgreSQLConfiguration : ExternalDataSourceConfiguration +{ + String on_conflict; +}; + + +struct StorageMySQLConfiguration : ExternalDataSourceConfiguration +{ + bool replace_query = false; + String on_duplicate_clause; +}; + +struct StorageMongoDBConfiguration : ExternalDataSourceConfiguration +{ + String collection; + String options; +}; + + +using StorageSpecificArgs = std::vector>; + +struct ExternalDataSourceConfig +{ + ExternalDataSourceConfiguration configuration; + StorageSpecificArgs specific_args; +}; + +/* If there is a storage engine's configuration specified in the named_collections, + * this function returns valid for usage ExternalDataSourceConfiguration struct + * otherwise std::nullopt is returned. + * + * If any configuration options are provided as key-value engine arguments, they will override + * configuration values, i.e. ENGINE = PostgreSQL(postgresql_configuration, database = 'postgres_database'); + * + * Any key-value engine argument except common (`host`, `port`, `username`, `password`, `database`) + * is returned in EngineArgs struct. + */ +std::optional getExternalDataSourceConfiguration(const ASTs & args, ContextPtr context, bool is_database_engine = false); + +ExternalDataSourceConfiguration getExternalDataSourceConfiguration( + const Poco::Util::AbstractConfiguration & dict_config, const String & dict_config_prefix, ContextPtr context); + + +/// Highest priority is 0, the bigger the number in map, the less the priority. +using ExternalDataSourcesConfigurationByPriority = std::map>; + +struct ExternalDataSourcesByPriority +{ + String database; + String table; + String schema; + ExternalDataSourcesConfigurationByPriority replicas_configurations; +}; + +ExternalDataSourcesByPriority +getExternalDataSourceConfigurationByPriority(const Poco::Util::AbstractConfiguration & dict_config, const String & dict_config_prefix, ContextPtr context); + + +struct URLBasedDataSourceConfiguration +{ + String url; + String format; + String compression_method = "auto"; + String structure; + + std::vector> headers; + + void set(const URLBasedDataSourceConfiguration & conf); +}; + +struct StorageS3Configuration : URLBasedDataSourceConfiguration +{ + String access_key_id; + String secret_access_key; +}; + +struct URLBasedDataSourceConfig +{ + URLBasedDataSourceConfiguration configuration; + StorageSpecificArgs specific_args; +}; + +std::optional getURLBasedDataSourceConfiguration(const ASTs & args, ContextPtr context); + +} diff --git a/src/Storages/MergeTree/BackgroundJobsAssignee.h b/src/Storages/MergeTree/BackgroundJobsAssignee.h index 2be92502347..4ccb4c0169b 100644 --- a/src/Storages/MergeTree/BackgroundJobsAssignee.h +++ b/src/Storages/MergeTree/BackgroundJobsAssignee.h @@ -71,7 +71,7 @@ public: void scheduleMoveTask(ExecutableTaskPtr move_task); /// Just call finish - virtual ~BackgroundJobsAssignee(); + ~BackgroundJobsAssignee(); BackgroundJobsAssignee( MergeTreeData & data_, diff --git a/src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp b/src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp index 6eb9f55a233..304aee86e44 100644 --- a/src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp +++ b/src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp @@ -114,6 +114,7 @@ void MergeTreeBackgroundExecutor::routine(TaskRuntimeDataPtr item) /// This is significant to order the destructors. item->task.reset(); + item->is_done.set(); return; } @@ -149,6 +150,7 @@ void MergeTreeBackgroundExecutor::routine(TaskRuntimeDataPtr item) /// The thread that shutdowns storage will scan queues in order to find some tasks to wait for, but will find nothing. /// So, the destructor of a task and the destructor of a storage will be executed concurrently. item->task.reset(); + item->is_done.set(); } } @@ -176,13 +178,7 @@ void MergeTreeBackgroundExecutor::threadFunction() active.push_back(item); } - routine(item); - - /// When storage shutdowns it will wait until all related background tasks - /// are finished, because they may want to interact with its fields - /// and this will cause segfault. - if (item->is_currently_deleting) - item->is_done.set(); + routine(std::move(item)); } catch (...) { diff --git a/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h b/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h index 27a08f4628a..6c3d904b3f0 100644 --- a/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h +++ b/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h @@ -128,7 +128,7 @@ private: ExecutableTaskPtr task; CurrentMetrics::Increment increment; - std::atomic_bool is_currently_deleting{false}; + bool is_currently_deleting{false}; /// Actually autoreset=false is needed only for unit test /// where multiple threads could remove tasks corresponding to the same storage /// This scenario in not possible in reality. diff --git a/src/Storages/MergeTree/MergeTreeData.cpp b/src/Storages/MergeTree/MergeTreeData.cpp index ee1387af49b..00e7cb09137 100644 --- a/src/Storages/MergeTree/MergeTreeData.cpp +++ b/src/Storages/MergeTree/MergeTreeData.cpp @@ -978,6 +978,7 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks) DataPartsVector broken_parts_to_detach; size_t suspicious_broken_parts = 0; + size_t suspicious_broken_parts_bytes = 0; std::atomic has_adaptive_parts = false; std::atomic has_non_adaptive_parts = false; @@ -1004,17 +1005,18 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks) if (part_disk_ptr->exists(marker_path)) { + /// NOTE: getBytesOnDisk() cannot be used here, since it maybe zero of checksums.txt will not exist + size_t size_of_part = IMergeTreeDataPart::calculateTotalSizeOnDisk(part->volume->getDisk(), part->getFullRelativePath()); LOG_WARNING(log, - "Detaching stale part {}{}, which should have been deleted after a move. That can only happen " - "after unclean restart of ClickHouse after move of a part having an operation blocking that " - "stale copy of part.", - getFullPathOnDisk(part_disk_ptr), part_name); - + "Detaching stale part {}{} (size: {}), which should have been deleted after a move. " + "That can only happen after unclean restart of ClickHouse after move of a part having an operation blocking that stale copy of part.", + getFullPathOnDisk(part_disk_ptr), part_name, formatReadableSizeWithBinarySuffix(size_of_part)); std::lock_guard loading_lock(mutex); broken_parts_to_detach.push_back(part); ++suspicious_broken_parts; + suspicious_broken_parts_bytes += size_of_part; return; } @@ -1043,16 +1045,20 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks) /// Ignore broken parts that can appear as a result of hard server restart. if (broken) { - LOG_ERROR(log, - "Detaching broken part {}{}. If it happened after update, it is likely because of backward " - "incompatibility. You need to resolve this manually", - getFullPathOnDisk(part_disk_ptr), part_name); + /// NOTE: getBytesOnDisk() cannot be used here, since it maybe zero of checksums.txt will not exist + size_t size_of_part = IMergeTreeDataPart::calculateTotalSizeOnDisk(part->volume->getDisk(), part->getFullRelativePath()); + LOG_ERROR(log, + "Detaching broken part {}{} (size: {}). " + "If it happened after update, it is likely because of backward incompability. " + "You need to resolve this manually", + getFullPathOnDisk(part_disk_ptr), part_name, formatReadableSizeWithBinarySuffix(size_of_part)); std::lock_guard loading_lock(mutex); broken_parts_to_detach.push_back(part); ++suspicious_broken_parts; + suspicious_broken_parts_bytes += size_of_part; return; } @@ -1099,8 +1105,14 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks) has_non_adaptive_index_granularity_parts = has_non_adaptive_parts; if (suspicious_broken_parts > settings->max_suspicious_broken_parts && !skip_sanity_checks) - throw Exception("Suspiciously many (" + toString(suspicious_broken_parts) + ") broken parts to remove.", - ErrorCodes::TOO_MANY_UNEXPECTED_DATA_PARTS); + throw Exception(ErrorCodes::TOO_MANY_UNEXPECTED_DATA_PARTS, + "Suspiciously many ({}) broken parts to remove.", + suspicious_broken_parts); + + if (suspicious_broken_parts_bytes > settings->max_suspicious_broken_parts_bytes && !skip_sanity_checks) + throw Exception(ErrorCodes::TOO_MANY_UNEXPECTED_DATA_PARTS, + "Suspiciously big size ({}) of all broken parts to remove.", + formatReadableSizeWithBinarySuffix(suspicious_broken_parts_bytes)); for (auto & part : broken_parts_to_detach) part->renameToDetached("broken-on-start"); /// detached parts must not have '_' in prefixes diff --git a/src/Storages/MergeTree/MergeTreeIndexBloomFilter.cpp b/src/Storages/MergeTree/MergeTreeIndexBloomFilter.cpp index 83c29c40c3f..cec09f53cdf 100644 --- a/src/Storages/MergeTree/MergeTreeIndexBloomFilter.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexBloomFilter.cpp @@ -46,20 +46,29 @@ MergeTreeIndexGranulePtr MergeTreeIndexBloomFilter::createIndexGranule() const bool MergeTreeIndexBloomFilter::mayBenefitFromIndexForIn(const ASTPtr & node) const { - const String & column_name = node->getColumnName(); + Names required_columns = index.expression->getRequiredColumns(); + NameSet required_columns_set(required_columns.begin(), required_columns.end()); - for (const auto & cname : index.column_names) - if (column_name == cname) + std::vector nodes_to_check; + nodes_to_check.emplace_back(node); + + while (!nodes_to_check.empty()) + { + auto node_to_check = nodes_to_check.back(); + nodes_to_check.pop_back(); + + const auto & column_name = node_to_check->getColumnName(); + if (required_columns_set.find(column_name) != required_columns_set.end()) return true; - if (const auto * func = typeid_cast(node.get())) - { - for (const auto & children : func->arguments->children) - if (mayBenefitFromIndexForIn(children)) - return true; + if (const auto * function = typeid_cast(node_to_check.get())) + { + auto & function_arguments_children = function->arguments->children; + nodes_to_check.insert(nodes_to_check.end(), function_arguments_children.begin(), function_arguments_children.end()); + } } - return false; + return true; } MergeTreeIndexAggregatorPtr MergeTreeIndexBloomFilter::createIndexAggregator() const diff --git a/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.cpp b/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.cpp index 7ca6f9ff1bd..cb617b0ef22 100644 --- a/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.cpp @@ -1,6 +1,7 @@ #include #include #include +#include #include #include #include @@ -188,7 +189,6 @@ bool MergeTreeIndexConditionBloomFilter::mayBeTrueOnGranule(const MergeTreeIndex const auto & filter = filters[query_index_hash.first]; const ColumnPtr & hash_column = query_index_hash.second; - match_rows = maybeTrueOnBloomFilter(&*hash_column, filter, hash_functions, @@ -276,7 +276,9 @@ bool MergeTreeIndexConditionBloomFilter::traverseFunction(const ASTPtr & node, B if (functionIsInOrGlobalInOperator(function->name)) { - if (const auto & prepared_set = getPreparedSet(arguments[1])) + auto prepared_set = getPreparedSet(arguments[1]); + + if (prepared_set) { if (traverseASTIn(function->name, arguments[0], prepared_set, out)) maybe_useful = true; @@ -285,6 +287,7 @@ bool MergeTreeIndexConditionBloomFilter::traverseFunction(const ASTPtr & node, B else if (function->name == "equals" || function->name == "notEquals" || function->name == "has" || + function->name == "mapContains" || function->name == "indexOf" || function->name == "hasAny" || function->name == "hasAll") @@ -308,14 +311,22 @@ bool MergeTreeIndexConditionBloomFilter::traverseFunction(const ASTPtr & node, B } bool MergeTreeIndexConditionBloomFilter::traverseASTIn( - const String & function_name, const ASTPtr & key_ast, const SetPtr & prepared_set, RPNElement & out) + const String & function_name, + const ASTPtr & key_ast, + const SetPtr & prepared_set, + RPNElement & out) { const auto prepared_info = getPreparedSetInfo(prepared_set); - return traverseASTIn(function_name, key_ast, prepared_info.type, prepared_info.column, out); + return traverseASTIn(function_name, key_ast, prepared_set, prepared_info.type, prepared_info.column, out); } bool MergeTreeIndexConditionBloomFilter::traverseASTIn( - const String & function_name, const ASTPtr & key_ast, const DataTypePtr & type, const ColumnPtr & column, RPNElement & out) + const String & function_name, + const ASTPtr & key_ast, + const SetPtr & prepared_set, + const DataTypePtr & type, + const ColumnPtr & column, + RPNElement & out) { if (header.has(key_ast->getColumnName())) { @@ -352,10 +363,83 @@ bool MergeTreeIndexConditionBloomFilter::traverseASTIn( const auto & sub_data_types = tuple_data_type->getElements(); for (size_t index = 0; index < arguments.size(); ++index) - match_with_subtype |= traverseASTIn(function_name, arguments[index], sub_data_types[index], sub_columns[index], out); + match_with_subtype |= traverseASTIn(function_name, arguments[index], nullptr, sub_data_types[index], sub_columns[index], out); return match_with_subtype; } + + if (function->name == "arrayElement") + { + /** Try to parse arrayElement for mapKeys index. + * It is important to ignore keys like column_map['Key'] IN ('') because if key does not exists in map + * we return default value for arrayElement. + * + * We cannot skip keys that does not exist in map if comparison is with default type value because + * that way we skip necessary granules where map key does not exists. + */ + + if (!prepared_set) + return false; + + auto default_column_to_check = type->createColumnConstWithDefaultValue(1)->convertToFullColumnIfConst(); + ColumnWithTypeAndName default_column_with_type_to_check { default_column_to_check, type, "" }; + ColumnsWithTypeAndName default_columns_with_type_to_check = {default_column_with_type_to_check}; + auto set_contains_default_value_predicate_column = prepared_set->execute(default_columns_with_type_to_check, false /*negative*/); + const auto & set_contains_default_value_predicate_column_typed = assert_cast(*set_contains_default_value_predicate_column); + bool set_contain_default_value = set_contains_default_value_predicate_column_typed.getData()[0]; + if (set_contain_default_value) + return false; + + const auto & col_name = assert_cast(function->arguments.get()->children[0].get())->name(); + auto map_keys_index_column_name = fmt::format("mapKeys({})", col_name); + auto map_values_index_column_name = fmt::format("mapValues({})", col_name); + + if (header.has(map_keys_index_column_name)) + { + /// For mapKeys we serialize key argument with bloom filter + + auto & argument = function->arguments.get()->children[1]; + + if (const auto * literal = argument->as()) + { + size_t position = header.getPositionByName(map_keys_index_column_name); + const DataTypePtr & index_type = header.getByPosition(position).type; + + auto element_key = literal->value; + const DataTypePtr actual_type = BloomFilter::getPrimitiveType(index_type); + out.predicate.emplace_back(std::make_pair(position, BloomFilterHash::hashWithField(actual_type.get(), element_key))); + return true; + } + else + { + return false; + } + } + else if (header.has(map_values_index_column_name)) + { + /// For mapValues we serialize set with bloom filter + + size_t row_size = column->size(); + size_t position = header.getPositionByName(map_values_index_column_name); + const DataTypePtr & index_type = header.getByPosition(position).type; + const auto & array_type = assert_cast(*index_type); + const auto & array_nested_type = array_type.getNestedType(); + const auto & converted_column = castColumn(ColumnWithTypeAndName{column, type, ""}, array_nested_type); + out.predicate.emplace_back(std::make_pair(position, BloomFilterHash::hashWithColumn(array_nested_type, converted_column, 0, row_size))); + } + else + { + return false; + } + + if (function_name == "in" || function_name == "globalIn") + out.function = RPNElement::FUNCTION_IN; + + if (function_name == "notIn" || function_name == "globalNotIn") + out.function = RPNElement::FUNCTION_NOT_IN; + + return true; + } } return false; @@ -420,7 +504,12 @@ static bool indexOfCanUseBloomFilter(const ASTPtr & parent) bool MergeTreeIndexConditionBloomFilter::traverseASTEquals( - const String & function_name, const ASTPtr & key_ast, const DataTypePtr & value_type, const Field & value_field, RPNElement & out, const ASTPtr & parent) + const String & function_name, + const ASTPtr & key_ast, + const DataTypePtr & value_type, + const Field & value_field, + RPNElement & out, + const ASTPtr & parent) { if (header.has(key_ast->getColumnName())) { @@ -488,6 +577,29 @@ bool MergeTreeIndexConditionBloomFilter::traverseASTEquals( return true; } + if (function_name == "mapContains" || function_name == "has") + { + const auto & col_name = assert_cast(key_ast.get())->name(); + auto map_keys_index_column_name = fmt::format("mapKeys({})", col_name); + + if (!header.has(map_keys_index_column_name)) + return false; + + size_t position = header.getPositionByName(map_keys_index_column_name); + const DataTypePtr & index_type = header.getByPosition(position).type; + const auto * array_type = typeid_cast(index_type.get()); + + if (!array_type) + return false; + + out.function = RPNElement::FUNCTION_HAS; + const DataTypePtr actual_type = BloomFilter::getPrimitiveType(array_type->getNestedType()); + Field converted_field = convertFieldToType(value_field, *actual_type, value_type.get()); + out.predicate.emplace_back(std::make_pair(position, BloomFilterHash::hashWithField(actual_type.get(), converted_field))); + + return true; + } + if (const auto * function = key_ast->as()) { WhichDataType which(value_type); @@ -509,6 +621,42 @@ bool MergeTreeIndexConditionBloomFilter::traverseASTEquals( return match_with_subtype; } + + if (function->name == "arrayElement") + { + /** Try to parse arrayElement for mapKeys index. + * It is important to ignore keys like column_map['Key'] = '' because if key does not exists in map + * we return default value for arrayElement. + * + * We cannot skip keys that does not exist in map if comparison is with default type value because + * that way we skip necessary granules where map key does not exists. + */ + if (value_field == value_type->getDefault()) + return false; + + const auto & col_name = assert_cast(function->arguments.get()->children[0].get())->name(); + + auto map_keys_index_column_name = fmt::format("mapKeys({})", col_name); + + if (!header.has(map_keys_index_column_name)) + return false; + + size_t position = header.getPositionByName(map_keys_index_column_name); + const DataTypePtr & index_type = header.getByPosition(position).type; + out.function = function_name == "equals" ? RPNElement::FUNCTION_EQUALS : RPNElement::FUNCTION_NOT_EQUALS; + + auto & argument = function->arguments.get()->children[1]; + + if (const auto * literal = argument->as()) + { + auto element_key = literal->value; + const DataTypePtr actual_type = BloomFilter::getPrimitiveType(index_type); + out.predicate.emplace_back(std::make_pair(position, BloomFilterHash::hashWithField(actual_type.get(), element_key))); + return true; + } + + return false; + } } return false; diff --git a/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.h b/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.h index f2bbd047ca1..5c6559ba298 100644 --- a/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.h +++ b/src/Storages/MergeTree/MergeTreeIndexConditionBloomFilter.h @@ -70,13 +70,27 @@ private: bool traverseFunction(const ASTPtr & node, Block & block_with_constants, RPNElement & out, const ASTPtr & parent); - bool traverseASTIn(const String & function_name, const ASTPtr & key_ast, const SetPtr & prepared_set, RPNElement & out); + bool traverseASTIn( + const String & function_name, + const ASTPtr & key_ast, + const SetPtr & prepared_set, + RPNElement & out); bool traverseASTIn( - const String & function_name, const ASTPtr & key_ast, const DataTypePtr & type, const ColumnPtr & column, RPNElement & out); + const String & function_name, + const ASTPtr & key_ast, + const SetPtr & prepared_set, + const DataTypePtr & type, + const ColumnPtr & column, + RPNElement & out); bool traverseASTEquals( - const String & function_name, const ASTPtr & key_ast, const DataTypePtr & value_type, const Field & value_field, RPNElement & out, const ASTPtr & parent); + const String & function_name, + const ASTPtr & key_ast, + const DataTypePtr & value_type, + const Field & value_field, + RPNElement & out, + const ASTPtr & parent); }; } diff --git a/src/Storages/MergeTree/MergeTreeIndexFullText.cpp b/src/Storages/MergeTree/MergeTreeIndexFullText.cpp index 1c71d77b334..8f43b1606cb 100644 --- a/src/Storages/MergeTree/MergeTreeIndexFullText.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexFullText.cpp @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -155,13 +156,40 @@ void MergeTreeIndexAggregatorFullText::update(const Block & block, size_t * pos, for (size_t col = 0; col < index_columns.size(); ++col) { - const auto & column = block.getByName(index_columns[col]).column; - for (size_t i = 0; i < rows_read; ++i) + const auto & column_with_type = block.getByName(index_columns[col]); + const auto & column = column_with_type.column; + size_t current_position = *pos; + + if (isArray(column_with_type.type)) { - auto ref = column->getDataAt(*pos + i); - columnToBloomFilter(ref.data, ref.size, token_extractor, granule->bloom_filters[col]); + const auto & column_array = assert_cast(*column); + const auto & column_offsets = column_array.getOffsets(); + const auto & column_key = column_array.getData(); + + for (size_t i = 0; i < rows_read; ++i) + { + size_t element_start_row = column_offsets[current_position - 1]; + size_t elements_size = column_offsets[current_position] - element_start_row; + + for (size_t row_num = 0; row_num < elements_size; row_num++) + { + auto ref = column_key.getDataAt(element_start_row + row_num); + columnToBloomFilter(ref.data, ref.size, token_extractor, granule->bloom_filters[col]); + } + + current_position += 1; + } + } + else + { + for (size_t i = 0; i < rows_read; ++i) + { + auto ref = column->getDataAt(current_position + i); + columnToBloomFilter(ref.data, ref.size, token_extractor, granule->bloom_filters[col]); + } } } + granule->has_elems = true; *pos += rows_read; } @@ -202,6 +230,7 @@ bool MergeTreeConditionFullText::alwaysUnknownOrTrue() const } else if (element.function == RPNElement::FUNCTION_EQUALS || element.function == RPNElement::FUNCTION_NOT_EQUALS + || element.function == RPNElement::FUNCTION_HAS || element.function == RPNElement::FUNCTION_IN || element.function == RPNElement::FUNCTION_NOT_IN || element.function == RPNElement::FUNCTION_MULTI_SEARCH @@ -251,7 +280,8 @@ bool MergeTreeConditionFullText::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx rpn_stack.emplace_back(true, true); } else if (element.function == RPNElement::FUNCTION_EQUALS - || element.function == RPNElement::FUNCTION_NOT_EQUALS) + || element.function == RPNElement::FUNCTION_NOT_EQUALS + || element.function == RPNElement::FUNCTION_HAS) { rpn_stack.emplace_back(granule->bloom_filters[element.key_column].contains(*element.bloom_filter), true); @@ -378,6 +408,15 @@ bool MergeTreeConditionFullText::atomFromAST( else if (!token_extractor->supportLike() && (func_name == "like" || func_name == "notLike")) return false; + if (func_name == "has") + { + out.key_column = key_column_num; + out.function = RPNElement::FUNCTION_HAS; + out.bloom_filter = std::make_unique(params); + stringToBloomFilter(const_value.get(), token_extractor, *out.bloom_filter); + + return true; + } if (func_name == "notEquals") { out.key_column = key_column_num; @@ -837,10 +876,18 @@ MergeTreeIndexPtr bloomFilterIndexCreator( void bloomFilterIndexValidator(const IndexDescription & index, bool /*attach*/) { - for (const auto & data_type : index.data_types) + for (const auto & index_data_type : index.data_types) { - if (data_type->getTypeId() != TypeIndex::String && data_type->getTypeId() != TypeIndex::FixedString) - throw Exception("Bloom filter index can be used only with `String` or `FixedString` column.", ErrorCodes::INCORRECT_QUERY); + WhichDataType data_type(index_data_type); + + if (data_type.isArray()) + { + const auto & array_type = assert_cast(*index_data_type); + data_type = WhichDataType(array_type.getNestedType()); + } + + if (!data_type.isString() && !data_type.isFixedString()) + throw Exception("Bloom filter index can be used only with `String`, `FixedString` column or Array with `String` or `FixedString` values column.", ErrorCodes::INCORRECT_QUERY); } if (index.type == NgramTokenExtractor::getName()) diff --git a/src/Storages/MergeTree/MergeTreeIndexFullText.h b/src/Storages/MergeTree/MergeTreeIndexFullText.h index d34cbc61da2..b1c70a9c04f 100644 --- a/src/Storages/MergeTree/MergeTreeIndexFullText.h +++ b/src/Storages/MergeTree/MergeTreeIndexFullText.h @@ -112,6 +112,7 @@ private: /// Atoms of a Boolean expression. FUNCTION_EQUALS, FUNCTION_NOT_EQUALS, + FUNCTION_HAS, FUNCTION_IN, FUNCTION_NOT_IN, FUNCTION_MULTI_SEARCH, diff --git a/src/Storages/MergeTree/MergeTreeSettings.h b/src/Storages/MergeTree/MergeTreeSettings.h index 889b89b9a27..92a892c963f 100644 --- a/src/Storages/MergeTree/MergeTreeSettings.h +++ b/src/Storages/MergeTree/MergeTreeSettings.h @@ -79,6 +79,7 @@ struct Settings; M(Seconds, try_fetch_recompressed_part_timeout, 7200, "Recompression works slow in most cases, so we don't start merge with recompression until this timeout and trying to fetch recompressed part from replica which assigned this merge with recompression.", 0) \ M(Bool, always_fetch_merged_part, false, "If true, replica never merge parts and always download merged parts from other replicas.", 0) \ M(UInt64, max_suspicious_broken_parts, 10, "Max broken parts, if more - deny automatic deletion.", 0) \ + M(UInt64, max_suspicious_broken_parts_bytes, 1ULL * 1024 * 1024 * 1024, "Max size of all broken parts, if more - deny automatic deletion.", 0) \ M(UInt64, max_files_to_modify_in_alter_columns, 75, "Not apply ALTER if number of files for modification(deletion, addition) more than this.", 0) \ M(UInt64, max_files_to_remove_in_alter_columns, 50, "Not apply ALTER, if number of files for deletion more than this.", 0) \ M(Float, replicated_max_ratio_of_wrong_parts, 0.5, "If ratio of wrong parts to total number of parts is less than this - allow to start.", 0) \ diff --git a/src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp b/src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp index 46033efc12e..e7b5fa8256c 100644 --- a/src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp +++ b/src/Storages/PostgreSQL/MaterializedPostgreSQLConsumer.cpp @@ -436,7 +436,7 @@ void MaterializedPostgreSQLConsumer::processReplicationMessage(const char * repl if (new_relation_definition) { - current_schema_data.column_identifiers.emplace_back(std::make_tuple(data_type_id, type_modifier)); + current_schema_data.column_identifiers.emplace_back(std::make_pair(data_type_id, type_modifier)); } else { diff --git a/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.cpp b/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.cpp index fdded7283c4..1b17e6c0c6e 100644 --- a/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.cpp +++ b/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.cpp @@ -1,10 +1,11 @@ #include "StorageMaterializedPostgreSQL.h" #if USE_LIBPQXX +#include #include -#include #include #include +#include #include #include #include @@ -20,8 +21,8 @@ #include #include #include -#include #include +#include #include @@ -30,7 +31,6 @@ namespace DB namespace ErrorCodes { - extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int LOGICAL_ERROR; extern const int BAD_ARGUMENTS; } @@ -471,22 +471,6 @@ void registerStorageMaterializedPostgreSQL(StorageFactory & factory) { auto creator_fn = [](const StorageFactory::Arguments & args) { - ASTs & engine_args = args.engine_args; - bool has_settings = args.storage_def->settings; - auto postgresql_replication_settings = std::make_unique(); - - if (has_settings) - postgresql_replication_settings->loadFromQuery(*args.storage_def); - - if (engine_args.size() != 5) - throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, - "Storage MaterializedPostgreSQL requires 5 parameters: " - "PostgreSQL('host:port', 'database', 'table', 'username', 'password'. Got {}", - engine_args.size()); - - for (auto & engine_arg : engine_args) - engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, args.getContext()); - StorageInMemoryMetadata metadata; metadata.setColumns(args.columns); metadata.setConstraints(args.constraints); @@ -502,20 +486,19 @@ void registerStorageMaterializedPostgreSQL(StorageFactory & factory) else metadata.primary_key = KeyDescription::getKeyFromAST(args.storage_def->order_by->ptr(), metadata.columns, args.getContext()); - auto parsed_host_port = parseAddress(engine_args[0]->as().value.safeGet(), 5432); - const String & remote_table = engine_args[2]->as().value.safeGet(); - const String & remote_database = engine_args[1]->as().value.safeGet(); - - /// No connection is made here, see Storages/PostgreSQL/PostgreSQLConnection.cpp + auto configuration = StoragePostgreSQL::getConfiguration(args.engine_args, args.getContext()); auto connection_info = postgres::formatConnectionString( - remote_database, - parsed_host_port.first, - parsed_host_port.second, - engine_args[3]->as().value.safeGet(), - engine_args[4]->as().value.safeGet()); + configuration.database, configuration.host, configuration.port, + configuration.username, configuration.password); + + bool has_settings = args.storage_def->settings; + auto postgresql_replication_settings = std::make_unique(); + + if (has_settings) + postgresql_replication_settings->loadFromQuery(*args.storage_def); return StorageMaterializedPostgreSQL::create( - args.table_id, args.attach, remote_database, remote_table, connection_info, + args.table_id, args.attach, configuration.database, configuration.table, connection_info, metadata, args.getContext(), std::move(postgresql_replication_settings)); }; diff --git a/src/Storages/StorageExternalDistributed.cpp b/src/Storages/StorageExternalDistributed.cpp index f20e49fe23a..52f8262398c 100644 --- a/src/Storages/StorageExternalDistributed.cpp +++ b/src/Storages/StorageExternalDistributed.cpp @@ -15,6 +15,7 @@ #include #include #include +#include #include @@ -31,10 +32,7 @@ StorageExternalDistributed::StorageExternalDistributed( const StorageID & table_id_, ExternalStorageEngine table_engine, const String & cluster_description, - const String & remote_database, - const String & remote_table, - const String & username, - const String & password, + const ExternalDataSourceConfiguration & configuration, const ColumnsDescription & columns_, const ConstraintsDescription & constraints_, const String & comment, @@ -66,15 +64,16 @@ StorageExternalDistributed::StorageExternalDistributed( addresses = parseRemoteDescriptionForExternalDatabase(shard_description, max_addresses, 3306); mysqlxx::PoolWithFailover pool( - remote_database, + configuration.database, addresses, - username, password); + configuration.username, + configuration.password); shard = StorageMySQL::create( table_id_, std::move(pool), - remote_database, - remote_table, + configuration.database, + configuration.table, /* replace_query = */ false, /* on_duplicate_clause = */ "", columns_, @@ -90,15 +89,16 @@ StorageExternalDistributed::StorageExternalDistributed( case ExternalStorageEngine::PostgreSQL: { addresses = parseRemoteDescriptionForExternalDatabase(shard_description, max_addresses, 5432); + StoragePostgreSQLConfiguration postgres_conf; + postgres_conf.set(configuration); + postgres_conf.addresses = addresses; auto pool = std::make_shared( - remote_database, - addresses, - username, password, + postgres_conf, context->getSettingsRef().postgresql_connection_pool_size, context->getSettingsRef().postgresql_connection_pool_wait_timeout); - shard = StoragePostgreSQL::create(table_id_, std::move(pool), remote_table, columns_, constraints_, String{}); + shard = StoragePostgreSQL::create(table_id_, std::move(pool), configuration.table, columns_, constraints_, String{}); break; } #endif @@ -113,13 +113,10 @@ StorageExternalDistributed::StorageExternalDistributed( } #else - (void)table_engine; - (void)remote_database; - (void)remote_table; - (void)username; - (void)password; - (void)shards_descriptions; + (void)configuration; + (void)cluster_description; (void)addresses; + (void)table_engine; #endif } @@ -207,64 +204,113 @@ void registerStorageExternalDistributed(StorageFactory & factory) factory.registerStorage("ExternalDistributed", [](const StorageFactory::Arguments & args) { ASTs & engine_args = args.engine_args; + if (engine_args.size() < 2) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Engine ExternalDistributed must have at least 2 arguments: engine_name, named_collection and/or description"); - if (engine_args.size() != 6) - throw Exception( - "Storage MySQLiDistributed requires 5 parameters: ExternalDistributed('engine_name', 'cluster_description', database, table, 'user', 'password').", - ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); - - for (auto & engine_arg : engine_args) - engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, args.getLocalContext()); - - const String & engine_name = engine_args[0]->as().value.safeGet(); - const String & addresses_description = engine_args[1]->as().value.safeGet(); - + auto engine_name = engine_args[0]->as().value.safeGet(); StorageExternalDistributed::ExternalStorageEngine table_engine; if (engine_name == "URL") - { table_engine = StorageExternalDistributed::ExternalStorageEngine::URL; + else if (engine_name == "MySQL") + table_engine = StorageExternalDistributed::ExternalStorageEngine::MySQL; + else if (engine_name == "PostgreSQL") + table_engine = StorageExternalDistributed::ExternalStorageEngine::PostgreSQL; + else + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "External storage engine {} is not supported for StorageExternalDistributed. Supported engines are: MySQL, PostgreSQL, URL", + engine_name); + + ASTs inner_engine_args(engine_args.begin() + 1, engine_args.end()); + String cluster_description; + + if (engine_name == "URL") + { + URLBasedDataSourceConfiguration configuration; + if (auto named_collection = getURLBasedDataSourceConfiguration(inner_engine_args, args.getLocalContext())) + { + auto [common_configuration, storage_specific_args] = named_collection.value(); + configuration.set(common_configuration); + + for (const auto & [name, value] : storage_specific_args) + { + if (name == "description") + cluster_description = value.safeGet(); + else + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Unknown key-value argument {} for table engine URL", name); + } + + if (cluster_description.empty()) + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Engine ExternalDistribued must have `description` key-value argument or named collection parameter"); + } + else + { + for (auto & engine_arg : engine_args) + engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, args.getLocalContext()); + + cluster_description = engine_args[1]->as().value.safeGet(); + configuration.format = engine_args[2]->as().value.safeGet(); + configuration.compression_method = "auto"; + if (engine_args.size() == 4) + configuration.compression_method = engine_args[3]->as().value.safeGet(); + } - const String & format_name = engine_args[2]->as().value.safeGet(); - String compression_method = "auto"; - if (engine_args.size() == 4) - compression_method = engine_args[3]->as().value.safeGet(); auto format_settings = StorageURL::getFormatSettingsFromArgs(args); return StorageExternalDistributed::create( - addresses_description, + cluster_description, args.table_id, - format_name, + configuration.format, format_settings, - compression_method, + configuration.compression_method, args.columns, args.constraints, args.getContext()); } else { - if (engine_name == "MySQL") - table_engine = StorageExternalDistributed::ExternalStorageEngine::MySQL; - else if (engine_name == "PostgreSQL") - table_engine = StorageExternalDistributed::ExternalStorageEngine::PostgreSQL; - else - throw Exception(ErrorCodes::BAD_ARGUMENTS, - "External storage engine {} is not supported for StorageExternalDistributed. Supported engines are: MySQL, PostgreSQL, URL", - engine_name); + ExternalDataSourceConfiguration configuration; + if (auto named_collection = getExternalDataSourceConfiguration(inner_engine_args, args.getLocalContext())) + { + auto [common_configuration, storage_specific_args] = named_collection.value(); + configuration.set(common_configuration); + + for (const auto & [name, value] : storage_specific_args) + { + if (name == "description") + cluster_description = value.safeGet(); + else + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Unknown key-value argument {} for table function URL", name); + } + + if (cluster_description.empty()) + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Engine ExternalDistribued must have `description` key-value argument or named collection parameter"); + } + else + { + if (engine_args.size() != 6) + throw Exception( + "Storage ExternalDistributed requires 5 parameters: " + "ExternalDistributed('engine_name', 'cluster_description', 'database', 'table', 'user', 'password').", + ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + + cluster_description = engine_args[1]->as().value.safeGet(); + configuration.database = engine_args[2]->as().value.safeGet(); + configuration.table = engine_args[3]->as().value.safeGet(); + configuration.username = engine_args[4]->as().value.safeGet(); + configuration.password = engine_args[5]->as().value.safeGet(); + } - const String & remote_database = engine_args[2]->as().value.safeGet(); - const String & remote_table = engine_args[3]->as().value.safeGet(); - const String & username = engine_args[4]->as().value.safeGet(); - const String & password = engine_args[5]->as().value.safeGet(); return StorageExternalDistributed::create( args.table_id, table_engine, - addresses_description, - remote_database, - remote_table, - username, - password, + cluster_description, + configuration, args.columns, args.constraints, args.comment, diff --git a/src/Storages/StorageExternalDistributed.h b/src/Storages/StorageExternalDistributed.h index c85276c09dd..9f04133d63d 100644 --- a/src/Storages/StorageExternalDistributed.h +++ b/src/Storages/StorageExternalDistributed.h @@ -11,11 +11,13 @@ namespace DB { +struct ExternalDataSourceConfiguration; + /// Storages MySQL and PostgreSQL use ConnectionPoolWithFailover and support multiple replicas. /// This class unites multiple storages with replicas into multiple shards with replicas. /// A query to external database is passed to one replica on each shard, the result is united. /// Replicas on each shard have the same priority, traversed replicas are moved to the end of the queue. -/// TODO: try `load_balancing` setting for replicas priorities same way as for table function `remote` +/// Similar approach is used for URL storage. class StorageExternalDistributed final : public shared_ptr_helper, public DB::IStorage { friend struct shared_ptr_helper; @@ -44,10 +46,7 @@ protected: const StorageID & table_id_, ExternalStorageEngine table_engine, const String & cluster_description, - const String & remote_database_, - const String & remote_table_, - const String & username, - const String & password, + const ExternalDataSourceConfiguration & configuration, const ColumnsDescription & columns_, const ConstraintsDescription & constraints_, const String & comment, diff --git a/src/Storages/StorageMongoDB.cpp b/src/Storages/StorageMongoDB.cpp index 3bdef7fd295..2a1f7cc2aa9 100644 --- a/src/Storages/StorageMongoDB.cpp +++ b/src/Storages/StorageMongoDB.cpp @@ -24,6 +24,7 @@ namespace ErrorCodes { extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int MONGODB_CANNOT_AUTHENTICATE; + extern const int BAD_ARGUMENTS; } StorageMongoDB::StorageMongoDB( @@ -102,42 +103,72 @@ Pipe StorageMongoDB::read( return Pipe(std::make_shared(connection, createCursor(database_name, collection_name, sample_block), sample_block, max_block_size, true)); } -void registerStorageMongoDB(StorageFactory & factory) -{ - factory.registerStorage("MongoDB", [](const StorageFactory::Arguments & args) - { - ASTs & engine_args = args.engine_args; +StorageMongoDBConfiguration StorageMongoDB::getConfiguration(ASTs engine_args, ContextPtr context) +{ + StorageMongoDBConfiguration configuration; + if (auto named_collection = getExternalDataSourceConfiguration(engine_args, context)) + { + auto [common_configuration, storage_specific_args] = named_collection.value(); + configuration.set(common_configuration); + + for (const auto & [arg_name, arg_value] : storage_specific_args) + { + if (arg_name == "collection") + configuration.collection = arg_value.safeGet(); + else if (arg_name == "options") + configuration.options = arg_value.safeGet(); + else + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Unexpected key-value argument." + "Got: {}, but expected one of:" + "host, port, username, password, database, table, options.", arg_name); + } + } + else + { if (engine_args.size() < 5 || engine_args.size() > 6) throw Exception( "Storage MongoDB requires from 5 to 6 parameters: MongoDB('host:port', database, collection, 'user', 'password' [, 'options']).", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); for (auto & engine_arg : engine_args) - engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, args.getLocalContext()); + engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, context); /// 27017 is the default MongoDB port. auto parsed_host_port = parseAddress(engine_args[0]->as().value.safeGet(), 27017); - const String & remote_database = engine_args[1]->as().value.safeGet(); - const String & collection = engine_args[2]->as().value.safeGet(); - const String & username = engine_args[3]->as().value.safeGet(); - const String & password = engine_args[4]->as().value.safeGet(); - - String options; + configuration.host = parsed_host_port.first; + configuration.port = parsed_host_port.second; + configuration.database = engine_args[1]->as().value.safeGet(); + configuration.collection = engine_args[2]->as().value.safeGet(); + configuration.username = engine_args[3]->as().value.safeGet(); + configuration.password = engine_args[4]->as().value.safeGet(); if (engine_args.size() >= 6) - options = engine_args[5]->as().value.safeGet(); + configuration.options = engine_args[5]->as().value.safeGet(); + + } + + return configuration; +} + + +void registerStorageMongoDB(StorageFactory & factory) +{ + factory.registerStorage("MongoDB", [](const StorageFactory::Arguments & args) + { + auto configuration = StorageMongoDB::getConfiguration(args.engine_args, args.getLocalContext()); return StorageMongoDB::create( args.table_id, - parsed_host_port.first, - parsed_host_port.second, - remote_database, - collection, - username, - password, - options, + configuration.host, + configuration.port, + configuration.database, + configuration.collection, + configuration.username, + configuration.password, + configuration.options, args.columns, args.constraints, args.comment); diff --git a/src/Storages/StorageMongoDB.h b/src/Storages/StorageMongoDB.h index 3014b88a9ca..c925418c888 100644 --- a/src/Storages/StorageMongoDB.h +++ b/src/Storages/StorageMongoDB.h @@ -3,6 +3,7 @@ #include #include +#include #include @@ -42,6 +43,8 @@ public: size_t max_block_size, unsigned num_streams) override; + static StorageMongoDBConfiguration getConfiguration(ASTs engine_args, ContextPtr context); + private: void connectIfNotConnected(); diff --git a/src/Storages/StorageMySQL.cpp b/src/Storages/StorageMySQL.cpp index 7f458ef82af..7f08dfbfe99 100644 --- a/src/Storages/StorageMySQL.cpp +++ b/src/Storages/StorageMySQL.cpp @@ -234,68 +234,91 @@ SinkToStoragePtr StorageMySQL::write(const ASTPtr & /*query*/, const StorageMeta local_context->getSettingsRef().mysql_max_rows_to_insert); } -void registerStorageMySQL(StorageFactory & factory) -{ - factory.registerStorage("MySQL", [](const StorageFactory::Arguments & args) - { - ASTs & engine_args = args.engine_args; +StorageMySQLConfiguration StorageMySQL::getConfiguration(ASTs engine_args, ContextPtr context_) +{ + StorageMySQLConfiguration configuration; + + if (auto named_collection = getExternalDataSourceConfiguration(engine_args, context_)) + { + auto [common_configuration, storage_specific_args] = named_collection.value(); + configuration.set(common_configuration); + configuration.addresses = {std::make_pair(configuration.host, configuration.port)}; + + for (const auto & [arg_name, arg_value] : storage_specific_args) + { + if (arg_name == "replace_query") + configuration.replace_query = arg_value.safeGet(); + else if (arg_name == "on_duplicate_clause") + configuration.on_duplicate_clause = arg_value.safeGet(); + else + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Unexpected key-value argument." + "Got: {}, but expected one of:" + "host, port, username, password, database, table, replace_query, on_duplicate_clause.", arg_name); + } + } + else + { if (engine_args.size() < 5 || engine_args.size() > 7) throw Exception( "Storage MySQL requires 5-7 parameters: MySQL('host:port' (or 'addresses_pattern'), database, table, 'user', 'password'[, replace_query, 'on_duplicate_clause']).", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); for (auto & engine_arg : engine_args) - engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, args.getLocalContext()); + engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, context_); - /// 3306 is the default MySQL port. - const String & host_port = engine_args[0]->as().value.safeGet(); - const String & remote_database = engine_args[1]->as().value.safeGet(); - const String & remote_table = engine_args[2]->as().value.safeGet(); - const String & username = engine_args[3]->as().value.safeGet(); - const String & password = engine_args[4]->as().value.safeGet(); - size_t max_addresses = args.getContext()->getSettingsRef().glob_expansion_max_elements; + const auto & host_port = engine_args[0]->as().value.safeGet(); + size_t max_addresses = context_->getSettingsRef().glob_expansion_max_elements; - /// TODO: move some arguments from the arguments to the SETTINGS. - MySQLSettings mysql_settings; + configuration.addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 3306); + configuration.database = engine_args[1]->as().value.safeGet(); + configuration.table = engine_args[2]->as().value.safeGet(); + configuration.username = engine_args[3]->as().value.safeGet(); + configuration.password = engine_args[4]->as().value.safeGet(); + + if (engine_args.size() >= 6) + configuration.replace_query = engine_args[5]->as().value.safeGet(); + if (engine_args.size() == 7) + configuration.on_duplicate_clause = engine_args[6]->as().value.safeGet(); + } + + if (configuration.replace_query && !configuration.on_duplicate_clause.empty()) + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Only one of 'replace_query' and 'on_duplicate_clause' can be specified, or none of them"); + + return configuration; +} + + +void registerStorageMySQL(StorageFactory & factory) +{ + factory.registerStorage("MySQL", [](const StorageFactory::Arguments & args) + { + auto configuration = StorageMySQL::getConfiguration(args.engine_args, args.getLocalContext()); + + MySQLSettings mysql_settings; /// TODO: move some arguments from the arguments to the SETTINGS. if (args.storage_def->settings) - { mysql_settings.loadFromQuery(*args.storage_def); - } if (!mysql_settings.connection_pool_size) throw Exception("connection_pool_size cannot be zero.", ErrorCodes::BAD_ARGUMENTS); - auto addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 3306); mysqlxx::PoolWithFailover pool( - remote_database, - addresses, - username, - password, + configuration.database, configuration.addresses, + configuration.username, configuration.password, MYSQLXX_POOL_WITH_FAILOVER_DEFAULT_START_CONNECTIONS, mysql_settings.connection_pool_size, mysql_settings.connection_max_tries, mysql_settings.connection_wait_timeout); - bool replace_query = false; - std::string on_duplicate_clause; - if (engine_args.size() >= 6) - replace_query = engine_args[5]->as().value.safeGet(); - if (engine_args.size() == 7) - on_duplicate_clause = engine_args[6]->as().value.safeGet(); - - if (replace_query && !on_duplicate_clause.empty()) - throw Exception( - "Only one of 'replace_query' and 'on_duplicate_clause' can be specified, or none of them", - ErrorCodes::BAD_ARGUMENTS); - return StorageMySQL::create( args.table_id, std::move(pool), - remote_database, - remote_table, - replace_query, - on_duplicate_clause, + configuration.database, + configuration.table, + configuration.replace_query, + configuration.on_duplicate_clause, args.columns, args.constraints, args.comment, diff --git a/src/Storages/StorageMySQL.h b/src/Storages/StorageMySQL.h index 70d7a4455b1..2aad915e073 100644 --- a/src/Storages/StorageMySQL.h +++ b/src/Storages/StorageMySQL.h @@ -10,6 +10,7 @@ #include #include +#include #include @@ -50,6 +51,8 @@ public: SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + static StorageMySQLConfiguration getConfiguration(ASTs engine_args, ContextPtr context_); + private: friend class StorageMySQLSink; diff --git a/src/Storages/StoragePostgreSQL.cpp b/src/Storages/StoragePostgreSQL.cpp index 3617e964734..d4fedaf78c8 100644 --- a/src/Storages/StoragePostgreSQL.cpp +++ b/src/Storages/StoragePostgreSQL.cpp @@ -3,43 +3,51 @@ #if USE_LIBPQXX #include -#include -#include -#include -#include +#include +#include +#include +#include + #include #include #include #include #include #include + #include #include #include #include -#include -#include -#include -#include #include + #include #include -#include -#include -#include -#include + #include -#include #include +#include +#include + +#include +#include +#include + +#include + +#include +#include + namespace DB { namespace ErrorCodes { - extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int NOT_IMPLEMENTED; + extern const int BAD_ARGUMENTS; + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; } StoragePostgreSQL::StoragePostgreSQL( @@ -376,53 +384,80 @@ SinkToStoragePtr StoragePostgreSQL::write( } +StoragePostgreSQLConfiguration StoragePostgreSQL::getConfiguration(ASTs engine_args, ContextPtr context) +{ + StoragePostgreSQLConfiguration configuration; + if (auto named_collection = getExternalDataSourceConfiguration(engine_args, context)) + { + auto [common_configuration, storage_specific_args] = named_collection.value(); + + configuration.set(common_configuration); + configuration.addresses = {std::make_pair(configuration.host, configuration.port)}; + + for (const auto & [arg_name, arg_value] : storage_specific_args) + { + if (arg_name == "on_conflict") + configuration.on_conflict = arg_value.safeGet(); + else + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Unexpected key-value argument." + "Got: {}, but expected one of:" + "host, port, username, password, database, table, schema, on_conflict.", arg_name); + } + } + else + { + if (engine_args.size() < 5 || engine_args.size() > 7) + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Storage PostgreSQL requires from 5 to 7 parameters: " + "PostgreSQL('host:port', 'database', 'table', 'username', 'password' [, 'schema', 'ON CONFLICT ...']. Got: {}", + engine_args.size()); + + for (auto & engine_arg : engine_args) + engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, context); + + const auto & host_port = engine_args[0]->as().value.safeGet(); + size_t max_addresses = context->getSettingsRef().glob_expansion_max_elements; + + configuration.addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 5432); + if (configuration.addresses.size() == 1) + { + configuration.host = configuration.addresses[0].first; + configuration.port = configuration.addresses[0].second; + } + + configuration.database = engine_args[1]->as().value.safeGet(); + configuration.table = engine_args[2]->as().value.safeGet(); + configuration.username = engine_args[3]->as().value.safeGet(); + configuration.password = engine_args[4]->as().value.safeGet(); + + if (engine_args.size() >= 6) + configuration.schema = engine_args[5]->as().value.safeGet(); + if (engine_args.size() >= 7) + configuration.on_conflict = engine_args[6]->as().value.safeGet(); + } + + return configuration; +} + + void registerStoragePostgreSQL(StorageFactory & factory) { factory.registerStorage("PostgreSQL", [](const StorageFactory::Arguments & args) { - ASTs & engine_args = args.engine_args; - - if (engine_args.size() < 5 || engine_args.size() > 7) - throw Exception("Storage PostgreSQL requires from 5 to 7 parameters: " - "PostgreSQL('host:port', 'database', 'table', 'username', 'password' [, 'schema', 'ON CONFLICT ...']", - ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); - - for (auto & engine_arg : engine_args) - engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, args.getLocalContext()); - - auto host_port = engine_args[0]->as().value.safeGet(); - /// Split into replicas if needed. - size_t max_addresses = args.getContext()->getSettingsRef().glob_expansion_max_elements; - auto addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 5432); - - const String & remote_database = engine_args[1]->as().value.safeGet(); - const String & remote_table = engine_args[2]->as().value.safeGet(); - const String & username = engine_args[3]->as().value.safeGet(); - const String & password = engine_args[4]->as().value.safeGet(); - - String remote_table_schema, on_conflict; - if (engine_args.size() >= 6) - remote_table_schema = engine_args[5]->as().value.safeGet(); - if (engine_args.size() >= 7) - on_conflict = engine_args[6]->as().value.safeGet(); - - auto pool = std::make_shared( - remote_database, - addresses, - username, - password, + auto configuration = StoragePostgreSQL::getConfiguration(args.engine_args, args.getLocalContext()); + auto pool = std::make_shared(configuration, args.getContext()->getSettingsRef().postgresql_connection_pool_size, args.getContext()->getSettingsRef().postgresql_connection_pool_wait_timeout); return StoragePostgreSQL::create( args.table_id, std::move(pool), - remote_table, + configuration.table, args.columns, args.constraints, args.comment, - remote_table_schema, - on_conflict); + configuration.schema, + configuration.on_conflict); }, { .source_access_type = AccessType::POSTGRES, diff --git a/src/Storages/StoragePostgreSQL.h b/src/Storages/StoragePostgreSQL.h index a12b52e6e48..31797dc4cb4 100644 --- a/src/Storages/StoragePostgreSQL.h +++ b/src/Storages/StoragePostgreSQL.h @@ -10,12 +10,12 @@ #include #include #include +#include namespace DB { - class StoragePostgreSQL final : public shared_ptr_helper, public IStorage { friend struct shared_ptr_helper; @@ -43,6 +43,8 @@ public: SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + static StoragePostgreSQLConfiguration getConfiguration(ASTs engine_args, ContextPtr context); + private: friend class PostgreSQLBlockOutputStream; diff --git a/src/Storages/StorageReplicatedMergeTree.cpp b/src/Storages/StorageReplicatedMergeTree.cpp index 29646c99262..e67947ae811 100644 --- a/src/Storages/StorageReplicatedMergeTree.cpp +++ b/src/Storages/StorageReplicatedMergeTree.cpp @@ -961,7 +961,9 @@ void StorageReplicatedMergeTree::checkTableStructure(const String & zookeeper_pr const ColumnsDescription & old_columns = metadata_snapshot->getColumns(); if (columns_from_zk != old_columns) { - throw Exception("Table columns structure in ZooKeeper is different from local table structure", ErrorCodes::INCOMPATIBLE_COLUMNS); + throw Exception(ErrorCodes::INCOMPATIBLE_COLUMNS, + "Table columns structure in ZooKeeper is different from local table structure. Local columns:\n" + "{}\nZookeeper columns:\n{}", old_columns.toString(), columns_from_zk.toString()); } } diff --git a/src/Storages/StorageS3.cpp b/src/Storages/StorageS3.cpp index 1b22d96dc80..ae4e523d2d8 100644 --- a/src/Storages/StorageS3.cpp +++ b/src/Storages/StorageS3.cpp @@ -732,20 +732,70 @@ void StorageS3::updateClientAndAuthSettings(ContextPtr ctx, StorageS3::ClientAut upd.auth_settings = std::move(settings); } -void registerStorageS3Impl(const String & name, StorageFactory & factory) -{ - factory.registerStorage(name, [](const StorageFactory::Arguments & args) - { - ASTs & engine_args = args.engine_args; +StorageS3Configuration StorageS3::getConfiguration(ASTs & engine_args, ContextPtr local_context) +{ + StorageS3Configuration configuration; + + if (auto named_collection = getURLBasedDataSourceConfiguration(engine_args, local_context)) + { + auto [common_configuration, storage_specific_args] = named_collection.value(); + configuration.set(common_configuration); + + for (const auto & [arg_name, arg_value] : storage_specific_args) + { + if (arg_name == "access_key_id") + configuration.access_key_id = arg_value.safeGet(); + else if (arg_name == "secret_access_key") + configuration.secret_access_key = arg_value.safeGet(); + else + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, + "Unknown key-value argument `{}` for StorageS3, expected: url, [access_key_id, secret_access_key], name of used format and [compression_method].", + arg_name); + } + } + else + { if (engine_args.size() < 2 || engine_args.size() > 5) throw Exception( "Storage S3 requires 2 to 5 arguments: url, [access_key_id, secret_access_key], name of used format and [compression_method].", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); for (auto & engine_arg : engine_args) - engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, args.getLocalContext()); + engine_arg = evaluateConstantExpressionOrIdentifierAsLiteral(engine_arg, local_context); + configuration.url = engine_args[0]->as().value.safeGet(); + if (engine_args.size() >= 4) + { + configuration.access_key_id = engine_args[1]->as().value.safeGet(); + configuration.secret_access_key = engine_args[2]->as().value.safeGet(); + } + + if (engine_args.size() == 3 || engine_args.size() == 5) + { + configuration.compression_method = engine_args.back()->as().value.safeGet(); + configuration.format = engine_args[engine_args.size() - 2]->as().value.safeGet(); + } + else + { + configuration.compression_method = "auto"; + configuration.format = engine_args.back()->as().value.safeGet(); + } + } + + return configuration; +} + + +void registerStorageS3Impl(const String & name, StorageFactory & factory) +{ + factory.registerStorage(name, [](const StorageFactory::Arguments & args) + { + auto & engine_args = args.engine_args; + if (engine_args.empty()) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "External data source must have arguments"); + + auto configuration = StorageS3::getConfiguration(engine_args, args.getLocalContext()); // Use format settings from global server context + settings from // the SETTINGS clause of the create query. Settings from current // session and user are ignored. @@ -760,9 +810,7 @@ void registerStorageS3Impl(const String & name, StorageFactory & factory) for (const auto & change : changes) { if (user_format_settings.has(change.name)) - { user_format_settings.set(change.name, change.value); - } } // Apply changes from SETTINGS clause, with validation. @@ -774,42 +822,18 @@ void registerStorageS3Impl(const String & name, StorageFactory & factory) format_settings = getFormatSettings(args.getContext()); } - String url = engine_args[0]->as().value.safeGet(); - Poco::URI uri (url); - S3::URI s3_uri (uri); - - String access_key_id; - String secret_access_key; - if (engine_args.size() >= 4) - { - access_key_id = engine_args[1]->as().value.safeGet(); - secret_access_key = engine_args[2]->as().value.safeGet(); - } - - UInt64 max_single_read_retries = args.getLocalContext()->getSettingsRef().s3_max_single_read_retries; - UInt64 min_upload_part_size = args.getLocalContext()->getSettingsRef().s3_min_upload_part_size; - UInt64 max_single_part_upload_size = args.getLocalContext()->getSettingsRef().s3_max_single_part_upload_size; - UInt64 max_connections = args.getLocalContext()->getSettingsRef().s3_max_connections; - - String compression_method; - String format_name; - if (engine_args.size() == 3 || engine_args.size() == 5) - { - compression_method = engine_args.back()->as().value.safeGet(); - format_name = engine_args[engine_args.size() - 2]->as().value.safeGet(); - } - else - { - compression_method = "auto"; - format_name = engine_args.back()->as().value.safeGet(); - } + S3::URI s3_uri(Poco::URI(configuration.url)); + auto max_single_read_retries = args.getLocalContext()->getSettingsRef().s3_max_single_read_retries; + auto min_upload_part_size = args.getLocalContext()->getSettingsRef().s3_min_upload_part_size; + auto max_single_part_upload_size = args.getLocalContext()->getSettingsRef().s3_max_single_part_upload_size; + auto max_connections = args.getLocalContext()->getSettingsRef().s3_max_connections; return StorageS3::create( s3_uri, - access_key_id, - secret_access_key, + configuration.access_key_id, + configuration.secret_access_key, args.table_id, - format_name, + configuration.format, max_single_read_retries, min_upload_part_size, max_single_part_upload_size, @@ -819,7 +843,7 @@ void registerStorageS3Impl(const String & name, StorageFactory & factory) args.comment, args.getContext(), format_settings, - compression_method); + configuration.compression_method); }, { .supports_settings = true, diff --git a/src/Storages/StorageS3.h b/src/Storages/StorageS3.h index cfd7e496928..d51581be74b 100644 --- a/src/Storages/StorageS3.h +++ b/src/Storages/StorageS3.h @@ -18,6 +18,7 @@ #include #include #include +#include namespace Aws::S3 { @@ -141,6 +142,8 @@ public: bool supportsPartitionBy() const override; + static StorageS3Configuration getConfiguration(ASTs & engine_args, ContextPtr local_context); + private: friend class StorageS3Cluster; diff --git a/src/Storages/StorageURL.cpp b/src/Storages/StorageURL.cpp index 1aa5ac7f236..81820ce5e1d 100644 --- a/src/Storages/StorageURL.cpp +++ b/src/Storages/StorageURL.cpp @@ -6,7 +6,6 @@ #include #include -#include #include #include #include @@ -32,8 +31,10 @@ namespace ErrorCodes { extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int NETWORK_ERROR; + extern const int BAD_ARGUMENTS; } + IStorageURLBase::IStorageURLBase( const Poco::URI & uri_, ContextPtr /*context_*/, @@ -43,8 +44,9 @@ IStorageURLBase::IStorageURLBase( const ColumnsDescription & columns_, const ConstraintsDescription & constraints_, const String & comment, - const String & compression_method_) - : IStorage(table_id_), uri(uri_), compression_method(compression_method_), format_name(format_name_), format_settings(format_settings_) + const String & compression_method_, + const ReadWriteBufferFromHTTP::HTTPHeaderEntries & headers_) + : IStorage(table_id_), uri(uri_), compression_method(compression_method_), format_name(format_name_), format_settings(format_settings_), headers(headers_) { StorageInMemoryMetadata storage_metadata; storage_metadata.setColumns(columns_); @@ -69,10 +71,14 @@ namespace const ColumnsDescription & columns, UInt64 max_block_size, const ConnectionTimeouts & timeouts, - const CompressionMethod compression_method) + const CompressionMethod compression_method, + const ReadWriteBufferFromHTTP::HTTPHeaderEntries & headers_ = {}) : SourceWithProgress(sample_block), name(std::move(name_)) { - ReadWriteBufferFromHTTP::HTTPHeaderEntries header; + ReadWriteBufferFromHTTP::HTTPHeaderEntries headers; + + for (const auto & header : headers_) + headers.emplace_back(header); // Propagate OpenTelemetry trace context, if any, downstream. if (CurrentThread::isInitialized()) @@ -80,12 +86,12 @@ namespace const auto & thread_trace_context = CurrentThread::get().thread_trace_context; if (thread_trace_context.trace_id != UUID()) { - header.emplace_back("traceparent", + headers.emplace_back("traceparent", thread_trace_context.composeTraceparentHeader()); if (!thread_trace_context.tracestate.empty()) { - header.emplace_back("tracestate", + headers.emplace_back("tracestate", thread_trace_context.tracestate); } } @@ -100,7 +106,7 @@ namespace context->getSettingsRef().max_http_get_redirects, Poco::Net::HTTPBasicCredentials{}, DBMS_DEFAULT_BUFFER_SIZE, - header, + headers, context->getRemoteHostFilter()), compression_method); @@ -237,7 +243,8 @@ Pipe IStorageURLBase::read( metadata_snapshot->getColumns(), max_block_size, ConnectionTimeouts::getHTTPTimeouts(local_context), - chooseCompressionMethod(request_uri.getPath(), compression_method))); + chooseCompressionMethod(request_uri.getPath(), compression_method), + headers)); } @@ -312,8 +319,9 @@ StorageURL::StorageURL( const ConstraintsDescription & constraints_, const String & comment, ContextPtr context_, - const String & compression_method_) - : IStorageURLBase(uri_, context_, table_id_, format_name_, format_settings_, columns_, constraints_, comment, compression_method_) + const String & compression_method_, + const ReadWriteBufferFromHTTP::HTTPHeaderEntries & headers_) + : IStorageURLBase(uri_, context_, table_id_, format_name_, format_settings_, columns_, constraints_, comment, compression_method_, headers_) { context_->getRemoteHostFilter().checkURL(uri); } @@ -375,45 +383,73 @@ FormatSettings StorageURL::getFormatSettingsFromArgs(const StorageFactory::Argum return format_settings; } +URLBasedDataSourceConfiguration StorageURL::getConfiguration(ASTs & args, ContextPtr local_context) +{ + URLBasedDataSourceConfiguration configuration; + + if (auto named_collection = getURLBasedDataSourceConfiguration(args, local_context)) + { + auto [common_configuration, storage_specific_args] = named_collection.value(); + configuration.set(common_configuration); + + if (!storage_specific_args.empty()) + { + String illegal_args; + for (const auto & arg : storage_specific_args) + { + if (!illegal_args.empty()) + illegal_args += ", "; + illegal_args += arg.first; + } + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Unknown arguments {} for table function URL", illegal_args); + } + } + else + { + if (args.size() != 2 && args.size() != 3) + throw Exception( + "Storage URL requires 2 or 3 arguments: url, name of used format and optional compression method.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + + for (auto & arg : args) + arg = evaluateConstantExpressionOrIdentifierAsLiteral(arg, local_context); + + configuration.url = args[0]->as().value.safeGet(); + configuration.format = args[1]->as().value.safeGet(); + if (args.size() == 3) + configuration.compression_method = args[2]->as().value.safeGet(); + } + + return configuration; +} + void registerStorageURL(StorageFactory & factory) { factory.registerStorage("URL", [](const StorageFactory::Arguments & args) { ASTs & engine_args = args.engine_args; - - if (engine_args.size() != 2 && engine_args.size() != 3) - throw Exception( - "Storage URL requires 2 or 3 arguments: url, name of used format and optional compression method.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); - - engine_args[0] = evaluateConstantExpressionOrIdentifierAsLiteral(engine_args[0], args.getLocalContext()); - - const String & url = engine_args[0]->as().value.safeGet(); - Poco::URI uri(url); - - engine_args[1] = evaluateConstantExpressionOrIdentifierAsLiteral(engine_args[1], args.getLocalContext()); - - const String & format_name = engine_args[1]->as().value.safeGet(); - - String compression_method = "auto"; - if (engine_args.size() == 3) - { - engine_args[2] = evaluateConstantExpressionOrIdentifierAsLiteral(engine_args[2], args.getLocalContext()); - compression_method = engine_args[2]->as().value.safeGet(); - } - + auto configuration = StorageURL::getConfiguration(engine_args, args.getLocalContext()); auto format_settings = StorageURL::getFormatSettingsFromArgs(args); + Poco::URI uri(configuration.url); + + ReadWriteBufferFromHTTP::HTTPHeaderEntries headers; + for (const auto & [header, value] : configuration.headers) + { + auto value_literal = value.safeGet(); + headers.emplace_back(std::make_pair(header, value_literal)); + } return StorageURL::create( uri, args.table_id, - format_name, + configuration.format, format_settings, args.columns, args.constraints, args.comment, args.getContext(), - compression_method); + configuration.compression_method, + headers); }, { .supports_settings = true, diff --git a/src/Storages/StorageURL.h b/src/Storages/StorageURL.h index 7d61661b68d..930272ab1aa 100644 --- a/src/Storages/StorageURL.h +++ b/src/Storages/StorageURL.h @@ -6,7 +6,9 @@ #include #include #include +#include #include +#include namespace DB @@ -44,7 +46,8 @@ protected: const ColumnsDescription & columns_, const ConstraintsDescription & constraints_, const String & comment, - const String & compression_method_); + const String & compression_method_, + const ReadWriteBufferFromHTTP::HTTPHeaderEntries & headers_ = {}); Poco::URI uri; String compression_method; @@ -54,6 +57,7 @@ protected: // For `url` table function, we use settings from current query context. // In this case, format_settings is not set. std::optional format_settings; + ReadWriteBufferFromHTTP::HTTPHeaderEntries headers; virtual std::string getReadMethod() const; @@ -113,7 +117,8 @@ public: const ConstraintsDescription & constraints_, const String & comment, ContextPtr context_, - const String & compression_method_); + const String & compression_method_, + const ReadWriteBufferFromHTTP::HTTPHeaderEntries & headers_ = {}); String getName() const override { @@ -126,6 +131,8 @@ public: } static FormatSettings getFormatSettingsFromArgs(const StorageFactory::Arguments & args); + + static URLBasedDataSourceConfiguration getConfiguration(ASTs & args, ContextPtr context); }; @@ -152,6 +159,13 @@ public: size_t max_block_size, unsigned num_streams) override; + struct Configuration + { + String url; + String compression_method = "auto"; + std::vector> headers; + }; + private: std::vector uri_options; }; diff --git a/src/Storages/StorageView.cpp b/src/Storages/StorageView.cpp index 790b925f891..e4b0e57a563 100644 --- a/src/Storages/StorageView.cpp +++ b/src/Storages/StorageView.cpp @@ -158,8 +158,8 @@ void StorageView::read( { throw DB::Exception(ErrorCodes::INCORRECT_QUERY, "Query from view {} returned Nullable column having not Nullable type in structure. " - "If query from view has JOIN, it may be cause by different values of 'json_use_nulls' setting. " - "You may explicitly specify 'json_use_nulls' in 'CREATE VIEW' query to avoid this error", + "If query from view has JOIN, it may be cause by different values of 'join_use_nulls' setting. " + "You may explicitly specify 'join_use_nulls' in 'CREATE VIEW' query to avoid this error", getStorageID().getFullTableName()); } diff --git a/src/Storages/System/StorageSystemDDLWorkerQueue.cpp b/src/Storages/System/StorageSystemDDLWorkerQueue.cpp index 5b9ed938e22..1df8b43515e 100644 --- a/src/Storages/System/StorageSystemDDLWorkerQueue.cpp +++ b/src/Storages/System/StorageSystemDDLWorkerQueue.cpp @@ -1,23 +1,14 @@ -#include -#include - -#include "StorageSystemDDLWorkerQueue.h" - -#include +#include #include - -#include #include #include #include #include +#include +#include #include -#include -#include -#include -#include +#include #include - #include #include #include @@ -25,207 +16,334 @@ namespace fs = std::filesystem; -enum Status -{ - ACTIVE, - FINISHED, - UNKNOWN, - ERRORED -}; namespace DB { -std::vector> getStatusEnumsAndValues() + +enum class Status +{ + INACTIVE, + ACTIVE, + FINISHED, + REMOVING, + UNKNOWN, +}; + +using GetResponseFuture = std::future; +using ListResponseFuture = std::future; +using GetResponseFutures = std::vector; +using ListResponseFutures = std::vector; + +static std::vector> getStatusEnumsAndValues() { return std::vector>{ - {"Active", static_cast(Status::ACTIVE)}, - {"Finished", static_cast(Status::FINISHED)}, - {"Unknown", static_cast(Status::UNKNOWN)}, - {"Errored", static_cast(Status::ERRORED)}, + {"Inactive", static_cast(Status::INACTIVE)}, + {"Active", static_cast(Status::ACTIVE)}, + {"Finished", static_cast(Status::FINISHED)}, + {"Removing", static_cast(Status::REMOVING)}, + {"Unknown", static_cast(Status::UNKNOWN)}, }; } - -std::vector> getZooKeeperErrorEnumsAndValues() -{ - return std::vector>{ - {"ZOK", static_cast(Coordination::Error::ZOK)}, - {"ZSYSTEMERROR", static_cast(Coordination::Error::ZSYSTEMERROR)}, - {"ZRUNTIMEINCONSISTENCY", static_cast(Coordination::Error::ZRUNTIMEINCONSISTENCY)}, - {"ZDATAINCONSISTENCY", static_cast(Coordination::Error::ZDATAINCONSISTENCY)}, - {"ZCONNECTIONLOSS", static_cast(Coordination::Error::ZCONNECTIONLOSS)}, - {"ZMARSHALLINGERROR", static_cast(Coordination::Error::ZMARSHALLINGERROR)}, - {"ZUNIMPLEMENTED", static_cast(Coordination::Error::ZUNIMPLEMENTED)}, - {"ZOPERATIONTIMEOUT", static_cast(Coordination::Error::ZOPERATIONTIMEOUT)}, - {"ZBADARGUMENTS", static_cast(Coordination::Error::ZBADARGUMENTS)}, - {"ZINVALIDSTATE", static_cast(Coordination::Error::ZINVALIDSTATE)}, - {"ZAPIERROR", static_cast(Coordination::Error::ZAPIERROR)}, - {"ZNONODE", static_cast(Coordination::Error::ZNONODE)}, - {"ZNOAUTH", static_cast(Coordination::Error::ZNOAUTH)}, - {"ZBADVERSION", static_cast(Coordination::Error::ZBADVERSION)}, - {"ZNOCHILDRENFOREPHEMERALS", static_cast(Coordination::Error::ZNOCHILDRENFOREPHEMERALS)}, - {"ZNODEEXISTS", static_cast(Coordination::Error::ZNODEEXISTS)}, - {"ZNOTEMPTY", static_cast(Coordination::Error::ZNOTEMPTY)}, - {"ZSESSIONEXPIRED", static_cast(Coordination::Error::ZSESSIONEXPIRED)}, - {"ZINVALIDCALLBACK", static_cast(Coordination::Error::ZINVALIDCALLBACK)}, - {"ZINVALIDACL", static_cast(Coordination::Error::ZINVALIDACL)}, - {"ZAUTHFAILED", static_cast(Coordination::Error::ZAUTHFAILED)}, - {"ZCLOSING", static_cast(Coordination::Error::ZCLOSING)}, - {"ZNOTHING", static_cast(Coordination::Error::ZNOTHING)}, - {"ZSESSIONMOVED", static_cast(Coordination::Error::ZSESSIONMOVED)}, - }; -} - - NamesAndTypesList StorageSystemDDLWorkerQueue::getNamesAndTypes() { return { - {"entry", std::make_shared()}, - {"host_name", std::make_shared()}, - {"host_address", std::make_shared()}, - {"port", std::make_shared()}, - {"status", std::make_shared(getStatusEnumsAndValues())}, - {"cluster", std::make_shared()}, - {"query", std::make_shared()}, - {"initiator", std::make_shared()}, - {"query_start_time", std::make_shared()}, - {"query_finish_time", std::make_shared()}, - {"query_duration_ms", std::make_shared()}, - {"exception_code", std::make_shared(getZooKeeperErrorEnumsAndValues())}, + {"entry", std::make_shared()}, + {"entry_version", std::make_shared(std::make_shared())}, + {"initiator_host", std::make_shared(std::make_shared())}, + {"initiator_port", std::make_shared(std::make_shared())}, + {"cluster", std::make_shared()}, + {"query", std::make_shared()}, + {"settings", std::make_shared(std::make_shared(), std::make_shared())}, + {"query_create_time", std::make_shared()}, + + {"host", std::make_shared(std::make_shared())}, + {"port", std::make_shared(std::make_shared())}, + {"status", std::make_shared(std::make_shared(getStatusEnumsAndValues()))}, + {"exception_code", std::make_shared(std::make_shared())}, + {"exception_text", std::make_shared(std::make_shared())}, + {"query_finish_time", std::make_shared(std::make_shared())}, + {"query_duration_ms", std::make_shared(std::make_shared())}, }; } -static String clusterNameFromDDLQuery(ContextPtr context, const DDLLogEntry & entry) +static String clusterNameFromDDLQuery(ContextPtr context, const DDLTask & task) { - const char * begin = entry.query.data(); - const char * end = begin + entry.query.size(); - ASTPtr query; - ASTQueryWithOnCluster * query_on_cluster; + const char * begin = task.entry.query.data(); + const char * end = begin + task.entry.query.size(); String cluster_name; ParserQuery parser_query(end); - String description; - query = parseQuery(parser_query, begin, end, description, 0, context->getSettingsRef().max_parser_depth); - if (query && (query_on_cluster = dynamic_cast(query.get()))) + String description = fmt::format("from {}", task.entry_path); + ASTPtr query = parseQuery(parser_query, begin, end, description, + context->getSettingsRef().max_query_size, context->getSettingsRef().max_parser_depth); + if (const auto * query_on_cluster = dynamic_cast(query.get())) cluster_name = query_on_cluster->cluster; return cluster_name; } +static void fillCommonColumns(MutableColumns & res_columns, size_t & col, const DDLTask & task, const String & cluster_name, UInt64 query_create_time_ms) +{ + /// entry + res_columns[col++]->insert(task.entry_name); + + /// entry_version + res_columns[col++]->insert(task.entry.version); + + if (task.entry.initiator.empty()) + { + /// initiator_host + res_columns[col++]->insert(Field{}); + /// initiator_port + res_columns[col++]->insert(Field{}); + } + else + { + HostID initiator = HostID::fromString(task.entry.initiator); + /// initiator_host + res_columns[col++]->insert(initiator.host_name); + /// initiator_port + res_columns[col++]->insert(initiator.port); + } + + /// cluster + res_columns[col++]->insert(cluster_name); + + /// query + res_columns[col++]->insert(task.entry.query); + + Map settings_map; + if (task.entry.settings) + { + for (const auto & change : *task.entry.settings) + { + Tuple pair; + pair.push_back(change.name); + pair.push_back(toString(change.value)); + settings_map.push_back(std::move(pair)); + } + } + + /// settings + res_columns[col++]->insert(settings_map); + + res_columns[col++]->insert(static_cast(query_create_time_ms / 1000)); +} + +static void repeatValuesInCommonColumns(MutableColumns & res_columns, size_t num_filled_columns) +{ + if (res_columns[num_filled_columns - 1]->size() == res_columns[num_filled_columns]->size() + 1) + { + /// Common columns are already filled + return; + } + + /// Copy values from previous row + assert(res_columns[num_filled_columns - 1]->size() == res_columns[num_filled_columns]->size()); + for (size_t filled_col = 0; filled_col < num_filled_columns; ++filled_col) + res_columns[filled_col]->insert((*res_columns[filled_col])[res_columns[filled_col]->size() - 1]); +} + +static void fillHostnameColumns(MutableColumns & res_columns, size_t & col, const HostID & host_id) +{ + /// NOTE host_id.host_name can be a domain name or an IP address + /// We could try to resolve domain name or reverse resolve an address and add two separate columns, + /// but seems like it's not really needed, so we show host_id.host_name as is. + + /// host + res_columns[col++]->insert(host_id.host_name); + + /// port + res_columns[col++]->insert(host_id.port); +} + +static void fillStatusColumnsWithNulls(MutableColumns & res_columns, size_t & col, Status status) +{ + /// status + res_columns[col++]->insert(static_cast(status)); + /// exception_code + res_columns[col++]->insert(Field{}); + /// exception_text + res_columns[col++]->insert(Field{}); + /// query_finish_time + res_columns[col++]->insert(Field{}); + /// query_duration_ms + res_columns[col++]->insert(Field{}); +} + +static void fillStatusColumns(MutableColumns & res_columns, size_t & col, + GetResponseFuture & finished_data_future, + UInt64 query_create_time_ms) +{ + auto maybe_finished_status = finished_data_future.get(); + if (maybe_finished_status.error == Coordination::Error::ZNONODE) + return fillStatusColumnsWithNulls(res_columns, col, Status::REMOVING); + + /// asyncTryGet should throw on other error codes + assert(maybe_finished_status.error == Coordination::Error::ZOK); + + /// status + res_columns[col++]->insert(static_cast(Status::FINISHED)); + + auto execution_status = ExecutionStatus::fromText(maybe_finished_status.data); + /// exception_code + res_columns[col++]->insert(execution_status.code); + /// exception_text + res_columns[col++]->insert(execution_status.message); + + UInt64 query_finish_time_ms = maybe_finished_status.stat.ctime; + /// query_finish_time + res_columns[col++]->insert(static_cast(query_finish_time_ms / 1000)); + /// query_duration_ms + res_columns[col++]->insert(static_cast(query_finish_time_ms - query_create_time_ms)); +} + + void StorageSystemDDLWorkerQueue::fillData(MutableColumns & res_columns, ContextPtr context, const SelectQueryInfo &) const { zkutil::ZooKeeperPtr zookeeper = context->getZooKeeper(); - Coordination::Error zk_exception_code = Coordination::Error::ZOK; - String ddl_zookeeper_path = config.getString("distributed_ddl.path", "/clickhouse/task_queue/ddl/"); - String ddl_query_path; + fs::path ddl_zookeeper_path = context->getConfigRef().getString("distributed_ddl.path", "/clickhouse/task_queue/ddl/"); - // this is equivalent to query zookeeper at the `ddl_zookeeper_path` - /* [zk: localhost:2181(CONNECTED) 51] ls /clickhouse/task_queue/ddl - [query-0000000000, query-0000000001, query-0000000002, query-0000000003, query-0000000004] - */ + Strings ddl_task_paths = zookeeper->getChildren(ddl_zookeeper_path); - zkutil::Strings queries; + GetResponseFutures ddl_task_futures; + ListResponseFutures active_nodes_futures; + ListResponseFutures finished_nodes_futures; - Coordination::Error code = zookeeper->tryGetChildren(ddl_zookeeper_path, queries); - // if there is an error here, just register the code. - // the queries will be empty and so there will be nothing to fill the table - if (code != Coordination::Error::ZOK && code != Coordination::Error::ZNONODE) - zk_exception_code = code; - - const auto clusters = context->getClusters(); - for (const auto & name_and_cluster : clusters->getContainer()) + for (const auto & task_path : ddl_task_paths) { - const ClusterPtr & cluster = name_and_cluster.second; - const auto & shards_info = cluster->getShardsInfo(); - const auto & addresses_with_failover = cluster->getShardsAddresses(); - for (size_t shard_index = 0; shard_index < shards_info.size(); ++shard_index) + ddl_task_futures.push_back(zookeeper->asyncTryGet(ddl_zookeeper_path / task_path)); + /// List status dirs. Active host may become finished, so we list active first. + active_nodes_futures.push_back(zookeeper->asyncTryGetChildrenNoThrow(ddl_zookeeper_path / task_path / "active")); + finished_nodes_futures.push_back(zookeeper->asyncTryGetChildrenNoThrow(ddl_zookeeper_path / task_path / "finished")); + } + + for (size_t i = 0; i < ddl_task_paths.size(); ++i) + { + auto maybe_task = ddl_task_futures[i].get(); + if (maybe_task.error != Coordination::Error::ZOK) { - const auto & shard_addresses = addresses_with_failover[shard_index]; - const auto & shard_info = shards_info[shard_index]; - const auto pool_status = shard_info.pool->getStatus(); - for (size_t replica_index = 0; replica_index < shard_addresses.size(); ++replica_index) + /// Task is removed + assert(maybe_task.error == Coordination::Error::ZNONODE); + continue; + } + + DDLTask task{ddl_task_paths[i], ddl_zookeeper_path / ddl_task_paths[i]}; + try + { + task.entry.parse(maybe_task.data); + } + catch (Exception & e) + { + e.addMessage("On parsing DDL entry {}: {}", task.entry_path, maybe_task.data); + throw; + } + + String cluster_name = clusterNameFromDDLQuery(context, task); + UInt64 query_create_time_ms = maybe_task.stat.ctime; + + size_t col = 0; + fillCommonColumns(res_columns, col, task, cluster_name, query_create_time_ms); + + /// At first we process finished nodes, to avoid duplication if some host was active + /// and suddenly become finished during status dirs listing. + /// Then we process active (but not finished) hosts. + /// And then we process the rest hosts from task.entry.hosts list. + /// NOTE: It's not guaranteed that task.entry.hosts contains all host ids from status dirs. + std::unordered_set processed_hosts; + + /// Race condition with DDLWorker::cleanupQueue(...) is possible. + /// We may get incorrect list of finished nodes if task is currently removing. + /// To avoid showing INACTIVE status for hosts that have actually executed query, + /// we will detect if someone is removing task and show special REMOVING status. + /// Also we should distinguish it from another case when status dirs are not created yet (extremely rare case). + bool is_removing_task = false; + + auto maybe_finished_hosts = finished_nodes_futures[i].get(); + if (maybe_finished_hosts.error == Coordination::Error::ZOK) + { + GetResponseFutures finished_status_futures; + for (const auto & host_id_str : maybe_finished_hosts.names) + finished_status_futures.push_back(zookeeper->asyncTryGet(fs::path(task.entry_path) / "finished" / host_id_str)); + + for (size_t host_idx = 0; host_idx < maybe_finished_hosts.names.size(); ++host_idx) { - /* Dir contents of every query will be similar to - [zk: localhost:2181(CONNECTED) 53] ls /clickhouse/task_queue/ddl/query-0000000004 - [active, finished] - */ - std::vector> futures; - futures.reserve(queries.size()); - for (const String & q : queries) - { - futures.push_back(zookeeper->asyncTryGet(fs::path(ddl_zookeeper_path) / q)); - } - for (size_t query_id = 0; query_id < queries.size(); query_id++) - { - Int64 query_finish_time = 0; - size_t i = 0; - res_columns[i++]->insert(queries[query_id]); // entry - const auto & address = shard_addresses[replica_index]; - res_columns[i++]->insert(address.host_name); - auto resolved = address.getResolvedAddress(); - res_columns[i++]->insert(resolved ? resolved->host().toString() : String()); // host_address - res_columns[i++]->insert(address.port); - ddl_query_path = fs::path(ddl_zookeeper_path) / queries[query_id]; - - zkutil::Strings active_nodes; - zkutil::Strings finished_nodes; - - code = zookeeper->tryGetChildren(fs::path(ddl_query_path) / "active", active_nodes); - if (code != Coordination::Error::ZOK && code != Coordination::Error::ZNONODE) - zk_exception_code = code; - - code = zookeeper->tryGetChildren(fs::path(ddl_query_path) / "finished", finished_nodes); - if (code != Coordination::Error::ZOK && code != Coordination::Error::ZNONODE) - zk_exception_code = code; - - /* status: - * active: If the hostname:port entry is present under active path. - * finished: If the hostname:port entry is present under the finished path. - * errored: If the hostname:port entry is present under the finished path but the error count is not 0. - * unknown: If the above cases don't hold true, then status is unknown. - */ - if (std::find(active_nodes.begin(), active_nodes.end(), address.toString()) != active_nodes.end()) - { - res_columns[i++]->insert(static_cast(Status::ACTIVE)); - } - else if (std::find(finished_nodes.begin(), finished_nodes.end(), address.toString()) != finished_nodes.end()) - { - if (pool_status[replica_index].error_count != 0) - { - res_columns[i++]->insert(static_cast(Status::ERRORED)); - } - else - { - res_columns[i++]->insert(static_cast(Status::FINISHED)); - } - // regardless of the status finished or errored, the node host_name:port entry was found under the /finished path - // & should be able to get the contents of the znode at /finished path. - auto res_fn = zookeeper->asyncTryGet(fs::path(ddl_query_path) / "finished"); - auto stat_fn = res_fn.get().stat; - query_finish_time = stat_fn.mtime; - } - else - { - res_columns[i++]->insert(static_cast(Status::UNKNOWN)); - } - - Coordination::GetResponse res; - res = futures[query_id].get(); - - auto query_start_time = res.stat.mtime; - - DDLLogEntry entry; - entry.parse(res.data); - String cluster_name = clusterNameFromDDLQuery(context, entry); - - res_columns[i++]->insert(cluster_name); - res_columns[i++]->insert(entry.query); - res_columns[i++]->insert(entry.initiator); - res_columns[i++]->insert(UInt64(query_start_time / 1000)); - res_columns[i++]->insert(UInt64(query_finish_time / 1000)); - res_columns[i++]->insert(UInt64(query_finish_time - query_start_time)); - res_columns[i++]->insert(static_cast(zk_exception_code)); - } + const auto & host_id_str = maybe_finished_hosts.names[host_idx]; + HostID host_id = HostID::fromString(host_id_str); + repeatValuesInCommonColumns(res_columns, col); + size_t rest_col = col; + fillHostnameColumns(res_columns, rest_col, host_id); + fillStatusColumns(res_columns, rest_col, finished_status_futures[host_idx], query_create_time_ms); + processed_hosts.insert(host_id_str); } } + else if (maybe_finished_hosts.error == Coordination::Error::ZNONODE) + { + /// Rare case: Either status dirs are not created yet or already removed. + /// We can distinguish it by checking if task node exists, because "query-xxx" and "query-xxx/finished" + /// are removed in single multi-request + is_removing_task = !zookeeper->exists(task.entry_path); + } + else + { + throw Coordination::Exception(maybe_finished_hosts.error, fs::path(task.entry_path) / "finished"); + } + + /// Process active nodes + auto maybe_active_hosts = active_nodes_futures[i].get(); + if (maybe_active_hosts.error == Coordination::Error::ZOK) + { + for (const auto & host_id_str : maybe_active_hosts.names) + { + if (processed_hosts.contains(host_id_str)) + continue; + + HostID host_id = HostID::fromString(host_id_str); + repeatValuesInCommonColumns(res_columns, col); + size_t rest_col = col; + fillHostnameColumns(res_columns, rest_col, host_id); + fillStatusColumnsWithNulls(res_columns, rest_col, Status::ACTIVE); + processed_hosts.insert(host_id_str); + } + } + else if (maybe_active_hosts.error == Coordination::Error::ZNONODE) + { + /// Rare case: Either status dirs are not created yet or task is currently removing. + /// When removing a task, at first we remove "query-xxx/active" (not recursively), + /// then recursively remove everything except "query-xxx/finished" + /// and then remove "query-xxx" and "query-xxx/finished". + is_removing_task = is_removing_task || + (zookeeper->exists(fs::path(task.entry_path) / "finished") && !zookeeper->exists(fs::path(task.entry_path) / "active")) || + !zookeeper->exists(task.entry_path); + } + else + { + throw Coordination::Exception(maybe_active_hosts.error, fs::path(task.entry_path) / "active"); + } + + /// Process the rest hosts + for (const auto & host_id : task.entry.hosts) + { + if (processed_hosts.contains(host_id.toString())) + continue; + + Status status = is_removing_task ? Status::REMOVING : Status::INACTIVE; + repeatValuesInCommonColumns(res_columns, col); + size_t rest_col = col; + fillHostnameColumns(res_columns, rest_col, host_id); + fillStatusColumnsWithNulls(res_columns, rest_col, status); + processed_hosts.insert(host_id.toString()); + } + + if (processed_hosts.empty()) + { + /// We don't know any hosts, just fill the rest columns with nulls. + /// host + res_columns[col++]->insert(Field{}); + /// port + res_columns[col++]->insert(Field{}); + fillStatusColumnsWithNulls(res_columns, col, Status::UNKNOWN); + } } } + } diff --git a/src/Storages/System/StorageSystemDDLWorkerQueue.h b/src/Storages/System/StorageSystemDDLWorkerQueue.h index 891ba4db509..9c29172f0f5 100644 --- a/src/Storages/System/StorageSystemDDLWorkerQueue.h +++ b/src/Storages/System/StorageSystemDDLWorkerQueue.h @@ -1,12 +1,8 @@ #pragma once - -#include -#include - #include #include #include - +#include namespace DB { @@ -19,10 +15,9 @@ class StorageSystemDDLWorkerQueue final : public shared_ptr_helper { friend struct shared_ptr_helper; - Poco::Util::LayeredConfiguration & config = Poco::Util::Application::instance().config(); protected: - void fillData(MutableColumns & res_columns, ContextPtr context, const SelectQueryInfo & query_info) const override; + void fillData(MutableColumns & res_columns, ContextPtr context, const SelectQueryInfo &) const override; using IStorageSystemOneBlock::IStorageSystemOneBlock; diff --git a/src/Storages/tests/gtest_transform_query_for_external_database.cpp b/src/Storages/tests/gtest_transform_query_for_external_database.cpp index 9501be73a38..13e1ca882af 100644 --- a/src/Storages/tests/gtest_transform_query_for_external_database.cpp +++ b/src/Storages/tests/gtest_transform_query_for_external_database.cpp @@ -219,3 +219,33 @@ TEST(TransformQueryForExternalDatabase, ForeignColumnInWhere) "WHERE column > 2 AND (apply_id = 1 OR table2.num = 1) AND table2.attr != ''", R"(SELECT "column", "apply_id" FROM "test"."table" WHERE ("column" > 2) AND ("apply_id" = 1))"); } + +TEST(TransformQueryForExternalDatabase, NoStrict) +{ + const State & state = State::instance(); + + check(state, 1, + "SELECT field FROM table WHERE field IN (SELECT attr FROM table2)", + R"(SELECT "field" FROM "test"."table")"); +} + +TEST(TransformQueryForExternalDatabase, Strict) +{ + const State & state = State::instance(); + state.context->setSetting("external_table_strict_query", true); + + check(state, 1, + "SELECT field FROM table WHERE field = '1'", + R"(SELECT "field" FROM "test"."table" WHERE "field" = '1')"); + check(state, 1, + "SELECT field FROM table WHERE field IN ('1', '2')", + R"(SELECT "field" FROM "test"."table" WHERE "field" IN ('1', '2'))"); + check(state, 1, + "SELECT field FROM table WHERE field LIKE '%test%'", + R"(SELECT "field" FROM "test"."table" WHERE "field" LIKE '%test%')"); + + /// removeUnknownSubexpressionsFromWhere() takes place + EXPECT_THROW(check(state, 1, "SELECT field FROM table WHERE field IN (SELECT attr FROM table2)", ""), Exception); + /// !isCompatible() takes place + EXPECT_THROW(check(state, 1, "SELECT column FROM test.table WHERE left(column, 10) = RIGHT(column, 10) AND SUBSTRING(column FROM 1 FOR 2) = 'Hello'", ""), Exception); +} diff --git a/src/Storages/transformQueryForExternalDatabase.cpp b/src/Storages/transformQueryForExternalDatabase.cpp index 1bd665de460..96510585600 100644 --- a/src/Storages/transformQueryForExternalDatabase.cpp +++ b/src/Storages/transformQueryForExternalDatabase.cpp @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -20,6 +21,7 @@ namespace DB namespace ErrorCodes { extern const int LOGICAL_ERROR; + extern const int INCORRECT_QUERY; } namespace @@ -248,6 +250,7 @@ String transformQueryForExternalDatabase( { auto clone_query = query_info.query->clone(); const Names used_columns = query_info.syntax_analyzer_result->requiredSourceColumns(); + bool strict = context->getSettingsRef().external_table_strict_query; auto select = std::make_shared(); @@ -275,6 +278,10 @@ String transformQueryForExternalDatabase( { select->setExpression(ASTSelectQuery::Expression::WHERE, std::move(original_where)); } + else if (strict) + { + throw Exception("Query contains non-compatible expressions (and external_table_strict_query=true)", ErrorCodes::INCORRECT_QUERY); + } else if (const auto * function = original_where->as()) { if (function->name == "and") @@ -292,6 +299,10 @@ String transformQueryForExternalDatabase( } } } + else if (strict && original_where) + { + throw Exception("Query contains non-compatible expressions (and external_table_strict_query=true)", ErrorCodes::INCORRECT_QUERY); + } ASTPtr select_ptr = select; dropAliases(select_ptr); diff --git a/src/Storages/transformQueryForExternalDatabase.h b/src/Storages/transformQueryForExternalDatabase.h index 215afab8b30..6f7d6af5319 100644 --- a/src/Storages/transformQueryForExternalDatabase.h +++ b/src/Storages/transformQueryForExternalDatabase.h @@ -22,6 +22,9 @@ class IAST; * that contain only compatible expressions. * * Compatible expressions are comparisons of identifiers, constants, and logical operations on them. + * + * Throws INCORRECT_QUERY if external_table_strict_query (from context settings) + * is set and some expression from WHERE is not compatible. */ String transformQueryForExternalDatabase( const SelectQueryInfo & query_info, diff --git a/src/Storages/ya.make b/src/Storages/ya.make index b9eb47e2ab8..c0da9b29382 100644 --- a/src/Storages/ya.make +++ b/src/Storages/ya.make @@ -164,6 +164,7 @@ SRCS( StorageView.cpp StorageXDBC.cpp System/StorageSystemAggregateFunctionCombinators.cpp + System/StorageSystemAsynchronousInserts.cpp System/StorageSystemAsynchronousMetrics.cpp System/StorageSystemBuildOptions.cpp System/StorageSystemClusters.cpp diff --git a/src/TableFunctions/ITableFunctionFileLike.h b/src/TableFunctions/ITableFunctionFileLike.h index 7c96ce610b3..2069f02b0dd 100644 --- a/src/TableFunctions/ITableFunctionFileLike.h +++ b/src/TableFunctions/ITableFunctionFileLike.h @@ -12,6 +12,15 @@ class Context; */ class ITableFunctionFileLike : public ITableFunction { + +protected: + void parseArguments(const ASTPtr & ast_function, ContextPtr context) override; + + String filename; + String format; + String structure; + String compression_method = "auto"; + private: StoragePtr executeImpl(const ASTPtr & ast_function, ContextPtr context, const std::string & table_name, ColumnsDescription cached_columns) const override; @@ -21,13 +30,6 @@ private: ColumnsDescription getActualTableStructure(ContextPtr context) const override; - void parseArguments(const ASTPtr & ast_function, ContextPtr context) override; - bool hasStaticStructure() const override { return true; } - - String filename; - String format; - String structure; - String compression_method = "auto"; }; } diff --git a/src/TableFunctions/TableFunctionMySQL.cpp b/src/TableFunctions/TableFunctionMySQL.cpp index 09f9cf8b1f5..005a689f895 100644 --- a/src/TableFunctions/TableFunctionMySQL.cpp +++ b/src/TableFunctions/TableFunctionMySQL.cpp @@ -34,8 +34,6 @@ namespace DB namespace ErrorCodes { extern const int LOGICAL_ERROR; - extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; - extern const int BAD_ARGUMENTS; extern const int UNKNOWN_TABLE; } @@ -46,46 +44,19 @@ void TableFunctionMySQL::parseArguments(const ASTPtr & ast_function, ContextPtr if (!args_func.arguments) throw Exception("Table function 'mysql' must have arguments.", ErrorCodes::LOGICAL_ERROR); - ASTs & args = args_func.arguments->children; - - if (args.size() < 5 || args.size() > 7) - throw Exception("Table function 'mysql' requires 5-7 parameters: MySQL('host:port', database, table, 'user', 'password'[, replace_query, 'on_duplicate_clause']).", - ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); - - for (auto & arg : args) - arg = evaluateConstantExpressionOrIdentifierAsLiteral(arg, context); - - String host_port = args[0]->as().value.safeGet(); - remote_database_name = args[1]->as().value.safeGet(); - remote_table_name = args[2]->as().value.safeGet(); - user_name = args[3]->as().value.safeGet(); - password = args[4]->as().value.safeGet(); - - /// Split into replicas if needed. 3306 is the default MySQL port number - size_t max_addresses = context->getSettingsRef().glob_expansion_max_elements; - auto addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 3306); - pool.emplace(remote_database_name, addresses, user_name, password); - - if (args.size() >= 6) - replace_query = args[5]->as().value.safeGet() > 0; - if (args.size() == 7) - on_duplicate_clause = args[6]->as().value.safeGet(); - - if (replace_query && !on_duplicate_clause.empty()) - throw Exception( - "Only one of 'replace_query' and 'on_duplicate_clause' can be specified, or none of them", - ErrorCodes::BAD_ARGUMENTS); + configuration = StorageMySQL::getConfiguration(args_func.arguments->children, context); + pool.emplace(configuration->database, configuration->addresses, configuration->username, configuration->password); } ColumnsDescription TableFunctionMySQL::getActualTableStructure(ContextPtr context) const { const auto & settings = context->getSettingsRef(); - const auto tables_and_columns = fetchTablesColumnsList(*pool, remote_database_name, {remote_table_name}, settings, settings.mysql_datatypes_support_level); + const auto tables_and_columns = fetchTablesColumnsList(*pool, configuration->database, {configuration->table}, settings, settings.mysql_datatypes_support_level); - const auto columns = tables_and_columns.find(remote_table_name); + const auto columns = tables_and_columns.find(configuration->table); if (columns == tables_and_columns.end()) - throw Exception("MySQL table " + (remote_database_name.empty() ? "" : (backQuote(remote_database_name) + ".")) - + backQuote(remote_table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); + throw Exception("MySQL table " + (configuration->database.empty() ? "" : (backQuote(configuration->database) + ".")) + + backQuote(configuration->table) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); return columns->second; } @@ -101,10 +72,10 @@ StoragePtr TableFunctionMySQL::executeImpl( auto res = StorageMySQL::create( StorageID(getDatabaseName(), table_name), std::move(*pool), - remote_database_name, - remote_table_name, - replace_query, - on_duplicate_clause, + configuration->database, + configuration->table, + configuration->replace_query, + configuration->on_duplicate_clause, columns, ConstraintsDescription{}, String{}, diff --git a/src/TableFunctions/TableFunctionMySQL.h b/src/TableFunctions/TableFunctionMySQL.h index 64c7d56cf2a..78e84dfed29 100644 --- a/src/TableFunctions/TableFunctionMySQL.h +++ b/src/TableFunctions/TableFunctionMySQL.h @@ -5,6 +5,7 @@ #if USE_MYSQL #include +#include #include @@ -30,14 +31,8 @@ private: ColumnsDescription getActualTableStructure(ContextPtr context) const override; void parseArguments(const ASTPtr & ast_function, ContextPtr context) override; - String remote_database_name; - String remote_table_name; - String user_name; - String password; - bool replace_query = false; - String on_duplicate_clause; - mutable std::optional pool; + std::optional configuration; }; } diff --git a/src/TableFunctions/TableFunctionPostgreSQL.cpp b/src/TableFunctions/TableFunctionPostgreSQL.cpp index 568cc6171fd..980066622c8 100644 --- a/src/TableFunctions/TableFunctionPostgreSQL.cpp +++ b/src/TableFunctions/TableFunctionPostgreSQL.cpp @@ -21,7 +21,6 @@ namespace DB namespace ErrorCodes { - extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int BAD_ARGUMENTS; } @@ -33,12 +32,12 @@ StoragePtr TableFunctionPostgreSQL::executeImpl(const ASTPtr & /*ast_function*/, auto result = std::make_shared( StorageID(getDatabaseName(), table_name), connection_pool, - remote_table_name, + configuration->table, columns, ConstraintsDescription{}, String{}, - remote_table_schema, - on_conflict); + configuration->schema, + configuration->on_conflict); result->startup(); return result; @@ -51,8 +50,8 @@ ColumnsDescription TableFunctionPostgreSQL::getActualTableStructure(ContextPtr c auto connection_holder = connection_pool->get(); auto columns = fetchPostgreSQLTableStructure( connection_holder->get(), - remote_table_schema.empty() ? doubleQuoteString(remote_table_name) - : doubleQuoteString(remote_table_schema) + '.' + doubleQuoteString(remote_table_name), + configuration->schema.empty() ? doubleQuoteString(configuration->table) + : doubleQuoteString(configuration->schema) + '.' + doubleQuoteString(configuration->table), use_nulls).columns; return ColumnsDescription{*columns}; @@ -62,37 +61,13 @@ ColumnsDescription TableFunctionPostgreSQL::getActualTableStructure(ContextPtr c void TableFunctionPostgreSQL::parseArguments(const ASTPtr & ast_function, ContextPtr context) { const auto & func_args = ast_function->as(); - if (!func_args.arguments) throw Exception("Table function 'PostgreSQL' must have arguments.", ErrorCodes::BAD_ARGUMENTS); - ASTs & args = func_args.arguments->children; - - if (args.size() < 5 || args.size() > 7) - throw Exception("Table function 'PostgreSQL' requires from 5 to 7 parameters: " - "PostgreSQL('host:port', 'database', 'table', 'user', 'password', [, 'schema', 'ON CONFLICT ...']).", - ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); - - for (auto & arg : args) - arg = evaluateConstantExpressionOrIdentifierAsLiteral(arg, context); - - /// Split into replicas if needed. 5432 is a default postgresql port. - const auto & host_port = args[0]->as().value.safeGet(); - size_t max_addresses = context->getSettingsRef().glob_expansion_max_elements; - auto addresses = parseRemoteDescriptionForExternalDatabase(host_port, max_addresses, 5432); - - remote_table_name = args[2]->as().value.safeGet(); - - if (args.size() >= 6) - remote_table_schema = args[5]->as().value.safeGet(); - if (args.size() >= 7) - on_conflict = args[6]->as().value.safeGet(); - - connection_pool = std::make_shared( - args[1]->as().value.safeGet(), - addresses, - args[3]->as().value.safeGet(), - args[4]->as().value.safeGet()); + configuration.emplace(StoragePostgreSQL::getConfiguration(func_args.arguments->children, context)); + connection_pool = std::make_shared(*configuration, + context->getSettingsRef().postgresql_connection_pool_size, + context->getSettingsRef().postgresql_connection_pool_wait_timeout); } diff --git a/src/TableFunctions/TableFunctionPostgreSQL.h b/src/TableFunctions/TableFunctionPostgreSQL.h index e3810a0e391..ff363e3a6cf 100644 --- a/src/TableFunctions/TableFunctionPostgreSQL.h +++ b/src/TableFunctions/TableFunctionPostgreSQL.h @@ -6,6 +6,7 @@ #if USE_LIBPQXX #include #include +#include namespace DB @@ -27,9 +28,8 @@ private: ColumnsDescription getActualTableStructure(ContextPtr context) const override; void parseArguments(const ASTPtr & ast_function, ContextPtr context) override; - String connection_str; - String remote_table_name, remote_table_schema, on_conflict; postgres::PoolWithFailoverPtr connection_pool; + std::optional configuration; }; } diff --git a/src/TableFunctions/TableFunctionRemote.cpp b/src/TableFunctions/TableFunctionRemote.cpp index 08f61a49fa5..3c39e3f2ec0 100644 --- a/src/TableFunctions/TableFunctionRemote.cpp +++ b/src/TableFunctions/TableFunctionRemote.cpp @@ -93,14 +93,8 @@ void TableFunctionRemote::parseArguments(const ASTPtr & ast_function, ContextPtr ++arg_num; - size_t dot = remote_database.find('.'); - if (dot != String::npos) - { - /// NOTE Bad - do not support identifiers in backquotes. - remote_table = remote_database.substr(dot + 1); - remote_database = remote_database.substr(0, dot); - } - else + auto qualified_name = QualifiedTableName::parseFromString(remote_database); + if (qualified_name.database.empty()) { if (arg_num >= args.size()) { @@ -108,11 +102,15 @@ void TableFunctionRemote::parseArguments(const ASTPtr & ast_function, ContextPtr } else { + std::swap(qualified_name.database, qualified_name.table); args[arg_num] = evaluateConstantExpressionOrIdentifierAsLiteral(args[arg_num], context); - remote_table = args[arg_num]->as().value.safeGet(); + qualified_name.table = args[arg_num]->as().value.safeGet(); ++arg_num; } } + + remote_database = std::move(qualified_name.database); + remote_table = std::move(qualified_name.table); } /// Cluster function may have sharding key for insert diff --git a/src/TableFunctions/TableFunctionS3.cpp b/src/TableFunctions/TableFunctionS3.cpp index 9878ed72560..4faf3f15aa4 100644 --- a/src/TableFunctions/TableFunctionS3.cpp +++ b/src/TableFunctions/TableFunctionS3.cpp @@ -3,13 +3,13 @@ #if USE_AWS_S3 #include -#include #include #include #include #include #include #include +#include #include "registerTableFunctions.h" @@ -38,51 +38,75 @@ void TableFunctionS3::parseArguments(const ASTPtr & ast_function, ContextPtr con throw Exception("Table function '" + getName() + "' must have arguments.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); ASTs & args = args_func.at(0)->children; + StorageS3Configuration configuration; - if (args.size() < 3 || args.size() > 6) - throw Exception(message, ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); - - for (auto & arg : args) - arg = evaluateConstantExpressionOrIdentifierAsLiteral(arg, context); - - /// Size -> argument indexes - static auto size_to_args = std::map> + if (auto named_collection = getURLBasedDataSourceConfiguration(args, context)) { - {3, {{"format", 1}, {"structure", 2}}}, - {4, {{"format", 1}, {"structure", 2}, {"compression_method", 3}}}, - {5, {{"access_key_id", 1}, {"secret_access_key", 2}, {"format", 3}, {"structure", 4}}}, - {6, {{"access_key_id", 1}, {"secret_access_key", 2}, {"format", 3}, {"structure", 4}, {"compression_method", 5}}} - }; + auto [common_configuration, storage_specific_args] = named_collection.value(); + configuration.set(common_configuration); - /// This argument is always the first - filename = args[0]->as().value.safeGet(); + for (const auto & [arg_name, arg_value] : storage_specific_args) + { + if (arg_name == "access_key_id") + configuration.access_key_id = arg_value.safeGet(); + else if (arg_name == "secret_access_key") + configuration.secret_access_key = arg_value.safeGet(); + else + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, + "Unknown key-value argument `{}` for StorageS3, expected: " + "url, [access_key_id, secret_access_key], name of used format, structure and [compression_method].", + arg_name); + } + } + else + { + if (args.size() < 3 || args.size() > 6) + throw Exception(message, ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); - auto & args_to_idx = size_to_args[args.size()]; + for (auto & arg : args) + arg = evaluateConstantExpressionOrIdentifierAsLiteral(arg, context); - if (args_to_idx.contains("format")) - format = args[args_to_idx["format"]]->as().value.safeGet(); + /// Size -> argument indexes + static auto size_to_args = std::map> + { + {3, {{"format", 1}, {"structure", 2}}}, + {4, {{"format", 1}, {"structure", 2}, {"compression_method", 3}}}, + {5, {{"access_key_id", 1}, {"secret_access_key", 2}, {"format", 3}, {"structure", 4}}}, + {6, {{"access_key_id", 1}, {"secret_access_key", 2}, {"format", 3}, {"structure", 4}, {"compression_method", 5}}} + }; - if (args_to_idx.contains("structure")) - structure = args[args_to_idx["structure"]]->as().value.safeGet(); + /// This argument is always the first + configuration.url = args[0]->as().value.safeGet(); - if (args_to_idx.contains("compression_method")) - compression_method = args[args_to_idx["compression_method"]]->as().value.safeGet(); + auto & args_to_idx = size_to_args[args.size()]; - if (args_to_idx.contains("access_key_id")) - access_key_id = args[args_to_idx["access_key_id"]]->as().value.safeGet(); + if (args_to_idx.contains("format")) + configuration.format = args[args_to_idx["format"]]->as().value.safeGet(); - if (args_to_idx.contains("secret_access_key")) - secret_access_key = args[args_to_idx["secret_access_key"]]->as().value.safeGet(); + if (args_to_idx.contains("structure")) + configuration.structure = args[args_to_idx["structure"]]->as().value.safeGet(); + + if (args_to_idx.contains("compression_method")) + configuration.compression_method = args[args_to_idx["compression_method"]]->as().value.safeGet(); + + if (args_to_idx.contains("access_key_id")) + configuration.access_key_id = args[args_to_idx["access_key_id"]]->as().value.safeGet(); + + if (args_to_idx.contains("secret_access_key")) + configuration.secret_access_key = args[args_to_idx["secret_access_key"]]->as().value.safeGet(); + } + + s3_configuration = std::move(configuration); } ColumnsDescription TableFunctionS3::getActualTableStructure(ContextPtr context) const { - return parseColumnsListFromString(structure, context); + return parseColumnsListFromString(s3_configuration->structure, context); } StoragePtr TableFunctionS3::executeImpl(const ASTPtr & /*ast_function*/, ContextPtr context, const std::string & table_name, ColumnsDescription /*cached_columns*/) const { - Poco::URI uri (filename); + Poco::URI uri (s3_configuration->url); S3::URI s3_uri (uri); UInt64 max_single_read_retries = context->getSettingsRef().s3_max_single_read_retries; UInt64 min_upload_part_size = context->getSettingsRef().s3_min_upload_part_size; @@ -91,10 +115,10 @@ StoragePtr TableFunctionS3::executeImpl(const ASTPtr & /*ast_function*/, Context StoragePtr storage = StorageS3::create( s3_uri, - access_key_id, - secret_access_key, + s3_configuration->access_key_id, + s3_configuration->secret_access_key, StorageID(getDatabaseName(), table_name), - format, + s3_configuration->format, max_single_read_retries, min_upload_part_size, max_single_part_upload_size, @@ -105,7 +129,7 @@ StoragePtr TableFunctionS3::executeImpl(const ASTPtr & /*ast_function*/, Context context, /// No format_settings for table function S3 std::nullopt, - compression_method); + s3_configuration->compression_method); storage->startup(); diff --git a/src/TableFunctions/TableFunctionS3.h b/src/TableFunctions/TableFunctionS3.h index 1835fa3daa9..8d4c1391236 100644 --- a/src/TableFunctions/TableFunctionS3.h +++ b/src/TableFunctions/TableFunctionS3.h @@ -5,6 +5,7 @@ #if USE_AWS_S3 #include +#include namespace DB @@ -36,12 +37,7 @@ protected: ColumnsDescription getActualTableStructure(ContextPtr context) const override; void parseArguments(const ASTPtr & ast_function, ContextPtr context) override; - String filename; - String format; - String structure; - String access_key_id; - String secret_access_key; - String compression_method = "auto"; + std::optional s3_configuration; }; class TableFunctionCOS : public TableFunctionS3 diff --git a/src/TableFunctions/TableFunctionURL.cpp b/src/TableFunctions/TableFunctionURL.cpp index a1fe142bea6..4b78862a269 100644 --- a/src/TableFunctions/TableFunctionURL.cpp +++ b/src/TableFunctions/TableFunctionURL.cpp @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -12,6 +13,48 @@ namespace DB { + +namespace ErrorCodes +{ + extern const int BAD_ARGUMENTS; +} + +void TableFunctionURL::parseArguments(const ASTPtr & ast_function, ContextPtr context) +{ + const auto & func_args = ast_function->as(); + if (!func_args.arguments) + throw Exception("Table function 'URL' must have arguments.", ErrorCodes::BAD_ARGUMENTS); + + URLBasedDataSourceConfiguration configuration; + if (auto with_named_collection = getURLBasedDataSourceConfiguration(func_args.arguments->children, context)) + { + auto [common_configuration, storage_specific_args] = with_named_collection.value(); + configuration.set(common_configuration); + + if (!storage_specific_args.empty()) + { + String illegal_args; + for (const auto & arg : storage_specific_args) + { + if (!illegal_args.empty()) + illegal_args += ", "; + illegal_args += arg.first; + } + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Unknown arguments {} for table function URL", illegal_args); + } + + filename = configuration.url; + format = configuration.format; + structure = configuration.structure; + compression_method = configuration.compression_method; + } + else + { + ITableFunctionFileLike::parseArguments(ast_function, context); + } +} + + StoragePtr TableFunctionURL::getStorage( const String & source, const String & format_, const ColumnsDescription & columns, ContextPtr global_context, const std::string & table_name, const String & compression_method_) const diff --git a/src/TableFunctions/TableFunctionURL.h b/src/TableFunctions/TableFunctionURL.h index fde361e8bbb..c35db9f9c8b 100644 --- a/src/TableFunctions/TableFunctionURL.h +++ b/src/TableFunctions/TableFunctionURL.h @@ -19,6 +19,9 @@ public: return name; } +protected: + void parseArguments(const ASTPtr & ast_function, ContextPtr context) override; + private: StoragePtr getStorage( const String & source, const String & format_, const ColumnsDescription & columns, ContextPtr global_context, diff --git a/tests/ci/ci_config.json b/tests/ci/ci_config.json index 2efb6faa614..6222e4f61bc 100644 --- a/tests/ci/ci_config.json +++ b/tests/ci/ci_config.json @@ -704,6 +704,18 @@ "clang-tidy": "disable", "with_coverage": false } + }, + "ClickHouse Keeper Jepsen": { + "required_build_properties": { + "compiler": "clang-13", + "package_type": "binary", + "build_type": "relwithdebuginfo", + "sanitizer": "none", + "bundled": "bundled", + "splitted": "unsplitted", + "clang-tidy": "disable", + "with_coverage": false + } } } } diff --git a/tests/clickhouse-test b/tests/clickhouse-test index b8b67181d9a..470b88de574 100755 --- a/tests/clickhouse-test +++ b/tests/clickhouse-test @@ -1,5 +1,7 @@ #!/usr/bin/env python3 +# pylint: disable=too-many-return-statements +import enum import shutil import sys import os @@ -8,6 +10,7 @@ import signal import re import copy import traceback +import math from argparse import ArgumentParser from typing import Tuple, Union, Optional, TextIO, Dict, Set, List @@ -42,19 +45,10 @@ except ImportError: DISTRIBUTED_DDL_TIMEOUT_MSG = "is executing longer than distributed_ddl_task_timeout" MESSAGES_TO_RETRY = [ - "DB::Exception: ZooKeeper session has been expired", - "Coordination::Exception: Session expired", - "Coordination::Exception: Connection loss", - "Coordination::Exception: Operation timeout", - "DB::Exception: Session expired", - "DB::Exception: Connection loss", - "DB::Exception: Operation timeout", - "Operation timed out", "ConnectionPoolWithFailover: Connection failed at try", "DB::Exception: New table appeared in database being dropped or detached. Try again", "is already started to be removing by another replica right now", "DB::Exception: Cannot enqueue query", - "Shutdown is called for table", # It happens in SYSTEM SYNC REPLICA query if session with ZooKeeper is being reinitialized. DISTRIBUTED_DDL_TIMEOUT_MSG # FIXME ] @@ -93,138 +87,30 @@ def get_db_engine(args, database_name): return "" # Will use default engine -def configure_testcase_args(args, case_file, suite_tmp_dir, stderr_file): - testcase_args = copy.deepcopy(args) +def get_zookeeper_session_uptime(args): + try: + query = b"SELECT zookeeperSessionUptime()" - testcase_args.testcase_start_time = datetime.now() - testcase_basename = os.path.basename(case_file) - testcase_args.testcase_client = f"{testcase_args.client} --log_comment='{testcase_basename}'" + if args.replicated_database: + query = b"SELECT min(materialize(zookeeperSessionUptime())) " \ + b"FROM clusterAllReplicas('test_cluster_database_replicated', system.one) " - if testcase_args.database: - database = testcase_args.database - os.environ.setdefault("CLICKHOUSE_DATABASE", database) - os.environ.setdefault("CLICKHOUSE_TMP", suite_tmp_dir) - else: - # If --database is not specified, we will create temporary database with unique name - # And we will recreate and drop it for each test - def random_str(length=6): - alphabet = string.ascii_lowercase + string.digits - return ''.join(random.choice(alphabet) for _ in range(length)) - database = 'test_{suffix}'.format(suffix=random_str()) + clickhouse_proc = open_client_process(args.client) - with open(stderr_file, 'w') as stderr: - client_cmd = testcase_args.testcase_client + " " \ - + get_additional_client_options(args) + (stdout, _) = clickhouse_proc.communicate((query), timeout=20) - clickhouse_proc_create = open_client_process( - universal_newlines=True, - client_args=client_cmd, - stderr_file=stderr) - - try: - clickhouse_proc_create.communicate(("CREATE DATABASE " + database + get_db_engine(testcase_args, database)), timeout=testcase_args.timeout) - except TimeoutExpired: - total_time = (datetime.now() - testcase_args.testcase_start_time).total_seconds() - return clickhouse_proc_create, "", "Timeout creating database {} before test".format(database), total_time - - os.environ["CLICKHOUSE_DATABASE"] = database - # Set temporary directory to match the randomly generated database, - # because .sh tests also use it for temporary files and we want to avoid - # collisions. - testcase_args.test_tmp_dir = os.path.join(suite_tmp_dir, database) - os.mkdir(testcase_args.test_tmp_dir) - os.environ.setdefault("CLICKHOUSE_TMP", testcase_args.test_tmp_dir) - - testcase_args.testcase_database = database - - return testcase_args - -def run_single_test(args, ext, server_logs_level, client_options, case_file, stdout_file, stderr_file): - client = args.testcase_client - start_time = args.testcase_start_time - database = args.testcase_database - - # This is for .sh tests - os.environ["CLICKHOUSE_LOG_COMMENT"] = case_file - - params = { - 'client': client + ' --database=' + database, - 'logs_level': server_logs_level, - 'options': client_options, - 'test': case_file, - 'stdout': stdout_file, - 'stderr': stderr_file, - } - - # >> append to stderr (but not stdout since it is not used there), - # because there are also output of per test database creation - if not args.database: - pattern = '{test} > {stdout} 2>> {stderr}' - else: - pattern = '{test} > {stdout} 2> {stderr}' - - if ext == '.sql': - pattern = "{client} --send_logs_level={logs_level} --testmode --multiquery {options} < " + pattern - - command = pattern.format(**params) - - proc = Popen(command, shell=True, env=os.environ) - - while (datetime.now() - start_time).total_seconds() < args.timeout and proc.poll() is None: - sleep(0.01) - - need_drop_database = not args.database - if need_drop_database and args.no_drop_if_fail: - maybe_passed = (proc.returncode == 0) and (proc.stderr is None) and (proc.stdout is None or 'Exception' not in proc.stdout) - need_drop_database = not maybe_passed - - if need_drop_database: - with open(stderr_file, 'a') as stderr: - clickhouse_proc_create = open_client_process(client, universal_newlines=True, stderr_file=stderr) - - seconds_left = max(args.timeout - (datetime.now() - start_time).total_seconds(), 20) - - try: - drop_database_query = "DROP DATABASE " + database - if args.replicated_database: - drop_database_query += " ON CLUSTER test_cluster_database_replicated" - clickhouse_proc_create.communicate((drop_database_query), timeout=seconds_left) - except TimeoutExpired: - # kill test process because it can also hung - if proc.returncode is None: - try: - proc.kill() - except OSError as e: - if e.errno != ESRCH: - raise - - total_time = (datetime.now() - start_time).total_seconds() - return clickhouse_proc_create, "", f"Timeout dropping database {database} after test", total_time - - shutil.rmtree(args.test_tmp_dir) - - total_time = (datetime.now() - start_time).total_seconds() - - # Normalize randomized database names in stdout, stderr files. - os.system("LC_ALL=C sed -i -e 's/{test_db}/default/g' {file}".format(test_db=database, file=stdout_file)) - if args.hide_db_name: - os.system("LC_ALL=C sed -i -e 's/{test_db}/default/g' {file}".format(test_db=database, file=stderr_file)) - if args.replicated_database: - os.system("LC_ALL=C sed -i -e 's|/auto_{{shard}}||g' {file}".format(file=stdout_file)) - os.system("LC_ALL=C sed -i -e 's|auto_{{replica}}||g' {file}".format(file=stdout_file)) - - # Normalize hostname in stdout file. - os.system("LC_ALL=C sed -i -e 's/{hostname}/localhost/g' {file}".format(hostname=socket.gethostname(), file=stdout_file)) - - stdout = open(stdout_file, 'rb').read() if os.path.exists(stdout_file) else b'' - stdout = str(stdout, errors='replace', encoding='utf-8') - stderr = open(stderr_file, 'rb').read() if os.path.exists(stderr_file) else b'' - stderr = str(stderr, errors='replace', encoding='utf-8') - - return proc, stdout, stderr, total_time + return int(stdout.decode('utf-8').strip()) + except: + return None -def need_retry(stdout, stderr): +def need_retry(args, stdout, stderr, total_time): + # Sometimes we may get unexpected exception like "Replica is readonly" or "Shutdown is called for table" + # instead of "Session expired" or "Connection loss" + # Retry if session was expired during test execution + session_uptime = get_zookeeper_session_uptime(args) + if session_uptime is not None and session_uptime < math.ceil(total_time): + return True return any(msg in stdout for msg in MESSAGES_TO_RETRY) or any(msg in stderr for msg in MESSAGES_TO_RETRY) @@ -343,110 +229,609 @@ def colored(text, args, color=None, on_color=None, attrs=None): return text +class TestStatus(enum.Enum): + FAIL = "FAIL" + UNKNOWN = "UNKNOWN" + OK = "OK" + SKIPPED = "SKIPPED" + + +class FailureReason(enum.Enum): + # FAIL reasons + TIMEOUT = "Timeout!" + SERVER_DIED = "server died" + EXIT_CODE = "return code: " + STDERR = "having stderror: " + EXCEPTION = "having having exception in stdout: " + RESULT_DIFF = "result differs with reference: " + TOO_LONG = "Test runs too long (> 60s). Make it faster." + + # SKIPPED reasons + DISABLED = "disabled" + SKIP = "skip" + NO_JINJA = "no jinja" + NO_ZOOKEEPER = "no zookeeper" + NO_SHARD = "no shard" + FAST_ONLY = "running fast tests only" + NO_LONG = "not running long tests" + REPLICATED_DB = "replicated-database" + BUILD = "not running for current build" + + # UNKNOWN reasons + NO_REFERENCE = "no reference file" + INTERNAL_ERROR = "Test internal error: " + + +class TestResult: + def __init__(self, case_name: str, status: TestStatus, reason: Optional[FailureReason], total_time: float, description: str): + self.case_name: str = case_name + self.status: TestStatus = status + self.reason: Optional[FailureReason] = reason + self.total_time: float = total_time + self.description: str = description + self.need_retry: bool = False + + def check_if_need_retry(self, args, stdout, stderr, runs_count): + if self.status != TestStatus.FAIL: + return + if not need_retry(args, stdout, stderr, self.total_time): + return + if MAX_RETRIES < runs_count: + return + self.need_retry = True + + +class TestCase: + @staticmethod + def get_reference_file(suite_dir, name): + """ + Returns reference file name for specified test + """ + + name = removesuffix(name, ".gen") + for ext in ['.reference', '.gen.reference']: + reference_file = os.path.join(suite_dir, name) + ext + if os.path.isfile(reference_file): + return reference_file + return None + + @staticmethod + def configure_testcase_args(args, case_file, suite_tmp_dir, stderr_file): + testcase_args = copy.deepcopy(args) + + testcase_args.testcase_start_time = datetime.now() + testcase_basename = os.path.basename(case_file) + testcase_args.testcase_client = f"{testcase_args.client} --log_comment='{testcase_basename}'" + + if testcase_args.database: + database = testcase_args.database + os.environ.setdefault("CLICKHOUSE_DATABASE", database) + os.environ.setdefault("CLICKHOUSE_TMP", suite_tmp_dir) + else: + # If --database is not specified, we will create temporary database with unique name + # And we will recreate and drop it for each test + def random_str(length=6): + alphabet = string.ascii_lowercase + string.digits + return ''.join(random.choice(alphabet) for _ in range(length)) + + database = 'test_{suffix}'.format(suffix=random_str()) + + with open(stderr_file, 'w') as stderr: + client_cmd = testcase_args.testcase_client + " " \ + + get_additional_client_options(args) + + clickhouse_proc_create = open_client_process( + universal_newlines=True, + client_args=client_cmd, + stderr_file=stderr) + + try: + clickhouse_proc_create.communicate( + ("CREATE DATABASE " + database + get_db_engine(testcase_args, database)), + timeout=testcase_args.timeout) + except TimeoutExpired: + total_time = (datetime.now() - testcase_args.testcase_start_time).total_seconds() + return clickhouse_proc_create, "", "Timeout creating database {} before test".format( + database), total_time + + os.environ["CLICKHOUSE_DATABASE"] = database + # Set temporary directory to match the randomly generated database, + # because .sh tests also use it for temporary files and we want to avoid + # collisions. + testcase_args.test_tmp_dir = os.path.join(suite_tmp_dir, database) + os.mkdir(testcase_args.test_tmp_dir) + os.environ.setdefault("CLICKHOUSE_TMP", testcase_args.test_tmp_dir) + + testcase_args.testcase_database = database + + return testcase_args + + def __init__(self, suite, case: str, args, is_concurrent: bool): + self.case: str = case # case file name + self.tags: Set[str] = suite.all_tags[case] if case in suite.all_tags else set() + + self.case_file: str = os.path.join(suite.suite_path, case) + (self.name, self.ext) = os.path.splitext(case) + + file_suffix = ('.' + str(os.getpid())) if is_concurrent and args.test_runs > 1 else '' + self.reference_file = self.get_reference_file(suite.suite_path, self.name) + self.stdout_file = os.path.join(suite.suite_tmp_path, self.name) + file_suffix + '.stdout' + self.stderr_file = os.path.join(suite.suite_tmp_path, self.name) + file_suffix + '.stderr' + + self.testcase_args = None + self.runs_count = 0 + + # should skip test, should increment skipped_total, skip reason + def should_skip_test(self, suite) -> Optional[FailureReason]: + tags = self.tags + + if tags and ('disabled' in tags) and not args.disabled: + return FailureReason.DISABLED + + elif os.path.exists(os.path.join(suite.suite_path, self.name) + '.disabled') and not args.disabled: + return FailureReason.DISABLED + + elif args.skip and any(s in self.name for s in args.skip): + return FailureReason.SKIP + + elif not USE_JINJA and self.ext.endswith("j2"): + return FailureReason.NO_JINJA + + elif tags and (('zookeeper' in tags) or ('replica' in tags)) and not args.zookeeper: + return FailureReason.NO_ZOOKEEPER + + elif tags and (('shard' in tags) or ('distributed' in tags) or ('global' in tags)) and not args.shard: + return FailureReason.NO_SHARD + + elif tags and ('no-fasttest' in tags) and args.fast_tests_only: + return FailureReason.FAST_ONLY + + elif tags and (('long' in tags) or ('deadlock' in tags) or ('race' in tags)) and args.no_long: + # Tests for races and deadlocks usually are run in a loop for a significant amount of time + return FailureReason.NO_LONG + + elif tags and ('no-replicated-database' in tags) and args.replicated_database: + return FailureReason.REPLICATED_DB + + elif tags: + for build_flag in args.build_flags: + if 'no-' + build_flag in tags: + return FailureReason.BUILD + + return None + + def process_result_impl(self, proc, stdout: str, stderr: str, total_time: float): + description = "" + + if proc.returncode is None: + try: + proc.kill() + except OSError as e: + if e.errno != ESRCH: + raise + + if stderr: + description += stderr + return TestResult(self.name, TestStatus.FAIL, FailureReason.TIMEOUT, total_time, description) + + if proc.returncode != 0: + reason = FailureReason.EXIT_CODE + description += str(proc.returncode) + + if stderr: + description += "\n" + description += stderr + + # Stop on fatal errors like segmentation fault. They are sent to client via logs. + if ' ' in stderr: + reason = FailureReason.SERVER_DIED + + if self.testcase_args.stop \ + and ('Connection refused' in stderr or 'Attempt to read after eof' in stderr) \ + and 'Received exception from server' not in stderr: + reason = FailureReason.SERVER_DIED + + if os.path.isfile(self.stdout_file): + description += ", result:\n\n" + description += '\n'.join(open(self.stdout_file).read().split('\n')[:100]) + description += '\n' + + description += "\nstdout:\n{}\n".format(stdout) + return TestResult(self.name, TestStatus.FAIL, reason, total_time, description) + + if stderr: + description += "\n{}\n".format('\n'.join(stderr.split('\n')[:100])) + description += "\nstdout:\n{}\n".format(stdout) + return TestResult(self.name, TestStatus.FAIL, FailureReason.STDERR, total_time, description) + + if 'Exception' in stdout: + description += "\n{}\n".format('\n'.join(stdout.split('\n')[:100])) + return TestResult(self.name, TestStatus.FAIL, FailureReason.EXCEPTION, total_time, description) + + if '@@SKIP@@' in stdout: + skip_reason = stdout.replace('@@SKIP@@', '').rstrip("\n") + description += " - " + description += skip_reason + return TestResult(self.name, TestStatus.SKIPPED, FailureReason.SKIP, total_time, description) + + if self.reference_file is None: + return TestResult(self.name, TestStatus.UNKNOWN, FailureReason.NO_REFERENCE, total_time, description) + + result_is_different = subprocess.call(['diff', '-q', self.reference_file, self.stdout_file], stdout=PIPE) + + if result_is_different: + diff = Popen(['diff', '-U', str(self.testcase_args.unified), self.reference_file, self.stdout_file], stdout=PIPE, + universal_newlines=True).communicate()[0] + description += "\n{}\n".format(diff) + return TestResult(self.name, TestStatus.FAIL, FailureReason.RESULT_DIFF, total_time, description) + + if self.testcase_args.test_runs > 1 and total_time > 60 and 'long' not in self.name: + # We're in Flaky Check mode, check the run time as well while we're at it. + return TestResult(self.name, TestStatus.FAIL, FailureReason.TOO_LONG, total_time, description) + + if os.path.exists(self.stdout_file): + os.remove(self.stdout_file) + if os.path.exists(self.stderr_file): + os.remove(self.stderr_file) + + return TestResult(self.name, TestStatus.OK, None, total_time, description) + + @staticmethod + def print_test_time(test_time) -> str: + if args.print_time: + return " {0:.2f} sec.".format(test_time) + else: + return '' + + def process_result(self, result: TestResult, messages): + description_full = messages[result.status] + description_full += self.print_test_time(result.total_time) + if result.reason is not None: + description_full += " - " + description_full += result.reason.value + + description_full += result.description + description_full += "\n" + + if result.status == TestStatus.FAIL: + description_full += 'Database: ' + self.testcase_args.testcase_database + + result.description = description_full + return result + + @staticmethod + def send_test_name_failed(suite: str, case: str) -> bool: + clickhouse_proc = open_client_process(args.client, universal_newlines=True) + + failed_to_check = False + + pid = os.getpid() + query = f"SELECT 'Running test {suite}/{case} from pid={pid}';" + + try: + clickhouse_proc.communicate((query), timeout=20) + except: + failed_to_check = True + + return failed_to_check or clickhouse_proc.returncode != 0 + + def run_single_test(self, server_logs_level, client_options): + args = self.testcase_args + client = args.testcase_client + start_time = args.testcase_start_time + database = args.testcase_database + + # This is for .sh tests + os.environ["CLICKHOUSE_LOG_COMMENT"] = self.case_file + + params = { + 'client': client + ' --database=' + database, + 'logs_level': server_logs_level, + 'options': client_options, + 'test': self.case_file, + 'stdout': self.stdout_file, + 'stderr': self.stderr_file, + } + + # >> append to stderr (but not stdout since it is not used there), + # because there are also output of per test database creation + if not args.database: + pattern = '{test} > {stdout} 2>> {stderr}' + else: + pattern = '{test} > {stdout} 2> {stderr}' + + if self.ext == '.sql': + pattern = "{client} --send_logs_level={logs_level} --testmode --multiquery {options} < " + pattern + + command = pattern.format(**params) + + proc = Popen(command, shell=True, env=os.environ) + + while (datetime.now() - start_time).total_seconds() < args.timeout and proc.poll() is None: + sleep(0.01) + + need_drop_database = not args.database + if need_drop_database and args.no_drop_if_fail: + maybe_passed = (proc.returncode == 0) and (proc.stderr is None) and ( + proc.stdout is None or 'Exception' not in proc.stdout) + need_drop_database = not maybe_passed + + if need_drop_database: + with open(self.stderr_file, 'a') as stderr: + clickhouse_proc_create = open_client_process(client, universal_newlines=True, stderr_file=stderr) + + seconds_left = max(args.timeout - (datetime.now() - start_time).total_seconds(), 20) + + try: + drop_database_query = "DROP DATABASE " + database + if args.replicated_database: + drop_database_query += " ON CLUSTER test_cluster_database_replicated" + clickhouse_proc_create.communicate((drop_database_query), timeout=seconds_left) + except TimeoutExpired: + # kill test process because it can also hung + if proc.returncode is None: + try: + proc.kill() + except OSError as e: + if e.errno != ESRCH: + raise + + total_time = (datetime.now() - start_time).total_seconds() + return clickhouse_proc_create, "", f"Timeout dropping database {database} after test", total_time + + shutil.rmtree(args.test_tmp_dir) + + total_time = (datetime.now() - start_time).total_seconds() + + # Normalize randomized database names in stdout, stderr files. + os.system("LC_ALL=C sed -i -e 's/{test_db}/default/g' {file}".format(test_db=database, file=self.stdout_file)) + if args.hide_db_name: + os.system( + "LC_ALL=C sed -i -e 's/{test_db}/default/g' {file}".format(test_db=database, file=self.stderr_file)) + if args.replicated_database: + os.system("LC_ALL=C sed -i -e 's|/auto_{{shard}}||g' {file}".format(file=self.stdout_file)) + os.system("LC_ALL=C sed -i -e 's|auto_{{replica}}||g' {file}".format(file=self.stdout_file)) + + # Normalize hostname in stdout file. + os.system("LC_ALL=C sed -i -e 's/{hostname}/localhost/g' {file}".format(hostname=socket.gethostname(), + file=self.stdout_file)) + + stdout = open(self.stdout_file, 'rb').read() if os.path.exists(self.stdout_file) else b'' + stdout = str(stdout, errors='replace', encoding='utf-8') + stderr = open(self.stderr_file, 'rb').read() if os.path.exists(self.stderr_file) else b'' + stderr = str(stderr, errors='replace', encoding='utf-8') + + return proc, stdout, stderr, total_time + + def run(self, args, suite, client_options, server_logs_level): + try: + skip_reason = self.should_skip_test(suite) + if skip_reason is not None: + return TestResult(self.name, TestStatus.SKIPPED, skip_reason, 0., "") + + if args.testname and self.send_test_name_failed(suite, self.case): + description = "\nServer does not respond to health check\n" + return TestResult(self.name, TestStatus.FAIL, FailureReason.SERVER_DIED, 0., description) + + self.runs_count += 1 + self.testcase_args = self.configure_testcase_args(args, self.case_file, suite.suite_tmp_path, self.stderr_file) + proc, stdout, stderr, total_time = self.run_single_test(server_logs_level, client_options) + + result = self.process_result_impl(proc, stdout, stderr, total_time) + result.check_if_need_retry(args, stdout, stderr, self.runs_count) + return result + except KeyboardInterrupt as e: + raise e + except: + exc_type, exc_value, tb = sys.exc_info() + exc_name = exc_type.__name__ + traceback_str = "\n".join(traceback.format_tb(tb, 10)) + description = f"{exc_name}\n{exc_value}\n{traceback_str}" + return TestResult(self.name, TestStatus.UNKNOWN, FailureReason.INTERNAL_ERROR, 0., description) + + +class TestSuite: + @staticmethod + def tests_in_suite_key_func(item: str) -> int: + if args.order == 'random': + return random.random() + + reverse = 1 if args.order == 'asc' else -1 + + if -1 == item.find('_'): + return 99998 + + prefix, _ = item.split('_', 1) + + try: + return reverse * int(prefix) + except ValueError: + return 99997 + + @staticmethod + def render_test_template(j2env, suite_dir, test_name): + """ + Render template for test and reference file if needed + """ + + if j2env is None: + return test_name + + test_base_name = removesuffix(test_name, ".sql.j2", ".sql") + + reference_file_name = test_base_name + ".reference.j2" + reference_file_path = os.path.join(suite_dir, reference_file_name) + if os.path.isfile(reference_file_path): + tpl = j2env.get_template(reference_file_name) + tpl.stream().dump(os.path.join(suite_dir, test_base_name) + ".gen.reference") + + if test_name.endswith(".sql.j2"): + tpl = j2env.get_template(test_name) + generated_test_name = test_base_name + ".gen.sql" + tpl.stream().dump(os.path.join(suite_dir, generated_test_name)) + return generated_test_name + + return test_name + + @staticmethod + def read_test_tags(suite_dir: str, all_tests: List[str]) -> Dict[str, Set[str]]: + def get_comment_sign(filename): + if filename.endswith('.sql') or filename.endswith('.sql.j2'): + return '--' + elif filename.endswith('.sh') or filename.endswith('.py') or filename.endswith('.expect'): + return '#' + else: + raise Exception(f'Unknown file_extension: {filename}') + + def parse_tags_from_line(line, comment_sign): + if not line.startswith(comment_sign): + return None + tags_str = line[len(comment_sign):].lstrip() + tags_prefix = "Tags:" + if not tags_str.startswith(tags_prefix): + return None + tags_str = tags_str[len(tags_prefix):] + tags = tags_str.split(',') + tags = {tag.strip() for tag in tags} + return tags + + def is_shebang(line): + return line.startswith('#!') + + def load_tags_from_file(filepath): + with open(filepath, 'r') as file: + try: + line = file.readline() + if is_shebang(line): + line = file.readline() + except UnicodeDecodeError: + return [] + return parse_tags_from_line(line, get_comment_sign(filepath)) + + all_tags = {} + start_time = datetime.now() + for test_name in all_tests: + tags = load_tags_from_file(os.path.join(suite_dir, test_name)) + if tags: + all_tags[test_name] = tags + elapsed = (datetime.now() - start_time).total_seconds() + if elapsed > 1: + print(f"Tags for suite {suite_dir} read in {elapsed:.2f} seconds") + return all_tags + + def __init__(self, args, suite_path: str, suite_tmp_path: str, suite: str): + self.args = args + self.suite_path: str = suite_path + self.suite_tmp_path: str = suite_tmp_path + self.suite: str = suite + + self.all_tests: List[str] = self.get_tests_list(self.tests_in_suite_key_func) + self.all_tags: Dict[str, Set[str]] = self.read_test_tags(self.suite_path, self.all_tests) + + self.sequential_tests = [] + self.parallel_tests = [] + for test_name in self.all_tests: + if self.is_sequential_test(test_name): + self.sequential_tests.append(test_name) + else: + self.parallel_tests.append(test_name) + + def is_sequential_test(self, test_name): + if args.sequential: + if any(s in test_name for s in args.sequential): + return True + + if test_name not in self.all_tags: + return False + + return ('no-parallel' in self.all_tags[test_name]) or ('sequential' in self.all_tags[test_name]) + + def get_tests_list(self, sort_key): + """ + Return list of tests file names to run + """ + + all_tests = list(self.get_selected_tests()) + all_tests = all_tests * self.args.test_runs + all_tests.sort(key=sort_key) + return all_tests + + def get_selected_tests(self): + """ + Find all files with tests, filter, render templates + """ + + j2env = jinja2.Environment( + loader=jinja2.FileSystemLoader(self.suite_path), + keep_trailing_newline=True, + ) if USE_JINJA else None + + for test_name in os.listdir(self.suite_path): + if not is_test_from_dir(self.suite_path, test_name): + continue + if self.args.test and not any(re.search(pattern, test_name) for pattern in self.args.test): + continue + if USE_JINJA and test_name.endswith(".gen.sql"): + continue + test_name = self.render_test_template(j2env, self.suite_path, test_name) + yield test_name + + @staticmethod + def readTestSuite(args, suite_dir_name: str): + def is_data_present(): + clickhouse_proc = open_client_process(args.client) + (stdout, stderr) = clickhouse_proc.communicate(b"EXISTS TABLE test.hits") + if clickhouse_proc.returncode != 0: + raise CalledProcessError(clickhouse_proc.returncode, args.client, stderr) + + return stdout.startswith(b'1') + + base_dir = os.path.abspath(args.queries) + tmp_dir = os.path.abspath(args.tmp) + suite_path = os.path.join(base_dir, suite_dir_name) + + suite_re_obj = re.search('^[0-9]+_(.*)$', suite_dir_name) + if not suite_re_obj: # skip .gitignore and so on + return None + + suite_tmp_path = os.path.join(tmp_dir, suite_dir_name) + if not os.path.exists(suite_tmp_path): + os.makedirs(suite_tmp_path) + + suite = suite_re_obj.group(1) + + if not os.path.isdir(suite_path): + return None + + if 'stateful' in suite and not args.no_stateful and not is_data_present(): + print("Won't run stateful tests because test data wasn't loaded.") + return None + if 'stateless' in suite and args.no_stateless: + print("Won't run stateless tests because they were manually disabled.") + return None + if 'stateful' in suite and args.no_stateful: + print("Won't run stateful tests because they were manually disabled.") + return None + + return TestSuite(args, suite_path, suite_tmp_path, suite) + + stop_time = None -exit_code = multiprocessing.Value("i", 0) -server_died = multiprocessing.Event() -stop_tests_triggered_lock = multiprocessing.Lock() -stop_tests_triggered = multiprocessing.Event() -queue = multiprocessing.Queue(maxsize=1) +exit_code = None +server_died = None +stop_tests_triggered_lock = None +stop_tests_triggered = None +queue = None +multiprocessing_manager = None +restarted_tests = None - -def print_test_time(test_time) -> str: - if args.print_time: - return " {0:.2f} sec.".format(test_time) - else: - return '' - - -# should skip test, should increment skipped_total, skip reason -def should_skip_test(name: str, test_ext: str, suite_dir: str, all_tags: Dict[str, Set[str]]) -> Tuple[bool, bool, str]: - tags = all_tags.get(name + test_ext) - - should_skip = False - increment_skip_count = False - skip_reason = '' - - if tags and ('disabled' in tags) and not args.disabled: - should_skip = True - increment_skip_count = False - skip_reason = 'disabled' - - elif os.path.exists(os.path.join(suite_dir, name) + '.disabled') and not args.disabled: - should_skip = True - increment_skip_count = False - skip_reason = 'disabled' - - elif args.skip and any(s in name for s in args.skip): - should_skip = True - increment_skip_count = True - skip_reason = 'skip' - - elif not USE_JINJA and test_ext.endswith("j2"): - should_skip = True - increment_skip_count = True - skip_reason = 'no jinja' - - elif tags and (('zookeeper' in tags) or ('replica' in tags)) and not args.zookeeper: - should_skip = True - increment_skip_count = True - skip_reason = 'no zookeeper' - - elif tags and (('shard' in tags) or ('distributed' in tags) or ('global' in tags)) and not args.shard: - should_skip = True - increment_skip_count = True - skip_reason = 'no shard' - - elif tags and ('no-fasttest' in tags) and args.fast_tests_only: - should_skip = True - increment_skip_count = True - skip_reason = 'running fast tests only' - - elif tags and (('long' in tags) or ('deadlock' in tags) or ('race' in tags)) and args.no_long: - # Tests for races and deadlocks usually are run in a loop for a significant amount of time - should_skip = True - increment_skip_count = True - skip_reason = 'not running long tests' - - elif tags and ('no-replicated-database' in tags) and args.replicated_database: - should_skip = True - increment_skip_count = True - skip_reason = 'replicated-database' - - elif tags: - for build_flag in args.build_flags: - if 'no-' + build_flag in tags: - should_skip = True - increment_skip_count = True - skip_reason = build_flag - break - - return should_skip, increment_skip_count, skip_reason - - -def send_test_name_failed(suite: str, case: str) -> bool: - clickhouse_proc = open_client_process(args.client, universal_newlines=True) - - failed_to_check = False - - pid = os.getpid() - query = f"SELECT 'Running test {suite}/{case} from pid={pid}';" - - try: - clickhouse_proc.communicate((query), timeout=20) - except: - failed_to_check = True - - return failed_to_check or clickhouse_proc.returncode != 0 - - -restarted_tests = [] # (test, stderr) - -# def run_tests_array(all_tests, num_tests, suite, suite_dir, suite_tmp_dir, all_tags): +# def run_tests_array(all_tests: List[str], num_tests: int, test_suite: TestSuite): def run_tests_array(all_tests_with_params): - all_tests, num_tests, suite, suite_dir, suite_tmp_dir, all_tags = all_tests_with_params + all_tests, num_tests, test_suite = all_tests_with_params global stop_time global exit_code global server_died + global restarted_tests OP_SQUARE_BRACKET = colored("[", args, attrs=['bold']) CL_SQUARE_BRACKET = colored("]", args, attrs=['bold']) @@ -456,10 +841,11 @@ def run_tests_array(all_tests_with_params): MSG_OK = OP_SQUARE_BRACKET + colored(" OK ", args, "green", attrs=['bold']) + CL_SQUARE_BRACKET MSG_SKIPPED = OP_SQUARE_BRACKET + colored(" SKIPPED ", args, "cyan", attrs=['bold']) + CL_SQUARE_BRACKET + MESSAGES = {TestStatus.FAIL: MSG_FAIL, TestStatus.UNKNOWN: MSG_UNKNOWN, TestStatus.OK: MSG_OK, TestStatus.SKIPPED: MSG_SKIPPED} + passed_total = 0 skipped_total = 0 failures_total = 0 - failures = 0 failures_chain = 0 start_time = datetime.now() @@ -470,7 +856,7 @@ def run_tests_array(all_tests_with_params): if num_tests > 0: about = 'about ' if is_concurrent else '' proc_name = multiprocessing.current_process().name - print(f"\nRunning {about}{num_tests} {suite} tests ({proc_name}).\n") + print(f"\nRunning {about}{num_tests} {test_suite.suite} tests ({proc_name}).\n") while True: if is_concurrent: @@ -492,182 +878,56 @@ def run_tests_array(all_tests_with_params): stop_tests() break - case_file = os.path.join(suite_dir, case) - (name, ext) = os.path.splitext(case) + test_case = TestCase(test_suite, case, args, is_concurrent) try: - status = '' + description = '' if not is_concurrent: sys.stdout.flush() - sys.stdout.write("{0:72}".format(removesuffix(name, ".gen", ".sql") + ": ")) + sys.stdout.write("{0:72}".format(removesuffix(test_case.name, ".gen", ".sql") + ": ")) # This flush is needed so you can see the test name of the long # running test before it will finish. But don't do it in parallel # mode, so that the lines don't mix. sys.stdout.flush() else: - status = "{0:72}".format(removesuffix(name, ".gen", ".sql") + ": ") + description = "{0:72}".format(removesuffix(test_case.name, ".gen", ".sql") + ": ") - skip_test, increment_skip_count, skip_reason = \ - should_skip_test(name, ext, suite_dir, all_tags) + while True: + test_result = test_case.run(args, test_suite, client_options, server_logs_level) + test_result = test_case.process_result(test_result, MESSAGES) + if not test_result.need_retry: + break + restarted_tests.append(test_result) - if skip_test: - status += MSG_SKIPPED + f" - {skip_reason}\n" - - if increment_skip_count: - skipped_total += 1 - else: - if args.testname and send_test_name_failed(suite, case): - failures += 1 - print("Server does not respond to health check") + if test_result.status == TestStatus.OK: + passed_total += 1 + failures_chain = 0 + elif test_result.status == TestStatus.FAIL: + failures_total += 1 + failures_chain += 1 + if test_result.reason == FailureReason.SERVER_DIED: server_died.set() stop_tests() - break - file_suffix = ('.' + str(os.getpid())) if is_concurrent and args.test_runs > 1 else '' - reference_file = get_reference_file(suite_dir, name) - stdout_file = os.path.join(suite_tmp_dir, name) + file_suffix + '.stdout' - stderr_file = os.path.join(suite_tmp_dir, name) + file_suffix + '.stderr' + elif test_result.status == TestStatus.SKIPPED: + skipped_total += 1 - testcase_args = configure_testcase_args(args, case_file, suite_tmp_dir, stderr_file) - proc, stdout, stderr, total_time = run_single_test(testcase_args, ext, server_logs_level, client_options, case_file, stdout_file, stderr_file) + description += test_result.description - if proc.returncode is None: - try: - proc.kill() - except OSError as e: - if e.errno != ESRCH: - raise + if description and not description.endswith('\n'): + description += '\n' - failures += 1 - status += MSG_FAIL - status += print_test_time(total_time) - status += " - Timeout!\n" - if stderr: - status += stderr - status += 'Database: ' + testcase_args.testcase_database - else: - counter = 1 - while need_retry(stdout, stderr): - restarted_tests.append((case_file, stderr)) - testcase_args = configure_testcase_args(args, case_file, suite_tmp_dir, stderr_file) - proc, stdout, stderr, total_time = run_single_test(testcase_args, ext, server_logs_level, client_options, case_file, stdout_file, stderr_file) - sleep(2**counter) - counter += 1 - if MAX_RETRIES < counter: - if args.replicated_database: - if DISTRIBUTED_DDL_TIMEOUT_MSG in stderr: - server_died.set() - break - - if proc.returncode != 0: - failures += 1 - failures_chain += 1 - status += MSG_FAIL - status += print_test_time(total_time) - status += ' - return code {}\n'.format(proc.returncode) - - if stderr: - status += stderr - - # Stop on fatal errors like segmentation fault. They are sent to client via logs. - if ' ' in stderr: - server_died.set() - - if testcase_args.stop \ - and ('Connection refused' in stderr or 'Attempt to read after eof' in stderr) \ - and 'Received exception from server' not in stderr: - server_died.set() - - if os.path.isfile(stdout_file): - status += ", result:\n\n" - status += '\n'.join( - open(stdout_file).read().split('\n')[:100]) - status += '\n' - - status += "\nstdout:\n{}\n".format(stdout) - status += 'Database: ' + testcase_args.testcase_database - - elif stderr: - failures += 1 - failures_chain += 1 - status += MSG_FAIL - status += print_test_time(total_time) - status += " - having stderror:\n{}\n".format( - '\n'.join(stderr.split('\n')[:100])) - status += "\nstdout:\n{}\n".format(stdout) - status += 'Database: ' + testcase_args.testcase_database - elif 'Exception' in stdout: - failures += 1 - failures_chain += 1 - status += MSG_FAIL - status += print_test_time(total_time) - status += " - having exception in stdout:\n{}\n".format( - '\n'.join(stdout.split('\n')[:100])) - status += 'Database: ' + testcase_args.testcase_database - elif '@@SKIP@@' in stdout: - skipped_total += 1 - skip_reason = stdout.replace('@@SKIP@@', '').rstrip("\n") - status += MSG_SKIPPED + f" - {skip_reason}\n" - elif reference_file is None: - status += MSG_UNKNOWN - status += print_test_time(total_time) - status += " - no reference file\n" - status += 'Database: ' + testcase_args.testcase_database - else: - result_is_different = subprocess.call(['diff', '-q', reference_file, stdout_file], stdout=PIPE) - - if result_is_different: - diff = Popen(['diff', '-U', str(testcase_args.unified), reference_file, stdout_file], stdout=PIPE, universal_newlines=True).communicate()[0] - failures += 1 - status += MSG_FAIL - status += print_test_time(total_time) - status += " - result differs with reference:\n{}\n".format(diff) - status += 'Database: ' + testcase_args.testcase_database - else: - if testcase_args.test_runs > 1 and total_time > 60 and 'long' not in name: - # We're in Flaky Check mode, check the run time as well while we're at it. - failures += 1 - failures_chain += 1 - status += MSG_FAIL - status += print_test_time(total_time) - status += " - Test runs too long (> 60s). Make it faster.\n" - status += 'Database: ' + testcase_args.testcase_database - else: - passed_total += 1 - failures_chain = 0 - status += MSG_OK - status += print_test_time(total_time) - status += "\n" - if os.path.exists(stdout_file): - os.remove(stdout_file) - if os.path.exists(stderr_file): - os.remove(stderr_file) - - if status and not status.endswith('\n'): - status += '\n' - - sys.stdout.write(status) + sys.stdout.write(description) sys.stdout.flush() except KeyboardInterrupt as e: print(colored("Break tests execution", args, "red")) stop_tests() raise e - except: - exc_type, exc_value, tb = sys.exc_info() - failures += 1 - - exc_name = exc_type.__name__ - traceback_str = "\n".join(traceback.format_tb(tb, 10)) - - print(f"{MSG_FAIL} - Test internal error: {exc_name}") - print(f"{exc_value}\n{traceback_str}") if failures_chain >= 20: stop_tests() break - failures_total = failures_total + failures - if failures_total > 0: print(colored(f"\nHaving {failures_total} errors! {passed_total} tests passed." f" {skipped_total} tests skipped. {(datetime.now() - start_time).total_seconds():.2f} s elapsed" @@ -708,18 +968,12 @@ def check_server_started(client, retry_count): sleep(0.5) continue - # FIXME Some old comment, maybe now CH supports Python3 ? - # We can't print this, because for some reason this is python 2, - # and args appeared in 3.3. To hell with it. - # print(''.join(clickhouse_proc.args)) - - # Other kind of error, fail. - code: int = clickhouse_proc.returncode print(f"\nClient invocation failed with code {code}:\n\ stdout: {stdout}\n\ - stderr: {stderr}") + stderr: {stderr}\n\ + args: {''.join(clickhouse_proc.args)}\n") sys.stdout.flush() @@ -816,23 +1070,6 @@ def suite_key_func(item: str) -> Union[int, Tuple[int, str]]: return 99997, '' -def tests_in_suite_key_func(item: str) -> int: - if args.order == 'random': - return random.random() - - reverse = 1 if args.order == 'asc' else -1 - - if -1 == item.find('_'): - return 99998 - - prefix, _ = item.split('_', 1) - - try: - return reverse * int(prefix) - except ValueError: - return 99997 - - def extract_key(key: str) -> str: return subprocess.getstatusoutput( args.extract_from_config + @@ -850,14 +1087,13 @@ def open_client_process( universal_newlines=True if universal_newlines else None) - -def do_run_tests(jobs, suite, suite_dir, suite_tmp_dir, all_tests, all_tags, parallel_tests, sequential_tests, parallel): - if jobs > 1 and len(parallel_tests) > 0: - print("Found", len(parallel_tests), "parallel tests and", len(sequential_tests), "sequential tests") +def do_run_tests(jobs, test_suite: TestSuite, parallel): + if jobs > 1 and len(test_suite.parallel_tests) > 0: + print("Found", len(test_suite.parallel_tests), "parallel tests and", len(test_suite.sequential_tests), "sequential tests") run_n, run_total = parallel.split('/') run_n = float(run_n) run_total = float(run_total) - tests_n = len(parallel_tests) + tests_n = len(test_suite.parallel_tests) if run_total > tests_n: run_total = tests_n @@ -866,15 +1102,15 @@ def do_run_tests(jobs, suite, suite_dir, suite_tmp_dir, all_tests, all_tags, par if jobs > run_total: run_total = jobs - batch_size = max(1, len(parallel_tests) // jobs) + batch_size = max(1, len(test_suite.parallel_tests) // jobs) parallel_tests_array = [] for _ in range(jobs): - parallel_tests_array.append((None, batch_size, suite, suite_dir, suite_tmp_dir, all_tags)) + parallel_tests_array.append((None, batch_size, test_suite)) with closing(multiprocessing.Pool(processes=jobs)) as pool: pool.map_async(run_tests_array, parallel_tests_array) - for suit in parallel_tests: + for suit in test_suite.parallel_tests: queue.put(suit) for _ in range(jobs): @@ -884,11 +1120,11 @@ def do_run_tests(jobs, suite, suite_dir, suite_tmp_dir, all_tests, all_tags, par pool.join() - run_tests_array((sequential_tests, len(sequential_tests), suite, suite_dir, suite_tmp_dir, all_tags)) - return len(sequential_tests) + len(parallel_tests) + run_tests_array((test_suite.sequential_tests, len(test_suite.sequential_tests), test_suite)) + return len(test_suite.sequential_tests) + len(test_suite.parallel_tests) else: - num_tests = len(all_tests) - run_tests_array((all_tests, num_tests, suite, suite_dir, suite_tmp_dir, all_tags)) + num_tests = len(test_suite.all_tests) + run_tests_array((test_suite.all_tests, num_tests, test_suite)) return num_tests @@ -913,89 +1149,12 @@ def removesuffix(text, *suffixes): return text -def render_test_template(j2env, suite_dir, test_name): - """ - Render template for test and reference file if needed - """ - - if j2env is None: - return test_name - - test_base_name = removesuffix(test_name, ".sql.j2", ".sql") - - reference_file_name = test_base_name + ".reference.j2" - reference_file_path = os.path.join(suite_dir, reference_file_name) - if os.path.isfile(reference_file_path): - tpl = j2env.get_template(reference_file_name) - tpl.stream().dump(os.path.join(suite_dir, test_base_name) + ".gen.reference") - - if test_name.endswith(".sql.j2"): - tpl = j2env.get_template(test_name) - generated_test_name = test_base_name + ".gen.sql" - tpl.stream().dump(os.path.join(suite_dir, generated_test_name)) - return generated_test_name - - return test_name - - -def get_selected_tests(suite_dir, patterns): - """ - Find all files with tests, filter, render templates - """ - - j2env = jinja2.Environment( - loader=jinja2.FileSystemLoader(suite_dir), - keep_trailing_newline=True, - ) if USE_JINJA else None - - for test_name in os.listdir(suite_dir): - if not is_test_from_dir(suite_dir, test_name): - continue - if patterns and not any(re.search(pattern, test_name) for pattern in patterns): - continue - if USE_JINJA and test_name.endswith(".gen.sql"): - continue - test_name = render_test_template(j2env, suite_dir, test_name) - yield test_name - - -def get_tests_list(suite_dir, patterns, test_runs, sort_key): - """ - Return list of tests file names to run - """ - - all_tests = list(get_selected_tests(suite_dir, patterns)) - all_tests = all_tests * test_runs - all_tests.sort(key=sort_key) - return all_tests - - -def get_reference_file(suite_dir, name): - """ - Returns reference file name for specified test - """ - - name = removesuffix(name, ".gen") - for ext in ['.reference', '.gen.reference']: - reference_file = os.path.join(suite_dir, name) + ext - if os.path.isfile(reference_file): - return reference_file - return None - - def main(args): global server_died global stop_time global exit_code global server_logs_level - - def is_data_present(): - clickhouse_proc = open_client_process(args.client) - (stdout, stderr) = clickhouse_proc.communicate(b"EXISTS TABLE test.hits") - if clickhouse_proc.returncode != 0: - raise CalledProcessError(clickhouse_proc.returncode, args.client, stderr) - - return stdout.startswith(b'1') + global restarted_tests if not check_server_started(args.client, args.server_check_retries): msg = "Server is not responding. Cannot execute 'SELECT 1' query. \ @@ -1043,13 +1202,17 @@ def main(args): def create_common_database(args, db_name): create_database_retries = 0 while create_database_retries < MAX_RETRIES: + start_time = datetime.now() + client_cmd = args.client + " " + get_additional_client_options(args) clickhouse_proc_create = open_client_process(client_cmd, universal_newlines=True) (stdout, stderr) = clickhouse_proc_create.communicate(("CREATE DATABASE IF NOT EXISTS " + db_name + get_db_engine(args, db_name))) - if not need_retry(stdout, stderr): + total_time = (datetime.now() - start_time).total_seconds() + + if not need_retry(args, stdout, stderr, total_time): break create_database_retries += 1 @@ -1064,46 +1227,11 @@ def main(args): if server_died.is_set(): break - suite_dir = os.path.join(base_dir, suite) - suite_re_obj = re.search('^[0-9]+_(.*)$', suite) - if not suite_re_obj: # skip .gitignore and so on + test_suite = TestSuite.readTestSuite(args, suite) + if test_suite is None: continue - suite_tmp_dir = os.path.join(tmp_dir, suite) - if not os.path.exists(suite_tmp_dir): - os.makedirs(suite_tmp_dir) - - suite = suite_re_obj.group(1) - - if os.path.isdir(suite_dir): - if 'stateful' in suite and not args.no_stateful and not is_data_present(): - print("Won't run stateful tests because test data wasn't loaded.") - continue - if 'stateless' in suite and args.no_stateless: - print("Won't run stateless tests because they were manually disabled.") - continue - if 'stateful' in suite and args.no_stateful: - print("Won't run stateful tests because they were manually disabled.") - continue - - all_tests = get_tests_list( - suite_dir, args.test, args.test_runs, tests_in_suite_key_func) - - all_tags = read_test_tags(suite_dir, all_tests) - - sequential_tests = [] - if args.sequential: - for test in all_tests: - if any(s in test for s in args.sequential): - sequential_tests.append(test) - else: - sequential_tests = collect_sequential_list(all_tags) - - sequential_tests_set = set(sequential_tests) - parallel_tests = [test for test in all_tests if test not in sequential_tests_set] - - total_tests_run += do_run_tests( - args.jobs, suite, suite_dir, suite_tmp_dir, all_tests, all_tags, parallel_tests, sequential_tests, args.parallel) + total_tests_run += do_run_tests(args.jobs, test_suite, args.parallel) if server_died.is_set(): exit_code.value = 1 @@ -1133,8 +1261,12 @@ def main(args): if len(restarted_tests) > 0: print("\nSome tests were restarted:\n") - for (test_case, stderr) in restarted_tests: - print(test_case + "\n" + stderr + "\n") + for test_result in restarted_tests: + print("\n{0:72}: ".format(test_result.case_name)) + # replace it with lowercase to avoid parsing retried tests as failed + for status in TestStatus: + test_result.description = test_result.description.replace(status.value, status.value.lower()) + print(test_result.description) if total_tests_run == 0: print("No tests were run.") @@ -1175,61 +1307,16 @@ def get_additional_client_options_url(args): return '' -def read_test_tags(suite_dir: str, all_tests: List[str]) -> Dict[str, Set[str]]: - def get_comment_sign(filename): - if filename.endswith('.sql') or filename.endswith('.sql.j2'): - return '--' - elif filename.endswith('.sh') or filename.endswith('.py') or filename.endswith('.expect'): - return '#' - else: - raise Exception(f'Unknown file_extension: {filename}') - - def parse_tags_from_line(line, comment_sign): - if not line.startswith(comment_sign): - return None - tags_str = line[len(comment_sign):].lstrip() - tags_prefix = "Tags:" - if not tags_str.startswith(tags_prefix): - return None - tags_str = tags_str[len(tags_prefix):] - tags = tags_str.split(',') - tags = {tag.strip() for tag in tags} - return tags - - def is_shebang(line): - return line.startswith('#!') - - def load_tags_from_file(filepath): - with open(filepath, 'r') as file: - try: - line = file.readline() - if is_shebang(line): - line = file.readline() - except UnicodeDecodeError: - return [] - return parse_tags_from_line(line, get_comment_sign(filepath)) - - all_tags = {} - start_time = datetime.now() - for test_name in all_tests: - tags = load_tags_from_file(os.path.join(suite_dir, test_name)) - if tags: - all_tags[test_name] = tags - elapsed = (datetime.now() - start_time).total_seconds() - if elapsed > 1: - print(f"Tags for suite {suite_dir} read in {elapsed:.2f} seconds") - return all_tags - - -def collect_sequential_list(all_tags: Dict[str, Set[str]]) -> List[str]: - res = [] - for test_name, tags in all_tags.items(): - if ('no-parallel' in tags) or ('sequential' in tags): - res.append(test_name) - return res - - if __name__ == '__main__': + stop_time = None + exit_code = multiprocessing.Value("i", 0) + server_died = multiprocessing.Event() + stop_tests_triggered_lock = multiprocessing.Lock() + stop_tests_triggered = multiprocessing.Event() + queue = multiprocessing.Queue(maxsize=1) + multiprocessing_manager = multiprocessing.Manager() + restarted_tests = multiprocessing_manager.list() + # Move to a new process group and kill it at exit so that we don't have any # infinite tests processes left # (new process group is required to avoid killing some parent processes) diff --git a/tests/integration/helpers/cluster.py b/tests/integration/helpers/cluster.py index d1240999274..6a02754f5e7 100644 --- a/tests/integration/helpers/cluster.py +++ b/tests/integration/helpers/cluster.py @@ -760,7 +760,7 @@ class ClickHouseCluster: hostname=None, env_variables=None, image="clickhouse/integration-test", tag=None, stay_alive=False, ipv4_address=None, ipv6_address=None, with_installed_binary=False, tmpfs=None, zookeeper_docker_compose_path=None, minio_certs_dir=None, use_keeper=True, - main_config_name="config.xml", users_config_name="users.xml", copy_common_configs=True): + main_config_name="config.xml", users_config_name="users.xml", copy_common_configs=True, config_root_name="yandex"): """Add an instance to the cluster. @@ -832,7 +832,8 @@ class ClickHouseCluster: main_config_name=main_config_name, users_config_name=users_config_name, copy_common_configs=copy_common_configs, - tmpfs=tmpfs or []) + tmpfs=tmpfs or [], + config_root_name=config_root_name) docker_compose_yml_dir = get_docker_compose_path() @@ -1802,7 +1803,7 @@ class ClickHouseInstance: main_config_name="config.xml", users_config_name="users.xml", copy_common_configs=True, hostname=None, env_variables=None, image="clickhouse/integration-test", tag="latest", - stay_alive=False, ipv4_address=None, ipv6_address=None, with_installed_binary=False, tmpfs=None): + stay_alive=False, ipv4_address=None, ipv6_address=None, with_installed_binary=False, tmpfs=None, config_root_name="yandex"): self.name = name self.base_cmd = cluster.base_cmd @@ -1875,6 +1876,7 @@ class ClickHouseInstance: self.ipv6_address = ipv6_address self.with_installed_binary = with_installed_binary self.is_up = False + self.config_root_name = config_root_name def is_built_with_sanitizer(self, sanitizer_name=''): @@ -2219,9 +2221,8 @@ class ClickHouseInstance: finally: sock.close() - @staticmethod - def dict_to_xml(dictionary): - xml_str = dict2xml(dictionary, wrap="yandex", indent=" ", newlines=True) + def dict_to_xml(self, dictionary): + xml_str = dict2xml(dictionary, wrap=self.config_root_name, indent=" ", newlines=True) return xml_str @property @@ -2304,15 +2305,22 @@ class ClickHouseInstance: dictionaries_dir = p.abspath(p.join(instance_config_dir, 'dictionaries')) os.mkdir(dictionaries_dir) + def write_embedded_config(name, dest_dir): + with open(p.join(HELPERS_DIR, name), 'r') as f: + data = f.read() + data = data.replace('yandex', self.config_root_name) + with open(p.join(dest_dir, name), 'w') as r: + r.write(data) logging.debug("Copy common configuration from helpers") # The file is named with 0_ prefix to be processed before other configuration overloads. if self.copy_common_configs: - shutil.copy(p.join(HELPERS_DIR, '0_common_instance_config.xml'), self.config_d_dir) + write_embedded_config('0_common_instance_config.xml', self.config_d_dir) + + write_embedded_config('0_common_instance_users.xml', users_d_dir) - shutil.copy(p.join(HELPERS_DIR, '0_common_instance_users.xml'), users_d_dir) if len(self.custom_dictionaries_paths): - shutil.copy(p.join(HELPERS_DIR, '0_common_enable_dictionaries.xml'), self.config_d_dir) + write_embedded_config('0_common_enable_dictionaries.xml', self.config_d_dir) logging.debug("Generate and write macros file") macros = self.macros.copy() diff --git a/tests/integration/test_config_xml_main/configs/config.d/access_control.yaml b/tests/integration/test_config_xml_main/configs/config.d/access_control.yaml index d8ead517c86..a83e775e504 100644 --- a/tests/integration/test_config_xml_main/configs/config.d/access_control.yaml +++ b/tests/integration/test_config_xml_main/configs/config.d/access_control.yaml @@ -2,6 +2,6 @@ user_directories: users_xml: path: users.xml local_directory: - path: access/ + path: /var/lib/clickhouse/access/ "@replace": replace diff --git a/tests/integration/test_config_xml_main/configs/config.d/path.yaml b/tests/integration/test_config_xml_main/configs/config.d/path.yaml index 3e26e8906ee..7fd5b1a0478 100644 --- a/tests/integration/test_config_xml_main/configs/config.d/path.yaml +++ b/tests/integration/test_config_xml_main/configs/config.d/path.yaml @@ -1,18 +1,18 @@ path: - - ./ + - /var/lib/clickhouse - "@replace": replace tmp_path: - - ./tmp/ + - /var/lib/clickhouse/tmp/ - "@replace": replace user_files_path: - - ./user_files/ + - /var/lib/clickhouse/user_files/ - "@replace": replace format_schema_path: - - ./format_schemas/ + - /var/lib/clickhouse/format_schemas/ - "@replace": replace access_control_path: - - ./access/ + - /var/lib/clickhouse/access/ - "@replace": replace top_level_domains_path: - - ./top_level_domains/ + - /var/lib/clickhouse/top_level_domains/ - "@replace": replace diff --git a/tests/integration/test_config_xml_main/configs/config.xml b/tests/integration/test_config_xml_main/configs/config.xml index 3b5dab50ffe..a2c69c34e0a 100644 --- a/tests/integration/test_config_xml_main/configs/config.xml +++ b/tests/integration/test_config_xml_main/configs/config.xml @@ -1,5 +1,5 @@ - + trace /var/log/clickhouse-server/clickhouse-server.log @@ -64,10 +64,10 @@ 1000 - /var/lib/clickhouse/ + ./ - /var/lib/clickhouse/tmp/ - /var/lib/clickhouse/user_files/ + ./tmp/ + ./user_files/ @@ -274,4 +274,4 @@ false https://6f33034cfe684dd7a3ab9875e57b1c8d@o388870.ingest.sentry.io/5226277 - + diff --git a/tests/integration/test_config_xml_main/configs/embedded.xml b/tests/integration/test_config_xml_main/configs/embedded.xml index a66f57d1eb7..ba0df99dfe0 100644 --- a/tests/integration/test_config_xml_main/configs/embedded.xml +++ b/tests/integration/test_config_xml_main/configs/embedded.xml @@ -1,6 +1,6 @@ - + trace true @@ -37,4 +37,4 @@ - + diff --git a/tests/integration/test_config_xml_main/configs/users.xml b/tests/integration/test_config_xml_main/configs/users.xml index b473413bdfa..ac785c12577 100644 --- a/tests/integration/test_config_xml_main/configs/users.xml +++ b/tests/integration/test_config_xml_main/configs/users.xml @@ -1,5 +1,5 @@ - + 10000000000 @@ -16,4 +16,4 @@ default - + diff --git a/tests/integration/test_config_xml_main/test.py b/tests/integration/test_config_xml_main/test.py index 052f9adb01f..11efb5e283c 100644 --- a/tests/integration/test_config_xml_main/test.py +++ b/tests/integration/test_config_xml_main/test.py @@ -32,7 +32,7 @@ def test_xml_main_conf(): all_userd = ['configs/users.d/allow_introspection_functions.yaml', 'configs/users.d/log_queries.yaml'] - node = cluster.add_instance('node', base_config_dir='configs', main_configs=all_confd, user_configs=all_userd, with_zookeeper=False) + node = cluster.add_instance('node', base_config_dir='configs', main_configs=all_confd, user_configs=all_userd, with_zookeeper=False, config_root_name='clickhouse') try: cluster.start() diff --git a/tests/integration/test_config_xml_yaml_mix/configs/config.d/access_control.yaml b/tests/integration/test_config_xml_yaml_mix/configs/config.d/access_control.yaml index ce2e23839ef..90a0a0ac3fb 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/config.d/access_control.yaml +++ b/tests/integration/test_config_xml_yaml_mix/configs/config.d/access_control.yaml @@ -2,6 +2,6 @@ user_directories: users_xml: path: users.yaml local_directory: - path: access/ + path: /var/lib/clickhouse/access/ "@replace": replace diff --git a/tests/integration/test_config_xml_yaml_mix/configs/config.d/keeper_port.xml b/tests/integration/test_config_xml_yaml_mix/configs/config.d/keeper_port.xml index b21df47bc85..cee4d338231 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/config.d/keeper_port.xml +++ b/tests/integration/test_config_xml_yaml_mix/configs/config.d/keeper_port.xml @@ -1,4 +1,4 @@ - + 9181 1 @@ -20,4 +20,4 @@ - + diff --git a/tests/integration/test_config_xml_yaml_mix/configs/config.d/logging_no_rotate.xml b/tests/integration/test_config_xml_yaml_mix/configs/config.d/logging_no_rotate.xml index 2c34585437b..e541c39aff4 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/config.d/logging_no_rotate.xml +++ b/tests/integration/test_config_xml_yaml_mix/configs/config.d/logging_no_rotate.xml @@ -1,8 +1,8 @@ - + never - + diff --git a/tests/integration/test_config_xml_yaml_mix/configs/config.d/metric_log.xml b/tests/integration/test_config_xml_yaml_mix/configs/config.d/metric_log.xml index 0ca9f162416..ea829d15975 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/config.d/metric_log.xml +++ b/tests/integration/test_config_xml_yaml_mix/configs/config.d/metric_log.xml @@ -1,8 +1,8 @@ - + system metric_log
7500 1000
-
+ diff --git a/tests/integration/test_config_xml_yaml_mix/configs/config.d/part_log.xml b/tests/integration/test_config_xml_yaml_mix/configs/config.d/part_log.xml index 6c6fc9c6982..ce9847a49fb 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/config.d/part_log.xml +++ b/tests/integration/test_config_xml_yaml_mix/configs/config.d/part_log.xml @@ -1,8 +1,8 @@ - + system part_log
7500
-
+ diff --git a/tests/integration/test_config_xml_yaml_mix/configs/config.d/path.yaml b/tests/integration/test_config_xml_yaml_mix/configs/config.d/path.yaml index 3e26e8906ee..7fd5b1a0478 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/config.d/path.yaml +++ b/tests/integration/test_config_xml_yaml_mix/configs/config.d/path.yaml @@ -1,18 +1,18 @@ path: - - ./ + - /var/lib/clickhouse - "@replace": replace tmp_path: - - ./tmp/ + - /var/lib/clickhouse/tmp/ - "@replace": replace user_files_path: - - ./user_files/ + - /var/lib/clickhouse/user_files/ - "@replace": replace format_schema_path: - - ./format_schemas/ + - /var/lib/clickhouse/format_schemas/ - "@replace": replace access_control_path: - - ./access/ + - /var/lib/clickhouse/access/ - "@replace": replace top_level_domains_path: - - ./top_level_domains/ + - /var/lib/clickhouse/top_level_domains/ - "@replace": replace diff --git a/tests/integration/test_config_xml_yaml_mix/configs/config.d/query_masking_rules.xml b/tests/integration/test_config_xml_yaml_mix/configs/config.d/query_masking_rules.xml index 5a854848f3d..690d62b9a2b 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/config.d/query_masking_rules.xml +++ b/tests/integration/test_config_xml_yaml_mix/configs/config.d/query_masking_rules.xml @@ -1,10 +1,10 @@ - + TOPSECRET.TOPSECRET [hidden] - + diff --git a/tests/integration/test_config_xml_yaml_mix/configs/config.d/test_cluster_with_incorrect_pw.xml b/tests/integration/test_config_xml_yaml_mix/configs/config.d/test_cluster_with_incorrect_pw.xml index 109e35afc37..a5033abf154 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/config.d/test_cluster_with_incorrect_pw.xml +++ b/tests/integration/test_config_xml_yaml_mix/configs/config.d/test_cluster_with_incorrect_pw.xml @@ -1,4 +1,4 @@ - + @@ -18,4 +18,4 @@ - + diff --git a/tests/integration/test_config_xml_yaml_mix/configs/config.d/zookeeper.xml b/tests/integration/test_config_xml_yaml_mix/configs/config.d/zookeeper.xml index 06ed7fcd39f..4fa529a6180 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/config.d/zookeeper.xml +++ b/tests/integration/test_config_xml_yaml_mix/configs/config.d/zookeeper.xml @@ -1,8 +1,8 @@ - + localhost 9181 - + diff --git a/tests/integration/test_config_xml_yaml_mix/configs/config.xml b/tests/integration/test_config_xml_yaml_mix/configs/config.xml index e6a2b6d5324..660e8d7937d 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/config.xml +++ b/tests/integration/test_config_xml_yaml_mix/configs/config.xml @@ -1,5 +1,5 @@ - + trace /var/log/clickhouse-server/clickhouse-server.log @@ -64,10 +64,10 @@ 1000 - /var/lib/clickhouse/ + ./ - /var/lib/clickhouse/tmp/ - /var/lib/clickhouse/user_files/ + ./tmp/ + ./user_files/ @@ -274,4 +274,4 @@ false https://6f33034cfe684dd7a3ab9875e57b1c8d@o388870.ingest.sentry.io/5226277 - + diff --git a/tests/integration/test_config_xml_yaml_mix/configs/embedded.xml b/tests/integration/test_config_xml_yaml_mix/configs/embedded.xml index a66f57d1eb7..ba0df99dfe0 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/embedded.xml +++ b/tests/integration/test_config_xml_yaml_mix/configs/embedded.xml @@ -1,6 +1,6 @@ - + trace true @@ -37,4 +37,4 @@ - + diff --git a/tests/integration/test_config_xml_yaml_mix/configs/users.d/allow_introspection_functions.xml b/tests/integration/test_config_xml_yaml_mix/configs/users.d/allow_introspection_functions.xml index b94e95bc043..cfde1b4525d 100644 --- a/tests/integration/test_config_xml_yaml_mix/configs/users.d/allow_introspection_functions.xml +++ b/tests/integration/test_config_xml_yaml_mix/configs/users.d/allow_introspection_functions.xml @@ -1,8 +1,8 @@ - + 1 - + diff --git a/tests/integration/test_config_xml_yaml_mix/test.py b/tests/integration/test_config_xml_yaml_mix/test.py index 90ee8a2dea5..86cd68b3378 100644 --- a/tests/integration/test_config_xml_yaml_mix/test.py +++ b/tests/integration/test_config_xml_yaml_mix/test.py @@ -32,7 +32,7 @@ def test_extra_yaml_mix(): 'configs/users.d/log_queries.yaml'] node = cluster.add_instance('node', base_config_dir='configs', main_configs=all_confd, user_configs=all_userd, with_zookeeper=False, - users_config_name="users.yaml", copy_common_configs=False) + users_config_name="users.yaml", copy_common_configs=False, config_root_name="clickhouse") try: cluster.start() diff --git a/tests/integration/test_config_yaml_full/configs/config.d/access_control.yaml b/tests/integration/test_config_yaml_full/configs/config.d/access_control.yaml index ce2e23839ef..90a0a0ac3fb 100644 --- a/tests/integration/test_config_yaml_full/configs/config.d/access_control.yaml +++ b/tests/integration/test_config_yaml_full/configs/config.d/access_control.yaml @@ -2,6 +2,6 @@ user_directories: users_xml: path: users.yaml local_directory: - path: access/ + path: /var/lib/clickhouse/access/ "@replace": replace diff --git a/tests/integration/test_config_yaml_full/configs/config.d/path.yaml b/tests/integration/test_config_yaml_full/configs/config.d/path.yaml index 3e26e8906ee..7fd5b1a0478 100644 --- a/tests/integration/test_config_yaml_full/configs/config.d/path.yaml +++ b/tests/integration/test_config_yaml_full/configs/config.d/path.yaml @@ -1,18 +1,18 @@ path: - - ./ + - /var/lib/clickhouse - "@replace": replace tmp_path: - - ./tmp/ + - /var/lib/clickhouse/tmp/ - "@replace": replace user_files_path: - - ./user_files/ + - /var/lib/clickhouse/user_files/ - "@replace": replace format_schema_path: - - ./format_schemas/ + - /var/lib/clickhouse/format_schemas/ - "@replace": replace access_control_path: - - ./access/ + - /var/lib/clickhouse/access/ - "@replace": replace top_level_domains_path: - - ./top_level_domains/ + - /var/lib/clickhouse/top_level_domains/ - "@replace": replace diff --git a/tests/integration/test_config_yaml_full/configs/config.yaml b/tests/integration/test_config_yaml_full/configs/config.yaml index 619a3735269..5958d463d21 100644 --- a/tests/integration/test_config_yaml_full/configs/config.yaml +++ b/tests/integration/test_config_yaml_full/configs/config.yaml @@ -48,9 +48,9 @@ total_memory_tracker_sample_probability: 0 uncompressed_cache_size: 8589934592 mark_cache_size: 5368709120 mmap_cache_size: 1000 -path: /var/lib/clickhouse/ -tmp_path: /var/lib/clickhouse/tmp/ -user_files_path: /var/lib/clickhouse/user_files/ +path: ./ +tmp_path: ./tmp +user_files_path: ./user_files/ ldap_servers: '' user_directories: users_xml: diff --git a/tests/integration/test_config_yaml_full/test.py b/tests/integration/test_config_yaml_full/test.py index bc4fa40384c..e8bf21754e0 100644 --- a/tests/integration/test_config_yaml_full/test.py +++ b/tests/integration/test_config_yaml_full/test.py @@ -31,7 +31,7 @@ def test_yaml_full_conf(): 'configs/users.d/log_queries.yaml'] node = cluster.add_instance('node', base_config_dir='configs', main_configs=all_confd, user_configs=all_userd, - with_zookeeper=False, main_config_name="config.yaml", users_config_name="users.yaml", copy_common_configs=False) + with_zookeeper=False, main_config_name="config.yaml", users_config_name="users.yaml", copy_common_configs=False, config_root_name="clickhouse") try: cluster.start() diff --git a/tests/integration/test_config_yaml_main/configs/config.d/access_control.xml b/tests/integration/test_config_yaml_main/configs/config.d/access_control.xml index b61f89bd904..7b2815b5ce9 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/access_control.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/access_control.xml @@ -1,4 +1,4 @@ - + @@ -7,7 +7,7 @@ - access/ + /var/lib/clickhouse/access/ - + diff --git a/tests/integration/test_config_yaml_main/configs/config.d/keeper_port.xml b/tests/integration/test_config_yaml_main/configs/config.d/keeper_port.xml index b21df47bc85..cee4d338231 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/keeper_port.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/keeper_port.xml @@ -1,4 +1,4 @@ - + 9181 1 @@ -20,4 +20,4 @@ - + diff --git a/tests/integration/test_config_yaml_main/configs/config.d/log_to_console.xml b/tests/integration/test_config_yaml_main/configs/config.d/log_to_console.xml index 227c53647f3..70d2f014380 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/log_to_console.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/log_to_console.xml @@ -1,7 +1,7 @@ - + true - + diff --git a/tests/integration/test_config_yaml_main/configs/config.d/logging_no_rotate.xml b/tests/integration/test_config_yaml_main/configs/config.d/logging_no_rotate.xml index 2c34585437b..e541c39aff4 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/logging_no_rotate.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/logging_no_rotate.xml @@ -1,8 +1,8 @@ - + never - + diff --git a/tests/integration/test_config_yaml_main/configs/config.d/macros.xml b/tests/integration/test_config_yaml_main/configs/config.d/macros.xml index 4902b12bc81..657082fe8ae 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/macros.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/macros.xml @@ -1,4 +1,4 @@ - + Hello, world! s1 @@ -6,4 +6,4 @@ /clickhouse/tables/{database}/{shard}/ table_{table} - + diff --git a/tests/integration/test_config_yaml_main/configs/config.d/metric_log.xml b/tests/integration/test_config_yaml_main/configs/config.d/metric_log.xml index 0ca9f162416..ea829d15975 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/metric_log.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/metric_log.xml @@ -1,8 +1,8 @@ - + system metric_log
7500 1000
-
+ diff --git a/tests/integration/test_config_yaml_main/configs/config.d/more_clusters.xml b/tests/integration/test_config_yaml_main/configs/config.d/more_clusters.xml index aecbf9e0ba7..ce88408876f 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/more_clusters.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/more_clusters.xml @@ -1,4 +1,4 @@ - + - + diff --git a/tests/integration/test_config_yaml_main/configs/config.d/part_log.xml b/tests/integration/test_config_yaml_main/configs/config.d/part_log.xml index 6c6fc9c6982..ce9847a49fb 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/part_log.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/part_log.xml @@ -1,8 +1,8 @@ - + system part_log
7500
-
+ diff --git a/tests/integration/test_config_yaml_main/configs/config.d/path.xml b/tests/integration/test_config_yaml_main/configs/config.d/path.xml index 466ed0d1663..25d8f6780d2 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/path.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/path.xml @@ -1,8 +1,8 @@ - - ./ - ./tmp/ - ./user_files/ - ./format_schemas/ - ./access/ - ./top_level_domains/ - + + /var/lib/clickhouse + /var/lib/clickhouse/tmp/ + /var/lib/clickhouse/user_files/ + /var/lib/clickhouse/format_schemas/ + /var/lib/clickhouse/access/ + /var/lib/clickhouse/top_level_domains/ + diff --git a/tests/integration/test_config_yaml_main/configs/config.d/query_masking_rules.xml b/tests/integration/test_config_yaml_main/configs/config.d/query_masking_rules.xml index 5a854848f3d..690d62b9a2b 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/query_masking_rules.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/query_masking_rules.xml @@ -1,10 +1,10 @@ - + TOPSECRET.TOPSECRET [hidden] - + diff --git a/tests/integration/test_config_yaml_main/configs/config.d/tcp_with_proxy.xml b/tests/integration/test_config_yaml_main/configs/config.d/tcp_with_proxy.xml index 19046054c16..733eb2370ca 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/tcp_with_proxy.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/tcp_with_proxy.xml @@ -1,3 +1,3 @@ - + 9010 - + diff --git a/tests/integration/test_config_yaml_main/configs/config.d/test_cluster_with_incorrect_pw.xml b/tests/integration/test_config_yaml_main/configs/config.d/test_cluster_with_incorrect_pw.xml index 109e35afc37..a5033abf154 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/test_cluster_with_incorrect_pw.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/test_cluster_with_incorrect_pw.xml @@ -1,4 +1,4 @@ - + @@ -18,4 +18,4 @@ - + diff --git a/tests/integration/test_config_yaml_main/configs/config.d/text_log.xml b/tests/integration/test_config_yaml_main/configs/config.d/text_log.xml index 3699a23578c..dce4942d952 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/text_log.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/text_log.xml @@ -1,7 +1,7 @@ - + system text_log
7500
-
+ diff --git a/tests/integration/test_config_yaml_main/configs/config.d/zookeeper.xml b/tests/integration/test_config_yaml_main/configs/config.d/zookeeper.xml index 06ed7fcd39f..4fa529a6180 100644 --- a/tests/integration/test_config_yaml_main/configs/config.d/zookeeper.xml +++ b/tests/integration/test_config_yaml_main/configs/config.d/zookeeper.xml @@ -1,8 +1,8 @@ - + localhost 9181 - + diff --git a/tests/integration/test_config_yaml_main/configs/config.yaml b/tests/integration/test_config_yaml_main/configs/config.yaml index e5a36b1e49b..65cbb4b3e81 100644 --- a/tests/integration/test_config_yaml_main/configs/config.yaml +++ b/tests/integration/test_config_yaml_main/configs/config.yaml @@ -48,9 +48,9 @@ total_memory_tracker_sample_probability: 0 uncompressed_cache_size: 8589934592 mark_cache_size: 5368709120 mmap_cache_size: 1000 -path: /var/lib/clickhouse/ -tmp_path: /var/lib/clickhouse/tmp/ -user_files_path: /var/lib/clickhouse/user_files/ +path: ./ +tmp_path: ./tmp/ +user_files_path: ./user_files/ ldap_servers: '' user_directories: users_xml: diff --git a/tests/integration/test_config_yaml_main/configs/embedded.xml b/tests/integration/test_config_yaml_main/configs/embedded.xml index a66f57d1eb7..ba0df99dfe0 100644 --- a/tests/integration/test_config_yaml_main/configs/embedded.xml +++ b/tests/integration/test_config_yaml_main/configs/embedded.xml @@ -1,6 +1,6 @@ - + trace true @@ -37,4 +37,4 @@ - + diff --git a/tests/integration/test_config_yaml_main/configs/users.d/allow_introspection_functions.xml b/tests/integration/test_config_yaml_main/configs/users.d/allow_introspection_functions.xml index b94e95bc043..cfde1b4525d 100644 --- a/tests/integration/test_config_yaml_main/configs/users.d/allow_introspection_functions.xml +++ b/tests/integration/test_config_yaml_main/configs/users.d/allow_introspection_functions.xml @@ -1,8 +1,8 @@ - + 1 - + diff --git a/tests/integration/test_config_yaml_main/configs/users.d/log_queries.xml b/tests/integration/test_config_yaml_main/configs/users.d/log_queries.xml index 25261072ade..755c5478463 100644 --- a/tests/integration/test_config_yaml_main/configs/users.d/log_queries.xml +++ b/tests/integration/test_config_yaml_main/configs/users.d/log_queries.xml @@ -1,7 +1,7 @@ - + 1 - + diff --git a/tests/integration/test_config_yaml_main/test.py b/tests/integration/test_config_yaml_main/test.py index f4de16c35a2..bb4c8eb8f9f 100644 --- a/tests/integration/test_config_yaml_main/test.py +++ b/tests/integration/test_config_yaml_main/test.py @@ -32,7 +32,8 @@ def test_yaml_main_conf(): 'configs/users.d/log_queries.xml'] node = cluster.add_instance('node', base_config_dir='configs', main_configs=all_confd, user_configs=all_userd, - with_zookeeper=False, main_config_name="config.yaml", users_config_name="users.yaml", copy_common_configs=False) + with_zookeeper=False, main_config_name="config.yaml", users_config_name="users.yaml", + copy_common_configs=False, config_root_name="clickhouse") try: cluster.start() diff --git a/tests/integration/test_dictionaries_dependency_xml/test.py b/tests/integration/test_dictionaries_dependency_xml/test.py index cfd7d58d574..849fdf57980 100644 --- a/tests/integration/test_dictionaries_dependency_xml/test.py +++ b/tests/integration/test_dictionaries_dependency_xml/test.py @@ -6,7 +6,7 @@ DICTIONARY_FILES = ['configs/dictionaries/dep_x.xml', 'configs/dictionaries/dep_ 'configs/dictionaries/dep_z.xml'] cluster = ClickHouseCluster(__file__) -instance = cluster.add_instance('instance', dictionaries=DICTIONARY_FILES) +instance = cluster.add_instance('instance', dictionaries=DICTIONARY_FILES, stay_alive=True) @pytest.fixture(scope="module") @@ -73,3 +73,45 @@ def test_get_data(started_cluster): assert query("SELECT dictGetString('dep_x', 'a', toUInt64(4))") == "ether\n" assert query("SELECT dictGetString('dep_y', 'a', toUInt64(4))") == "ether\n" assert query("SELECT dictGetString('dep_z', 'a', toUInt64(4))") == "ether\n" + +def dependent_tables_assert(): + res = instance.query("select database || '.' || name from system.tables") + assert "system.join" in res + assert "default.src" in res + assert "dict.dep_y" in res + assert "lazy.log" in res + assert "test.d" in res + assert "default.join" in res + assert "a.t" in res + +def test_dependent_tables(started_cluster): + query = instance.query + query("create database lazy engine=Lazy(10)") + query("create database a") + query("create table lazy.src (n int, m int) engine=Log") + query("create dictionary a.d (n int default 0, m int default 42) primary key n " + "source(clickhouse(host 'localhost' port tcpPort() user 'default' table 'src' password '' db 'lazy'))" + "lifetime(min 1 max 10) layout(flat())") + query("create table system.join (n int, m int) engine=Join(any, left, n)") + query("insert into system.join values (1, 1)") + query("create table src (n int, m default joinGet('system.join', 'm', 1::int)," + "t default dictGetOrNull('a.d', 'm', toUInt64(3))," + "k default dictGet('a.d', 'm', toUInt64(4))) engine=MergeTree order by n") + query("create dictionary test.d (n int default 0, m int default 42) primary key n " + "source(clickhouse(host 'localhost' port tcpPort() user 'default' table 'src' password '' db 'default'))" + "lifetime(min 1 max 10) layout(flat())") + query("create table join (n int, m default dictGet('a.d', 'm', toUInt64(3))," + "k default dictGet('test.d', 'm', toUInt64(0))) engine=Join(any, left, n)") + query("create table lazy.log (n default dictGet(test.d, 'm', toUInt64(0))) engine=Log") + query("create table a.t (n default joinGet('system.join', 'm', 1::int)," + "m default dictGet('test.d', 'm', toUInt64(3))," + "k default joinGet(join, 'm', 1::int)) engine=MergeTree order by n") + + dependent_tables_assert() + instance.restart_clickhouse() + dependent_tables_assert() + query("drop database a") + query("drop database lazy") + query("drop table src") + query("drop table join") + query("drop table system.join") diff --git a/tests/integration/test_dictionaries_mysql/configs/named_collections.xml b/tests/integration/test_dictionaries_mysql/configs/named_collections.xml new file mode 100644 index 00000000000..0b591579247 --- /dev/null +++ b/tests/integration/test_dictionaries_mysql/configs/named_collections.xml @@ -0,0 +1,25 @@ + + + + root + clickhouse + mysql57 + 3306 + test + test_table
+
+ + postgres + mysecretpassword + postgres1 + + + root + clickhouse + mysql57 + 1111 + test + test_table
+
+
+
diff --git a/tests/integration/test_dictionaries_mysql/test.py b/tests/integration/test_dictionaries_mysql/test.py index fa3855d1e16..c1819923523 100644 --- a/tests/integration/test_dictionaries_mysql/test.py +++ b/tests/integration/test_dictionaries_mysql/test.py @@ -7,7 +7,7 @@ import time import logging DICTS = ['configs/dictionaries/mysql_dict1.xml', 'configs/dictionaries/mysql_dict2.xml'] -CONFIG_FILES = ['configs/remote_servers.xml'] +CONFIG_FILES = ['configs/remote_servers.xml', 'configs/named_collections.xml'] cluster = ClickHouseCluster(__file__) instance = cluster.add_instance('instance', main_configs=CONFIG_FILES, with_mysql=True, dictionaries=DICTS) @@ -157,6 +157,55 @@ def test_mysql_dictionaries_custom_query_partial_load_complex_key(started_cluste execute_mysql_query(mysql_connection, "DROP TABLE test.test_table_2;") +def test_predefined_connection_configuration(started_cluster): + mysql_connection = get_mysql_conn(started_cluster) + + execute_mysql_query(mysql_connection, "DROP TABLE IF EXISTS test.test_table") + execute_mysql_query(mysql_connection, "CREATE TABLE IF NOT EXISTS test.test_table (id Integer, value Integer);") + execute_mysql_query(mysql_connection, "INSERT INTO test.test_table VALUES (100, 200);") + + instance.query(''' + DROP DICTIONARY IF EXISTS dict; + CREATE DICTIONARY dict (id UInt32, value UInt32) + PRIMARY KEY id + SOURCE(MYSQL(NAME mysql1)) + LIFETIME(MIN 1 MAX 2) + LAYOUT(HASHED()); + ''') + result = instance.query("SELECT dictGetUInt32(dict, 'value', toUInt64(100))") + assert(int(result) == 200) + + instance.query(''' + DROP DICTIONARY dict; + CREATE DICTIONARY dict (id UInt32, value UInt32) + PRIMARY KEY id + SOURCE(MYSQL(NAME mysql2)) + LIFETIME(MIN 1 MAX 2) + LAYOUT(HASHED()); + ''') + result = instance.query_and_get_error("SELECT dictGetUInt32(dict, 'value', toUInt64(100))") + instance.query(''' + DROP DICTIONARY dict; + CREATE DICTIONARY dict (id UInt32, value UInt32) + PRIMARY KEY id + SOURCE(MYSQL(NAME unknown_collection)) + LIFETIME(MIN 1 MAX 2) + LAYOUT(HASHED()); + ''') + result = instance.query_and_get_error("SELECT dictGetUInt32(dict, 'value', toUInt64(100))") + + instance.query(''' + DROP DICTIONARY dict; + CREATE DICTIONARY dict (id UInt32, value UInt32) + PRIMARY KEY id + SOURCE(MYSQL(NAME mysql3 PORT 3306)) + LIFETIME(MIN 1 MAX 2) + LAYOUT(HASHED()); + ''') + result = instance.query("SELECT dictGetUInt32(dict, 'value', toUInt64(100))") + assert(int(result) == 200) + + def create_mysql_db(mysql_connection, name): with mysql_connection.cursor() as cursor: cursor.execute("DROP DATABASE IF EXISTS {}".format(name)) diff --git a/tests/integration/test_dictionaries_postgresql/configs/named_collections.xml b/tests/integration/test_dictionaries_postgresql/configs/named_collections.xml new file mode 100644 index 00000000000..676cab2087d --- /dev/null +++ b/tests/integration/test_dictionaries_postgresql/configs/named_collections.xml @@ -0,0 +1,34 @@ + + + + postgres + mysecretpassword + postgres1 + 5432 + clickhouse + test_table
+
+ + postgres + mysecretpassword + postgres1 + 5432 + clickhouse + test_table
+ test_schema +
+ + postgres + mysecretpassword + postgres1 + 1111 + postgres + test_table
+
+ + postgres + mysecretpassword + postgres1 + +
+
diff --git a/tests/integration/test_dictionaries_postgresql/test.py b/tests/integration/test_dictionaries_postgresql/test.py index 58a503bd571..8869e9112d1 100644 --- a/tests/integration/test_dictionaries_postgresql/test.py +++ b/tests/integration/test_dictionaries_postgresql/test.py @@ -8,7 +8,7 @@ from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT cluster = ClickHouseCluster(__file__) node1 = cluster.add_instance('node1', - main_configs=['configs/config.xml', 'configs/dictionaries/postgres_dict.xml'], + main_configs=['configs/config.xml', 'configs/dictionaries/postgres_dict.xml', 'configs/named_collections.xml'], with_postgres=True, with_postgres_cluster=True) postgres_dict_table_template = """ @@ -302,6 +302,73 @@ def test_postgres_schema(started_cluster): node1.query("DROP DICTIONARY IF EXISTS postgres_dict") +def test_predefined_connection_configuration(started_cluster): + conn = get_postgres_conn(ip=started_cluster.postgres_ip, port=started_cluster.postgres_port, database=True) + cursor = conn.cursor() + + cursor.execute('DROP TABLE IF EXISTS test_table') + cursor.execute('CREATE TABLE test_table (id integer, value integer)') + cursor.execute('INSERT INTO test_table SELECT i, i FROM generate_series(0, 99) as t(i)') + + node1.query(''' + CREATE DICTIONARY postgres_dict (id UInt32, value UInt32) + PRIMARY KEY id + SOURCE(POSTGRESQL(NAME postgres1)) + LIFETIME(MIN 1 MAX 2) + LAYOUT(HASHED()); + ''') + result = node1.query("SELECT dictGetUInt32(postgres_dict, 'value', toUInt64(99))") + assert(int(result.strip()) == 99) + + cursor.execute('DROP SCHEMA IF EXISTS test_schema CASCADE') + cursor.execute('CREATE SCHEMA test_schema') + cursor.execute('CREATE TABLE test_schema.test_table (id integer, value integer)') + cursor.execute('INSERT INTO test_schema.test_table SELECT i, 100 FROM generate_series(0, 99) as t(i)') + + node1.query(''' + DROP DICTIONARY postgres_dict; + CREATE DICTIONARY postgres_dict (id UInt32, value UInt32) + PRIMARY KEY id + SOURCE(POSTGRESQL(NAME postgres1 SCHEMA test_schema)) + LIFETIME(MIN 1 MAX 2) + LAYOUT(HASHED()); + ''') + result = node1.query("SELECT dictGetUInt32(postgres_dict, 'value', toUInt64(99))") + assert(int(result.strip()) == 100) + + node1.query(''' + DROP DICTIONARY postgres_dict; + CREATE DICTIONARY postgres_dict (id UInt32, value UInt32) + PRIMARY KEY id + SOURCE(POSTGRESQL(NAME postgres2)) + LIFETIME(MIN 1 MAX 2) + LAYOUT(HASHED()); + ''') + result = node1.query("SELECT dictGetUInt32(postgres_dict, 'value', toUInt64(99))") + assert(int(result.strip()) == 100) + + node1.query('DROP DICTIONARY postgres_dict') + node1.query(''' + CREATE DICTIONARY postgres_dict (id UInt32, value UInt32) + PRIMARY KEY id + SOURCE(POSTGRESQL(NAME postgres4)) + LIFETIME(MIN 1 MAX 2) + LAYOUT(HASHED()); + ''') + result = node1.query_and_get_error("SELECT dictGetUInt32(postgres_dict, 'value', toUInt64(99))") + + node1.query(''' + DROP DICTIONARY postgres_dict; + CREATE DICTIONARY postgres_dict (id UInt32, value UInt32) + PRIMARY KEY id + SOURCE(POSTGRESQL(NAME postgres1 PORT 5432)) + LIFETIME(MIN 1 MAX 2) + LAYOUT(HASHED()); + ''') + result = node1.query("SELECT dictGetUInt32(postgres_dict, 'value', toUInt64(99))") + assert(int(result.strip()) == 99) + + if __name__ == '__main__': cluster.start() input("Cluster created, press any key to destroy...") diff --git a/tests/integration/test_max_suspicious_broken_parts/__init__.py b/tests/integration/test_max_suspicious_broken_parts/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_max_suspicious_broken_parts/test.py b/tests/integration/test_max_suspicious_broken_parts/test.py new file mode 100644 index 00000000000..31f53fdbc3c --- /dev/null +++ b/tests/integration/test_max_suspicious_broken_parts/test.py @@ -0,0 +1,121 @@ +# pylint: disable=unused-argument +# pylint: disable=redefined-outer-name +# pylint: disable=line-too-long + +import pytest + +from helpers.client import QueryRuntimeException +from helpers.cluster import ClickHouseCluster + +cluster = ClickHouseCluster(__file__) +node = cluster.add_instance('node', stay_alive=True) + +@pytest.fixture(scope='module', autouse=True) +def start_cluster(): + try: + cluster.start() + yield cluster + finally: + cluster.shutdown() + +def break_part(table, part_name): + node.exec_in_container(['bash', '-c', f'rm /var/lib/clickhouse/data/default/{table}/{part_name}/columns.txt']) + +def remove_part(table, part_name): + node.exec_in_container(['bash', '-c', f'rm -r /var/lib/clickhouse/data/default/{table}/{part_name}']) + +def get_count(table): + return int(node.query(f'SELECT count() FROM {table}').strip()) + +def detach_table(table): + node.query(f'DETACH TABLE {table}') +def attach_table(table): + node.query(f'ATTACH TABLE {table}') + +def check_table(table): + rows = 900 + per_part_rows = 90 + + node.query(f'INSERT INTO {table} SELECT * FROM numbers(900)') + + assert get_count(table) == rows + + # break one part, and check that clickhouse will be alive + break_part(table, '0_1_1_0') + rows -= per_part_rows + detach_table(table) + attach_table(table) + assert get_count(table) == rows + + # break two parts, and check that clickhouse will not start + break_part(table, '1_2_2_0') + break_part(table, '2_3_3_0') + rows -= per_part_rows*2 + detach_table(table) + with pytest.raises(QueryRuntimeException): + attach_table(table) + + # now remove one part, and check + remove_part(table, '1_2_2_0') + attach_table(table) + assert get_count(table) == rows + + node.query(f'DROP TABLE {table}') + +def test_max_suspicious_broken_parts(): + node.query(""" + CREATE TABLE test_max_suspicious_broken_parts ( + key Int + ) + ENGINE=MergeTree + ORDER BY key + PARTITION BY key%10 + SETTINGS + max_suspicious_broken_parts = 1; + """) + check_table('test_max_suspicious_broken_parts') + +def test_max_suspicious_broken_parts_bytes(): + node.query(""" + CREATE TABLE test_max_suspicious_broken_parts_bytes ( + key Int + ) + ENGINE=MergeTree + ORDER BY key + PARTITION BY key%10 + SETTINGS + max_suspicious_broken_parts = 10, + /* one part takes ~751 byte, so we allow failure of one part with these limit */ + max_suspicious_broken_parts_bytes = 1000; + """) + check_table('test_max_suspicious_broken_parts_bytes') + +def test_max_suspicious_broken_parts__wide(): + node.query(""" + CREATE TABLE test_max_suspicious_broken_parts__wide ( + key Int + ) + ENGINE=MergeTree + ORDER BY key + PARTITION BY key%10 + SETTINGS + min_bytes_for_wide_part = 0, + max_suspicious_broken_parts = 1; + """) + check_table('test_max_suspicious_broken_parts__wide') + +def test_max_suspicious_broken_parts_bytes__wide(): + node.query(""" + CREATE TABLE test_max_suspicious_broken_parts_bytes__wide ( + key Int + ) + ENGINE=MergeTree + ORDER BY key + PARTITION BY key%10 + SETTINGS + min_bytes_for_wide_part = 0, + max_suspicious_broken_parts = 10, + /* one part takes ~750 byte, so we allow failure of one part with these limit */ + max_suspicious_broken_parts_bytes = 1000; + """) + check_table('test_max_suspicious_broken_parts_bytes__wide') diff --git a/tests/integration/test_mysql_database_engine/configs/named_collections.xml b/tests/integration/test_mysql_database_engine/configs/named_collections.xml new file mode 100644 index 00000000000..b5e0973d37a --- /dev/null +++ b/tests/integration/test_mysql_database_engine/configs/named_collections.xml @@ -0,0 +1,25 @@ + + + + root + clickhouse + mysql57 + 3306 + test_database + test_table
+
+ + postgres + mysecretpassword + postgres1 + + + root + clickhouse + mysql57 + 1111 + clickhouse + test_table
+
+
+
diff --git a/tests/integration/test_mysql_database_engine/test.py b/tests/integration/test_mysql_database_engine/test.py index a093c2a0125..49206ab1abe 100644 --- a/tests/integration/test_mysql_database_engine/test.py +++ b/tests/integration/test_mysql_database_engine/test.py @@ -8,7 +8,7 @@ from helpers.client import QueryRuntimeException from helpers.cluster import ClickHouseCluster cluster = ClickHouseCluster(__file__) -clickhouse_node = cluster.add_instance('node1', main_configs=['configs/remote_servers.xml'], with_mysql=True) +clickhouse_node = cluster.add_instance('node1', main_configs=['configs/remote_servers.xml', 'configs/named_collections.xml'], with_mysql=True) @pytest.fixture(scope="module") @@ -404,3 +404,23 @@ def test_mysql_types(started_cluster, case_name, mysql_type, expected_ch_type, m execute_query(clickhouse_node, "SELECT value FROM mysql('mysql57:3306', '${mysql_db}', '${table_name}', 'root', 'clickhouse')", settings=clickhouse_query_settings) + + +def test_predefined_connection_configuration(started_cluster): + with contextlib.closing(MySQLNodeInstance('root', 'clickhouse', started_cluster.mysql_ip, started_cluster.mysql_port)) as mysql_node: + mysql_node.query("DROP DATABASE IF EXISTS test_database") + mysql_node.query("CREATE DATABASE test_database DEFAULT CHARACTER SET 'utf8'") + mysql_node.query('CREATE TABLE `test_database`.`test_table` ( `id` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;') + + clickhouse_node.query("DROP DATABASE IF EXISTS test_database") + clickhouse_node.query("CREATE DATABASE test_database ENGINE = MySQL(mysql1)") + clickhouse_node.query("INSERT INTO `test_database`.`test_table` select number from numbers(100)") + assert clickhouse_node.query("SELECT count() FROM `test_database`.`test_table`").rstrip() == '100' + + clickhouse_node.query("DROP DATABASE test_database") + clickhouse_node.query_and_get_error("CREATE DATABASE test_database ENGINE = MySQL(mysql2)") + clickhouse_node.query_and_get_error("CREATE DATABASE test_database ENGINE = MySQL(unknown_collection)") + clickhouse_node.query_and_get_error("CREATE DATABASE test_database ENGINE = MySQL(mysql1, 1)") + + clickhouse_node.query("CREATE DATABASE test_database ENGINE = MySQL(mysql1, port=3306)") + assert clickhouse_node.query("SELECT count() FROM `test_database`.`test_table`").rstrip() == '100' diff --git a/tests/integration/test_postgresql_database_engine/configs/named_collections.xml b/tests/integration/test_postgresql_database_engine/configs/named_collections.xml new file mode 100644 index 00000000000..f084ee373ec --- /dev/null +++ b/tests/integration/test_postgresql_database_engine/configs/named_collections.xml @@ -0,0 +1,23 @@ + + + + postgres + mysecretpassword + postgres1 + 5432 + postgres + + + postgres + mysecretpassword + postgres1 + + + postgres + mysecretpassword + postgres1 + 1111 + postgres + + + diff --git a/tests/integration/test_postgresql_database_engine/test.py b/tests/integration/test_postgresql_database_engine/test.py index 8768c4037a1..656f655cfb3 100644 --- a/tests/integration/test_postgresql_database_engine/test.py +++ b/tests/integration/test_postgresql_database_engine/test.py @@ -7,7 +7,7 @@ from helpers.test_tools import assert_eq_with_retry from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT cluster = ClickHouseCluster(__file__) -node1 = cluster.add_instance('node1', main_configs=[], with_postgres=True) +node1 = cluster.add_instance('node1', main_configs=["configs/named_collections.xml"], with_postgres=True) postgres_table_template = """ CREATE TABLE IF NOT EXISTS {} ( @@ -208,6 +208,36 @@ def test_postgresql_database_with_schema(started_cluster): node1.query("DROP DATABASE test_database") +def test_predefined_connection_configuration(started_cluster): + cursor = started_cluster.postgres_conn.cursor() + cursor.execute(f'DROP TABLE IF EXISTS test_table') + cursor.execute(f'CREATE TABLE test_table (a integer PRIMARY KEY, b integer)') + + node1.query("DROP DATABASE IF EXISTS postgres_database") + node1.query("CREATE DATABASE postgres_database ENGINE = PostgreSQL(postgres1)") + node1.query("INSERT INTO postgres_database.test_table SELECT number, number from numbers(100)") + assert (node1.query(f"SELECT count() FROM postgres_database.test_table").rstrip() == '100') + + cursor.execute('DROP SCHEMA IF EXISTS test_schema') + cursor.execute('CREATE SCHEMA test_schema') + cursor.execute('CREATE TABLE test_schema.test_table (a integer)') + + node1.query("DROP DATABASE IF EXISTS postgres_database") + node1.query("CREATE DATABASE postgres_database ENGINE = PostgreSQL(postgres1, schema='test_schema')") + node1.query("INSERT INTO postgres_database.test_table SELECT number from numbers(200)") + assert (node1.query(f"SELECT count() FROM postgres_database.test_table").rstrip() == '200') + + node1.query("DROP DATABASE IF EXISTS postgres_database") + node1.query_and_get_error("CREATE DATABASE postgres_database ENGINE = PostgreSQL(postgres1, 'test_schema')") + node1.query_and_get_error("CREATE DATABASE postgres_database ENGINE = PostgreSQL(postgres2)") + node1.query_and_get_error("CREATE DATABASE postgres_database ENGINE = PostgreSQL(unknown_collection)") + node1.query("CREATE DATABASE postgres_database ENGINE = PostgreSQL(postgres3, port=5432)") + assert (node1.query(f"SELECT count() FROM postgres_database.test_table").rstrip() == '100') + + node1.query("DROP DATABASE postgres_database") + cursor.execute(f'DROP TABLE test_table ') + + if __name__ == '__main__': cluster.start() input("Cluster created, press any key to destroy...") diff --git a/tests/integration/test_postgresql_replica_database_engine/configs/log_conf.xml b/tests/integration/test_postgresql_replica_database_engine/configs/log_conf.xml index f9d15e572aa..6dc9fce900a 100644 --- a/tests/integration/test_postgresql_replica_database_engine/configs/log_conf.xml +++ b/tests/integration/test_postgresql_replica_database_engine/configs/log_conf.xml @@ -8,4 +8,22 @@ /var/log/clickhouse-server/stderr.log /var/log/clickhouse-server/stdout.log + + + postgres + mysecretpassword + postgres1 + 5432 + postgres_database + test_table
+
+ + postgres + mysecretpassword + postgres1 + 1111 + postgres_database + test_table
+
+
diff --git a/tests/integration/test_postgresql_replica_database_engine/test.py b/tests/integration/test_postgresql_replica_database_engine/test.py index aa9be8e0244..d8bf29fa2dd 100644 --- a/tests/integration/test_postgresql_replica_database_engine/test.py +++ b/tests/integration/test_postgresql_replica_database_engine/test.py @@ -149,12 +149,10 @@ def check_tables_are_synchronized(table_name, order_by='key', postgres_database= def started_cluster(): try: cluster.start() - conn = get_postgres_conn(ip=cluster.postgres_ip, - port=cluster.postgres_port) + conn = get_postgres_conn(ip=cluster.postgres_ip, port=cluster.postgres_port) cursor = conn.cursor() create_postgres_db(cursor, 'postgres_database') - create_clickhouse_postgres_db(ip=cluster.postgres_ip, - port=cluster.postgres_port) + create_clickhouse_postgres_db(ip=cluster.postgres_ip, port=cluster.postgres_port) instance.query("DROP DATABASE IF EXISTS test_database") yield cluster @@ -1124,6 +1122,18 @@ def test_remove_table_from_replication(started_cluster): cursor.execute('drop table if exists postgresql_replica_{};'.format(i)) +def test_predefined_connection_configuration(started_cluster): + drop_materialized_db() + conn = get_postgres_conn(ip=started_cluster.postgres_ip, port=started_cluster.postgres_port, database=True) + cursor = conn.cursor() + cursor.execute(f'DROP TABLE IF EXISTS test_table') + cursor.execute(f'CREATE TABLE test_table (key integer PRIMARY KEY, value integer)') + + instance.query("CREATE DATABASE test_database ENGINE = MaterializedPostgreSQL(postgres1)") + check_tables_are_synchronized("test_table"); + drop_materialized_db() + + if __name__ == '__main__': cluster.start() input("Cluster created, press any key to destroy...") diff --git a/tests/integration/test_redirect_url_storage/configs/named_collections.xml b/tests/integration/test_redirect_url_storage/configs/named_collections.xml new file mode 100644 index 00000000000..17cc701344a --- /dev/null +++ b/tests/integration/test_redirect_url_storage/configs/named_collections.xml @@ -0,0 +1,16 @@ + + + + +
+ Range + bytes=0-1 +
+
+ Access-Control-Request-Method + PUT +
+
+
+
+
diff --git a/tests/integration/test_redirect_url_storage/test.py b/tests/integration/test_redirect_url_storage/test.py index c99e5182c91..e2f1982f305 100644 --- a/tests/integration/test_redirect_url_storage/test.py +++ b/tests/integration/test_redirect_url_storage/test.py @@ -2,7 +2,7 @@ import pytest from helpers.cluster import ClickHouseCluster cluster = ClickHouseCluster(__file__) -node1 = cluster.add_instance('node1', with_zookeeper=False, with_hdfs=True) +node1 = cluster.add_instance('node1', main_configs=['configs/named_collections.xml'], with_zookeeper=False, with_hdfs=True) @pytest.fixture(scope="module") @@ -81,3 +81,17 @@ def test_url_with_redirect_allowed(started_cluster): node1.query( "create table WebHDFSStorageWithRedirect (id UInt32, name String, weight Float64) ENGINE = URL('http://hdfs1:50070/webhdfs/v1/simple_storage?op=OPEN&namenoderpcaddress=hdfs1:9000&offset=0', 'TSV')") assert node1.query("SET max_http_get_redirects=1; select * from WebHDFSStorageWithRedirect") == "1\tMark\t72.53\n" + + +def test_predefined_connection_configuration(started_cluster): + hdfs_api = started_cluster.hdfs_api + + hdfs_api.write_data("/simple_storage", "1\tMark\t72.53\n") + assert hdfs_api.read_data("/simple_storage") == "1\tMark\t72.53\n" + + node1.query("drop table if exists WebHDFSStorageWithRedirect") + node1.query( + "create table WebHDFSStorageWithRedirect (id UInt32, name String, weight Float64) ENGINE = URL(url1, url='http://hdfs1:50070/webhdfs/v1/simple_storage?op=OPEN&namenoderpcaddress=hdfs1:9000&offset=0', format='TSV')") + assert node1.query("SET max_http_get_redirects=1; select * from WebHDFSStorageWithRedirect") == "1\tMark\t72.53\n" + result = node1.query("SET max_http_get_redirects=1; select * from url(url1, url='http://hdfs1:50070/webhdfs/v1/simple_storage?op=OPEN&namenoderpcaddress=hdfs1:9000&offset=0', format='TSV', structure='id UInt32, name String, weight Float64')") + assert(result == "1\tMark\t72.53\n") diff --git a/tests/integration/test_rocksdb_options/configs/rocksdb.xml b/tests/integration/test_rocksdb_options/configs/rocksdb.xml new file mode 100644 index 00000000000..afb2d2776fc --- /dev/null +++ b/tests/integration/test_rocksdb_options/configs/rocksdb.xml @@ -0,0 +1,22 @@ + + + + + 8 + + + 2 + + + + test + + 10000 + + + 14 + +
+
+
+
diff --git a/tests/integration/test_rocksdb_options/configs/rocksdb.yml b/tests/integration/test_rocksdb_options/configs/rocksdb.yml deleted file mode 100644 index 363ead6f318..00000000000 --- a/tests/integration/test_rocksdb_options/configs/rocksdb.yml +++ /dev/null @@ -1,13 +0,0 @@ ---- -rocksdb: - options: - max_background_jobs: 8 - column_family_options: - num_levels: 2 - tables: - - table: - name: test - options: - max_open_files: 10000 - column_family_options: - max_bytes_for_level_base: 14 diff --git a/tests/integration/test_rocksdb_options/test.py b/tests/integration/test_rocksdb_options/test.py index 286528107b8..f7ed8071ca9 100644 --- a/tests/integration/test_rocksdb_options/test.py +++ b/tests/integration/test_rocksdb_options/test.py @@ -9,10 +9,10 @@ from helpers.cluster import ClickHouseCluster cluster = ClickHouseCluster(__file__) -node = cluster.add_instance('node', main_configs=['configs/rocksdb.yml'], stay_alive=True) +node = cluster.add_instance('node', main_configs=['configs/rocksdb.xml'], stay_alive=True) -@pytest.fixture(scope='module', autouse=True) +@pytest.fixture(scope='module') def start_cluster(): try: cluster.start() @@ -20,52 +20,52 @@ def start_cluster(): finally: cluster.shutdown() -def test_valid_options(): +def test_valid_options(start_cluster): node.query(""" CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); DROP TABLE test; """) -def test_invalid_options(): - node.exec_in_container(['bash', '-c', "sed -i 's/max_background_jobs/no_such_option/' /etc/clickhouse-server/config.d/rocksdb.yml"]) +def test_invalid_options(start_cluster): + node.exec_in_container(['bash', '-c', "sed -i 's/max_background_jobs/no_such_option/g' /etc/clickhouse-server/config.d/rocksdb.xml"]) node.restart_clickhouse() with pytest.raises(QueryRuntimeException): node.query(""" CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); """) - node.exec_in_container(['bash', '-c', "sed -i 's/no_such_option/max_background_jobs/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.exec_in_container(['bash', '-c', "sed -i 's/no_such_option/max_background_jobs/g' /etc/clickhouse-server/config.d/rocksdb.xml"]) node.restart_clickhouse() -def test_table_valid_options(): +def test_table_valid_options(start_cluster): node.query(""" CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); DROP TABLE test; """) -def test_table_invalid_options(): - node.exec_in_container(['bash', '-c', "sed -i 's/max_open_files/no_such_table_option/' /etc/clickhouse-server/config.d/rocksdb.yml"]) +def test_table_invalid_options(start_cluster): + node.exec_in_container(['bash', '-c', "sed -i 's/max_open_files/no_such_table_option/g' /etc/clickhouse-server/config.d/rocksdb.xml"]) node.restart_clickhouse() with pytest.raises(QueryRuntimeException): node.query(""" CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); """) - node.exec_in_container(['bash', '-c', "sed -i 's/no_such_table_option/max_open_files/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.exec_in_container(['bash', '-c', "sed -i 's/no_such_table_option/max_open_files/g' /etc/clickhouse-server/config.d/rocksdb.xml"]) node.restart_clickhouse() -def test_valid_column_family_options(): +def test_valid_column_family_options(start_cluster): node.query(""" CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); DROP TABLE test; """) def test_invalid_column_family_options(): - node.exec_in_container(['bash', '-c', "sed -i 's/num_levels/no_such_column_family_option/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.exec_in_container(['bash', '-c', "sed -i 's/num_levels/no_such_column_family_option/g' /etc/clickhouse-server/config.d/rocksdb.xml"]) node.restart_clickhouse() with pytest.raises(QueryRuntimeException): node.query(""" CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); """) - node.exec_in_container(['bash', '-c', "sed -i 's/no_such_column_family_option/num_levels/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.exec_in_container(['bash', '-c', "sed -i 's/no_such_column_family_option/num_levels/g' /etc/clickhouse-server/config.d/rocksdb.xml"]) node.restart_clickhouse() def test_table_valid_column_family_options(): @@ -75,11 +75,11 @@ def test_table_valid_column_family_options(): """) def test_table_invalid_column_family_options(): - node.exec_in_container(['bash', '-c', "sed -i 's/max_bytes_for_level_base/no_such_table_column_family_option/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.exec_in_container(['bash', '-c', "sed -i 's/max_bytes_for_level_base/no_such_table_column_family_option/g' /etc/clickhouse-server/config.d/rocksdb.xml"]) node.restart_clickhouse() with pytest.raises(QueryRuntimeException): node.query(""" CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); """) - node.exec_in_container(['bash', '-c', "sed -i 's/no_such_table_column_family_option/max_bytes_for_level_base/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.exec_in_container(['bash', '-c', "sed -i 's/no_such_table_column_family_option/max_bytes_for_level_base/g' /etc/clickhouse-server/config.d/rocksdb.xml"]) node.restart_clickhouse() diff --git a/tests/integration/test_storage_mongodb/configs/named_collections.xml b/tests/integration/test_storage_mongodb/configs/named_collections.xml new file mode 100644 index 00000000000..fcdcb85bd36 --- /dev/null +++ b/tests/integration/test_storage_mongodb/configs/named_collections.xml @@ -0,0 +1,12 @@ + + + + root + clickhouse + mongo1 + 27017 + test + simple_table
+
+
+
diff --git a/tests/integration/test_storage_mongodb/test.py b/tests/integration/test_storage_mongodb/test.py index aeb8419b734..1a5de353d7d 100644 --- a/tests/integration/test_storage_mongodb/test.py +++ b/tests/integration/test_storage_mongodb/test.py @@ -11,7 +11,7 @@ def started_cluster(request): try: cluster = ClickHouseCluster(__file__) node = cluster.add_instance('node', - main_configs=["configs_secure/config.d/ssl_conf.xml"], + main_configs=["configs_secure/config.d/ssl_conf.xml", "configs/named_collections.xml"], with_mongo=True, with_mongo_secure=request.param) cluster.start() @@ -124,3 +124,18 @@ def test_secure_connection(started_cluster): assert node.query("SELECT data from simple_mongo_table where key = 42") == hex(42 * 42) + '\n' node.query("DROP TABLE simple_mongo_table") simple_mongo_table.drop() + +@pytest.mark.parametrize('started_cluster', [False], indirect=['started_cluster']) +def test_predefined_connection_configuration(started_cluster): + mongo_connection = get_mongo_connection(started_cluster) + db = mongo_connection['test'] + db.add_user('root', 'clickhouse') + simple_mongo_table = db['simple_table'] + data = [] + for i in range(0, 100): + data.append({'key': i, 'data': hex(i * i)}) + simple_mongo_table.insert_many(data) + + node = started_cluster.instances['node'] + node.query("create table simple_mongo_table(key UInt64, data String) engine = MongoDB(mongo1)") + simple_mongo_table.drop() diff --git a/tests/integration/test_storage_mysql/configs/named_collections.xml b/tests/integration/test_storage_mysql/configs/named_collections.xml new file mode 100644 index 00000000000..45d28d521f7 --- /dev/null +++ b/tests/integration/test_storage_mysql/configs/named_collections.xml @@ -0,0 +1,25 @@ + + + + root + clickhouse + mysql57 + 3306 + clickhouse + test_table
+
+ + postgres + mysecretpassword + postgres1 + + + root + clickhouse + mysql57 + 1111 + clickhouse + test_table
+
+
+
diff --git a/tests/integration/test_storage_mysql/test.py b/tests/integration/test_storage_mysql/test.py index 39147916f0a..a81bd876073 100644 --- a/tests/integration/test_storage_mysql/test.py +++ b/tests/integration/test_storage_mysql/test.py @@ -10,7 +10,7 @@ from helpers.client import QueryRuntimeException cluster = ClickHouseCluster(__file__) -node1 = cluster.add_instance('node1', main_configs=['configs/remote_servers.xml'], with_mysql=True) +node1 = cluster.add_instance('node1', main_configs=['configs/remote_servers.xml', 'configs/named_collections.xml'], with_mysql=True) node2 = cluster.add_instance('node2', main_configs=['configs/remote_servers.xml'], with_mysql_cluster=True) node3 = cluster.add_instance('node3', main_configs=['configs/remote_servers.xml'], user_configs=['configs/users.xml'], with_mysql=True) @@ -246,7 +246,7 @@ def test_mysql_distributed(started_cluster): node2.query(''' CREATE TABLE test_replicas (id UInt32, name String, age UInt32, money UInt32) - ENGINE = MySQL(`mysql{2|3|4}:3306`, 'clickhouse', 'test_replicas', 'root', 'clickhouse'); ''') + ENGINE = MySQL('mysql{2|3|4}:3306', 'clickhouse', 'test_replicas', 'root', 'clickhouse'); ''') # Fill remote tables with different data to be able to check nodes = [node1, node2, node2, node2] @@ -255,13 +255,13 @@ def test_mysql_distributed(started_cluster): nodes[i-1].query(''' CREATE TABLE test_replica{} (id UInt32, name String, age UInt32, money UInt32) - ENGINE = MySQL(`mysql{}:3306`, 'clickhouse', 'test_replicas', 'root', 'clickhouse');'''.format(i, 57 if i==1 else i)) + ENGINE = MySQL('mysql{}:3306', 'clickhouse', 'test_replicas', 'root', 'clickhouse');'''.format(i, 57 if i==1 else i)) nodes[i-1].query("INSERT INTO test_replica{} (id, name) SELECT number, 'host{}' from numbers(10) ".format(i, i)) # test multiple ports parsing - result = node2.query('''SELECT DISTINCT(name) FROM mysql(`mysql{57|2|3}:3306`, 'clickhouse', 'test_replicas', 'root', 'clickhouse'); ''') + result = node2.query('''SELECT DISTINCT(name) FROM mysql('mysql{57|2|3}:3306', 'clickhouse', 'test_replicas', 'root', 'clickhouse'); ''') assert(result == 'host1\n' or result == 'host2\n' or result == 'host3\n') - result = node2.query('''SELECT DISTINCT(name) FROM mysql(`mysql57:3306|mysql2:3306|mysql3:3306`, 'clickhouse', 'test_replicas', 'root', 'clickhouse'); ''') + result = node2.query('''SELECT DISTINCT(name) FROM mysql('mysql57:3306|mysql2:3306|mysql3:3306', 'clickhouse', 'test_replicas', 'root', 'clickhouse'); ''') assert(result == 'host1\n' or result == 'host2\n' or result == 'host3\n') # check all replicas are traversed @@ -279,7 +279,7 @@ def test_mysql_distributed(started_cluster): node2.query(''' CREATE TABLE test_shards (id UInt32, name String, age UInt32, money UInt32) - ENGINE = ExternalDistributed('MySQL', `mysql{57|2}:3306,mysql{3|4}:3306`, 'clickhouse', 'test_replicas', 'root', 'clickhouse'); ''') + ENGINE = ExternalDistributed('MySQL', 'mysql{57|2}:3306,mysql{3|4}:3306', 'clickhouse', 'test_replicas', 'root', 'clickhouse'); ''') # Check only one replica in each shard is used result = node2.query("SELECT DISTINCT(name) FROM test_shards ORDER BY name") @@ -322,7 +322,6 @@ CREATE TABLE {}(id UInt32, name String, age UInt32, money UInt32) ENGINE = MySQL conn.close() -# Check that limited connection_wait_timeout (via connection_pool_size=1) will throw. def test_settings_connection_wait_timeout(started_cluster): table_name = 'test_settings_connection_wait_timeout' node1.query(f'DROP TABLE IF EXISTS {table_name}') @@ -367,6 +366,59 @@ def test_settings_connection_wait_timeout(started_cluster): drop_mysql_table(conn, table_name) conn.close() + +def test_predefined_connection_configuration(started_cluster): + conn = get_mysql_conn(started_cluster, started_cluster.mysql_ip) + table_name = 'test_table' + drop_mysql_table(conn, table_name) + create_mysql_table(conn, table_name) + + node1.query(''' + DROP TABLE IF EXISTS test_table; + CREATE TABLE test_table (id UInt32, name String, age UInt32, money UInt32) + ENGINE MySQL(mysql1); + ''') + node1.query("INSERT INTO test_table (id, name, money) select number, toString(number), number from numbers(100)") + assert (node1.query(f"SELECT count() FROM test_table").rstrip() == '100') + + node1.query(''' + DROP TABLE IF EXISTS test_table; + CREATE TABLE test_table (id UInt32, name String, age UInt32, money UInt32) + ENGINE MySQL(mysql1, replace_query=1); + ''') + node1.query("INSERT INTO test_table (id, name, money) select number, toString(number), number from numbers(100)") + node1.query("INSERT INTO test_table (id, name, money) select number, toString(number), number from numbers(100)") + assert (node1.query(f"SELECT count() FROM test_table").rstrip() == '100') + + node1.query_and_get_error(''' + DROP TABLE IF EXISTS test_table; + CREATE TABLE test_table (id UInt32, name String, age UInt32, money UInt32) + ENGINE MySQL(mysql1, query=1); + ''') + node1.query_and_get_error(''' + DROP TABLE IF EXISTS test_table; + CREATE TABLE test_table (id UInt32, name String, age UInt32, money UInt32) + ENGINE MySQL(mysql1, replace_query=1, on_duplicate_clause='kek'); + ''') + node1.query_and_get_error(''' + DROP TABLE IF EXISTS test_table; + CREATE TABLE test_table (id UInt32, name String, age UInt32, money UInt32) + ENGINE MySQL(fff); + ''') + node1.query_and_get_error(''' + DROP TABLE IF EXISTS test_table; + CREATE TABLE test_table (id UInt32, name String, age UInt32, money UInt32) + ENGINE MySQL(mysql2); + ''') + + node1.query(''' + DROP TABLE IF EXISTS test_table; + CREATE TABLE test_table (id UInt32, name String, age UInt32, money UInt32) + ENGINE MySQL(mysql3, port=3306); + ''') + assert (node1.query(f"SELECT count() FROM test_table").rstrip() == '100') + + # Regression for (k, v) IN ((k, v)) def test_mysql_in(started_cluster): table_name = 'test_mysql_in' @@ -397,6 +449,7 @@ def test_mysql_in(started_cluster): drop_mysql_table(conn, table_name) conn.close() + if __name__ == '__main__': with contextmanager(started_cluster)() as cluster: for name, instance in list(cluster.instances.items()): diff --git a/tests/integration/test_storage_postgresql/configs/named_collections.xml b/tests/integration/test_storage_postgresql/configs/named_collections.xml new file mode 100644 index 00000000000..0fedfbe8207 --- /dev/null +++ b/tests/integration/test_storage_postgresql/configs/named_collections.xml @@ -0,0 +1,33 @@ + + + + postgres + mysecretpassword + postgres1 + 5432 + postgres + test_table
+
+ + postgres + mysecretpassword + postgres1 + + + postgres + mysecretpassword + postgres1 + 1111 + postgres + test_table
+
+ + postgres + mysecretpassword + postgres1 + 5432 + postgres + test_replicas
+
+
+
diff --git a/tests/integration/test_storage_postgresql/test.py b/tests/integration/test_storage_postgresql/test.py index bb0e284eac9..6f43036e64d 100644 --- a/tests/integration/test_storage_postgresql/test.py +++ b/tests/integration/test_storage_postgresql/test.py @@ -5,8 +5,8 @@ from multiprocessing.dummy import Pool from helpers.cluster import ClickHouseCluster cluster = ClickHouseCluster(__file__) -node1 = cluster.add_instance('node1', with_postgres=True) -node2 = cluster.add_instance('node2', with_postgres_cluster=True) +node1 = cluster.add_instance('node1', main_configs=['configs/named_collections.xml'], with_postgres=True) +node2 = cluster.add_instance('node2', main_configs=['configs/named_collections.xml'], with_postgres_cluster=True) @pytest.fixture(scope="module") @@ -18,7 +18,6 @@ def started_cluster(): finally: cluster.shutdown() - def test_postgres_select_insert(started_cluster): cursor = started_cluster.postgres_conn.cursor() table_name = 'test_many' @@ -243,9 +242,9 @@ def test_postgres_distributed(started_cluster): cursors[i].execute(f"""INSERT INTO test_replicas select i, 'host{i+1}' from generate_series(0, 99) as t(i);"""); # test multiple ports parsing - result = node2.query('''SELECT DISTINCT(name) FROM postgresql(`postgres{1|2|3}:5432`, 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') + result = node2.query('''SELECT DISTINCT(name) FROM postgresql('postgres{1|2|3}:5432', 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') assert(result == 'host1\n' or result == 'host2\n' or result == 'host3\n') - result = node2.query('''SELECT DISTINCT(name) FROM postgresql(`postgres2:5431|postgres3:5432`, 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') + result = node2.query('''SELECT DISTINCT(name) FROM postgresql('postgres2:5431|postgres3:5432', 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') assert(result == 'host3\n' or result == 'host2\n') # Create storage with with 3 replicas @@ -253,7 +252,7 @@ def test_postgres_distributed(started_cluster): node2.query(''' CREATE TABLE test_replicas (id UInt32, name String) - ENGINE = PostgreSQL(`postgres{2|3|4}:5432`, 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') + ENGINE = PostgreSQL('postgres{2|3|4}:5432', 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') # Check all replicas are traversed query = "SELECT name FROM (" @@ -269,12 +268,20 @@ def test_postgres_distributed(started_cluster): node2.query(''' CREATE TABLE test_shards (id UInt32, name String, age UInt32, money UInt32) - ENGINE = ExternalDistributed('PostgreSQL', `postgres{1|2}:5432,postgres{3|4}:5432`, 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') + ENGINE = ExternalDistributed('PostgreSQL', 'postgres{1|2}:5432,postgres{3|4}:5432', 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') # Check only one replica in each shard is used result = node2.query("SELECT DISTINCT(name) FROM test_shards ORDER BY name") assert(result == 'host1\nhost3\n') + node2.query(''' + CREATE TABLE test_shards2 + (id UInt32, name String, age UInt32, money UInt32) + ENGINE = ExternalDistributed('PostgreSQL', postgres4, description='postgres{1|2}:5432,postgres{3|4}:5432'); ''') + + result = node2.query("SELECT DISTINCT(name) FROM test_shards2 ORDER BY name") + assert(result == 'host1\nhost3\n') + # Check all replicas are traversed query = "SELECT name FROM (" for i in range (3): @@ -354,6 +361,69 @@ def test_postgres_on_conflict(started_cluster): cursor.execute(f'DROP TABLE {table} ') +def test_predefined_connection_configuration(started_cluster): + cursor = started_cluster.postgres_conn.cursor() + cursor.execute(f'DROP TABLE IF EXISTS test_table') + cursor.execute(f'CREATE TABLE test_table (a integer PRIMARY KEY, b integer)') + + node1.query(''' + DROP TABLE IF EXISTS test_table; + CREATE TABLE test_table (a UInt32, b Int32) + ENGINE PostgreSQL(postgres1); + ''') + node1.query(f''' INSERT INTO test_table SELECT number, number from numbers(100)''') + assert (node1.query(f"SELECT count() FROM test_table").rstrip() == '100') + + node1.query(''' + DROP TABLE test_table; + CREATE TABLE test_table (a UInt32, b Int32) + ENGINE PostgreSQL(postgres1, on_conflict='ON CONFLICT DO NOTHING'); + ''') + node1.query(f''' INSERT INTO test_table SELECT number, number from numbers(100)''') + node1.query(f''' INSERT INTO test_table SELECT number, number from numbers(100)''') + assert (node1.query(f"SELECT count() FROM test_table").rstrip() == '100') + + node1.query('DROP TABLE test_table;') + node1.query_and_get_error(''' + CREATE TABLE test_table (a UInt32, b Int32) + ENGINE PostgreSQL(postgres1, 'ON CONFLICT DO NOTHING'); + ''') + node1.query_and_get_error(''' + CREATE TABLE test_table (a UInt32, b Int32) + ENGINE PostgreSQL(postgres2); + ''') + node1.query_and_get_error(''' + CREATE TABLE test_table (a UInt32, b Int32) + ENGINE PostgreSQL(unknown_collection); + ''') + + node1.query(''' + CREATE TABLE test_table (a UInt32, b Int32) + ENGINE PostgreSQL(postgres1, port=5432, database='postgres', table='test_table'); + ''') + assert (node1.query(f"SELECT count() FROM test_table").rstrip() == '100') + + node1.query(''' + DROP TABLE test_table; + CREATE TABLE test_table (a UInt32, b Int32) + ENGINE PostgreSQL(postgres3, port=5432); + ''') + assert (node1.query(f"SELECT count() FROM test_table").rstrip() == '100') + + assert (node1.query(f"SELECT count() FROM postgresql(postgres1)").rstrip() == '100') + node1.query("INSERT INTO TABLE FUNCTION postgresql(postgres1, on_conflict='ON CONFLICT DO NOTHING') SELECT number, number from numbers(100)") + assert (node1.query(f"SELECT count() FROM postgresql(postgres1)").rstrip() == '100') + + cursor.execute('DROP SCHEMA IF EXISTS test_schema CASCADE') + cursor.execute('CREATE SCHEMA test_schema') + cursor.execute('CREATE TABLE test_schema.test_table (a integer)') + node1.query("INSERT INTO TABLE FUNCTION postgresql(postgres1, schema='test_schema', on_conflict='ON CONFLICT DO NOTHING') SELECT number from numbers(200)") + assert (node1.query(f"SELECT count() FROM postgresql(postgres1, schema='test_schema')").rstrip() == '200') + + cursor.execute('DROP SCHEMA test_schema CASCADE') + cursor.execute(f'DROP TABLE test_table ') + + if __name__ == '__main__': cluster.start() input("Cluster created, press any key to destroy...") diff --git a/tests/integration/test_storage_s3/configs/named_collections.xml b/tests/integration/test_storage_s3/configs/named_collections.xml new file mode 100644 index 00000000000..5af6d84859a --- /dev/null +++ b/tests/integration/test_storage_s3/configs/named_collections.xml @@ -0,0 +1,9 @@ + + + + http://minio1:9001/root/test_table + minio + minio123 + + + diff --git a/tests/integration/test_storage_s3/test.py b/tests/integration/test_storage_s3/test.py index 95b823c8345..2f49b462d19 100644 --- a/tests/integration/test_storage_s3/test.py +++ b/tests/integration/test_storage_s3/test.py @@ -87,7 +87,7 @@ def started_cluster(): cluster = ClickHouseCluster(__file__) cluster.add_instance("restricted_dummy", main_configs=["configs/config_for_test_remote_host_filter.xml"], with_minio=True) - cluster.add_instance("dummy", with_minio=True, main_configs=["configs/defaultS3.xml"]) + cluster.add_instance("dummy", with_minio=True, main_configs=["configs/defaultS3.xml", "configs/named_collections.xml"]) cluster.add_instance("s3_max_redirects", with_minio=True, main_configs=["configs/defaultS3.xml"], user_configs=["configs/s3_max_redirects.xml"]) logging.info("Starting cluster...") @@ -735,3 +735,18 @@ def test_truncate_table(started_cluster): assert(len(list(minio.list_objects(started_cluster.minio_bucket, 'truncate/'))) == 0) assert instance.query("SELECT * FROM {}".format(name)) == "" + +def test_predefined_connection_configuration(started_cluster): + bucket = started_cluster.minio_bucket + instance = started_cluster.instances["dummy"] # type: ClickHouseInstance + name = "test_table" + + instance.query("drop table if exists {}".format(name)) + instance.query("CREATE TABLE {} (id UInt32) ENGINE = S3(s3_conf1, format='CSV')".format(name)) + + instance.query("INSERT INTO {} SELECT number FROM numbers(10)".format(name)) + result = instance.query("SELECT * FROM {}".format(name)) + assert result == instance.query("SELECT number FROM numbers(10)") + + result = instance.query("SELECT * FROM s3(s3_conf1, format='CSV', structure='id UInt32')") + assert result == instance.query("SELECT number FROM numbers(10)") diff --git a/tests/performance/fuse_sumcount.xml b/tests/performance/fuse_sumcount.xml index b2eb0e678e2..237edb1b970 100644 --- a/tests/performance/fuse_sumcount.xml +++ b/tests/performance/fuse_sumcount.xml @@ -6,7 +6,7 @@ Also test GROUP BY with and without keys, because they might have different optimizations. --> - 1 + 1 @@ -21,13 +21,13 @@ SELECT sum(number) FROM numbers(1000000000) FORMAT Null SELECT sum(number), count(number) FROM numbers(1000000000) FORMAT Null - SELECT sum(number), count(number) FROM numbers(1000000000) SETTINGS optimize_fuse_sum_count_avg = 0 FORMAT Null + SELECT sum(number), count(number) FROM numbers(1000000000) SETTINGS optimize_syntax_fuse_functions = 0 FORMAT Null SELECT sum(number), avg(number), count(number) FROM numbers(1000000000) FORMAT Null - SELECT sum(number), avg(number), count(number) FROM numbers(1000000000) SETTINGS optimize_fuse_sum_count_avg = 0 FORMAT Null + SELECT sum(number), avg(number), count(number) FROM numbers(1000000000) SETTINGS optimize_syntax_fuse_functions = 0 FORMAT Null SELECT sum(number) FROM numbers(100000000) GROUP BY intHash32(number) % 1000 FORMAT Null SELECT sum(number), count(number) FROM numbers(100000000) GROUP BY intHash32(number) % 1000 FORMAT Null - SELECT sum(number), count(number) FROM numbers(100000000) GROUP BY intHash32(number) % 1000 SETTINGS optimize_fuse_sum_count_avg = 0 FORMAT Null + SELECT sum(number), count(number) FROM numbers(100000000) GROUP BY intHash32(number) % 1000 SETTINGS optimize_syntax_fuse_functions = 0 FORMAT Null SELECT sum(number), avg(number), count(number) FROM numbers(100000000) GROUP BY intHash32(number) % 1000 FORMAT Null - SELECT sum(number), avg(number), count(number) FROM numbers(100000000) GROUP BY intHash32(number) % 1000 SETTINGS optimize_fuse_sum_count_avg = 0 FORMAT Null + SELECT sum(number), avg(number), count(number) FROM numbers(100000000) GROUP BY intHash32(number) % 1000 SETTINGS optimize_syntax_fuse_functions = 0 FORMAT Null diff --git a/tests/performance/sum.xml b/tests/performance/sum.xml index 075102998c6..57b879a360d 100644 --- a/tests/performance/sum.xml +++ b/tests/performance/sum.xml @@ -24,7 +24,7 @@ FROM numbers_mt(200000000) SETTINGS max_threads = 8 - SELECT sum(x) FROM nullfloat32 - SELECT sum(x::Nullable(Float64)) FROM nullfloat32 + SELECT sum(x) FROM nullfloat32 + SELECT sum(x::Nullable(Float64)) FROM nullfloat32 DROP TABLE IF EXISTS nullfloat32 diff --git a/tests/queries/0_stateless/01098_msgpack_format.reference b/tests/queries/0_stateless/01098_msgpack_format.reference index ad116a5ba91..384852f24a7 100644 --- a/tests/queries/0_stateless/01098_msgpack_format.reference +++ b/tests/queries/0_stateless/01098_msgpack_format.reference @@ -11,3 +11,25 @@ 2020-01-01 2020-01-02 2020-01-02 +{1:2,2:3} [{1:[1,2],2:[3,4]},{3:[5,6],4:[7,8]}] +{1:2,2:3} [{1:[1,2],2:[3,4]},{3:[5,6],4:[7,8]}] +42 42 42 ['42','42'] ['42','42'] +42 \N \N [NULL,'42',NULL] [NULL,'42',NULL] +42 42 42 ['42','42'] ['42','42'] +42 \N \N [NULL,'42',NULL] [NULL,'42',NULL] +OK +OK +OK +OK +OK +OK +OK +OK +OK +OK +OK +OK +OK +OK +OK +OK diff --git a/tests/queries/0_stateless/01098_msgpack_format.sh b/tests/queries/0_stateless/01098_msgpack_format.sh index fe1622aca3e..aa982c5478d 100755 --- a/tests/queries/0_stateless/01098_msgpack_format.sh +++ b/tests/queries/0_stateless/01098_msgpack_format.sh @@ -1,10 +1,14 @@ #!/usr/bin/env bash -# Tags: no-fasttest +# Tags: no-fasttest, no-parallel CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh + +USER_FILES_PATH=$(clickhouse-client --query "select _path,_file from file('nonexist.txt', 'CSV', 'val1 char')" 2>&1 | grep Exception | awk '{gsub("/nonexist.txt","",$9); print $9}') + + $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS msgpack"; $CLICKHOUSE_CLIENT --query="CREATE TABLE msgpack (uint8 UInt8, uint16 UInt16, uint32 UInt32, uint64 UInt64, int8 Int8, int16 Int16, int32 Int32, int64 Int64, float Float32, double Float64, string String, date Date, datetime DateTime('Europe/Moscow'), datetime64 DateTime64(3, 'Europe/Moscow'), array Array(UInt32)) ENGINE = Memory"; @@ -62,3 +66,66 @@ $CLICKHOUSE_CLIENT --query="SELECT * FROM msgpack"; $CLICKHOUSE_CLIENT --query="DROP TABLE msgpack"; + +$CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS msgpack_map"; + +$CLICKHOUSE_CLIENT --query="CREATE TABLE msgpack_map (m Map(UInt64, UInt64), a Array(Map(UInt64, Array(UInt64)))) ENGINE=Memory()"; + +$CLICKHOUSE_CLIENT --query="INSERT INTO msgpack_map VALUES ({1 : 2, 2 : 3}, [{1 : [1, 2], 2 : [3, 4]}, {3 : [5, 6], 4 : [7, 8]}])"; + + +$CLICKHOUSE_CLIENT --query="SELECT * FROM msgpack_map FORMAT MsgPack" | $CLICKHOUSE_CLIENT --query="INSERT INTO msgpack_map FORMAT MsgPack"; + +$CLICKHOUSE_CLIENT --query="SELECT * FROM msgpack_map"; + +$CLICKHOUSE_CLIENT --query="DROP TABLE msgpack_map"; + + +$CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS msgpack_lc_nullable"; + +$CLICKHOUSE_CLIENT --query="CREATE TABLE msgpack_lc_nullable (a LowCardinality(String), b Nullable(String), c LowCardinality(Nullable(String)), d Array(Nullable(String)), e Array(LowCardinality(Nullable(String)))) engine=Memory()"; + +$CLICKHOUSE_CLIENT --query="INSERT INTO msgpack_lc_nullable VALUES ('42', '42', '42', ['42', '42'], ['42', '42']), ('42', NULL, NULL, [NULL, '42', NULL], [NULL, '42', NULL])"; + + +$CLICKHOUSE_CLIENT --query="SELECT * FROM msgpack_lc_nullable FORMAT MsgPack" | $CLICKHOUSE_CLIENT --query="INSERT INTO msgpack_lc_nullable FORMAT MsgPack"; + +$CLICKHOUSE_CLIENT --query="SELECT * FROM msgpack_lc_nullable"; + +$CLICKHOUSE_CLIENT --query="DROP TABLE msgpack_lc_nullable"; + + +$CLICKHOUSE_CLIENT --query="SELECT toString(number) FROM numbers(10) FORMAT MsgPack" > $USER_FILES_PATH/data.msgpack + +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x UInt64')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x Float32')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x Array(UInt32)')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x Map(UInt64, UInt64)')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; + + +$CLICKHOUSE_CLIENT --query="SELECT number FROM numbers(10) FORMAT MsgPack" > $USER_FILES_PATH/data.msgpack + +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x Float32')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x String')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x Array(UInt64)')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x Map(UInt64, UInt64)')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; + + +$CLICKHOUSE_CLIENT --query="SELECT [number, number + 1] FROM numbers(10) FORMAT MsgPack" > $USER_FILES_PATH/data.msgpack + +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x Float32')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x String')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x UInt64')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x Map(UInt64, UInt64)')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; + + +$CLICKHOUSE_CLIENT --query="SELECT map(number, number + 1) FROM numbers(10) FORMAT MsgPack" > $USER_FILES_PATH/data.msgpack + +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x Float32')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x String')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x UInt64')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; +$CLICKHOUSE_CLIENT --query="SELECT * FROM file('data.msgpack', 'MsgPack', 'x Array(UInt64)')" 2>&1 | grep -F -q "ILLEGAL_COLUMN" && echo 'OK' || echo 'FAIL'; + + +rm $USER_FILES_PATH/data.msgpack + diff --git a/tests/queries/0_stateless/01160_table_dependencies.reference b/tests/queries/0_stateless/01160_table_dependencies.reference new file mode 100644 index 00000000000..39a58b06076 --- /dev/null +++ b/tests/queries/0_stateless/01160_table_dependencies.reference @@ -0,0 +1,6 @@ +dict1 +dict2 +dict_src +join +s +t diff --git a/tests/queries/0_stateless/01160_table_dependencies.sh b/tests/queries/0_stateless/01160_table_dependencies.sh new file mode 100755 index 00000000000..149439f2981 --- /dev/null +++ b/tests/queries/0_stateless/01160_table_dependencies.sh @@ -0,0 +1,45 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + + +$CLICKHOUSE_CLIENT -q "drop table if exists dict_src;" +$CLICKHOUSE_CLIENT -q "drop dictionary if exists dict1;" +$CLICKHOUSE_CLIENT -q "drop dictionary if exists dict2;" +$CLICKHOUSE_CLIENT -q "drop table if exists join;" +$CLICKHOUSE_CLIENT -q "drop table if exists t;" + +$CLICKHOUSE_CLIENT -q "create table dict_src (n int, m int, s String) engine=MergeTree order by n;" + +$CLICKHOUSE_CLIENT -q "create dictionary dict1 (n int default 0, m int default 1, s String default 'qqq') +PRIMARY KEY n +SOURCE(CLICKHOUSE(HOST 'localhost' PORT tcpPort() USER 'default' TABLE 'dict_src' PASSWORD '' DB '$CLICKHOUSE_DATABASE')) +LIFETIME(MIN 1 MAX 10) LAYOUT(FLAT());" + +$CLICKHOUSE_CLIENT -q "create table join(n int, m int default dictGet('$CLICKHOUSE_DATABASE.dict1', 'm', 42::UInt64)) engine=Join(any, left, n);" + +$CLICKHOUSE_CLIENT -q "create dictionary dict2 (n int default 0, m int DEFAULT 2, s String default 'asd') +PRIMARY KEY n +SOURCE(CLICKHOUSE(HOST 'localhost' PORT tcpPort() USER 'default' TABLE 'join' PASSWORD '' DB '$CLICKHOUSE_DATABASE')) +LIFETIME(MIN 1 MAX 10) LAYOUT(FLAT());" + +$CLICKHOUSE_CLIENT -q "create table s (x default joinGet($CLICKHOUSE_DATABASE.join, 'm', 42::int)) engine=Set" + +$CLICKHOUSE_CLIENT -q "create table t (n int, m int default joinGet($CLICKHOUSE_DATABASE.join, 'm', 42::int), +s String default dictGet($CLICKHOUSE_DATABASE.dict1, 's', 42::UInt64), x default in(1, $CLICKHOUSE_DATABASE.s)) engine=MergeTree order by n;" + +CLICKHOUSE_CLIENT_DEFAULT_DB=$(echo ${CLICKHOUSE_CLIENT} | sed 's/'"--database=${CLICKHOUSE_DATABASE}"'/--database=default/g') + +for _ in {1..10}; do + $CLICKHOUSE_CLIENT_DEFAULT_DB -q "detach database $CLICKHOUSE_DATABASE;" + $CLICKHOUSE_CLIENT_DEFAULT_DB -q "attach database $CLICKHOUSE_DATABASE;" +done +$CLICKHOUSE_CLIENT -q "show tables from $CLICKHOUSE_DATABASE;" + +$CLICKHOUSE_CLIENT -q "drop table dict_src;" +$CLICKHOUSE_CLIENT -q "drop dictionary dict1;" +$CLICKHOUSE_CLIENT -q "drop dictionary dict2;" +$CLICKHOUSE_CLIENT -q "drop table join;" +$CLICKHOUSE_CLIENT -q "drop table t;" diff --git a/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.reference b/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.reference index 11c8996c021..bedf9e9a091 100644 --- a/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.reference +++ b/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.reference @@ -26,3 +26,20 @@ localhost 9000 0 0 0 localhost 9000 57 Code: 57. Error: Table default.never_throw already exists. (TABLE_ALREADY_EXISTS) 0 0 localhost 9000 0 1 0 localhost 1 \N \N 1 0 +distributed_ddl_queue +2 localhost 9000 test_shard_localhost CREATE TABLE default.none ON CLUSTER test_shard_localhost (`n` int) ENGINE = Memory 1 localhost 9000 Finished 0 1 1 +2 localhost 9000 test_shard_localhost CREATE TABLE default.none ON CLUSTER test_shard_localhost (`n` int) ENGINE = Memory 1 localhost 9000 Finished 57 Code: 57. DB::Error: Table default.none already exists. (TABLE_ALREADY_EXISTS) 1 1 +2 localhost 9000 test_unavailable_shard DROP TABLE IF EXISTS default.none ON CLUSTER test_unavailable_shard 1 localhost 1 Inactive \N \N \N \N +2 localhost 9000 test_unavailable_shard DROP TABLE IF EXISTS default.none ON CLUSTER test_unavailable_shard 1 localhost 9000 Finished 0 1 1 +2 localhost 9000 test_shard_localhost CREATE TABLE default.throw ON CLUSTER test_shard_localhost (`n` int) ENGINE = Memory 1 localhost 9000 Finished 0 1 1 +2 localhost 9000 test_shard_localhost CREATE TABLE default.throw ON CLUSTER test_shard_localhost (`n` int) ENGINE = Memory 1 localhost 9000 Finished 57 Code: 57. DB::Error: Table default.throw already exists. (TABLE_ALREADY_EXISTS) 1 1 +2 localhost 9000 test_unavailable_shard DROP TABLE IF EXISTS default.throw ON CLUSTER test_unavailable_shard 1 localhost 1 Inactive \N \N \N \N +2 localhost 9000 test_unavailable_shard DROP TABLE IF EXISTS default.throw ON CLUSTER test_unavailable_shard 1 localhost 9000 Finished 0 1 1 +2 localhost 9000 test_shard_localhost CREATE TABLE default.null_status ON CLUSTER test_shard_localhost (`n` int) ENGINE = Memory 1 localhost 9000 Finished 0 1 1 +2 localhost 9000 test_shard_localhost CREATE TABLE default.null_status ON CLUSTER test_shard_localhost (`n` int) ENGINE = Memory 1 localhost 9000 Finished 57 Code: 57. DB::Error: Table default.null_status already exists. (TABLE_ALREADY_EXISTS) 1 1 +2 localhost 9000 test_unavailable_shard DROP TABLE IF EXISTS default.null_status ON CLUSTER test_unavailable_shard 1 localhost 1 Inactive \N \N \N \N +2 localhost 9000 test_unavailable_shard DROP TABLE IF EXISTS default.null_status ON CLUSTER test_unavailable_shard 1 localhost 9000 Finished 0 1 1 +2 localhost 9000 test_shard_localhost CREATE TABLE default.never_throw ON CLUSTER test_shard_localhost (`n` int) ENGINE = Memory 1 localhost 9000 Finished 0 1 1 +2 localhost 9000 test_shard_localhost CREATE TABLE default.never_throw ON CLUSTER test_shard_localhost (`n` int) ENGINE = Memory 1 localhost 9000 Finished 57 Code: 57. DB::Error: Table default.never_throw already exists. (TABLE_ALREADY_EXISTS) 1 1 +2 localhost 9000 test_unavailable_shard DROP TABLE IF EXISTS default.never_throw ON CLUSTER test_unavailable_shard 1 localhost 1 Inactive \N \N \N \N +2 localhost 9000 test_unavailable_shard DROP TABLE IF EXISTS default.never_throw ON CLUSTER test_unavailable_shard 1 localhost 9000 Finished 0 1 1 diff --git a/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.sh b/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.sh index ce848e6835b..e4a23055ae6 100755 --- a/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.sh +++ b/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.sh @@ -12,6 +12,9 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) TMP_OUT=$(mktemp "$CURDIR/01175_distributed_ddl_output_mode_long.XXXXXX") trap 'rm -f ${TMP_OUT:?}' EXIT +TIMEOUT=300 +MIN_TIMEOUT=1 + # We execute a distributed DDL query with timeout 1 to check that one host is unavailable and will time out and other complete successfully. # But sometimes one second is not enough even for healthy host to succeed. Repeat the test in this case. function run_until_out_contains() @@ -19,55 +22,72 @@ function run_until_out_contains() PATTERN=$1 shift - for _ in {1..20} + for ((i=MIN_TIMEOUT; i<10; i++)) do - "$@" > "$TMP_OUT" 2>&1 + "$@" --distributed_ddl_task_timeout="$i" > "$TMP_OUT" 2>&1 if grep -q "$PATTERN" "$TMP_OUT" then - cat "$TMP_OUT" + cat "$TMP_OUT" | sed "s/distributed_ddl_task_timeout (=$i)/distributed_ddl_task_timeout (=$MIN_TIMEOUT)/g" break fi done } +RAND_COMMENT="01175_DDL_$RANDOM" +LOG_COMMENT="${CLICKHOUSE_LOG_COMMENT}_$RAND_COMMENT" + +CLICKHOUSE_CLIENT_WITH_SETTINGS=${CLICKHOUSE_CLIENT/--log_comment=\'${CLICKHOUSE_LOG_COMMENT}\'/--log_comment=\'${LOG_COMMENT}\'} +CLICKHOUSE_CLIENT_WITH_SETTINGS+=" --output_format_parallel_formatting=0 " +CLICKHOUSE_CLIENT_WITH_SETTINGS+=" --distributed_ddl_entry_format_version=2 " + +CLIENT=${CLICKHOUSE_CLIENT_WITH_SETTINGS} +CLIENT+=" --distributed_ddl_task_timeout=$TIMEOUT " $CLICKHOUSE_CLIENT -q "drop table if exists none;" $CLICKHOUSE_CLIENT -q "drop table if exists throw;" $CLICKHOUSE_CLIENT -q "drop table if exists null_status;" $CLICKHOUSE_CLIENT -q "drop table if exists never_throw;" -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=none -q "select value from system.settings where name='distributed_ddl_output_mode';" +$CLIENT --distributed_ddl_output_mode=none -q "select value from system.settings where name='distributed_ddl_output_mode';" # Ok -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=none -q "create table none on cluster test_shard_localhost (n int) engine=Memory;" +$CLIENT --distributed_ddl_output_mode=none -q "create table none on cluster test_shard_localhost (n int) engine=Memory;" # Table exists -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=none -q "create table none on cluster test_shard_localhost (n int) engine=Memory;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" +$CLIENT --distributed_ddl_output_mode=none -q "create table none on cluster test_shard_localhost (n int) engine=Memory;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" # Timeout -run_until_out_contains 'There are 1 unfinished hosts' $CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=1 --distributed_ddl_output_mode=none -q "drop table if exists none on cluster test_unavailable_shard;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/Watching task .* is executing longer/Watching task is executing longer/" +run_until_out_contains 'There are 1 unfinished hosts' $CLICKHOUSE_CLIENT_WITH_SETTINGS --distributed_ddl_output_mode=none -q "drop table if exists none on cluster test_unavailable_shard;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/Watching task .* is executing longer/Watching task is executing longer/" -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=throw -q "select value from system.settings where name='distributed_ddl_output_mode';" -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=throw -q "create table throw on cluster test_shard_localhost (n int) engine=Memory;" -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=throw -q "create table throw on cluster test_shard_localhost (n int) engine=Memory format Null;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" +$CLIENT --distributed_ddl_output_mode=throw -q "select value from system.settings where name='distributed_ddl_output_mode';" +$CLIENT --distributed_ddl_output_mode=throw -q "create table throw on cluster test_shard_localhost (n int) engine=Memory;" +$CLIENT --distributed_ddl_output_mode=throw -q "create table throw on cluster test_shard_localhost (n int) engine=Memory format Null;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" -run_until_out_contains 'There are 1 unfinished hosts' $CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=1 --distributed_ddl_output_mode=throw -q "drop table if exists throw on cluster test_unavailable_shard;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/Watching task .* is executing longer/Watching task is executing longer/" +run_until_out_contains 'There are 1 unfinished hosts' $CLICKHOUSE_CLIENT_WITH_SETTINGS --distributed_ddl_output_mode=throw -q "drop table if exists throw on cluster test_unavailable_shard;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/Watching task .* is executing longer/Watching task is executing longer/" -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=null_status_on_timeout -q "select value from system.settings where name='distributed_ddl_output_mode';" -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=null_status_on_timeout -q "create table null_status on cluster test_shard_localhost (n int) engine=Memory;" -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=null_status_on_timeout -q "create table null_status on cluster test_shard_localhost (n int) engine=Memory format Null;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" +$CLIENT --distributed_ddl_output_mode=null_status_on_timeout -q "select value from system.settings where name='distributed_ddl_output_mode';" +$CLIENT --distributed_ddl_output_mode=null_status_on_timeout -q "create table null_status on cluster test_shard_localhost (n int) engine=Memory;" +$CLIENT --distributed_ddl_output_mode=null_status_on_timeout -q "create table null_status on cluster test_shard_localhost (n int) engine=Memory format Null;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" -run_until_out_contains '9000 0 ' $CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=1 --distributed_ddl_output_mode=null_status_on_timeout -q "drop table if exists null_status on cluster test_unavailable_shard;" +run_until_out_contains '9000 0 ' $CLICKHOUSE_CLIENT_WITH_SETTINGS --distributed_ddl_output_mode=null_status_on_timeout -q "drop table if exists null_status on cluster test_unavailable_shard;" -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=never_throw -q "select value from system.settings where name='distributed_ddl_output_mode';" -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=never_throw -q "create table never_throw on cluster test_shard_localhost (n int) engine=Memory;" -$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=never_throw -q "create table never_throw on cluster test_shard_localhost (n int) engine=Memory;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" +$CLIENT --distributed_ddl_output_mode=never_throw -q "select value from system.settings where name='distributed_ddl_output_mode';" +$CLIENT --distributed_ddl_output_mode=never_throw -q "create table never_throw on cluster test_shard_localhost (n int) engine=Memory;" +$CLIENT --distributed_ddl_output_mode=never_throw -q "create table never_throw on cluster test_shard_localhost (n int) engine=Memory;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" -run_until_out_contains '9000 0 ' $CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=1 --distributed_ddl_output_mode=never_throw -q "drop table if exists never_throw on cluster test_unavailable_shard;" +run_until_out_contains '9000 0 ' $CLICKHOUSE_CLIENT_WITH_SETTINGS --distributed_ddl_output_mode=never_throw -q "drop table if exists never_throw on cluster test_unavailable_shard;" $CLICKHOUSE_CLIENT -q "drop table if exists none;" $CLICKHOUSE_CLIENT -q "drop table if exists throw;" $CLICKHOUSE_CLIENT -q "drop table if exists null_status;" $CLICKHOUSE_CLIENT -q "drop table if exists never_throw;" + +$CLICKHOUSE_CLIENT -q "select 'distributed_ddl_queue'" +$CLICKHOUSE_CLIENT -q "select entry_version, initiator_host, initiator_port, cluster, replaceRegexpOne(query, 'UUID \'[0-9a-f\-]{36}\' ', ''), abs(query_create_time - now()) < 600, + host, port, status, exception_code, replace(replaceRegexpOne(exception_text, ' \(version.*', ''), 'Exception', 'Error'), abs(query_finish_time - query_create_time - query_duration_ms/1000) <= 1 , query_duration_ms < 600000 + from system.distributed_ddl_queue + where arrayExists((key, val) -> key='log_comment' and val like '%$RAND_COMMENT%', mapKeys(settings), mapValues(settings)) + and arrayExists((key, val) -> key='distributed_ddl_task_timeout' and val in ('$TIMEOUT', '$MIN_TIMEOUT'), mapKeys(settings), mapValues(settings)) + order by entry, host, port, exception_code" diff --git a/tests/queries/0_stateless/01372_remote_table_function_empty_table.sql b/tests/queries/0_stateless/01372_remote_table_function_empty_table.sql index 4153dc632f3..55c9d3f63d3 100644 --- a/tests/queries/0_stateless/01372_remote_table_function_empty_table.sql +++ b/tests/queries/0_stateless/01372_remote_table_function_empty_table.sql @@ -1,4 +1,4 @@ -SELECT * FROM remote('127..2', 'a.'); -- { serverError 36 } +SELECT * FROM remote('127..2', 'a.'); -- { serverError 62 } -- Clear cache to avoid future errors in the logs SYSTEM DROP DNS CACHE diff --git a/tests/queries/0_stateless/01651_map_functions.sql b/tests/queries/0_stateless/01651_map_functions.sql index 997006ecdf3..bbaaf9bee84 100644 --- a/tests/queries/0_stateless/01651_map_functions.sql +++ b/tests/queries/0_stateless/01651_map_functions.sql @@ -8,14 +8,14 @@ select mapContains(a, 'name') from table_map; select mapContains(a, 'gender') from table_map; select mapContains(a, 'abc') from table_map; select mapContains(a, b) from table_map; -select mapContains(a, 10) from table_map; -- { serverError 43 } +select mapContains(a, 10) from table_map; -- { serverError 386 } select mapKeys(a) from table_map; drop table if exists table_map; CREATE TABLE table_map (a Map(UInt8, Int), b UInt8, c UInt32) engine = MergeTree order by tuple(); insert into table_map select map(number, number), number, number from numbers(1000, 3); select mapContains(a, b), mapContains(a, c), mapContains(a, 233) from table_map; -select mapContains(a, 'aaa') from table_map; -- { serverError 43 } +select mapContains(a, 'aaa') from table_map; -- { serverError 386 } select mapContains(b, 'aaa') from table_map; -- { serverError 43 } select mapKeys(a) from table_map; select mapValues(a) from table_map; diff --git a/tests/queries/0_stateless/01744_fuse_sum_count_aggregate.sql b/tests/queries/0_stateless/01744_fuse_sum_count_aggregate.sql index cad7b5803d4..4648889ca27 100644 --- a/tests/queries/0_stateless/01744_fuse_sum_count_aggregate.sql +++ b/tests/queries/0_stateless/01744_fuse_sum_count_aggregate.sql @@ -2,7 +2,7 @@ DROP TABLE IF EXISTS fuse_tbl; CREATE TABLE fuse_tbl(a Int8, b Int8) Engine = Log; INSERT INTO fuse_tbl SELECT number, number + 1 FROM numbers(1, 20); -SET optimize_fuse_sum_count_avg = 1; +SET optimize_syntax_fuse_functions = 1; SELECT sum(a), sum(b), count(b) from fuse_tbl; EXPLAIN SYNTAX SELECT sum(a), sum(b), count(b) from fuse_tbl; SELECT '---------NOT trigger fuse--------'; diff --git a/tests/queries/0_stateless/01956_fuse_quantile_optimization.reference b/tests/queries/0_stateless/01956_fuse_quantile_optimization.reference new file mode 100644 index 00000000000..defad422cad --- /dev/null +++ b/tests/queries/0_stateless/01956_fuse_quantile_optimization.reference @@ -0,0 +1,97 @@ +2016-06-15 23:00:00 2016-06-15 23:00:00 +2016-06-15 23:00:00 2016-06-15 23:00:00 +2016-06-15 23:00:00 2016-06-15 23:00:00 +2016-06-15 23:00:00 2016-06-15 23:00:00 2016-06-15 23:00:00 +30000 30000 30000 +30000 30000 30000 +2016-06-15 23:00:16 2016-06-15 23:00:16 2016-06-15 23:00:16 +2016-06-15 23:00:16 2016-06-15 23:00:16 2016-06-15 23:00:16 +2016-04-02 17:23:12 2016-04-02 17:23:12 2016-04-02 17:23:12 +---------After fuse result----------- +quantile: +SELECT + quantiles(0.2, 0.3)(d)[1], + quantiles(0.2, 0.3)(d)[2] +FROM datetime +2016-06-15 23:00:00 2016-06-15 23:00:00 +quantileDeterministic: +SELECT + quantilesDeterministic(0.2, 0.5)(d, 1)[1], + quantilesDeterministic(0.2, 0.5)(d, 1)[2] +FROM datetime +2016-06-15 23:00:00 2016-06-15 23:00:00 +quantileExact: +SELECT + quantilesExact(0.2, 0.5)(d)[1], + quantilesExact(0.2, 0.5)(d)[2] +FROM datetime +2016-06-15 23:00:00 2016-06-15 23:00:00 +quantileExactWeighted: +SELECT + quantilesExactWeighted(0.2, 0.4)(d, 1)[1], + quantilesExactWeighted(0.2, 0.4)(d, 1)[2], + quantileExactWeighted(0.3)(d, 2) +FROM datetime +2016-06-15 23:00:00 2016-06-15 23:00:00 2016-06-15 23:00:00 +quantileTiming: +SELECT + quantilesTiming(0.2, 0.3)(d)[1], + quantilesTiming(0.2, 0.3)(d)[2], + quantileTiming(0.2)(d + 1) +FROM datetime +30000 30000 30000 +quantileTimingWeighted: +SELECT + quantilesTimingWeighted(0.2, 0.3)(d, 1)[1], + quantilesTimingWeighted(0.2, 0.3)(d, 1)[2], + quantileTimingWeighted(0.2)(d, 2) +FROM datetime +30000 30000 30000 +quantileTDigest: +SELECT + quantilesTDigest(0.2, 0.3)(d)[1], + quantilesTDigest(0.2, 0.3)(d)[2], + quantileTDigest(0.2)(d + 1) +FROM datetime +2016-06-15 23:00:16 2016-06-15 23:00:16 2016-06-15 23:00:16 +quantileTDigestWeighted: +SELECT + quantilesTDigestWeighted(0.2, 0.3)(d, 1)[1], + quantilesTDigestWeighted(0.2, 0.3)(d, 1)[2], + quantileTDigestWeighted(0.4)(d, 2) +FROM datetime +2016-06-15 23:00:16 2016-06-15 23:00:16 2016-06-15 23:00:16 +quantileBFloat16: +SELECT + quantilesBFloat16(0.2, 0.3)(d)[1], + quantilesBFloat16(0.2, 0.3)(d)[2], + quantileBFloat16(0.4)(d + 1) +FROM datetime +2016-04-02 17:23:12 2016-04-02 17:23:12 2016-04-02 17:23:12 +quantileBFloat16Weighted: +SELECT + quantilesBFloat16Weighted(0.2, 0.3)(d, 1)[1], + quantilesBFloat16Weighted(0.2, 0.3)(d, 1)[2], + quantileBFloat16Weighted(0.2)(d, 2) +FROM datetime +2016-04-02 17:23:12 2016-04-02 17:23:12 2016-04-02 17:23:12 +SELECT + quantiles(0.2, 0.3, 0.2)(d)[1] AS k, + quantiles(0.2, 0.3, 0.2)(d)[2] +FROM datetime +ORDER BY quantiles(0.2, 0.3, 0.2)(d)[3] ASC +0 4 7.2 7.6 +1 5 8.2 8.6 +SELECT + b, + quantiles(0.5, 0.9, 0.95)(x)[1] AS a, + quantiles(0.5, 0.9, 0.95)(x)[2] AS y, + quantiles(0.5, 0.9, 0.95)(x)[3] +FROM +( + SELECT + number AS x, + number % 2 AS b + FROM numbers(10) +) +GROUP BY b diff --git a/tests/queries/0_stateless/01956_fuse_quantile_optimization.sql b/tests/queries/0_stateless/01956_fuse_quantile_optimization.sql new file mode 100644 index 00000000000..2a97c60882c --- /dev/null +++ b/tests/queries/0_stateless/01956_fuse_quantile_optimization.sql @@ -0,0 +1,73 @@ +DROP TABLE IF EXISTS datetime; +CREATE TABLE datetime (d DateTime('UTC')) ENGINE = Memory; +INSERT INTO datetime(d) VALUES(toDateTime('2016-06-15 23:00:00', 'UTC')) + +SET optimize_syntax_fuse_functions = true; + +SELECT quantile(0.2)(d), quantile(0.3)(d) FROM datetime; +SELECT quantileDeterministic(0.2)(d, 1), quantileDeterministic(0.5)(d, 1) FROM datetime; +SELECT quantileExact(0.2)(d), quantileExact(0.5)(d) FROM datetime; +SELECT quantileExactWeighted(0.2)(d, 1), quantileExactWeighted(0.4)(d, 1), quantileExactWeighted(0.3)(d, 2) FROM datetime; +SELECT quantileTiming(0.2)(d), quantileTiming(0.3)(d), quantileTiming(0.2)(d+1) FROM datetime; +SELECT quantileTimingWeighted(0.2)(d, 1), quantileTimingWeighted(0.3)(d, 1), quantileTimingWeighted(0.2)(d, 2) FROM datetime; +SELECT quantileTDigest(0.2)(d), quantileTDigest(0.3)(d), quantileTDigest(0.2)(d + 1) FROM datetime; +SELECT quantileTDigestWeighted(0.2)(d, 1), quantileTDigestWeighted(0.3)(d, 1), quantileTDigestWeighted(0.4)(d, 2) FROM datetime; +SELECT quantileBFloat16(0.2)(d), quantileBFloat16(0.3)(d), quantileBFloat16(0.4)(d + 1) FROM datetime; + + +SELECT '---------After fuse result-----------'; +SELECT 'quantile:'; +EXPLAIN SYNTAX SELECT quantile(0.2)(d), quantile(0.3)(d) FROM datetime; +SELECT quantile(0.2)(d), quantile(0.3)(d) FROM datetime; + +SELECT 'quantileDeterministic:'; +EXPLAIN SYNTAX SELECT quantileDeterministic(0.2)(d, 1), quantileDeterministic(0.5)(d, 1) FROM datetime; +SELECT quantileDeterministic(0.2)(d, 1), quantileDeterministic(0.5)(d, 1) FROM datetime; + +SELECT 'quantileExact:'; +EXPLAIN SYNTAX SELECT quantileExact(0.2)(d), quantileExact(0.5)(d) FROM datetime; +SELECT quantileExact(0.2)(d), quantileExact(0.5)(d) FROM datetime; + +SELECT 'quantileExactWeighted:'; +EXPLAIN SYNTAX SELECT quantileExactWeighted(0.2)(d, 1), quantileExactWeighted(0.4)(d, 1), quantileExactWeighted(0.3)(d, 2) FROM datetime; +SELECT quantileExactWeighted(0.2)(d, 1), quantileExactWeighted(0.4)(d, 1), quantileExactWeighted(0.3)(d, 2) FROM datetime; + +SELECT 'quantileTiming:'; +EXPLAIN SYNTAX SELECT quantileTiming(0.2)(d), quantileTiming(0.3)(d), quantileTiming(0.2)(d+1) FROM datetime; +SELECT quantileTiming(0.2)(d), quantileTiming(0.3)(d), quantileTiming(0.2)(d+1) FROM datetime; + +SELECT 'quantileTimingWeighted:'; +EXPLAIN SYNTAX SELECT quantileTimingWeighted(0.2)(d, 1), quantileTimingWeighted(0.3)(d, 1), quantileTimingWeighted(0.2)(d, 2) FROM datetime; +SELECT quantileTimingWeighted(0.2)(d, 1), quantileTimingWeighted(0.3)(d, 1), quantileTimingWeighted(0.2)(d, 2) FROM datetime; + +SELECT 'quantileTDigest:'; +EXPLAIN SYNTAX SELECT quantileTDigest(0.2)(d), quantileTDigest(0.3)(d), quantileTDigest(0.2)(d + 1) FROM datetime; +SELECT quantileTDigest(0.2)(d), quantileTDigest(0.3)(d), quantileTDigest(0.2)(d + 1) FROM datetime; + +SELECT 'quantileTDigestWeighted:'; +EXPLAIN SYNTAX SELECT quantileTDigestWeighted(0.2)(d, 1), quantileTDigestWeighted(0.3)(d, 1), quantileTDigestWeighted(0.4)(d, 2) FROM datetime; +SELECT quantileTDigestWeighted(0.2)(d, 1), quantileTDigestWeighted(0.3)(d, 1), quantileTDigestWeighted(0.4)(d, 2) FROM datetime; + +SELECT 'quantileBFloat16:'; +EXPLAIN SYNTAX SELECT quantileBFloat16(0.2)(d), quantileBFloat16(0.3)(d), quantileBFloat16(0.4)(d + 1) FROM datetime; +SELECT quantileBFloat16(0.2)(d), quantileBFloat16(0.3)(d), quantileBFloat16(0.4)(d + 1) FROM datetime; + +SELECT 'quantileBFloat16Weighted:'; +EXPLAIN SYNTAX SELECT quantileBFloat16Weighted(0.2)(d, 1), quantileBFloat16Weighted(0.3)(d, 1), quantileBFloat16Weighted(0.2)(d, 2) FROM datetime; +SELECT quantileBFloat16Weighted(0.2)(d, 1), quantileBFloat16Weighted(0.3)(d, 1), quantileBFloat16Weighted(0.2)(d, 2) FROM datetime; + +EXPLAIN SYNTAX SELECT quantile(0.2)(d) as k, quantile(0.3)(d) FROM datetime order by quantile(0.2)(d); + +SELECT b, quantile(0.5)(x) as a, quantile(0.9)(x) as y, quantile(0.95)(x) FROM (select number as x, number % 2 as b from numbers(10)) group by b; +EXPLAIN SYNTAX SELECT b, quantile(0.5)(x) as a, quantile(0.9)(x) as y, quantile(0.95)(x) FROM (select number as x, number % 2 as b from numbers(10)) group by b; + +-- fuzzer +SELECT quantileDeterministic(0.99)(1023) FROM datetime FORMAT Null; -- { serverError NUMBER_OF_ARGUMENTS_DOESNT_MATCH } +SELECT quantileTiming(0.5)(NULL, NULL, quantileTiming(-inf)(NULL), NULL) FROM datetime FORMAT Null; -- { serverError ILLEGAL_AGGREGATION } +SELECT quantileTDigest(NULL)(NULL, quantileTDigest(3.14)(NULL, d + NULL), 2.), NULL FORMAT Null; -- { serverError ILLEGAL_AGGREGATION } +SELECT quantile(1, 0.3)(d), quantile(0.3)(d) FROM datetime; -- { serverError NUMBER_OF_ARGUMENTS_DOESNT_MATCH } +SELECT quantile(quantileDeterministic('', '2.47')('0.02', '0.2', NULL), 0.9)(d), quantile(0.3)(d) FROM datetime; -- { serverError ILLEGAL_AGGREGATION } +SELECT quantileTimingWeighted([[[[['-214748364.8'], NULL]], [[[quantileTimingWeighted([[[[['-214748364.8'], NULL], '-922337203.6854775808'], [[['-214748364.7']]], NULL]])([NULL], NULL), '-214748364.7']]], NULL]])([NULL], NULL); -- { serverError ILLEGAL_AGGREGATION } +SELECT quantileTimingWeighted([quantileTimingWeighted(0.5)(1, 1)])(1, 1); -- { serverError ILLEGAL_AGGREGATION } + +DROP TABLE datetime; diff --git a/tests/queries/0_stateless/02015_async_inserts_1.sh b/tests/queries/0_stateless/02015_async_inserts_1.sh index 365d2e99b31..b4310f5101c 100755 --- a/tests/queries/0_stateless/02015_async_inserts_1.sh +++ b/tests/queries/0_stateless/02015_async_inserts_1.sh @@ -4,7 +4,7 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh -url="${CLICKHOUSE_URL}&async_insert_mode=1&wait_for_async_insert=1" +url="${CLICKHOUSE_URL}&async_insert=1&wait_for_async_insert=1" ${CLICKHOUSE_CLIENT} -q "DROP TABLE IF EXISTS async_inserts" ${CLICKHOUSE_CLIENT} -q "CREATE TABLE async_inserts (id UInt32, s String) ENGINE = Memory" diff --git a/tests/queries/0_stateless/02015_async_inserts_2.sh b/tests/queries/0_stateless/02015_async_inserts_2.sh index 0eb11bb5219..90f5584d84e 100755 --- a/tests/queries/0_stateless/02015_async_inserts_2.sh +++ b/tests/queries/0_stateless/02015_async_inserts_2.sh @@ -4,7 +4,7 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh -url="${CLICKHOUSE_URL}&async_insert_mode=1&wait_for_async_insert=1" +url="${CLICKHOUSE_URL}&async_insert=1&wait_for_async_insert=1" ${CLICKHOUSE_CLIENT} -q "DROP TABLE IF EXISTS async_inserts" ${CLICKHOUSE_CLIENT} -q "CREATE TABLE async_inserts (id UInt32, s String) ENGINE = MergeTree ORDER BY id" diff --git a/tests/queries/0_stateless/02015_async_inserts_3.sh b/tests/queries/0_stateless/02015_async_inserts_3.sh index fe97354d3ac..9d85d81caac 100755 --- a/tests/queries/0_stateless/02015_async_inserts_3.sh +++ b/tests/queries/0_stateless/02015_async_inserts_3.sh @@ -4,7 +4,7 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh -url="${CLICKHOUSE_URL}&async_insert_mode=1&wait_for_async_insert=1" +url="${CLICKHOUSE_URL}&async_insert=1&wait_for_async_insert=1" ${CLICKHOUSE_CLIENT} -q "DROP TABLE IF EXISTS async_inserts" ${CLICKHOUSE_CLIENT} -q "CREATE TABLE async_inserts (id UInt32, v UInt32 DEFAULT id * id) ENGINE = Memory" diff --git a/tests/queries/0_stateless/02015_async_inserts_4.sh b/tests/queries/0_stateless/02015_async_inserts_4.sh index f8cc0aa0a48..65598923b96 100755 --- a/tests/queries/0_stateless/02015_async_inserts_4.sh +++ b/tests/queries/0_stateless/02015_async_inserts_4.sh @@ -4,7 +4,7 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh -url="${CLICKHOUSE_URL}&async_insert_mode=1&wait_for_async_insert=1" +url="${CLICKHOUSE_URL}&async_insert=1&wait_for_async_insert=1" ${CLICKHOUSE_CLIENT} -q "DROP USER IF EXISTS u_02015_allowed" ${CLICKHOUSE_CLIENT} -q "DROP USER IF EXISTS u_02015_denied" diff --git a/tests/queries/0_stateless/02015_async_inserts_5.sh b/tests/queries/0_stateless/02015_async_inserts_5.sh index e07e274d1d7..05ea876b101 100755 --- a/tests/queries/0_stateless/02015_async_inserts_5.sh +++ b/tests/queries/0_stateless/02015_async_inserts_5.sh @@ -4,7 +4,7 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh -url="${CLICKHOUSE_URL}&async_insert_mode=1&wait_for_async_insert=1" +url="${CLICKHOUSE_URL}&async_insert=1&wait_for_async_insert=1" ${CLICKHOUSE_CLIENT} -q "DROP TABLE IF EXISTS async_inserts" ${CLICKHOUSE_CLIENT} -q "CREATE TABLE async_inserts (id UInt32, s String) ENGINE = MergeTree ORDER BY id SETTINGS parts_to_throw_insert = 1" diff --git a/tests/queries/0_stateless/02015_async_inserts_6.reference b/tests/queries/0_stateless/02015_async_inserts_6.reference new file mode 100644 index 00000000000..f3a80cd0cdf --- /dev/null +++ b/tests/queries/0_stateless/02015_async_inserts_6.reference @@ -0,0 +1,4 @@ +Code: 60 +Code: 73 +Code: 73 +Code: 16 diff --git a/tests/queries/0_stateless/02015_async_inserts_6.sh b/tests/queries/0_stateless/02015_async_inserts_6.sh new file mode 100755 index 00000000000..94091783081 --- /dev/null +++ b/tests/queries/0_stateless/02015_async_inserts_6.sh @@ -0,0 +1,24 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +url="${CLICKHOUSE_URL}&async_insert=1&wait_for_async_insert=0" + +${CLICKHOUSE_CLIENT} -q "DROP TABLE IF EXISTS async_inserts" +${CLICKHOUSE_CLIENT} -q "CREATE TABLE async_inserts (id UInt32, s String) ENGINE = Memory" + +${CLICKHOUSE_CURL} -sS $url -d 'INSERT INTO async_inserts1 FORMAT JSONEachRow {"id": 1, "s": "a"}' \ + | grep -o "Code: 60" + +${CLICKHOUSE_CURL} -sS $url -d 'INSERT INTO async_inserts FORMAT BadFormat {"id": 1, "s": "a"}' \ + | grep -o "Code: 73" + +${CLICKHOUSE_CURL} -sS $url -d 'INSERT INTO async_inserts FORMAT Pretty {"id": 1, "s": "a"}' \ + | grep -o "Code: 73" + +${CLICKHOUSE_CURL} -sS $url -d 'INSERT INTO async_inserts (id, a) FORMAT JSONEachRow {"id": 1, "s": "a"}' \ + | grep -o "Code: 16" + +${CLICKHOUSE_CLIENT} -q "DROP TABLE async_inserts" diff --git a/tests/queries/0_stateless/02015_async_inserts_stress_long.sh b/tests/queries/0_stateless/02015_async_inserts_stress_long.sh index c11a1be8cef..f9a58818404 100755 --- a/tests/queries/0_stateless/02015_async_inserts_stress_long.sh +++ b/tests/queries/0_stateless/02015_async_inserts_stress_long.sh @@ -7,7 +7,7 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) function insert1() { - url="${CLICKHOUSE_URL}&async_insert_mode=1&wait_for_async_insert=0" + url="${CLICKHOUSE_URL}&async_insert=1&wait_for_async_insert=0" while true; do ${CLICKHOUSE_CURL} -sS "$url" -d 'INSERT INTO async_inserts FORMAT CSV 1,"a" @@ -18,7 +18,7 @@ function insert1() function insert2() { - url="${CLICKHOUSE_URL}&async_insert_mode=1&wait_for_async_insert=0" + url="${CLICKHOUSE_URL}&async_insert=1&wait_for_async_insert=0" while true; do ${CLICKHOUSE_CURL} -sS "$url" -d 'INSERT INTO async_inserts FORMAT JSONEachRow {"id": 5, "s": "e"} {"id": 6, "s": "f"}' done diff --git a/tests/queries/0_stateless/02029_output_csv_null_representation.reference b/tests/queries/0_stateless/02029_output_csv_null_representation.reference new file mode 100644 index 00000000000..a5174f4424f --- /dev/null +++ b/tests/queries/0_stateless/02029_output_csv_null_representation.reference @@ -0,0 +1,4 @@ +# output_format_csv_null_representation should initially be \\N +"val1",\N,"val3" +# Changing output_format_csv_null_representation +"val1",∅,"val3" diff --git a/tests/queries/0_stateless/02029_output_csv_null_representation.sql b/tests/queries/0_stateless/02029_output_csv_null_representation.sql new file mode 100644 index 00000000000..772c6c89144 --- /dev/null +++ b/tests/queries/0_stateless/02029_output_csv_null_representation.sql @@ -0,0 +1,16 @@ +DROP TABLE IF EXISTS test_data; +CREATE TABLE test_data ( + col1 Nullable(String), + col2 Nullable(String), + col3 Nullable(String) +) ENGINE = Memory; + +INSERT INTO test_data VALUES ('val1', NULL, 'val3'); + +SELECT '# output_format_csv_null_representation should initially be \\N'; +SELECT * FROM test_data FORMAT CSV; + +SELECT '# Changing output_format_csv_null_representation'; +SET output_format_csv_null_representation = '∅'; +SELECT * FROM test_data FORMAT CSV; +SET output_format_csv_null_representation = '\\N'; diff --git a/tests/queries/0_stateless/2020_exponential_smoothing.reference b/tests/queries/0_stateless/2020_exponential_smoothing.reference new file mode 100644 index 00000000000..8ebf4c3c066 --- /dev/null +++ b/tests/queries/0_stateless/2020_exponential_smoothing.reference @@ -0,0 +1,130 @@ +1 0 0.5 +0 1 0.25 +0 2 0.125 +0 3 0.0625 +0 4 0.03125 +0 5 0.015625 +0 6 0.0078125 +0 7 0.00390625 +0 8 0.001953125 +0 9 0.0009765625 +1 0 0.067 +0 1 0.062 +0 2 0.058 +0 3 0.054 +0 4 0.051 +0 5 0.047 +0 6 0.044 +0 7 0.041 +0 8 0.038 +0 9 0.036 +0 0 0 +1 1 0.5 +2 2 1.25 +3 3 2.125 +4 4 3.0625 +5 5 4.03125 +6 6 5.015625 +7 7 6.0078125 +8 8 7.00390625 +9 9 8.001953125 +1 0 0.067 ███▎ +0 1 0.062 ███ +0 2 0.058 ██▊ +0 3 0.054 ██▋ +0 4 0.051 ██▌ +0 5 0.047 ██▎ +0 6 0.044 ██▏ +0 7 0.041 ██ +0 8 0.038 █▊ +0 9 0.036 █▋ +0 10 0.033 █▋ +0 11 0.031 █▌ +0 12 0.029 █▍ +0 13 0.027 █▎ +0 14 0.025 █▎ +0 15 0.024 █▏ +0 16 0.022 █ +0 17 0.021 █ +0 18 0.019 ▊ +0 19 0.018 ▊ +0 20 0.017 ▋ +0 21 0.016 ▋ +0 22 0.015 ▋ +0 23 0.014 ▋ +0 24 0.013 ▋ +1 25 0.079 ███▊ +1 26 0.14 ███████ +1 27 0.198 █████████▊ +1 28 0.252 ████████████▌ +1 29 0.302 ███████████████ +1 30 0.349 █████████████████▍ +1 31 0.392 ███████████████████▌ +1 32 0.433 █████████████████████▋ +1 33 0.471 ███████████████████████▌ +1 34 0.506 █████████████████████████▎ +1 35 0.539 ██████████████████████████▊ +1 36 0.57 ████████████████████████████▌ +1 37 0.599 █████████████████████████████▊ +1 38 0.626 ███████████████████████████████▎ +1 39 0.651 ████████████████████████████████▌ +1 40 0.674 █████████████████████████████████▋ +1 41 0.696 ██████████████████████████████████▋ +1 42 0.716 ███████████████████████████████████▋ +1 43 0.735 ████████████████████████████████████▋ +1 44 0.753 █████████████████████████████████████▋ +1 45 0.77 ██████████████████████████████████████▍ +1 46 0.785 ███████████████████████████████████████▎ +1 47 0.8 ███████████████████████████████████████▊ +1 48 0.813 ████████████████████████████████████████▋ +1 49 0.825 █████████████████████████████████████████▎ +1 0 0.5 █████████████████████████ +0 1 0.25 ████████████▌ +0 2 0.125 ██████▎ +0 3 0.062 ███ +0 4 0.031 █▌ +1 5 0.516 █████████████████████████▋ +0 6 0.258 ████████████▊ +0 7 0.129 ██████▍ +0 8 0.064 ███▏ +0 9 0.032 █▌ +1 10 0.516 █████████████████████████▋ +0 11 0.258 ████████████▊ +0 12 0.129 ██████▍ +0 13 0.065 ███▏ +0 14 0.032 █▌ +1 15 0.516 █████████████████████████▋ +0 16 0.258 ████████████▊ +0 17 0.129 ██████▍ +0 18 0.065 ███▏ +0 19 0.032 █▌ +1 20 0.516 █████████████████████████▋ +0 21 0.258 ████████████▊ +0 22 0.129 ██████▍ +0 23 0.065 ███▏ +0 24 0.032 █▌ +1 25 0.516 █████████████████████████▋ +0 26 0.258 ████████████▊ +0 27 0.129 ██████▍ +0 28 0.065 ███▏ +0 29 0.032 █▌ +1 30 0.516 █████████████████████████▋ +0 31 0.258 ████████████▊ +0 32 0.129 ██████▍ +0 33 0.065 ███▏ +0 34 0.032 █▌ +1 35 0.516 █████████████████████████▋ +0 36 0.258 ████████████▊ +0 37 0.129 ██████▍ +0 38 0.065 ███▏ +0 39 0.032 █▌ +1 40 0.516 █████████████████████████▋ +0 41 0.258 ████████████▊ +0 42 0.129 ██████▍ +0 43 0.065 ███▏ +0 44 0.032 █▌ +1 45 0.516 █████████████████████████▋ +0 46 0.258 ████████████▊ +0 47 0.129 ██████▍ +0 48 0.065 ███▏ +0 49 0.032 █▌ diff --git a/tests/queries/0_stateless/2020_exponential_smoothing.sql b/tests/queries/0_stateless/2020_exponential_smoothing.sql new file mode 100644 index 00000000000..a210225453a --- /dev/null +++ b/tests/queries/0_stateless/2020_exponential_smoothing.sql @@ -0,0 +1,32 @@ +SELECT number = 0 AS value, number AS time, exponentialMovingAverage(1)(value, time) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS exp_smooth FROM numbers(10); +SELECT value, time, round(exp_smooth, 3) FROM (SELECT number = 0 AS value, number AS time, exponentialMovingAverage(10)(value, time) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS exp_smooth FROM numbers(10)); + +SELECT number AS value, number AS time, exponentialMovingAverage(1)(value, time) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS exp_smooth FROM numbers(10); + +SELECT + value, + time, + round(exp_smooth, 3), + bar(exp_smooth, 0, 1, 50) AS bar +FROM +( + SELECT + (number = 0) OR (number >= 25) AS value, + number AS time, + exponentialMovingAverage(10)(value, time) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS exp_smooth + FROM numbers(50) +); + +SELECT + value, + time, + round(exp_smooth, 3), + bar(exp_smooth, 0, 1, 50) AS bar +FROM +( + SELECT + (number % 5) = 0 AS value, + number AS time, + exponentialMovingAverage(1)(value, time) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS exp_smooth + FROM numbers(50) +); diff --git a/tests/queries/0_stateless/2021_exponential_sum.reference b/tests/queries/0_stateless/2021_exponential_sum.reference new file mode 100644 index 00000000000..5bd77479cf7 --- /dev/null +++ b/tests/queries/0_stateless/2021_exponential_sum.reference @@ -0,0 +1,8 @@ +0.0009765625 +0.0009775171065493646 +0.0009775171065493646 +0.0009775171065493646 +0.0009775171065493646 +0.0009775171065493646 +0.0009775171065493646 +0.0009775171065493646 diff --git a/tests/queries/0_stateless/2021_exponential_sum.sql b/tests/queries/0_stateless/2021_exponential_sum.sql new file mode 100644 index 00000000000..8ab7638029c --- /dev/null +++ b/tests/queries/0_stateless/2021_exponential_sum.sql @@ -0,0 +1,9 @@ +-- Check that it is deterministic +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM numbers_mt(10); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM numbers_mt(100); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM numbers_mt(1000); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM numbers_mt(10000); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM numbers_mt(100000); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM numbers_mt(1000000); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM numbers_mt(10000000); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM numbers_mt(100000000); diff --git a/tests/queries/0_stateless/2021_exponential_sum_shard.reference b/tests/queries/0_stateless/2021_exponential_sum_shard.reference new file mode 100644 index 00000000000..8453706a05a --- /dev/null +++ b/tests/queries/0_stateless/2021_exponential_sum_shard.reference @@ -0,0 +1,5 @@ +0.009775171065493644 +0.009775171065493644 +0.009775171065493644 +0.009775171065493644 +0.009775171065493644 diff --git a/tests/queries/0_stateless/2021_exponential_sum_shard.sql b/tests/queries/0_stateless/2021_exponential_sum_shard.sql new file mode 100644 index 00000000000..49fde0fe217 --- /dev/null +++ b/tests/queries/0_stateless/2021_exponential_sum_shard.sql @@ -0,0 +1,6 @@ +-- Check that it is deterministic +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM remote('127.0.0.{1..10}', numbers_mt(1000)); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM remote('127.0.0.{1..10}', numbers_mt(10000)); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM remote('127.0.0.{1..10}', numbers_mt(100000)); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM remote('127.0.0.{1..10}', numbers_mt(1000000)); +WITH number % 10 = 0 AS value, number AS time SELECT exponentialMovingAverage(1)(value, time) AS exp_smooth FROM remote('127.0.0.{1..10}', numbers_mt(10000000)); diff --git a/tests/queries/0_stateless/2021_map_bloom_filter_index.reference b/tests/queries/0_stateless/2021_map_bloom_filter_index.reference new file mode 100644 index 00000000000..4f0a04073ae --- /dev/null +++ b/tests/queries/0_stateless/2021_map_bloom_filter_index.reference @@ -0,0 +1,46 @@ +Map bloom filter mapKeys +Equals with existing key +0 {'K0':'V0'} +Equals with non existing key +Equals with non existing key and default value +0 {'K0':'V0'} +1 {'K1':'V1'} +Not equals with existing key +1 {'K1':'V1'} +Not equals with non existing key +0 {'K0':'V0'} +1 {'K1':'V1'} +Not equals with non existing key and default value +IN with existing key +0 {'K0':'V0'} +IN with non existing key +IN with non existing key and default value +0 {'K0':'V0'} +1 {'K1':'V1'} +NOT IN with existing key +1 {'K1':'V1'} +NOT IN with non existing key +0 {'K0':'V0'} +1 {'K1':'V1'} +NOT IN with non existing key and default value +MapContains with existing key +0 {'K0':'V0'} +MapContains with non existing key +MapContains with non existing key and default value +Has with existing key +0 {'K0':'V0'} +Has with non existing key +Has with non existing key and default value +Map bloom filter mapValues +IN with existing key +0 {'K0':'V0'} +IN with non existing key +IN with non existing key and default value +0 {'K0':'V0'} +1 {'K1':'V1'} +NOT IN with existing key +1 {'K1':'V1'} +NOT IN with non existing key +0 {'K0':'V0'} +1 {'K1':'V1'} +NOT IN with non existing key and default value diff --git a/tests/queries/0_stateless/2021_map_bloom_filter_index.sql b/tests/queries/0_stateless/2021_map_bloom_filter_index.sql new file mode 100644 index 00000000000..6e0c4f4a360 --- /dev/null +++ b/tests/queries/0_stateless/2021_map_bloom_filter_index.sql @@ -0,0 +1,80 @@ +DROP TABLE IF EXISTS map_test_index_map_keys; +CREATE TABLE map_test_index_map_keys +( + row_id UInt32, + map Map(String, String), + INDEX map_bloom_filter_keys mapKeys(map) TYPE bloom_filter GRANULARITY 1 +) Engine=MergeTree() ORDER BY row_id; + +INSERT INTO map_test_index_map_keys VALUES (0, {'K0':'V0'}), (1, {'K1':'V1'}); + +SELECT 'Map bloom filter mapKeys'; + +SELECT 'Equals with existing key'; +SELECT * FROM map_test_index_map_keys WHERE map['K0'] = 'V0'; +SELECT 'Equals with non existing key'; +SELECT * FROM map_test_index_map_keys WHERE map['K2'] = 'V2'; +SELECT 'Equals with non existing key and default value'; +SELECT * FROM map_test_index_map_keys WHERE map['K3'] = ''; +SELECT 'Not equals with existing key'; +SELECT * FROM map_test_index_map_keys WHERE map['K0'] != 'V0'; +SELECT 'Not equals with non existing key'; +SELECT * FROM map_test_index_map_keys WHERE map['K2'] != 'V2'; +SELECT 'Not equals with non existing key and default value'; +SELECT * FROM map_test_index_map_keys WHERE map['K3'] != ''; + +SELECT 'IN with existing key'; +SELECT * FROM map_test_index_map_keys WHERE map['K0'] IN 'V0'; +SELECT 'IN with non existing key'; +SELECT * FROM map_test_index_map_keys WHERE map['K2'] IN 'V2'; +SELECT 'IN with non existing key and default value'; +SELECT * FROM map_test_index_map_keys WHERE map['K3'] IN ''; +SELECT 'NOT IN with existing key'; +SELECT * FROM map_test_index_map_keys WHERE map['K0'] NOT IN 'V0'; +SELECT 'NOT IN with non existing key'; +SELECT * FROM map_test_index_map_keys WHERE map['K2'] NOT IN 'V2'; +SELECT 'NOT IN with non existing key and default value'; +SELECT * FROM map_test_index_map_keys WHERE map['K3'] NOT IN ''; + +SELECT 'MapContains with existing key'; +SELECT * FROM map_test_index_map_keys WHERE mapContains(map, 'K0'); +SELECT 'MapContains with non existing key'; +SELECT * FROM map_test_index_map_keys WHERE mapContains(map, 'K2'); +SELECT 'MapContains with non existing key and default value'; +SELECT * FROM map_test_index_map_keys WHERE mapContains(map, ''); + +SELECT 'Has with existing key'; +SELECT * FROM map_test_index_map_keys WHERE has(map, 'K0'); +SELECT 'Has with non existing key'; +SELECT * FROM map_test_index_map_keys WHERE has(map, 'K2'); +SELECT 'Has with non existing key and default value'; +SELECT * FROM map_test_index_map_keys WHERE has(map, ''); + +DROP TABLE map_test_index_map_keys; + +DROP TABLE IF EXISTS map_test_index_map_values; +CREATE TABLE map_test_index_map_values +( + row_id UInt32, + map Map(String, String), + INDEX map_bloom_filter_values mapValues(map) TYPE bloom_filter GRANULARITY 1 +) Engine=MergeTree() ORDER BY row_id; + +INSERT INTO map_test_index_map_values VALUES (0, {'K0':'V0'}), (1, {'K1':'V1'}); + +SELECT 'Map bloom filter mapValues'; + +SELECT 'IN with existing key'; +SELECT * FROM map_test_index_map_values WHERE map['K0'] IN 'V0'; +SELECT 'IN with non existing key'; +SELECT * FROM map_test_index_map_values WHERE map['K2'] IN 'V2'; +SELECT 'IN with non existing key and default value'; +SELECT * FROM map_test_index_map_values WHERE map['K3'] IN ''; +SELECT 'NOT IN with existing key'; +SELECT * FROM map_test_index_map_values WHERE map['K0'] NOT IN 'V0'; +SELECT 'NOT IN with non existing key'; +SELECT * FROM map_test_index_map_values WHERE map['K2'] NOT IN 'V2'; +SELECT 'NOT IN with non existing key and default value'; +SELECT * FROM map_test_index_map_values WHERE map['K3'] NOT IN ''; + +DROP TABLE map_test_index_map_values; diff --git a/tests/queries/0_stateless/2021_map_has.reference b/tests/queries/0_stateless/2021_map_has.reference new file mode 100644 index 00000000000..2c10bd62504 --- /dev/null +++ b/tests/queries/0_stateless/2021_map_has.reference @@ -0,0 +1,6 @@ +Non constant map +1 +0 +Constant map +1 +0 diff --git a/tests/queries/0_stateless/2021_map_has.sql b/tests/queries/0_stateless/2021_map_has.sql new file mode 100644 index 00000000000..84099058273 --- /dev/null +++ b/tests/queries/0_stateless/2021_map_has.sql @@ -0,0 +1,14 @@ +DROP TABLE IF EXISTS test_map; +CREATE TABLE test_map (value Map(String, String)) ENGINE=TinyLog; + +SELECT 'Non constant map'; +INSERT INTO test_map VALUES ({'K0':'V0'}); +SELECT has(value, 'K0') FROM test_map; +SELECT has(value, 'K1') FROM test_map; + +SELECT 'Constant map'; + +SELECT has(map('K0', 'V0'), 'K0') FROM system.one; +SELECT has(map('K0', 'V0'), 'K1') FROM system.one; + +DROP TABLE test_map; diff --git a/tests/queries/0_stateless/2022_array_full_text_bloom_filter_index.reference b/tests/queries/0_stateless/2022_array_full_text_bloom_filter_index.reference new file mode 100644 index 00000000000..f61dedd9bd2 --- /dev/null +++ b/tests/queries/0_stateless/2022_array_full_text_bloom_filter_index.reference @@ -0,0 +1,8 @@ +1 ['K1 K1'] ['K1 K1'] +2 ['K2 K2'] ['K2 K2'] +1 ['K1 K1'] ['K1 K1'] +2 ['K2 K2'] ['K2 K2'] +1 ['K1 K1'] ['K1 K1'] +2 ['K2 K2'] ['K2 K2'] +1 ['K1 K1'] ['K1 K1'] +2 ['K2 K2'] ['K2 K2'] diff --git a/tests/queries/0_stateless/2022_array_full_text_bloom_filter_index.sql b/tests/queries/0_stateless/2022_array_full_text_bloom_filter_index.sql new file mode 100644 index 00000000000..6a2a00674cb --- /dev/null +++ b/tests/queries/0_stateless/2022_array_full_text_bloom_filter_index.sql @@ -0,0 +1,42 @@ +DROP TABLE IF EXISTS bf_tokenbf_array_test; +DROP TABLE IF EXISTS bf_ngram_array_test; + +CREATE TABLE bf_tokenbf_array_test +( + row_id UInt32, + array Array(String), + array_fixed Array(FixedString(5)), + INDEX array_bf_tokenbf array TYPE tokenbf_v1(256,2,0) GRANULARITY 1, + INDEX array_fixed_bf_tokenbf array_fixed TYPE tokenbf_v1(256,2,0) GRANULARITY 1 +) Engine=MergeTree() ORDER BY row_id SETTINGS index_granularity = 2; + +CREATE TABLE bf_ngram_array_test +( + row_id UInt32, + array Array(String), + array_fixed Array(FixedString(5)), + INDEX array_ngram array TYPE ngrambf_v1(4,256,2,0) GRANULARITY 1, + INDEX array_fixed_ngram array_fixed TYPE ngrambf_v1(4,256,2,0) GRANULARITY 1 +) Engine=MergeTree() ORDER BY row_id SETTINGS index_granularity = 2; + +INSERT INTO bf_tokenbf_array_test VALUES (1, ['K1 K1'], ['K1 K1']), (2, ['K2 K2'], ['K2 K2']); +INSERT INTO bf_ngram_array_test VALUES (1, ['K1 K1'], ['K1 K1']), (2, ['K2 K2'], ['K2 K2']); + +SELECT * FROM bf_tokenbf_array_test WHERE has(array, 'K1 K1'); +SELECT * FROM bf_tokenbf_array_test WHERE has(array, 'K2 K2'); +SELECT * FROM bf_tokenbf_array_test WHERE has(array, 'K3 K3'); + +SELECT * FROM bf_tokenbf_array_test WHERE has(array_fixed, 'K1 K1'); +SELECT * FROM bf_tokenbf_array_test WHERE has(array_fixed, 'K2 K2'); +SELECT * FROM bf_tokenbf_array_test WHERE has(array_fixed, 'K3 K3'); + +SELECT * FROM bf_ngram_array_test WHERE has(array, 'K1 K1'); +SELECT * FROM bf_ngram_array_test WHERE has(array, 'K2 K2'); +SELECT * FROM bf_ngram_array_test WHERE has(array, 'K3 K3'); + +SELECT * FROM bf_ngram_array_test WHERE has(array_fixed, 'K1 K1'); +SELECT * FROM bf_ngram_array_test WHERE has(array_fixed, 'K2 K2'); +SELECT * FROM bf_ngram_array_test WHERE has(array_fixed, 'K3 K3'); + +DROP TABLE bf_tokenbf_array_test; +DROP TABLE bf_ngram_array_test; diff --git a/utils/github/cherrypick.py b/utils/github/cherrypick.py index 89072b316b2..8bedf54fefa 100644 --- a/utils/github/cherrypick.py +++ b/utils/github/cherrypick.py @@ -114,7 +114,7 @@ class CherryPick: 'Merge it only if you intend to backport changes to the target branch, otherwise just close it.\n' ) - git_prefix = ['git', '-C', repo_path, '-c', 'user.email=robot-clickhouse@yandex-team.ru', '-c', 'user.name=robot-clickhouse'] + git_prefix = ['git', '-C', repo_path, '-c', 'user.email=robot-clickhouse@clickhouse.com', '-c', 'user.name=robot-clickhouse'] pr_title = 'Backport #{number} to {target}: {title}'.format( number=self._pr['number'], target=self.target_branch, diff --git a/utils/release/release_lib.sh b/utils/release/release_lib.sh index efa41ae22ad..9f6c2285d93 100644 --- a/utils/release/release_lib.sh +++ b/utils/release/release_lib.sh @@ -168,7 +168,7 @@ function gen_changelog { -e "s/[@]VERSION_STRING[@]/$VERSION_STRING/g" \ -e "s/[@]DATE[@]/$CHDATE/g" \ -e "s/[@]AUTHOR[@]/$AUTHOR/g" \ - -e "s/[@]EMAIL[@]/$(whoami)@yandex-team.ru/g" \ + -e "s/[@]EMAIL[@]/$(whoami)@clickhouse.com/g" \ < $CHLOG.in > $CHLOG } diff --git a/website/blog/en/2021/clickhouse-inc.md b/website/blog/en/2021/clickhouse-inc.md index 41e0bd4a772..164e6fc545d 100644 --- a/website/blog/en/2021/clickhouse-inc.md +++ b/website/blog/en/2021/clickhouse-inc.md @@ -6,7 +6,7 @@ author: '[Alexey Milovidov](https://github.com/alexey-milovidov)' tags: ['company', 'incorporation', 'yandex', 'community'] --- -Today I’m happy to announce **ClickHouse Inc.**, the new home of ClickHouse. The development team has moved from Yandex and joined ClickHouse Inc. to continue building the fastest (and the greatest) analytical database management system. The company has received nearly $50M in Series A funding led by Index Ventures and Benchmark with participation by Yandex N.V. and others. I created ClickHouse, Inc. with two co-founders, [Yury Izrailevsky](https://www.linkedin.com/in/yuryizrailevsky/) and [Aaron Katz](https://www.linkedin.com/in/aaron-k-5762094/). I will continue to lead the development of ClickHouse as Chief Technology Officer (CTO), Yury will run product and engineering, and Aaron will be CEO. +Today I’m happy to announce **ClickHouse Inc.**, the new home of ClickHouse. The development team has moved from Yandex and joined ClickHouse Inc. to continue building the fastest (and the greatest) analytical database management system. The company has received nearly $50M in Series A funding led by Index Ventures and Benchmark with participation by Yandex N.V. and others. I created ClickHouse, Inc. with two co-founders, [Yury Izrailevsky](https://www.linkedin.com/in/yuryizrailevsky/) and [Aaron Katz](https://www.linkedin.com/in/aaron-katz-5762094/). I will continue to lead the development of ClickHouse as Chief Technology Officer (CTO), Yury will run product and engineering, and Aaron will be CEO. ## History of ClickHouse @@ -39,7 +39,7 @@ Lastly, ClickHouse was purpose-built from the beginning to: Yandex N.V. is the largest internet company in Europe and employs over 14,000 people. They develop search, advertisement, and e-commerce services, ride tech and food tech solutions, self-driving cars... and also ClickHouse with a team of 15 engineers. It is hard to believe that we have managed to build a world-class leading analytical DBMS with such a small team while leveraging the global community. While this was barely enough to keep up with the development of the open-source product, everyone understands that the potential of ClickHouse technology highly outgrows such a small team. -We decided to unite the resources: take the team of core ClickHouse developers, bring in a world-class business team led by [Aaron Katz](https://www.linkedin.com/in/aaron-k-5762094/) and a cloud engineering team led by [Yury Izrailevsky](https://www.linkedin.com/in/yuryizrailevsky/), keep the power of open source, add the investment from the leading VC funds, and make an international company 100% focused on ClickHouse. I’m thrilled to announce ClickHouse, Inc. +We decided to unite the resources: take the team of core ClickHouse developers, bring in a world-class business team led by [Aaron Katz](https://www.linkedin.com/in/aaron-katz-5762094/) and a cloud engineering team led by [Yury Izrailevsky](https://www.linkedin.com/in/yuryizrailevsky/), keep the power of open source, add the investment from the leading VC funds, and make an international company 100% focused on ClickHouse. I’m thrilled to announce ClickHouse, Inc. ## What’s Next? diff --git a/website/css/main.css b/website/css/main.css index 4e812b8fabc..9b676804eba 100644 --- a/website/css/main.css +++ b/website/css/main.css @@ -908,7 +908,6 @@ img { } ul { - color: #495057; list-style-type: square; padding-left: 1.25em; } diff --git a/website/images/logos/logo-benchmark-capital.png b/website/images/logos/logo-benchmark-capital.png index 38bbce23e0e..626599c6b35 100644 Binary files a/website/images/logos/logo-benchmark-capital.png and b/website/images/logos/logo-benchmark-capital.png differ diff --git a/website/js/base.js b/website/js/base.js index 1debd0f780c..52b801eb98f 100644 --- a/website/js/base.js +++ b/website/js/base.js @@ -1,7 +1,7 @@ (function () { Sentry.init({ dsn: 'https://2b95b52c943f4ad99baccab7a9048e4d@o388870.ingest.sentry.io/5246103', - environment: window.location.hostname === 'clickhouse.tech' ? 'prod' : 'test' + environment: window.location.hostname === 'clickhouse.com' ? 'prod' : 'test' }); $(document).click(function (event) { var target = $(event.target); @@ -95,7 +95,7 @@ s.type = "text/javascript"; s.async = true; s.src = "/js/metrika.js"; - if (window.location.hostname.endsWith('clickhouse.tech')) { + if (window.location.hostname.endsWith('clickhouse.com')) { if (w.opera == "[object Opera]") { d.addEventListener("DOMContentLoaded", f, false); } else { diff --git a/website/robots.txt b/website/robots.txt index 2cecc12e311..f7d6bd76a33 100644 --- a/website/robots.txt +++ b/website/robots.txt @@ -1,5 +1,5 @@ User-agent: * Disallow: /cdn-cgi/ Allow: / -Host: https://clickhouse.tech -Sitemap: https://clickhouse.tech/sitemap-index.xml +Host: https://clickhouse.com +Sitemap: https://clickhouse.com/sitemap-index.xml diff --git a/website/sitemap-index.xml b/website/sitemap-index.xml index 3fbdd99d372..bcaf6a3fe19 100644 --- a/website/sitemap-index.xml +++ b/website/sitemap-index.xml @@ -1,24 +1,24 @@ - https://clickhouse.tech/docs/en/sitemap.xml + https://clickhouse.com/docs/en/sitemap.xml - https://clickhouse.tech/docs/zh/sitemap.xml + https://clickhouse.com/docs/zh/sitemap.xml - https://clickhouse.tech/docs/ru/sitemap.xml + https://clickhouse.com/docs/ru/sitemap.xml - https://clickhouse.tech/docs/ja/sitemap.xml + https://clickhouse.com/docs/ja/sitemap.xml - https://clickhouse.tech/blog/en/sitemap.xml + https://clickhouse.com/blog/en/sitemap.xml - https://clickhouse.tech/blog/ru/sitemap.xml + https://clickhouse.com/blog/ru/sitemap.xml - https://clickhouse.tech/sitemap-static.xml + https://clickhouse.com/sitemap-static.xml diff --git a/website/sitemap-static.xml b/website/sitemap-static.xml index 6d6b41e5827..b5b5f3aa0d5 100644 --- a/website/sitemap-static.xml +++ b/website/sitemap-static.xml @@ -1,19 +1,23 @@ - https://clickhouse.tech/ + https://clickhouse.com/ daily - - https://clickhouse.tech/benchmark/dbms/ + + https://clickhouse.com/company/ weekly - https://clickhouse.tech/benchmark/hardware/ + https://clickhouse.com/benchmark/dbms/ weekly - https://clickhouse.tech/codebrowser/html_report/ClickHouse/index.html + https://clickhouse.com/benchmark/hardware/ + weekly + + + https://clickhouse.com/codebrowser/html_report/ClickHouse/index.html daily diff --git a/website/templates/company/founders.html b/website/templates/company/founders.html index dbff295af1e..2f6100e28f7 100644 --- a/website/templates/company/founders.html +++ b/website/templates/company/founders.html @@ -1,6 +1,6 @@
- +

{{ _('Meet the Team') }}

@@ -11,7 +11,7 @@
- + @@ -21,11 +21,11 @@

{{ _('Co-Founder & President, Product and Engineering') }}

- +
- - + +

@@ -34,11 +34,11 @@

{{ _('Co-Founder & CEO') }}

- +

- - + +

@@ -47,7 +47,7 @@

{{ _('Co-Founder & CTO') }}

- +

diff --git a/website/templates/company/investors.html b/website/templates/company/investors.html index 4d6224e7603..0dd38085c83 100644 --- a/website/templates/company/investors.html +++ b/website/templates/company/investors.html @@ -14,7 +14,7 @@

{{ _('Board Member') }}

- +
diff --git a/website/templates/company/overview.html b/website/templates/company/overview.html index 98aed9ffd86..e5759227c86 100644 --- a/website/templates/company/overview.html +++ b/website/templates/company/overview.html @@ -2,7 +2,7 @@

- Creators of the online analytical processing (OLAP) database management system ClickHouse have announced their decision to officially incorporate as a company. The creator of ClickHouse, Alexey Milovidov (CTO), will be joined by co-founders and seasoned enterprise software executives, Yury Izrailevsky (President, Product and Engineering) and Aaron Katz (CEO), along with nearly $50M in Series A funding by Index Ventures and Benchmark. + Creators of the online analytical processing (OLAP) database management system ClickHouse have announced their decision to officially incorporate as a company. The creator of ClickHouse, Alexey Milovidov (CTO), will be joined by co-founders and seasoned enterprise software executives, Yury Izrailevsky (President, Product and Engineering) and Aaron Katz (CEO), along with nearly $50M in Series A funding led by Index Ventures and Benchmark.

diff --git a/website/templates/company/press.html b/website/templates/company/press.html index dc6f0f74cf0..8265b68b063 100644 --- a/website/templates/company/press.html +++ b/website/templates/company/press.html @@ -1,5 +1,4 @@ -{## -

+
@@ -10,26 +9,60 @@
-
-
+
+

{{ _('9/20/21') }}

- {{ _('ClickHouse, Inc. Announces Incorporation, Along With $50M In Series A Funding') }} + {{ _('ClickHouse, Inc. Announces Incorporation, Along With $50M In Series A Funding') }}

-

+

{{ _('New financing will allow the open source success to build a world-class, commercial-grade cloud solution that’s secure, compliant, and convenient for any customer to use.') }}

- {{ _('Read More') }} - + {{ _('Read More') }} +
+
+

+ {{ _('9/20/21') }} +

+

+ {{ _('Business Insider Exclusive') }} +

+

+ {{ _('The creators of the popular ClickHouse project just raised $50 million from Index and Benchmark to form a company that will take on Splunk and Druid in the white-hot data space.') }} +

+ {{ _('Read More') }} +
+
+

+ {{ _('9/20/21') }} +

+

+ {{ _('Index Ventures Perspective') }} +

+

+ {{ _('Our road to ClickHouse started like a good spy novel, with a complex series of introductions over Telegram, clandestine text conversations spanning months before we finally managed to meet the team “face-to-face” (aka over Zoom).') }} +

+ {{ _('Read More') }}
+
+

+ {{ _('9/20/21') }} +

+

+ {{ _('Yandex Perspective') }} +

+

+ {{ _('Yandex announces the spin off of ClickHouse, Inc., a pioneering company focused on open-source column-oriented database management systems (DBMS) for analytical processing. ') }} +

+ {{ _('Read More') }} +
-##} diff --git a/website/templates/company/team.html b/website/templates/company/team.html index 28e7b622302..8b4c4e26774 100644 --- a/website/templates/company/team.html +++ b/website/templates/company/team.html @@ -1,27 +1,27 @@
- +

{{ _('ClickHouse Team') }}

- -
+ + -
+

{{ _('Vitaly Baranov') }}

{{ _('Principal Sofware Engineer') }}

- +
- + @@ -31,10 +31,10 @@

{{ _('VP, Product') }}

- +
- + @@ -42,13 +42,13 @@ {{ _('Jason Chan') }}

- {{ _('Adviser, Security, Privacy & Compliance') }} + {{ _('Adviser, Security, Privacy & Compliance') }}

- +
- - + +

@@ -57,10 +57,10 @@

{{ _('Software Engineer') }}

- +

- + @@ -70,10 +70,10 @@

{{ _('Senior Director, Business Technology') }}

- +
- + @@ -83,11 +83,11 @@

{{ _('VP, Sales') }}

- +
{% if false %}
- +
@@ -97,12 +97,12 @@

{{ _('Account Executive') }}

- +
{% endif %}
- - + +

@@ -111,24 +111,24 @@

{{ _('Senior Software Engineer') }}

- +

- -
+ + -
+

{{ _('Nikolai Kochetov') }}

{{ _('Engineering Team Lead') }}

- +
{% if false %}
- +
@@ -138,12 +138,12 @@

{{ _('Senior Recruiter') }}

- +
{% endif %}
- - + +

@@ -152,11 +152,11 @@

{{ _('Software Engineer') }}

- +

{% if false %}
- +
@@ -164,14 +164,14 @@ {{ _('Claire Lucas') }}

- {{ _('Director, Global Business Strategy & Operations') }} + {{ _('Director, Global Business Strategy & Operations') }}

- +
{% endif %}
- - + +

@@ -180,11 +180,11 @@

{{ _('Software Engineer') }}

- +

- - + +

@@ -193,10 +193,10 @@

{{ _('Senior Software Engineer') }}

- +

- + @@ -204,13 +204,13 @@ {{ _('Thom O’Connor') }}

- {{ _('VP, Support & Services') }} + {{ _('VP, Support & Services') }}

- +
- - + +

@@ -219,10 +219,10 @@

{{ _('Senior Software Engineer') }}

- +

- + @@ -232,12 +232,11 @@

{{ _('Director, Global Learning') }}

- +
- - +

@@ -246,11 +245,11 @@

{{ _('Engineering Team Lead') }}

- +

- - + +

@@ -259,10 +258,10 @@

{{ _('Software Engineer') }}

- +

- + @@ -272,11 +271,11 @@

{{ _('VP, Operations') }}

- +
- - + +

@@ -285,10 +284,10 @@

{{ _('Software Engineer') }}

- +

- + @@ -298,7 +297,7 @@

{{ _('VP, EMEA') }}

- +
@@ -311,7 +310,7 @@

{{ _('Senior Technical Project Manager') }}

- +
diff --git a/website/templates/docs/nav.html b/website/templates/docs/nav.html index 98691bd79bd..4d57d282796 100644 --- a/website/templates/docs/nav.html +++ b/website/templates/docs/nav.html @@ -41,7 +41,7 @@ {% for code, name in config.extra.languages.items() %} - {{ name }} + {{ name }} {% endfor %}
diff --git a/website/templates/index/hero.html b/website/templates/index/hero.html index af082e73a0c..873bcf9487a 100644 --- a/website/templates/index/hero.html +++ b/website/templates/index/hero.html @@ -12,7 +12,7 @@

@@ -31,7 +31,7 @@

Introducing ClickHouse inc.!
-

ClickHouse, Inc. Announces Incorporation, Along With $50M In Series A Funding. New financing will allow the open source success to build a world-class, commercial-grade cloud solution that’s secure, compliant, and convenient for any customer to use.

Read the Press Release diff --git a/website/templates/index/success.html b/website/templates/index/success.html index 3249eabc1ee..8ab32b06ac9 100644 --- a/website/templates/index/success.html +++ b/website/templates/index/success.html @@ -64,7 +64,7 @@

{{ _('Uber moved it’s logging platform to ClickHouse increasing developer productivity and overall reliability of the platform while seeing 3x data compression, 10x performance increase, and ½ the reduction in hardware cost.') }}

- {{ _('Read the Case Study') }} + {{ _('Read the Case Study') }}
@@ -103,7 +103,7 @@

{{ _('eBay adopted ClickHouse for their real time OLAP events (Logs + Metrics) infrastructure. The simplified architecture with ClickHouse allowed them to reduce their DevOps activity and troubleshooting, reduced the overall infrastructure by 90%%, and they saw a stronger integration with Grafana and ClickHouse for visualization and alerting.') }}

- {{ _('Read the Case Study') }} + {{ _('Read the Case Study') }}
@@ -142,7 +142,7 @@

{{ _('Cloudflare was having challenges scaling their CitusDB-based system which had a high TCO and maintenance costs due to the complex architecture. By moving their HTTP analytics data to ClickHouse they were able to scale to 8M requests per second, deleted 10’s of thousands of lines of code, reduced their MTTR, and saw a 7x improvement on customer queries per second they could serve.') }}

- {{ _('Read the Case Study') }} + {{ _('Read the Case Study') }}
@@ -181,7 +181,7 @@

{{ _('Spotify\'s A/B Experimentation platform is serving thousands of sub-second queries per second on petabyte-scale datasets with Clickhouse. They reduced the amount of low-variance work by an order of magnitude and enabled feature teams to self-serve insights by introducing a unified SQL interface for Data Platform and tools for automatic decision making for Experimentation.') }}

- {{ _('Read the Case Study') }} + {{ _('Read the Case Study') }}
@@ -220,7 +220,7 @@

{{ _('ClickHouse helps serve the Client Analytics platform for reporting, deep data analysis as well as advanced data science to provide Deutsche Bank’s front office a clear view on their client\'s activity and profitability.') }}

- {{ _('Read the Case Study') }} + {{ _('Read the Case Study') }}