Merge branch 'master' into gb-use-null-analyzer-crashes

This commit is contained in:
Nikolai Kochetov 2024-03-26 13:05:03 +01:00
commit 2dee605d52
300 changed files with 6679 additions and 3285 deletions

View File

@ -6,6 +6,7 @@ env:
PYTHONUNBUFFERED: 1
on: # yamllint disable-line rule:truthy
merge_group:
pull_request:
types:
- synchronize
@ -29,6 +30,7 @@ jobs:
fetch-depth: 0 # to get version
filter: tree:0
- name: Labels check
if: ${{ github.event_name != 'merge_group' }}
run: |
cd "$GITHUB_WORKSPACE/tests/ci"
python3 run_check.py
@ -56,16 +58,9 @@ jobs:
echo 'EOF'
} >> "$GITHUB_OUTPUT"
- name: Re-create GH statuses for skipped jobs if any
if: ${{ github.event_name != 'merge_group' }}
run: |
python3 "$GITHUB_WORKSPACE/tests/ci/ci.py" --infile ${{ runner.temp }}/ci_run_data.json --update-gh-statuses
- name: Style check early
# hack to run style check before the docker build job if possible (style-check image not changed)
if: contains(fromJson(steps.runconfig.outputs.CI_DATA).jobs_data.jobs_to_do, 'Style check early')
run: |
DOCKER_TAG=$(echo '${{ toJson(fromJson(steps.runconfig.outputs.CI_DATA).docker_data.images) }}' | tr -d '\n')
export DOCKER_TAG=$DOCKER_TAG
python3 ./tests/ci/style_check.py --no-push
python3 "$GITHUB_WORKSPACE/tests/ci/ci.py" --infile ${{ runner.temp }}/ci_run_data.json --post --job-name 'Style check'
BuildDockers:
needs: [RunConfig]
if: ${{ !failure() && !cancelled() && toJson(fromJson(needs.RunConfig.outputs.data).docker_data.missing_multi) != '[]' }}

View File

@ -1,10 +1,184 @@
### Table of Contents
**[ClickHouse release v24.3 LTS, 2024-03-26](#243)**<br/>
**[ClickHouse release v24.2, 2024-02-29](#242)**<br/>
**[ClickHouse release v24.1, 2024-01-30](#241)**<br/>
**[Changelog for 2023](https://clickhouse.com/docs/en/whats-new/changelog/2023/)**<br/>
# 2024 Changelog
### <a id="243"></a> ClickHouse release 24.3 LTS, 2024-03-26
#### Upgrade Notes
* The setting `allow_experimental_analyzer` is enabled by default and it switches the query analysis to a new implementation, which has better compatibility and feature completeness. The feature "analyzer" is considered beta instead of experimental. You can turn the old behavior by setting the `compatibility` to `24.2` or disabling the `allow_experimental_analyzer` setting. Watch the [video on YouTube](https://www.youtube.com/watch?v=zhrOYQpgvkk).
* ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings, `output_format_parquet_string_as_string`, `output_format_orc_string_as_string`, `output_format_arrow_string_as_string`. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the faster `lz4` compression method, that's why we set `zstd` by default. This is controlled by the settings `output_format_parquet_compression_method`, `output_format_orc_compression_method`, and `output_format_arrow_compression_method`. We changed the default to `zstd` for Parquet and ORC, but not Arrow (it is emphasized for low-level usages). [#61817](https://github.com/ClickHouse/ClickHouse/pull/61817) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* In the new ClickHouse version, the functions `geoDistance`, `greatCircleDistance`, and `greatCircleAngle` will use 64-bit double precision floating point data type for internal calculations and return type if all the arguments are Float64. This closes [#58476](https://github.com/ClickHouse/ClickHouse/issues/58476). In previous versions, the function always used Float32. You can switch to the old behavior by setting `geo_distance_returns_float64_on_float64_arguments` to `false` or setting `compatibility` to `24.2` or earlier. [#61848](https://github.com/ClickHouse/ClickHouse/pull/61848) ([Alexey Milovidov](https://github.com/alexey-milovidov)). Co-authored with [Geet Patel](https://github.com/geetptl).
* The obsolete in-memory data parts have been deprecated since version 23.5 and have not been supported since version 23.10. Now the remaining code is removed. Continuation of [#55186](https://github.com/ClickHouse/ClickHouse/issues/55186) and [#45409](https://github.com/ClickHouse/ClickHouse/issues/45409). It is unlikely that you have used in-memory data parts because they were available only before version 23.5 and only when you enabled them manually by specifying the corresponding SETTINGS for a MergeTree table. To check if you have in-memory data parts, run the following query: `SELECT part_type, count() FROM system.parts GROUP BY part_type ORDER BY part_type`. To disable the usage of in-memory data parts, do `ALTER TABLE ... MODIFY SETTING min_bytes_for_compact_part = DEFAULT, min_rows_for_compact_part = DEFAULT`. Before upgrading from old ClickHouse releases, first check that you don't have in-memory data parts. If there are in-memory data parts, disable them first, then wait while there are no in-memory data parts and continue the upgrade. [#61127](https://github.com/ClickHouse/ClickHouse/pull/61127) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Changed the column name from `duration_ms` to `duration_microseconds` in the `system.zookeeper` table to reflect the reality that the duration is in the microsecond resolution. [#60774](https://github.com/ClickHouse/ClickHouse/pull/60774) ([Duc Canh Le](https://github.com/canhld94)).
* Reject incoming INSERT queries in case when query-level settings `async_insert` and `deduplicate_blocks_in_dependent_materialized_views` are enabled together. This behaviour is controlled by a setting `throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert` and enabled by default. This is a continuation of https://github.com/ClickHouse/ClickHouse/pull/59699 needed to unblock https://github.com/ClickHouse/ClickHouse/pull/59915. [#60888](https://github.com/ClickHouse/ClickHouse/pull/60888) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Utility `clickhouse-copier` is moved to a separate repository on GitHub: https://github.com/ClickHouse/copier. It is no longer included in the bundle but is still available as a separate download. This closes: [#60734](https://github.com/ClickHouse/ClickHouse/issues/60734) This closes: [#60540](https://github.com/ClickHouse/ClickHouse/issues/60540) This closes: [#60250](https://github.com/ClickHouse/ClickHouse/issues/60250) This closes: [#52917](https://github.com/ClickHouse/ClickHouse/issues/52917) This closes: [#51140](https://github.com/ClickHouse/ClickHouse/issues/51140) This closes: [#47517](https://github.com/ClickHouse/ClickHouse/issues/47517) This closes: [#47189](https://github.com/ClickHouse/ClickHouse/issues/47189) This closes: [#46598](https://github.com/ClickHouse/ClickHouse/issues/46598) This closes: [#40257](https://github.com/ClickHouse/ClickHouse/issues/40257) This closes: [#36504](https://github.com/ClickHouse/ClickHouse/issues/36504) This closes: [#35485](https://github.com/ClickHouse/ClickHouse/issues/35485) This closes: [#33702](https://github.com/ClickHouse/ClickHouse/issues/33702) This closes: [#26702](https://github.com/ClickHouse/ClickHouse/issues/26702).
* To increase compatibility with MySQL, the compatibility alias `locate` now accepts arguments `(needle, haystack[, start_pos])` by default. The previous behavior `(haystack, needle, [, start_pos])` can be restored by setting `function_locate_has_mysql_compatible_argument_order = 0`. [#61092](https://github.com/ClickHouse/ClickHouse/pull/61092) ([Robert Schulze](https://github.com/rschu1ze)).
* Forbid `SimpleAggregateFunction` in `ORDER BY` of `MergeTree` tables (like `AggregateFunction` is forbidden, but they are forbidden because they are not comparable) by default (use `allow_suspicious_primary_key` to allow them). [#61399](https://github.com/ClickHouse/ClickHouse/pull/61399) ([Azat Khuzhin](https://github.com/azat)).
* The `Ordinary` database engine is deprecated. You will receive a warning in clickhouse-client if your server is using it. This closes [#52229](https://github.com/ClickHouse/ClickHouse/issues/52229). [#56942](https://github.com/ClickHouse/ClickHouse/pull/56942) ([shabroo](https://github.com/shabroo)).
#### New Feature
* Support reading and writing backups as `tar` (in addition to `zip`). [#59535](https://github.com/ClickHouse/ClickHouse/pull/59535) ([josh-hildred](https://github.com/josh-hildred)).
* Implemented support for S3 Express buckets. [#59965](https://github.com/ClickHouse/ClickHouse/pull/59965) ([Nikita Taranov](https://github.com/nickitat)).
* Allow to attach parts from a different disk (using copy instead of hard link). [#60112](https://github.com/ClickHouse/ClickHouse/pull/60112) ([Unalian](https://github.com/Unalian)).
* Size-capped `Memory` tables: controlled by their settings, `min_bytes_to_keep, max_bytes_to_keep, min_rows_to_keep` and `max_rows_to_keep`. [#60612](https://github.com/ClickHouse/ClickHouse/pull/60612) ([Jake Bamrah](https://github.com/JakeBamrah)).
* Separate limits on number of waiting and executing queries. Added new server setting `max_waiting_queries` that limits the number of queries waiting due to `async_load_databases`. Existing limits on number of executing queries no longer count waiting queries. [#61053](https://github.com/ClickHouse/ClickHouse/pull/61053) ([Sergei Trifonov](https://github.com/serxa)).
* Added a table `system.keywords` which contains all the keywords from parser. Mostly needed and will be used for better fuzzing and syntax highlighting. [#51808](https://github.com/ClickHouse/ClickHouse/pull/51808) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add support for `ATTACH PARTITION ALL`. [#61107](https://github.com/ClickHouse/ClickHouse/pull/61107) ([Kirill Nikiforov](https://github.com/allmazz)).
* Add a new function, `getClientHTTPHeader`. This closes [#54665](https://github.com/ClickHouse/ClickHouse/issues/54665). Co-authored with @lingtaolf. [#61820](https://github.com/ClickHouse/ClickHouse/pull/61820) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add `generate_series` as a table function (compatibility alias for PostgreSQL to the existing `numbers` function). This function generates table with an arithmetic progression with natural numbers. [#59390](https://github.com/ClickHouse/ClickHouse/pull/59390) ([divanik](https://github.com/divanik)).
* A mode for `topK`/`topkWeighed` support mode, which return count of values and its error. [#54508](https://github.com/ClickHouse/ClickHouse/pull/54508) ([UnamedRus](https://github.com/UnamedRus)).
* Added function `toMillisecond` which returns the millisecond component for values of type`DateTime` or `DateTime64`. [#60281](https://github.com/ClickHouse/ClickHouse/pull/60281) ([Shaun Struwig](https://github.com/Blargian)).
* Allow configuring HTTP redirect handlers for clickhouse-server. For example, you can make `/` redirect to the Play UI. [#60390](https://github.com/ClickHouse/ClickHouse/pull/60390) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Performance Improvement
* Optimized function `dotProduct` to omit unnecessary and expensive memory copies. [#60928](https://github.com/ClickHouse/ClickHouse/pull/60928) ([Robert Schulze](https://github.com/rschu1ze)).
* 30x faster printing for 256-bit integers. [#61100](https://github.com/ClickHouse/ClickHouse/pull/61100) ([Raúl Marín](https://github.com/Algunenano)).
* If the table's primary key contains mostly useless columns, don't keep them in memory. This is controlled by a new setting `primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns` with the value `0.9` by default, which means: for a composite primary key, if a column changes its value for at least 0.9 of all the times, the next columns after it will be not loaded. [#60255](https://github.com/ClickHouse/ClickHouse/pull/60255) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve the performance of serialized aggregation method when involving multiple `Nullable` columns. [#55809](https://github.com/ClickHouse/ClickHouse/pull/55809) ([Amos Bird](https://github.com/amosbird)).
* Lazy build JSON's output to improve performance of ALL JOIN. [#58278](https://github.com/ClickHouse/ClickHouse/pull/58278) ([LiuNeng](https://github.com/liuneng1994)).
* Make HTTP/HTTPs connections with external services, such as AWS S3 reusable for all uses cases. Even when response is 3xx or 4xx. [#58845](https://github.com/ClickHouse/ClickHouse/pull/58845) ([Sema Checherinda](https://github.com/CheSema)).
* Improvements to aggregate functions `argMin` / `argMax` / `any` / `anyLast` / `anyHeavy`, as well as `ORDER BY {u8/u16/u32/u64/i8/i16/u32/i64) LIMIT 1` queries. [#58640](https://github.com/ClickHouse/ClickHouse/pull/58640) ([Raúl Marín](https://github.com/Algunenano)).
* Trivial optimization for column's filter. Peak memory can be reduced to 44% of the original in some cases. [#59698](https://github.com/ClickHouse/ClickHouse/pull/59698) ([李扬](https://github.com/taiyang-li)).
* Execute `multiIf` function in a columnar fashion when the result type's underlying type is a number. [#60384](https://github.com/ClickHouse/ClickHouse/pull/60384) ([李扬](https://github.com/taiyang-li)).
* Faster (almost 2x) mutexes. [#60823](https://github.com/ClickHouse/ClickHouse/pull/60823) ([Azat Khuzhin](https://github.com/azat)).
* Drain multiple connections in parallel when a distributed query is finishing. [#60845](https://github.com/ClickHouse/ClickHouse/pull/60845) ([lizhuoyu5](https://github.com/lzydmxy)).
* Optimize data movement between columns of a Nullable number or a Nullable string, which improves some micro-benchmarks. [#60846](https://github.com/ClickHouse/ClickHouse/pull/60846) ([李扬](https://github.com/taiyang-li)).
* Operations with the filesystem cache will suffer less from the lock contention. [#61066](https://github.com/ClickHouse/ClickHouse/pull/61066) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Optimize array join and other JOINs by preventing a wrong compiler's optimization. Close [#61074](https://github.com/ClickHouse/ClickHouse/issues/61074). [#61075](https://github.com/ClickHouse/ClickHouse/pull/61075) ([李扬](https://github.com/taiyang-li)).
* If a query with a syntax error contained `COLUMNS` matcher with a regular expression, the regular expression was compiled each time during the parser's backtracking, instead of being compiled once. This was a fundamental error. The compiled regexp was put to AST. But the letter A in AST means "abstract" which means it should not contain heavyweight objects. Parts of AST can be created and discarded during parsing, including a large number of backtracking. This leads to slowness on the parsing side and consequently allows DoS by a readonly user. But the main problem is that it prevents progress in fuzzers. [#61543](https://github.com/ClickHouse/ClickHouse/pull/61543) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a new analyzer pass to optimize the IN operator for a single value. [#61564](https://github.com/ClickHouse/ClickHouse/pull/61564) ([LiuNeng](https://github.com/liuneng1994)).
* DNSResolver shuffles set of resolved IPs which is needed to uniformly utilize multiple endpoints of AWS S3. [#60965](https://github.com/ClickHouse/ClickHouse/pull/60965) ([Sema Checherinda](https://github.com/CheSema)).
#### Experimental Feature
* Support parallel reading for Azure blob storage. This improves the performance of the experimental Azure object storage. [#61503](https://github.com/ClickHouse/ClickHouse/pull/61503) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Add asynchronous WriteBuffer for Azure blob storage similar to S3. This improves the performance of the experimental Azure object storage. [#59929](https://github.com/ClickHouse/ClickHouse/pull/59929) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Use managed identity for backups IO when using Azure Blob Storage. Add a setting to prevent ClickHouse from attempting to create a non-existent container, which requires permissions at the storage account level. [#61785](https://github.com/ClickHouse/ClickHouse/pull/61785) ([Daniel Pozo Escalona](https://github.com/danipozo)).
* Add a setting `parallel_replicas_allow_in_with_subquery = 1` which allows subqueries for IN work with parallel replicas. [#60950](https://github.com/ClickHouse/ClickHouse/pull/60950) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* A change for the "zero-copy" replication: all zero copy locks related to a table have to be dropped when the table is dropped. The directory which contains these locks has to be removed also. [#57575](https://github.com/ClickHouse/ClickHouse/pull/57575) ([Sema Checherinda](https://github.com/CheSema)).
#### Improvement
* Use `MergeTree` as a default table engine. [#60524](https://github.com/ClickHouse/ClickHouse/pull/60524) ([Alexey Milovidov](https://github.com/alexey-milovidov))
* Enable `output_format_pretty_row_numbers` by default. It is better for usability. [#61791](https://github.com/ClickHouse/ClickHouse/pull/61791) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* In the previous version, some numbers in Pretty formats were not pretty enough. [#61794](https://github.com/ClickHouse/ClickHouse/pull/61794) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* A long value in Pretty formats won't be cut if it is the single value in the resultset, such as in the result of the `SHOW CREATE TABLE` query. [#61795](https://github.com/ClickHouse/ClickHouse/pull/61795) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Similarly to `clickhouse-local`, `clickhouse-client` will accept the `--output-format` option as a synonym to the `--format` option. This closes [#59848](https://github.com/ClickHouse/ClickHouse/issues/59848). [#61797](https://github.com/ClickHouse/ClickHouse/pull/61797) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* If stdout is a terminal and the output format is not specified, `clickhouse-client` and similar tools will use `PrettyCompact` by default, similarly to the interactive mode. `clickhouse-client` and `clickhouse-local` will handle command line arguments for input and output formats in a unified fashion. This closes [#61272](https://github.com/ClickHouse/ClickHouse/issues/61272). [#61800](https://github.com/ClickHouse/ClickHouse/pull/61800) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Underscore digit groups in Pretty formats for better readability. This is controlled by a new setting, `output_format_pretty_highlight_digit_groups`. [#61802](https://github.com/ClickHouse/ClickHouse/pull/61802) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add ability to override initial INSERT settings via `SYSTEM FLUSH DISTRIBUTED`. [#61832](https://github.com/ClickHouse/ClickHouse/pull/61832) ([Azat Khuzhin](https://github.com/azat)).
* Enable processors profiling (time spent/in and out bytes for sorting, aggregation, ...) by default. [#61096](https://github.com/ClickHouse/ClickHouse/pull/61096) ([Azat Khuzhin](https://github.com/azat)).
* Support files without format extension in Filesystem database. [#60795](https://github.com/ClickHouse/ClickHouse/pull/60795) ([Kruglov Pavel](https://github.com/Avogar)).
* Make all format names case insensitive, like Tsv, or TSV, or tsv, or even rowbinary. [#60420](https://github.com/ClickHouse/ClickHouse/pull/60420) ([豪肥肥](https://github.com/HowePa)). I appreciate if you will continue to write it correctly, e.g., `JSON` 😇, not `Json` 🤮, but we don't mind if you spell it as you prefer.
* Added `none_only_active` mode for `distributed_ddl_output_mode` setting. [#60340](https://github.com/ClickHouse/ClickHouse/pull/60340) ([Alexander Tokmakov](https://github.com/tavplubix)).
* The advanced dashboard has slightly better colors for multi-line graphs. [#60391](https://github.com/ClickHouse/ClickHouse/pull/60391) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The Advanced dashboard now has controls always visible on scrolling. This allows you to add a new chart without scrolling up. [#60692](https://github.com/ClickHouse/ClickHouse/pull/60692) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* While running the `MODIFY COLUMN` query for materialized views, check the inner table's structure to ensure every column exists. [#47427](https://github.com/ClickHouse/ClickHouse/pull/47427) ([sunny](https://github.com/sunny19930321)).
* String types and Enums can be used in the same context, such as: arrays, UNION queries, conditional expressions. This closes [#60726](https://github.com/ClickHouse/ClickHouse/issues/60726). [#60727](https://github.com/ClickHouse/ClickHouse/pull/60727) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allow declaring Enums in the structure of external data for query processing (this is an immediate temporary table that you can provide for your query). [#57857](https://github.com/ClickHouse/ClickHouse/pull/57857) ([Duc Canh Le](https://github.com/canhld94)).
* Consider lightweight deleted rows when selecting parts to merge, so the disk size of the resulting part will be estimated better. [#58223](https://github.com/ClickHouse/ClickHouse/pull/58223) ([Zhuo Qiu](https://github.com/jewelzqiu)).
* Added comments for columns for more system tables. Continuation of https://github.com/ClickHouse/ClickHouse/pull/58356. [#59016](https://github.com/ClickHouse/ClickHouse/pull/59016) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Now we can use virtual columns in PREWHERE. It's worthwhile for non-const virtual columns like `_part_offset`. [#59033](https://github.com/ClickHouse/ClickHouse/pull/59033) ([Amos Bird](https://github.com/amosbird)). Improved overall usability of virtual columns. Now it is allowed to use virtual columns in `PREWHERE` (it's worthwhile for non-const virtual columns like `_part_offset`). Now a builtin documentation is available for virtual columns as a comment of column in `DESCRIBE` query with enabled setting `describe_include_virtual_columns`. [#60205](https://github.com/ClickHouse/ClickHouse/pull/60205) ([Anton Popov](https://github.com/CurtizJ)).
* Instead of using a constant key, now object storage generates key for determining remove objects capability. [#59495](https://github.com/ClickHouse/ClickHouse/pull/59495) ([Sema Checherinda](https://github.com/CheSema)).
* Allow "local" as object storage type instead of "local_blob_storage". [#60165](https://github.com/ClickHouse/ClickHouse/pull/60165) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Parallel flush of pending INSERT blocks of Distributed engine on `DETACH`/server shutdown and `SYSTEM FLUSH DISTRIBUTED` (Parallelism will work only if you have multi-disk policy for a table (like everything in the Distributed engine right now)). [#60225](https://github.com/ClickHouse/ClickHouse/pull/60225) ([Azat Khuzhin](https://github.com/azat)).
* Add a setting to force read-through cache for merges. [#60308](https://github.com/ClickHouse/ClickHouse/pull/60308) ([Kseniia Sumarokova](https://github.com/kssenii)).
* An improvement for the MySQL compatibility protocol. The issue [#57598](https://github.com/ClickHouse/ClickHouse/issues/57598) mentions a variant behaviour regarding transaction handling. An issued COMMIT/ROLLBACK when no transaction is active is reported as an error contrary to MySQL behaviour. [#60338](https://github.com/ClickHouse/ClickHouse/pull/60338) ([PapaToemmsn](https://github.com/PapaToemmsn)).
* Function `substring` now has a new alias `byteSlice`. [#60494](https://github.com/ClickHouse/ClickHouse/pull/60494) ([Robert Schulze](https://github.com/rschu1ze)).
* Renamed server setting `dns_cache_max_size` to `dns_cache_max_entries` to reduce ambiguity. [#60500](https://github.com/ClickHouse/ClickHouse/pull/60500) ([Kirill Nikiforov](https://github.com/allmazz)).
* `SHOW INDEX | INDEXES | INDICES | KEYS` no longer sorts by the primary key columns (which was unintuitive). [#60514](https://github.com/ClickHouse/ClickHouse/pull/60514) ([Robert Schulze](https://github.com/rschu1ze)).
* Keeper improvement: abort during startup if an invalid snapshot is detected to avoid data loss. [#60537](https://github.com/ClickHouse/ClickHouse/pull/60537) ([Antonio Andelic](https://github.com/antonio2368)).
* Update tzdata to 2024a. [#60768](https://github.com/ClickHouse/ClickHouse/pull/60768) ([Raúl Marín](https://github.com/Algunenano)).
* Keeper improvement: support `leadership_expiry_ms` in Keeper's settings. [#60806](https://github.com/ClickHouse/ClickHouse/pull/60806) ([Brokenice0415](https://github.com/Brokenice0415)).
* Always infer exponential numbers in JSON formats regardless of the setting `input_format_try_infer_exponent_floats`. Add setting `input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects` that allows to use String type for ambiguous paths instead of an exception during named Tuples inference from JSON objects. [#60808](https://github.com/ClickHouse/ClickHouse/pull/60808) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support for `START TRANSACTION` syntax typically used in MySQL syntax, resolving https://github.com/ClickHouse/ClickHouse/discussions/60865. [#60886](https://github.com/ClickHouse/ClickHouse/pull/60886) ([Zach Naimon](https://github.com/ArctypeZach)).
* Add a flag for the full-sorting merge join algorithm to treat null as biggest/smallest. So the behavior can be compitable with other SQL systems, like Apache Spark. [#60896](https://github.com/ClickHouse/ClickHouse/pull/60896) ([loudongfeng](https://github.com/loudongfeng)).
* Support detect output format by file exctension in `clickhouse-client` and `clickhouse-local`. [#61036](https://github.com/ClickHouse/ClickHouse/pull/61036) ([豪肥肥](https://github.com/HowePa)).
* Update memory limit in runtime when Linux's CGroups value changed. [#61049](https://github.com/ClickHouse/ClickHouse/pull/61049) ([Han Fei](https://github.com/hanfei1991)).
* Add the function `toUInt128OrZero`, which was missed by mistake (the mistake is related to https://github.com/ClickHouse/ClickHouse/pull/945). The compatibility aliases `FROM_UNIXTIME` and `DATE_FORMAT` (they are not ClickHouse-native and only exist for MySQL compatibility) have been made case insensitive, as expected for SQL-compatibility aliases. [#61114](https://github.com/ClickHouse/ClickHouse/pull/61114) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improvements for the access checks, allowing to revoke of unpossessed rights in case the target user doesn't have the revoking grants either. Example: `GRANT SELECT ON *.* TO user1; REVOKE SELECT ON system.* FROM user1;`. [#61115](https://github.com/ClickHouse/ClickHouse/pull/61115) ([pufit](https://github.com/pufit)).
* Fix `has()` function with `Nullable` column (fixes [#60214](https://github.com/ClickHouse/ClickHouse/issues/60214)). [#61249](https://github.com/ClickHouse/ClickHouse/pull/61249) ([Mikhail Koviazin](https://github.com/mkmkme)).
* Now it's possible to specify the attribute `merge="true"` in config substitutions for subtrees `<include from_zk="/path" merge="true">`. In case this attribute specified, clickhouse will merge subtree with existing configuration, otherwise default behavior is append new content to configuration. [#61299](https://github.com/ClickHouse/ClickHouse/pull/61299) ([alesapin](https://github.com/alesapin)).
* Add async metrics for virtual memory mappings: `VMMaxMapCount` & `VMNumMaps`. Closes [#60662](https://github.com/ClickHouse/ClickHouse/issues/60662). [#61354](https://github.com/ClickHouse/ClickHouse/pull/61354) ([Tuan Pham Anh](https://github.com/tuanpavn)).
* Use `temporary_files_codec` setting in all places where we create temporary data, for example external memory sorting and external memory GROUP BY. Before it worked only in `partial_merge` JOIN algorithm. [#61456](https://github.com/ClickHouse/ClickHouse/pull/61456) ([Maksim Kita](https://github.com/kitaisreal)).
* Add a new setting `max_parser_backtracks` which allows to limit the complexity of query parsing. [#61502](https://github.com/ClickHouse/ClickHouse/pull/61502) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Less contention during dynamic resize of the filesystem cache. [#61524](https://github.com/ClickHouse/ClickHouse/pull/61524) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Disallow sharded mode of StorageS3 queue, because it will be rewritten. [#61537](https://github.com/ClickHouse/ClickHouse/pull/61537) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fixed typo: from `use_leagcy_max_level` to `use_legacy_max_level`. [#61545](https://github.com/ClickHouse/ClickHouse/pull/61545) ([William Schoeffel](https://github.com/wiledusc)).
* Remove some duplicate entries in `system.blob_storage_log`. [#61622](https://github.com/ClickHouse/ClickHouse/pull/61622) ([YenchangChan](https://github.com/YenchangChan)).
* Added `current_user` function as a compatibility alias for MySQL. [#61770](https://github.com/ClickHouse/ClickHouse/pull/61770) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fix inconsistent floating point aggregate function states in mixed x86-64 / ARM clusters [#60610](https://github.com/ClickHouse/ClickHouse/pull/60610) ([Harry Lee](https://github.com/HarryLeeIBM)).
#### Build/Testing/Packaging Improvement
* The real-time query profiler now works on AArch64. In previous versions, it worked only when a program didn't spend time inside a syscall. [#60807](https://github.com/ClickHouse/ClickHouse/pull/60807) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* ClickHouse version has been added to docker labels. Closes [#54224](https://github.com/ClickHouse/ClickHouse/issues/54224). [#60949](https://github.com/ClickHouse/ClickHouse/pull/60949) ([Nikolay Monkov](https://github.com/nikmonkov)).
* Upgrade `prqlc` to 0.11.3. [#60616](https://github.com/ClickHouse/ClickHouse/pull/60616) ([Maximilian Roos](https://github.com/max-sixty)).
* Add generic query text fuzzer in `clickhouse-local`. [#61508](https://github.com/ClickHouse/ClickHouse/pull/61508) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix finished_mutations_to_keep=0 for MergeTree (as docs says 0 is to keep everything) [#60031](https://github.com/ClickHouse/ClickHouse/pull/60031) ([Azat Khuzhin](https://github.com/azat)).
* Something was wrong with the FINAL optimization, here is how the author describes it: "PartsSplitter invalid ranges for the same part". [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)).
* Something was wrong with Apache Hive, which is experimental and not supported. [#60262](https://github.com/ClickHouse/ClickHouse/pull/60262) ([shanfengp](https://github.com/Aed-p)).
* An improvement for experimental parallel replicas: force reanalysis if parallel replicas changed [#60362](https://github.com/ClickHouse/ClickHouse/pull/60362) ([Raúl Marín](https://github.com/Algunenano)).
* Fix usage of plain metadata type with new disks configuration option [#60396](https://github.com/ClickHouse/ClickHouse/pull/60396) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Don't allow to set max_parallel_replicas to 0 as it doesn't make sense [#60430](https://github.com/ClickHouse/ClickHouse/pull/60430) ([Kruglov Pavel](https://github.com/Avogar)).
* Try to fix logical error 'Cannot capture column because it has incompatible type' in mapContainsKeyLike [#60451](https://github.com/ClickHouse/ClickHouse/pull/60451) ([Kruglov Pavel](https://github.com/Avogar)).
* Avoid calculation of scalar subqueries for CREATE TABLE. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)).
* Something was wrong with experimental KQL (Kusto) support: fix `max_query_size_for_kql_compound_operator`: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)).
* Keeper fix: add timeouts when waiting for commit logs [#60544](https://github.com/ClickHouse/ClickHouse/pull/60544) ([Antonio Andelic](https://github.com/antonio2368)).
* Don't output number tips for date types [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)).
* Fix reading from MergeTree with non-deterministic functions in filter [#60586](https://github.com/ClickHouse/ClickHouse/pull/60586) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix logical error on bad compatibility setting value type [#60596](https://github.com/ClickHouse/ClickHouse/pull/60596) ([Kruglov Pavel](https://github.com/Avogar)).
* fix(prql): Robust panic handler [#60615](https://github.com/ClickHouse/ClickHouse/pull/60615) ([Maximilian Roos](https://github.com/max-sixty)).
* Fix `intDiv` for decimal and date arguments [#60672](https://github.com/ClickHouse/ClickHouse/pull/60672) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fix: expand CTE in alter modify query [#60682](https://github.com/ClickHouse/ClickHouse/pull/60682) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix system.parts for non-Atomic/Ordinary database engine (i.e. Memory) [#60689](https://github.com/ClickHouse/ClickHouse/pull/60689) ([Azat Khuzhin](https://github.com/azat)).
* Fix "Invalid storage definition in metadata file" for parameterized views [#60708](https://github.com/ClickHouse/ClickHouse/pull/60708) ([Azat Khuzhin](https://github.com/azat)).
* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove wrong assertion in aggregate function quantileGK [#60740](https://github.com/ClickHouse/ClickHouse/pull/60740) ([李扬](https://github.com/taiyang-li)).
* Fix insert-select + insert_deduplication_token bug by setting streams to 1 [#60745](https://github.com/ClickHouse/ClickHouse/pull/60745) ([Jordi Villar](https://github.com/jrdi)).
* Prevent setting custom metadata headers on unsupported multipart upload operations [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
* Fix toStartOfInterval [#60763](https://github.com/ClickHouse/ClickHouse/pull/60763) ([Andrey Zvonov](https://github.com/zvonand)).
* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)).
* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)).
* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix possible stuck on error in HashedDictionaryParallelLoader [#60926](https://github.com/ClickHouse/ClickHouse/pull/60926) ([vdimir](https://github.com/vdimir)).
* Fix async RESTORE with Replicated database (experimental feature) [#60934](https://github.com/ClickHouse/ClickHouse/pull/60934) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix deadlock in async inserts to `Log` tables via native protocol [#61055](https://github.com/ClickHouse/ClickHouse/pull/61055) ([Anton Popov](https://github.com/CurtizJ)).
* Fix lazy execution of default argument in dictGetOrDefault for RangeHashedDictionary [#61196](https://github.com/ClickHouse/ClickHouse/pull/61196) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix multiple bugs in groupArraySorted [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)).
* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix usage of session_token in S3 engine [#61234](https://github.com/ClickHouse/ClickHouse/pull/61234) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix possible incorrect result of aggregate function `uniqExact` [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)).
* Fix bugs in show database [#61269](https://github.com/ClickHouse/ClickHouse/pull/61269) ([Raúl Marín](https://github.com/Algunenano)).
* Fix logical error in RabbitMQ storage with MATERIALIZED columns [#61320](https://github.com/ClickHouse/ClickHouse/pull/61320) ([vdimir](https://github.com/vdimir)).
* Fix CREATE OR REPLACE DICTIONARY [#61356](https://github.com/ClickHouse/ClickHouse/pull/61356) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix ATTACH query with external ON CLUSTER [#61365](https://github.com/ClickHouse/ClickHouse/pull/61365) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix consecutive keys optimization for nullable keys [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)).
* fix issue of actions dag split [#61458](https://github.com/ClickHouse/ClickHouse/pull/61458) ([Raúl Marín](https://github.com/Algunenano)).
* Fix finishing a failed RESTORE [#61466](https://github.com/ClickHouse/ClickHouse/pull/61466) ([Vitaly Baranov](https://github.com/vitlibar)).
* Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings [#61468](https://github.com/ClickHouse/ClickHouse/pull/61468) ([Raúl Marín](https://github.com/Algunenano)).
* Allow queuing in restore pool [#61475](https://github.com/ClickHouse/ClickHouse/pull/61475) ([Nikita Taranov](https://github.com/nickitat)).
* Fix an inconsistency when reading system.parts using UUID. [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)).
* Fix ALTER QUERY MODIFY SQL SECURITY [#61480](https://github.com/ClickHouse/ClickHouse/pull/61480) ([pufit](https://github.com/pufit)).
* Fix a crash in window view (experimental feature) [#61526](https://github.com/ClickHouse/ClickHouse/pull/61526) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix `repeat` with non-native integers [#61527](https://github.com/ClickHouse/ClickHouse/pull/61527) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix client's `-s` argument [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix crash in arrayPartialReverseSort [#61539](https://github.com/ClickHouse/ClickHouse/pull/61539) ([Raúl Marín](https://github.com/Algunenano)).
* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix addDays cause an error when used DateTime64 [#61561](https://github.com/ClickHouse/ClickHouse/pull/61561) ([Shuai li](https://github.com/loneylee)).
* Disallow LowCardinality input type for JSONExtract [#61617](https://github.com/ClickHouse/ClickHouse/pull/61617) ([Julia Kartseva](https://github.com/jkartseva)).
* Fix `system.part_log` for async insert with deduplication [#61620](https://github.com/ClickHouse/ClickHouse/pull/61620) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix a `Non-ready set` exception for system.parts. [#61666](https://github.com/ClickHouse/ClickHouse/pull/61666) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix actual_part_name for REPLACE_RANGE (`Entry actual part isn't empty yet`) [#61675](https://github.com/ClickHouse/ClickHouse/pull/61675) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix a sanitizer report in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)).
* Fix an observation that the RANGE frame is not supported for Nullable columns. [#61766](https://github.com/ClickHouse/ClickHouse/pull/61766) ([YuanLiu](https://github.com/ditgittube)).
### <a id="242"></a> ClickHouse release 24.2, 2024-02-29
#### Backward Incompatible Change

View File

@ -28,7 +28,6 @@ curl https://clickhouse.com/ | sh
* [Slack](https://clickhouse.com/slack) and [Telegram](https://telegram.me/clickhouse_en) allow chatting with ClickHouse users in real-time.
* [Blog](https://clickhouse.com/blog/) contains various ClickHouse-related articles, as well as announcements and reports about events.
* [Code Browser (github.dev)](https://github.dev/ClickHouse/ClickHouse) with syntax highlighting, powered by github.dev.
* [Static Analysis (SonarCloud)](https://sonarcloud.io/project/issues?resolved=false&id=ClickHouse_ClickHouse) proposes C++ quality improvements.
* [Contacts](https://clickhouse.com/company/contact) can help to get your questions answered if there are any.
## Monthly Release & Community Call

View File

@ -30,7 +30,7 @@ if (SANITIZE)
elseif (SANITIZE STREQUAL "thread")
set (TSAN_FLAGS "-fsanitize=thread")
if (COMPILER_CLANG)
set (TSAN_FLAGS "${TSAN_FLAGS} -fsanitize-blacklist=${PROJECT_SOURCE_DIR}/tests/tsan_suppressions.txt")
set (TSAN_FLAGS "${TSAN_FLAGS} -fsanitize-ignorelist=${PROJECT_SOURCE_DIR}/tests/tsan_ignorelist.txt")
endif()
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${TSAN_FLAGS}")
@ -48,7 +48,7 @@ if (SANITIZE)
set(UBSAN_FLAGS "${UBSAN_FLAGS} -fno-sanitize=unsigned-integer-overflow")
endif()
if (COMPILER_CLANG)
set (UBSAN_FLAGS "${UBSAN_FLAGS} -fsanitize-blacklist=${PROJECT_SOURCE_DIR}/tests/ubsan_suppressions.txt")
set (UBSAN_FLAGS "${UBSAN_FLAGS} -fsanitize-ignorelist=${PROJECT_SOURCE_DIR}/tests/ubsan_ignorelist.txt")
endif()
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${UBSAN_FLAGS}")

2
contrib/xxHash vendored

@ -1 +1 @@
Subproject commit 3078dc6039f8c0bffcb1904f81cfe6b2c3209435
Subproject commit bbb27a5efb85b92a0486cf361a8635715a53f6ba

View File

@ -7,7 +7,7 @@ add_library(xxHash ${SRCS})
target_include_directories(xxHash SYSTEM BEFORE INTERFACE "${LIBRARY_DIR}")
# XXH_INLINE_ALL - Make all functions inline, with implementations being directly included within xxhash.h. Inlining functions is beneficial for speed on small keys.
# https://github.com/Cyan4973/xxHash/tree/v0.8.1#build-modifiers
# https://github.com/Cyan4973/xxHash/tree/v0.8.2#build-modifiers
target_compile_definitions(xxHash PUBLIC XXH_INLINE_ALL)
add_library(ch_contrib::xxHash ALIAS xxHash)

View File

@ -0,0 +1,24 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.12.6.19-stable (40080a3c2a4) FIXME as compared to v23.12.5.81-stable (a0fbe3ae813)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Improve isolation of query cache entries under re-created users or role switches [#58611](https://github.com/ClickHouse/ClickHouse/pull/58611) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix possible incorrect result of aggregate function `uniqExact` [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)).
* Fix consecutive keys optimization for nullable keys [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)).
* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)).
#### CI Fix or Improvement (changelog entry is not required)
* Backported in [#61429](https://github.com/ClickHouse/ClickHouse/issues/61429):. [#61374](https://github.com/ClickHouse/ClickHouse/pull/61374) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#61486](https://github.com/ClickHouse/ClickHouse/issues/61486): ... [#61441](https://github.com/ClickHouse/ClickHouse/pull/61441) ([Max K.](https://github.com/maxknv)).
* Backported in [#61641](https://github.com/ClickHouse/ClickHouse/issues/61641):. [#61592](https://github.com/ClickHouse/ClickHouse/pull/61592) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#61811](https://github.com/ClickHouse/ClickHouse/issues/61811): ![Screenshot_20240323_025055](https://github.com/ClickHouse/ClickHouse/assets/18581488/ccaab212-a1d3-4dfb-8d56-b1991760b6bf). [#61801](https://github.com/ClickHouse/ClickHouse/pull/61801) ([Alexey Milovidov](https://github.com/alexey-milovidov)).

View File

@ -0,0 +1,13 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.3.22.3-lts (04075bf96a1) FIXME as compared to v23.3.21.26-lts (d9672a3731f)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)).

View File

@ -0,0 +1,20 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.8.12.13-lts (bdbd0d87e5d) FIXME as compared to v23.8.11.28-lts (31879d2ab4c)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Improve isolation of query cache entries under re-created users or role switches [#58611](https://github.com/ClickHouse/ClickHouse/pull/58611) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)).
#### CI Fix or Improvement (changelog entry is not required)
* Backported in [#61428](https://github.com/ClickHouse/ClickHouse/issues/61428):. [#61374](https://github.com/ClickHouse/ClickHouse/pull/61374) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#61484](https://github.com/ClickHouse/ClickHouse/issues/61484): ... [#61441](https://github.com/ClickHouse/ClickHouse/pull/61441) ([Max K.](https://github.com/maxknv)).

View File

@ -0,0 +1,32 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.1.8.22-stable (7fb8f96d3da) FIXME as compared to v24.1.7.18-stable (90925babd78)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix possible incorrect result of aggregate function `uniqExact` [#61257](https://github.com/ClickHouse/ClickHouse/pull/61257) ([Anton Popov](https://github.com/CurtizJ)).
* Fix consecutive keys optimization for nullable keys [#61393](https://github.com/ClickHouse/ClickHouse/pull/61393) ([Anton Popov](https://github.com/CurtizJ)).
* Fix bug when reading system.parts using UUID (issue 61220). [#61479](https://github.com/ClickHouse/ClickHouse/pull/61479) ([Dan Wu](https://github.com/wudanzy)).
* Fix client `-s` argument [#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix string search with const position [#61547](https://github.com/ClickHouse/ClickHouse/pull/61547) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix crash in `multiSearchAllPositionsCaseInsensitiveUTF8` for incorrect UTF-8 [#61749](https://github.com/ClickHouse/ClickHouse/pull/61749) ([pufit](https://github.com/pufit)).
#### CI Fix or Improvement (changelog entry is not required)
* Backported in [#61431](https://github.com/ClickHouse/ClickHouse/issues/61431):. [#61374](https://github.com/ClickHouse/ClickHouse/pull/61374) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#61488](https://github.com/ClickHouse/ClickHouse/issues/61488): ... [#61441](https://github.com/ClickHouse/ClickHouse/pull/61441) ([Max K.](https://github.com/maxknv)).
* Backported in [#61642](https://github.com/ClickHouse/ClickHouse/issues/61642):. [#61592](https://github.com/ClickHouse/ClickHouse/pull/61592) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Backport [#61479](https://github.com/ClickHouse/ClickHouse/issues/61479) to 24.1: Fix bug when reading system.parts using UUID (issue 61220)."'. [#61775](https://github.com/ClickHouse/ClickHouse/pull/61775) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Speed up cctools building [#61011](https://github.com/ClickHouse/ClickHouse/pull/61011) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -55,9 +55,7 @@ To build using Homebrew's vanilla Clang compiler (the only **recommended** way):
cd ClickHouse
mkdir build
export PATH=$(brew --prefix llvm)/bin:$PATH
export CC=$(brew --prefix llvm)/bin/clang
export CXX=$(brew --prefix llvm)/bin/clang++
cmake -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -S . -B build
cmake -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=$(brew --prefix llvm)/bin/clang -DCMAKE_CXX_COMPILER=$(brew --prefix llvm)/bin/clang++ -S . -B build
cmake --build build
# The resulting binary will be created at: build/programs/clickhouse
```

View File

@ -153,6 +153,26 @@ Upon successful build you get an executable file `ClickHouse/<build_dir>/program
ls -l programs/clickhouse
### Advanced Building Process {#advanced-building-process}
#### Minimal Build {#minimal-build}
If you are not interested in functionality provided by third-party libraries, you can further speed up the build using `cmake` options
```
cmake -DENABLE_LIBRARIES=OFF
```
In case of problems with any of the development options, you are on your own!
#### Rust support {#rust-support}
Rust requires internet connection, in case you don't have it, you can disable Rust support:
```
cmake -DENABLE_RUST=OFF
```
## Running the Built Executable of ClickHouse {#running-the-built-executable-of-clickhouse}
To run the server under the current user you need to navigate to `ClickHouse/programs/server/` (located outside of `build`) and run:
@ -250,10 +270,3 @@ Most probably some of the builds will fail at first times. This is due to the fa
You can use GitHub integrated code browser [here](https://github.dev/ClickHouse/ClickHouse).
Also, you can browse sources on [GitHub](https://github.com/ClickHouse/ClickHouse) as usual.
If you are not interested in functionality provided by third-party libraries, you can further speed up the build using `cmake` options
```
-DENABLE_LIBRARIES=0
```
In case of problems with any of the development options, you are on your own!

View File

@ -87,6 +87,7 @@ The BACKUP and RESTORE statements take a list of DATABASE and TABLE names, a des
- `structure_only`: if enabled, allows to only backup or restore the CREATE statements without the data of tables
- `storage_policy`: storage policy for the tables being restored. See [Using Multiple Block Devices for Data Storage](../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes). This setting is only applicable to the `RESTORE` command. The specified storage policy applies only to tables with an engine from the `MergeTree` family.
- `s3_storage_class`: the storage class used for S3 backup. For example, `STANDARD`
- `azure_attempt_to_create_container`: when using Azure Blob Storage, whether the specified container will try to be created if it doesn't exist. Default: true.
### Usage examples

View File

@ -3,6 +3,7 @@ slug: /en/operations/monitoring
sidebar_position: 45
sidebar_label: Monitoring
description: You can monitor the utilization of hardware resources and also ClickHouse server metrics.
keywords: [monitoring, observability, advanced dashboard, dashboard, observability dashboard]
---
# Monitoring
@ -15,11 +16,11 @@ You can monitor:
- Utilization of hardware resources.
- ClickHouse server metrics.
## Built-in observability dashboard
## Built-in advanced observability dashboard
<img width="400" alt="Screenshot 2023-11-12 at 6 08 58 PM" src="https://github.com/ClickHouse/ClickHouse/assets/3936029/2bd10011-4a47-4b94-b836-d44557c7fdc1" />
ClickHouse comes with a built-in observability dashboard feature which can be accessed by `$HOST:$PORT/dashboard` (requires user and password) that shows the following metrics:
ClickHouse comes with a built-in advanced observability dashboard feature which can be accessed by `$HOST:$PORT/dashboard` (requires user and password) that shows the following metrics:
- Queries/second
- CPU usage (cores)
- Queries running

View File

@ -867,3 +867,31 @@ Default value: `Never`
Persists virtual column `_block_number` on merges.
Default value: false.
## exclude_deleted_rows_for_part_size_in_merge {#exclude_deleted_rows_for_part_size_in_merge}
If enabled, estimated actual size of data parts (i.e., excluding those rows that have been deleted through `DELETE FROM`) will be used when selecting parts to merge. Note that this behavior is only triggered for data parts affected by `DELETE FROM` executed after this setting is enabled.
Possible values:
- true, false
Default value: false
**See Also**
- [load_existing_rows_count_for_old_parts](#load_existing_rows_count_for_old_parts) setting
## load_existing_rows_count_for_old_parts {#load_existing_rows_count_for_old_parts}
If enabled along with [exclude_deleted_rows_for_part_size_in_merge](#exclude_deleted_rows_for_part_size_in_merge), deleted rows count for existing data parts will be calculated during table starting up. Note that it may slow down start up table loading.
Possible values:
- true, false
Default value: false
**See Also**
- [exclude_deleted_rows_for_part_size_in_merge](#exclude_deleted_rows_for_part_size_in_merge) setting

View File

@ -3132,3 +3132,17 @@ Result:
│ (616.2931945826209,108.8825,115.6175) │
└───────────────────────────────────────┘
```
## getClientHTTPHeader
Get the value of an HTTP header.
If there is no such header or the current request is not performed via the HTTP interface, the function returns an empty string.
Certain HTTP headers (e.g., `Authentication` and `X-ClickHouse-*`) are restricted.
The function requires the setting `allow_get_client_http_header` to be enabled.
The setting is not enabled by default for security reasons, because some headers, such as `Cookie`, could contain sensitive info.
HTTP headers are case sensitive for this function.
If the function is used in the context of a distributed query, it returns non-empty result only on the initiator node.

View File

@ -133,6 +133,8 @@ For the query to run successfully, the following conditions must be met:
- Both tables must have the same indices and projections.
- Both tables must have the same storage policy.
If both tables have the same storage policy, use hardlink to attach partition. Otherwise, use copying the data to attach partition.
## REPLACE PARTITION
``` sql

View File

@ -180,10 +180,16 @@ SYSTEM STOP DISTRIBUTED SENDS [db.]<distributed_table_name> [ON CLUSTER cluster_
Forces ClickHouse to send data to cluster nodes synchronously. If any nodes are unavailable, ClickHouse throws an exception and stops query execution. You can retry the query until it succeeds, which will happen when all nodes are back online.
You can also override some settings via `SETTINGS` clause, this can be useful to avoid some temporary limitations, like `max_concurrent_queries_for_all_users` or `max_memory_usage`.
``` sql
SYSTEM FLUSH DISTRIBUTED [db.]<distributed_table_name> [ON CLUSTER cluster_name]
SYSTEM FLUSH DISTRIBUTED [db.]<distributed_table_name> [ON CLUSTER cluster_name] [SETTINGS ...]
```
:::note
Each pending block is stored in disk with settings from the initial INSERT query, so that is why sometimes you may want to override settings.
:::
### START DISTRIBUTED SENDS
Enables background data distribution when inserting data into distributed tables.

View File

@ -127,7 +127,7 @@ See the [deployment](docs/en/deployment-guides/terminology.md) documentation for
#### Verify that experimental transactions are enabled
Issue a `BEGIN TRANSACTION` followed by a `ROLLBACK` to verify that experimental transactions are enabled, and that ClickHouse Keeper is enabled as it is used to track transactions.
Issue a `BEGIN TRANSACTION` or `START TRANSACTION` followed by a `ROLLBACK` to verify that experimental transactions are enabled, and that ClickHouse Keeper is enabled as it is used to track transactions.
```sql
BEGIN TRANSACTION

View File

@ -4,8 +4,8 @@ set (CLICKHOUSE_ODBC_BRIDGE_SOURCES
ColumnInfoHandler.cpp
IdentifierQuoteHandler.cpp
MainHandler.cpp
ODBCBlockInputStream.cpp
ODBCBlockOutputStream.cpp
ODBCSource.cpp
ODBCSink.cpp
ODBCBridge.cpp
ODBCHandlerFactory.cpp
PingHandler.cpp

View File

@ -1,8 +1,8 @@
#include "MainHandler.h"
#include "validateODBCConnectionString.h"
#include "ODBCBlockInputStream.h"
#include "ODBCBlockOutputStream.h"
#include "ODBCSource.h"
#include "ODBCSink.h"
#include "getIdentifierQuote.h"
#include <DataTypes/DataTypeFactory.h>
#include <Formats/FormatFactory.h>

View File

@ -1,4 +1,4 @@
#include "ODBCBlockOutputStream.h"
#include "ODBCSink.h"
#include <IO/WriteBufferFromString.h>
#include <Interpreters/Context.h>

View File

@ -1,11 +1,10 @@
#include "ODBCBlockInputStream.h"
#include "ODBCSource.h"
#include <vector>
#include <IO/ReadBufferFromString.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypeDateTime64.h>
#include <Common/assert_cast.h>
#include <IO/ReadHelpers.h>
#include <Common/logger_useful.h>
namespace DB

View File

@ -0,0 +1,73 @@
#include <Analyzer/ColumnNode.h>
#include <Analyzer/ConstantNode.h>
#include <Analyzer/FunctionNode.h>
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/Passes/ConvertInToEqualPass.h>
#include <Functions/equals.h>
#include <Functions/notEquals.h>
namespace DB
{
class ConvertInToEqualPassVisitor : public InDepthQueryTreeVisitorWithContext<ConvertInToEqualPassVisitor>
{
public:
using Base = InDepthQueryTreeVisitorWithContext<ConvertInToEqualPassVisitor>;
using Base::Base;
void enterImpl(QueryTreeNodePtr & node)
{
static const std::unordered_map<String, String> MAPPING = {
{"in", "equals"},
{"notIn", "notEquals"}
};
auto * func_node = node->as<FunctionNode>();
if (!func_node
|| !MAPPING.contains(func_node->getFunctionName())
|| func_node->getArguments().getNodes().size() != 2)
return ;
auto args = func_node->getArguments().getNodes();
auto * column_node = args[0]->as<ColumnNode>();
auto * constant_node = args[1]->as<ConstantNode>();
if (!column_node || !constant_node)
return ;
// IN multiple values is not supported
if (constant_node->getValue().getType() == Field::Types::Which::Tuple
|| constant_node->getValue().getType() == Field::Types::Which::Array)
return ;
// x IN null not equivalent to x = null
if (constant_node->getValue().isNull())
return ;
auto result_func_name = MAPPING.at(func_node->getFunctionName());
auto equal = std::make_shared<FunctionNode>(result_func_name);
QueryTreeNodes arguments{column_node->clone(), constant_node->clone()};
equal->getArguments().getNodes() = std::move(arguments);
FunctionOverloadResolverPtr resolver;
bool decimal_check_overflow = getContext()->getSettingsRef().decimal_check_overflow;
if (result_func_name == "equals")
{
resolver = createInternalFunctionEqualOverloadResolver(decimal_check_overflow);
}
else
{
resolver = createInternalFunctionNotEqualOverloadResolver(decimal_check_overflow);
}
try
{
equal->resolveAsFunction(resolver);
}
catch (...)
{
// When function resolver fails, we should not replace the function node
return;
}
node = equal;
}
};
void ConvertInToEqualPass::run(QueryTreeNodePtr & query_tree_node, ContextPtr context)
{
ConvertInToEqualPassVisitor visitor(std::move(context));
visitor.visit(query_tree_node);
}
}

View File

@ -0,0 +1,27 @@
#pragma once
#include <Analyzer/IQueryTreePass.h>
namespace DB
{
/** Optimize `in` to `equals` if possible.
* 1. convert in single value to equal
* Example: SELECT * from test where x IN (1);
* Result: SELECT * from test where x = 1;
*
* 2. convert not in single value to notEqual
* Example: SELECT * from test where x NOT IN (1);
* Result: SELECT * from test where x != 1;
*
* If value is null or tuple, do not convert.
*/
class ConvertInToEqualPass final : public IQueryTreePass
{
public:
String getName() override { return "ConvertInToEqualPass"; }
String getDescription() override { return "Convert in to equal"; }
void run(QueryTreeNodePtr & query_tree_node, ContextPtr context) override;
};
}

View File

@ -784,7 +784,12 @@ struct IdentifierResolveScope
/// and we should not convert `number` to nullable.
size_t found_nullable_group_by_key_in_scope = 0;
QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> nullable_join_columns;
/** It's possible that after a JOIN, a column in the projection has a type different from the column in the source table.
* (For example, after join_use_nulls or USING column casted to supertype)
* However, the column in the projection still refers to the table as its source.
* This map is used to revert these columns back to their original columns in the source table.
*/
QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> join_columns_with_changed_types;
/// Use identifier lookup to result cache
bool use_identifier_lookup_to_result_cache = true;
@ -1315,7 +1320,8 @@ private:
if (!resolved_expression->getResultType()->equals(*new_result_type))
resolved_expression = buildCastFunction(resolved_expression, new_result_type, scope.context, true);
}
scope.nullable_join_columns[nullable_resolved_identifier] = resolved_identifier;
if (!nullable_resolved_identifier->isEqual(*resolved_identifier))
scope.join_columns_with_changed_types[nullable_resolved_identifier] = resolved_identifier;
return nullable_resolved_identifier;
}
return nullptr;
@ -1408,6 +1414,8 @@ private:
const NamesAndTypes & matched_columns,
const IdentifierResolveScope & scope);
void updateMatchedColumnsFromJoinUsing(QueryTreeNodesWithNames & result_matched_column_nodes_with_names, const QueryTreeNodePtr & source_table_expression, IdentifierResolveScope & scope);
QueryTreeNodesWithNames resolveQualifiedMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope);
QueryTreeNodesWithNames resolveUnqualifiedMatcher(QueryTreeNodePtr & matcher_node, IdentifierResolveScope & scope);
@ -2175,10 +2183,13 @@ void QueryAnalyzer::evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & node, Iden
!nearest_query_scope)
{
auto constant_value = std::make_shared<ConstantValue>(std::move(scalar_value), scalar_type);
auto constant_node = std::make_shared<ConstantNode>(std::move(constant_value), node);
auto constant_node = std::make_shared<ConstantNode>(constant_value, node);
if (constant_node->getValue().isNull())
{
node = buildCastFunction(constant_node, constant_node->getResultType(), context);
node = std::make_shared<ConstantNode>(std::move(constant_value), node);
}
else
node = std::move(constant_node);
@ -3316,6 +3327,78 @@ QueryTreeNodePtr checkIsMissedObjectJSONSubcolumn(const QueryTreeNodePtr & left_
return {};
}
/// Used to replace columns that changed type because of JOIN to their original type
class ReplaceColumnsVisitor : public InDepthQueryTreeVisitor<ReplaceColumnsVisitor>
{
public:
explicit ReplaceColumnsVisitor(const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map_, const ContextPtr & context_)
: replacement_map(replacement_map_)
, context(context_)
{}
/// Apply replacement transitively, because column may change it's type twice, one to have a supertype and then because of `joun_use_nulls`
static QueryTreeNodePtr findTransitiveReplacement(QueryTreeNodePtr node, const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map_)
{
auto it = replacement_map_.find(node);
QueryTreeNodePtr result_node = nullptr;
for (; it != replacement_map_.end(); it = replacement_map_.find(result_node))
{
if (result_node && result_node->isEqual(*it->second))
{
Strings map_dump;
for (const auto & [k, v]: replacement_map_)
map_dump.push_back(fmt::format("{} -> {} (is_equals: {}, is_same: {})",
k.node->dumpTree(), v->dumpTree(), k.node->isEqual(*v), k.node == v));
throw Exception(ErrorCodes::LOGICAL_ERROR, "Infinite loop in query tree replacement map: {}", fmt::join(map_dump, "; "));
}
chassert(it->second);
result_node = it->second;
}
return result_node;
}
void visitImpl(QueryTreeNodePtr & node)
{
if (auto replacement_node = findTransitiveReplacement(node, replacement_map))
node = replacement_node;
if (auto * function_node = node->as<FunctionNode>(); function_node && function_node->isResolved())
rerunFunctionResolve(function_node, context);
}
/// We want to re-run resolve for function _after_ its arguments are replaced
bool shouldTraverseTopToBottom() const { return false; }
bool needChildVisit(QueryTreeNodePtr & /* parent */, QueryTreeNodePtr & child)
{
/// Visit only expressions, but not subqueries
return child->getNodeType() == QueryTreeNodeType::IDENTIFIER
|| child->getNodeType() == QueryTreeNodeType::LIST
|| child->getNodeType() == QueryTreeNodeType::FUNCTION
|| child->getNodeType() == QueryTreeNodeType::COLUMN;
}
private:
const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map;
const ContextPtr & context;
};
/// Compare resolved identifiers considering columns that become nullable after JOIN
bool resolvedIdenfiersFromJoinAreEquals(
const QueryTreeNodePtr & left_resolved_identifier,
const QueryTreeNodePtr & right_resolved_identifier,
const IdentifierResolveScope & scope)
{
auto left_original_node = ReplaceColumnsVisitor::findTransitiveReplacement(left_resolved_identifier, scope.join_columns_with_changed_types);
const auto & left_resolved_to_compare = left_original_node ? left_original_node : left_resolved_identifier;
auto right_original_node = ReplaceColumnsVisitor::findTransitiveReplacement(right_resolved_identifier, scope.join_columns_with_changed_types);
const auto & right_resolved_to_compare = right_original_node ? right_original_node : right_resolved_identifier;
return left_resolved_to_compare->isEqual(*right_resolved_to_compare, IQueryTreeNode::CompareOptions{.compare_aliases = false});
}
QueryTreeNodePtr QueryAnalyzer::tryResolveIdentifierFromJoin(const IdentifierLookup & identifier_lookup,
const QueryTreeNodePtr & table_expression_node,
IdentifierResolveScope & scope)
@ -3450,9 +3533,13 @@ QueryTreeNodePtr QueryAnalyzer::tryResolveIdentifierFromJoin(const IdentifierLoo
auto & result_column = result_column_node->as<ColumnNode &>();
result_column.setColumnType(using_column_node.getColumnType());
const auto & join_using_left_column = using_expression_list.getNodes().at(0);
if (!result_column_node->isEqual(*join_using_left_column))
scope.join_columns_with_changed_types[result_column_node] = join_using_left_column;
resolved_identifier = std::move(result_column_node);
}
else if (left_resolved_identifier->isEqual(*right_resolved_identifier, IQueryTreeNode::CompareOptions{.compare_aliases = false}))
else if (resolvedIdenfiersFromJoinAreEquals(left_resolved_identifier, right_resolved_identifier, scope))
{
const auto & identifier_path_part = identifier_lookup.identifier.front();
auto * left_resolved_identifier_column = left_resolved_identifier->as<ColumnNode>();
@ -3528,6 +3615,9 @@ QueryTreeNodePtr QueryAnalyzer::tryResolveIdentifierFromJoin(const IdentifierLoo
auto left_resolved_column_clone = std::static_pointer_cast<ColumnNode>(left_resolved_column.clone());
left_resolved_column_clone->setColumnType(using_column_node_it->second->getColumnType());
resolved_identifier = std::move(left_resolved_column_clone);
if (!resolved_identifier->isEqual(*using_column_node_it->second))
scope.join_columns_with_changed_types[resolved_identifier] = using_column_node_it->second;
}
}
}
@ -3550,6 +3640,8 @@ QueryTreeNodePtr QueryAnalyzer::tryResolveIdentifierFromJoin(const IdentifierLoo
auto right_resolved_column_clone = std::static_pointer_cast<ColumnNode>(right_resolved_column.clone());
right_resolved_column_clone->setColumnType(using_column_node_it->second->getColumnType());
resolved_identifier = std::move(right_resolved_column_clone);
if (!resolved_identifier->isEqual(*using_column_node_it->second))
scope.join_columns_with_changed_types[resolved_identifier] = using_column_node_it->second;
}
}
}
@ -3559,9 +3651,17 @@ QueryTreeNodePtr QueryAnalyzer::tryResolveIdentifierFromJoin(const IdentifierLoo
if (scope.join_use_nulls)
{
auto projection_name_it = node_to_projection_name.find(resolved_identifier);
auto nullable_resolved_identifier = convertJoinedColumnTypeToNullIfNeeded(resolved_identifier, join_kind, resolved_side, scope);
if (nullable_resolved_identifier)
{
resolved_identifier = nullable_resolved_identifier;
/// Set the same projection name for new nullable node
if (projection_name_it != node_to_projection_name.end())
{
node_to_projection_name.emplace(resolved_identifier, projection_name_it->second);
}
}
}
return resolved_identifier;
@ -4220,6 +4320,95 @@ QueryAnalyzer::QueryTreeNodesWithNames QueryAnalyzer::getMatchedColumnNodesWithN
return matched_column_nodes_with_names;
}
bool hasTableExpressionInJoinTree(const QueryTreeNodePtr & join_tree_node, const QueryTreeNodePtr & table_expression)
{
QueryTreeNodes nodes_to_process;
nodes_to_process.push_back(join_tree_node);
while (!nodes_to_process.empty())
{
auto node_to_process = std::move(nodes_to_process.back());
nodes_to_process.pop_back();
if (node_to_process == table_expression)
return true;
if (node_to_process->getNodeType() == QueryTreeNodeType::JOIN)
{
const auto & join_node = node_to_process->as<JoinNode &>();
nodes_to_process.push_back(join_node.getLeftTableExpression());
nodes_to_process.push_back(join_node.getRightTableExpression());
}
}
return false;
}
/// Columns that resolved from matcher can also match columns from JOIN USING.
/// In that case we update type to type of column in USING section.
/// TODO: It's not completely correct for qualified matchers, so t1.* should be resolved to left table column type.
/// But in planner we do not distinguish such cases.
void QueryAnalyzer::updateMatchedColumnsFromJoinUsing(
QueryTreeNodesWithNames & result_matched_column_nodes_with_names,
const QueryTreeNodePtr & source_table_expression,
IdentifierResolveScope & scope)
{
auto * nearest_query_scope = scope.getNearestQueryScope();
auto * nearest_query_scope_query_node = nearest_query_scope ? nearest_query_scope->scope_node->as<QueryNode>() : nullptr;
/// If there are no parent query scope or query scope does not have join tree
if (!nearest_query_scope_query_node || !nearest_query_scope_query_node->getJoinTree())
{
throw Exception(ErrorCodes::UNSUPPORTED_METHOD,
"There are no table sources. In scope {}",
scope.scope_node->formatASTForErrorMessage());
}
const auto & join_tree = nearest_query_scope_query_node->getJoinTree();
const auto * join_node = join_tree->as<JoinNode>();
if (join_node && join_node->isUsingJoinExpression())
{
const auto & join_using_list = join_node->getJoinExpression()->as<ListNode &>();
const auto & join_using_nodes = join_using_list.getNodes();
for (auto & [matched_column_node, _] : result_matched_column_nodes_with_names)
{
auto & matched_column_node_typed = matched_column_node->as<ColumnNode &>();
const auto & matched_column_name = matched_column_node_typed.getColumnName();
for (const auto & join_using_node : join_using_nodes)
{
auto & join_using_column_node = join_using_node->as<ColumnNode &>();
const auto & join_using_column_name = join_using_column_node.getColumnName();
if (matched_column_name != join_using_column_name)
continue;
const auto & join_using_column_nodes_list = join_using_column_node.getExpressionOrThrow()->as<ListNode &>();
const auto & join_using_column_nodes = join_using_column_nodes_list.getNodes();
auto it = node_to_projection_name.find(matched_column_node);
if (hasTableExpressionInJoinTree(join_node->getLeftTableExpression(), source_table_expression))
matched_column_node = join_using_column_nodes.at(0);
else if (hasTableExpressionInJoinTree(join_node->getRightTableExpression(), source_table_expression))
matched_column_node = join_using_column_nodes.at(1);
else
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Cannot find column {} in JOIN USING section {}",
matched_column_node->dumpTree(), join_node->dumpTree());
matched_column_node = matched_column_node->clone();
if (it != node_to_projection_name.end())
node_to_projection_name.emplace(matched_column_node, it->second);
matched_column_node->as<ColumnNode &>().setColumnType(join_using_column_node.getResultType());
if (!matched_column_node->isEqual(*join_using_column_nodes.at(0)))
scope.join_columns_with_changed_types[matched_column_node] = join_using_column_nodes.at(0);
}
}
}
}
/** Resolve qualified tree matcher.
*
* First try to match qualified identifier to expression. If qualified identifier matched expression node then
@ -4337,6 +4526,8 @@ QueryAnalyzer::QueryTreeNodesWithNames QueryAnalyzer::resolveQualifiedMatcher(Qu
matched_columns,
scope);
updateMatchedColumnsFromJoinUsing(result_matched_column_nodes_with_names, table_expression_node, scope);
return result_matched_column_nodes_with_names;
}
@ -4472,6 +4663,8 @@ QueryAnalyzer::QueryTreeNodesWithNames QueryAnalyzer::resolveUnqualifiedMatcher(
matched_column_node = matched_column_node->clone();
matched_column_node->as<ColumnNode &>().setColumnType(join_using_column_node.getResultType());
if (!matched_column_node->isEqual(*join_using_column_nodes.at(0)))
scope.join_columns_with_changed_types[matched_column_node] = join_using_column_nodes.at(0);
table_expression_column_names_to_skip.insert(join_using_column_name);
matched_expression_nodes_with_column_names.emplace_back(std::move(matched_column_node), join_using_column_name);
@ -4591,7 +4784,9 @@ ProjectionNames QueryAnalyzer::resolveMatcher(QueryTreeNodePtr & matcher_node, I
node = nullable_node;
/// Set the same projection name for new nullable node
if (projection_name_it != node_to_projection_name.end())
{
node_to_projection_name.emplace(node, projection_name_it->second);
}
}
}
}
@ -7622,29 +7817,6 @@ void QueryAnalyzer::resolveQueryJoinTreeNode(QueryTreeNodePtr & join_tree_node,
scope.table_expressions_in_resolve_process.erase(join_tree_node.get());
}
class ReplaceColumnsVisitor : public InDepthQueryTreeVisitor<ReplaceColumnsVisitor>
{
public:
explicit ReplaceColumnsVisitor(const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map_, const ContextPtr & context_)
: replacement_map(replacement_map_)
, context(context_)
{}
void visitImpl(QueryTreeNodePtr & node)
{
if (auto it = replacement_map.find(node); it != replacement_map.end())
node = it->second;
if (auto * function_node = node->as<FunctionNode>())
rerunFunctionResolve(function_node, context);
}
bool shouldTraverseTopToBottom() const { return false; }
private:
const QueryTreeNodePtrWithHashMap<QueryTreeNodePtr> & replacement_map;
const ContextPtr & context;
};
/** Resolve query.
* This function modifies query node during resolve. It is caller responsibility to clone query node before resolve
* if it is needed for later use.
@ -7836,19 +8008,17 @@ void QueryAnalyzer::resolveQuery(const QueryTreeNodePtr & query_node, Identifier
{
resolveExpressionNode(prewhere_node, scope, false /*allow_lambda_expression*/, false /*allow_table_expression*/);
if (scope.join_use_nulls)
{
/** Expression in PREWHERE with JOIN should not be modified by join_use_nulls.
* Example: SELECT * FROM t1 JOIN t2 USING (id) PREWHERE b = 1
* Column `a` should be resolved from table and should not change its type to Nullable.
* More complicated example when column is somewhere inside an expression:
* SELECT a + 1 as b FROM t1 JOIN t2 USING (id) PREWHERE b = 1
* expression `a + 1 as b` in projection and in PREWHERE should have different `a`.
*/
prewhere_node = prewhere_node->clone();
ReplaceColumnsVisitor replace_visitor(scope.nullable_join_columns, scope.context);
replace_visitor.visit(prewhere_node);
}
/** Expressions in PREWHERE with JOIN should not change their type.
* Example: SELECT * FROM t1 JOIN t2 USING (a) PREWHERE a = 1
* Column `a` in PREWHERE should be resolved from the left table
* and should not change its type to Nullable or to the supertype of `a` from t1 and t2.
* Here's a more complicated example where the column is somewhere inside an expression:
* SELECT a + 1 as b FROM t1 JOIN t2 USING (id) PREWHERE b = 1
* The expression `a + 1 as b` in the projection and in PREWHERE should have different `a`.
*/
prewhere_node = prewhere_node->clone();
ReplaceColumnsVisitor replace_visitor(scope.join_columns_with_changed_types, scope.context);
replace_visitor.visit(prewhere_node);
}
if (query_node_typed.getWhere())

View File

@ -148,6 +148,13 @@ void RemoveUnusedProjectionColumnsPass::run(QueryTreeNodePtr & query_tree_node,
for (auto & [query_or_union_node, used_columns] : visitor.query_or_union_node_to_used_columns)
{
/// can't remove columns from distinct, see example - 03023_remove_unused_column_distinct.sql
if (auto * query_node = query_or_union_node->as<QueryNode>())
{
if (query_node->isDistinct())
continue;
}
auto used_projection_indexes = convertUsedColumnNamesToUsedProjectionIndexes(query_or_union_node, used_columns);
updateUsedProjectionIndexes(query_or_union_node, used_projection_indexes);

View File

@ -28,6 +28,7 @@
#include <Analyzer/Passes/MultiIfToIfPass.h>
#include <Analyzer/Passes/IfConstantConditionPass.h>
#include <Analyzer/Passes/IfChainToMultiIfPass.h>
#include <Analyzer/Passes/ConvertInToEqualPass.h>
#include <Analyzer/Passes/OrderByTupleEliminationPass.h>
#include <Analyzer/Passes/NormalizeCountVariantsPass.h>
#include <Analyzer/Passes/AggregateFunctionsArithmericOperationsPass.h>
@ -263,6 +264,7 @@ void addQueryTreePasses(QueryTreePassManager & manager, bool only_analyze)
manager.addPass(std::make_unique<SumIfToCountIfPass>());
manager.addPass(std::make_unique<RewriteArrayExistsToHasPass>());
manager.addPass(std::make_unique<NormalizeCountVariantsPass>());
manager.addPass(std::make_unique<ConvertInToEqualPass>());
/// should before AggregateFunctionsArithmericOperationsPass
manager.addPass(std::make_unique<AggregateFunctionOfGroupByKeysPass>());

View File

@ -40,6 +40,7 @@ public:
bool deduplicate_files = true;
bool allow_s3_native_copy = true;
bool use_same_s3_credentials_for_base_backup = false;
bool azure_attempt_to_create_container = true;
ReadSettings read_settings;
WriteSettings write_settings;
};

View File

@ -140,12 +140,13 @@ BackupWriterAzureBlobStorage::BackupWriterAzureBlobStorage(
StorageAzureBlob::Configuration configuration_,
const ReadSettings & read_settings_,
const WriteSettings & write_settings_,
const ContextPtr & context_)
const ContextPtr & context_,
bool attempt_to_create_container)
: BackupWriterDefault(read_settings_, write_settings_, getLogger("BackupWriterAzureBlobStorage"))
, data_source_description{DataSourceType::ObjectStorage, ObjectStorageType::Azure, MetadataStorageType::None, configuration_.container, false, false}
, configuration(configuration_)
{
auto client_ptr = StorageAzureBlob::createClient(configuration, /* is_read_only */ false);
auto client_ptr = StorageAzureBlob::createClient(configuration, /* is_read_only */ false, attempt_to_create_container);
object_storage = std::make_unique<AzureObjectStorage>("BackupWriterAzureBlobStorage",
std::move(client_ptr),
StorageAzureBlob::createSettings(context_),
@ -275,10 +276,9 @@ std::unique_ptr<WriteBuffer> BackupWriterAzureBlobStorage::writeFile(const Strin
return std::make_unique<WriteBufferFromAzureBlobStorage>(
client,
key,
settings->max_single_part_upload_size,
settings->max_unexpected_write_error_retries,
DBMS_DEFAULT_BUFFER_SIZE,
write_settings);
write_settings,
settings);
}
void BackupWriterAzureBlobStorage::removeFile(const String & file_name)

View File

@ -37,7 +37,7 @@ private:
class BackupWriterAzureBlobStorage : public BackupWriterDefault
{
public:
BackupWriterAzureBlobStorage(StorageAzureBlob::Configuration configuration_, const ReadSettings & read_settings_, const WriteSettings & write_settings_, const ContextPtr & context_);
BackupWriterAzureBlobStorage(StorageAzureBlob::Configuration configuration_, const ReadSettings & read_settings_, const WriteSettings & write_settings_, const ContextPtr & context_, bool attempt_to_create_container);
~BackupWriterAzureBlobStorage() override;
bool fileExists(const String & file_name) override;

View File

@ -28,10 +28,13 @@ namespace ErrorCodes
M(Bool, deduplicate_files) \
M(Bool, allow_s3_native_copy) \
M(Bool, use_same_s3_credentials_for_base_backup) \
M(Bool, azure_attempt_to_create_container) \
M(Bool, read_from_filesystem_cache) \
M(UInt64, shard_num) \
M(UInt64, replica_num) \
M(Bool, check_parts) \
M(Bool, check_projection_parts) \
M(Bool, allow_backup_broken_projections) \
M(Bool, internal) \
M(String, host_id) \
M(OptionalUUID, backup_uuid)

View File

@ -47,6 +47,9 @@ struct BackupSettings
/// Whether base backup to S3 should inherit credentials from the BACKUP query.
bool use_same_s3_credentials_for_base_backup = false;
/// Whether a new Azure container should be created if it does not exist (requires permissions at storage account level)
bool azure_attempt_to_create_container = true;
/// Allow to use the filesystem cache in passive mode - benefit from the existing cache entries,
/// but don't put more entries into the cache.
bool read_from_filesystem_cache = true;
@ -62,6 +65,12 @@ struct BackupSettings
/// Check checksums of the data parts before writing them to a backup.
bool check_parts = true;
/// Check checksums of the projection data parts before writing them to a backup.
bool check_projection_parts = true;
/// Allow to create backup with broken projections.
bool allow_backup_broken_projections = false;
/// Internal, should not be specified by user.
/// Whether this backup is a part of a distributed backup created by BACKUP ON CLUSTER.
bool internal = false;

View File

@ -597,6 +597,7 @@ void BackupsWorker::doBackup(
backup_create_params.deduplicate_files = backup_settings.deduplicate_files;
backup_create_params.allow_s3_native_copy = backup_settings.allow_s3_native_copy;
backup_create_params.use_same_s3_credentials_for_base_backup = backup_settings.use_same_s3_credentials_for_base_backup;
backup_create_params.azure_attempt_to_create_container = backup_settings.azure_attempt_to_create_container;
backup_create_params.read_settings = getReadSettingsForBackup(context, backup_settings);
backup_create_params.write_settings = getWriteSettingsForBackup(context);
backup = BackupFactory::instance().createBackup(backup_create_params);

View File

@ -86,7 +86,7 @@ void registerBackupEngineAzureBlobStorage(BackupFactory & factory)
if (args.size() == 3)
{
configuration.connection_url = args[0].safeGet<String>();
configuration.is_connection_string = true;
configuration.is_connection_string = !configuration.connection_url.starts_with("http");
configuration.container = args[1].safeGet<String>();
configuration.blob_path = args[2].safeGet<String>();
@ -147,7 +147,8 @@ void registerBackupEngineAzureBlobStorage(BackupFactory & factory)
auto writer = std::make_shared<BackupWriterAzureBlobStorage>(configuration,
params.read_settings,
params.write_settings,
params.context);
params.context,
params.azure_attempt_to_create_container);
return std::make_unique<BackupImpl>(
params.backup_info,

View File

@ -132,14 +132,6 @@ public:
void setThrottler(const ThrottlerPtr &) override {}
private:
void initBlockInput();
void processOrdinaryQuery();
void processOrdinaryQueryWithProcessors();
void updateState();
bool pullBlock(Block & block);
void finishQuery();

View File

@ -2,7 +2,6 @@
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/Combinators/AggregateFunctionCombinatorFactory.h>
#include <Core/Settings.h>
#include <Columns/ColumnString.h>
#include <Common/typeid_cast.h>
#include <Common/Macros.h>

View File

@ -2,6 +2,7 @@
#include <limits>
#include <optional>
#include <magic_enum.hpp>
#include <fmt/format.h>
#include <base/defines.h>
#include <base/scope_guard.h>
@ -201,9 +202,10 @@ void LoadTask::remove()
}
}
AsyncLoader::AsyncLoader(std::vector<PoolInitializer> pool_initializers, bool log_failures_, bool log_progress_)
AsyncLoader::AsyncLoader(std::vector<PoolInitializer> pool_initializers, bool log_failures_, bool log_progress_, bool log_events_)
: log_failures(log_failures_)
, log_progress(log_progress_)
, log_events(log_events_)
, log(getLogger("AsyncLoader"))
{
pools.reserve(pool_initializers.size());
@ -332,6 +334,8 @@ void AsyncLoader::schedule(const LoadJobSet & jobs_to_schedule)
ALLOW_ALLOCATIONS_IN_SCOPE;
scheduled_jobs.try_emplace(job);
job->scheduled(++last_job_id);
if (log_events)
LOG_DEBUG(log, "Schedule load job '{}' into {}", job->name, getPoolName(job->pool()));
});
}
@ -587,6 +591,14 @@ void AsyncLoader::finish(const LoadJobPtr & job, LoadStatus status, std::excepti
else if (status == LoadStatus::CANCELED)
job->canceled(reason);
if (log_events)
{
NOEXCEPT_SCOPE({
ALLOW_ALLOCATIONS_IN_SCOPE;
LOG_DEBUG(log, "Finish load job '{}' with status {}", job->name, magic_enum::enum_name(status));
});
}
Info & info = scheduled_jobs[job];
if (info.isReady())
{
@ -666,6 +678,14 @@ void AsyncLoader::prioritize(const LoadJobPtr & job, size_t new_pool_id, std::un
job->pool_id.store(new_pool_id);
if (log_events)
{
NOEXCEPT_SCOPE({
ALLOW_ALLOCATIONS_IN_SCOPE;
LOG_DEBUG(log, "Prioritize load job '{}': {} -> {}", job->name, old_pool.name, new_pool.name);
});
}
// Recurse into dependencies
for (const auto & dep : job->dependencies)
prioritize(dep, new_pool_id, lock);
@ -770,6 +790,9 @@ void AsyncLoader::wait(std::unique_lock<std::mutex> & job_lock, const LoadJobPtr
if (job->load_status != LoadStatus::PENDING) // Shortcut just to avoid incrementing ProfileEvents
return;
if (log_events)
LOG_DEBUG(log, "Wait load job '{}' in {}", job->name, getPoolName(job->pool_id));
if (job->on_waiters_increment)
job->on_waiters_increment(job);
@ -808,6 +831,20 @@ bool AsyncLoader::canWorkerLive(Pool & pool, std::unique_lock<std::mutex> &)
&& (!current_priority || *current_priority >= pool.priority);
}
void AsyncLoader::setCurrentPriority(std::unique_lock<std::mutex> &, std::optional<Priority> priority)
{
if (log_events && current_priority != priority)
{
NOEXCEPT_SCOPE({
ALLOW_ALLOCATIONS_IN_SCOPE;
LOG_DEBUG(log, "Change current priority: {} -> {}",
current_priority ? std::to_string(*current_priority) : "none",
priority ? std::to_string(*priority) : "none");
});
}
current_priority = priority;
}
void AsyncLoader::updateCurrentPriorityAndSpawn(std::unique_lock<std::mutex> & lock)
{
// Find current priority.
@ -818,7 +855,7 @@ void AsyncLoader::updateCurrentPriorityAndSpawn(std::unique_lock<std::mutex> & l
if (pool.isActive() && (!priority || *priority > pool.priority))
priority = pool.priority;
}
current_priority = priority;
setCurrentPriority(lock, priority);
// Spawn workers in all pools with current priority
for (Pool & pool : pools)
@ -828,12 +865,14 @@ void AsyncLoader::updateCurrentPriorityAndSpawn(std::unique_lock<std::mutex> & l
}
}
void AsyncLoader::spawn(Pool & pool, std::unique_lock<std::mutex> &)
void AsyncLoader::spawn(Pool & pool, std::unique_lock<std::mutex> & lock)
{
setCurrentPriority(lock, pool.priority); // canSpawnWorker() ensures this would not decrease current_priority
pool.workers++;
current_priority = pool.priority; // canSpawnWorker() ensures this would not decrease current_priority
NOEXCEPT_SCOPE({
ALLOW_ALLOCATIONS_IN_SCOPE;
if (log_events)
LOG_DEBUG(log, "Spawn loader worker #{} in {}", pool.workers, pool.name);
pool.thread_pool->scheduleOrThrowOnError([this, &pool] { worker(pool); });
});
}
@ -861,6 +900,13 @@ void AsyncLoader::worker(Pool & pool)
if (!canWorkerLive(pool, lock))
{
if (log_events)
{
NOEXCEPT_SCOPE({
ALLOW_ALLOCATIONS_IN_SCOPE;
LOG_DEBUG(log, "Stop worker in {}", pool.name);
});
}
if (--pool.workers == 0)
updateCurrentPriorityAndSpawn(lock); // It will spawn lower priority workers if needed
return;
@ -871,6 +917,14 @@ void AsyncLoader::worker(Pool & pool)
job = it->second;
pool.ready_queue.erase(it);
scheduled_jobs.find(job)->second.ready_seqno = 0; // This job is no longer in the ready queue
if (log_events)
{
NOEXCEPT_SCOPE({
ALLOW_ALLOCATIONS_IN_SCOPE;
LOG_DEBUG(log, "Execute load job '{}' in {}", job->name, pool.name);
});
}
}
ALLOW_ALLOCATIONS_IN_SCOPE;

View File

@ -390,7 +390,7 @@ private:
};
public:
AsyncLoader(std::vector<PoolInitializer> pool_initializers, bool log_failures_, bool log_progress_);
AsyncLoader(std::vector<PoolInitializer> pool_initializers, bool log_failures_, bool log_progress_, bool log_events_);
// WARNING: all tasks instances should be destructed before associated AsyncLoader.
~AsyncLoader();
@ -470,6 +470,7 @@ private:
void wait(std::unique_lock<std::mutex> & job_lock, const LoadJobPtr & job);
bool canSpawnWorker(Pool & pool, std::unique_lock<std::mutex> & lock);
bool canWorkerLive(Pool & pool, std::unique_lock<std::mutex> & lock);
void setCurrentPriority(std::unique_lock<std::mutex> & lock, std::optional<Priority> priority);
void updateCurrentPriorityAndSpawn(std::unique_lock<std::mutex> & lock);
void spawn(Pool & pool, std::unique_lock<std::mutex> & lock);
void worker(Pool & pool);
@ -478,6 +479,7 @@ private:
// Logging
const bool log_failures; // Worker should log all exceptions caught from job functions.
const bool log_progress; // Periodically log total progress
const bool log_events; // Log all important events: job start/end, waits, prioritizations
LoggerPtr log;
mutable std::mutex mutex; // Guards all the fields below.

View File

@ -1,22 +1,18 @@
#include "config.h"
#if USE_AWS_S3
#include <IO/WriteBufferFromS3.h>
#include "BufferAllocationPolicy.h"
#include <memory>
namespace
namespace DB
{
class FixedSizeBufferAllocationPolicy : public DB::WriteBufferFromS3::IBufferAllocationPolicy
class FixedSizeBufferAllocationPolicy : public BufferAllocationPolicy
{
const size_t buffer_size = 0;
size_t buffer_number = 0;
public:
explicit FixedSizeBufferAllocationPolicy(const DB::S3Settings::RequestSettings::PartUploadSettings & settings_)
: buffer_size(settings_.strict_upload_part_size)
explicit FixedSizeBufferAllocationPolicy(const BufferAllocationPolicy::Settings & settings_)
: buffer_size(settings_.strict_size)
{
chassert(buffer_size > 0);
}
@ -36,7 +32,7 @@ public:
};
class ExpBufferAllocationPolicy : public DB::WriteBufferFromS3::IBufferAllocationPolicy
class ExpBufferAllocationPolicy : public DB::BufferAllocationPolicy
{
const size_t first_size = 0;
const size_t second_size = 0;
@ -49,12 +45,12 @@ class ExpBufferAllocationPolicy : public DB::WriteBufferFromS3::IBufferAllocatio
size_t buffer_number = 0;
public:
explicit ExpBufferAllocationPolicy(const DB::S3Settings::RequestSettings::PartUploadSettings & settings_)
: first_size(std::max(settings_.max_single_part_upload_size, settings_.min_upload_part_size))
, second_size(settings_.min_upload_part_size)
, multiply_factor(settings_.upload_part_size_multiply_factor)
, multiply_threshold(settings_.upload_part_size_multiply_parts_count_threshold)
, max_size(settings_.max_upload_part_size)
explicit ExpBufferAllocationPolicy(const BufferAllocationPolicy::Settings & settings_)
: first_size(std::max(settings_.max_single_size, settings_.min_size))
, second_size(settings_.min_size)
, multiply_factor(settings_.multiply_factor)
, multiply_threshold(settings_.multiply_parts_count_threshold)
, max_size(settings_.max_size)
{
chassert(first_size > 0);
chassert(second_size > 0);
@ -92,16 +88,12 @@ public:
}
};
}
namespace DB
BufferAllocationPolicy::~BufferAllocationPolicy() = default;
BufferAllocationPolicyPtr BufferAllocationPolicy::create(BufferAllocationPolicy::Settings settings_)
{
WriteBufferFromS3::IBufferAllocationPolicy::~IBufferAllocationPolicy() = default;
WriteBufferFromS3::IBufferAllocationPolicyPtr WriteBufferFromS3::ChooseBufferPolicy(const S3Settings::RequestSettings::PartUploadSettings & settings_)
{
if (settings_.strict_upload_part_size > 0)
if (settings_.strict_size > 0)
return std::make_unique<FixedSizeBufferAllocationPolicy>(settings_);
else
return std::make_unique<ExpBufferAllocationPolicy>(settings_);
@ -109,4 +101,3 @@ WriteBufferFromS3::IBufferAllocationPolicyPtr WriteBufferFromS3::ChooseBufferPol
}
#endif

View File

@ -0,0 +1,39 @@
#pragma once
#include "config.h"
#include "logger_useful.h"
#include <list>
namespace DB
{
class BufferAllocationPolicy;
using BufferAllocationPolicyPtr = std::unique_ptr<BufferAllocationPolicy>;
/// Buffer number starts with 0
class BufferAllocationPolicy
{
public:
struct Settings
{
size_t strict_size = 0;
size_t min_size = 16 * 1024 * 1024;
size_t max_size = 5ULL * 1024 * 1024 * 1024;
size_t multiply_factor = 2;
size_t multiply_parts_count_threshold = 500;
size_t max_single_size = 32 * 1024 * 1024; /// Max size for a single buffer/block
};
virtual size_t getBufferNumber() const = 0;
virtual size_t getBufferSize() const = 0;
virtual void nextBuffer() = 0;
virtual ~BufferAllocationPolicy() = 0;
static BufferAllocationPolicyPtr create(Settings settings_);
};
}

View File

@ -46,28 +46,45 @@ struct LastElementCacheStats
namespace columns_hashing_impl
{
template <typename Value, bool consecutive_keys_optimization_>
struct LastElementCache
struct LastElementCacheBase
{
static constexpr bool consecutive_keys_optimization = consecutive_keys_optimization_;
Value value;
bool empty = true;
bool found = false;
UInt64 misses = 0;
bool check(const Value & value_) const { return value == value_; }
template <typename Key>
bool check(const Key & key) const { return value.first == key; }
void onNewValue(bool is_found)
{
empty = false;
found = is_found;
++misses;
}
bool hasOnlyOneValue() const { return found && misses == 1; }
};
template <typename Data>
struct LastElementCache<Data, false>
template <typename Value, bool nullable> struct LastElementCache;
template <typename Value>
struct LastElementCache<Value, true> : public LastElementCacheBase
{
static constexpr bool consecutive_keys_optimization = false;
Value value{};
bool is_null = false;
template <typename Key>
bool check(const Key & key) const { return !is_null && value.first == key; }
bool check(const Value & rhs) const { return !is_null && value == rhs; }
};
template <typename Value>
struct LastElementCache<Value, false> : public LastElementCacheBase
{
Value value{};
template <typename Key>
bool check(const Key & key) const { return value.first == key; }
bool check(const Value & rhs) const { return value == rhs; }
};
template <typename Mapped>
@ -161,7 +178,7 @@ public:
using EmplaceResult = EmplaceResultImpl<Mapped>;
using FindResult = FindResultImpl<Mapped, need_offset>;
static constexpr bool has_mapped = !std::is_same_v<Mapped, void>;
using Cache = LastElementCache<Value, consecutive_keys_optimization>;
using Cache = LastElementCache<Value, nullable>;
static HashMethodContextPtr createContext(const HashMethodContext::Settings &) { return nullptr; }
@ -172,6 +189,15 @@ public:
{
if (isNullAt(row))
{
if constexpr (consecutive_keys_optimization)
{
if (!cache.is_null)
{
cache.onNewValue(true);
cache.is_null = true;
}
}
bool has_null_key = data.hasNullKeyData();
data.hasNullKeyData() = true;
@ -193,10 +219,21 @@ public:
{
if (isNullAt(row))
{
bool has_null_key = data.hasNullKeyData();
if constexpr (consecutive_keys_optimization)
{
if (!cache.is_null)
{
cache.onNewValue(has_null_key);
cache.is_null = true;
}
}
if constexpr (has_mapped)
return FindResult(&data.getNullKeyData(), data.hasNullKeyData(), 0);
return FindResult(&data.getNullKeyData(), has_null_key, 0);
else
return FindResult(data.hasNullKeyData(), 0);
return FindResult(has_null_key, 0);
}
}
@ -303,9 +340,10 @@ protected:
if constexpr (consecutive_keys_optimization)
{
cache.found = true;
cache.empty = false;
++cache.misses;
cache.onNewValue(true);
if constexpr (nullable)
cache.is_null = false;
if constexpr (has_mapped)
{
@ -346,17 +384,16 @@ protected:
if constexpr (consecutive_keys_optimization)
{
cache.found = it != nullptr;
cache.empty = false;
++cache.misses;
cache.onNewValue(it != nullptr);
if constexpr (nullable)
cache.is_null = false;
if constexpr (has_mapped)
{
cache.value.first = key;
if (it)
{
cache.value.second = it->getMapped();
}
}
else
{

View File

@ -591,6 +591,7 @@
M(710, FAULT_INJECTED) \
M(711, FILECACHE_ACCESS_DENIED) \
M(712, TOO_MANY_MATERIALIZED_VIEWS) \
M(713, BROKEN_PROJECTION) \
M(714, UNEXPECTED_CLUSTER) \
M(715, CANNOT_DETECT_FORMAT) \
M(716, CANNOT_FORGET_PARTITION) \

View File

@ -0,0 +1,19 @@
#include <Common/ExternalLoaderStatus.h>
#include <base/EnumReflection.h>
namespace DB
{
std::vector<std::pair<String, Int8>> getExternalLoaderStatusEnumAllPossibleValues()
{
std::vector<std::pair<String, Int8>> out;
out.reserve(magic_enum::enum_count<ExternalLoaderStatus>());
for (const auto & [value, str] : magic_enum::enum_entries<ExternalLoaderStatus>())
out.emplace_back(std::string{str}, static_cast<Int8>(value));
return out;
}
}

View File

@ -1,30 +1,20 @@
#pragma once
#include <vector>
#include <base/EnumReflection.h>
#include <base/types.h>
namespace DB
{
enum class ExternalLoaderStatus : int8_t
{
NOT_LOADED, /// Object hasn't been tried to load. This is an initial state.
LOADED, /// Object has been loaded successfully.
FAILED, /// Object has been failed to load.
LOADING, /// Object is being loaded right now for the first time.
FAILED_AND_RELOADING, /// Object was failed to load before and it's being reloaded right now.
LOADED_AND_RELOADING, /// Object was loaded successfully before and it's being reloaded right now.
NOT_EXIST, /// Object with this name wasn't found in the configuration.
};
enum class ExternalLoaderStatus : int8_t
{
NOT_LOADED, /// Object hasn't been tried to load. This is an initial state.
LOADED, /// Object has been loaded successfully.
FAILED, /// Object has been failed to load.
LOADING, /// Object is being loaded right now for the first time.
FAILED_AND_RELOADING, /// Object was failed to load before and it's being reloaded right now.
LOADED_AND_RELOADING, /// Object was loaded successfully before and it's being reloaded right now.
NOT_EXIST, /// Object with this name wasn't found in the configuration.
};
inline std::vector<std::pair<String, Int8>> getStatusEnumAllPossibleValues()
{
std::vector<std::pair<String, Int8>> out;
out.reserve(magic_enum::enum_count<ExternalLoaderStatus>());
for (const auto & [value, str] : magic_enum::enum_entries<ExternalLoaderStatus>())
out.emplace_back(std::string{str}, static_cast<Int8>(value));
return out;
}
std::vector<std::pair<String, Int8>> getExternalLoaderStatusEnumAllPossibleValues();
}

View File

@ -1,11 +1,9 @@
#pragma once
#include <condition_variable>
#include <mutex>
#include <type_traits>
#include <variant>
#include <boost/noncopyable.hpp>
#include <condition_variable>
#include <Poco/Timespan.h>
#include <boost/noncopyable.hpp>
#include <Common/logger_useful.h>
#include <Common/Exception.h>
@ -17,6 +15,14 @@ namespace ProfileEvents
extern const Event ConnectionPoolIsFullMicroseconds;
}
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
}
/** A class from which you can inherit and get a pool of something. Used for database connection pools.
* Descendant class must provide a method for creating a new object to place in the pool.
*/
@ -29,22 +35,6 @@ public:
using ObjectPtr = std::shared_ptr<Object>;
using Ptr = std::shared_ptr<PoolBase<TObject>>;
enum class BehaviourOnLimit
{
/**
* Default behaviour - when limit on pool size is reached, callers will wait until object will be returned back in pool.
*/
Wait,
/**
* If no free objects in pool - allocate a new object, but not store it in pool.
* This behaviour is needed when we simply don't want to waste time waiting or if we cannot guarantee that query could be processed using fixed amount of connections.
* For example, when we read from table on s3, one GetObject request corresponds to the whole FileSystemCache segment. This segments are shared between different
* reading tasks, so in general case connection could be taken from pool by one task and returned back by another one. And these tasks are processed completely independently.
*/
AllocateNewBypassingPool,
};
private:
/** The object with the flag, whether it is currently used. */
@ -99,53 +89,37 @@ public:
Object & operator*() && = delete;
const Object & operator*() const && = delete;
Object * operator->() & { return castToObjectPtr(); }
const Object * operator->() const & { return castToObjectPtr(); }
Object & operator*() & { return *castToObjectPtr(); }
const Object & operator*() const & { return *castToObjectPtr(); }
Object * operator->() & { return &*data->data.object; }
const Object * operator->() const & { return &*data->data.object; }
Object & operator*() & { return *data->data.object; }
const Object & operator*() const & { return *data->data.object; }
/**
* Expire an object to make it reallocated later.
*/
void expire()
{
if (data.index() == 1)
std::get<1>(data)->data.is_expired = true;
data->data.is_expired = true;
}
bool isNull() const { return data.index() == 0 ? !std::get<0>(data) : !std::get<1>(data); }
bool isNull() const { return data == nullptr; }
PoolBase * getPool() const
{
if (!data)
throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "Attempt to get pool from uninitialized entry");
return &data->data.pool;
}
private:
/**
* Plain object will be stored instead of PoolEntryHelper if fallback was made in get() (see BehaviourOnLimit::AllocateNewBypassingPool).
*/
std::variant<ObjectPtr, std::shared_ptr<PoolEntryHelper>> data;
std::shared_ptr<PoolEntryHelper> data;
explicit Entry(ObjectPtr && object) : data(std::move(object)) { }
explicit Entry(PooledObject & object) : data(std::make_shared<PoolEntryHelper>(object)) { }
auto castToObjectPtr() const
{
return std::visit(
[](const auto & ptr)
{
using T = std::decay_t<decltype(ptr)>;
if constexpr (std::is_same_v<ObjectPtr, T>)
return ptr.get();
else
return ptr->data.object.get();
},
data);
}
explicit Entry(PooledObject & object) : data(std::make_shared<PoolEntryHelper>(object)) {}
};
virtual ~PoolBase() = default;
/** Allocates the object.
* If 'behaviour_on_limit' is Wait - wait for free object in pool for 'timeout'. With 'timeout' < 0, the timeout is infinite.
* If 'behaviour_on_limit' is AllocateNewBypassingPool and there is no free object - a new object will be created but not stored in the pool.
*/
/** Allocates the object. Wait for free object in pool for 'timeout'. With 'timeout' < 0, the timeout is infinite. */
Entry get(Poco::Timespan::TimeDiff timeout)
{
std::unique_lock lock(mutex);
@ -176,9 +150,6 @@ public:
return Entry(*items.back());
}
if (behaviour_on_limit == BehaviourOnLimit::AllocateNewBypassingPool)
return Entry(allocObject());
Stopwatch blocked;
if (timeout < 0)
{
@ -213,8 +184,6 @@ private:
/** The maximum size of the pool. */
unsigned max_items;
BehaviourOnLimit behaviour_on_limit;
/** Pool. */
Objects items;
@ -225,8 +194,8 @@ private:
protected:
LoggerPtr log;
PoolBase(unsigned max_items_, LoggerPtr log_, BehaviourOnLimit behaviour_on_limit_ = BehaviourOnLimit::Wait)
: max_items(max_items_), behaviour_on_limit(behaviour_on_limit_), log(log_)
PoolBase(unsigned max_items_, LoggerPtr log_)
: max_items(max_items_), log(log_)
{
items.reserve(max_items);
}

View File

@ -1,8 +1,6 @@
#include "config.h"
#if USE_AWS_S3
#include <IO/WriteBufferFromS3TaskTracker.h>
#include "ThreadPoolTaskTracker.h"
namespace ProfileEvents
{
@ -12,19 +10,19 @@ namespace ProfileEvents
namespace DB
{
WriteBufferFromS3::TaskTracker::TaskTracker(ThreadPoolCallbackRunner<void> scheduler_, size_t max_tasks_inflight_, LogSeriesLimiterPtr limitedLog_)
TaskTracker::TaskTracker(ThreadPoolCallbackRunner<void> scheduler_, size_t max_tasks_inflight_, LogSeriesLimiterPtr limitedLog_)
: is_async(bool(scheduler_))
, scheduler(scheduler_ ? std::move(scheduler_) : syncRunner())
, max_tasks_inflight(max_tasks_inflight_)
, limitedLog(limitedLog_)
{}
WriteBufferFromS3::TaskTracker::~TaskTracker()
TaskTracker::~TaskTracker()
{
safeWaitAll();
}
ThreadPoolCallbackRunner<void> WriteBufferFromS3::TaskTracker::syncRunner()
ThreadPoolCallbackRunner<void> TaskTracker::syncRunner()
{
return [](Callback && callback, int64_t) mutable -> std::future<void>
{
@ -35,7 +33,7 @@ ThreadPoolCallbackRunner<void> WriteBufferFromS3::TaskTracker::syncRunner()
};
}
void WriteBufferFromS3::TaskTracker::waitAll()
void TaskTracker::waitAll()
{
/// Exceptions are propagated
for (auto & future : futures)
@ -48,7 +46,7 @@ void WriteBufferFromS3::TaskTracker::waitAll()
finished_futures.clear();
}
void WriteBufferFromS3::TaskTracker::safeWaitAll()
void TaskTracker::safeWaitAll()
{
for (auto & future : futures)
{
@ -71,7 +69,7 @@ void WriteBufferFromS3::TaskTracker::safeWaitAll()
finished_futures.clear();
}
void WriteBufferFromS3::TaskTracker::waitIfAny()
void TaskTracker::waitIfAny()
{
if (futures.empty())
return;
@ -99,7 +97,7 @@ void WriteBufferFromS3::TaskTracker::waitIfAny()
ProfileEvents::increment(ProfileEvents::WriteBufferFromS3WaitInflightLimitMicroseconds, watch.elapsedMicroseconds());
}
void WriteBufferFromS3::TaskTracker::add(Callback && func)
void TaskTracker::add(Callback && func)
{
/// All this fuzz is about 2 things. This is the most critical place of TaskTracker.
/// The first is not to fail insertion in the list `futures`.
@ -134,7 +132,7 @@ void WriteBufferFromS3::TaskTracker::add(Callback && func)
waitTilInflightShrink();
}
void WriteBufferFromS3::TaskTracker::waitTilInflightShrink()
void TaskTracker::waitTilInflightShrink()
{
if (!max_tasks_inflight)
return;
@ -166,11 +164,10 @@ void WriteBufferFromS3::TaskTracker::waitTilInflightShrink()
ProfileEvents::increment(ProfileEvents::WriteBufferFromS3WaitInflightLimitMicroseconds, watch.elapsedMicroseconds());
}
bool WriteBufferFromS3::TaskTracker::isAsync() const
bool TaskTracker::isAsync() const
{
return is_async;
}
}
#endif

View File

@ -1,20 +1,16 @@
#pragma once
#include "config.h"
#include "threadPoolCallbackRunner.h"
#include "IO/WriteBufferFromS3.h"
#if USE_AWS_S3
#include "WriteBufferFromS3.h"
#include <Common/logger_useful.h>
#include "logger_useful.h"
#include <list>
namespace DB
{
/// That class is used only in WriteBufferFromS3 for now.
/// Therefore it declared as a part of WriteBufferFromS3.
/// TaskTracker takes a Callback which is run by scheduler in some external shared ThreadPool.
/// TaskTracker brings the methods waitIfAny, waitAll/safeWaitAll
/// to help with coordination of the running tasks.
@ -22,7 +18,7 @@ namespace DB
/// Basic exception safety is provided. If exception occurred the object has to be destroyed.
/// No thread safety is provided. Use this object with no concurrency.
class WriteBufferFromS3::TaskTracker
class TaskTracker
{
public:
using Callback = std::function<void()>;
@ -68,5 +64,3 @@ private:
};
}
#endif

View File

@ -191,7 +191,8 @@ namespace VolnitskyTraits
if (length_l != length_r)
return false;
assert(length_l >= 2 && length_r >= 2);
if (length_l < 2 || length_r < 2)
return false; /// Some part of the given ngram contains an invalid UTF-8 sequence.
chars.c0 = seq_l[seq_ngram_offset];
chars.c1 = seq_l[seq_ngram_offset + 1];
@ -253,7 +254,9 @@ namespace VolnitskyTraits
if (size_l != size_u)
return false;
assert(size_l >= 1 && size_u >= 1);
if (size_l == 0 || size_u == 0)
return false; /// Some part of the given ngram contains an invalid UTF-8 sequence.
chars.c1 = seq_l[0];
putNGramBase(n, offset);
@ -276,7 +279,8 @@ namespace VolnitskyTraits
if (size_l != size_u)
return false;
assert(size_l > seq_ngram_offset && size_u > seq_ngram_offset);
if (size_l <= seq_ngram_offset || size_u <= seq_ngram_offset)
return false; /// Some part of the given ngram contains an invalid UTF-8 sequence.
chars.c0 = seq_l[seq_ngram_offset];
putNGramBase(n, offset);
@ -302,10 +306,8 @@ namespace VolnitskyTraits
if (size_first_l != size_first_u || size_second_l != size_second_u)
return false;
assert(size_first_l > seq_ngram_offset);
assert(size_first_u > seq_ngram_offset);
assert(size_second_l > 0);
assert(size_second_u > 0);
if (size_first_l <= seq_ngram_offset || size_first_u <= seq_ngram_offset || size_second_l == 0 || size_second_u == 0)
return false;
auto c0l = first_l_seq[seq_ngram_offset];
auto c0u = first_u_seq[seq_ngram_offset];
@ -399,7 +401,7 @@ public:
if (fallback || fallback_searcher.force_fallback)
return;
hash = std::unique_ptr<VolnitskyTraits::Offset[]>(new VolnitskyTraits::Offset[VolnitskyTraits::hash_size]{});
hash = std::make_unique<VolnitskyTraits::Offset[]>(VolnitskyTraits::hash_size);
auto callback = [this](const VolnitskyTraits::Ngram ngram, const int offset) { return this->putNGramBase(ngram, offset); };
/// ssize_t is used here because unsigned can't be used with condition like `i >= 0`, unsigned always >= 0

View File

@ -812,7 +812,7 @@ void ZooKeeper::receiveEvent()
RequestInfo request_info;
ZooKeeperResponsePtr response;
UInt64 elapsed_ms = 0;
UInt64 elapsed_microseconds = 0;
maybeInjectRecvFault();
@ -875,8 +875,8 @@ void ZooKeeper::receiveEvent()
CurrentMetrics::sub(CurrentMetrics::ZooKeeperRequest);
}
elapsed_ms = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - request_info.time).count();
ProfileEvents::increment(ProfileEvents::ZooKeeperWaitMicroseconds, elapsed_ms);
elapsed_microseconds = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - request_info.time).count();
ProfileEvents::increment(ProfileEvents::ZooKeeperWaitMicroseconds, elapsed_microseconds);
}
try
@ -935,7 +935,7 @@ void ZooKeeper::receiveEvent()
length, actual_length);
}
logOperationIfNeeded(request_info.request, response, /* finalize= */ false, elapsed_ms);
logOperationIfNeeded(request_info.request, response, /* finalize= */ false, elapsed_microseconds);
}
catch (...)
{
@ -954,7 +954,7 @@ void ZooKeeper::receiveEvent()
if (request_info.callback)
request_info.callback(*response);
logOperationIfNeeded(request_info.request, response, /* finalize= */ false, elapsed_ms);
logOperationIfNeeded(request_info.request, response, /* finalize= */ false, elapsed_microseconds);
}
catch (...)
{
@ -1048,14 +1048,14 @@ void ZooKeeper::finalize(bool error_send, bool error_receive, const String & rea
? Error::ZCONNECTIONLOSS
: Error::ZSESSIONEXPIRED;
response->xid = request_info.request->xid;
UInt64 elapsed_ms = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - request_info.time).count();
UInt64 elapsed_microseconds = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - request_info.time).count();
if (request_info.callback)
{
try
{
request_info.callback(*response);
logOperationIfNeeded(request_info.request, response, true, elapsed_ms);
logOperationIfNeeded(request_info.request, response, true, elapsed_microseconds);
}
catch (...)
{
@ -1115,8 +1115,8 @@ void ZooKeeper::finalize(bool error_send, bool error_receive, const String & rea
try
{
info.callback(*response);
UInt64 elapsed_ms = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - info.time).count();
logOperationIfNeeded(info.request, response, true, elapsed_ms);
UInt64 elapsed_microseconds = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - info.time).count();
logOperationIfNeeded(info.request, response, true, elapsed_microseconds);
}
catch (...)
{
@ -1498,7 +1498,7 @@ void ZooKeeper::setZooKeeperLog(std::shared_ptr<DB::ZooKeeperLog> zk_log_)
}
#ifdef ZOOKEEPER_LOG
void ZooKeeper::logOperationIfNeeded(const ZooKeeperRequestPtr & request, const ZooKeeperResponsePtr & response, bool finalize, UInt64 elapsed_ms)
void ZooKeeper::logOperationIfNeeded(const ZooKeeperRequestPtr & request, const ZooKeeperResponsePtr & response, bool finalize, UInt64 elapsed_microseconds)
{
auto maybe_zk_log = std::atomic_load(&zk_log);
if (!maybe_zk_log)
@ -1536,7 +1536,7 @@ void ZooKeeper::logOperationIfNeeded(const ZooKeeperRequestPtr & request, const
elem.event_time = event_time;
elem.address = socket_address;
elem.session_id = session_id;
elem.duration_ms = elapsed_ms;
elem.duration_microseconds = elapsed_microseconds;
if (request)
{
elem.thread_id = request->thread_id;

View File

@ -343,7 +343,7 @@ private:
void flushWriteBuffer();
ReadBuffer & getReadBuffer();
void logOperationIfNeeded(const ZooKeeperRequestPtr & request, const ZooKeeperResponsePtr & response = nullptr, bool finalize = false, UInt64 elapsed_ms = 0);
void logOperationIfNeeded(const ZooKeeperRequestPtr & request, const ZooKeeperResponsePtr & response = nullptr, bool finalize = false, UInt64 elapsed_microseconds = 0);
void initFeatureFlags();

View File

@ -107,6 +107,18 @@ namespace impl
(PRIORITY), _file_function.c_str(), __LINE__, _format_string); \
_channel->log(_poco_message); \
} \
catch (const Poco::Exception & logger_exception) \
{ \
::write(STDERR_FILENO, static_cast<const void *>(MESSAGE_FOR_EXCEPTION_ON_LOGGING), sizeof(MESSAGE_FOR_EXCEPTION_ON_LOGGING)); \
const std::string & logger_exception_message = logger_exception.message(); \
::write(STDERR_FILENO, static_cast<const void *>(logger_exception_message.data()), logger_exception_message.size()); \
} \
catch (const std::exception & logger_exception) \
{ \
::write(STDERR_FILENO, static_cast<const void *>(MESSAGE_FOR_EXCEPTION_ON_LOGGING), sizeof(MESSAGE_FOR_EXCEPTION_ON_LOGGING)); \
const char * logger_exception_message = logger_exception.what(); \
::write(STDERR_FILENO, static_cast<const void *>(logger_exception_message), strlen(logger_exception_message)); \
} \
catch (...) \
{ \
::write(STDERR_FILENO, static_cast<const void *>(MESSAGE_FOR_EXCEPTION_ON_LOGGING), sizeof(MESSAGE_FOR_EXCEPTION_ON_LOGGING)); \

View File

@ -50,7 +50,7 @@ struct AsyncLoaderTest
pcg64 rng{randomSeed()};
explicit AsyncLoaderTest(std::vector<Initializer> initializers)
: loader(getPoolInitializers(initializers), /* log_failures = */ false, /* log_progress = */ false)
: loader(getPoolInitializers(initializers), /* log_failures = */ false, /* log_progress = */ false, /* log_events = */ false)
{
loader.stop(); // All tests call `start()` manually to better control ordering
}

View File

@ -17,6 +17,9 @@
#include <Core/ExternalTable.h>
#include <Poco/Net/MessageHeader.h>
#include <Parsers/ASTNameTypePair.h>
#include <Parsers/ParserCreateQuery.h>
#include <Parsers/parseQuery.h>
#include <base/scope_guard.h>
@ -28,6 +31,18 @@ namespace ErrorCodes
extern const int BAD_ARGUMENTS;
}
/// Parsing a list of types with `,` as separator. For example, `Int, Enum('foo'=1,'bar'=2), Double`
/// Used in `parseStructureFromTypesField`
class ParserTypeList : public IParserBase
{
protected:
const char * getName() const override { return "type pair list"; }
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override
{
return ParserList(std::make_unique<ParserDataType>(), std::make_unique<ParserToken>(TokenType::Comma), false)
.parse(pos, node, expected);
}
};
ExternalTableDataPtr BaseExternalTable::getData(ContextPtr context)
{
@ -55,23 +70,36 @@ void BaseExternalTable::clear()
void BaseExternalTable::parseStructureFromStructureField(const std::string & argument)
{
std::vector<std::string> vals;
splitInto<' ', ','>(vals, argument, true);
ParserNameTypePairList parser;
const auto * pos = argument.data();
String error;
ASTPtr columns_list_raw = tryParseQuery(parser, pos, pos + argument.size(), error, false, "", false, DBMS_DEFAULT_MAX_QUERY_SIZE, DBMS_DEFAULT_MAX_PARSER_DEPTH, DBMS_DEFAULT_MAX_PARSER_BACKTRACKS, true);
if (vals.size() % 2 != 0)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Odd number of attributes in section structure: {}", vals.size());
if (!columns_list_raw)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Error while parsing table structure: {}", error);
for (size_t i = 0; i < vals.size(); i += 2)
structure.emplace_back(vals[i], vals[i + 1]);
for (auto & child : columns_list_raw->children)
{
auto * column = child->as<ASTNameTypePair>();
if (column)
structure.emplace_back(column->name, column->type->getColumnNameWithoutAlias());
else
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Error while parsing table structure: expected column definition, got {}", child->formatForErrorMessage());
}
}
void BaseExternalTable::parseStructureFromTypesField(const std::string & argument)
{
std::vector<std::string> vals;
splitInto<' ', ','>(vals, argument, true);
ParserTypeList parser;
const auto * pos = argument.data();
String error;
ASTPtr type_list_raw = tryParseQuery(parser, pos, pos+argument.size(), error, false, "", false, DBMS_DEFAULT_MAX_QUERY_SIZE, DBMS_DEFAULT_MAX_PARSER_DEPTH, DBMS_DEFAULT_MAX_PARSER_BACKTRACKS, true);
for (size_t i = 0; i < vals.size(); ++i)
structure.emplace_back("_" + toString(i + 1), vals[i]);
if (!type_list_raw)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Error while parsing table structure: {}", error);
for (size_t i = 0; i < type_list_raw->children.size(); ++i)
structure.emplace_back("_" + toString(i + 1), type_list_raw->children[i]->getColumnNameWithoutAlias());
}
void BaseExternalTable::initSampleBlock()

View File

@ -78,11 +78,17 @@ class IColumn;
M(UInt64, distributed_connections_pool_size, 1024, "Maximum number of connections with one remote server in the pool.", 0) \
M(UInt64, connections_with_failover_max_tries, 3, "The maximum number of attempts to connect to replicas.", 0) \
M(UInt64, s3_strict_upload_part_size, 0, "The exact size of part to upload during multipart upload to S3 (some implementations does not supports variable size parts).", 0) \
M(UInt64, azure_strict_upload_part_size, 0, "The exact size of part to upload during multipart upload to Azure blob storage.", 0) \
M(UInt64, s3_min_upload_part_size, 16*1024*1024, "The minimum size of part to upload during multipart upload to S3.", 0) \
M(UInt64, s3_max_upload_part_size, 5ull*1024*1024*1024, "The maximum size of part to upload during multipart upload to S3.", 0) \
M(UInt64, azure_min_upload_part_size, 16*1024*1024, "The minimum size of part to upload during multipart upload to Azure blob storage.", 0) \
M(UInt64, azure_max_upload_part_size, 5ull*1024*1024*1024, "The maximum size of part to upload during multipart upload to Azure blob storage.", 0) \
M(UInt64, s3_upload_part_size_multiply_factor, 2, "Multiply s3_min_upload_part_size by this factor each time s3_multiply_parts_count_threshold parts were uploaded from a single write to S3.", 0) \
M(UInt64, s3_upload_part_size_multiply_parts_count_threshold, 500, "Each time this number of parts was uploaded to S3, s3_min_upload_part_size is multiplied by s3_upload_part_size_multiply_factor.", 0) \
M(UInt64, s3_max_inflight_parts_for_one_file, 20, "The maximum number of a concurrent loaded parts in multipart upload request. 0 means unlimited. You ", 0) \
M(UInt64, azure_upload_part_size_multiply_factor, 2, "Multiply azure_min_upload_part_size by this factor each time azure_multiply_parts_count_threshold parts were uploaded from a single write to Azure blob storage.", 0) \
M(UInt64, azure_upload_part_size_multiply_parts_count_threshold, 500, "Each time this number of parts was uploaded to Azure blob storage, azure_min_upload_part_size is multiplied by azure_upload_part_size_multiply_factor.", 0) \
M(UInt64, s3_max_inflight_parts_for_one_file, 20, "The maximum number of a concurrent loaded parts in multipart upload request. 0 means unlimited.", 0) \
M(UInt64, azure_max_inflight_parts_for_one_file, 20, "The maximum number of a concurrent loaded parts in multipart upload request. 0 means unlimited.", 0) \
M(UInt64, s3_max_single_part_upload_size, 32*1024*1024, "The maximum size of object to upload using singlepart upload to S3.", 0) \
M(UInt64, azure_max_single_part_upload_size, 100*1024*1024, "The maximum size of object to upload using singlepart upload to Azure blob storage.", 0) \
M(UInt64, azure_max_single_part_copy_size, 256*1024*1024, "The maximum size of object to copy using single part copy to Azure blob storage.", 0) \
@ -162,6 +168,7 @@ class IColumn;
M(Bool, allow_suspicious_indices, false, "Reject primary/secondary indexes and sorting keys with identical expressions", 0) \
M(Bool, allow_suspicious_ttl_expressions, false, "Reject TTL expressions that don't depend on any of table's columns. It indicates a user error most of the time.", 0) \
M(Bool, allow_suspicious_variant_types, false, "In CREATE TABLE statement allows specifying Variant type with similar variant types (for example, with different numeric or date types). Enabling this setting may introduce some ambiguity when working with values with similar types.", 0) \
M(Bool, allow_suspicious_primary_key, false, "Forbid suspicious PRIMARY KEY/ORDER BY for MergeTree (i.e. SimpleAggregateFunction)", 0) \
M(Bool, compile_expressions, false, "Compile some scalar functions and operators to native code.", 0) \
M(UInt64, min_count_to_compile_expression, 3, "The number of identical expressions before they are JIT-compiled", 0) \
M(Bool, compile_aggregate_expressions, true, "Compile aggregate functions to native code.", 0) \
@ -861,6 +868,8 @@ class IColumn;
M(Bool, use_variant_as_common_type, false, "Use Variant as a result type for if/multiIf in case when there is no common type for arguments", 0) \
M(Bool, enable_order_by_all, true, "Enable sorting expression ORDER BY ALL.", 0) \
M(Bool, traverse_shadow_remote_data_paths, false, "Traverse shadow directory when query system.remote_data_paths", 0) \
M(Bool, geo_distance_returns_float64_on_float64_arguments, true, "If all four arguments to `geoDistance`, `greatCircleDistance`, `greatCircleAngle` functions are Float64, return Float64 and use double precision for internal calculations. In previous ClickHouse versions, the functions always returned Float32.", 0) \
M(Bool, allow_get_client_http_header, false, "Allow to use the function `getClientHTTPHeader` which lets to obtain a value of an the current HTTP request's header. It is not enabled by default for security reasons, because some headers, such as `Cookie`, could contain sensitive info. Note that the `X-ClickHouse-*` and `Authentication` headers are always restricted and cannot be obtained with this function.", 0) \
\
/** Experimental functions */ \
M(Bool, allow_experimental_materialized_postgresql_table, false, "Allows to use the MaterializedPostgreSQL table engine. Disabled by default, because this feature is experimental", 0) \
@ -1090,14 +1099,15 @@ class IColumn;
M(UInt64, output_format_pretty_max_rows, 10000, "Rows limit for Pretty formats.", 0) \
M(UInt64, output_format_pretty_max_column_pad_width, 250, "Maximum width to pad all values in a column in Pretty formats.", 0) \
M(UInt64, output_format_pretty_max_value_width, 10000, "Maximum width of value to display in Pretty formats. If greater - it will be cut.", 0) \
M(UInt64, output_format_pretty_max_value_width_apply_for_single_value, false, "Only cut values (see the `output_format_pretty_max_value_width` setting) when it is not a single value in a block. Otherwise output it entirely, which is useful for the `SHOW CREATE TABLE` query.", 0) \
M(UInt64Auto, output_format_pretty_color, "auto", "Use ANSI escape sequences in Pretty formats. 0 - disabled, 1 - enabled, 'auto' - enabled if a terminal.", 0) \
M(String, output_format_pretty_grid_charset, "UTF-8", "Charset for printing grid borders. Available charsets: ASCII, UTF-8 (default one).", 0) \
M(UInt64, output_format_parquet_row_group_size, 1000000, "Target row group size in rows.", 0) \
M(UInt64, output_format_parquet_row_group_size_bytes, 512 * 1024 * 1024, "Target row group size in bytes, before compression.", 0) \
M(Bool, output_format_parquet_string_as_string, false, "Use Parquet String type instead of Binary for String columns.", 0) \
M(Bool, output_format_parquet_string_as_string, true, "Use Parquet String type instead of Binary for String columns.", 0) \
M(Bool, output_format_parquet_fixed_string_as_fixed_byte_array, true, "Use Parquet FIXED_LENGTH_BYTE_ARRAY type instead of Binary for FixedString columns.", 0) \
M(ParquetVersion, output_format_parquet_version, "2.latest", "Parquet format version for output format. Supported versions: 1.0, 2.4, 2.6 and 2.latest (default)", 0) \
M(ParquetCompression, output_format_parquet_compression_method, "lz4", "Compression method for Parquet output format. Supported codecs: snappy, lz4, brotli, zstd, gzip, none (uncompressed)", 0) \
M(ParquetCompression, output_format_parquet_compression_method, "zstd", "Compression method for Parquet output format. Supported codecs: snappy, lz4, brotli, zstd, gzip, none (uncompressed)", 0) \
M(Bool, output_format_parquet_compliant_nested_types, true, "In parquet file schema, use name 'element' instead of 'item' for list elements. This is a historical artifact of Arrow library implementation. Generally increases compatibility, except perhaps with some old versions of Arrow.", 0) \
M(Bool, output_format_parquet_use_custom_encoder, false, "Use a faster Parquet encoder implementation.", 0) \
M(Bool, output_format_parquet_parallel_encoding, true, "Do Parquet encoding in multiple threads. Requires output_format_parquet_use_custom_encoder.", 0) \
@ -1138,7 +1148,7 @@ class IColumn;
\
M(Bool, output_format_enable_streaming, false, "Enable streaming in output formats that support it.", 0) \
M(Bool, output_format_write_statistics, true, "Write statistics about read rows, bytes, time elapsed in suitable output formats.", 0) \
M(Bool, output_format_pretty_row_numbers, false, "Add row numbers before each row for pretty output format", 0) \
M(Bool, output_format_pretty_row_numbers, true, "Add row numbers before each row for pretty output format", 0) \
M(Bool, output_format_pretty_highlight_digit_groups, true, "If enabled and if output is a terminal, highlight every digit corresponding to the number of thousands, millions, etc. with underline.", 0) \
M(UInt64, output_format_pretty_single_large_number_tip_threshold, 1'000'000, "Print a readable number tip on the right side of the table if the block consists of a single number which exceeds this value (except 0)", 0) \
M(Bool, insert_distributed_one_random_shard, false, "If setting is enabled, inserting into distributed table will choose a random shard to write when there is no sharding key", 0) \
@ -1149,12 +1159,12 @@ class IColumn;
M(Bool, output_format_arrow_low_cardinality_as_dictionary, false, "Enable output LowCardinality type as Dictionary Arrow type", 0) \
M(Bool, output_format_arrow_use_signed_indexes_for_dictionary, true, "Use signed integers for dictionary indexes in Arrow format", 0) \
M(Bool, output_format_arrow_use_64_bit_indexes_for_dictionary, false, "Always use 64 bit integers for dictionary indexes in Arrow format", 0) \
M(Bool, output_format_arrow_string_as_string, false, "Use Arrow String type instead of Binary for String columns", 0) \
M(Bool, output_format_arrow_string_as_string, true, "Use Arrow String type instead of Binary for String columns", 0) \
M(Bool, output_format_arrow_fixed_string_as_fixed_byte_array, true, "Use Arrow FIXED_SIZE_BINARY type instead of Binary for FixedString columns.", 0) \
M(ArrowCompression, output_format_arrow_compression_method, "lz4_frame", "Compression method for Arrow output format. Supported codecs: lz4_frame, zstd, none (uncompressed)", 0) \
\
M(Bool, output_format_orc_string_as_string, false, "Use ORC String type instead of Binary for String columns", 0) \
M(ORCCompression, output_format_orc_compression_method, "lz4", "Compression method for ORC output format. Supported codecs: lz4, snappy, zlib, zstd, none (uncompressed)", 0) \
M(Bool, output_format_orc_string_as_string, true, "Use ORC String type instead of Binary for String columns", 0) \
M(ORCCompression, output_format_orc_compression_method, "zstd", "Compression method for ORC output format. Supported codecs: lz4, snappy, zlib, zstd, none (uncompressed)", 0) \
M(UInt64, output_format_orc_row_index_stride, 10'000, "Target row index stride in ORC output format", 0) \
\
M(CapnProtoEnumComparingMode, format_capn_proto_enum_comparising_mode, FormatSettings::CapnProtoEnumComparingMode::BY_VALUES, "How to map ClickHouse Enum and CapnProto Enum", 0) \

View File

@ -97,6 +97,7 @@ static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> sett
{"parallel_replicas_allow_in_with_subquery", false, true, "If true, subquery for IN will be executed on every follower replica"},
{"log_processors_profiles", false, true, "Enable by default"},
{"function_locate_has_mysql_compatible_argument_order", false, true, "Increase compatibility with MySQL's locate function."},
{"allow_suspicious_primary_key", true, false, "Forbid suspicious PRIMARY KEY/ORDER BY for MergeTree (i.e. SimpleAggregateFunction)"},
{"filesystem_cache_reserve_space_wait_lock_timeout_milliseconds", 1000, 1000, "Wait time to lock cache for sapce reservation in filesystem cache"},
{"max_parser_backtracks", 0, 1000000, "Limiting the complexity of parsing"},
{"analyzer_compatibility_join_using_top_level_identifier", false, false, "Force to resolve identifier in JOIN USING from projection"},
@ -105,7 +106,22 @@ static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> sett
{"keeper_retry_max_backoff_ms", 5000, 5000, "Max backoff timeout for general keeper operations"},
{"s3queue_allow_experimental_sharded_mode", false, false, "Enable experimental sharded mode of S3Queue table engine. It is experimental because it will be rewritten"},
{"merge_tree_read_split_ranges_into_intersecting_and_non_intersecting_injection_probability", 0.0, 0.0, "For testing of `PartsSplitter` - split read ranges into intersecting and non intersecting every time you read from MergeTree with the specified probability."},
{"allow_get_client_http_header", false, false, "Introduced a new function."},
{"output_format_pretty_row_numbers", false, true, "It is better for usability."},
{"output_format_pretty_max_value_width_apply_for_single_value", true, false, "Single values in Pretty formats won't be cut."},
{"output_format_parquet_string_as_string", false, true, "ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases."},
{"output_format_orc_string_as_string", false, true, "ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases."},
{"output_format_arrow_string_as_string", false, true, "ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases."},
{"output_format_parquet_compression_method", "lz4", "zstd", "Parquet/ORC/Arrow support many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools, such as 'duckdb', lack support for the faster `lz4` compression method, that's why we set zstd by default."},
{"output_format_orc_compression_method", "lz4", "zstd", "Parquet/ORC/Arrow support many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools, such as 'duckdb', lack support for the faster `lz4` compression method, that's why we set zstd by default."},
{"output_format_pretty_highlight_digit_groups", false, true, "If enabled and if output is a terminal, highlight every digit corresponding to the number of thousands, millions, etc. with underline."},
{"geo_distance_returns_float64_on_float64_arguments", false, true, "Increase the default precision."},
{"azure_max_inflight_parts_for_one_file", 20, 20, "The maximum number of a concurrent loaded parts in multipart upload request. 0 means unlimited."},
{"azure_strict_upload_part_size", 0, 0, "The exact size of part to upload during multipart upload to Azure blob storage."},
{"azure_min_upload_part_size", 16*1024*1024, 16*1024*1024, "The minimum size of part to upload during multipart upload to Azure blob storage."},
{"azure_max_upload_part_size", 5ull*1024*1024*1024, 5ull*1024*1024*1024, "The maximum size of part to upload during multipart upload to Azure blob storage."},
{"azure_upload_part_size_multiply_factor", 2, 2, "Multiply azure_min_upload_part_size by this factor each time azure_multiply_parts_count_threshold parts were uploaded from a single write to Azure blob storage."},
{"azure_upload_part_size_multiply_parts_count_threshold", 500, 500, "Each time this number of parts was uploaded to Azure blob storage, azure_min_upload_part_size is multiplied by azure_upload_part_size_multiply_factor."},
}},
{"24.2", {{"allow_suspicious_variant_types", true, false, "Don't allow creating Variant type with suspicious variants by default"},
{"validate_experimental_and_suspicious_types_inside_nested_types", false, true, "Validate usage of experimental and suspicious types inside nested types"},

View File

@ -18,6 +18,16 @@ namespace ErrorCodes
extern const int UNKNOWN_UNION;
}
template <typename Type>
constexpr auto getEnumValues()
{
std::array<std::pair<std::string_view, Type>, magic_enum::enum_count<Type>()> enum_values{};
size_t index = 0;
for (auto value : magic_enum::enum_values<Type>())
enum_values[index++] = std::pair{magic_enum::enum_name(value), value};
return enum_values;
}
IMPLEMENT_SETTING_ENUM(LoadBalancing, ErrorCodes::UNKNOWN_LOAD_BALANCING,
{{"random", LoadBalancing::RANDOM},
{"nearest_hostname", LoadBalancing::NEAREST_HOSTNAME},

View File

@ -13,6 +13,108 @@
namespace DB
{
template <typename Type>
constexpr auto getEnumValues();
/// NOLINTNEXTLINE
#define DECLARE_SETTING_ENUM(ENUM_TYPE) \
DECLARE_SETTING_ENUM_WITH_RENAME(ENUM_TYPE, ENUM_TYPE)
/// NOLINTNEXTLINE
#define DECLARE_SETTING_ENUM_WITH_RENAME(NEW_NAME, ENUM_TYPE) \
struct SettingField##NEW_NAME##Traits \
{ \
using EnumType = ENUM_TYPE; \
using EnumValuePairs = std::pair<const char *, EnumType>[]; \
static const String & toString(EnumType value); \
static EnumType fromString(std::string_view str); \
}; \
\
using SettingField##NEW_NAME = SettingFieldEnum<ENUM_TYPE, SettingField##NEW_NAME##Traits>;
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_ENUM(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, ...) \
IMPLEMENT_SETTING_ENUM_IMPL(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, EnumValuePairs, __VA_ARGS__)
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_AUTO_ENUM(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME) \
IMPLEMENT_SETTING_ENUM_IMPL(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, , getEnumValues<EnumType>())
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_ENUM_IMPL(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, PAIRS_TYPE, ...) \
const String & SettingField##NEW_NAME##Traits::toString(typename SettingField##NEW_NAME::EnumType value) \
{ \
static const std::unordered_map<EnumType, String> map = [] { \
std::unordered_map<EnumType, String> res; \
for (const auto & [name, val] : PAIRS_TYPE __VA_ARGS__) \
res.emplace(val, name); \
return res; \
}(); \
auto it = map.find(value); \
if (it != map.end()) \
return it->second; \
throw Exception(ERROR_CODE_FOR_UNEXPECTED_NAME, \
"Unexpected value of " #NEW_NAME ":{}", std::to_string(std::underlying_type_t<EnumType>(value))); \
} \
\
typename SettingField##NEW_NAME::EnumType SettingField##NEW_NAME##Traits::fromString(std::string_view str) \
{ \
static const std::unordered_map<std::string_view, EnumType> map = [] { \
std::unordered_map<std::string_view, EnumType> res; \
for (const auto & [name, val] : PAIRS_TYPE __VA_ARGS__) \
res.emplace(name, val); \
return res; \
}(); \
auto it = map.find(str); \
if (it != map.end()) \
return it->second; \
String msg; \
bool need_comma = false; \
for (auto & name : map | boost::adaptors::map_keys) \
{ \
if (std::exchange(need_comma, true)) \
msg += ", "; \
msg += "'" + String{name} + "'"; \
} \
throw Exception(ERROR_CODE_FOR_UNEXPECTED_NAME, "Unexpected value of " #NEW_NAME ": '{}'. Must be one of [{}]", String{str}, msg); \
}
/// NOLINTNEXTLINE
#define DECLARE_SETTING_MULTI_ENUM(ENUM_TYPE) \
DECLARE_SETTING_MULTI_ENUM_WITH_RENAME(ENUM_TYPE, ENUM_TYPE)
/// NOLINTNEXTLINE
#define DECLARE_SETTING_MULTI_ENUM_WITH_RENAME(ENUM_TYPE, NEW_NAME) \
struct SettingField##NEW_NAME##Traits \
{ \
using EnumType = ENUM_TYPE; \
using EnumValuePairs = std::pair<const char *, EnumType>[]; \
static size_t getEnumSize(); \
static const String & toString(EnumType value); \
static EnumType fromString(std::string_view str); \
}; \
\
using SettingField##NEW_NAME = SettingFieldMultiEnum<ENUM_TYPE, SettingField##NEW_NAME##Traits>; \
using NEW_NAME##List = typename SettingField##NEW_NAME::ValueType;
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_MULTI_ENUM(ENUM_TYPE, ERROR_CODE_FOR_UNEXPECTED_NAME, ...) \
IMPLEMENT_SETTING_MULTI_ENUM_WITH_RENAME(ENUM_TYPE, ERROR_CODE_FOR_UNEXPECTED_NAME, __VA_ARGS__)
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_MULTI_ENUM_WITH_RENAME(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, ...) \
IMPLEMENT_SETTING_ENUM(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, __VA_ARGS__)\
size_t SettingField##NEW_NAME##Traits::getEnumSize() {\
return std::initializer_list<std::pair<const char*, NEW_NAME>> __VA_ARGS__ .size();\
}
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_MULTI_AUTO_ENUM(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME) \
IMPLEMENT_SETTING_AUTO_ENUM(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME)\
size_t SettingField##NEW_NAME##Traits::getEnumSize() {\
return getEnumValues<EnumType>().size();\
}
enum class LoadBalancing
{
/// among replicas with a minimum number of errors selected randomly

View File

@ -7,9 +7,7 @@
#include <Core/MultiEnum.h>
#include <boost/range/adaptor/map.hpp>
#include <chrono>
#include <unordered_map>
#include <string_view>
#include <magic_enum.hpp>
namespace DB
@ -380,79 +378,6 @@ void SettingFieldEnum<EnumT, Traits>::readBinary(ReadBuffer & in)
*this = Traits::fromString(SettingFieldEnumHelpers::readBinary(in));
}
template <typename Type>
constexpr auto getEnumValues()
{
std::array<std::pair<std::string_view, Type>, magic_enum::enum_count<Type>()> enum_values{};
size_t index = 0;
for (auto value : magic_enum::enum_values<Type>())
enum_values[index++] = std::pair{magic_enum::enum_name(value), value};
return enum_values;
}
/// NOLINTNEXTLINE
#define DECLARE_SETTING_ENUM(ENUM_TYPE) \
DECLARE_SETTING_ENUM_WITH_RENAME(ENUM_TYPE, ENUM_TYPE)
/// NOLINTNEXTLINE
#define DECLARE_SETTING_ENUM_WITH_RENAME(NEW_NAME, ENUM_TYPE) \
struct SettingField##NEW_NAME##Traits \
{ \
using EnumType = ENUM_TYPE; \
using EnumValuePairs = std::pair<const char *, EnumType>[]; \
static const String & toString(EnumType value); \
static EnumType fromString(std::string_view str); \
}; \
\
using SettingField##NEW_NAME = SettingFieldEnum<ENUM_TYPE, SettingField##NEW_NAME##Traits>;
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_ENUM(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, ...) \
IMPLEMENT_SETTING_ENUM_IMPL(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, EnumValuePairs, __VA_ARGS__)
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_AUTO_ENUM(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME) \
IMPLEMENT_SETTING_ENUM_IMPL(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, , getEnumValues<EnumType>())
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_ENUM_IMPL(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, PAIRS_TYPE, ...) \
const String & SettingField##NEW_NAME##Traits::toString(typename SettingField##NEW_NAME::EnumType value) \
{ \
static const std::unordered_map<EnumType, String> map = [] { \
std::unordered_map<EnumType, String> res; \
for (const auto & [name, val] : PAIRS_TYPE __VA_ARGS__) \
res.emplace(val, name); \
return res; \
}(); \
auto it = map.find(value); \
if (it != map.end()) \
return it->second; \
throw Exception(ERROR_CODE_FOR_UNEXPECTED_NAME, \
"Unexpected value of " #NEW_NAME ":{}", std::to_string(std::underlying_type_t<EnumType>(value))); \
} \
\
typename SettingField##NEW_NAME::EnumType SettingField##NEW_NAME##Traits::fromString(std::string_view str) \
{ \
static const std::unordered_map<std::string_view, EnumType> map = [] { \
std::unordered_map<std::string_view, EnumType> res; \
for (const auto & [name, val] : PAIRS_TYPE __VA_ARGS__) \
res.emplace(name, val); \
return res; \
}(); \
auto it = map.find(str); \
if (it != map.end()) \
return it->second; \
String msg; \
bool need_comma = false; \
for (auto & name : map | boost::adaptors::map_keys) \
{ \
if (std::exchange(need_comma, true)) \
msg += ", "; \
msg += "'" + String{name} + "'"; \
} \
throw Exception(ERROR_CODE_FOR_UNEXPECTED_NAME, "Unexpected value of " #NEW_NAME ": '{}'. Must be one of [{}]", String{str}, msg); \
}
// Mostly like SettingFieldEnum, but can have multiple enum values (or none) set at once.
template <typename Enum, typename Traits>
struct SettingFieldMultiEnum
@ -543,42 +468,6 @@ void SettingFieldMultiEnum<EnumT, Traits>::readBinary(ReadBuffer & in)
parseFromString(SettingFieldEnumHelpers::readBinary(in));
}
/// NOLINTNEXTLINE
#define DECLARE_SETTING_MULTI_ENUM(ENUM_TYPE) \
DECLARE_SETTING_MULTI_ENUM_WITH_RENAME(ENUM_TYPE, ENUM_TYPE)
/// NOLINTNEXTLINE
#define DECLARE_SETTING_MULTI_ENUM_WITH_RENAME(ENUM_TYPE, NEW_NAME) \
struct SettingField##NEW_NAME##Traits \
{ \
using EnumType = ENUM_TYPE; \
using EnumValuePairs = std::pair<const char *, EnumType>[]; \
static size_t getEnumSize(); \
static const String & toString(EnumType value); \
static EnumType fromString(std::string_view str); \
}; \
\
using SettingField##NEW_NAME = SettingFieldMultiEnum<ENUM_TYPE, SettingField##NEW_NAME##Traits>; \
using NEW_NAME##List = typename SettingField##NEW_NAME::ValueType;
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_MULTI_ENUM(ENUM_TYPE, ERROR_CODE_FOR_UNEXPECTED_NAME, ...) \
IMPLEMENT_SETTING_MULTI_ENUM_WITH_RENAME(ENUM_TYPE, ERROR_CODE_FOR_UNEXPECTED_NAME, __VA_ARGS__)
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_MULTI_ENUM_WITH_RENAME(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, ...) \
IMPLEMENT_SETTING_ENUM(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, __VA_ARGS__)\
size_t SettingField##NEW_NAME##Traits::getEnumSize() {\
return std::initializer_list<std::pair<const char*, NEW_NAME>> __VA_ARGS__ .size();\
}
/// NOLINTNEXTLINE
#define IMPLEMENT_SETTING_MULTI_AUTO_ENUM(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME) \
IMPLEMENT_SETTING_AUTO_ENUM(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME)\
size_t SettingField##NEW_NAME##Traits::getEnumSize() {\
return getEnumValues<EnumType>().size();\
}
/// Setting field for specifying user-defined timezone. It is basically a string, but it needs validation.
struct SettingFieldTimezone
{

View File

@ -71,6 +71,17 @@ void applyMetadataChangesToCreateQuery(const ASTPtr & query, const StorageInMemo
query->replace(ast_create_query.refresh_strategy, metadata.refresh);
}
if (metadata.sql_security_type)
{
auto new_sql_security = std::make_shared<ASTSQLSecurity>();
new_sql_security->type = metadata.sql_security_type;
if (metadata.definer)
new_sql_security->definer = std::make_shared<ASTUserNameWithHost>(*metadata.definer);
ast_create_query.sql_security = std::move(new_sql_security);
}
/// MaterializedView, Dictionary are types of CREATE query without storage.
if (ast_create_query.storage)
{

View File

@ -41,33 +41,6 @@ enum class AttributeUnderlyingType : TypeIndexUnderlying
#undef map_item
#define CALL_FOR_ALL_DICTIONARY_ATTRIBUTE_TYPES(M) \
M(UInt8) \
M(UInt16) \
M(UInt32) \
M(UInt64) \
M(UInt128) \
M(UInt256) \
M(Int8) \
M(Int16) \
M(Int32) \
M(Int64) \
M(Int128) \
M(Int256) \
M(Decimal32) \
M(Decimal64) \
M(Decimal128) \
M(Decimal256) \
M(DateTime64) \
M(Float32) \
M(Float64) \
M(UUID) \
M(IPv4) \
M(IPv6) \
M(String) \
M(Array)
/// Min and max lifetimes for a dictionary or its entry
using DictionaryLifetime = ExternalLoadableLifetime;

View File

@ -8,7 +8,7 @@
#include <Common/Throttler.h>
#include <base/sleep.h>
#include <Common/ProfileEvents.h>
#include <IO/SeekableReadBuffer.h>
namespace ProfileEvents
{
@ -27,7 +27,6 @@ namespace ErrorCodes
extern const int LOGICAL_ERROR;
}
ReadBufferFromAzureBlobStorage::ReadBufferFromAzureBlobStorage(
std::shared_ptr<const Azure::Storage::Blobs::BlobContainerClient> blob_container_client_,
const String & path_,
@ -56,7 +55,6 @@ ReadBufferFromAzureBlobStorage::ReadBufferFromAzureBlobStorage(
}
}
void ReadBufferFromAzureBlobStorage::setReadUntilEnd()
{
if (read_until_position)
@ -105,7 +103,7 @@ bool ReadBufferFromAzureBlobStorage::nextImpl()
auto handle_exception = [&, this](const auto & e, size_t i)
{
LOG_INFO(log, "Exception caught during Azure Read for file {} at attempt {}/{}: {}", path, i + 1, max_single_read_retries, e.Message);
LOG_DEBUG(log, "Exception caught during Azure Read for file {} at attempt {}/{}: {}", path, i + 1, max_single_read_retries, e.Message);
if (i + 1 == max_single_read_retries)
throw;
@ -139,7 +137,6 @@ bool ReadBufferFromAzureBlobStorage::nextImpl()
return true;
}
off_t ReadBufferFromAzureBlobStorage::seek(off_t offset_, int whence)
{
if (offset_ == getPosition() && whence == SEEK_SET)
@ -193,13 +190,11 @@ off_t ReadBufferFromAzureBlobStorage::seek(off_t offset_, int whence)
return offset;
}
off_t ReadBufferFromAzureBlobStorage::getPosition()
{
return offset - available();
}
void ReadBufferFromAzureBlobStorage::initialize()
{
if (initialized)
@ -220,7 +215,7 @@ void ReadBufferFromAzureBlobStorage::initialize()
auto handle_exception = [&, this](const auto & e, size_t i)
{
LOG_INFO(log, "Exception caught during Azure Download for file {} at offset {} at attempt {}/{}: {}", path, offset, i + 1, max_single_download_retries, e.Message);
LOG_DEBUG(log, "Exception caught during Azure Download for file {} at offset {} at attempt {}/{}: {}", path, offset, i + 1, max_single_download_retries, e.Message);
if (i + 1 == max_single_download_retries)
throw;
@ -262,6 +257,47 @@ size_t ReadBufferFromAzureBlobStorage::getFileSize()
return *file_size;
}
size_t ReadBufferFromAzureBlobStorage::readBigAt(char * to, size_t n, size_t range_begin, const std::function<bool(size_t)> & /*progress_callback*/) const
{
size_t initial_n = n;
size_t sleep_time_with_backoff_milliseconds = 100;
for (size_t i = 0; i < max_single_download_retries && n > 0; ++i)
{
size_t bytes_copied = 0;
try
{
Azure::Storage::Blobs::DownloadBlobOptions download_options;
download_options.Range = {static_cast<int64_t>(range_begin), n};
auto download_response = blob_client->Download(download_options);
std::unique_ptr<Azure::Core::IO::BodyStream> body_stream = std::move(download_response.Value.BodyStream);
bytes_copied = body_stream->ReadToCount(reinterpret_cast<uint8_t *>(to), body_stream->Length());
LOG_TEST(log, "AzureBlobStorage readBigAt read bytes {}", bytes_copied);
if (read_settings.remote_throttler)
read_settings.remote_throttler->add(bytes_copied, ProfileEvents::RemoteReadThrottlerBytes, ProfileEvents::RemoteReadThrottlerSleepMicroseconds);
}
catch (const Azure::Core::RequestFailedException & e)
{
LOG_DEBUG(log, "Exception caught during Azure Download for file {} at offset {} at attempt {}/{}: {}", path, offset, i + 1, max_single_download_retries, e.Message);
if (i + 1 == max_single_download_retries)
throw;
sleepForMilliseconds(sleep_time_with_backoff_milliseconds);
sleep_time_with_backoff_milliseconds *= 2;
}
range_begin += bytes_copied;
to += bytes_copied;
n -= bytes_copied;
}
return initial_n;
}
}
#endif

View File

@ -44,6 +44,10 @@ public:
size_t getFileSize() override;
size_t readBigAt(char * to, size_t n, size_t range_begin, const std::function<bool(size_t)> & progress_callback) const override;
bool supportsReadAt() override { return true; }
private:
void initialize();

View File

@ -18,21 +18,48 @@ namespace ProfileEvents
namespace DB
{
struct WriteBufferFromAzureBlobStorage::PartData
{
Memory<> memory;
size_t data_size = 0;
std::string block_id;
};
BufferAllocationPolicyPtr createBufferAllocationPolicy(const AzureObjectStorageSettings & settings)
{
BufferAllocationPolicy::Settings allocation_settings;
allocation_settings.strict_size = settings.strict_upload_part_size;
allocation_settings.min_size = settings.min_upload_part_size;
allocation_settings.max_size = settings.max_upload_part_size;
allocation_settings.multiply_factor = settings.upload_part_size_multiply_factor;
allocation_settings.multiply_parts_count_threshold = settings.upload_part_size_multiply_parts_count_threshold;
allocation_settings.max_single_size = settings.max_single_part_upload_size;
return BufferAllocationPolicy::create(allocation_settings);
}
WriteBufferFromAzureBlobStorage::WriteBufferFromAzureBlobStorage(
std::shared_ptr<const Azure::Storage::Blobs::BlobContainerClient> blob_container_client_,
const String & blob_path_,
size_t max_single_part_upload_size_,
size_t max_unexpected_write_error_retries_,
size_t buf_size_,
const WriteSettings & write_settings_)
const WriteSettings & write_settings_,
std::shared_ptr<const AzureObjectStorageSettings> settings_,
ThreadPoolCallbackRunner<void> schedule_)
: WriteBufferFromFileBase(buf_size_, nullptr, 0)
, log(getLogger("WriteBufferFromAzureBlobStorage"))
, max_single_part_upload_size(max_single_part_upload_size_)
, max_unexpected_write_error_retries(max_unexpected_write_error_retries_)
, buffer_allocation_policy(createBufferAllocationPolicy(*settings_))
, max_single_part_upload_size(settings_->max_single_part_upload_size)
, max_unexpected_write_error_retries(settings_->max_unexpected_write_error_retries)
, blob_path(blob_path_)
, write_settings(write_settings_)
, blob_container_client(blob_container_client_)
, task_tracker(
std::make_unique<TaskTracker>(
std::move(schedule_),
settings_->max_inflight_parts_for_one_file,
limitedLog))
{
allocateBuffer();
}
@ -77,62 +104,73 @@ void WriteBufferFromAzureBlobStorage::execWithRetry(std::function<void()> func,
void WriteBufferFromAzureBlobStorage::finalizeImpl()
{
auto block_blob_client = blob_container_client->GetBlockBlobClient(blob_path);
/// If there is only one block and size is less than or equal to max_single_part_upload_size
/// then we use single part upload instead of multi part upload
if (buffer_allocation_policy->getBufferNumber() == 1)
{
size_t data_size = size_t(position() - memory.data());
if (data_size <= max_single_part_upload_size)
{
Azure::Core::IO::MemoryBodyStream memory_stream(reinterpret_cast<const uint8_t *>(memory.data()), data_size);
execWithRetry([&](){ block_blob_client.Upload(memory_stream); }, max_unexpected_write_error_retries, data_size);
LOG_TRACE(log, "Committed single block for blob `{}`", blob_path);
return;
}
}
execWithRetry([this](){ next(); }, max_unexpected_write_error_retries);
if (tmp_buffer_write_offset > 0)
uploadBlock(tmp_buffer->data(), tmp_buffer_write_offset);
task_tracker->waitAll();
auto block_blob_client = blob_container_client->GetBlockBlobClient(blob_path);
execWithRetry([&](){ block_blob_client.CommitBlockList(block_ids); }, max_unexpected_write_error_retries);
LOG_TRACE(log, "Committed {} blocks for blob `{}`", block_ids.size(), blob_path);
}
void WriteBufferFromAzureBlobStorage::uploadBlock(const char * data, size_t size)
{
auto block_blob_client = blob_container_client->GetBlockBlobClient(blob_path);
const std::string & block_id = block_ids.emplace_back(getRandomASCIIString(64));
Azure::Core::IO::MemoryBodyStream memory_stream(reinterpret_cast<const uint8_t *>(data), size);
execWithRetry([&](){ block_blob_client.StageBlock(block_id, memory_stream); }, max_unexpected_write_error_retries, size);
tmp_buffer_write_offset = 0;
LOG_TRACE(log, "Staged block (id: {}) of size {} (blob path: {}).", block_id, size, blob_path);
}
WriteBufferFromAzureBlobStorage::MemoryBufferPtr WriteBufferFromAzureBlobStorage::allocateBuffer() const
{
return std::make_unique<Memory<>>(max_single_part_upload_size);
}
void WriteBufferFromAzureBlobStorage::nextImpl()
{
size_t size_to_upload = offset();
task_tracker->waitIfAny();
writePart();
allocateBuffer();
}
if (size_to_upload == 0)
void WriteBufferFromAzureBlobStorage::allocateBuffer()
{
buffer_allocation_policy->nextBuffer();
auto size = buffer_allocation_policy->getBufferSize();
if (buffer_allocation_policy->getBufferNumber() == 1)
size = std::min(size_t(DBMS_DEFAULT_BUFFER_SIZE), size);
memory = Memory(size);
WriteBuffer::set(memory.data(), memory.size());
}
void WriteBufferFromAzureBlobStorage::writePart()
{
auto data_size = size_t(position() - memory.data());
if (data_size == 0)
return;
if (!tmp_buffer)
tmp_buffer = allocateBuffer();
const std::string & block_id = block_ids.emplace_back(getRandomASCIIString(64));
std::shared_ptr<PartData> part_data = std::make_shared<PartData>(std::move(memory), data_size, block_id);
WriteBuffer::set(nullptr, 0);
size_t uploaded_size = 0;
while (uploaded_size != size_to_upload)
auto upload_worker = [this, part_data] ()
{
size_t memory_buffer_remaining_size = max_single_part_upload_size - tmp_buffer_write_offset;
if (memory_buffer_remaining_size == 0)
uploadBlock(tmp_buffer->data(), tmp_buffer->size());
auto block_blob_client = blob_container_client->GetBlockBlobClient(blob_path);
size_t size = std::min(memory_buffer_remaining_size, size_to_upload - uploaded_size);
memcpy(tmp_buffer->data() + tmp_buffer_write_offset, working_buffer.begin() + uploaded_size, size);
uploaded_size += size;
tmp_buffer_write_offset += size;
}
Azure::Core::IO::MemoryBodyStream memory_stream(reinterpret_cast<const uint8_t *>(part_data->memory.data()), part_data->data_size);
execWithRetry([&](){ block_blob_client.StageBlock(part_data->block_id, memory_stream); }, max_unexpected_write_error_retries, part_data->data_size);
if (tmp_buffer_write_offset == max_single_part_upload_size)
uploadBlock(tmp_buffer->data(), tmp_buffer->size());
if (write_settings.remote_throttler)
write_settings.remote_throttler->add(part_data->data_size, ProfileEvents::RemoteWriteThrottlerBytes, ProfileEvents::RemoteWriteThrottlerSleepMicroseconds);
};
if (write_settings.remote_throttler)
write_settings.remote_throttler->add(size_to_upload, ProfileEvents::RemoteWriteThrottlerBytes, ProfileEvents::RemoteWriteThrottlerSleepMicroseconds);
task_tracker->add(std::move(upload_worker));
}
}

View File

@ -11,7 +11,9 @@
#include <IO/WriteSettings.h>
#include <azure/storage/blobs.hpp>
#include <azure/core/io/body_stream.hpp>
#include <Common/ThreadPoolTaskTracker.h>
#include <Common/BufferAllocationPolicy.h>
#include <Storages/StorageAzureBlob.h>
namespace Poco
{
@ -21,6 +23,8 @@ class Logger;
namespace DB
{
class TaskTracker;
class WriteBufferFromAzureBlobStorage : public WriteBufferFromFileBase
{
public:
@ -29,10 +33,10 @@ public:
WriteBufferFromAzureBlobStorage(
AzureClientPtr blob_container_client_,
const String & blob_path_,
size_t max_single_part_upload_size_,
size_t max_unexpected_write_error_retries_,
size_t buf_size_,
const WriteSettings & write_settings_);
const WriteSettings & write_settings_,
std::shared_ptr<const AzureObjectStorageSettings> settings_,
ThreadPoolCallbackRunner<void> schedule_ = {});
~WriteBufferFromAzureBlobStorage() override;
@ -42,11 +46,19 @@ public:
void sync() override { next(); }
private:
struct PartData;
void writePart();
void allocateBuffer();
void finalizeImpl() override;
void execWithRetry(std::function<void()> func, size_t num_tries, size_t cost = 0);
void uploadBlock(const char * data, size_t size);
LoggerPtr log;
LogSeriesLimiterPtr limitedLog = std::make_shared<LogSeriesLimiter>(log, 1, 5);
BufferAllocationPolicyPtr buffer_allocation_policy;
const size_t max_single_part_upload_size;
const size_t max_unexpected_write_error_retries;
@ -61,6 +73,10 @@ private:
size_t tmp_buffer_write_offset = 0;
MemoryBufferPtr allocateBuffer() const;
bool first_buffer=true;
std::unique_ptr<TaskTracker> task_tracker;
};
}

View File

@ -212,17 +212,23 @@ std::unique_ptr<BlobContainerClient> getAzureBlobContainerClient(
std::unique_ptr<AzureObjectStorageSettings> getAzureBlobStorageSettings(const Poco::Util::AbstractConfiguration & config, const String & config_prefix, ContextPtr context)
{
return std::make_unique<AzureObjectStorageSettings>(
config.getUInt64(config_prefix + ".max_single_part_upload_size", 100 * 1024 * 1024),
config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024),
config.getInt(config_prefix + ".max_single_read_retries", 3),
config.getInt(config_prefix + ".max_single_download_retries", 3),
config.getInt(config_prefix + ".list_object_keys_size", 1000),
config.getUInt64(config_prefix + ".max_upload_part_size", 5ULL * 1024 * 1024 * 1024),
config.getUInt64(config_prefix + ".max_single_part_copy_size", context->getSettings().azure_max_single_part_copy_size),
config.getBool(config_prefix + ".use_native_copy", false),
config.getUInt64(config_prefix + ".max_unexpected_write_error_retries", context->getSettings().azure_max_unexpected_write_error_retries)
);
std::unique_ptr<AzureObjectStorageSettings> settings = std::make_unique<AzureObjectStorageSettings>();
settings->max_single_part_upload_size = config.getUInt64(config_prefix + ".max_single_part_upload_size", context->getSettings().azure_max_single_part_upload_size);
settings->min_bytes_for_seek = config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024);
settings->max_single_read_retries = config.getInt(config_prefix + ".max_single_read_retries", 3);
settings->max_single_download_retries = config.getInt(config_prefix + ".max_single_download_retries", 3);
settings->list_object_keys_size = config.getInt(config_prefix + ".list_object_keys_size", 1000);
settings->min_upload_part_size = config.getUInt64(config_prefix + ".min_upload_part_size", context->getSettings().azure_min_upload_part_size);
settings->max_upload_part_size = config.getUInt64(config_prefix + ".max_upload_part_size", context->getSettings().azure_max_upload_part_size);
settings->max_single_part_copy_size = config.getUInt64(config_prefix + ".max_single_part_copy_size", context->getSettings().azure_max_single_part_copy_size);
settings->use_native_copy = config.getBool(config_prefix + ".use_native_copy", false);
settings->max_unexpected_write_error_retries = config.getUInt64(config_prefix + ".max_unexpected_write_error_retries", context->getSettings().azure_max_unexpected_write_error_retries);
settings->max_inflight_parts_for_one_file = config.getUInt64(config_prefix + ".max_inflight_parts_for_one_file", context->getSettings().azure_max_inflight_parts_for_one_file);
settings->strict_upload_part_size = config.getUInt64(config_prefix + ".strict_upload_part_size", context->getSettings().azure_strict_upload_part_size);
settings->upload_part_size_multiply_factor = config.getUInt64(config_prefix + ".upload_part_size_multiply_factor", context->getSettings().azure_upload_part_size_multiply_factor);
settings->upload_part_size_multiply_parts_count_threshold = config.getUInt64(config_prefix + ".upload_part_size_multiply_parts_count_threshold", context->getSettings().azure_upload_part_size_multiply_parts_count_threshold);
return settings;
}
}

View File

@ -265,10 +265,9 @@ std::unique_ptr<WriteBufferFromFileBase> AzureObjectStorage::writeObject( /// NO
return std::make_unique<WriteBufferFromAzureBlobStorage>(
client.get(),
object.remote_path,
settings.get()->max_single_part_upload_size,
settings.get()->max_unexpected_write_error_retries,
buf_size,
patchSettings(write_settings));
patchSettings(write_settings),
settings.get());
}
/// Remove file. Throws exception if file doesn't exists or it's a directory.

View File

@ -24,19 +24,29 @@ struct AzureObjectStorageSettings
int max_single_read_retries_,
int max_single_download_retries_,
int list_object_keys_size_,
size_t min_upload_part_size_,
size_t max_upload_part_size_,
size_t max_single_part_copy_size_,
bool use_native_copy_,
size_t max_unexpected_write_error_retries_)
size_t max_unexpected_write_error_retries_,
size_t max_inflight_parts_for_one_file_,
size_t strict_upload_part_size_,
size_t upload_part_size_multiply_factor_,
size_t upload_part_size_multiply_parts_count_threshold_)
: max_single_part_upload_size(max_single_part_upload_size_)
, min_bytes_for_seek(min_bytes_for_seek_)
, max_single_read_retries(max_single_read_retries_)
, max_single_download_retries(max_single_download_retries_)
, list_object_keys_size(list_object_keys_size_)
, min_upload_part_size(min_upload_part_size_)
, max_upload_part_size(max_upload_part_size_)
, max_single_part_copy_size(max_single_part_copy_size_)
, use_native_copy(use_native_copy_)
, max_unexpected_write_error_retries (max_unexpected_write_error_retries_)
, max_unexpected_write_error_retries(max_unexpected_write_error_retries_)
, max_inflight_parts_for_one_file(max_inflight_parts_for_one_file_)
, strict_upload_part_size(strict_upload_part_size_)
, upload_part_size_multiply_factor(upload_part_size_multiply_factor_)
, upload_part_size_multiply_parts_count_threshold(upload_part_size_multiply_parts_count_threshold_)
{
}
@ -52,6 +62,10 @@ struct AzureObjectStorageSettings
size_t max_single_part_copy_size = 256 * 1024 * 1024;
bool use_native_copy = false;
size_t max_unexpected_write_error_retries = 4;
size_t max_inflight_parts_for_one_file = 20;
size_t strict_upload_part_size = 0;
size_t upload_part_size_multiply_factor = 2;
size_t upload_part_size_multiply_parts_count_threshold = 500;
};
using AzureClient = Azure::Storage::Blobs::BlobContainerClient;

View File

@ -167,6 +167,7 @@ FormatSettings getFormatSettings(const ContextPtr & context, const Settings & se
format_settings.pretty.max_column_pad_width = settings.output_format_pretty_max_column_pad_width;
format_settings.pretty.max_rows = settings.output_format_pretty_max_rows;
format_settings.pretty.max_value_width = settings.output_format_pretty_max_value_width;
format_settings.pretty.max_value_width_apply_for_single_value = settings.output_format_pretty_max_value_width_apply_for_single_value;
format_settings.pretty.highlight_digit_groups = settings.output_format_pretty_highlight_digit_groups;
format_settings.pretty.output_format_pretty_row_numbers = settings.output_format_pretty_row_numbers;
format_settings.pretty.output_format_pretty_single_large_number_tip_threshold = settings.output_format_pretty_single_large_number_tip_threshold;

View File

@ -275,6 +275,7 @@ struct FormatSettings
UInt64 max_rows = 10000;
UInt64 max_column_pad_width = 250;
UInt64 max_value_width = 10000;
UInt64 max_value_width_apply_for_single_value = false;
bool highlight_digit_groups = true;
SettingFieldUInt64Auto color{"auto"};

View File

@ -14,6 +14,8 @@ extract_into_parent_list(clickhouse_functions_sources dbms_sources
multiMatchAny.cpp
checkHyperscanRegexp.cpp
array/has.cpp
equals.cpp
notEquals.cpp
CastOverloadResolver.cpp
)
extract_into_parent_list(clickhouse_functions_headers dbms_headers

View File

@ -348,6 +348,7 @@ public:
String getName() const override { return Name::name; }
bool useDefaultImplementationForNulls() const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
bool useDefaultImplementationForLowCardinalityColumns() const override { return false; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
{
@ -469,9 +470,6 @@ public:
else
return_type = json_return_type;
/// Top-level LowCardinality columns are processed outside JSON parser.
json_return_type = removeLowCardinality(json_return_type);
DataTypes argument_types;
argument_types.reserve(arguments.size());
for (const auto & argument : arguments)
@ -867,11 +865,9 @@ struct JSONExtractTree
explicit LowCardinalityFixedStringNode(const size_t fixed_length_) : fixed_length(fixed_length_) { }
bool insertResultToColumn(IColumn & dest, const Element & element) override
{
// If element is an object we delegate the insertion to JSONExtractRawImpl
if (element.isObject())
// For types other than string, delegate the insertion to JSONExtractRawImpl.
if (!element.isString())
return JSONExtractRawImpl<JSONParser>::insertResultToLowCardinalityFixedStringColumn(dest, element, fixed_length);
else if (!element.isString())
return false;
auto str = element.getString();
if (str.size() > fixed_length)
@ -1486,9 +1482,6 @@ public:
// We use insertResultToLowCardinalityFixedStringColumn in case we are inserting raw data in a Low Cardinality FixedString column
static bool insertResultToLowCardinalityFixedStringColumn(IColumn & dest, const Element & element, size_t fixed_length)
{
if (element.getObject().size() > fixed_length)
return false;
ColumnFixedString::Chars chars;
WriteBufferFromVector<ColumnFixedString::Chars> buf(chars, AppendModeTag());
traverse(element, buf);

View File

@ -13,6 +13,11 @@ REGISTER_FUNCTION(Equals)
factory.registerFunction<FunctionEquals>();
}
FunctionOverloadResolverPtr createInternalFunctionEqualOverloadResolver(bool decimal_check_overflow)
{
return std::make_unique<FunctionToOverloadResolverAdaptor>(std::make_shared<FunctionEquals>(decimal_check_overflow));
}
template <>
ColumnPtr FunctionComparison<EqualsOp, NameEquals>::executeTupleImpl(
const ColumnsWithTypeAndName & x, const ColumnsWithTypeAndName & y, size_t tuple_size, size_t input_rows_count) const

11
src/Functions/equals.h Normal file
View File

@ -0,0 +1,11 @@
#pragma once
#include <memory>
namespace DB
{
class IFunctionOverloadResolver;
using FunctionOverloadResolverPtr = std::shared_ptr<IFunctionOverloadResolver>;
FunctionOverloadResolverPtr createInternalFunctionEqualOverloadResolver(bool decimal_check_overflow);
}

View File

@ -0,0 +1,96 @@
#include <Functions/IFunction.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <DataTypes/DataTypeString.h>
#include <Columns/ColumnString.h>
#include <Interpreters/Context.h>
#include <Core/Field.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int FUNCTION_NOT_ALLOWED;
}
namespace
{
class FunctionGetClientHTTPHeader : public IFunction, WithContext
{
public:
explicit FunctionGetClientHTTPHeader(ContextPtr context_)
: WithContext(context_)
{
if (!getContext()->getSettingsRef().allow_get_client_http_header)
throw Exception(ErrorCodes::FUNCTION_NOT_ALLOWED, "The function getClientHTTPHeader requires setting `allow_get_client_http_header` to be enabled.");
}
String getName() const override { return "getClientHTTPHeader"; }
bool useDefaultImplementationForConstants() const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo &) const override { return false; }
size_t getNumberOfArguments() const override
{
return 1;
}
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isString(arguments[0]))
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "The argument of function {} must be String", getName());
return std::make_shared<DataTypeString>();
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
{
const ClientInfo & client_info = getContext()->getClientInfo();
const auto & source = arguments[0].column;
auto result = result_type->createColumn();
result->reserve(input_rows_count);
for (size_t row = 0; row < input_rows_count; ++row)
{
Field header;
source->get(row, header);
if (auto it = client_info.http_headers.find(header.get<String>()); it != client_info.http_headers.end())
result->insert(it->second);
else
result->insertDefault();
}
return result;
}
};
}
REGISTER_FUNCTION(GetClientHTTPHeader)
{
factory.registerFunction("getClientHTTPHeader",
[](ContextPtr context) { return std::make_shared<FunctionGetClientHTTPHeader>(context); },
FunctionDocumentation{
.description = R"(
Get the value of an HTTP header.
If there is no such header or the current request is not performed via the HTTP interface, the function returns an empty string.
Certain HTTP headers (e.g., `Authentication` and `X-ClickHouse-*`) are restricted.
The function requires the setting `allow_get_client_http_header` to be enabled.
The setting is not enabled by default for security reasons, because some headers, such as `Cookie`, could contain sensitive info.
HTTP headers are case sensitive for this function.
If the function is used in the context of a distributed query, it returns non-empty result only on the initiator node.
",
.syntax = "getClientHTTPHeader(name)",
.arguments = {{"name", "The HTTP header name (String)"}},
.returned_value = "The value of the header (String).",
.categories{"Miscellaneous"}});
}
}

View File

@ -7,7 +7,6 @@
#include <Functions/PerformanceAdaptors.h>
#include <Interpreters/castColumn.h>
#include <Common/TargetSpecific.h>
#include <base/range.h>
#include <cmath>
#include <numbers>
@ -42,121 +41,6 @@ namespace ErrorCodes
namespace
{
constexpr double PI = std::numbers::pi_v<double>;
constexpr float PI_F = std::numbers::pi_v<float>;
constexpr float RAD_IN_DEG = static_cast<float>(PI / 180.0);
constexpr float RAD_IN_DEG_HALF = static_cast<float>(PI / 360.0);
constexpr size_t COS_LUT_SIZE = 1024; // maxerr 0.00063%
constexpr float COS_LUT_SIZE_F = 1024.0f; // maxerr 0.00063%
constexpr size_t ASIN_SQRT_LUT_SIZE = 512;
constexpr size_t METRIC_LUT_SIZE = 1024;
/** Earth radius in meters using WGS84 authalic radius.
* We use this value to be consistent with H3 library.
*/
constexpr float EARTH_RADIUS = 6371007.180918475f;
constexpr float EARTH_DIAMETER = 2 * EARTH_RADIUS;
float cos_lut[COS_LUT_SIZE + 1]; /// cos(x) table
float asin_sqrt_lut[ASIN_SQRT_LUT_SIZE + 1]; /// asin(sqrt(x)) * earth_diameter table
float sphere_metric_lut[METRIC_LUT_SIZE + 1]; /// sphere metric, unitless: the distance in degrees for one degree across longitude depending on latitude
float sphere_metric_meters_lut[METRIC_LUT_SIZE + 1]; /// sphere metric: the distance in meters for one degree across longitude depending on latitude
float wgs84_metric_meters_lut[2 * (METRIC_LUT_SIZE + 1)]; /// ellipsoid metric: the distance in meters across one degree latitude/longitude depending on latitude
inline double sqr(double v)
{
return v * v;
}
inline float sqrf(float v)
{
return v * v;
}
void geodistInit()
{
for (size_t i = 0; i <= COS_LUT_SIZE; ++i)
cos_lut[i] = static_cast<float>(cos(2 * PI * i / COS_LUT_SIZE)); // [0, 2 * pi] -> [0, COS_LUT_SIZE]
for (size_t i = 0; i <= ASIN_SQRT_LUT_SIZE; ++i)
asin_sqrt_lut[i] = static_cast<float>(asin(
sqrt(static_cast<double>(i) / ASIN_SQRT_LUT_SIZE))); // [0, 1] -> [0, ASIN_SQRT_LUT_SIZE]
for (size_t i = 0; i <= METRIC_LUT_SIZE; ++i)
{
double latitude = i * (PI / METRIC_LUT_SIZE) - PI * 0.5; // [-pi / 2, pi / 2] -> [0, METRIC_LUT_SIZE]
/// Squared metric coefficients (for the distance in meters) on a tangent plane, for latitude and longitude (in degrees),
/// depending on the latitude (in radians).
/// https://github.com/mapbox/cheap-ruler/blob/master/index.js#L67
wgs84_metric_meters_lut[i * 2] = static_cast<float>(sqr(111132.09 - 566.05 * cos(2 * latitude) + 1.20 * cos(4 * latitude)));
wgs84_metric_meters_lut[i * 2 + 1] = static_cast<float>(sqr(111415.13 * cos(latitude) - 94.55 * cos(3 * latitude) + 0.12 * cos(5 * latitude)));
sphere_metric_meters_lut[i] = static_cast<float>(sqr((EARTH_DIAMETER * PI / 360) * cos(latitude)));
sphere_metric_lut[i] = static_cast<float>(sqr(cos(latitude)));
}
}
inline NO_SANITIZE_UNDEFINED size_t floatToIndex(float x)
{
/// Implementation specific behaviour on overflow or infinite value.
return static_cast<size_t>(x);
}
inline float geodistDegDiff(float f)
{
f = fabsf(f);
if (f > 180)
f = 360 - f;
return f;
}
inline float geodistFastCos(float x)
{
float y = fabsf(x) * (COS_LUT_SIZE_F / PI_F / 2.0f);
size_t i = floatToIndex(y);
y -= i;
i &= (COS_LUT_SIZE - 1);
return cos_lut[i] + (cos_lut[i + 1] - cos_lut[i]) * y;
}
inline float geodistFastSin(float x)
{
float y = fabsf(x) * (COS_LUT_SIZE_F / PI_F / 2.0f);
size_t i = floatToIndex(y);
y -= i;
i = (i - COS_LUT_SIZE / 4) & (COS_LUT_SIZE - 1); // cos(x - pi / 2) = sin(x), costable / 4 = pi / 2
return cos_lut[i] + (cos_lut[i + 1] - cos_lut[i]) * y;
}
/// fast implementation of asin(sqrt(x))
/// max error in floats 0.00369%, in doubles 0.00072%
inline float geodistFastAsinSqrt(float x)
{
if (x < 0.122f)
{
// distance under 4546 km, Taylor error under 0.00072%
float y = sqrtf(x);
return y + x * y * 0.166666666666666f + x * x * y * 0.075f + x * x * x * y * 0.044642857142857f;
}
if (x < 0.948f)
{
// distance under 17083 km, 512-entry LUT error under 0.00072%
x *= ASIN_SQRT_LUT_SIZE;
size_t i = floatToIndex(x);
return asin_sqrt_lut[i] + (asin_sqrt_lut[i + 1] - asin_sqrt_lut[i]) * (x - i);
}
return asinf(sqrtf(x)); // distance over 17083 km, just compute exact
}
enum class Method
{
SPHERE_DEGREES,
@ -164,18 +48,117 @@ enum class Method
WGS84_METERS,
};
}
constexpr size_t ASIN_SQRT_LUT_SIZE = 512;
constexpr size_t COS_LUT_SIZE = 1024; // maxerr 0.00063%
constexpr size_t METRIC_LUT_SIZE = 1024;
/// Earth radius in meters using WGS84 authalic radius.
/// We use this value to be consistent with H3 library.
constexpr double EARTH_RADIUS = 6371007.180918475;
constexpr double EARTH_DIAMETER = 2.0 * EARTH_RADIUS;
constexpr double PI = std::numbers::pi_v<double>;
template <typename T>
T sqr(T v) { return v * v; }
template <typename T>
struct Impl
{
T cos_lut[COS_LUT_SIZE + 1]; /// cos(x) table
T asin_sqrt_lut[ASIN_SQRT_LUT_SIZE + 1]; /// asin(sqrt(x)) * earth_diameter table
T sphere_metric_lut[METRIC_LUT_SIZE + 1]; /// sphere metric, unitless: the distance in degrees for one degree across longitude depending on latitude
T sphere_metric_meters_lut[METRIC_LUT_SIZE + 1]; /// sphere metric: the distance in meters for one degree across longitude depending on latitude
T wgs84_metric_meters_lut[2 * (METRIC_LUT_SIZE + 1)]; /// ellipsoid metric: the distance in meters across one degree latitude/longitude depending on latitude
Impl()
{
for (size_t i = 0; i <= COS_LUT_SIZE; ++i)
cos_lut[i] = T(std::cos(2 * PI * static_cast<double>(i) / COS_LUT_SIZE)); // [0, 2 * pi] -> [0, COS_LUT_SIZE]
for (size_t i = 0; i <= ASIN_SQRT_LUT_SIZE; ++i)
asin_sqrt_lut[i] = T(std::asin(std::sqrt(static_cast<double>(i) / ASIN_SQRT_LUT_SIZE))); // [0, 1] -> [0, ASIN_SQRT_LUT_SIZE]
for (size_t i = 0; i <= METRIC_LUT_SIZE; ++i)
{
double latitude = i * (PI / METRIC_LUT_SIZE) - PI * 0.5; // [-pi / 2, pi / 2] -> [0, METRIC_LUT_SIZE]
/// Squared metric coefficients (for the distance in meters) on a tangent plane, for latitude and longitude (in degrees),
/// depending on the latitude (in radians).
/// https://github.com/mapbox/cheap-ruler/blob/master/index.js#L67
wgs84_metric_meters_lut[i * 2] = T(sqr(111132.09 - 566.05 * std::cos(2.0 * latitude) + 1.20 * std::cos(4.0 * latitude)));
wgs84_metric_meters_lut[i * 2 + 1] = T(sqr(111415.13 * std::cos(latitude) - 94.55 * std::cos(3.0 * latitude) + 0.12 * std::cos(5.0 * latitude)));
sphere_metric_meters_lut[i] = T(sqr((EARTH_DIAMETER * PI / 360) * std::cos(latitude)));
sphere_metric_lut[i] = T(sqr(std::cos(latitude)));
}
}
static inline NO_SANITIZE_UNDEFINED size_t toIndex(T x)
{
/// Implementation specific behaviour on overflow or infinite value.
return static_cast<size_t>(x);
}
static inline T degDiff(T f)
{
f = std::abs(f);
if (f > 180)
f = 360 - f;
return f;
}
inline T fastCos(T x)
{
T y = std::abs(x) * (T(COS_LUT_SIZE) / T(PI) / T(2.0));
size_t i = toIndex(y);
y -= i;
i &= (COS_LUT_SIZE - 1);
return cos_lut[i] + (cos_lut[i + 1] - cos_lut[i]) * y;
}
inline T fastSin(T x)
{
T y = std::abs(x) * (T(COS_LUT_SIZE) / T(PI) / T(2.0));
size_t i = toIndex(y);
y -= i;
i = (i - COS_LUT_SIZE / 4) & (COS_LUT_SIZE - 1); // cos(x - pi / 2) = sin(x), costable / 4 = pi / 2
return cos_lut[i] + (cos_lut[i + 1] - cos_lut[i]) * y;
}
/// fast implementation of asin(sqrt(x))
/// max error in floats 0.00369%, in doubles 0.00072%
inline T fastAsinSqrt(T x)
{
if (x < T(0.122))
{
// distance under 4546 km, Taylor error under 0.00072%
T y = std::sqrt(x);
return y + x * y * T(0.166666666666666) + x * x * y * T(0.075) + x * x * x * y * T(0.044642857142857);
}
if (x < T(0.948))
{
// distance under 17083 km, 512-entry LUT error under 0.00072%
x *= ASIN_SQRT_LUT_SIZE;
size_t i = toIndex(x);
return asin_sqrt_lut[i] + (asin_sqrt_lut[i + 1] - asin_sqrt_lut[i]) * (x - i);
}
return std::asin(std::sqrt(x)); /// distance is over 17083 km, just compute exact
}
};
template <typename T> Impl<T> impl;
DECLARE_MULTITARGET_CODE(
namespace
{
template <Method method>
float distance(float lon1deg, float lat1deg, float lon2deg, float lat2deg)
template <Method method, typename T>
T distance(T lon1deg, T lat1deg, T lon2deg, T lat2deg)
{
float lat_diff = geodistDegDiff(lat1deg - lat2deg);
float lon_diff = geodistDegDiff(lon1deg - lon2deg);
T lat_diff = impl<T>.degDiff(lat1deg - lat2deg);
T lon_diff = impl<T>.degDiff(lon1deg - lon2deg);
if (lon_diff < 13)
{
@ -187,51 +170,54 @@ float distance(float lon1deg, float lat1deg, float lon2deg, float lat2deg)
/// (Remember how a plane flies from Amsterdam to New York)
/// But if longitude is close but latitude is different enough, there is no difference between meridian and great circle line.
float latitude_midpoint = (lat1deg + lat2deg + 180) * METRIC_LUT_SIZE / 360; // [-90, 90] degrees -> [0, METRIC_LUT_SIZE] indexes
size_t latitude_midpoint_index = floatToIndex(latitude_midpoint) & (METRIC_LUT_SIZE - 1);
T latitude_midpoint = (lat1deg + lat2deg + 180) * METRIC_LUT_SIZE / 360; // [-90, 90] degrees -> [0, METRIC_LUT_SIZE] indexes
size_t latitude_midpoint_index = impl<T>.toIndex(latitude_midpoint) & (METRIC_LUT_SIZE - 1);
/// This is linear interpolation between two table items at index "latitude_midpoint_index" and "latitude_midpoint_index + 1".
float k_lat{};
float k_lon{};
T k_lat{};
T k_lon{};
if constexpr (method == Method::SPHERE_DEGREES)
{
k_lat = 1;
k_lon = sphere_metric_lut[latitude_midpoint_index]
+ (sphere_metric_lut[latitude_midpoint_index + 1] - sphere_metric_lut[latitude_midpoint_index]) * (latitude_midpoint - latitude_midpoint_index);
k_lon = impl<T>.sphere_metric_lut[latitude_midpoint_index]
+ (impl<T>.sphere_metric_lut[latitude_midpoint_index + 1] - impl<T>.sphere_metric_lut[latitude_midpoint_index]) * (latitude_midpoint - latitude_midpoint_index);
}
else if constexpr (method == Method::SPHERE_METERS)
{
k_lat = sqrf(EARTH_DIAMETER * PI_F / 360.0f);
k_lat = sqr(T(EARTH_DIAMETER) * T(PI) / T(360.0));
k_lon = sphere_metric_meters_lut[latitude_midpoint_index]
+ (sphere_metric_meters_lut[latitude_midpoint_index + 1] - sphere_metric_meters_lut[latitude_midpoint_index]) * (latitude_midpoint - latitude_midpoint_index);
k_lon = impl<T>.sphere_metric_meters_lut[latitude_midpoint_index]
+ (impl<T>.sphere_metric_meters_lut[latitude_midpoint_index + 1] - impl<T>.sphere_metric_meters_lut[latitude_midpoint_index]) * (latitude_midpoint - latitude_midpoint_index);
}
else if constexpr (method == Method::WGS84_METERS)
{
k_lat = wgs84_metric_meters_lut[latitude_midpoint_index * 2]
+ (wgs84_metric_meters_lut[(latitude_midpoint_index + 1) * 2] - wgs84_metric_meters_lut[latitude_midpoint_index * 2]) * (latitude_midpoint - latitude_midpoint_index);
k_lat = impl<T>.wgs84_metric_meters_lut[latitude_midpoint_index * 2]
+ (impl<T>.wgs84_metric_meters_lut[(latitude_midpoint_index + 1) * 2] - impl<T>.wgs84_metric_meters_lut[latitude_midpoint_index * 2]) * (latitude_midpoint - latitude_midpoint_index);
k_lon = wgs84_metric_meters_lut[latitude_midpoint_index * 2 + 1]
+ (wgs84_metric_meters_lut[(latitude_midpoint_index + 1) * 2 + 1] - wgs84_metric_meters_lut[latitude_midpoint_index * 2 + 1]) * (latitude_midpoint - latitude_midpoint_index);
k_lon = impl<T>.wgs84_metric_meters_lut[latitude_midpoint_index * 2 + 1]
+ (impl<T>.wgs84_metric_meters_lut[(latitude_midpoint_index + 1) * 2 + 1] - impl<T>.wgs84_metric_meters_lut[latitude_midpoint_index * 2 + 1]) * (latitude_midpoint - latitude_midpoint_index);
}
/// Metric on a tangent plane: it differs from Euclidean metric only by scale of coordinates.
return sqrtf(k_lat * lat_diff * lat_diff + k_lon * lon_diff * lon_diff);
return std::sqrt(k_lat * lat_diff * lat_diff + k_lon * lon_diff * lon_diff);
}
else
{
// points too far away; use haversine
/// Points are too far away: use Haversine.
float a = sqrf(geodistFastSin(lat_diff * RAD_IN_DEG_HALF))
+ geodistFastCos(lat1deg * RAD_IN_DEG) * geodistFastCos(lat2deg * RAD_IN_DEG) * sqrf(geodistFastSin(lon_diff * RAD_IN_DEG_HALF));
static constexpr T RAD_IN_DEG = T(PI / 180.0);
static constexpr T RAD_IN_DEG_HALF = T(PI / 360.0);
T a = sqr(impl<T>.fastSin(lat_diff * RAD_IN_DEG_HALF))
+ impl<T>.fastCos(lat1deg * RAD_IN_DEG) * impl<T>.fastCos(lat2deg * RAD_IN_DEG) * sqr(impl<T>.fastSin(lon_diff * RAD_IN_DEG_HALF));
if constexpr (method == Method::SPHERE_DEGREES)
return (360.0f / PI_F) * geodistFastAsinSqrt(a);
return (T(360.0) / T(PI)) * impl<T>.fastAsinSqrt(a);
else
return EARTH_DIAMETER * geodistFastAsinSqrt(a);
return T(EARTH_DIAMETER) * impl<T>.fastAsinSqrt(a);
}
}
@ -241,13 +227,24 @@ template <Method method>
class FunctionGeoDistance : public IFunction
{
public:
static constexpr auto name =
(method == Method::SPHERE_DEGREES) ? "greatCircleAngle"
: ((method == Method::SPHERE_METERS) ? "greatCircleDistance"
: "geoDistance");
explicit FunctionGeoDistance(ContextPtr context)
{
always_float32 = !context->getSettingsRef().geo_distance_returns_float64_on_float64_arguments;
}
private:
String getName() const override { return name; }
bool always_float32;
String getName() const override
{
if constexpr (method == Method::SPHERE_DEGREES)
return "greatCircleAngle";
if constexpr (method == Method::SPHERE_METERS)
return "greatCircleDistance";
else
return "geoDistance";
}
size_t getNumberOfArguments() const override { return 4; }
bool useDefaultImplementationForConstants() const override { return true; }
@ -255,22 +252,31 @@ private:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
for (const auto arg_idx : collections::range(0, arguments.size()))
bool has_float64 = false;
for (size_t arg_idx = 0; arg_idx < 4; ++arg_idx)
{
const auto * arg = arguments[arg_idx].get();
if (!isNumber(WhichDataType(arg)))
WhichDataType which(arguments[arg_idx]);
if (!isNumber(which))
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Illegal type {} of argument {} of function {}. "
"Must be numeric", arg->getName(), std::to_string(arg_idx + 1), getName());
"Must be numeric", arguments[arg_idx]->getName(), std::to_string(arg_idx + 1), getName());
if (which.isFloat64())
has_float64 = true;
}
return std::make_shared<DataTypeFloat32>();
if (has_float64 && !always_float32)
return std::make_shared<DataTypeFloat64>();
else
return std::make_shared<DataTypeFloat32>();
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
{
auto dst = ColumnVector<Float32>::create();
auto & dst_data = dst->getData();
dst_data.resize(input_rows_count);
bool returns_float64 = WhichDataType(result_type).isFloat64();
auto dst = result_type->createColumn();
auto arguments_copy = arguments;
for (auto & argument : arguments_copy)
@ -280,10 +286,24 @@ private:
argument.type = result_type;
}
const auto * col_lon1 = convertArgumentColumnToFloat32(arguments_copy, 0);
const auto * col_lat1 = convertArgumentColumnToFloat32(arguments_copy, 1);
const auto * col_lon2 = convertArgumentColumnToFloat32(arguments_copy, 2);
const auto * col_lat2 = convertArgumentColumnToFloat32(arguments_copy, 3);
if (returns_float64)
run<Float64>(arguments_copy, dst, input_rows_count);
else
run<Float32>(arguments_copy, dst, input_rows_count);
return dst;
}
template <typename T>
void run(const ColumnsWithTypeAndName & arguments, MutableColumnPtr & dst, size_t input_rows_count) const
{
const auto * col_lon1 = convertArgumentColumn<T>(arguments, 0);
const auto * col_lat1 = convertArgumentColumn<T>(arguments, 1);
const auto * col_lon2 = convertArgumentColumn<T>(arguments, 2);
const auto * col_lat2 = convertArgumentColumn<T>(arguments, 3);
auto & dst_data = assert_cast<ColumnVector<T> &>(*dst).getData();
dst_data.resize(input_rows_count);
for (size_t row_num = 0; row_num < input_rows_count; ++row_num)
{
@ -291,20 +311,20 @@ private:
col_lon1->getData()[row_num], col_lat1->getData()[row_num],
col_lon2->getData()[row_num], col_lat2->getData()[row_num]);
}
return dst;
}
const ColumnFloat32 * convertArgumentColumnToFloat32(const ColumnsWithTypeAndName & arguments, size_t argument_index) const
template <typename T>
const ColumnVector<T> * convertArgumentColumn(const ColumnsWithTypeAndName & arguments, size_t argument_index) const
{
const auto * column_typed = checkAndGetColumn<ColumnFloat32>(arguments[argument_index].column.get());
const auto * column_typed = checkAndGetColumn<ColumnVector<T>>(arguments[argument_index].column.get());
if (!column_typed)
throw Exception(
ErrorCodes::ILLEGAL_COLUMN,
"Illegal type {} of argument {} of function {}. Must be Float32.",
"Illegal type {} of argument {} of function {}. Must be {}.",
arguments[argument_index].type->getName(),
argument_index + 1,
getName());
getName(),
TypeName<T>);
return column_typed;
}
@ -316,18 +336,19 @@ template <Method method>
class FunctionGeoDistance : public TargetSpecific::Default::FunctionGeoDistance<method>
{
public:
explicit FunctionGeoDistance(ContextPtr context) : selector(context)
explicit FunctionGeoDistance(ContextPtr context)
: TargetSpecific::Default::FunctionGeoDistance<method>(context), selector(context)
{
selector.registerImplementation<TargetArch::Default,
TargetSpecific::Default::FunctionGeoDistance<method>>();
TargetSpecific::Default::FunctionGeoDistance<method>>(context);
#if USE_MULTITARGET_CODE
selector.registerImplementation<TargetArch::AVX,
TargetSpecific::AVX::FunctionGeoDistance<method>>();
TargetSpecific::AVX::FunctionGeoDistance<method>>(context);
selector.registerImplementation<TargetArch::AVX2,
TargetSpecific::AVX2::FunctionGeoDistance<method>>();
TargetSpecific::AVX2::FunctionGeoDistance<method>>(context);
selector.registerImplementation<TargetArch::AVX512F,
TargetSpecific::AVX512F::FunctionGeoDistance<method>>();
TargetSpecific::AVX512F::FunctionGeoDistance<method>>(context);
#endif
}
@ -345,12 +366,13 @@ private:
ImplementationSelector<IFunction> selector;
};
}
REGISTER_FUNCTION(GeoDistance)
{
geodistInit();
factory.registerFunction<FunctionGeoDistance<Method::SPHERE_DEGREES>>();
factory.registerFunction<FunctionGeoDistance<Method::SPHERE_METERS>>();
factory.registerFunction<FunctionGeoDistance<Method::WGS84_METERS>>();
factory.registerFunction("greatCircleAngle", [](ContextPtr context) { return std::make_shared<FunctionGeoDistance<Method::SPHERE_DEGREES>>(std::move(context)); });
factory.registerFunction("greatCircleDistance", [](ContextPtr context) { return std::make_shared<FunctionGeoDistance<Method::SPHERE_METERS>>(std::move(context)); });
factory.registerFunction("geoDistance", [](ContextPtr context) { return std::make_shared<FunctionGeoDistance<Method::WGS84_METERS>>(std::move(context)); });
}
}

View File

@ -14,4 +14,9 @@ REGISTER_FUNCTION(ScalarSubqueryResult)
factory.registerFunction<FunctionScalarSubqueryResult>();
}
REGISTER_FUNCTION(ActionName)
{
factory.registerFunction<FunctionActionName>();
}
}

View File

@ -42,4 +42,18 @@ struct ScalarSubqueryResultName
using FunctionIdentity = FunctionIdentityBase<IdentityName>;
using FunctionScalarSubqueryResult = FunctionIdentityBase<ScalarSubqueryResultName>;
struct ActionNameName
{
static constexpr auto name = "__actionName";
};
class FunctionActionName : public FunctionIdentityBase<ActionNameName>
{
public:
using FunctionIdentityBase::FunctionIdentityBase;
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionActionName>(); }
size_t getNumberOfArguments() const override { return 2; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; }
};
}

View File

@ -12,6 +12,11 @@ REGISTER_FUNCTION(NotEquals)
factory.registerFunction<FunctionNotEquals>();
}
FunctionOverloadResolverPtr createInternalFunctionNotEqualOverloadResolver(bool decimal_check_overflow)
{
return std::make_unique<FunctionToOverloadResolverAdaptor>(std::make_shared<FunctionNotEquals>(decimal_check_overflow));
}
template <>
ColumnPtr FunctionComparison<NotEqualsOp, NameNotEquals>::executeTupleImpl(
const ColumnsWithTypeAndName & x, const ColumnsWithTypeAndName & y, size_t tuple_size, size_t input_rows_count) const

11
src/Functions/notEquals.h Normal file
View File

@ -0,0 +1,11 @@
#pragma once
#include <memory>
namespace DB
{
class IFunctionOverloadResolver;
using FunctionOverloadResolverPtr = std::shared_ptr<IFunctionOverloadResolver>;
FunctionOverloadResolverPtr createInternalFunctionNotEqualOverloadResolver(bool decimal_check_overflow);
}

View File

@ -0,0 +1,11 @@
#include <IO/UncompressedCache.h>
namespace DB
{
template class CacheBase<UInt128, UncompressedCacheCell, UInt128TrivialHash, UncompressedSizeWeightFunction>;
UncompressedCache::UncompressedCache(const String & cache_policy, size_t max_size_in_bytes, double size_ratio)
: Base(cache_policy, max_size_in_bytes, 0, size_ratio)
{
}
}

View File

@ -33,6 +33,7 @@ struct UncompressedSizeWeightFunction
}
};
extern template class CacheBase<UInt128, UncompressedCacheCell, UInt128TrivialHash, UncompressedSizeWeightFunction>;
/** Cache of decompressed blocks for implementation of CachedCompressedReadBuffer. thread-safe.
*/
@ -42,8 +43,7 @@ private:
using Base = CacheBase<UInt128, UncompressedCacheCell, UInt128TrivialHash, UncompressedSizeWeightFunction>;
public:
UncompressedCache(const String & cache_policy, size_t max_size_in_bytes, double size_ratio)
: Base(cache_policy, max_size_in_bytes, 0, size_ratio) {}
UncompressedCache(const String & cache_policy, size_t max_size_in_bytes, double size_ratio);
/// Calculate key from path to file and offset.
static UInt128 hash(const String & path_to_file, size_t offset)

View File

@ -79,7 +79,7 @@ inline char * writeVarInt(Int64 x, char * ostr)
return writeVarUInt(static_cast<UInt64>((x << 1) ^ (x >> 63)), ostr);
}
namespace impl
namespace varint_impl
{
template <bool check_eof>
@ -106,8 +106,8 @@ inline void readVarUInt(UInt64 & x, ReadBuffer & istr)
inline void readVarUInt(UInt64 & x, ReadBuffer & istr)
{
if (istr.buffer().end() - istr.position() >= 10)
return impl::readVarUInt<false>(x, istr);
return impl::readVarUInt<true>(x, istr);
return varint_impl::readVarUInt<false>(x, istr);
return varint_impl::readVarUInt<true>(x, istr);
}
inline void readVarUInt(UInt64 & x, std::istream & istr)

View File

@ -4,8 +4,8 @@
#include "StdIStreamFromMemory.h"
#include "WriteBufferFromS3.h"
#include "WriteBufferFromS3TaskTracker.h"
#include <Common/ThreadPoolTaskTracker.h>
#include <Common/logger_useful.h>
#include <Common/ProfileEvents.h>
#include <Common/Throttler.h>
@ -72,6 +72,19 @@ struct WriteBufferFromS3::PartData
}
};
BufferAllocationPolicyPtr createBufferAllocationPolicy(const S3Settings::RequestSettings::PartUploadSettings & settings)
{
BufferAllocationPolicy::Settings allocation_settings;
allocation_settings.strict_size = settings.strict_upload_part_size;
allocation_settings.min_size = settings.min_upload_part_size;
allocation_settings.max_size = settings.max_upload_part_size;
allocation_settings.multiply_factor = settings.upload_part_size_multiply_factor;
allocation_settings.multiply_parts_count_threshold = settings.upload_part_size_multiply_parts_count_threshold;
allocation_settings.max_single_size = settings.max_single_part_upload_size;
return BufferAllocationPolicy::create(allocation_settings);
}
WriteBufferFromS3::WriteBufferFromS3(
std::shared_ptr<const S3::Client> client_ptr_,
@ -91,9 +104,9 @@ WriteBufferFromS3::WriteBufferFromS3(
, write_settings(write_settings_)
, client_ptr(std::move(client_ptr_))
, object_metadata(std::move(object_metadata_))
, buffer_allocation_policy(ChooseBufferPolicy(upload_settings))
, buffer_allocation_policy(createBufferAllocationPolicy(upload_settings))
, task_tracker(
std::make_unique<WriteBufferFromS3::TaskTracker>(
std::make_unique<TaskTracker>(
std::move(schedule_),
upload_settings.max_inflight_parts_for_one_file,
limitedLog))
@ -320,14 +333,6 @@ void WriteBufferFromS3::detachBuffer()
detached_part_data.push_back({std::move(buf), data_size});
}
void WriteBufferFromS3::allocateFirstBuffer()
{
const auto max_first_buffer = buffer_allocation_policy->getBufferSize();
const auto size = std::min(size_t(DBMS_DEFAULT_BUFFER_SIZE), max_first_buffer);
memory = Memory(size);
WriteBuffer::set(memory.data(), memory.size());
}
void WriteBufferFromS3::allocateBuffer()
{
buffer_allocation_policy->nextBuffer();
@ -340,6 +345,14 @@ void WriteBufferFromS3::allocateBuffer()
WriteBuffer::set(memory.data(), memory.size());
}
void WriteBufferFromS3::allocateFirstBuffer()
{
const auto max_first_buffer = buffer_allocation_policy->getBufferSize();
const auto size = std::min(size_t(DBMS_DEFAULT_BUFFER_SIZE), max_first_buffer);
memory = Memory(size);
WriteBuffer::set(memory.data(), memory.size());
}
void WriteBufferFromS3::setFakeBufferWhenPreFinalized()
{
WriteBuffer::set(fake_buffer_when_prefinalized, sizeof(fake_buffer_when_prefinalized));

View File

@ -12,6 +12,8 @@
#include <Storages/StorageS3Settings.h>
#include <Common/threadPoolCallbackRunner.h>
#include <IO/S3/BlobStorageLogWriter.h>
#include <Common/ThreadPoolTaskTracker.h>
#include <Common/BufferAllocationPolicy.h>
#include <memory>
#include <vector>
@ -26,6 +28,8 @@ namespace DB
* Data is divided on chunks with size greater than 'minimum_upload_part_size'. Last chunk can be less than this threshold.
* Each chunk is written as a part to S3.
*/
class TaskTracker;
class WriteBufferFromS3 final : public WriteBufferFromFileBase
{
public:
@ -46,18 +50,6 @@ public:
std::string getFileName() const override { return key; }
void sync() override { next(); }
class IBufferAllocationPolicy
{
public:
virtual size_t getBufferNumber() const = 0;
virtual size_t getBufferSize() const = 0;
virtual void nextBuffer() = 0;
virtual ~IBufferAllocationPolicy() = 0;
};
using IBufferAllocationPolicyPtr = std::unique_ptr<IBufferAllocationPolicy>;
static IBufferAllocationPolicyPtr ChooseBufferPolicy(const S3Settings::RequestSettings::PartUploadSettings & settings_);
private:
/// Receives response from the server after sending all data.
void finalizeImpl() override;
@ -67,10 +59,10 @@ private:
struct PartData;
void hidePartialData();
void allocateFirstBuffer();
void reallocateFirstBuffer();
void detachBuffer();
void allocateBuffer();
void allocateFirstBuffer();
void setFakeBufferWhenPreFinalized();
S3::UploadPartRequest getUploadRequest(size_t part_number, PartData & data);
@ -94,7 +86,7 @@ private:
LoggerPtr log = getLogger("WriteBufferFromS3");
LogSeriesLimiterPtr limitedLog = std::make_shared<LogSeriesLimiter>(log, 1, 5);
IBufferAllocationPolicyPtr buffer_allocation_policy;
BufferAllocationPolicyPtr buffer_allocation_policy;
/// Upload in S3 is made in parts.
/// We initiate upload, then upload each part and get ETag as a response, and then finalizeImpl() upload with listing all our parts.
@ -119,7 +111,6 @@ private:
size_t total_size = 0;
size_t hidden_size = 0;
class TaskTracker;
std::unique_ptr<TaskTracker> task_tracker;
BlobStorageLogWriterPtr blob_log;

View File

@ -19,7 +19,6 @@
#include <base/find_symbols.h>
#include <base/StringRef.h>
#include <base/DecomposedFloat.h>
#include <base/EnumReflection.h>
#include <Core/DecimalFunctions.h>
#include <Core/Types.h>

View File

@ -741,7 +741,7 @@ Block ActionsDAG::updateHeader(Block header) const
catch (Exception & e)
{
if (e.code() == ErrorCodes::NOT_FOUND_COLUMN_IN_BLOCK)
e.addMessage(" in block {}", header.dumpStructure());
e.addMessage("in block {}", header.dumpStructure());
throw;
}

View File

@ -5,6 +5,7 @@
#include <IO/WriteHelpers.h>
#include <Core/ProtocolDefines.h>
#include <base/getFQDNOrHostName.h>
#include <Poco/Net/HTTPRequest.h>
#include <unistd.h>
#include <Common/config_version.h>
@ -255,7 +256,29 @@ String toString(ClientInfo::Interface interface)
return "TCP_INTERSERVER";
}
return std::format("Unknown {}!\n", static_cast<int>(interface));
return std::format("Unknown server interface ({}).", static_cast<int>(interface));
}
void ClientInfo::setFromHTTPRequest(const Poco::Net::HTTPRequest & request)
{
http_method = ClientInfo::HTTPMethod::UNKNOWN;
if (request.getMethod() == Poco::Net::HTTPRequest::HTTP_GET)
http_method = ClientInfo::HTTPMethod::GET;
else if (request.getMethod() == Poco::Net::HTTPRequest::HTTP_POST)
http_method = ClientInfo::HTTPMethod::POST;
http_user_agent = request.get("User-Agent", "");
http_referer = request.get("Referer", "");
forwarded_for = request.get("X-Forwarded-For", "");
for (const auto & header : request)
{
/// These headers can contain authentication info and shouldn't be accessible by the user.
String key_lowercase = Poco::toLower(header.first);
if (key_lowercase.starts_with("x-clickhouse") || key_lowercase == "authentication")
continue;
http_headers[header.first] = header.second;
}
}
}

View File

@ -7,6 +7,12 @@
#include <Common/VersionNumber.h>
#include <boost/algorithm/string/trim.hpp>
namespace Poco::Net
{
class HTTPRequest;
}
namespace DB
{
@ -93,6 +99,7 @@ public:
HTTPMethod http_method = HTTPMethod::UNKNOWN;
String http_user_agent;
String http_referer;
std::unordered_map<String, String> http_headers;
/// For mysql and postgresql
UInt64 connection_id = 0;
@ -135,6 +142,9 @@ public:
/// Initialize parameters on client initiating query.
void setInitialQuery();
/// Initialize parameters related to HTTP request.
void setFromHTTPRequest(const Poco::Net::HTTPRequest & request);
bool clientVersionEquals(const ClientInfo & other, bool compare_patch) const;
String getVersionStr() const;

View File

@ -1219,7 +1219,7 @@ void Context::addWarningMessageAboutDatabaseOrdinary(const String & database_nam
/// We don't use getFlagsPath method, because it takes a shared lock.
auto convert_databases_flag = fs::path(shared->flags_path) / "convert_ordinary_to_atomic";
auto message = fmt::format("Server has databases (for example `{}`) with Ordinary engine, which was deprecated. "
"To convert this database to a new Atomic engine, please create a forcing flag {} and make sure that ClickHouse has write permission for it. "
"To convert this database to a new Atomic engine, create a flag {} and make sure that ClickHouse has write permission for it. "
"Example: sudo touch '{}' && sudo chmod 666 '{}'",
database_name,
convert_databases_flag.string(), convert_databases_flag.string(), convert_databases_flag.string());
@ -2490,7 +2490,8 @@ AsyncLoader & Context::getAsyncLoader() const
}
},
/* log_failures = */ true,
/* log_progress = */ true);
/* log_progress = */ true,
/* log_events = */ true);
});
return *shared->async_loader;
@ -4640,11 +4641,9 @@ void Context::setClientConnectionId(uint32_t connection_id_)
client_info.connection_id = connection_id_;
}
void Context::setHTTPClientInfo(ClientInfo::HTTPMethod http_method, const String & http_user_agent, const String & http_referer)
void Context::setHTTPClientInfo(const Poco::Net::HTTPRequest & request)
{
client_info.http_method = http_method;
client_info.http_user_agent = http_user_agent;
client_info.http_referer = http_referer;
client_info.setFromHTTPRequest(request);
need_recalculate_access = true;
}

View File

@ -642,7 +642,7 @@ public:
void setClientInterface(ClientInfo::Interface interface);
void setClientVersion(UInt64 client_version_major, UInt64 client_version_minor, UInt64 client_version_patch, unsigned client_tcp_protocol_version);
void setClientConnectionId(uint32_t connection_id);
void setHTTPClientInfo(ClientInfo::HTTPMethod http_method, const String & http_user_agent, const String & http_referer);
void setHTTPClientInfo(const Poco::Net::HTTPRequest & request);
void setForwardedFor(const String & forwarded_for);
void setQueryKind(ClientInfo::QueryKind query_kind);
void setQueryKindInitial();

View File

@ -1881,7 +1881,7 @@ void InterpreterCreateQuery::addColumnsDescriptionToCreateQueryIfNecessary(ASTCr
void InterpreterCreateQuery::processSQLSecurityOption(ContextPtr context_, ASTSQLSecurity & sql_security, bool is_attach, bool is_materialized_view)
{
/// If no SQL security is specified, apply default from default_*_view_sql_security setting.
if (!sql_security.type.has_value())
if (!sql_security.type)
{
SQLSecurityType default_security;

View File

@ -13,7 +13,6 @@
#include <Parsers/ASTTablesInSelectQuery.h>
#include <Parsers/ExpressionListParsers.h>
#include <Parsers/parseQuery.h>
#include <Parsers/FunctionParameterValuesVisitor.h>
#include <Access/Common/AccessFlags.h>
#include <Access/ContextAccess.h>
@ -73,7 +72,6 @@
#include <Processors/Transforms/FilterTransform.h>
#include <QueryPipeline/QueryPipelineBuilder.h>
#include <Storages/IStorage.h>
#include <Storages/MergeTree/MergeTreeWhereOptimizer.h>
#include <Storages/StorageDistributed.h>
#include <Storages/StorageDummy.h>
@ -85,7 +83,6 @@
#include <Core/ColumnNumbers.h>
#include <Core/Field.h>
#include <Core/ProtocolDefines.h>
#include <Functions/IFunction.h>
#include <Interpreters/Aggregator.h>
#include <Interpreters/IJoin.h>
#include <QueryPipeline/SizeLimits.h>

View File

@ -59,6 +59,7 @@
#include <Storages/System/StorageSystemFilesystemCache.h>
#include <Parsers/ASTSystemQuery.h>
#include <Parsers/ASTCreateQuery.h>
#include <Parsers/ASTSetQuery.h>
#include <Processors/Sources/SourceFromSingleChunk.h>
#include <Common/ThreadFuzzer.h>
#include <base/coverage.h>
@ -1165,12 +1166,16 @@ void InterpreterSystemQuery::syncTransactionLog()
}
void InterpreterSystemQuery::flushDistributed(ASTSystemQuery &)
void InterpreterSystemQuery::flushDistributed(ASTSystemQuery & query)
{
getContext()->checkAccess(AccessType::SYSTEM_FLUSH_DISTRIBUTED, table_id);
SettingsChanges settings_changes;
if (query.query_settings)
settings_changes = query.query_settings->as<ASTSetQuery>()->changes;
if (auto * storage_distributed = dynamic_cast<StorageDistributed *>(DatabaseCatalog::instance().getTable(table_id, getContext()).get()))
storage_distributed->flushClusterNodesAllData(getContext());
storage_distributed->flushClusterNodesAllData(getContext(), settings_changes);
else
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Table {} is not distributed", table_id.getNameForLogs());
}

View File

@ -341,6 +341,11 @@ bool MutationsInterpreter::Source::hasProjection(const String & name) const
return part && part->hasProjection(name);
}
bool MutationsInterpreter::Source::hasBrokenProjection(const String & name) const
{
return part && part->hasBrokenProjection(name);
}
bool MutationsInterpreter::Source::isCompactPart() const
{
return part && part->getType() == MergeTreeDataPartType::Compact;
@ -802,7 +807,7 @@ void MutationsInterpreter::prepare(bool dry_run)
{
mutation_kind.set(MutationKind::MUTATE_INDEX_STATISTIC_PROJECTION);
const auto & projection = projections_desc.get(command.projection_name);
if (!source.hasProjection(projection.name))
if (!source.hasProjection(projection.name) || source.hasBrokenProjection(projection.name))
{
for (const auto & column : projection.required_columns)
dependencies.emplace(column, ColumnDependency::PROJECTION);
@ -989,6 +994,13 @@ void MutationsInterpreter::prepare(bool dry_run)
if (!source.hasProjection(projection.name))
continue;
/// Always rebuild broken projections.
if (source.hasBrokenProjection(projection.name))
{
materialized_projections.insert(projection.name);
continue;
}
if (need_rebuild_projections)
{
materialized_projections.insert(projection.name);

View File

@ -126,6 +126,7 @@ public:
bool materializeTTLRecalculateOnly() const;
bool hasSecondaryIndex(const String & name) const;
bool hasProjection(const String & name) const;
bool hasBrokenProjection(const String & name) const;
bool isCompactPart() const;
void read(

View File

@ -429,18 +429,12 @@ void Session::setClientConnectionId(uint32_t connection_id)
prepared_client_info->connection_id = connection_id;
}
void Session::setHTTPClientInfo(ClientInfo::HTTPMethod http_method, const String & http_user_agent, const String & http_referer)
void Session::setHTTPClientInfo(const Poco::Net::HTTPRequest & request)
{
if (session_context)
{
session_context->setHTTPClientInfo(http_method, http_user_agent, http_referer);
}
session_context->setHTTPClientInfo(request);
else
{
prepared_client_info->http_method = http_method;
prepared_client_info->http_user_agent = http_user_agent;
prepared_client_info->http_referer = http_referer;
}
prepared_client_info->setFromHTTPRequest(request);
}
void Session::setForwardedFor(const String & forwarded_for)

View File

@ -65,7 +65,7 @@ public:
void setClientInterface(ClientInfo::Interface interface);
void setClientVersion(UInt64 client_version_major, UInt64 client_version_minor, UInt64 client_version_patch, unsigned client_tcp_protocol_version);
void setClientConnectionId(uint32_t connection_id);
void setHTTPClientInfo(ClientInfo::HTTPMethod http_method, const String & http_user_agent, const String & http_referer);
void setHTTPClientInfo(const Poco::Net::HTTPRequest & request);
void setForwardedFor(const String & forwarded_for);
void setQuotaClientKey(const String & quota_key);
void setConnectionClientVersion(UInt64 client_version_major, UInt64 client_version_minor, UInt64 client_version_patch, unsigned client_tcp_protocol_version);

Some files were not shown because too many files have changed in this diff Show More