Merge branch 'fix-idiotic-code' into ast-data-type

This commit is contained in:
Alexey Milovidov 2024-07-25 19:06:15 +02:00
commit 4bf8d72a67
162 changed files with 1677 additions and 1610 deletions

View File

@ -9,19 +9,64 @@ on: # yamllint disable-line rule:truthy
- cron: '0 */6 * * *'
workflow_dispatch:
jobs:
RunConfig:
runs-on: [self-hosted, style-checker-aarch64]
outputs:
data: ${{ steps.runconfig.outputs.CI_DATA }}
steps:
- name: DebugInfo
uses: hmarr/debug-action@f7318c783045ac39ed9bb497e22ce835fdafbfe6
- name: Check out repository code
uses: ClickHouse/checkout@v1
with:
clear-repository: true # to ensure correct digests
fetch-depth: 0 # to get version
filter: tree:0
- name: PrepareRunConfig
id: runconfig
run: |
echo "::group::configure CI run"
python3 "$GITHUB_WORKSPACE/tests/ci/ci.py" --configure --workflow "$GITHUB_WORKFLOW" --outfile ${{ runner.temp }}/ci_run_data.json
echo "::endgroup::"
echo "::group::CI run configure results"
python3 -m json.tool ${{ runner.temp }}/ci_run_data.json
echo "::endgroup::"
{
echo 'CI_DATA<<EOF'
cat ${{ runner.temp }}/ci_run_data.json
echo 'EOF'
} >> "$GITHUB_OUTPUT"
KeeperJepsenRelease:
uses: ./.github/workflows/reusable_simple_job.yml
needs: [RunConfig]
uses: ./.github/workflows/reusable_test.yml
with:
test_name: Jepsen keeper check
runner_type: style-checker
report_required: true
test_name: ClickHouse Keeper Jepsen
runner_type: style-checker-aarch64
data: ${{ needs.RunConfig.outputs.data }}
run_command: |
python3 jepsen_check.py keeper
# ServerJepsenRelease:
# uses: ./.github/workflows/reusable_simple_job.yml
# with:
# test_name: Jepsen server check
# runner_type: style-checker
# run_command: |
# cd "$REPO_COPY/tests/ci"
# python3 jepsen_check.py server
ServerJepsenRelease:
if: false # skip for server
needs: [RunConfig]
uses: ./.github/workflows/reusable_test.yml
with:
test_name: ClickHouse Server Jepsen
runner_type: style-checker-aarch64
data: ${{ needs.RunConfig.outputs.data }}
run_command: |
python3 jepsen_check.py server
CheckWorkflow:
if: ${{ !cancelled() }}
needs: [RunConfig, ServerJepsenRelease, KeeperJepsenRelease]
runs-on: [self-hosted, style-checker-aarch64]
steps:
- name: Check out repository code
uses: ClickHouse/checkout@v1
- name: Check Workflow results
run: |
export WORKFLOW_RESULT_FILE="/tmp/workflow_results.json"
cat >> "$WORKFLOW_RESULT_FILE" << 'EOF'
${{ toJson(needs) }}
EOF
python3 ./tests/ci/ci_buddy.py --check-wf-status

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v24.7, 2024-07-30](#247)**<br/>
**[ClickHouse release v24.6, 2024-07-01](#246)**<br/>
**[ClickHouse release v24.5, 2024-05-30](#245)**<br/>
**[ClickHouse release v24.4, 2024-04-30](#244)**<br/>
@ -9,6 +10,178 @@
# 2024 Changelog
### <a id="247"></a> ClickHouse release 24.7, 2024-07-30
#### Backward Incompatible Change
* Forbid `CRATE MATERIALIZED VIEW ... ENGINE Replicated*MergeTree POPULATE AS SELECT ...` with Replicated databases. [#63963](https://github.com/ClickHouse/ClickHouse/pull/63963) ([vdimir](https://github.com/vdimir)).
* `clickhouse-keeper-client` will only accept paths in string literals, such as `ls '/hello/world'`, not bare strings such as `ls /hello/world`. [#65494](https://github.com/ClickHouse/ClickHouse/pull/65494) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Metric `KeeperOutstandingRequets` was renamed to `KeeperOutstandingRequests`. [#66206](https://github.com/ClickHouse/ClickHouse/pull/66206) ([Robert Schulze](https://github.com/rschu1ze)).
* Remove `is_deterministic` field from the `system.functions` table. [#66630](https://github.com/ClickHouse/ClickHouse/pull/66630) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Function `tuple` will now try to construct named tuples in query (controlled by `enable_named_columns_in_function_tuple`). Introduce function `tupleNames` to extract names from tuples. [#54881](https://github.com/ClickHouse/ClickHouse/pull/54881) ([Amos Bird](https://github.com/amosbird)).
#### New Feature
* Add `ASOF JOIN` support for `full_sorting_join` algorithm. [#55051](https://github.com/ClickHouse/ClickHouse/pull/55051) ([vdimir](https://github.com/vdimir)).
* Add new window function `percent_rank`. [#62747](https://github.com/ClickHouse/ClickHouse/pull/62747) ([lgbo](https://github.com/lgbo-ustc)).
* Support JWT authentication in `clickhouse-client` (will be available only in ClickHouse Cloud). [#62829](https://github.com/ClickHouse/ClickHouse/pull/62829) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Add SQL functions `changeYear`, `changeMonth`, `changeDay`, `changeHour`, `changeMinute`, `changeSecond`. For example, `SELECT changeMonth(toDate('2024-06-14'), 7)` returns date `2024-07-14`. [#63186](https://github.com/ClickHouse/ClickHouse/pull/63186) ([cucumber95](https://github.com/cucumber95)).
* Introduce startup scripts, which allow the execution of preconfigured queries at the startup stage. [#64889](https://github.com/ClickHouse/ClickHouse/pull/64889) ([pufit](https://github.com/pufit)).
* Support accept_invalid_certificate in client's config in order to allow for client to connect over secure TCP to a server running with self-signed certificate - can be used as a shorthand for corresponding `openSSL` client settings `verificationMode=none` + `invalidCertificateHandler.name=AcceptCertificateHandler`. [#65238](https://github.com/ClickHouse/ClickHouse/pull/65238) ([peacewalker122](https://github.com/peacewalker122)).
* Add system.error_log which contains history of error values from table system.errors, periodically flushed to disk. [#65381](https://github.com/ClickHouse/ClickHouse/pull/65381) ([Pablo Marcos](https://github.com/pamarcos)).
* Add aggregate function `groupConcat`. About the same as `arrayStringConcat( groupArray(column), ',')` Can receive 2 parameters: a string delimiter and the number of elements to be processed. [#65451](https://github.com/ClickHouse/ClickHouse/pull/65451) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Add AzureQueue storage. [#65458](https://github.com/ClickHouse/ClickHouse/pull/65458) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add a new setting to disable/enable writing page index into parquet files. [#65475](https://github.com/ClickHouse/ClickHouse/pull/65475) ([lgbo](https://github.com/lgbo-ustc)).
* Introduce `logger.console_log_level` server config to control the log level to the console (if enabled). [#65559](https://github.com/ClickHouse/ClickHouse/pull/65559) ([Azat Khuzhin](https://github.com/azat)).
* Automatically append a wildcard `*` to the end of a directory path with table function `file`. [#66019](https://github.com/ClickHouse/ClickHouse/pull/66019) ([Zhidong (David) Guo](https://github.com/Gun9niR)).
* Add `--memory-usage` option to client in non interactive mode. [#66393](https://github.com/ClickHouse/ClickHouse/pull/66393) ([vdimir](https://github.com/vdimir)).
* Make an interactive client for clickhouse-disks, add local disk from the local directory. [#64446](https://github.com/ClickHouse/ClickHouse/pull/64446) ([Daniil Ivanik](https://github.com/divanik)).
* When lightweight delete happens on a table with projection(s), users have choices either throw an exception (by default) or drop the projection [#65594](https://github.com/ClickHouse/ClickHouse/pull/65594) ([jsc0218](https://github.com/jsc0218)).
#### Experimental Feature
* Change binary serialization of Variant data type: add `compact` mode to avoid writing the same discriminator multiple times for granules with single variant or with only NULL values. Add MergeTree setting `use_compact_variant_discriminators_serialization` that is enabled by default. Note that Variant type is still experimental and backward-incompatible change in serialization is ok. [#62774](https://github.com/ClickHouse/ClickHouse/pull/62774) ([Kruglov Pavel](https://github.com/Avogar)).
* Support rocksdb as backend storage of keeper. [#56626](https://github.com/ClickHouse/ClickHouse/pull/56626) ([Han Fei](https://github.com/hanfei1991)).
* Refactor JSONExtract functions, support more types including experimental Dynamic type. [#66046](https://github.com/ClickHouse/ClickHouse/pull/66046) ([Kruglov Pavel](https://github.com/Avogar)).
* Support null map subcolumn for Variant and Dynamic subcolumns. [#66178](https://github.com/ClickHouse/ClickHouse/pull/66178) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix reading dynamic subcolumns from altered Memory table. Previously if `max_types` parameter of a Dynamic type was changed in Memory table via alter, further subcolumns reading can return wrong result. [#66066](https://github.com/ClickHouse/ClickHouse/pull/66066) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support for `cluster_for_parallel_replicas` when using custom key parallel replicas. It allows you to use parallel replicas with custom key with MergeTree tables. [#65453](https://github.com/ClickHouse/ClickHouse/pull/65453) ([Antonio Andelic](https://github.com/antonio2368)).
#### Performance Improvement
* Enable `optimize_functions_to_subcolumns` by default. [#58661](https://github.com/ClickHouse/ClickHouse/pull/58661) ([Anton Popov](https://github.com/CurtizJ)).
* Replace int to string algorithm with a faster one (from a modified amdn/itoa to a modified jeaiii/itoa). [#61661](https://github.com/ClickHouse/ClickHouse/pull/61661) ([Raúl Marín](https://github.com/Algunenano)).
* Sizes of hash tables created by join (`parallel_hash` algorithm) is collected and cached now. This information will be used to preallocate space in hash tables for subsequent query executions and save time on hash table resizes. [#64553](https://github.com/ClickHouse/ClickHouse/pull/64553) ([Nikita Taranov](https://github.com/nickitat)).
* Optimized queries with `ORDER BY` primary key and `WHERE` that have a condition with high selectivity by using of buffering. It is controlled by setting `read_in_order_use_buffering` (enabled by default) and can increase memory usage of query. [#64607](https://github.com/ClickHouse/ClickHouse/pull/64607) ([Anton Popov](https://github.com/CurtizJ)).
* Improve performance of loading `plain_rewritable` metadata. [#65634](https://github.com/ClickHouse/ClickHouse/pull/65634) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Attaching tables on read-only disks will use fewer resources by not loading outdated parts. [#65635](https://github.com/ClickHouse/ClickHouse/pull/65635) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Support minmax hyperrectangle for Set indices. [#65676](https://github.com/ClickHouse/ClickHouse/pull/65676) ([AntiTopQuark](https://github.com/AntiTopQuark)).
* Unload primary index of outdated parts to reduce total memory usage. [#65852](https://github.com/ClickHouse/ClickHouse/pull/65852) ([Anton Popov](https://github.com/CurtizJ)).
* Functions `replaceRegexpAll` and `replaceRegexpOne` are now significantly faster if the pattern is trivial, i.e. contains no metacharacters, pattern classes, flags, grouping characters etc. (Thanks to Taiyang Li). [#66185](https://github.com/ClickHouse/ClickHouse/pull/66185) ([Robert Schulze](https://github.com/rschu1ze)).
* s3 requests: Reduce retry time for queries, increase retries count for backups. 8.5 minutes and 100 retires for queries, 1.2 hours and 1000 retries for backup restore. [#65232](https://github.com/ClickHouse/ClickHouse/pull/65232) ([Sema Checherinda](https://github.com/CheSema)).
* Support query plan LIMIT optimization. Support LIMIT pushdown for PostgreSQL storage and table function. [#65454](https://github.com/ClickHouse/ClickHouse/pull/65454) ([Maksim Kita](https://github.com/kitaisreal)).
* Improved ZooKeeper load balancing. The current session doesn't expire until the optimal nodes become available despite `fallback_session_lifetime`. Added support for AZ-aware balancing. [#65570](https://github.com/ClickHouse/ClickHouse/pull/65570) ([Alexander Tokmakov](https://github.com/tavplubix)).
* DatabaseCatalog drops tables faster by using up to database_catalog_drop_table_concurrency threads. [#66065](https://github.com/ClickHouse/ClickHouse/pull/66065) ([Sema Checherinda](https://github.com/CheSema)).
#### Improvement
* The setting `optimize_trivial_insert_select` is disabled by default. In most cases, it should be beneficial. Nevertheless, if you are seeing slower INSERT SELECT or increased memory usage, you can enable it back or `SET compatibility = '24.6'`. [#58970](https://github.com/ClickHouse/ClickHouse/pull/58970) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Print stacktrace and diagnostic info if `clickhouse-client` or `clickhouse-local` crashes. [#61109](https://github.com/ClickHouse/ClickHouse/pull/61109) ([Alexander Tokmakov](https://github.com/tavplubix)).
* The result of `SHOW INDEX | INDEXES | INDICES | KEYS` was previously sorted by the primary key column names. Since this was unintuitive, the result is now sorted by the position of the primary key columns within the primary key. [#61131](https://github.com/ClickHouse/ClickHouse/pull/61131) ([Robert Schulze](https://github.com/rschu1ze)).
* Change how deduplication for Materialized Views works. Fixed a lot of cases like: - on destination table: data is split for 2 or more blocks and that blocks is considered as duplicate when that block is inserted in parallel. - on MV destination table: the equal blocks are deduplicated, that happens when MV often produces equal data as a result for different input data due to performing aggregation. - on MV destination table: the equal blocks which comes from different MV are deduplicated. [#61601](https://github.com/ClickHouse/ClickHouse/pull/61601) ([Sema Checherinda](https://github.com/CheSema)).
* Allow matching column names in a case insensitive manner when reading json files (`input_format_json_ignore_key_case`). [#61750](https://github.com/ClickHouse/ClickHouse/pull/61750) ([kevinyhzou](https://github.com/KevinyhZou)).
* Support reading partitioned data DeltaLake data. Infer DeltaLake schema by reading metadata instead of data. [#63201](https://github.com/ClickHouse/ClickHouse/pull/63201) ([Kseniia Sumarokova](https://github.com/kssenii)).
* In composable protocols TLS layer accepted only `certificateFile` and `privateKeyFile` parameters. https://clickhouse.com/docs/en/operations/settings/composable-protocols. [#63985](https://github.com/ClickHouse/ClickHouse/pull/63985) ([Anton Ivashkin](https://github.com/ianton-ru)).
* Added profile event `SelectQueriesWithPrimaryKeyUsage` which indicates how many SELECT queries use the primary key to evaluate the WHERE clause. [#64492](https://github.com/ClickHouse/ClickHouse/pull/64492) ([0x01f](https://github.com/0xfei)).
* `StorageS3Queue` related fixes and improvements. Deduce a default value of `s3queue_processing_threads_num` according to the number of physical cpu cores on the server (instead of the previous default value as 1). Set default value of `s3queue_loading_retries` to 10. Fix possible vague "Uncaught exception" in exception column of `system.s3queue`. Do not increment retry count on `MEMORY_LIMIT_EXCEEDED` exception. Move files commit to a stage after insertion into table fully finished to avoid files being commited while not inserted. Add settings `s3queue_max_processed_files_before_commit`, `s3queue_max_processed_rows_before_commit`, `s3queue_max_processed_bytes_before_commit`, `s3queue_max_processing_time_sec_before_commit`, to better control commit and flush time. [#65046](https://github.com/ClickHouse/ClickHouse/pull/65046) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support aliases in parametrized view function (only new analyzer). [#65190](https://github.com/ClickHouse/ClickHouse/pull/65190) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Updated to mask account key in logs in azureBlobStorage. [#65273](https://github.com/ClickHouse/ClickHouse/pull/65273) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Partition pruning for `IN` predicates when filter expression is a part of `PARTITION BY` expression. [#65335](https://github.com/ClickHouse/ClickHouse/pull/65335) ([Eduard Karacharov](https://github.com/korowa)).
* Add system tables with main information about all detached tables. [#65400](https://github.com/ClickHouse/ClickHouse/pull/65400) ([Konstantin Morozov](https://github.com/k-morozov)).
* `arrayMin`/`arrayMax` can be applicable to all data types that are comparable. [#65455](https://github.com/ClickHouse/ClickHouse/pull/65455) ([pn](https://github.com/chloro-pn)).
* Improved memory accounting for cgroups v2 to exclude the amount occupied by the page cache. [#65470](https://github.com/ClickHouse/ClickHouse/pull/65470) ([Nikita Taranov](https://github.com/nickitat)).
* Do not create format settings for each row when serializing chunks to insert to EmbeddedRocksDB table. [#65474](https://github.com/ClickHouse/ClickHouse/pull/65474) ([Duc Canh Le](https://github.com/canhld94)).
* Reduce `clickhouse-local` prompt to just `:)`. `getFQDNOrHostName()` takes too long on macOS, and we don't want a hostname in the prompt for `clickhouse-local` anyway. [#65510](https://github.com/ClickHouse/ClickHouse/pull/65510) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Avoid printing a message from jemalloc about per-CPU arenas on low-end virtual machines. [#65532](https://github.com/ClickHouse/ClickHouse/pull/65532) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Disable filesystem cache background download by default. It will be enabled back when we fix the issue with possible "Memory limit exceeded" because memory deallocation is done outside of query context (while buffer is allocated inside of query context) if we use background download threads. Plus we need to add a separate setting to define max size to download for background workers (currently it is limited by max_file_segment_size, which might be too big). [#65534](https://github.com/ClickHouse/ClickHouse/pull/65534) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add new option to config `<config_reload_interval_ms>` which allow to specify how often clickhouse will reload config. [#65545](https://github.com/ClickHouse/ClickHouse/pull/65545) ([alesapin](https://github.com/alesapin)).
* Implement binary encoding for ClickHouse data types and add its specification in docs. Use it in Dynamic binary serialization, allow to use it in RowBinaryWithNamesAndTypes and Native formats under settings. [#65546](https://github.com/ClickHouse/ClickHouse/pull/65546) ([Kruglov Pavel](https://github.com/Avogar)).
* Improved ZooKeeper load balancing. The current session doesn't expire until the optimal nodes become available despite `fallback_session_lifetime`. Added support for AZ-aware balancing. [#65570](https://github.com/ClickHouse/ClickHouse/pull/65570) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Server settings `compiled_expression_cache_size` and `compiled_expression_cache_elements_size` are now shown in `system.server_settings`. [#65584](https://github.com/ClickHouse/ClickHouse/pull/65584) ([Robert Schulze](https://github.com/rschu1ze)).
* Add support for user identification based on x509 SubjectAltName extension. [#65626](https://github.com/ClickHouse/ClickHouse/pull/65626) ([Anton Kozlov](https://github.com/tonickkozlov)).
* `clickhouse-local` will respect the `max_server_memory_usage` and `max_server_memory_usage_to_ram_ratio` from the configuration file. It will also set the max memory usage to 90% of the system memory by default, like `clickhouse-server` does. [#65697](https://github.com/ClickHouse/ClickHouse/pull/65697) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a script to backup your files to ClickHouse. [#65699](https://github.com/ClickHouse/ClickHouse/pull/65699) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* PostgreSQL source support cancel. [#65722](https://github.com/ClickHouse/ClickHouse/pull/65722) ([Maksim Kita](https://github.com/kitaisreal)).
* Make allow_experimental_analyzer be controlled by the initiator for distributed queries. This ensures compatibility and correctness during operations in mixed version clusters. [#65777](https://github.com/ClickHouse/ClickHouse/pull/65777) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Respect cgroup CPU limit in Keeper. [#65819](https://github.com/ClickHouse/ClickHouse/pull/65819) ([Antonio Andelic](https://github.com/antonio2368)).
* Allow to use `concat` function with empty arguments ``` sql :) select concat();. [#65887](https://github.com/ClickHouse/ClickHouse/pull/65887) ([李扬](https://github.com/taiyang-li)).
* Allow controlling named collections in clickhouse-local. [#65973](https://github.com/ClickHouse/ClickHouse/pull/65973) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve Azure profile events. [#65999](https://github.com/ClickHouse/ClickHouse/pull/65999) ([alesapin](https://github.com/alesapin)).
* Support ORC file read by writer time zone. [#66025](https://github.com/ClickHouse/ClickHouse/pull/66025) ([kevinyhzou](https://github.com/KevinyhZou)).
* Add settings to control connection to the PostgreSQL. * Setting `postgresql_connection_attempt_timeout` specifies the value passed to `connect_timeout` parameter of connection URL. * Setting `postgresql_connection_pool_retries` specifies the number of retries to establish a connection to the PostgreSQL end-point. [#66232](https://github.com/ClickHouse/ClickHouse/pull/66232) ([Dmitry Novik](https://github.com/novikd)).
* Reduce inaccuracy of input_wait_elapsed_us/input_wait_elapsed_us/elapsed_us. [#66239](https://github.com/ClickHouse/ClickHouse/pull/66239) ([Azat Khuzhin](https://github.com/azat)).
* Improve FilesystemCache ProfileEvents. [#66249](https://github.com/ClickHouse/ClickHouse/pull/66249) ([zhukai](https://github.com/nauu)).
* Add settings to ignore ON CLUSTER clause in queries for named collection management with replicated storage. [#66288](https://github.com/ClickHouse/ClickHouse/pull/66288) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Function `generateSnowflakeID` now allows to specify a machine ID as a parameter to prevent collisions in large clusters. [#66374](https://github.com/ClickHouse/ClickHouse/pull/66374) ([ZAWA_ll](https://github.com/Zawa-ll)).
* Disable suspending on Ctrl+Z in interactive mode. This is a common trap and is not expected behavior for almost all users. I imagine only a few extreme power users could appreciate suspending terminal applications to the background, but I don't know any. [#66511](https://github.com/ClickHouse/ClickHouse/pull/66511) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add option for validating the Primary key type in Dictionaries. Without this option for simple layouts any column type will be implicitly converted to UInt64. ### Documentation entry for user-facing changes. [#66595](https://github.com/ClickHouse/ClickHouse/pull/66595) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix unexpected size of low cardinality column in function calls. [#65298](https://github.com/ClickHouse/ClickHouse/pull/65298) ([Raúl Marín](https://github.com/Algunenano)).
* Check cyclic dependencies on CREATE/REPLACE/RENAME/EXCHANGE queries and throw an exception if there is a cyclic dependency. Previously such cyclic dependencies could lead to a deadlock during server startup. Also fix some bugs in dependencies creation. [#65405](https://github.com/ClickHouse/ClickHouse/pull/65405) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix crash in maxIntersections. [#65689](https://github.com/ClickHouse/ClickHouse/pull/65689) ([Raúl Marín](https://github.com/Algunenano)).
* Fix the VALID UNTIL clause in the user definition resetting after a restart. [#66409](https://github.com/ClickHouse/ClickHouse/pull/66409) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix SHOW MERGES remaining time. [#66735](https://github.com/ClickHouse/ClickHouse/pull/66735) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* `Query was cancelled` might have been printed twice in clickhouse-client. This behaviour is fixed. [#66005](https://github.com/ClickHouse/ClickHouse/pull/66005) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fixed crash while using MaterializedMySQL with TABLE OVERRIDE that maps MySQL NULL field into ClickHouse not NULL field. [#54649](https://github.com/ClickHouse/ClickHouse/pull/54649) ([Filipp Ozinov](https://github.com/bakwc)).
* Fix logical error when PREWHERE expression read no columns and table has no adaptive index granularity (very old table). [#59173](https://github.com/ClickHouse/ClickHouse/pull/59173) ([Alexander Gololobov](https://github.com/davenger)).
* Fix bug with cancellation buffer when canceling a query. [#64478](https://github.com/ClickHouse/ClickHouse/pull/64478) ([Sema Checherinda](https://github.com/CheSema)).
* Fix filling parts columns from metadata (when columns.txt does not exists). [#64757](https://github.com/ClickHouse/ClickHouse/pull/64757) ([Azat Khuzhin](https://github.com/azat)).
* Fix crash for `ALTER TABLE ... ON CLUSTER ... MODIFY SQL SECURITY`. [#64957](https://github.com/ClickHouse/ClickHouse/pull/64957) ([pufit](https://github.com/pufit)).
* Fix crash on destroying AccessControl: add explicit shutdown. [#64993](https://github.com/ClickHouse/ClickHouse/pull/64993) ([Vitaly Baranov](https://github.com/vitlibar)).
* Eliminate injective function in argument of functions `uniq*` recursively. This used to work correctly but was broken in the new analyzer. [#65140](https://github.com/ClickHouse/ClickHouse/pull/65140) ([Duc Canh Le](https://github.com/canhld94)).
* Fix unexpected projection name when query with CTE. [#65267](https://github.com/ClickHouse/ClickHouse/pull/65267) ([wudidapaopao](https://github.com/wudidapaopao)).
* Require `dictGet` privilege when accessing dictionaries via direct query or the `Dictionary` table engine. [#65359](https://github.com/ClickHouse/ClickHouse/pull/65359) ([Joe Lynch](https://github.com/joelynch)).
* Fix user-specific S3 auth with incremental backups. [#65481](https://github.com/ClickHouse/ClickHouse/pull/65481) ([Antonio Andelic](https://github.com/antonio2368)).
* Disable `non-intersecting-parts` optimization for queries with `FINAL` in case of `read-in-order` optimization was enabled. This could lead to an incorrect query result. As a workaround, disable `do_not_merge_across_partitions_select_final` and `split_parts_ranges_into_intersecting_and_non_intersecting_final` before this fix is merged. [#65505](https://github.com/ClickHouse/ClickHouse/pull/65505) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix getting exception `Index out of bound for blob metadata` in case all files from list batch were filtered out. [#65523](https://github.com/ClickHouse/ClickHouse/pull/65523) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix NOT_FOUND_COLUMN_IN_BLOCK for deduplicate merge of projection. [#65573](https://github.com/ClickHouse/ClickHouse/pull/65573) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fixed bug in MergeJoin. Column in sparse serialisation might be treated as a column of its nested type though the required conversion wasn't performed. [#65632](https://github.com/ClickHouse/ClickHouse/pull/65632) ([Nikita Taranov](https://github.com/nickitat)).
* Fixed a bug that compatibility level '23.4' was not properly applied. [#65737](https://github.com/ClickHouse/ClickHouse/pull/65737) ([cw5121](https://github.com/cw5121)).
* Fix odbc table with nullable fields. [#65738](https://github.com/ClickHouse/ClickHouse/pull/65738) ([Rodolphe Dugé de Bernonville](https://github.com/RodolpheDuge)).
* Fix data race in `TCPHandler`, which could happen on fatal error. [#65744](https://github.com/ClickHouse/ClickHouse/pull/65744) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix invalid exceptions in function `parseDateTime` with `%F` and `%D` placeholders. [#65768](https://github.com/ClickHouse/ClickHouse/pull/65768) ([Antonio Andelic](https://github.com/antonio2368)).
* For queries that read from `PostgreSQL`, cancel the internal `PostgreSQL` query if the ClickHouse query is finished. Otherwise, `ClickHouse` query cannot be canceled until the internal `PostgreSQL` query is finished. [#65771](https://github.com/ClickHouse/ClickHouse/pull/65771) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix a bug in short circuit logic when old analyzer and dictGetOrDefault is used. [#65802](https://github.com/ClickHouse/ClickHouse/pull/65802) ([jsc0218](https://github.com/jsc0218)).
* Fix a bug leads to EmbeddedRocksDB with TTL write corrupted SST files. [#65816](https://github.com/ClickHouse/ClickHouse/pull/65816) ([Duc Canh Le](https://github.com/canhld94)).
* Functions `bitTest`, `bitTestAll`, and `bitTestAny` now return an error if the specified bit index is out-of-bounds [#65818](https://github.com/ClickHouse/ClickHouse/pull/65818) ([Pablo Marcos](https://github.com/pamarcos)).
* Setting `join_any_take_last_row` is supported in any query with hash join. [#65820](https://github.com/ClickHouse/ClickHouse/pull/65820) ([vdimir](https://github.com/vdimir)).
* Better handling of join conditions involving `IS NULL` checks (for example `ON (a = b AND (a IS NOT NULL) AND (b IS NOT NULL) ) OR ( (a IS NULL) AND (b IS NULL) )` is rewritten to `ON a <=> b`), fix incorrect optimization when condition other then `IS NULL` are present. [#65835](https://github.com/ClickHouse/ClickHouse/pull/65835) ([vdimir](https://github.com/vdimir)).
* Functions `bitShiftLeft` and `bitShitfRight` return an error for out of bounds shift positions [#65838](https://github.com/ClickHouse/ClickHouse/pull/65838) ([Pablo Marcos](https://github.com/pamarcos)).
* Fix growing memory usage in S3Queue. [#65839](https://github.com/ClickHouse/ClickHouse/pull/65839) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix tie handling in `arrayAUC` to match sklearn. [#65840](https://github.com/ClickHouse/ClickHouse/pull/65840) ([gabrielmcg44](https://github.com/gabrielmcg44)).
* Fix possible issues with MySQL server protocol TLS connections. [#65917](https://github.com/ClickHouse/ClickHouse/pull/65917) ([Azat Khuzhin](https://github.com/azat)).
* Fix possible issues with MySQL client protocol TLS connections. [#65938](https://github.com/ClickHouse/ClickHouse/pull/65938) ([Azat Khuzhin](https://github.com/azat)).
* Fix handling of `SSL_ERROR_WANT_READ`/`SSL_ERROR_WANT_WRITE` with zero timeout. [#65941](https://github.com/ClickHouse/ClickHouse/pull/65941) ([Azat Khuzhin](https://github.com/azat)).
* Add missing settings `input_format_csv_skip_first_lines/input_format_tsv_skip_first_lines/input_format_csv_try_infer_numbers_from_strings/input_format_csv_try_infer_strings_from_quoted_tuples` in schema inference cache because they can change the resulting schema. It prevents from incorrect result of schema inference with these settings changed. [#65980](https://github.com/ClickHouse/ClickHouse/pull/65980) ([Kruglov Pavel](https://github.com/Avogar)).
* Column _size in s3 engine and s3 table function denotes the size of a file inside the archive, not a size of the archive itself. [#65993](https://github.com/ClickHouse/ClickHouse/pull/65993) ([Daniil Ivanik](https://github.com/divanik)).
* Fix resolving dynamic subcolumns in analyzer, avoid reading the whole column on dynamic subcolumn reading. [#66004](https://github.com/ClickHouse/ClickHouse/pull/66004) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix config merging for from_env with replace overrides. [#66034](https://github.com/ClickHouse/ClickHouse/pull/66034) ([Azat Khuzhin](https://github.com/azat)).
* Fix a possible hanging in `GRPCServer` during shutdown. [#66061](https://github.com/ClickHouse/ClickHouse/pull/66061) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fixed several cases in function `has` with non-constant `LowCardinality` arguments. [#66088](https://github.com/ClickHouse/ClickHouse/pull/66088) ([Anton Popov](https://github.com/CurtizJ)).
* Fix for `groupArrayIntersect`. It had incorrect behavior in the `merge()` function. Also, fixed behavior in `deserialise()` for numeric and general data. [#66103](https://github.com/ClickHouse/ClickHouse/pull/66103) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fixed buffer overflow bug in `unbin`/`unhex` implementation. [#66106](https://github.com/ClickHouse/ClickHouse/pull/66106) ([Nikita Taranov](https://github.com/nickitat)).
* Disable the `merge-filters` optimization introduced in [#64760](https://github.com/ClickHouse/ClickHouse/issues/64760). It may cause an exception if optimization merges two filter expressions and does not apply a short-circuit evaluation. [#66126](https://github.com/ClickHouse/ClickHouse/pull/66126) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fixed the issue when the server failed to parse Avro files with negative block size arrays encoded, which is now allowed by the Avro specification. [#66130](https://github.com/ClickHouse/ClickHouse/pull/66130) ([Serge Klochkov](https://github.com/slvrtrn)).
* Fixed a bug in ZooKeeper client: a session could get stuck in unusable state after receiving a hardware error from ZooKeeper. For example, this might happen due to "soft memory limit" in ClickHouse Keeper. [#66140](https://github.com/ClickHouse/ClickHouse/pull/66140) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix issue in SumIfToCountIfVisitor and signed integers. [#66146](https://github.com/ClickHouse/ClickHouse/pull/66146) ([Raúl Marín](https://github.com/Algunenano)).
* Fix rare case with missing data in the result of distributed query. [#66174](https://github.com/ClickHouse/ClickHouse/pull/66174) ([vdimir](https://github.com/vdimir)).
* Fix order of parsing metadata fields in StorageDeltaLake. [#66211](https://github.com/ClickHouse/ClickHouse/pull/66211) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Don't throw `TIMEOUT_EXCEEDED` for `none_only_active` mode of `distributed_ddl_output_mode`. [#66218](https://github.com/ClickHouse/ClickHouse/pull/66218) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix handling limit for `system.numbers_mt` when no index can be used. [#66231](https://github.com/ClickHouse/ClickHouse/pull/66231) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fixed how the ClickHouse server detects the maximum number of usable CPU cores as specified by cgroups v2 if the server runs in a container such as Docker. In more detail, containers often run their process in the root cgroup which has an empty name. In that case, ClickHouse ignored the CPU limits set by cgroups v2. [#66237](https://github.com/ClickHouse/ClickHouse/pull/66237) ([filimonov](https://github.com/filimonov)).
* Fix the `Not-ready set` error when a subquery with `IN` is used in the constraint. [#66261](https://github.com/ClickHouse/ClickHouse/pull/66261) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix error reporting while copying to S3 or AzureBlobStorage. [#66295](https://github.com/ClickHouse/ClickHouse/pull/66295) ([Vitaly Baranov](https://github.com/vitlibar)).
* Prevent watchdog from keeping descriptors of unlinked(rotated) log files. [#66334](https://github.com/ClickHouse/ClickHouse/pull/66334) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Fix the bug that logicalexpressionoptimizerpass lost logical type of constant. [#66344](https://github.com/ClickHouse/ClickHouse/pull/66344) ([pn](https://github.com/chloro-pn)).
* Fix `Column identifier is already registered` error with `group_by_use_nulls=true` and new analyzer. [#66400](https://github.com/ClickHouse/ClickHouse/pull/66400) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix possible incorrect result for queries joining and filtering table external engine (like PostgreSQL), due to too aggressive filter pushdown. Since now, conditions from where section won't be send to external database in case of outer join with external table. [#66402](https://github.com/ClickHouse/ClickHouse/pull/66402) ([vdimir](https://github.com/vdimir)).
* Added missing column materialization for cross join. [#66413](https://github.com/ClickHouse/ClickHouse/pull/66413) ([lgbo](https://github.com/lgbo-ustc)).
* Fix `Cannot find column` error for queries with constant expression in `GROUP BY` key and new analyzer enabled. [#66433](https://github.com/ClickHouse/ClickHouse/pull/66433) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Avoid possible logical error during import from Npy format in case of bad array nesting level, fix testing of other kinds of errors. [#66461](https://github.com/ClickHouse/ClickHouse/pull/66461) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fix wrong count() result when there is non-deterministic function in predicate. [#66510](https://github.com/ClickHouse/ClickHouse/pull/66510) ([Duc Canh Le](https://github.com/canhld94)).
* Correctly track memory for `Allocator::realloc`. [#66548](https://github.com/ClickHouse/ClickHouse/pull/66548) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix reading of uninitialized memory when hashing empty tuples. [#66562](https://github.com/ClickHouse/ClickHouse/pull/66562) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix an invalid result for queries with `WINDOW`. This could happen when `PARTITION` columns have sparse serialization and window functions are executed in parallel. [#66579](https://github.com/ClickHouse/ClickHouse/pull/66579) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix removing named collections in local storage. [#66599](https://github.com/ClickHouse/ClickHouse/pull/66599) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fix `column_length` is not updated in `ColumnTuple::insertManyFrom`. [#66626](https://github.com/ClickHouse/ClickHouse/pull/66626) ([lgbo](https://github.com/lgbo-ustc)).
* Fix `Unknown identifier` and `Column is not under aggregate function` errors for queries with the expression `(column IS NULL).` The bug was triggered by [#65088](https://github.com/ClickHouse/ClickHouse/issues/65088), with the disabled analyzer only. [#66654](https://github.com/ClickHouse/ClickHouse/pull/66654) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix `Method getResultType is not supported for QUERY query node` error when scalar subquery was used as the first argument of IN (with new analyzer). [#66655](https://github.com/ClickHouse/ClickHouse/pull/66655) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix possible PARAMETER_OUT_OF_BOUND error during reading variant subcolumn. [#66659](https://github.com/ClickHouse/ClickHouse/pull/66659) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix rare case of stuck merge after drop column. [#66707](https://github.com/ClickHouse/ClickHouse/pull/66707) ([Raúl Marín](https://github.com/Algunenano)).
* Fix assertion `isUniqTypes` when insert select from remote sources. [#66722](https://github.com/ClickHouse/ClickHouse/pull/66722) ([Sema Checherinda](https://github.com/CheSema)).
* Fix logical error in PrometheusRequestHandler. [#66621](https://github.com/ClickHouse/ClickHouse/pull/66621) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix `indexHint` function case found by fuzzer. [#66286](https://github.com/ClickHouse/ClickHouse/pull/66286) ([Anton Popov](https://github.com/CurtizJ)).
* Fix AST formatting of 'create table b empty as a'. [#64951](https://github.com/ClickHouse/ClickHouse/pull/64951) ([Michael Kolupaev](https://github.com/al13n321)).
#### Build/Testing/Packaging Improvement
* Instantiate template methods ahead in different .cpp files, avoid too large translation units during compiling. [#64818](https://github.com/ClickHouse/ClickHouse/pull/64818) ([lgbo](https://github.com/lgbo-ustc)).
### <a id="246"></a> ClickHouse release 24.6, 2024-07-01
#### Backward Incompatible Change

2
contrib/libunwind vendored

@ -1 +1 @@
Subproject commit fe854449e24bedfa26e38465b84374312dbd587f
Subproject commit a89d904befea07814628c6ce0b44083c4e149c62

View File

@ -23,15 +23,17 @@ RUN apt-get update \
# and MEMORY_LIMIT_EXCEEDED exceptions in Functional tests (total memory limit in Functional tests is ~55.24 GiB).
# TSAN will flush shadow memory when reaching this limit.
# It may cause false-negatives, but it's better than OOM.
RUN echo "TSAN_OPTIONS='verbosity=1000 halt_on_error=1 abort_on_error=1 history_size=7 memory_limit_mb=46080 second_deadlock_stack=1'" >> /etc/environment
RUN echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment
RUN echo "MSAN_OPTIONS='abort_on_error=1 poison_in_dtor=1'" >> /etc/environment
RUN echo "LSAN_OPTIONS='suppressions=/usr/share/clickhouse-test/config/lsan_suppressions.txt'" >> /etc/environment
# max_allocation_size_mb is set to 32GB, so we have much bigger chance to run into memory limit than the limitation of the sanitizers
RUN echo "TSAN_OPTIONS='verbosity=1000 halt_on_error=1 abort_on_error=1 history_size=7 memory_limit_mb=46080 second_deadlock_stack=1 max_allocation_size_mb=32768'" >> /etc/environment
RUN echo "UBSAN_OPTIONS='print_stacktrace=1 max_allocation_size_mb=32768'" >> /etc/environment
RUN echo "MSAN_OPTIONS='abort_on_error=1 poison_in_dtor=1 max_allocation_size_mb=32768'" >> /etc/environment
RUN echo "LSAN_OPTIONS='suppressions=/usr/share/clickhouse-test/config/lsan_suppressions.txt max_allocation_size_mb=32768'" >> /etc/environment
# Sanitizer options for current shell (not current, but the one that will be spawned on "docker run")
# (but w/o verbosity for TSAN, otherwise test.reference will not match)
ENV TSAN_OPTIONS='halt_on_error=1 abort_on_error=1 history_size=7 memory_limit_mb=46080 second_deadlock_stack=1'
ENV UBSAN_OPTIONS='print_stacktrace=1'
ENV MSAN_OPTIONS='abort_on_error=1 poison_in_dtor=1'
ENV TSAN_OPTIONS='halt_on_error=1 abort_on_error=1 history_size=7 memory_limit_mb=46080 second_deadlock_stack=1 max_allocation_size_mb=32768'
ENV UBSAN_OPTIONS='print_stacktrace=1 max_allocation_size_mb=32768'
ENV MSAN_OPTIONS='abort_on_error=1 poison_in_dtor=1 max_allocation_size_mb=32768'
ENV LSAN_OPTIONS='max_allocation_size_mb=32768'
# for external_symbolizer_path
RUN ln -s /usr/bin/llvm-symbolizer-${LLVM_VERSION} /usr/bin/llvm-symbolizer

View File

@ -55,7 +55,7 @@ CMPLNT_FR_TM Nullable(String)
```
:::tip
Most of the time the above command will let you know which fields in the input data are numeric, and which are strings, and which are tuples. This is not always the case. Because ClickHouse is routineley used with datasets containing billions of records there is a default number (100) of rows examined to [infer the schema](/docs/en/integrations/data-ingestion/data-formats/json.md#relying-on-schema-inference) in order to avoid parsing billions of rows to infer the schema. The response below may not match what you see, as the dataset is updated several times each year. Looking at the Data Dictionary you can see that CMPLNT_NUM is specified as text, and not numeric. By overriding the default of 100 rows for inference with the setting `SETTINGS input_format_max_rows_to_read_for_schema_inference=2000`
Most of the time the above command will let you know which fields in the input data are numeric, and which are strings, and which are tuples. This is not always the case. Because ClickHouse is routineley used with datasets containing billions of records there is a default number (100) of rows examined to [infer the schema](/en/integrations/data-formats/json/inference) in order to avoid parsing billions of rows to infer the schema. The response below may not match what you see, as the dataset is updated several times each year. Looking at the Data Dictionary you can see that CMPLNT_NUM is specified as text, and not numeric. By overriding the default of 100 rows for inference with the setting `SETTINGS input_format_max_rows_to_read_for_schema_inference=2000`
you can get a better idea of the content.
Note: as of version 22.5 the default is now 25,000 rows for inferring the schema, so only change the setting if you are on an older version or if you need more than 25,000 rows to be sampled.

View File

@ -18,10 +18,21 @@ Reloads all dictionaries that have been successfully loaded before.
By default, dictionaries are loaded lazily (see [dictionaries_lazy_load](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-dictionaries_lazy_load)), so instead of being loaded automatically at startup, they are initialized on first access through dictGet function or SELECT from tables with ENGINE = Dictionary. The `SYSTEM RELOAD DICTIONARIES` query reloads such dictionaries (LOADED).
Always returns `Ok.` regardless of the result of the dictionary update.
**Syntax**
```sql
SYSTEM RELOAD DICTIONARIES [ON CLUSTER cluster_name]
```
## RELOAD DICTIONARY
Completely reloads a dictionary `dictionary_name`, regardless of the state of the dictionary (LOADED / NOT_LOADED / FAILED).
Always returns `Ok.` regardless of the result of updating the dictionary.
``` sql
SYSTEM RELOAD DICTIONARY [ON CLUSTER cluster_name] dictionary_name
```
The status of the dictionary can be checked by querying the `system.dictionaries` table.
``` sql

View File

@ -1,36 +0,0 @@
---
slug: /en/sql-reference/table-functions/fuzzQuery
sidebar_position: 75
sidebar_label: fuzzQuery
---
# fuzzQuery
Perturbs the given query string with random variations.
``` sql
fuzzQuery(query[, max_query_length[, random_seed]])
```
**Arguments**
- `query` (String) - The source query to perform the fuzzing on.
- `max_query_length` (UInt64) - A maximum length the query can get during the fuzzing process.
- `random_seed` (UInt64) - A random seed for producing stable results.
**Returned Value**
A table object with a single column containing perturbed query strings.
## Usage Example
``` sql
SELECT * FROM fuzzQuery('SELECT materialize(\'a\' AS key) GROUP BY key') LIMIT 2;
```
```
┌─query──────────────────────────────────────────────────────────┐
1. │ SELECT 'a' AS key GROUP BY key │
2. │ EXPLAIN PIPELINE compact = true SELECT 'a' AS key GROUP BY key │
└────────────────────────────────────────────────────────────────┘
```

View File

@ -9,10 +9,7 @@ namespace DB
class Client : public ClientBase
{
public:
Client()
{
fuzzer = QueryFuzzer(randomSeed(), &std::cout, &std::cerr);
}
Client() = default;
void initialize(Poco::Util::Application & self) override;

View File

@ -1,2 +1,2 @@
clickhouse_add_executable (validate-odbc-connection-string validate-odbc-connection-string.cpp ../validateODBCConnectionString.cpp)
target_link_libraries (validate-odbc-connection-string PRIVATE clickhouse_common_io)
target_link_libraries (validate-odbc-connection-string PRIVATE clickhouse_common_io clickhouse_common_config)

View File

@ -4124,9 +4124,7 @@ void QueryAnalyzer::resolveInterpolateColumnsNodeList(QueryTreeNodePtr & interpo
auto * column_to_interpolate = interpolate_node_typed.getExpression()->as<IdentifierNode>();
if (!column_to_interpolate)
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"INTERPOLATE can work only for identifiers, but {} is found",
throw Exception(ErrorCodes::LOGICAL_ERROR, "INTERPOLATE can work only for indentifiers, but {} is found",
interpolate_node_typed.getExpression()->formatASTForErrorMessage());
auto column_to_interpolate_name = column_to_interpolate->getIdentifier().getFullName();

View File

@ -308,9 +308,16 @@ public:
ClientBase::~ClientBase()
{
writeSignalIDtoSignalPipe(SignalListener::StopThread);
signal_listener_thread.join();
HandledSignals::instance().reset();
try
{
writeSignalIDtoSignalPipe(SignalListener::StopThread);
signal_listener_thread.join();
HandledSignals::instance().reset();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
}
ClientBase::ClientBase(

View File

@ -6,7 +6,6 @@
#include <Common/ProgressIndication.h>
#include <Common/InterruptListener.h>
#include <Common/ShellCommand.h>
#include <Common/QueryFuzzer.h>
#include <Common/Stopwatch.h>
#include <Common/DNSResolver.h>
#include <Core/ExternalTable.h>
@ -17,6 +16,7 @@
#include <Poco/SplitterChannel.h>
#include <Interpreters/Context.h>
#include <Client/Suggest.h>
#include <Client/QueryFuzzer.h>
#include <boost/program_options.hpp>
#include <Storages/StorageFile.h>
#include <Storages/SelectQueryInfo.h>

View File

@ -68,21 +68,22 @@ Field QueryFuzzer::getRandomField(int type)
{
case 0:
{
return bad_int64_values[fuzz_rand() % std::size(bad_int64_values)];
return bad_int64_values[fuzz_rand() % (sizeof(bad_int64_values)
/ sizeof(*bad_int64_values))];
}
case 1:
{
static constexpr double values[]
= {NAN, INFINITY, -INFINITY, 0., -0., 0.0001, 0.5, 0.9999,
1., 1.0001, 2., 10.0001, 100.0001, 1000.0001, 1e10, 1e20,
FLT_MIN, FLT_MIN + FLT_EPSILON, FLT_MAX, FLT_MAX + FLT_EPSILON}; return values[fuzz_rand() % std::size(values)];
FLT_MIN, FLT_MIN + FLT_EPSILON, FLT_MAX, FLT_MAX + FLT_EPSILON}; return values[fuzz_rand() % (sizeof(values) / sizeof(*values))];
}
case 2:
{
static constexpr UInt64 scales[] = {0, 1, 2, 10};
return DecimalField<Decimal64>(
bad_int64_values[fuzz_rand() % std::size(bad_int64_values)],
static_cast<UInt32>(scales[fuzz_rand() % std::size(scales)])
bad_int64_values[fuzz_rand() % (sizeof(bad_int64_values) / sizeof(*bad_int64_values))],
static_cast<UInt32>(scales[fuzz_rand() % (sizeof(scales) / sizeof(*scales))])
);
}
default:
@ -164,8 +165,7 @@ Field QueryFuzzer::fuzzField(Field field)
{
size_t pos = fuzz_rand() % arr.size();
arr.erase(arr.begin() + pos);
if (debug_stream)
*debug_stream << "erased\n";
std::cerr << "erased\n";
}
if (fuzz_rand() % 5 == 0)
@ -174,14 +174,12 @@ Field QueryFuzzer::fuzzField(Field field)
{
size_t pos = fuzz_rand() % arr.size();
arr.insert(arr.begin() + pos, fuzzField(arr[pos]));
if (debug_stream)
*debug_stream << fmt::format("inserted (pos {})\n", pos);
std::cerr << fmt::format("inserted (pos {})\n", pos);
}
else
{
arr.insert(arr.begin(), getRandomField(0));
if (debug_stream)
*debug_stream << "inserted (0)\n";
std::cerr << "inserted (0)\n";
}
}
@ -199,9 +197,7 @@ Field QueryFuzzer::fuzzField(Field field)
{
size_t pos = fuzz_rand() % arr.size();
arr.erase(arr.begin() + pos);
if (debug_stream)
*debug_stream << "erased\n";
std::cerr << "erased\n";
}
if (fuzz_rand() % 5 == 0)
@ -210,16 +206,12 @@ Field QueryFuzzer::fuzzField(Field field)
{
size_t pos = fuzz_rand() % arr.size();
arr.insert(arr.begin() + pos, fuzzField(arr[pos]));
if (debug_stream)
*debug_stream << fmt::format("inserted (pos {})\n", pos);
std::cerr << fmt::format("inserted (pos {})\n", pos);
}
else
{
arr.insert(arr.begin(), getRandomField(0));
if (debug_stream)
*debug_stream << "inserted (0)\n";
std::cerr << "inserted (0)\n";
}
}
@ -352,8 +344,7 @@ void QueryFuzzer::fuzzOrderByList(IAST * ast)
}
else
{
if (debug_stream)
*debug_stream << "No random column.\n";
std::cerr << "No random column.\n";
}
}
@ -387,8 +378,7 @@ void QueryFuzzer::fuzzColumnLikeExpressionList(IAST * ast)
if (col)
impl->children.insert(pos, col);
else
if (debug_stream)
*debug_stream << "No random column.\n";
std::cerr << "No random column.\n";
}
// We don't have to recurse here to fuzz the children, this is handled by
@ -1371,15 +1361,11 @@ void QueryFuzzer::fuzzMain(ASTPtr & ast)
collectFuzzInfoMain(ast);
fuzz(ast);
if (out_stream)
{
*out_stream << std::endl;
WriteBufferFromOStream ast_buf(*out_stream, 4096);
formatAST(*ast, ast_buf, false /*highlight*/);
ast_buf.finalize();
*out_stream << std::endl << std::endl;
}
std::cout << std::endl;
WriteBufferFromOStream ast_buf(std::cout, 4096);
formatAST(*ast, ast_buf, false /*highlight*/);
ast_buf.finalize();
std::cout << std::endl << std::endl;
}
}

View File

@ -35,31 +35,9 @@ struct ASTWindowDefinition;
* queries, so you want to feed it a lot of queries to get some interesting mix
* of them. Normally we feed SQL regression tests to it.
*/
class QueryFuzzer
struct QueryFuzzer
{
public:
explicit QueryFuzzer(pcg64 fuzz_rand_ = randomSeed(), std::ostream * out_stream_ = nullptr, std::ostream * debug_stream_ = nullptr)
: fuzz_rand(fuzz_rand_)
, out_stream(out_stream_)
, debug_stream(debug_stream_)
{
}
// This is the only function you have to call -- it will modify the passed
// ASTPtr to point to new AST with some random changes.
void fuzzMain(ASTPtr & ast);
ASTs getInsertQueriesForFuzzedTables(const String & full_query);
ASTs getDropQueriesForFuzzedTables(const ASTDropQuery & drop_query);
void notifyQueryFailed(ASTPtr ast);
static bool isSuitableForFuzzing(const ASTCreateQuery & create);
private:
pcg64 fuzz_rand;
std::ostream * out_stream = nullptr;
std::ostream * debug_stream = nullptr;
pcg64 fuzz_rand{randomSeed()};
// We add elements to expression lists with fixed probability. Some elements
// are so large, that the expected number of elements we add to them is
@ -88,6 +66,10 @@ private:
std::unordered_map<std::string, size_t> index_of_fuzzed_table;
std::set<IAST::Hash> created_tables_hashes;
// This is the only function you have to call -- it will modify the passed
// ASTPtr to point to new AST with some random changes.
void fuzzMain(ASTPtr & ast);
// Various helper functions follow, normally you shouldn't have to call them.
Field getRandomField(int type);
Field fuzzField(Field field);
@ -95,6 +77,9 @@ private:
ASTPtr getRandomExpressionList();
DataTypePtr fuzzDataType(DataTypePtr type);
DataTypePtr getRandomType();
ASTs getInsertQueriesForFuzzedTables(const String & full_query);
ASTs getDropQueriesForFuzzedTables(const ASTDropQuery & drop_query);
void notifyQueryFailed(ASTPtr ast);
void replaceWithColumnLike(ASTPtr & ast);
void replaceWithTableLike(ASTPtr & ast);
void fuzzOrderByElement(ASTOrderByElement * elem);
@ -117,6 +102,8 @@ private:
void addTableLike(ASTPtr ast);
void addColumnLike(ASTPtr ast);
void collectFuzzInfoRecurse(ASTPtr ast);
static bool isSuitableForFuzzing(const ASTCreateQuery & create);
};
}

View File

@ -59,6 +59,7 @@ static struct InitFiu
ONCE(execute_query_calling_empty_set_result_func_on_exception) \
ONCE(receive_timeout_on_table_status_response) \
REGULAR(keepermap_fail_drop_data) \
REGULAR(lazy_pipe_fds_fail_close) \
namespace FailPoints

View File

@ -1,19 +1,23 @@
#include <Common/PipeFDs.h>
#include <Common/Exception.h>
#include <Common/formatReadable.h>
#include <Common/FailPoint.h>
#include <Common/logger_useful.h>
#include <base/errnoToString.h>
#include <unistd.h>
#include <fcntl.h>
#include <string>
#include <algorithm>
namespace DB
{
namespace FailPoints
{
extern const char lazy_pipe_fds_fail_close[];
}
namespace ErrorCodes
{
extern const int CANNOT_PIPE;
@ -42,6 +46,11 @@ void LazyPipeFDs::open()
void LazyPipeFDs::close()
{
fiu_do_on(FailPoints::lazy_pipe_fds_fail_close,
{
throw Exception(ErrorCodes::CANNOT_PIPE, "Manually triggered exception on close");
});
for (int & fd : fds_rw)
{
if (fd < 0)

View File

@ -1,8 +1,5 @@
#pragma once
#include <cstddef>
namespace DB
{

View File

@ -33,7 +33,7 @@ namespace DB
namespace
{
#if defined(OS_LINUX)
thread_local size_t write_trace_iteration = 0;
//thread_local size_t write_trace_iteration = 0;
#endif
/// Even after timer_delete() the signal can be delivered,
/// since it does not do anything with pending signals.
@ -57,7 +57,7 @@ namespace
auto saved_errno = errno; /// We must restore previous value of errno in signal handler.
#if defined(OS_LINUX)
#if defined(OS_LINUX) && false //asdqwe
if (info)
{
int overrun_count = info->si_overrun;
@ -92,7 +92,7 @@ namespace
constexpr bool sanitizer = false;
#endif
asynchronous_stack_unwinding = true;
//asdqwe asynchronous_stack_unwinding = true;
if (sanitizer || 0 == sigsetjmp(asynchronous_stack_unwinding_signal_jump_buffer, 1))
{
stack_trace.emplace(signal_context);

View File

@ -605,7 +605,14 @@ void HandledSignals::reset()
HandledSignals::~HandledSignals()
{
reset();
try
{
reset();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
};
HandledSignals & HandledSignals::instance()

View File

@ -489,11 +489,22 @@ struct CacheEntry
using CacheEntryPtr = std::shared_ptr<CacheEntry>;
using StackTraceCache = std::map<StackTraceTriple, CacheEntryPtr, std::less<>>;
static constinit std::atomic<bool> can_use_cache = false;
using StackTraceCacheBase = std::map<StackTraceTriple, CacheEntryPtr, std::less<>>;
struct StackTraceCache : public StackTraceCacheBase
{
~StackTraceCache()
{
can_use_cache = false;
}
};
static StackTraceCache & cacheInstance()
{
static StackTraceCache cache;
can_use_cache = true;
return cache;
}
@ -503,6 +514,13 @@ String toStringCached(const StackTrace::FramePointers & pointers, size_t offset,
{
const StackTraceRefTriple key{pointers, offset, size};
if (!can_use_cache)
{
DB::WriteBufferFromOwnString out;
toStringEveryLineImpl(false, key, [&](std::string_view str) { out << str << '\n'; });
return out.str();
}
/// Calculation of stack trace text is extremely slow.
/// We use cache because otherwise the server could be overloaded by trash queries.
/// Note that this cache can grow unconditionally, but practically it should be small.

View File

@ -1,14 +1,14 @@
clickhouse_add_executable (hashes_test hashes_test.cpp)
target_link_libraries (hashes_test PRIVATE clickhouse_common_io ch_contrib::cityhash)
target_link_libraries (hashes_test PRIVATE clickhouse_common_io clickhouse_common_config ch_contrib::cityhash)
if (TARGET OpenSSL::Crypto)
target_link_libraries (hashes_test PRIVATE OpenSSL::Crypto)
endif()
clickhouse_add_executable (sip_hash_perf sip_hash_perf.cpp)
target_link_libraries (sip_hash_perf PRIVATE clickhouse_common_io)
target_link_libraries (sip_hash_perf PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (small_table small_table.cpp)
target_link_libraries (small_table PRIVATE clickhouse_common_io)
target_link_libraries (small_table PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (parallel_aggregation parallel_aggregation.cpp)
target_link_libraries (parallel_aggregation PRIVATE dbms clickhouse_functions)
@ -17,13 +17,13 @@ clickhouse_add_executable (parallel_aggregation2 parallel_aggregation2.cpp)
target_link_libraries (parallel_aggregation2 PRIVATE dbms clickhouse_functions)
clickhouse_add_executable (int_hashes_perf int_hashes_perf.cpp)
target_link_libraries (int_hashes_perf PRIVATE clickhouse_common_io)
target_link_libraries (int_hashes_perf PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (compact_array compact_array.cpp)
target_link_libraries (compact_array PRIVATE clickhouse_common_io)
target_link_libraries (compact_array PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (radix_sort radix_sort.cpp)
target_link_libraries (radix_sort PRIVATE clickhouse_common_io ch_contrib::pdqsort)
target_link_libraries (radix_sort PRIVATE clickhouse_common_io clickhouse_common_config ch_contrib::pdqsort)
clickhouse_add_executable (arena_with_free_lists arena_with_free_lists.cpp)
target_link_libraries (arena_with_free_lists PRIVATE dbms)
@ -33,46 +33,46 @@ target_link_libraries (lru_hash_map_perf PRIVATE dbms)
if (OS_LINUX)
clickhouse_add_executable (thread_creation_latency thread_creation_latency.cpp)
target_link_libraries (thread_creation_latency PRIVATE clickhouse_common_io)
target_link_libraries (thread_creation_latency PRIVATE clickhouse_common_io clickhouse_common_config)
endif()
clickhouse_add_executable (array_cache array_cache.cpp)
target_link_libraries (array_cache PRIVATE clickhouse_common_io)
target_link_libraries (array_cache PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (space_saving space_saving.cpp)
target_link_libraries (space_saving PRIVATE clickhouse_common_io)
target_link_libraries (space_saving PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (integer_hash_tables_benchmark integer_hash_tables_benchmark.cpp)
target_link_libraries (integer_hash_tables_benchmark PRIVATE dbms ch_contrib::abseil_swiss_tables ch_contrib::sparsehash)
clickhouse_add_executable (cow_columns cow_columns.cpp)
target_link_libraries (cow_columns PRIVATE clickhouse_common_io)
target_link_libraries (cow_columns PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (cow_compositions cow_compositions.cpp)
target_link_libraries (cow_compositions PRIVATE clickhouse_common_io)
target_link_libraries (cow_compositions PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (stopwatch stopwatch.cpp)
target_link_libraries (stopwatch PRIVATE clickhouse_common_io)
target_link_libraries (stopwatch PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (symbol_index symbol_index.cpp)
target_link_libraries (symbol_index PRIVATE clickhouse_common_io)
target_link_libraries (symbol_index PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (chaos_sanitizer chaos_sanitizer.cpp)
target_link_libraries (chaos_sanitizer PRIVATE clickhouse_common_io)
target_link_libraries (chaos_sanitizer PRIVATE clickhouse_common_io clickhouse_common_config)
if (OS_LINUX)
clickhouse_add_executable (memory_statistics_os_perf memory_statistics_os_perf.cpp)
target_link_libraries (memory_statistics_os_perf PRIVATE clickhouse_common_io)
target_link_libraries (memory_statistics_os_perf PRIVATE clickhouse_common_io clickhouse_common_config)
endif()
clickhouse_add_executable (procfs_metrics_provider_perf procfs_metrics_provider_perf.cpp)
target_link_libraries (procfs_metrics_provider_perf PRIVATE clickhouse_common_io)
target_link_libraries (procfs_metrics_provider_perf PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (average average.cpp)
target_link_libraries (average PRIVATE clickhouse_common_io)
target_link_libraries (average PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (shell_command_inout shell_command_inout.cpp)
target_link_libraries (shell_command_inout PRIVATE clickhouse_common_io)
target_link_libraries (shell_command_inout PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (executable_udf executable_udf.cpp)
target_link_libraries (executable_udf PRIVATE dbms)
@ -91,4 +91,4 @@ if (ENABLE_SSL)
endif()
clickhouse_add_executable (check_pointer_valid check_pointer_valid.cpp)
target_link_libraries (check_pointer_valid PRIVATE clickhouse_common_io)
target_link_libraries (check_pointer_valid PRIVATE clickhouse_common_io clickhouse_common_config)

View File

@ -1,2 +1,2 @@
clickhouse_add_executable (mysqlxx_pool_test mysqlxx_pool_test.cpp)
target_link_libraries (mysqlxx_pool_test PRIVATE mysqlxx)
target_link_libraries (mysqlxx_pool_test PRIVATE mysqlxx clickhouse_common_config)

View File

@ -1,2 +1,2 @@
clickhouse_add_executable (compressed_buffer compressed_buffer.cpp)
target_link_libraries (compressed_buffer PRIVATE clickhouse_common_io clickhouse_compression)
target_link_libraries (compressed_buffer PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression)

View File

@ -605,7 +605,7 @@ class IColumn;
M(Bool, optimize_if_chain_to_multiif, false, "Replace if(cond1, then1, if(cond2, ...)) chains to multiIf. Currently it's not beneficial for numeric types.", 0) \
M(Bool, optimize_multiif_to_if, true, "Replace 'multiIf' with only one condition to 'if'.", 0) \
M(Bool, optimize_if_transform_strings_to_enum, false, "Replaces string-type arguments in If and Transform to enum. Disabled by default cause it could make inconsistent change in distributed query that would lead to its fail.", 0) \
M(Bool, optimize_functions_to_subcolumns, true, "Transform functions to subcolumns, if possible, to reduce amount of read data. E.g. 'length(arr)' -> 'arr.size0', 'col IS NULL' -> 'col.null' ", 0) \
M(Bool, optimize_functions_to_subcolumns, false, "Transform functions to subcolumns, if possible, to reduce amount of read data. E.g. 'length(arr)' -> 'arr.size0', 'col IS NULL' -> 'col.null' ", 0) \
M(Bool, optimize_using_constraints, false, "Use constraints for query optimization", 0) \
M(Bool, optimize_substitute_columns, false, "Use constraints for column substitution", 0) \
M(Bool, optimize_append_index, false, "Use constraints in order to append index condition (indexHint)", 0) \
@ -766,7 +766,7 @@ class IColumn;
M(UInt64, merge_tree_min_rows_for_concurrent_read_for_remote_filesystem, (20 * 8192), "If at least as many lines are read from one file, the reading can be parallelized, when reading from remote filesystem.", 0) \
M(UInt64, merge_tree_min_bytes_for_concurrent_read_for_remote_filesystem, (24 * 10 * 1024 * 1024), "If at least as many bytes are read from one file, the reading can be parallelized, when reading from remote filesystem.", 0) \
M(UInt64, remote_read_min_bytes_for_seek, 4 * DBMS_DEFAULT_BUFFER_SIZE, "Min bytes required for remote read (url, s3) to do seek, instead of read with ignore.", 0) \
M(UInt64, merge_tree_min_bytes_per_task_for_remote_reading, 4 * DBMS_DEFAULT_BUFFER_SIZE, "Min bytes to read per task.", 0) \
M(UInt64, merge_tree_min_bytes_per_task_for_remote_reading, 2 * DBMS_DEFAULT_BUFFER_SIZE, "Min bytes to read per task.", 0) ALIAS(filesystem_prefetch_min_bytes_for_single_read_task) \
M(Bool, merge_tree_use_const_size_tasks_for_remote_reading, true, "Whether to use constant size tasks for reading from a remote table.", 0) \
M(Bool, merge_tree_determine_task_size_by_prewhere_columns, true, "Whether to use only prewhere columns size to determine reading task size.", 0) \
M(UInt64, merge_tree_compact_parts_min_granules_to_multibuffer_read, 16, "Only available in ClickHouse Cloud", 0) \
@ -808,7 +808,6 @@ class IColumn;
M(UInt64, prefetch_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the prefetch buffer to read from the filesystem.", 0) \
M(UInt64, filesystem_prefetch_step_bytes, 0, "Prefetch step in bytes. Zero means `auto` - approximately the best prefetch step will be auto deduced, but might not be 100% the best. The actual value might be different because of setting filesystem_prefetch_min_bytes_for_single_read_task", 0) \
M(UInt64, filesystem_prefetch_step_marks, 0, "Prefetch step in marks. Zero means `auto` - approximately the best prefetch step will be auto deduced, but might not be 100% the best. The actual value might be different because of setting filesystem_prefetch_min_bytes_for_single_read_task", 0) \
M(UInt64, filesystem_prefetch_min_bytes_for_single_read_task, "2Mi", "Do not parallelize within one file read less than this amount of bytes. E.g. one reader will not receive a read task of size less than this amount. This setting is recommended to avoid spikes of time for aws getObject requests to aws", 0) \
M(UInt64, filesystem_prefetch_max_memory_usage, "1Gi", "Maximum memory usage for prefetches.", 0) \
M(UInt64, filesystem_prefetches_limit, 200, "Maximum number of prefetches. Zero means unlimited. A setting `filesystem_prefetches_max_memory_usage` is more recommended if you want to limit the number of prefetches", 0) \
\

View File

@ -63,7 +63,6 @@ static std::initializer_list<std::pair<ClickHouseVersion, SettingsChangesHistory
{"output_format_native_encode_types_in_binary_format", false, false, "Added new setting to allow to write type names in binary format in Native output format"},
{"input_format_native_decode_types_in_binary_format", false, false, "Added new setting to allow to read type names in binary format in Native output format"},
{"read_in_order_use_buffering", false, true, "Use buffering before merging while reading in order of primary key"},
{"optimize_functions_to_subcolumns", false, true, "Enable optimization by default"},
{"enable_named_columns_in_function_tuple", false, true, "Generate named tuples in function tuple() when all names are unique and can be treated as unquoted identifiers."},
{"input_format_json_ignore_key_case", false, false, "Ignore json key case while read json field from string."},
{"optimize_trivial_insert_select", true, false, "The optimization does not make sense in many cases."},
@ -77,6 +76,7 @@ static std::initializer_list<std::pair<ClickHouseVersion, SettingsChangesHistory
{"azure_sdk_max_retries", 10, 10, "Maximum number of retries in azure sdk"},
{"azure_sdk_retry_initial_backoff_ms", 10, 10, "Minimal backoff between retries in azure sdk"},
{"azure_sdk_retry_max_backoff_ms", 1000, 1000, "Maximal backoff between retries in azure sdk"},
{"merge_tree_min_bytes_per_task_for_remote_reading", 4194304, 2097152, "Value is unified with `filesystem_prefetch_min_bytes_for_single_read_task`"},
{"ignore_on_cluster_for_replicated_named_collections_queries", false, false, "Ignore ON CLUSTER clause for replicated named collections management queries."},
{"backup_restore_s3_retry_attempts", 1000,1000, "Setting for Aws::Client::RetryStrategy, Aws::Client does retries itself, 0 means no retries. It takes place only for backup/restore."},
{"postgresql_connection_attempt_timeout", 2, 2, "Allow to control 'connect_timeout' parameter of PostgreSQL connection."},
@ -147,7 +147,7 @@ static std::initializer_list<std::pair<ClickHouseVersion, SettingsChangesHistory
{"default_table_engine", "None", "MergeTree", "Set default table engine to MergeTree for better usability"},
{"input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects", false, false, "Allow to use String type for ambiguous paths during named tuple inference from JSON objects"},
{"traverse_shadow_remote_data_paths", false, false, "Traverse shadow directory when query system.remote_data_paths."},
{"throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert", false, true, "Deduplication is dependent materialized view cannot work together with async inserts."},
{"throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert", false, true, "Deduplication in dependent materialized view cannot work together with async inserts."},
{"parallel_replicas_allow_in_with_subquery", false, true, "If true, subquery for IN will be executed on every follower replica"},
{"log_processors_profiles", false, true, "Enable by default"},
{"function_locate_has_mysql_compatible_argument_order", false, true, "Increase compatibility with MySQL's locate function."},

View File

@ -1,8 +1,8 @@
clickhouse_add_executable (string_pool string_pool.cpp)
target_link_libraries (string_pool PRIVATE clickhouse_common_io ch_contrib::sparsehash)
target_link_libraries (string_pool PRIVATE clickhouse_common_io clickhouse_common_config ch_contrib::sparsehash)
clickhouse_add_executable (field field.cpp)
target_link_libraries (field PRIVATE dbms)
clickhouse_add_executable (string_ref_hash string_ref_hash.cpp)
target_link_libraries (string_ref_hash PRIVATE clickhouse_common_io)
target_link_libraries (string_ref_hash PRIVATE clickhouse_common_io clickhouse_common_config)

View File

@ -146,10 +146,19 @@ BaseDaemon::BaseDaemon() = default;
BaseDaemon::~BaseDaemon()
{
writeSignalIDtoSignalPipe(SignalListener::StopThread);
signal_listener_thread.join();
HandledSignals::instance().reset();
SentryWriter::resetInstance();
try
{
writeSignalIDtoSignalPipe(SignalListener::StopThread);
signal_listener_thread.join();
HandledSignals::instance().reset();
SentryWriter::resetInstance();
}
catch (...)
{
tryLogCurrentException(&logger());
}
OwnSplitChannel::disableLogging();
}

View File

@ -559,8 +559,11 @@ void DatabaseReplicated::createEmptyLogEntry(const ZooKeeperPtr & current_zookee
bool DatabaseReplicated::waitForReplicaToProcessAllEntries(UInt64 timeout_ms)
{
if (!ddl_worker || is_probably_dropped)
return false;
{
std::lock_guard lock{ddl_worker_mutex};
if (!ddl_worker || is_probably_dropped)
return false;
}
return ddl_worker->waitForReplicaToProcessAllEntries(timeout_ms);
}
@ -641,7 +644,10 @@ LoadTaskPtr DatabaseReplicated::startupDatabaseAsync(AsyncLoader & async_loader,
if (is_probably_dropped)
return;
ddl_worker = std::make_unique<DatabaseReplicatedDDLWorker>(this, getContext());
{
std::lock_guard lock{ddl_worker_mutex};
ddl_worker = std::make_unique<DatabaseReplicatedDDLWorker>(this, getContext());
}
ddl_worker->startup();
ddl_worker_initialized = true;
});
@ -671,92 +677,96 @@ void DatabaseReplicated::stopLoading()
DatabaseAtomic::stopLoading();
}
bool DatabaseReplicated::checkDigestValid(const ContextPtr & local_context, bool debug_check /* = true */) const
void DatabaseReplicated::dumpLocalTablesForDebugOnly(const ContextPtr & local_context) const
{
if (debug_check)
auto table_names = getAllTableNames(context.lock());
for (const auto & table_name : table_names)
{
/// Reduce number of debug checks
if (thread_local_rng() % 16)
return true;
auto ast_ptr = tryGetCreateTableQuery(table_name, local_context);
if (ast_ptr)
LOG_DEBUG(log, "[local] Table {} create query is {}", table_name, queryToString(ast_ptr));
else
LOG_DEBUG(log, "[local] Table {} has no create query", table_name);
}
LOG_TEST(log, "Current in-memory metadata digest: {}", tables_metadata_digest);
/// Database is probably being dropped
if (!local_context->getZooKeeperMetadataTransaction() && (!ddl_worker || !ddl_worker->isCurrentlyActive()))
return true;
UInt64 local_digest = 0;
{
std::lock_guard lock{mutex};
for (const auto & table : TSA_SUPPRESS_WARNING_FOR_READ(tables))
local_digest += getMetadataHash(table.first);
}
if (local_digest != tables_metadata_digest)
{
LOG_ERROR(log, "Digest of local metadata ({}) is not equal to in-memory digest ({})", local_digest, tables_metadata_digest);
return false;
}
/// Do not check digest in Keeper after internal subquery, it's probably not committed yet
if (local_context->isInternalSubquery())
return true;
/// Check does not make sense to check digest in Keeper during recovering
if (is_recovering)
return true;
String zk_digest = getZooKeeper()->get(replica_path + "/digest");
String local_digest_str = toString(local_digest);
if (zk_digest != local_digest_str)
{
LOG_ERROR(log, "Digest of local metadata ({}) is not equal to digest in Keeper ({})", local_digest_str, zk_digest);
return false;
}
return true;
}
void DatabaseReplicated::checkQueryValid(const ASTPtr & query, ContextPtr query_context) const
void DatabaseReplicated::dumpTablesInZooKeeperForDebugOnly() const
{
/// Replicas will set correct name of current database in query context (database name can be different on replicas)
if (auto * ddl_query = dynamic_cast<ASTQueryWithTableAndOutput *>(query.get()))
UInt32 max_log_ptr;
auto table_name_to_metadata = tryGetConsistentMetadataSnapshot(getZooKeeper(), max_log_ptr);
for (const auto & [table_name, create_table_query] : table_name_to_metadata)
{
if (ddl_query->getDatabase() != getDatabaseName())
throw Exception(ErrorCodes::UNKNOWN_DATABASE, "Database was renamed");
ddl_query->database.reset();
if (auto * create = query->as<ASTCreateQuery>())
auto query_ast = parseQueryFromMetadataInZooKeeper(table_name, create_table_query);
if (query_ast)
{
if (create->storage)
checkTableEngine(*create, *create->storage, query_context);
LOG_DEBUG(log, "[zookeeper] Table {} create query is {}", table_name, queryToString(query_ast));
}
else
{
LOG_DEBUG(log, "[zookeeper] Table {} has no create query", table_name);
}
}
}
if (create->targets)
void DatabaseReplicated::tryCompareLocalAndZooKeeperTablesAndDumpDiffForDebugOnly(const ContextPtr & local_context) const
{
UInt32 max_log_ptr;
auto table_name_to_metadata_in_zk = tryGetConsistentMetadataSnapshot(getZooKeeper(), max_log_ptr);
auto table_names_local = getAllTableNames(local_context);
if (table_name_to_metadata_in_zk.size() != table_names_local.size())
LOG_DEBUG(log, "Amount of tables in zk {} locally {}", table_name_to_metadata_in_zk.size(), table_names_local.size());
std::unordered_set<std::string> checked_tables;
for (const auto & table_name : table_names_local)
{
auto local_ast_ptr = tryGetCreateTableQuery(table_name, local_context);
if (table_name_to_metadata_in_zk.contains(table_name))
{
checked_tables.insert(table_name);
auto create_table_query_in_zk = table_name_to_metadata_in_zk[table_name];
auto zk_ast_ptr = parseQueryFromMetadataInZooKeeper(table_name, create_table_query_in_zk);
if (local_ast_ptr == nullptr && zk_ast_ptr == nullptr)
{
for (const auto & inner_table_engine : create->targets->getInnerEngines())
checkTableEngine(*create, *inner_table_engine, query_context);
LOG_DEBUG(log, "AST for table {} is the same (nullptr) in local and ZK", table_name);
}
else if (local_ast_ptr != nullptr && zk_ast_ptr != nullptr && queryToString(local_ast_ptr) != queryToString(zk_ast_ptr))
{
LOG_DEBUG(log, "AST differs for table {}, local {}, in zookeeper {}", table_name, queryToString(local_ast_ptr), queryToString(zk_ast_ptr));
}
else if (local_ast_ptr == nullptr)
{
LOG_DEBUG(log, "AST differs for table {}, local nullptr, in zookeeper {}", table_name, queryToString(zk_ast_ptr));
}
else if (zk_ast_ptr == nullptr)
{
LOG_DEBUG(log, "AST differs for table {}, local {}, in zookeeper nullptr", table_name, queryToString(local_ast_ptr));
}
else
{
LOG_DEBUG(log, "AST for table {} is the same in local and ZK", table_name);
}
}
}
if (const auto * query_alter = query->as<ASTAlterQuery>())
{
for (const auto & command : query_alter->command_list->children)
else
{
if (!isSupportedAlterTypeForOnClusterDDLQuery(command->as<ASTAlterCommand&>().type))
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Unsupported type of ALTER query");
if (local_ast_ptr == nullptr)
LOG_DEBUG(log, "Table {} exists locally, but missing in ZK", table_name);
else
LOG_DEBUG(log, "Table {} exists locally with AST {}, but missing in ZK", table_name, queryToString(local_ast_ptr));
}
}
if (auto * query_drop = query->as<ASTDropQuery>())
for (const auto & [table_name, table_metadata] : table_name_to_metadata_in_zk)
{
if (query_drop->kind == ASTDropQuery::Kind::Detach && query_context->getSettingsRef().database_replicated_always_detach_permanently)
query_drop->permanently = true;
if (query_drop->kind == ASTDropQuery::Kind::Detach && !query_drop->permanently)
throw Exception(ErrorCodes::INCORRECT_QUERY, "DETACH TABLE is not allowed for Replicated databases. "
"Use DETACH TABLE PERMANENTLY or SYSTEM RESTART REPLICA or set "
"database_replicated_always_detach_permanently to 1");
if (!checked_tables.contains(table_name))
{
auto zk_ast_ptr = parseQueryFromMetadataInZooKeeper(table_name, table_metadata);
if (zk_ast_ptr == nullptr)
LOG_DEBUG(log, "Table {} exists in ZK with AST {}, but missing locally", table_name, queryToString(zk_ast_ptr));
else
LOG_DEBUG(log, "Table {} exists in ZK, but missing locally", table_name);
}
}
}
@ -839,6 +849,107 @@ void DatabaseReplicated::checkTableEngine(const ASTCreateQuery & query, ASTStora
"to distinguish different shards and replicas");
}
bool DatabaseReplicated::checkDigestValid(const ContextPtr & local_context, bool debug_check /* = true */) const
{
if (debug_check)
{
/// Reduce number of debug checks
if (thread_local_rng() % 16)
return true;
}
LOG_TEST(log, "Current in-memory metadata digest: {}", tables_metadata_digest);
/// Database is probably being dropped
if (!local_context->getZooKeeperMetadataTransaction() && (!ddl_worker || !ddl_worker->isCurrentlyActive()))
return true;
UInt64 local_digest = 0;
{
std::lock_guard lock{mutex};
for (const auto & table : TSA_SUPPRESS_WARNING_FOR_READ(tables))
local_digest += getMetadataHash(table.first);
}
if (local_digest != tables_metadata_digest)
{
LOG_ERROR(log, "Digest of local metadata ({}) is not equal to in-memory digest ({})", local_digest, tables_metadata_digest);
#ifndef NDEBUG
dumpLocalTablesForDebugOnly(local_context);
dumpTablesInZooKeeperForDebugOnly();
tryCompareLocalAndZooKeeperTablesAndDumpDiffForDebugOnly(local_context);
#endif
return false;
}
/// Do not check digest in Keeper after internal subquery, it's probably not committed yet
if (local_context->isInternalSubquery())
return true;
/// Check does not make sense to check digest in Keeper during recovering
if (is_recovering)
return true;
String zk_digest = getZooKeeper()->get(replica_path + "/digest");
String local_digest_str = toString(local_digest);
if (zk_digest != local_digest_str)
{
LOG_ERROR(log, "Digest of local metadata ({}) is not equal to digest in Keeper ({})", local_digest_str, zk_digest);
#ifndef NDEBUG
dumpLocalTablesForDebugOnly(local_context);
dumpTablesInZooKeeperForDebugOnly();
tryCompareLocalAndZooKeeperTablesAndDumpDiffForDebugOnly(local_context);
#endif
return false;
}
return true;
}
void DatabaseReplicated::checkQueryValid(const ASTPtr & query, ContextPtr query_context) const
{
/// Replicas will set correct name of current database in query context (database name can be different on replicas)
if (auto * ddl_query = dynamic_cast<ASTQueryWithTableAndOutput *>(query.get()))
{
if (ddl_query->getDatabase() != getDatabaseName())
throw Exception(ErrorCodes::UNKNOWN_DATABASE, "Database was renamed");
ddl_query->database.reset();
if (auto * create = query->as<ASTCreateQuery>())
{
if (create->storage)
checkTableEngine(*create, *create->storage, query_context);
if (create->targets)
{
for (const auto & inner_table_engine : create->targets->getInnerEngines())
checkTableEngine(*create, *inner_table_engine, query_context);
}
}
}
if (const auto * query_alter = query->as<ASTAlterQuery>())
{
for (const auto & command : query_alter->command_list->children)
{
if (!isSupportedAlterTypeForOnClusterDDLQuery(command->as<ASTAlterCommand&>().type))
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Unsupported type of ALTER query");
}
}
if (auto * query_drop = query->as<ASTDropQuery>())
{
if (query_drop->kind == ASTDropQuery::Kind::Detach && query_context->getSettingsRef().database_replicated_always_detach_permanently)
query_drop->permanently = true;
if (query_drop->kind == ASTDropQuery::Kind::Detach && !query_drop->permanently)
throw Exception(ErrorCodes::INCORRECT_QUERY, "DETACH TABLE is not allowed for Replicated databases. "
"Use DETACH TABLE PERMANENTLY or SYSTEM RESTART REPLICA or set "
"database_replicated_always_detach_permanently to 1");
}
}
BlockIO DatabaseReplicated::tryEnqueueReplicatedDDL(const ASTPtr & query, ContextPtr query_context, QueryFlags flags)
{
waitDatabaseStarted();
@ -1253,7 +1364,7 @@ void DatabaseReplicated::recoverLostReplica(const ZooKeeperPtr & current_zookeep
current_zookeeper->set(replica_path + "/digest", toString(tables_metadata_digest));
}
std::map<String, String> DatabaseReplicated::tryGetConsistentMetadataSnapshot(const ZooKeeperPtr & zookeeper, UInt32 & max_log_ptr)
std::map<String, String> DatabaseReplicated::tryGetConsistentMetadataSnapshot(const ZooKeeperPtr & zookeeper, UInt32 & max_log_ptr) const
{
return getConsistentMetadataSnapshotImpl(zookeeper, {}, /* max_retries= */ 10, max_log_ptr);
}
@ -1314,7 +1425,7 @@ std::map<String, String> DatabaseReplicated::getConsistentMetadataSnapshotImpl(
return table_name_to_metadata;
}
ASTPtr DatabaseReplicated::parseQueryFromMetadataInZooKeeper(const String & node_name, const String & query)
ASTPtr DatabaseReplicated::parseQueryFromMetadataInZooKeeper(const String & node_name, const String & query) const
{
ParserCreateQuery parser;
String description = "in ZooKeeper " + zookeeper_path + "/metadata/" + node_name;
@ -1411,6 +1522,7 @@ void DatabaseReplicated::renameDatabase(ContextPtr query_context, const String &
void DatabaseReplicated::stopReplication()
{
std::lock_guard lock{ddl_worker_mutex};
if (ddl_worker)
ddl_worker->shutdown();
}

View File

@ -109,14 +109,15 @@ private:
void checkQueryValid(const ASTPtr & query, ContextPtr query_context) const;
void checkTableEngine(const ASTCreateQuery & query, ASTStorage & storage, ContextPtr query_context) const;
void recoverLostReplica(const ZooKeeperPtr & current_zookeeper, UInt32 our_log_ptr, UInt32 & max_log_ptr);
std::map<String, String> tryGetConsistentMetadataSnapshot(const ZooKeeperPtr & zookeeper, UInt32 & max_log_ptr);
std::map<String, String> tryGetConsistentMetadataSnapshot(const ZooKeeperPtr & zookeeper, UInt32 & max_log_ptr) const;
std::map<String, String> getConsistentMetadataSnapshotImpl(const ZooKeeperPtr & zookeeper, const FilterByNameFunction & filter_by_table_name,
size_t max_retries, UInt32 & max_log_ptr) const;
ASTPtr parseQueryFromMetadataInZooKeeper(const String & node_name, const String & query);
ASTPtr parseQueryFromMetadataInZooKeeper(const String & node_name, const String & query) const;
String readMetadataFile(const String & table_name) const;
ClusterPtr getClusterImpl(bool all_groups = false) const;
@ -132,6 +133,11 @@ private:
UInt64 getMetadataHash(const String & table_name) const;
bool checkDigestValid(const ContextPtr & local_context, bool debug_check = true) const TSA_REQUIRES(metadata_mutex);
/// For debug purposes only, don't use in production code
void dumpLocalTablesForDebugOnly(const ContextPtr & local_context) const;
void dumpTablesInZooKeeperForDebugOnly() const;
void tryCompareLocalAndZooKeeperTablesAndDumpDiffForDebugOnly(const ContextPtr & local_context) const;
void waitDatabaseStarted() const override;
void stopLoading() override;
@ -149,6 +155,7 @@ private:
std::atomic_bool is_recovering = false;
std::atomic_bool ddl_worker_initialized = false;
std::unique_ptr<DatabaseReplicatedDDLWorker> ddl_worker;
std::mutex ddl_worker_mutex;
UInt32 max_log_ptr_at_creation = 0;
/// Usually operation with metadata are single-threaded because of the way replication works,

View File

@ -289,8 +289,13 @@ StoragePtr DatabaseWithOwnTablesBase::detachTableUnlocked(const String & table_n
tables.erase(it);
table_storage->is_detached = true;
if (!table_storage->isSystemStorage() && database_name != DatabaseCatalog::SYSTEM_DATABASE)
if (!table_storage->isSystemStorage()
&& database_name != DatabaseCatalog::SYSTEM_DATABASE
&& database_name != DatabaseCatalog::TEMPORARY_DATABASE)
{
LOG_TEST(log, "Counting detached table {} to database {}", table_name, database_name);
CurrentMetrics::sub(getAttachedCounterForStorage(table_storage));
}
auto table_id = table_storage->getStorageID();
if (table_id.hasUUID())
@ -334,8 +339,13 @@ void DatabaseWithOwnTablesBase::attachTableUnlocked(const String & table_name, c
/// non-Atomic database the is_detached is set to true before RENAME.
table->is_detached = false;
if (!table->isSystemStorage() && table_id.database_name != DatabaseCatalog::SYSTEM_DATABASE)
if (!table->isSystemStorage()
&& database_name != DatabaseCatalog::SYSTEM_DATABASE
&& database_name != DatabaseCatalog::TEMPORARY_DATABASE)
{
LOG_TEST(log, "Counting attached table {} to database {}", table_name, database_name);
CurrentMetrics::add(getAttachedCounterForStorage(table));
}
}
void DatabaseWithOwnTablesBase::shutdown()

View File

@ -261,7 +261,7 @@ std::optional<size_t> ReadBufferFromAzureBlobStorage::tryGetFileSize()
if (!file_size)
file_size = blob_client->GetProperties().Value.BlobSize;
return *file_size;
return file_size;
}
size_t ReadBufferFromAzureBlobStorage::readBigAt(char * to, size_t n, size_t range_begin, const std::function<bool(size_t)> & /*progress_callback*/) const

View File

@ -874,7 +874,9 @@ void DiskObjectStorageTransaction::writeFileUsingBlobWritingFunction(
/// Create metadata (see create_metadata_callback in DiskObjectStorageTransaction::writeFile()).
if (mode == WriteMode::Rewrite)
{
if (!object_storage.isWriteOnce() && metadata_storage.exists(path))
/// Otherwise we will produce lost blobs which nobody points to
/// WriteOnce storages are not affected by the issue
if (!object_storage.isPlain() && metadata_storage.exists(path))
object_storage.removeObjectsIfExist(metadata_storage.getStorageObjects(path));
metadata_transaction->createMetadataFile(path, std::move(object_key), object_size);

View File

@ -58,7 +58,8 @@ TemporaryFileOnDisk::~TemporaryFileOnDisk()
if (!disk->exists(relative_path))
{
LOG_WARNING(getLogger("TemporaryFileOnDisk"), "Temporary path '{}' does not exist in '{}'", relative_path, disk->getPath());
if (show_warning_if_removed)
LOG_WARNING(getLogger("TemporaryFileOnDisk"), "Temporary path '{}' does not exist in '{}'", relative_path, disk->getPath());
return;
}

View File

@ -27,12 +27,19 @@ public:
/// Return relative path (without disk)
const String & getRelativePath() const { return relative_path; }
/// Sets whether the destructor should show a warning if the temporary file has been already removed.
/// By default a warning is shown.
void setShowWarningIfRemoved(bool show_warning_if_removed_) { show_warning_if_removed = show_warning_if_removed_; }
private:
DiskPtr disk;
/// Relative path in disk to the temporary file or directory
String relative_path;
/// Whether the destructor should show a warning if the temporary file has been already removed.
bool show_warning_if_removed = true;
CurrentMetrics::Increment metric_increment;
/// Specified if we know what for file is used (sort/aggregate/join).

View File

@ -18,6 +18,7 @@ namespace ErrorCodes
extern const int ILLEGAL_COLUMN;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int BAD_ARGUMENTS;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
}
struct Base58Encode
@ -135,7 +136,7 @@ public:
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
if (arguments.size() != 1)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Wrong number of arguments for function {}: 1 expected.", getName());
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Wrong number of arguments for function {}: 1 expected.", getName());
if (!isString(arguments[0].type))
throw Exception(

View File

@ -15,6 +15,7 @@ namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ILLEGAL_COLUMN;
extern const int TOO_FEW_ARGUMENTS_FOR_FUNCTION;
}
class FunctionChar : public IFunction
@ -36,7 +37,7 @@ public:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (arguments.empty())
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
throw Exception(ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION,
"Number of arguments for function {} can't be {}, should be at least 1",
getName(), arguments.size());

View File

@ -59,19 +59,19 @@ public:
bool useDefaultImplementationForConstants() const override { return true; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
const ColumnPtr column = arguments[0].column;
if (const ColumnString * col = checkAndGetColumn<ColumnString>(column.get()))
{
auto col_res = ColumnString::create();
Impl::vector(col->getChars(), col->getOffsets(), col_res->getChars(), col_res->getOffsets());
Impl::vector(col->getChars(), col->getOffsets(), col_res->getChars(), col_res->getOffsets(), input_rows_count);
return col_res;
}
else if (const ColumnFixedString * col_fixed = checkAndGetColumn<ColumnFixedString>(column.get()))
{
auto col_res = ColumnFixedString::create(col_fixed->getN());
Impl::vectorFixed(col_fixed->getChars(), col_fixed->getN(), col_res->getChars());
Impl::vectorFixed(col_fixed->getChars(), col_fixed->getN(), col_res->getChars(), input_rows_count);
return col_res;
}
else

View File

@ -47,85 +47,54 @@ bool allArgumentsAreConstants(const ColumnsWithTypeAndName & args)
return true;
}
/// Replaces single low cardinality column in a function call by its dictionary
/// This can only happen after the arguments have been adapted in IFunctionOverloadResolver::getReturnType
/// as it's only possible if there is one low cardinality column and, optionally, const columns
ColumnPtr replaceLowCardinalityColumnsByNestedAndGetDictionaryIndexes(
ColumnsWithTypeAndName & args, bool can_be_executed_on_default_arguments, size_t input_rows_count)
{
/// We return the LC indexes so the LC can be reconstructed with the function result
size_t num_rows = input_rows_count;
ColumnPtr indexes;
size_t number_low_cardinality_columns = 0;
size_t last_low_cardinality = 0;
size_t number_const_columns = 0;
size_t number_full_columns = 0;
for (size_t i = 0; i < args.size(); i++)
/// Find first LowCardinality column and replace it to nested dictionary.
for (auto & column : args)
{
auto const & arg = args[i];
if (checkAndGetColumn<ColumnLowCardinality>(arg.column.get()))
if (const auto * low_cardinality_column = checkAndGetColumn<ColumnLowCardinality>(column.column.get()))
{
number_low_cardinality_columns++;
last_low_cardinality = i;
/// Single LowCardinality column is supported now.
if (indexes)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Expected single dictionary argument for function.");
const auto * low_cardinality_type = checkAndGetDataType<DataTypeLowCardinality>(column.type.get());
if (!low_cardinality_type)
throw Exception(ErrorCodes::LOGICAL_ERROR,
"Incompatible type for LowCardinality column: {}",
column.type->getName());
if (can_be_executed_on_default_arguments)
{
/// Normal case, when function can be executed on values' default.
column.column = low_cardinality_column->getDictionary().getNestedColumn();
indexes = low_cardinality_column->getIndexesPtr();
}
else
{
/// Special case when default value can't be used. Example: 1 % LowCardinality(Int).
/// LowCardinality always contains default, so 1 % 0 will throw exception in normal case.
auto dict_encoded = low_cardinality_column->getMinimalDictionaryEncodedColumn(0, low_cardinality_column->size());
column.column = dict_encoded.dictionary;
indexes = dict_encoded.indexes;
}
num_rows = column.column->size();
column.type = low_cardinality_type->getDictionaryType();
}
else if (checkAndGetColumn<ColumnConst>(arg.column.get()))
number_const_columns++;
else
number_full_columns++;
}
if (!number_low_cardinality_columns && !number_const_columns)
return nullptr;
if (number_full_columns > 0 || number_low_cardinality_columns > 1)
{
/// This should not be possible but currently there are multiple tests in CI failing because of it
/// TODO: Fix those cases, then enable this exception
#if 0
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected low cardinality types found. Low cardinality: {}. Full {}. Const {}",
number_low_cardinality_columns, number_full_columns, number_const_columns);
#else
return nullptr;
#endif
}
else if (number_low_cardinality_columns == 1)
{
auto & lc_arg = args[last_low_cardinality];
const auto * low_cardinality_type = checkAndGetDataType<DataTypeLowCardinality>(lc_arg.type.get());
if (!low_cardinality_type)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Incompatible type for LowCardinality column: {}", lc_arg.type->getName());
const auto * low_cardinality_column = checkAndGetColumn<ColumnLowCardinality>(lc_arg.column.get());
chassert(low_cardinality_column);
if (can_be_executed_on_default_arguments)
{
/// Normal case, when function can be executed on values' default.
lc_arg.column = low_cardinality_column->getDictionary().getNestedColumn();
indexes = low_cardinality_column->getIndexesPtr();
}
else
{
/// Special case when default value can't be used. Example: 1 % LowCardinality(Int).
/// LowCardinality always contains default, so 1 % 0 will throw exception in normal case.
auto dict_encoded = low_cardinality_column->getMinimalDictionaryEncodedColumn(0, low_cardinality_column->size());
lc_arg.column = dict_encoded.dictionary;
indexes = dict_encoded.indexes;
}
/// The new column will have a different number of rows, normally less but occasionally it might be more (NULL)
input_rows_count = lc_arg.column->size();
lc_arg.type = low_cardinality_type->getDictionaryType();
}
/// Change size of constants
/// Change size of constants.
for (auto & column : args)
{
if (const auto * column_const = checkAndGetColumn<ColumnConst>(column.column.get()))
{
column.column = ColumnConst::create(recursiveRemoveLowCardinality(column_const->getDataColumnPtr()), input_rows_count);
column.column = ColumnConst::create(recursiveRemoveLowCardinality(column_const->getDataColumnPtr()), num_rows);
column.type = recursiveRemoveLowCardinality(column.type);
}
}
@ -301,8 +270,6 @@ ColumnPtr IExecutableFunction::executeWithoutSparseColumns(const ColumnsWithType
bool can_be_executed_on_default_arguments = canBeExecutedOnDefaultArguments();
const auto & dictionary_type = res_low_cardinality_type->getDictionaryType();
/// The arguments should have been adapted in IFunctionOverloadResolver::getReturnType
/// So there is only one low cardinality column (and optionally some const columns) and no full column
ColumnPtr indexes = replaceLowCardinalityColumnsByNestedAndGetDictionaryIndexes(
columns_without_low_cardinality, can_be_executed_on_default_arguments, input_rows_count);

View File

@ -8,17 +8,19 @@ namespace DB
template <char not_case_lower_bound, char not_case_upper_bound>
struct LowerUpperImpl
{
static void vector(const ColumnString::Chars & data,
static void vector(
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t /*input_rows_count*/)
{
res_data.resize_exact(data.size());
res_offsets.assign(offsets);
array(data.data(), data.data() + data.size(), res_data.data());
}
static void vectorFixed(const ColumnString::Chars & data, size_t /*n*/, ColumnString::Chars & res_data)
static void vectorFixed(const ColumnString::Chars & data, size_t /*n*/, ColumnString::Chars & res_data, size_t /*input_rows_count*/)
{
res_data.resize_exact(data.size());
array(data.data(), data.data() + data.size(), res_data.data());

View File

@ -90,7 +90,8 @@ struct LowerUpperUTF8Impl
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
if (data.empty())
return;
@ -98,7 +99,7 @@ struct LowerUpperUTF8Impl
bool all_ascii = isAllASCII(data.data(), data.size());
if (all_ascii)
{
LowerUpperImpl<not_case_lower_bound, not_case_upper_bound>::vector(data, offsets, res_data, res_offsets);
LowerUpperImpl<not_case_lower_bound, not_case_upper_bound>::vector(data, offsets, res_data, res_offsets, input_rows_count);
return;
}
@ -107,7 +108,7 @@ struct LowerUpperUTF8Impl
array(data.data(), data.data() + data.size(), offsets, res_data.data());
}
static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Functions lowerUTF8 and upperUTF8 cannot work with FixedString argument");
}

View File

@ -62,12 +62,13 @@ using Pos = const char *;
template <typename Extractor>
struct ExtractSubstringImpl
{
static void vector(const ColumnString::Chars & data, const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data, ColumnString::Offsets & res_offsets)
static void vector(
const ColumnString::Chars & data, const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data, ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
size_t size = offsets.size();
res_offsets.resize(size);
res_data.reserve(size * Extractor::getReserveLengthForElement());
res_offsets.resize(input_rows_count);
res_data.reserve(input_rows_count * Extractor::getReserveLengthForElement());
size_t prev_offset = 0;
size_t res_offset = 0;
@ -76,7 +77,7 @@ struct ExtractSubstringImpl
Pos start;
size_t length;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
Extractor::execute(reinterpret_cast<const char *>(&data[prev_offset]), offsets[i] - prev_offset - 1, start, length);
@ -99,7 +100,7 @@ struct ExtractSubstringImpl
res_data.assign(start, length);
}
static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Column of type FixedString is not supported by this function");
}
@ -111,12 +112,13 @@ struct ExtractSubstringImpl
template <typename Extractor>
struct CutSubstringImpl
{
static void vector(const ColumnString::Chars & data, const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data, ColumnString::Offsets & res_offsets)
static void vector(
const ColumnString::Chars & data, const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data, ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
res_data.reserve(data.size());
size_t size = offsets.size();
res_offsets.resize(size);
res_offsets.resize(input_rows_count);
size_t prev_offset = 0;
size_t res_offset = 0;
@ -125,7 +127,7 @@ struct CutSubstringImpl
Pos start;
size_t length;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
const char * current = reinterpret_cast<const char *>(&data[prev_offset]);
Extractor::execute(current, offsets[i] - prev_offset - 1, start, length);
@ -154,7 +156,7 @@ struct CutSubstringImpl
res_data.append(start + length, data.data() + data.size());
}
static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Column of type FixedString is not supported by this function");
}

View File

@ -1,8 +1,8 @@
#pragma once
#include <base/find_symbols.h>
#include "domain.h"
#include "tldLookup.h"
#include <Functions/URL/domain.h>
#include <Functions/URL/tldLookup.h>
#include <Common/TLDListsHolder.h> /// TLDType
namespace DB

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include "fragment.h"
#include <Functions/FunctionStringToString.h>
#include <Functions/URL/fragment.h>
namespace DB
{

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include "queryString.h"
#include <Functions/FunctionStringToString.h>
#include <Functions/URL/queryString.h>
namespace DB
{

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include "queryStringAndFragment.h"
#include <Functions/FunctionStringToString.h>
#include <Functions/URL/queryStringAndFragment.h>
namespace DB
{

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include "ExtractFirstSignificantSubdomain.h"
#include <Functions/URL/ExtractFirstSignificantSubdomain.h>
namespace DB

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include "ExtractFirstSignificantSubdomain.h"
#include "FirstSignificantSubdomainCustomImpl.h"
#include <Functions/URL/ExtractFirstSignificantSubdomain.h>
#include <Functions/URL/FirstSignificantSubdomainCustomImpl.h>
namespace DB
{

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include "protocol.h"
#include <Functions/URL/protocol.h>
#include <base/find_symbols.h>

View File

@ -1,7 +1,7 @@
#include <base/hex.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include <base/find_symbols.h>
#include <base/hex.h>
namespace DB
@ -121,8 +121,10 @@ enum URLCodeStrategy
template <URLCodeStrategy code_strategy, bool space_as_plus>
struct CodeURLComponentImpl
{
static void vector(const ColumnString::Chars & data, const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data, ColumnString::Offsets & res_offsets)
static void vector(
const ColumnString::Chars & data, const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data, ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
if (code_strategy == encode)
{
@ -134,13 +136,12 @@ struct CodeURLComponentImpl
res_data.resize(data.size());
}
size_t size = offsets.size();
res_offsets.resize(size);
res_offsets.resize(input_rows_count);
size_t prev_offset = 0;
size_t res_offset = 0;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
const char * src_data = reinterpret_cast<const char *>(&data[prev_offset]);
size_t src_size = offsets[i] - prev_offset;
@ -165,7 +166,7 @@ struct CodeURLComponentImpl
res_data.resize(res_offset);
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Column of type FixedString is not supported by URL functions");
}

View File

@ -1,5 +1,4 @@
#include "domain.h"
#include <Functions/URL/domain.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>

View File

@ -1,9 +1,10 @@
#pragma once
#include "protocol.h"
#include <base/find_symbols.h>
#include <cstring>
#include <Common/StringUtils.h>
#include <Functions/URL/protocol.h>
#include <base/find_symbols.h>
#include <cstring>
namespace DB
{

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include "domain.h"
#include <Functions/URL/domain.h>
namespace DB
{

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include "ExtractFirstSignificantSubdomain.h"
#include <Functions/URL/ExtractFirstSignificantSubdomain.h>
namespace DB

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include "ExtractFirstSignificantSubdomain.h"
#include "FirstSignificantSubdomainCustomImpl.h"
#include <Functions/URL/ExtractFirstSignificantSubdomain.h>
#include <Functions/URL/FirstSignificantSubdomainCustomImpl.h>
namespace DB

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include "fragment.h"
#include <Functions/URL/fragment.h>
namespace DB
{

View File

@ -1,7 +1,7 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include <Functions/StringHelpers.h>
#include "path.h"
#include <Functions/URL/path.h>
#include <base/find_symbols.h>

View File

@ -1,7 +1,7 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include <Functions/StringHelpers.h>
#include "path.h"
#include <Functions/URL/path.h>
#include <base/find_symbols.h>
namespace DB

View File

@ -5,7 +5,7 @@
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnArray.h>
#include <Columns/ColumnConst.h>
#include "domain.h"
#include <Functions/URL/domain.h>
namespace DB

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include "protocol.h"
#include <Functions/URL/protocol.h>
namespace DB

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include "queryString.h"
#include <Functions/URL/queryString.h>
namespace DB
{

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include "queryStringAndFragment.h"
#include <Functions/URL/queryStringAndFragment.h>
namespace DB
{

View File

@ -1,6 +1,6 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
#include "domain.h"
#include <Functions/URL/domain.h>
namespace DB
{

View File

@ -28,20 +28,20 @@ namespace
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
/// The size of result is always not more than the size of source.
/// Because entities decodes to the shorter byte sequence.
/// Example: &#xx... &#xx... will decode to UTF-8 byte sequence not longer than 4 bytes.
res_data.resize(data.size());
size_t size = offsets.size();
res_offsets.resize(size);
res_offsets.resize(input_rows_count);
size_t prev_offset = 0;
size_t res_offset = 0;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
const char * src_data = reinterpret_cast<const char *>(&data[prev_offset]);
size_t src_size = offsets[i] - prev_offset;
@ -55,7 +55,7 @@ namespace
res_data.resize(res_offset);
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Function decodeHTMLComponent cannot work with FixedString argument");
}
@ -64,7 +64,6 @@ namespace
static const int max_legal_unicode_value = 0x10FFFF;
static const int max_decimal_length_of_unicode_point = 7; /// 1114111
static size_t execute(const char * src, size_t src_size, char * dst)
{
const char * src_pos = src;

View File

@ -27,20 +27,20 @@ namespace
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
/// The size of result is always not more than the size of source.
/// Because entities decodes to the shorter byte sequence.
/// Example: &#xx... &#xx... will decode to UTF-8 byte sequence not longer than 4 bytes.
res_data.resize(data.size());
size_t size = offsets.size();
res_offsets.resize(size);
res_offsets.resize(input_rows_count);
size_t prev_offset = 0;
size_t res_offset = 0;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
const char * src_data = reinterpret_cast<const char *>(&data[prev_offset]);
size_t src_size = offsets[i] - prev_offset;
@ -54,7 +54,7 @@ namespace
res_data.resize(res_offset);
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Function decodeXMLComponent cannot work with FixedString argument");
}

View File

@ -25,17 +25,17 @@ namespace
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
/// 6 is the maximum size amplification (the maximum length of encoded entity: &quot;)
res_data.resize(data.size() * 6);
size_t size = offsets.size();
res_offsets.resize(size);
res_offsets.resize(input_rows_count);
size_t prev_offset = 0;
size_t res_offset = 0;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
const char * src_data = reinterpret_cast<const char *>(&data[prev_offset]);
size_t src_size = offsets[i] - prev_offset;
@ -49,7 +49,7 @@ namespace
res_data.resize(res_offset);
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Function encodeXML cannot work with FixedString argument");
}

View File

@ -11,8 +11,8 @@ namespace DB
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int TOO_MANY_ARGUMENTS_FOR_FUNCTION;
}
@ -87,7 +87,7 @@ public:
return col_res;
}
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
throw Exception(ErrorCodes::TOO_MANY_ARGUMENTS_FOR_FUNCTION,
"Illegal number of UInt arguments of function {}: should be not more than 2 dimensions",
getName());
}

View File

@ -44,15 +44,15 @@ struct IdnaEncode
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
const size_t rows = offsets.size();
res_data.reserve(data.size()); /// just a guess, assuming the input is all-ASCII
res_offsets.reserve(rows);
res_offsets.reserve(input_rows_count);
size_t prev_offset = 0;
std::string ascii;
for (size_t row = 0; row < rows; ++row)
for (size_t row = 0; row < input_rows_count; ++row)
{
const char * value = reinterpret_cast<const char *>(&data[prev_offset]);
const size_t value_length = offsets[row] - prev_offset - 1;
@ -85,7 +85,7 @@ struct IdnaEncode
}
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Arguments of type FixedString are not allowed");
}
@ -99,15 +99,15 @@ struct IdnaDecode
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
const size_t rows = offsets.size();
res_data.reserve(data.size()); /// just a guess, assuming the input is all-ASCII
res_offsets.reserve(rows);
res_offsets.reserve(input_rows_count);
size_t prev_offset = 0;
std::string unicode;
for (size_t row = 0; row < rows; ++row)
for (size_t row = 0; row < input_rows_count; ++row)
{
const char * ascii = reinterpret_cast<const char *>(&data[prev_offset]);
const size_t ascii_length = offsets[row] - prev_offset - 1;
@ -124,7 +124,7 @@ struct IdnaDecode
}
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Arguments of type FixedString are not allowed");
}

View File

@ -9,10 +9,12 @@ namespace
struct InitcapImpl
{
static void vector(const ColumnString::Chars & data,
static void vector(
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t /*input_rows_count*/)
{
if (data.empty())
return;
@ -21,7 +23,7 @@ struct InitcapImpl
array(data.data(), data.data() + data.size(), res_data.data());
}
static void vectorFixed(const ColumnString::Chars & data, size_t /*n*/, ColumnString::Chars & res_data)
static void vectorFixed(const ColumnString::Chars & data, size_t /*n*/, ColumnString::Chars & res_data, size_t)
{
res_data.resize(data.size());
array(data.data(), data.data() + data.size(), res_data.data());

View File

@ -22,7 +22,8 @@ struct InitcapUTF8Impl
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t /*input_rows_count*/)
{
if (data.empty())
return;
@ -31,7 +32,7 @@ struct InitcapUTF8Impl
array(data.data(), data.data() + data.size(), offsets, res_data.data());
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Function initcapUTF8 cannot work with FixedString argument");
}

View File

@ -16,8 +16,8 @@ namespace DB
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int TOO_MANY_ARGUMENTS_FOR_FUNCTION;
}
#define EXTRACT_VECTOR(INDEX) \
@ -130,7 +130,7 @@ namespace ErrorCodes
MASK(8, 6, col6->getUInt(i)), \
MASK(8, 7, col7->getUInt(i))) \
\
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, \
throw Exception(ErrorCodes::TOO_MANY_ARGUMENTS_FOR_FUNCTION, \
"Illegal number of UInt arguments of function {}, max: 8", \
getName()); \

View File

@ -19,17 +19,19 @@ template <bool keep_names>
struct Impl
{
static constexpr auto name = keep_names ? "normalizeQueryKeepNames" : "normalizeQuery";
static void vector(const ColumnString::Chars & data,
static void vector(
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
size_t size = offsets.size();
res_offsets.resize(size);
res_offsets.resize(input_rows_count);
res_data.reserve(data.size());
ColumnString::Offset prev_src_offset = 0;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
ColumnString::Offset curr_src_offset = offsets[i];
@ -43,7 +45,7 @@ struct Impl
}
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot apply function normalizeQuery to fixed string.");
}

View File

@ -84,7 +84,8 @@ struct NormalizeUTF8Impl
static void vector(const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
UErrorCode err = U_ZERO_ERROR;
@ -92,8 +93,7 @@ struct NormalizeUTF8Impl
if (U_FAILURE(err))
throw Exception(ErrorCodes::CANNOT_NORMALIZE_STRING, "Normalization failed (getNormalizer): {}", u_errorName(err));
size_t size = offsets.size();
res_offsets.resize(size);
res_offsets.resize(input_rows_count);
res_data.reserve(data.size() * 2);
@ -103,7 +103,7 @@ struct NormalizeUTF8Impl
PODArray<UChar> from_uchars;
PODArray<UChar> to_uchars;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
size_t from_size = offsets[i] - current_from_offset - 1;
@ -157,7 +157,7 @@ struct NormalizeUTF8Impl
res_data.resize(current_to_offset);
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot apply function normalizeUTF8 to fixed string.");
}

View File

@ -27,13 +27,13 @@ struct Impl
static void vector(
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
PaddedPODArray<UInt64> & res_data)
PaddedPODArray<UInt64> & res_data,
size_t input_rows_count)
{
size_t size = offsets.size();
res_data.resize(size);
res_data.resize(input_rows_count);
ColumnString::Offset prev_src_offset = 0;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
ColumnString::Offset curr_src_offset = offsets[i];
res_data[i] = normalizedQueryHash(
@ -77,15 +77,15 @@ public:
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
const ColumnPtr column = arguments[0].column;
if (const ColumnString * col = checkAndGetColumn<ColumnString>(column.get()))
{
auto col_res = ColumnUInt64::create();
typename ColumnUInt64::Container & vec_res = col_res->getData();
vec_res.resize(col->size());
Impl<keep_names>::vector(col->getChars(), col->getOffsets(), vec_res);
vec_res.resize(input_rows_count);
Impl<keep_names>::vector(col->getChars(), col->getOffsets(), vec_res, input_rows_count);
return col_res;
}
else

View File

@ -91,8 +91,6 @@ private:
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
const auto size = input_rows_count;
/// Prepare array of ellipses.
size_t ellipses_count = (arguments.size() - 2) / 4;
std::vector<Ellipse> ellipses(ellipses_count);
@ -141,13 +139,11 @@ private:
auto dst = ColumnVector<UInt8>::create();
auto & dst_data = dst->getData();
dst_data.resize(size);
dst_data.resize(input_rows_count);
size_t start_index = 0;
for (const auto row : collections::range(0, size))
{
for (size_t row = 0; row < input_rows_count; ++row)
dst_data[row] = isPointInEllipses(col_vec_x->getData()[row], col_vec_y->getData()[row], ellipses.data(), ellipses_count, start_index);
}
return dst;
}
@ -157,7 +153,7 @@ private:
const auto * col_const_y = assert_cast<const ColumnConst *> (col_y);
size_t start_index = 0;
UInt8 res = isPointInEllipses(col_const_x->getValue<Float64>(), col_const_y->getValue<Float64>(), ellipses.data(), ellipses_count, start_index);
return DataTypeUInt8().createColumnConst(size, res);
return DataTypeUInt8().createColumnConst(input_rows_count, res);
}
else
{

View File

@ -6,11 +6,11 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionStringToString.h>
# pragma clang diagnostic push
# pragma clang diagnostic ignored "-Wnewline-eof"
# include <ada/idna/punycode.h>
# include <ada/idna/unicode_transcoding.h>
# pragma clang diagnostic pop
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wnewline-eof"
#include <ada/idna/punycode.h>
#include <ada/idna/unicode_transcoding.h>
#pragma clang diagnostic pop
namespace DB
{
@ -38,16 +38,16 @@ struct PunycodeEncode
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
const size_t rows = offsets.size();
res_data.reserve(data.size()); /// just a guess, assuming the input is all-ASCII
res_offsets.reserve(rows);
res_offsets.reserve(input_rows_count);
size_t prev_offset = 0;
std::u32string value_utf32;
std::string value_puny;
for (size_t row = 0; row < rows; ++row)
for (size_t row = 0; row < input_rows_count; ++row)
{
const char * value = reinterpret_cast<const char *>(&data[prev_offset]);
const size_t value_length = offsets[row] - prev_offset - 1;
@ -72,7 +72,7 @@ struct PunycodeEncode
}
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Arguments of type FixedString are not allowed");
}
@ -86,16 +86,16 @@ struct PunycodeDecode
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
const size_t rows = offsets.size();
res_data.reserve(data.size()); /// just a guess, assuming the input is all-ASCII
res_offsets.reserve(rows);
res_offsets.reserve(input_rows_count);
size_t prev_offset = 0;
std::u32string value_utf32;
std::string value_utf8;
for (size_t row = 0; row < rows; ++row)
for (size_t row = 0; row < input_rows_count; ++row)
{
const char * value = reinterpret_cast<const char *>(&data[prev_offset]);
const size_t value_length = offsets[row] - prev_offset - 1;
@ -129,7 +129,7 @@ struct PunycodeDecode
}
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Arguments of type FixedString are not allowed");
}

View File

@ -24,6 +24,7 @@ namespace ErrorCodes
extern const int ILLEGAL_COLUMN;
extern const int BAD_ARGUMENTS;
extern const int LOGICAL_ERROR;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
}
namespace
@ -246,7 +247,7 @@ public:
{
auto desired = Distribution::getNumberOfArguments();
if (arguments.size() != desired && arguments.size() != desired + 1)
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Wrong number of arguments for function {}. Should be {} or {}",
getName(), desired, desired + 1);

View File

@ -55,19 +55,19 @@ public:
bool useDefaultImplementationForConstants() const override { return true; }
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t) const override
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
const ColumnPtr column = arguments[0].column;
if (const ColumnString * col = checkAndGetColumn<ColumnString>(column.get()))
{
auto col_res = ColumnString::create();
ReverseImpl::vector(col->getChars(), col->getOffsets(), col_res->getChars(), col_res->getOffsets());
ReverseImpl::vector(col->getChars(), col->getOffsets(), col_res->getChars(), col_res->getOffsets(), input_rows_count);
return col_res;
}
else if (const ColumnFixedString * col_fixed = checkAndGetColumn<ColumnFixedString>(column.get()))
{
auto col_res = ColumnFixedString::create(col_fixed->getN());
ReverseImpl::vectorFixed(col_fixed->getChars(), col_fixed->getN(), col_res->getChars());
ReverseImpl::vectorFixed(col_fixed->getChars(), col_fixed->getN(), col_res->getChars(), input_rows_count);
return col_res;
}
else

View File

@ -9,17 +9,18 @@ namespace DB
*/
struct ReverseImpl
{
static void vector(const ColumnString::Chars & data,
static void vector(
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
res_data.resize_exact(data.size());
res_offsets.assign(offsets);
size_t size = offsets.size();
ColumnString::Offset prev_offset = 0;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
for (size_t j = prev_offset; j < offsets[i] - 1; ++j)
res_data[j] = data[offsets[i] + prev_offset - 2 - j];
@ -28,12 +29,15 @@ struct ReverseImpl
}
}
static void vectorFixed(const ColumnString::Chars & data, size_t n, ColumnString::Chars & res_data)
static void vectorFixed(
const ColumnString::Chars & data,
size_t n,
ColumnString::Chars & res_data,
size_t input_rows_count)
{
res_data.resize_exact(data.size());
size_t size = data.size() / n;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
for (size_t j = i * n; j < (i + 1) * n; ++j)
res_data[j] = data[(i * 2 + 1) * n - j - 1];
}

View File

@ -23,25 +23,25 @@ namespace
*/
struct ReverseUTF8Impl
{
static void vector(const ColumnString::Chars & data,
static void vector(
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
bool all_ascii = isAllASCII(data.data(), data.size());
if (all_ascii)
{
ReverseImpl::vector(data, offsets, res_data, res_offsets);
ReverseImpl::vector(data, offsets, res_data, res_offsets, input_rows_count);
return;
}
res_data.resize(data.size());
res_offsets.assign(offsets);
size_t size = offsets.size();
ColumnString::Offset prev_offset = 0;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
ColumnString::Offset j = prev_offset;
while (j < offsets[i] - 1)
@ -73,7 +73,7 @@ struct ReverseUTF8Impl
}
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot apply function reverseUTF8 to fixed string.");
}

View File

@ -79,14 +79,14 @@ struct SoundexImpl
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
const size_t size = offsets.size();
res_data.resize(size * (length + 1));
res_offsets.resize(size);
res_data.resize(input_rows_count * (length + 1));
res_offsets.resize(input_rows_count);
size_t prev_offset = 0;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
const char * value = reinterpret_cast<const char *>(&data[prev_offset]);
const size_t value_length = offsets[i] - prev_offset - 1;
@ -98,7 +98,7 @@ struct SoundexImpl
}
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Column of type FixedString is not supported by soundex function");
}

View File

@ -128,16 +128,16 @@ struct ToValidUTF8Impl
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
const size_t offsets_size = offsets.size();
/// It can be larger than that, but we believe it is unlikely to happen.
res_data.resize(data.size());
res_offsets.resize(offsets_size);
res_offsets.resize(input_rows_count);
size_t prev_offset = 0;
WriteBufferFromVector<ColumnString::Chars> write_buffer(res_data);
for (size_t i = 0; i < offsets_size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
const char * haystack_data = reinterpret_cast<const char *>(&data[prev_offset]);
const size_t haystack_size = offsets[i] - prev_offset - 1;
@ -149,7 +149,7 @@ struct ToValidUTF8Impl
write_buffer.finalize();
}
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
[[noreturn]] static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Column of type FixedString is not supported by toValidUTF8 function");
}

View File

@ -43,10 +43,10 @@ public:
const ColumnString::Chars & data,
const ColumnString::Offsets & offsets,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets)
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
size_t size = offsets.size();
res_offsets.resize_exact(size);
res_offsets.resize_exact(input_rows_count);
res_data.reserve_exact(data.size());
size_t prev_offset = 0;
@ -55,7 +55,7 @@ public:
const UInt8 * start;
size_t length;
for (size_t i = 0; i < size; ++i)
for (size_t i = 0; i < input_rows_count; ++i)
{
execute(reinterpret_cast<const UInt8 *>(&data[prev_offset]), offsets[i] - prev_offset - 1, start, length);
@ -69,7 +69,7 @@ public:
}
}
static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &)
static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)
{
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Functions trimLeft, trimRight and trimBoth cannot work with FixedString argument");
}

View File

@ -1,77 +1,77 @@
clickhouse_add_executable (read_buffer read_buffer.cpp)
target_link_libraries (read_buffer PRIVATE clickhouse_common_io)
target_link_libraries (read_buffer PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (read_buffer_perf read_buffer_perf.cpp)
target_link_libraries (read_buffer_perf PRIVATE clickhouse_common_io)
target_link_libraries (read_buffer_perf PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (read_float_perf read_float_perf.cpp)
target_link_libraries (read_float_perf PRIVATE clickhouse_common_io)
target_link_libraries (read_float_perf PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (write_buffer write_buffer.cpp)
target_link_libraries (write_buffer PRIVATE clickhouse_common_io)
target_link_libraries (write_buffer PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (write_buffer_perf write_buffer_perf.cpp)
target_link_libraries (write_buffer_perf PRIVATE clickhouse_common_io)
target_link_libraries (write_buffer_perf PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (valid_utf8_perf valid_utf8_perf.cpp)
target_link_libraries (valid_utf8_perf PRIVATE clickhouse_common_io)
target_link_libraries (valid_utf8_perf PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (valid_utf8 valid_utf8.cpp)
target_link_libraries (valid_utf8 PRIVATE clickhouse_common_io)
target_link_libraries (valid_utf8 PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (var_uint var_uint.cpp)
target_link_libraries (var_uint PRIVATE clickhouse_common_io)
target_link_libraries (var_uint PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (read_escaped_string read_escaped_string.cpp)
target_link_libraries (read_escaped_string PRIVATE clickhouse_common_io)
target_link_libraries (read_escaped_string PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (parse_int_perf parse_int_perf.cpp)
target_link_libraries (parse_int_perf PRIVATE clickhouse_common_io)
target_link_libraries (parse_int_perf PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (parse_int_perf2 parse_int_perf2.cpp)
target_link_libraries (parse_int_perf2 PRIVATE clickhouse_common_io)
target_link_libraries (parse_int_perf2 PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (read_write_int read_write_int.cpp)
target_link_libraries (read_write_int PRIVATE clickhouse_common_io)
target_link_libraries (read_write_int PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (o_direct_and_dirty_pages o_direct_and_dirty_pages.cpp)
target_link_libraries (o_direct_and_dirty_pages PRIVATE clickhouse_common_io)
target_link_libraries (o_direct_and_dirty_pages PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (io_operators io_operators.cpp)
target_link_libraries (io_operators PRIVATE clickhouse_common_io)
target_link_libraries (io_operators PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (write_int write_int.cpp)
target_link_libraries (write_int PRIVATE clickhouse_common_io)
target_link_libraries (write_int PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (zlib_buffers zlib_buffers.cpp)
target_link_libraries (zlib_buffers PRIVATE clickhouse_common_io)
target_link_libraries (zlib_buffers PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (lzma_buffers lzma_buffers.cpp)
target_link_libraries (lzma_buffers PRIVATE clickhouse_common_io)
target_link_libraries (lzma_buffers PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (limit_read_buffer limit_read_buffer.cpp)
target_link_libraries (limit_read_buffer PRIVATE clickhouse_common_io)
target_link_libraries (limit_read_buffer PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (limit_read_buffer2 limit_read_buffer2.cpp)
target_link_libraries (limit_read_buffer2 PRIVATE clickhouse_common_io)
target_link_libraries (limit_read_buffer2 PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (parse_date_time_best_effort parse_date_time_best_effort.cpp)
target_link_libraries (parse_date_time_best_effort PRIVATE clickhouse_common_io)
target_link_libraries (parse_date_time_best_effort PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (zlib_ng_bug zlib_ng_bug.cpp)
target_link_libraries (zlib_ng_bug PRIVATE ch_contrib::zlib clickhouse_common_io)
target_link_libraries (zlib_ng_bug PRIVATE ch_contrib::zlib clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (dragonbox_test dragonbox_test.cpp)
target_link_libraries (dragonbox_test PRIVATE ch_contrib::dragonbox_to_chars clickhouse_common_io)
target_link_libraries (dragonbox_test PRIVATE ch_contrib::dragonbox_to_chars clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (zstd_buffers zstd_buffers.cpp)
target_link_libraries (zstd_buffers PRIVATE clickhouse_common_io)
target_link_libraries (zstd_buffers PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (snappy_read_buffer snappy_read_buffer.cpp)
target_link_libraries (snappy_read_buffer PRIVATE clickhouse_common_io)
target_link_libraries (snappy_read_buffer PRIVATE clickhouse_common_io clickhouse_common_config)
clickhouse_add_executable (hadoop_snappy_read_buffer hadoop_snappy_read_buffer.cpp)
target_link_libraries (hadoop_snappy_read_buffer PRIVATE clickhouse_common_io)
target_link_libraries (hadoop_snappy_read_buffer PRIVATE clickhouse_common_io clickhouse_common_config)
if (TARGET ch_contrib::hdfs)
clickhouse_add_executable (read_buffer_from_hdfs read_buffer_from_hdfs.cpp)

View File

@ -2,34 +2,34 @@ clickhouse_add_executable (hash_map hash_map.cpp)
target_link_libraries (hash_map PRIVATE dbms clickhouse_functions ch_contrib::sparsehash)
clickhouse_add_executable (hash_map_lookup hash_map_lookup.cpp)
target_link_libraries (hash_map_lookup PRIVATE clickhouse_common_io clickhouse_compression)
target_link_libraries (hash_map_lookup PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression)
clickhouse_add_executable (hash_map3 hash_map3.cpp)
target_link_libraries (hash_map3 PRIVATE clickhouse_common_io clickhouse_compression ch_contrib::farmhash ch_contrib::metrohash)
target_link_libraries (hash_map3 PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression ch_contrib::farmhash ch_contrib::metrohash)
clickhouse_add_executable (hash_map_string hash_map_string.cpp)
target_link_libraries (hash_map_string PRIVATE clickhouse_common_io clickhouse_compression ch_contrib::sparsehash)
target_link_libraries (hash_map_string PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression ch_contrib::sparsehash)
clickhouse_add_executable (hash_map_string_2 hash_map_string_2.cpp)
target_link_libraries (hash_map_string_2 PRIVATE clickhouse_common_io clickhouse_compression)
target_link_libraries (hash_map_string_2 PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression)
clickhouse_add_executable (hash_map_string_3 hash_map_string_3.cpp)
target_link_libraries (hash_map_string_3 PRIVATE clickhouse_common_io clickhouse_compression ch_contrib::farmhash ch_contrib::metrohash)
target_link_libraries (hash_map_string_3 PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression ch_contrib::farmhash ch_contrib::metrohash)
clickhouse_add_executable (hash_map_string_small hash_map_string_small.cpp)
target_link_libraries (hash_map_string_small PRIVATE clickhouse_common_io clickhouse_compression ch_contrib::sparsehash)
target_link_libraries (hash_map_string_small PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression ch_contrib::sparsehash)
clickhouse_add_executable (string_hash_map string_hash_map.cpp)
target_link_libraries (string_hash_map PRIVATE clickhouse_common_io clickhouse_compression ch_contrib::sparsehash)
target_link_libraries (string_hash_map PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression ch_contrib::sparsehash)
clickhouse_add_executable (string_hash_map_aggregation string_hash_map.cpp)
target_link_libraries (string_hash_map_aggregation PRIVATE clickhouse_common_io clickhouse_compression)
target_link_libraries (string_hash_map_aggregation PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression)
clickhouse_add_executable (string_hash_set string_hash_set.cpp)
target_link_libraries (string_hash_set PRIVATE clickhouse_common_io clickhouse_compression)
target_link_libraries (string_hash_set PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression)
clickhouse_add_executable (two_level_hash_map two_level_hash_map.cpp)
target_link_libraries (two_level_hash_map PRIVATE clickhouse_common_io clickhouse_compression ch_contrib::sparsehash)
target_link_libraries (two_level_hash_map PRIVATE clickhouse_common_io clickhouse_common_config clickhouse_compression ch_contrib::sparsehash)
clickhouse_add_executable (jit_example jit_example.cpp)
target_link_libraries (jit_example PRIVATE dbms)

View File

@ -1048,14 +1048,14 @@ static std::tuple<ASTPtr, BlockIO> executeQueryImpl(
/// In case when the client had to retry some mini-INSERTs then they will be properly deduplicated
/// by the source tables. This functionality is controlled by a setting `async_insert_deduplicate`.
/// But then they will be glued together into a block and pushed through a chain of Materialized Views if any.
/// The process of forming such blocks is not deteministic so each time we retry mini-INSERTs the resulting
/// The process of forming such blocks is not deterministic so each time we retry mini-INSERTs the resulting
/// block may be concatenated differently.
/// That's why deduplication in dependent Materialized Views doesn't make sense in presence of async INSERTs.
if (settings.throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert &&
settings.deduplicate_blocks_in_dependent_materialized_views)
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED,
"Deduplication is dependent materialized view cannot work together with async inserts. "\
"Please disable eiher `deduplicate_blocks_in_dependent_materialized_views` or `async_insert` setting.");
"Deduplication in dependent materialized view cannot work together with async inserts. "\
"Please disable either `deduplicate_blocks_in_dependent_materialized_views` or `async_insert` setting.");
quota = context->getQuota();
if (quota)

View File

@ -16,8 +16,18 @@
namespace DB
{
static constinit std::atomic<bool> allow_logging{true};
void OwnSplitChannel::disableLogging()
{
allow_logging = false;
}
void OwnSplitChannel::log(const Poco::Message & msg)
{
if (!allow_logging)
return;
#ifndef WITHOUT_TEXT_LOG
auto logs_queue = CurrentThread::getInternalTextLogsQueue();

View File

@ -39,6 +39,8 @@ public:
void setLevel(const std::string & name, int level);
static void disableLogging();
private:
void logSplit(const Poco::Message & msg);
void tryLogSplit(const Poco::Message & msg);

View File

@ -1,7 +1,7 @@
set(SRCS)
clickhouse_add_executable(lexer lexer.cpp ${SRCS})
target_link_libraries(lexer PRIVATE clickhouse_parsers)
target_link_libraries(lexer PRIVATE clickhouse_parsers clickhouse_common_config)
clickhouse_add_executable(select_parser select_parser.cpp ${SRCS} "../../Server/ServerType.cpp")
target_link_libraries(select_parser PRIVATE dbms)

View File

@ -745,12 +745,7 @@ void addWithFillStepIfNeeded(QueryPlan & query_plan,
{
auto & interpolate_node_typed = interpolate_node->as<InterpolateNode &>();
PlannerActionsVisitor planner_actions_visitor(
planner_context,
/* use_column_identifier_as_action_node_name_, (default value)*/ true,
/// Prefer the INPUT to CONSTANT nodes (actions must be non constant)
/* always_use_const_column_for_constant_nodes */ false);
PlannerActionsVisitor planner_actions_visitor(planner_context);
auto expression_to_interpolate_expression_nodes = planner_actions_visitor.visit(*interpolate_actions_dag,
interpolate_node_typed.getExpression());
if (expression_to_interpolate_expression_nodes.size() != 1)

View File

@ -487,33 +487,16 @@ public:
return node;
}
[[nodiscard]] String addConstantIfNecessary(
const std::string & node_name, const ColumnWithTypeAndName & column, bool always_use_const_column_for_constant_nodes)
const ActionsDAG::Node * addConstantIfNecessary(const std::string & node_name, const ColumnWithTypeAndName & column)
{
chassert(column.column != nullptr);
auto it = node_name_to_node.find(node_name);
if (it != node_name_to_node.end() && (!always_use_const_column_for_constant_nodes || it->second->column))
return {node_name};
if (it != node_name_to_node.end())
{
/// There is a node with this name, but it doesn't have a column
/// This likely happens because we executed the query until WithMergeableState with a const node in the
/// WHERE clause and, as the results of headers are materialized, the column was removed
/// Let's add a new column and keep this
String dupped_name{node_name + "_dupped"};
if (node_name_to_node.find(dupped_name) != node_name_to_node.end())
return dupped_name;
const auto * node = &actions_dag.addColumn(column);
node_name_to_node[dupped_name] = node;
return dupped_name;
}
return it->second;
const auto * node = &actions_dag.addColumn(column);
node_name_to_node[node->result_name] = node;
return {node_name};
return node;
}
template <typename FunctionOrOverloadResolver>
@ -542,7 +525,7 @@ public:
}
private:
std::unordered_map<String, const ActionsDAG::Node *> node_name_to_node;
std::unordered_map<std::string_view, const ActionsDAG::Node *> node_name_to_node;
ActionsDAG & actions_dag;
QueryTreeNodePtr scope_node;
};
@ -550,11 +533,9 @@ private:
class PlannerActionsVisitorImpl
{
public:
PlannerActionsVisitorImpl(
ActionsDAG & actions_dag,
PlannerActionsVisitorImpl(ActionsDAG & actions_dag,
const PlannerContextPtr & planner_context_,
bool use_column_identifier_as_action_node_name_,
bool always_use_const_column_for_constant_nodes_);
bool use_column_identifier_as_action_node_name_);
ActionsDAG::NodeRawConstPtrs visit(QueryTreeNodePtr expression_node);
@ -614,18 +595,14 @@ private:
const PlannerContextPtr planner_context;
ActionNodeNameHelper action_node_name_helper;
bool use_column_identifier_as_action_node_name;
bool always_use_const_column_for_constant_nodes;
};
PlannerActionsVisitorImpl::PlannerActionsVisitorImpl(
ActionsDAG & actions_dag,
PlannerActionsVisitorImpl::PlannerActionsVisitorImpl(ActionsDAG & actions_dag,
const PlannerContextPtr & planner_context_,
bool use_column_identifier_as_action_node_name_,
bool always_use_const_column_for_constant_nodes_)
bool use_column_identifier_as_action_node_name_)
: planner_context(planner_context_)
, action_node_name_helper(node_to_node_name, *planner_context, use_column_identifier_as_action_node_name_)
, use_column_identifier_as_action_node_name(use_column_identifier_as_action_node_name_)
, always_use_const_column_for_constant_nodes(always_use_const_column_for_constant_nodes_)
{
actions_stack.emplace_back(actions_dag, nullptr);
}
@ -748,16 +725,17 @@ PlannerActionsVisitorImpl::NodeNameAndNodeMinLevel PlannerActionsVisitorImpl::vi
column.type = constant_type;
column.column = column.type->createColumnConst(1, constant_literal);
String final_name = actions_stack[0].addConstantIfNecessary(constant_node_name, column, always_use_const_column_for_constant_nodes);
actions_stack[0].addConstantIfNecessary(constant_node_name, column);
size_t actions_stack_size = actions_stack.size();
for (size_t i = 1; i < actions_stack_size; ++i)
{
auto & actions_stack_node = actions_stack[i];
actions_stack_node.addInputConstantColumnIfNecessary(final_name, column);
actions_stack_node.addInputConstantColumnIfNecessary(constant_node_name, column);
}
return {final_name, Levels(0)};
return {constant_node_name, Levels(0)};
}
PlannerActionsVisitorImpl::NodeNameAndNodeMinLevel PlannerActionsVisitorImpl::visitLambda(const QueryTreeNodePtr & node)
@ -886,16 +864,16 @@ PlannerActionsVisitorImpl::NodeNameAndNodeMinLevel PlannerActionsVisitorImpl::ma
else
column.column = std::move(column_set);
String final_name = actions_stack[0].addConstantIfNecessary(column.name, column, always_use_const_column_for_constant_nodes);
actions_stack[0].addConstantIfNecessary(column.name, column);
size_t actions_stack_size = actions_stack.size();
for (size_t i = 1; i < actions_stack_size; ++i)
{
auto & actions_stack_node = actions_stack[i];
actions_stack_node.addInputConstantColumnIfNecessary(final_name, column);
actions_stack_node.addInputConstantColumnIfNecessary(column.name, column);
}
return {final_name, Levels(0)};
return {column.name, Levels(0)};
}
PlannerActionsVisitorImpl::NodeNameAndNodeMinLevel PlannerActionsVisitorImpl::visitIndexHintFunction(const QueryTreeNodePtr & node)
@ -1032,19 +1010,14 @@ PlannerActionsVisitorImpl::NodeNameAndNodeMinLevel PlannerActionsVisitorImpl::vi
}
PlannerActionsVisitor::PlannerActionsVisitor(
const PlannerContextPtr & planner_context_,
bool use_column_identifier_as_action_node_name_,
bool always_use_const_column_for_constant_nodes_)
PlannerActionsVisitor::PlannerActionsVisitor(const PlannerContextPtr & planner_context_, bool use_column_identifier_as_action_node_name_)
: planner_context(planner_context_)
, use_column_identifier_as_action_node_name(use_column_identifier_as_action_node_name_)
, always_use_const_column_for_constant_nodes(always_use_const_column_for_constant_nodes_)
{}
ActionsDAG::NodeRawConstPtrs PlannerActionsVisitor::visit(ActionsDAG & actions_dag, QueryTreeNodePtr expression_node)
{
PlannerActionsVisitorImpl actions_visitor_impl(
actions_dag, planner_context, use_column_identifier_as_action_node_name, always_use_const_column_for_constant_nodes);
PlannerActionsVisitorImpl actions_visitor_impl(actions_dag, planner_context, use_column_identifier_as_action_node_name);
return actions_visitor_impl.visit(expression_node);
}

View File

@ -27,17 +27,11 @@ using PlannerContextPtr = std::shared_ptr<PlannerContext>;
* During actions build, there is special handling for following functions:
* 1. Aggregate functions are added in actions dag as INPUT nodes. Aggregate functions arguments are not added.
* 2. For function `in` and its variants, already collected sets from planner context are used.
* 3. When building actions that use CONSTANT nodes, by default we ignore pre-existing INPUTs if those don't have
* a column (a const column always has a column). This is for compatibility with previous headers. We disable this
* behaviour when we explicitly want to override CONSTANT nodes with the input (resolving InterpolateNode for example)
*/
class PlannerActionsVisitor
{
public:
explicit PlannerActionsVisitor(
const PlannerContextPtr & planner_context_,
bool use_column_identifier_as_action_node_name_ = true,
bool always_use_const_column_for_constant_nodes_ = true);
explicit PlannerActionsVisitor(const PlannerContextPtr & planner_context_, bool use_column_identifier_as_action_node_name_ = true);
/** Add actions necessary to calculate expression node into expression dag.
* Necessary actions are not added in actions dag output.
@ -48,7 +42,6 @@ public:
private:
const PlannerContextPtr planner_context;
bool use_column_identifier_as_action_node_name = true;
bool always_use_const_column_for_constant_nodes = true;
};
/** Calculate query tree expression node action dag name and add them into node to name map.

View File

@ -943,6 +943,8 @@ Pipe ReadFromMergeTree::spreadMarkRangesAmongStreamsWithOrder(
PoolSettings pool_settings
{
.threads = num_streams,
.sum_marks = parts_with_ranges.getMarksCountAllParts(),
.min_marks_for_concurrent_read = info.min_marks_for_concurrent_read,
.preferred_block_size_bytes = settings.preferred_block_size_bytes,
.use_uncompressed_cache = info.use_uncompressed_cache,

View File

@ -56,7 +56,10 @@ namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
extern const int NOT_IMPLEMENTED;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int TOO_FEW_ARGUMENTS_FOR_FUNCTION;
extern const int TOO_MANY_ARGUMENTS_FOR_FUNCTION;
}
// Interface for true window functions. It's not much of an interface, they just
@ -1710,7 +1713,7 @@ struct WindowFunctionExponentialTimeDecayedSum final : public StatefulWindowFunc
{
if (parameters_.size() != 1)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Function {} takes exactly one parameter", name_);
}
return applyVisitor(FieldVisitorConvertToNumber<Float64>(), parameters_[0]);
@ -1723,7 +1726,7 @@ struct WindowFunctionExponentialTimeDecayedSum final : public StatefulWindowFunc
{
if (argument_types.size() != 2)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Function {} takes exactly two arguments", name_);
}
@ -1807,7 +1810,7 @@ struct WindowFunctionExponentialTimeDecayedMax final : public WindowFunction
{
if (parameters_.size() != 1)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Function {} takes exactly one parameter", name_);
}
return applyVisitor(FieldVisitorConvertToNumber<Float64>(), parameters_[0]);
@ -1820,7 +1823,7 @@ struct WindowFunctionExponentialTimeDecayedMax final : public WindowFunction
{
if (argument_types.size() != 2)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Function {} takes exactly two arguments", name_);
}
@ -1882,7 +1885,7 @@ struct WindowFunctionExponentialTimeDecayedCount final : public StatefulWindowFu
{
if (parameters_.size() != 1)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Function {} takes exactly one parameter", name_);
}
return applyVisitor(FieldVisitorConvertToNumber<Float64>(), parameters_[0]);
@ -1895,7 +1898,7 @@ struct WindowFunctionExponentialTimeDecayedCount final : public StatefulWindowFu
{
if (argument_types.size() != 1)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Function {} takes exactly one argument", name_);
}
@ -1968,7 +1971,7 @@ struct WindowFunctionExponentialTimeDecayedAvg final : public StatefulWindowFunc
{
if (parameters_.size() != 1)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Function {} takes exactly one parameter", name_);
}
return applyVisitor(FieldVisitorConvertToNumber<Float64>(), parameters_[0]);
@ -1981,7 +1984,7 @@ struct WindowFunctionExponentialTimeDecayedAvg final : public StatefulWindowFunc
{
if (argument_types.size() != 2)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Function {} takes exactly two arguments", name_);
}
@ -2116,7 +2119,7 @@ struct WindowFunctionNtile final : public StatefulWindowFunction<NtileState>
: StatefulWindowFunction<NtileState>(name_, argument_types_, parameters_, std::make_shared<DataTypeUInt64>())
{
if (argument_types.size() != 1)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Function {} takes exactly one argument", name_);
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Function {} takes exactly one argument", name_);
auto type_id = argument_types[0]->getTypeId();
if (type_id != TypeIndex::UInt8 && type_id != TypeIndex::UInt16 && type_id != TypeIndex::UInt32 && type_id != TypeIndex::UInt64)
@ -2191,7 +2194,7 @@ namespace
if (!buckets)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Argument of 'ntile' funtcion must be greater than zero");
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Argument of 'ntile' function must be greater than zero");
}
}
// new partition
@ -2404,7 +2407,7 @@ struct WindowFunctionLagLeadInFrame final : public WindowFunction
if (argument_types.size() > 3)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::TOO_MANY_ARGUMENTS_FOR_FUNCTION,
"Function '{}' accepts at most 3 arguments, {} given",
name, argument_types.size());
}
@ -2414,7 +2417,7 @@ struct WindowFunctionLagLeadInFrame final : public WindowFunction
{
if (argument_types_.empty())
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION,
"Function {} takes at least one argument", name_);
}
@ -2504,7 +2507,7 @@ struct WindowFunctionNthValue final : public WindowFunction
{
if (argument_types_.size() != 2)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Function {} takes exactly two arguments", name_);
}
@ -2578,7 +2581,7 @@ struct NonNegativeDerivativeParams
if (argument_types.size() != 2 && argument_types.size() != 3)
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Function {} takes 2 or 3 arguments", name_);
}

View File

@ -1,48 +1,48 @@
#include <Interpreters/AsynchronousInsertQueue.h>
#include <Interpreters/Squashing.h>
#include <Parsers/ASTInsertQuery.h>
#include <algorithm>
#include <exception>
#include <memory>
#include <mutex>
#include <vector>
#include <string_view>
#include <Poco/Net/NetException.h>
#include <Poco/Net/SocketAddress.h>
#include <Poco/Util/LayeredConfiguration.h>
#include <Common/CurrentThread.h>
#include <Common/Stopwatch.h>
#include <Common/NetException.h>
#include <Common/setThreadName.h>
#include <Common/OpenSSLHelpers.h>
#include <IO/Progress.h>
#include <vector>
#include <Access/AccessControl.h>
#include <Access/Credentials.h>
#include <Compression/CompressedReadBuffer.h>
#include <Compression/CompressedWriteBuffer.h>
#include <IO/ReadBufferFromPocoSocket.h>
#include <IO/WriteBufferFromPocoSocket.h>
#include <IO/LimitReadBuffer.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <Compression/CompressionFactory.h>
#include <Core/ExternalTable.h>
#include <Core/ServerSettings.h>
#include <Formats/NativeReader.h>
#include <Formats/NativeWriter.h>
#include <Interpreters/executeQuery.h>
#include <Interpreters/TablesStatus.h>
#include <IO/LimitReadBuffer.h>
#include <IO/Progress.h>
#include <IO/ReadBufferFromPocoSocket.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteBufferFromPocoSocket.h>
#include <IO/WriteHelpers.h>
#include <Interpreters/AsynchronousInsertQueue.h>
#include <Interpreters/InternalTextLogsQueue.h>
#include <Interpreters/OpenTelemetrySpanLog.h>
#include <Interpreters/Session.h>
#include <Interpreters/Squashing.h>
#include <Interpreters/TablesStatus.h>
#include <Interpreters/executeQuery.h>
#include <Parsers/ASTInsertQuery.h>
#include <Server/TCPServer.h>
#include <Storages/StorageReplicatedMergeTree.h>
#include <Storages/MergeTree/MergeTreeDataPartUUID.h>
#include <Storages/ObjectStorage/StorageObjectStorageCluster.h>
#include <Core/ExternalTable.h>
#include <Core/ServerSettings.h>
#include <Access/AccessControl.h>
#include <Access/Credentials.h>
#include <Compression/CompressionFactory.h>
#include <Common/logger_useful.h>
#include <Storages/StorageReplicatedMergeTree.h>
#include <Poco/Net/NetException.h>
#include <Poco/Net/SocketAddress.h>
#include <Poco/Util/LayeredConfiguration.h>
#include <Common/CurrentMetrics.h>
#include <Common/CurrentThread.h>
#include <Common/NetException.h>
#include <Common/OpenSSLHelpers.h>
#include <Common/Stopwatch.h>
#include <Common/logger_useful.h>
#include <Common/scope_guard_safe.h>
#include <Common/setThreadName.h>
#include <Common/thread_local_rng.h>
#include <fmt/format.h>
#include <Processors/Executors/PullingAsyncPipelineExecutor.h>
#include <Processors/Executors/PushingPipelineExecutor.h>
@ -61,6 +61,8 @@
#include <Common/config_version.h>
#include <fmt/format.h>
using namespace std::literals;
using namespace DB;
@ -961,8 +963,8 @@ void TCPHandler::processInsertQuery()
if (settings.throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert &&
settings.deduplicate_blocks_in_dependent_materialized_views)
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED,
"Deduplication is dependent materialized view cannot work together with async inserts. "\
"Please disable eiher `deduplicate_blocks_in_dependent_materialized_views` or `async_insert` setting.");
"Deduplication in dependent materialized view cannot work together with async inserts. "\
"Please disable either `deduplicate_blocks_in_dependent_materialized_views` or `async_insert` setting.");
auto result = processAsyncInsertQuery(*insert_queue);
if (result.status == AsynchronousInsertQueue::PushResult::OK)
@ -1036,6 +1038,17 @@ void TCPHandler::processOrdinaryQuery()
PullingAsyncPipelineExecutor executor(pipeline);
CurrentMetrics::Increment query_thread_metric_increment{CurrentMetrics::QueryThread};
/// The following may happen:
/// * current thread is holding the lock
/// * because of the exception we unwind the stack and call the destructor of `executor`
/// * the destructor calls cancel() and waits for all query threads to finish
/// * at the same time one of the query threads is trying to acquire the lock, e.g. inside `merge_tree_read_task_callback`
/// * deadlock
SCOPE_EXIT({
if (out_lock.owns_lock())
out_lock.unlock();
});
Block block;
while (executor.pull(block, interactive_delay / 1000))
{
@ -1079,8 +1092,7 @@ void TCPHandler::processOrdinaryQuery()
}
/// This lock wasn't acquired before and we make .lock() call here
/// so everything under this line is covered even together
/// with sendProgress() out of the scope
/// so everything under this line is covered.
out_lock.lock();
/** If data has run out, we will send the profiling data and total values to
@ -1107,6 +1119,7 @@ void TCPHandler::processOrdinaryQuery()
last_sent_snapshots.clear();
}
out_lock.lock();
sendProgress();
}

View File

@ -73,12 +73,7 @@ std::optional<String> DataPartStorageOnDiskBase::getRelativePathForPrefix(Logger
for (int try_no = 0; try_no < 10; ++try_no)
{
if (prefix.empty())
res = part_dir + (try_no ? "_try" + DB::toString(try_no) : "");
else if (prefix.ends_with("_"))
res = prefix + part_dir + (try_no ? "_try" + DB::toString(try_no) : "");
else
res = prefix + "_" + part_dir + (try_no ? "_try" + DB::toString(try_no) : "");
res = getPartDirForPrefix(prefix, detached, try_no);
if (!volume->getDisk()->exists(full_relative_path / res))
return res;
@ -101,6 +96,36 @@ std::optional<String> DataPartStorageOnDiskBase::getRelativePathForPrefix(Logger
return res;
}
String DataPartStorageOnDiskBase::getPartDirForPrefix(const String & prefix, bool detached, int try_no) const
{
/// This function joins `prefix` and the part name and an attempt number returning something like "<prefix>_<part_name>_<tryN>".
String res = prefix;
if (!prefix.empty() && !prefix.ends_with("_"))
res += "_";
/// During RESTORE temporary part directories are created with names like "tmp_restore_all_2_2_0-XXXXXXXX".
/// To detach such a directory we need to rename it replacing "tmp_restore_" with a specified prefix,
/// and a random suffix with an attempt number.
String part_name;
if (detached && part_dir.starts_with("tmp_restore_"))
{
part_name = part_dir.substr(strlen("tmp_restore_"));
size_t endpos = part_name.find('-');
if (endpos != String::npos)
part_name.erase(endpos, String::npos);
}
if (!part_name.empty())
res += part_name;
else
res += part_dir;
if (try_no)
res += "_try" + DB::toString(try_no);
return res;
}
bool DataPartStorageOnDiskBase::looksLikeBrokenDetachedPartHasTheSameContent(const String & detached_part_path,
std::optional<String> & original_checksums_content,
std::optional<Strings> & original_files_list) const

View File

@ -148,6 +148,9 @@ private:
/// Actual file name may be the same as expected
/// or be the name of the file with packed data.
virtual NameSet getActualFileNamesOnDisk(const NameSet & file_names) const = 0;
/// Returns the destination path for the part directory while copying a detached part.
String getPartDirForPrefix(const String & prefix, bool detached, int try_no) const;
};
}

View File

@ -739,10 +739,25 @@ void IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool require_columns_checks
}
catch (...)
{
/// Don't scare people with broken part error
/// Don't scare people with broken part error if it's retryable.
if (!isRetryableException(std::current_exception()))
{
LOG_ERROR(storage.log, "Part {} is broken and needs manual correction", getDataPartStorage().getFullPath());
if (Exception * e = exception_cast<Exception *>(std::current_exception()))
{
/// Probably there is something wrong with files of this part.
/// So it can be helpful to add to the error message some information about those files.
String files_in_part;
for (auto it = getDataPartStorage().iterate(); it->isValid(); it->next())
files_in_part += fmt::format("{}{} ({} bytes)", (files_in_part.empty() ? "" : ", "), it->name(), getDataPartStorage().getFileSize(it->name()));
if (!files_in_part.empty())
e->addMessage("Part contains files: {}", files_in_part);
if (isEmpty())
e->addMessage("Part is empty");
}
}
// There could be conditions that data part to be loaded is broken, but some of meta infos are already written
// into metadata before exception, need to clean them all.
metadata_manager->deleteAll(/*include_projection*/ true);

View File

@ -5551,12 +5551,17 @@ public:
attachIfAllPartsRestored();
}
String getTemporaryDirectory(const DiskPtr & disk)
String getTemporaryDirectory(const DiskPtr & disk, const String & part_name)
{
std::lock_guard lock{mutex};
auto it = temp_dirs.find(disk);
if (it == temp_dirs.end())
it = temp_dirs.emplace(disk, std::make_shared<TemporaryFileOnDisk>(disk, "tmp/")).first;
auto it = temp_part_dirs.find(part_name);
if (it == temp_part_dirs.end())
{
auto temp_part_dir = std::make_shared<TemporaryFileOnDisk>(disk, fs::path{storage->getRelativeDataPath()} / ("tmp_restore_" + part_name + "-"));
/// Attaching parts will rename them so it's expected for a temporary part directory not to exist anymore in the end.
temp_part_dir->setShowWarningIfRemoved(false);
it = temp_part_dirs.emplace(part_name, temp_part_dir).first;
}
return it->second->getRelativePath();
}
@ -5574,7 +5579,7 @@ private:
storage->attachRestoredParts(std::move(parts));
parts.clear();
temp_dirs.clear();
temp_part_dirs.clear();
num_parts = 0;
}
@ -5583,7 +5588,7 @@ private:
size_t num_parts = 0;
size_t num_broken_parts = 0;
MutableDataPartsVector parts;
std::map<DiskPtr, std::shared_ptr<TemporaryFileOnDisk>> temp_dirs;
std::map<String /* part_name*/, std::shared_ptr<TemporaryFileOnDisk>> temp_part_dirs;
mutable std::mutex mutex;
};
@ -5636,9 +5641,11 @@ void MergeTreeData::restorePartFromBackup(std::shared_ptr<RestoredPartsHolder> r
String part_name = part_info.getPartNameAndCheckFormat(format_version);
auto backup = restored_parts_holder->getBackup();
/// Find all files of this part in the backup.
Strings filenames = backup->listFiles(part_path_in_backup, /* recursive= */ true);
/// Calculate the total size of the part.
UInt64 total_size_of_part = 0;
Strings filenames = backup->listFiles(part_path_in_backup, /* recursive= */ true);
fs::path part_path_in_backup_fs = part_path_in_backup;
for (const String & filename : filenames)
total_size_of_part += backup->getFileSize(part_path_in_backup_fs / filename);
@ -5648,11 +5655,9 @@ void MergeTreeData::restorePartFromBackup(std::shared_ptr<RestoredPartsHolder> r
/// Calculate paths, for example:
/// part_name = 0_1_1_0
/// part_path_in_backup = /data/test/table/0_1_1_0
/// tmp_dir = tmp/1aaaaaa
/// tmp_part_dir = tmp/1aaaaaa/data/test/table/0_1_1_0
/// temp_part_dir = /var/lib/clickhouse/data/test/table/tmp_restore_all_0_1_1_0-XXXXXXXX
auto disk = reservation->getDisk();
fs::path temp_dir = restored_parts_holder->getTemporaryDirectory(disk);
fs::path temp_part_dir = temp_dir / part_path_in_backup_fs.relative_path();
fs::path temp_part_dir = restored_parts_holder->getTemporaryDirectory(disk, part_name);
/// Subdirectories in the part's directory. It's used to restore projections.
std::unordered_set<String> subdirs;
@ -5679,22 +5684,25 @@ void MergeTreeData::restorePartFromBackup(std::shared_ptr<RestoredPartsHolder> r
reservation->update(reservation->getSize() - file_size);
}
if (auto part = loadPartRestoredFromBackup(disk, temp_part_dir.parent_path(), part_name, detach_if_broken))
if (auto part = loadPartRestoredFromBackup(part_name, disk, temp_part_dir, detach_if_broken))
restored_parts_holder->addPart(part);
else
restored_parts_holder->increaseNumBrokenParts();
}
MergeTreeData::MutableDataPartPtr MergeTreeData::loadPartRestoredFromBackup(const DiskPtr & disk, const String & temp_dir, const String & part_name, bool detach_if_broken) const
MergeTreeData::MutableDataPartPtr MergeTreeData::loadPartRestoredFromBackup(const String & part_name, const DiskPtr & disk, const String & temp_part_dir, bool detach_if_broken) const
{
MutableDataPartPtr part;
auto single_disk_volume = std::make_shared<SingleDiskVolume>(disk->getName(), disk, 0);
fs::path full_part_dir{temp_part_dir};
String parent_part_dir = full_part_dir.parent_path();
String part_dir_name = full_part_dir.filename();
/// Load this part from the directory `tmp_part_dir`.
/// Load this part from the directory `temp_part_dir`.
auto load_part = [&]
{
MergeTreeDataPartBuilder builder(*this, part_name, single_disk_volume, temp_dir, part_name);
MergeTreeDataPartBuilder builder(*this, part_name, single_disk_volume, parent_part_dir, part_dir_name);
builder.withPartFormatFromDisk();
part = std::move(builder).build();
part->version.setCreationTID(Tx::PrehistoricTID, nullptr);
@ -5709,7 +5717,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeData::loadPartRestoredFromBackup(cons
if (!part)
{
/// Make a fake data part only to copy its files to /detached/.
part = MergeTreeDataPartBuilder{*this, part_name, single_disk_volume, temp_dir, part_name}
part = MergeTreeDataPartBuilder{*this, part_name, single_disk_volume, parent_part_dir, part_dir_name}
.withPartStorageType(MergeTreeDataPartStorageType::Full)
.withPartType(MergeTreeDataPartType::Wide)
.build();

View File

@ -1473,7 +1473,7 @@ protected:
/// Restores the parts of this table from backup.
void restorePartsFromBackup(RestorerFromBackup & restorer, const String & data_path_in_backup, const std::optional<ASTs> & partitions);
void restorePartFromBackup(std::shared_ptr<RestoredPartsHolder> restored_parts_holder, const MergeTreePartInfo & part_info, const String & part_path_in_backup, bool detach_if_broken) const;
MutableDataPartPtr loadPartRestoredFromBackup(const DiskPtr & disk, const String & temp_dir, const String & part_name, bool detach_if_broken) const;
MutableDataPartPtr loadPartRestoredFromBackup(const String & part_name, const DiskPtr & disk, const String & temp_part_dir, bool detach_if_broken) const;
/// Attaches restored parts to the storage.
virtual void attachRestoredParts(MutableDataPartsVector && parts) = 0;

View File

@ -379,7 +379,6 @@ void MergeTreePrefetchedReadPool::fillPerThreadTasks(size_t threads, size_t sum_
for (const auto & part : per_part_statistics)
total_size_approx += part.sum_marks * part.approx_size_of_mark;
size_t min_prefetch_step_marks = pool_settings.min_marks_for_concurrent_read;
for (size_t i = 0; i < per_part_infos.size(); ++i)
{
auto & part_stat = per_part_statistics[i];
@ -394,29 +393,7 @@ void MergeTreePrefetchedReadPool::fillPerThreadTasks(size_t threads, size_t sum_
1, static_cast<size_t>(std::round(static_cast<double>(settings.filesystem_prefetch_step_bytes) / part_stat.approx_size_of_mark)));
}
/// This limit is important to avoid spikes of slow aws getObject requests when parallelizing within one file.
/// (The default is taken from here https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html).
if (part_stat.approx_size_of_mark
&& settings.filesystem_prefetch_min_bytes_for_single_read_task
&& part_stat.approx_size_of_mark < settings.filesystem_prefetch_min_bytes_for_single_read_task)
{
const size_t min_prefetch_step_marks_by_total_cols = static_cast<size_t>(
std::ceil(static_cast<double>(settings.filesystem_prefetch_min_bytes_for_single_read_task) / part_stat.approx_size_of_mark));
/// At least one task to start working on it right now and another one to prefetch in the meantime.
const size_t new_min_prefetch_step_marks = std::min<size_t>(min_prefetch_step_marks_by_total_cols, sum_marks / threads / 2);
if (min_prefetch_step_marks < new_min_prefetch_step_marks)
{
LOG_DEBUG(log, "Increasing min prefetch step from {} to {}", min_prefetch_step_marks, new_min_prefetch_step_marks);
min_prefetch_step_marks = new_min_prefetch_step_marks;
}
}
if (part_stat.prefetch_step_marks < min_prefetch_step_marks)
{
LOG_DEBUG(log, "Increasing prefetch step from {} to {}", part_stat.prefetch_step_marks, min_prefetch_step_marks);
part_stat.prefetch_step_marks = min_prefetch_step_marks;
}
part_stat.prefetch_step_marks = std::max(part_stat.prefetch_step_marks, per_part_infos[i]->min_marks_per_task);
LOG_DEBUG(
log,
@ -433,11 +410,10 @@ void MergeTreePrefetchedReadPool::fillPerThreadTasks(size_t threads, size_t sum_
LOG_DEBUG(
log,
"Sum marks: {}, threads: {}, min_marks_per_thread: {}, min prefetch step marks: {}, prefetches limit: {}, total_size_approx: {}",
"Sum marks: {}, threads: {}, min_marks_per_thread: {}, prefetches limit: {}, total_size_approx: {}",
sum_marks,
threads,
min_marks_per_thread,
min_prefetch_step_marks,
settings.filesystem_prefetches_limit,
total_size_approx);

Some files were not shown because too many files have changed in this diff Show More