diff --git a/CHANGELOG.md b/CHANGELOG.md index 01bbafe2f16..a09e4d4fe24 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,10 +1,194 @@ ### Table of Contents +**[ClickHouse release v23.3 LTS, 2023-03-30](#233)**
**[ClickHouse release v23.2, 2023-02-23](#232)**
**[ClickHouse release v23.1, 2023-01-25](#231)**
**[Changelog for 2022](https://clickhouse.com/docs/en/whats-new/changelog/2022/)**
# 2023 Changelog +### ClickHouse release 23.3 LTS, 2023-03-30 + +#### Upgrade Notes +* The behavior of `*domain*RFC` and `netloc` functions is slightly changed: relaxed the set of symbols that are allowed in the URL authority for better conformance. [#46841](https://github.com/ClickHouse/ClickHouse/pull/46841) ([Azat Khuzhin](https://github.com/azat)). +* Prohibited creating tables based on KafkaEngine with DEFAULT/EPHEMERAL/ALIAS/MATERIALIZED statements for columns. [#47138](https://github.com/ClickHouse/ClickHouse/pull/47138) ([Aleksandr Musorin](https://github.com/AVMusorin)). +* An "asynchronous connection drain" feature is removed. Related settings and metrics are removed as well. It was an internal feature, so the removal should not affect users who had never heard about that feature. [#47486](https://github.com/ClickHouse/ClickHouse/pull/47486) ([Alexander Tokmakov](https://github.com/tavplubix)). +* Support 256-bit Decimal data type (more than 38 digits) in `arraySum`/`Min`/`Max`/`Avg`/`Product`, `arrayCumSum`/`CumSumNonNegative`, `arrayDifference`, array construction, IN operator, query parameters, `groupArrayMovingSum`, statistical functions, `min`/`max`/`any`/`argMin`/`argMax`, PostgreSQL wire protocol, MySQL table engine and function, `sumMap`, `mapAdd`, `mapSubtract`, `arrayIntersect`. Add support for big integers in `arrayIntersect`. Statistical aggregate functions involving moments (such as `corr` or various `TTest`s) will use `Float64` as their internal representation (they were using `Decimal128` before this change, but it was pointless), and these functions can return `nan` instead of `inf` in case of infinite variance. Some functions were allowed on `Decimal256` data types but returned `Decimal128` in previous versions - now it is fixed. This closes [#47569](https://github.com/ClickHouse/ClickHouse/issues/47569). This closes [#44864](https://github.com/ClickHouse/ClickHouse/issues/44864). This closes [#28335](https://github.com/ClickHouse/ClickHouse/issues/28335). [#47594](https://github.com/ClickHouse/ClickHouse/pull/47594) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Make backup_threads/restore_threads server settings (instead of user settings). [#47881](https://github.com/ClickHouse/ClickHouse/pull/47881) ([Azat Khuzhin](https://github.com/azat)). +* Do not allow const and non-deterministic secondary indices [#46839](https://github.com/ClickHouse/ClickHouse/pull/46839) ([Anton Popov](https://github.com/CurtizJ)). + +#### New Feature +* Add new mode for splitting the work on replicas using settings `parallel_replicas_custom_key` and `parallel_replicas_custom_key_filter_type`. If the cluster consists of a single shard with multiple replicas, up to `max_parallel_replicas` will be randomly picked and turned into shards. For each shard, a corresponding filter is added to the query on the initiator before being sent to the shard. If the cluster consists of multiple shards, it will behave the same as `sample_key` but with the possibility to define an arbitrary key. [#45108](https://github.com/ClickHouse/ClickHouse/pull/45108) ([Antonio Andelic](https://github.com/antonio2368)). +* An option to display partial result on cancel: Added query setting `stop_reading_on_first_cancel` allowing the canceled query (e.g. due to Ctrl-C) to return a partial result. [#45689](https://github.com/ClickHouse/ClickHouse/pull/45689) ([Alexey Perevyshin](https://github.com/alexX512)). +* Added support of arbitrary tables engines for temporary tables (except for Replicated and KeeperMap engines). Close [#31497](https://github.com/ClickHouse/ClickHouse/issues/31497). [#46071](https://github.com/ClickHouse/ClickHouse/pull/46071) ([Roman Vasin](https://github.com/rvasin)). +* Add support for replication of user-defined SQL functions using a centralized storage in Keeper. [#46085](https://github.com/ClickHouse/ClickHouse/pull/46085) ([Aleksei Filatov](https://github.com/aalexfvk)). +* Implement `system.server_settings` (similar to `system.settings`), which will contain server configurations. [#46550](https://github.com/ClickHouse/ClickHouse/pull/46550) ([pufit](https://github.com/pufit)). +* Support for `UNDROP TABLE` query. Closes [#46811](https://github.com/ClickHouse/ClickHouse/issues/46811). [#47241](https://github.com/ClickHouse/ClickHouse/pull/47241) ([chen](https://github.com/xiedeyantu)). +* Allow separate grants for named collections (e.g. to be able to give `SHOW/CREATE/ALTER/DROP named collection` access only to certain collections, instead of all at once). Closes [#40894](https://github.com/ClickHouse/ClickHouse/issues/40894). Add new access type `NAMED_COLLECTION_CONTROL` which is not given to default user unless explicitly added to user config (is required to be able to do `GRANT ALL`), also `show_named_collections` is no longer obligatory to be manually specified for default user to be able to have full access rights as was in 23.2. [#46241](https://github.com/ClickHouse/ClickHouse/pull/46241) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Allow nested custom disks. Previously custom disks supported only flat disk structure. [#47106](https://github.com/ClickHouse/ClickHouse/pull/47106) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Intruduce a function `widthBucket` (with a `WIDTH_BUCKET` alias for compatibility). [#42974](https://github.com/ClickHouse/ClickHouse/issues/42974). [#46790](https://github.com/ClickHouse/ClickHouse/pull/46790) ([avoiderboi](https://github.com/avoiderboi)). +* Add new function `parseDateTime`/`parseDateTimeInJodaSyntax` according to specified format string. parseDateTime parses string to datetime in MySQL syntax, parseDateTimeInJodaSyntax parses in Joda syntax. [#46815](https://github.com/ClickHouse/ClickHouse/pull/46815) ([李扬](https://github.com/taiyang-li)). +* Use `dummy UInt8` for default structure of table function `null`. Closes [#46930](https://github.com/ClickHouse/ClickHouse/issues/46930). [#47006](https://github.com/ClickHouse/ClickHouse/pull/47006) ([flynn](https://github.com/ucasfl)). +* Support for date format with a comma, like `Dec 15, 2021` in the `parseDateTimeBestEffort` function. Closes [#46816](https://github.com/ClickHouse/ClickHouse/issues/46816). [#47071](https://github.com/ClickHouse/ClickHouse/pull/47071) ([chen](https://github.com/xiedeyantu)). +* Add settings `http_wait_end_of_query` and `http_response_buffer_size` that corresponds to URL params `wait_end_of_query` and `buffer_size` for HTTP interface. This allows to change these settings in the profiles. [#47108](https://github.com/ClickHouse/ClickHouse/pull/47108) ([Vladimir C](https://github.com/vdimir)). +* Add `system.dropped_tables` table that shows tables that were dropped from `Atomic` databases but were not completely removed yet. [#47364](https://github.com/ClickHouse/ClickHouse/pull/47364) ([chen](https://github.com/xiedeyantu)). +* Add `INSTR` as alias of `positionCaseInsensitive` for MySQL compatibility. Closes [#47529](https://github.com/ClickHouse/ClickHouse/issues/47529). [#47535](https://github.com/ClickHouse/ClickHouse/pull/47535) ([flynn](https://github.com/ucasfl)). +* Added `toDecimalString` function allowing to convert numbers to string with fixed precision. [#47838](https://github.com/ClickHouse/ClickHouse/pull/47838) ([Andrey Zvonov](https://github.com/zvonand)). +* Add a merge tree setting `max_number_of_mutations_for_replica`. It limits the number of part mutations per replica to the specified amount. Zero means no limit on the number of mutations per replica (the execution can still be constrained by other settings). [#48047](https://github.com/ClickHouse/ClickHouse/pull/48047) ([Vladimir C](https://github.com/vdimir)). +* Add Map-related function `mapFromArrays`, which allows us to create map from a pair of arrays. [#31125](https://github.com/ClickHouse/ClickHouse/pull/31125) ([李扬](https://github.com/taiyang-li)). +* Allow control compression in Parquet/ORC/Arrow output formats, support more compression for input formats. This closes [#13541](https://github.com/ClickHouse/ClickHouse/issues/13541). [#47114](https://github.com/ClickHouse/ClickHouse/pull/47114) ([Kruglov Pavel](https://github.com/Avogar)). +* Add SSL User Certificate authentication to the native protocol. Closes [#47077](https://github.com/ClickHouse/ClickHouse/issues/47077). [#47596](https://github.com/ClickHouse/ClickHouse/pull/47596) ([Nikolay Degterinsky](https://github.com/evillique)). +* Add *OrNull() and *OrZero() variants for `parseDateTime`, add alias `str_to_date` for MySQL parity. [#48000](https://github.com/ClickHouse/ClickHouse/pull/48000) ([Robert Schulze](https://github.com/rschu1ze)). +* Added operator `REGEXP` (similar to operators "LIKE", "IN", "MOD" etc.) for better compatibility with MySQL [#47869](https://github.com/ClickHouse/ClickHouse/pull/47869) ([Robert Schulze](https://github.com/rschu1ze)). + +#### Performance Improvement +* Marks in memory are now compressed, using 3-6x less memory. [#47290](https://github.com/ClickHouse/ClickHouse/pull/47290) ([Michael Kolupaev](https://github.com/al13n321)). +* Backups for large numbers of files were unbelievably slow in previous versions. Not anymore. Now they are unbelievably fast. [#47251](https://github.com/ClickHouse/ClickHouse/pull/47251) ([Alexey Milovidov](https://github.com/alexey-milovidov)). Introduced a separate thread pool for backup's IO operations. This will allow to scale it independently of other pools and increase performance. [#47174](https://github.com/ClickHouse/ClickHouse/pull/47174) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). Use MultiRead request and retries for collecting metadata at final stage of backup processing. [#47243](https://github.com/ClickHouse/ClickHouse/pull/47243) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). If a backup and restoring data are both in S3 then server-side copy should be used from now on. [#47546](https://github.com/ClickHouse/ClickHouse/pull/47546) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fixed excessive reading in queries with `FINAL`. [#47801](https://github.com/ClickHouse/ClickHouse/pull/47801) ([Nikita Taranov](https://github.com/nickitat)). +* Setting `max_final_threads` would be set to number of cores at server startup (by the same algorithm as we use for `max_threads`). This improves concurrency of `final` execution on servers with high number of CPUs. [#47915](https://github.com/ClickHouse/ClickHouse/pull/47915) ([Nikita Taranov](https://github.com/nickitat)). +* Allow executing reading pipeline for DIRECT dictionary with CLICKHOUSE source in multiple threads. To enable set `dictionary_use_async_executor=1` in `SETTINGS` section for source in `CREATE DICTIONARY` statement. [#47986](https://github.com/ClickHouse/ClickHouse/pull/47986) ([Vladimir C](https://github.com/vdimir)). +* Optimize one nullable key aggregate performance. [#45772](https://github.com/ClickHouse/ClickHouse/pull/45772) ([LiuNeng](https://github.com/liuneng1994)). +* Implemented lowercase `tokenbf_v1` index utilization for `hasTokenOrNull`, `hasTokenCaseInsensitive` and `hasTokenCaseInsensitiveOrNull`. [#46252](https://github.com/ClickHouse/ClickHouse/pull/46252) ([ltrk2](https://github.com/ltrk2)). +* Optimize functions `position` and `LIKE` by searching the first two chars using SIMD. [#46289](https://github.com/ClickHouse/ClickHouse/pull/46289) ([Jiebin Sun](https://github.com/jiebinn)). +* Optimize queries from the `system.detached_parts`, which could be significantly large. Added several sources with respect to the block size limitation; in each block an IO thread pool is used to calculate the part size, i.e. to make syscalls in parallel. [#46624](https://github.com/ClickHouse/ClickHouse/pull/46624) ([Sema Checherinda](https://github.com/CheSema)). +* Increase the default value of `max_replicated_merges_in_queue` for ReplicatedMergeTree tables from 16 to 1000. It allows faster background merge operation on clusters with a very large number of replicas, such as clusters with shared storage in ClickHouse Cloud. [#47050](https://github.com/ClickHouse/ClickHouse/pull/47050) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Updated `clickhouse-copier` to use `GROUP BY` instead of `DISTINCT` to get list of partitions. For large tables this reduced the select time from over 500s to under 1s. [#47386](https://github.com/ClickHouse/ClickHouse/pull/47386) ([Clayton McClure](https://github.com/cmcclure-twilio)). +* Fix performance degradation in `ASOF JOIN`. [#47544](https://github.com/ClickHouse/ClickHouse/pull/47544) ([Ongkong](https://github.com/ongkong)). +* Even more batching in Keeper. Avoid breaking batches on read requests to improve performance. [#47978](https://github.com/ClickHouse/ClickHouse/pull/47978) ([Antonio Andelic](https://github.com/antonio2368)). +* Allow PREWHERE for Merge with different DEFAULT expression for column. [#46831](https://github.com/ClickHouse/ClickHouse/pull/46831) ([Azat Khuzhin](https://github.com/azat)). + +#### Experimental Feature +* Parallel replicas: Improved the overall performance by better utilizing local replica. And forbid reading with parallel replicas from non-replicated MergeTree by default. [#47858](https://github.com/ClickHouse/ClickHouse/pull/47858) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Support filter push down to left table for JOIN with `Join`, `Dictionary` and `EmbeddedRocksDB` tables if the experimental Analyzer is enabled. [#47280](https://github.com/ClickHouse/ClickHouse/pull/47280) ([Maksim Kita](https://github.com/kitaisreal)). +* Now ReplicatedMergeTree with zero copy replication has less load to Keeper. [#47676](https://github.com/ClickHouse/ClickHouse/pull/47676) ([alesapin](https://github.com/alesapin)). +* Fix create materialized view with MaterializedPostgreSQL [#40807](https://github.com/ClickHouse/ClickHouse/pull/40807) ([Maksim Buren](https://github.com/maks-buren630501)). + +#### Improvement +* Enable `input_format_json_ignore_unknown_keys_in_named_tuple` by default. [#46742](https://github.com/ClickHouse/ClickHouse/pull/46742) ([Kruglov Pavel](https://github.com/Avogar)). +* Allow to ignore errors while pushing to MATERIALIZED VIEW (add new setting `materialized_views_ignore_errors`, by default to `false`, but it is set to `true` for flushing logs to `system.*_log` tables unconditionally). [#46658](https://github.com/ClickHouse/ClickHouse/pull/46658) ([Azat Khuzhin](https://github.com/azat)). +* Track the file queue of distributed sends in memory. [#45491](https://github.com/ClickHouse/ClickHouse/pull/45491) ([Azat Khuzhin](https://github.com/azat)). +* Now `X-ClickHouse-Query-Id` and `X-ClickHouse-Timezone` headers are added to response in all queries via http protocol. Previously it was done only for `SELECT` queries. [#46364](https://github.com/ClickHouse/ClickHouse/pull/46364) ([Anton Popov](https://github.com/CurtizJ)). +* External tables from `MongoDB`: support for connection to a replica set via a URI with a host:port enum and support for the readPreference option in MongoDB dictionaries. Example URI: mongodb://db0.example.com:27017,db1.example.com:27017,db2.example.com:27017/?replicaSet=myRepl&readPreference=primary. [#46524](https://github.com/ClickHouse/ClickHouse/pull/46524) ([artem-yadr](https://github.com/artem-yadr)). +* This improvement should be invisible for users. Re-implement projection analysis on top of query plan. Added setting `query_plan_optimize_projection=1` to switch between old and new version. Fixes [#44963](https://github.com/ClickHouse/ClickHouse/issues/44963). [#46537](https://github.com/ClickHouse/ClickHouse/pull/46537) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Use parquet format v2 instead of v1 in output format by default. Add setting `output_format_parquet_version` to control parquet version, possible values `1.0`, `2.4`, `2.6`, `2.latest` (default). [#46617](https://github.com/ClickHouse/ClickHouse/pull/46617) ([Kruglov Pavel](https://github.com/Avogar)). +* It is now possible using new configuration syntax to configure Kafka topics with periods (`.`) in their name. [#46752](https://github.com/ClickHouse/ClickHouse/pull/46752) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix heuristics that check hyperscan patterns for problematic repeats. [#46819](https://github.com/ClickHouse/ClickHouse/pull/46819) ([Robert Schulze](https://github.com/rschu1ze)). +* Don't report ZK node exists to system.errors when a block was created concurrently by a different replica. [#46820](https://github.com/ClickHouse/ClickHouse/pull/46820) ([Raúl Marín](https://github.com/Algunenano)). +* Increase the limit for opened files in `clickhouse-local`. It will be able to read from `web` tables on servers with a huge number of CPU cores. Do not back off reading from the URL table engine in case of too many opened files. This closes [#46852](https://github.com/ClickHouse/ClickHouse/issues/46852). [#46853](https://github.com/ClickHouse/ClickHouse/pull/46853) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Exceptions thrown when numbers cannot be parsed now have an easier-to-read exception message. [#46917](https://github.com/ClickHouse/ClickHouse/pull/46917) ([Robert Schulze](https://github.com/rschu1ze)). +* Added update `system.backups` after every processed task to track the progress of backups. [#46989](https://github.com/ClickHouse/ClickHouse/pull/46989) ([Aleksandr Musorin](https://github.com/AVMusorin)). +* Allow types conversion in Native input format. Add settings `input_format_native_allow_types_conversion` that controls it (enabled by default). [#46990](https://github.com/ClickHouse/ClickHouse/pull/46990) ([Kruglov Pavel](https://github.com/Avogar)). +* Allow IPv4 in the `range` function to generate IP ranges. [#46995](https://github.com/ClickHouse/ClickHouse/pull/46995) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Improve exception message when it's impossible to make part move from one volume/disk to another. [#47032](https://github.com/ClickHouse/ClickHouse/pull/47032) ([alesapin](https://github.com/alesapin)). +* Support `Bool` type in `JSONType` function. Previously `Null` type was mistakenly returned for bool values. [#47046](https://github.com/ClickHouse/ClickHouse/pull/47046) ([Anton Popov](https://github.com/CurtizJ)). +* Use `_request_body` parameter to configure predefined HTTP queries. [#47086](https://github.com/ClickHouse/ClickHouse/pull/47086) ([Constantine Peresypkin](https://github.com/pkit)). +* Automatic indentation in the built-in UI SQL editor when Enter is pressed. [#47113](https://github.com/ClickHouse/ClickHouse/pull/47113) ([Alexey Korepanov](https://github.com/alexkorep)). +* Self-extraction with 'sudo' will attempt to set uid and gid of extracted files to running user. [#47116](https://github.com/ClickHouse/ClickHouse/pull/47116) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Previously, the `repeat` function's second argument only accepted an unsigned integer type, which meant it could not accept values such as -1. This behavior differed from that of the Spark function. In this update, the repeat function has been modified to match the behavior of the Spark function. It now accepts the same types of inputs, including negative integers. Extensive testing has been performed to verify the correctness of the updated implementation. [#47134](https://github.com/ClickHouse/ClickHouse/pull/47134) ([KevinyhZou](https://github.com/KevinyhZou)). Note: the changelog entry was rewritten by ChatGPT. +* Remove `::__1` part from stacktraces. Display `std::basic_string ClickHouse release 23.2, 2023-02-23 #### Backward Incompatible Change diff --git a/CMakeLists.txt b/CMakeLists.txt index bd7de46474b..ce615c11f2b 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -568,7 +568,7 @@ if (NATIVE_BUILD_TARGETS COMMAND ${CMAKE_COMMAND} "-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}" "-DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}" - "-DENABLE_CCACHE=${ENABLE_CCACHE}" + "-DCOMPILER_CACHE=${COMPILER_CACHE}" # Avoid overriding .cargo/config.toml with native toolchain. "-DENABLE_RUST=OFF" "-DENABLE_CLICKHOUSE_SELF_EXTRACTING=${ENABLE_CLICKHOUSE_SELF_EXTRACTING}" diff --git a/cmake/ccache.cmake b/cmake/ccache.cmake index f0769f337d0..285c8efb9e1 100644 --- a/cmake/ccache.cmake +++ b/cmake/ccache.cmake @@ -1,5 +1,6 @@ # Setup integration with ccache to speed up builds, see https://ccache.dev/ +# Matches both ccache and sccache if (CMAKE_CXX_COMPILER_LAUNCHER MATCHES "ccache" OR CMAKE_C_COMPILER_LAUNCHER MATCHES "ccache") # custom compiler launcher already defined, most likely because cmake was invoked with like "-DCMAKE_CXX_COMPILER_LAUNCHER=ccache" or # via environment variable --> respect setting and trust that the launcher was specified correctly @@ -8,45 +9,65 @@ if (CMAKE_CXX_COMPILER_LAUNCHER MATCHES "ccache" OR CMAKE_C_COMPILER_LAUNCHER MA return() endif() -option(ENABLE_CCACHE "Speedup re-compilations using ccache (external tool)" ON) - -if (NOT ENABLE_CCACHE) - message(STATUS "Using ccache: no (disabled via configuration)") - return() +set(ENABLE_CCACHE "default" CACHE STRING "Deprecated, use COMPILER_CACHE=(auto|ccache|sccache|disabled)") +if (NOT ENABLE_CCACHE STREQUAL "default") + message(WARNING "The -DENABLE_CCACHE is deprecated in favor of -DCOMPILER_CACHE") +endif() + +set(COMPILER_CACHE "auto" CACHE STRING "Speedup re-compilations using the caching tools; valid options are 'auto' (ccache, then sccache), 'ccache', 'sccache', or 'disabled'") + +# It has pretty complex logic, because the ENABLE_CCACHE is deprecated, but still should +# control the COMPILER_CACHE +# After it will be completely removed, the following block will be much simpler +if (COMPILER_CACHE STREQUAL "ccache" OR (ENABLE_CCACHE AND NOT ENABLE_CCACHE STREQUAL "default")) + find_program (CCACHE_EXECUTABLE ccache) +elseif(COMPILER_CACHE STREQUAL "disabled" OR NOT ENABLE_CCACHE STREQUAL "default") + message(STATUS "Using *ccache: no (disabled via configuration)") + return() +elseif(COMPILER_CACHE STREQUAL "auto") + find_program (CCACHE_EXECUTABLE ccache sccache) +elseif(COMPILER_CACHE STREQUAL "sccache") + find_program (CCACHE_EXECUTABLE sccache) +else() + message(${RECONFIGURE_MESSAGE_LEVEL} "The COMPILER_CACHE must be one of (auto|ccache|sccache|disabled), given '${COMPILER_CACHE}'") endif() -find_program (CCACHE_EXECUTABLE ccache) if (NOT CCACHE_EXECUTABLE) - message(${RECONFIGURE_MESSAGE_LEVEL} "Using ccache: no (Could not find find ccache. To significantly reduce compile times for the 2nd, 3rd, etc. build, it is highly recommended to install ccache. To suppress this message, run cmake with -DENABLE_CCACHE=0)") + message(${RECONFIGURE_MESSAGE_LEVEL} "Using *ccache: no (Could not find find ccache or sccache. To significantly reduce compile times for the 2nd, 3rd, etc. build, it is highly recommended to install one of them. To suppress this message, run cmake with -DCOMPILER_CACHE=disabled)") return() endif() -execute_process(COMMAND ${CCACHE_EXECUTABLE} "-V" OUTPUT_VARIABLE CCACHE_VERSION) -string(REGEX REPLACE "ccache version ([0-9\\.]+).*" "\\1" CCACHE_VERSION ${CCACHE_VERSION}) +if (CCACHE_EXECUTABLE MATCHES "/ccache$") + execute_process(COMMAND ${CCACHE_EXECUTABLE} "-V" OUTPUT_VARIABLE CCACHE_VERSION) + string(REGEX REPLACE "ccache version ([0-9\\.]+).*" "\\1" CCACHE_VERSION ${CCACHE_VERSION}) -set (CCACHE_MINIMUM_VERSION 3.3) + set (CCACHE_MINIMUM_VERSION 3.3) -if (CCACHE_VERSION VERSION_LESS_EQUAL ${CCACHE_MINIMUM_VERSION}) - message(${RECONFIGURE_MESSAGE_LEVEL} "Using ccache: no (found ${CCACHE_EXECUTABLE} (version ${CCACHE_VERSION}), the minimum required version is ${CCACHE_MINIMUM_VERSION}") - return() -endif() + if (CCACHE_VERSION VERSION_LESS_EQUAL ${CCACHE_MINIMUM_VERSION}) + message(${RECONFIGURE_MESSAGE_LEVEL} "Using ccache: no (found ${CCACHE_EXECUTABLE} (version ${CCACHE_VERSION}), the minimum required version is ${CCACHE_MINIMUM_VERSION}") + return() + endif() -message(STATUS "Using ccache: ${CCACHE_EXECUTABLE} (version ${CCACHE_VERSION})") -set(LAUNCHER ${CCACHE_EXECUTABLE}) + message(STATUS "Using ccache: ${CCACHE_EXECUTABLE} (version ${CCACHE_VERSION})") + set(LAUNCHER ${CCACHE_EXECUTABLE}) -# Work around a well-intended but unfortunate behavior of ccache 4.0 & 4.1 with -# environment variable SOURCE_DATE_EPOCH. This variable provides an alternative -# to source-code embedded timestamps (__DATE__/__TIME__) and therefore helps with -# reproducible builds (*). SOURCE_DATE_EPOCH is set automatically by the -# distribution, e.g. Debian. Ccache 4.0 & 4.1 incorporate SOURCE_DATE_EPOCH into -# the hash calculation regardless they contain timestamps or not. This invalidates -# the cache whenever SOURCE_DATE_EPOCH changes. As a fix, ignore SOURCE_DATE_EPOCH. -# -# (*) https://reproducible-builds.org/specs/source-date-epoch/ -if (CCACHE_VERSION VERSION_GREATER_EQUAL "4.0" AND CCACHE_VERSION VERSION_LESS "4.2") - message(STATUS "Ignore SOURCE_DATE_EPOCH for ccache 4.0 / 4.1") - set(LAUNCHER env -u SOURCE_DATE_EPOCH ${CCACHE_EXECUTABLE}) + # Work around a well-intended but unfortunate behavior of ccache 4.0 & 4.1 with + # environment variable SOURCE_DATE_EPOCH. This variable provides an alternative + # to source-code embedded timestamps (__DATE__/__TIME__) and therefore helps with + # reproducible builds (*). SOURCE_DATE_EPOCH is set automatically by the + # distribution, e.g. Debian. Ccache 4.0 & 4.1 incorporate SOURCE_DATE_EPOCH into + # the hash calculation regardless they contain timestamps or not. This invalidates + # the cache whenever SOURCE_DATE_EPOCH changes. As a fix, ignore SOURCE_DATE_EPOCH. + # + # (*) https://reproducible-builds.org/specs/source-date-epoch/ + if (CCACHE_VERSION VERSION_GREATER_EQUAL "4.0" AND CCACHE_VERSION VERSION_LESS "4.2") + message(STATUS "Ignore SOURCE_DATE_EPOCH for ccache 4.0 / 4.1") + set(LAUNCHER env -u SOURCE_DATE_EPOCH ${CCACHE_EXECUTABLE}) + endif() +elseif(CCACHE_EXECUTABLE MATCHES "/sccache$") + message(STATUS "Using sccache: ${CCACHE_EXECUTABLE}") + set(LAUNCHER ${CCACHE_EXECUTABLE}) endif() set (CMAKE_CXX_COMPILER_LAUNCHER ${LAUNCHER} ${CMAKE_CXX_COMPILER_LAUNCHER}) diff --git a/contrib/boost b/contrib/boost index 03d9ec9cd15..8fe7b3326ef 160000 --- a/contrib/boost +++ b/contrib/boost @@ -1 +1 @@ -Subproject commit 03d9ec9cd159d14bd0b17c05138098451a1ea606 +Subproject commit 8fe7b3326ef482ee6ecdf5a4f698f2b8c2780f98 diff --git a/contrib/llvm-project b/contrib/llvm-project index 4bfaeb31dd0..e0accd51793 160000 --- a/contrib/llvm-project +++ b/contrib/llvm-project @@ -1 +1 @@ -Subproject commit 4bfaeb31dd0ef13f025221f93c138974a3e0a22a +Subproject commit e0accd517933ebb44aff84bc8db448ffd8ef1929 diff --git a/docker/packager/binary/Dockerfile b/docker/packager/binary/Dockerfile index 62e6d47c183..fa860b2207f 100644 --- a/docker/packager/binary/Dockerfile +++ b/docker/packager/binary/Dockerfile @@ -69,13 +69,14 @@ RUN add-apt-repository ppa:ubuntu-toolchain-r/test --yes \ libc6 \ libc6-dev \ libc6-dev-arm64-cross \ + python3-boto3 \ yasm \ zstd \ && apt-get clean \ && rm -rf /var/lib/apt/lists # Download toolchain and SDK for Darwin -RUN wget -nv https://github.com/phracker/MacOSX-SDKs/releases/download/11.3/MacOSX11.0.sdk.tar.xz +RUN curl -sL -O https://github.com/phracker/MacOSX-SDKs/releases/download/11.3/MacOSX11.0.sdk.tar.xz # Architecture of the image when BuildKit/buildx is used ARG TARGETARCH @@ -97,7 +98,7 @@ ENV PATH="$PATH:/usr/local/go/bin" ENV GOPATH=/workdir/go ENV GOCACHE=/workdir/ -ARG CLANG_TIDY_SHA1=03644275e794b0587849bfc2ec6123d5ae0bdb1c +ARG CLANG_TIDY_SHA1=c191254ea00d47ade11d7170ef82fe038c213774 RUN curl -Lo /usr/bin/clang-tidy-cache \ "https://raw.githubusercontent.com/matus-chochlik/ctcache/$CLANG_TIDY_SHA1/clang-tidy-cache" \ && chmod +x /usr/bin/clang-tidy-cache diff --git a/docker/packager/binary/build.sh b/docker/packager/binary/build.sh index 24dca72e946..2cd0a011013 100755 --- a/docker/packager/binary/build.sh +++ b/docker/packager/binary/build.sh @@ -6,6 +6,7 @@ exec &> >(ts) ccache_status () { ccache --show-config ||: ccache --show-stats ||: + SCCACHE_NO_DAEMON=1 sccache --show-stats ||: } [ -O /build ] || git config --global --add safe.directory /build diff --git a/docker/packager/packager b/docker/packager/packager index 58dd299fd6d..7d022df52e6 100755 --- a/docker/packager/packager +++ b/docker/packager/packager @@ -5,13 +5,19 @@ import os import argparse import logging import sys -from typing import List +from pathlib import Path +from typing import List, Optional -SCRIPT_PATH = os.path.realpath(__file__) +SCRIPT_PATH = Path(__file__).absolute() IMAGE_TYPE = "binary" +IMAGE_NAME = f"clickhouse/{IMAGE_TYPE}-builder" -def check_image_exists_locally(image_name): +class BuildException(Exception): + pass + + +def check_image_exists_locally(image_name: str) -> bool: try: output = subprocess.check_output( f"docker images -q {image_name} 2> /dev/null", shell=True @@ -21,17 +27,17 @@ def check_image_exists_locally(image_name): return False -def pull_image(image_name): +def pull_image(image_name: str) -> bool: try: subprocess.check_call(f"docker pull {image_name}", shell=True) return True except subprocess.CalledProcessError: - logging.info(f"Cannot pull image {image_name}".format()) + logging.info("Cannot pull image %s", image_name) return False -def build_image(image_name, filepath): - context = os.path.dirname(filepath) +def build_image(image_name: str, filepath: Path) -> None: + context = filepath.parent build_cmd = f"docker build --network=host -t {image_name} -f {filepath} {context}" logging.info("Will build image with cmd: '%s'", build_cmd) subprocess.check_call( @@ -40,7 +46,7 @@ def build_image(image_name, filepath): ) -def pre_build(repo_path: str, env_variables: List[str]): +def pre_build(repo_path: Path, env_variables: List[str]): if "WITH_PERFORMANCE=1" in env_variables: current_branch = subprocess.check_output( "git branch --show-current", shell=True, encoding="utf-8" @@ -56,7 +62,9 @@ def pre_build(repo_path: str, env_variables: List[str]): # conclusion is: in the current state the easiest way to go is to force # unshallow repository for performance artifacts. # To change it we need to rework our performance tests docker image - raise Exception("shallow repository is not suitable for performance builds") + raise BuildException( + "shallow repository is not suitable for performance builds" + ) if current_branch != "master": cmd = ( f"git -C {repo_path} fetch --no-recurse-submodules " @@ -67,14 +75,14 @@ def pre_build(repo_path: str, env_variables: List[str]): def run_docker_image_with_env( - image_name, - as_root, - output, - env_variables, - ch_root, - ccache_dir, - docker_image_version, + image_name: str, + as_root: bool, + output_dir: Path, + env_variables: List[str], + ch_root: Path, + ccache_dir: Optional[Path], ): + output_dir.mkdir(parents=True, exist_ok=True) env_part = " -e ".join(env_variables) if env_part: env_part = " -e " + env_part @@ -89,10 +97,14 @@ def run_docker_image_with_env( else: user = f"{os.geteuid()}:{os.getegid()}" + ccache_mount = f"--volume={ccache_dir}:/ccache" + if ccache_dir is None: + ccache_mount = "" + cmd = ( - f"docker run --network=host --user={user} --rm --volume={output}:/output " - f"--volume={ch_root}:/build --volume={ccache_dir}:/ccache {env_part} " - f"{interactive} {image_name}:{docker_image_version}" + f"docker run --network=host --user={user} --rm {ccache_mount}" + f"--volume={output_dir}:/output --volume={ch_root}:/build {env_part} " + f"{interactive} {image_name}" ) logging.info("Will build ClickHouse pkg with cmd: '%s'", cmd) @@ -100,24 +112,25 @@ def run_docker_image_with_env( subprocess.check_call(cmd, shell=True) -def is_release_build(build_type, package_type, sanitizer): +def is_release_build(build_type: str, package_type: str, sanitizer: str) -> bool: return build_type == "" and package_type == "deb" and sanitizer == "" def parse_env_variables( - build_type, - compiler, - sanitizer, - package_type, - cache, - distcc_hosts, - clang_tidy, - version, - author, - official, - additional_pkgs, - with_coverage, - with_binaries, + build_type: str, + compiler: str, + sanitizer: str, + package_type: str, + cache: str, + s3_bucket: str, + s3_directory: str, + s3_rw_access: bool, + clang_tidy: bool, + version: str, + official: bool, + additional_pkgs: bool, + with_coverage: bool, + with_binaries: str, ): DARWIN_SUFFIX = "-darwin" DARWIN_ARM_SUFFIX = "-darwin-aarch64" @@ -243,32 +256,43 @@ def parse_env_variables( else: result.append("BUILD_TYPE=None") - if cache == "distcc": - result.append(f"CCACHE_PREFIX={cache}") + if not cache: + cmake_flags.append("-DCOMPILER_CACHE=disabled") - if cache: + if cache == "ccache": + cmake_flags.append("-DCOMPILER_CACHE=ccache") result.append("CCACHE_DIR=/ccache") result.append("CCACHE_COMPRESSLEVEL=5") result.append("CCACHE_BASEDIR=/build") result.append("CCACHE_NOHASHDIR=true") result.append("CCACHE_COMPILERCHECK=content") - cache_maxsize = "15G" - if clang_tidy: - # 15G is not enough for tidy build - cache_maxsize = "25G" + result.append("CCACHE_MAXSIZE=15G") - # `CTCACHE_DIR` has the same purpose as the `CCACHE_DIR` above. - # It's there to have the clang-tidy cache embedded into our standard `CCACHE_DIR` + if cache == "sccache": + cmake_flags.append("-DCOMPILER_CACHE=sccache") + # see https://github.com/mozilla/sccache/blob/main/docs/S3.md + result.append(f"SCCACHE_BUCKET={s3_bucket}") + sccache_dir = "sccache" + if s3_directory: + sccache_dir = f"{s3_directory}/{sccache_dir}" + result.append(f"SCCACHE_S3_KEY_PREFIX={sccache_dir}") + if not s3_rw_access: + result.append("SCCACHE_S3_NO_CREDENTIALS=true") + + if clang_tidy: + # `CTCACHE_DIR` has the same purpose as the `CCACHE_DIR` above. + # It's there to have the clang-tidy cache embedded into our standard `CCACHE_DIR` + if cache == "ccache": result.append("CTCACHE_DIR=/ccache/clang-tidy-cache") - result.append(f"CCACHE_MAXSIZE={cache_maxsize}") - - if distcc_hosts: - hosts_with_params = [f"{host}/24,lzo" for host in distcc_hosts] + [ - "localhost/`nproc`" - ] - result.append('DISTCC_HOSTS="' + " ".join(hosts_with_params) + '"') - elif cache == "distcc": - result.append('DISTCC_HOSTS="localhost/`nproc`"') + if s3_bucket: + # see https://github.com/matus-chochlik/ctcache#environment-variables + ctcache_dir = "clang-tidy-cache" + if s3_directory: + ctcache_dir = f"{s3_directory}/{ctcache_dir}" + result.append(f"CTCACHE_S3_BUCKET={s3_bucket}") + result.append(f"CTCACHE_S3_FOLDER={ctcache_dir}") + if not s3_rw_access: + result.append("CTCACHE_S3_NO_CREDENTIALS=true") if additional_pkgs: # NOTE: This are the env for packages/build script @@ -300,9 +324,6 @@ def parse_env_variables( if version: result.append(f"VERSION_STRING='{version}'") - if author: - result.append(f"AUTHOR='{author}'") - if official: cmake_flags.append("-DCLICKHOUSE_OFFICIAL_BUILD=1") @@ -312,14 +333,14 @@ def parse_env_variables( return result -def dir_name(name: str) -> str: - if not os.path.isabs(name): - name = os.path.abspath(os.path.join(os.getcwd(), name)) - return name +def dir_name(name: str) -> Path: + path = Path(name) + if not path.is_absolute(): + path = Path.cwd() / name + return path -if __name__ == "__main__": - logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s") +def parse_args() -> argparse.Namespace: parser = argparse.ArgumentParser( formatter_class=argparse.ArgumentDefaultsHelpFormatter, description="ClickHouse building script using prebuilt Docker image", @@ -331,7 +352,7 @@ if __name__ == "__main__": ) parser.add_argument( "--clickhouse-repo-path", - default=os.path.join(os.path.dirname(SCRIPT_PATH), os.pardir, os.pardir), + default=SCRIPT_PATH.parents[2], type=dir_name, help="ClickHouse git repository", ) @@ -361,17 +382,34 @@ if __name__ == "__main__": ) parser.add_argument("--clang-tidy", action="store_true") - parser.add_argument("--cache", choices=("ccache", "distcc", ""), default="") parser.add_argument( - "--ccache_dir", - default=os.getenv("HOME", "") + "/.ccache", + "--cache", + choices=("ccache", "sccache", ""), + default="", + help="ccache or sccache for objects caching; sccache uses only S3 buckets", + ) + parser.add_argument( + "--ccache-dir", + default=Path.home() / ".ccache", type=dir_name, help="a directory with ccache", ) - parser.add_argument("--distcc-hosts", nargs="+") + parser.add_argument( + "--s3-bucket", + help="an S3 bucket used for sscache and clang-tidy-cache", + ) + parser.add_argument( + "--s3-directory", + default="ccache", + help="an S3 directory prefix used for sscache and clang-tidy-cache", + ) + parser.add_argument( + "--s3-rw-access", + action="store_true", + help="if set, the build fails on errors writing cache to S3", + ) parser.add_argument("--force-build-image", action="store_true") parser.add_argument("--version") - parser.add_argument("--author", default="clickhouse", help="a package author") parser.add_argument("--official", action="store_true") parser.add_argument("--additional-pkgs", action="store_true") parser.add_argument("--with-coverage", action="store_true") @@ -387,34 +425,54 @@ if __name__ == "__main__": args = parser.parse_args() - image_name = f"clickhouse/{IMAGE_TYPE}-builder" + if args.additional_pkgs and args.package_type != "deb": + raise argparse.ArgumentTypeError( + "Can build additional packages only in deb build" + ) + + if args.cache != "ccache": + args.ccache_dir = None + + if args.with_binaries != "": + if args.package_type != "deb": + raise argparse.ArgumentTypeError( + "Can add additional binaries only in deb build" + ) + logging.info("Should place %s to output", args.with_binaries) + + if args.cache == "sccache": + if not args.s3_bucket: + raise argparse.ArgumentTypeError("sccache must have --s3-bucket set") + + return args + + +def main(): + logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s") + args = parse_args() ch_root = args.clickhouse_repo_path - if args.additional_pkgs and args.package_type != "deb": - raise Exception("Can build additional packages only in deb build") + dockerfile = ch_root / "docker/packager" / IMAGE_TYPE / "Dockerfile" + image_with_version = IMAGE_NAME + ":" + args.docker_image_version + if args.force_build_image: + build_image(image_with_version, dockerfile) + elif not ( + check_image_exists_locally(image_with_version) or pull_image(image_with_version) + ): + build_image(image_with_version, dockerfile) - if args.with_binaries != "" and args.package_type != "deb": - raise Exception("Can add additional binaries only in deb build") - - if args.with_binaries != "" and args.package_type == "deb": - logging.info("Should place %s to output", args.with_binaries) - - dockerfile = os.path.join(ch_root, "docker/packager", IMAGE_TYPE, "Dockerfile") - image_with_version = image_name + ":" + args.docker_image_version - if not check_image_exists_locally(image_name) or args.force_build_image: - if not pull_image(image_with_version) or args.force_build_image: - build_image(image_with_version, dockerfile) env_prepared = parse_env_variables( args.build_type, args.compiler, args.sanitizer, args.package_type, args.cache, - args.distcc_hosts, + args.s3_bucket, + args.s3_directory, + args.s3_rw_access, args.clang_tidy, args.version, - args.author, args.official, args.additional_pkgs, args.with_coverage, @@ -423,12 +481,15 @@ if __name__ == "__main__": pre_build(args.clickhouse_repo_path, env_prepared) run_docker_image_with_env( - image_name, + image_with_version, args.as_root, args.output_dir, env_prepared, ch_root, args.ccache_dir, - args.docker_image_version, ) logging.info("Output placed into %s", args.output_dir) + + +if __name__ == "__main__": + main() diff --git a/docker/test/fasttest/Dockerfile b/docker/test/fasttest/Dockerfile index 32546b71eb8..ffb13fc774d 100644 --- a/docker/test/fasttest/Dockerfile +++ b/docker/test/fasttest/Dockerfile @@ -20,12 +20,6 @@ RUN apt-get update \ zstd \ --yes --no-install-recommends -# Install CMake 3.20+ for Rust compilation -RUN apt purge cmake --yes -RUN wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null -RUN apt-add-repository 'deb https://apt.kitware.com/ubuntu/ focal main' -RUN apt update && apt install cmake --yes - RUN pip3 install numpy scipy pandas Jinja2 ARG odbc_driver_url="https://github.com/ClickHouse/clickhouse-odbc/releases/download/v1.1.4.20200302/clickhouse-odbc-1.1.4-Linux.tar.gz" diff --git a/docker/test/fasttest/run.sh b/docker/test/fasttest/run.sh index 086276bed55..e0e30a63bb4 100755 --- a/docker/test/fasttest/run.sh +++ b/docker/test/fasttest/run.sh @@ -16,7 +16,8 @@ export LLVM_VERSION=${LLVM_VERSION:-13} # it being undefined. Also read it as array so that we can pass an empty list # of additional variable to cmake properly, and it doesn't generate an extra # empty parameter. -read -ra FASTTEST_CMAKE_FLAGS <<< "${FASTTEST_CMAKE_FLAGS:-}" +# Read it as CMAKE_FLAGS to not lose exported FASTTEST_CMAKE_FLAGS on subsequential launch +read -ra CMAKE_FLAGS <<< "${FASTTEST_CMAKE_FLAGS:-}" # Run only matching tests. FASTTEST_FOCUS=${FASTTEST_FOCUS:-""} @@ -37,6 +38,13 @@ export FASTTEST_DATA export FASTTEST_OUT export PATH +function ccache_status +{ + ccache --show-config ||: + ccache --show-stats ||: + SCCACHE_NO_DAEMON=1 sccache --show-stats ||: +} + function start_server { set -m # Spawn server in its own process groups @@ -171,14 +179,14 @@ function run_cmake export CCACHE_COMPILERCHECK=content export CCACHE_MAXSIZE=15G - ccache --show-stats ||: + ccache_status ccache --zero-stats ||: mkdir "$FASTTEST_BUILD" ||: ( cd "$FASTTEST_BUILD" - cmake "$FASTTEST_SOURCE" -DCMAKE_CXX_COMPILER="clang++-${LLVM_VERSION}" -DCMAKE_C_COMPILER="clang-${LLVM_VERSION}" "${CMAKE_LIBS_CONFIG[@]}" "${FASTTEST_CMAKE_FLAGS[@]}" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee "$FASTTEST_OUTPUT/cmake_log.txt" + cmake "$FASTTEST_SOURCE" -DCMAKE_CXX_COMPILER="clang++-${LLVM_VERSION}" -DCMAKE_C_COMPILER="clang-${LLVM_VERSION}" "${CMAKE_LIBS_CONFIG[@]}" "${CMAKE_FLAGS[@]}" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee "$FASTTEST_OUTPUT/cmake_log.txt" ) } @@ -193,7 +201,7 @@ function build strip programs/clickhouse -o "$FASTTEST_OUTPUT/clickhouse-stripped" zstd --threads=0 "$FASTTEST_OUTPUT/clickhouse-stripped" fi - ccache --show-stats ||: + ccache_status ccache --evict-older-than 1d ||: ) } diff --git a/docker/test/util/Dockerfile b/docker/test/util/Dockerfile index 0ee426f4e4d..911cadc3c58 100644 --- a/docker/test/util/Dockerfile +++ b/docker/test/util/Dockerfile @@ -92,4 +92,17 @@ RUN mkdir /tmp/ccache \ && cd / \ && rm -rf /tmp/ccache +ARG TARGETARCH +ARG SCCACHE_VERSION=v0.4.1 +RUN arch=${TARGETARCH:-amd64} \ + && case $arch in \ + amd64) rarch=x86_64 ;; \ + arm64) rarch=aarch64 ;; \ + esac \ + && curl -Ls "https://github.com/mozilla/sccache/releases/download/$SCCACHE_VERSION/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl.tar.gz" | \ + tar xz -C /tmp \ + && mv "/tmp/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl/sccache" /usr/bin \ + && rm "/tmp/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl" -r + + COPY process_functional_tests_result.py / diff --git a/docs/en/development/build.md b/docs/en/development/build.md index 804aa8a3dc5..00a8a54f80a 100644 --- a/docs/en/development/build.md +++ b/docs/en/development/build.md @@ -85,7 +85,7 @@ The build requires the following components: - Git (is used only to checkout the sources, it’s not needed for the build) - CMake 3.15 or newer - Ninja -- C++ compiler: clang-14 or newer +- C++ compiler: clang-15 or newer - Linker: lld - Yasm - Gawk diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index 9f5b8dbda58..4d638964c6f 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -2769,7 +2769,7 @@ Default value: `120` seconds. ## cast_keep_nullable {#cast_keep_nullable} -Enables or disables keeping of the `Nullable` data type in [CAST](../../sql-reference/functions/type-conversion-functions.md/#type_conversion_function-cast) operations. +Enables or disables keeping of the `Nullable` data type in [CAST](../../sql-reference/functions/type-conversion-functions.md/#castx-t) operations. When the setting is enabled and the argument of `CAST` function is `Nullable`, the result is also transformed to `Nullable` type. When the setting is disabled, the result always has the destination type exactly. diff --git a/docs/en/sql-reference/statements/select/join.md b/docs/en/sql-reference/statements/select/join.md index 49bd2672874..ece60961aaf 100644 --- a/docs/en/sql-reference/statements/select/join.md +++ b/docs/en/sql-reference/statements/select/join.md @@ -47,6 +47,7 @@ The default join type can be overridden using [join_default_strictness](../../.. The behavior of ClickHouse server for `ANY JOIN` operations depends on the [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) setting. + **See also** - [join_algorithm](../../../operations/settings/settings.md#settings-join_algorithm) @@ -57,6 +58,8 @@ The behavior of ClickHouse server for `ANY JOIN` operations depends on the [any_ - [join_on_disk_max_files_to_merge](../../../operations/settings/settings.md#join_on_disk_max_files_to_merge) - [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) +Use the `cross_to_inner_join_rewrite` setting to define the behavior when ClickHouse fails to rewrite a `CROSS JOIN` as an `INNER JOIN`. The default value is `1`, which allows the join to continue but it will be slower. Set `cross_to_inner_join_rewrite` to `0` if you want an error to be thrown, and set it to `2` to not run the cross joins but instead force a rewrite of all comma/cross joins. If the rewriting fails when the value is `2`, you will receive an error message stating "Please, try to simplify `WHERE` section". + ## ON Section Conditions An `ON` section can contain several conditions combined using the `AND` and `OR` operators. Conditions specifying join keys must refer both left and right tables and must use the equality operator. Other conditions may use other logical operators but they must refer either the left or the right table of a query. diff --git a/programs/benchmark/Benchmark.cpp b/programs/benchmark/Benchmark.cpp index 994f9b7ac4d..466a0c194f7 100644 --- a/programs/benchmark/Benchmark.cpp +++ b/programs/benchmark/Benchmark.cpp @@ -34,6 +34,7 @@ #include #include #include +#include #include @@ -43,6 +44,12 @@ namespace fs = std::filesystem; * The tool emulates a case with fixed amount of simultaneously executing queries. */ +namespace CurrentMetrics +{ + extern const Metric LocalThread; + extern const Metric LocalThreadActive; +} + namespace DB { @@ -103,7 +110,7 @@ public: settings(settings_), shared_context(Context::createShared()), global_context(Context::createGlobal(shared_context.get())), - pool(concurrency) + pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, concurrency) { const auto secure = secure_ ? Protocol::Secure::Enable : Protocol::Secure::Disable; size_t connections_cnt = std::max(ports_.size(), hosts_.size()); diff --git a/programs/copier/ClusterCopier.cpp b/programs/copier/ClusterCopier.cpp index 3a6261974d1..079c70596a6 100644 --- a/programs/copier/ClusterCopier.cpp +++ b/programs/copier/ClusterCopier.cpp @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -19,6 +20,12 @@ #include #include +namespace CurrentMetrics +{ + extern const Metric LocalThread; + extern const Metric LocalThreadActive; +} + namespace DB { @@ -192,7 +199,7 @@ void ClusterCopier::discoverTablePartitions(const ConnectionTimeouts & timeouts, { /// Fetch partitions list from a shard { - ThreadPool thread_pool(num_threads ? num_threads : 2 * getNumberOfPhysicalCPUCores()); + ThreadPool thread_pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, num_threads ? num_threads : 2 * getNumberOfPhysicalCPUCores()); for (const TaskShardPtr & task_shard : task_table.all_shards) thread_pool.scheduleOrThrowOnError([this, timeouts, task_shard]() diff --git a/src/Analyzer/Passes/LogicalExpressionOptimizerPass.cpp b/src/Analyzer/Passes/LogicalExpressionOptimizerPass.cpp index 3d65035f9fd..97669f3924f 100644 --- a/src/Analyzer/Passes/LogicalExpressionOptimizerPass.cpp +++ b/src/Analyzer/Passes/LogicalExpressionOptimizerPass.cpp @@ -153,9 +153,11 @@ private: } }; - if (const auto * lhs_literal = lhs->as()) + if (const auto * lhs_literal = lhs->as(); + lhs_literal && !lhs_literal->getValue().isNull()) add_equals_function_if_not_present(rhs, lhs_literal); - else if (const auto * rhs_literal = rhs->as()) + else if (const auto * rhs_literal = rhs->as(); + rhs_literal && !rhs_literal->getValue().isNull()) add_equals_function_if_not_present(lhs, rhs_literal); else or_operands.push_back(argument); diff --git a/src/Backups/BackupsWorker.cpp b/src/Backups/BackupsWorker.cpp index b82629eadf7..4572e481829 100644 --- a/src/Backups/BackupsWorker.cpp +++ b/src/Backups/BackupsWorker.cpp @@ -20,10 +20,19 @@ #include #include #include +#include #include #include +namespace CurrentMetrics +{ + extern const Metric BackupsThreads; + extern const Metric BackupsThreadsActive; + extern const Metric RestoreThreads; + extern const Metric RestoreThreadsActive; +} + namespace DB { @@ -153,8 +162,8 @@ namespace BackupsWorker::BackupsWorker(size_t num_backup_threads, size_t num_restore_threads, bool allow_concurrent_backups_, bool allow_concurrent_restores_) - : backups_thread_pool(num_backup_threads, /* max_free_threads = */ 0, num_backup_threads) - , restores_thread_pool(num_restore_threads, /* max_free_threads = */ 0, num_restore_threads) + : backups_thread_pool(CurrentMetrics::BackupsThreads, CurrentMetrics::BackupsThreadsActive, num_backup_threads, /* max_free_threads = */ 0, num_backup_threads) + , restores_thread_pool(CurrentMetrics::RestoreThreads, CurrentMetrics::RestoreThreadsActive, num_restore_threads, /* max_free_threads = */ 0, num_restore_threads) , log(&Poco::Logger::get("BackupsWorker")) , allow_concurrent_backups(allow_concurrent_backups_) , allow_concurrent_restores(allow_concurrent_restores_) diff --git a/src/Client/ReplxxLineReader.cpp b/src/Client/ReplxxLineReader.cpp index fa5ba8b8ba9..1979b37a94b 100644 --- a/src/Client/ReplxxLineReader.cpp +++ b/src/Client/ReplxxLineReader.cpp @@ -5,6 +5,7 @@ #include #include +#include #include #include #include diff --git a/src/Common/CurrentMetrics.cpp b/src/Common/CurrentMetrics.cpp index 5b20d98aa01..4c773048597 100644 --- a/src/Common/CurrentMetrics.cpp +++ b/src/Common/CurrentMetrics.cpp @@ -72,6 +72,64 @@ M(GlobalThreadActive, "Number of threads in global thread pool running a task.") \ M(LocalThread, "Number of threads in local thread pools. The threads in local thread pools are taken from the global thread pool.") \ M(LocalThreadActive, "Number of threads in local thread pools running a task.") \ + M(MergeTreeDataSelectExecutorThreads, "Number of threads in the MergeTreeDataSelectExecutor thread pool.") \ + M(MergeTreeDataSelectExecutorThreadsActive, "Number of threads in the MergeTreeDataSelectExecutor thread pool running a task.") \ + M(BackupsThreads, "Number of threads in the thread pool for BACKUP.") \ + M(BackupsThreadsActive, "Number of threads in thread pool for BACKUP running a task.") \ + M(RestoreThreads, "Number of threads in the thread pool for RESTORE.") \ + M(RestoreThreadsActive, "Number of threads in the thread pool for RESTORE running a task.") \ + M(IOThreads, "Number of threads in the IO thread pool.") \ + M(IOThreadsActive, "Number of threads in the IO thread pool running a task.") \ + M(ThreadPoolRemoteFSReaderThreads, "Number of threads in the thread pool for remote_filesystem_read_method=threadpool.") \ + M(ThreadPoolRemoteFSReaderThreadsActive, "Number of threads in the thread pool for remote_filesystem_read_method=threadpool running a task.") \ + M(ThreadPoolFSReaderThreads, "Number of threads in the thread pool for local_filesystem_read_method=threadpool.") \ + M(ThreadPoolFSReaderThreadsActive, "Number of threads in the thread pool for local_filesystem_read_method=threadpool running a task.") \ + M(BackupsIOThreads, "Number of threads in the BackupsIO thread pool.") \ + M(BackupsIOThreadsActive, "Number of threads in the BackupsIO thread pool running a task.") \ + M(DiskObjectStorageAsyncThreads, "Number of threads in the async thread pool for DiskObjectStorage.") \ + M(DiskObjectStorageAsyncThreadsActive, "Number of threads in the async thread pool for DiskObjectStorage running a task.") \ + M(StorageHiveThreads, "Number of threads in the StorageHive thread pool.") \ + M(StorageHiveThreadsActive, "Number of threads in the StorageHive thread pool running a task.") \ + M(TablesLoaderThreads, "Number of threads in the tables loader thread pool.") \ + M(TablesLoaderThreadsActive, "Number of threads in the tables loader thread pool running a task.") \ + M(DatabaseOrdinaryThreads, "Number of threads in the Ordinary database thread pool.") \ + M(DatabaseOrdinaryThreadsActive, "Number of threads in the Ordinary database thread pool running a task.") \ + M(DatabaseOnDiskThreads, "Number of threads in the DatabaseOnDisk thread pool.") \ + M(DatabaseOnDiskThreadsActive, "Number of threads in the DatabaseOnDisk thread pool running a task.") \ + M(DatabaseCatalogThreads, "Number of threads in the DatabaseCatalog thread pool.") \ + M(DatabaseCatalogThreadsActive, "Number of threads in the DatabaseCatalog thread pool running a task.") \ + M(DestroyAggregatesThreads, "Number of threads in the thread pool for destroy aggregate states.") \ + M(DestroyAggregatesThreadsActive, "Number of threads in the thread pool for destroy aggregate states running a task.") \ + M(HashedDictionaryThreads, "Number of threads in the HashedDictionary thread pool.") \ + M(HashedDictionaryThreadsActive, "Number of threads in the HashedDictionary thread pool running a task.") \ + M(CacheDictionaryThreads, "Number of threads in the CacheDictionary thread pool.") \ + M(CacheDictionaryThreadsActive, "Number of threads in the CacheDictionary thread pool running a task.") \ + M(ParallelFormattingOutputFormatThreads, "Number of threads in the ParallelFormattingOutputFormatThreads thread pool.") \ + M(ParallelFormattingOutputFormatThreadsActive, "Number of threads in the ParallelFormattingOutputFormatThreads thread pool running a task.") \ + M(ParallelParsingInputFormatThreads, "Number of threads in the ParallelParsingInputFormat thread pool.") \ + M(ParallelParsingInputFormatThreadsActive, "Number of threads in the ParallelParsingInputFormat thread pool running a task.") \ + M(MergeTreeBackgroundExecutorThreads, "Number of threads in the MergeTreeBackgroundExecutor thread pool.") \ + M(MergeTreeBackgroundExecutorThreadsActive, "Number of threads in the MergeTreeBackgroundExecutor thread pool running a task.") \ + M(AsynchronousInsertThreads, "Number of threads in the AsynchronousInsert thread pool.") \ + M(AsynchronousInsertThreadsActive, "Number of threads in the AsynchronousInsert thread pool running a task.") \ + M(StartupSystemTablesThreads, "Number of threads in the StartupSystemTables thread pool.") \ + M(StartupSystemTablesThreadsActive, "Number of threads in the StartupSystemTables thread pool running a task.") \ + M(AggregatorThreads, "Number of threads in the Aggregator thread pool.") \ + M(AggregatorThreadsActive, "Number of threads in the Aggregator thread pool running a task.") \ + M(DDLWorkerThreads, "Number of threads in the DDLWorker thread pool for ON CLUSTER queries.") \ + M(DDLWorkerThreadsActive, "Number of threads in the DDLWORKER thread pool for ON CLUSTER queries running a task.") \ + M(StorageDistributedThreads, "Number of threads in the StorageDistributed thread pool.") \ + M(StorageDistributedThreadsActive, "Number of threads in the StorageDistributed thread pool running a task.") \ + M(StorageS3Threads, "Number of threads in the StorageS3 thread pool.") \ + M(StorageS3ThreadsActive, "Number of threads in the StorageS3 thread pool running a task.") \ + M(MergeTreePartsLoaderThreads, "Number of threads in the MergeTree parts loader thread pool.") \ + M(MergeTreePartsLoaderThreadsActive, "Number of threads in the MergeTree parts loader thread pool running a task.") \ + M(MergeTreePartsCleanerThreads, "Number of threads in the MergeTree parts cleaner thread pool.") \ + M(MergeTreePartsCleanerThreadsActive, "Number of threads in the MergeTree parts cleaner thread pool running a task.") \ + M(SystemReplicasThreads, "Number of threads in the system.replicas thread pool.") \ + M(SystemReplicasThreadsActive, "Number of threads in the system.replicas thread pool running a task.") \ + M(RestartReplicaThreads, "Number of threads in the RESTART REPLICA thread pool.") \ + M(RestartReplicaThreadsActive, "Number of threads in the RESTART REPLICA thread pool running a task.") \ M(DistributedFilesToInsert, "Number of pending files to process for asynchronous insertion into Distributed tables. Number of files for every shard is summed.") \ M(BrokenDistributedFilesToInsert, "Number of files for asynchronous insertion into Distributed tables that has been marked as broken. This metric will starts from 0 on start. Number of files for every shard is summed.") \ M(TablesToDropQueueSize, "Number of dropped tables, that are waiting for background data removal.") \ diff --git a/src/Common/CurrentMetrics.h b/src/Common/CurrentMetrics.h index c184ee1e7f2..0ae16e2d08d 100644 --- a/src/Common/CurrentMetrics.h +++ b/src/Common/CurrentMetrics.h @@ -4,6 +4,7 @@ #include #include #include +#include #include /** Allows to count number of simultaneously happening processes or current value of some metric. @@ -73,7 +74,10 @@ namespace CurrentMetrics public: explicit Increment(Metric metric, Value amount_ = 1) - : Increment(&values[metric], amount_) {} + : Increment(&values[metric], amount_) + { + assert(metric < CurrentMetrics::end()); + } ~Increment() { diff --git a/src/Common/ErrorCodes.cpp b/src/Common/ErrorCodes.cpp index 549112f8c84..9abf3bba8ff 100644 --- a/src/Common/ErrorCodes.cpp +++ b/src/Common/ErrorCodes.cpp @@ -649,6 +649,7 @@ M(678, IO_URING_INIT_FAILED) \ M(679, IO_URING_SUBMIT_ERROR) \ M(690, MIXED_ACCESS_PARAMETER_TYPES) \ + M(691, UNKNOWN_ELEMENT_OF_ENUM) \ \ M(999, KEEPER_EXCEPTION) \ M(1000, POCO_EXCEPTION) \ diff --git a/src/Common/ThreadPool.cpp b/src/Common/ThreadPool.cpp index b742e2df06e..bcf4fce27f8 100644 --- a/src/Common/ThreadPool.cpp +++ b/src/Common/ThreadPool.cpp @@ -26,28 +26,37 @@ namespace CurrentMetrics { extern const Metric GlobalThread; extern const Metric GlobalThreadActive; - extern const Metric LocalThread; - extern const Metric LocalThreadActive; } static constexpr auto DEFAULT_THREAD_NAME = "ThreadPool"; template -ThreadPoolImpl::ThreadPoolImpl() - : ThreadPoolImpl(getNumberOfPhysicalCPUCores()) +ThreadPoolImpl::ThreadPoolImpl(Metric metric_threads_, Metric metric_active_threads_) + : ThreadPoolImpl(metric_threads_, metric_active_threads_, getNumberOfPhysicalCPUCores()) { } template -ThreadPoolImpl::ThreadPoolImpl(size_t max_threads_) - : ThreadPoolImpl(max_threads_, max_threads_, max_threads_) +ThreadPoolImpl::ThreadPoolImpl( + Metric metric_threads_, + Metric metric_active_threads_, + size_t max_threads_) + : ThreadPoolImpl(metric_threads_, metric_active_threads_, max_threads_, max_threads_, max_threads_) { } template -ThreadPoolImpl::ThreadPoolImpl(size_t max_threads_, size_t max_free_threads_, size_t queue_size_, bool shutdown_on_exception_) - : max_threads(max_threads_) +ThreadPoolImpl::ThreadPoolImpl( + Metric metric_threads_, + Metric metric_active_threads_, + size_t max_threads_, + size_t max_free_threads_, + size_t queue_size_, + bool shutdown_on_exception_) + : metric_threads(metric_threads_) + , metric_active_threads(metric_active_threads_) + , max_threads(max_threads_) , max_free_threads(std::min(max_free_threads_, max_threads)) , queue_size(queue_size_ ? std::max(queue_size_, max_threads) : 0 /* zero means the queue is unlimited */) , shutdown_on_exception(shutdown_on_exception_) @@ -324,8 +333,7 @@ template void ThreadPoolImpl::worker(typename std::list::iterator thread_it) { DENY_ALLOCATIONS_IN_SCOPE; - CurrentMetrics::Increment metric_all_threads( - std::is_same_v ? CurrentMetrics::GlobalThread : CurrentMetrics::LocalThread); + CurrentMetrics::Increment metric_pool_threads(metric_threads); /// Remove this thread from `threads` and detach it, that must be done before exiting from this worker. /// We can't wrap the following lambda function into `SCOPE_EXIT` because it requires `mutex` to be locked. @@ -383,8 +391,7 @@ void ThreadPoolImpl::worker(typename std::list::iterator thread_ try { - CurrentMetrics::Increment metric_active_threads( - std::is_same_v ? CurrentMetrics::GlobalThreadActive : CurrentMetrics::LocalThreadActive); + CurrentMetrics::Increment metric_active_pool_threads(metric_active_threads); job(); @@ -458,6 +465,22 @@ template class ThreadFromGlobalPoolImpl; std::unique_ptr GlobalThreadPool::the_instance; + +GlobalThreadPool::GlobalThreadPool( + size_t max_threads_, + size_t max_free_threads_, + size_t queue_size_, + const bool shutdown_on_exception_) + : FreeThreadPool( + CurrentMetrics::GlobalThread, + CurrentMetrics::GlobalThreadActive, + max_threads_, + max_free_threads_, + queue_size_, + shutdown_on_exception_) +{ +} + void GlobalThreadPool::initialize(size_t max_threads, size_t max_free_threads, size_t queue_size) { if (the_instance) diff --git a/src/Common/ThreadPool.h b/src/Common/ThreadPool.h index a1ca79a1e4b..b2f77f9693c 100644 --- a/src/Common/ThreadPool.h +++ b/src/Common/ThreadPool.h @@ -16,6 +16,7 @@ #include #include #include +#include #include /** Very simple thread pool similar to boost::threadpool. @@ -33,15 +34,25 @@ class ThreadPoolImpl { public: using Job = std::function; + using Metric = CurrentMetrics::Metric; /// Maximum number of threads is based on the number of physical cores. - ThreadPoolImpl(); + ThreadPoolImpl(Metric metric_threads_, Metric metric_active_threads_); /// Size is constant. Up to num_threads are created on demand and then run until shutdown. - explicit ThreadPoolImpl(size_t max_threads_); + explicit ThreadPoolImpl( + Metric metric_threads_, + Metric metric_active_threads_, + size_t max_threads_); /// queue_size - maximum number of running plus scheduled jobs. It can be greater than max_threads. Zero means unlimited. - ThreadPoolImpl(size_t max_threads_, size_t max_free_threads_, size_t queue_size_, bool shutdown_on_exception_ = true); + ThreadPoolImpl( + Metric metric_threads_, + Metric metric_active_threads_, + size_t max_threads_, + size_t max_free_threads_, + size_t queue_size_, + bool shutdown_on_exception_ = true); /// Add new job. Locks until number of scheduled jobs is less than maximum or exception in one of threads was thrown. /// If any thread was throw an exception, first exception will be rethrown from this method, @@ -96,6 +107,9 @@ private: std::condition_variable job_finished; std::condition_variable new_job_or_shutdown; + Metric metric_threads; + Metric metric_active_threads; + size_t max_threads; size_t max_free_threads; size_t queue_size; @@ -159,12 +173,11 @@ class GlobalThreadPool : public FreeThreadPool, private boost::noncopyable { static std::unique_ptr the_instance; - GlobalThreadPool(size_t max_threads_, size_t max_free_threads_, - size_t queue_size_, const bool shutdown_on_exception_) - : FreeThreadPool(max_threads_, max_free_threads_, queue_size_, - shutdown_on_exception_) - { - } + GlobalThreadPool( + size_t max_threads_, + size_t max_free_threads_, + size_t queue_size_, + bool shutdown_on_exception_); public: static void initialize(size_t max_threads = 10000, size_t max_free_threads = 1000, size_t queue_size = 10000); diff --git a/src/Common/examples/parallel_aggregation.cpp b/src/Common/examples/parallel_aggregation.cpp index bd252b330f3..cf7a3197fef 100644 --- a/src/Common/examples/parallel_aggregation.cpp +++ b/src/Common/examples/parallel_aggregation.cpp @@ -17,6 +17,7 @@ #include #include +#include using Key = UInt64; @@ -28,6 +29,12 @@ using Map = HashMap; using MapTwoLevel = TwoLevelHashMap; +namespace CurrentMetrics +{ + extern const Metric LocalThread; + extern const Metric LocalThreadActive; +} + struct SmallLock { std::atomic locked {false}; @@ -247,7 +254,7 @@ int main(int argc, char ** argv) std::cerr << std::fixed << std::setprecision(2); - ThreadPool pool(num_threads); + ThreadPool pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, num_threads); Source data(n); diff --git a/src/Common/examples/parallel_aggregation2.cpp b/src/Common/examples/parallel_aggregation2.cpp index 6c20f46ab0e..1b0ad760490 100644 --- a/src/Common/examples/parallel_aggregation2.cpp +++ b/src/Common/examples/parallel_aggregation2.cpp @@ -17,6 +17,7 @@ #include #include +#include using Key = UInt64; @@ -24,6 +25,12 @@ using Value = UInt64; using Source = std::vector; +namespace CurrentMetrics +{ + extern const Metric LocalThread; + extern const Metric LocalThreadActive; +} + template struct AggregateIndependent { @@ -274,7 +281,7 @@ int main(int argc, char ** argv) std::cerr << std::fixed << std::setprecision(2); - ThreadPool pool(num_threads); + ThreadPool pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, num_threads); Source data(n); diff --git a/src/Common/examples/thread_creation_latency.cpp b/src/Common/examples/thread_creation_latency.cpp index 351f709013a..2434759c968 100644 --- a/src/Common/examples/thread_creation_latency.cpp +++ b/src/Common/examples/thread_creation_latency.cpp @@ -6,6 +6,7 @@ #include #include #include +#include int value = 0; @@ -14,6 +15,12 @@ static void f() { ++value; } static void * g(void *) { f(); return {}; } +namespace CurrentMetrics +{ + extern const Metric LocalThread; + extern const Metric LocalThreadActive; +} + namespace DB { namespace ErrorCodes @@ -65,7 +72,7 @@ int main(int argc, char ** argv) test(n, "Create and destroy ThreadPool each iteration", [] { - ThreadPool tp(1); + ThreadPool tp(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, 1); tp.scheduleOrThrowOnError(f); tp.wait(); }); @@ -86,7 +93,7 @@ int main(int argc, char ** argv) }); { - ThreadPool tp(1); + ThreadPool tp(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, 1); test(n, "Schedule job for Threadpool each iteration", [&tp] { @@ -96,7 +103,7 @@ int main(int argc, char ** argv) } { - ThreadPool tp(128); + ThreadPool tp(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, 128); test(n, "Schedule job for Threadpool with 128 threads each iteration", [&tp] { diff --git a/src/Common/tests/gtest_thread_pool_concurrent_wait.cpp b/src/Common/tests/gtest_thread_pool_concurrent_wait.cpp index f5f14739e39..f93017129dd 100644 --- a/src/Common/tests/gtest_thread_pool_concurrent_wait.cpp +++ b/src/Common/tests/gtest_thread_pool_concurrent_wait.cpp @@ -1,4 +1,5 @@ #include +#include #include @@ -7,6 +8,12 @@ */ +namespace CurrentMetrics +{ + extern const Metric LocalThread; + extern const Metric LocalThreadActive; +} + TEST(ThreadPool, ConcurrentWait) { auto worker = [] @@ -18,14 +25,14 @@ TEST(ThreadPool, ConcurrentWait) constexpr size_t num_threads = 4; constexpr size_t num_jobs = 4; - ThreadPool pool(num_threads); + ThreadPool pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, num_threads); for (size_t i = 0; i < num_jobs; ++i) pool.scheduleOrThrowOnError(worker); constexpr size_t num_waiting_threads = 4; - ThreadPool waiting_pool(num_waiting_threads); + ThreadPool waiting_pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, num_waiting_threads); for (size_t i = 0; i < num_waiting_threads; ++i) waiting_pool.scheduleOrThrowOnError([&pool] { pool.wait(); }); diff --git a/src/Common/tests/gtest_thread_pool_global_full.cpp b/src/Common/tests/gtest_thread_pool_global_full.cpp index 583d43be1bb..1b2ded9c7e1 100644 --- a/src/Common/tests/gtest_thread_pool_global_full.cpp +++ b/src/Common/tests/gtest_thread_pool_global_full.cpp @@ -2,10 +2,17 @@ #include #include +#include #include +namespace CurrentMetrics +{ + extern const Metric LocalThread; + extern const Metric LocalThreadActive; +} + /// Test what happens if local ThreadPool cannot create a ThreadFromGlobalPool. /// There was a bug: if local ThreadPool cannot allocate even a single thread, /// the job will be scheduled but never get executed. @@ -27,7 +34,7 @@ TEST(ThreadPool, GlobalFull1) auto func = [&] { ++counter; while (counter != num_jobs) {} }; - ThreadPool pool(num_jobs); + ThreadPool pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, num_jobs); for (size_t i = 0; i < capacity; ++i) pool.scheduleOrThrowOnError(func); @@ -65,11 +72,11 @@ TEST(ThreadPool, GlobalFull2) std::atomic counter = 0; auto func = [&] { ++counter; while (counter != capacity + 1) {} }; - ThreadPool pool(capacity, 0, capacity); + ThreadPool pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, capacity, 0, capacity); for (size_t i = 0; i < capacity; ++i) pool.scheduleOrThrowOnError(func); - ThreadPool another_pool(1); + ThreadPool another_pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, 1); EXPECT_THROW(another_pool.scheduleOrThrowOnError(func), DB::Exception); ++counter; diff --git a/src/Common/tests/gtest_thread_pool_loop.cpp b/src/Common/tests/gtest_thread_pool_loop.cpp index 15915044652..556c39df949 100644 --- a/src/Common/tests/gtest_thread_pool_loop.cpp +++ b/src/Common/tests/gtest_thread_pool_loop.cpp @@ -1,10 +1,17 @@ #include #include #include +#include #include +namespace CurrentMetrics +{ + extern const Metric LocalThread; + extern const Metric LocalThreadActive; +} + TEST(ThreadPool, Loop) { std::atomic res{0}; @@ -12,7 +19,7 @@ TEST(ThreadPool, Loop) for (size_t i = 0; i < 1000; ++i) { size_t threads = 16; - ThreadPool pool(threads); + ThreadPool pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, threads); for (size_t j = 0; j < threads; ++j) pool.scheduleOrThrowOnError([&] { ++res; }); pool.wait(); diff --git a/src/Common/tests/gtest_thread_pool_schedule_exception.cpp b/src/Common/tests/gtest_thread_pool_schedule_exception.cpp index 69362c34cd2..176e469d5ef 100644 --- a/src/Common/tests/gtest_thread_pool_schedule_exception.cpp +++ b/src/Common/tests/gtest_thread_pool_schedule_exception.cpp @@ -1,13 +1,20 @@ #include #include #include +#include #include +namespace CurrentMetrics +{ + extern const Metric LocalThread; + extern const Metric LocalThreadActive; +} + static bool check() { - ThreadPool pool(10); + ThreadPool pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, 10); /// The throwing thread. pool.scheduleOrThrowOnError([] { throw std::runtime_error("Hello, world!"); }); diff --git a/src/Core/ServerSettings.h b/src/Core/ServerSettings.h index e780424507c..efa41a5ec13 100644 --- a/src/Core/ServerSettings.h +++ b/src/Core/ServerSettings.h @@ -29,7 +29,6 @@ namespace DB M(Int32, max_connections, 1024, "Max server connections.", 0) \ M(UInt32, asynchronous_metrics_update_period_s, 1, "Period in seconds for updating asynchronous metrics.", 0) \ M(UInt32, asynchronous_heavy_metrics_update_period_s, 120, "Period in seconds for updating asynchronous metrics.", 0) \ - M(UInt32, max_threads_for_connection_collector, 10, "The maximum number of threads that will be used for draining connections asynchronously in a background upon finishing executing distributed queries.", 0) \ M(String, default_database, "default", "Default database name.", 0) \ M(String, tmp_policy, "", "Policy for storage with temporary data.", 0) \ M(UInt64, max_temporary_data_on_disk_size, 0, "The maximum amount of storage that could be used for external aggregation, joins or sorting., ", 0) \ diff --git a/src/DataTypes/EnumValues.cpp b/src/DataTypes/EnumValues.cpp index b0f51a54ccb..9df49e765a7 100644 --- a/src/DataTypes/EnumValues.cpp +++ b/src/DataTypes/EnumValues.cpp @@ -10,7 +10,7 @@ namespace ErrorCodes { extern const int SYNTAX_ERROR; extern const int EMPTY_DATA_PASSED; - extern const int BAD_ARGUMENTS; + extern const int UNKNOWN_ELEMENT_OF_ENUM; } template @@ -69,7 +69,7 @@ T EnumValues::getValue(StringRef field_name, bool try_treat_as_id) const } auto hints = this->getHints(field_name.toString()); auto hints_string = !hints.empty() ? ", maybe you meant: " + toString(hints) : ""; - throw Exception(ErrorCodes::BAD_ARGUMENTS, "Unknown element '{}' for enum{}", field_name.toString(), hints_string); + throw Exception(ErrorCodes::UNKNOWN_ELEMENT_OF_ENUM, "Unknown element '{}' for enum{}", field_name.toString(), hints_string); } return it->getMapped(); } diff --git a/src/Databases/DatabaseOnDisk.cpp b/src/Databases/DatabaseOnDisk.cpp index 4d9e22bd15d..01afbdcaa57 100644 --- a/src/Databases/DatabaseOnDisk.cpp +++ b/src/Databases/DatabaseOnDisk.cpp @@ -17,14 +17,21 @@ #include #include #include +#include +#include +#include #include #include -#include #include -#include namespace fs = std::filesystem; +namespace CurrentMetrics +{ + extern const Metric DatabaseOnDiskThreads; + extern const Metric DatabaseOnDiskThreadsActive; +} + namespace DB { @@ -620,7 +627,7 @@ void DatabaseOnDisk::iterateMetadataFiles(ContextPtr local_context, const Iterat } /// Read and parse metadata in parallel - ThreadPool pool; + ThreadPool pool(CurrentMetrics::DatabaseOnDiskThreads, CurrentMetrics::DatabaseOnDiskThreadsActive); for (const auto & file : metadata_files) { pool.scheduleOrThrowOnError([&]() diff --git a/src/Databases/DatabaseOrdinary.cpp b/src/Databases/DatabaseOrdinary.cpp index 49250602132..0db16f80656 100644 --- a/src/Databases/DatabaseOrdinary.cpp +++ b/src/Databases/DatabaseOrdinary.cpp @@ -25,9 +25,16 @@ #include #include #include +#include namespace fs = std::filesystem; +namespace CurrentMetrics +{ + extern const Metric DatabaseOrdinaryThreads; + extern const Metric DatabaseOrdinaryThreadsActive; +} + namespace DB { @@ -99,7 +106,7 @@ void DatabaseOrdinary::loadStoredObjects( std::atomic dictionaries_processed{0}; std::atomic tables_processed{0}; - ThreadPool pool; + ThreadPool pool(CurrentMetrics::DatabaseOrdinaryThreads, CurrentMetrics::DatabaseOrdinaryThreadsActive); /// We must attach dictionaries before attaching tables /// because while we're attaching tables we may need to have some dictionaries attached diff --git a/src/Databases/TablesLoader.cpp b/src/Databases/TablesLoader.cpp index 5d66f49554d..177b32daa2e 100644 --- a/src/Databases/TablesLoader.cpp +++ b/src/Databases/TablesLoader.cpp @@ -8,8 +8,15 @@ #include #include #include +#include #include +namespace CurrentMetrics +{ + extern const Metric TablesLoaderThreads; + extern const Metric TablesLoaderThreadsActive; +} + namespace DB { @@ -31,12 +38,13 @@ void logAboutProgress(Poco::Logger * log, size_t processed, size_t total, Atomic } TablesLoader::TablesLoader(ContextMutablePtr global_context_, Databases databases_, LoadingStrictnessLevel strictness_mode_) -: global_context(global_context_) -, databases(std::move(databases_)) -, strictness_mode(strictness_mode_) -, referential_dependencies("ReferentialDeps") -, loading_dependencies("LoadingDeps") -, all_loading_dependencies("LoadingDeps") + : global_context(global_context_) + , databases(std::move(databases_)) + , strictness_mode(strictness_mode_) + , referential_dependencies("ReferentialDeps") + , loading_dependencies("LoadingDeps") + , all_loading_dependencies("LoadingDeps") + , pool(CurrentMetrics::TablesLoaderThreads, CurrentMetrics::TablesLoaderThreadsActive) { metadata.default_database = global_context->getCurrentDatabase(); log = &Poco::Logger::get("TablesLoader"); diff --git a/src/Dictionaries/CacheDictionaryUpdateQueue.cpp b/src/Dictionaries/CacheDictionaryUpdateQueue.cpp index 1fdaf10c57c..09d5bed18b8 100644 --- a/src/Dictionaries/CacheDictionaryUpdateQueue.cpp +++ b/src/Dictionaries/CacheDictionaryUpdateQueue.cpp @@ -2,8 +2,15 @@ #include +#include #include +namespace CurrentMetrics +{ + extern const Metric CacheDictionaryThreads; + extern const Metric CacheDictionaryThreadsActive; +} + namespace DB { @@ -26,7 +33,7 @@ CacheDictionaryUpdateQueue::CacheDictionaryUpdateQueue( , configuration(configuration_) , update_func(std::move(update_func_)) , update_queue(configuration.max_update_queue_size) - , update_pool(configuration.max_threads_for_updates) + , update_pool(CurrentMetrics::CacheDictionaryThreads, CurrentMetrics::CacheDictionaryThreadsActive, configuration.max_threads_for_updates) { for (size_t i = 0; i < configuration.max_threads_for_updates; ++i) update_pool.scheduleOrThrowOnError([this] { updateThreadFunction(); }); diff --git a/src/Dictionaries/HashedDictionary.cpp b/src/Dictionaries/HashedDictionary.cpp index d6c9ac50dbe..0e5d18363e9 100644 --- a/src/Dictionaries/HashedDictionary.cpp +++ b/src/Dictionaries/HashedDictionary.cpp @@ -8,6 +8,7 @@ #include #include #include +#include #include @@ -21,6 +22,11 @@ #include #include +namespace CurrentMetrics +{ + extern const Metric HashedDictionaryThreads; + extern const Metric HashedDictionaryThreadsActive; +} namespace { @@ -60,7 +66,7 @@ public: explicit ParallelDictionaryLoader(HashedDictionary & dictionary_) : dictionary(dictionary_) , shards(dictionary.configuration.shards) - , pool(shards) + , pool(CurrentMetrics::HashedDictionaryThreads, CurrentMetrics::HashedDictionaryThreadsActive, shards) , shards_queues(shards) { UInt64 backlog = dictionary.configuration.shard_load_queue_backlog; @@ -213,7 +219,7 @@ HashedDictionary::~HashedDictionary() return; size_t shards = std::max(configuration.shards, 1); - ThreadPool pool(shards); + ThreadPool pool(CurrentMetrics::HashedDictionaryThreads, CurrentMetrics::HashedDictionaryThreadsActive, shards); size_t hash_tables_count = 0; auto schedule_destroy = [&hash_tables_count, &pool](auto & container) diff --git a/src/Disks/IO/ThreadPoolReader.cpp b/src/Disks/IO/ThreadPoolReader.cpp index 18b283b0ff3..3a071d13122 100644 --- a/src/Disks/IO/ThreadPoolReader.cpp +++ b/src/Disks/IO/ThreadPoolReader.cpp @@ -61,6 +61,8 @@ namespace ProfileEvents namespace CurrentMetrics { extern const Metric Read; + extern const Metric ThreadPoolFSReaderThreads; + extern const Metric ThreadPoolFSReaderThreadsActive; } @@ -85,7 +87,7 @@ static bool hasBugInPreadV2() #endif ThreadPoolReader::ThreadPoolReader(size_t pool_size, size_t queue_size_) - : pool(pool_size, pool_size, queue_size_) + : pool(CurrentMetrics::ThreadPoolFSReaderThreads, CurrentMetrics::ThreadPoolFSReaderThreadsActive, pool_size, pool_size, queue_size_) { } diff --git a/src/Disks/IO/ThreadPoolRemoteFSReader.cpp b/src/Disks/IO/ThreadPoolRemoteFSReader.cpp index c2d3ee8b53d..1980f57c876 100644 --- a/src/Disks/IO/ThreadPoolRemoteFSReader.cpp +++ b/src/Disks/IO/ThreadPoolRemoteFSReader.cpp @@ -26,6 +26,8 @@ namespace ProfileEvents namespace CurrentMetrics { extern const Metric RemoteRead; + extern const Metric ThreadPoolRemoteFSReaderThreads; + extern const Metric ThreadPoolRemoteFSReaderThreadsActive; } namespace DB @@ -60,7 +62,7 @@ IAsynchronousReader::Result RemoteFSFileDescriptor::readInto(char * data, size_t ThreadPoolRemoteFSReader::ThreadPoolRemoteFSReader(size_t pool_size, size_t queue_size_) - : pool(pool_size, pool_size, queue_size_) + : pool(CurrentMetrics::ThreadPoolRemoteFSReaderThreads, CurrentMetrics::ThreadPoolRemoteFSReaderThreadsActive, pool_size, pool_size, queue_size_) { } diff --git a/src/Disks/ObjectStorages/DiskObjectStorage.cpp b/src/Disks/ObjectStorages/DiskObjectStorage.cpp index 44cb80558af..6143f0620b8 100644 --- a/src/Disks/ObjectStorages/DiskObjectStorage.cpp +++ b/src/Disks/ObjectStorages/DiskObjectStorage.cpp @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -17,6 +18,13 @@ #include #include +namespace CurrentMetrics +{ + extern const Metric DiskObjectStorageAsyncThreads; + extern const Metric DiskObjectStorageAsyncThreadsActive; +} + + namespace DB { @@ -38,7 +46,8 @@ class AsyncThreadPoolExecutor : public Executor public: AsyncThreadPoolExecutor(const String & name_, int thread_pool_size) : name(name_) - , pool(ThreadPool(thread_pool_size)) {} + , pool(CurrentMetrics::DiskObjectStorageAsyncThreads, CurrentMetrics::DiskObjectStorageAsyncThreadsActive, thread_pool_size) + {} std::future execute(std::function task) override { diff --git a/src/Functions/FunctionsCodingIP.cpp b/src/Functions/FunctionsCodingIP.cpp index fb54fb951d1..f10ea24f4a6 100644 --- a/src/Functions/FunctionsCodingIP.cpp +++ b/src/Functions/FunctionsCodingIP.cpp @@ -1129,7 +1129,9 @@ public: for (size_t i = 0; i < vec_res.size(); ++i) { - vec_res[i] = DB::parseIPv6whole(reinterpret_cast(&vec_src[prev_offset]), reinterpret_cast(buffer)); + vec_res[i] = DB::parseIPv6whole(reinterpret_cast(&vec_src[prev_offset]), + reinterpret_cast(&vec_src[offsets_src[i] - 1]), + reinterpret_cast(buffer)); prev_offset = offsets_src[i]; } diff --git a/src/IO/BackupIOThreadPool.cpp b/src/IO/BackupsIOThreadPool.cpp similarity index 59% rename from src/IO/BackupIOThreadPool.cpp rename to src/IO/BackupsIOThreadPool.cpp index 067fc54b1ae..0829553945a 100644 --- a/src/IO/BackupIOThreadPool.cpp +++ b/src/IO/BackupsIOThreadPool.cpp @@ -1,5 +1,12 @@ #include -#include "Core/Field.h" +#include +#include + +namespace CurrentMetrics +{ + extern const Metric BackupsIOThreads; + extern const Metric BackupsIOThreadsActive; +} namespace DB { @@ -18,7 +25,13 @@ void BackupsIOThreadPool::initialize(size_t max_threads, size_t max_free_threads throw Exception(ErrorCodes::LOGICAL_ERROR, "The BackupsIO thread pool is initialized twice"); } - instance = std::make_unique(max_threads, max_free_threads, queue_size, false /*shutdown_on_exception*/); + instance = std::make_unique( + CurrentMetrics::BackupsIOThreads, + CurrentMetrics::BackupsIOThreadsActive, + max_threads, + max_free_threads, + queue_size, + /* shutdown_on_exception= */ false); } ThreadPool & BackupsIOThreadPool::get() diff --git a/src/IO/IOThreadPool.cpp b/src/IO/IOThreadPool.cpp index 4014d00d8b8..98bb6ffe6a7 100644 --- a/src/IO/IOThreadPool.cpp +++ b/src/IO/IOThreadPool.cpp @@ -1,5 +1,12 @@ #include -#include "Core/Field.h" +#include +#include + +namespace CurrentMetrics +{ + extern const Metric IOThreads; + extern const Metric IOThreadsActive; +} namespace DB { @@ -18,7 +25,13 @@ void IOThreadPool::initialize(size_t max_threads, size_t max_free_threads, size_ throw Exception(ErrorCodes::LOGICAL_ERROR, "The IO thread pool is initialized twice"); } - instance = std::make_unique(max_threads, max_free_threads, queue_size, false /*shutdown_on_exception*/); + instance = std::make_unique( + CurrentMetrics::IOThreads, + CurrentMetrics::IOThreadsActive, + max_threads, + max_free_threads, + queue_size, + /* shutdown_on_exception= */ false); } ThreadPool & IOThreadPool::get() diff --git a/src/Interpreters/Aggregator.cpp b/src/Interpreters/Aggregator.cpp index b0bcea23449..d6fbf072d05 100644 --- a/src/Interpreters/Aggregator.cpp +++ b/src/Interpreters/Aggregator.cpp @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -59,6 +60,8 @@ namespace ProfileEvents namespace CurrentMetrics { extern const Metric TemporaryFilesForAggregation; + extern const Metric AggregatorThreads; + extern const Metric AggregatorThreadsActive; } namespace DB @@ -2397,7 +2400,7 @@ BlocksList Aggregator::convertToBlocks(AggregatedDataVariants & data_variants, b std::unique_ptr thread_pool; if (max_threads > 1 && data_variants.sizeWithoutOverflowRow() > 100000 /// TODO Make a custom threshold. && data_variants.isTwoLevel()) /// TODO Use the shared thread pool with the `merge` function. - thread_pool = std::make_unique(max_threads); + thread_pool = std::make_unique(CurrentMetrics::AggregatorThreads, CurrentMetrics::AggregatorThreadsActive, max_threads); if (data_variants.without_key) blocks.emplace_back(prepareBlockAndFillWithoutKey( @@ -2592,7 +2595,7 @@ void NO_INLINE Aggregator::mergeDataOnlyExistingKeysImpl( void NO_INLINE Aggregator::mergeWithoutKeyDataImpl( ManyAggregatedDataVariants & non_empty_data) const { - ThreadPool thread_pool{params.max_threads}; + ThreadPool thread_pool{CurrentMetrics::AggregatorThreads, CurrentMetrics::AggregatorThreadsActive, params.max_threads}; AggregatedDataVariantsPtr & res = non_empty_data[0]; @@ -3065,7 +3068,7 @@ void Aggregator::mergeBlocks(BucketToBlocks bucket_to_blocks, AggregatedDataVari std::unique_ptr thread_pool; if (max_threads > 1 && total_input_rows > 100000) /// TODO Make a custom threshold. - thread_pool = std::make_unique(max_threads); + thread_pool = std::make_unique(CurrentMetrics::AggregatorThreads, CurrentMetrics::AggregatorThreadsActive, max_threads); for (const auto & bucket_blocks : bucket_to_blocks) { diff --git a/src/Interpreters/AsynchronousInsertQueue.cpp b/src/Interpreters/AsynchronousInsertQueue.cpp index 590cbc9ba83..2dd2409442e 100644 --- a/src/Interpreters/AsynchronousInsertQueue.cpp +++ b/src/Interpreters/AsynchronousInsertQueue.cpp @@ -32,6 +32,8 @@ namespace CurrentMetrics { extern const Metric PendingAsyncInsert; + extern const Metric AsynchronousInsertThreads; + extern const Metric AsynchronousInsertThreadsActive; } namespace ProfileEvents @@ -130,7 +132,7 @@ AsynchronousInsertQueue::AsynchronousInsertQueue(ContextPtr context_, size_t poo : WithContext(context_) , pool_size(pool_size_) , queue_shards(pool_size) - , pool(pool_size) + , pool(CurrentMetrics::AsynchronousInsertThreads, CurrentMetrics::AsynchronousInsertThreadsActive, pool_size) { if (!pool_size) throw Exception(ErrorCodes::BAD_ARGUMENTS, "pool_size cannot be zero"); diff --git a/src/Interpreters/DDLWorker.cpp b/src/Interpreters/DDLWorker.cpp index f70c0010585..22bece0ef04 100644 --- a/src/Interpreters/DDLWorker.cpp +++ b/src/Interpreters/DDLWorker.cpp @@ -40,6 +40,12 @@ namespace fs = std::filesystem; +namespace CurrentMetrics +{ + extern const Metric DDLWorkerThreads; + extern const Metric DDLWorkerThreadsActive; +} + namespace DB { @@ -85,7 +91,7 @@ DDLWorker::DDLWorker( { LOG_WARNING(log, "DDLWorker is configured to use multiple threads. " "It's not recommended because queries can be reordered. Also it may cause some unknown issues to appear."); - worker_pool = std::make_unique(pool_size); + worker_pool = std::make_unique(CurrentMetrics::DDLWorkerThreads, CurrentMetrics::DDLWorkerThreadsActive, pool_size); } queue_dir = zk_root_dir; @@ -1084,7 +1090,7 @@ void DDLWorker::runMainThread() /// It will wait for all threads in pool to finish and will not rethrow exceptions (if any). /// We create new thread pool to forget previous exceptions. if (1 < pool_size) - worker_pool = std::make_unique(pool_size); + worker_pool = std::make_unique(CurrentMetrics::DDLWorkerThreads, CurrentMetrics::DDLWorkerThreadsActive, pool_size); /// Clear other in-memory state, like server just started. current_tasks.clear(); last_skipped_entry_name.reset(); @@ -1123,7 +1129,7 @@ void DDLWorker::runMainThread() initialized = false; /// Wait for pending async tasks if (1 < pool_size) - worker_pool = std::make_unique(pool_size); + worker_pool = std::make_unique(CurrentMetrics::DDLWorkerThreads, CurrentMetrics::DDLWorkerThreadsActive, pool_size); LOG_INFO(log, "Lost ZooKeeper connection, will try to connect again: {}", getCurrentExceptionMessage(true)); } else diff --git a/src/Interpreters/DDLWorker.h b/src/Interpreters/DDLWorker.h index 65ef4b440a1..6cf034edae8 100644 --- a/src/Interpreters/DDLWorker.h +++ b/src/Interpreters/DDLWorker.h @@ -1,6 +1,7 @@ #pragma once #include +#include #include #include #include diff --git a/src/Interpreters/DatabaseCatalog.cpp b/src/Interpreters/DatabaseCatalog.cpp index ac9b8615fe5..f37e41614b0 100644 --- a/src/Interpreters/DatabaseCatalog.cpp +++ b/src/Interpreters/DatabaseCatalog.cpp @@ -38,6 +38,8 @@ namespace CurrentMetrics { extern const Metric TablesToDropQueueSize; + extern const Metric DatabaseCatalogThreads; + extern const Metric DatabaseCatalogThreadsActive; } namespace DB @@ -852,7 +854,7 @@ void DatabaseCatalog::loadMarkedAsDroppedTables() LOG_INFO(log, "Found {} partially dropped tables. Will load them and retry removal.", dropped_metadata.size()); - ThreadPool pool; + ThreadPool pool(CurrentMetrics::DatabaseCatalogThreads, CurrentMetrics::DatabaseCatalogThreadsActive); for (const auto & elem : dropped_metadata) { pool.scheduleOrThrowOnError([&]() diff --git a/src/Interpreters/InterpreterExplainQuery.cpp b/src/Interpreters/InterpreterExplainQuery.cpp index 3c225522cc4..3a381cd8dab 100644 --- a/src/Interpreters/InterpreterExplainQuery.cpp +++ b/src/Interpreters/InterpreterExplainQuery.cpp @@ -41,6 +41,7 @@ namespace ErrorCodes extern const int INVALID_SETTING_VALUE; extern const int UNKNOWN_SETTING; extern const int LOGICAL_ERROR; + extern const int NOT_IMPLEMENTED; } namespace @@ -386,6 +387,10 @@ QueryPipeline InterpreterExplainQuery::executeImpl() } case ASTExplainQuery::QueryTree: { + if (!getContext()->getSettingsRef().allow_experimental_analyzer) + throw Exception(ErrorCodes::NOT_IMPLEMENTED, + "EXPLAIN QUERY TREE is only supported with a new analyzer. Set allow_experimental_analyzer = 1."); + if (ast.getExplainedQuery()->as() == nullptr) throw Exception(ErrorCodes::INCORRECT_QUERY, "Only SELECT is supported for EXPLAIN QUERY TREE query"); diff --git a/src/Interpreters/InterpreterShowEngineQuery.cpp b/src/Interpreters/InterpreterShowEngineQuery.cpp index 5aae6ad5d28..8fd829f39ec 100644 --- a/src/Interpreters/InterpreterShowEngineQuery.cpp +++ b/src/Interpreters/InterpreterShowEngineQuery.cpp @@ -12,7 +12,7 @@ namespace DB BlockIO InterpreterShowEnginesQuery::execute() { - return executeQuery("SELECT * FROM system.table_engines", getContext(), true); + return executeQuery("SELECT * FROM system.table_engines ORDER BY name", getContext(), true); } } diff --git a/src/Interpreters/InterpreterShowTablesQuery.cpp b/src/Interpreters/InterpreterShowTablesQuery.cpp index 4e0dfdc9236..a631cb72722 100644 --- a/src/Interpreters/InterpreterShowTablesQuery.cpp +++ b/src/Interpreters/InterpreterShowTablesQuery.cpp @@ -28,7 +28,6 @@ InterpreterShowTablesQuery::InterpreterShowTablesQuery(const ASTPtr & query_ptr_ { } - String InterpreterShowTablesQuery::getRewrittenQuery() { const auto & query = query_ptr->as(); @@ -51,6 +50,9 @@ String InterpreterShowTablesQuery::getRewrittenQuery() if (query.limit_length) rewritten_query << " LIMIT " << query.limit_length; + /// (*) + rewritten_query << " ORDER BY name"; + return rewritten_query.str(); } @@ -69,6 +71,9 @@ String InterpreterShowTablesQuery::getRewrittenQuery() << DB::quote << query.like; } + /// (*) + rewritten_query << " ORDER BY cluster"; + if (query.limit_length) rewritten_query << " LIMIT " << query.limit_length; @@ -81,6 +86,9 @@ String InterpreterShowTablesQuery::getRewrittenQuery() rewritten_query << " WHERE cluster = " << DB::quote << query.cluster_str; + /// (*) + rewritten_query << " ORDER BY cluster"; + return rewritten_query.str(); } @@ -101,6 +109,9 @@ String InterpreterShowTablesQuery::getRewrittenQuery() << DB::quote << query.like; } + /// (*) + rewritten_query << " ORDER BY name, type, value "; + return rewritten_query.str(); } @@ -146,6 +157,9 @@ String InterpreterShowTablesQuery::getRewrittenQuery() else if (query.where_expression) rewritten_query << " AND (" << query.where_expression << ")"; + /// (*) + rewritten_query << " ORDER BY name "; + if (query.limit_length) rewritten_query << " LIMIT " << query.limit_length; @@ -176,5 +190,8 @@ BlockIO InterpreterShowTablesQuery::execute() return executeQuery(getRewrittenQuery(), getContext(), true); } +/// (*) Sorting is strictly speaking not necessary but 1. it is convenient for users, 2. SQL currently does not allow to +/// sort the output of SHOW otherwise (SELECT * FROM (SHOW ...) ORDER BY ...) is rejected) and 3. some +/// SQL tests can take advantage of this. } diff --git a/src/Interpreters/InterpreterSystemQuery.cpp b/src/Interpreters/InterpreterSystemQuery.cpp index fb6b1635f28..cd3e5579e12 100644 --- a/src/Interpreters/InterpreterSystemQuery.cpp +++ b/src/Interpreters/InterpreterSystemQuery.cpp @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -64,6 +65,12 @@ #include "config.h" +namespace CurrentMetrics +{ + extern const Metric RestartReplicaThreads; + extern const Metric RestartReplicaThreadsActive; +} + namespace DB { @@ -685,7 +692,8 @@ void InterpreterSystemQuery::restartReplicas(ContextMutablePtr system_context) for (auto & guard : guards) guard.second = catalog.getDDLGuard(guard.first.database_name, guard.first.table_name); - ThreadPool pool(std::min(static_cast(getNumberOfPhysicalCPUCores()), replica_names.size())); + size_t threads = std::min(static_cast(getNumberOfPhysicalCPUCores()), replica_names.size()); + ThreadPool pool(CurrentMetrics::RestartReplicaThreads, CurrentMetrics::RestartReplicaThreadsActive, threads); for (auto & replica : replica_names) { diff --git a/src/Interpreters/executeQuery.cpp b/src/Interpreters/executeQuery.cpp index 74394879191..e60510dc5f3 100644 --- a/src/Interpreters/executeQuery.cpp +++ b/src/Interpreters/executeQuery.cpp @@ -1223,18 +1223,37 @@ void executeQuery( end = begin + parse_buf.size(); } - ASTPtr ast; - BlockIO streams; - - std::tie(ast, streams) = executeQueryImpl(begin, end, context, false, QueryProcessingStage::Complete, &istr); - auto & pipeline = streams.pipeline; - QueryResultDetails result_details { .query_id = context->getClientInfo().current_query_id, .timezone = DateLUT::instance().getTimeZone(), }; + /// Set the result details in case of any exception raised during query execution + SCOPE_EXIT({ + if (set_result_details == nullptr) + /// Either the result_details have been set in the flow below or the caller of this function does not provide this callback + return; + + try + { + set_result_details(result_details); + } + catch (...) + { + /// This exception can be ignored. + /// because if the code goes here, it means there's already an exception raised during query execution, + /// and that exception will be propagated to outer caller, + /// there's no need to report the exception thrown here. + } + }); + + ASTPtr ast; + BlockIO streams; + + std::tie(ast, streams) = executeQueryImpl(begin, end, context, false, QueryProcessingStage::Complete, &istr); + auto & pipeline = streams.pipeline; + std::unique_ptr compressed_buffer; try { @@ -1304,7 +1323,15 @@ void executeQuery( } if (set_result_details) - set_result_details(result_details); + { + /// The call of set_result_details itself might throw exception, + /// in such case there's no need to call this function again in the SCOPE_EXIT defined above. + /// So the callback is cleared before its execution. + auto set_result_details_copy = set_result_details; + set_result_details = nullptr; + + set_result_details_copy(result_details); + } if (pipeline.initialized()) { diff --git a/src/Interpreters/loadMetadata.cpp b/src/Interpreters/loadMetadata.cpp index 47ccff57419..83af2684322 100644 --- a/src/Interpreters/loadMetadata.cpp +++ b/src/Interpreters/loadMetadata.cpp @@ -16,16 +16,24 @@ #include #include -#include +#include #include -#include #include +#include + +#include #define ORDINARY_TO_ATOMIC_PREFIX ".tmp_convert." namespace fs = std::filesystem; +namespace CurrentMetrics +{ + extern const Metric StartupSystemTablesThreads; + extern const Metric StartupSystemTablesThreadsActive; +} + namespace DB { @@ -366,7 +374,7 @@ static void maybeConvertOrdinaryDatabaseToAtomic(ContextMutablePtr context, cons if (!tables_started) { /// It's not quite correct to run DDL queries while database is not started up. - ThreadPool pool; + ThreadPool pool(CurrentMetrics::StartupSystemTablesThreads, CurrentMetrics::StartupSystemTablesThreadsActive); DatabaseCatalog::instance().getSystemDatabase()->startupTables(pool, LoadingStrictnessLevel::FORCE_RESTORE); } @@ -461,7 +469,7 @@ void convertDatabasesEnginesIfNeed(ContextMutablePtr context) void startupSystemTables() { - ThreadPool pool; + ThreadPool pool(CurrentMetrics::StartupSystemTablesThreads, CurrentMetrics::StartupSystemTablesThreadsActive); DatabaseCatalog::instance().getSystemDatabase()->startupTables(pool, LoadingStrictnessLevel::FORCE_RESTORE); } diff --git a/src/Parsers/IParser.h b/src/Parsers/IParser.h index dbd292bcd9f..d53b58baa7c 100644 --- a/src/Parsers/IParser.h +++ b/src/Parsers/IParser.h @@ -1,6 +1,7 @@ #pragma once #include +#include #include #include diff --git a/src/Processors/Formats/IRowInputFormat.cpp b/src/Processors/Formats/IRowInputFormat.cpp index 81c818e3334..948e6f4cb82 100644 --- a/src/Processors/Formats/IRowInputFormat.cpp +++ b/src/Processors/Formats/IRowInputFormat.cpp @@ -28,6 +28,7 @@ namespace ErrorCodes extern const int CANNOT_PARSE_DOMAIN_VALUE_FROM_STRING; extern const int CANNOT_PARSE_IPV4; extern const int CANNOT_PARSE_IPV6; + extern const int UNKNOWN_ELEMENT_OF_ENUM; } @@ -48,7 +49,8 @@ bool isParseError(int code) || code == ErrorCodes::INCORRECT_DATA /// For some ReadHelpers || code == ErrorCodes::CANNOT_PARSE_DOMAIN_VALUE_FROM_STRING || code == ErrorCodes::CANNOT_PARSE_IPV4 - || code == ErrorCodes::CANNOT_PARSE_IPV6; + || code == ErrorCodes::CANNOT_PARSE_IPV6 + || code == ErrorCodes::UNKNOWN_ELEMENT_OF_ENUM; } IRowInputFormat::IRowInputFormat(Block header, ReadBuffer & in_, Params params_) diff --git a/src/Processors/Formats/Impl/AvroRowOutputFormat.cpp b/src/Processors/Formats/Impl/AvroRowOutputFormat.cpp index cd1c1b096aa..c743b2c1766 100644 --- a/src/Processors/Formats/Impl/AvroRowOutputFormat.cpp +++ b/src/Processors/Formats/Impl/AvroRowOutputFormat.cpp @@ -565,6 +565,11 @@ void AvroRowOutputFormat::write(const Columns & columns, size_t row_num) void AvroRowOutputFormat::finalizeImpl() { + /// If file writer weren't created, we should create it here to write file prefix/suffix + /// even without actual data so the file will be valid Avro file + if (!file_writer_ptr) + createFileWriter(); + file_writer_ptr->close(); } diff --git a/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.h b/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.h index 7ea19f01e01..790d05e83dd 100644 --- a/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.h +++ b/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.h @@ -7,7 +7,8 @@ #include #include #include -#include "IO/WriteBufferFromString.h" +#include +#include #include #include #include @@ -17,6 +18,12 @@ #include #include +namespace CurrentMetrics +{ + extern const Metric ParallelFormattingOutputFormatThreads; + extern const Metric ParallelFormattingOutputFormatThreadsActive; +} + namespace DB { @@ -74,7 +81,7 @@ public: explicit ParallelFormattingOutputFormat(Params params) : IOutputFormat(params.header, params.out) , internal_formatter_creator(params.internal_formatter_creator) - , pool(params.max_threads_for_parallel_formatting) + , pool(CurrentMetrics::ParallelFormattingOutputFormatThreads, CurrentMetrics::ParallelFormattingOutputFormatThreadsActive, params.max_threads_for_parallel_formatting) { LOG_TEST(&Poco::Logger::get("ParallelFormattingOutputFormat"), "Parallel formatting is being used"); diff --git a/src/Processors/Formats/Impl/ParallelParsingInputFormat.h b/src/Processors/Formats/Impl/ParallelParsingInputFormat.h index 03fb2d650dc..97df9308dbf 100644 --- a/src/Processors/Formats/Impl/ParallelParsingInputFormat.h +++ b/src/Processors/Formats/Impl/ParallelParsingInputFormat.h @@ -5,14 +5,21 @@ #include #include #include +#include +#include #include #include #include #include -#include #include +namespace CurrentMetrics +{ + extern const Metric ParallelParsingInputFormatThreads; + extern const Metric ParallelParsingInputFormatThreadsActive; +} + namespace DB { @@ -94,7 +101,7 @@ public: , min_chunk_bytes(params.min_chunk_bytes) , max_block_size(params.max_block_size) , is_server(params.is_server) - , pool(params.max_threads) + , pool(CurrentMetrics::ParallelParsingInputFormatThreads, CurrentMetrics::ParallelParsingInputFormatThreadsActive, params.max_threads) { // One unit for each thread, including segmentator and reader, plus a // couple more units so that the segmentation thread doesn't spuriously diff --git a/src/Processors/QueryPlan/FilterStep.cpp b/src/Processors/QueryPlan/FilterStep.cpp index dc837446a96..9137b2d1998 100644 --- a/src/Processors/QueryPlan/FilterStep.cpp +++ b/src/Processors/QueryPlan/FilterStep.cpp @@ -14,7 +14,7 @@ static ITransformingStep::Traits getTraits(const ActionsDAGPtr & expression, con bool preserves_sorting = expression->isSortingPreserved(header, sort_description, remove_filter_column ? filter_column_name : ""); if (remove_filter_column) { - preserves_sorting &= find_if( + preserves_sorting &= std::find_if( begin(sort_description), end(sort_description), [&](const auto & column_desc) { return column_desc.column_name == filter_column_name; }) diff --git a/src/Processors/QueryPlan/Optimizations/useDataParallelAggregation.cpp b/src/Processors/QueryPlan/Optimizations/useDataParallelAggregation.cpp index 13a749bd0b6..dc801d10514 100644 --- a/src/Processors/QueryPlan/Optimizations/useDataParallelAggregation.cpp +++ b/src/Processors/QueryPlan/Optimizations/useDataParallelAggregation.cpp @@ -153,6 +153,10 @@ bool isPartitionKeySuitsGroupByKey( if (group_by_actions->hasArrayJoin() || group_by_actions->hasStatefulFunctions() || group_by_actions->hasNonDeterministic()) return false; + + /// We are interested only in calculations required to obtain group by keys (and not aggregate function arguments for example). + group_by_actions->removeUnusedActions(aggregating.getParams().keys); + const auto & gb_key_required_columns = group_by_actions->getRequiredColumnsNames(); const auto & partition_actions = reading.getStorageMetadata()->getPartitionKey().expression->getActionsDAG(); @@ -202,7 +206,7 @@ size_t tryAggregatePartitionsIndependently(QueryPlan::Node * node, QueryPlan::No return 0; if (!reading->willOutputEachPartitionThroughSeparatePort() - && isPartitionKeySuitsGroupByKey(*reading, expression_step->getExpression(), *aggregating_step)) + && isPartitionKeySuitsGroupByKey(*reading, expression_step->getExpression()->clone(), *aggregating_step)) { if (reading->requestOutputEachPartitionThroughSeparatePort()) aggregating_step->skipMerging(); diff --git a/src/Processors/Transforms/AggregatingTransform.h b/src/Processors/Transforms/AggregatingTransform.h index 3abd2ac3346..048b69adae6 100644 --- a/src/Processors/Transforms/AggregatingTransform.h +++ b/src/Processors/Transforms/AggregatingTransform.h @@ -6,6 +6,13 @@ #include #include #include +#include + +namespace CurrentMetrics +{ + extern const Metric DestroyAggregatesThreads; + extern const Metric DestroyAggregatesThreadsActive; +} namespace DB { @@ -84,7 +91,10 @@ struct ManyAggregatedData // Aggregation states destruction may be very time-consuming. // In the case of a query with LIMIT, most states won't be destroyed during conversion to blocks. // Without the following code, they would be destroyed in the destructor of AggregatedDataVariants in the current thread (i.e. sequentially). - const auto pool = std::make_unique(variants.size()); + const auto pool = std::make_unique( + CurrentMetrics::DestroyAggregatesThreads, + CurrentMetrics::DestroyAggregatesThreadsActive, + variants.size()); for (auto && variant : variants) { diff --git a/src/Storages/Hive/StorageHive.cpp b/src/Storages/Hive/StorageHive.cpp index 85e6341eb5a..71830e8bc86 100644 --- a/src/Storages/Hive/StorageHive.cpp +++ b/src/Storages/Hive/StorageHive.cpp @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -40,6 +41,11 @@ #include #include +namespace CurrentMetrics +{ + extern const Metric StorageHiveThreads; + extern const Metric StorageHiveThreadsActive; +} namespace DB { @@ -844,7 +850,7 @@ HiveFiles StorageHive::collectHiveFiles( Int64 hive_max_query_partitions = context_->getSettings().max_partitions_to_read; /// Mutext to protect hive_files, which maybe appended in multiple threads std::mutex hive_files_mutex; - ThreadPool pool{max_threads}; + ThreadPool pool{CurrentMetrics::StorageHiveThreads, CurrentMetrics::StorageHiveThreadsActive, max_threads}; if (!partitions.empty()) { for (const auto & partition : partitions) diff --git a/src/Storages/MeiliSearch/StorageMeiliSearch.cpp b/src/Storages/MeiliSearch/StorageMeiliSearch.cpp index 62a6c471070..fe8cc74f577 100644 --- a/src/Storages/MeiliSearch/StorageMeiliSearch.cpp +++ b/src/Storages/MeiliSearch/StorageMeiliSearch.cpp @@ -99,7 +99,7 @@ Pipe StorageMeiliSearch::read( for (const auto & el : query_params->children) { auto str = el->getColumnName(); - auto it = find(str.begin(), str.end(), '='); + auto it = std::find(str.begin(), str.end(), '='); if (it == str.end()) throw Exception(ErrorCodes::BAD_QUERY_PARAMETER, "meiliMatch function must have parameters of the form \'key=value\'"); diff --git a/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h b/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h index 5c47d20865b..a27fb18c0fe 100644 --- a/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h +++ b/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h @@ -21,6 +21,12 @@ #include +namespace CurrentMetrics +{ + extern const Metric MergeTreeBackgroundExecutorThreads; + extern const Metric MergeTreeBackgroundExecutorThreadsActive; +} + namespace DB { namespace ErrorCodes @@ -255,6 +261,7 @@ public: , max_tasks_count(max_tasks_count_) , metric(metric_) , max_tasks_metric(max_tasks_metric_, 2 * max_tasks_count) // active + pending + , pool(CurrentMetrics::MergeTreeBackgroundExecutorThreads, CurrentMetrics::MergeTreeBackgroundExecutorThreadsActive) { if (max_tasks_count == 0) throw Exception(ErrorCodes::INVALID_CONFIG_PARAMETER, "Task count for MergeTreeBackgroundExecutor must not be zero"); diff --git a/src/Storages/MergeTree/MergeTreeData.cpp b/src/Storages/MergeTree/MergeTreeData.cpp index fef6ccbdaeb..f9848b572f9 100644 --- a/src/Storages/MergeTree/MergeTreeData.cpp +++ b/src/Storages/MergeTree/MergeTreeData.cpp @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -117,6 +118,10 @@ namespace ProfileEvents namespace CurrentMetrics { extern const Metric DelayedInserts; + extern const Metric MergeTreePartsLoaderThreads; + extern const Metric MergeTreePartsLoaderThreadsActive; + extern const Metric MergeTreePartsCleanerThreads; + extern const Metric MergeTreePartsCleanerThreadsActive; } @@ -1567,7 +1572,7 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks) } } - ThreadPool pool(disks.size()); + ThreadPool pool(CurrentMetrics::MergeTreePartsLoaderThreads, CurrentMetrics::MergeTreePartsLoaderThreadsActive, disks.size()); std::vector parts_to_load_by_disk(disks.size()); for (size_t i = 0; i < disks.size(); ++i) @@ -2299,7 +2304,7 @@ void MergeTreeData::clearPartsFromFilesystemImpl(const DataPartsVector & parts_t /// Parallel parts removal. size_t num_threads = std::min(settings->max_part_removal_threads, parts_to_remove.size()); std::mutex part_names_mutex; - ThreadPool pool(num_threads); + ThreadPool pool(CurrentMetrics::MergeTreePartsCleanerThreads, CurrentMetrics::MergeTreePartsCleanerThreadsActive, num_threads); bool has_zero_copy_parts = false; if (settings->allow_remote_fs_zero_copy_replication && dynamic_cast(this) != nullptr) diff --git a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp index ff8862f0f36..2039912106c 100644 --- a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp +++ b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp @@ -35,6 +35,7 @@ #include #include +#include #include #include #include @@ -46,6 +47,12 @@ #include +namespace CurrentMetrics +{ + extern const Metric MergeTreeDataSelectExecutorThreads; + extern const Metric MergeTreeDataSelectExecutorThreadsActive; +} + namespace DB { @@ -1077,7 +1084,10 @@ RangesInDataParts MergeTreeDataSelectExecutor::filterPartsByPrimaryKeyAndSkipInd else { /// Parallel loading of data parts. - ThreadPool pool(num_threads); + ThreadPool pool( + CurrentMetrics::MergeTreeDataSelectExecutorThreads, + CurrentMetrics::MergeTreeDataSelectExecutorThreadsActive, + num_threads); for (size_t part_index = 0; part_index < parts.size(); ++part_index) pool.scheduleOrThrowOnError([&, part_index, thread_group = CurrentThread::getGroup()] diff --git a/src/Storages/StorageDistributed.cpp b/src/Storages/StorageDistributed.cpp index 0be4ae3a79f..4dddd22fae5 100644 --- a/src/Storages/StorageDistributed.cpp +++ b/src/Storages/StorageDistributed.cpp @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -126,6 +127,12 @@ namespace ProfileEvents extern const Event DistributedDelayedInsertsMilliseconds; } +namespace CurrentMetrics +{ + extern const Metric StorageDistributedThreads; + extern const Metric StorageDistributedThreadsActive; +} + namespace DB { @@ -1436,7 +1443,7 @@ void StorageDistributed::initializeFromDisk() const auto & disks = data_volume->getDisks(); /// Make initialization for large number of disks parallel. - ThreadPool pool(disks.size()); + ThreadPool pool(CurrentMetrics::StorageDistributedThreads, CurrentMetrics::StorageDistributedThreadsActive, disks.size()); for (const DiskPtr & disk : disks) { diff --git a/src/Storages/StorageMerge.cpp b/src/Storages/StorageMerge.cpp index 0f7b47255f6..00ccfebff25 100644 --- a/src/Storages/StorageMerge.cpp +++ b/src/Storages/StorageMerge.cpp @@ -655,7 +655,9 @@ QueryPipelineBuilderPtr ReadFromMerge::createSources( if (real_column_names.empty()) real_column_names.push_back(ExpressionActions::getSmallestColumn(storage_snapshot->metadata->getColumns().getAllPhysical()).name); - QueryPlan plan; + /// Steps for reading from child tables should have the same lifetime as the current step + /// because `builder` can have references to them (mainly for EXPLAIN PIPELINE). + QueryPlan & plan = child_plans.emplace_back(); StorageView * view = dynamic_cast(storage.get()); if (!view || allow_experimental_analyzer) diff --git a/src/Storages/StorageMerge.h b/src/Storages/StorageMerge.h index 4bc47375047..d53dcd34f5f 100644 --- a/src/Storages/StorageMerge.h +++ b/src/Storages/StorageMerge.h @@ -159,6 +159,10 @@ private: StoragePtr storage_merge; StorageSnapshotPtr merge_storage_snapshot; + /// Store read plan for each child table. + /// It's needed to guarantee lifetime for child steps to be the same as for this step. + std::vector child_plans; + SelectQueryInfo query_info; ContextMutablePtr context; QueryProcessingStage::Enum common_processed_stage; diff --git a/src/Storages/StorageS3.cpp b/src/Storages/StorageS3.cpp index e17860af288..c7d2915f32f 100644 --- a/src/Storages/StorageS3.cpp +++ b/src/Storages/StorageS3.cpp @@ -29,7 +29,6 @@ #include #include #include -#include #include #include @@ -53,8 +52,10 @@ #include +#include #include #include +#include #include #include @@ -65,6 +66,12 @@ namespace fs = std::filesystem; +namespace CurrentMetrics +{ + extern const Metric StorageS3Threads; + extern const Metric StorageS3ThreadsActive; +} + namespace ProfileEvents { extern const Event S3DeleteObjects; @@ -150,7 +157,7 @@ public: , object_infos(object_infos_) , read_keys(read_keys_) , request_settings(request_settings_) - , list_objects_pool(1) + , list_objects_pool(CurrentMetrics::StorageS3Threads, CurrentMetrics::StorageS3ThreadsActive, 1) , list_objects_scheduler(threadPoolCallbackRunner(list_objects_pool, "ListObjects")) { if (globbed_uri.bucket.find_first_of("*?{") != globbed_uri.bucket.npos) @@ -574,7 +581,7 @@ StorageS3Source::StorageS3Source( , requested_virtual_columns(requested_virtual_columns_) , file_iterator(file_iterator_) , download_thread_num(download_thread_num_) - , create_reader_pool(1) + , create_reader_pool(CurrentMetrics::StorageS3Threads, CurrentMetrics::StorageS3ThreadsActive, 1) , create_reader_scheduler(threadPoolCallbackRunner(create_reader_pool, "CreateS3Reader")) { reader = createReader(); diff --git a/src/Storages/System/StorageSystemReplicas.cpp b/src/Storages/System/StorageSystemReplicas.cpp index 65878d356f4..240d452fe29 100644 --- a/src/Storages/System/StorageSystemReplicas.cpp +++ b/src/Storages/System/StorageSystemReplicas.cpp @@ -7,12 +7,19 @@ #include #include #include -#include #include #include +#include +#include #include +namespace CurrentMetrics +{ + extern const Metric SystemReplicasThreads; + extern const Metric SystemReplicasThreadsActive; +} + namespace DB { @@ -160,7 +167,7 @@ Pipe StorageSystemReplicas::read( if (settings.max_threads != 0) thread_pool_size = std::min(thread_pool_size, static_cast(settings.max_threads)); - ThreadPool thread_pool(thread_pool_size); + ThreadPool thread_pool(CurrentMetrics::SystemReplicasThreads, CurrentMetrics::SystemReplicasThreadsActive, thread_pool_size); for (size_t i = 0; i < tables_size; ++i) { diff --git a/src/Storages/tests/gtest_transform_query_for_external_database.cpp b/src/Storages/tests/gtest_transform_query_for_external_database.cpp index 3eff0a8ba70..5c1442ece11 100644 --- a/src/Storages/tests/gtest_transform_query_for_external_database.cpp +++ b/src/Storages/tests/gtest_transform_query_for_external_database.cpp @@ -106,22 +106,9 @@ private: StorageID(db_name, table_name), ColumnsDescription{tab.columns}, ConstraintsDescription{}, String{})); } DatabaseCatalog::instance().attachDatabase(database->getDatabaseName(), database); - // DatabaseCatalog::instance().attachDatabase("system", mockSystemDatabase()); context->setCurrentDatabase("test"); } - - DatabasePtr mockSystemDatabase() - { - DatabasePtr database = std::make_shared("system", context); - auto tab = TableWithColumnNamesAndTypes(createDBAndTable("one", "system"), { {"dummy", std::make_shared()} }); - database->attachTable(context, tab.table.table, - std::make_shared( - StorageID(tab.table.database, tab.table.table), - ColumnsDescription{tab.columns}, ConstraintsDescription{}, String{})); - - return database; - } }; static void checkOld( diff --git a/tests/ci/build_check.py b/tests/ci/build_check.py index 4db601be803..a829069985d 100644 --- a/tests/ci/build_check.py +++ b/tests/ci/build_check.py @@ -6,15 +6,12 @@ import json import os import sys import time -from shutil import rmtree from typing import List, Tuple -from ccache_utils import get_ccache_if_not_exists, upload_ccache from ci_config import CI_CONFIG, BuildConfig from commit_status_helper import get_commit_filtered_statuses, get_commit from docker_pull_helper import get_image_with_version from env_helper import ( - CACHES_PATH, GITHUB_JOB, IMAGES_PATH, REPO_COPY, @@ -54,7 +51,6 @@ def get_packager_cmd( output_path: str, build_version: str, image_version: str, - ccache_path: str, official: bool, ) -> str: package_type = build_config["package_type"] @@ -72,8 +68,9 @@ def get_packager_cmd( if build_config["tidy"] == "enable": cmd += " --clang-tidy" - cmd += " --cache=ccache" - cmd += f" --ccache_dir={ccache_path}" + cmd += " --cache=sccache" + cmd += " --s3-rw-access" + cmd += f" --s3-bucket={S3_BUILDS_BUCKET}" if "additional_pkgs" in build_config and build_config["additional_pkgs"]: cmd += " --additional-pkgs" @@ -314,29 +311,12 @@ def main(): if not os.path.exists(build_output_path): os.makedirs(build_output_path) - ccache_path = os.path.join(CACHES_PATH, build_name + "_ccache") - - logging.info("Will try to fetch cache for our build") - try: - get_ccache_if_not_exists( - ccache_path, s3_helper, pr_info.number, TEMP_PATH, pr_info.release_pr - ) - except Exception as e: - # In case there are issues with ccache, remove the path and do not fail a build - logging.info("Failed to get ccache, building without it. Error: %s", e) - rmtree(ccache_path, ignore_errors=True) - - if not os.path.exists(ccache_path): - logging.info("cache was not fetched, will create empty dir") - os.makedirs(ccache_path) - packager_cmd = get_packager_cmd( build_config, os.path.join(REPO_COPY, "docker/packager"), build_output_path, version.string, image_version, - ccache_path, official_flag, ) @@ -352,13 +332,8 @@ def main(): subprocess.check_call( f"sudo chown -R ubuntu:ubuntu {build_output_path}", shell=True ) - subprocess.check_call(f"sudo chown -R ubuntu:ubuntu {ccache_path}", shell=True) logging.info("Build finished with %s, log path %s", success, log_path) - # Upload the ccache first to have the least build time in case of problems - logging.info("Will upload cache") - upload_ccache(ccache_path, s3_helper, pr_info.number, TEMP_PATH) - # FIXME performance performance_urls = [] performance_path = os.path.join(build_output_path, "performance.tar.zst") diff --git a/tests/ci/fast_test_check.py b/tests/ci/fast_test_check.py index a5bb64889d1..54367f70b3f 100644 --- a/tests/ci/fast_test_check.py +++ b/tests/ci/fast_test_check.py @@ -11,7 +11,6 @@ from typing import List, Tuple from github import Github -from ccache_utils import get_ccache_if_not_exists, upload_ccache from clickhouse_helper import ( ClickHouseHelper, mark_flaky_tests, @@ -22,7 +21,7 @@ from commit_status_helper import ( update_mergeable_check, ) from docker_pull_helper import get_image_with_version -from env_helper import CACHES_PATH, TEMP_PATH +from env_helper import S3_BUILDS_BUCKET, TEMP_PATH from get_robot_token import get_best_robot_token from pr_info import FORCE_TESTS_LABEL, PRInfo from report import TestResults, read_test_results @@ -38,24 +37,22 @@ NAME = "Fast test" csv.field_size_limit(sys.maxsize) -def get_fasttest_cmd( - workspace, output_path, ccache_path, repo_path, pr_number, commit_sha, image -): +def get_fasttest_cmd(workspace, output_path, repo_path, pr_number, commit_sha, image): return ( f"docker run --cap-add=SYS_PTRACE " + "--network=host " # required to get access to IAM credentials f"-e FASTTEST_WORKSPACE=/fasttest-workspace -e FASTTEST_OUTPUT=/test_output " f"-e FASTTEST_SOURCE=/ClickHouse --cap-add=SYS_PTRACE " + f"-e FASTTEST_CMAKE_FLAGS='-DCOMPILER_CACHE=sccache' " f"-e PULL_REQUEST_NUMBER={pr_number} -e COMMIT_SHA={commit_sha} " f"-e COPY_CLICKHOUSE_BINARY_TO_OUTPUT=1 " + f"-e SCCACHE_BUCKET={S3_BUILDS_BUCKET} -e SCCACHE_S3_KEY_PREFIX=ccache/sccache " f"--volume={workspace}:/fasttest-workspace --volume={repo_path}:/ClickHouse " - f"--volume={output_path}:/test_output " - f"--volume={ccache_path}:/fasttest-workspace/ccache {image}" + f"--volume={output_path}:/test_output {image}" ) -def process_results( - result_folder: str, -) -> Tuple[str, str, TestResults, List[str]]: +def process_results(result_folder: str) -> Tuple[str, str, TestResults, List[str]]: test_results = [] # type: TestResults additional_files = [] # Just upload all files from result_folder. @@ -129,21 +126,6 @@ def main(): if not os.path.exists(output_path): os.makedirs(output_path) - if not os.path.exists(CACHES_PATH): - os.makedirs(CACHES_PATH) - subprocess.check_call(f"sudo chown -R ubuntu:ubuntu {CACHES_PATH}", shell=True) - cache_path = os.path.join(CACHES_PATH, "fasttest") - - logging.info("Will try to fetch cache for our build") - ccache_for_pr = get_ccache_if_not_exists( - cache_path, s3_helper, pr_info.number, temp_path, pr_info.release_pr - ) - upload_master_ccache = ccache_for_pr in (-1, 0) - - if not os.path.exists(cache_path): - logging.info("cache was not fetched, will create empty dir") - os.makedirs(cache_path) - repo_path = os.path.join(temp_path, "fasttest-repo") if not os.path.exists(repo_path): os.makedirs(repo_path) @@ -151,7 +133,6 @@ def main(): run_cmd = get_fasttest_cmd( workspace, output_path, - cache_path, repo_path, pr_info.number, pr_info.sha, @@ -172,7 +153,6 @@ def main(): logging.info("Run failed") subprocess.check_call(f"sudo chown -R ubuntu:ubuntu {temp_path}", shell=True) - subprocess.check_call(f"sudo chown -R ubuntu:ubuntu {cache_path}", shell=True) test_output_files = os.listdir(output_path) additional_logs = [] @@ -202,12 +182,6 @@ def main(): else: state, description, test_results, additional_logs = process_results(output_path) - logging.info("Will upload cache") - upload_ccache(cache_path, s3_helper, pr_info.number, temp_path) - if upload_master_ccache: - logging.info("Will upload a fallback cache for master") - upload_ccache(cache_path, s3_helper, 0, temp_path) - ch_helper = ClickHouseHelper() mark_flaky_tests(ch_helper, NAME, test_results) diff --git a/tests/integration/test_backup_restore_on_cluster/test_disallow_concurrency.py b/tests/integration/test_backup_restore_on_cluster/test_disallow_concurrency.py index b04eaed4bb8..1fb1f807245 100644 --- a/tests/integration/test_backup_restore_on_cluster/test_disallow_concurrency.py +++ b/tests/integration/test_backup_restore_on_cluster/test_disallow_concurrency.py @@ -125,19 +125,29 @@ def test_concurrent_backups_on_same_node(): .query(f"BACKUP TABLE tbl ON CLUSTER 'cluster' TO {backup_name} ASYNC") .split("\t")[0] ) - assert_eq_with_retry( - nodes[0], - f"SELECT status FROM system.backups WHERE status == 'CREATING_BACKUP' AND id = '{id}'", - "CREATING_BACKUP", + + status = ( + nodes[0] + .query(f"SELECT status FROM system.backups WHERE id == '{id}'") + .rstrip("\n") ) - assert "Concurrent backups not supported" in nodes[0].query_and_get_error( + assert status in ["CREATING_BACKUP", "BACKUP_CREATED"] + + error = nodes[0].query_and_get_error( f"BACKUP TABLE tbl ON CLUSTER 'cluster' TO {backup_name}" ) + expected_errors = [ + "Concurrent backups not supported", + f"Backup {backup_name} already exists", + ] + assert any([expected_error in error for expected_error in expected_errors]) assert_eq_with_retry( nodes[0], - f"SELECT status FROM system.backups WHERE status == 'BACKUP_CREATED' AND id = '{id}'", + f"SELECT status FROM system.backups WHERE id = '{id}'", "BACKUP_CREATED", + sleep_time=2, + retry_count=50, ) # This restore part is added to confirm creating an internal backup & restore work @@ -161,18 +171,29 @@ def test_concurrent_backups_on_different_nodes(): .query(f"BACKUP TABLE tbl ON CLUSTER 'cluster' TO {backup_name} ASYNC") .split("\t")[0] ) - assert_eq_with_retry( - nodes[1], - f"SELECT status FROM system.backups WHERE status == 'CREATING_BACKUP' AND id = '{id}'", - "CREATING_BACKUP", + + status = ( + nodes[1] + .query(f"SELECT status FROM system.backups WHERE id == '{id}'") + .rstrip("\n") ) - assert "Concurrent backups not supported" in nodes[0].query_and_get_error( + assert status in ["CREATING_BACKUP", "BACKUP_CREATED"] + + error = nodes[0].query_and_get_error( f"BACKUP TABLE tbl ON CLUSTER 'cluster' TO {backup_name}" ) + expected_errors = [ + "Concurrent backups not supported", + f"Backup {backup_name} already exists", + ] + assert any([expected_error in error for expected_error in expected_errors]) + assert_eq_with_retry( nodes[1], - f"SELECT status FROM system.backups WHERE status == 'BACKUP_CREATED' AND id = '{id}'", + f"SELECT status FROM system.backups WHERE id = '{id}'", "BACKUP_CREATED", + sleep_time=2, + retry_count=50, ) @@ -181,22 +202,7 @@ def test_concurrent_restores_on_same_node(): backup_name = new_backup_name() - id = ( - nodes[0] - .query(f"BACKUP TABLE tbl ON CLUSTER 'cluster' TO {backup_name} ASYNC") - .split("\t")[0] - ) - assert_eq_with_retry( - nodes[0], - f"SELECT status FROM system.backups WHERE status == 'CREATING_BACKUP' AND id = '{id}'", - "CREATING_BACKUP", - ) - - assert_eq_with_retry( - nodes[0], - f"SELECT status FROM system.backups WHERE status == 'BACKUP_CREATED' AND id = '{id}'", - "BACKUP_CREATED", - ) + nodes[0].query(f"BACKUP TABLE tbl ON CLUSTER 'cluster' TO {backup_name}") nodes[0].query( f"DROP TABLE tbl ON CLUSTER 'cluster' NO DELAY", @@ -218,22 +224,21 @@ def test_concurrent_restores_on_same_node(): ) assert status in ["RESTORING", "RESTORED"] - concurrent_error = nodes[0].query_and_get_error( + error = nodes[0].query_and_get_error( f"RESTORE TABLE tbl ON CLUSTER 'cluster' FROM {backup_name}" ) - expected_errors = [ "Concurrent restores not supported", "Cannot restore the table default.tbl because it already contains some data", ] - assert any( - [expected_error in concurrent_error for expected_error in expected_errors] - ) + assert any([expected_error in error for expected_error in expected_errors]) assert_eq_with_retry( nodes[0], f"SELECT status FROM system.backups WHERE id == '{restore_id}'", "RESTORED", + sleep_time=2, + retry_count=50, ) @@ -242,22 +247,7 @@ def test_concurrent_restores_on_different_node(): backup_name = new_backup_name() - id = ( - nodes[0] - .query(f"BACKUP TABLE tbl ON CLUSTER 'cluster' TO {backup_name} ASYNC") - .split("\t")[0] - ) - assert_eq_with_retry( - nodes[0], - f"SELECT status FROM system.backups WHERE status == 'CREATING_BACKUP' AND id = '{id}'", - "CREATING_BACKUP", - ) - - assert_eq_with_retry( - nodes[0], - f"SELECT status FROM system.backups WHERE status == 'BACKUP_CREATED' AND id = '{id}'", - "BACKUP_CREATED", - ) + nodes[0].query(f"BACKUP TABLE tbl ON CLUSTER 'cluster' TO {backup_name}") nodes[0].query( f"DROP TABLE tbl ON CLUSTER 'cluster' NO DELAY", @@ -279,20 +269,19 @@ def test_concurrent_restores_on_different_node(): ) assert status in ["RESTORING", "RESTORED"] - concurrent_error = nodes[1].query_and_get_error( + error = nodes[1].query_and_get_error( f"RESTORE TABLE tbl ON CLUSTER 'cluster' FROM {backup_name}" ) - expected_errors = [ "Concurrent restores not supported", "Cannot restore the table default.tbl because it already contains some data", ] - assert any( - [expected_error in concurrent_error for expected_error in expected_errors] - ) + assert any([expected_error in error for expected_error in expected_errors]) assert_eq_with_retry( nodes[0], f"SELECT status FROM system.backups WHERE id == '{restore_id}'", "RESTORED", + sleep_time=2, + retry_count=50, ) diff --git a/tests/queries/0_stateless/00453_cast_enum.sql b/tests/queries/0_stateless/00453_cast_enum.sql index 384db50c7c4..023e7233acf 100644 --- a/tests/queries/0_stateless/00453_cast_enum.sql +++ b/tests/queries/0_stateless/00453_cast_enum.sql @@ -12,6 +12,6 @@ INSERT INTO cast_enums SELECT 2 AS type, toDate('2017-01-01') AS date, number AS SELECT type, date, id FROM cast_enums ORDER BY type, id; -INSERT INTO cast_enums VALUES ('wrong_value', '2017-01-02', 7); -- { clientError 36 } +INSERT INTO cast_enums VALUES ('wrong_value', '2017-01-02', 7); -- { clientError 691 } DROP TABLE IF EXISTS cast_enums; diff --git a/tests/queries/0_stateless/00849_multiple_comma_join_2.sql b/tests/queries/0_stateless/00849_multiple_comma_join_2.sql index db8b27c4d4d..51bf5a2ede1 100644 --- a/tests/queries/0_stateless/00849_multiple_comma_join_2.sql +++ b/tests/queries/0_stateless/00849_multiple_comma_join_2.sql @@ -62,49 +62,49 @@ SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explai --- EXPLAIN QUERY TREE SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2 WHERE t1.a = t2.a); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2 WHERE t1.a = t2.a SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2 WHERE t1.b = t2.b); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2 WHERE t1.b = t2.b SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3 WHERE t1.a = t2.a AND t1.a = t3.a); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3 WHERE t1.a = t2.a AND t1.a = t3.a SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3 WHERE t1.b = t2.b AND t1.b = t3.b); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3 WHERE t1.b = t2.b AND t1.b = t3.b SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t1.a = t3.a AND t1.a = t4.a); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t1.a = t3.a AND t1.a = t4.a SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t1.b = t2.b AND t1.b = t3.b AND t1.b = t4.b); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t1.b = t2.b AND t1.b = t3.b AND t1.b = t4.b SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t2.a = t1.a AND t2.a = t3.a AND t2.a = t4.a); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t2.a = t1.a AND t2.a = t3.a AND t2.a = t4.a SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t3.a = t1.a AND t3.a = t2.a AND t3.a = t4.a); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t3.a = t1.a AND t3.a = t2.a AND t3.a = t4.a SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t4.a = t1.a AND t4.a = t2.a AND t4.a = t3.a); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t4.a = t1.a AND t4.a = t2.a AND t4.a = t3.a SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t2.a = t3.a AND t3.a = t4.a); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 WHERE t1.a = t2.a AND t2.a = t3.a AND t3.a = t4.a SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2, t3, t4 SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1 CROSS JOIN t2 CROSS JOIN t3 CROSS JOIN t4); + EXPLAIN QUERY TREE SELECT t1.a FROM t1 CROSS JOIN t2 CROSS JOIN t3 CROSS JOIN t4 SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2 CROSS JOIN t3); + EXPLAIN QUERY TREE SELECT t1.a FROM t1, t2 CROSS JOIN t3 SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1 JOIN t2 USING a CROSS JOIN t3); + EXPLAIN QUERY TREE SELECT t1.a FROM t1 JOIN t2 USING a CROSS JOIN t3 SETTINGS allow_experimental_analyzer = 1); SELECT countIf(explain like '%COMMA%' OR explain like '%CROSS%'), countIf(explain like '%INNER%') FROM ( - EXPLAIN QUERY TREE SELECT t1.a FROM t1 JOIN t2 ON t1.a = t2.a CROSS JOIN t3); + EXPLAIN QUERY TREE SELECT t1.a FROM t1 JOIN t2 ON t1.a = t2.a CROSS JOIN t3 SETTINGS allow_experimental_analyzer = 1); INSERT INTO t1 values (1,1), (2,2), (3,3), (4,4); INSERT INTO t2 values (1,1), (1, Null); diff --git a/tests/queries/0_stateless/01293_show_clusters.reference b/tests/queries/0_stateless/01293_show_clusters.reference index c62f8cdfa2d..9f8a44ebd0a 100644 --- a/tests/queries/0_stateless/01293_show_clusters.reference +++ b/tests/queries/0_stateless/01293_show_clusters.reference @@ -1,2 +1,3 @@ test_shard_localhost -test_shard_localhost 1 1 1 localhost ::1 9000 1 default +test_cluster_one_shard_two_replicas 1 1 1 127.0.0.1 127.0.0.1 9000 1 default +test_cluster_one_shard_two_replicas 1 1 2 127.0.0.2 127.0.0.2 9000 0 default diff --git a/tests/queries/0_stateless/01293_show_clusters.sh b/tests/queries/0_stateless/01293_show_clusters.sh index 2fdf17ec25e..ae027210383 100755 --- a/tests/queries/0_stateless/01293_show_clusters.sh +++ b/tests/queries/0_stateless/01293_show_clusters.sh @@ -6,4 +6,5 @@ CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) $CLICKHOUSE_CLIENT -q "show clusters like 'test_shard%' limit 1" # cluster,shard_num,shard_weight,replica_num,host_name,host_address,port,is_local,user,default_database[,errors_count,slowdowns_count,estimated_recovery_time] -$CLICKHOUSE_CLIENT -q "show cluster 'test_shard_localhost'" | cut -f-10 +# use a cluster with static IPv4 +$CLICKHOUSE_CLIENT -q "show cluster 'test_cluster_one_shard_two_replicas'" | cut -f-10 diff --git a/tests/queries/0_stateless/01293_show_settings.reference b/tests/queries/0_stateless/01293_show_settings.reference index f11956e1893..f053387d1c5 100644 --- a/tests/queries/0_stateless/01293_show_settings.reference +++ b/tests/queries/0_stateless/01293_show_settings.reference @@ -3,6 +3,6 @@ connect_timeout Seconds 10 connect_timeout_with_failover_ms Milliseconds 2000 connect_timeout_with_failover_secure_ms Milliseconds 3000 external_storage_connect_timeout_sec UInt64 10 +filesystem_prefetch_max_memory_usage UInt64 1073741824 max_untracked_memory UInt64 1048576 memory_profiler_step UInt64 1048576 -filesystem_prefetch_max_memory_usage UInt64 1073741824 diff --git a/tests/queries/0_stateless/01300_group_by_other_keys_having.sql b/tests/queries/0_stateless/01300_group_by_other_keys_having.sql index d209c5f24e3..911f61a62e2 100644 --- a/tests/queries/0_stateless/01300_group_by_other_keys_having.sql +++ b/tests/queries/0_stateless/01300_group_by_other_keys_having.sql @@ -1,5 +1,5 @@ set optimize_group_by_function_keys = 1; - +set allow_experimental_analyzer = 1; -- { echoOn } SELECT round(avg(log(2) * number), 6) AS k FROM numbers(10000000) GROUP BY (number % 2) * (number % 3), number % 3, number % 2 HAVING avg(log(2) * number) > 3465735.3 ORDER BY k; diff --git a/tests/queries/0_stateless/01310_enum_comparison.sql b/tests/queries/0_stateless/01310_enum_comparison.sql index 26901a61b2b..ed63911e698 100644 --- a/tests/queries/0_stateless/01310_enum_comparison.sql +++ b/tests/queries/0_stateless/01310_enum_comparison.sql @@ -3,4 +3,4 @@ INSERT INTO enum VALUES ('hello'); SELECT count() FROM enum WHERE x = 'hello'; SELECT count() FROM enum WHERE x = 'world'; -SELECT count() FROM enum WHERE x = 'xyz'; -- { serverError 36 } +SELECT count() FROM enum WHERE x = 'xyz'; -- { serverError 691 } diff --git a/tests/queries/0_stateless/01323_redundant_functions_in_order_by.reference b/tests/queries/0_stateless/01323_redundant_functions_in_order_by.reference index 91a96eb68a3..bf184d142ec 100644 --- a/tests/queries/0_stateless/01323_redundant_functions_in_order_by.reference +++ b/tests/queries/0_stateless/01323_redundant_functions_in_order_by.reference @@ -65,6 +65,7 @@ QUERY id: 0 SORT id: 12, sort_direction: ASCENDING, with_fill: 0 EXPRESSION COLUMN id: 7, column_name: number, result_type: UInt64, source_id: 8 + SETTINGS allow_experimental_analyzer=1 SELECT groupArray(x) FROM ( @@ -98,6 +99,7 @@ QUERY id: 0 SORT id: 12, sort_direction: ASCENDING, with_fill: 0 EXPRESSION COLUMN id: 7, column_name: number, result_type: UInt64, source_id: 8 + SETTINGS allow_experimental_analyzer=1 SELECT groupArray(x) FROM ( @@ -139,6 +141,7 @@ QUERY id: 0 SORT id: 15, sort_direction: ASCENDING, with_fill: 0 EXPRESSION COLUMN id: 7, column_name: number, result_type: UInt64, source_id: 8 + SETTINGS allow_experimental_analyzer=1 SELECT key, a, @@ -200,6 +203,7 @@ QUERY id: 0 SORT id: 25, sort_direction: ASCENDING, with_fill: 0 EXPRESSION COLUMN id: 26, column_name: key, result_type: UInt64, source_id: 5 + SETTINGS allow_experimental_analyzer=1 SELECT key, a @@ -225,6 +229,7 @@ QUERY id: 0 SORT id: 7, sort_direction: ASCENDING, with_fill: 0 EXPRESSION COLUMN id: 4, column_name: a, result_type: UInt8, source_id: 3 + SETTINGS allow_experimental_analyzer=1 SELECT key, a @@ -257,6 +262,7 @@ QUERY id: 0 LIST id: 11, nodes: 2 COLUMN id: 2, column_name: key, result_type: UInt64, source_id: 3 COLUMN id: 4, column_name: a, result_type: UInt8, source_id: 3 + SETTINGS allow_experimental_analyzer=1 QUERY id: 0 PROJECTION COLUMNS key UInt64 @@ -279,6 +285,7 @@ QUERY id: 0 SORT id: 10, sort_direction: ASCENDING, with_fill: 0 EXPRESSION COLUMN id: 2, column_name: key, result_type: UInt64, source_id: 3 + SETTINGS allow_experimental_analyzer=1 QUERY id: 0 PROJECTION COLUMNS t1.id UInt64 @@ -307,6 +314,7 @@ QUERY id: 0 SORT id: 14, sort_direction: ASCENDING, with_fill: 0 EXPRESSION COLUMN id: 15, column_name: id, result_type: UInt64, source_id: 5 + SETTINGS allow_experimental_analyzer=1 [0,1,2] [0,1,2] [0,1,2] diff --git a/tests/queries/0_stateless/01323_redundant_functions_in_order_by.sql b/tests/queries/0_stateless/01323_redundant_functions_in_order_by.sql index 338c1345052..738ad581e3d 100644 --- a/tests/queries/0_stateless/01323_redundant_functions_in_order_by.sql +++ b/tests/queries/0_stateless/01323_redundant_functions_in_order_by.sql @@ -20,25 +20,25 @@ SELECT key, a FROM test ORDER BY key, a, exp(key + a) SETTINGS allow_experimenta SELECT key, a FROM test ORDER BY key, exp(key + a); SELECT key, a FROM test ORDER BY key, exp(key + a) SETTINGS allow_experimental_analyzer=1; EXPLAIN SYNTAX SELECT groupArray(x) from (SELECT number as x FROM numbers(3) ORDER BY x, exp(x)); -EXPLAIN QUERY TREE run_passes=1 SELECT groupArray(x) from (SELECT number as x FROM numbers(3) ORDER BY x, exp(x)); +EXPLAIN QUERY TREE run_passes=1 SELECT groupArray(x) from (SELECT number as x FROM numbers(3) ORDER BY x, exp(x)) settings allow_experimental_analyzer=1; EXPLAIN SYNTAX SELECT groupArray(x) from (SELECT number as x FROM numbers(3) ORDER BY x, exp(exp(x))); -EXPLAIN QUERY TREE run_passes=1 SELECT groupArray(x) from (SELECT number as x FROM numbers(3) ORDER BY x, exp(exp(x))); +EXPLAIN QUERY TREE run_passes=1 SELECT groupArray(x) from (SELECT number as x FROM numbers(3) ORDER BY x, exp(exp(x))) settings allow_experimental_analyzer=1; EXPLAIN SYNTAX SELECT groupArray(x) from (SELECT number as x FROM numbers(3) ORDER BY exp(x), x); -EXPLAIN QUERY TREE run_passes=1 SELECT groupArray(x) from (SELECT number as x FROM numbers(3) ORDER BY exp(x), x); +EXPLAIN QUERY TREE run_passes=1 SELECT groupArray(x) from (SELECT number as x FROM numbers(3) ORDER BY exp(x), x) settings allow_experimental_analyzer=1; EXPLAIN SYNTAX SELECT * FROM (SELECT number + 2 AS key FROM numbers(4)) s FULL JOIN test t USING(key) ORDER BY s.key, t.key; -EXPLAIN QUERY TREE run_passes=1 SELECT * FROM (SELECT number + 2 AS key FROM numbers(4)) s FULL JOIN test t USING(key) ORDER BY s.key, t.key; +EXPLAIN QUERY TREE run_passes=1 SELECT * FROM (SELECT number + 2 AS key FROM numbers(4)) s FULL JOIN test t USING(key) ORDER BY s.key, t.key settings allow_experimental_analyzer=1; EXPLAIN SYNTAX SELECT key, a FROM test ORDER BY key, a, exp(key + a); -EXPLAIN QUERY TREE run_passes=1 SELECT key, a FROM test ORDER BY key, a, exp(key + a); +EXPLAIN QUERY TREE run_passes=1 SELECT key, a FROM test ORDER BY key, a, exp(key + a) settings allow_experimental_analyzer=1; EXPLAIN SYNTAX SELECT key, a FROM test ORDER BY key, exp(key + a); -EXPLAIN QUERY TREE run_passes=1 SELECT key, a FROM test ORDER BY key, exp(key + a); -EXPLAIN QUERY TREE run_passes=1 SELECT key FROM test GROUP BY key ORDER BY avg(a), key; +EXPLAIN QUERY TREE run_passes=1 SELECT key, a FROM test ORDER BY key, exp(key + a) settings allow_experimental_analyzer=1; +EXPLAIN QUERY TREE run_passes=1 SELECT key FROM test GROUP BY key ORDER BY avg(a), key settings allow_experimental_analyzer=1; DROP TABLE IF EXISTS t1; DROP TABLE IF EXISTS t2; CREATE TABLE t1 (id UInt64) ENGINE = MergeTree() ORDER BY id; CREATE TABLE t2 (id UInt64) ENGINE = MergeTree() ORDER BY id; -EXPLAIN QUERY TREE run_passes=1 SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.id ORDER BY t1.id, t2.id; +EXPLAIN QUERY TREE run_passes=1 SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.id ORDER BY t1.id, t2.id settings allow_experimental_analyzer=1; set optimize_redundant_functions_in_order_by = 0; diff --git a/tests/queries/0_stateless/01402_cast_nullable_string_to_enum.sql b/tests/queries/0_stateless/01402_cast_nullable_string_to_enum.sql index b8b5370515a..1d445412381 100644 --- a/tests/queries/0_stateless/01402_cast_nullable_string_to_enum.sql +++ b/tests/queries/0_stateless/01402_cast_nullable_string_to_enum.sql @@ -5,8 +5,8 @@ SELECT CAST(CAST(NULL AS Nullable(String)) AS Nullable(Enum8('Hello' = 1))); SELECT CAST(CAST(NULL AS Nullable(FixedString(1))) AS Nullable(Enum8('Hello' = 1))); -- empty string still not acceptable -SELECT CAST(CAST('' AS Nullable(String)) AS Nullable(Enum8('Hello' = 1))); -- { serverError 36 } -SELECT CAST(CAST('' AS Nullable(FixedString(1))) AS Nullable(Enum8('Hello' = 1))); -- { serverError 36 } +SELECT CAST(CAST('' AS Nullable(String)) AS Nullable(Enum8('Hello' = 1))); -- { serverError 691 } +SELECT CAST(CAST('' AS Nullable(FixedString(1))) AS Nullable(Enum8('Hello' = 1))); -- { serverError 691 } -- non-Nullable Enum() still not acceptable SELECT CAST(CAST(NULL AS Nullable(String)) AS Enum8('Hello' = 1)); -- { serverError 349 } diff --git a/tests/queries/0_stateless/02481_analyzer_optimize_grouping_sets_keys.sql b/tests/queries/0_stateless/02481_analyzer_optimize_grouping_sets_keys.sql index b51233f734c..de9208ef009 100644 --- a/tests/queries/0_stateless/02481_analyzer_optimize_grouping_sets_keys.sql +++ b/tests/queries/0_stateless/02481_analyzer_optimize_grouping_sets_keys.sql @@ -1,3 +1,5 @@ +set allow_experimental_analyzer = 1; + EXPLAIN QUERY TREE run_passes=1 SELECT avg(log(2) * number) AS k FROM numbers(10000000) GROUP BY GROUPING SETS (((number % 2) * (number % 3)), number % 3, number % 2) diff --git a/tests/queries/0_stateless/02564_query_id_header.reference b/tests/queries/0_stateless/02564_query_id_header.reference index 413e8929f36..fa56fc23e3e 100644 --- a/tests/queries/0_stateless/02564_query_id_header.reference +++ b/tests/queries/0_stateless/02564_query_id_header.reference @@ -20,3 +20,7 @@ DROP TABLE t_query_id_header < Content-Type: text/plain; charset=UTF-8 < X-ClickHouse-Query-Id: query_id < X-ClickHouse-Timezone: timezone +BAD SQL +< Content-Type: text/plain; charset=UTF-8 +< X-ClickHouse-Query-Id: query_id +< X-ClickHouse-Timezone: timezone diff --git a/tests/queries/0_stateless/02564_query_id_header.sh b/tests/queries/0_stateless/02564_query_id_header.sh index 67ddbcfcc46..7184422a030 100755 --- a/tests/queries/0_stateless/02564_query_id_header.sh +++ b/tests/queries/0_stateless/02564_query_id_header.sh @@ -28,3 +28,4 @@ run_and_check_headers "INSERT INTO t_query_id_header VALUES (1)" run_and_check_headers "EXISTS TABLE t_query_id_header" run_and_check_headers "SELECT * FROM t_query_id_header" run_and_check_headers "DROP TABLE t_query_id_header" +run_and_check_headers "BAD SQL" diff --git a/tests/queries/0_stateless/02675_is_ipv6_function_fix.reference b/tests/queries/0_stateless/02675_is_ipv6_function_fix.reference new file mode 100644 index 00000000000..573541ac970 --- /dev/null +++ b/tests/queries/0_stateless/02675_is_ipv6_function_fix.reference @@ -0,0 +1 @@ +0 diff --git a/tests/queries/0_stateless/02675_is_ipv6_function_fix.sql b/tests/queries/0_stateless/02675_is_ipv6_function_fix.sql new file mode 100644 index 00000000000..c28b4a5dc2d --- /dev/null +++ b/tests/queries/0_stateless/02675_is_ipv6_function_fix.sql @@ -0,0 +1 @@ +SELECT isIPv6String('1234::1234:'); \ No newline at end of file diff --git a/tests/queries/0_stateless/02681_aggregation_by_partitions_bug.reference b/tests/queries/0_stateless/02681_aggregation_by_partitions_bug.reference new file mode 100644 index 00000000000..749fce669df --- /dev/null +++ b/tests/queries/0_stateless/02681_aggregation_by_partitions_bug.reference @@ -0,0 +1 @@ +1000000 diff --git a/tests/queries/0_stateless/02681_aggregation_by_partitions_bug.sql b/tests/queries/0_stateless/02681_aggregation_by_partitions_bug.sql new file mode 100644 index 00000000000..32b4b55076b --- /dev/null +++ b/tests/queries/0_stateless/02681_aggregation_by_partitions_bug.sql @@ -0,0 +1,10 @@ +-- Tags: no-random-merge-tree-settings + +set max_threads = 16; + +create table t(a UInt32) engine=MergeTree order by tuple() partition by a % 16; + +insert into t select * from numbers_mt(1e6); + +set allow_aggregate_partitions_independently=1, force_aggregate_partitions_independently=1; +select count(distinct a) from t; diff --git a/tests/queries/0_stateless/02702_allow_skip_errors_enum.reference b/tests/queries/0_stateless/02702_allow_skip_errors_enum.reference new file mode 100644 index 00000000000..f9264f7fbd3 --- /dev/null +++ b/tests/queries/0_stateless/02702_allow_skip_errors_enum.reference @@ -0,0 +1,2 @@ +Hello +World diff --git a/tests/queries/0_stateless/02702_allow_skip_errors_enum.sh b/tests/queries/0_stateless/02702_allow_skip_errors_enum.sh new file mode 100755 index 00000000000..e68f5517d52 --- /dev/null +++ b/tests/queries/0_stateless/02702_allow_skip_errors_enum.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +$CLICKHOUSE_CLIENT --multiquery --query "DROP TABLE IF EXISTS t; CREATE TABLE t (x Enum('Hello' = 1, 'World' = 2)) ENGINE = Memory;" +$CLICKHOUSE_CLIENT --input_format_allow_errors_num 1 --query "INSERT INTO t FORMAT CSV" < #include #include +#include #include #include "Stats.h" @@ -15,6 +16,12 @@ using Ports = std::vector; using Strings = std::vector; +namespace CurrentMetrics +{ + extern const Metric LocalThread; + extern const Metric LocalThreadActive; +} + class Runner { public: @@ -27,7 +34,7 @@ public: bool continue_on_error_, size_t max_iterations_) : concurrency(concurrency_) - , pool(concurrency) + , pool(CurrentMetrics::LocalThread, CurrentMetrics::LocalThreadActive, concurrency) , hosts_strings(hosts_strings_) , generator(getGenerator(generator_name)) , max_time(max_time_)