ClickHouse/docs/changelogs/v21.9.1.8000-prestable.md

72 KiB

sidebar_position sidebar_label
1 2022

2022 Changelog

ClickHouse release v21.9.1.8000-prestable FIXME as compared to v21.8.1.7409-prestable

Backward Incompatible Change

  • Fix the issue that in case of some sophisticated query with column aliases identical to the names of expressions, bad cast may happen. This fixes #25447. This fixes #26914. This fix may introduce backward incompatibility: if there are different expressions with identical names, exception will be thrown. It may break some rare cases when enable_optimize_predicate_expression is set. #26639 (Alexey Milovidov).
  • Under clickhouse-local, always treat local addresses with a port as remote. #26736 (Raúl Marín).
  • Do not allow to apply parametric aggregate function with -Merge combinator to aggregate function state if state was produced by aggregate function with different parameters. For example, state of fooState(42)(x) cannot be finalized with fooMerge(s) or fooMerge(123)(s), parameters must be specified explicitly like fooMerge(42)(s) and must be equal. It does not affect some special aggregate functions like quantile and sequence* that use parameters for finalization only. #26847 (Alexander Tokmakov).
  • Do not output trailing zeros in text representation of Decimal types. Example: 1.23 will be printed instead of 1.230000 for decimal with scale 6. This closes #15794. It may introduce slight incompatibility if your applications somehow relied on the trailing zeros. Serialization in output formats can be controlled with the setting output_format_decimal_trailing_zeros. Implementation of toString and casting to String is changed unconditionally. #27680 (Alexey Milovidov).

New Feature

  • Implement window function nth_value(expr, N) that returns the value of the Nth row of the window frame. #26334 (Zuo, RuoYu).
  • Functions that return (initial_)query_id of the current query. This closes #23682. #26410 (Alexey Boykov).
  • Introduce syntax for here documents. Example SELECT $doc$VALUE$doc$. #26671 (Maksim Kita).
  • New functions currentProfiles(), enabledProfiles(), defaultProfiles(). #26714 (Vitaly Baranov).
  • Add new functions currentRoles(), enabledRoles(), defaultRoles(). #26780 (Vitaly Baranov).
  • Supported cluster macros inside table functions 'cluster' and 'clusterAllReplicas'. #26913 (Vadim Volodin).
  • Added support for custom query for MySQL, PostgreSQL, ClickHouse, JDBC, Cassandra dictionary source. Closes #1270. #26995 (Maksim Kita).
  • add column default_database to system.users. #27054 (kevin wan).
  • Added bitmapSubsetOffsetLimit(bitmap, offset, cardinality_limit) function. It creates a subset of bitmap limit the results to cardinality_limit with offset of offset. #27234 (DHBin).
  • Add support for bzip2 compression method for import/export. Closes #22428. #27377 (Nikolay Degterinsky).
    • Add replicated storage of user, roles, row policies, quotas and settings profiles through ZooKeeper (experimental). #27426 (Kevin Michel).
  • Add "tupleToNameValuePairs", a function that turns a named tuple into an array of pairs. #27505 (Braulio Valdivielso Martínez).
  • Enable using constants from with and select in aggregate function parameters. Close #10945. #27531 (abel-cheng).
  • Added ComplexKeyRangeHashed dictionary. Closes #22029. #27629 (Maksim Kita).

Performance Improvement

  • Compile aggregate functions groupBitOr, groupBitAnd, groupBitXor. #26161 (Maksim Kita).
  • Compile columns with Enum types. #26237 (Maksim Kita).
  • Don't build sets for indices when analyzing a query. #26365 (Raúl Marín).
  • Improve latency of short queries, that require reading from tables with large number of columns. #26371 (Anton Popov).
  • Share file descriptors in concurrent reads of the same files. There is no noticeable performance difference on Linux. But the number of opened files will be significantly (10..100 times) lower on typical servers and it makes operations easier. See #26214. #26768 (Alexey Milovidov).
  • Specialize date time related comparison to achieve better performance. This fixes #27083 . #27122 (Amos Bird).
  • Improve the performance of fast queries when max_execution_time=0 by reducing the number of clock_gettime system calls. #27325 (filimonov).
  • Less number of clock_gettime syscalls that may lead to performance improvement for some types of fast queries. #27492 (filimonov).

Improvement

  • Add error id (like BAD_ARGUMENTS) to exception messages. This closes #25862. #26172 (Alexey Milovidov).
  • Remove GLOBAL keyword for IN when scalar function is passed. In previous versions, if user specified GLOBAL IN f(x) exception was thrown. #26217 (Amos Bird).
  • Apply aggressive IN index analysis for projections so that better projection candidate can be selected. #26218 (Amos Bird).
  • convert timestamp and timestamptz data types to DateTime64 in postgres engine. #26234 (jasine).
  • Check for non-deterministic functions in keys, including constant expressions like now(), today(). This closes #25875. This closes #11333. #26235 (Alexey Milovidov).
  • Don't throw exception when querying system.detached_parts table if there is custom disk configuration and detached directory does not exist on some disks. This closes #26078. #26236 (Alexey Milovidov).
  • Add information about column sizes in system.columns table for Log and TinyLog tables. This closes #9001. #26241 (Nikolay Degterinsky).
  • Added output_format_avro_string_column_pattern setting to put specified String columns to Avro as string instead of default bytes. Implements #22414. #26245 (Ilya Golshtein).
  • Check hash function at table creation, not at sampling. Add settings in MergeTreeSettings, if someone create a table with incorrect sampling column but sampling never be used, disable this settings for starting the server without exception. #26256 (zhaoyu).
  • Make toTimeZone monotonicity when timeZone is a constant value to support partition puring when use sql like:. #26261 (huangzhaowei).
    • When client connect to server, he receives information about all warnings that are already were collected by server. (It can be disabled by using option --no-warnings). #26282 (Filatenkov Artur).
  • Add a setting function_range_max_elements_in_block to tune the safety threshold for data volume generated by function range. This closes #26303. #26305 (Alexey Milovidov).
  • Control the execution period of clear old temporary directories by parameter with default value. #26212. #26313 (fastio).
  • Allow to reuse connections of shards among different clusters. It also avoids creating new connections when using cluster table function. #26318 (Amos Bird).
  • Add events to profile calls to sleep / sleepEachRow. #26320 (Raúl Marín).
  • Save server address in history URLs in web UI if it differs from the origin of web UI. This closes #26044. #26322 (Alexey Milovidov).
  • Add ability to set Distributed directory monitor settings via CREATE TABLE (i.e. CREATE TABLE dist (key Int) Engine=Distributed(cluster, db, table) SETTINGS monitor_batch_inserts=1 and similar). #26336 (Azat Khuzhin).
  • Fix behaviour with non-existing host in user allowed host list. #26368 (ianton-ru).
  • Added comments for the code written in https://github.com/ClickHouse/ClickHouse/pull/24206; the code has been improved in several places. #26377 (Vitaly Baranov).
  • Enable use_hedged_requests setting that allows to mitigate tail latencies on large clusters. #26380 (Alexey Milovidov).
  • Updated protobuf to 3.17.3. Changelogs are available on https://github.com/protocolbuffers/protobuf/releases. #26424 (Ilya Yatsishin).
  • After https://github.com/ClickHouse/ClickHouse/pull/26377. Encryption algorithm now should be specified explicitly if it's not default (aes_128_ctr):. #26465 (Vitaly Baranov).
  • Apply LIMIT on the shards for queries like SELECT * FROM dist ORDER BY key LIMIT 10 w/ distributed_push_down_limit=1. Avoid running Distinct/LIMIT BY steps for queries like SELECT DISTINCT shading_key FROM dist ORDER BY key. Now distributed_push_down_limit is respected by optimize_distributed_group_by_sharding_key optimization. #26466 (Azat Khuzhin).
  • Executable dictionaries (ExecutableDictionarySource, ExecutablePoolDictionarySource) enable creation with DDL query using clickhouse-local. Closes #22355. #26510 (Maksim Kita).
  • Add round-robin support for clickhouse-benchmark (it does not differ from the regular multi host/port run except for statistics report). #26607 (Azat Khuzhin).
  • Improve the high performance machine to use the kafka engine. and it can recuce the query node work load. #26642 (feihengye).
  • Avoid hanging clickhouse-benchmark if connection fails (i.e. on EMFILE). #26656 (Azat Khuzhin).
  • Fix excessive (x2) connect attempts with skip_unavailable_shards. #26658 (Azat Khuzhin).
  • Improve handling of KILL QUERY requests. #26675 (Raúl Marín).
  • SET PROFILE now applies constraints too if they're set for a passed profile. #26730 (Vitaly Baranov).
  • Support multiple keys for encrypted disk. Display error message if the key is probably wrong. (see https://github.com/ClickHouse/ClickHouse/pull/26465#issuecomment-882015970). #26733 (Vitaly Baranov).
  • remove uncessary exception thrown. #26740 (Caspian).
  • Watchdog is disabled in docker by default. Fix for not handling ctrl+c. #26757 (Mikhail f. Shiryaev).
  • Changing default roles affects new sessions only. #26759 (Vitaly Baranov).
  • Less verbose internal RocksDB logs. This closes #26252. #26789 (Alexey Milovidov).
  • Expose rocksdb statistics via system.rocksdb table. Read rocksdb options from ClickHouse config (rocksdb/rocksdb_TABLE keys). #26821 (Azat Khuzhin).
  • Updated extractAllGroupsHorizontal - upper limit on the number of matches per row can be set via optional third argument. ... #26961 (Vasily Nemkov).
  • Now functions can be shard-level constants, which means if it's executed in the context of some distributed table, it generates a normal column, otherwise it produces a constant value. Notable functions are: hostName(), tcpPort(), version(), buildId(), uptime(), etc. #27020 (Amos Bird).
  • Improve compatibility with non-whole-minute timezone offsets. #27080 (Raúl Marín).
  • Enable distributed_push_down_limit by default. #27104 (Azat Khuzhin).
  • Improved the existence condition judgment and empty string node judgment when clickhouse-keeper creates znode. #27125 (小路).
  • Add compression for INTO OUTFILE that automatically choose compression algorithm. Closes #3473. #27134 (Filatenkov Artur).
  • add a new metric called MaxPushedDDLEntryID which is the maximum ddl entry id that current node push to zookeeper. #27174 (Fuwang Hu).
  • Allow to pass query settings via server URI in Web UI. #27177 (kolsys).
  • Added columns replica_is_active that maps replica name to is replica active status to table system.replicas. Closes #27138. #27180 (Maksim Kita).
  • Try recording query_kind even when query fails to start. #27182 (Amos Bird).
  • Mark window functions as ready for general use. Remove the allow_experimental_window_functions setting. #27184 (Alexander Kuzmenkov).
  • Memory client in client. #27191 (Filatenkov Artur).
  • Support schema for postgres database engine. Closes #27166. #27198 (Kseniia Sumarokova).
  • Split global mutex into individual regexp construction. This helps avoid huge regexp construction blocking other related threads. Not sure how to proper test the improvement. #27211 (Amos Bird).
  • Add 10 seconds cache for S3 proxy resolver. #27216 (ianton-ru).
  • Add new index data skipping minmax index format for proper Nullable support. #27250 (Azat Khuzhin).
  • Memory consumed by bitmap aggregate functions now is taken into account for memory limits. This closes #26555. #27252 (Alexey Milovidov).
  • Add two settings max_hyperscan_regexp_length and max_hyperscan_regexp_total_length to prevent huge regexp being used in hyperscan related functions, such as multiMatchAny. #27378 (Amos Bird).
  • Add setting log_formatted_queries to log additional formatted query into system.query_log. It's useful for normalized query analysis because functions like normalizeQuery and normalizeQueryKeepNames don't parse/format queries in order to achieve better performance. #27380 (Amos Bird).
  • Add Cast function for internal usage, which will not preserve type nullability, but non-internal cast will preserve according to setting cast_keep_nullable. Closes #12636. #27382 (Kseniia Sumarokova).
  • Send response with error message if HTTP port is not set and user tries to send HTTP request to TCP port. #27385 (Braulio Valdivielso Martínez).
  • Use bytes instead of strings for binary data in the GRPC protocol. #27431 (Vitaly Baranov).
  • Log client IP address if authentication fails. #27514 (Misko Lee).
  • Disable arrayJoin on partition expressions. #27648 (Raúl Marín).
  • Enables query parameters to be passed in the body of http requests. #27706 (Hermano Lustosa).
  • Remove duplicate index analysis and avoid possible invalid limit checks during projection analysis. #27742 (Amos Bird).

Bug Fix

  • Fix potential crash if more than one untuple expression is used. #26179 (Alexey Milovidov).
  • Remove excessive newline in thread_name column in system.stack_trace table. This fixes #24124. #26210 (Alexey Milovidov).
  • Fix logical error on join with totals, close #26017. #26250 (Vladimir C).
  • Fix zstd decompression in case there are escape sequences at the end of internal buffer. Closes #26013. #26314 (Kseniia Sumarokova).
  • Fixed rare bug in lost replica recovery that may cause replicas to diverge. #26321 (Alexander Tokmakov).
  • Fix optimize_distributed_group_by_sharding_key for multiple columns (leads to incorrect result w/ optimize_skip_unused_shards=1/allow_nondeterministic_optimize_skip_unused_shards=1 and multiple columns in sharding key expression). #26353 (Azat Khuzhin).
  • Fix possible crash when login as dropped user. This PR fixes #26073. #26363 (Vitaly Baranov).
  • Fix infinite non joined block stream in partial_merge_join close #26325. #26374 (Vladimir C).
  • Now, scalar subquery always returns Nullable result if it's type can be Nullable. It is needed because in case of empty subquery it's result should be Null. Previously, it was possible to get error about incompatible types (type deduction does not execute scalar subquery, and it could use not-nullable type). Scalar subquery with empty result which can't be converted to Nullable (like Array or Tuple) now throws error. Fixes #25411. #26423 (Nikolai Kochetov).
  • Fix some fuzzed msan crash. Fixes #22517. #26428 (Nikolai Kochetov).
  • Fix broken name resolution after rewriting column aliases. This fixes #26432. #26475 (Amos Bird).
  • Fix issues with CREATE DICTIONARY query if dictionary name or database name was quoted. Closes #26491. #26508 (Maksim Kita).
  • Fix crash in rabbitmq shutdown in case rabbitmq setup was not started. Closes #26504. #26529 (Kseniia Sumarokova).
  • Update chown cmd check in clickhouse-server docker entrypoint. It fixes the bug that cluster pod restart failed (or timeout) on kubernetes. #26545 (Ky Li).
  • Fix incorrect function names of groupBitmapAnd/Or/Xor. This fixes. #26557 (Amos Bird).
  • Fix history file conversion if file is empty. #26589 (Azat Khuzhin).
  • Fix potential nullptr dereference in window functions. This fixes #25276. #26668 (Alexander Kuzmenkov).
  • ParallelFormattingOutputFormat: Use mutex to handle the join to the collector_thread (https://github.com/ClickHouse/ClickHouse/issues/26694). #26703 (Raúl Marín).
  • Sometimes SET ROLE could work incorrectly, this PR fixes that. #26707 (Vitaly Baranov).
  • Do not remove data on ReplicatedMergeTree table shutdown to avoid creating data to metadata inconsistency. #26716 (nvartolomei).
  • Add event_time_microseconds value for REMOVE_PART in system.part_log. In previous versions is was not set. #26720 (Azat Khuzhin).
  • Aggregate function parameters might be lost when applying some combinators causing exceptions like Conversion from AggregateFunction(topKArray, Array(String)) to AggregateFunction(topKArray(10), Array(String)) is not supported. It's fixed. Fixes #26196 and #26433. #26814 (Alexander Tokmakov).
  • Fix library-bridge ids load. #26834 (Kseniia Sumarokova).
  • Fix error Missing columns: 'xxx' when DEFAULT column references other non materialized column without DEFAULT expression. Fixes #26591. #26900 (alesapin).
  • Fix reading of custom TLDs (stops processing with lower buffer or bigger file). #26948 (Azat Khuzhin).
  • Fix "Unknown column name" error with multiple JOINs in some cases, close #26899. #26957 (Vladimir C).
  • Now partition ID in queries like ALTER TABLE ... PARTITION ID xxx validates for correctness. Fixes #25718. #26963 (alesapin).
  • [RFC] Fix possible mutation stack due to race with DROP_RANGE. #27002 (Azat Khuzhin).
  • Fixed cache, complex_key_cache, ssd_cache, complex_key_ssd_cache configuration parsing. Options allow_read_expired_keys, max_update_queue_size, update_queue_push_timeout_milliseconds, query_wait_timeout_milliseconds were not parsed for dictionaries with non cache type. #27032 (Maksim Kita).
  • Fix synchronization in GRPCServer This PR fixes #27024. #27064 (Vitaly Baranov).
  • In rare cases system.detached_parts table might contain incorrect information for some parts, it's fixed. Fixes #27114. #27183 (Alexander Tokmakov).
  • Fix on-disk format breakage for secondary indices over Nullable column (no stable release had been affected). #27197 (Azat Khuzhin).
  • Fix column structure in merge join, close #27091. #27217 (Vladimir C).
  • In case of ambiguity, lambda functions prefer its arguments to other aliases or identifiers. #27235 (Raúl Marín).
  • Fix mutation stuck on invalid partitions in non-replicated MergeTree. #27248 (Azat Khuzhin).
  • Fix distributed_group_by_no_merge=2+distributed_push_down_limit=1 or optimize_distributed_group_by_sharding_key=1 with LIMIT BY and LIMIT OFFSET. #27249 (Azat Khuzhin).
  • Fix errors like Expected ColumnLowCardinality, gotUInt8 or Bad cast from type DB::ColumnVector<char8_t> to DB::ColumnLowCardinality for some queries with LowCardinality in PREWHERE. Fixes #23515. #27298 (Nikolai Kochetov).
  • Fix Cannot find column error for queries with sampling. Was introduced in #24574. Fixes #26522. #27301 (Nikolai Kochetov).
  • Fix Mysql protocol when using parallel formats (CSV / TSV). #27326 (Raúl Marín).
  • Fixed incorrect validation of partition id for MergeTree tables that created with old syntax. #27328 (Alexander Tokmakov).
  • Fix incorrect result for query with row-level security, prewhere and LowCardinality filter. Fixes #27179. #27329 (Nikolai Kochetov).
  • /proc/info contains metrics like. #27361 (Mike Kot).
  • Fix distributed queries with zero shards and aggregation. #27427 (Azat Khuzhin).
  • fix metric BackgroundMessageBrokerSchedulePoolTask, maybe mistyped。. #27452 (Ben).
  • Fix crash during projection materialization when some parts contain missing columns. This fixes #27512. #27528 (Amos Bird).
  • Fixed underflow of the time value when constructing it from components. Closes #27193. #27605 (Vasily Nemkov).
  • After setting max_memory_usage* to non-zero value it was not possible to reset it back to 0 (unlimited). It's fixed. #27638 (Alexander Tokmakov).
  • Fixed another case of Unexpected merged part ... intersecting drop range ... error. #27656 (Alexander Tokmakov).
  • Fix postgresql table function resulting in non-closing connections. Closes #26088. #27662 (Kseniia Sumarokova).
  • Fix bad type cast when functions like arrayHas are applied to arrays of LowCardinality of Nullable of different non-numeric types like DateTime and DateTime64. In previous versions bad cast occurs. In new version it will lead to exception. This closes #26330. #27682 (Alexey Milovidov).
  • Fix column filtering with union distinct in subquery. Closes #27578. #27689 (Kseniia Sumarokova).
  • After https://github.com/ClickHouse/ClickHouse/pull/26384. To execute GRANT WITH REPLACE OPTION now the current user should have GRANT OPTION for access rights it's going to grant AND for access rights it's going to revoke. #27701 (Vitaly Baranov).
  • After https://github.com/ClickHouse/ClickHouse/pull/25687. Add backquotes for the default database shown in CREATE USER. #27702 (Vitaly Baranov).
  • Remove duplicated source files in CMakeLists.txt in arrow-cmake. #27736 (李扬).
  • Fix possible crash when asynchronous connection draining is enabled and hedged connection is disabled. #27774 (Amos Bird).
  • Prevent crashes for some formats when NULL (tombstone) message was coming from Kafka. Closes #19255. #27794 (filimonov).
  • Fix a rare bug in DROP PART which can lead to the error Unexpected merged part intersects drop range. #27807 (alesapin).
  • Fix a couple of bugs that may cause replicas to diverge. #27808 (Alexander Tokmakov).

Build/Testing/Packaging Improvement

Other

NO CL ENTRY

  • NO CL ENTRY: 'Modify code comments'. #26265 (xiedeyantu).
  • NO CL ENTRY: 'Revert "Datatype Date32, support range 1925 to 2283"'. #26352 (Alexey Milovidov).
  • NO CL ENTRY: 'Fix CURR_DATABASE empty for 01034_move_partition_from_table_zookeeper.sh'. #27164 (小路).
  • NO CL ENTRY: 'DOCSUP-12413: macros support in functions cluster and clusterAllReplicas'. #27759 (olgarev).
  • NO CL ENTRY: 'Revert "less sys calls #2: make vdso work again"'. #27829 (Alexey Milovidov).
  • NO CL ENTRY: 'Revert "Do not miss exceptions from the ThreadPool"'. #27844 (Alexey Milovidov).

NOT FOR CHANGELOG / INSIGNIFICANT