ClickHouse/docs/changelogs/v21.6.1.6891-prestable.md

63 KiB

sidebar_position sidebar_label
1 2022

2022 Changelog

ClickHouse release v21.6.1.6891-prestable FIXME as compared to v21.5.1.6601-prestable

New Feature

  • Add projection support for MergeTree* tables. #20202 (Amos Bird).
  • Add back indexHint function. This is for #21238 . This reverts https://github.com/ClickHouse/ClickHouse/pull/9542 . This fixes #9540 . #21304 (Amos Bird).
    1. Add aggregate function sumCount. This function returns a tuple of two fields: sum and count. #21337 (hexiaoting).
  • Added less secure IMDS credentials provider for S3 which works under docker correctly. #21852 (Vladimir Chebotarev).
    • New aggregate function deltaSumTimestamp for summing the difference between consecutive rows while maintaining ordering during merge by storing timestamps. #21888 (Russ Frank).
    • LDAP: implemented user DN detection functionality to use when mapping Active Directory groups to ClickHouse roles. #22228 (Denis Glazachev).
  • Introduce a new function: arrayProduct which accept an array as the parameter, and return the product of all the elements in array. Close issue: #21613. #22242 (hexiaoting).
  • Add setting indexes (boolean, disabled by default) to EXPLAIN PIPELINE query. When enabled, shows used indexes, number of filtered parts and granules for every index applied. Supported for MergeTree* tables. #22352 (Nikolai Kochetov).
  • Add setting json (boolean, 0 by default) for EXPLAIN PLAN query. When enabled, query output will be a single JSON row. It is recommended to use TSVRaw format to avoid unnecessary escaping. #23082 (Nikolai Kochetov).
  • Added SYSTEM QUERY RELOAD MODEL, SYSTEM QUERY RELOAD MODELS. Closes #18722. #23182 (Maksim Kita).
  • Made progress bar for LocalServer and united it for Client and Local. #23196 (Egor Savin).
  • Support DDL dictionaries for DatabaseMemory. Closes #22354. Added support for DETACH DICTIONARY PERMANENTLY. Added support for EXCHANGE DICTIONARIES for Atomic database engine. Added support for moving dictionaries between databases using RENAME DICTIONARY. #23436 (Maksim Kita).
  • Allow globs {...}, which act like shards, and failover options with separator | for URL table function. Closes #17181. #23446 (Kseniia Sumarokova).
  • If insert_null_as_default = 1, insert default values instead of NULL in INSERT ... SELECT and INSERT ... SELECT ... UNION ALL ... queries. Closes #22832. #23524 (Kseniia Sumarokova).
  • Implement table comments. closes #23225. #23548 (flynn).
  • Introduce a new function: arrayProduct which accept an array as the parameter, and return the product of all the elements in array. Closes #21613. #23782 (Maksim Kita).
  • Add postgres-like cast operator (::). E.g.: [1, 2]::Array(UInt8), 0.1::Decimal(4, 4), number::UInt16. #23871 (Anton Popov).
  • ... #23910 (Xiang Zhou).
  • Add function splitByRegexp. #24077 (abel-cheng).
  • Add thread_name column in system.stack_trace. This closes #23256. #24124 (abel-cheng).

Performance Improvement

  • Enable compile_expressions setting by default. When this setting enabled, compositions of simple functions and operators will be compiled to native code with LLVM at runtime. #8482 (Alexey Milovidov).
  • ORC input format reading by stripe instead of reading entire table into memory by once which is cost memory when file size is huge. #23102 (Chao Ma).
  • Update re2 library. Performance of regular expressions matching is improved. Also this PR adds compatibility with gcc-11. #24196 (Raúl Marín).

Improvement

  • Support Array data type for inserting and selecting data in Arrow, Parquet and ORC formats. #21770 (taylor12805).
  • Add settings external_storage_max_read_rows and external_storage_max_read_rows for MySQL table engine, dictionary source and MaterializeMySQL minor data fetches. #22697 (TCeason).
  • Retries on HTTP connection drops in S3. #22988 (Vladimir Chebotarev).
  • Fix the case when a progress bar in interactive mode in clickhouse-client that appear in the middle of the data may rewrite some parts of visible data in terminal. This closes #19283. #23050 (Alexey Milovidov).
  • Added possibility to restore MergeTree parts to 'detached' directory for DiskS3. #23112 (Pavel Kovalenko).
  • Skip unavaiable replicas when writing to distributed tables. #23152 (Amos Bird).
  • Support LowCardinality nullability with join_use_nulls, close #15101. #23237 (Vladimir C).
  • Disable settings use_hedged_requests and async_socket_for_remote because there is an evidence that it may cause issues. #23261 (Alexey Milovidov).
  • Fixed quantile(s)TDigest. Added special handling of singleton centroids according to tdunning/t-digest 3.2+. Also a bug with over-compression of centroids in implementation of earlier version of the algorithm was fixed. #23314 (Vladimir Chebotarev).
  • Allow user to specify empty string instead of database name for MySQL storage. Default database will be used for queries. In previous versions it was working for SELECT queries and not support for INSERT was also added. This closes #19281. This can be useful working with Sphinx or other MySQL-compatible foreign databases. #23319 (Alexey Milovidov).
  • Disable min_bytes_to_use_mmap_io by default. #23322 (Azat Khuzhin).
  • If user applied a misconfiguration by mistakenly setting max_distributed_connections to value zero, every query to a Distributed table will throw exception with a message containing "logical error". But it's really an expected behaviour, not a logical error, so the exception message was slightly incorrect. It also triggered checks in our CI enviroment that ensures that no logical errors ever happen. Instead we will treat max_distributed_connections misconfigured to zero as the minimum possible value (one). #23348 (Azat Khuzhin).
  • Keep default timezone on DateTime operations if it was not provided explicitly. For example, if you add one second to a value of DateTime type without timezone it will remain DateTime without timezone. In previous versions the value of default timezone was placed to the returned data type explicitly so it becomes DateTime('something'). This closes #4854. #23392 (Alexey Milovidov).
  • Previously, MySQL 5.7.9 was not supported due to SQL incompatibility. Now leave MySQL parameter verification to the MaterializeMySQL. #23413 (TCeason).
  • Possibility to change S3 disk settings in runtime via new SYSTEM RESTART DISK SQL command. #23429 (Pavel Kovalenko).
  • Respect lock_acquire_timeout_for_background_operations for OPTIMIZE. #23623 (Azat Khuzhin).
  • Make big integers production ready. Add support for UInt128 data type. Fix known issues with the Decimal256 data type. Support big integers in dictionaries. Support gcd/lcm functions for big integers. Support big integers in array search and conditional functions. Support LowCardinality(UUID). Support big integers in generateRandom table function and clickhouse-obfuscator. Fix error with returning UUID from scalar subqueries. This fixes #7834. This fixes #23936. This fixes #4176. This fixes #24018. This fixes #17828. Backward incompatible change: values of UUID type cannot be compared with integer. For example, instead of writing uuid != 0 type uuid != '00000000-0000-0000-0000-000000000000'. #23631 (Alexey Milovidov).
  • Add _partition_value virtual column to MergeTree table family. It can be used to prune partition in a deterministic way. It's needed to implement partition matcher for mutations. #23673 (Amos Bird).
  • Enable async_socket_for_remote by default. #23683 (Nikolai Kochetov).
  • When there is some ReplicatedMergeTree tables whose zookeeper is expired, it will throw the error below when we select the meta data of some table from system.tables with select_sequential_consistency is enabled: Session expired (Session expired): While executing Tables. #23793 (Fuwang Hu).
  • Added region parameter for S3 storage and disk. #23846 (Vladimir Chebotarev).
  • Allow configuring different log levels for different logging channels. Closes #19569. #23857 (filimonov).
  • Add broken_data_files/broken_data_compressed_bytes into system.distribution_queue. Add metric for number of files for asynchronous insertion into Distributed tables that has been marked as broken (BrokenDistributedFilesToInsert). #23885 (Azat Khuzhin).
  • Allow to add specific queue settings via table settng rabbitmq_queue_settings_list. (Closes #23737 and #23918). Allow user to control all RabbitMQ setup: if table setting rabbitmq_queue_consume is set to 1 - RabbitMQ table engine will only connect to specified queue and will not perform any RabbitMQ consumer-side setup like declaring exchange, queues, bindings. (Closes #21757). Add proper cleanup when RabbitMQ table is dropped - delete queues, which the table has declared and all bound exchanges - if they were created by the table. #23887 (Kseniia Sumarokova).
  • Measure found rate (the percentage for which the value was found) for dictionaries (see found_rate in system.dictionaries). #23916 (Azat Khuzhin).
  • Add hints for Enum names. Closes #17112. #23919 (flynn).
  • Add support for HTTP compression (determined by Content-Encoding HTTP header) in http dictionary source. This fixes #8912. #23946 (Filatenkov Artur).
  • Preallocate support for hashed/sparse_hashed dictionaries. #23979 (Azat Khuzhin).
  • Support specifying table schema for postgresql dictionary source. Closes #23958. #23980 (Kseniia Sumarokova).
  • Log information about OS name, kernel version and CPU architecture on server startup. #23988 (Azat Khuzhin).
  • enable DateTime64 to be a version column in ReplacingMergeTree. #23992 (kevin wan).
  • Add support for ORDER BY WITH FILL with DateTime64. #24016 (kevin wan).
  • Now prefer_column_name_to_alias = 1 will also favor column names for group by, having and order by. This fixes #23882. #24022 (Amos Bird).
  • Do not acquire lock for total_bytes/total_rows for Buffer engine. #24066 (Azat Khuzhin).
  • Flush Buffer tables before shutting down tables (within one database), to avoid discarding blocks due to underlying table had been already detached (and Destination table default.a_data_01870 doesn't exist. Block of data is discarded error in the log). #24067 (Azat Khuzhin).
  • Preserve dictionaries until storage shutdown (this will avoid possible external dictionary 'DICT' not found errors at server shutdown during final Buffer flush). #24068 (Azat Khuzhin).
  • Update zstd to v1.5.0. #24135 (Raúl Marín).
  • Fix crash when memory allocation fails in simdjson. https://github.com/simdjson/simdjson/pull/1567 . Mark as improvement because it's a rare bug. #24147 (Amos Bird).

Bug Fix

  • This PR fixes a crash on shutdown which happened because of currentConnections() could return zero while some connections were still alive. #23154 (Vitaly Baranov).
  • QueryAliasVisitor to prefer alias for ASTWithAlias if subquery was optimized to constant. Fixes #22924. Fixes #10401. #23191 (Maksim Kita).
  • Fixed Not found column error when selecting from MaterializeMySQL with condition on key column. Fixes #22432. #23200 (Alexander Tokmakov).
  • Fixed the behavior when disabling input_format_with_names_use_header setting discards all the input with CSVWithNames format. This fixes #22406. #23202 (Nikita Mikhaylov).
  • Add type conversion for optimize_skip_unused_shards_rewrite_in (fixes use-of-uninitialized-value with optimize_skip_unused_shards_rewrite_in). #23219 (Azat Khuzhin).
  • Fixed simple key dictionary from DDL creation if primary key is not first attribute. Fixes #23236. #23262 (Maksim Kita).
  • Fixed very rare (distributed) race condition between creation and removal of ReplicatedMergeTree tables. It might cause exceptions like node doesn't exist on attempt to create replicated table. Fixes #21419. #23294 (Alexander Tokmakov).
  • Fixed very rare race condition on background cleanup of old blocks. It might cause a block not to be deduplicated if it's too close to the end of deduplication window. #23301 (Alexander Tokmakov).
  • Fix possible crash in case if unknown packet was received form remote query (with async_socket_for_remote enabled). Maybe fixes #21167. #23309 (Nikolai Kochetov).
  • Don't relax NOT conditions during partition pruning. This fixes #23305 and #21539. #23310 (Amos Bird).
  • Fix possible Block structure mismatch error for queries with UNION which could possibly happen after filter-push-down optimization. Fixes #23029. #23359 (Nikolai Kochetov).
  • Fix incompatible constant expression generation during partition pruning based on virtual columns. This fixes https://github.com/ClickHouse/ClickHouse/pull/21401#discussion_r611888913. #23366 (Amos Bird).
  • ORDER BY with COLLATE was not working correctly if the column is in primary key (or is a monotonic function of it) and the setting optimize_read_in_order is not turned off. This closes #22379. Workaround for older versions: turn the setting optimize_read_in_order off. #23375 (Alexey Milovidov).
  • Remove support for argMin and argMax for single Tuple argument. The code was not memory-safe. The feature was added by mistake and it is confusing for people. These functions can be reintroduced under different names later. This fixes #22384 and reverts #17359. #23393 (Alexey Milovidov).
  • Allow to move more conditions to PREWHERE as it was before version 21.1. Insufficient number of moved condtions could lead to worse performance. #23397 (Anton Popov).
  • Kafka storage may support parquet format messages. #23412 (Chao Ma).
  • Kafka storage may support arrow and arrowstream format messages. #23415 (Chao Ma).
  • Fixed Cannot unlink file error on unsuccessful creation of ReplicatedMergeTree table with multidisk configuration. This closes #21755. #23433 (Alexander Tokmakov).
    • Bug fix for deltaSum aggregate function in counter reset case ... #23437 (Russ Frank).
  • Fix bug that does not allow cast from empty array literal, to array with dimensions greater than 1. Closes #14476. #23456 (Maksim Kita).
  • Fix corner cases in vertical merges with ReplacingMergeTree. In rare cases they could lead to fails of merges with exceptions like Incomplete granules are not allowed while blocks are granules size. #23459 (Anton Popov).
  • When modify column's default value without datatype, and this column is used as ReplacingMergeTree's parameter like column b in the below example, then the server will core dump: CREATE TABLE alter_test (a Int32, b DateTime) ENGINE = ReplacingMergeTree(b) ORDER BY a; ALTER TABLE alter_test MODIFY COLUMN `b` DEFAULT now(); the sever throw error: 2021.04.22 09:48:00.685317 [ 2607 ] {} <Trace> BaseDaemon: Received signal 11 2021.04.22 09:48:00.686110 [ 2705 ] {} <Fatal> BaseDaemon: ######################################## 2021.04.22 09:48:00.686336 [ 2705 ] {} <Fatal> BaseDaemon: (version 21.6.1.1, build id: 6459E84DFCF8E778546C5AD2FFE91B3AD71E1B1B) (from thread 2619) (no query) Received signal Segmentation fault (11) 2021.04.22 09:48:00.686572 [ 2705 ] {} <Fatal> BaseDaemon: Address: NULL pointer. Access: read. Address not mapped to object. 2021.04.22 09:48:00.686686 [ 2705 ] {} <Fatal> BaseDaemon: Stack trace: 0x1c2585d7 0x1c254f66 0x1bb7e403 0x1bb58923 0x1bb56a85 0x1c6840ef 0x1c691148 0x2061a05c 0x2061a8e4 0x20775a03 0x207722bd 0x20771048 0x7f6e5c25be25 0x7f6e5bd81bad 2021.04.22 09:48:02.283045 [ 2705 ] {} <Fatal> BaseDaemon: 4. /mnt/disk4/hewenting/ClickHouse/src/src/Storages/MergeTree/MergeTreeData.cpp:1449: DB::(anonymous namespace)::checkVersionColumnTypesConversion(DB::IDataType const*, DB::IDataType const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) @ 0x1c2585d7 in /mnt/disk4/hewenting/ClickHouse/build-dbgsrc-clang-dev-nested/programs/clickhouse-server 2021.04.22 09:48:03.714451 [ 2705 ] {} <Fatal> BaseDaemon: 5. /mnt/disk4/hewenting/ClickHouse/src/src/Storages/MergeTree/MergeTreeData.cpp:1582: DB::MergeTreeData::checkAlterIsPossible(DB::AlterCommands const&, std::__1::shared_ptr<DB::Context>) const @ 0x1c254f66 in /mnt/disk4/hewenting/ClickHouse/build-dbgsrc-clang-dev-nested/programs/clickhouse-server 2021.04.22 09:48:04.692949 [ 2705 ] {} <Fatal> BaseDaemon: 6. /mnt/disk4/hewenting/ClickHouse/src/src/Interpreters/InterpreterAlterQuery.cpp:144: DB::InterpreterAlterQuery::execute() @ 0x1bb7e403 in /mnt/disk4/hewenting/ClickHouse/build-dbgsrc-clang-dev-nested/programs/clickhouse-server. #23483 (hexiaoting).
  • Fix columns function when multiple joins in select query. Closes #22736. #23501 (Maksim Kita).
  • Fix restart / stop command hanging. Closes #20214. #23552 (filimonov).
  • Fix misinterpretation of some LIKE expressions with escape sequences. #23610 (Alexey Milovidov).
  • Fixed server fault when inserting data through HTTP caused an exception. This fixes #23512. #23643 (Nikita Mikhaylov).
  • Added an exception in case of completely the same values in both samples in aggregate function mannWhitneyUTest. This fixes #23646. #23654 (Nikita Mikhaylov).
  • Fixed a bug in recovery of staled ReplicatedMergeTree replica. Some metadata updates could be ignored by staled replica if ALTER query was executed during downtime of the replica. #23742 (Alexander Tokmakov).
  • Avoid possible "Cannot schedule a task" error (in case some exception had been occurred) on INSERT into Distributed. #23744 (Azat Khuzhin).
  • Fix heap_use_after_free when reading from hdfs if Values format is used. #23761 (Kseniia Sumarokova).
  • Fix crash when PREWHERE and row policy filter are both in effect with empty result. #23763 (Amos Bird).
  • Fixed remote JDBC bridge timeout connection issue. Closes #9609. #23771 (Maksim Kita).
  • Fix CLEAR COLUMN does not work when it is referenced by materialized view. Close #23764. #23781 (flynn).
  • Fix error Can't initialize pipeline with empty pipe for queries with GLOBAL IN/JOIN and use_hedged_requests. Fixes #23431. #23805 (Nikolai Kochetov).
  • Better handling of URI's in PocoHTTPClient. Fixed bug with URLs containing + symbol, data with such keys could not be read previously. #23822 (Vladimir Chebotarev).
  • HashedDictionary complex key update field initial load fix. Closes #23800. #23824 (Maksim Kita).
  • Better handling of HTTP errors in PocoHTTPClient. Response bodies of HTTP errors were being ignored earlier. #23844 (Vladimir Chebotarev).
  • Fix distributed_group_by_no_merge=2 with GROUP BY and aggregate function wrapped into regular function (had been broken in #23546). Throw exception in case of someone trying to use distributed_group_by_no_merge=2 with window functions. Disable optimize_distributed_group_by_sharding_key for queries with window functions. #23906 (Azat Khuzhin).
  • Fix implementation of connection pool of PostgreSQL engine. Closes #23897. #23909 (Kseniia Sumarokova).
  • Fix keys metrics accounting for CACHE() dictionary with duplicates in the source (leads to DictCacheKeysRequestedMiss overflows). #23929 (Azat Khuzhin).
  • Fix SIGSEGV for external GROUP BY and overflow row (i.e. queries like SELECT FROM GROUP BY WITH TOTALS SETTINGS max_bytes_before_external_group_by>0, max_rows_to_group_by>0, group_by_overflow_mode='any', totals_mode='before_having'). #23962 (Azat Khuzhin).
  • Some ALTER PARTITION queries might cause Part A intersects previous part B and Unexpected merged part C intersecting drop range D errors in replication queue. It's fixed. Fixes #23296. #23997 (Alexander Tokmakov).
  • Fix crash in MergeJoin, close #24010. #24013 (Vladimir C).
  • now64() supports optional timezone argument ... #24091 (Vasily Nemkov).
  • Fixed using const DateTime value vs DateTime64 column in WHERE. ... #24100 (Vasily Nemkov).
  • Bug: explain pipeline withselect xxx finalshows wrong pipeline: ``` dell123 :) explain pipeline select z from prewhere_move_select_final final;. #24116 (hexiaoting).
  • Fix a rare bug that could lead to a partially initialized table that can serve write requests (insert/alter/so on). Now such tables will be in readonly mode. #24122 (alesapin).
  • Fix race condition which could happen in RBAC under a heavy load. This PR fixes #24090, #24134,. #24176 (Vitaly Baranov).
  • Update nested column with const condition will make server crash. ``` CREATE TABLE test_wide_nested ( id Int, info.id Array(Int), info.name Array(String), info.age Array(Int) ) ENGINE = MergeTree ORDER BY tuple() SETTINGS min_bytes_for_wide_part = 0; set mutations_sync = 1;. #24183 (hexiaoting).
  • Fix abnormal server termination due to hdfs becoming not accessible during query execution. Closes #24117. #24191 (Kseniia Sumarokova).
  • Fix wrong typo at StorageMemory, this bug was introduced at #15127, now fixed, Closes #24192. #24193 (张中南).

Build/Testing/Packaging Improvement

  • Adding Map type tests in TestFlows. #21087 (vzakaznikov).
  • Testflows tests for DateTime64 Extended Range. #22729 (Andrey Zvonov).
  • CMake will be failed with settings as bellow -DENABLE_CASSANDRA=OFF -DENABLE_AMQPCPP=ON ... #22984 (Ben).
  • Add simple tool for benchmarking [Zoo]Keeper. #23038 (alesapin).
  • Remove a source of nondeterminism from build. Now builds at different point of time will produce byte-identical binaries. Partially addressed #22113. #23559 (Alexey Milovidov).
  • Avoid possible build dependency on locale and filesystem order. This allows reproducible builds. #23600 (Alexey Milovidov).
  • Always enable asynchronous-unwind-tables explicitly. It may fix query profiler on AArch64. #23602 (Alexey Milovidov).
  • Fix Memory Sanitizer report in GRPC library. This closes #19234. #23615 (Alexey Milovidov).
  • Window functions tests in TestFlows. #23704 (vzakaznikov).
  • Adds support for building on Solaris-derived operating systems. #23746 (bnaecker).
  • Update librdkafka 1.6.0-RC3 to 1.6.1. #23874 (filimonov).
  • Enabling running of all TestFlows modules in parallel. #23942 (vzakaznikov).
  • Fixing window functions distributed tests by moving to a deterministic sharding key. #23975 (vzakaznikov).
  • Add more benchmarks for hash tables, including the Swiss Table from Google (that appeared to be slower than ClickHouse hash map in our specific usage scenario). #24111 (Maksim Kita).
  • Support building on Illumos. #24144 (bnaecker).

Other

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT

New Feature #14893