ClickHouse/docs/changelogs/v22.5.1.2079-stable.md

68 KiB
Raw Blame History

sidebar_position sidebar_label
1 2022

2022 Changelog

ClickHouse release v22.5.1.2079-stable (df0cb06209) FIXME as compared to v22.4.1.2305-prestable (77a82cc090)

Backward Incompatible Change

  • Updated the BoringSSL module to the official FIPS compliant version. This makes ClickHouse FIPS compliant. #35914 (Deleted user).
  • Now, background merges, mutations and OPTIMIZE will not increment SelectedRows and SelectedBytes metrics. They (still) will increment MergedRows and MergedUncompressedBytes as it was before. #37040 (Nikolai Kochetov).

New Feature

  • add implementation of MeiliSearch storage and table function. #33332 (Mikhail Artemenko).
  • Add support of GROUPING SETS in GROUP BY clause. Follow up after #33186. This implementation supports a parallel processing of grouping sets. #33631 (Dmitry Novik).
  • According to the design mentioned at #19627#issuecomment-1068772646. #35318 (徐炘).
  • Added SYSTEM SYNC DATABASE REPLICA query which allows to sync tables metadata inside Replicated database, because currently synchronisation is asynchronous. #35944 (Nikita Mikhaylov).
  • Parse collations in CREATE TABLE, throw exception or ignore. closes #35892. #36271 (yuuch).
  • Add aliases JSONLines and NDJSON for JSONEachRow. Closes #36303. #36327 (flynn).
  • Set parts_to_delay_insert and parts_to_throw_insert as query-level settings. If they are defined, they can override table-level settings. #36371 (Memo).
  • temporary table can show total rows and total bytes. #36401. #36439 (chen).
  • Added new hash function - wyHash64. #36467 (olevino).
  • Window function nth_value was added. #36601 (Nikolay).
  • Add MySQLDump input format. It reads all data from INSERT queries belonging to one table in dump. If there are more than one table, by default it reads data from the first one. #36667 (Kruglov Pavel).
  • New single binary based diagnostics tool. #36705 (Dale McDiarmid).
  • Description: It is used to count the system table of a request for remote file access, which can help users analyze the causes of performance fluctuations in the scenario of separation of storage and computer. The current system table structure is as follows. When a query reads a segment of a remote file, a record is generated. Read type include READ_FROM_FS_AND_DOWNLOADED_TO_CACHE、READ_FROM_CACHE、READ_FROM_FS_BYPASSING_CACHE, which used to indicate whether the query accesses the segment from the cache or from a remote file. #36802 (Han Shukai).
  • Adds h3Line, h3Distance and h3HexRing functions. #37030 (Bharat Nallan).
  • Related issue - #35101. #37033 (qieqieplus).
  • Added system.certificates table. #37142 (Yakov Olkhovskiy).

Performance Improvement

  • Improve performance or ORDER BY, MergeJoin, insertion into MergeTree using JIT compilation of sort columns comparator. #34469 (Maksim Kita).
  • First commit is to increase the inline threshold. Next commits will improve queries by inlining for those who have shown better performance. This way we will not increase the compile time and binary size and optimize the program. #34544 (Daniel Kutenin).
  • Transform OR LIKE chain to multiMatchAny. Will enable once we have more confidence it works. #34932 (Daniel Kutenin).
  • Rewrite 'select countDistinct(a) from t' to 'select count(1) from (select a from t groupBy a)'. #35993 (zhanglistar).
  • Change structure of system.asynchronous_metric_log. It will take about 10 times less space. This closes #36357. The field event_time_microseconds was removed, because it is useless. #36360 (Alexey Milovidov).
  • The default HashJoin is not thread safe for inserting right table's rows and run it in a single thread. When the right table is large, the join process is too slow with low cpu utilization. #36415 (lgbo).
  • Improve performance of reading from storage File and table functions file in case when path has globs and matched directory contains large number of files. #36647 (Anton Popov).
  • Appy parallel parsing for input format HiveText, which can speed up HiveText parsing by 2x when reading local file. #36650 (李扬).
  • Improves performance of file descriptor cache by narrowing mutex scopes. #36682 (Anton Kozlov).
  • This PR improves the WATCH query in WindowView: 1. Reduce the latency of providing query results by calling the fire_condition signal. 2. Makes the cancel query operation(ctrl-c) faster, by checking isCancelled() more frequently. #37226 (vxider).
  • Improve performance of avg, sum aggregate functions if used without GROUP BY expression. #37257 (Maksim Kita).
  • Improve performance of unary arithmetic functions (bitCount, bitNot, abs, intExp2, intExp10, negate, roundAge, roundDuration, roundToExp2, sign) using dynamic dispatch. #37289 (Maksim Kita).

Improvement

  • Remind properly if use clickhouse-client --file without preceeding --external. Close #34747. #34765 (李扬).
  • Added support for specifying content_type in predefined and static HTTP handler config. #34916 (Roman Nikonov).
  • Implement partial GROUP BY key for optimize_aggregation_in_order. #35111 (Azat Khuzhin).
  • Nullables detection in protobuf using Google wrappers. #35149 (Jakub Kuklis).
  • If the required amount of memory is available before the selected query stopped, all waiting queries continue execution. Now we don't stop any query if memory is freed before the moment when the selected query knows about the cancellation. #35637 (Dmitry Novik).
  • Enable memory overcommit by default. #35921 (Dmitry Novik).
  • Refactor code around schema inference with globs. Try next file from glob only if it makes sense (previously we tried next file in case of any error). Also it fixes #36317. #36205 (Kruglov Pavel).
  • Improve schema inference for json objects. #36207 (Kruglov Pavel).
  • Add support for force recovery which allows you to reconfigure cluster without quorum. #36258 (Antonio Andelic).
  • We create a local interpreter if we want to execute query on localhost replica. But for when executing query on multiple replicas we rely on the fact that a connection exists so replicas can talk to coordinator. It is now improved and localhost replica can talk to coordinator directly in the same process. #36281 (Nikita Mikhaylov).
  • Show names of erroneous files in case of parsing errors while executing table functions file, s3 and url. #36314 (Anton Popov).
  • Allowed to increase the number of threads for executing background operations (merges, mutations, moves and fetches) at runtime if they are specified at top level config. #36425 (Nikita Mikhaylov).
  • clickhouse-benchmark can read auth from environment variables. #36497 (Anton Kozlov).
  • Allow names of tuple elements that start from digits. #36544 (Anton Popov).
  • Allow file descriptors in table function file if it is run in clickhouse-local. #36562 (wuxiaobai24).
  • Allow to cast columns of type Object(...) to Object(Nullable(...)). #36564 (awakeljw).
  • Cleanup CSS in Play UI. The pixels are more evenly placed. Better usability for long content in table cells. #36569 (Alexey Milovidov).
  • The metrics about time spent reading from s3 now calculated correctly. Close #35483. #36572 (Alexey Milovidov).
  • Improve SYSTEM DROP FILESYSTEM CACHE query: <path> option and FORCE option. #36639 (Kseniia Sumarokova).
  • Add is_all_data_sent column into system.processes, and improve internal testing hardening check based on it. #36649 (Azat Khuzhin).
  • Now date time conversion functions that generates time before 1970-01-01 00:00:00 with partial hours/minutes timezones will be saturated to zero instead of overflow. This is the continuation of https://github.com/ClickHouse/ClickHouse/pull/29953 which addresses https://github.com/ClickHouse/ClickHouse/pull/29953#discussion_r800550280 . Mark as improvement because it's implementation defined behavior (and very rare case) and we are allowed to break it. #36656 (Amos Bird).
  • Allow to cancel query while still keep decent query id in MySQLHandler. #36699 (Amos Bird).
  • Properly cancel INSERT queries in clickhouse-client/clickhouse-local. #36710 (Azat Khuzhin).
  • Allow cluster macro in s3Cluster table function. #36726 (Vadim Volodin).
  • Added user_defined_path config setting. #36753 (Maksim Kita).
  • Allow to execute hash functions with arguments of type Array(Tuple(..)). #36812 (Anton Popov).
  • Add warning if someone running clickhouse-server with log level "test". The log level "test" was added recently and cannot be used in production due to inevitable, unavoidable, fatal and life-threatening performance degradation. #36824 (Alexey Milovidov).
  • Play UI: If there is one row in result and more than a few columns, display the result vertically. Continuation of #36811. #36842 (Alexey Milovidov).
  • Add extra diagnostic info (if applicable) when sending exception to other server. #36872 (Alexander Tokmakov).
  • After #36425 settings like background_fetches_pool_size became obsolete and can appear in top level config, but clickhouse throws and exception like Error updating configuration from '/etc/clickhouse-server/config.xml' config.: Code: 137. DB::Exception: A setting 'background_fetches_pool_size' appeared at top level in config /etc/clickhouse-server/config.xml. This is fixed. #36917 (Nikita Mikhaylov).
  • Finalize write buffers in case of exception to avoid doing it in destructors. Hope it fixes: #36907. #36979 (Kruglov Pavel).
  • Play UI: Nullable numbers will be aligned to the right in table cells. This closes #36982. #36988 (Alexey Milovidov).
  • Implemented a new mode of handling row policies which can be enabled in the main configuration which enables users without permissive row policies to read rows. #36997 (Vitaly Baranov).
  • Fix bug which can lead to forgotten outdated parts in MergeTree table engines family in case of filesystem failures during parts removal. Before fix they will be removed only after first server restart. #37014 (alesapin).
  • Modify query div in play.html to be extendable beyond 200px height. In case of very long queries it is helpful to extend the textarea element, only today, since the div is fixed height, the extended textarea hides the data div underneath. With this fix, extending the textarea element will push the data div down/up such the extended textarea won't hide it. #37051 (guyco87).
  • Better read from cache. #37054 (Kseniia Sumarokova).
  • Fix progress indication for INSERT SELECT in clickhouse-local for any query and for file progress in client, more correct file progress. #37075 (Kseniia Sumarokova).
  • Disable log_query_threads setting by default. It controls the logging of statistics about every thread participating in query execution. After supporting asynchronous reads, the total number of distinct thread ids became too large, and logging into the query_thread_log has become too heavy. #37077 (Alexey Milovidov).
  • Option compatibility_ignore_auto_increment_in_create_table allows ignoring AUTO_INCREMENT keyword in a column declaration to simplify migration from MySQL. #37178 (Igor Nikonov).
  • Added implicit cast for h3kRing function second argument to improve usability. Closes #35432. #37189 (Maksim Kita).
  • Limit the max partitions could be queried for each hive table. Avoid resource overruns. #37281 (lgbo).

Bug Fix

  • Extracts Version ID if present from the URI and adds a request to the AWS HTTP URI. Closes #31221. - [x] Extract Version ID from URI if present and reassemble without it. - [x] Configure AWS HTTP URI object with request. - [x] Unit Tests: gtest_s3_uri - [x] Drop instrumentation commit. #34571 (Saad Ur Rahman).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in official stable or prestable release)

  • The ilike() function on FixedString columns could have returned wrong results (i.e. match less than it should). #37117 (Robert Schulze).
  • Fix implicit cast for optimize_skip_unused_shards_rewrite_in. #37153 (Azat Khuzhin).
  • Enable enable_global_with_statement for subqueries, close #37141. #37166 (Vladimir C).
  • Now WindowView WATCH EVENTS query will not be terminated due to the nonempty Chunk created in WindowViewSource.h:58. #37182 (vxider).
  • Fix "Cannot create column of type Set" for distributed queries with LIMIT BY. #37193 (Azat Khuzhin).
  • Fix possible overflow during OvercommitRatio comparison. cc @tavplubix. #37197 (Dmitry Novik).
  • Update max_fired_watermark after blocks actually fired, in case delete data that hasn't been fired yet. #37225 (vxider).
  • Kafka does not need group.id on producer stage. In console log you can find Warning that describe this issue: 2022.05.15 17:59:13.270227 [ 137 ] {} <Warning> StorageKafka (topic-name): [rdk:CONFWARN] [thrd:app]: Configuration property group.id is a consumer property and will be ignored by this producer instance. #37228 (Mark Andreev).
  • fix MySQL database engine to compatible with binary(0) dataType. #37232 (zzsmdfj).
  • Fix execution of mutations in tables, in which there exist columns of type Object. Using subcolumns of type Object in WHERE expression of UPDATE or DELETE queries is now allowed yet, as well as manipulating (DROP, MODIFY) of separate subcolumns. Fixes #37205. #37266 (Anton Popov).
  • Fix Nullable(String) to Nullable(Bool/IPv4/IPv6) conversion Closes #37221. #37270 (Kruglov Pavel).
  • Fix system.opentelemetry_span_log attribute.values alias to values instead of keys. #37275 (Aleksandr Razumov).
  • Fix possible deadlock in OvercommitTracker during logging. cc @alesapin @tavplubix Fixes #37272. #37299 (Dmitry Novik).

Bug Fix (user-visible misbehaviour in official stable or prestable release)

    • fix substring function range error length when offset and length is negative constant and s is not constant. #33861 (RogerYK).
  • Accidentally ZSTD support for Arrow was not being built. This fixes #35283. #35486 (Sean Lafferty).
  • Fix ALTER DROP COLUMN of nested column with compact parts (i.e. ALTER TABLE x DROP COLUMN n, when there is column n.d). #35797 (Azat Khuzhin).
  • Fix insertion of complex JSONs with nested arrays to columns of type Object. #36077 (Anton Popov).
  • Queries with aliases inside special operators returned parsing error (was broken in 22.1). Example: SELECT substring('test' AS t, 1, 1). #36167 (Maksim Kita).
  • Fix dictionary reload for ClickHouseDictionarySource if it contains scalar subqueries. #36390 (lthaooo).
  • Fix nullptr dereference in JOIN and COLUMNS matcher. This fixes #36416 . This is for https://github.com/ClickHouse/ClickHouse/pull/36417. #36430 (Amos Bird).
  • Fix bug in s3Cluster schema inference that let to the fact that not all data was read in the select from s3Cluster. The bug appeared in https://github.com/ClickHouse/ClickHouse/pull/35544. #36434 (Kruglov Pavel).
  • Server might fail to start if it cannot resolve hostname of external ClickHouse dictionary. It's fixed. Fixes #36451. #36463 (Alexander Tokmakov).
  • This code segment can prove bug. ``` int main() { RangeGenerator g{1230, 100}; std::cout << g.totalRanges() << std::endl; int count = 0; while(g.nextRange()) ++count; std::cout << "count:" << count << std::endl; return 0; }. #36469 (李扬).
  • Fix clickhouse-benchmark json report results. #36473 (Tian Xinhui).
  • Add missing enum values in system.session_log table. Closes #36474. #36480 (Memo).
  • Fix possible exception with unknown packet from server in client. #36481 (Kseniia Sumarokova).
  • Fix usage of executable user defined functions in GROUP BY. Before executable user defined functions cannot be used as expressions in GROUP BY. Closes #36448. #36486 (Maksim Kita).
  • close #33906. #36489 (awakeljw).
  • Fix hostname sanity checks for Keeper cluster configuration. Add keeper_server.host_checks_enabled config to enable/disable those checks. #36492 (Antonio Andelic).
  • Fix offset update ReadBufferFromEncryptedFile, which could cause undefined behaviour. #36493 (Kseniia Sumarokova).
  • Fix Missing column exception which could happen while using INTERPOLATE with ENGINE = MergeTree table. #36549 (Yakov Olkhovskiy).
  • Fix format crash when default expression follow EPHEMERAL not literal. Closes #36618. #36633 (flynn).
  • Fix merges of wide parts with type Object. #36637 (Anton Popov).
  • Fixed parsing of query settings in CREATE query when engine is not specified. Fixes https://github.com/ClickHouse/ClickHouse/pull/34187#issuecomment-1103812419. #36642 (Alexander Tokmakov).
  • Fix possible heap-use-after-free in schema inference. Closes #36661. #36679 (Kruglov Pavel).
  • Fix server restart if cache configuration changed. #36685 (Kseniia Sumarokova).
  • In the previous PR, I found that testing (stateless tests, flaky check (address, actions)) is timeout. Moreover, testing locally can also trigger unstable system deadlocks. This problem still exists when using the latest source code of master. #36697 (Han Shukai).
  • Fix server reload on port change (do not wait for current connections from query context). #36700 (Azat Khuzhin).
  • Fix vertical merges in wide parts. Previously an exception There is no column can be thrown during merge. #36707 (Anton Popov).
  • During the test in PR, I found that the one cache class was initialized twice, it throws a exception. Although the cause of this problem is not clear, there should be code logic of repeatedly loading disk in ClickHouse, so we need to make special judgment for this situation. #36737 (Han Shukai).
  • Fix a bug of groupBitmapAndState/groupBitmapOrState/groupBitmapXorState on distributed table. #36739 (Zhang Yifan).
  • Fix timeouts in Hedged requests. Connection hang right after sending remote query could lead to eternal waiting. #36749 (Kruglov Pavel).
  • Fix insertion to columns of type Object from multiple files, e.g. via table function file with globs. #36762 (Anton Popov).
  • Fix some issues with async reads from remote filesystem which happened when reading low cardinality. #36763 (Kseniia Sumarokova).
  • Fix creation of tables with flatten_nested = 0. Previously unflattened Nested columns could be flattened after server restart. #36803 (Anton Popov).
  • Fix incorrect cast in cached buffer from remote fs. #36809 (Kseniia Sumarokova).
  • Remove function groupArraySorted which has a bug. #36822 (Alexey Milovidov).
  • Fix fire in window view with hop window #34044. #36861 (vxider).
  • Fix current_size count in cache. #36887 (Kseniia Sumarokova).
  • Fix incorrect query result when doing constant aggregation. This fixes #36728 . #36888 (Amos Bird).
  • Fix bug in clickhouse-keeper which can lead to corrupted compressed log files in case of small load and restarts. #36910 (alesapin).
  • Fix bugs when using multiple columns in WindowView by adding converting actions to make it possible to callwriteIntoWindowView with a slightly different schema. #36928 (vxider).
  • Fix issue: #36671. #36929 (李扬).
  • Fix stuck when dropping source table in WindowView. Closes #35678. #36967 (vxider).
  • Fixed logical error on TRUNCATE query in Replicated database. Fixes #33747. #36976 (Alexander Tokmakov).
  • Fix sending external tables data in HedgedConnections with max_parallel_replicas != 1. #36981 (Kruglov Pavel).
  • Fixed problem with infs in quantileTDigest. Fixes #32107. #37021 (Vladimir Chebotarev).
  • Fix LowCardinality->ArrowDictionary invalid output when type of indexes is not UInt8. Closes #36832. #37043 (Kruglov Pavel).
  • Fix in-order GROUP BY (optimize_aggregation_in_order=1) with *Array (groupArrayArray/...) aggregate functions. #37046 (Azat Khuzhin).
  • Fixed performance degradation of some INSERT SELECT queries with implicit aggregation. Fixes #36792. #37047 (Alexander Tokmakov).
  • Fix optimize_aggregation_in_order with prefix GROUP BY and *Array aggregate functions. #37050 (Azat Khuzhin).

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT