ClickHouse/docs/changelogs/v22.12.1.1752-stable.md
Alexey Milovidov e0252db8d4 No prestable
2023-03-27 12:19:32 +02:00

56 KiB
Raw Permalink Blame History

sidebar_position sidebar_label
1 2022

2022 Changelog

ClickHouse release v22.12.1.1752-stable (688e488e93) FIXME as compared to v22.11.1.1360-stable (0d211ed198)

Backward Incompatible Change

  • Fixed backward incompatibility in (de)serialization of states of min, max, any*, argMin, argMax aggregate functions with String argument. The incompatibility was introduced in https://github.com/ClickHouse/ClickHouse/pull/41431 and affects 22.9, 22.10 and 22.11 branches (fixed since 22.9.6, 22.10.4 and 22.11.2 correspondingly). Some minor releases of 22.3, 22.7 and 22.8 branches are also affected: 22.3.13...22.3.14 (fixed since 22.3.15), 22.8.6...22.8.9 (fixed since 22.8.10), 22.7.6 and newer (will not be fixed in 22.7, we recommend to upgrade from 22.7.* to 22.8.10 or newer). This release note does not concern users that have never used affected versions. Incompatible versions append extra '\0' to strings when reading states of the aggregate functions mentioned above. For example, if an older version saved state of anyState('foobar') to state_column then incompatible version will print 'foobar\0' on anyMerge(state_column). Also incompatible versions write states of the aggregate functions without trailing '\0'. Newer versions (that have the fix) can correctly read data written by all versions including incompatible versions, except one corner case. If an incompatible version saved a state with a string that actually ends with null character, then newer version will trim trailing '\0' when reading state of affected aggregate function. For example, if an incompatible version saved state of anyState('abrac\0dabra\0') to state_column then newer versions will print 'abrac\0dabra' on anyMerge(state_column). The issue also affects distributed queries when an incompatible version works in a cluster together with older or newer versions. #43038 (Raúl Marín).

New Feature

  • Add "grace_hash" join_algorithm. #38191 (BigRedEye).
  • Merging on initiator now uses the same memory bound approach as merging of local aggregation results if enable_memory_bound_merging_of_aggregation_results is set. #40879 (Nikita Taranov).
  • Add BSONEachRow input/output format. In this format, ClickHouse formats/parses each row as a separated BSON Document and each column is formatted/parsed as a single BSON field with column name as a key. #42033 (mark-polokhov).
  • close: #37631. #42265 (刘陶峰).
  • Added multiplyDecimal and divideDecimal functions for decimal operations with fixed precision. #42438 (Andrey Zvonov).
  • Added system.moves table with list of currently moving parts. #42660 (Sergei Trifonov).
  • Keeper feature: add support for embedded Prometheus endpoint. #43087 (Antonio Andelic).
  • Added age function to calculate difference between two dates or dates with time values expressed as number of full units. Close #41115. #43123 (Roman Vasin).
  • Add settings max_streams_for_merge_tree_reading and allow_asynchronous_read_from_io_pool_for_merge_tree. Setting max_streams_for_merge_tree_reading limits the number of reading streams for MergeTree tables. Setting allow_asynchronous_read_from_io_pool_for_merge_tree enables background I/O pool to read from MergeTree tables. This may increase performance for I/O bound queries if used together with max_streams_to_max_threads_ratio or max_streams_for_merge_tree_reading. #43260 (Nikolai Kochetov).
  • Add the expression of the index on data_skipping_indices system table. #43308 (Guillaume Tassery).
  • New hash function xxh3 added. Also performance of xxHash32 and xxHash64 improved on arm thanks to library update. #43411 (Nikita Taranov).
    • Temporary data (for external sorting, aggregation, and JOINs) can share storage with the filesystem cache for remote disks and evict it, close #42158. #43457 (Vladimir C).
  • Add column engine_full to system table databases so that users can access whole engine definition of database via system tables. #43468 (凌涛).
  • Add password complexity rules and checks for creating a new user. #43719 (Nikolay Degterinsky).
  • Add function concatWithSeparator , like concat_ws in spark. #43749 (李扬).
  • Added constraints for merge tree settings. #43903 (Sergei Trifonov).
  • Support numeric literals with _ as separator. #43925 (jh0x).
  • Add a new setting input_format_json_read_objects_as_strings that allows to parse nested JSON objects into Strings in all JSON input formats. This setting is disable by default. #44052 (Kruglov Pavel).

Performance Improvement

  • Optimisation is getting skipped now if max_size_to_preallocate_for_aggregation has too small value. Default value of this setting increased to 10^8. #43945 (Nikita Taranov).

Improvement

  • Support numeric literals with underscores. closes #28967. #39129 (unbyte).
  • Add FROM table SELECT column syntax. #41095 (Nikolay Degterinsky).
  • This PR changes how followed queries delete parts: truncate table, alter table drop part, alter table drop partition. Now these queries make empty parts which cover old parts. This makes truncate query works without exclusive lock which means concurrent reads aren't locked. Also achieved durability in all those queries. If request is succeeded then no resurrected pars appear later. Note that atomicity is achieved only with transaction scope. #41145 (Sema Checherinda).
  • SET param_x query no longer requires manual string serialization for the value of the parameter. For example, query SET param_a = '[\'a\', \'b\']' can now be written like SET param_a = ['a', 'b']. #41874 (Nikolay Degterinsky).
  • filesystemAvailable and related functions support one optional argument with disk name, and change filesystemFree to filesystemUnreserved. Closes #35076. #42064 (flynn).
  • Increased the default value of search_limit to 256, and added LDAP server config option to change that to an arbitrary value. Closes: #42276. #42461 (Vasily Nemkov).
  • Add cosine distance for annoy. #42778 (Filatenkov Artur).
  • Allow to remove sensitive information from the exception messages also. Resolves #41418. #42940 (filimonov).
  • Keeper improvement: Add 4lw command rqld which can manually assign a node as leader. #43026 (JackyWoo).
  • Apply connection timeouts settings for Distributed async INSERT from the query. #43156 (Azat Khuzhin).
  • unhex function support FixedString arguments. issue42369. #43207 (DR).
  • Priority is given to deleting completely expired Partsrelated #42869. #43222 (zhongyuankai).
  • Follow-up to https://github.com/ClickHouse/ClickHouse/pull/42484. Mask sensitive information in logs better; mask secret parts in the output of queries SHOW CREATE TABLE and SELECT FROM system.tables. Also resolves #41418. #43227 (Vitaly Baranov).
  • Enable compress marks and primary key. #43288 (SmitaRKulkarni).
  • resolve issue #38075 . Right now async insert doesn't support deduplication, because multiple small inserts will coexist in one part, which corespond multiple block ids. This solution is straitfoward: The change involves: 1. mark offsets for every inserts in every chunk 2. calculate multiple block_ids when sinker receive a chunk 3. get block number lock by these block_ids 3.1. if fails, remove the dup insert(s) and dup block_id(s) from block and recalculate offsets agian. 3.2. if succeeds, commit block_id's and other items into keeper a. if fails, do 3.1 b. if succeeds, everything succeeds. #43304 (Han Fei).
  • More precise and reactive CPU load indication on client. #43307 (Sergei Trifonov).
  • Restrict default access to named collections for user defined in config. It must have explicit show_named_collections=1 to be able to see them. #43325 (Kseniia Sumarokova).
  • Support reading of subcolumns of nested types from storage S3 and table function s3 with formats Parquet, Arrow and ORC. #43329 (chen).
  • Add table_uuid to system.parts. #43404 (Azat Khuzhin).
  • Added client option to display the number of locally processed rows in non-interactive mode (--print-num-processed-rows). #43407 (jh0x).
  • Show read rows while reading from stdin from client. Closes #43423. #43442 (Kseniia Sumarokova).
  • Keeper improvement: try syncing logs to disk in parallel with replication. #43450 (Antonio Andelic).
  • Show progress bar while reading from s3 table function / engine. #43454 (Kseniia Sumarokova).
  • Progress bar will show both read and written rows. #43496 (Ilya Yatsishin).
  • Implement aggregation-in-order optimization on top of query plan. It is enabled by default (but works only together with optimize_aggregation_in_order, which is disabled by default). Set query_plan_aggregation_in_order = 0 to use previous AST-based version. #43592 (Nikolai Kochetov).
  • Allow to send profile events with trace_type = 'ProfileEvent' to system.trace_log on each increment with current stack, profile event name and value of increment. It can be enabled by setting trace_profile_events and used to debug performance of queries. #43639 (Anton Popov).
  • Keeper improvement: requests are batched more often. The batching can be controlled with the new setting max_requests_quick_batch_size. #43686 (Antonio Andelic).
  • Added possibility to use array as a second parameter for cutURLParameter function. Close #6827. #43788 (Roman Vasin).
  • Implement referential dependencies and use them to create tables in the correct order while restoring from a backup. #43834 (Vitaly Baranov).
  • Add a new setting input_format_max_binary_string_size to limit string size in RowBinary format. #43842 (Kruglov Pavel).
  • Support query like SHOW FULL TABLES .... #43910 (Filatenkov Artur).
  • When ClickHouse requests a remote HTTP server, and it returns an error, the numeric HTTP code was not displayed correctly in the exception message. Closes #43919. #43920 (Alexey Milovidov).
  • Settings merge_tree_min_rows_for_concurrent_read_for_remote_filesystem/merge_tree_min_bytes_for_concurrent_read_for_remote_filesystem did not respect adaptive granularity. Fat rows did not decrease the number of read rows (as it is was done for merge_tree_min_rows_for_concurrent_read/merge_tree_min_bytes_for_concurrent_read, which could lead to high memory usage. #43965 (Nikolai Kochetov).
  • Support optimize_if_transform_strings_to_enum in new analyzer. #43999 (Antonio Andelic).
  • This is to upgrade the new "DeflateQpl" compression codec which has been implemented on previous PR (details: https://github.com/ClickHouse/ClickHouse/pull/39494). This patch improves codec on below aspects: 1. QPL v0.2.0 to QPL v0.3.0 Intel® Query Processing Library (QPL) 2. Improve CMake file for fixing QPL build issues for QPL v0.3.0。 3. Link the QPL library with libaccel-config at build time instead of runtime loading on QPL v0.2.0 (dlopen) 4. Fixed log print issue in CompressionCodecDeflateQpl.cpp. #44024 (jasperzhu).
  • Follow-up to https://github.com/ClickHouse/ClickHouse/pull/43834 Fix review issues; dependencies from Distributed table engine and from cluster() function are also considered now; as well as dependencies of a dictionary defined without host & port specified. #44158 (Vitaly Baranov).

Bug Fix

  • Fix mutations not making progress when checksums do not match between replicas (e.g. caused by a change in data format on an upgrade). #36877 (nvartolomei).
  • fix skip_unavailable_shards does not work using hdfsCluster table function. #43236 (chen).
  • fix s3 support question mark wildcard. Closes #42731. #43253 (chen).
    • Fix functions arrayFirstOrNull and arrayLastOrNull or null when array is Nullable. #43274 (Duc Canh Le).
    • we create a new zk path called "async_blocks" for replicated tables in #43304 . However, for tables created in older versions, this path does not exist and will cause error when doing partition operations. This PR will create this node when initializing replicated tree. - This PR created a flag async_insert_deduplicate with false default value to control whether to use this function. As mentioned in #38075 , this function is not yet fully finished. I would turn off it by default. #44223 (Han Fei).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in official stable release)

  • Fixed unable to log in (because of failure to create session_log entry) in rare case of messed up setting profiles. ... #42641 (Vasily Nemkov).
  • Fix incorrect UserTimeMicroseconds/SystemTimeMicroseconds accounting. #42791 (Azat Khuzhin).
  • Do not suppress exceptions in web disk. Fix retries for web disk. #42800 (Azat Khuzhin).
  • Fixed race condition between inserts and dropping MVs. #43161 (AlfVII).
  • Fixed bug which could lead to deadlock while using asynchronous inserts. #43233 (Anton Popov).
  • Additional check on zero uncompressed size is added to CompressionCodecDelta. #43255 (Nikita Taranov).
  • An issue with the following exception has been reported while trying to read a Parquet file from S3 into ClickHouse:. #43297 (Arthur Passos).
  • Fix bad cast from LowCardinality column when using short circuit function execution. Proper fix of https://github.com/ClickHouse/ClickHouse/pull/42937. #43311 (Kruglov Pavel).
  • Fixed queries with SAMPLE BY with prewhere optimization on tables using Merge engine. #43315 (Antonio Andelic).
  • Fix DESCRIBE for deltaLake and hudi table functions. #43323 (Antonio Andelic).
  • Check and compare the content of format_version file in MergeTreeData so tables can be loaded even if the storage policy was changed. #43328 (Antonio Andelic).
  • Fix possible (very unlikely) "No column to rollback" logical error during INSERT into Buffer. #43336 (Azat Khuzhin).
  • Fix a bug that allowed FucntionParser to parse an unlimited amount of round brackets into one function if allow_function_parameters is set. #43350 (Nikolay Degterinsky).
  • MaterializeMySQL support ddl drop table t1,t2 and Compatible with most of MySQL drop ddl. #43366 (zzsmdfj).
  • Fix possible Cannot create non-empty column with type Nothing in functions if/multiIf. Closes #43356. #43368 (Kruglov Pavel).
  • Fix a bug when row level filter uses default value of column. #43387 (Alexander Gololobov).
  • Query with DISTINCT + LIMIT BY + LIMIT can return fewer rows than expected. Fixes #43377. #43410 (Igor Nikonov).
  • Fix sumMap() for Nullable(Decimal()). #43414 (Azat Khuzhin).
  • Fix date_diff() for hour/minute on macOS. Close #42742. #43466 (zzsmdfj).
  • Fix incorrect memory accounting because of merges/mutations. #43516 (Azat Khuzhin).
  • Substitute UDFs in CREATE query to avoid failures during loading at the startup. Additionally, UDFs can now be used as DEFAULT expressions for columns. #43539 (Antonio Andelic).
  • Correctly report errors in queries even when multiple JOINs optimization is taking place. #43583 (Salvatore).
  • Fixed primary key analysis with conditions involving toString(enum). #43596 (Nikita Taranov).
    • Ensure consistency when copier update status and attach_is_done in keeper after partition attach is done. #43602 (lizhuoyu5).
  • During recovering of the lost replica there could a situation where we need to atomically swap two table names (use EXCHANGE), but instead previously we tried to use two RENAME queries. Which was obviously failed and moreover failed the whole recovery process of the database replica. #43628 (Nikita Mikhaylov).
  • fix s3Cluster function returns NOT_FOUND_COLUMN_IN_BLOCK error. Closes #43534. #43629 (chen).
  • Optimized number of List requests to ZooKeeper when selecting a part to merge. Previously it could produce thousands of requests in some cases. Fixes #43647. #43675 (Alexander Tokmakov).
  • Fix posssible logical error 'Array sizes mismatched' while parsing JSON object with arrays with same key names but with different nesting level. Closes #43569. #43693 (Kruglov Pavel).
  • Fixed possible exception in case of distributed group by with an alias column among aggregation keys. #43709 (Nikita Taranov).
  • Fix bug which can lead to broken projections if zero-copy replication is enabled and used. #43764 (alesapin).
    • Fix using multipart upload for large S3 objects in AWS S3. #43824 (ianton-ru).
  • Fixed ALTER ... RESET SETTING with ON CLUSTER. It could be applied to one replica only. Fixes #43843. #43848 (Elena Torró).
  • Keeper fix: throw if interserver port for Raft is already in use. Fix segfault in Prometheus when Raft server failed to initialize. #43984 (Antonio Andelic).
  • Fix order by positional arg in case unneeded columns pruning. Closes #43964. #43987 (Kseniia Sumarokova).
  • Fixed exception when subquery contains having but doesn't contain actual aggregation. #44051 (Nikita Taranov).
  • Fix race in s3 multipart upload. This race could cause the error Part number must be an integer between 1 and 10000, inclusive. (S3_ERROR) while restoring from a backup. #44065 (Vitaly Baranov).
  • Fix undefined behavior in the quantiles function, which might lead to uninitialized memory. Found by fuzzer. This closes #44066. #44067 (Alexey Milovidov).
  • Prevent dropping nested column if it creates empty part. #44159 (Antonio Andelic).
  • Fix LOGICAL_ERROR in case when fetch of part was stopped while fetching projection to the disk with enabled zero-copy replication. #44173 (Anton Popov).
  • Fix possible Bad cast from type DB::IAST const* to DB::ASTLiteral const*. Closes #44191. #44192 (Kruglov Pavel).
  • Prevent ReadonlyReplica metric from having negative values. #44220 (Antonio Andelic).

Build Improvement

  • Fixed Endian issues in hex string conversion on s390x (which is not supported by ClickHouse). #41245 (Harry Lee).
  • ... toDateTime64 conversion generates wrong time on z build, add bit_cast swap fix to support toDateTime64 on s390x platform. #42847 (Suzy Wang).
  • ... s390x support for ip coding functions. #43078 (Suzy Wang).
  • Fix byte order issue of wide integers for s390x. #43228 (Harry Lee).
  • Fixed endian issue in bloom filter serialization for s390x. #43642 (Harry Lee).
  • Fixed setting TCP_KEEPIDLE of client connection for s390x. #43850 (Harry Lee).
  • Fix endian issue in StringHashTable for s390x. #44049 (Harry Lee).

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT