ClickHouse/docs/changelogs/v22.10.1.1877-stable.md

62 KiB

sidebar_position sidebar_label
1 2022

2022 Changelog

ClickHouse release v22.10.1.1877-stable (98ab5a3c18) FIXME as compared to v22.9.1.2603-stable (3030d4c7ff)

Backward Incompatible Change

New Feature

  • Add Rust code support into ClickHouse with BLAKE3 hash-function library as an example. #33435 (BoloniniD).
  • This is the initial implement of Kusto Query Language. (MVP). #37961 (Yong Wang).
    • Support limiting of temporary data stored on disk using settings max_temporary_data_on_disk_size_for_user/max_temporary_data_on_disk_size_for_query . #40893 (Vladimir C).
  • Support Java integers hashing in javaHash. #41131 (JackyWoo).
  • This PR is to support the OpenSSL in-house build like the BoringSSL submodule. Build flag i.e. ENABLE_CH_BUNDLE_BORINGSSL is used to choose between BoringSSL and OpenSSL. By default, the BoringSSL in-house build will be used. #41142 (MeenaRenganathan22).
  • Composable protocol configuration is added. #41198 (Yakov Olkhovskiy).
  • Add OpenTelemetry support to ON CLUSTER DDL(require distributed_ddl_entry_format_version to be set to 4). #41484 (Frank Chen).
  • Add setting format_json_object_each_row_column_for_object_name to write/parse object name as column value in JSONObjectEachRow format. #41703 (Kruglov Pavel).
  • adds Morton Coding (ZCurve) encode/decode functions. #41753 (Constantine Peresypkin).
  • Implement support for different UUID binary formats with support for the two most prevalent ones: the default big-endian and Microsoft's mixed-endian as specified in RFC 4122. #42108 (ltrk2).
  • Added an aggregate function analysisOfVariance (anova) to perform a statistical test over several groups of normally distributed observations to find out whether all groups have the same mean or not. Original PR #37872. #42131 (Nikita Mikhaylov).
  • Add support for SET setting_name = DEFAULT. #42187 (Filatenkov Artur).
    • Add URL Functions which conform rfc. Functions include: cutToFirstSignificantSubdomainCustomRFC, cutToFirstSignificantSubdomainCustomWithWWWRFC, cutToFirstSignificantSubdomainRFC, cutToFirstSignificantSubdomainWithWWWRFC, domainRFC, domainWithoutWWWRFC, firstSignificantSubdomainCustomRFC, firstSignificantSubdomainRFC, portRFC, topLevelDomainRFC. #42274 (Quanfa Fu).
  • Added functions (randUniform, randNormal, randLogNormal, randExponential, randChiSquared, randStudentT, randFisherF, randBernoulli, randBinomial, randNegativeBinomial, randPoisson ) to generate random values according to the specified distributions. This closes #21834. #42411 (Nikita Mikhaylov).

Performance Improvement

  • Implement operator precedence element parser to resolve stack overflow issues and make the required stack size smaller. #34892 (Nikolay Degterinsky).
  • DISTINCT in order optimization leverage sorting properties of data streams. This improvement will enable reading in order for DISTINCT if applicable (before it was necessary to provide ORDER BY for columns in DISTINCT). #41014 (Igor Nikonov).
  • ColumnVector: optimize UInt8 index with AVX512VBMI. #41247 (Guo Wangyang).
  • The performance experiments of SSB (Star Schema Benchmark) on the ICX device (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads) shows that this change could bring a 2.95x improvement of the geomean of all subcases' QPS. #41675 (Zhiguo Zhou).
  • Fixed slowness in JSONExtract with LowCardinality(String) tuples. #41726 (AlfVII).
  • Add ldapr capabilities to AArch64 builds. This is supported from Graviton 2+, Azure and GCP instances. Only appeared in clang-15 not so long ago. #41778 (Daniel Kutenin).
  • Improve performance when comparing strings and one argument is empty constant string. #41870 (Jiebin Sun).
  • optimize insertFrom of ColumnAggregateFunction to share Aggregate State in some cases. #41960 (flynn).
  • Relax the "Too many parts" threshold. This closes #6551. Now ClickHouse will allow more parts in a partition if the average part size is large enough (at least 10 GiB). This allows to have up to petabytes of data in a single partition of a single table on a single server, which is possible using disk shelves or object storage. #42002 (Alexey Milovidov).
  • Make writing to AzureBlobStorage more efficient (respect max_single_part_upload_size instead of writing a block per each buffer size). Inefficiency mentioned in #41754. #42041 (Kseniia Sumarokova).
  • Make thread ids in the process list and query_log unique to avoid waste. #42180 (Alexey Milovidov).

Improvement

  • Added new infrastructure for query analysis and planning under allow_experimental_analyzer setting. #31796 (Maksim Kita).
    • Support expression (EXPLAIN SELECT ...) in a subquery. Queries like SELECT * FROM (EXPLAIN PIPELINE SELECT col FROM TABLE ORDER BY col) became valid. #40630 (Vladimir C).
  • Currently changing async_insert_max_data_size or async_insert_busy_timeout_ms in scope of query makes no sense and this leads to bad user experience. E.g. user wants to insert data rarely and he doesn't have an access to server config to tune default settings. #40668 (Nikita Mikhaylov).
  • Embedded Keeper will always start in the background allowing ClickHouse to start without achieving quorum. #40991 (Antonio Andelic).
  • Improvements for reading from remote filesystems, made threadpool size for reads/writes configurable. Closes #41070. #41011 (Kseniia Sumarokova).
  • Made reestablishing a new connection more reactive in case of expiration of the previous one. Previously there was a task which spawns every minute by default and thus a table could be in readonly state for about this time. #41092 (Nikita Mikhaylov).
  • Support all combinators combination in WindowTransform/arratReduce*/initializeAggregation/aggregate functions versioning. Previously combinators like ForEach/Resample/Map didn't work in these places, using them led to exception likeState function ... inserts results into non-state column. #41107 (Kruglov Pavel).
  • Now projections can be used with zero copy replication. #41147 (alesapin).
    • Add function tryDecrypt that returns NULL when decrypt fail (e.g. decrypt with incorrect key) instead of throwing exception. #41206 (Duc Canh Le).
  • Add the unreserved_space column to the system.disks table to check how much space is not taken by reservations per disk. #41254 (filimonov).
  • Support s3 authorisation headers from ast arguments. #41261 (Kseniia Sumarokova).
  • Add setting 'allow_implicit_no_password' that forbids creating a user with no password unless 'IDENTIFIED WITH no_password' is explicitly specified. #41341 (Nikolay Degterinsky).
  • keeper-improvement: add support for uploading snapshots to S3. S3 information can be defined inside keeper_server.s3_snapshot. #41342 (Antonio Andelic).
  • Add support for MultiRead in Keeper and internal ZooKeeper client. #41410 (Antonio Andelic).
  • add a support for decimal type comparing with floating point literal in IN operator. #41544 (liang.huang).
  • Allow readable size values in cache config. #41688 (Kseniia Sumarokova).
  • Check file path for path traversal attacks in errors logger for input formats. #41694 (Kruglov Pavel).
  • ClickHouse could cache stale DNS entries for some period of time (15 seconds by default) until the cache won't be updated asynchronously. During these period ClickHouse can nevertheless try to establish a connection and produce errors. This behaviour is fixed. #41707 (Nikita Mikhaylov).
  • Add interactive history search with fzf-like utility (fzf/sk) for clickhouse-client/clickhouse-local (note you can use FZF_DEFAULT_OPTS/SKIM_DEFAULT_OPTIONS to additionally configure the behavior). #41730 (Azat Khuzhin).
  • For client when connecting to a secure server with invalid certificate only allow to proceed with '--accept-certificate' flag. #41743 (Yakov Olkhovskiy).
  • Add function "tryBase58Decode()", similar to the existing function "tryBase64Decode()". #41824 (Robert Schulze).
  • Improve feedback when replacing partition with different primary key. Fixes #34798. #41838 (Salvatore).
  • Replace back clickhouse su command with sudo -u in start in order to respect limits in /etc/security/limits.conf. #41847 (Eugene Konkov).
  • Fix parallel parsing: segmentator now checks max_block_size. #41852 (Vitaly Baranov).
  • Don't report TABLE_IS_DROPPED exception in order to skip table in case is was just dropped. #41908 (AlfVII).
  • Improve option enable_extended_results_for_datetime_functions to return results of type DateTime64 for functions toStartOfDay, toStartOfHour, toStartOfFifteenMinutes, toStartOfTenMinutes, toStartOfFiveMinutes, toStartOfMinute and timeSlot. #41910 (Roman Vasin).
  • Improve DateTime type inference for text formats. Now it respect setting date_time_input_format and doesn't try to infer datetimes from numbers as timestamps. Closes #41389 Closes #42206. #41912 (Kruglov Pavel).
  • Remove confusing warning when inserting with perform_ttl_move_on_insert=false. #41980 (Vitaly Baranov).
  • Allow user to write countState(*) similar to count(*). This closes #9338. #41983 (Amos Bird).
  • Added an option to specify an arbitrary string as an environment name in the Sentry's config for more handy reports. #42037 (Nikita Mikhaylov).
  • Added system table asynchronous_insert_log . It contains information about asynchronous inserts (including results of queries in fire-and-forget mode (with wait_for_async_insert=0)) for better introspection. #42040 (Anton Popov).
  • Fix parsing out-of-range Date from CSV:. #42044 (Andrey Zvonov).
  • parseDataTimeBestEffort support comma between date and time. Closes #42038. #42049 (flynn).
  • Add support for methods lz4, bz2, snappy in 'Accept-Encoding'. #42071 (Nikolay Degterinsky).
  • Various minor fixes for BLAKE3 function. #42073 (BoloniniD).
  • Improved stale replica recovery process for ReplicatedMergeTree. If lost replica have some parts which absent on a healthy replica, but these parts should appear in future according to replication queue of the healthy replica, then lost replica will keep such parts instead of detaching them. #42134 (Alexander Tokmakov).
  • Support BACKUP to S3 with as-is path/data structure. #42232 (Azat Khuzhin).
  • Add a possibility to use Date32 arguments for date_diff function. Fix issue in date_diff function when using DateTime64 arguments with start date before Unix epoch and end date after Unix epoch. #42308 (Roman Vasin).
  • When uploading big parts to minio, 'Complete Multipart Upload' can take a long time. Minio sends heartbeats every 10 seconds (see https://github.com/minio/minio/pull/7198). But clickhouse times out earlier, because the default send/receive timeout is set to 5 seconds. #42321 (filimonov).
  • Add S3 as a new type of the destination of backups. #42333 (Vitaly Baranov).
  • Fix rarely invalid cast of aggregate state types with complex types such as Decimal. This fixes #42408. #42417 (Amos Bird).
  • Support skipping cache completely (both download to cache and reading cached data) in case the requested read range exceeds the threshold defined by cache setting bypass_cache_threashold, requires to be enabled with enable_bypass_cache_with_threshold). #42418 (Han Shukai).
  • Merge parts if every part in the range is older than a certain threshold. The threshold can be set by using min_age_to_force_merge_seconds. This closes #35836. #42423 (Antonio Andelic).
  • Enabled CompiledExpressionCache in clickhouse-local. #42477 (AlfVII).
  • Remove support for the {database} macro from the client's prompt. It was displayed incorrectly if the database was unspecified and it was not updated on USE statements. This closes #25891. #42508 (Alexey Milovidov).
  • Allow to use Date32 arguments for dateName function. #42554 (Roman Vasin).

Bug Fix

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in official stable or prestable release)

  • Several fixes for DiskWeb. #41652 (Kseniia Sumarokova).
  • Fixes issue when docker run will fail if "https_port" is not present in config. #41693 (Yakov Olkhovskiy).
  • Mutations were not cancelled properly on server shutdown or SYSTEM STOP MERGES query and cancellation might take long time, it's fixed. #41699 (Alexander Tokmakov).
  • Fix wrong result of queries with ORDER BY or GROUP BY by columns from prefix of sorting key, wrapped into monotonic functions, with enable "read in order" optimization (settings optimize_read_in_order and optimize_aggregation_in_order). #41701 (Anton Popov).
  • Fix possible crash in SELECT from Merge table with enabled optimize_monotonous_functions_in_order_by setting. Fixes #41269. #41740 (Nikolai Kochetov).
  • Fixed "Part ... intersects part ..." error that might happen in extremely rare cases if replica was restarted just after detaching some part as broken. #41741 (Alexander Tokmakov).
  • Don't allow to create or alter merge tree tables with virtual column name _row_exists, which is reserved for lightweight delete. Fixed #41716. #41763 (Jianmei Zhang).
  • Fix a bug that CORS headers are missing in some HTTP responses. #41792 (Frank Chen).
  • 22.9 might fail to startup ReplicatedMergeTree table if that table was created by 20.3 or older version and was never altered, it's fixed. Fixes #41742. #41796 (Alexander Tokmakov).
  • When the batch sending fails for some reason, it cannot be automatically recovered, and if it is not processed in time, it will lead to accumulation, and the printed error message will become longer and longer, which will cause the http thread to block. #41813 (zhongyuankai).
  • Fix compact parts with compressed marks setting. Fixes #41783 and #41746. #41823 (alesapin).
  • Old versions of Replicated database doesn't have a special marker in [Zoo]Keeper. We need to check only whether the node contains come obscure data instead of special mark. #41875 (Nikita Mikhaylov).
  • Fix possible exception in fs cache. #41884 (Kseniia Sumarokova).
  • Fix use_environment_credentials for s3 table function. #41970 (Kseniia Sumarokova).
  • Fixed "Directory already exists and is not empty" error on detaching broken part that might prevent ReplicatedMergeTree table from starting replication. Fixes #40957. #41981 (Alexander Tokmakov).
  • toDateTime64() now returns the same output with negative integer and float arguments. #42025 (Robert Schulze).
  • Fix write into AzureBlobStorage. Partially closes #41754. #42034 (Kseniia Sumarokova).
  • Fix the bzip2 decoding issue for specific bzip2 files. #42046 (Nikolay Degterinsky).
    • Fix SQL function "toLastDayOfMonth()" with setting "enable_extended_results_for_datetime_functions = 1" at the beginning of the extended range (January 1900). - Fix SQL function "toRelativeWeekNum()" with setting "enable_extended_results_for_datetime_functions = 1" at the end of extended range (December 2299). - Improve the performance of for SQL functions "toISOYear()", "toFirstDayNumOfISOYearIndex()" and "toYearWeekOfNewyearMode()" by avoiding unnecessary index arithmetics. #42084 (Roman Vasin).
  • The maximum size of fetches for each table accidentally was set to 8 while the pool size could be bigger. Now the maximum size of fetches for table is equal to the pool size. #42090 (Nikita Mikhaylov).
  • A table might be shut down and a dictionary might be detached before checking if can be dropped without breaking dependencies between table, it's fixed. Fixes #41982. #42106 (Alexander Tokmakov).
  • Fix bad inefficiency of remote_filesystem_read_method=read with filesystem cache. Closes #42125. #42129 (Kseniia Sumarokova).
  • Fix possible timeout exception for distributed queries with use_hedged_requests=0. #42130 (Azat Khuzhin).
  • Fixed a minor bug inside function runningDifference in case of using it with Date32 type. Previously Date was used and it may cause some logical errors like Bad cast from type DB::ColumnVector<int> to DB::ColumnVector<unsigned short>'. #42143 (Alfred Xu).
  • Fix reusing of files > 4GB from base backup. #42146 (Azat Khuzhin).
  • DISTINCT in order fails with LOGICAL_ERROR if first column in sorting key contains function. #42186 (Igor Nikonov).
  • Fix a bug with projections and the aggregate_functions_null_for_empty setting. This bug is very rare and appears only if you enable the aggregate_functions_null_for_empty setting in the server's config. This closes #41647. #42198 (Alexey Milovidov).
  • Fix a bug which prevents ClickHouse to start when background_pool_size setting is set on default profile but background_merges_mutations_concurrency_ratio is not. #42315 (nvartolomei).
  • ALTER UPDATE of attached part (with columns different from table schema) could create an invalid columns.txt metadata on disk. Reading from such part could fail with errors or return invalid data. Fixes #42161. #42319 (Nikolai Kochetov).
  • Setting additional_table_filters were not applied to Distributed storage. Fixes #41692. #42322 (Nikolai Kochetov).
  • Fix a data race in query finish/cancel. This closes #42346. #42362 (Alexey Milovidov).
  • This reverts #40217 which introduced a regression in date/time functions. #42367 (Alexey Milovidov).
  • Fix assert cast in join on falsy condition, Close #42380. #42407 (Vladimir C).
  • Fix buffer overflow in the processing of Decimal data types. This closes #42451. #42465 (Alexey Milovidov).
  • AggregateFunctionQuantile now correctly works with UInt128 columns. Previously, the quantile state interpreted UInt128 columns as Int128 which could have led to incorrect results. #42473 (Antonio Andelic).
  • Fix bad_assert during INSERT into Annoy indexes over non-Float32 columns. #42485 (Robert Schulze).
  • This closes #42453. #42573 (Alexey Milovidov).
  • Fix function arrayElement with type Map with Nullable values and Nullable index. #42623 (Anton Popov).

Bug Fix (user-visible misbehaviour in official stable or prestable release)

  • Fix unexpected table loading error when partition key contains alias function names during server upgrade. #36379 (Amos Bird).

Build Improvement

NO CL ENTRY

  • NO CL ENTRY: 'Revert "Disable parallel s3 multipart upload for part moves."'. #41681 (Alexander Tokmakov).
  • NO CL ENTRY: 'Revert "Attempt to fix abort from parallel parsing"'. #42545 (Nikolai Kochetov).
  • NO CL ENTRY: 'Revert "Low cardinality cases moved to the function for its corresponding type"'. #42633 (Anton Popov).

NOT FOR CHANGELOG / INSIGNIFICANT