ClickHouse/CHANGELOG.md
2023-02-23 10:34:23 +01:00

87 KiB

Table of Contents

ClickHouse release v23.2, 2023-02-23
ClickHouse release v23.1, 2023-01-25
Changelog for 2022

2023 Changelog

ClickHouse release 23.2, 2023-02-23

Backward Incompatible Change

  • Extend function "toDayOfWeek()" (alias: "DAYOFWEEK") with a mode argument that encodes whether the week starts on Monday or Sunday and whether counting starts at 0 or 1. For consistency with other date time functions, the mode argument was inserted between the time and the time zone arguments. This breaks existing usage of the (previously undocumented) 2-argument syntax "toDayOfWeek(time, time_zone)". A fix is to rewrite the function into "toDayOfWeek(time, 0, time_zone)". #45233 (Robert Schulze).
  • Rename setting max_query_cache_size to filesystem_cache_max_download_size. #45614 (Kseniia Sumarokova).
  • The default user will not have permissions for access type SHOW NAMED COLLECTION by default (e.g. default user will no longer be able to grant ALL to other users as it was before, therefore this PR is backward incompatible). #46010 (Kseniia Sumarokova).
  • If the SETTINGS clause is specified before the FORMAT clause, the settings will be applied to formatting as well. #46003 (Azat Khuzhin).
  • Remove support for setting materialized_postgresql_allow_automatic_update (which was by default turned off). #46106 (Kseniia Sumarokova).
  • Slightly improve performance of countDigits on realistic datasets. This closed #44518. In previous versions, countDigits(0) returned 0; now it returns 1, which is more correct, and follows the existing documentation. #46187 (Alexey Milovidov).
  • Disallow creation of new columns compressed by a combination of codecs "Delta" or "DoubleDelta" followed by codecs "Gorilla" or "FPC". This can be bypassed using setting "allow_suspicious_codecs = true". #45652 (Robert Schulze).

New Feature

  • Add StorageIceberg and table function iceberg to access iceberg table store on S3. #45384 (flynn).
  • Allow configuring storage as SETTINGS disk = '<disk_name>' (instead of storage_policy) and with explicit disk creation SETTINGS disk = disk(type=s3, ...). #41976 (Kseniia Sumarokova).
  • Expose ProfileEvents counters in system.part_log. #38614 (Bharat Nallan).
  • Enrichment of the existing ReplacingMergeTree engine to allow duplicate the insertion. It leverages the power of both ReplacingMergeTree and CollapsingMergeTree in one MergeTree engine. Deleted data are not returned when queried, but not removed from disk neither. #41005 (youennL-cs).
  • Add generateULID function. Closes #36536. #44662 (Nikolay Degterinsky).
  • Add corrMatrix aggregate function, calculating each two columns. In addition, since Aggregatefunctions covarSamp and covarPop are similar to corr, I add covarSampMatrix, covarPopMatrix by the way. @alexey-milovidov closes #44587. #44680 (FFFFFFFHHHHHHH).
  • Introduce arrayShuffle function for random array permutations. #45271 (Joanna Hulboj).
  • Support types FIXED_SIZE_BINARY type in Arrow, FIXED_LENGTH_BYTE_ARRAY in Parquet and match them to FixedString. Add settings output_format_parquet_fixed_string_as_fixed_byte_array/output_format_arrow_fixed_string_as_fixed_byte_array to control default output type for FixedString. Closes #45326. #45340 (Kruglov Pavel).
  • Add a new column last_exception_time to system.replication_queue. #45457 (Frank Chen).
  • Add two new functions which allow for user-defined keys/seeds with SipHash{64,128}. #45513 (Salvatore Mesoraca).
  • Allow a three-argument version for table function format. close #45808. #45873 (FFFFFFFHHHHHHH).
  • Add JodaTime format support for 'x','w','S'. Refer to https://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html. #46073 (zk_kiger).
  • Support window function ntile.
  • Add setting final to implicitly apply the FINAL modifier to every table. #40945 (Arthur Passos).
  • Added arrayPartialSort and arrayPartialReverseSort functions. #46296 (Joanna Hulboj).
  • The new http parameter client_protocol_version allows setting a client protocol version for HTTP responses using the Native format. #40397. #46360 (Geoff Genz).
  • Add new function regexpExtract, like spark function REGEXP_EXTRACT for compatibility. It is similar to the existing function extract. #46469 (李扬).
  • Add new function JSONArrayLength, which returns the number of elements in the outermost JSON array. The function returns NULL if the input JSON string is invalid. #46631 (李扬).

Performance Improvement

  • The introduced logic works if PREWHERE condition is a conjunction of multiple conditions (cond1 AND cond2 AND ... ). It groups those conditions that require reading the same columns into steps. After each step the corresponding part of the full condition is computed and the result rows might be filtered. This allows to read fewer rows in the next steps thus saving IO bandwidth and doing less computation. This logic is disabled by default for now. It will be enabled by default in one of the future releases once it is known to not have any regressions, so it is highly encouraged to be used for testing. It can be controlled by 2 settings: "enable_multiple_prewhere_read_steps" and "move_all_conditions_to_prewhere". #46140 (Alexander Gololobov).
  • An option added to aggregate partitions independently if table partition key and group by key are compatible. Controlled by the setting allow_aggregate_partitions_independently. Disabled by default because of limited applicability (please refer to the docs). #45364 (Nikita Taranov).
  • Allow using Vertical merge algorithm with parts in Compact format. This will allow ClickHouse server to use much less memory for background operations. This closes #46084. #45681 #46282 (Anton Popov).
  • Optimize Parquet reader by using batch reader. #45878 (LiuNeng).
  • Add new local_filesystem_read_method method io_uring based on the asynchronous Linux io_uring subsystem, improving read performance almost universally compared to the default pread method. #38456 (Saulius Valatka).
  • Rewrite aggregate functions with if expression as argument when logically equivalent. For example, avg(if(cond, col, null)) can be rewritten to avgIf(cond, col). It is helpful in performance. #44730 (李扬).
  • Improve lower/upper function performance with avx512 instructions. #37894 (yaqi-zhao).
  • Remove the limitation that on systems with >=32 cores and SMT disabled ClickHouse uses only half of the cores (the case when you disable Hyper Threading in BIOS). #44973 (Robert Schulze).
  • Improve performance of function multiIf by columnar executing, speed up by 2.3x. #45296 (李扬).
  • Add fast path for function position when the needle is empty. #45382 (李扬).
  • Enable query_plan_remove_redundant_sorting optimization by default. Optimization implemented in #45420. #45567 (Igor Nikonov).
  • Increased HTTP Transfer Encoding chunk size to improve performance of large queries using the HTTP interface. #45593 (Geoff Genz).
  • Fixed performance of short SELECT queries that read from tables with large number of Array/Map/Nested columns. #45630 (Anton Popov).
  • Improve performance of filtering for big integers and decimal types. #45949 (李扬).
  • This change could effectively reduce the overhead of obtaining the filter from ColumnNullable(UInt8) and improve the overall query performance. To evaluate the impact of this change, we adopted TPC-H benchmark but revised the column types from non-nullable to nullable, and we measured the QPS of its queries as the performance indicator. #45962 (Zhiguo Zhou).
  • Make the _part and _partition_id virtual column be LowCardinality(String) type. Closes #45964. #45975 (flynn).
  • Improve the performance of Decimal conversion when the scale does not change. #46095 (Alexey Milovidov).
  • Allow to increase prefetching for read data. #46168 (Kseniia Sumarokova).
  • Rewrite arrayExists(x -> x = 1, arr) -> has(arr, 1), which improve performance by 1.34x. #46188 (李扬).
  • Fix too big memory usage for vertical merges on non-remote disk. Respect max_insert_delayed_streams_for_parallel_write for the remote disk. #46275 (Nikolai Kochetov).
  • Update zstd to v1.5.4. It has some minor improvements in performance and compression ratio. If you run replicas with different versions of ClickHouse you may see reasonable error messages Data after merge/mutation is not byte-identical to data on another replicas. with explanation. These messages are Ok and you should not worry. #46280 (Raúl Marín).
  • Fix performance degradation caused by #39737. #46309 (Alexey Milovidov).
  • The replicas_status handle will answer quickly even in case of a large replication queue. #46310 (Alexey Milovidov).
  • Add avx512 support for aggregate function sum, function unary arithmetic, function comparison. #37870 (zhao zhou).
  • Rewrote the code around marks distribution and the overall coordination of the reading in order to achieve the maximum performance improvement. This closes #34527. #43772 (Nikita Mikhaylov).
  • Remove redundant DISTINCT clauses in query (subqueries). Implemented on top of query plan. It does similar optimization as optimize_duplicate_order_by_and_distinct regarding DISTINCT clauses. Can be enabled via query_plan_remove_redundant_distinct setting. Related to #42648. #44176 (Igor Nikonov).
  • A few query rewrite optimizations: sumIf(123, cond) -> 123 * countIf(1, cond), sum(if(cond, 123, 0)) -> 123 * countIf(cond), sum(if(cond, 0, 123)) -> 123 * countIf(not(cond)) #44728 (李扬).
  • Improved how memory bound merging and aggregation in order on top query plan interact. Previously we fell back to explicit sorting for AIO in some cases when it wasn't actually needed. #45892 (Nikita Taranov).
  • Concurrent merges are scheduled using round-robin by default to ensure fair and starvation-free operation. Previously in heavily overloaded shards, big merges could possibly be starved by smaller merges due to the use of strict priority scheduling. Added background_merges_mutations_scheduling_policy server config option to select scheduling algorithm (round_robin or shortest_task_first). #46247 (Sergei Trifonov).

Improvement

  • Enable retries for INSERT by default in case of ZooKeeper session loss. We already use it in production. #46308 (Alexey Milovidov).
  • Add ability to ignore unknown keys in JSON object for named tuples (input_format_json_ignore_unknown_keys_in_named_tuple). #45678 (Azat Khuzhin).
  • Support optimizing the where clause with sorting key expression move to prewhere for query with final. #38893. #38950 (hexiaoting).
  • Add new metrics for backups: num_processed_files and processed_files_size described actual number of processed files. #42244 (Aleksandr).
  • Added retries on interserver DNS errors. #43179 (Anton Kozlov).
  • Keeper improvement: try preallocating space on the disk to avoid undefined out-of-space issues. Introduce setting max_log_file_size for the maximum size of Keeper's Raft log files. #44370 (Antonio Andelic).
  • Optimize behavior for a replica delay api logic in case the replica is read-only. #45148 (mateng915).
  • Ask for the password in clickhouse-client interactively in a case when the empty password is wrong. Closes #46702. #46730 (Nikolay Degterinsky).
  • Mark Gorilla compression on columns of non-Float* type as suspicious. #45376 (Robert Schulze).
  • Show replica name that is executing a merge in the postpone_reason column. #45458 (Frank Chen).
  • Save exception stack trace in part_log. #45459 (Frank Chen).
  • The regexp_tree dictionary is polished and now it is compatible with https://github.com/ua-parser/uap-core. #45631 (Han Fei).
  • Updated checking of SYSTEM SYNC REPLICA, resolves #45508 #45648 (SmitaRKulkarni).
  • Rename setting replication_alter_partitions_sync to alter_sync. #45659 (Antonio Andelic).
  • The generateRandom table function and the engine now support LowCardinality data types. This is useful for testing, for example you can write INSERT INTO table SELECT * FROM generateRandom() LIMIT 1000. This is needed to debug #45590. #45661 (Alexey Milovidov).
  • The experimental query result cache now provides more modular configuration settings. #45679 (Robert Schulze).
  • Renamed "query result cache" to "query cache". #45682 (Robert Schulze).
  • add SYSTEM SYNC FILE CACHE command. It will do the sync syscall. #8921. #45685 (DR).
  • Add a new S3 setting allow_head_object_request. This PR makes usage of GetObjectAttributes request instead of HeadObject introduced in https://github.com/ClickHouse/ClickHouse/pull/45288 optional (and disabled by default). #45701 (Vitaly Baranov).
  • Add ability to override connection settings based on connection names (that said that now you can forget about storing password for each connection, you can simply put everything into ~/.clickhouse-client/config.xml and even use different history files for them, which can be also useful). #45715 (Azat Khuzhin).
  • Arrow format: support the duration type. Closes #45669. #45750 (flynn).
  • Extend the logging in the Query Cache to improve investigations of the caching behavior. #45751 (Robert Schulze).
  • The query cache's server-level settings are now reconfigurable at runtime. #45758 (Robert Schulze).
  • Hide password in logs when a table function's arguments are specified with a named collection. #45774 (Vitaly Baranov).
  • Improve internal S3 client to correctly deduce regions and redirections for different types of URLs. #45783 (Antonio Andelic).
  • Add support for Map, IPv4 and IPv6 types in generateRandom. Mostly useful for testing. #45785 (Raúl Marín).
  • Support empty/notEmpty for IP types. #45799 (Yakov Olkhovskiy).
  • The column num_processed_files was split into two columns: num_files (for BACKUP) and files_read (for RESTORE). The column processed_files_size was split into two columns: total_size (for BACKUP) and bytes_read (for RESTORE). #45800 (Vitaly Baranov).
  • Add support for SHOW ENGINES query for MySQL compatibility. #45859 (Filatenkov Artur).
  • Improved how the obfuscator deals with queries. #45867 (Raúl Marín).
  • Improve behaviour of conversion into Date for boundary value 65535 (2149-06-06). #46042 #45914 (Joanna Hulboj).
  • Add setting check_referential_table_dependencies to check referential dependencies on DROP TABLE. This PR solves #38326. #45936 (Vitaly Baranov).
  • Fix tupleElement to return Null when having Null argument. Closes #45894. #45952 (flynn).
  • Throw an error on no files satisfying the S3 wildcard. Closes #45587. #45957 (chen).
  • Use cluster state data to check concurrent backup/restore. #45982 (SmitaRKulkarni).
  • ClickHouse Client: Use "exact" matching for fuzzy search, which has correct case ignorance and more appropriate algorithm for matching SQL queries. #46000 (Azat Khuzhin).
  • Forbid wrong create View syntax CREATE View X TO Y AS SELECT. Closes #4331. #46043 (flynn).
  • Storage Log family support setting the storage_policy. Closes #43421. #46044 (flynn).
  • Improve JSONColumns format when the result is empty. Closes #46024. #46053 (flynn).
  • Add reference implementation for SipHash128. #46065 (Salvatore Mesoraca).
  • Add a new metric to record allocations times and bytes using mmap. #46068 (李扬).
  • Currently for functions like leftPad, rightPad, leftPadUTF8, rightPadUTF8, the second argument length must be UInt8|16|32|64|128|256. Which is too strict for clickhouse users, besides, it is not consistent with other similar functions like arrayResize, substring and so on. #46103 (李扬).
  • Fix assertion in the welchTTest function in debug build when the resulting statistics is NaN. Unified the behavior with other similar functions. Change the behavior of studentTTest to return NaN instead of throwing an exception because the previous behavior was inconvenient. This closes #41176 This closes #42162. #46141 (Alexey Milovidov).
  • More convenient usage of big integers and ORDER BY WITH FILL. Allow using plain integers for start and end points in WITH FILL when ORDER BY big (128-bit and 256-bit) integers. Fix the wrong result for big integers with negative start or end points. This closes #16733. #46152 (Alexey Milovidov).
  • Add parts, active_parts and total_marks columns to system.tables on issue. #46161 (attack204).
  • Functions "multi[Fuzzy]Match(Any|AnyIndex|AllIndices}" now reject regexes which will likely evaluate very slowly in vectorscan. #46167 (Robert Schulze).
  • When insert_null_as_default is enabled and column doesn't have defined default value, the default of column type will be used. Also this PR fixes using default values on nulls in case of LowCardinality columns. #46171 (Kruglov Pavel).
  • Prefer explicitly defined access keys for S3 clients. If use_environment_credentials is set to true, and the user has provided the access key through query or config, they will be used instead of the ones from the environment variable. #46191 (Antonio Andelic).
  • Add an alias "DATE_FORMAT()" for function "formatDateTime()" to improve compatibility with MySQL's SQL dialect, extend function formatDateTime with substitutions "a", "b", "c", "h", "i", "k", "l" "r", "s", "W". ### Documentation entry for user-facing changes User-readable short description: DATE_FORMAT is an alias of formatDateTime. Formats a Time according to the given Format string. Format is a constant expression, so you cannot have multiple formats for a single result column. (Provide link to formatDateTime). #46302 (Jake Bamrah).
  • Add ProfileEvents and CurrentMetrics about the callback tasks for parallel replicas (s3Cluster and MergeTree tables). #46313 (Alexey Milovidov).
  • Add support for DELETE and UPDATE for tables using KeeperMap storage engine. #46330 (Antonio Andelic).
  • Allow writing RENAME queries with query parameters. Resolves #45778. #46407 (Nikolay Degterinsky).
  • Fix parameterized SELECT queries with REPLACE transformer. Resolves #33002. #46420 (Nikolay Degterinsky).
  • Exclude the internal database used for temporary/external tables from the calculation of asynchronous metric "NumberOfDatabases". This makes the behavior consistent with system table "system.databases". #46435 (Robert Schulze).
  • Added last_exception_time column into distribution_queue table. #46564 (Aleksandr).
  • Support for IN clause with parameter in parameterized views. #46583 (SmitaRKulkarni).
  • Do not load named collections on server startup (load them on first access instead). #46607 (Kseniia Sumarokova).

Build/Testing/Packaging Improvement

  • Introduce GWP-ASan implemented by the LLVM runtime. This closes #27039. #45226 (Han Fei).
  • We want to make our tests less stable and more flaky: add randomization for merge tree settings in tests. #38983 (Anton Popov).
  • Enable the HDFS support in PowerPC and which helps to fixes the following functional tests 02113_hdfs_assert.sh, 02244_hdfs_cluster.sql and 02368_cancel_write_into_hdfs.sh. #44949 (MeenaRenganathan22).
  • Add systemd.service file for clickhouse-keeper. Fixes #44293. #45568 (Mikhail f. Shiryaev).
  • ClickHouse's fork of poco was moved from "contrib/" to "base/poco/". #46075 (Robert Schulze).
  • Add an option for clickhouse-watchdog to restart the child process. This does not make a lot of use. #46312 (Alexey Milovidov).
  • If the environment variable CLICKHOUSE_DOCKER_RESTART_ON_EXIT is set to 1, the Docker container will run clickhouse-server as a child instead of the first process, and restart it when it exited. #46391 (Alexey Milovidov).
  • Fix Systemd service file. #46461 (SuperDJY).
  • Raised the minimum Clang version needed to build ClickHouse from 12 to 15. #46710 (Robert Schulze).
  • Upgrade Intel QPL from v0.3.0 to v1.0.0 2. Build libaccel-config and link it statically to QPL library instead of dynamically. #45809 (jasperzhu).

Bug Fix (user-visible misbehavior in official stable or prestable release)

  • Flush data exactly by rabbitmq_flush_interval_ms or by rabbitmq_max_block_size in StorageRabbitMQ. Closes #42389. Closes #45160. #44404 (Kseniia Sumarokova).
  • Use PODArray to render in sparkBar function, so we can control the memory usage. Close #44467. #44489 (Duc Canh Le).
  • Fix functions (quantilesExactExclusive, quantilesExactInclusive) return unsorted array element. #45379 (wujunfu).
  • Fix uncaught exception in HTTPHandler when open telemetry is enabled. #45456 (Frank Chen).
  • Don't infer Dates from 8 digit numbers. It could lead to wrong data to be read. #45581 (Kruglov Pavel).
  • Fixes to correctly use odbc_bridge_use_connection_pooling setting. #45591 (Bharat Nallan).
  • When the callback in the cache is called, it is possible that this cache is destructed. To keep it safe, we capture members by value. It's also safe for task schedule because it will be deactivated before storage is destroyed. Resolve #45548. #45601 (Han Fei).
  • Fix data corruption when codecs Delta or DoubleDelta are combined with codec Gorilla. #45615 (Robert Schulze).
  • Correctly check types when using N-gram bloom filter index to avoid invalid reads. #45617 (Antonio Andelic).
  • A couple of segfaults have been reported around c-ares. They were introduced in my previous pull requests. I have fixed them with the help of Alexander Tokmakov. #45629 (Arthur Passos).
  • Fix key description when encountering duplicate primary keys. This can happen in projections. See #45590 for details. #45686 (Amos Bird).
  • Set compression method and level for backup Closes #45690. #45737 (Pradeep Chhetri).
  • Should use select_query_typed.limitByOffset() instead of select_query_typed.limitOffset(). #45817 (刘陶峰).
  • When use experimental analyzer, queries like SELECT number FROM numbers(100) LIMIT 10 OFFSET 10; get wrong results (empty result for this sql). That is caused by an unnecessary offset step added by planner. #45822 (刘陶峰).
  • Backward compatibility - allow implicit narrowing conversion from UInt64 to IPv4 - required for "INSERT ... VALUES ..." expression. #45865 (Yakov Olkhovskiy).
  • Bugfix IPv6 parser for mixed ip4 address with missed first octet (like ::.1.2.3). #45871 (Yakov Olkhovskiy).
  • Add the query_kind column to the system.processes table and the SHOW PROCESSLIST query. Remove duplicate code. It fixes a bug: the global configuration parameter max_concurrent_select_queries was not respected to queries with INTERSECT or EXCEPT chains. #45872 (Alexey Milovidov).
  • Fix crash in a function stochasticLinearRegression. Found by WingFuzz. #45985 (Nikolai Kochetov).
  • Fix crash in SELECT queries with INTERSECT and EXCEPT modifiers that read data from tables with enabled sparse columns (controlled by setting ratio_of_defaults_for_sparse_serialization). #45987 (Anton Popov).
  • Fix read in order optimization for DESC sorting with FINAL, close #45815. #46009 (Vladimir C).
  • Fix reading of non existing nested columns with multiple level in compact parts. #46045 (Azat Khuzhin).
  • Fix elapsed column in system.processes (10x error). #46047 (Azat Khuzhin).
  • Follow-up fix for Replace domain IP types (IPv4, IPv6) with native https://github.com/ClickHouse/ClickHouse/pull/43221. #46087 (Yakov Olkhovskiy).
  • Fix environment variable substitution in the configuration when a parameter already has a value. This closes #46131. This closes #9547. #46144 (pufit).
  • Fix incorrect predicate push down with grouping sets. Closes #45947. #46151 (flynn).
  • Fix possible pipeline stuck error on fulls_sorting_join with constant keys. #46175 (Vladimir C).
  • Never rewrite tuple functions as literals during formatting to avoid incorrect results. #46232 (Salvatore Mesoraca).
  • Fix possible out of bounds error while reading LowCardinality(Nullable) in Arrow format. #46270 (Kruglov Pavel).
  • Fix SYSTEM UNFREEZE queries failing with the exception CANNOT_PARSE_INPUT_ASSERTION_FAILED. #46325 (Aleksei Filatov).
  • Fix possible crash which can be caused by an integer overflow while deserializing aggregating state of a function that stores HashTable. #46349 (Nikolai Kochetov).
  • Fix possible LOGICAL_ERROR in asynchronous inserts with invalid data sent in format VALUES. #46350 (Anton Popov).
  • Fixed a LOGICAL_ERROR on an attempt to execute ALTER ... MOVE PART ... TO TABLE. This type of query was never actually supported. #46359 (Alexander Tokmakov).
  • Fix s3Cluster schema inference in parallel distributed insert select when parallel_distributed_insert_select is enabled. #46381 (Kruglov Pavel).
  • Fix queries like ALTER TABLE ... UPDATE nested.arr1 = nested.arr2 ..., where arr1 and arr2 are fields of the same Nested column. #46387 (Anton Popov).
  • Scheduler may fail to schedule a task. If it happens, the whole MulityPartUpload should be aborted and UploadHelper must wait for already scheduled tasks. #46451 (Dmitry Novik).
  • Fix PREWHERE for Merge with different default types (fixes some NOT_FOUND_COLUMN_IN_BLOCK when the default type for the column differs, also allow PREWHERE when the type of column is the same across tables, and prohibit it, only if it differs). #46454 (Azat Khuzhin).
  • Fix a crash that could happen when constant values are used in ORDER BY. Fixes #46466. #46493 (Nikolai Kochetov).
  • Do not throw exception if disk setting was specified on query level, but storage_policy was specified in config merge tree settings section. disk will override setting from config. #46533 (Kseniia Sumarokova).
  • Fix an invalid processing of constant LowCardinality argument in function arrayMap. This bug could lead to a segfault in release, and logical error Bad cast in debug build. #46569 (Alexey Milovidov).
  • fixes #46557. #46611 (Alexander Gololobov).
  • Fix endless restarts of clickhouse-server systemd unit if server cannot start within 1m30sec (Disable timeout logic for starting clickhouse-server from systemd service). #46613 (Azat Khuzhin).
  • Allocated during asynchronous inserts memory buffers were deallocated in the global context and MemoryTracker counters for corresponding user and query were not updated correctly. That led to false positive OOM exceptions. #46622 (Dmitry Novik).
  • Updated to not clear on_expression from table_join as its used by future analyze runs resolves #45185. #46487 (SmitaRKulkarni).

ClickHouse release 23.1, 2023-01-26

ClickHouse release 23.1

Upgrade Notes

  • The SYSTEM RESTART DISK query becomes a no-op. #44647 (alesapin).
  • The PREALLOCATE option for HASHED/SPARSE_HASHED dictionaries becomes a no-op. #45388 (Azat Khuzhin). It does not give significant advantages anymore.
  • Disallow Gorilla codec on columns of non-Float32 or non-Float64 type. #45252 (Robert Schulze). It was pointless and led to inconsistencies.
  • Parallel quorum inserts might work incorrectly with *MergeTree tables created with the deprecated syntax. Therefore, parallel quorum inserts support is completely disabled for such tables. It does not affect tables created with a new syntax. #45430 (Alexander Tokmakov).
  • Use the GetObjectAttributes request instead of the HeadObject request to get the size of an object in AWS S3. This change fixes handling endpoints without explicit regions after updating the AWS SDK, for example. #45288 (Vitaly Baranov). AWS S3 and Minio are tested, but keep in mind that various S3-compatible services (GCS, R2, B2) may have subtle incompatibilities. This change also may require you to adjust the ACL to allow the GetObjectAttributes request.
  • Forbid paths in timezone names. For example, a timezone name like /usr/share/zoneinfo/Asia/Aden is not allowed; the IANA timezone database name like Asia/Aden should be used. #44225 (Kruglov Pavel).
  • Queries combining equijoin and constant expressions (e.g., JOIN ON t1.x = t2.x AND 1 = 1) are forbidden due to incorrect results. #44016 (Vladimir C).

New Feature

  • Dictionary source for extracting keys by traversing regular expressions tree. It can be used for User-Agent parsing. #40878 (Vage Ogannisian). #43858 (Han Fei).
  • Added parametrized view functionality, now it's possible to specify query parameters for the View table engine. resolves #40907. #41687 (SmitaRKulkarni).
  • Add quantileInterpolatedWeighted/quantilesInterpolatedWeighted functions. #38252 (Bharat Nallan).
  • Array join support for the Map type, like the function "explode" in Spark. #43239 (李扬).
  • Support SQL standard binary and hex string literals. #43785 (Mo Xuan).
  • Allow formatting DateTime in Joda-Time style. Refer to the Joda-Time docs. #43818 (李扬).
  • Implemented a fractional second formatter (%f) for formatDateTime. #44060 (ltrk2). #44497 (Alexander Gololobov).
  • Added age function to calculate the difference between two dates or dates with time values expressed as the number of full units. Closes #41115. #44421 (Robert Schulze).
  • Add Null source for dictionaries. Closes #44240. #44502 (mayamika).
  • Allow configuring the S3 storage class with the s3_storage_class configuration option. Such as <s3_storage_class>STANDARD/INTELLIGENT_TIERING</s3_storage_class> Closes #44443. #44707 (chen).
  • Insert default values in case of missing elements in JSON object while parsing named tuple. Add setting input_format_json_defaults_for_missing_elements_in_named_tuple that controls this behaviour. Closes #45142#issuecomment-1380153217. #45231 (Kruglov Pavel).
  • Record server startup time in ProfileEvents (ServerStartupMilliseconds). Resolves #43188. #45250 (SmitaRKulkarni).
  • Refactor and Improve streaming engines Kafka/RabbitMQ/NATS and add support for all formats, also refactor formats a bit: - Fix producing messages in row-based formats with suffixes/prefixes. Now every message is formatted completely with all delimiters and can be parsed back using input format. - Support block-based formats like Native, Parquet, ORC, etc. Every block is formatted as a separate message. The number of rows in one message depends on the block size, so you can control it via the setting max_block_size. - Add new engine settings kafka_max_rows_per_message/rabbitmq_max_rows_per_message/nats_max_rows_per_message. They control the number of rows formatted in one message in row-based formats. Default value: 1. - Fix high memory consumption in the NATS table engine. - Support arbitrary binary data in NATS producer (previously it worked only with strings contained \0 at the end) - Add missing Kafka/RabbitMQ/NATS engine settings in the documentation. - Refactor producing and consuming in Kafka/RabbitMQ/NATS, separate it from WriteBuffers/ReadBuffers semantic. - Refactor output formats: remove callbacks on each row used in Kafka/RabbitMQ/NATS (now we don't use callbacks there), allow to use IRowOutputFormat directly, clarify row end and row between delimiters, make it possible to reset output format to start formatting again - Add proper implementation in formatRow function (bonus after formats refactoring). #42777 (Kruglov Pavel).
  • Support reading/writing Nested tables as List of Struct in CapnProto format. Read/write Decimal32/64 as Int32/64. Closes #43319. #43379 (Kruglov Pavel).
  • Added a message_format_string column to system.text_log. The column contains a pattern that was used to format the message. #44543 (Alexander Tokmakov). This allows various analytics over the ClickHouse logs.
  • Try to autodetect headers with column names (and maybe types) for CSV/TSV/CustomSeparated input formats. Add settings input_format_tsv/csv/custom_detect_header that enable this behaviour (enabled by default). Closes #44640. #44953 (Kruglov Pavel).

Experimental Feature

  • Add an experimental inverted index as a new secondary index type for efficient text search. #38667 (larryluogit).
  • Add experimental query result cache. #43797 (Robert Schulze).
  • Added extendable and configurable scheduling subsystem for IO requests (not yet integrated with IO code itself). #41840 (Sergei Trifonov). This feature does nothing at all, enjoy.
  • Added SYSTEM DROP DATABASE REPLICA that removes metadata of a dead replica of a Replicated database. Resolves #41794. #42807 (Alexander Tokmakov).

Performance Improvement

  • Do not load inactive parts at startup of MergeTree tables. #42181 (Anton Popov).
  • Improved latency of reading from storage S3 and table function s3 with large numbers of small files. Now settings remote_filesystem_read_method and remote_filesystem_read_prefetch take effect while reading from storage S3. #43726 (Anton Popov).
  • Optimization for reading struct fields in Parquet/ORC files. Only the required fields are loaded. #44484 (lgbo).
  • Two-level aggregation algorithm was mistakenly disabled for queries over the HTTP interface. It was enabled back, and it leads to a major performance improvement. #45450 (Nikolai Kochetov).
  • Added mmap support for StorageFile, which should improve the performance of clickhouse-local. #43927 (pufit).
  • Added sharding support in HashedDictionary to allow parallel load (almost linear scaling based on number of shards). #40003 (Azat Khuzhin).
  • Speed up query parsing. #42284 (Raúl Marín).
  • Always replace OR chain expr = x1 OR ... OR expr = xN to expr IN (x1, ..., xN) in the case where expr is a LowCardinality column. Setting optimize_min_equality_disjunction_chain_length is ignored in this case. #42889 (Guo Wangyang).
  • Slightly improve performance by optimizing the code around ThreadStatus. #43586 (Zhiguo Zhou).
  • Optimize the column-wise ternary logic evaluation by achieving auto-vectorization. In the performance test of this microbenchmark, we've observed a peak performance gain of 21x on the ICX device (Intel Xeon Platinum 8380 CPU). #43669 (Zhiguo Zhou).
  • Avoid acquiring read locks in the system.tables table if possible. #43840 (Raúl Marín).
  • Optimize ThreadPool. The performance experiments of SSB (Star Schema Benchmark) on the ICX device (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads) shows that this change could effectively decrease the lock contention for ThreadPoolImpl::mutex by 75%, increasing the CPU utilization and improving the overall performance by 2.4%. #44308 (Zhiguo Zhou).
  • Now the optimisation for predicting the hash table size is applied only if the cached hash table size is sufficiently large (thresholds were determined empirically and hardcoded). #44455 (Nikita Taranov).
  • Small performance improvement for asynchronous reading from remote filesystems. #44868 (Kseniia Sumarokova).
  • Add fast path for: - col like '%%'; - col like '%'; - col not like '%'; - col not like '%'; - match(col, '.*'). #45244 (李扬).
  • Slightly improve happy path optimisation in filtering (WHERE clause). #45289 (Nikita Taranov).
  • Provide monotonicity info for toUnixTimestamp64* to enable more algebraic optimizations for index analysis. #44116 (Nikita Taranov).
  • Allow the configuration of temporary data for query processing (spilling to disk) to cooperate with the filesystem cache (taking up the space from the cache disk) #43972 (Vladimir C). This mainly improves ClickHouse Cloud, but can be used for self-managed setups as well, if you know what to do.
  • Make system.replicas table do parallel fetches of replicas statuses. Closes #43918. #43998 (Nikolay Degterinsky).
  • Optimize memory consumption during backup to S3: files to S3 now will be copied directly without using WriteBufferFromS3 (which could use a lot of memory). #45188 (Vitaly Baranov).
  • Add a cache for async block ids. This will reduce the number of requests of ZooKeeper when we enable async inserts deduplication. #45106 (Han Fei).

Improvement

  • Use structure from insertion table in generateRandom without arguments. #45239 (Kruglov Pavel).
  • Allow to implicitly convert floats stored in string fields of JSON to integers in JSONExtract functions. E.g. JSONExtract('{"a": "1000.111"}', 'a', 'UInt64') -> 1000, previously it returned 0. #45432 (Anton Popov).
  • Added fields supports_parallel_parsing and supports_parallel_formatting to table system.formats for better introspection. #45499 (Anton Popov).
  • Improve reading CSV field in CustomSeparated/Template format. Closes #42352 Closes #39620. #43332 (Kruglov Pavel).
  • Unify query elapsed time measurements. #43455 (Raúl Marín).
  • Improve automatic usage of structure from insertion table in table functions file/hdfs/s3 when virtual columns are present in a select query, it fixes the possible error Block structure mismatch or number of columns mismatch. #43695 (Kruglov Pavel).
  • Add support for signed arguments in the function range. Fixes #43333. #43733 (sanyu).
  • Remove redundant sorting, for example, sorting related ORDER BY clauses in subqueries. Implemented on top of query plan. It does similar optimization as optimize_duplicate_order_by_and_distinct regarding ORDER BY clauses, but more generic, since it's applied to any redundant sorting steps (not only caused by ORDER BY clause) and applied to subqueries of any depth. Related to #42648. #43905 (Igor Nikonov).
  • Add the ability to disable deduplication of files for BACKUP (for backups without deduplication ATTACH can be used instead of full RESTORE). For example BACKUP foo TO S3(...) SETTINGS deduplicate_files=0 (default deduplicate_files=1). #43947 (Azat Khuzhin).
  • Refactor and improve schema inference for text formats. Add new setting schema_inference_make_columns_nullable that controls making result types Nullable (enabled by default);. #44019 (Kruglov Pavel).
  • Better support for PROXYv1 protocol. #44135 (Yakov Olkhovskiy).
  • Add information about the latest part check by cleanup threads into system.parts table. #44244 (Dmitry Novik).
  • Disable table functions in readonly mode for inserts. #44290 (SmitaRKulkarni).
  • Add a setting simultaneous_parts_removal_limit to allow limiting the number of parts being processed by one iteration of CleanupThread. #44461 (Dmitry Novik).
  • Do not initialize ReadBufferFromS3 when only virtual columns are needed in a query. This may be helpful to #44246. #44493 (chen).
  • Prevent duplicate column names hints. Closes #44130. #44519 (Joanna Hulboj).
  • Allow macro substitution in endpoint of disks. Resolve #40951. #44533 (SmitaRKulkarni).
  • Improve schema inference when input_format_json_read_object_as_string is enabled. #44546 (Kruglov Pavel).
  • Add a user-level setting database_replicated_allow_replicated_engine_arguments which allows banning the creation of ReplicatedMergeTree tables with arguments in DatabaseReplicated. #44566 (alesapin).
  • Prevent users from mistakenly specifying zero (invalid) value for index_granularity. This closes #44536. #44578 (Alexey Milovidov).
  • Added possibility to set path to service keytab file in keytab parameter in kerberos section of config.xml. #44594 (Roman Vasin).
  • Use already written part of the query for fuzzy search (pass to the skim library, which is written in Rust and linked statically to ClickHouse). #44600 (Azat Khuzhin).
  • Enable input_format_json_read_objects_as_strings by default to be able to read nested JSON objects while JSON Object type is experimental. #44657 (Kruglov Pavel).
  • Improvement for deduplication of async inserts: when users do duplicate async inserts, we should deduplicate inside the memory before we query Keeper. #44682 (Han Fei).
  • Input/ouptut Avro format will parse bool type as ClickHouse bool type. #44684 (Kruglov Pavel).
  • Support Bool type in Arrow/Parquet/ORC. Closes #43970. #44698 (Kruglov Pavel).
  • Don't greedily parse beyond the quotes when reading UUIDs - it may lead to mistakenly successful parsing of incorrect data. #44686 (Raúl Marín).
  • Infer UInt64 in case of Int64 overflow and fix some transforms in schema inference. #44696 (Kruglov Pavel).
  • Previously dependency resolving inside Replicated database was done in a hacky way, and now it's done right using an explicit graph. #44697 (Nikita Mikhaylov).
  • Fix output_format_pretty_row_numbers does not preserve the counter across the blocks. Closes #44815. #44832 (flynn).
  • Don't report errors in system.errors due to parts being merged concurrently with the background cleanup process. #44874 (Raúl Marín).
  • Optimize and fix metrics for Distributed async INSERT. #44922 (Azat Khuzhin).
  • Added settings to disallow concurrent backups and restores resolves #43891 Implementation: * Added server-level settings to disallow concurrent backups and restores, which are read and set when BackupWorker is created in Context. * Settings are set to true by default. * Before starting backup or restores, added a check to see if any other backups/restores are running. For internal requests, it checks if it is from the self node using backup_uuid. #45072 (SmitaRKulkarni).
  • Add <storage_policy> config parameter for system logs. #45320 (Stig Bakken).

Build/Testing/Packaging Improvement

  • Statically link with the skim library (it is written in Rust) for fuzzy search in clickhouse client/local history. #44239 (Azat Khuzhin).
  • We removed support for shared linking because of Rust. Actually, Rust is only an excuse for this removal, and we wanted to remove it nevertheless. #44828 (Alexey Milovidov).
  • Remove the dependency on the adduser tool from the packages, because we don't use it. This fixes #44934. #45011 (Alexey Milovidov).
  • The SQLite library is updated to the latest. It is used for the SQLite database and table integration engines. Also, fixed a false-positive TSan report. This closes #45027. #45031 (Alexey Milovidov).
  • CRC-32 changes to address the WeakHash collision issue in PowerPC. #45144 (MeenaRenganathan22).
  • Update aws-c* submodules #43020 (Vitaly Baranov).
  • Automatically merge green backport PRs and green approved PRs #41110 (Mikhail f. Shiryaev).
  • Introduce a website for the status of ClickHouse CI. Source.

Bug Fix

  • Replace domain IP types (IPv4, IPv6) with native. #43221 (Yakov Olkhovskiy). It automatically fixes some missing implementations in the code.
  • Fix the backup process if mutations get killed during the backup process. #45351 (Vitaly Baranov).
  • Fix the Invalid number of rows in Chunk exception message. #41404. #42126 (Alexander Gololobov).
  • Fix possible use of an uninitialized value after executing expressions after sorting. Closes #43386 #43635 (Kruglov Pavel).
  • Better handling of NULL in aggregate combinators, fix possible segfault/logical error while using an obscure optimization optimize_rewrite_sum_if_to_count_if. Closes #43758. #43813 (Kruglov Pavel).
  • Fix CREATE USER/ROLE query settings constraints. #43993 (Nikolay Degterinsky).
  • Fixed bug with non-parsable default value for EPHEMERAL column in table metadata. #44026 (Yakov Olkhovskiy).
  • Fix parsing of bad version from compatibility setting. #44224 (Kruglov Pavel).
  • Bring interval subtraction from datetime in line with addition. #44241 (ltrk2).
  • Remove limits on the maximum size of the result for view. #44261 (lizhuoyu5).
  • Fix possible logical error in cache if do_not_evict_index_and_mrk_files=1. Closes #42142. #44268 (Kseniia Sumarokova).
  • Fix possible too early cache write interruption in write-through cache (caching could be stopped due to false assumption when it shouldn't have). #44289 (Kseniia Sumarokova).
  • Fix possible crash in the case function IN with constant arguments was used as a constant argument together with LowCardinality. Fixes #44221. #44346 (Nikolai Kochetov).
  • Fix support for complex parameters (like arrays) of parametric aggregate functions. This closes #30975. The aggregate function sumMapFiltered was unusable in distributed queries before this change. #44358 (Alexey Milovidov).
  • Fix reading ObjectId in BSON schema inference. #44382 (Kruglov Pavel).
  • Fix race which can lead to premature temp parts removal before merge finishes in ReplicatedMergeTree. This issue could lead to errors like No such file or directory: xxx. Fixes #43983. #44383 (alesapin).
  • Some invalid SYSTEM ... ON CLUSTER queries worked in an unexpected way if a cluster name was not specified. It's fixed, now invalid queries throw SYNTAX_ERROR as they should. Fixes #44264. #44387 (Alexander Tokmakov).
  • Fix reading Map type in ORC format. #44400 (Kruglov Pavel).
  • Fix reading columns that are not presented in input data in Parquet/ORC formats. Previously it could lead to error INCORRECT_NUMBER_OF_COLUMNS. Closes #44333. #44405 (Kruglov Pavel).
  • Previously the bar function used the same '▋' (U+258B "Left five eighths block") character to display both 5/8 and 6/8 bars. This change corrects this behavior by using '▊' (U+258A "Left three quarters block") for displaying 6/8 bar. #44410 (Alexander Gololobov).
  • Placing profile settings after profile settings constraints in the configuration file made constraints ineffective. #44411 (Konstantin Bogdanov).
  • Fix SYNTAX_ERROR while running EXPLAIN AST INSERT queries with data. Closes #44207. #44413 (save-my-heart).
  • Fix reading bool value with CRLF in CSV format. Closes #44401. #44442 (Kruglov Pavel).
  • Don't execute and/or/if/multiIf on a LowCardinality dictionary, so the result type cannot be LowCardinality. It could lead to the error Illegal column ColumnLowCardinality in some cases. Fixes #43603. #44469 (Kruglov Pavel).
  • Fix mutations with the setting max_streams_for_merge_tree_reading. #44472 (Anton Popov).
  • Fix potential null pointer dereference with GROUPING SETS in ASTSelectQuery::formatImpl (#43049). #44479 (Robert Schulze).
  • Validate types in table function arguments, CAST function arguments, JSONAsObject schema inference according to settings. #44501 (Kruglov Pavel).
  • Fix IN function with LowCardinality and const column, close #44503. #44506 (Duc Canh Le).
  • Fixed a bug in the normalization of a DEFAULT expression in CREATE TABLE statement. The second argument of the function in (or the right argument of operator IN) might be replaced with the result of its evaluation during CREATE query execution. Fixes #44496. #44547 (Alexander Tokmakov).
  • Projections do not work in presence of WITH ROLLUP, WITH CUBE and WITH TOTALS. In previous versions, a query produced an exception instead of skipping the usage of projections. This closes #44614. This closes #42772. #44615 (Alexey Milovidov).
  • Async blocks were not cleaned because the function get all blocks sorted by time didn't get async blocks. #44651 (Han Fei).
  • Fix LOGICAL_ERROR The top step of the right pipeline should be ExpressionStep for JOIN with subquery, UNION, and TOTALS. Fixes #43687. #44673 (Nikolai Kochetov).
  • Avoid std::out_of_range exception in the Executable table engine. #44681 (Kruglov Pavel).
  • Do not apply optimize_syntax_fuse_functions to quantiles on AST, close #44712. #44713 (Vladimir C).
  • Fix bug with wrong type in Merge table and PREWHERE, close #43324. #44716 (Vladimir C).
  • Fix a possible crash during shutdown (while destroying TraceCollector). Fixes #44757. #44758 (Nikolai Kochetov).
  • Fix a possible crash in distributed query processing. The crash could happen if a query with totals or extremes returned an empty result and there are mismatched types in the Distributed and the local tables. Fixes #44738. #44760 (Nikolai Kochetov).
  • Fix fsync for fetches (min_compressed_bytes_to_fsync_after_fetch)/small files (ttl.txt, columns.txt) in mutations (min_rows_to_fsync_after_merge/min_compressed_bytes_to_fsync_after_merge). #44781 (Azat Khuzhin).
  • A rare race condition was possible when querying the system.parts or system.parts_columns tables in the presence of parts being moved between disks. Introduced in #41145. #44809 (Alexey Milovidov).
  • Fix the error Context has expired which could appear with enabled projections optimization. Can be reproduced for queries with specific functions, like dictHas/dictGet which use context in runtime. Fixes #44844. #44850 (Nikolai Kochetov).
  • A fix for Cannot read all data error which could happen while reading LowCardinality dictionary from remote fs. Fixes #44709. #44875 (Nikolai Kochetov).
  • Ignore cases when hardware monitor sensors cannot be read instead of showing a full exception message in logs. #44895 (Raúl Marín).
  • Use max_delay_to_insert value in case the calculated time to delay INSERT exceeds the setting value. Related to #44902. #44916 (Igor Nikonov).
  • Fix error Different order of columns in UNION subquery for queries with UNION. Fixes #44866. #44920 (Nikolai Kochetov).
  • Delay for INSERT can be calculated incorrectly, which can lead to always using max_delay_to_insert setting as delay instead of a correct value. Using simple formula max_delay_to_insert * (parts_over_threshold/max_allowed_parts_over_threshold) i.e. delay grows proportionally to parts over threshold. Closes #44902. #44954 (Igor Nikonov).
  • Fix alter table TTL error when a wide part has the lightweight delete mask. #44959 (Mingliang Pan).
  • Follow-up fix for Replace domain IP types (IPv4, IPv6) with native #43221. #45024 (Yakov Olkhovskiy).
  • Follow-up fix for Replace domain IP types (IPv4, IPv6) with native https://github.com/ClickHouse/ClickHouse/pull/43221. #45043 (Yakov Olkhovskiy).
  • A buffer overflow was possible in the parser. Found by fuzzer. #45047 (Alexey Milovidov).
  • Fix possible cannot-read-all-data error in storage FileLog. Closes #45051, #38257. #45057 (Kseniia Sumarokova).
  • Memory efficient aggregation (setting distributed_aggregation_memory_efficient) is disabled when grouping sets are present in the query. #45058 (Nikita Taranov).
  • Fix RANGE_HASHED dictionary to count range columns as part of the primary key during updates when update_field is specified. Closes #44588. #45061 (Maksim Kita).
  • Fix error Cannot capture column for LowCardinality captured argument of nested lambda. Fixes #45028. #45065 (Nikolai Kochetov).
  • Fix the wrong query result of additional_table_filters (additional filter was not applied) in case the minmax/count projection is used. #45133 (Nikolai Kochetov).
  • Fixed bug in histogram function accepting negative values. #45147 (simpleton).
  • Fix wrong column nullability in StoreageJoin, close #44940. #45184 (Vladimir C).
  • Fix background_fetches_pool_size settings reload (increase at runtime). #45189 (Raúl Marín).
  • Correctly process SELECT queries on KV engines (e.g. KeeperMap, EmbeddedRocksDB) using IN on the key with subquery producing different type. #45215 (Antonio Andelic).
  • Fix logical error in SEMI JOIN & join_use_nulls in some cases, close #45163, close #45209. #45230 (Vladimir C).
  • Fix heap-use-after-free in reading from s3. #45253 (Kruglov Pavel).
  • Fix bug when the Avro Union type is ['null', Nested type], closes #45275. Fix bug that incorrectly infers bytes type to Float. #45276 (flynn).
  • Throw a correct exception when explicit PREWHERE cannot be used with a table using the storage engine Merge. #45319 (Antonio Andelic).
  • Under WSL1 Ubuntu self-extracting ClickHouse fails to decompress due to inconsistency - /proc/self/maps reporting 32bit file's inode, while stat reporting 64bit inode. #45339 (Yakov Olkhovskiy).
  • Fix race in Distributed table startup (that could lead to processing file of async INSERT multiple times). #45360 (Azat Khuzhin).
  • Fix a possible crash while reading from storage S3 and table function s3 in the case when ListObject request has failed. #45371 (Anton Popov).
  • Fix SELECT ... FROM system.dictionaries exception when there is a dictionary with a bad structure (e.g. incorrect type in XML config). #45399 (Aleksei Filatov).
  • Fix s3Cluster schema inference when structure from insertion table is used in INSERT INTO ... SELECT * FROM s3Cluster queries. #45422 (Kruglov Pavel).
  • Fix bug in JSON/BSONEachRow parsing with HTTP that could lead to using default values for some columns instead of values from data. #45424 (Kruglov Pavel).
  • Fixed bug (Code: 632. DB::Exception: Unexpected data ... after parsed IPv6 value ...) with typed parsing of IP types from text source. #45425 (Yakov Olkhovskiy).
  • close #45297 Add check for empty regular expressions. #45428 (Han Fei).
  • Fix possible (likely distributed) query hung. #45448 (Azat Khuzhin).
  • Fix possible deadlock with allow_asynchronous_read_from_io_pool_for_merge_tree enabled in case of exception from ThreadPool::schedule. #45481 (Nikolai Kochetov).
  • Fix possible in-use table after DETACH. #45493 (Azat Khuzhin).
  • Fix rare abort in the case when a query is canceled and parallel parsing was used during its execution. #45498 (Anton Popov).
  • Fix a race between Distributed table creation and INSERT into it (could lead to CANNOT_LINK during INSERT into the table). #45502 (Azat Khuzhin).
  • Add proper default (SLRU) to cache policy getter. Closes #45514. #45524 (Kseniia Sumarokova).
  • Disallow array join in mutations closes #42637 #44447 (SmitaRKulkarni).
  • Fix for qualified asterisks with alias table name and column transformer. Resolves #44736. #44755 (SmitaRKulkarni).

Changelog for 2022