ClickHouse/CHANGELOG.md
2022-02-17 19:05:04 +03:00

83 KiB

ClickHouse release v22.2, 2022-02-17 (draft)

Upgrade Notes

  • Applying data skipping indexes for queries with FINAL may produce incorrect result. In this release we disabled data skipping indexes by default for queries with FINAL (a new setting use_skip_indexes_if_final is introduced and disabled by default). #34243 (Azat Khuzhin).

New Feature

  • Functions for text classification: language and charset detection. See #23271. #33314 (Nikolay Degterinsky).
  • Add memory overcommit to MemoryTracker. Added guaranteed settings for memory limits which represent soft memory limits. In case when hard memory limit is reached, MemoryTracker tries to cancel the most overcommited query. New setting memory_usage_overcommit_max_wait_microseconds specifies how long queries may wait another query to stop. Closes #28375. #31182 (Dmitry Novik).
  • Allow to create new files on insert for File/S3/HDFS engines. Allow to owerwrite file in HDFS. Throw an exception in attempt to overwrite a file in S3 by default. Throw an exception in attempt to append data to file in formats that have suffix. Closes #31640 Closes #31622 Closes #23862 Closes #15022 Closes #16674. #33302 (Kruglov Pavel).
  • Add a setting that allows a user to provide own deduplication semantic in MergeTree/ReplicatedMergeTree If provided, it's used instead of data digest to generate block ID. So, for example, by providing a unique value for the setting in each INSERT statement, the user can avoid the same inserted data being deduplicated. This closes: #7461. #32304 (Igor Nikonov).
  • Add support of DEFAULT keyword for INSERT statements. Closes #6331. #33141 (Andrii Buriachevskyi).
  • EPHEMERAL column specifier is added to CREATE TABLE query. Closes #9436. #34424 (yakov-olkhovskiy).
  • Support IF EXISTS clause for TTL expr TO [DISK|VOLUME] [IF EXISTS] 'xxx' feature. Parts will be moved to disk or volume only if it exists on replica, so MOVE TTL rules will be able to behave differently on replicas according to the existing storage policies. Resolves #34455. #34504 (Anton Popov).
  • Allow set default table engine and to create tables without specifying ENGINE. #34187 (Ilya Yatsishin).
  • Add table function format(format_name, data). #34125 (Kruglov Pavel).
  • Detect format in clickhouse-local by file name even in the case when it is passed to stdin. #33829 (Kruglov Pavel).
  • Add schema inference for values table function. Closes #33811. #34017 (Kruglov Pavel).
  • Dynamic reload of server TLS certificates on config reload. Closes #15764. #15765 (johnskopis). #31257 (Filatenkov Artur).
  • Now ReplicatedMergeTree can recover data when some of its disks are broken. #13544 (Amos Bird).
  • Fault-tolerant connections in clickhouse-client: clickhouse-client ... --host host1 --host host2 --port port2 --host host3 --port port --host host4. #34490 (Kruglov Pavel). #33824 (Filippov Denis).
  • Add DEGREES and RADIANS functions for MySQL compatibility. #33769 (Bharat Nallan).
  • Add h3ToCenterChild function. #33313 (Bharat Nallan). Add new h3 miscellaneous functions: edgeLengthKm,exactEdgeLengthKm,exactEdgeLengthM,exactEdgeLengthRads,numHexagons. #33621 (Bharat Nallan).
  • Add function bitSlice to extract bit subsequences from String/FixedString. #33360 (RogerYK).
  • Implemented meanZTest aggregate function. #33354 (achimbab).
  • Add confidence intervals to T-tests aggregate functions. #33260 (achimbab).

Performance Improvement

  • Support optimize_read_in_order if prefix of sorting key is already sorted. E.g. if we have sorting key ORDER BY (a, b) in table and query with WHERE a = const ORDER BY b clauses, now it will be applied reading in order of sorting key instead of full sort. #32748 (Anton Popov).
  • Improve performance of partitioned insert into table functions URL, S3, File, HDFS. Closes #34348. #34510 (Maksim Kita).
  • Multiple performance improvements of clickhouse-keeper. #34484 #34587 (zhanglistar).
  • FlatDictionary improve performance of dictionary data load. #33871 (Maksim Kita).
  • Improve performance of mapPopulateSeries function. Closes #33944. #34318 (Maksim Kita).
  • _file and _path virtual columns (in file-like table engines) are made LowCardinality - it will make queries for multiple files faster. Closes #34300. #34317 (flynn).
  • Speed up loading of data parts. It was not parallelized before: the setting part_loading_threads did not have effect. See #4699. #34310 (alexey-milovidov).
  • Improve performance of LineAsString format. This closes #34303. #34306 (alexey-milovidov).
  • Optimize quantilesExact{Low,High} to use nth_element instead of sort. #34287 (Danila Kutenin).
  • Slightly improve performance of Regexp format. #34202 (alexey-milovidov).
  • Minor improvement to potential hot-path in ExecuteScalarSubqueriesMatcher::visit, where std::set<String> was constructed on every function invocation. #34128 (Federico Rodriguez).
  • Make ORDER BY tuple almost as fast as ORDER BY columns. We have special optimizations for multiple column ORDER BY: https://github.com/ClickHouse/ClickHouse/pull/10831 . It's beneficial to also apply to tuple columns. #34060 (Amos Bird).
  • Reworks and reintroduce the scalar subqueries cache to Materialized Views execution. #33958 (Raúl Marín).
  • Slightly improve performance of ORDER BY by adding x86-64 AVX-512 support for memcmpSmall functions to accelerate memory comparison. It works only if you compile ClickHouse by yourself. #33706 (hanqf-git).
  • Improve range_hashed dictionary performance if for key there are a lot of intervals. Fixes #23821. #33516 (Maksim Kita).
  • For inserts and merges into S3, write files in parallel whenever possible (TODO: check if it's merged). #33291 (Nikolai Kochetov).

Improvement

  • Support asynchronous inserts in clickhouse-client for queries with inlined data. #34267 (Anton Popov).

  • Functions dictGet, dictHas implicitly cast key argument to dictionary key structure, if they are different. #33672 (Maksim Kita).

  • Added #! and # as a recognised start of a single line comment. Reference to task #34138. #34230 (Aaron Katz).

  • Allow to write s3(url, access_key_id, secret_access_key) (autodetect of data format and table structure, but with explicit credentials). #34503 (Kruglov Pavel).

  • Added sending of the output format back to client like it's done in HTTP protocol as suggested in #34362. Closes #34362. #34499 (Vitaly Baranov).

  • Send ProfileEvents statistics in case of INSERT SELECT query (to display query metrics in clickhouse-client for this type of queries). #34498 (Dmitry Novik).

  • Recognize .jsonl extension for JSONEachRow format. #34496 (Kruglov Pavel).

  • Improve schema inference in clickhouse-local. Allow to write just clickhouse-local -q "select * from table" < data.format. #34495 (Kruglov Pavel).

  • Privileges CREATE/ALTER/DROP ROW POLICY now can be granted on a table or on database.* as well as globally *.*. #34489 (Vitaly Baranov).

  • Allow allow_experimental_projection_optimization by default. #34456 (Nikolai Kochetov).

  • Allow to export arbitrary large files to s3. Add two new settings: s3_upload_part_size_multiply_factor and s3_upload_part_size_multiply_parts_count_threshold. Now each time s3_upload_part_size_multiply_parts_count_threshold uploaded to S3 from a single query s3_min_upload_part_size multiplied by s3_upload_part_size_multiply_factor. Fixes #34244. #34422 (alesapin).

  • Allow to skip not found (404) URLs for globs when using URL storage / table function. Also closes #34359. #34392 (Kseniia Sumarokova).

  • Default input and output formats for clickhouse-local that can be overriden by --input-format and --output-format. Close #30631. #34352 (李扬).

  • Add options for clickhouse-format. Which close #30528 - max_query_size - max_parser_depth. #34349 (李扬).

  • Better handling of pre-inputs before client start. This is for #34308. #34336 (Amos Bird).

  • REGEXP_MATCHES and REGEXP_REPLACE function aliases for compatibility with PostgreSQL. Close #30885. #34334 (李扬).

  • Some servers expect a User-Agent header in their HTTP requests. A User-Agent header entry has been added to HTTP requests of the form: User-Agent: ClickHouse/VERSION_STRING. #34330 (Saad Ur Rahman).

  • Cancel merges before acquiring table lock for TRUNCATE query to avoid DEADLOCK_AVOIDED error in some cases. Fixes #34302. #34304 (tavplubix).

  • Change severity of the "Cancelled merging parts" message in logs, because it's not an error. This closes #34148. #34232 (alexey-milovidov).

  • Add ability to compose PostgreSQL-style cast operator :: with ArrayElement and TupleElement. #34229 (Nikolay Degterinsky).

  • Recognize YYYYMMDD-hhmmss format in parseDateTimeBestEffort function. This closes #34206. #34208 (alexey-milovidov).

  • Allow carriage return in the middle of the line while parsing by Regexp format. This closes #34200. #34205 (alexey-milovidov).

  • Allow to parse dictionary PRIMARY KEY as PRIMARY KEY (id, value), previously supported only PRIMARY KEY id, value. Closes #34135. #34141 (Maksim Kita).

  • Maxsplit argument for splitByChar. close #34081. #34140 (李扬).

  • Improving the experience of multiple line editing for clickhouse-client. This is a follow-up of https://github.com/ClickHouse/ClickHouse/pull/31123. #34114 (Amos Bird).

  • Add UUID suport in MsgPack input/output format. #34065 (Kruglov Pavel).

  • Tracing context is now propagated from GRPC client metadata. #34064 (andremarianiello).

  • Supports all types of SYSTEM query ON CLUSTER clause. #34005 (小路).

  • Fix memory accounting for queries that uses < max_untracker_memory. #34001 (Azat Khuzhin).

  • Fixed UTF-8 string case-insensitive search when lowercase and uppercase characters are represented by different number of bytes. Example is and ß. This closes #7334. #33992 (Harry Lee).

  • Detect format and schema from stdin in clickhouse-local. #33960 (Kruglov Pavel).

  • RangeHashedDictionary improvements. Improve performance of load time if there are multiple attributes. Allow to create without attributes. Added option to specify strategy when intervals start and end have Nullable type convert_null_range_bound_to_open by default is true. Closes #29791. Allow to specify Float, Decimal, DateTime64, Int128, Int256, UInt128, UInt256 as range types. RangeHashedDictionary added support for range values that extend Int64 type. Closes #28322. Added option range_lookup_strategy to specify range lookup type min, max by default is min . Closes #21647. Fixed allocated bytes calculations. Fixed type name in system.dictionaries in case of ComplexKeyHashedDictionary. #33927 (Maksim Kita).

  • FlatDictionary, HashedDictionary, HashedArrayDictionary added support for creating with empty attributes, with support of read all keys, and dictHas. Fixes #33820. #33918 (Maksim Kita).

  • Added support for DateTime64 data type in dictionaries. #33914 (Maksim Kita).

  • TODO: WTF? Fix disk using the same path, close #29072. #33905 (zhongyuankai).

  • Try every resolved ip address while getting S3 proxy. #33862 (Nikolai Kochetov).

  • Support explain create function query explain ast create function mycast AS (n) -> cast(n as String); EXPLAIN AST CREATE FUNCTION mycast AS n -> CAST(n, 'String'). #33819 (李扬).

  • Added support for cast from Map(Key, Value) to Array(Tuple(Key, Value)). #33794 (Maksim Kita).

  • Add some improvements and fixes for Bool data type. Fixes #33244. #33737 (Kruglov Pavel).

  • Enable stream to table join in WindowView. #33729 (vxider).

  • Parse and store OpenTelemetry trace-id in big-endian order. #33723 (Frank Chen).

  • Improvement for fromUnixTimestamp64 family functions.. They now accept any integer value that can be converted to Int64. This closes: #14648. #33505 (Andrey Zvonov).

  • Add function addressToLineWithInlines. Close #26211. #33467 (SuperDJY).

  • Support SET, YEAR, TIME and GEOMETRY data types in MaterializedMySQL (experimental feature). Fixes #18091, #21536, #26361. #33429 (zzsmdfj).

  • Replace _shard_num via constants (from #7624) with shardNum() function (from #27020), to avoid possible issues (like those that had been found in #16947). #33392 (Azat Khuzhin).

  • Enable binary arithmetic(plus, minus, multiply, division, least, greates) between Decimal and Float. #33355 (flynn).

  • Respect cgroup limits for CPU quota. #33342 (JaySon).

  • Improve keeper performance and fix several memory leaks. #33329 (alesapin).

  • Add new keeper setting min_session_timeout_ms. Now keeper server will determine client session timeout according to min_session_timeout_ms and session_timeout_ms settings. #33288 (JackyWoo).

  • Added UUID data type support for functions hex, bin. #32170 (Frank Chen).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehaviour in official stable or prestable release)

    • Add Debug workflow to get variables for all actions on demand - Fix lack of pr_info.number for some edge case. #34644 (Mikhail f. Shiryaev).
  • Fixed the assertion in case of using allow_experimental_parallel_reading_from_replicas with max_parallel_replicas equals to 1. This fixes #34525. #34613 (Nikita Mikhaylov).
  • Fix bug of round/roundBankers, close #33267. #34562 (李扬).
  • In case of cancelation S3 and HDFS canceled only current reader, but continued to execute the initial query. Fixes #34301 Relates to #34397. #34539 (Dmitry Novik).
  • Fix exception Chunk should have AggregatedChunkInfo in MergingAggregatedTransform (in case of optimize_aggregation_in_order=1 and distributed_aggregation_memory_efficient=0). Fixes #34526. #34532 (Anton Popov).
  • Fix comparison between integers and floats in index analysis. Previously it could lead to skipping some granules for reading by mistake. Fixes #34493. #34528 (Anton Popov).
  • Fix compression support in URL engine. #34524 (Frank Chen).
  • Fix possible error 'file_size: Operation not supported'. #34479 (Kruglov Pavel).
  • Add missing lock for storage. Fixes possible race with table deletion. #34416 (Kseniia Sumarokova).
  • Fix possible error 'Cannot convert column Function to mask' in short circuit function evaluation. Closes #34171. #34415 (Kruglov Pavel).
  • Fix segfault in schema inference from url. Closes #34147. #34405 (Kruglov Pavel).
  • For SQLUserDefinedFunctions change privilege level from DATABASE to GLOBAL. Closes #34281. #34404 (Maksim Kita).
  • Fix wrong engine syntax in result of SHOW CREATE DATABASE query for databases with engine Memory. This closes #34335. #34345 (alexey-milovidov).
  • Try to fix rare bug while reading of empty arrays, which could lead to Data compressed with different methods error. #34327 (Anton Popov).
  • Fix various issues when projection is enabled by default. Each issue is described in separate commit. This is for #33678 . This fixes #34273. #34305 (Amos Bird).
  • Fixed a couple of extremely rare race conditions that might lead to broken state of replication queue and "intersecting parts" error. #34297 (tavplubix).
  • Fix progress bar width. It was incorrectly rounded to integer number of characters. #34275 (alexey-milovidov).
  • Fix current_user/current_address for interserver mode (Before this patch current_user/current_address will be preserved from the previous query). #34263 (Azat Khuzhin).
  • Fix memory leak in case of some Exception during query processing with optimize_aggregation_in_order=1. #34234 (Azat Khuzhin).
  • Fix reading of subcolumns with dots in their names. In particular fixed reading of Nested columns, if their element names contain dots (e.g Nested(`keys.name` String, `keys.id` UInt64, values UInt64)). #34228 (Anton Popov).
  • Fix metric Query, which shows number of executing queries. In last several releases it was always 0. #34224 (Anton Popov).
  • Fix schema inference for table runction s3. #34186 (Kruglov Pavel).
  • Fix rare and benign race condition in HDFS, S3 and URL storage engines which can lead to additional connections. #34172 (alesapin).
  • Fix bug which can rarely lead to error "Cannot read all data" while reading LowCardinality columns of MergeTree table engines family which stores data on remote file system like S3. #34139 (alesapin).
  • Fix inserts to distributed tables in case of change of native protocol. The last change was in the version version 22.1, so there may be some failures of inserts to distributed tables after upgrade to that version. #34132 (Anton Popov).
  • Fix possible data race in StorageFile that was introduced in https://github.com/ClickHouse/ClickHouse/pull/33960. Closes #34111. #34113 (Kruglov Pavel).
  • Fixed minor race condition that might cause "intersecting parts" error in extremely rare cases after ZooKeeper connection loss. #34096 (tavplubix).
  • Fix asynchronous inserts with Native format. #34068 (Anton Popov).
    • Fixes parallel_view_processing=0 not working when inserting into a table using VALUES. - Fixes view_duration_ms in the query_views_log not being set correctly for materialized views. #34067 (Raúl Marín).
  • Fix bug which lead to inability for server to start when both replicated access storage and keeper are used. Introduced two settings for keeper socket timeout instead of settings from default user: keeper_server.socket_receive_timeout_sec and keeper_server.socket_send_timeout_sec. Fixes #33973. #33988 (alesapin).
  • Fix segfault while parsing ORC file with corrupted footer. Closes #33797. #33984 (Kruglov Pavel).
  • Fix parsing IPv6 from query parameter and fix IPv6 to string conversion. Closes #33928. #33971 (Kruglov Pavel).
  • Fix crash while reading of nested tuples. Fixes #33838. #33956 (Anton Popov).
  • Fix usage of functions array and tuple with literal arguments in distributed queries. Previously it could lead to Not found columns exception. #33938 (Anton Popov).
  • Fix parsing ZK metadata: now metadata from zookeeper compared with local metadata in canonical form. #33933 (sunny).
  • Aggregate function combinator -If did not correctly process Nullable filter argument. This closes #27073. #33920 (alexey-milovidov).
  • Fix potential race condition when doing remote disk read. cc @Jokser. #33912 (Amos Bird).
  • Fix crash if sql user defined function is created with lambda with non identifier arguments. Closes #33866. #33868 (Maksim Kita).
  • Fix usage of sparse columns (which can be enabled by experimental setting ratio_of_defaults_for_sparse_serialization). #33849 (Anton Popov).
  • Fixed replica is not readonly logical error on SYSTEM RESTORE REPLICA query when replica is actually readonly. Fixes #33806. #33847 (tavplubix).
  • Fix memory leak in clickhouse-keeper in case of compression is used (default). #33840 (Azat Khuzhin).
  • Fix KeyCondition with no common types available. #33833 (Amos Bird).
  • Fix schema inference for JSONEachRow and JSONCompactEachRow. #33830 (Kruglov Pavel).
  • Fix usage of external dictionaries with Redis source and large number of keys. #33804 (Anton Popov).
  • Fix bug in client that led to 'Connection reset by peer' in server. Closes #33309. #33790 (Kruglov Pavel).
  • Fix parsing query INSERT INTO ... VALUES SETTINGS ... (...), ... #33776 (Kruglov Pavel).
  • Fix bug of check table when creating data part with wide format and projection. #33774 (李扬).
  • Fix tiny race between count() and INSERT/merges/... in MergeTree (it is possible to return incorrect number of rows for SELECT with optimize_trivial_count_query). #33753 (Azat Khuzhin).
  • Throw exception when storage hdfs list directory failed. #33724 (LiuNeng).
  • Fix mutation when table contains projections. This fixes #33010 . This fixes #33275 . #33679 (Amos Bird).
  • Correctly determine current database if CREATE TEMPORARY TABLE AS SELECT is queried inside a named HTTP session. This is a very rare use case. This closes #8340. #33676 (alexey-milovidov).
  • Allow some queries with sorting, LIMIT BY, ARRAY JOIN and lambda functions. This closes #7462. #33675 (alexey-milovidov).
  • Fix bug in zero copy replication which lead to data duplication in case of TTL move. Fixes #33643. #33642 (alesapin).
  • Fix Chunk should have AggregatedChunkInfo in GroupingAggregatedTransform (in case of optimize_aggregation_in_order=1). #33637 (Azat Khuzhin).
  • Fix error Bad cast from type ... to DB::DataTypeArray which may happen when table has Nested column with dots in name, and default value is generated for it (e.g. during insert, when column is not listed). Continuation of #28762. #33588 (Alexey Pavlenko).
  • TODO. #33492 (huzhichengdd).
  • Create a function escapeForLDAPFilter and use it to escape characters '(' and ')' in a final_user_dn variable. #33401 (IlyaTsoi).
  • Fix lz4 compression for output. Closes #31421. #31862 (Kruglov Pavel).
  • add HashMethodSingleLowCardinalityColumn::findKey, avoid crash. #34506 (DR).
  • Fix inserting to temporary tables via gRPC. This PR fixes #34347, issue #2. #34364 (Vitaly Baranov).
  • This PR fixes #19429. #34225 (Vitaly Baranov).
  • This PR fixes #18206. #33977 (Vitaly Baranov).
  • This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). #33574 (Vitaly Baranov).

ClickHouse release v22.1, 2022-01-18

Upgrade Notes

  • The functions left and right were previously implemented in parser and now full-featured. Distributed queries with left or right functions without aliases may throw exception if cluster contains different versions of clickhouse-server. If you are upgrading your cluster and encounter this error, you should finish upgrading your cluster to ensure all nodes have the same version. Also you can add aliases (AS something) to the columns in your queries to avoid this issue. #33407 (alexey-milovidov).
  • Resource usage by scalar subqueries is fully accounted since this version. With this change, rows read in scalar subqueries are now reported in the query_log. If the scalar subquery is cached (repeated or called for several rows) the rows read are only counted once. This change allows KILLing queries and reporting progress while they are executing scalar subqueries. #32271 (Raúl Marín).

New Feature

  • Implement data schema inference for input formats. Allow to skip structure (or write just auto) in table functions file, url, s3, hdfs and in parameters of clickhouse-local . Allow to skip structure in create query for table engines File, HDFS, S3, URL, Merge, Buffer, Distributed and ReplicatedMergeTree (if we add new replicas). #32455 (Kruglov Pavel).
  • Detect format by file extension in file/hdfs/s3/url table functions and HDFS/S3/URL table engines and also for SELECT INTO OUTFILE and INSERT FROM INFILE #33565 (Kruglov Pavel). Close #30918. #33443 (OnePiece).
  • A tool for collecting diagnostics data if you need support. #33175 (Alexander Burmak).
  • Automatic cluster discovery via Zoo/Keeper. It allows to add replicas to the cluster without changing configuration on every server. #31442 (vdimir).
  • Implement hive table engine to access apache hive from clickhouse. This implements: #29245. #31104 (taiyang-li).
  • Add aggregate functions cramersV, cramersVBiasCorrected, theilsU and contingency. These functions calculate dependency (measure of association) between categorical values. All these functions are using cross-tab (histogram on pairs) for implementation. You can imagine it like a correlation coefficient but for any discrete values (not necessary numbers). #33366 (alexey-milovidov). Initial implementation by Vanyok-All-is-OK and antikvist.
  • Added table function hdfsCluster which allows processing files from HDFS in parallel from many nodes in a specified cluster, similarly to s3Cluster. #32400 (Zhichang Yu).
  • Adding support for disks backed by Azure Blob Storage, in a similar way it has been done for disks backed by AWS S3. #31505 (Jakub Kuklis).
  • Allow COMMENT in CREATE VIEW (for all VIEW kinds). #31062 (Vasily Nemkov).
  • Dynamically reinitialize listening ports and protocols when configuration changes. #30549 (Kevin Michel).
  • Added left, right, leftUTF8, rightUTF8 functions. Fix error in implementation of substringUTF8 function with negative offset (offset from the end of string). #33407 (alexey-milovidov).
  • Add new functions for H3 coordinate system: h3HexAreaKm2, h3CellAreaM2, h3CellAreaRads2. #33479 (Bharat Nallan).
  • Add MONTHNAME function. #33436 (usurai).
  • Added function arrayLast. Closes #33390. #33415 Added function arrayLastIndex. #33465 (Maksim Kita).
  • Add function decodeURLFormComponent slightly different to decodeURLComponent. Close #10298. #33451 (SuperDJY).
  • Allow to split GraphiteMergeTree rollup rules for plain/tagged metrics (optional rule_type field). #33494 (Michail Safronov).

Performance Improvement

  • Support moving conditions to PREWHERE (setting optimize_move_to_prewhere) for tables of Merge engine if its all underlying tables supports PREWHERE. #33300 (Anton Popov).
  • More efficient handling of globs for URL storage. Now you can easily query million URLs in parallel with retries. Closes #32866. #32907 (Kseniia Sumarokova).
  • Avoid exponential backtracking in parser. This closes #20158. #33481 (alexey-milovidov).
  • Abuse of untuple function was leading to exponential complexity of query analysis (found by fuzzer). This closes #33297. #33445 (alexey-milovidov).
  • Reduce allocated memory for dictionaries with string attributes. #33466 (Maksim Kita).
  • Slight performance improvement of reinterpret function. #32587 (alexey-milovidov).
  • Non significant change. In extremely rare cases when data part is lost on every replica, after merging of some data parts, the subsequent queries may skip less amount of partitions during partition pruning. This hardly affects anything. #32220 (Azat Khuzhin).
  • Improve clickhouse-keeper writing performance by optimization the size calculation logic. #32366 (zhanglistar).
  • Optimize single part projection materialization. This closes #31669. #31885 (Amos Bird).
  • Improve query performance of system tables. #33312 (OnePiece).
  • Optimize selecting of MergeTree parts that can be moved between volumes. #33225 (OnePiece).
  • Fix sparse_hashed dict performance with sequential keys (wrong hash function). #32536 (Azat Khuzhin).

Experimental Feature

  • Parallel reading from multiple replicas within a shard during distributed query without using sample key. To enable this, set allow_experimental_parallel_reading_from_replicas = 1 and max_parallel_replicas to any number. This closes #26748. #29279 (Nikita Mikhaylov).
  • Implemented sparse serialization. It can reduce usage of disk space and improve performance of some queries for columns, which contain a lot of default (zero) values. It can be enabled by setting ratio_for_sparse_serialization. Sparse serialization will be chosen dynamically for column, if it has ratio of number of default values to number of all values above that threshold. Serialization (default or sparse) will be fixed for every column in part, but may varies between parts. #22535 (Anton Popov).
  • Add "TABLE OVERRIDE" feature for customizing MaterializedMySQL table schemas. #32325 (Stig Bakken).
  • Add EXPLAIN TABLE OVERRIDE query. #32836 (Stig Bakken).
  • Support TABLE OVERRIDE clause for MaterializedPostgreSQL. RFC: #31480. #32749 (Kseniia Sumarokova).
  • Change ZooKeeper path for zero-copy marks for shared data. Note that "zero-copy replication" is non-production feature (in early stages of development) that you shouldn't use anyway. But in case if you have used it, let you keep in mind this change. #32061 (ianton-ru).
  • Events clause support for WINDOW VIEW watch query. #32607 (vxider).
  • Fix ACL with explicit digit hash in clickhouse-keeper: now the behavior consistent with ZooKeeper and generated digest is always accepted. #33249 (小路). #33246.
  • Fix unexpected projection removal when detaching parts. #32067 (Amos Bird).

Improvement

  • Now date time conversion functions that generates time before 1970-01-01 00:00:00 will be saturated to zero instead of overflow. #29953 (Amos Bird). It also fixes a bug in index analysis if date truncation function would yield result before the Unix epoch.
  • Always display resource usage (total CPU usage, total RAM usage and max RAM usage per host) in client. #33271 (alexey-milovidov).
  • Improve Bool type serialization and deserialization, check the range of values. #32984 (Kruglov Pavel).
  • If an invalid setting is defined using the SET query or using the query parameters in the HTTP request, error message will contain suggestions that are similar to the invalid setting string (if any exists). #32946 (Antonio Andelic).
  • Support hints for mistyped setting names for clickhouse-client and clickhouse-local. Closes #32237. #32841 (凌涛).
  • Allow to use virtual columns in Materialized Views. Close #11210. #33482 (OnePiece).
  • Add config to disable IPv6 in clickhouse-keeper if needed. This close #33381. #33450 (Wu Xueyang).
  • Add more info to system.build_options about current git revision. #33431 (taiyang-li).
  • clickhouse-local: track memory under --max_memory_usage_in_client option. #33341 (Azat Khuzhin).
  • Allow negative intervals in function intervalLengthSum. Their length will be added as well. This closes #33323. #33335 (alexey-milovidov).
  • LineAsString can be used as output format. This closes #30919. #33331 (Sergei Trifonov).
  • Support <secure/> in cluster configuration, as an alternative form of <secure>1</secure>. Close #33270. #33330 (SuperDJY).
  • Pressing Ctrl+C twice will terminate clickhouse-benchmark immediately without waiting for in-flight queries. This closes #32586. #33303 (alexey-milovidov).
  • Support Unix timestamp with milliseconds in parseDateTimeBestEffort function. #33276 (Ben).
  • Allow to cancel query while reading data from external table in the formats: Arrow / Parquet / ORC - it failed to be cancelled it case of big files and setting input_format_allow_seeks as false. Closes #29678. #33238 (Kseniia Sumarokova).
  • If table engine supports SETTINGS clause, allow to pass the settings as key-value or via config. Add this support for MySQL. #33231 (Kseniia Sumarokova).
  • Correctly prevent Nullable primary keys if necessary. This is for #32780. #33218 (Amos Bird).
  • Add retry for PostgreSQL connections in case nothing has been fetched yet. Closes #33199. #33209 (Kseniia Sumarokova).
  • Validate config keys for external dictionaries. #33095. #33130 (Kseniia Sumarokova).
  • Send profile info inside clickhouse-local. Closes #33093. #33097 (Kseniia Sumarokova).
  • Short circuit evaluation: support for function throwIf. Closes #32969. #32973 (Maksim Kita).
  • (This only happens in unofficial builds). Fixed segfault when inserting data into compressed Decimal, String, FixedString and Array columns. This closes #32939. #32940 (N. Kolotov).
  • Added support for specifying subquery as SQL user defined function. Example: CREATE FUNCTION test AS () -> (SELECT 1). Closes #30755. #32758 (Maksim Kita).
  • Improve gRPC compression support for #28671. #32747 (Vitaly Baranov).
  • Flush all In-Memory data parts when WAL is not enabled while shutdown server or detaching table. #32742 (nauta).
  • Allow to control connection timeouts for MySQL (previously was supported only for dictionary source). Closes #16669. Previously default connect_timeout was rather small, now it is configurable. #32734 (Kseniia Sumarokova).
  • Support authSource option for storage MongoDB. Closes #32594. #32702 (Kseniia Sumarokova).
  • Support Date32 type in genarateRandom table function. #32643 (nauta).
  • Add settings max_concurrent_select_queries and max_concurrent_insert_queries for control concurrent queries by query kind. Close #3575. #32609 (SuperDJY).
  • Improve handling nested structures with missing columns while reading data in Protobuf format. Follow-up to https://github.com/ClickHouse/ClickHouse/pull/31988. #32531 (Vitaly Baranov).
  • Allow empty credentials for MongoDB engine. Closes #26267. #32460 (Kseniia Sumarokova).
  • Disable some optimizations for window functions that may lead to exceptions. Closes #31535. Closes #31620. #32453 (Kseniia Sumarokova).
  • Allows to connect to MongoDB 5.0. Closes #31483,. #32416 (Kseniia Sumarokova).
  • Enable comparison between Decimal and Float. Closes #22626. #31966 (flynn).
  • Added settings command_read_timeout, command_write_timeout for StorageExecutable, StorageExecutablePool, ExecutableDictionary, ExecutablePoolDictionary, ExecutableUserDefinedFunctions. Setting command_read_timeout controls timeout for reading data from command stdout in milliseconds. Setting command_write_timeout timeout for writing data to command stdin in milliseconds. Added settings command_termination_timeout for ExecutableUserDefinedFunction, ExecutableDictionary, StorageExecutable. Added setting execute_direct for ExecutableUserDefinedFunction, by default true. Added setting execute_direct for ExecutableDictionary, ExecutablePoolDictionary, by default false. #30957 (Maksim Kita).
  • Bitmap aggregate functions will give correct result for out of range argument instead of wraparound. #33127 (DR).
  • Fix parsing incorrect queries with FROM INFILE statement. #33521 (Kruglov Pavel).
  • Don't allow to write into S3 if path contains globs. #33142 (Kruglov Pavel).
  • --echo option was not used by clickhouse-client in batch mode with single query. #32843 (N. Kolotov).
  • Use --database option for clickhouse-local. #32797 (Kseniia Sumarokova).
  • Fix surprisingly bad code in SQL ordinary function file. Now it supports symlinks. #32640 (alexey-milovidov).
  • Updating modification_time for data part in system.parts after part movement #32964. #32965 (save-my-heart).
  • Potential issue, cannot be exploited: integer overflow may happen in array resize. #33024 (varadarajkumar).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in official stable or prestable release)

  • Several fixes for format parsing. This is relevant if clickhouse-server is open for write access to adversary. Specifically crafted input data for Native format may lead to reading uninitialized memory or crash. This is relevant if clickhouse-server is open for write access to adversary. #33050 (Heena Bansal). Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. #33022 (Harry Lee). Fix null pointer dereference in LowCardinality data when deserializing LowCardinality data in the Native format. #33021 (Harry Lee).
  • ClickHouse Keeper handler will correctly remove operation when response sent. #32988 (JackyWoo).
  • Potential off-by-one miscalculation of quotas: quota limit was not reached, but the limit was exceeded. This fixes #31174. #31656 (sunny).
  • Fixed CASTing from String to IPv4 or IPv6 and back. Fixed error message in case of failed conversion. #29224 (Dmitry Novik) #27914 (Vasily Nemkov).
  • Fixed an exception like Unknown aggregate function nothing during an execution on a remote server. This fixes #16689. #26074 (hexiaoting).
  • Fix wrong database for JOIN without explicit database in distributed queries (Fixes: #10471). #33611 (Azat Khuzhin).
  • Fix segfault in Apache Avro format that appears after the second insert into file. #33566 (Kruglov Pavel).
  • Fix segfault in Apache Arrow format if schema contains Dictionary type. Closes #33507. #33529 (Kruglov Pavel).
  • Out of band offset and limit settings may be applied incorrectly for views. Close #33289 #33518 (hexiaoting).
  • Fix an exception Block structure mismatch which may happen during insertion into table with default nested LowCardinality column. Fixes #33028. #33504 (Nikolai Kochetov).
  • Fix dictionary expressions for range_hashed range min and range max attributes when created using DDL. Closes #30809. #33478 (Maksim Kita).
  • Fix possible use-after-free for INSERT into Materialized View with concurrent DROP (Azat Khuzhin).
  • Do not try to read pass EOF (to workaround for a bug in the Linux kernel), this bug can be reproduced on kernels (3.14..5.9), and requires index_granularity_bytes=0 (i.e. turn off adaptive index granularity). #33372 (Azat Khuzhin).
  • The commands SYSTEM SUSPEND and SYSTEM ... THREAD FUZZER missed access control. It is fixed. Author: Kevin Michel. #33333 (alexey-milovidov).
  • Fix when COMMENT for dictionaries does not appear in system.tables, system.dictionaries. Allow to modify the comment for Dictionary engine. Closes #33251. #33261 (Maksim Kita).
  • Add asynchronous inserts (with enabled setting async_insert) to query log. Previously such queries didn't appear in the query log. #33239 (Anton Popov).
  • Fix sending WHERE 1 = 0 expressions for external databases query. Closes #33152. #33214 (Kseniia Sumarokova).
  • Fix DDL validation for MaterializedPostgreSQL. Fix setting materialized_postgresql_allow_automatic_update. Closes #29535. #33200 (Kseniia Sumarokova). Make sure unused replication slots are always removed. Found in #26952. #33187 (Kseniia Sumarokova). Fix MaterializedPostreSQL detach/attach (removing / adding to replication) tables with non-default schema. Found in #29535. #33179 (Kseniia Sumarokova). Fix DROP MaterializedPostgreSQL database. #33468 (Kseniia Sumarokova).
  • The metric StorageBufferBytes sometimes was miscalculated. #33159 (xuyatian).
  • Fix error Invalid version for SerializationLowCardinality key column in case of reading from LowCardinality column with local_filesystem_read_prefetch or remote_filesystem_read_prefetch enabled. #33046 (Nikolai Kochetov).
  • Fix s3 table function reading empty file. Closes #33008. #33037 (Kseniia Sumarokova).
  • Fix Context leak in case of cancel_http_readonly_queries_on_client_close (i.e. leaking of external tables that had been uploaded the the server and other resources). #32982 (Azat Khuzhin).
  • Fix wrong tuple output in CSV format in case of custom csv delimiter. #32981 (Kruglov Pavel).
  • Fix HDFS URL check that didn't allow using HA namenode address. Bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/31042. #32976 (Kruglov Pavel).
  • Fix throwing exception like positional argument out of bounds for non-positional arguments. Closes #31173#event-5789668239. #32961 (Kseniia Sumarokova).
  • Fix UB in case of unexpected EOF during filling a set from HTTP query (i.e. if the client interrupted in the middle, i.e. timeout 0.15s curl -Ss -F 's=@t.csv;' 'http://127.0.0.1:8123/?s_structure=key+Int&query=SELECT+dummy+IN+s' and with large enough t.csv). #32955 (Azat Khuzhin).
  • Fix a regression in replaceRegexpAll function. The function worked incorrectly when matched substring was empty. This closes #32777. This closes #30245. #32945 (alexey-milovidov).
  • Fix ORC format stripe reading. #32929 (kreuzerkrieg).
  • topKWeightedState failed for some input types. #32487. #32914 (vdimir).
  • Fix exception Single chunk is expected from view inner query (LOGICAL_ERROR) in materialized view. Fixes #31419. #32862 (Nikolai Kochetov).
  • Fix optimization with lazy seek for async reads from remote filesystems. Closes #32803. #32835 (Kseniia Sumarokova).
  • MergeTree table engine might silently skip some mutations if there are too many running mutations or in case of high memory consumption, it's fixed. Fixes #17882. #32814 (tavplubix).
  • Avoid reusing the scalar subquery cache when processing MV blocks. This fixes a bug when the scalar query reference the source table but it means that all subscalar queries in the MV definition will be calculated for each block. #32811 (Raúl Marín).
  • Server might fail to start if database with MySQL engine cannot connect to MySQL server, it's fixed. Fixes #14441. #32802 (tavplubix).
  • Fix crash when used fuzzBits function, close #32737. #32755 (SuperDJY).
  • Fix error Column is not under aggregate function in case of MV with GROUP BY (list of columns) (which is pared as GROUP BY tuple(...)) over Kafka/RabbitMQ. Fixes #32668 and #32744. #32751 (Nikolai Kochetov).
  • Fix ALTER TABLE ... MATERIALIZE TTL query with TTL ... DELETE WHERE ... and TTL ... GROUP BY ... modes. #32695 (Anton Popov).
  • Fix optimize_read_in_order optimization in case when table engine is Distributed or Merge and its underlying MergeTree tables have monotonous function in prefix of sorting key. #32670 (Anton Popov).
  • Fix LOGICAL_ERROR exception when the target of a materialized view is a JOIN or a SET table. #32669 (Raúl Marín).
  • Inserting into S3 with multipart upload to Google Cloud Storage may trigger abort. #32504. #32649 (vdimir).
  • Fix possible exception at RabbitMQ storage startup by delaying channel creation. #32584 (Kseniia Sumarokova).
  • Fix table lifetime (i.e. possible use-after-free) in case of parallel DROP TABLE and INSERT. #32572 (Azat Khuzhin).
  • Fix async inserts with formats CustomSeparated, Template, Regexp, MsgPack and JSONAsString. Previousely the async inserts with these formats didn't read any data. #32530 (Kruglov Pavel).
  • Fix groupBitmapAnd function on distributed table. #32529 (minhthucdao).
  • Fix crash in JOIN found by fuzzer, close #32458. #32508 (vdimir).
  • Proper handling of the case with Apache Arrow column duplication. #32507 (Dmitriy Mokhnatkin).
  • Fix issue with ambiguous query formatting in distributed queries that led to errors when some table columns were named ALL or DISTINCT. This closes #32391. #32490 (alexey-milovidov).
  • Fix failures in queries that are trying to use skipping indices, which are not materialized yet. Fixes #32292 and #30343. #32359 (Anton Popov).
  • Fix broken select query when there are more than 2 row policies on same column, begin at second queries on the same session. #31606. #32291 (SuperDJY).
  • Fix fractional unix timestamp conversion to DateTime64, fractional part was reversed for negative unix timestamps (before 1970-01-01). #32240 (Ben).
  • Some entries of replication queue might hang for temporary_directories_lifetime (1 day by default) with Directory tmp_merge_<part_name> or Part ... (state Deleting) already exists, but it will be deleted soon or similar error. It's fixed. Fixes #29616. #32201 (tavplubix).
  • Fix parsing of APPLY lambda column transformer which could lead to client/server crash. #32138 (Kruglov Pavel).
  • Fix base64Encode adding trailing bytes on small strings. #31797 (Kevin Michel).
  • Fix possible crash (or incorrect result) in case of LowCardinality arguments of window function. Fixes #31114. #31888 (Nikolai Kochetov).
  • Fix hang up with command DROP TABLE system.query_log sync. #33293 (zhanghuajie).

Changelog for 2021