ClickHouse/docs/changelogs/v21.3.2.5-lts.md
2022-05-25 00:05:54 +02:00

36 KiB

ClickHouse release v21.3.2.5-lts FIXME as compared to v21.2.1.5869-prestable

Backward Incompatible Change

  • Now all case-insensitive function names will be lower-cased during query analysis. This is needed for projection query routing. #20174 (Amos Bird).
  • Now it's not allowed to create MergeTree tables in old syntax with table TTL because it's just ignored. Attach of old tables is still possible. #20282 (alesapin).

New Feature

  • Add experimental Replicated database engine. It replicates DDL queries across multiple hosts. #16193 (Alexander Tokmakov).
  • Distributed query deduplication is a followup to #16033 and partially resolves the proposal #13574. #17348 (xjewer).
  • Add the ability to backup/restore metadata files for DiskS3. #18377 (Pavel Kovalenko).
    • Included in the pull request. #18508 (PHO).
  • Added file() function to read file as a String. This close #issue:18851. #19204 (keenwolf).
  • Tables with MergeTree* engine now have two new table-level settings for query concurrency control. Setting max_concurrent_queries limits the number of concurrently executed queries which are related to this table. Setting min_marks_to_honor_max_concurrent_queries tells to apply previous setting only if query reads at least this number of marks. #19544 (Amos Bird).
    • Mentioned in #18454 - add function htmlOrxmlCoarseParse; - support <script></script> parse; - support <style></style> parse; - support <![CDATA[]]> parse; - support white space collapse; - support any <content> format parse; - HyperScan to support SIMD; - Everything is done in a single pass. #19600 (zlx19950903).
  • Add quota type QUERY_SELECTS and QUERY_INSERTS. #19603 (JackyWoo).
  • ExecutableDictionarySource added implicit_key option. Fixes #14527. #19677 (Maksim Kita).
  • Added Server Side Encryption Customer Keys (the x-amz-server-side-encryption-customer-(key/md5) header) support in S3 client. See the link. Closes #19428. #19748 (Vladimir Chebotarev).
  • Function reinterpretAs updated to support big integers. Fixes #19691. #19858 (Maksim Kita).
  • Add setting insert_shard_id to support insert data into specific shard from distributed table. #19961 (flynn).
  • Added timezoneOffset(datetime) function which will give the offset from UTC in seconds. This close #issue:19850. #19962 (keenwolf).
  • New event_time_microseconds column in system.part_log table. #20027 (Bharat Nallan).
  • ... Add aggregate function deltaSum for summing the differences between consecutive rows. #20057 (Russ Frank).
  • Add two settings to delay or throw error during insertion when there are too many inactive parts. This is useful when server fails to clean up parts quickly enough. #20178 (Amos Bird).
  • Add file engine settings: engine_file_empty_if_not_exists and engine_file_truncate_on_insert. #20620 (M0r64n).

Performance Improvement

  • Add parallel select final for one part with level>0 when do_not_merge_across_partitions_select_final setting is 1. #19375 (Kruglov Pavel).
  • Improved performance of bitmap columns during joins. #19407 (templarzq).
  • Partially reimplement HTTP server to make it making less copies of incoming and outgoing data. It gives up to 1.5 performance improvement on inserting long records over HTTP. #19516 (Ivan).
  • Improve performance of aggregate functions by more strict aliasing. #19946 (Alexey Milovidov).
  • Fix the case when DataType parser may have exponential complexity (found by fuzzer). This closes #20096. #20132 (Alexey Milovidov).
  • Add compress setting for Memory tables. If it's enabled the table will use less RAM. On some machines and datasets it can also work faster on SELECT, but it is not always the case. This closes #20093. Note: there are reasons why Memory tables can work slower than MergeTree: (1) lack of compression (2) static size of blocks (3) lack of indices and prewhere... #20168 (Alexey Milovidov).
  • Do not squash blocks too much on INSERT SELECT if inserting into Memory table. In previous versions inefficient data representation was created in Memory table after INSERT SELECT. This closes #13052. #20169 (Alexey Milovidov).
  • Improved performance of aggregation by several fixed size fields (unconfirmed). #20454 (Alexey Milovidov).
  • Speed up reading from Memory tables in extreme cases (when reading speed is in order of 50 GB/sec) by simplification of pipeline and (consequently) less lock contention in pipeline scheduling. #20468 (Alexey Milovidov).
  • Improve performance of GROUP BY multiple fixed size keys. #20472 (Alexey Milovidov).
  • The setting distributed_aggregation_memory_efficient is enabled by default. It will lower memory usage and improve performance of distributed queries. #20599 (Alexey Milovidov).
  • Slightly better code in aggregation. #20978 (Alexey Milovidov).
  • Add back intDiv/module vectorConstant specializations for better performance. This fixes #21293 . The regression was introduced in https://github.com/ClickHouse/ClickHouse/pull/18145 . #21307 (Amos Bird).

Improvement

  • Fix creation of TTL in cases, when its expression is a function and it is the same as ORDER BY key. Now it's allowed to set custom aggregation to primary key columns in TTL with GROUP BY. Backward incompatible: For primary key columns, which are not in GROUP BY and aren't set explicitly now is applied function any instead of max, when TTL is expired. Also if you use TTL with WHERE or GROUP BY you can see exceptions at merges, while making rolling update. #15450 (Anton Popov).
  • Hedged Requests for remote queries. When setting use_hedged_requests enabled (by default), allow to establish many connections with different replicas for query. New connection is enabled in case existent connection(s) with replica(s) were not established within hedged_connection_timeout or no data was received within receive_data_timeout. Query uses the first connection which send non empty progress packet (or data packet, if allow_changing_replica_until_first_data_packet); other connections are cancelled. Queries with max_parallel_replicas > 1 are supported. #19291 (Kruglov Pavel).
  • Print inline frames for fatal stacktraces. #19317 (Ivan).
  • Do not silently ignore write errors. #19451 (Azat Khuzhin).
  • Added support for PREWHERE when tables have row-level security expressions specified. #19576 (Denis Glazachev).
  • Add IStoragePolicy interface. #19608 (Ernest Zaslavsky).
  • Add ability to throttle INSERT into Distributed based on amount of pending bytes for async send (bytes_to_delay_insert/max_delay_to_insert and bytes_to_throw_insert settings for Distributed engine has been added). #19673 (Azat Khuzhin).
  • move Conditions that are not related to JOIN to where clause. #18720. #19685 (hexiaoting).
  • Add separate config directive for Buffer profile. #19721 (Azat Khuzhin).
  • Show MaterializeMySQL tables in system.parts. #19770 (Stig Bakken).
  • Initialize MaxDDLEntryID to the last value after restarting. Before this PR, MaxDDLEntryID will remain zero until a new DDLTask is processed. #19924 (Amos Bird).
  • Add conversion of block structure for INSERT into Distributed tables if it does not match. #19947 (Azat Khuzhin).
  • If user calls JSONExtract function with Float32 type requested, allow inaccurate conversion to the result type. For example the number 0.1 in JSON is double precision and is not representable in Float32, but the user still wants to get it. Previous versions return 0 for non-Nullable type and NULL for Nullable type to indicate that conversion is imprecise. The logic was 100% correct but it was surprising to users and leading to questions. This closes #13962. #19960 (Alexey Milovidov).
  • The value of MYSQL_OPT_RECONNECT option can now be controlled by "opt_reconnect" parameter in the config section of mysql replica. #19998 (Alexander Kazakov).
  • Return DiskType instead of String in IDisk::getType() as in the rest of storage interfaces. #19999 (Ernest Zaslavsky).
  • Fix data race in executable dictionary that was possible only on misuse (when the script returns data ignoring its input). #20045 (Alexey Milovidov).
  • Show full details of MaterializeMySQL tables in system.tables. #20051 (Stig Bakken).
  • Supports system.zookeeper path IN query. #20105 (小路).
    1. SHOW TABLES is now considered as one query in the quota calculations, not two queries. 2. SYSTEM queries now consume quota. 3. Fix calculation of interval's end in quota consumption. #20106 (Vitaly Baranov).
    • Fix toDateTime64(toDate()/toDateTime()) for DateTime64 - Implement DateTime64 clamping to match DateTime behaviour. #20131 (Azat Khuzhin).
  • The setting access_management is now configurable on startup by providing CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT, defaults to disabled (0) which was the prior value. #20139 (Marquitos).
  • Updated CacheDictionary, ComplexCacheDictionary, SSDCacheDictionary, SSDComplexKeyDictionary to use LRUHashMap as underlying index. #20164 (Maksim Kita).
  • Support all native integer types in bitmap functions. #20171 (Amos Bird).
  • Normalize count(constant), sum(1) to count(). This is needed for projection query routing. #20175 (Amos Bird).
  • Perform algebraic optimizations of arithmetic expressions inside avg aggregate function. close #20092. #20183 (flynn).
  • Lockless SYSTEM FLUSH DISTRIBUTED. #20215 (Azat Khuzhin).
  • Implicit conversion from integer to Dicimal type might succeeded if integer value doe not fit into Decimal type. Now it throws ARGUMENT_OUT_OF_BOUND. #20232 (Alexander Tokmakov).
  • Do not allow early constant folding of explicitly forbidden functions. #20303 (Azat Khuzhin).
  • Make FQDN and other DNS related functions work correctly in alpine images. #20336 (filimonov).
  • Fixed race between execution of distributed DDL tasks and cleanup of DDL queue. Now DDL task cannot be removed from ZooKeeper if there are active workers. Fixes #20016. #20448 (Alexander Tokmakov).
  • Improved serialization for data types combined of Arrays and Tuples. Improved matching enum data types to protobuf enum type. Fixed serialization of the Map data type. Omitted values are now set by default. #20506 (Vitaly Baranov).
  • https://github.com/ClickHouse/ClickHouse/issues/20576. #20596 (Kseniia Sumarokova).
  • Function 'reinterpretAs(x, Type)' renamed into 'reinterpret(x, Type)'. #20611 (Maksim Kita).
  • When loading config for mysql source ClickHouse will now randomize the list of replicas with the same priority to ensure the round-robin logics of picking mysql endpoint. This closes #20629. #20632 (Alexander Kazakov).
  • Do only merging of sorted blocks on initiator with distributed_group_by_no_merge. #20882 (Azat Khuzhin).
    • Fill only requested columns when querying system.parts & system.parts_columns. Closes #19570. ... #21035 (Anmol Arora).
  • Usability improvement: more consistent DateTime64 parsing: recognize the case when unix timestamp with subsecond resolution is specified as scaled integer (like 1111111111222 instead of 1111111111.222). This closes #13194. #21053 (Alexey Milovidov).
  • MySQL dictionary source will now retry unexpected connection failures (Lost connection to MySQL server during query) which sometimes happen on SSL/TLS connections. #21237 (Alexander Kazakov).
  • Forbid to drop a column if it's referenced by materialized view. Closes #21164. #21303 (flynn).
  • Provide better compatibility for mysql clients. 1. mysql jdbc 2. mycli. #21367 (Amos Bird).
  • Case-insensitive compression methods for table functions. Also fixed LZMA compression method which was checked in upper case. #21416 (Vladimir Chebotarev).

Bug Fix

  • Background thread which executes ON CLUSTER queries might hang waiting for dropped replicated table to do something. It's fixed. #19684 (yiguolei).
  • Fix a bug that moving pieces to destination table may failed in case of launching multiple clickhouse-copiers. #19743 (madianjun).
  • Fix clickhouse-client abort exception while executing only select. #19790 (李扬).
  • Fix starting the server with tables having default expressions containing dictGet(). Allow getting return type of dictGet() without loading dictionary. #19805 (Vitaly Baranov).
  • Deadlock was possible if system.text_log is enabled. This fixes #19874. #19875 (Alexey Milovidov).
  • BloomFilter index crash fix. Fixes #19757. #19884 (Maksim Kita).
    • Fix a segfault in function fromModifiedJulianDay when the argument type is Nullable(T) for any integral types other than Int32. #19959 (PHO).
  • EmbeddedRocksDB is an experimental storage. Fix the issue with lack of proper type checking. Simplified code. This closes #19967. #19972 (Alexey Milovidov).
  • Prevent "Connection refused" in docker during initialization script execution. #20012 (filimonov).
  • MaterializeMySQL: Fix replication for statements that update several tables. #20066 (Håvard Kvålen).
  • Fix the case when calculating modulo of division of negative number by small divisor, the resulting data type was not large enough to accomodate the negative result. This closes #20052. #20067 (Alexey Milovidov).
  • The MongoDB table engine now establishes connection only when it's going to read data. ATTACH TABLE won't try to connect anymore. #20110 (Vitaly Baranov).
  • Fix server crash after query with if function with Tuple type of then/else branches result. Tuple type must contain Array or another complex type. Fixes #18356. #20133 (alesapin).
  • fix toMinute function to handle special timezone correctly. #20149 (keenwolf).
  • Fixes #19314. #20156 (Ivan).
  • Fix CTE when using in INSERT SELECT. This fixes #20187, fixes #20195. #20211 (Amos Bird).
  • Fix rare server crash on config reload during the shutdown. Fixes #19689. #20224 (alesapin).
  • Fix exception during vertical merge for MergeTree table engines family which don't allow to perform vertical merges. Fixes #20259. #20279 (alesapin).
  • Fixed the behavior when in case of broken JSON we tried to read the whole file into memory which leads to exception from the allocator. Fixes #19719. #20286 (Nikita Mikhaylov).
  • Restrict to DROP or RENAME version column of *CollapsingMergeTree and ReplacingMergeTree table engines. #20300 (alesapin).
  • Fix too often retries of failed background tasks for ReplicatedMergeTree table engines family. This could lead to too verbose logging and increased CPU load. Fixes #20203. #20335 (alesapin).
  • Fix incorrect result of binary operations between two constant decimals of different scale. Fixes #20283. #20339 (Maksim Kita).
  • Fix null dereference with join_use_nulls=1. #20344 (Azat Khuzhin).
  • Avoid invalid dereference in RANGE_HASHED() dictionary. #20345 (Azat Khuzhin).
  • Check if table function view is used in expression list and throw an error. This fixes #20342. #20350 (Amos Bird).
  • Fix LOGICAL_ERROR for join_use_nulls=1 when JOIN contains const from SELECT. #20461 (Azat Khuzhin).
  • Fix abnormal server termination when http client goes away. #20464 (Azat Khuzhin).
  • Fix infinite loop when propagating WITH aliases to subqueries. This fixes #20388. #20476 (Amos Bird).
  • Fix function transform does not work properly for floating point keys. Closes #20460. #20479 (flynn).
  • Add proper checks while parsing directory names for async INSERT (fixes SIGSEGV). #20498 (Azat Khuzhin).
  • Fix crash which could happen if unknown packet was received from remove query (was introduced in #17868). #20547 (Azat Khuzhin).
  • Fix the number of threads for scalar subqueries and subqueries for index (after #19007 single thread was always used). Fixes #20457, #20512. #20550 (Nikolai Kochetov).
  • Fixed inconsistent behavior of dictionary in case of queries where we look for absent keys in dictionary. #20578 (Nikita Mikhaylov).
  • Fix subquery with union distinct and limit clause. close #20597. #20610 (flynn).
  • Backported in #21571: force_drop_table flag didn't work for MATERIALIZED VIEW, it's fixed. Fixes #18943. #20626 (Alexander Tokmakov).
  • Fix usage of -Distinct combinator with -State combinator in aggregate functions. #20866 (Anton Popov).
  • USE database; query did not work when using MySQL 5.7 client to connect to ClickHouse server, it's fixed. Fixes #18926. #20878 (Alexander Tokmakov).
  • Fix 'Empty task was returned from async task queue' on query cancellation. #20881 (Azat Khuzhin).
  • Closes #9969. Fixed Brotli http compression error, which reproduced for large data sizes, slightly complicated structure and with json output format. Update Brotli to the latest version to include the "fix rare access to uninitialized data in ring-buffer". #20991 (Kseniia Sumarokova).
  • Fixed behaviour, when ALTER MODIFY COLUMN created mutation, that will knowingly fail. #21007 (Anton Popov).
  • Out of bound memory access was possible when formatting specifically crafted out of range value of type DateTime64. This closes #20494. This closes #20543. #21023 (Alexey Milovidov).
  • fix default_replica_path and default_replica_name values are useless on Replicated(*)MergeTree engine when the engine needs specify other parameters. #21060 (mxzlxy).
  • Fix type mismatch issue when using LowCardinality keys in joinGet. This fixes #21114. #21117 (Amos Bird).
  • Fix the metadata leak when the Replicated*MergeTree with custom (non default) ZooKeeper cluster is dropped. #21119 (fastio).
  • fix bug related to cast tuple to map. Closes #21029. #21120 (hexiaoting).
  • Fix input_format_null_as_default take effective when types are nullable. This fixes #21116 . #21121 (Amos Bird).
  • Fixes #21112. Fixed bug that could cause duplicates with insert query (if one of the callbacks came a little too late). #21138 (Kseniia Sumarokova).
  • Now mutations allowed only for table engines that support them (MergeTree family, Memory, MaterializedView). Other engines will report a more clear error. Fixes #21168. #21183 (alesapin).
  • Fix crash in EXPLAIN for query with UNION. Fixes #20876, #21170. #21246 (flynn).
  • Fix bug with join_use_nulls and joining TOTALS from subqueries. This closes #19362 and #21137. #21248 (Vladimir C).
  • Fix redundant reconnects to ZooKeeper and the possibility of two active sessions for a single clickhouse server. Both problems introduced in #14678. #21264 (alesapin).
  • Now ALTER MODIFY COLUMN queries will correctly affect changes in partition key, skip indices, TTLs, and so on. Fixes #13675. #21334 (alesapin).
  • Fix error Bad cast from type ... to DB::ColumnLowCardinality while inserting into table with LowCardinality column from Values format. Fixes #21140. #21357 (Nikolai Kochetov).
  • Fix SIGSEGV for distributed queries on failures. #21434 (Azat Khuzhin).
  • Backported in #21610: Fixed race on SSL object inside SecureSocket in Poco. #21456 (Nikita Mikhaylov).
  • Fix a deadlock in ALTER DELETE mutations for non replicated MergeTree table engines when the predicate contains the table itself. Fixes #20558. #21477 (alesapin).

Build/Testing/Packaging Improvement

Experimental feature

  • Introduce experimental support for window functions, enabled with allow_experimental_functions = 1. This is a preliminary, alpha-quality implementation that is not suitable for production use and will change in backward-incompatible ways in future releases. Please see the documentation for the list of supported features. #20337 (Alexander Kuzmenkov).

NO CL ENTRY