ClickHouse/docs/changelogs/v21.4.1.6422-prestable.md
2022-05-25 00:05:54 +02:00

38 KiB

ClickHouse release v21.4.1.6422-prestable FIXME as compared to v21.3.1.6185-prestable

Backward Incompatible Change

  • Now replicas that are processing the ALTER TABLE ATTACH PART[ITION] command search in their detached/ folders before fetching the data from other replicas. As an implementation detail, a new command ATTACH_PART is introduced in the replicated log. Parts are searched and compared by their checksums. #18978 (Mike Kot).
  • Column keys in table system.dictionaries was replaced to columns key.names and key.types. Columns key.names, key.types, attribute.names, attribute.types from system.dictionaries table does not require dictionary to be loaded. #21884 (Maksim Kita).
  • Fix cutToFirstSignificantSubdomainCustom()/firstSignificantSubdomainCustom() returning wrong result for 3+ level domains present in custom top-level domain list. For input domains matching these custom top-level domains, the third-level domain was considered to be the first significant one. This is now fixed. This change may introduce incompatibility if the function is used in e.g. the sharding key. #21946 (Azat Khuzhin).
  • The toStartOfIntervalFunction will align hour intervals to the midnight (in previous versions they were aligned to the start of unix epoch). For example, toStartOfInterval(x, INTERVAL 11 HOUR) will split every day into three intervals: 00:00:00..10:59:59, 11:00:00..21:59:59 and 22:00:00..23:59:59. This behaviour is more suited for practical needs. This closes #9510. #22060 (Alexey Milovidov).

New Feature

  • Extended range of DateTime64 to properly support dates from year 1925 to 2283. Improved support of DateTime around zero date (1970-01-01). ... #9404 (Vasily Nemkov).
    • Added support of Kerberos authentication for preconfigured users and HTTP requests (GSS-SPNEGO). #14995 (Denis Glazachev).
  • Zero-copy replication for ReplicatedMergeTree over S3 storage. #16240 (ianton-ru).
  • Support dictHas function for RangeHashedDictionary. Fixes #6680. #19816 (Maksim Kita).
  • Supports implicit key type conversion for JOIN. Closes #18567. #19885 (Vladimir C).
  • Allow customizing timeouts for http connections used for replication independently from other http timeouts. #20088 (nvartolomei).
  • Added async update in ComplexKeyCache, SSDCache, SSDComplexKeyCache dictionaries. Added support for Nullable type in Cache, ComplexKeyCache, SSDCache, SSDComplexKeyCache dictionaries. Added support for multiple attributes fetch with dictGet, dictGetOrDefault functions. Fixes #21517. #20595 (Maksim Kita).
  • Added Grant, Revoke and System values of query_kind column for corresponding queries in system.query_log ... #21102 (Vasily Nemkov).
  • Added new SQL command ALTER TABLE 'table_name' UNFREEZE [PARTITION 'part_expr'] WITH NAME 'backup_name'. #21142 (Pavel Kovalenko).
  • Added ExecutablePool dictionary source. Close #14528. #21321 (Maksim Kita).
    • Add function isIPAddressInRange to test if an IPv4 or IPv6 address is contained in a given CIDR network prefix. #21329 (PHO).
  • Add _partition_id virtual column for MergeTree* engines. Allow to prune partitions by _partition_id. Add partitionID() function to calculate partition id string. #21401 (Amos Bird).
  • Add new column slowdowns_count to system.clusters. When using hedged requests, it shows how many times we switched to another replica because this replica was responding slowly. Also show actual value of errors_count in system.clusters. #21480 (Kruglov Pavel).
  • Add option --backslash for clickhouse-format, which can add a backslash at the end of each line of the formatted query. #21494 (flynn).
  • Add new optional clause GRANTEES for CREATE/ALTER USER commands:. #21641 (Vitaly Baranov).
  • Add ctime option to zookeeper-dump-tree. It allows to dump node creation time. #21842 (Ilya).
  • Functions 'dictGet', 'dictHas' use current database name if it is not specified for dictionaries created with DDL. Closes #21632. #21859 (Maksim Kita).
  • Support Nullable type for PolygonDictionary attribute. #21890 (Maksim Kita).
  • Added table function dictionary. It works the same way as Dictionary engine. Closes #21560. #21910 (Maksim Kita).
  • Add function timezoneOf that returns the timezone name of DateTime or DateTime64 data types. This does not close #9959. Fix inconsistencies in function names: add aliases timezone and timeZone as well as toTimezone and toTimeZone and timezoneOf and timeZoneOf. #22001 (Alexey Milovidov).
  • Add prefer_column_name_to_alias setting to use original column names instead of aliases. it is needed to be more compatible with common databases' aliasing rules. This is for #9715 and #9887. #22044 (Amos Bird).
  • Improved performance of dictGetHierarchy, dictIsIn functions. Added functions dictGetChildren(dictionary, key), dictGetDescendants(dictionary, key, level). Function dictGetChildren return all children as an array if indexes. It is a inverse transformation for dictGetHierarchy. Function dictGetDescendants return all descendants as if dictGetChildren was applied level times recursively. Zero level value is equivalent to infinity. Closes #14656. #22096 (Maksim Kita).
  • Added function dictGetOrNull. It works like dictGet, but return Null in case key was not found in dictionary. Closes #22375. #22413 (Maksim Kita).

Performance Improvement

  • Support parallel parsing for CSVWithNames and TSVWithNames formats. This closes #21085. #21149 (Nikita Mikhaylov).
  • Improved performance by replacing memcpy to another implementation. This closes #18583. #21520 (Alexey Milovidov).
  • Supported parallel formatting in clickhouse-local and everywhere else. #21630 (Nikita Mikhaylov).
  • Optimize performance of queries like SELECT ... FINAL ... WHERE. Now in queries with FINAL it's allowed to move to PREWHERE columns, which are in sorting key. ... #21830 (foolchi).
  • Faster GROUP BY with small max_rows_to_group_by and group_by_overflow_mode='any'. #21856 (Nikolai Kochetov).
  • Avoid unnecessary data copy when using codec NONE. Please note that codec NONE is mostly useless - it's recommended to always use compression (LZ4 is by default). Despite the common belief, disabling compression may not improve performance (the opposite effect is possible). The NONE codec is useful in some cases: - when data is uncompressable; - for synthetic benchmarks. #22145 (Alexey Milovidov).
  • Add cache for files read with min_bytes_to_use_mmap_io setting. It makes significant (2x and more) performance improvement when the value of the setting is small by avoiding frequent mmap/munmap calls and the consequent page faults. Note that mmap IO has major drawbacks that makes it less reliable in production (e.g. hung or SIGBUS on faulty disks; less controllable memory usage). Nevertheless it is good in benchmarks. #22206 (Alexey Milovidov).
  • Enable read with mmap IO for file ranges from 64 MiB (the settings min_bytes_to_use_mmap_io). It may lead to moderate performance improvement. #22326 (Alexey Milovidov).

Improvement

  • Introduce a new merge tree setting min_bytes_to_rebalance_partition_over_jbod which allows assigning new parts to different disks of a JBOD volume in a balanced way. #16481 (Amos Bird).
  • Improve performance of aggregation in order of sorting key (with enabled setting optimize_aggregation_in_order). #19401 (Anton Popov).
  • MaterializeMySQL: add minmax skipping index for _version column. #20382 (Stig Bakken).
  • Do not create empty parts on INSERT when optimize_on_insert setting enabled. Fixes #20304. #20387 (Kruglov Pavel).
  • MaterializeMySQL: Attempt to reconnect to MySQL if the connection is lost. #20961 (Håvard Kvålen).
  • Improve support of integer keys in data type Map. #21157 (Anton Popov).
  • Improve clickhouse-format to not throw exception when there are extra spaces or comment after the last query, and throw exception early with readable message when format ASTInsertQuery with data . #21311 (flynn).
  • Age and Precision in graphite rollup configs should increase from retention to retention. Now it's checked and the wrong config raises an exception. #21496 (Mikhail f. Shiryaev).
  • Add setting optimize_skip_unused_shards_limit to limit the number of sharding key values for optimize_skip_unused_shards. #21512 (Azat Khuzhin).
  • Add aliases simpleJSONExtract/simpleJSONHas to visitParam/visitParamExtract{UInt, Int, Bool, Float, Raw, String}. Fixes #21383. #21519 (fastio).
  • Add last_error_time/last_error_message/last_error_stacktrace/remote columns for system.errors. #21529 (Azat Khuzhin).
  • If PODArray was instantiated with element size that is neither a fraction or a multiple of 16, buffer overflow was possible. No bugs in current releases exist. #21533 (Alexey Milovidov).
    • Propagate query and session settings for distributed DDL queries. Set distributed_ddl_entry_format_version to 2 to enable this. - Added distributed_ddl_output_mode setting. Supported modes: none, throw (default), null_status_on_timeout and never_throw. - Miscellaneous fixes and improvements for Replicated database engine. #21535 (Alexander Tokmakov).
  • Update clusters only if their configurations were updated. #21685 (Kruglov Pavel).
  • Support replicas priority for postgres dictionary source. #21710 (Kseniia Sumarokova).
  • Closes #21701. Support non-default table schema for postgres storage/table-function. #21711 (Kseniia Sumarokova).
  • Better formatting for Array and Map data types in Web UI. #21798 (Alexey Milovidov).
  • DiskS3 (experimental feature under development). Fixed bug with the impossibility to move directory if the destination is not empty and cache disk is used. #21837 (Pavel Kovalenko).
  • Add connection pool for PostgreSQL table/database engine and dictionary source. Should fix #21444. #21839 (Kseniia Sumarokova).
  • Add profile event HedgedRequestsChangeReplica, change read data timeout from sec to ms. #21886 (Kruglov Pavel).
  • Support RANGE OFFSET frame for floating point types. Implement lagInFrame/leadInFrame window functions, which are analogous to lag/lead, but respect the window frame. They are identical when the frame is between unbounded preceding and unbounded following. This closes #5485. #21895 (Alexander Kuzmenkov).
  • Show path to data directory of EmbeddedRocksDB tables in system tables. #21903 (Alexander Tokmakov).
  • Supported replication_alter_partitions_sync=1 setting for moving partitions from helping table to destination. Decreased default timeouts. Fixes #21911. #21912 (jasong).
  • If partition key of a MergeTree table does not include Date or DateTime columns but includes exactly one DateTime64 column, expose its values in the min_time and max_time columns in system.parts and system.parts_columns tables. Add min_time and max_time columns to system.parts_columns table (these was inconsistency to the system.parts table). This closes #18244. #22011 (Alexey Milovidov).
    • Add option strict_increase to windowFunnel function to calculate each event once (resolve #21835). #22025 (Vladimir C).
  • Added case insensitive aliases for CONNECTION_ID() and VERSION() functions. This fixes #22028. #22042 (Eugene Klimov).
  • Update used version of simdjson to 0.9.1. This fixes #21984. #22057 (Vitaly Baranov).
  • Convert system.errors.stack_trace from String into Array(UInt64) (This should decrease overhead for the errors collecting). #22058 (Azat Khuzhin).
  • If tuple of NULLs, e.g. (NULL, NULL) is on the left hand side of IN operator with tuples of non-NULLs on the right hand side, e.g. SELECT (NULL, NULL) IN ((0, 0), (3, 1)) return 0 instead of throwing an exception about incompatible types. The expression may also appear due to optimization of something like SELECT (NULL, NULL) = (8, 0) OR (NULL, NULL) = (3, 2) OR (NULL, NULL) = (0, 0) OR (NULL, NULL) = (3, 1). This closes #22017. #22063 (Alexey Milovidov).
  • Added possibility to migrate existing S3 disk to the schema with backup-restore capabilities. #22070 (Pavel Kovalenko).
  • Add case-insensitive history search/navigation and subword movement features to clickhouse-client. #22105 (Amos Bird).
  • Add current_database column to system.processes table. It contains the current database of the query. #22365 (Alexander Kuzmenkov).
  • Fix MSan report for function range with UInt256 argument (support for large integers is experimental). This closes #22157. #22387 (Alexey Milovidov).
  • Fix error Directory tmp_fetch_XXX already exists which could happen after failed fetch part. Delete temporary fetch directory if it already exists. Fixes #14197. #22411 (nvartolomei).
  • Better exception message in client in case of exception while server is writing blocks. In previous versions client may get misleading message like Data compressed with different methods. #22427 (Alexey Milovidov).

Bug Fix

  • Fixed open behavior of remote host filter in case when there is remote_url_allow_hosts section in configuration but no entries there. ⚠️ please add a note about potential issue when upgrading - @alexey-milovidov. #20058 (Vladimir Chebotarev).
  • Fix name clashes in PredicateRewriteVisitor. It caused incorrect WHERE filtration after full join. Close #20497. #20622 (Vladimir C).
  • force_drop_table flag didn't work for MATERIALIZED VIEW, it's fixed. Fixes #18943. #20626 (Alexander Tokmakov).
  • fix official website documents which introduced cluster secret feature. #21331 (Chao Ma).
  • Fix receive and send timeouts and non-blocking read in secure socket. #21429 (Kruglov Pavel).
  • Fix Avro format parsing for Kafka. Fixes #21437. #21438 (Ilya Golshtein).
  • Fixed race on SSL object inside SecureSocket in Poco. #21456 (Nikita Mikhaylov).
  • Fix that S3 table holds old credentials after config update. #21457 (Pervakov Grigorii).
  • Fix table function clusterAllReplicas returns wrong _shard_num. close #21481. #21498 (flynn).
  • The ::poll() return rc == 1 , it could be a request or it could be a response. #21544 (小路).
  • In case if query has constant WHERE condition, and setting optimize_skip_unused_shards enabled, all shards may be skipped and query could return incorrect empty result. #21550 (Amos Bird).
  • Fix possible error Cannot find column when optimize_skip_unused_shards is enabled and zero shards are used. #21579 (Azat Khuzhin).
  • std::terminate was called if there is an error writing data into s3. #21624 (Vladimir C).
  • Remove unknown columns from joined table in where for queries to external database engines (MySQL, PostgreSQL). close #14614, close #19288 (dup), close #19645 (dup). #21640 (Vladimir C).
  • Fix fsync_part_directory for horizontal merge. #21642 (Azat Khuzhin).
  • Fix distributed requests cancellation (for example simple select from multiple shards with limit, i.e. select * from remote('127.{2,3}', system.numbers) limit 100) with async_socket_for_remote=1. #21643 (Azat Khuzhin).
  • Add type conversion for StorageJoin (previously led to SIGSEGV). #21646 (Azat Khuzhin).
  • Start accepting connections after DDLWorker and dictionaries initialization. #21676 (Azat Khuzhin).
  • Fix SIGSEGV on not existing attributes from ip_trie with access_to_key_from_attributes. #21692 (Azat Khuzhin).
  • Fix function arrayElement with type Map for constant integer arguments. #21699 (Anton Popov).
  • Fix concurrent OPTIMIZE and DROP for ReplicatedMergeTree. #21716 (Azat Khuzhin).
  • Fix bug for ReplicatedMerge table engines when ALTER MODIFY COLUMN query doesn't change the type of decimal column if its size (32 bit or 64 bit) doesn't change. #21728 (alesapin).
  • Reverted S3 connection pools. #21737 (Vladimir Chebotarev).
  • Fix adding of parts with already existing in destination table names in query MOVE PARTITION TO TABLE with non-replicated MergeTree tables. #21760 (ygrek).
  • Fix scalar subquery index analysis. This fixes #21717 , which was introduced in https://github.com/ClickHouse/ClickHouse/pull/18896 . #21766 (Amos Bird).
  • Fix possible crashes in aggregate functions with combinator Distinct, while using two-level aggregation. This is a follow-up fix of https://github.com/ClickHouse/ClickHouse/pull/18365 . Can only reproduced in production env. No test case available yet. cc @CurtizJ. #21818 (Amos Bird).
  • Better error handling and logging in WriteBufferFromS3. #21836 (Pavel Kovalenko).
  • Fix incorrect query result (and possible crash) which could happen when WHERE or HAVING condition is pushed before GROUP BY. Fixes #21773. #21841 (Nikolai Kochetov).
  • Fix deadlock in first catboost model execution. Closes #13832. #21844 (Kruglov Pavel).
  • Fix wrong ORDER BY results when a query contains window functions, and optimization for reading in primary key order is applied. Fixes #21828. #21915 (Alexander Kuzmenkov).
  • Fix reading the HTTP POST request with "multipart/form-data" content type. #21936 (Ivan).
  • Prevent hedged connections overlaps (Unknown packet 9 from server error). #21941 (Azat Khuzhin).
  • Reverted #15454 that may cause significant increase in memory usage while loading external dictionaries of hashed type. This closes #21935. #21948 (Maksim Kita).
  • In rare case, merge for CollapsingMergeTree may create granule with index_granularity + 1 rows. Because of this, internal check, added in #18928 (affects 21.2 and 21.3), may fail with error Incomplete granules are not allowed while blocks are granules size. This error did not allow parts to merge. #21976 (Nikolai Kochetov).
  • The function decrypt was lacking a check for the minimal size of data encrypted in AEAD mode. This closes #21897. #22064 (Alexey Milovidov).
  • Docker entrypoint: avoid chown of . in case when LOG_PATH is empty. Closes #22100. #22102 (filimonov).
  • Disable async_socket_for_remote/use_hedged_requests for buggy linux kernels. #22109 (Azat Khuzhin).
  • Fix waiting for OPTIMIZE and ALTER queries for ReplicatedMergeTree table engines. Now the query will not hang when the table was detached or restarted. #22118 (alesapin).
  • Fix the background thread pool name. #22122 (fastio).
  • Fix error Invalid number of rows in Chunk in JOIN with TOTALS and arrayJoin. Closes #19303. #22129 (Vladimir C).
  • Fix docker entrypoint in case http_port is not in the config. #22132 (Ewout).
  • Fix uncaught exception in InterserverIOHTTPHandler. #22146 (Azat Khuzhin).
  • Use finalize() over next() for nested writers. #22147 (Azat Khuzhin).
  • Fix query cancellation with use_hedged_requests=0 and async_socket_for_remote=1. #22183 (Azat Khuzhin).
  • Fix exception which may happen when SELECT has constant WHERE condition and source table has columns which names are digits. #22270 (LiuNeng).
  • Now clickhouse will not throw LOGICAL_ERROR exception when we try to mutate the already covered part. Fixes #22013. #22291 (alesapin).
  • Fixed bug in S3 zero-copy replication for hybrid storage. #22378 (ianton-ru).
  • Add (missing) memory accounting in parallel parsing routines. In previous versions OOM was possible when the resultset contains very large blocks of data. This closes #22008. #22425 (Alexey Milovidov).
  • Remove socket from epoll before cancelling packet receiver in HedgedConnections to prevent possible race. I hope it fixes #22161. #22443 (Kruglov Pavel).

Build/Testing/Packaging Improvement

Other

  • Update tests for hedged requests. #21998 (Kruglov Pavel).
  • Don't set the same timeouts in ReadBufferFromPocoSocket/WriteBufferFromPocoSocket in nextImpl because it causes a race. #22343 (Kruglov Pavel).

NO CL ENTRY

New Feature (datasketches support in clickhouse #14893)