ClickHouse/docs/changelogs/v23.12.1.1368-stable.md

56 KiB

sidebar_position sidebar_label
1 2023

2023 Changelog

ClickHouse release v23.12.1.1368-stable (a2faa65b08) FIXME as compared to v23.11.1.2711-stable (05bc8ef1e0)

Backward Incompatible Change

  • Fix check for non-deterministic functions in TTL expressions. Previously, you could create a TTL expression with non-deterministic functions in some cases, which could lead to undefined behavior later. This fixes #37250. Disallow TTL expressions that don't depend on any columns of a table by default. It can be allowed back by SET allow_suspicious_ttl_expressions = 1 or SET compatibility = '23.11'. Closes #37286. #51858 (Alexey Milovidov).
  • Remove function arrayFold because it has a bug. This closes #57816. This closes #57458. #57836 (Alexey Milovidov).
  • Remove the feature of is_deleted row in ReplacingMergeTree and the CLEANUP modifier for the OPTIMIZE query. This fixes #57930. This closes #54988. This closes #54570. This closes #50346. This closes #47579. The feature has to be removed because it is not good. We have to remove it as quickly as possible, because there is no other option. #57932 (Alexey Milovidov).
  • The MergeTree setting clean_deleted_rows is deprecated, it has no effect anymore. The CLEANUP keyword for OPTIMIZE is not allowed by default (unless allow_experimental_replacing_merge_with_cleanup is enabled). #58267 (Alexander Tokmakov).

New Feature

  • Allow disabling of HEAD request before GET request. #54602 (Fionera).
  • Add a HTTP endpoint for checking if Keeper is ready to accept traffic. #55876 (Konstantin Bogdanov).
  • Add 'union' mode for schema inference. In this mode the resulting table schema is the union of all files schemas (so schema is inferred from each file). The mode of schema inference is controlled by a setting schema_inference_mode with 2 possible values - default and union. Closes #55428. #55892 (Kruglov Pavel).
  • Add new setting input_format_csv_try_infer_numbers_from_strings that allows to infer numbers from strings in CSV format. Closes #56455. #56859 (Kruglov Pavel).
  • Refreshable materialized views. #56946 (Michael Kolupaev).
  • Add more warnings on the number of databases, tables. #57375 (凌涛).
  • Added a new mutation command ALTER TABLE <table> APPLY DELETED MASK, which allows to enforce applying of mask written by lightweight delete and to remove rows marked as deleted from disk. #57433 (Anton Popov).
  • Added a new SQL function sqid to generate Sqids (https://sqids.org/), example: SELECT sqid(125, 126). #57512 (Robert Schulze).
  • Dictionary with HASHED_ARRAY (and COMPLEX_KEY_HASHED_ARRAY) layout supports SHARDS similarly to HASHED. #57544 (vdimir).
  • Add asynchronous metrics for total primary key bytes and total allocated primary key bytes in memory. #57551 (Bharat Nallan).
  • Table system.dropped_tables_parts contains parts of system.dropped_tables tables (dropped but not yet removed tables). #57555 (Yakov Olkhovskiy).
  • Add FORMAT_BYTES as an alias for formatReadableSize. #57592 (Bharat Nallan).
  • Add SHA512_256 function. #57645 (Bharat Nallan).
  • Allow passing optional SESSION_TOKEN to s3 table function. #57850 (Shani Elharrar).
  • Clause ORDER BY now supports specifying ALL, meaning that ClickHouse sorts by all columns in the SELECT clause. Example: SELECT col1, col2 FROM tab WHERE [...] ORDER BY ALL. #57875 (zhongyuankai).
  • Added functions for punycode encoding/decoding: punycodeEncode() and punycodeDecode(). #57969 (Robert Schulze).
  • This PR reproduces the implementation of PASTE JOIN, which allows users to join tables without ON clause. Example: ``` SQL SELECT * FROM ( SELECT number AS a FROM numbers(2) ) AS t1 PASTE JOIN ( SELECT number AS a FROM numbers(2) ORDER BY a DESC ) AS t2. #57995 (Yarik Briukhovetskyi).
  • A handler /binary opens a visual viewer of symbols inside the ClickHouse binary. #58211 (Alexey Milovidov).

Performance Improvement

  • Made copy between s3 disks using a s3-server-side copy instead of copying through the buffer. Improves BACKUP/RESTORE operations and clickhouse-disks copy command. #56744 (MikhailBurdukov).
  • HashJoin respects setting max_joined_block_size_rows and do not produce large blocks for ALL JOIN. #56996 (vdimir).
  • Release memory for aggregation earlier. This may avoid unnecessary external aggregation. #57691 (Nikolai Kochetov).
  • Improve performance of string serialization. #57717 (Maksim Kita).
  • Support trivial count optimization for Merge-engine tables. #57867 (skyoct).
  • Optimized aggregation in some cases. #57872 (Anton Popov).
  • The hasAny() function can now take advantage of the full-text skipping indices. #57878 (Jpnock).
  • Function if(cond, then, else) (and its alias cond ? : then : else) were optimized to use branch-free evaluation. #57885 (zhanglistar).
  • Extract non intersecting parts ranges from MergeTree table during FINAL processing. That way we can avoid additional FINAL logic for this non intersecting parts ranges. In case when amount of duplicate values with same primary key is low, performance will be almost the same as without FINAL. Improve reading performance for MergeTree FINAL when do_not_merge_across_partitions_select_final setting is set. #58120 (Maksim Kita).
  • MergeTree automatically derive do_not_merge_across_partitions_select_final setting if partition key expression contains only columns from primary key expression. #58218 (Maksim Kita).
  • Speedup MIN and MAX for native types. #58231 (Raúl Marín).

Improvement

  • Make inserts into distributed tables handle updated cluster configuration properly. When the list of cluster nodes is dynamically updated, the Directory Monitor of the distribution table cannot sense the new node, and the Directory Monitor must be re-noded to sense it. #42826 (zhongyuankai).
  • Replace --no-system-tables with loading virtual tables of system database lazily. #55271 (Azat Khuzhin).
  • Clickhouse-test print case sn, current time and case name in one test case. #55710 (guoxiaolong).
  • Do not allow creating replicated table with inconsistent merge params. #56833 (Duc Canh Le).
  • Implement SLRU cache policy for filesystem cache. #57076 (Kseniia Sumarokova).
  • Show uncompressed size in system.tables, obtained from data parts' checksums #56618. #57186 (Chen Lixiang).
  • Add skip_unavailable_shards as a setting for Distributed tables that is similar to the corresponding query-level setting. Closes #43666. #57218 (Gagan Goel).
  • Function substring() (aliases: substr, mid) can now be used with Enum types. Previously, the first function argument had to be a value of type String or FixedString. This improves compatibility with 3rd party tools such as Tableau via MySQL interface. #57277 (Serge Klochkov).
  • Better hints when a table doesn't exist. #57342 (Bharat Nallan).
  • Allow to overwrite max_partition_size_to_drop and max_table_size_to_drop server settings in query time. #57452 (Jordi Villar).
  • Add support for read-only flag when connecting to the ZooKeeper server (fixes #53749). #57479 (Mikhail Koviazin).
  • Fix possible distributed sends stuck due to "No such file or directory" (during recovering batch from disk). Fix possible issues with error_count from system.distribution_queue (in case of distributed_directory_monitor_max_sleep_time_ms >5min). Introduce profile event to track async INSERT failures - DistributedAsyncInsertionFailures. #57480 (Azat Khuzhin).
  • The limit for the number of connections per endpoint for background fetches was raised from 15 to the value of background_fetches_pool_size setting. - MergeTree-level setting replicated_max_parallel_fetches_for_host became obsolete - MergeTree-level settings replicated_fetches_http_connection_timeout, replicated_fetches_http_send_timeout and replicated_fetches_http_receive_timeout are moved to the Server-level. - Setting keep_alive_timeout is added to the list of Server-level settings. #57523 (Nikita Mikhaylov).
  • It is now possible to refer to ALIAS column in index (non-primary-key) definitions (issue #55650). Example: CREATE TABLE tab(col UInt32, col_alias ALIAS col + 1, INDEX idx (col_alias) TYPE minmax) ENGINE = MergeTree ORDER BY col;. #57546 (Robert Schulze).
  • Function format() now supports arbitrary argument types (instead of only String and FixedString arguments). This is important to calculate SELECT format('The {0} to all questions is {1}', 'answer', 42). #57549 (Robert Schulze).
  • Support PostgreSQL generated columns and default column values in MaterializedPostgreSQL. Closes #40449. #57568 (Kseniia Sumarokova).
  • Allow to apply some filesystem cache config settings changes without server restart. #57578 (Kseniia Sumarokova).
  • Handle sigabrt case when getting PostgreSQl table structure with empty array. #57618 (Mike Kot (Михаил Кот)).
  • Allows to use the date_trunc() function with the first argument not depending on the case of it. Both cases are now supported: SELECT date_trunc('day', now()) and SELECT date_trunc('DAY', now()). #57624 (Yarik Briukhovetskyi).
  • Expose the total number of errors occurred since last server as a ClickHouseErrorMetric_ALL metric. #57627 (Nikita Mikhaylov).
  • Allow nodes in config with from_env/from_zk and non empty element with replace=1. #57628 (Azat Khuzhin).
  • Generate malformed output that cannot be parsed as JSON. #57646 (Julia Kartseva).
  • Consider lightweight deleted rows when selecting parts to merge if enabled. #57648 (Zhuo Qiu).
  • Make querying system.filesystem_cache not memory intensive. #57687 (Kseniia Sumarokova).
  • Allow IPv6 to UInt128 conversion and binary arithmetic. #57707 (Yakov Olkhovskiy).
  • Support negative positional arguments. Closes #57736. #57741 (flynn).
  • Add a setting for async inserts deduplication cache -- how long we wait for cache update. Deprecate setting async_block_ids_cache_min_update_interval_ms. Now cache is updated only in case of conflicts. #57743 (alesapin).
  • sleep() function now can be cancelled with KILL QUERY. #57746 (Vitaly Baranov).
  • Slightly better inference of unnamed tupes in JSON formats. #57751 (Kruglov Pavel).
  • Refactor UserDefinedSQL* classes to make it possible to add SQL UDF storages which are different from ZooKeeper and Disk. #57752 (Natasha Chizhonkova).
  • Forbid CREATE TABLE ... AS SELECT queries for Replicated table engines in Replicated database because they are broken. Reference #35408. #57796 (Nikolay Degterinsky).
  • Fix and improve transform query for external database, we should recursively obtain all compatible predicates. #57888 (flynn).
  • Support dynamic reloading of filesystem cache size. Closes #57866. #57897 (Kseniia Sumarokova).
  • Fix system.stack_trace for threads with blocked SIGRTMIN. #57907 (Azat Khuzhin).
  • Added a new setting readonly which can be used to specify a s3 disk is read only. It can be useful to create a table with read only s3_plain type disk. #57977 (Pengyuan Bian).
  • Support keeper failures in quorum check. #57986 (Raúl Marín).
  • Add max/peak RSS (MemoryResidentMax) into system.asynchronous_metrics. #58095 (Azat Khuzhin).
  • Fix system.stack_trace for threads with blocked SIGRTMIN (and also send signal to the threads only if it is not blocked to avoid waiting storage_system_stack_trace_pipe_read_timeout_ms when it does not make any sense). #58136 (Azat Khuzhin).
  • This PR allows users to use s3 links (https:// and s3://) without mentioning region if it's not default. Also find the correct region if the user mentioned the wrong one. ### Documentation entry for user-facing changes. #58148 (Yarik Briukhovetskyi).
  • clickhouse-format --obfuscate will know about Settings, MergeTreeSettings, and time zones and keep their names unchanged. #58179 (Alexey Milovidov).
  • Added explicit finalize() function in ZipArchiveWriter. Simplify too complicated code in ZipArchiveWriter. This PR fixes #58074. #58202 (Vitaly Baranov).
  • The primary key analysis in MergeTree tables will now be applied to predicates that include the virtual column _part_offset (optionally with _part). This feature can serve as a poor man's secondary index. #58224 (Amos Bird).
  • Make caches with the same path use the same cache objects. This behaviour existed before, but was broken in https://github.com/ClickHouse/ClickHouse/pull/48805 (in 23.4). If such caches with the same path have different set of cache settings, an exception will be thrown, that this is not allowed. #58264 (Kseniia Sumarokova).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in an official stable release)

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT