ClickHouse/docs/changelogs/v23.7.1.2470-stable.md

79 KiB

sidebar_position sidebar_label
1 2023

2023 Changelog

ClickHouse release v23.7.1.2470-stable (a70127baec) FIXME as compared to v23.6.1.1524-stable (d1c7e13d08)

Backward Incompatible Change

  • Add NAMED COLLECTION access type (aliases USE NAMED COLLECTION, NAMED COLLECTION USAGE). This PR is backward incompatible because this access type is disabled by default (because a parent access type NAMED COLLECTION ADMIN is disabled by default as well). Proposed in #50277. To grant use GRANT NAMED COLLECTION ON collection_name TO user or GRANT NAMED COLLECTION ON * TO user, to be able to give these grants named_collection_admin is required in config (previously it was named named_collection_control, so will remain as an alias). #50625 (Kseniia Sumarokova).
  • Fixing a typo in the system.parts column name last_removal_attemp_time. Now it is named last_removal_attempt_time. #52104 (filimonov).
  • Bump version of the distributed_ddl_entry_format_version to 5 by default (enables opentelemetry and initial_query_idd pass through). This will not allow to process existing entries for distributed DDL after downgrade (but note, that usually there should be no such unprocessed entries). #52128 (Azat Khuzhin).
  • Check projection metadata the same way we check ordinary metadata. This change may prevent the server from starting in case there was a table with an invalid projection. An example is a projection that created positional columns in PK (e.g. projection p (select * order by 1, 4) which is not allowed in table PK and can cause a crash during insert/merge). Drop such projections before the update. Fixes #52353. #52361 (Nikolai Kochetov).
  • The experimental feature hashid is removed due to a bug. The quality of implementation was questionable at the start, and it didn't get through the experimental status. This closes #52406. #52449 (Alexey Milovidov).
  • The function toDecimalString is removed due to subpar implementation quality. This closes #52407. #52450 (Alexey Milovidov).

New Feature

  • Implement KQL-style formatting for Interval. #45671 (ltrk2).
  • Support ZooKeeper reconfig command for CH Keeper with incremental reconfiguration which can be enabled via keeper_server.enable_reconfiguration setting. Support adding servers, removing servers, and changing server priorities. #49450 (Mike Kot).
  • Kafka connector can fetch avro schema from schema registry with basic authentication using url-encoded credentials. #49664 (Ilya Golshtein).
  • Add function arrayJaccardIndex which computes the Jaccard similarity between two arrays. #50076 (FFFFFFFHHHHHHH).
  • Added support for prql as a query language. #50686 (János Benjamin Antal).
  • Add a column is_obsolete to system.settings and similar tables. Closes #50819. #50826 (flynn).
  • Implement support of encrypted elements in configuration file Added possibility to use encrypted text in leaf elements of configuration file. The text is encrypted using encryption codecs from <encryption_codecs> section. #50986 (Roman Vasin).
  • Just a new request of #49483. #51013 (lgbo).
  • Add SYSTEM STOP LISTEN query. Closes #47972. #51016 (Nikolay Degterinsky).
  • Add input_format_csv_allow_variable_number_of_columns options. #51273 (Dmitry Kardymon).
  • Another boring feature: add function substring_index, as in spark or mysql. #51472 (李扬).
  • Show stats for jemalloc bins. Example ``` SELECT *, size * (nmalloc - ndalloc) AS allocated_bytes FROM system.jemalloc_bins WHERE allocated_bytes > 0 ORDER BY allocated_bytes DESC LIMIT 10. #51674 (Alexander Gololobov).
  • Add RowBinaryWithDefaults format with extra byte before each column for using column default value. Closes #50854. #51695 (Kruglov Pavel).
  • Added default_temporary_table_engine setting. Same as default_table_engine but for temporary tables. #51292. #51708 (velavokr).
  • Added new initcap / initcapUTF8 functions which convert the first letter of each word to upper case and the rest to lower case. #51735 (Dmitry Kardymon).
  • Create table now supports PRIMARY KEY syntax in column definition. Columns are added to primary index in the same order columns are defined. #51881 (Ilya Yatsishin).
  • Added the possibility to use date and time format specifiers in log and error log file names, either in config files (log and errorlog tags) or command line arguments (--log-file and --errorlog-file). #51945 (Victor Krasnov).
  • Added Peak Memory Usage (for query) to client final statistics, and to http header. #51946 (Dmitry Kardymon).
  • Added new hasSubsequence() (+CaseInsensitive + UTF8 versions) functions. #52050 (Dmitry Kardymon).
  • Add array_agg as alias of groupArray for PostgreSQL compatibility. Closes #52100. ### Documentation entry for user-facing changes. #52135 (flynn).
  • Add any_value as a compatibility alias for any aggregate function. Closes #52140. #52147 (flynn).
  • Add aggregate function array_concat_agg for compatibility with BigQuery, it's alias of groupArrayArray. Closes #52139. #52149 (flynn).
  • Add OCTET_LENGTH as an alias to length. Closes #52153. #52176 (FFFFFFFHHHHHHH).
  • Re-add SipHash keyed functions. #52206 (Salvatore Mesoraca).
  • Added firstLine function to extract the first line from the multi-line string. This closes #51172. #52209 (Mikhail Koviazin).

Performance Improvement

  • Enable move_all_conditions_to_prewhere and enable_multiple_prewhere_read_steps settings by default. #46365 (Alexander Gololobov).
  • Improves performance of some queries by tuning allocator. #46416 (Azat Khuzhin).
  • Writing parquet files is 10x faster, it's multi-threaded now. Almost the same speed as reading. #49367 (Michael Kolupaev).
  • Enable automatic selection of the sparse serialization format by default. It improves performance. The format is supported since version 22.1. After this change, downgrading to versions older than 22.1 might not be possible. You can turn off the usage of the sparse serialization format by providing the ratio_of_defaults_for_sparse_serialization = 1 setting for your MergeTree tables. #49631 (Alexey Milovidov).
  • Now we use fixed-size tasks in MergeTreePrefetchedReadPool as in MergeTreeReadPool. Also from now we use connection pool for S3 requests. #49732 (Nikita Taranov).
  • More pushdown to the right side of join. #50532 (Nikita Taranov).
  • Improve grace_hash join by reserving hash table's size (resubmit). #50875 (lgbo).
  • Waiting on lock in OpenedFileCache could be noticeable sometimes. We sharded it into multiple sub-maps (each with its own lock) to avoid contention. #51341 (Nikita Taranov).
  • Remove duplicate condition in functionunixtimestamp64.h. #51857 (lcjh).
  • The idea is that conditions with PK columns are likely to be used in PK analysis and will not contribute much more to PREWHERE filtering. #51958 (Alexander Gololobov).
    1. Add rewriter for both old and new analyzer. 2. Add settings optimize_uniq_to_count which default is 0. #52004 (JackyWoo).
  • The performance experiments of OnTime on the ICX device (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads) show that this change could bring an improvement of 11.6% to the QPS of the query Q8 while having no impact on others. #52036 (Zhiguo Zhou).
  • Enable allow_vertical_merges_from_compact_to_wide_parts by default. It will save memory usage during merges. #52295 (Alexey Milovidov).
  • Fix incorrect projection analysis which invalidates primary keys. This issue only exists when query_plan_optimize_primary_key = 1, query_plan_optimize_projection = 1 . This fixes #48823 . This fixes #51173 . #52308 (Amos Bird).
  • Reduce the number of syscalls in FileCache::loadMetadata. #52435 (Raúl Marín).

Improvement

  • Added query SYSTEM FLUSH ASYNC INSERT QUEUE which flushes all pending asynchronous inserts to the destination tables. Added a server-side setting async_insert_queue_flush_on_shutdown (true by default) which determines whether to flush queue of asynchronous inserts on graceful shutdown. Setting async_insert_threads is now a server-side setting. #49160 (Anton Popov).
  • Don't show messages about 16 EiB free space in logs, as they don't make sense. This closes #49320. #49342 (Alexey Milovidov).
  • Properly check the limit for the sleepEachRow function. Add a setting function_sleep_max_microseconds_per_block. This is needed for generic query fuzzer. #49343 (Alexey Milovidov).
  • Fix two issues: ``` select geohashEncode(120.2, number::Float64) from numbers(10);. #50066 (李扬).
  • Add support for external disks in Keeper for storing snapshots and logs. #50098 (Antonio Andelic).
  • Add support for multi-directory selection ({}) globs. #50559 (Andrey Zvonov).
  • Allow to have strict lower boundary for file segment size by downloading remaining data in the background. Minimum size of file segment (if actual file size is bigger) is configured as cache configuration setting boundary_alignment, by default 4Mi. Number of background threads are configured as cache configuration setting background_download_threads, by default 2. Also max_file_segment_size was increased from 8Mi to 32Mi in this PR. #51000 (Kseniia Sumarokova).
  • Allow filtering HTTP headers with http_forbid_headers section in config. Both exact matching and regexp filters are available. #51038 (Nikolay Degterinsky).
  • #50727 new alias for function current_database and added new function current_schemas. #51076 (Pedro Riera).
  • Log async insert flush queries into to system.query_log. #51160 (Raúl Marín).
  • Decreased default timeouts for S3 from 30 seconds to 3 seconds, and for other HTTP from 180 seconds to 30 seconds. #51171 (Michael Kolupaev).
  • Use read_bytes/total_bytes_to_read for progress bar in s3/file/url/... table functions for better progress indication. #51286 (Kruglov Pavel).
  • Functions "date_diff() and age()" now support millisecond/microsecond unit and work with microsecond precision. #51291 (Dmitry Kardymon).
  • Allow SQL standard FETCH without OFFSET. See https://antonz.org/sql-fetch/. #51293 (Alexey Milovidov).
  • Improve parsing of path in clickhouse-keeper-client. #51359 (Azat Khuzhin).
  • A third-party product depending on ClickHouse (Gluten: Plugin to Double SparkSQL's Performance) had a bug. This fix avoids heap overflow in that third-party product while reading from HDFS. #51386 (李扬).
  • Fix checking error caused by uninitialized class members. #51418 (李扬).
  • Add ability to disable native copy for S3 (setting for BACKUP/RESTORE allow_s3_native_copy, and s3_allow_native_copy for s3/s3_plain disks). #51448 (Azat Khuzhin).
  • Add column primary_key_size to system.parts table to show compressed primary key size on disk. Closes #51400. #51496 (Yarik Briukhovetskyi).
  • Allow running clickhouse-local without procfs, without home directory existing, and without name resolution plugins from glibc. #51518 (Alexey Milovidov).
  • Correcting the message of modify storage policy https://github.com/clickhouse/clickhouse/issues/51516 ### documentation entry for user-facing changes. #51519 (xiaolei565).
  • Support DROP FILESYSTEM CACHE <cache_name> KEY <key> [ OFFSET <offset>]. #51547 (Kseniia Sumarokova).
  • Allow to add disk name for custom disks. Previously custom disks would use an internal generated disk name. Now it will be possible with disk = disk_<name>(...) (e.g. disk will have name name) . #51552 (Kseniia Sumarokova).
  • Add placeholder %a for rull filename in rename_files_after_processing setting. #51603 (Kruglov Pavel).
  • Add column modification time into system.parts_columns. #51685 (Azat Khuzhin).
  • Add new setting input_format_csv_use_default_on_bad_values to CSV format that allows to insert default value when parsing of a single field failed. #51716 (KevinyhZou).
  • Added a crash log flush to the disk after the unexpected crash. #51720 (Alexey Gerasimchuck).
  • Fix behavior in dashboard page where errors unrelated to authentication are not shown. Also fix 'overlapping' chart behavior. #51744 (Zach Naimon).
  • Allow UUID to UInt128 conversion. #51765 (Dmitry Kardymon).
  • Added support for function range of Nullable arguments. #51767 (Dmitry Kardymon).
  • Convert condition like toyear(x) = c to c1 <= x < c2. #51795 (Han Fei).
  • Improve MySQL compatibility of statement SHOW INDEX. #51796 (Robert Schulze).
  • Fix use_structure_from_insertion_table_in_table_functions does not work with MATERIALIZED and ALIAS columns. Closes #51817. Closes #51019. #51825 (flynn).
  • Introduce a table setting wait_for_unique_parts_send_before_shutdown_ms which specify the amount of time replica will wait before closing interserver handler for replicated sends. Also fix inconsistency with shutdown of tables and interserver handlers: now server shutdown tables first and only after it shut down interserver handlers. #51851 (alesapin).
  • CacheDictionary request only unique keys from source. Closes #51762. #51853 (Maksim Kita).
  • Fixed settings not applied for explain query when format provided. #51859 (Nikita Taranov).
  • Allow SETTINGS before FORMAT in DESCRIBE TABLE query for compatibility with SELECT query. Closes #51544. #51899 (Nikolay Degterinsky).
  • Var-int encoded integers (e.g. used by the native protocol) can now use the full 64-bit range. 3rd party clients are advised to update their var-int code accordingly. #51905 (Robert Schulze).
  • Update certificates when they change without the need to manually SYSTEM RELOAD CONFIG. #52030 (Mike Kot).
  • Added allow_create_index_without_type setting that allow to ignore ADD INDEX queries without specified TYPE. Standard SQL queries will just succeed without changing table schema. #52056 (Ilya Yatsishin).
  • Fixed crash when mysqlxx::Pool::Entry is used after it was disconnected. #52063 (Val Doroshchuk).
  • CREATE TABLE ... AS SELECT .. is now supported in MaterializedMySQL. #52067 (Val Doroshchuk).
  • Introduced automatic conversion of text types to utf8 for MaterializedMySQL. #52084 (Val Doroshchuk).
  • Add alias for functions today (now available under the curdate/current_date names) and now (current_timestamp). #52106 (Lloyd-Pottiger).
  • Log messages are written to text_log from the beginning. #52113 (Dmitry Kardymon).
  • In cases where the HTTP endpoint has multiple IP addresses and the first of them is unreachable, a timeout exception will be thrown. Made session creation with handling all resolved endpoints. #52116 (Aleksei Filatov).
  • Support async_deduplication_token for async insert. #52136 (Han Fei).
  • Avro input format support Union with single type. Closes #52131. #52137 (flynn).
  • Add setting optimize_use_implicit_projections to disable implicit projections (currently only min_max_count projection). This is defaulted to false until #52075 is fixed. #52152 (Amos Bird).
  • It was possible to use the function hasToken for infinite loop. Now this possibility is removed. This closes #52156. #52160 (Alexey Milovidov).
    1. Upgrade Intel QPL from v1.1.0 to v1.2.0 2. Upgrade Intel accel-config from v3.5 to v4.0 3. Fixed issue that Device IOTLB miss has big perf. impact for IAA accelerators. #52180 (jasperzhu).
  • Functions "date_diff() and age()" now support millisecond/microsecond unit and work with microsecond precision. #52181 (Dmitry Kardymon).
  • Create ZK ancestors optimistically. #52195 (Raúl Marín).
  • Fix #50582. Avoid the Not found column ... in block error in some cases of reading in-order and constants. #52259 (Chen768959).
  • Check whether S2 geo primitives are invalid as early as possible on ClickHouse side. This closes: #27090. #52260 (Nikita Mikhaylov).
  • Now unquoted utf-8 strings are supported in DDL for MaterializedMySQL. #52318 (Val Doroshchuk).
  • Add back missing projection QueryAccessInfo when query_plan_optimize_projection = 1. This fixes #50183 . This fixes #50093 . #52327 (Amos Bird).
  • Add new setting disable_url_encoding that allows to disable decoding/encoding path in uri in URL engine. #52337 (Kruglov Pavel).
  • When ZooKeeperRetriesControl rethrows an error, it's more useful to see its original stack trace, not the one from ZooKeeperRetriesControl itself. #52347 (Vitaly Baranov).
  • Now double quoted comments are supported in MaterializedMySQL. #52355 (Val Doroshchuk).
  • Wait for zero copy replication lock even if some disks don't support it. #52376 (Raúl Marín).
  • Now it's possible to specify min (memory_profiler_sample_min_allocation_size) and max (memory_profiler_sample_max_allocation_size) size for allocations to be tracked with sampling memory profiler. #52419 (alesapin).
  • The session_timezone setting is demoted to experimental. #52445 (Alexey Milovidov).
  • Now interserver port will be closed only after tables are shut down. #52498 (alesapin).
  • Added field refcount to system.remote_data_paths table. #52518 (Anton Popov).
  • New setting merge_tree_determine_task_size_by_prewhere_columns added. If set to true only sizes of the columns from PREWHERE section will be considered to determine reading task size. Otherwise all the columns from query are considered. #52606 (Nikita Taranov).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in an official stable release)

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT