ClickHouse/docs/changelogs/v24.5.1.1763-stable.md

68 KiB
Raw Blame History

sidebar_position sidebar_label
1 2024

2024 Changelog

ClickHouse release v24.5.1.1763-stable (647c154a94) FIXME as compared to v24.4.1.2088-stable (6d4b31322d)

Backward Incompatible Change

  • Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make to drop such indexes before upgrade and re-create them after upgrade. #62884 (Robert Schulze).
  • Usage of functions neighbor, runningAccumulate, runningDifferenceStartingWithFirstValue, runningDifference deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set allow_deprecated_functions=1. #63132 (Nikita Taranov).
  • Queries from system.columns will work faster if there is a large number of columns, but many databases or tables are not granted for SHOW TABLES. Note that in previous versions, if you grant SHOW COLUMNS to individual columns without granting SHOW TABLES to the corresponding tables, the system.columns table will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. #63439 (Alexey Milovidov).

New Feature

  • Provide support for AzureBlobStorage function in ClickHouse server to use Azure Workload identity to authenticate against Azure blob storage. If use_workload_identity parameter is set in config, workload identity is used for authentication. #57881 (Vinay Suryadevara).
  • Introduce bulk loading to StorageEmbeddedRocksDB by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce StorageEmbeddedRocksDB table settings. #59163 (Duc Canh Le).
  • User can now parse CRLF with TSV format using a setting input_format_tsv_crlf_end_of_line. Closes #56257. #59747 (Shaun Struwig).
  • Adds the Form Format to read/write a single record in the application/x-www-form-urlencoded format. #60199 (Shaun Struwig).
  • Added possibility to compress in CROSS JOIN. #60459 (p1rattttt).
  • New setting input_format_force_null_for_omitted_fields that forces NULL values for omitted fields. #60887 (Constantine Peresypkin).
  • Support join with inequal conditions which involve columns from both left and right table. e.g. t1.y < t2.y. To enable, SET allow_experimental_join_condition = 1. #60920 (lgbo).
  • Earlier our s3 storage and s3 table function didn't support selecting from archive files. I created a solution that allows to iterate over files inside archives in S3. #62259 (Daniil Ivanik).
  • Support for conditional function clamp. #62377 (skyoct).
  • Add npy output format. #62430 (豪肥肥).
  • Added SQL functions generateUUIDv7, generateUUIDv7ThreadMonotonic, generateUUIDv7NonMonotonic (with different monotonicity/performance trade-offs) to generate version 7 UUIDs aka. timestamp-based UUIDs with random component. Also added a new function UUIDToNum to extract bytes from a UUID and a new function UUIDv7ToDateTime to extract timestamp component from a UUID version 7. #62852 (Alexey Petrunyaka).
  • Backported in #64307: Implement Dynamic data type that allows to store values of any type inside it without knowing all of them in advance. Dynamic type is available under a setting allow_experimental_dynamic_type. Reference: #54864. #63058 (Kruglov Pavel).
  • Introduce bulk loading to StorageEmbeddedRocksDB by creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce StorageEmbeddedRocksDB table settings. #63324 (Duc Canh Le).
  • Raw as a synonym for TSVRaw. #63394 (Unalian).
  • Added possibility to do cross join in temporary file if size exceeds limits. #63432 (p1rattttt).
  • On Linux and MacOS, if the program has STDOUT redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to INTO OUTFILE ). #63662 (v01dXYZ).
  • Change warning on high number of attached tables to differentiate tables, views and dictionaries. #64180 (Francisco J. Jurado Moreno).

Performance Improvement

  • Skip merging of newly created projection blocks during INSERT-s. #59405 (Nikita Taranov).
  • Process string functions XXXUTF8 'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. #61632 (李扬).
  • Improved performance of selection ({}) globs in StorageS3. #62120 (Andrey Zvonov).
  • HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. #62652 (Anton Ivashkin).
  • Add a new configurationprefer_merge_sort_block_bytes to control the memory usage and speed up sorting 2 times when merging when there are many columns. #62904 (LiuNeng).
  • clickhouse-local will start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes #62941. #63074 (Alexey Milovidov).
  • Micro-optimizations for the new analyzer. #63429 (Raúl Marín).
  • Index analysis will work if DateTime is compared to DateTime64. This closes #63441. #63443 (Alexey Milovidov).
  • Index analysis will work if DateTime is compared to DateTime64. This closes #63441. #63532 (Raúl Marín).
  • Speed up indices of type set a little (around 1.5 times) by removing garbage. #64098 (Alexey Milovidov).

Improvement

  • Maps can now have Float32, Float64, Array(T), Map(K,V) and Tuple(T1, T2, ...) as keys. Closes #54537. #59318 (李扬).
  • Multiline strings with border preservation and column width change. #59940 (Volodyachan).
  • Make rabbitmq nack broken messages. Closes #45350. #60312 (Kseniia Sumarokova).
  • Fix a crash in asynchronous stack unwinding (such as when using the sampling query profiler) while interpreting debug info. This closes #60460. #60468 (Alexey Milovidov).
  • Distinct messages for s3 error 'no key' for cases disk and storage. #61108 (Sema Checherinda).
  • Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by keep_free_space_size(elements)_ratio). This allows to release pressure from space reservation for queries (on tryReserve method). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. #61250 (Kseniia Sumarokova).
  • The progress bar will work for trivial queries with LIMIT from system.zeros, system.zeros_mt (it already works for system.numbers and system.numbers_mt), and the generateRandom table function. As a bonus, if the total number of records is greater than the max_rows_to_read limit, it will throw an exception earlier. This closes #58183. #61823 (Alexey Milovidov).
  • YAML Merge Key support. #62685 (Azat Khuzhin).
  • Enhance error message when non-deterministic function is used with Replicated source. #62896 (Grégoire Pineau).
  • Fix interserver secret for Distributed over Distributed from remote. #63013 (Azat Khuzhin).
  • Allow using clickhouse-local and its shortcuts clickhouse and ch with a query or queries file as a positional argument. Examples: ch "SELECT 1", ch --param_test Hello "SELECT {test:String}", ch query.sql. This closes #62361. #63081 (Alexey Milovidov).
  • Support configuration substitutions from YAML files. #63106 (Eduard Karacharov).
  • Add TTL information in system parts_columns table. #63200 (litlig).
  • Keep previous data in terminal after picking from skim suggestions. #63261 (FlameFactory).
  • Width of fields now correctly calculate, ignoring ANSI escape sequences. #63270 (Shaun Struwig).
  • Enable plain_rewritable metadata for local and Azure (azure_blob_storage) object storages. #63365 (Julia Kartseva).
  • Support English-style Unicode quotes, e.g. “Hello”, world. This is questionable in general but helpful when you type your query in a word processor, such as Google Docs. This closes #58634. #63381 (Alexey Milovidov).
  • Allowed to create MaterializedMySQL database without connection to MySQL. #63397 (Kirill).
  • Remove copying data when writing to filesystem cache. #63401 (Kseniia Sumarokova).
  • Update the usage of error code NUMBER_OF_ARGUMENTS_DOESNT_MATCH by more accurate error codes when appropriate. #63406 (Yohann Jardin).
  • os_user and client_hostname are now correctly set up for queries for command line suggestions in clickhouse-client. This closes #63430. #63433 (Alexey Milovidov).
  • Fixed tabulation from line numbering, correct handling of length when moving a line if the value has a tab, added tests. #63493 (Volodyachan).
  • Add this aggregate_function_group_array_has_limit_sizesetting to support discarding data in some scenarios. #63516 (zhongyuankai).
  • Automatically mark a replica of Replicated database as lost and start recovery if some DDL task fails more than max_retries_before_automatic_recovery (100 by default) times in a row with the same error. Also, fixed a bug that could cause skipping DDL entries when an exception is thrown during an early stage of entry execution. #63549 (Alexander Tokmakov).
  • Automatically correct max_block_size=0 to default value. #63587 (Antonio Andelic).
  • Account failed files in s3queue_tracked_file_ttl_sec and s3queue_traked_files_limit for StorageS3Queue. #63638 (Kseniia Sumarokova).
  • Add a build_id ALIAS column to trace_log to facilitate auto renaming upon detecting binary changes. This is to address #52086. #63656 (Zimu Li).
  • Enable truncate operation for object storage disks. #63693 (MikhailBurdukov).
  • The loading of the keywords list is now dependent on the server revision and will be disabled for the old versions of ClickHouse server. CC @azat. #63786 (Nikita Mikhaylov).
  • Allow trailing commas in the columns list in the INSERT query. For example, INSERT INTO test (a, b, c, ) VALUES .... #63803 (Alexey Milovidov).
  • Better exception messages for the Regexp format. #63804 (Alexey Milovidov).
  • Allow trailing commas in the Values format. For example, this query is allowed: INSERT INTO test (a, b, c) VALUES (4, 5, 6,);. #63810 (Alexey Milovidov).
  • Clickhouse disks have to read server setting to obtain actual metadata format version. #63831 (Sema Checherinda).
  • Disable pretty format restrictions (output_format_pretty_max_rows/output_format_pretty_max_value_width) when stdout is not TTY. #63942 (Azat Khuzhin).
  • Exception handling now works when ClickHouse is used inside AWS Lambda. Author: Alexey Coolnev. #64014 (Alexey Milovidov).
  • Throw CANNOT_DECOMPRESS instread of CORRUPTED_DATA on invalid compressed data passed via HTTP. #64036 (vdimir).
  • A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes #61993. #64084 (Alexey Milovidov).
  • Now backups with azure blob storage will use multicopy. #64116 (alesapin).
  • Add metrics, logs, and thread names around parts filtering with indices. #64130 (Alexey Milovidov).
  • Allow to use native copy for azure even with different containers. #64154 (alesapin).
  • Finally enable native copy for azure. #64182 (alesapin).
  • Ignore allow_suspicious_primary_key on ATTACH and verify on ALTER. #64202 (Azat Khuzhin).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in an official stable release)

  • Fix making backup when multiple shards are used. This PR fixes #56566. #57684 (Vitaly Baranov).
  • Fix passing projections/indexes from CREATE query into inner table of MV. #59183 (Azat Khuzhin).
  • Fix boundRatio incorrect merge. #60532 (Tao Wang).
  • Fix crash when using some functions with low-cardinality columns. #61966 (Michael Kolupaev).
  • Fix queries with FINAL give wrong result when table does not use adaptive granularity. #62432 (Duc Canh Le).
  • Improve the detection of cgroups v2 memory controller in unusual locations. This fixes a warning that the cgroup memory observer was disabled because no cgroups v1 or v2 current memory file could be found. #62903 (Robert Schulze).
  • Fix subsequent use of external tables in client. #62964 (Azat Khuzhin).
  • Fix crash with untuple and unresolved lambda. #63131 (Raúl Marín).
  • Fix bug which could lead to server to accept connections before server is actually loaded. #63181 (alesapin).
  • Fix intersect parts when restart after drop range. #63202 (Han Fei).
  • Fix a misbehavior when SQL security defaults don't load for old tables during server startup. #63209 (pufit).
  • JOIN filter push down filled join fix. Closes #63228. #63234 (Maksim Kita).
  • Fix infinite loop while listing objects in Azure blob storage. #63257 (Julia Kartseva).
  • CROSS join can be executed with any value join_algorithm setting, close #62431. #63273 (vdimir).
  • Fixed a potential crash caused by a no space left error when temporary data in the cache is used. #63346 (vdimir).
  • Fix bug which could potentially lead to rare LOGICAL_ERROR during SELECT query with message: Unexpected return type from materialize. Expected type_XXX. Got type_YYY. Introduced in #59379. #63353 (alesapin).
  • Fix X-ClickHouse-Timezone header returning wrong timezone when using session_timezone as query level setting. #63377 (Andrey Zvonov).
  • Fix debug assert when using grouping WITH ROLLUP and LowCardinality types. #63398 (Raúl Marín).
  • Fix logical errors in queries with GROUPING SETS and WHERE and group_by_use_nulls = true, close #60538. #63405 (vdimir).
  • Fix backup of projection part in case projection was removed from table metadata, but part still has projection. #63426 (Kseniia Sumarokova).
  • Fix 'Every derived table must have its own alias' error for MYSQL dictionary source, close #63341. #63481 (vdimir).
  • Insert QueryFinish on AsyncInsertFlush with no data. #63483 (Raúl Marín).
  • Fix system.query_log.used_dictionaries logging. #63487 (Eduard Karacharov).
  • Avoid segafult in MergeTreePrefetchedReadPool while fetching projection parts. #63513 (Antonio Andelic).
  • Fix rabbitmq heap-use-after-free found by clang-18, which can happen if an error is thrown from RabbitMQ during initialization of exchange and queues. #63515 (Kseniia Sumarokova).
  • Fix crash on exit with sentry enabled (due to openssl destroyed before sentry). #63548 (Azat Khuzhin).
  • Fix support for Array and Map with Keyed hashing functions and materialized keys. #63628 (Salvatore Mesoraca).
  • Fixed Parquet filter pushdown not working with Analyzer. #63642 (Michael Kolupaev).
  • It is forbidden to convert MergeTree to replicated if the zookeeper path for this table already exists. #63670 (Kirill).
  • Read only the necessary columns from VIEW (new analyzer). Closes #62594. #63688 (Maksim Kita).
  • Fix rare case with missing data in the result of distributed query. #63691 (vdimir).
  • Fix #63539. Forbid WINDOW redefinition in new analyzer. #63694 (Dmitry Novik).
  • Flatten_nested is broken with replicated database. #63695 (Nikolai Kochetov).
  • Fix SIZES_OF_COLUMNS_DOESNT_MATCH error for queries with arrayJoin function in WHERE. Fixes #63653. #63722 (Nikolai Kochetov).
  • Fix Not found column and CAST AS Map from array requires nested tuple of 2 elements exceptions for distributed queries which use Map(Nothing, Nothing) type. Fixes #63637. #63753 (Nikolai Kochetov).
  • Fix possible ILLEGAL_COLUMN error in partial_merge join, close #37928. #63755 (vdimir).
  • query_plan_remove_redundant_distinct can break queries with WINDOW FUNCTIONS (with allow_experimental_analyzer is on). Fixes #62820. #63776 (Igor Nikonov).
  • Fix possible crash with SYSTEM UNLOAD PRIMARY KEY. #63778 (Raúl Marín).
  • Fix a query with a duplicating cycling alias. Fixes #63320. #63791 (Nikolai Kochetov).
  • Fixed performance degradation of parsing data formats in INSERT query. This closes #62918. This partially reverts #42284, which breaks the original design and introduces more problems. #63801 (Alexey Milovidov).
  • Add 'endpoint_subpath' S3 URI setting to allow plain_rewritable disks to share the same endpoint. #63806 (Julia Kartseva).
  • Fix queries using parallel read buffer (e.g. with max_download_thread > 0) getting stuck when threads cannot be allocated. #63814 (Antonio Andelic).
  • Allow JOIN filter push down to both streams if only single equivalent column is used in query. Closes #63799. #63819 (Maksim Kita).
  • Remove the data from all disks after DROP with the Lazy database engines. Without these changes, orhpaned will remain on the disks. #63848 (MikhailBurdukov).
  • Fix incorrect select query result when parallel replicas were used to read from a Materialized View. #63861 (Nikita Taranov).
  • Fixes in find_super_nodes and find_big_family command of keeper-client: - do not fail on ZNONODE errors - find super nodes inside super nodes - properly calculate subtree node count. #63862 (Alexander Gololobov).
  • Fix a error Database name is empty for remote queries with lambdas over the cluster with modified default database. Fixes #63471. #63864 (Nikolai Kochetov).
  • Fix SIGSEGV due to CPU/Real (query_profiler_real_time_period_ns/query_profiler_cpu_time_period_ns) profiler (has been an issue since 2022, that leads to periodic server crashes, especially if you were using distributed engine). #63865 (Azat Khuzhin).
  • Fixed EXPLAIN CURRENT TRANSACTION query. #63926 (Anton Popov).
  • Fix analyzer - IN function with arbitrary deep sub-selects in materialized view to use insertion block. #63930 (Yakov Olkhovskiy).
  • Allow ALTER TABLE .. MODIFY|RESET SETTING and ALTER TABLE .. MODIFY COMMENT for plain_rewritable disk. #63933 (Julia Kartseva).
  • Fix Recursive CTE with distributed queries. Closes #63790. #63939 (Maksim Kita).
  • Fix resolve of unqualified COLUMNS matcher. Preserve the input columns order and forbid usage of unknown identifiers. #63962 (Dmitry Novik).
  • Fix the Not found column error for queries with skip_unused_shards = 1, LIMIT BY, and the new analyzer. Fixes #63943. #63983 (Nikolai Kochetov).
  • (Low-quality third-party Kusto Query Language). Resolve Client Abortion Issue When Using KQL Table Function in Interactive Mode. #63992 (Yong Wang).
  • Backported in #64356: Fix an Cyclic aliases error for cyclic aliases of different type (expression and function). Fixes #63205. #63993 (Nikolai Kochetov).
  • Deserialize untrusted binary inputs in a safer way. #64024 (Robert Schulze).
  • Do not throw Storage doesn't support FINAL error for remote queries over non-MergeTree tables with final = true and new analyzer. Fixes #63960. #64037 (Nikolai Kochetov).
  • Add missing settings to recoverLostReplica. #64040 (Raúl Marín).
  • Fix unwind on SIGSEGV on aarch64 (due to small stack for signal). #64058 (Azat Khuzhin).
  • Backported in #64324: This fix will use a proper redefined context with the correct definer for each individual view in the query pipeline Closes #63777. #64079 (pufit).
  • Backported in #64384: Fix analyzer: "Not found column" error is fixed when using INTERPOLATE. #64096 (Yakov Olkhovskiy).
  • Fix azure backup writing multipart blocks as 1mb (read buffer size) instead of max_upload_part_size. #64117 (Kseniia Sumarokova).
  • Backported in #64541: Fix creating backups to S3 buckets with different credentials from the disk containing the file. #64153 (Antonio Andelic).
  • Prevent LOGICAL_ERROR on CREATE TABLE as MaterializedView. #64174 (Raúl Marín).
  • Backported in #64332: The query cache now considers two identical queries against different databases as different. The previous behavior could be used to bypass missing privileges to read from a table. #64199 (Robert Schulze).
  • Ignore text_log config when using Keeper. #64218 (Antonio Andelic).
  • Backported in #64692: Fix Query Tree size validation. Closes #63701. #64377 (Dmitry Novik).
  • Backported in #64411: Fix Logical error: Bad cast for Buffer table with PREWHERE. Fixes #64172. #64388 (Nikolai Kochetov).
  • Backported in #64625: Fix an error Cannot find column in distributed queries with constant CTE in the GROUP BY key. #64519 (Nikolai Kochetov).
  • Backported in #64682: Fix #64612. Do not rewrite aggregation if -If combinator is already used. #64638 (Dmitry Novik).

CI Fix or Improvement (changelog entry is not required)

Critical Bug Fix (crash, LOGICAL_ERROR, data loss, RBAC)

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT