ClickHouse/docs/changelogs/v24.4.1.2088-stable.md

74 KiB

sidebar_position sidebar_label
1 2024

2024 Changelog

ClickHouse release v24.4.1.2088-stable (6d4b31322d) FIXME as compared to v24.3.1.2672-lts (2c5c589a88)

Backward Incompatible Change

  • Don't allow to set max_parallel_replicas to 0 as it doesn't make sense. Setting it to 0 could lead to unexpected logical errors. Closes #60140. #61201 (Kruglov Pavel).
  • clickhouse-odbc-bridge and clickhouse-library-bridge are separate packages. This closes #61677. #62114 (Alexey Milovidov).
  • Remove support for INSERT WATCH query (part of the experimental LIVE VIEW feature). #62382 (Alexey Milovidov).
  • Remove optimize_monotonous_functions_in_order_by setting. #63004 (Raúl Marín).

New Feature

  • Supports dropping multiple tables at the same time like drop table a,b,c;. #58705 (zhongyuankai).
  • Table engine is grantable now, and it won't affect existing users behavior. #60117 (jsc0218).
  • Added a rewritable S3 disk which supports INSERT operations and does not require locally stored metadata. #61116 (Julia Kartseva).
  • For convenience purpose, SELECT * FROM numbers() will work in the same way as SELECT * FROM system.numbers - without a limit. #61969 (YenchangChan).
  • Modifying memory table settings through ALTER MODIFY SETTING is now supported. ALTER TABLE memory MODIFY SETTING min_rows_to_keep = 100, max_rows_to_keep = 1000;. #62039 (zhongyuankai).
  • Analyzer support recursive CTEs. #62074 (Maksim Kita).
  • Analyzer support QUALIFY clause. Closes #47819. #62619 (Maksim Kita).
  • Added role query parameter to the HTTP interface. It works similarly to SET ROLE x, applying the role before the statement is executed. This allows for overcoming the limitation of the HTTP interface, as multiple statements are not allowed, and it is not possible to send both SET ROLE x and the statement itself at the same time. It is possible to set multiple roles that way, e.g., ?role=x&role=y, which will be an equivalent of SET ROLE x, y. #62669 (Serge Klochkov).
  • Add SYSTEM UNLOAD PRIMARY KEY. #62738 (Pablo Marcos).

Performance Improvement

  • Reduce overhead of the mutations for SELECTs (v2). #60856 (Azat Khuzhin).
  • More frequently invoked functions in PODArray are now force-inlined. #61144 (李扬).
  • JOIN filter push down improvements using equivalent sets. #61216 (Maksim Kita).
  • Enabled fast Parquet encoder by default (output_format_parquet_use_custom_encoder). #62088 (Michael Kolupaev).
  • ... When all required fields are read, skip all remaining fields directly which can save a lot of comparison. #62210 (lgbo).
  • Functions splitByChar and splitByRegexp were speed up significantly. #62392 (李扬).
  • Improve trivial insert select from files in file/s3/hdfs/url/... table functions. Add separate max_parsing_threads setting to control the number of threads used in parallel parsing. #62404 (Kruglov Pavel).
  • Support parallel write buffer for AzureBlobStorage managed by setting azure_allow_parallel_part_upload. #62534 (SmitaRKulkarni).
  • Functions to_utc_timestamp and from_utc_timestamp are now about 2x faster. #62583 (KevinyhZou).
  • Functions parseDateTimeOrNull, parseDateTimeOrZero, parseDateTimeInJodaSyntaxOrNull and parseDateTimeInJodaSyntaxOrZero now run significantly faster (10x - 1000x) when the input contains mostly non-parseable values. #62634 (LiuNeng).
  • SELECTs against system.query_cache are now noticeably faster when the query cache contains lots of entries (e.g. more than 100.000). #62671 (Robert Schulze).
  • QueryPlan convert OUTER JOIN to INNER JOIN optimization if filter after JOIN always filters default values. Optimization can be controlled with setting query_plan_convert_outer_join_to_inner_join, enabled by default. #62907 (Maksim Kita).
  • Enable optimize_rewrite_sum_if_to_count_if by default. #62929 (Raúl Marín).

Improvement

  • Introduce separate consumer/producer tags for the Kafka configuration. This avoids warnings from librdkafka that consumer properties were specified for producer instances and vice versa (e.g. Configuration property session.timeout.ms is a consumer property and will be ignored by this producer instance). Closes: #58983. #58956 (Aleksandr Musorin).
  • Added value1, value2, ..., value10 columns to system.text_log. These columns contain values that were used to format the message. #59619 (Alexey Katsman).
  • Add a setting first_day_of_week which affects the first day of the week considered by functions toStartOfInterval(..., INTERVAL ... WEEK). This allows for consistency with function toStartOfWeek which defaults to Sunday as the first day of the week. #60598 (Jordi Villar).
  • Added persistent virtual column _block_offset which stores original number of row in block that was assigned at insert. Persistence of column _block_offset can be enabled by setting enable_block_offset_column. Added virtual column_part_data_version which contains either min block number or mutation version of part. Persistent virtual column _block_number is not considered experimental anymore. #60676 (Anton Popov).
  • Less contention in filesystem cache (part 3): execute removal from filesystem without lock on space reservation attempt. #61163 (Kseniia Sumarokova).
  • Functions date_diff and age now calculate their result at nanosecond instead of microsecond precision. They now also offer nanosecond (or nanoseconds or ns) as a possible value for the unit parameter. #61409 (Austin Kothig).
  • Now marks are not loaded for wide parts during merges. #61551 (Anton Popov).
  • Reload certificate chain during certificate reload. #61671 (Pervakov Grigorii).
  • Speed up dynamic resize of filesystem cache. #61723 (Kseniia Sumarokova).
  • Add TRUNCATE ALL TABLES. #61862 (豪肥肥).
  • Try to prevent #60432 by not allowing a table to be attached if there is an active replica for that replica path. #61876 (Arthur Passos).
  • Add a setting input_format_json_throw_on_bad_escape_sequence, disabling it allows saving bad escape sequences in JSON input formats. #61889 (Kruglov Pavel).
  • Userspace page cache works with static web storage (disk(type = web)) now. Use client setting use_page_cache_for_disks_without_file_cache=1 to enable. #61911 (Michael Kolupaev).
  • Implement input() for clickhouse-local. #61923 (Azat Khuzhin).
  • Fix logical-error when undoing quorum insert transaction. #61953 (Han Fei).
  • StorageJoin with strictness ANY is consistent after reload. When several rows with the same key are inserted, the first one will have higher priority (before, it was chosen randomly upon table loading). close #51027. #61972 (vdimir).
  • Automatically infer Nullable column types from Apache Arrow schema. #61984 (Maksim Kita).
  • Allow to cancel parallel merge of aggregate states during aggregation. Example: uniqExact. #61992 (Maksim Kita).
  • Don't treat Bool and number variants as suspicious in Variant type. #61999 (Kruglov Pavel).
  • Use system.keywords to fill in the suggestions and also use them in the all places internally. #62000 (Nikita Mikhaylov).
  • Implement better conversion from String to Variant using parsing. #62005 (Kruglov Pavel).
  • Support Variant in JSONExtract functions. #62014 (Kruglov Pavel).
  • Dictionary source with INVALIDATE_QUERY is not reloaded twice on startup. #62050 (vdimir).
  • OPTIMIZE FINAL for ReplicatedMergeTree now will wait for currently active merges to finish and then reattempt to schedule a final merge. This will put it more in line with ordinary MergeTree behaviour. #62067 (Nikita Taranov).
  • While read data from a hive text file, it would use the first line of hive text file to resize of number of input fields, and sometimes the fields number of first line is not matched with the hive table defined , such as the hive table is defined to have 3 columns, like test_tbl(a Int32, b Int32, c Int32), but the first line of text file only has 2 fields, and in this suitation, the input fields will be resized to 2, and if the next line of the text file has 3 fields, then the third field can not be read but set a default value 0, which is not right. #62086 (KevinyhZou).
  • CREATE AS copies the comment. #62117 (Pablo Marcos).
  • The syntax highlighting while typing in the client will work on the syntax level (previously, it worked on the lexer level). #62123 (Alexey Milovidov).
  • Fix an issue where when a redundant = 1 or = 0 is added after a boolean expression involving the primary key, the primary index is not used. For example, both SELECT * FROM <table> WHERE <primary-key> IN (<value>) = 1 and SELECT * FROM <table> WHERE <primary-key> NOT IN (<value>) = 0 will both perform a full table scan, when the primary index can be used. #62142 (josh-hildred).
  • Add query progress to table zookeeper. #62152 (JackyWoo).
  • Add ability to turn on trace collector (Real and CPU) server-wide. #62189 (alesapin).
  • Added setting lightweight_deletes_sync (default value: 2 - wait all replicas synchronously). It is similar to setting mutations_sync but affects only behaviour of lightweight deletes. #62195 (Anton Popov).
  • Distinguish booleans and integers while parsing values for custom settings: SET custom_a = true; SET custom_b = 1;. #62206 (Vitaly Baranov).
  • Support S3 access through AWS Private Link Interface endpoints. Closes #60021, #31074 and #53761. #62208 (Arthur Passos).
  • Client has to send header 'Keep-Alive: timeout=X' to the server. If a client receives a response from the server with that header, client has to use the value from the server. Also for a client it is better not to use a connection which is nearly expired in order to avoid connection close race. #62249 (Sema Checherinda).
  • Added nano- micro- milliseconds unit for date_trunc. #62335 (Misz606).
  • Do not create a directory for UDF in clickhouse-client if it does not exist. This closes #59597. #62366 (Alexey Milovidov).
  • The query cache now no longer caches results of queries against system tables (system.*, information_schema.*, INFORMATION_SCHEMA.*). #62376 (Robert Schulze).
  • MOVE PARTITION TO TABLE query can be delayed or can throw TOO_MANY_PARTS exception to avoid exceeding limits on the part count. The same settings and limits are applied as for theINSERT query (see max_parts_in_total, parts_to_delay_insert, parts_to_throw_insert, inactive_parts_to_throw_insert, inactive_parts_to_delay_insert, max_avg_part_size_for_too_many_parts, min_delay_to_insert_ms and max_delay_to_insert settings). #62420 (Sergei Trifonov).
  • Added the missing hostname column to system table blob_storage_log. #62456 (Jayme Bird).
  • Changed the default installation directory on macOS from /usr/bin to /usr/local/bin. This is necessary because Apple's System Integrity Protection introduced with macOS El Capitan (2015) prevents writing into /usr/bin, even with sudo. #62489 (haohang).
  • Make transform always return the first match. #62518 (Raúl Marín).
  • For consistency with other system tables, system.backup_log now has a column event_time. #62541 (Jayme Bird).
  • Avoid evaluating table DEFAULT expressions while executing RESTORE. #62601 (Vitaly Baranov).
  • Return stream of chunks from system.remote_data_paths instead of accumulating the whole result in one big chunk. This allows to consume less memory, show intermediate progress and cancel the query. #62613 (Alexander Gololobov).
  • S3 storage and backups also need the same default keep alive settings as s3 disk. #62648 (Sema Checherinda).
  • Table system.backup_log now has the "default" sorting key which is event_date, event_time, the same as for other _log table engines. #62667 (Nikita Mikhaylov).
  • Mark type Variant as comparable so it can be used in primary key. #62693 (Kruglov Pavel).
  • Add librdkafka's client identifier to log messages to be able to differentiate log messages from different consumers of a single table. #62813 (János Benjamin Antal).
  • Allow special macros {uuid} and {database} in a Replicated database ZooKeeper path. #62818 (Vitaly Baranov).
  • Allow quota key with different auth scheme in HTTP requests. #62842 (Kseniia Sumarokova).
  • Remove experimental tag from Replicated database engine. Now it is in Beta stage. #62937 (Justin de Guzman).
  • Reduce the verbosity of command line argument --help in clickhouse client and clickhouse local. The previous output is now generated by --help --verbose. #62973 (Yarik Briukhovetskyi).
  • Close session if user's valid_until is reached. #63046 (Konstantin Bogdanov).
  • log_bin_use_v1_row_events was removed in MySQL 8.3, fix #60479. #63101 (Eugene Klimov).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in an official stable release)

CI Fix or Improvement (changelog entry is not required)

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT