ClickHouse/docs/changelogs/v24.4.1.2088-stable.md
2024-05-30 16:58:36 +02:00

79 KiB
Raw Permalink Blame History

sidebar_position sidebar_label
1 2024

2024 Changelog

ClickHouse release v24.4.1.2088-stable (6d4b31322d) FIXME as compared to v24.3.1.2672-lts (2c5c589a88)

Backward Incompatible Change

  • Don't allow to set max_parallel_replicas to 0 as it doesn't make sense. Setting it to 0 could lead to unexpected logical errors. Closes #60140. #61201 (Kruglov Pavel).
  • clickhouse-odbc-bridge and clickhouse-library-bridge are separate packages. This closes #61677. #62114 (Alexey Milovidov).
  • Remove support for INSERT WATCH query (part of the experimental LIVE VIEW feature). #62382 (Alexey Milovidov).
  • Remove optimize_monotonous_functions_in_order_by setting. #63004 (Raúl Marín).

New Feature

  • Supports dropping multiple tables at the same time like drop table a,b,c;. #58705 (zhongyuankai).
  • Table engine is grantable now, and it won't affect existing users behavior. #60117 (jsc0218).
  • Added a rewritable S3 disk which supports INSERT operations and does not require locally stored metadata. #61116 (Julia Kartseva).
  • For convenience purpose, SELECT * FROM numbers() will work in the same way as SELECT * FROM system.numbers - without a limit. #61969 (YenchangChan).
  • Modifying memory table settings through ALTER MODIFY SETTING is now supported. ALTER TABLE memory MODIFY SETTING min_rows_to_keep = 100, max_rows_to_keep = 1000;. #62039 (zhongyuankai).
  • Analyzer support recursive CTEs. #62074 (Maksim Kita).
  • Analyzer support QUALIFY clause. Closes #47819. #62619 (Maksim Kita).
  • Added role query parameter to the HTTP interface. It works similarly to SET ROLE x, applying the role before the statement is executed. This allows for overcoming the limitation of the HTTP interface, as multiple statements are not allowed, and it is not possible to send both SET ROLE x and the statement itself at the same time. It is possible to set multiple roles that way, e.g., ?role=x&role=y, which will be an equivalent of SET ROLE x, y. #62669 (Serge Klochkov).
  • Add SYSTEM UNLOAD PRIMARY KEY. #62738 (Pablo Marcos).

Performance Improvement

  • Reduce overhead of the mutations for SELECTs (v2). #60856 (Azat Khuzhin).
  • More frequently invoked functions in PODArray are now force-inlined. #61144 (李扬).
  • JOIN filter push down improvements using equivalent sets. #61216 (Maksim Kita).
  • Enabled fast Parquet encoder by default (output_format_parquet_use_custom_encoder). #62088 (Michael Kolupaev).
  • ... When all required fields are read, skip all remaining fields directly which can save a lot of comparison. #62210 (lgbo).
  • Functions splitByChar and splitByRegexp were speed up significantly. #62392 (李扬).
  • Improve trivial insert select from files in file/s3/hdfs/url/... table functions. Add separate max_parsing_threads setting to control the number of threads used in parallel parsing. #62404 (Kruglov Pavel).
  • Support parallel write buffer for AzureBlobStorage managed by setting azure_allow_parallel_part_upload. #62534 (SmitaRKulkarni).
  • Functions to_utc_timestamp and from_utc_timestamp are now about 2x faster. #62583 (KevinyhZou).
  • Functions parseDateTimeOrNull, parseDateTimeOrZero, parseDateTimeInJodaSyntaxOrNull and parseDateTimeInJodaSyntaxOrZero now run significantly faster (10x - 1000x) when the input contains mostly non-parseable values. #62634 (LiuNeng).
  • SELECTs against system.query_cache are now noticeably faster when the query cache contains lots of entries (e.g. more than 100.000). #62671 (Robert Schulze).
  • QueryPlan convert OUTER JOIN to INNER JOIN optimization if filter after JOIN always filters default values. Optimization can be controlled with setting query_plan_convert_outer_join_to_inner_join, enabled by default. #62907 (Maksim Kita).
  • Enable optimize_rewrite_sum_if_to_count_if by default. #62929 (Raúl Marín).

Improvement

  • Introduce separate consumer/producer tags for the Kafka configuration. This avoids warnings from librdkafka that consumer properties were specified for producer instances and vice versa (e.g. Configuration property session.timeout.ms is a consumer property and will be ignored by this producer instance). Closes: #58983. #58956 (Aleksandr Musorin).
  • Added value1, value2, ..., value10 columns to system.text_log. These columns contain values that were used to format the message. #59619 (Alexey Katsman).
  • Add a setting first_day_of_week which affects the first day of the week considered by functions toStartOfInterval(..., INTERVAL ... WEEK). This allows for consistency with function toStartOfWeek which defaults to Sunday as the first day of the week. #60598 (Jordi Villar).
  • Added persistent virtual column _block_offset which stores original number of row in block that was assigned at insert. Persistence of column _block_offset can be enabled by setting enable_block_offset_column. Added virtual column_part_data_version which contains either min block number or mutation version of part. Persistent virtual column _block_number is not considered experimental anymore. #60676 (Anton Popov).
  • Less contention in filesystem cache (part 3): execute removal from filesystem without lock on space reservation attempt. #61163 (Kseniia Sumarokova).
  • Functions date_diff and age now calculate their result at nanosecond instead of microsecond precision. They now also offer nanosecond (or nanoseconds or ns) as a possible value for the unit parameter. #61409 (Austin Kothig).
  • Now marks are not loaded for wide parts during merges. #61551 (Anton Popov).
  • Reload certificate chain during certificate reload. #61671 (Pervakov Grigorii).
  • Speed up dynamic resize of filesystem cache. #61723 (Kseniia Sumarokova).
  • Add TRUNCATE ALL TABLES. #61862 (豪肥肥).
  • Try to prevent #60432 by not allowing a table to be attached if there is an active replica for that replica path. #61876 (Arthur Passos).
  • Add a setting input_format_json_throw_on_bad_escape_sequence, disabling it allows saving bad escape sequences in JSON input formats. #61889 (Kruglov Pavel).
  • Userspace page cache works with static web storage (disk(type = web)) now. Use client setting use_page_cache_for_disks_without_file_cache=1 to enable. #61911 (Michael Kolupaev).
  • Implement input() for clickhouse-local. #61923 (Azat Khuzhin).
  • Fix logical-error when undoing quorum insert transaction. #61953 (Han Fei).
  • StorageJoin with strictness ANY is consistent after reload. When several rows with the same key are inserted, the first one will have higher priority (before, it was chosen randomly upon table loading). close #51027. #61972 (vdimir).
  • Automatically infer Nullable column types from Apache Arrow schema. #61984 (Maksim Kita).
  • Allow to cancel parallel merge of aggregate states during aggregation. Example: uniqExact. #61992 (Maksim Kita).
  • Don't treat Bool and number variants as suspicious in Variant type. #61999 (Kruglov Pavel).
  • Use system.keywords to fill in the suggestions and also use them in the all places internally. #62000 (Nikita Mikhaylov).
  • Implement better conversion from String to Variant using parsing. #62005 (Kruglov Pavel).
  • Support Variant in JSONExtract functions. #62014 (Kruglov Pavel).
  • Dictionary source with INVALIDATE_QUERY is not reloaded twice on startup. #62050 (vdimir).
  • OPTIMIZE FINAL for ReplicatedMergeTree now will wait for currently active merges to finish and then reattempt to schedule a final merge. This will put it more in line with ordinary MergeTree behaviour. #62067 (Nikita Taranov).
  • While read data from a hive text file, it would use the first line of hive text file to resize of number of input fields, and sometimes the fields number of first line is not matched with the hive table defined , such as the hive table is defined to have 3 columns, like test_tbl(a Int32, b Int32, c Int32), but the first line of text file only has 2 fields, and in this suitation, the input fields will be resized to 2, and if the next line of the text file has 3 fields, then the third field can not be read but set a default value 0, which is not right. #62086 (KevinyhZou).
  • CREATE AS copies the comment. #62117 (Pablo Marcos).
  • The syntax highlighting while typing in the client will work on the syntax level (previously, it worked on the lexer level). #62123 (Alexey Milovidov).
  • Fix an issue where when a redundant = 1 or = 0 is added after a boolean expression involving the primary key, the primary index is not used. For example, both SELECT * FROM <table> WHERE <primary-key> IN (<value>) = 1 and SELECT * FROM <table> WHERE <primary-key> NOT IN (<value>) = 0 will both perform a full table scan, when the primary index can be used. #62142 (josh-hildred).
  • Add query progress to table zookeeper. #62152 (JackyWoo).
  • Add ability to turn on trace collector (Real and CPU) server-wide. #62189 (alesapin).
  • Added setting lightweight_deletes_sync (default value: 2 - wait all replicas synchronously). It is similar to setting mutations_sync but affects only behaviour of lightweight deletes. #62195 (Anton Popov).
  • Distinguish booleans and integers while parsing values for custom settings: SET custom_a = true; SET custom_b = 1;. #62206 (Vitaly Baranov).
  • Support S3 access through AWS Private Link Interface endpoints. Closes #60021, #31074 and #53761. #62208 (Arthur Passos).
  • Client has to send header 'Keep-Alive: timeout=X' to the server. If a client receives a response from the server with that header, client has to use the value from the server. Also for a client it is better not to use a connection which is nearly expired in order to avoid connection close race. #62249 (Sema Checherinda).
  • Added nano- micro- milliseconds unit for date_trunc. #62335 (Misz606).
  • Do not create a directory for UDF in clickhouse-client if it does not exist. This closes #59597. #62366 (Alexey Milovidov).
  • The query cache now no longer caches results of queries against system tables (system.*, information_schema.*, INFORMATION_SCHEMA.*). #62376 (Robert Schulze).
  • MOVE PARTITION TO TABLE query can be delayed or can throw TOO_MANY_PARTS exception to avoid exceeding limits on the part count. The same settings and limits are applied as for theINSERT query (see max_parts_in_total, parts_to_delay_insert, parts_to_throw_insert, inactive_parts_to_throw_insert, inactive_parts_to_delay_insert, max_avg_part_size_for_too_many_parts, min_delay_to_insert_ms and max_delay_to_insert settings). #62420 (Sergei Trifonov).
  • Added the missing hostname column to system table blob_storage_log. #62456 (Jayme Bird).
  • Changed the default installation directory on macOS from /usr/bin to /usr/local/bin. This is necessary because Apple's System Integrity Protection introduced with macOS El Capitan (2015) prevents writing into /usr/bin, even with sudo. #62489 (haohang).
  • Make transform always return the first match. #62518 (Raúl Marín).
  • For consistency with other system tables, system.backup_log now has a column event_time. #62541 (Jayme Bird).
  • Avoid evaluating table DEFAULT expressions while executing RESTORE. #62601 (Vitaly Baranov).
  • Return stream of chunks from system.remote_data_paths instead of accumulating the whole result in one big chunk. This allows to consume less memory, show intermediate progress and cancel the query. #62613 (Alexander Gololobov).
  • S3 storage and backups also need the same default keep alive settings as s3 disk. #62648 (Sema Checherinda).
  • Table system.backup_log now has the "default" sorting key which is event_date, event_time, the same as for other _log table engines. #62667 (Nikita Mikhaylov).
  • Mark type Variant as comparable so it can be used in primary key. #62693 (Kruglov Pavel).
  • Add librdkafka's client identifier to log messages to be able to differentiate log messages from different consumers of a single table. #62813 (János Benjamin Antal).
  • Allow special macros {uuid} and {database} in a Replicated database ZooKeeper path. #62818 (Vitaly Baranov).
  • Allow quota key with different auth scheme in HTTP requests. #62842 (Kseniia Sumarokova).
  • Remove experimental tag from Replicated database engine. Now it is in Beta stage. #62937 (Justin de Guzman).
  • Reduce the verbosity of command line argument --help in clickhouse client and clickhouse local. The previous output is now generated by --help --verbose. #62973 (Yarik Briukhovetskyi).
  • Close session if user's valid_until is reached. #63046 (Konstantin Bogdanov).
  • log_bin_use_v1_row_events was removed in MySQL 8.3, fix #60479. #63101 (Eugene Klimov).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in an official stable release)

  • Fix parser error when using COUNT(*) with FILTER clause. #61357 (Duc Canh Le).
  • Fix logical error ''Unexpected return type from materialize. Expected Nullable. Got UInt8' while using group_by_use_nulls with analyzer and materialize/constant in grouping set. Closes #61531. #61567 (Kruglov Pavel).
  • Fix data race between MOVE PARTITION query and merges resulting in intersecting parts. #61610 (János Benjamin Antal).
  • TBD. #61720 (Kruglov Pavel).
  • Search for MergeTree to ReplicatedMergeTree conversion flag at the correct location for tables with custom storage policy. #61769 (Kirill).
  • Fix possible connections data-race for distributed_foreground_insert/distributed_background_insert_batch that leads to crashes. #61867 (Azat Khuzhin).
  • Fix skipping escape sequcne parsing errors during JSON data parsing while using input_format_allow_errors_num/ratio settings. #61883 (Kruglov Pavel).
  • Fix writing exception message in output format in HTTP when http_wait_end_of_query is used. Closes #55101. #61951 (Kruglov Pavel).
  • This PR reverts https://github.com/ClickHouse/ClickHouse/pull/61617 and fixed the problem with usage of LowCardinality columns together with JSONExtract function. Previously the user may receive either incorrect result of a logical error. #61957 (Nikita Mikhaylov).
  • Fixes Crash in Engine Merge if Row Policy does not have expression. #61971 (Ilya Golshtein).
  • Implemented preFinalize, updated finalizeImpl & destructor of WriteBufferAzureBlobStorage to avoided having uncaught exception in destructor. #61988 (SmitaRKulkarni).
  • Fix CREATE TABLE w/o columns definition for ReplicatedMergeTree (columns will be obtained from replica). #62040 (Azat Khuzhin).
  • Fix optimize_skip_unused_shards_rewrite_in for composite sharding key (could lead to NOT_FOUND_COLUMN_IN_BLOCK and TYPE_MISMATCH). #62047 (Azat Khuzhin).
  • ReadWriteBufferFromHTTP set right header host when redirected. #62068 (Sema Checherinda).
  • Fix external table cannot parse data type Bool. #62115 (Duc Canh Le).
  • Revert "Merge pull request #61564 from liuneng1994/optimize_in_single_value". The feature is broken and can't be disabled individually. #62135 (Raúl Marín).
  • Fix override of MergeTree virtual columns. #62180 (Raúl Marín).
  • Fix query parameter resolution with allow_experimental_analyzer enabled. Closes #62113. #62186 (Dmitry Novik).
  • This PR makes RESTORE ON CLUSTER wait for each ReplicatedMergeTree table to stop being readonly before attaching any restored parts to it. Earlier it didn't wait and it could try to attach some parts at nearly the same time as checking other replicas during the table's startup. In rare cases some parts could be not attached at all during RESTORE ON CLUSTER because of that issue. #62207 (Vitaly Baranov).
  • Fix crash on CREATE TABLE with INDEX containing SQL UDF in expression, close #62134. #62225 (vdimir).
  • Fix generateRandom with NULL in the seed argument. Fixes #62092. #62248 (Nikolai Kochetov).
  • Fix buffer overflow when DISTINCT is used with constant values. #62250 (Antonio Andelic).
  • When some index columns are not loaded into memory for some parts of a *MergeTree table, queries with FINAL might produce wrong results. Now we explicitly choose only the common prefix of index columns for all parts to avoid this issue. #62268 (Nikita Taranov).
  • Fix inability to address parametrized view in SELECT queries via aliases. #62274 (Dmitry Novik).
  • Fix name resolution in case when identifier is resolved to an executed scalar subquery. #62281 (Dmitry Novik).
  • Fix argMax with nullable non native numeric column. #62285 (Raúl Marín).
  • Fix BACKUP and RESTORE of a materialized view in Ordinary database. #62295 (Vitaly Baranov).
  • Fix data race on scalars in Context. #62305 (Kruglov Pavel).
  • Fix displaying of materialized_view primary_key in system.tables. Previously it was shown empty even when a CREATE query included PRIMARY KEY. #62319 (Murat Khairulin).
  • Do not build multithread insert pipeline for engines without max_insert_threads support. Fix insterted rows order in queries like INSERT INTO FUNCTION file/s3(...) SELECT * FROM ORDER BY col. #62333 (vdimir).
  • Resolve positional arguments only on the initiator node. Closes #62289. #62362 (flynn).
  • Fix filter pushdown from additional_table_filters in Merge engine in analyzer. Closes #62229. #62398 (Kruglov Pavel).
  • Fix Unknown expression or table expression identifier error for GLOBAL IN table queries (with new analyzer). Fixes #62286. #62409 (Nikolai Kochetov).
  • Respect settings truncate_on_insert/create_new_file_on_insert in s3/hdfs/azure engines during partitioned write. Closes #61492. #62425 (Kruglov Pavel).
  • Fix backup restore path for AzureBlobStorage to include specified blob path. #62447 (SmitaRKulkarni).
  • Fixed rare bug in SimpleSquashingChunksTransform that may lead to a loss of the last chunk of data in a stream. #62451 (Nikita Taranov).
  • Fix excessive memory usage for queries with nested lambdas. Fixes #62036. #62462 (Nikolai Kochetov).
  • Fix validation of special columns (ver, is_deleted, sign) in MergeTree engines on table creation and alter queries. Fixes #62463. #62498 (János Benjamin Antal).
  • Avoid crash when reading protobuf with recursive types. #62506 (Raúl Marín).
  • Fix 62459. #62524 (helifu).
  • Fix an error LIMIT expression must be constant in queries with constant expression in LIMIT/OFFSET which contains scalar subquery. Fixes #62294. #62567 (Nikolai Kochetov).
  • Fix segmentation fault when using Hive table engine. Reference #62154, #62560. #62578 (Nikolay Degterinsky).
  • Fix memory leak in groupArraySorted. Fix #62536. #62597 (Antonio Andelic).
  • Fix crash in largestTriangleThreeBuckets. #62646 (Raúl Marín).
  • Fix tumble[Start,End] and hop[Start,End] functions for resolutions bigger than a day. #62705 (Jordi Villar).
  • Fix argMin/argMax combinator state. #62708 (Raúl Marín).
  • Fix temporary data in cache failing because of a small value of setting filesystem_cache_reserve_space_wait_lock_timeout_milliseconds. Introduced a separate setting temporary_data_in_cache_reserve_space_wait_lock_timeout_milliseconds. #62715 (Kseniia Sumarokova).
  • Fixed crash in table function mergeTreeIndex after offloading some of the columns from suffix of primary key. #62762 (Anton Popov).
  • Fix size checks when updating materialized nested columns ( fixes #62731 ). #62773 (Eliot Hautefeuille).
  • Fix an error when FINAL is not applied when specified in CTE (new analyzer). Fixes #62779. #62811 (Duc Canh Le).
  • Fixed crash in function formatRow with JSON format in queries executed via the HTTP interface. #62840 (Anton Popov).
  • Fix failure to start when storage account URL has trailing slash. #62850 (Daniel Pozo Escalona).
  • Fixed bug in GCD codec implementation that may lead to server crashes. #62853 (Nikita Taranov).
  • Fix incorrect key analysis when LowCardinality(Nullable) keys appear in the middle of a hyperrectangle. This fixes #62848. #62866 (Amos Bird).
  • When we use function fromUnixTimestampInJodaSyntax to convert the input Int64 or UInt64 value to DateTime, sometimes it return the wrong resultbecause the input value may exceed the maximum value of Uint32 typeand the function will first convert the input value to Uint32, and so would lead to the wrong result. For example we have a table test_tbl(a Int64, b UInt64), and it has a row (10262736196, 10262736196), when use fromUnixTimestampInJodaSyntax to convert, the wrong result as below. #62901 (KevinyhZou).
  • Disable optimize_rewrite_aggregate_function_with_if for sum(nullable). #62912 (Raúl Marín).
  • Fix the Unexpected return type error for queries that read from StorageBuffer with PREWHERE when the source table has different types. Fixes #62545. #62916 (Nikolai Kochetov).
  • Fix temporary data in cache incorrect behaviour in case creation of cache key base directory fails with no space left on device. #62925 (Kseniia Sumarokova).
  • Fixed server crash on IPv6 gRPC client connection. #62978 (Konstantin Bogdanov).
  • Fix possible CHECKSUM_DOESNT_MATCH (and others) during replicated fetches. #62987 (Azat Khuzhin).
  • Fix terminate with uncaught exception in temporary data in cache. #62998 (Kseniia Sumarokova).
  • Fix optimize_rewrite_aggregate_function_with_if implicit cast. #62999 (Raúl Marín).
  • Fix possible crash after unsuccessful RESTORE. This PR fixes #62985. #63040 (Vitaly Baranov).
  • Fix Not found column in block error for distributed queries with server-side constants in GROUP BY key. Fixes #62682. #63047 (Nikolai Kochetov).
  • Fix incorrect judgement of of monotonicity of function abs. #63097 (Duc Canh Le).
  • Sanity check: Clamp values instead of throwing. #63119 (Raúl Marín).
  • Setting server_name might help with recently reported SSL handshake error when connecting to MongoDB Atlas: Poco::Exception. Code: 1000, e.code() = 0, SSL Exception: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR. #63122 (Alexander Gololobov).
  • The wire protocol version check for MongoDB used to try accessing "config" database, but this can fail if the user doesn't have permissions for it. The fix is to use the database name provided by user. #63126 (Alexander Gololobov).
  • Fix a bug when SQL SECURITY statement appears in all CREATE queries if the server setting ignore_empty_sql_security_in_create_view_query=true https://github.com/ClickHouse/ClickHouse/pull/63134. #63136 (pufit).

CI Fix or Improvement (changelog entry is not required)

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT