mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-11 08:52:06 +00:00
149 KiB
149 KiB
Table of Contents
ClickHouse release v24.4, 2024-04-30
ClickHouse release v24.3 LTS, 2024-03-26
ClickHouse release v24.2, 2024-02-29
ClickHouse release v24.1, 2024-01-30
Changelog for 2023
2024 Changelog
ClickHouse release 24.4, 2024-04-30
Upgrade Notes
clickhouse-odbc-bridge
andclickhouse-library-bridge
are now separate packages. This closes #61677. #62114 (Alexey Milovidov).- Don't allow to set max_parallel_replicas (for the experimental parallel reading from replicas) to
0
as it doesn't make sense. Closes #60140. #61201 (Kruglov Pavel). - Remove support for
INSERT WATCH
query (part of the deprecatedLIVE VIEW
feature). #62382 (Alexey Milovidov). - Removed the
optimize_monotonous_functions_in_order_by
setting. #63004 (Raúl Marín). - Remove experimental tag from the
Replicated
database engine. Now it is in Beta stage. #62937 (Justin de Guzman).
New Feature
- Support recursive CTEs. #62074 (Maksim Kita).
- Support
QUALIFY
clause. Closes #47819. #62619 (Maksim Kita). - Table engines are grantable now, and it won't affect existing users behavior. #60117 (jsc0218).
- Added a rewritable S3 disk which supports INSERT operations and does not require locally stored metadata. #61116 (Julia Kartseva). The main use case is for system tables.
- The syntax highlighting while typing in the client will work on the syntax level (previously, it worked on the lexer level). #62123 (Alexey Milovidov).
- Supports dropping multiple tables at the same time like
DROP TABLE a, b, c
;. #58705 (zhongyuankai). - Modifying memory table settings through
ALTER MODIFY SETTING
is now supported. Example:ALTER TABLE memory MODIFY SETTING min_rows_to_keep = 100, max_rows_to_keep = 1000;
. #62039 (zhongyuankai). - Added
role
query parameter to the HTTP interface. It works similarly toSET ROLE x
, applying the role before the statement is executed. This allows for overcoming the limitation of the HTTP interface, as multiple statements are not allowed, and it is not possible to send bothSET ROLE x
and the statement itself at the same time. It is possible to set multiple roles that way, e.g.,?role=x&role=y
, which will be an equivalent ofSET ROLE x, y
. #62669 (Serge Klochkov). - Add
SYSTEM UNLOAD PRIMARY KEY
to free up memory usage for a table's primary key. #62738 (Pablo Marcos). - Added
value1
,value2
, ...,value10
columns tosystem.text_log
. These columns contain values that were used to format the message. #59619 (Alexey Katsman). - Added persistent virtual column
_block_offset
which stores original number of row in block that was assigned at insert. Persistence of column_block_offset
can be enabled by the MergeTree settingenable_block_offset_column
. Added virtual column_part_data_version
which contains either min block number or mutation version of part. Persistent virtual column_block_number
is not considered experimental anymore. #60676 (Anton Popov). - Add a setting
input_format_json_throw_on_bad_escape_sequence
, disabling it allows saving bad escape sequences in JSON input formats. #61889 (Kruglov Pavel).
Performance Improvement
- JOIN filter push down improvements using equivalent sets. #61216 (Maksim Kita).
- Convert OUTER JOIN to INNER JOIN optimization if the filter after JOIN always filters default values. Optimization can be controlled with setting
query_plan_convert_outer_join_to_inner_join
, enabled by default. #62907 (Maksim Kita). - Improvement for AWS S3. Client has to send header 'Keep-Alive: timeout=X' to the server. If a client receives a response from the server with that header, client has to use the value from the server. Also for a client it is better not to use a connection which is nearly expired in order to avoid connection close race. #62249 (Sema Checherinda).
- Reduce overhead of the mutations for SELECTs (v2). #60856 (Azat Khuzhin).
- More frequently invoked functions in PODArray are now force-inlined. #61144 (李扬).
- Speed up parsing of JSON by skipping the rest of the object when all required columns are read. #62210 (lgbo).
- Improve trivial insert select from files in file/s3/hdfs/url/... table functions. Add separate max_parsing_threads setting to control the number of threads used in parallel parsing. #62404 (Kruglov Pavel).
- Functions
to_utc_timestamp
andfrom_utc_timestamp
are now about 2x faster. #62583 (KevinyhZou). - Functions
parseDateTimeOrNull
,parseDateTimeOrZero
,parseDateTimeInJodaSyntaxOrNull
andparseDateTimeInJodaSyntaxOrZero
now run significantly faster (10x - 1000x) when the input contains mostly non-parseable values. #62634 (LiuNeng). - SELECTs against
system.query_cache
are now noticeably faster when the query cache contains lots of entries (e.g. more than 100.000). #62671 (Robert Schulze). - Less contention in filesystem cache (part 3): execute removal from filesystem without lock on space reservation attempt. #61163 (Kseniia Sumarokova).
- Speed up dynamic resize of filesystem cache. #61723 (Kseniia Sumarokova).
- Dictionary source with
INVALIDATE_QUERY
is not reloaded twice on startup. #62050 (vdimir). - Fix an issue where when a redundant
= 1
or= 0
is added after a boolean expression involving the primary key, the primary index is not used. For example, bothSELECT * FROM <table> WHERE <primary-key> IN (<value>) = 1
andSELECT * FROM <table> WHERE <primary-key> NOT IN (<value>) = 0
will both perform a full table scan, when the primary index can be used. #62142 (josh-hildred). - Return stream of chunks from
system.remote_data_paths
instead of accumulating the whole result in one big chunk. This allows to consume less memory, show intermediate progress and cancel the query. #62613 (Alexander Gololobov).
Experimental Feature
- Support parallel write buffer for Azure Blob Storage managed by setting
azure_allow_parallel_part_upload
. #62534 (SmitaRKulkarni). - Userspace page cache works with static web storage (
disk(type = web)
) now. Use client settinguse_page_cache_for_disks_without_file_cache=1
to enable. #61911 (Michael Kolupaev). - Don't treat Bool and number variants as suspicious in the
Variant
type. #61999 (Kruglov Pavel). - Implement better conversion from String to
Variant
using parsing. #62005 (Kruglov Pavel). - Support
Variant
in JSONExtract functions. #62014 (Kruglov Pavel). - Mark type
Variant
as comparable so it can be used in primary key. #62693 (Kruglov Pavel).
Improvement
- For convenience purpose,
SELECT * FROM numbers()
will work in the same way asSELECT * FROM system.numbers
- without a limit. #61969 (YenchangChan). - Introduce separate consumer/producer tags for the Kafka configuration. This avoids warnings from librdkafka (a bad C library with a lot of bugs) that consumer properties were specified for producer instances and vice versa (e.g.
Configuration property session.timeout.ms is a consumer property and will be ignored by this producer instance
). Closes: #58983. #58956 (Aleksandr Musorin). - Functions
date_diff
andage
now calculate their result at nanosecond instead of microsecond precision. They now also offernanosecond
(ornanoseconds
orns
) as a possible value for theunit
parameter. #61409 (Austin Kothig). - Added nano-, micro-, milliseconds unit for
date_trunc
. #62335 (Misz606). - Reload certificate chain during certificate reload. #61671 (Pervakov Grigorii).
- Try to prevent an error #60432 by not allowing a table to be attached if there is an active replica for that replica path. #61876 (Arthur Passos).
- Implement support for
input
forclickhouse-local
. #61923 (Azat Khuzhin). Join
table engine with strictnessANY
is consistent after reload. When several rows with the same key are inserted, the first one will have higher priority (before, it was chosen randomly upon table loading). close #51027. #61972 (vdimir).- Automatically infer Nullable column types from Apache Arrow schema. #61984 (Maksim Kita).
- Allow to cancel parallel merge of aggregate states during aggregation. Example:
uniqExact
. #61992 (Maksim Kita). - Use
system.keywords
to fill in the suggestions and also use them in the all places internally. #62000 (Nikita Mikhaylov). OPTIMIZE FINAL
forReplicatedMergeTree
now will wait for currently active merges to finish and then reattempt to schedule a final merge. This will put it more in line with ordinaryMergeTree
behaviour. #62067 (Nikita Taranov).- While read data from a hive text file, it would use the first line of hive text file to resize of number of input fields, and sometimes the fields number of first line is not matched with the hive table defined , such as the hive table is defined to have 3 columns, like
test_tbl(a Int32, b Int32, c Int32)
, but the first line of text file only has 2 fields, and in this suitation, the input fields will be resized to 2, and if the next line of the text file has 3 fields, then the third field can not be read but set a default value 0, which is not right. #62086 (KevinyhZou). CREATE AS
copies the table's comment. #62117 (Pablo Marcos).- Add query progress to table zookeeper. #62152 (JackyWoo).
- Add ability to turn on trace collector (Real and CPU) server-wide. #62189 (alesapin).
- Added setting
lightweight_deletes_sync
(default value: 2 - wait all replicas synchronously). It is similar to settingmutations_sync
but affects only behaviour of lightweight deletes. #62195 (Anton Popov). - Distinguish booleans and integers while parsing values for custom settings:
SET custom_a = true; SET custom_b = 1;
. #62206 (Vitaly Baranov). - Support S3 access through AWS Private Link Interface endpoints. Closes #60021, #31074 and #53761. #62208 (Arthur Passos).
- Do not create a directory for UDF in clickhouse-client if it does not exist. This closes #59597. #62366 (Alexey Milovidov).
- The query cache now no longer caches results of queries against system tables (
system.*
,information_schema.*
,INFORMATION_SCHEMA.*
). #62376 (Robert Schulze). MOVE PARTITION TO TABLE
query can be delayed or can throwTOO_MANY_PARTS
exception to avoid exceeding limits on the part count. The same settings and limits are applied as for theINSERT
query (seemax_parts_in_total
,parts_to_delay_insert
,parts_to_throw_insert
,inactive_parts_to_throw_insert
,inactive_parts_to_delay_insert
,max_avg_part_size_for_too_many_parts
,min_delay_to_insert_ms
andmax_delay_to_insert
settings). #62420 (Sergei Trifonov).- Changed the default installation directory on macOS from
/usr/bin
to/usr/local/bin
. This is necessary because Apple's System Integrity Protection introduced with macOS El Capitan (2015) prevents writing into/usr/bin
, even withsudo
. #62489 (haohang). - Make transform always return the first match. #62518 (Raúl Marín).
- Added the missing
hostname
column to system tableblob_storage_log
. #62456 (Jayme Bird). - For consistency with other system tables,
system.backup_log
now has a columnevent_time
. #62541 (Jayme Bird). - Table
system.backup_log
now has the "default" sorting key which isevent_date, event_time
, the same as for other_log
table engines. #62667 (Nikita Mikhaylov). - Avoid evaluating table DEFAULT expressions while executing
RESTORE
. #62601 (Vitaly Baranov). - S3 storage and backups also need the same default keep alive settings as s3 disk. #62648 (Sema Checherinda).
- Add librdkafka's (that infamous C library, which has a lot of bugs) client identifier to log messages to be able to differentiate log messages from different consumers of a single table. #62813 (János Benjamin Antal).
- Allow special macros
{uuid}
and{database}
in a Replicated database ZooKeeper path. #62818 (Vitaly Baranov). - Allow quota key with different auth scheme in HTTP requests. #62842 (Kseniia Sumarokova).
- Reduce the verbosity of command line argument
--help
inclickhouse client
andclickhouse local
. The previous output is now generated by--help --verbose
. #62973 (Yarik Briukhovetskyi). log_bin_use_v1_row_events
was removed in MySQL 8.3, and we adjust the experimentalMaterializedMySQL
engine for it #60479. #63101 (Eugene Klimov). Author: Nikolay Yankin.
Build/Testing/Packaging Improvement
- Vendor in Rust dependencies, so the Rust code (that we use for minor features for hype and lulz) can be built in a sane way, similarly to C++. #62297 (Raúl Marín).
- ClickHouse now uses OpenSSL 3.2 instead of BoringSSL. #59870 (Robert Schulze). Note that OpenSSL has generally worse engineering culture (such as non-zero number of sanitizer reports, that we had to patch, a complex build system with generated files, etc.) but has better compatibility.
- Ignore DROP queries in stress test with 1/2 probability, use TRUNCATE instead of ignoring DROP in upgrade check for Memory/JOIN tables. #61476 (Kruglov Pavel).
- Remove from the Keeper Docker image the volumes at /etc/clickhouse-keeper and /var/log/clickhouse-keeper. #61683 (Tristan).
- Add tests for all issues which are no longer relevant with Analyzer being enabled by default. Closes: #55794 Closes: #49472 Closes: #44414 Closes: #13843 Closes: #55803 Closes: #48308 Closes: #45535 Closes: #44365 Closes: #44153 Closes: #42399 Closes: #27115 Closes: #23162 Closes: #15395 Closes: #15411 Closes: #14978 Closes: #17319 Closes: #11813 Closes: #13210 Closes: #23053 Closes: #37729 Closes: #32639 Closes: #9954 Closes: #41964 Closes: #54317 Closes: #7520 Closes: #36973 Closes: #40955 Closes: #19687 Closes: #23104 Closes: #21584 Closes: #23344 Closes: #22627 Closes: #10276 Closes: #19687 Closes: #4567 Closes: #17710 Closes: #11068 Closes: #24395 Closes: #23416 Closes: #23162 Closes: #25655 Closes: #11757 Closes: #6571 Closes: #4432 Closes: #8259 Closes: #9233 Closes: #14699 Closes: #27068 Closes: #28687 Closes: #28777 Closes: #29734 Closes: #61238 Closes: #33825 Closes: #35608 Closes: #29838 Closes: #35652 Closes: #36189 Closes: #39634 Closes: #47432 Closes: #54910 Closes: #57321 Closes: #59154 Closes: #61014 Closes: #61950 Closes: #55647 Closes: #61947. #62185 (Nikita Mikhaylov).
- Add more tests from issues which are no longer relevant or fixed by analyzer. Closes: #58985 Closes: #59549 Closes: #36963 Closes: #39453 Closes: #56521 Closes: #47552 Closes: #56503 Closes: #59101 Closes: #50271 Closes: #54954 Closes: #56466 Closes: #11000 Closes: #10894 Closes: https://github.com/ClickHouse/ClickHouse/issues/448 Closes: #8030 Closes: #32139 Closes: #47288 Closes: #50705 Closes: #54511 Closes: #55466 Closes: #58500 Closes: #39923 Closes: #39855 Closes: #4596 Closes: #47422 Closes: #33000 Closes: #14739 Closes: #44039 Closes: #8547 Closes: #22923 Closes: #23865 Closes: #29748 Closes: #4222. #62457 (Nikita Mikhaylov).
- Fixed build errors when OpenSSL is linked dynamically (note: this is generally unsupported and only required for IBM's s390x platforms). #62888 (Harry Lee).
Bug Fix (user-visible misbehavior in an official stable release)
- Fix logical-error when undoing quorum insert transaction. #61953 (Han Fei).
- Fix parser error when using COUNT(*) with FILTER clause #61357 (Duc Canh Le).
- Fix logical error in
group_by_use_nulls
+ grouping sets + analyzer + materialize/constant #61567 (Kruglov Pavel). - Cancel merges before removing moved parts #61610 (János Benjamin Antal).
- Fix abort in Apache Arrow #61720 (Kruglov Pavel).
- Search for
convert_to_replicated
flag at the correct path corresponding to the specific disk #61769 (Kirill). - Fix possible connections data-race for distributed_foreground_insert/distributed_background_insert_batch #61867 (Azat Khuzhin).
- Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats #61883 (Kruglov Pavel).
- Fix writing exception message in output format in HTTP when http_wait_end_of_query is used #61951 (Kruglov Pavel).
- Proper fix for LowCardinality together with JSONExtact functions #61957 (Nikita Mikhaylov).
- Crash in Engine Merge if Row Policy does not have expression #61971 (Ilya Golshtein).
- Fix WriteBufferAzureBlobStorage destructor uncaught exception #61988 (SmitaRKulkarni).
- Fix CREATE TABLE without columns definition for ReplicatedMergeTree #62040 (Azat Khuzhin).
- Fix optimize_skip_unused_shards_rewrite_in for composite sharding key #62047 (Azat Khuzhin).
- ReadWriteBufferFromHTTP set right header host when redirected #62068 (Sema Checherinda).
- Fix external table cannot parse data type Bool #62115 (Duc Canh Le).
- Analyzer: Fix query parameter resolution #62186 (Dmitry Novik).
- Fix restoring parts while readonly #62207 (Vitaly Baranov).
- Fix crash in index definition containing SQL UDF #62225 (vdimir).
- Fixing NULL random seed for generateRandom with analyzer. #62248 (Nikolai Kochetov).
- Correctly handle const columns in Distinct Transfom #62250 (Antonio Andelic).
- Fix Parts Splitter for queries with the FINAL modifier #62268 (Nikita Taranov).
- Analyzer: Fix alias to parametrized view resolution #62274 (Dmitry Novik).
- Analyzer: Fix name resolution from parent scopes #62281 (Dmitry Novik).
- Fix argMax with nullable non native numeric column #62285 (Raúl Marín).
- Fix BACKUP and RESTORE of a materialized view in Ordinary database #62295 (Vitaly Baranov).
- Fix data race on scalars in Context #62305 (Kruglov Pavel).
- Fix primary key in materialized view #62319 (Murat Khairulin).
- Do not build multithread insert pipeline for tables without support #62333 (vdimir).
- Fix analyzer with positional arguments in distributed query #62362 (flynn).
- Fix filter pushdown from additional_table_filters in Merge engine in analyzer #62398 (Kruglov Pavel).
- Fix GLOBAL IN table queries with analyzer. #62409 (Nikolai Kochetov).
- Respect settings truncate_on_insert/create_new_file_on_insert in s3/hdfs/azure engines during partitioned write #62425 (Kruglov Pavel).
- Fix backup restore path for AzureBlobStorage #62447 (SmitaRKulkarni).
- Fix SimpleSquashingChunksTransform #62451 (Nikita Taranov).
- Fix capture of nested lambda. #62462 (Nikolai Kochetov).
- Avoid crash when reading protobuf with recursive types #62506 (Raúl Marín).
- Fix a bug moving one partition from one to itself #62524 (helifu).
- Fix scalar subquery in LIMIT #62567 (Nikolai Kochetov).
- Fix segfault in the experimental and unsupported Hive engine, which we don't like anyway #62578 (Nikolay Degterinsky).
- Fix memory leak in groupArraySorted #62597 (Antonio Andelic).
- Fix crash in largestTriangleThreeBuckets #62646 (Raúl Marín).
- Fix tumble[Start,End] and hop[Start,End] for bigger resolutions #62705 (Jordi Villar).
- Fix argMin/argMax combinator state #62708 (Raúl Marín).
- Fix temporary data in cache failing because of cache lock contention optimization #62715 (Kseniia Sumarokova).
- Fix crash in function
mergeTreeIndex
#62762 (Anton Popov). - fix: update: nested materialized columns: size check fixes #62773 (Eliot Hautefeuille).
- Fix FINAL modifier is not respected in CTE with analyzer #62811 (Duc Canh Le).
- Fix crash in function
formatRow
withJSON
format and HTTP interface #62840 (Anton Popov). - Azure: fix building final url from endpoint object #62850 (Daniel Pozo Escalona).
- Fix GCD codec #62853 (Nikita Taranov).
- Fix LowCardinality(Nullable) key in hyperrectangle #62866 (Amos Bird).
- Fix fromUnixtimestamp in joda syntax while the input value beyond UInt32 #62901 (KevinyhZou).
- Disable optimize_rewrite_aggregate_function_with_if for sum(nullable) #62912 (Raúl Marín).
- Fix PREWHERE for StorageBuffer with different source table column types. #62916 (Nikolai Kochetov).
- Fix temporary data in cache incorrectly processing failure of cache key directory creation #62925 (Kseniia Sumarokova).
- gRPC: fix crash on IPv6 peer connection #62978 (Konstantin Bogdanov).
- Fix possible CHECKSUM_DOESNT_MATCH (and others) during replicated fetches #62987 (Azat Khuzhin).
- Fix terminate with uncaught exception in temporary data in cache #62998 (Kseniia Sumarokova).
- Fix optimize_rewrite_aggregate_function_with_if implicit cast #62999 (Raúl Marín).
- Fix unhandled exception in ~RestorerFromBackup #63040 (Vitaly Baranov).
- Do not remove server constants from GROUP BY key for secondary query. #63047 (Nikolai Kochetov).
- Fix incorrect judgement of of monotonicity of function abs #63097 (Duc Canh Le).
- Set server name for SSL handshake in MongoDB engine #63122 (Alexander Gololobov).
- Use user specified db instead of "config" for MongoDB wire protocol version check #63126 (Alexander Gololobov).
ClickHouse release 24.3 LTS, 2024-03-27
Upgrade Notes
- The setting
allow_experimental_analyzer
is enabled by default and it switches the query analysis to a new implementation, which has better compatibility and feature completeness. The feature "analyzer" is considered beta instead of experimental. You can turn the old behavior by setting thecompatibility
to24.2
or disabling theallow_experimental_analyzer
setting. Watch the video on YouTube. - ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings,
output_format_parquet_string_as_string
,output_format_orc_string_as_string
,output_format_arrow_string_as_string
. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the fasterlz4
compression method, that's why we setzstd
by default. This is controlled by the settingsoutput_format_parquet_compression_method
,output_format_orc_compression_method
, andoutput_format_arrow_compression_method
. We changed the default tozstd
for Parquet and ORC, but not Arrow (it is emphasized for low-level usages). #61817 (Alexey Milovidov). - In the new ClickHouse version, the functions
geoDistance
,greatCircleDistance
, andgreatCircleAngle
will use 64-bit double precision floating point data type for internal calculations and return type if all the arguments are Float64. This closes #58476. In previous versions, the function always used Float32. You can switch to the old behavior by settinggeo_distance_returns_float64_on_float64_arguments
tofalse
or settingcompatibility
to24.2
or earlier. #61848 (Alexey Milovidov). Co-authored with Geet Patel. - The obsolete in-memory data parts have been deprecated since version 23.5 and have not been supported since version 23.10. Now the remaining code is removed. Continuation of #55186 and #45409. It is unlikely that you have used in-memory data parts because they were available only before version 23.5 and only when you enabled them manually by specifying the corresponding SETTINGS for a MergeTree table. To check if you have in-memory data parts, run the following query:
SELECT part_type, count() FROM system.parts GROUP BY part_type ORDER BY part_type
. To disable the usage of in-memory data parts, doALTER TABLE ... MODIFY SETTING min_bytes_for_compact_part = DEFAULT, min_rows_for_compact_part = DEFAULT
. Before upgrading from old ClickHouse releases, first check that you don't have in-memory data parts. If there are in-memory data parts, disable them first, then wait while there are no in-memory data parts and continue the upgrade. #61127 (Alexey Milovidov). - Changed the column name from
duration_ms
toduration_microseconds
in thesystem.zookeeper
table to reflect the reality that the duration is in the microsecond resolution. #60774 (Duc Canh Le). - Reject incoming INSERT queries in case when query-level settings
async_insert
anddeduplicate_blocks_in_dependent_materialized_views
are enabled together. This behaviour is controlled by a settingthrow_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert
and enabled by default. This is a continuation of https://github.com/ClickHouse/ClickHouse/pull/59699 needed to unblock https://github.com/ClickHouse/ClickHouse/pull/59915. #60888 (Nikita Mikhaylov). - Utility
clickhouse-copier
is moved to a separate repository on GitHub: https://github.com/ClickHouse/copier. It is no longer included in the bundle but is still available as a separate download. This closes: #60734 This closes: #60540 This closes: #60250 This closes: #52917 This closes: #51140 This closes: #47517 This closes: #47189 This closes: #46598 This closes: #40257 This closes: #36504 This closes: #35485 This closes: #33702 This closes: #26702. - To increase compatibility with MySQL, the compatibility alias
locate
now accepts arguments(needle, haystack[, start_pos])
by default. The previous behavior(haystack, needle, [, start_pos])
can be restored by settingfunction_locate_has_mysql_compatible_argument_order = 0
. #61092 (Robert Schulze). - Forbid
SimpleAggregateFunction
inORDER BY
ofMergeTree
tables (likeAggregateFunction
is forbidden, but they are forbidden because they are not comparable) by default (useallow_suspicious_primary_key
to allow them). #61399 (Azat Khuzhin). - The
Ordinary
database engine is deprecated. You will receive a warning in clickhouse-client if your server is using it. This closes #52229. #56942 (shabroo).
New Feature
- Support reading and writing backups as
tar
(in addition tozip
). #59535 (josh-hildred). - Implemented support for S3 Express buckets. #59965 (Nikita Taranov).
- Allow to attach parts from a different disk (using copy instead of hard link). #60112 (Unalian).
- Size-capped
Memory
tables: controlled by their settings,min_bytes_to_keep, max_bytes_to_keep, min_rows_to_keep
andmax_rows_to_keep
. #60612 (Jake Bamrah). - Separate limits on number of waiting and executing queries. Added new server setting
max_waiting_queries
that limits the number of queries waiting due toasync_load_databases
. Existing limits on number of executing queries no longer count waiting queries. #61053 (Sergei Trifonov). - Added a table
system.keywords
which contains all the keywords from parser. Mostly needed and will be used for better fuzzing and syntax highlighting. #51808 (Nikita Mikhaylov). - Add support for
ATTACH PARTITION ALL
. #61107 (Kirill Nikiforov). - Add a new function,
getClientHTTPHeader
. This closes #54665. Co-authored with @lingtaolf. #61820 (Alexey Milovidov). - Add
generate_series
as a table function (compatibility alias for PostgreSQL to the existingnumbers
function). This function generates table with an arithmetic progression with natural numbers. #59390 (divanik). - A mode for
topK
/topkWeighed
support mode, which return count of values and its error. #54508 (UnamedRus). - Added function
toMillisecond
which returns the millisecond component for values of typeDateTime
orDateTime64
. #60281 (Shaun Struwig). - Allow configuring HTTP redirect handlers for clickhouse-server. For example, you can make
/
redirect to the Play UI. #60390 (Alexey Milovidov).
Performance Improvement
- Optimized function
dotProduct
to omit unnecessary and expensive memory copies. #60928 (Robert Schulze). - 30x faster printing for 256-bit integers. #61100 (Raúl Marín).
- If the table's primary key contains mostly useless columns, don't keep them in memory. This is controlled by a new setting
primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns
with the value0.9
by default, which means: for a composite primary key, if a column changes its value for at least 0.9 of all the times, the next columns after it will be not loaded. #60255 (Alexey Milovidov). - Improve the performance of serialized aggregation methods when involving multiple
Nullable
columns. #55809 (Amos Bird). - Lazy builds JSON's output to improve performance of ALL JOIN. #58278 (LiuNeng).
- Make HTTP/HTTPs connections with external services, such as AWS S3 reusable for all use cases. Even when the response is 3xx or 4xx. #58845 (Sema Checherinda).
- Improvements to aggregate functions
argMin
/argMax
/any
/anyLast
/anyHeavy
, as well asORDER BY {u8/u16/u32/u64/i8/i16/u32/i64) LIMIT 1
queries. #58640 (Raúl Marín). - Trivial optimization for column's filter. Peak memory can be reduced to 44% of the original in some cases. #59698 (李扬).
- Execute
multiIf
function in a columnar fashion when the result type's underlying type is a number. #60384 (李扬). - Faster (almost 2x) mutexes. #60823 (Azat Khuzhin).
- Drain multiple connections in parallel when a distributed query is finishing. #60845 (lizhuoyu5).
- Optimize data movement between columns of a Nullable number or a Nullable string, which improves some micro-benchmarks. #60846 (李扬).
- Operations with the filesystem cache will suffer less from the lock contention. #61066 (Alexey Milovidov).
- Optimize array join and other JOINs by preventing a wrong compiler's optimization. Close #61074. #61075 (李扬).
- If a query with a syntax error contained the
COLUMNS
matcher with a regular expression, the regular expression was compiled each time during the parser's backtracking, instead of being compiled once. This was a fundamental error. The compiled regexp was put to AST. But the letter A in AST means "abstract" which means it should not contain heavyweight objects. Parts of AST can be created and discarded during parsing, including a large number of backtracking. This leads to slowness on the parsing side and consequently allows DoS by a readonly user. But the main problem is that it prevents progress in fuzzers. #61543 (Alexey Milovidov). - Add a new analyzer pass to optimize the IN operator for a single value. #61564 (LiuNeng).
- DNSResolver shuffles set of resolved IPs which is needed to uniformly utilize multiple endpoints of AWS S3. #60965 (Sema Checherinda).
Experimental Feature
- Support parallel reading for Azure blob storage. This improves the performance of the experimental Azure object storage. #61503 (SmitaRKulkarni).
- Add asynchronous WriteBuffer for Azure blob storage similar to S3. This improves the performance of the experimental Azure object storage. #59929 (SmitaRKulkarni).
- Use managed identity for backups IO when using Azure Blob Storage. Add a setting to prevent ClickHouse from attempting to create a non-existent container, which requires permissions at the storage account level. #61785 (Daniel Pozo Escalona).
- Add a setting
parallel_replicas_allow_in_with_subquery = 1
which allows subqueries for IN work with parallel replicas. #60950 (Nikolai Kochetov). - A change for the "zero-copy" replication: all zero copy locks related to a table have to be dropped when the table is dropped. The directory which contains these locks has to be removed also. #57575 (Sema Checherinda).
Improvement
- Use
MergeTree
as a default table engine. #60524 (Alexey Milovidov) - Enable
output_format_pretty_row_numbers
by default. It is better for usability. #61791 (Alexey Milovidov). - In the previous version, some numbers in Pretty formats were not pretty enough. #61794 (Alexey Milovidov).
- A long value in Pretty formats won't be cut if it is the single value in the resultset, such as in the result of the
SHOW CREATE TABLE
query. #61795 (Alexey Milovidov). - Similarly to
clickhouse-local
,clickhouse-client
will accept the--output-format
option as a synonym to the--format
option. This closes #59848. #61797 (Alexey Milovidov). - If stdout is a terminal and the output format is not specified,
clickhouse-client
and similar tools will usePrettyCompact
by default, similarly to the interactive mode.clickhouse-client
andclickhouse-local
will handle command line arguments for input and output formats in a unified fashion. This closes #61272. #61800 (Alexey Milovidov). - Underscore digit groups in Pretty formats for better readability. This is controlled by a new setting,
output_format_pretty_highlight_digit_groups
. #61802 (Alexey Milovidov). - Add ability to override initial INSERT settings via
SYSTEM FLUSH DISTRIBUTED
. #61832 (Azat Khuzhin). - Enable processors profiling (time spent/in and out bytes for sorting, aggregation, ...) by default. #61096 (Azat Khuzhin).
- Support files without format extension in Filesystem database. #60795 (Kruglov Pavel).
- Make all format names case insensitive, like Tsv, or TSV, or tsv, or even rowbinary. #60420 (豪肥肥). I appreciate if you will continue to write it correctly, e.g.,
JSON
😇, notJson
🤮, but we don't mind if you spell it as you prefer. - Added
none_only_active
mode fordistributed_ddl_output_mode
setting. #60340 (Alexander Tokmakov). - The advanced dashboard has slightly better colors for multi-line graphs. #60391 (Alexey Milovidov).
- The Advanced dashboard now has controls always visible on scrolling. This allows you to add a new chart without scrolling up. #60692 (Alexey Milovidov).
- While running the
MODIFY COLUMN
query for materialized views, check the inner table's structure to ensure every column exists. #47427 (sunny). - String types and Enums can be used in the same context, such as: arrays, UNION queries, conditional expressions. This closes #60726. #60727 (Alexey Milovidov).
- Allow declaring Enums in the structure of external data for query processing (this is an immediate temporary table that you can provide for your query). #57857 (Duc Canh Le).
- Consider lightweight deleted rows when selecting parts to merge, so the disk size of the resulting part will be estimated better. #58223 (Zhuo Qiu).
- Added comments for columns for more system tables. Continuation of https://github.com/ClickHouse/ClickHouse/pull/58356. #59016 (Nikita Mikhaylov).
- Now we can use virtual columns in PREWHERE. It's worthwhile for non-const virtual columns like
_part_offset
. #59033 (Amos Bird). Improved overall usability of virtual columns. Now it is allowed to use virtual columns inPREWHERE
(it's worthwhile for non-const virtual columns like_part_offset
). Now a builtin documentation is available for virtual columns as a comment of column inDESCRIBE
query with enabled settingdescribe_include_virtual_columns
. #60205 (Anton Popov). - Instead of using a constant key, now object storage generates key for determining remove objects capability. #59495 (Sema Checherinda).
- Allow "local" as object storage type instead of "local_blob_storage". #60165 (Kseniia Sumarokova).
- Parallel flush of pending INSERT blocks of Distributed engine on
DETACH
/server shutdown andSYSTEM FLUSH DISTRIBUTED
(Parallelism will work only if you have multi-disk policy for a table (like everything in the Distributed engine right now)). #60225 (Azat Khuzhin). - Add a setting to force read-through cache for merges. #60308 (Kseniia Sumarokova).
- An improvement for the MySQL compatibility protocol. The issue #57598 mentions a variant behaviour regarding transaction handling. An issued COMMIT/ROLLBACK when no transaction is active is reported as an error contrary to MySQL behaviour. #60338 (PapaToemmsn).
- Function
substring
now has a new aliasbyteSlice
. #60494 (Robert Schulze). - Renamed server setting
dns_cache_max_size
todns_cache_max_entries
to reduce ambiguity. #60500 (Kirill Nikiforov). SHOW INDEX | INDEXES | INDICES | KEYS
no longer sorts by the primary key columns (which was unintuitive). #60514 (Robert Schulze).- Keeper improvement: abort during startup if an invalid snapshot is detected to avoid data loss. #60537 (Antonio Andelic).
- Update tzdata to 2024a. #60768 (Raúl Marín).
- Keeper improvement: support
leadership_expiry_ms
in Keeper's settings. #60806 (Brokenice0415). - Always infer exponential numbers in JSON formats regardless of the setting
input_format_try_infer_exponent_floats
. Add settinginput_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects
that allows to use String type for ambiguous paths instead of an exception during named Tuples inference from JSON objects. #60808 (Kruglov Pavel). - Add support for
START TRANSACTION
syntax typically used in MySQL syntax, resolving https://github.com/ClickHouse/ClickHouse/discussions/60865. #60886 (Zach Naimon). - Add a flag for the full-sorting merge join algorithm to treat null as biggest/smallest. So the behavior can be compitable with other SQL systems, like Apache Spark. #60896 (loudongfeng).
- Support detect output format by file exctension in
clickhouse-client
andclickhouse-local
. #61036 (豪肥肥). - Update memory limit in runtime when Linux's CGroups value changed. #61049 (Han Fei).
- Add the function
toUInt128OrZero
, which was missed by mistake (the mistake is related to https://github.com/ClickHouse/ClickHouse/pull/945). The compatibility aliasesFROM_UNIXTIME
andDATE_FORMAT
(they are not ClickHouse-native and only exist for MySQL compatibility) have been made case insensitive, as expected for SQL-compatibility aliases. #61114 (Alexey Milovidov). - Improvements for the access checks, allowing to revoke of unpossessed rights in case the target user doesn't have the revoking grants either. Example:
GRANT SELECT ON *.* TO user1; REVOKE SELECT ON system.* FROM user1;
. #61115 (pufit). - Fix
has()
function withNullable
column (fixes #60214). #61249 (Mikhail Koviazin). - Now it's possible to specify the attribute
merge="true"
in config substitutions for subtrees<include from_zk="/path" merge="true">
. In case this attribute specified, clickhouse will merge subtree with existing configuration, otherwise default behavior is append new content to configuration. #61299 (alesapin). - Add async metrics for virtual memory mappings:
VMMaxMapCount
&VMNumMaps
. Closes #60662. #61354 (Tuan Pham Anh). - Use
temporary_files_codec
setting in all places where we create temporary data, for example external memory sorting and external memory GROUP BY. Before it worked only inpartial_merge
JOIN algorithm. #61456 (Maksim Kita). - Add a new setting
max_parser_backtracks
which allows to limit the complexity of query parsing. #61502 (Alexey Milovidov). - Less contention during dynamic resize of the filesystem cache. #61524 (Kseniia Sumarokova).
- Disallow sharded mode of StorageS3 queue, because it will be rewritten. #61537 (Kseniia Sumarokova).
- Fixed typo: from
use_leagcy_max_level
touse_legacy_max_level
. #61545 (William Schoeffel). - Remove some duplicate entries in
system.blob_storage_log
. #61622 (YenchangChan). - Added
current_user
function as a compatibility alias for MySQL. #61770 (Yarik Briukhovetskyi). - Fix inconsistent floating point aggregate function states in mixed x86-64 / ARM clusters #60610 (Harry Lee).
Build/Testing/Packaging Improvement
- The real-time query profiler now works on AArch64. In previous versions, it worked only when a program didn't spend time inside a syscall. #60807 (Alexey Milovidov).
- ClickHouse version has been added to docker labels. Closes #54224. #60949 (Nikolay Monkov).
- Upgrade
prqlc
to 0.11.3. #60616 (Maximilian Roos). - Add generic query text fuzzer in
clickhouse-local
. #61508 (Alexey Milovidov).
Bug Fix (user-visible misbehavior in an official stable release)
- Fix finished_mutations_to_keep=0 for MergeTree (as docs says 0 is to keep everything) #60031 (Azat Khuzhin).
- Something was wrong with the FINAL optimization, here is how the author describes it: "PartsSplitter invalid ranges for the same part". #60041 (Maksim Kita).
- Something was wrong with Apache Hive, which is experimental and not supported. #60262 (shanfengp).
- An improvement for experimental parallel replicas: force reanalysis if parallel replicas changed #60362 (Raúl Marín).
- Fix usage of plain metadata type with new disks configuration option #60396 (Kseniia Sumarokova).
- Try to fix logical error 'Cannot capture column because it has incompatible type' in mapContainsKeyLike #60451 (Kruglov Pavel).
- Avoid calculation of scalar subqueries for CREATE TABLE. #60464 (Nikolai Kochetov).
- Fix deadlock in parallel parsing when lots of rows are skipped due to errors #60516 (Kruglov Pavel).
- Something was wrong with experimental KQL (Kusto) support: fix
max_query_size_for_kql_compound_operator
: #60534 (Yong Wang). - Keeper fix: add timeouts when waiting for commit logs #60544 (Antonio Andelic).
- Don't output number tips for date types #60577 (Raúl Marín).
- Fix reading from MergeTree with non-deterministic functions in filter #60586 (Kruglov Pavel).
- Fix logical error on bad compatibility setting value type #60596 (Kruglov Pavel).
- fix(prql): Robust panic handler #60615 (Maximilian Roos).
- Fix
intDiv
for decimal and date arguments #60672 (Yarik Briukhovetskyi). - Fix: expand CTE in alter modify query #60682 (Yakov Olkhovskiy).
- Fix system.parts for non-Atomic/Ordinary database engine (i.e. Memory) #60689 (Azat Khuzhin).
- Fix "Invalid storage definition in metadata file" for parameterized views #60708 (Azat Khuzhin).
- Fix buffer overflow in CompressionCodecMultiple #60731 (Alexey Milovidov).
- Remove nonsense from SQL/JSON #60738 (Alexey Milovidov).
- Remove wrong assertion in aggregate function quantileGK #60740 (李扬).
- Fix insert-select + insert_deduplication_token bug by setting streams to 1 #60745 (Jordi Villar).
- Prevent setting custom metadata headers on unsupported multipart upload operations #60748 (Francisco J. Jurado Moreno).
- Fix toStartOfInterval #60763 (Andrey Zvonov).
- Fix crash in arrayEnumerateRanked #60764 (Raúl Marín).
- Fix crash when using input() in INSERT SELECT JOIN #60765 (Kruglov Pavel).
- Fix crash with different allow_experimental_analyzer value in subqueries #60770 (Dmitry Novik).
- Remove recursion when reading from S3 #60849 (Antonio Andelic).
- Fix possible stuck on error in HashedDictionaryParallelLoader #60926 (vdimir).
- Fix async RESTORE with Replicated database (experimental feature) #60934 (Antonio Andelic).
- Fix deadlock in async inserts to
Log
tables via native protocol #61055 (Anton Popov). - Fix lazy execution of default argument in dictGetOrDefault for RangeHashedDictionary #61196 (Kruglov Pavel).
- Fix multiple bugs in groupArraySorted #61203 (Raúl Marín).
- Fix Keeper reconfig for standalone binary #61233 (Antonio Andelic).
- Fix usage of session_token in S3 engine #61234 (Kruglov Pavel).
- Fix possible incorrect result of aggregate function
uniqExact
#61257 (Anton Popov). - Fix bugs in show database #61269 (Raúl Marín).
- Fix logical error in RabbitMQ storage with MATERIALIZED columns #61320 (vdimir).
- Fix CREATE OR REPLACE DICTIONARY #61356 (Vitaly Baranov).
- Fix ATTACH query with external ON CLUSTER #61365 (Nikolay Degterinsky).
- Fix consecutive keys optimization for nullable keys #61393 (Anton Popov).
- fix issue of actions dag split #61458 (Raúl Marín).
- Fix finishing a failed RESTORE #61466 (Vitaly Baranov).
- Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings #61468 (Raúl Marín).
- Allow queuing in restore pool #61475 (Nikita Taranov).
- Fix an inconsistency when reading system.parts using UUID. #61479 (Dan Wu).
- Fix ALTER QUERY MODIFY SQL SECURITY #61480 (pufit).
- Fix a crash in window view (experimental feature) #61526 (Alexey Milovidov).
- Fix
repeat
with non-native integers #61527 (Antonio Andelic). - Fix client's
-s
argument #61530 (Mikhail f. Shiryaev). - Fix crash in arrayPartialReverseSort #61539 (Raúl Marín).
- Fix string search with const position #61547 (Antonio Andelic).
- Fix addDays cause an error when used DateTime64 #61561 (Shuai li).
- Disallow LowCardinality input type for JSONExtract #61617 (Julia Kartseva).
- Fix
system.part_log
for async insert with deduplication #61620 (Antonio Andelic). - Fix a
Non-ready set
exception for system.parts. #61666 (Nikolai Kochetov). - Fix actual_part_name for REPLACE_RANGE (
Entry actual part isn't empty yet
) #61675 (Alexander Tokmakov). - Fix a sanitizer report in
multiSearchAllPositionsCaseInsensitiveUTF8
for incorrect UTF-8 #61749 (pufit). - Fix an observation that the RANGE frame is not supported for Nullable columns. #61766 (YuanLiu).
ClickHouse release 24.2, 2024-02-29
Backward Incompatible Change
- Validate suspicious/experimental types in nested types. Previously we didn't validate such types (except JSON) in nested types like Array/Tuple/Map. #59385 (Kruglov Pavel).
- Add sanity check for number of threads and block sizes. #60138 (Raúl Marín).
- Don't infer floats in exponential notation by default. Add a setting
input_format_try_infer_exponent_floats
that will restore previous behaviour (disabled by default). Closes #59476. #59500 (Kruglov Pavel). - Allow alter operations to be surrounded by parenthesis. The emission of parentheses can be controlled by the
format_alter_operations_with_parentheses
config. By default, in formatted queries the parentheses are emitted as we store the formatted alter operations in some places as metadata (e.g.: mutations). The new syntax clarifies some of the queries where alter operations end in a list. E.g.:ALTER TABLE x MODIFY TTL date GROUP BY a, b, DROP COLUMN c
cannot be parsed properly with the old syntax. In the new syntax the queryALTER TABLE x (MODIFY TTL date GROUP BY a, b), (DROP COLUMN c)
is obvious. Older versions are not able to read the new syntax, therefore using the new syntax might cause issues if newer and older version of ClickHouse are mixed in a single cluster. #59532 (János Benjamin Antal). - Fix for the materialized view security issue, which allowed a user to insert into a table without required grants for that. Fix validates that the user has permission to insert not only into a materialized view but also into all underlying tables. This means that some queries, which worked before, now can fail with
Not enough privileges
. To address this problem, the release introduces a new feature of SQL security for views https://clickhouse.com/docs/en/sql-reference/statements/create/view#sql_security. #54901 #60439 (pufit).
New Feature
- Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. So, a View will encapsulate the grants. #54901 #60439 (pufit).
- Try to detect file format automatically during schema inference if it's unknown in
file/s3/hdfs/url/azureBlobStorage
engines. Closes #50576. #59092 (Kruglov Pavel). - Implement auto-adjustment for asynchronous insert timeouts. The following settings are introduced: async_insert_poll_timeout_ms, async_insert_use_adaptive_busy_timeout, async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms, async_insert_busy_timeout_increase_rate, async_insert_busy_timeout_decrease_rate. #58486 (Julia Kartseva).
- Allow to set up a quota for maximum sequential login failures. #54737 (Alexey Gerasimchuck).
- A new aggregate function
groupArrayIntersect
. Follows up: #49862. #59598 (Yarik Briukhovetskyi). - Backup & Restore support for
AzureBlobStorage
. Resolves #50747. #56988 (SmitaRKulkarni). - The user can now specify the template string directly in the query using
format_schema_rows_template
as an alternative toformat_template_row
. Closes #31363. #59088 (Shaun Struwig). - Implemented automatic conversion of merge tree tables of different kinds to replicated engine. Create empty
convert_to_replicated
file in table's data directory (/clickhouse/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/
) and that table will be converted automatically on next server start. #57798 (Kirill). - Added query
ALTER TABLE table FORGET PARTITION partition
that removes ZooKeeper nodes, related to an empty partition. #59507 (Sergei Trifonov). This is an expert-level feature. - Support JWT credentials file for the NATS table engine. #59543 (Nickolaj Jepsen).
- Implemented
system.dns_cache
table, which can be useful for debugging DNS issues. #59856 (Kirill Nikiforov). - The codec
LZ4HC
will accept a new level 2, which is faster than the previous minimum level 3, at the expense of less compression. In previous versions,LZ4HC(2)
and less was the same asLZ4HC(3)
. Author: Cyan4973. #60090 (Alexey Milovidov). - Implemented
system.dns_cache
table, which can be useful for debugging DNS issues. New server setting dns_cache_max_size. #60257 (Kirill Nikiforov). - Support single-argument version for the
merge
table function, asmerge(['db_name', ] 'tables_regexp')
. #60372 (豪肥肥). - Support negative positional arguments. Closes #57736. #58292 (flynn).
- Support specifying a set of permitted users for specific S3 settings in config using
user
key. #60144 (Antonio Andelic). - Added table function
mergeTreeIndex
. It represents the contents of index and marks files ofMergeTree
tables. It can be used for introspection. Syntax:mergeTreeIndex(database, table, [with_marks = true])
wheredatabase.table
is an existing table withMergeTree
engine. #58140 (Anton Popov).
Experimental Feature
- Added function
seriesOutliersDetectTukey
to detect outliers in series data using Tukey's fences algorithm. #58632 (Bhavna Jindal). Keep in mind that the behavior will be changed in the next patch release. - Add function
variantType
that returns Enum with variant type name for each row. #59398 (Kruglov Pavel). - Support
LEFT JOIN
,ALL INNER JOIN
, and simple subqueries for parallel replicas (only with analyzer). New settingparallel_replicas_prefer_local_join
chooses localJOIN
execution (by default) vsGLOBAL JOIN
. All tables should exist on every replica fromcluster_for_parallel_replicas
. New settingsmin_external_table_block_size_rows
andmin_external_table_block_size_bytes
are used to squash small blocks that are sent for temporary tables (only with analyzer). #58916 (Nikolai Kochetov). - Allow concurrent table creation in the
Replicated
database during adding or recovering a new replica. #59277 (Konstantin Bogdanov). - Implement comparison operator for
Variant
values and proper Field inserting intoVariant
column. Don't allow creatingVariant
type with similar variant types by default (allow uder a settingallow_suspicious_variant_types
) Closes #59996. Closes #59850. #60198 (Kruglov Pavel). - Disable parallel replicas JOIN with CTE (not analyzer) #59239 (Raúl Marín).
Performance Improvement
- Primary key will use less amount of memory. #60049 (Alexey Milovidov).
- Improve memory usage for primary key and some other operations. #60050 (Alexey Milovidov).
- The tables' primary keys will be loaded in memory lazily on first access. This is controlled by the new MergeTree setting
primary_key_lazy_load
, which is on by default. This provides several advantages: - it will not be loaded for tables that are not used; - if there is not enough memory, an exception will be thrown on first use instead of at server startup. This provides several disadvantages: - the latency of loading the primary key will be paid on the first query rather than before accepting connections; this theoretically may introduce a thundering-herd problem. This closes #11188. #60093 (Alexey Milovidov). - Vectorized distance functions used in vector search. #58866 (Robert Schulze).
- Vectorized function
dotProduct
which is useful for vector search. #60202 (Robert Schulze). - Add short-circuit ability for
dictGetOrDefault
function. Closes #52098. #57767 (jsc0218). - Keeper improvement: cache only a certain amount of logs in-memory controlled by
latest_logs_cache_size_threshold
andcommit_logs_cache_size_threshold
. #59460 (Antonio Andelic). - Keeper improvement: reduce size of data node even more. #59592 (Antonio Andelic).
- Continue optimizing branch miss of
if
function when result type isFloat*/Decimal*/*Int*
, follow up of https://github.com/ClickHouse/ClickHouse/pull/57885. #59148 (李扬). - Optimize
if
function when the input type isMap
, the speed-up is up to ~10x. #59413 (李扬). - Improve performance of the
Int8
type by implementing strict aliasing (we already have it forUInt8
and all other integer types). #59485 (Raúl Marín). - Optimize performance of sum/avg conditionally for bigint and big decimal types by reducing branch miss. #59504 (李扬).
- Improve performance of SELECTs with active mutations. #59531 (Azat Khuzhin).
- Optimized function
isNotNull
with AVX2. #59621 (李扬). - Improve ASOF JOIN performance for sorted or almost sorted data. #59731 (Maksim Kita).
- The previous default value equals to 1 MB for
async_insert_max_data_size
appeared to be too small. The new one would be 10 MiB. #59536 (Nikita Mikhaylov). - Use multiple threads while reading the metadata of tables from a backup while executing the RESTORE command. #60040 (Vitaly Baranov).
- Now if
StorageBuffer
has more than 1 shard (num_layers
> 1) background flush will happen simultaneously for all shards in multiple threads. #60111 (alesapin).
Improvement
- When output format is
Pretty
format and a block consists of a single numeric value which exceeds one million, A readable number will be printed on table right. #60379 (rogeryk). - Added settings
split_parts_ranges_into_intersecting_and_non_intersecting_final
andsplit_intersecting_parts_ranges_into_layers_final
. These settings are needed to disable optimizations for queries withFINAL
and needed for debug only. #59705 (Maksim Kita). Actually not only for that - they can also lower memory usage at the expense of performance. - Rename the setting
extract_kvp_max_pairs_per_row
toextract_key_value_pairs_max_pairs_per_row
. The issue (unnecessary abbreviation in the setting name) was introduced in https://github.com/ClickHouse/ClickHouse/pull/43606. Fix the documentation of this setting. #59683 (Alexey Milovidov). #59960 (jsc0218). - Running
ALTER COLUMN MATERIALIZE
on a column withDEFAULT
orMATERIALIZED
expression now precisely follows the semantics. #58023 (Duc Canh Le). - Enabled an exponential backoff logic for errors during mutations. It will reduce the CPU usage, memory usage and log file sizes. #58036 (MikhailBurdukov).
- Add improvement to count the
InitialQuery
Profile Event. #58195 (Unalian). - Allow to define
volume_priority
instorage_configuration
. #58533 (Andrey Zvonov). - Add support for the
Date32
type in theT64
codec. #58738 (Hongbin Ma). - Allow trailing commas in types with several items. #59119 (Aleksandr Musorin).
- Settings for the Distributed table engine can now be specified in the server configuration file (similar to MergeTree settings), e.g.
<distributed> <flush_on_detach>false</flush_on_detach> </distributed>
. #59291 (Azat Khuzhin). - Retry disconnects and expired sessions when reading
system.zookeeper
. This is helpful when reading many rows fromsystem.zookeeper
table especially in the presence of fault-injected disconnects. #59388 (Alexander Gololobov). - Do not interpret numbers with leading zeroes as octals when
input_format_values_interpret_expressions=0
. #59403 (Joanna Hulboj). - At startup and whenever config files are changed, ClickHouse updates the hard memory limits of its total memory tracker. These limits are computed based on various server settings and cgroups limits (on Linux). Previously, setting
/sys/fs/cgroup/memory.max
(for cgroups v2) was hard-coded. As a result, cgroup v2 memory limits configured for nested groups (hierarchies), e.g./sys/fs/cgroup/my/nested/group/memory.max
were ignored. This is now fixed. The behavior of v1 memory limits remains unchanged. #59435 (Robert Schulze). - New profile events added to observe the time spent on calculating PK/projections/secondary indices during
INSERT
-s. #59436 (Nikita Taranov). - Allow to define a starting point for S3Queue with Ordered mode at the creation using a setting
s3queue_last_processed_path
. #59446 (Kseniia Sumarokova). - Made comments for system tables also available in
system.tables
inclickhouse-local
. #59493 (Nikita Mikhaylov). system.zookeeper
table: previously the whole result was accumulated in memory and returned as one big chunk. This change should help to reduce memory consumption when reading many rows fromsystem.zookeeper
, allow showing intermediate progress (how many rows have been read so far) and avoid hitting connection timeout when result set is big. #59545 (Alexander Gololobov).- Now dashboard understands both compressed and uncompressed state of URL's #hash (backward compatibility). Continuation of #59124 . #59548 (Amos Bird).
- Bumped Intel QPL (used by codec
DEFLATE_QPL
) from v1.3.1 to v1.4.0 . Also fixed a bug for polling timeout mechanism, as we observed in same cases timeout won't work properly, if timeout happen, IAA and CPU may process buffer concurrently. So far, we'd better make sure IAA codec status is not QPL_STS_BEING_PROCESSED, then fallback to SW codec. #59551 (jasperzhu). - Do not show a warning about the server version in ClickHouse Cloud because ClickHouse Cloud handles seamless upgrades automatically. #59657 (Alexey Milovidov).
- After self-extraction temporary binary is moved instead copying. #59661 (Yakov Olkhovskiy).
- Fix stack unwinding on Apple macOS. This closes #53653. #59690 (Nikita Mikhaylov).
- Check for stack overflow in parsers even if the user misconfigured the
max_parser_depth
setting to a very high value. This closes #59622. #59697 (Alexey Milovidov). #60434 - Unify XML and SQL created named collection behaviour in Kafka storage. #59710 (Pervakov Grigorii).
- In case when
merge_max_block_size_bytes
is small enough and tables contain wide rows (strings or tuples) background merges may stuck in an endless loop. This behaviour is fixed. Follow-up for https://github.com/ClickHouse/ClickHouse/pull/59340. #59812 (Nikita Mikhaylov). - Allow uuid in replica_path if CREATE TABLE explicitly has it. #59908 (Azat Khuzhin).
- Add column
metadata_version
of ReplicatedMergeTree table insystem.tables
system table. #59942 (Maksim Kita). - Keeper improvement: send only Keeper related metrics/events for Prometheus. #59945 (Antonio Andelic).
- The dashboard will display metrics across different ClickHouse versions even if the structure of system tables has changed after the upgrade. #59967 (Alexey Milovidov).
- Allow loading AZ info from a file. #59976 (Konstantin Bogdanov).
- Keeper improvement: add retries on failures for Disk related operations. #59980 (Antonio Andelic).
- Add new config setting
backups.remove_backup_files_after_failure
:<clickhouse> <backups> <remove_backup_files_after_failure>true</remove_backup_files_after_failure> </backups> </clickhouse>
. #60002 (Vitaly Baranov). - Copy S3 file GCP fallback to buffer copy in case GCP returned
Internal Error
withGATEWAY_TIMEOUT
HTTP error code. #60164 (Maksim Kita). - Short circuit execution for
ULIDStringToDateTime
. #60211 (Juan Madurga). - Added
query_id
column for tablessystem.backups
andsystem.backup_log
. Added error stacktrace toerror
column. #60220 (Maksim Kita). - Connections through the MySQL port now automatically run with setting
prefer_column_name_to_alias = 1
to support QuickSight out-of-the-box. Also, settingsmysql_map_string_to_text_in_show_columns
andmysql_map_fixed_string_to_text_in_show_columns
are now enabled by default, affecting also only MySQL connections. This increases compatibility with more BI tools. #60365 (Robert Schulze). - Fix a race condition in JavaScript code leading to duplicate charts on top of each other. #60392 (Alexey Milovidov).
Build/Testing/Packaging Improvement
- Added builds and tests with coverage collection with introspection. Continuation of #56102. #58792 (Alexey Milovidov).
- Update the Rust toolchain in
corrosion-cmake
when the CMake cross-compilation toolchain variable is set. #59309 (Aris Tritas). - Add some fuzzing to ASTLiterals. #59383 (Raúl Marín).
- If you want to run initdb scripts every time when ClickHouse container is starting you shoud initialize environment varible CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS. #59808 (Alexander Nikolaev).
- Remove ability to disable generic clickhouse components (like server/client/...), but keep some that requires extra libraries (like ODBC or keeper). #59857 (Azat Khuzhin).
- Query fuzzer will fuzz SETTINGS inside queries. #60087 (Alexey Milovidov).
- Add support for building ClickHouse with clang-19 (master). #60448 (Alexey Milovidov).
Bug Fix (user-visible misbehavior in an official stable release)
- Fix a "Non-ready set" error in TTL WHERE. #57430 (Nikolai Kochetov).
- Fix a bug in the
quantilesGK
function #58216 (李扬). - Fix a wrong behavior with
intDiv
for Decimal arguments #59243 (Yarik Briukhovetskyi). - Fix
translate
with FixedString input #59356 (Raúl Marín). - Fix digest calculation in Keeper #59439 (Antonio Andelic).
- Fix stacktraces for binaries without debug symbols #59444 (Azat Khuzhin).
- Fix
ASTAlterCommand::formatImpl
in case of column specific settings… #59445 (János Benjamin Antal). - Fix
SELECT * FROM [...] ORDER BY ALL
with Analyzer #59462 (zhongyuankai). - Fix possible uncaught exception during distributed query cancellation #59487 (Azat Khuzhin).
- Make MAX use the same rules as permutation for complex types #59498 (Raúl Marín).
- Fix corner case when passing
update_insert_deduplication_token_in_dependent_materialized_views
#59544 (Jordi Villar). - Fix incorrect result of arrayElement / map on empty value #59594 (Raúl Marín).
- Fix crash in topK when merging empty states #59603 (Raúl Marín).
- Fix distributed table with a constant sharding key #59606 (Vitaly Baranov).
- Fix KQL issue found by WingFuzz #59626 (Yong Wang).
- Fix error "Read beyond last offset" for AsynchronousBoundedReadBuffer #59630 (Vitaly Baranov).
- Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor #59658 (Raúl Marín).
- Fix query start time on non initial queries #59662 (Raúl Marín).
- Validate types of arguments for
minmax
skipping index #59733 (Anton Popov). - Fix leftPad / rightPad function with FixedString input #59739 (Raúl Marín).
- Fix AST fuzzer issue in function
countMatches
#59752 (Robert Schulze). - RabbitMQ: fix having neither acked nor nacked messages #59775 (Kseniia Sumarokova).
- Fix StorageURL doing some of the query execution in single thread #59833 (Michael Kolupaev).
- S3Queue: fix uninitialized value #59897 (Kseniia Sumarokova).
- Fix parsing of partition expressions surrounded by parens #59901 (János Benjamin Antal).
- Fix crash in JSONColumnsWithMetadata format over HTTP #59925 (Kruglov Pavel).
- Do not rewrite sum to count if the return value differs in Analyzer #59926 (Azat Khuzhin).
- UniqExactSet read crash fix #59928 (Maksim Kita).
- ReplicatedMergeTree invalid metadata_version fix #59946 (Maksim Kita).
- Fix data race in
StorageDistributed
#59987 (Nikita Taranov). - Docker: run init scripts when option is enabled rather than disabled #59991 (jktng).
- Fix INSERT into
SQLite
with single quote (by escaping single quotes with a quote instead of backslash) #60015 (Azat Khuzhin). - Fix several logical errors in
arrayFold
#60022 (Raúl Marín). - Fix optimize_uniq_to_count removing the column alias #60026 (Raúl Marín).
- Fix possible exception from S3Queue table on drop #60036 (Kseniia Sumarokova).
- Fix formatting of NOT with single literals #60042 (Raúl Marín).
- Use max_query_size from context in DDLLogEntry instead of hardcoded 4096 #60083 (Kruglov Pavel).
- Fix inconsistent formatting of queries containing tables named
table
. Fix wrong formatting of queries withUNION ALL
,INTERSECT
, andEXCEPT
when their structure wasn't linear. This closes #52349. Fix wrong formatting ofSYSTEM
queries, includingSYSTEM ... DROP FILESYSTEM CACHE
,SYSTEM ... REFRESH/START/STOP/CANCEL/TEST VIEW
,SYSTEM ENABLE/DISABLE FAILPOINT
. Fix formatting of parameterized DDL queries. Fix the formatting of theDESCRIBE FILESYSTEM CACHE
query. Fix incorrect formatting of theSET param_...
(a query setting a parameter). Fix incorrect formatting ofCREATE INDEX
queries. Fix inconsistent formatting ofCREATE USER
and similar queries. Fix inconsistent formatting ofCREATE SETTINGS PROFILE
. Fix incorrect formatting ofALTER ... MODIFY REFRESH
. Fix inconsistent formatting of window functions if frame offsets were expressions. Fix inconsistent formatting ofRESPECT NULLS
andIGNORE NULLS
if they were used after a function that implements an operator (such asplus
). Fix idiotic formatting ofSYSTEM SYNC REPLICA ... LIGHTWEIGHT FROM ...
. Fix inconsistent formatting of invalid queries withGROUP BY GROUPING SETS ... WITH ROLLUP/CUBE/TOTALS
. Fix inconsistent formatting ofGRANT CURRENT GRANTS
. Fix inconsistent formatting ofCREATE TABLE (... COLLATE)
. Additionally, I fixed the incorrect formatting ofEXPLAIN
in subqueries (#60102). Fixed incorrect formatting of lambda functions (#60012). Added a check so there is no way to miss these abominations in the future. #60095 (Alexey Milovidov). - Fix inconsistent formatting of explain in subqueries #60102 (Alexey Milovidov).
- Fix cosineDistance crash with Nullable #60150 (Raúl Marín).
- Allow casting of bools in string representation to true bools #60160 (Robert Schulze).
- Fix
system.s3queue_log
#60166 (Kseniia Sumarokova). - Fix arrayReduce with nullable aggregate function name #60188 (Raúl Marín).
- Hide sensitive info for
S3Queue
#60233 (Kseniia Sumarokova). - Fix http exception codes. #60252 (Austin Kothig).
- S3Queue: fix a bug (also fixes flaky test_storage_s3_queue/test.py::test_shards_distributed) #60282 (Kseniia Sumarokova).
- Fix use-of-uninitialized-value and invalid result in hashing functions with IPv6 #60359 (Kruglov Pavel).
- Fix OptimizeDateOrDateTimeConverterWithPreimageVisitor with null arguments #60453 (Raúl Marín).
- Fixed a minor bug that prevented distributed table queries sent from either KQL or PRQL dialect clients to be executed on replicas. #59674. #60470 (Alexey Milovidov) #59674 (Austin Kothig).
ClickHouse release 24.1, 2024-01-30
Backward Incompatible Change
- The setting
print_pretty_type_names
is turned on by default. You can turn it off to keep the old behavior orSET compatibility = '23.12'
. #57726 (Alexey Milovidov). - The MergeTree setting
clean_deleted_rows
is deprecated, it has no effect anymore. TheCLEANUP
keyword forOPTIMIZE
is not allowed by default (unlessallow_experimental_replacing_merge_with_cleanup
is enabled). #58316 (Alexander Tokmakov). - The function
reverseDNSQuery
is no longer available. This closes #58368. #58369 (Alexey Milovidov). - Enable various changes to improve the access control in the configuration file. These changes affect the behavior, and you check the
config.xml
in theaccess_control_improvements
section. In case you are not confident, keep the values in the configuration file as they were in the previous version. #58584 (Alexey Milovidov). - Improve the operation of
sumMapFiltered
with NaN values. NaN values are now placed at the end (instead of randomly) and considered different from any values.-0
is now also treated as equal to0
; since 0 values are discarded,-0
values are discarded too. #58959 (Raúl Marín). - The function
visibleWidth
will behave according to the docs. In previous versions, it simply counted code points after string serialization, like thelengthUTF8
function, but didn't consider zero-width and combining characters, full-width characters, tabs, and deletes. Now the behavior is changed accordingly. If you want to keep the old behavior, setfunction_visible_width_behavior
to0
, or setcompatibility
to23.12
or lower. #59022 (Alexey Milovidov). Kusto
dialect is disabled until these two bugs will be fixed: #59037 and #59036. #59305 (Alexey Milovidov). Any attempt to useKusto
will result in exception.- More efficient implementation of the
FINAL
modifier no longer guarantees preserving the order even ifmax_threads = 1
. If you counted on the previous behavior, setenable_vertical_final
to 0 orcompatibility
to23.12
.
New Feature
- Implement Variant data type that represents a union of other data types. Type
Variant(T1, T2, ..., TN)
means that each row of this type has a value of either typeT1
orT2
or ... orTN
or none of them (NULL
value). Variant type is available under a settingallow_experimental_variant_type
. Reference: #54864. #58047 (Kruglov Pavel). - Certain settings (currently
min_compress_block_size
andmax_compress_block_size
) can now be specified at column-level where they take precedence over the corresponding table-level setting. Example:CREATE TABLE tab (col String SETTINGS (min_compress_block_size = 81920, max_compress_block_size = 163840)) ENGINE = MergeTree ORDER BY tuple();
. #55201 (Duc Canh Le). - Add
quantileDD
aggregate function as well as the correspondingquantilesDD
andmedianDD
. It is based on the DDSketch https://www.vldb.org/pvldb/vol12/p2195-masson.pdf. ### Documentation entry for user-facing changes. #56342 (Srikanth Chekuri). - Allow to configure any kind of object storage with any kind of metadata type. #58357 (Kseniia Sumarokova).
- Added
null_status_on_timeout_only_active
andthrow_only_active
modes fordistributed_ddl_output_mode
that allow to avoid waiting for inactive replicas. #58350 (Alexander Tokmakov). - Add function
arrayShingles
to compute subarrays, e.g.arrayShingles([1, 2, 3, 4, 5], 3)
returns[[1,2,3],[2,3,4],[3,4,5]]
. #58396 (Zheng Miao). - Added functions
punycodeEncode
,punycodeDecode
,idnaEncode
andidnaDecode
which are useful for translating international domain names to an ASCII representation according to the IDNA standard. #58454 (Robert Schulze). - Added string similarity functions
dramerauLevenshteinDistance
,jaroSimilarity
andjaroWinklerSimilarity
. #58531 (Robert Schulze). - Add two settings
output_format_compression_level
to change output compression level andoutput_format_compression_zstd_window_log
to explicitly set compression window size and enable long-range mode for zstd compression if output compression method iszstd
. Applied forINTO OUTFILE
and when writing to table functionsfile
,url
,hdfs
,s3
, andazureBlobStorage
. #58539 (Duc Canh Le). - Automatically disable ANSI escape sequences in Pretty formats if the output is not a terminal. Add new
auto
mode to settingoutput_format_pretty_color
. #58614 (Shaun Struwig). - Added function
sqidDecode
which decodes Sqids. #58544 (Robert Schulze). - Allow to read Bool values into String in JSON input formats. It's done under a setting
input_format_json_read_bools_as_strings
that is enabled by default. #58561 (Kruglov Pavel). - Added function
seriesDecomposeSTL
which decomposes a time series into a season, a trend and a residual component. #57078 (Bhavna Jindal). - Introduced MySQL Binlog Client for MaterializedMySQL: One binlog connection for many databases. #57323 (Val Doroshchuk).
- Intel QuickAssist Technology (QAT) provides hardware-accelerated compression and cryptograpy. ClickHouse got a new compression codec
ZSTD_QAT
which utilizes QAT for zstd compression. The codec uses Intel's QATlib and Inte's QAT ZSTD Plugin. Right now, only compression can be accelerated in hardware (a software fallback kicks in in case QAT could not be initialized), decompression always runs in software. #57509 (jasperzhu). - Implementing the new way how object storage keys are generated for s3 disks. Now the format could be defined in terms of
re2
regex syntax withkey_template
option in disc description. #57663 (Sema Checherinda). - Table system.dropped_tables_parts contains parts of system.dropped_tables tables (dropped but not yet removed tables). #58038 (Yakov Olkhovskiy).
- Add settings
max_materialized_views_size_for_table
to limit the number of materialized views attached to a table. #58068 (zhongyuankai). clickhouse-format
improvements: support INSERT queries withVALUES
; support comments (use--comments
to output them); support--max_line_length
option to format only long queries in multiline. #58246 (vdimir).- Attach all system tables in
clickhouse-local
, includingsystem.parts
. This closes #58312. #58359 (Alexey Milovidov). - Support for
Enum
data types in functiontransform
. This closes #58241. #58360 (Alexey Milovidov). - Add table
system.database_engines
. #58390 (Bharat Nallan). Allow registering database engines independently in the codebase. #58365 (Bharat Nallan). Allow registering interpreters independently. #58443 (Bharat Nallan). - Added
FROM <Replicas>
modifier forSYSTEM SYNC REPLICA LIGHTWEIGHT
query. With theFROM
modifier ensures we wait for fetches and drop-ranges only for the specified source replicas, as well as any replica not in zookeeper or with an empty source_replica. #58393 (Jayme Bird). - Added setting
update_insert_deduplication_token_in_dependent_materialized_views
. This setting allows to update insert deduplication token with table identifier during insert in dependent materialized views. Closes #59165. #59238 (Maksim Kita). - Added statement
SYSTEM RELOAD ASYNCHRONOUS METRICS
which updates the asynchronous metrics. Mostly useful for testing and development. #53710 (Robert Schulze).
Performance Improvement
- Coordination for parallel replicas is rewritten for better parallelism and cache locality. It has been tested for linear scalability on hundreds of replicas. It also got support for reading in order. #57968 (Nikita Taranov).
- Replace HTTP outgoing buffering based with the native ClickHouse buffers. Add bytes counting metrics for interfaces. #56064 (Yakov Olkhovskiy).
- Large aggregation states of
uniqExact
will be merged in parallel in distrubuted queries. #59009 (Nikita Taranov). - Lower memory usage after reading from
MergeTree
tables. #59290 (Anton Popov). - Lower memory usage in vertical merges. #59340 (Anton Popov).
- Avoid huge memory consumption during Keeper startup for more cases. #58455 (Antonio Andelic).
- Keeper improvement: reduce Keeper's memory usage for stored nodes. #59002 (Antonio Andelic).
- More cache-friendly final implementation. Note on the behaviour change: previously queries with
FINAL
modifier that read with a single stream (e.g.max_threads = 1
) produced sorted output without explicitly providedORDER BY
clause. This is no longer guaranteed whenenable_vertical_final = true
(and it is so by default). #54366 (Duc Canh Le). - Bypass extra copying in
ReadBufferFromIStream
which is used, e.g., for reading from S3. #56961 (Nikita Taranov). - Optimize array element function when input is Array(Map)/Array(Array(Num)/Array(Array(String))/Array(BigInt)/Array(Decimal). The previous implementations did more allocations than needed. The optimization speed up is up to ~6x especially when input type is Array(Map). #56403 (李扬).
- Read column once while reading more than one subcolumn from it in compact parts. #57631 (Kruglov Pavel).
- Rewrite the AST of
sum(column + constant)
function. This is available as an optimization pass for Analyzer #57853 (Jiebin Sun). - The evaluation of function
match
now utilizes skipping indicesngrambf_v1
andtokenbf_v1
. #57882 (凌涛). - The evaluation of function
match
now utilizes inverted indices. #58284 (凌涛). - MergeTree
FINAL
does not compare rows from same non-L0 part. #58142 (Duc Canh Le). - Speed up iota calls (filling array with consecutive numbers). #58271 (Raúl Marín).
- Speedup MIN/MAX for non-numeric types. #58334 (Raúl Marín).
- Optimize the combination of filters (like in multi-stage PREWHERE) with BMI2/SSE intrinsics #58800 (Zhiguo Zhou).
- Use one thread less in
clickhouse-local
. #58968 (Alexey Milovidov). - Improve the
multiIf
function performance when the type is Nullable. #57745 (KevinyhZou). - Add
SYSTEM JEMALLOC PURGE
for purging unused jemalloc pages,SYSTEM JEMALLOC [ ENABLE | DISABLE | FLUSH ] PROFILE
for controlling jemalloc profile if the profiler is enabled. Add jemalloc-related 4LW command in Keeper:jmst
for dumping jemalloc stats,jmfp
,jmep
,jmdp
for controlling jemalloc profile if the profiler is enabled. #58665 (Antonio Andelic). - Lower memory consumption in backups to S3. #58962 (Vitaly Baranov).
Improvement
- Added comments (brief descriptions) to all columns of system tables. There are several reasons for this: - We use system tables a lot, and sometimes it could be very difficult for developer to understand the purpose and the meaning of a particular column. - We change (add new ones or modify existing) system tables a lot and the documentation for them is always outdated. For example take a look at the documentation page for
system.parts
. It misses a lot of columns - We would like to eventually generate documentation directly from ClickHouse. #58356 (Nikita Mikhaylov). - Allow queries without aliases for subqueries for
PASTE JOIN
. #58654 (Yarik Briukhovetskyi). - Enable
MySQL
/MariaDB
integration on macOS. This closes #21191. #46316 (Alexey Milovidov) (Robert Schulze). - Disable
max_rows_in_set_to_optimize_join
by default. #56396 (vdimir). - Add
<host_name>
config parameter that allows avoiding resolving hostnames in ON CLUSTER DDL queries and Replicated database engines. This mitigates the possibility of the queue being stuck in case of a change in cluster definition. Closes #57573. #57603 (Nikolay Degterinsky). - Increase
load_metadata_threads
to 16 for the filesystem cache. It will make the server start up faster. #57732 (Alexey Milovidov). - Add ability to throttle merges/mutations (
max_mutations_bandwidth_for_server
/max_merges_bandwidth_for_server
). #57877 (Azat Khuzhin). - Replaced undocumented (boolean) column
is_hot_reloadable
in system tablesystem.server_settings
by (Enum8) columnchangeable_without_restart
with possible valuesNo
,Yes
,IncreaseOnly
andDecreaseOnly
. Also documented the column. #58029 (skyoct). - Cluster discovery supports setting username and password, close #58063. #58123 (vdimir).
- Support query parameters in
ALTER TABLE ... PART
. #58297 (Azat Khuzhin). - Create consumers for Kafka tables on the fly (but keep them for some period -
kafka_consumers_pool_ttl_ms
, since last used), this should fix problem with statistics forsystem.kafka_consumers
(that does not consumed when nobody reads from Kafka table, which leads to live memory leak and slow table detach) and also this PR enables stats forsystem.kafka_consumers
by default again. #58310 (Azat Khuzhin). sparkBar
as an alias tosparkbar
. #58335 (凌涛).- Avoid sending
ComposeObject
requests after upload toGCS
. #58343 (Azat Khuzhin). - Correctly handle keys with dot in the name in configurations XMLs. #58354 (Azat Khuzhin).
- Make function
format
return constant on constant arguments. This closes #58355. #58358 (Alexey Milovidov). - Adding a setting
max_estimated_execution_time
to separatemax_execution_time
andmax_estimated_execution_time
. #58402 (Zhang Yifan). - Provide a hint when an invalid database engine name is used. #58444 (Bharat Nallan).
- Add settings for better control of indexes type in Arrow dictionary. Use signed integer type for indexes by default as Arrow recommends. Closes #57401. #58519 (Kruglov Pavel).
- Implement #58575 Support
CLICKHOUSE_PASSWORD_FILE
environment variable when running the docker image. #58583 (Eyal Halpern Shalev). - When executing some queries, which require a lot of streams for reading data, the error
"Paste JOIN requires sorted tables only"
was previously thrown. Now the numbers of streams resize to 1 in that case. #58608 (Yarik Briukhovetskyi). - Better message for INVALID_IDENTIFIER error. #58703 (Yakov Olkhovskiy).
- Improved handling of signed numeric literals in normalizeQuery. #58710 (Salvatore Mesoraca).
- Support Point data type for MySQL. #58721 (Kseniia Sumarokova).
- When comparing a Float32 column and a const string, read the string as Float32 (instead of Float64). #58724 (Raúl Marín).
- Improve S3 compatibility, add ECloud EOS storage support. #58786 (xleoken).
- Allow
KILL QUERY
to cancel backups / restores. This PR also makes running backups and restores visible insystem.processes
. Also, there is a new setting in the server configuration now -shutdown_wait_backups_and_restores
(default=true) which makes the server either wait on shutdown for all running backups and restores to finish or just cancel them. #58804 (Vitaly Baranov). - Avro format to support ZSTD codec. Closes #58735. #58805 (flynn).
- MySQL interface gained support for
net_write_timeout
andnet_read_timeout
settings.net_write_timeout
is translated into the nativesend_timeout
ClickHouse setting and, similarly,net_read_timeout
intoreceive_timeout
. Fixed an issue where it was possible to set MySQLsql_select_limit
setting only if the entire statement was in upper case. #58835 (Serge Klochkov). - A better exception message while conflict of creating dictionary and table with the same name. #58841 (Yarik Briukhovetskyi).
- Make sure that for custom (created from SQL) disks ether
filesystem_caches_path
(a common directory prefix for all filesystem caches) orcustom_cached_disks_base_directory
(a common directory prefix for only filesystem caches created from custom disks) is specified in server config.custom_cached_disks_base_directory
has higher priority for custom disks overfilesystem_caches_path
, which is used if the former one is absent. Filesystem cache settingpath
must lie inside that directory, otherwise exception will be thrown preventing disk to be created. This will not affect disks created on an older version and server was upgraded - then the exception will not be thrown to allow the server to successfully start).custom_cached_disks_base_directory
is added to default server config as/var/lib/clickhouse/caches/
. Closes #57825. #58869 (Kseniia Sumarokova). - MySQL interface gained compatibility with
SHOW WARNINGS
/SHOW COUNT(*) WARNINGS
queries, though the returned result is always an empty set. #58929 (Serge Klochkov). - Skip unavailable replicas when executing parallel distributed
INSERT SELECT
. #58931 (Alexander Tokmakov). - Display word-descriptive log level while enabling structured log formatting in json. #58936 (Tim Liou).
- MySQL interface gained support for
CAST(x AS SIGNED)
andCAST(x AS UNSIGNED)
statements via data type aliases:SIGNED
for Int64, andUNSIGNED
for UInt64. This improves compatibility with BI tools such as Looker Studio. #58954 (Serge Klochkov). - Change working directory to the data path in docker container. #58975 (cangyin).
- Added setting for Azure Blob Storage
azure_max_unexpected_write_error_retries
, can also be set from config under azure section. #59001 (SmitaRKulkarni). - Allow server to start with broken data lake table. Closes #58625. #59080 (Kseniia Sumarokova).
- Allow to ignore schema evolution in the
Iceberg
table engine and read all data using schema specified by the user on table creation or latest schema parsed from metadata on table creation. This is done under a settingiceberg_engine_ignore_schema_evolution
that is disabled by default. Note that enabling this setting can lead to incorrect result as in case of evolved schema all data files will be read using the same schema. #59133 (Kruglov Pavel). - Prohibit mutable operations (
INSERT
/ALTER
/OPTIMIZE
/...) on read-only/write-once storages with a properTABLE_IS_READ_ONLY
error (to avoid leftovers). Avoid leaving left-overs on write-once disks (format_version.txt
) onCREATE
/ATTACH
. IgnoreDROP
forReplicatedMergeTree
(so as forMergeTree
). Fix iterating overs3_plain
(MetadataStorageFromPlainObjectStorage::iterateDirectory
). Note read-only isweb
disk, and write-once iss3_plain
. #59170 (Azat Khuzhin). - Fix bug in the experimental
_block_number
column which could lead to logical error during complex combination ofALTER
s andmerge
s. Fixes #56202. Replaces #58601. #59295 (alesapin). - Play UI understands when an exception is returned inside JSON. Adjustment for #52853. #59303 (Alexey Milovidov).
/binary
HTTP handler allows to specify user, host, and optionally, password in the query string. #59311 (Alexey Milovidov).- Support backups for compressed in-memory tables. This closes #57893. #59315 (Alexey Milovidov).
- Support the
FORMAT
clause inBACKUP
andRESTORE
queries. #59338 (Vitaly Baranov). - Function
concatWithSeparator
now supports arbitrary argument types (instead of onlyString
andFixedString
arguments). For example,SELECT concatWithSeparator('.', 'number', 1)
now returnsnumber.1
. #59341 (Robert Schulze).
Build/Testing/Packaging Improvement
- Improve aliases for clickhouse binary (now
ch
/clickhouse
isclickhouse-local
orclickhouse
depends on the arguments) and add bash completion for new aliases. #58344 (Azat Khuzhin). - Add settings changes check to CI to check that all settings changes are reflected in settings changes history. #58555 (Kruglov Pavel).
- Use tables directly attached from S3 in stateful tests. #58791 (Alexey Milovidov).
- Save the whole
fuzzer.log
as an archive instead of the last 100k lines.tail -n 100000
often removes lines with table definitions. Example:. #58821 (Dmitry Novik). - Enable Rust on macOS with Aarch64 (this will add fuzzy search in client with skim and the PRQL language, though I don't think that are people who host ClickHouse on darwin, so it is mostly for fuzzy search in client I would say). #59272 (Azat Khuzhin).
- Fix aggregation issue in mixed x86_64 and ARM clusters #59132 (Harry Lee).
Bug Fix (user-visible misbehavior in an official stable release)
- Add join keys conversion for nested LowCardinality #51550 (vdimir).
- Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) #56132 (Kruglov Pavel).
- Fix a bug with projections and the
aggregate_functions_null_for_empty
setting during insertion. #56944 (Amos Bird). - Fixed potential exception due to stale profile UUID #57263 (Vasily Nemkov).
- Fix working with read buffers in StreamingFormatExecutor #57438 (Kruglov Pavel).
- Ignore MVs with dropped target table during pushing to views #57520 (Kruglov Pavel).
- Eliminate possible race between ALTER_METADATA and MERGE_PARTS #57755 (Azat Khuzhin).
- Fix the expressions order bug in group by with rollup #57786 (Chen768959).
- A fix for the obsolete "zero-copy" replication feature: Fix lost blobs after dropping a replica with broken detached parts #58333 (Alexander Tokmakov).
- Allow users to work with symlinks in user_files_path #58447 (Duc Canh Le).
- Fix a crash when graphite table does not have an agg function #58453 (Duc Canh Le).
- Delay reading from StorageKafka to allow multiple reads in materialized views #58477 (János Benjamin Antal).
- Fix a stupid case of intersecting parts #58482 (Alexander Tokmakov).
- MergeTreePrefetchedReadPool disable for LIMIT only queries #58505 (Maksim Kita).
- Enable ordinary databases while restoration #58520 (Jihyuk Bok).
- Fix Apache Hive threadpool reading for ORC/Parquet/... #58537 (sunny).
- Hide credentials in
system.backup_log
'sbase_backup_name
column #58550 (Daniel Pozo Escalona). toStartOfInterval
for milli- microsencods values rounding #58557 (Yarik Briukhovetskyi).- Disable
max_joined_block_rows
in ConcurrentHashJoin #58595 (vdimir). - Fix join using nullable in the old analyzer #58596 (vdimir).
makeDateTime64
: Allow non-const fraction argument #58597 (Robert Schulze).- Fix possible NULL dereference during symbolizing inline frames #58607 (Azat Khuzhin).
- Improve isolation of query cache entries under re-created users or role switches #58611 (Robert Schulze).
- Fix broken partition key analysis when doing projection optimization #58638 (Amos Bird).
- Query cache: Fix per-user quota #58731 (Robert Schulze).
- Fix stream partitioning in parallel window functions #58739 (Dmitry Novik).
- Fix double destroy call on exception throw in addBatchLookupTable8 #58745 (Raúl Marín).
- Don't process requests in Keeper during shutdown #58765 (Antonio Andelic).
- Fix a null pointer dereference in
SlabsPolygonIndex::find
#58771 (Yarik Briukhovetskyi). - Fix JSONExtract function for LowCardinality(Nullable) columns #58808 (vdimir).
- A fix for unexpected accumulation of memory usage while creating a huge number of tables by CREATE and DROP. #58831 (Maksim Kita).
- Multiple read file log storage in mv #58877 (János Benjamin Antal).
- Restriction for the access key id for s3. #58900 (MikhailBurdukov).
- Fix possible crash in clickhouse-local during loading suggestions #58907 (Kruglov Pavel).
- Fix crash when
indexHint
is used #58911 (Dmitry Novik). - Fix StorageURL forgetting headers on server restart #58933 (Michael Kolupaev).
- Analyzer: fix storage replacement with insertion block #58958 (Yakov Olkhovskiy).
- Fix seek in ReadBufferFromZipArchive #58966 (Michael Kolupaev).
- A fix for experimental inverted indices (don't use in production):
DROP INDEX
of inverted index now removes all relevant files from persistence #59040 (mochi). - Fix data race on query_factories_info #59049 (Kseniia Sumarokova).
- Disable "Too many redirects" error retry #59099 (skyoct).
- Fix not started database shutdown deadlock #59137 (Sergei Trifonov).
- Fix: LIMIT BY and LIMIT in distributed query #59153 (Igor Nikonov).
- Fix crash with nullable timezone for
toString
#59190 (Yarik Briukhovetskyi). - Fix abort in iceberg metadata on bad file paths #59275 (Kruglov Pavel).
- Fix architecture name in select of Rust target #59307 (p1rattttt).
- Fix a logical error about "not-ready set" for querying from
system.tables
with a subquery in the IN clause. #59351 (Nikolai Kochetov).