mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-15 10:52:30 +00:00
38 KiB
38 KiB
Table of Contents
ClickHouse release v24.1, 2024-01-30
Changelog for 2023
2024 Changelog
ClickHouse release 24.1, 2024-01-30
Backward Incompatible Change
- The setting
print_pretty_type_names
is turned on by default. You can turn it off to keep the old behavior orSET compatibility = '23.12'
. #57726 (Alexey Milovidov). - The MergeTree setting
clean_deleted_rows
is deprecated, it has no effect anymore. TheCLEANUP
keyword forOPTIMIZE
is not allowed by default (unlessallow_experimental_replacing_merge_with_cleanup
is enabled). #58316 (Alexander Tokmakov). - The function
reverseDNSQuery
is no longer available. This closes #58368. #58369 (Alexey Milovidov). - Enable various changes to improve the access control in the configuration file. These changes affect the behavior, and you check the
config.xml
in theaccess_control_improvements
section. In case you are not confident, keep the values in the configuration file as they were in the previous version. #58584 (Alexey Milovidov). - Improve the operation of
sumMapFiltered
with NaN values. NaN values are now placed at the end (instead of randomly) and considered different from any values.-0
is now also treated as equal to0
; since 0 values are discarded,-0
values are discarded too. #58959 (Raúl Marín). - The function
visibleWidth
will behave according to the docs. In previous versions, it simply counted code points after string serialization, like thelengthUTF8
function, but didn't consider zero-width and combining characters, full-width characters, tabs, and deletes. Now the behavior is changed accordingly. If you want to keep the old behavior, setfunction_visible_width_behavior
to0
, or setcompatibility
to23.12
or lower. #59022 (Alexey Milovidov). Kusto
dialect is disabled until these two bugs will be fixed: #59037 and #59036. #59305 (Alexey Milovidov). Any attempt to useKusto
will result in exception.- More efficient implementation of the
FINAL
modifier no longer guarantees preserving the order even ifmax_threads = 1
. If you counted on the previous behavior, setenable_vertical_final
to 0 orcompatibility
to23.12
.
New Feature
- Implement Variant data type that represents a union of other data types. Type
Variant(T1, T2, ..., TN)
means that each row of this type has a value of either typeT1
orT2
or ... orTN
or none of them (NULL
value). Variant type is available under a settingallow_experimental_variant_type
. Reference: #54864. #58047 (Kruglov Pavel). - Certain settings (currently
min_compress_block_size
andmax_compress_block_size
) can now be specified at column-level where they take precedence over the corresponding table-level setting. Example:CREATE TABLE tab (col String SETTINGS (min_compress_block_size = 81920, max_compress_block_size = 163840)) ENGINE = MergeTree ORDER BY tuple();
. #55201 (Duc Canh Le). - Add
quantileDD
aggregate function as well as the correspondingquantilesDD
andmedianDD
. It is based on the DDSketch https://www.vldb.org/pvldb/vol12/p2195-masson.pdf. ### Documentation entry for user-facing changes. #56342 (Srikanth Chekuri). - Allow to configure any kind of object storage with any kind of metadata type. #58357 (Kseniia Sumarokova).
- Added
null_status_on_timeout_only_active
andthrow_only_active
modes fordistributed_ddl_output_mode
that allow to avoid waiting for inactive replicas. #58350 (Alexander Tokmakov). - Add function
arrayShingles
to compute subarrays, e.g.arrayShingles([1, 2, 3, 4, 5], 3)
returns[[1,2,3],[2,3,4],[3,4,5]]
. #58396 (Zheng Miao). - Added functions
punycodeEncode
,punycodeDecode
,idnaEncode
andidnaDecode
which are useful for translating international domain names to an ASCII representation according to the IDNA standard. #58454 (Robert Schulze). - Added string similarity functions
dramerauLevenshteinDistance
,jaroSimilarity
andjaroWinklerSimilarity
. #58531 (Robert Schulze). - Add two settings
output_format_compression_level
to change output compression level andoutput_format_compression_zstd_window_log
to explicitly set compression window size and enable long-range mode for zstd compression if output compression method iszstd
. Applied forINTO OUTFILE
and when writing to table functionsfile
,url
,hdfs
,s3
, andazureBlobStorage
. #58539 (Duc Canh Le). - Automatically disable ANSI escape sequences in Pretty formats if the output is not a terminal. Add new
auto
mode to settingoutput_format_pretty_color
. #58614 (Shaun Struwig). - Added function
sqidDecode
which decodes Sqids. #58544 (Robert Schulze). - Allow to read Bool values into String in JSON input formats. It's done under a setting
input_format_json_read_bools_as_strings
that is enabled by default. #58561 (Kruglov Pavel). - Added function
seriesDecomposeSTL
which decomposes a time series into a season, a trend and a residual component. #57078 (Bhavna Jindal). - Introduced MySQL Binlog Client for MaterializedMySQL: One binlog connection for many databases. #57323 (Val Doroshchuk).
- Intel QuickAssist Technology (QAT) provides hardware-accelerated compression and cryptograpy. ClickHouse got a new compression codec
ZSTD_QAT
which utilizes QAT for zstd compression. The codec uses Intel's QATlib and Inte's QAT ZSTD Plugin. Right now, only compression can be accelerated in hardware (a software fallback kicks in in case QAT could not be initialized), decompression always runs in software. #57509 (jasperzhu). - Implementing the new way how object storage keys are generated for s3 disks. Now the format could be defined in terms of
re2
regex syntax withkey_template
option in disc description. #57663 (Sema Checherinda). - Table system.dropped_tables_parts contains parts of system.dropped_tables tables (dropped but not yet removed tables). #58038 (Yakov Olkhovskiy).
- Add settings
max_materialized_views_size_for_table
to limit the number of materialized views attached to a table. #58068 (zhongyuankai). clickhouse-format
improvements: support INSERT queries withVALUES
; support comments (use--comments
to output them); support--max_line_length
option to format only long queries in multiline. #58246 (vdimir).- Attach all system tables in
clickhouse-local
, includingsystem.parts
. This closes #58312. #58359 (Alexey Milovidov). - Support for
Enum
data types in functiontransform
. This closes #58241. #58360 (Alexey Milovidov). - Add table
system.database_engines
. #58390 (Bharat Nallan). Allow registering database engines independently in the codebase. #58365 (Bharat Nallan). Allow registering interpreters independently. #58443 (Bharat Nallan). - Added
FROM <Replicas>
modifier forSYSTEM SYNC REPLICA LIGHTWEIGHT
query. With theFROM
modifier ensures we wait for fetches and drop-ranges only for the specified source replicas, as well as any replica not in zookeeper or with an empty source_replica. #58393 (Jayme Bird). - Added setting
update_insert_deduplication_token_in_dependent_materialized_views
. This setting allows to update insert deduplication token with table identifier during insert in dependent materialized views. Closes #59165. #59238 (Maksim Kita). - Added statement
SYSTEM RELOAD ASYNCHRONOUS METRICS
which updates the asynchronous metrics. Mostly useful for testing and development. #53710 (Robert Schulze).
Performance Improvement
- Coordination for parallel replicas is rewritten for better parallelism and cache locality. It has been tested for linear scalability on hundreds of replicas. It also got support for reading in order. #57968 (Nikita Taranov).
- Replace HTTP outgoing buffering based with the native ClickHouse buffers. Add bytes counting metrics for interfaces. #56064 (Yakov Olkhovskiy).
- Large aggregation states of
uniqExact
will be merged in parallel in distrubuted queries. #59009 (Nikita Taranov). - Lower memory usage after reading from
MergeTree
tables. #59290 (Anton Popov). - Lower memory usage in vertical merges. #59340 (Anton Popov).
- Avoid huge memory consumption during Keeper startup for more cases. #58455 (Antonio Andelic).
- Keeper improvement: reduce Keeper's memory usage for stored nodes. #59002 (Antonio Andelic).
- More cache-friendly final implementation. Note on the behaviour change: previously queries with
FINAL
modifier that read with a single stream (e.g.max_threads = 1
) produced sorted output without explicitly providedORDER BY
clause. This is no longer guaranteed whenenable_vertical_final = true
(and it is so by default). #54366 (Duc Canh Le). - Bypass extra copying in
ReadBufferFromIStream
which is used, e.g., for reading from S3. #56961 (Nikita Taranov). - Optimize array element function when input is Array(Map)/Array(Array(Num)/Array(Array(String))/Array(BigInt)/Array(Decimal). The previous implementations did more allocations than needed. The optimization speed up is up to ~6x especially when input type is Array(Map). #56403 (李扬).
- Read column once while reading more than one subcolumn from it in compact parts. #57631 (Kruglov Pavel).
- Rewrite the AST of
sum(column + constant)
function. This is available as an optimization pass for Analyzer #57853 (Jiebin Sun). - The evaluation of function
match
now utilizes skipping indicesngrambf_v1
andtokenbf_v1
. #57882 (凌涛). - The evaluation of function
match
now utilizes inverted indices. #58284 (凌涛). - MergeTree
FINAL
does not compare rows from same non-L0 part. #58142 (Duc Canh Le). - Speed up iota calls (filling array with consecutive numbers). #58271 (Raúl Marín).
- Speedup MIN/MAX for non-numeric types. #58334 (Raúl Marín).
- Optimize the combination of filters (like in multi-stage PREWHERE) with BMI2/SSE intrinsics #58800 (Zhiguo Zhou).
- Use one thread less in
clickhouse-local
. #58968 (Alexey Milovidov). - Improve the
multiIf
function performance when the type is Nullable. #57745 (KevinyhZou). - Add
SYSTEM JEMALLOC PURGE
for purging unused jemalloc pages,SYSTEM JEMALLOC [ ENABLE | DISABLE | FLUSH ] PROFILE
for controlling jemalloc profile if the profiler is enabled. Add jemalloc-related 4LW command in Keeper:jmst
for dumping jemalloc stats,jmfp
,jmep
,jmdp
for controlling jemalloc profile if the profiler is enabled. #58665 (Antonio Andelic). - Lower memory consumption in backups to S3. #58962 (Vitaly Baranov).
Improvement
- Added comments (brief descriptions) to all columns of system tables. There are several reasons for this: - We use system tables a lot, and sometimes it could be very difficult for developer to understand the purpose and the meaning of a particular column. - We change (add new ones or modify existing) system tables a lot and the documentation for them is always outdated. For example take a look at the documentation page for
system.parts
. It misses a lot of columns - We would like to eventually generate documentation directly from ClickHouse. #58356 (Nikita Mikhaylov). - Allow queries without aliases for subqueries for
PASTE JOIN
. #58654 (Yarik Briukhovetskyi). - Enable
MySQL
/MariaDB
integration on macOS. This closes #21191. #46316 (Alexey Milovidov) (Robert Schulze). - Disable
max_rows_in_set_to_optimize_join
by default. #56396 (vdimir). - Add
<host_name>
config parameter that allows avoiding resolving hostnames in ON CLUSTER DDL queries and Replicated database engines. This mitigates the possibility of the queue being stuck in case of a change in cluster definition. Closes #57573. #57603 (Nikolay Degterinsky). - Increase
load_metadata_threads
to 16 for the filesystem cache. It will make the server start up faster. #57732 (Alexey Milovidov). - Add ability to throttle merges/mutations (
max_mutations_bandwidth_for_server
/max_merges_bandwidth_for_server
). #57877 (Azat Khuzhin). - Replaced undocumented (boolean) column
is_hot_reloadable
in system tablesystem.server_settings
by (Enum8) columnchangeable_without_restart
with possible valuesNo
,Yes
,IncreaseOnly
andDecreaseOnly
. Also documented the column. #58029 (skyoct). - Cluster discovery supports setting username and password, close #58063. #58123 (vdimir).
- Support query parameters in
ALTER TABLE ... PART
. #58297 (Azat Khuzhin). - Create consumers for Kafka tables on the fly (but keep them for some period -
kafka_consumers_pool_ttl_ms
, since last used), this should fix problem with statistics forsystem.kafka_consumers
(that does not consumed when nobody reads from Kafka table, which leads to live memory leak and slow table detach) and also this PR enables stats forsystem.kafka_consumers
by default again. #58310 (Azat Khuzhin). sparkBar
as an alias tosparkbar
. #58335 (凌涛).- Avoid sending
ComposeObject
requests after upload toGCS
. #58343 (Azat Khuzhin). - Correctly handle keys with dot in the name in configurations XMLs. #58354 (Azat Khuzhin).
- Make function
format
return constant on constant arguments. This closes #58355. #58358 (Alexey Milovidov). - Adding a setting
max_estimated_execution_time
to separatemax_execution_time
andmax_estimated_execution_time
. #58402 (Zhang Yifan). - Provide a hint when an invalid database engine name is used. #58444 (Bharat Nallan).
- Add settings for better control of indexes type in Arrow dictionary. Use signed integer type for indexes by default as Arrow recommends. Closes #57401. #58519 (Kruglov Pavel).
- Implement #58575 Support
CLICKHOUSE_PASSWORD_FILE
environment variable when running the docker image. #58583 (Eyal Halpern Shalev). - When executing some queries, which require a lot of streams for reading data, the error
"Paste JOIN requires sorted tables only"
was previously thrown. Now the numbers of streams resize to 1 in that case. #58608 (Yarik Briukhovetskyi). - Better message for INVALID_IDENTIFIER error. #58703 (Yakov Olkhovskiy).
- Improved handling of signed numeric literals in normalizeQuery. #58710 (Salvatore Mesoraca).
- Support Point data type for MySQL. #58721 (Kseniia Sumarokova).
- When comparing a Float32 column and a const string, read the string as Float32 (instead of Float64). #58724 (Raúl Marín).
- Improve S3 compatibility, add ECloud EOS storage support. #58786 (xleoken).
- Allow
KILL QUERY
to cancel backups / restores. This PR also makes running backups and restores visible insystem.processes
. Also, there is a new setting in the server configuration now -shutdown_wait_backups_and_restores
(default=true) which makes the server either wait on shutdown for all running backups and restores to finish or just cancel them. #58804 (Vitaly Baranov). - Avro format to support ZSTD codec. Closes #58735. #58805 (flynn).
- MySQL interface gained support for
net_write_timeout
andnet_read_timeout
settings.net_write_timeout
is translated into the nativesend_timeout
ClickHouse setting and, similarly,net_read_timeout
intoreceive_timeout
. Fixed an issue where it was possible to set MySQLsql_select_limit
setting only if the entire statement was in upper case. #58835 (Serge Klochkov). - A better exception message while conflict of creating dictionary and table with the same name. #58841 (Yarik Briukhovetskyi).
- Make sure that for custom (created from SQL) disks ether
filesystem_caches_path
(a common directory prefix for all filesystem caches) orcustom_cached_disks_base_directory
(a common directory prefix for only filesystem caches created from custom disks) is specified in server config.custom_cached_disks_base_directory
has higher priority for custom disks overfilesystem_caches_path
, which is used if the former one is absent. Filesystem cache settingpath
must lie inside that directory, otherwise exception will be thrown preventing disk to be created. This will not affect disks created on an older version and server was upgraded - then the exception will not be thrown to allow the server to successfully start).custom_cached_disks_base_directory
is added to default server config as/var/lib/clickhouse/caches/
. Closes #57825. #58869 (Kseniia Sumarokova). - MySQL interface gained compatibility with
SHOW WARNINGS
/SHOW COUNT(*) WARNINGS
queries, though the returned result is always an empty set. #58929 (Serge Klochkov). - Skip unavailable replicas when executing parallel distributed
INSERT SELECT
. #58931 (Alexander Tokmakov). - Display word-descriptive log level while enabling structured log formatting in json. #58936 (Tim Liou).
- MySQL interface gained support for
CAST(x AS SIGNED)
andCAST(x AS UNSIGNED)
statements via data type aliases:SIGNED
for Int64, andUNSIGNED
for UInt64. This improves compatibility with BI tools such as Looker Studio. #58954 (Serge Klochkov). - Change working directory to the data path in docker container. #58975 (cangyin).
- Added setting for Azure Blob Storage
azure_max_unexpected_write_error_retries
, can also be set from config under azure section. #59001 (SmitaRKulkarni). - Allow server to start with broken data lake table. Closes #58625. #59080 (Kseniia Sumarokova).
- Allow to ignore schema evolution in the
Iceberg
table engine and read all data using schema specified by the user on table creation or latest schema parsed from metadata on table creation. This is done under a settingiceberg_engine_ignore_schema_evolution
that is disabled by default. Note that enabling this setting can lead to incorrect result as in case of evolved schema all data files will be read using the same schema. #59133 (Kruglov Pavel). - Prohibit mutable operations (
INSERT
/ALTER
/OPTIMIZE
/...) on read-only/write-once storages with a properTABLE_IS_READ_ONLY
error (to avoid leftovers). Avoid leaving left-overs on write-once disks (format_version.txt
) onCREATE
/ATTACH
. IgnoreDROP
forReplicatedMergeTree
(so as forMergeTree
). Fix iterating overs3_plain
(MetadataStorageFromPlainObjectStorage::iterateDirectory
). Note read-only isweb
disk, and write-once iss3_plain
. #59170 (Azat Khuzhin). - Fix bug in the experimental
_block_number
column which could lead to logical error during complex combination ofALTER
s andmerge
s. Fixes #56202. Replaces #58601. #59295 (alesapin). - Play UI understands when an exception is returned inside JSON. Adjustment for #52853. #59303 (Alexey Milovidov).
/binary
HTTP handler allows to specify user, host, and optionally, password in the query string. #59311 (Alexey Milovidov).- Support backups for compressed in-memory tables. This closes #57893. #59315 (Alexey Milovidov).
- Support the
FORMAT
clause inBACKUP
andRESTORE
queries. #59338 (Vitaly Baranov). - Function
concatWithSeparator
now supports arbitrary argument types (instead of onlyString
andFixedString
arguments). For example,SELECT concatWithSeparator('.', 'number', 1)
now returnsnumber.1
. #59341 (Robert Schulze).
Build/Testing/Packaging Improvement
- Improve aliases for clickhouse binary (now
ch
/clickhouse
isclickhouse-local
orclickhouse
depends on the arguments) and add bash completion for new aliases. #58344 (Azat Khuzhin). - Add settings changes check to CI to check that all settings changes are reflected in settings changes history. #58555 (Kruglov Pavel).
- Use tables directly attached from S3 in stateful tests. #58791 (Alexey Milovidov).
- Save the whole
fuzzer.log
as an archive instead of the last 100k lines.tail -n 100000
often removes lines with table definitions. Example:. #58821 (Dmitry Novik). - Enable Rust on macOS with Aarch64 (this will add fuzzy search in client with skim and the PRQL language, though I don't think that are people who host ClickHouse on darwin, so it is mostly for fuzzy search in client I would say). #59272 (Azat Khuzhin).
- Fix aggregation issue in mixed x86_64 and ARM clusters #59132 (Harry Lee).
Bug Fix (user-visible misbehavior in an official stable release)
- Add join keys conversion for nested LowCardinality #51550 (vdimir).
- Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) #56132 (Kruglov Pavel).
- Fix a bug with projections and the
aggregate_functions_null_for_empty
setting during insertion. #56944 (Amos Bird). - Fixed potential exception due to stale profile UUID #57263 (Vasily Nemkov).
- Fix working with read buffers in StreamingFormatExecutor #57438 (Kruglov Pavel).
- Ignore MVs with dropped target table during pushing to views #57520 (Kruglov Pavel).
- Eliminate possible race between ALTER_METADATA and MERGE_PARTS #57755 (Azat Khuzhin).
- Fix the expressions order bug in group by with rollup #57786 (Chen768959).
- A fix for the obsolete "zero-copy" replication feature: Fix lost blobs after dropping a replica with broken detached parts #58333 (Alexander Tokmakov).
- Allow users to work with symlinks in user_files_path #58447 (Duc Canh Le).
- Fix a crash when graphite table does not have an agg function #58453 (Duc Canh Le).
- Delay reading from StorageKafka to allow multiple reads in materialized views #58477 (János Benjamin Antal).
- Fix a stupid case of intersecting parts #58482 (Alexander Tokmakov).
- MergeTreePrefetchedReadPool disable for LIMIT only queries #58505 (Maksim Kita).
- Enable ordinary databases while restoration #58520 (Jihyuk Bok).
- Fix Apache Hive threadpool reading for ORC/Parquet/... #58537 (sunny).
- Hide credentials in
system.backup_log
'sbase_backup_name
column #58550 (Daniel Pozo Escalona). toStartOfInterval
for milli- microsencods values rounding #58557 (Yarik Briukhovetskyi).- Disable
max_joined_block_rows
in ConcurrentHashJoin #58595 (vdimir). - Fix join using nullable in the old analyzer #58596 (vdimir).
makeDateTime64
: Allow non-const fraction argument #58597 (Robert Schulze).- Fix possible NULL dereference during symbolizing inline frames #58607 (Azat Khuzhin).
- Improve isolation of query cache entries under re-created users or role switches #58611 (Robert Schulze).
- Fix broken partition key analysis when doing projection optimization #58638 (Amos Bird).
- Query cache: Fix per-user quota #58731 (Robert Schulze).
- Fix stream partitioning in parallel window functions #58739 (Dmitry Novik).
- Fix double destroy call on exception throw in addBatchLookupTable8 #58745 (Raúl Marín).
- Don't process requests in Keeper during shutdown #58765 (Antonio Andelic).
- Fix a null pointer dereference in
SlabsPolygonIndex::find
#58771 (Yarik Briukhovetskyi). - Fix JSONExtract function for LowCardinality(Nullable) columns #58808 (vdimir).
- A fix for unexpected accumulation of memory usage while creating a huge number of tables by CREATE and DROP. #58831 (Maksim Kita).
- Multiple read file log storage in mv #58877 (János Benjamin Antal).
- Restriction for the access key id for s3. #58900 (MikhailBurdukov).
- Fix possible crash in clickhouse-local during loading suggestions #58907 (Kruglov Pavel).
- Fix crash when
indexHint
is used #58911 (Dmitry Novik). - Fix StorageURL forgetting headers on server restart #58933 (Michael Kolupaev).
- Analyzer: fix storage replacement with insertion block #58958 (Yakov Olkhovskiy).
- Fix seek in ReadBufferFromZipArchive #58966 (Michael Kolupaev).
- A fix for experimental inverted indices (don't use in production):
DROP INDEX
of inverted index now removes all relevant files from persistence #59040 (mochi). - Fix data race on query_factories_info #59049 (Kseniia Sumarokova).
- Disable "Too many redirects" error retry #59099 (skyoct).
- Fix not started database shutdown deadlock #59137 (Sergei Trifonov).
- Fix: LIMIT BY and LIMIT in distributed query #59153 (Igor Nikonov).
- Fix crash with nullable timezone for
toString
#59190 (Yarik Briukhovetskyi). - Fix abort in iceberg metadata on bad file paths #59275 (Kruglov Pavel).
- Fix architecture name in select of Rust target #59307 (p1rattttt).
- Fix a logical error about "not-ready set" for querying from
system.tables
with a subquery in the IN clause. #59351 (Nikolai Kochetov).