mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-21 15:12:02 +00:00
40 KiB
40 KiB
ClickHouse release v22.1, 2022-01-18
Upgrade Notes
- The functions
left
andright
were previously implemented in parser and now full-featured. Distributed queries withleft
orright
functions without aliases may throw exception if cluster contains different versions of clickhouse-server. If you are upgrading your cluster and encounter this error, you should finish upgrading your cluster to ensure all nodes have the same version. Also you can add aliases (AS something
) to the columns in your queries to avoid this issue. #33407 (alexey-milovidov). - Resource usage by scalar subqueries is fully accounted since this version. With this change, rows read in scalar subqueries are now reported in the query_log. If the scalar subquery is cached (repeated or called for several rows) the rows read are only counted once. This change allows KILLing queries and reporting progress while they are executing scalar subqueries. #32271 (Raúl Marín).
New Feature
- Implement data schema inference for input formats. Allow to skip structure (or write just
auto
) in table functionsfile
,url
,s3
,hdfs
and in parameters ofclickhouse-local
. Allow to skip structure in create query for table enginesFile
,HDFS
,S3
,URL
,Merge
,Buffer
,Distributed
andReplicatedMergeTree
(if we add new replicas). #32455 (Kruglov Pavel). - Detect format by file extension in
file
/hdfs
/s3
/url
table functions andHDFS
/S3
/URL
table engines and also forSELECT INTO OUTFILE
andINSERT FROM INFILE
#33565 (Kruglov Pavel). Close #30918. #33443 (OnePiece). - A tool for collecting diagnostics data if you need support. #33175 (Alexander Burmak).
- Automatic cluster discovery via Zoo/Keeper. It allows to add replicas to the cluster without changing configuration on every server. #31442 (vdimir).
- Implement hive table engine to access apache hive from clickhouse. This implements: #29245. #31104 (taiyang-li).
- Add aggregate functions
cramersV
,cramersVBiasCorrected
,theilsU
andcontingency
. These functions calculate dependency (measure of association) between categorical values. All these functions are using cross-tab (histogram on pairs) for implementation. You can imagine it like a correlation coefficient but for any discrete values (not necessary numbers). #33366 (alexey-milovidov). Initial implementation by Vanyok-All-is-OK and antikvist. - Added table function
hdfsCluster
which allows processing files from HDFS in parallel from many nodes in a specified cluster, similarly tos3Cluster
. #32400 (Zhichang Yu). - Adding support for disks backed by Azure Blob Storage, in a similar way it has been done for disks backed by AWS S3. #31505 (Jakub Kuklis).
- Allow
COMMENT
inCREATE VIEW
(for all VIEW kinds). #31062 (Vasily Nemkov). - Dynamically reinitialize listening ports and protocols when configuration changes. #30549 (Kevin Michel).
- Added
left
,right
,leftUTF8
,rightUTF8
functions. Fix error in implementation ofsubstringUTF8
function with negative offset (offset from the end of string). #33407 (alexey-milovidov). - Add new functions for
H3
coordinate system:h3HexAreaKm2
,h3CellAreaM2
,h3CellAreaRads2
. #33479 (Bharat Nallan). - Add
MONTHNAME
function. #33436 (usurai). - Added function
arrayLast
. Closes #33390. #33415 Added functionarrayLastIndex
. #33465 (Maksim Kita). - Add function
decodeURLFormComponent
slightly different todecodeURLComponent
. Close #10298. #33451 (SuperDJY). - Allow to split
GraphiteMergeTree
rollup rules for plain/tagged metrics (optional rule_type field). #33494 (Michail Safronov).
Performance Improvement
- Support moving conditions to
PREWHERE
(settingoptimize_move_to_prewhere
) for tables ofMerge
engine if its all underlying tables supportsPREWHERE
. #33300 (Anton Popov). - More efficient handling of globs for URL storage. Now you can easily query million URLs in parallel with retries. Closes #32866. #32907 (Kseniia Sumarokova).
- Avoid exponential backtracking in parser. This closes #20158. #33481 (alexey-milovidov).
- Abuse of
untuple
function was leading to exponential complexity of query analysis (found by fuzzer). This closes #33297. #33445 (alexey-milovidov). - Reduce allocated memory for dictionaries with string attributes. #33466 (Maksim Kita).
- Slight performance improvement of
reinterpret
function. #32587 (alexey-milovidov). - Non significant change. In extremely rare cases when data part is lost on every replica, after merging of some data parts, the subsequent queries may skip less amount of partitions during partition pruning. This hardly affects anything. #32220 (Azat Khuzhin).
- Improve
clickhouse-keeper
writing performance by optimization the size calculation logic. #32366 (zhanglistar). - Optimize single part projection materialization. This closes #31669. #31885 (Amos Bird).
- Improve query performance of system tables. #33312 (OnePiece).
- Optimize selecting of MergeTree parts that can be moved between volumes. #33225 (OnePiece).
- Fix
sparse_hashed
dict performance with sequential keys (wrong hash function). #32536 (Azat Khuzhin).
Experimental Feature
- Parallel reading from multiple replicas within a shard during distributed query without using sample key. To enable this, set
allow_experimental_parallel_reading_from_replicas = 1
andmax_parallel_replicas
to any number. This closes #26748. #29279 (Nikita Mikhaylov). - Implemented sparse serialization. It can reduce usage of disk space and improve performance of some queries for columns, which contain a lot of default (zero) values. It can be enabled by setting
ratio_for_sparse_serialization
. Sparse serialization will be chosen dynamically for column, if it has ratio of number of default values to number of all values above that threshold. Serialization (default or sparse) will be fixed for every column in part, but may varies between parts. #22535 (Anton Popov). - Add "TABLE OVERRIDE" feature for customizing MaterializedMySQL table schemas. #32325 (Stig Bakken).
- Add
EXPLAIN TABLE OVERRIDE
query. #32836 (Stig Bakken). - Support TABLE OVERRIDE clause for MaterializedPostgreSQL. RFC: #31480. #32749 (Kseniia Sumarokova).
- Change ZooKeeper path for zero-copy marks for shared data. Note that "zero-copy replication" is non-production feature (in early stages of development) that you shouldn't use anyway. But in case if you have used it, let you keep in mind this change. #32061 (ianton-ru).
- Events clause support for WINDOW VIEW watch query. #32607 (vxider).
- Fix ACL with explicit digit hash in
clickhouse-keeper
: now the behavior consistent with ZooKeeper and generated digest is always accepted. #33249 (小路). #33246. - Fix unexpected projection removal when detaching parts. #32067 (Amos Bird).
Improvement
- Now date time conversion functions that generates time before
1970-01-01 00:00:00
will be saturated to zero instead of overflow. #29953 (Amos Bird). It also fixes a bug in index analysis if date truncation function would yield result before the Unix epoch. - Always display resource usage (total CPU usage, total RAM usage and max RAM usage per host) in client. #33271 (alexey-milovidov).
- Improve
Bool
type serialization and deserialization, check the range of values. #32984 (Kruglov Pavel). - If an invalid setting is defined using the
SET
query or using the query parameters in the HTTP request, error message will contain suggestions that are similar to the invalid setting string (if any exists). #32946 (Antonio Andelic). - Support hints for mistyped setting names for clickhouse-client and clickhouse-local. Closes #32237. #32841 (凌涛).
- Allow to use virtual columns in Materialized Views. Close #11210. #33482 (OnePiece).
- Add config to disable IPv6 in clickhouse-keeper if needed. This close #33381. #33450 (Wu Xueyang).
- Add more info to
system.build_options
about current git revision. #33431 (taiyang-li). clickhouse-local
: track memory under--max_memory_usage_in_client
option. #33341 (Azat Khuzhin).- Allow negative intervals in function
intervalLengthSum
. Their length will be added as well. This closes #33323. #33335 (alexey-milovidov). LineAsString
can be used as output format. This closes #30919. #33331 (Sergei Trifonov).- Support
<secure/>
in cluster configuration, as an alternative form of<secure>1</secure>
. Close #33270. #33330 (SuperDJY). - Pressing Ctrl+C twice will terminate
clickhouse-benchmark
immediately without waiting for in-flight queries. This closes #32586. #33303 (alexey-milovidov). - Support Unix timestamp with milliseconds in
parseDateTimeBestEffort
function. #33276 (Ben). - Allow to cancel query while reading data from external table in the formats:
Arrow
/Parquet
/ORC
- it failed to be cancelled it case of big files and setting input_format_allow_seeks as false. Closes #29678. #33238 (Kseniia Sumarokova). - If table engine supports
SETTINGS
clause, allow to pass the settings as key-value or via config. Add this support for MySQL. #33231 (Kseniia Sumarokova). - Correctly prevent Nullable primary keys if necessary. This is for #32780. #33218 (Amos Bird).
- Add retry for
PostgreSQL
connections in case nothing has been fetched yet. Closes #33199. #33209 (Kseniia Sumarokova). - Validate config keys for external dictionaries. #33095. #33130 (Kseniia Sumarokova).
- Send profile info inside
clickhouse-local
. Closes #33093. #33097 (Kseniia Sumarokova). - Short circuit evaluation: support for function
throwIf
. Closes #32969. #32973 (Maksim Kita). - (This only happens in unofficial builds). Fixed segfault when inserting data into compressed Decimal, String, FixedString and Array columns. This closes #32939. #32940 (N. Kolotov).
- Added support for specifying subquery as SQL user defined function. Example:
CREATE FUNCTION test AS () -> (SELECT 1)
. Closes #30755. #32758 (Maksim Kita). - Improve gRPC compression support for #28671. #32747 (Vitaly Baranov).
- Flush all In-Memory data parts when WAL is not enabled while shutdown server or detaching table. #32742 (nauta).
- Allow to control connection timeouts for MySQL (previously was supported only for dictionary source). Closes #16669. Previously default connect_timeout was rather small, now it is configurable. #32734 (Kseniia Sumarokova).
- Support
authSource
option for storageMongoDB
. Closes #32594. #32702 (Kseniia Sumarokova). - Support
Date32
type ingenarateRandom
table function. #32643 (nauta). - Add settings
max_concurrent_select_queries
andmax_concurrent_insert_queries
for control concurrent queries by query kind. Close #3575. #32609 (SuperDJY). - Improve handling nested structures with missing columns while reading data in
Protobuf
format. Follow-up to https://github.com/ClickHouse/ClickHouse/pull/31988. #32531 (Vitaly Baranov). - Allow empty credentials for
MongoDB
engine. Closes #26267. #32460 (Kseniia Sumarokova). - Disable some optimizations for window functions that may lead to exceptions. Closes #31535. Closes #31620. #32453 (Kseniia Sumarokova).
- Allows to connect to MongoDB 5.0. Closes #31483,. #32416 (Kseniia Sumarokova).
- Enable comparison between
Decimal
andFloat
. Closes #22626. #31966 (flynn). - Added settings
command_read_timeout
,command_write_timeout
forStorageExecutable
,StorageExecutablePool
,ExecutableDictionary
,ExecutablePoolDictionary
,ExecutableUserDefinedFunctions
. Settingcommand_read_timeout
controls timeout for reading data from command stdout in milliseconds. Settingcommand_write_timeout
timeout for writing data to command stdin in milliseconds. Added settingscommand_termination_timeout
forExecutableUserDefinedFunction
,ExecutableDictionary
,StorageExecutable
. Added settingexecute_direct
forExecutableUserDefinedFunction
, by default true. Added settingexecute_direct
forExecutableDictionary
,ExecutablePoolDictionary
, by default false. #30957 (Maksim Kita). - Bitmap aggregate functions will give correct result for out of range argument instead of wraparound. #33127 (DR).
- Fix parsing incorrect queries with
FROM INFILE
statement. #33521 (Kruglov Pavel). - Don't allow to write into
S3
if path contains globs. #33142 (Kruglov Pavel). --echo
option was not used byclickhouse-client
in batch mode with single query. #32843 (N. Kolotov).- Use
--database
option for clickhouse-local. #32797 (Kseniia Sumarokova). - Fix surprisingly bad code in SQL ordinary function
file
. Now it supports symlinks. #32640 (alexey-milovidov). - Updating
modification_time
for data part insystem.parts
after part movement #32964. #32965 (save-my-heart). - Potential issue, cannot be exploited: integer overflow may happen in array resize. #33024 (varadarajkumar).
Build/Testing/Packaging Improvement
- Add packages, functional tests and Docker builds for AArch64 (ARM) version of ClickHouse. #32911 (Mikhail f. Shiryaev). #32415
- Prepare ClickHouse to be built with musl-libc. It is not enabled by default. #33134 (alexey-milovidov).
- Make installation script working on FreeBSD. This closes #33384. #33418 (alexey-milovidov).
- Add
actionlint
for GitHub Actions workflows and verify workflow files viaact --list
to check the correct workflow syntax. #33612 (Mikhail f. Shiryaev). - Add more tests for the nullable primary key feature. Add more tests with different types and merge tree kinds, plus randomly generated data. #33228 (Amos Bird).
- Add a simple tool to visualize flaky tests in web browser. #33185 (alexey-milovidov).
- Enable hermetic build for shared builds. This is mainly for developers. #32968 (Amos Bird).
- Update
libc++
andlibc++abi
to the latest. #32484 (Raúl Marín). - Added integration test for external .NET client (ClickHouse.Client). #23230 (Oleg V. Kozlyuk).
- Inject git information into clickhouse binary file. So we can get source code revision easily from clickhouse binary file. #33124 (taiyang-li).
- Remove obsolete code from ConfigProcessor. Yandex specific code is not used anymore. The code contained one minor defect. This defect was reported by Mallik Hassan in #33032. This closes #33032. #33026 (alexey-milovidov).
Bug Fix (user-visible misbehavior in official stable or prestable release)
- Several fixes for format parsing. This is relevant if
clickhouse-server
is open for write access to adversary. Specifically crafted input data forNative
format may lead to reading uninitialized memory or crash. This is relevant ifclickhouse-server
is open for write access to adversary. #33050 (Heena Bansal). Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. #33022 (Harry Lee). Fix null pointer dereference inLowCardinality
data when deserializingLowCardinality
data in the Native format. #33021 (Harry Lee). - ClickHouse Keeper handler will correctly remove operation when response sent. #32988 (JackyWoo).
- Potential off-by-one miscalculation of quotas: quota limit was not reached, but the limit was exceeded. This fixes #31174. #31656 (sunny).
- Fixed CASTing from String to IPv4 or IPv6 and back. Fixed error message in case of failed conversion. #29224 (Dmitry Novik) #27914 (Vasily Nemkov).
- Fixed an exception like
Unknown aggregate function nothing
during an execution on a remote server. This fixes #16689. #26074 (hexiaoting). - Fix wrong database for JOIN without explicit database in distributed queries (Fixes: #10471). #33611 (Azat Khuzhin).
- Fix segfault in Apache
Avro
format that appears after the second insert into file. #33566 (Kruglov Pavel). - Fix segfault in Apache
Arrow
format if schema containsDictionary
type. Closes #33507. #33529 (Kruglov Pavel). - Out of band
offset
andlimit
settings may be applied incorrectly for views. Close #33289 #33518 (hexiaoting). - Fix an exception
Block structure mismatch
which may happen during insertion into table with default nestedLowCardinality
column. Fixes #33028. #33504 (Nikolai Kochetov). - Fix dictionary expressions for
range_hashed
range min and range max attributes when created using DDL. Closes #30809. #33478 (Maksim Kita). - Fix possible use-after-free for INSERT into Materialized View with concurrent DROP (Azat Khuzhin).
- Do not try to read pass EOF (to workaround for a bug in the Linux kernel), this bug can be reproduced on kernels (3.14..5.9), and requires
index_granularity_bytes=0
(i.e. turn off adaptive index granularity). #33372 (Azat Khuzhin). - The commands
SYSTEM SUSPEND
andSYSTEM ... THREAD FUZZER
missed access control. It is fixed. Author: Kevin Michel. #33333 (alexey-milovidov). - Fix when
COMMENT
for dictionaries does not appear insystem.tables
,system.dictionaries
. Allow to modify the comment forDictionary
engine. Closes #33251. #33261 (Maksim Kita). - Add asynchronous inserts (with enabled setting
async_insert
) to query log. Previously such queries didn't appear in the query log. #33239 (Anton Popov). - Fix sending
WHERE 1 = 0
expressions for external databases query. Closes #33152. #33214 (Kseniia Sumarokova). - Fix DDL validation for MaterializedPostgreSQL. Fix setting
materialized_postgresql_allow_automatic_update
. Closes #29535. #33200 (Kseniia Sumarokova). Make sure unused replication slots are always removed. Found in #26952. #33187 (Kseniia Sumarokova). Fix MaterializedPostreSQL detach/attach (removing / adding to replication) tables with non-default schema. Found in #29535. #33179 (Kseniia Sumarokova). Fix DROP MaterializedPostgreSQL database. #33468 (Kseniia Sumarokova). - The metric
StorageBufferBytes
sometimes was miscalculated. #33159 (xuyatian). - Fix error
Invalid version for SerializationLowCardinality key column
in case of reading fromLowCardinality
column withlocal_filesystem_read_prefetch
orremote_filesystem_read_prefetch
enabled. #33046 (Nikolai Kochetov). - Fix
s3
table function reading empty file. Closes #33008. #33037 (Kseniia Sumarokova). - Fix Context leak in case of cancel_http_readonly_queries_on_client_close (i.e. leaking of external tables that had been uploaded the the server and other resources). #32982 (Azat Khuzhin).
- Fix wrong tuple output in
CSV
format in case of custom csv delimiter. #32981 (Kruglov Pavel). - Fix HDFS URL check that didn't allow using HA namenode address. Bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/31042. #32976 (Kruglov Pavel).
- Fix throwing exception like positional argument out of bounds for non-positional arguments. Closes #31173#event-5789668239. #32961 (Kseniia Sumarokova).
- Fix UB in case of unexpected EOF during filling a set from HTTP query (i.e. if the client interrupted in the middle, i.e.
timeout 0.15s curl -Ss -F 's=@t.csv;' 'http://127.0.0.1:8123/?s_structure=key+Int&query=SELECT+dummy+IN+s'
and with large enought.csv
). #32955 (Azat Khuzhin). - Fix a regression in
replaceRegexpAll
function. The function worked incorrectly when matched substring was empty. This closes #32777. This closes #30245. #32945 (alexey-milovidov). - Fix
ORC
format stripe reading. #32929 (kreuzerkrieg). topKWeightedState
failed for some input types. #32487. #32914 (vdimir).- Fix exception
Single chunk is expected from view inner query (LOGICAL_ERROR)
in materialized view. Fixes #31419. #32862 (Nikolai Kochetov). - Fix optimization with lazy seek for async reads from remote filesystems. Closes #32803. #32835 (Kseniia Sumarokova).
MergeTree
table engine might silently skip some mutations if there are too many running mutations or in case of high memory consumption, it's fixed. Fixes #17882. #32814 (tavplubix).- Avoid reusing the scalar subquery cache when processing MV blocks. This fixes a bug when the scalar query reference the source table but it means that all subscalar queries in the MV definition will be calculated for each block. #32811 (Raúl Marín).
- Server might fail to start if database with
MySQL
engine cannot connect to MySQL server, it's fixed. Fixes #14441. #32802 (tavplubix). - Fix crash when used
fuzzBits
function, close #32737. #32755 (SuperDJY). - Fix error
Column is not under aggregate function
in case of MV withGROUP BY (list of columns)
(which is pared asGROUP BY tuple(...)
) overKafka
/RabbitMQ
. Fixes #32668 and #32744. #32751 (Nikolai Kochetov). - Fix
ALTER TABLE ... MATERIALIZE TTL
query withTTL ... DELETE WHERE ...
andTTL ... GROUP BY ...
modes. #32695 (Anton Popov). - Fix
optimize_read_in_order
optimization in case when table engine isDistributed
orMerge
and its underlyingMergeTree
tables have monotonous function in prefix of sorting key. #32670 (Anton Popov). - Fix LOGICAL_ERROR exception when the target of a materialized view is a JOIN or a SET table. #32669 (Raúl Marín).
- Inserting into S3 with multipart upload to Google Cloud Storage may trigger abort. #32504. #32649 (vdimir).
- Fix possible exception at
RabbitMQ
storage startup by delaying channel creation. #32584 (Kseniia Sumarokova). - Fix table lifetime (i.e. possible use-after-free) in case of parallel DROP TABLE and INSERT. #32572 (Azat Khuzhin).
- Fix async inserts with formats
CustomSeparated
,Template
,Regexp
,MsgPack
andJSONAsString
. Previousely the async inserts with these formats didn't read any data. #32530 (Kruglov Pavel). - Fix
groupBitmapAnd
function on distributed table. #32529 (minhthucdao). - Fix crash in JOIN found by fuzzer, close #32458. #32508 (vdimir).
- Proper handling of the case with Apache Arrow column duplication. #32507 (Dmitriy Mokhnatkin).
- Fix issue with ambiguous query formatting in distributed queries that led to errors when some table columns were named
ALL
orDISTINCT
. This closes #32391. #32490 (alexey-milovidov). - Fix failures in queries that are trying to use skipping indices, which are not materialized yet. Fixes #32292 and #30343. #32359 (Anton Popov).
- Fix broken select query when there are more than 2 row policies on same column, begin at second queries on the same session. #31606. #32291 (SuperDJY).
- Fix fractional unix timestamp conversion to
DateTime64
, fractional part was reversed for negative unix timestamps (before 1970-01-01). #32240 (Ben). - Some entries of replication queue might hang for
temporary_directories_lifetime
(1 day by default) withDirectory tmp_merge_<part_name>
orPart ... (state Deleting) already exists, but it will be deleted soon
or similar error. It's fixed. Fixes #29616. #32201 (tavplubix). - Fix parsing of
APPLY lambda
column transformer which could lead to client/server crash. #32138 (Kruglov Pavel). - Fix
base64Encode
adding trailing bytes on small strings. #31797 (Kevin Michel). - Fix possible crash (or incorrect result) in case of
LowCardinality
arguments of window function. Fixes #31114. #31888 (Nikolai Kochetov). - Fix hang up with command
DROP TABLE system.query_log sync
. #33293 (zhanghuajie).