mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-12 01:12:12 +00:00
38 KiB
38 KiB
ClickHouse release v21.7.1.7283-prestable FIXME as compared to v21.6.1.6891-prestable
Backward Incompatible Change
- Improved performance of queries with explicitly defined large sets. Added compatibility setting
legacy_column_name_of_tuple_literal
. It makes sense to set it totrue
, while doing rolling update of cluster from version lower than 21.7 to any higher version. Otherwise distributed queries with explicitly defined sets atIN
clause may fail during update. #25371 (Anton Popov). - Forward/backward incompatible change of maximum buffer size in clickhouse-keeper. Better to do it now (before production), than later. #25421 (alesapin).
New Feature
- Add support for VFS over HDFS. #11058 (overshov).
- Provides a way to restore replicated table when the data is (possibly) present, but the ZooKeeper metadata is lost. Resolves #13458. #13652 (Mike Kot).
- Implement
sequenceNextNode()
function useful forflow analysis
. #19766 (achimbab). - Added YAML configuration support to configuration loader. This closes #3607. #21858 (BoloniniD).
- Added dateName function. #23085 (Daniil Kondratyev).
- Add
quantileBFloat16
aggregate function as well as the correspondingquantilesBFloat16
andmedianBFloat16
. It is very simple and fast quantile estimator with relative error not more than 0.390625%. This closes #16641. #23204 (Ivan Novitskiy). - Support
ALTER DELETE
queries forJoin
table engine. #23260 (foolchi). - Add a new boolean setting
prefer_global_in_and_join
which defaults all IN/JOIN as GLOBAL IN/JOIN. #23434 (Amos Bird). - add bitpositionToArray function. #23843 (kevin wan).
- Add aggregate function
segmentLengthSum
. #24250 (flynn). - Support structs and maps in Arrow/Parquet/ORC and dictionaries in Arrow input/output formats. Present new setting
output_format_arrow_low_cardinality_as_dictionary
. #24341 (Kruglov Pavel). - Support
compile_expression
setting for AARCH64. #24342 (Maksim Kita). - Now clickhouse-keeper supports ZooKeeper-like
digest
ACLs. #24448 (alesapin). - Implements the
h3ToGeo
function. #24867 (Bharat Nallan). - Now query_log has two new columns : initial_query_start_time / initial_query_start_time_microsecond that record the starting time of a distributed query if any. #25022 (Amos Bird).
- Dictionaries added support for Array type. #25119 (Maksim Kita).
- Add
toJSONString
function to serialize columns to their JSON representations. #25164 (Amos Bird). - ClickHouse database created with MaterializeMySQL now contains all column comments from the MySQL database that materialized. #25199 (Storozhuk Kostiantyn).
- Added function
dateName
. Author [Daniil Kondratyev] (@dankondr). #25372 (Maksim Kita). - Added function
bitPositionsToArray
. Closes #23792. Author [Kevin Wan] (@MaxWk). #25394 (Maksim Kita).
Performance Improvement
- (remove from changelog) Integrate and test experimental compression libraries. Will be available under the flag
allow_experimental_codecs
. This closes #16775. #17847 (Abi Palagashvili). - Add exponential backoff to reschedule read attempt in case RabbitMQ queues are empty. Closes #24340. #24415 (Kseniia Sumarokova).
- Index of type bloom_filter can be used for expressions with
hasAny
function with constant arrays. This closes: #24291. #24900 (Vasily Nemkov).
Improvement
- Fix Zero-Copy replication with several S3 volumes (Fixes #22679). #22864 (ianton-ru).
- Add ability to push down LIMIT for distributed queries. #23027 (Azat Khuzhin).
- Respect
insert_allow_materialized_columns
(allows materialized columns) for INSERT intoDistributed
table. #23349 (Azat Khuzhin). - Here will be listed all the bugs that I am gonna to fix in this PR. #23518 (Nikita Mikhaylov).
- Display progress for File table engine in clickhouse-local and on INSERT query in clickhouse-client when data is passed to stdin. Closes #18209. #23656 (Kseniia Sumarokova).
- Handle column name clashes for storage join, close #20309. #23769 (Vladimir C).
- Add ability to split distributed batch on failures (i.e. due to memory limits, corruptions), under
distributed_directory_monitor_split_batch_on_failure
(OFF by default). #23864 (Azat Khuzhin). - Add standalone
clickhouse-keeper
symlink to the mainclickhouse
binary. Now it's possible to run coordination without the main clickhouse server. #24059 (alesapin). - Suppress exceptions from logger code. #24069 (Azat Khuzhin).
- Use global settings for query to
VIEW
. Fixed the behavior when queries toVIEW
use local settings, that leads to errors if setting onCREATE VIEW
andSELECT
were different. As for now,VIEW
won't use these modified settings, but you can still pass additional settings inSETTINGS
section ofCREATE VIEW
query. Close #20551. #24095 (Vladimir C). - Add settings (
connection_auto_close
/connection_max_tries
/connection_pool_size
) for MySQL storage engine. #24146 (Azat Khuzhin). - Fix trailing whitespaces in FROM clause with subqueries in multiline mode, and also changes the output of the queries slightly in a more human friendly way. #24151 (Azat Khuzhin).
- Recognize IPv4 addresses like
127.0.1.1
as local. This is controversial and closes #23504. Michael Filimonov will test this feature. #24316 (Alexey Milovidov). - Fix IPv6 addresses resolving (i.e. fixes
select * from remote('[::1]', system.one)
). #24319 (Azat Khuzhin). - Now query_log has two new columns :
initial_query_start_time / initial_query_start_time_microsecond
that record the starting time of a distributed query if any. #24388 (Amos Bird). - Rewrite more columns to possible alias expressions. This may enable better optimization, such as projections. #24405 (Amos Bird).
- Added optimization, that transforms some functions to reading of subcolumns to reduce amount of read data. E.g., statement
col IS NULL
is transformed to reading of subcolumncol.null
. Optimization can be enabled by settingoptimize_functions_to_subcolumns
. #24406 (Anton Popov). - Fix a data race on Keeper shutdown. #24412 (alesapin).
- Support postgres schema for insert queries. Closes #24149. #24413 (Kseniia Sumarokova).
- If SSDDictionary is created with DDL query, it can be created only inside user_files directory. #24466 (Maksim Kita).
- Make String-to-Int parser stricter so that
toInt64('+')
will throw. #24475 (Amos Bird). - Add merge tree setting
max_parts_to_merge_at_once
which limits the number of parts that can be merged in the background at once. Doesn't affectOPTIMIZE FINAL
query. Fixes #1820. #24496 (alesapin). - Avoid hiding errors like
Limit for rows or bytes to read exceeded
for scalar subqueries. #24545 (nvartolomei). - Add two Replicated*MergeTree settings:
max_replicated_fetches_network_bandwidth
andmax_replicated_sends_network_bandwidth
which allows to limit maximum speed of replicated fetches/sends for table. Add two server-wide settings (indefault
user profile):max_replicated_fetches_network_bandwidth_for_server
andmax_replicated_sends_network_bandwidth_for_server
which limit maximum speed of replication for all tables. The settings are not followed perfectly accurately. Turned off by default. Fixes #1821. #24573 (alesapin). - Respect
max_distributed_connections
forinsert_distributed_sync
(otherwise for huge clusters and sync insert it may run out ofmax_thread_pool_size
). #24754 (Azat Khuzhin). - Fixed a bug in
Replicated
database engine that might rarely cause some replica to skip enqueued DDL query. #24805 (Alexander Tokmakov). - Some queries require multi-pass semantic analysis. Try reusing built sets for
IN
in this case. #24874 (Amos Bird). - Allow
not in
operator to be used in partition pruning. #24894 (Amos Bird). - Improved logging of S3 errors, no more double spaces in case of empty keys and buckets. #24897 (Vladimir Chebotarev).
- For distributed query, when
optimize_skip_unused_shards=1
, allow to skip shard with condition like(sharding key) IN (one-element-tuple)
. (Tuples with many elements were supported. Tuple with single element did not work because it is parsed as literal). #24930 (Amos Bird). - Detect linux version at runtime (for worked nested epoll, that is required for
async_socket_for_remote
/use_hedged_requests
, otherwise remote queries may stuck). #25067 (Azat Khuzhin). - Increase size of background schedule pool to 128 (
background_schedule_pool_size
setting). It allows avoiding replication queue hung on slow zookeeper connection. #25072 (alesapin). - Fix topLevelDomain() for IDN hosts (i.e.
example.рф
), before it returns empty string for such hosts. #25103 (Azat Khuzhin). - On server start, parts with incorrect partition ID would not be ever removed, but always detached. #25070. #25166 (Nikolai Kochetov).
- Correct memory tracking in aggregate function
topK
. This closes #25259. #25260 (Alexey Milovidov). - Use separate
clickhouse-bridge
group and user for bridge processes. Set oom_score_adj so the bridges will be first subjects for OOM killer. Set set maximum RSS to 1 GiB. Closes #23861. #25280 (Kseniia Sumarokova). - Update prompt in
clickhouse-client
and display a message when reconnecting. This closes #10577. #25281 (Alexey Milovidov). - Add support for function
if
with Decimal and Int types on its branches. This closes #20549. This closes #10142. #25283 (Alexey Milovidov). - Add settings
http_max_fields
,http_max_field_name_size
,http_max_field_value_size
. #25296 (Ivan). - Add == operator on time conditions for sequenceMatch and sequenceCount functions. For eg: sequenceMatch('(?1)(?t==1)(?2)')(time, data = 1, data = 2). #25299 (Christophe Kalenzaga).
- Support Interval for LowCardinality, close #21730. #25410 (Vladimir C).
- Flatbuffers library updated to v.2.0.0. Improvements list https://github.com/google/flatbuffers/releases/tag/v2.0.0. #25474 (Ilya Yatsishin).
- Drop replicas from dirname for internal_replication=true (allows INSERT into Distributed with cluster from any number of replicas, before only 15 replicas was supported, everything more will fail with ENAMETOOLONG while creating directory for async blocks). #25513 (Azat Khuzhin).
- Resolve the actual port number bound when a user requests any available port from the operating system. #25569 (bnaecker).
- Improve startup time of Distributed engine. #25663 (Azat Khuzhin).
Bug Fix
- Fix the bug in failover behavior when Engine=Kafka was not able to start consumption if the same consumer had an empty assignment previously. Closes #21118. #21267 (filimonov).
- Fix waiting of automatic dropping of empty parts. It could lead to full filling of background pool and stuck of replication. #23315 (Anton Popov).
- Column cardinality in join output same as at the input, close #23351, close #20315. #24061 (Vladimir C).
- Use old modulo function version when used in partition key. Closes #23508. #24157 (Kseniia Sumarokova).
- Set
max_threads = 1
to fix mutation fail of StorageMemory. Closes #24274. #24275 (flynn). - Allow empty HTTP headers. Fixes #23901. #24285 (Ivan).
- Fixed a bug in moving Materialized View from Ordinary to Atomic database (
RENAME TABLE
query). Now inner table is moved to new database together with Materialized View. Fixes #23926. #24309 (Alexander Tokmakov). - Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number. #24321 (Amos Bird).
- In "multipart/form-data" message consider the CRLF preceding a boundary as part of it. Fixes #23905. #24399 (Ivan).
-
- Fixed the deadlock that can happen during LDAP role (re)mapping, when LDAP group is mapped to a nonexistent local role. #24431 (Denis Glazachev).
- Fix incorrect monotonicity of toWeek function. This fixes #24422 . This bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/5212 , and was exposed later by smarter partition pruner. #24446 (Amos Bird).
- In current CH version total_writes.bytes counter decreases too much during the buffer flush. It leads to counter overflow and totalBytes return something around 17.44 EB some time after the flush. This pr should fix it. ... #24450 (DimasKovas).
- Fixed the behavior when query
SYSTEM RESTART REPLICA
orSYSTEM SYNC REPLICA
is being processed infinitely. This was detected on server with extremely little amount of RAM. #24457 (Nikita Mikhaylov). - Fix usage of tuples in
CREATE .. AS SELECT
queries. #24464 (Anton Popov). - Enable reading of subcolumns for distributed tables. #24472 (Anton Popov).
- Disallow building uniqXXXXStates of other aggregation states. #24523 (Raúl Marín).
- Fixed bug in deserialization of random generator state with might cause some data types such as
AggregateFunction(groupArraySample(N), T))
to behave in a non-deterministic way. #24538 (Alexander Tokmakov). - Fix bug which can lead to ZooKeeper client hung inside clickhouse-server. #24721 (alesapin).
-
- If ZooKeeper connection was lost and replica was cloned after restoring the connection, its replication queue might contain outdated entries. It's fixed. - Fixed crash when replication queue contains intersecting virtual parts. It may rarely happen if some data part was lost. Print error in log instead of terminating. #24777 (Alexander Tokmakov).
- Fix bug when exception
Mutation was killed
can be thrown to the client on mutation wait when mutation not loaded into memory yet. #24809 (alesapin). - Allow NULL values in postgresql protocol. Closes #22622. #24857 (Kseniia Sumarokova).
- Fix "Missing columns" exception when joining Distributed Materialized View. #24870 (Azat Khuzhin).
- Fix extremely rare bug on low-memory servers which can lead to the inability to perform merges without restart. Possibly fixes #24603. #24872 (alesapin).
- Fixed possible error 'Cannot read from istream at offset 0' when reading a file from DiskS3. #24885 (Pavel Kovalenko).
- Fixed bug with declaring S3 disk at root of bucket. Earlier, it reported an error: ``` [heather] 2021.05.10 02:11:11.932234 [ 72790 ] {2ff80b7b-ec53-41cb-ac35-19bb390e1759} executeQuery: Code: 36, e.displayText() = DB::Exception: Key name is empty in path style S3 URI: (http://172.17.0.2/bucket/) (version 21.6.1.1) (from 127.0.0.1:47994) (in query: SELECT policy_name FROM system.storage_policies), Stack trace (when copying this message, always include the lines below):. #24898 (Vladimir Chebotarev).
- Fix possible heap-buffer-overflow in Arrow. #24922 (Kruglov Pavel).
- Fix limit/offset settings for distributed queries (ignore on the remote nodes). #24940 (Azat Khuzhin).
- Fix extremely rare error
Tagging already tagged part
in replication queue during concurrentalter move/replace partition
. Possibly fixes #22142. #24961 (alesapin). - Fix serialization of splitted nested messages in Protobuf format. This PR fixes #24647. #25000 (Vitaly Baranov).
- Fix potential crash when calculating aggregate function states by aggregation of aggregate function states of other aggregate functions (not a practical use case). See #24523. #25015 (Alexey Milovidov).
- Distinguish KILL MUTATION for different tables (fixes unexpected
Cancelled mutating parts
error). #25025 (Azat Khuzhin). - Fix wrong result when using aggregate projection with not empty
GROUP BY
key to execute query withGROUP BY
by empty key. #25055 (Amos Bird). - Fix bug which allows creating tables with columns referencing themselves like
a UInt32 ALIAS a + 1
orb UInt32 MATERIALIZED b
. Fixes #24910, #24292. #25059 (alesapin). - Fix bug with constant maps in mapContains that lead to error
empty column was returned by function mapContains
. Closes #25077. #25080 (Kruglov Pavel). - Fix crash in query with cross join and
joined_subquery_requires_alias = 0
. Fixes #24011. #25082 (Nikolai Kochetov). - Fix possible parts loss after updating up to 21.5 in case table used
UUID
in partition key. (It is not recommended to useUUID
in partition key). Fixes #25070. #25127 (Nikolai Kochetov). - Do not use table's projection for
SELECT
withFINAL
. It is not supported yet. #25163 (Amos Bird). - Fixed an error which occurred while inserting a subset of columns using CSVWithNames format. Fixes #25129. #25169 (Nikita Mikhaylov).
- Fix TOCTOU error in installation script. #25277 (Alexey Milovidov).
- Fix incorrect behaviour and UBSan report in big integers. In previous versions
CAST(1e19 AS UInt128)
returned zero. #25279 (Alexey Milovidov). - Fix joinGetOrNull with not-nullable columns. This fixes #24261. #25288 (Amos Bird).
- Fix error
Bad cast from type DB::ColumnLowCardinality to DB::ColumnVector<char8_t>
for queries whereLowCardinality
argument was used for IN (this bug appeared in 21.6). Fixes #25187. #25290 (Nikolai Kochetov). - Fix Logical Error Cannot sum Array/Tuple in min/maxMap. #25298 (Kruglov Pavel).
- Support
SimpleAggregateFunction(LowCardinality)
forSummingMergeTree
. Fixes #25134. #25300 (Nikolai Kochetov). - On ZooKeeper connection loss
ReplicatedMergeTree
table might wait for background operations to complete before trying to reconnect. It's fixed, now background operations are stopped forcefully. #25306 (Alexander Tokmakov). - Fix the possibility of non-deterministic behaviour of the
quantileDeterministic
function and similar. This closes #20480. #25313 (Alexey Milovidov). - Fix lost
WHERE
condition in expression-push-down optimization of query plan (settingquery_plan_filter_push_down = 1
by default). Fixes #25368. #25370 (Nikolai Kochetov). - Fix
REPLACE
column transformer when used in DDL by correctly quoting the formated query. This fixes #23925. #25391 (Amos Bird). - Fix segfault when sharding_key is absent in task config for copier. #25419 (Nikita Mikhaylov).
- Fix excessive underscore before the names of the preprocessed configuration files. #25431 (Vitaly Baranov).
- Fix convertion of datetime with timezone for MySQL, PostgreSQL, ODBC. Closes #5057. #25528 (Kseniia Sumarokova).
- Fix segfault in
Arrow
format when usingDecimal256
. Add arrowDecimal256
support. #25531 (Kruglov Pavel). - Fixed case, when sometimes conversion of postgres arrays resulted in String data type, not n-dimensional array, because
attndims
works incorrectly in some cases. Closes #24804. #25538 (Kseniia Sumarokova). - Fix wrong totals for query
WITH TOTALS
andWITH FILL
. Fixes #20872. #25539 (Anton Popov). - Fix error
Key expression contains comparison between inconvertible types
for queries withARRAY JOIN
in case if array is used in primary key. Fixes #8247. #25546 (Anton Popov). - Fix bug which can lead to intersecting parts after merges with TTL:
Part all_40_40_0 is covered by all_40_40_1 but should be merged into all_40_41_1. This shouldn't happen often.
. #25549 (alesapin). - Fix restore S3 table. #25601 (ianton-ru).
- Fix null pointer dereference in
EXPLAIN AST
without query. #25631 (Nikolai Kochetov). REPLACE PARTITION
might be ignored in rare cases if the source partition was empty. It's fixed. Fixes #24869. #25665 (Alexander Tokmakov).- Fixed
No such file or directory
error on movingDistributed
table between databases. Fixes #24971. #25667 (Alexander Tokmakov). - Fix mysql select user() return empty. Fixes #25683. #25697 (sundyli).
- Fix data race when querying
system.clusters
while reloading the cluster configuration at the same time. #25737 (Amos Bird).
Build/Testing/Packaging Improvement
- Ubuntu 20.04 is now used to run integration tests, docker-compose version used to run integration tests is updated to 1.28.2. Environment variables now take effect on docker-compose. Rework test_dictionaries_all_layouts_separate_sources to allow parallel run. #20393 (Ilya Yatsishin).
- Add libfuzzer tests for YAMLParser class. #24480 (BoloniniD).
- Adding support to save clickhouse server logs in TestFlows check. #24504 (vzakaznikov).
- Integration tests configuration has special treatment for dictionaries. Removed remaining dictionaries manual setup. #24728 (Ilya Yatsishin).
- Add integration test cases to cover JDBC bridge. #25047 (Zhichun Wu).
- Disabling extended precision data types TestFlows tests. #25125 (vzakaznikov).
- Fix using Yandex dockerhub registries for TestFlows. #25133 (vzakaznikov).
- Adding
leadInFrame
andlagInFrame
window functions TestFlows tests. #25144 (vzakaznikov). - Enable build with s3 module in osx #25217. #25218 (kevin wan).
- Increase LDAP verification cooldown performance tests timeout to 600 sec. #25374 (vzakaznikov).
- Enabling TestFlows RBAC tests. #25498 (vzakaznikov).
- Add CI check for darwin-aarch64 cross-compilation. #25560 (Ivan).
- Changed CSS theme to dark for better code highlighting. #25682 (Mike Kot).
Other
- Introduce ASTTableIdentifier into the code. #16401 (Ivan).
- Use std::filesystem instad of Poco::File. #23657 (Kseniia Sumarokova).
- Fix init script so it does not print 'usage' message for each 'status' command run. #25046 (Denis Korenevskiy).
- Fix cron.d task so it does not spam with email messages about current service status. #25050 (Denis Korenevskiy).
NO CL ENTRY
- NO CL ENTRY: 'Revert "Pass Settings to aggregate function creator"'. #24524 (Vladimir C).
- NO CL ENTRY: 'Revert "Add initial_query_start_time to query log"'. #25021 (Alexey Milovidov).
- NO CL ENTRY: 'Revert "Add run-id option to integration tests"'. #25526 (alesapin).
- NO CL ENTRY: 'Revert "Implement h3ToGeo function"'. #25593 (Alexander Tokmakov).
Testing Improvement
-
- Add join related options to stress tests. #25200 (Vladimir C).