mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-27 01:51:59 +00:00
41 KiB
41 KiB
ClickHouse release v21.9.1.8000-prestable FIXME as compared to v21.8.1.7409-prestable
Backward Incompatible Change
- Fix the issue that in case of some sophisticated query with column aliases identical to the names of expressions, bad cast may happen. This fixes #25447. This fixes #26914. This fix may introduce backward incompatibility: if there are different expressions with identical names, exception will be thrown. It may break some rare cases when
enable_optimize_predicate_expression
is set. #26639 (Alexey Milovidov). - Under clickhouse-local, always treat local addresses with a port as remote. #26736 (Raúl Marín).
- Do not allow to apply parametric aggregate function with
-Merge
combinator to aggregate function state if state was produced by aggregate function with different parameters. For example, state offooState(42)(x)
cannot be finalized withfooMerge(s)
orfooMerge(123)(s)
, parameters must be specified explicitly likefooMerge(42)(s)
and must be equal. It does not affect some special aggregate functions likequantile
andsequence*
that use parameters for finalization only. #26847 (Alexander Tokmakov). - Do not output trailing zeros in text representation of
Decimal
types. Example:1.23
will be printed instead of1.230000
for decimal with scale 6. This closes #15794. It may introduce slight incompatibility if your applications somehow relied on the trailing zeros. Serialization in output formats can be controlled with the settingoutput_format_decimal_trailing_zeros
. Implementation oftoString
and casting to String is changed unconditionally. #27680 (Alexey Milovidov).
New Feature
- Implement window function
nth_value(expr, N)
that returns the value of the Nth row of the window frame. #26334 (Zuo, RuoYu). - Functions that return (initial_)query_id of the current query. This closes #23682. #26410 (Alexey Boykov).
- Introduce syntax for here documents. Example
SELECT $doc$VALUE$doc$
. #26671 (Maksim Kita). - New functions
currentProfiles()
,enabledProfiles()
,defaultProfiles()
. #26714 (Vitaly Baranov). - Add new functions
currentRoles()
,enabledRoles()
,defaultRoles()
. #26780 (Vitaly Baranov). - Supported cluster macros inside table functions 'cluster' and 'clusterAllReplicas'. #26913 (Vadim Volodin).
- Added support for custom query for MySQL, PostgreSQL, ClickHouse, JDBC, Cassandra dictionary source. Closes #1270. #26995 (Maksim Kita).
- add column default_database to system.users. #27054 (kevin wan).
- Added
bitmapSubsetOffsetLimit(bitmap, offset, cardinality_limit)
function. It creates a subset of bitmap limit the results tocardinality_limit
with offset ofoffset
. #27234 (DHBin). - Add support for
bzip2
compression method for import/export. Closes #22428. #27377 (Nikolay Degterinsky). -
- Add replicated storage of user, roles, row policies, quotas and settings profiles through ZooKeeper (experimental). #27426 (Kevin Michel).
- Add "tupleToNameValuePairs", a function that turns a named tuple into an array of pairs. #27505 (Braulio Valdivielso Martínez).
- Enable using constants from with and select in aggregate function parameters. Close #10945. #27531 (abel-cheng).
- Added ComplexKeyRangeHashed dictionary. Closes #22029. #27629 (Maksim Kita).
Performance Improvement
- Compile aggregate functions
groupBitOr
,groupBitAnd
,groupBitXor
. #26161 (Maksim Kita). - Compile columns with
Enum
types. #26237 (Maksim Kita). -
- Vectorize the SUM of Nullable integer types with native representation (David Manzanares, Raúl Marín). #26248 (Raúl Marín).
- Don't build sets for indices when analyzing a query. #26365 (Raúl Marín).
- Improve latency of short queries, that require reading from tables with large number of columns. #26371 (Anton Popov).
- Share file descriptors in concurrent reads of the same files. There is no noticeable performance difference on Linux. But the number of opened files will be significantly (10..100 times) lower on typical servers and it makes operations easier. See #26214. #26768 (Alexey Milovidov).
- Specialize date time related comparison to achieve better performance. This fixes #27083 . #27122 (Amos Bird).
- Improve the performance of fast queries when
max_execution_time=0
by reducing the number ofclock_gettime
system calls. #27325 (filimonov). - Less number of
clock_gettime
syscalls that may lead to performance improvement for some types of fast queries. #27492 (filimonov).
Improvement
- Add error id (like
BAD_ARGUMENTS
) to exception messages. This closes #25862. #26172 (Alexey Milovidov). - Remove GLOBAL keyword for IN when scalar function is passed. In previous versions, if user specified
GLOBAL IN f(x)
exception was thrown. #26217 (Amos Bird). - Apply aggressive IN index analysis for projections so that better projection candidate can be selected. #26218 (Amos Bird).
- convert timestamp and timestamptz data types to DateTime64 in postgres engine. #26234 (jasine).
- Check for non-deterministic functions in keys, including constant expressions like
now()
,today()
. This closes #25875. This closes #11333. #26235 (Alexey Milovidov). - Don't throw exception when querying
system.detached_parts
table if there is custom disk configuration anddetached
directory does not exist on some disks. This closes #26078. #26236 (Alexey Milovidov). - Add information about column sizes in
system.columns
table forLog
andTinyLog
tables. This closes #9001. #26241 (Nikolay Degterinsky). - Added
output_format_avro_string_column_pattern
setting to put specified String columns to Avro as string instead of default bytes. Implements #22414. #26245 (Ilya Golshtein). -
- Add
system.warnings
table to collect warnings about server configuration. #26246 (Filatenkov Artur).
- Add
- Check hash function at table creation, not at sampling. Add settings in MergeTreeSettings, if someone create a table with incorrect sampling column but sampling never be used, disable this settings for starting the server without exception. #26256 (zhaoyu).
- Make
toTimeZone
monotonicity when timeZone is a constant value to support partition puring when use sql like:. #26261 (huangzhaowei). -
- When client connect to server, he receives information about all warnings that are already were collected by server. (It can be disabled by using option
--no-warnings
). #26282 (Filatenkov Artur).
- When client connect to server, he receives information about all warnings that are already were collected by server. (It can be disabled by using option
- Add a setting
function_range_max_elements_in_block
to tune the safety threshold for data volume generated by functionrange
. This closes #26303. #26305 (Alexey Milovidov). - Control the execution period of clear old temporary directories by parameter with default value. #26212. #26313 (fastio).
- Allow to reuse connections of shards among different clusters. It also avoids creating new connections when using
cluster
table function. #26318 (Amos Bird). - Add events to profile calls to sleep / sleepEachRow. #26320 (Raúl Marín).
- Save server address in history URLs in web UI if it differs from the origin of web UI. This closes #26044. #26322 (Alexey Milovidov).
- Add ability to set Distributed directory monitor settings via CREATE TABLE (i.e.
CREATE TABLE dist (key Int) Engine=Distributed(cluster, db, table) SETTINGS monitor_batch_inserts=1
and similar). #26336 (Azat Khuzhin). - Fix behaviour with non-existing host in user allowed host list. #26368 (ianton-ru).
- Added comments for the code written in https://github.com/ClickHouse/ClickHouse/pull/24206; the code has been improved in several places. #26377 (Vitaly Baranov).
- Enable
use_hedged_requests
setting that allows to mitigate tail latencies on large clusters. #26380 (Alexey Milovidov). - Updated protobuf to 3.17.3. Changelogs are available on https://github.com/protocolbuffers/protobuf/releases. #26424 (Ilya Yatsishin).
- After https://github.com/ClickHouse/ClickHouse/pull/26377. Encryption algorithm now should be specified explicitly if it's not default (
aes_128_ctr
):. #26465 (Vitaly Baranov). - Apply
LIMIT
on the shards for queries likeSELECT * FROM dist ORDER BY key LIMIT 10
w/distributed_push_down_limit=1
. Avoid runningDistinct
/LIMIT BY
steps for queries likeSELECT DISTINCT shading_key FROM dist ORDER BY key
. Nowdistributed_push_down_limit
is respected byoptimize_distributed_group_by_sharding_key
optimization. #26466 (Azat Khuzhin). -
- Set client query kind for mysql and postgresql handler. #26498 (anneji-dev).
- Executable dictionaries (ExecutableDictionarySource, ExecutablePoolDictionarySource) enable creation with DDL query using clickhouse-local. Closes #22355. #26510 (Maksim Kita).
- Add round-robin support for clickhouse-benchmark (it does not differ from the regular multi host/port run except for statistics report). #26607 (Azat Khuzhin).
- Improve the high performance machine to use the kafka engine. and it can recuce the query node work load. #26642 (feihengye).
- Avoid hanging clickhouse-benchmark if connection fails (i.e. on EMFILE). #26656 (Azat Khuzhin).
- Fix excessive (x2) connect attempts with skip_unavailable_shards. #26658 (Azat Khuzhin).
-
mapPopulatesSeries
function supportsMap
type. #26663 (Ildus Kurbangaliev).
- Improve handling of KILL QUERY requests. #26675 (Raúl Marín).
- SET PROFILE now applies constraints too if they're set for a passed profile. #26730 (Vitaly Baranov).
- Support multiple keys for encrypted disk. Display error message if the key is probably wrong. (see https://github.com/ClickHouse/ClickHouse/pull/26465#issuecomment-882015970). #26733 (Vitaly Baranov).
- remove uncessary exception thrown. #26740 (Caspian).
- Watchdog is disabled in docker by default. Fix for not handling ctrl+c. #26757 (Mikhail f. Shiryaev).
- Changing default roles affects new sessions only. #26759 (Vitaly Baranov).
- Less verbose internal RocksDB logs. This closes #26252. #26789 (Alexey Milovidov).
- Expose rocksdb statistics via system.rocksdb table. Read rocksdb options from ClickHouse config (
rocksdb
/rocksdb_TABLE
keys). #26821 (Azat Khuzhin). - Updated extractAllGroupsHorizontal - upper limit on the number of matches per row can be set via optional third argument. ... #26961 (Vasily Nemkov).
- Now functions can be shard-level constants, which means if it's executed in the context of some distributed table, it generates a normal column, otherwise it produces a constant value. Notable functions are:
hostName()
,tcpPort()
,version()
,buildId()
,uptime()
, etc. #27020 (Amos Bird). -
- Merge join correctly handles empty set in the right. #27078 (Vladimir C).
- Improve compatibility with non-whole-minute timezone offsets. #27080 (Raúl Marín).
- Enable distributed_push_down_limit by default. #27104 (Azat Khuzhin).
- Improved the existence condition judgment and empty string node judgment when clickhouse-keeper creates znode. #27125 (小路).
- Add compression for
INTO OUTFILE
that automatically choose compression algorithm. Closes #3473. #27134 (Filatenkov Artur). - add a new metric called MaxPushedDDLEntryID which is the maximum ddl entry id that current node push to zookeeper. #27174 (Fuwang Hu).
- Allow to pass query settings via server URI in Web UI. #27177 (kolsys).
- Added columns
replica_is_active
that maps replica name to is replica active status to tablesystem.replicas
. Closes #27138. #27180 (Maksim Kita). - Try recording
query_kind
even when query fails to start. #27182 (Amos Bird). - Mark window functions as ready for general use. Remove the
allow_experimental_window_functions
setting. #27184 (Alexander Kuzmenkov). - Memory client in client. #27191 (Filatenkov Artur).
- Support schema for postgres database engine. Closes #27166. #27198 (Kseniia Sumarokova).
- Split global mutex into individual regexp construction. This helps avoid huge regexp construction blocking other related threads. Not sure how to proper test the improvement. #27211 (Amos Bird).
- Add 10 seconds cache for S3 proxy resolver. #27216 (ianton-ru).
- Add new index data skipping minmax index format for proper Nullable support. #27250 (Azat Khuzhin).
- Memory consumed by bitmap aggregate functions now is taken into account for memory limits. This closes #26555. #27252 (Alexey Milovidov).
- Add two settings
max_hyperscan_regexp_length
andmax_hyperscan_regexp_total_length
to prevent huge regexp being used in hyperscan related functions, such asmultiMatchAny
. #27378 (Amos Bird). - Add setting
log_formatted_queries
to log additional formatted query intosystem.query_log
. It's useful for normalized query analysis because functions likenormalizeQuery
andnormalizeQueryKeepNames
don't parse/format queries in order to achieve better performance. #27380 (Amos Bird). - Add Cast function for internal usage, which will not preserve type nullability, but non-internal cast will preserve according to setting cast_keep_nullable. Closes #12636. #27382 (Kseniia Sumarokova).
- Send response with error message if HTTP port is not set and user tries to send HTTP request to TCP port. #27385 (Braulio Valdivielso Martínez).
- Use bytes instead of strings for binary data in the GRPC protocol. #27431 (Vitaly Baranov).
- Log client IP address if authentication fails. #27514 (Misko Lee).
- Disable arrayJoin on partition expressions. #27648 (Raúl Marín).
-
- Add
FROM INFILE
command. #27655 (Filatenkov Artur).
- Add
- Enables query parameters to be passed in the body of http requests. #27706 (Hermano Lustosa).
- Remove duplicate index analysis and avoid possible invalid limit checks during projection analysis. #27742 (Amos Bird).
Bug Fix
- Fix potential crash if more than one
untuple
expression is used. #26179 (Alexey Milovidov). - Remove excessive newline in
thread_name
column insystem.stack_trace
table. This fixes #24124. #26210 (Alexey Milovidov). - Fix logical error on join with totals, close #26017. #26250 (Vladimir C).
- Fix zstd decompression in case there are escape sequences at the end of internal buffer. Closes #26013. #26314 (Kseniia Sumarokova).
- Fixed rare bug in lost replica recovery that may cause replicas to diverge. #26321 (Alexander Tokmakov).
- Fix
optimize_distributed_group_by_sharding_key
for multiple columns (leads to incorrect result w/optimize_skip_unused_shards=1
/allow_nondeterministic_optimize_skip_unused_shards=1
and multiple columns in sharding key expression). #26353 (Azat Khuzhin). - Fix possible crash when login as dropped user. This PR fixes #26073. #26363 (Vitaly Baranov).
- Fix infinite non joined block stream in
partial_merge_join
close #26325. #26374 (Vladimir C). - Now, scalar subquery always returns
Nullable
result if it's type can beNullable
. It is needed because in case of empty subquery it's result should beNull
. Previously, it was possible to get error about incompatible types (type deduction does not execute scalar subquery, and it could use not-nullable type). Scalar subquery with empty result which can't be converted toNullable
(likeArray
orTuple
) now throws error. Fixes #25411. #26423 (Nikolai Kochetov). - Fix some fuzzed msan crash. Fixes #22517. #26428 (Nikolai Kochetov).
- Fix broken name resolution after rewriting column aliases. This fixes #26432. #26475 (Amos Bird).
- Fix issues with
CREATE DICTIONARY
query if dictionary name or database name was quoted. Closes #26491. #26508 (Maksim Kita). - Fix crash in rabbitmq shutdown in case rabbitmq setup was not started. Closes #26504. #26529 (Kseniia Sumarokova).
- Update
chown
cmd check in clickhouse-server docker entrypoint. It fixes the bug that cluster pod restart failed (or timeout) on kubernetes. #26545 (Ky Li). - Fix incorrect function names of groupBitmapAnd/Or/Xor. This fixes. #26557 (Amos Bird).
- Fix history file conversion if file is empty. #26589 (Azat Khuzhin).
- Fix potential nullptr dereference in window functions. This fixes #25276. #26668 (Alexander Kuzmenkov).
- ParallelFormattingOutputFormat: Use mutex to handle the join to the collector_thread (https://github.com/ClickHouse/ClickHouse/issues/26694). #26703 (Raúl Marín).
- Sometimes SET ROLE could work incorrectly, this PR fixes that. #26707 (Vitaly Baranov).
- Do not remove data on ReplicatedMergeTree table shutdown to avoid creating data to metadata inconsistency. #26716 (nvartolomei).
- Add
event_time_microseconds
value forREMOVE_PART
insystem.part_log
. In previous versions is was not set. #26720 (Azat Khuzhin). - Aggregate function parameters might be lost when applying some combinators causing exceptions like
Conversion from AggregateFunction(topKArray, Array(String)) to AggregateFunction(topKArray(10), Array(String)) is not supported
. It's fixed. Fixes #26196 and #26433. #26814 (Alexander Tokmakov). - Fix library-bridge ids load. #26834 (Kseniia Sumarokova).
- Fix error
Missing columns: 'xxx'
whenDEFAULT
column references other non materialized column withoutDEFAULT
expression. Fixes #26591. #26900 (alesapin). - Fix reading of custom TLDs (stops processing with lower buffer or bigger file). #26948 (Azat Khuzhin).
- Fix "Unknown column name" error with multiple JOINs in some cases, close #26899. #26957 (Vladimir C).
- Now partition ID in queries like
ALTER TABLE ... PARTITION ID xxx
validates for correctness. Fixes #25718. #26963 (alesapin). - [RFC] Fix possible mutation stack due to race with DROP_RANGE. #27002 (Azat Khuzhin).
- Fixed
cache
,complex_key_cache
,ssd_cache
,complex_key_ssd_cache
configuration parsing. Optionsallow_read_expired_keys
,max_update_queue_size
,update_queue_push_timeout_milliseconds
,query_wait_timeout_milliseconds
were not parsed for dictionaries with noncache
type. #27032 (Maksim Kita). - Fix synchronization in GRPCServer This PR fixes #27024. #27064 (Vitaly Baranov).
-
- Fix uninitialized memory in functions
multiSearch*
with empty array, close #27169. #27181 (Vladimir C).
- Fix uninitialized memory in functions
- In rare cases
system.detached_parts
table might contain incorrect information for some parts, it's fixed. Fixes #27114. #27183 (Alexander Tokmakov). - Fix on-disk format breakage for secondary indices over Nullable column (no stable release had been affected). #27197 (Azat Khuzhin).
- Fix column structure in merge join, close #27091. #27217 (Vladimir C).
- In case of ambiguity, lambda functions prefer its arguments to other aliases or identifiers. #27235 (Raúl Marín).
- Fix mutation stuck on invalid partitions in non-replicated MergeTree. #27248 (Azat Khuzhin).
- Fix
distributed_group_by_no_merge=2
+distributed_push_down_limit=1
oroptimize_distributed_group_by_sharding_key=1
withLIMIT BY
andLIMIT OFFSET
. #27249 (Azat Khuzhin). - Fix errors like
Expected ColumnLowCardinality, gotUInt8
orBad cast from type DB::ColumnVector<char8_t> to DB::ColumnLowCardinality
for some queries withLowCardinality
inPREWHERE
. Fixes #23515. #27298 (Nikolai Kochetov). - Fix
Cannot find column
error for queries with sampling. Was introduced in #24574. Fixes #26522. #27301 (Nikolai Kochetov). - Fix Mysql protocol when using parallel formats (CSV / TSV). #27326 (Raúl Marín).
- Fixed incorrect validation of partition id for MergeTree tables that created with old syntax. #27328 (Alexander Tokmakov).
- Fix incorrect result for query with row-level security, prewhere and LowCardinality filter. Fixes #27179. #27329 (Nikolai Kochetov).
- /proc/info contains metrics like. #27361 (Mike Kot).
- Fix distributed queries with zero shards and aggregation. #27427 (Azat Khuzhin).
- fix metric BackgroundMessageBrokerSchedulePoolTask, maybe mistyped。. #27452 (Ben).
- Fix crash during projection materialization when some parts contain missing columns. This fixes #27512. #27528 (Amos Bird).
- Fixed underflow of the time value when constructing it from components. Closes #27193. #27605 (Vasily Nemkov).
- After setting
max_memory_usage*
to non-zero value it was not possible to reset it back to 0 (unlimited). It's fixed. #27638 (Alexander Tokmakov). -
- Fix bug with aliased column in
Distributed
table. #27652 (Vladimir C).
- Fix bug with aliased column in
- Fixed another case of
Unexpected merged part ... intersecting drop range ...
error. #27656 (Alexander Tokmakov). - Fix postgresql table function resulting in non-closing connections. Closes #26088. #27662 (Kseniia Sumarokova).
- Fix bad type cast when functions like
arrayHas
are applied to arrays of LowCardinality of Nullable of different non-numeric types likeDateTime
andDateTime64
. In previous versions bad cast occurs. In new version it will lead to exception. This closes #26330. #27682 (Alexey Milovidov). - Fix column filtering with union distinct in subquery. Closes #27578. #27689 (Kseniia Sumarokova).
- After https://github.com/ClickHouse/ClickHouse/pull/26384. To execute
GRANT WITH REPLACE OPTION
now the current user should haveGRANT OPTION
for access rights it's going to grant AND for access rights it's going to revoke. #27701 (Vitaly Baranov). - After https://github.com/ClickHouse/ClickHouse/pull/25687. Add backquotes for the default database shown in CREATE USER. #27702 (Vitaly Baranov).
- Remove duplicated source files in CMakeLists.txt in arrow-cmake. #27736 (李扬).
- Fix possible crash when asynchronous connection draining is enabled and hedged connection is disabled. #27774 (Amos Bird).
- Prevent crashes for some formats when NULL (tombstone) message was coming from Kafka. Closes #19255. #27794 (filimonov).
- Fix a rare bug in
DROP PART
which can lead to the errorUnexpected merged part intersects drop range
. #27807 (alesapin). - Fix a couple of bugs that may cause replicas to diverge. #27808 (Alexander Tokmakov).
Build/Testing/Packaging Improvement
- Update RocksDB to 2021-07-16 master. #26411 (Alexey Milovidov).
clickhouse-test
supports SQL tests with Jinja2 templates. #26579 (Vladimir C).- Fix /clickhouse/window functions/tests/non distributed/errors/error window function in join. #26744 (vzakaznikov).
- Enabling RBAC TestFlows tests and crossing out new fails. #26747 (vzakaznikov).
- Tests: Fix CLICKHOUSE_CLIENT_SECURE with the default config. #26901 (Raúl Marín).
- Fix linking of auxiliar programs when using dynamic libraries. #26958 (Raúl Marín).
- Add CMake options to build with or without specific CPU instruction set. This is for #17469 and #27509. #27508 (Alexey Milovidov).
- Add support for build with
clang-13
. This closes #27705. #27714 (Alexey Milovidov). - Improve support for build with
clang-13
. #27777 (Sergei Semin).
Other
- Rename
MaterializeMySQL
toMaterializedMySQL
. #26822 (Alexander Tokmakov).
NO CL ENTRY
- NO CL ENTRY: 'Modify code comments'. #26265 (xiedeyantu).
- NO CL ENTRY: 'Revert "Datatype Date32, support range 1925 to 2283"'. #26352 (Alexey Milovidov).
- NO CL ENTRY: 'Fix CURR_DATABASE empty for 01034_move_partition_from_table_zookeeper.sh'. #27164 (小路).
- NO CL ENTRY: 'DOCSUP-12413: macros support in functions cluster and clusterAllReplicas'. #27759 (olgarev).
- NO CL ENTRY: 'Revert "less sys calls #2: make vdso work again"'. #27829 (Alexey Milovidov).
- NO CL ENTRY: 'Revert "Do not miss exceptions from the ThreadPool"'. #27844 (Alexey Milovidov).