ClickHouse/docs/changelogs/v22.2.1.2139-prestable.md
2022-05-16 14:46:05 +02:00

46 KiB

ClickHouse release v22.2.1.2139-prestable FIXME as compared to v22.1.1.2542-prestable

New Feature

  • (Not ready for production, put into experimental features) Add memory overcommit to MemoryTracker. Added guaranteed settings for memory limits which represent soft memory limits. In case when hard memory limit is reached, MemoryTracker tries to cancel the most overcommited query. New setting memory_usage_overcommit_max_wait_microseconds specifies how long queries may wait another query to stop. Closes #28375. #31182 (Dmitry Novik).
  • The setting allows a user to provide own deduplication semantic in MergeTree/ReplicatedMergeTree If provided, it's used instead of data digest to generate block ID. So, for example, by providing a unique value for the setting in each INSERT statement, the user can avoid the same inserted data being deduplicated. This closes: #7461. #32304 (Igor Nikonov).
  • Add support of DEFAULT keyword for INSERT statements. Closes #6331. #33141 (Andrii Buriachevskyi).
  • Add confidence intervals to ttests. #33260 (achimbab).
  • Allow to create new files on insert for File/S3/HDFS engines. Allow to owerwrite file in HDFS. Throw an exception in attempt to overwrite a file in S3 by default. Throw an exception in attempt to append data to file in formats that have suffix. Closes #31640 Closes #31622 Closes #23862 Closes #15022 Closes #16674. #33302 (Kruglov Pavel).
  • Add h3ToCenterChild function. #33313 (Bharat Nallan).
  • Merge functions for text classification. See #23271. #33314 (Nikolay Degterinsky).
  • Implemented meanZTest. #33354 (achimbab).
  • Add new h3 miscellaneous functions: edgeLengthKm,exactEdgeLengthKm,exactEdgeLengthM,exactEdgeLengthRads,numHexagons. #33621 (Bharat Nallan).
  • Add DEGREES and RADIANS functions. #33769 (Bharat Nallan).
  • Parameter --host can accept multiple hosts. In case of unavailability of one of them, the client will try to connect to the next one. #33824 (Filippov Denis).
  • Detect format in clickhouse-local by file name. #33829 (Kruglov Pavel).
  • Add a new method expire() in PoolBase which is used to reallocate an invalid object in the pool. #34076 (lgbo).
  • Add table function format(format_name, data). #34125 (Kruglov Pavel).
  • Allow to create default table engine. #34187 (Ilya Yatsishin).
  • EPHEMERAL column specifier is added to CREATE TABLE query. Closes #9436. #34424 (Yakov Olkhovskiy).

Performance Improvement

Improvement

  • Now ReplicatedMergeTree can recover data when some of its disks are broken. #13544 (Amos Bird).
  • Dynamic reload of server TLS certificates on config reload. Closes #15764. #15765 (johnskopis).
  • Merge #15765 (Dynamic reload of server TLS certificates on config reload) cc @johnskopis. #31257 (Filatenkov Artur).
  • Added UUID data type support for functions hex, bin. #32170 (Frank Chen).
  • Support optimize_read_in_order if prefix of sorting key is already sorted. E.g. if we have sorting key ORDER BY (a, b) in table and query with WHERE a = const ORDER BY b clauses, now it will be applied reading in order of sorting key instead of full sort. #32748 (Anton Popov).
  • Add new keeper setting min_session_timeout_ms. Now keeper server will determine client session timeout according to min_session_timeout_ms and session_timeout_ms settings. #33288 (JackyWoo).
  • Improve keeper performance and fix several memory leaks. #33329 (alesapin).
  • Respect cgroup limits for CPU quota. #33342 (JaySon).
  • Enable binary arithmetic(plus, minus, multiply, division, least, greates) between Decimal and Float. #33355 (flynn).
  • Replace _shard_num via constants (from #7624) with shardNum() function (from #27020), to avoid possible issues (like those that had been found in #16947). #33392 (Azat Khuzhin).
  • Support SET, YEAR, TIME and GEOMETRY data types in MaterializedMySQL. Fixes #18091, #21536, #26361. #33429 (zzsmdfj).
  • add function addressToLineWithInlines. Close #26211. #33467 (SuperDJY).
  • Improvement for fromUnixTimestamp64 family functions.. They now accept any integer value that can be converted to Int64. This closes: #14648. #33505 (Andrey Zvonov).
  • Functions dictGet, dictHas implicitly cast key argument to dictionary key structure, if they are different. #33672 (Maksim Kita).
    • Parse and store OpenTelemetry trace-id in big-endian order. #33723 (Frank Chen).
  • Enable stream to table join in WindowView. #33729 (vxider).
  • Create parent directories in DiskS3::restoreFileOperations method. #33730 (ianton-ru).
  • Add some improvements and fixes for Bool data type. Fixes #33244. #33737 (Kruglov Pavel).
  • Added support for cast from Map(Key, Value) to Array(Tuple(Key, Value)). #33794 (Maksim Kita).
  • Support explain create function query ``` sql :) explain ast create function mycast AS (n) -> cast(n as String); EXPLAIN AST CREATE FUNCTION mycast AS n -> CAST(n, 'String'). #33819 (李扬).
  • Try every resolved ip address while getting S3 proxy. #33862 (Nikolai Kochetov).
  • FlatDictionary improve performance of dictionary data load. #33871 (Maksim Kita).
  • fix disk using the same path, close #29072. #33905 (zhongyuankai).
  • Dictionaries added support for DateTime64. #33914 (Maksim Kita).
  • FlatDictionary, HashedDictionary, HashedArrayDictionary added support for creating with empty attributes, with support of read all keys, and dictHas. Fixes #33820. #33918 (Maksim Kita).
  • RangeHashedDictionary improvements. Improve performance of load time if there are multiple attributes. Allow to create without attributes. Added option to specify strategy when intervals start and end have Nullable type convert_null_range_bound_to_open by default is true. Closes #29791. Allow to specify Float, Decimal, DateTime64, Int128, Int256, UInt128, UInt256 as range types. RangeHashedDictionary added support for range values that extend Int64 type. Closes #28322. Added option range_lookup_strategy to specify range lookup type min, max by default is min . Closes #21647. Fixed allocated bytes calculations. Fixed type name in system.dictionaries in case of ComplexKeyHashedDictionary. #33927 (Maksim Kita).
  • Fix getauxval() in glibc-compatibility, this should fix vsyscalls after setenv (i.e. timezone is set in config), and LSan (and also fix some leaks that had been found by LSan). #33957 (Azat Khuzhin).
  • Detect format and schema from stdin in clickhouse-local. #33960 (Kruglov Pavel).
  • Fixed UTF-8 string case-insensitive search when lowercase and uppercase characters are represented by different number of bytes. Example is and ß. This closes #7334. #33992 (Harry Lee).
  • Fix memory accounting for queries that uses < max_untracker_memory. #34001 (Azat Khuzhin).
  • Supports all types of SYSTEM query ON CLUSTER clause. #34005 (小路).
  • Add schema inference for values() table function. Closes #33811. #34017 (Kruglov Pavel).
  • Tracing context is now propagated from GRPC client metadata. #34064 (andremarianiello).
  • Add UUID suport in MsgPack input/output format. #34065 (Kruglov Pavel).
  • Improving the experience of multiple line editing for clickhouse-client. This is a follow-up of https://github.com/ClickHouse/ClickHouse/pull/31123. #34114 (Amos Bird).
  • Maxsplit argument for splitByChar. close #34081. #34140 (李扬).
  • Allow to parse dictionary PRIMARY KEY as PRIMARY KEY (id, value), previously supported only PRIMARY KEY id, value. Closes #34135. #34141 (Maksim Kita).
  • Allow carriage return in the middle of the line while parsing by Regexp format. This closes #34200. #34205 (Alexey Milovidov).
  • Recognize YYYYMMDD-hhmmss format in parseDateTimeBestEffort function. This closes #34206. #34208 (Alexey Milovidov).
  • Add ability to compose PostgreSQL-style cast operator :: with ArrayElement and TupleElement. #34229 (Nikolay Degterinsky).
  • Added #! and # as a recognised start of a single line comment. Reference to task #34138. #34230 (Aaron Katz).
  • Change severity of the "Cancelled merging parts" message in logs, because it's not an error. This closes #34148. #34232 (Alexey Milovidov).
  • Apply data skipping indexes for queries with FINAL may produce incorrect result. Disable data skipping indexes by default for queries with FINAL (introduce new use_skip_indexes_if_final setting and disable it by default). #34243 (Azat Khuzhin).
  • Support asynchronous inserts in clickhouse-client for queries with inlined data. #34267 (Anton Popov).
  • Cancel merges before acquiring table lock for TRUNCATE query to avoid DEADLOCK_AVOIDED error in some cases. Fixes #34302. #34304 (Alexander Tokmakov).
  • Some servers expect a User-Agent header in their HTTP requests. A User-Agent header entry has been added to HTTP requests of the form: User-Agent: ClickHouse/VERSION_STRING. #34330 (Saad Ur Rahman).
  • REGEXP_MATCHES and REGEXP_REPLACE function aliases for compatibility with PostgreSQL. Close #30885. #34334 (李扬).
  • Better handle pre-inputs before client start. This is for #34308 . #34336 (Amos Bird).
  • Add options for clickhouse-format. Which close #30528 - max_query_size - max_parser_depth. #34349 (李扬).
  • Default input and output formats that can be overriden by --input-format and --output-format. Close #30631. #34352 (李扬).
  • Allow to skip not found urls for globs when using URL storage / table function. Also closes #34359. #34392 (Kseniia Sumarokova).
  • Add two new settings: s3_upload_part_size_multiply_factor and s3_upload_part_size_multiply_parts_count_threshold. Now each time s3_upload_part_size_multiply_parts_count_threshold uploaded to S3 from a single query s3_min_upload_part_size multiplied by s3_upload_part_size_multiply_factor. Fixes #34244. #34422 (alesapin).
  • Allow allow_experimental_projection_optimization by default. #34456 (Nikolai Kochetov).
  • Privileges CREATE/ALTER/DROP ROW POLICY now can be granted on a table or on database.* as well as globally *.*. #34489 (Vitaly Baranov).
  • Refactor client fault tolerant connection (https://github.com/ClickHouse/ClickHouse/pull/33824#issuecomment-1033690860). The new way to use it: bash clickhouse-client ... --host host1 --host host2 --port port2 --host host3 --port port --host host4 . #34490 (Kruglov Pavel).
  • Improve schema inference in clickhouse-local. Allow to write just clickhouse-local -q "select * from table" < data.format. #34495 (Kruglov Pavel).
  • Support .jsonl extension for JSONEachRow format. #34496 (Kruglov Pavel).
  • Send ProfileEvents statistics in case of INSERT SELECT query. #34498 (Dmitry Novik).
  • Added sending of the output format back to client like it's done in HTTP protocol as suggested in #34362. Closes #34362. #34499 (Vitaly Baranov).
  • Allow to write s3(url, access_key_id, secret_access_key). #34503 (Kruglov Pavel).
  • Support IF EXISTS clause for TTL expr TO [DISK|VOLUME] [IF EXISTS] 'xxx' feature. Parts will be moved to disk or volume only if it exists on replica, so MOVE TTL rules will be able to behave differently on replicas according to the existing storage policies. Resolves #34455. #34504 (Anton Popov).
  • Little improvement no need to clone log entry. #34587 (zhanglistar).
  • Slightly improve performance in case of filtering by sparse columns (which can be enabled by setting ratio_of_defaults_for_sparse_serialization in MergeTree tables). #34601 (Anton Popov).

Bug Fix

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehaviour in official stable or prestable release)

  • Fix lz4 compression for output. Closes #31421. #31862 (Kruglov Pavel).
  • Create a function escapeForLDAPFilter and use it to escape characters '(' and ')' in a final_user_dn variable. #33401 (IlyaTsoi).
  • TODO. #33492 (huzhichengdd).
  • Fix error Bad cast from type ... to DB::DataTypeArray which may happen when table has Nested column with dots in name, and default value is generated for it (e.g. during insert, when column is not listed). Continuation of #28762. #33588 (Alexey Pavlenko).
  • Fix Chunk should have AggregatedChunkInfo in GroupingAggregatedTransform (in case of optimize_aggregation_in_order=1). #33637 (Azat Khuzhin).
  • Fix bug in zero copy replication which lead to data duplication in case of TTL move. Fixes #33643. #33642 (alesapin).
  • Allow some queries with sorting, LIMIT BY, ARRAY JOIN and lambda functions. This closes #7462. #33675 (Alexey Milovidov).
  • Correctly determine current database if CREATE TEMPORARY TABLE AS SELECT is queried inside a named HTTP session. This is a very rare use case. This closes #8340. #33676 (Alexey Milovidov).
  • Fix mutation when table contains projections. This fixes #33010 . This fixes #33275 . #33679 (Amos Bird).
  • Throw exception when storage hdfs list directory failed. #33724 (LiuNeng).
  • Fix tiny race between count() and INSERT/merges/... in MergeTree (it is possible to return incorrect number of rows for SELECT with optimize_trivial_count_query). #33753 (Azat Khuzhin).
  • Fix bug of check table when creating data part with wide format and projection. #33774 (李扬).
  • Fix parsing query INSERT INTO ... VALUES SETTINGS ... (...), ... #33776 (Kruglov Pavel).
  • Fix bug in client that led to 'Connection reset by peer' in server. Closes #33309. #33790 (Kruglov Pavel).
  • Fix usage of external dictionaries with Redis source and large number of keys. #33804 (Anton Popov).
  • Fix schema inference for JSONEachRow and JSONCompactEachRow. #33830 (Kruglov Pavel).
  • Fix KeyCondition with no common types available. #33833 (Amos Bird).
  • Fix memory leak in clickhouse-keeper in case of compression is used (default). #33840 (Azat Khuzhin).
  • Fixed replica is not readonly logical error on SYSTEM RESTORE REPLICA query when replica is actually readonly. Fixes #33806. #33847 (Alexander Tokmakov).
  • Fix usage of sparse columns (which can be enabled by experimental setting ratio_of_defaults_for_sparse_serialization). #33849 (Anton Popov).
  • Fix crash if sql user defined function is created with lambda with non identifier arguments. Closes #33866. #33868 (Maksim Kita).
  • Fix potential race condition when doing remote disk read. cc @Jokser. #33912 (Amos Bird).
  • Aggregate function combinator -If did not correctly process Nullable filter argument. This closes #27073. #33920 (Alexey Milovidov).
  • Fix parsing ZK metadata: now metadata from zookeeper compared with local metadata in canonical form. #33933 (sunny).
  • Fix usage of functions array and tuple with literal arguments in distributed queries. Previously it could lead to Not found columns exception. #33938 (Anton Popov).
  • Fix crash while reading of nested tuples. Fixes #33838. #33956 (Anton Popov).
  • Fix parsing IPv6 from query parameter and fix IPv6 to string conversion. Closes #33928. #33971 (Kruglov Pavel).
  • Fix segfault while parsing ORC file with corrupted footer. Closes #33797. #33984 (Kruglov Pavel).
  • Fix bug which lead to inability for server to start when both replicated access storage and keeper are used. Introduced two settings for keeper socket timeout instead of settings from default user: keeper_server.socket_receive_timeout_sec and keeper_server.socket_send_timeout_sec. Fixes #33973. #33988 (alesapin).
    • Fixes parallel_view_processing=0 not working when inserting into a table using VALUES. - Fixes view_duration_ms in the query_views_log not being set correctly for materialized views. #34067 (Raúl Marín).
  • Fix asynchronous inserts with Native format. #34068 (Anton Popov).
  • Fixed minor race condition that might cause "intersecting parts" error in extremely rare cases after ZooKeeper connection loss. #34096 (Alexander Tokmakov).
  • Fix possible data race in StorageFile that was introduced in https://github.com/ClickHouse/ClickHouse/pull/33960. Closes #34111. #34113 (Kruglov Pavel).
  • Fix inserts to distributed tables in case of change of native protocol. The last change was in the version version 22.1, so there may be some failures of inserts to distributed tables after upgrade to that version. #34132 (Anton Popov).
  • Fix bug which can rarely lead to error "Cannot read all data" while reading LowCardinality columns of MergeTree table engines family which stores data on remote file system like S3. #34139 (alesapin).
  • Fix rare and benign race condition in HDFS, S3 and URL storage engines which can lead to additional connections. #34172 (alesapin).
  • Fix schema inference for table runction s3. #34186 (Kruglov Pavel).
  • Fix metric Query, which shows number of executing queries. In last several releases it was always 0. #34224 (Anton Popov).
  • Fix reading of subcolumns with dots in their names. In particular fixed reading of Nested columns, if their element names contain dots (e.g Nested(`keys.name` String, `keys.id` UInt64, values UInt64)). #34228 (Anton Popov).
  • Fix memory leak in case of some Exception during query processing with optimize_aggregation_in_order=1. #34234 (Azat Khuzhin).
  • Fix current_user/current_address for interserver mode (Before this patch current_user/current_address will be preserved from the previous query). #34263 (Azat Khuzhin).
  • Fix progress bar width. It was incorrectly rounded to integer number of characters. #34275 (Alexey Milovidov).
  • Fixed a couple of extremely rare race conditions that might lead to broken state of replication queue and "intersecting parts" error. #34297 (Alexander Tokmakov).
  • Fix various issues when projection is enabled by default. Each issue is described in separate commit. This is for #33678 . This fixes #34273. #34305 (Amos Bird).
  • Try to fix rare bug while reading of empty arrays, which could lead to Data compressed with different methods error. #34327 (Anton Popov).
  • Fix wrong engine syntax in result of SHOW CREATE DATABASE query for databases with engine Memory. This closes #34335. #34345 (Alexey Milovidov).
  • For SQLUserDefinedFunctions change privilege level from DATABASE to GLOBAL. Closes #34281. #34404 (Maksim Kita).
  • Fix segfault in schema inference from url. Closes #34147. #34405 (Kruglov Pavel).
  • Fix possible error 'Cannot convert column Function to mask' in short circuit function evaluation. Closes #34171. #34415 (Kruglov Pavel).
  • Add missing lock for storage. Fixes possible race with table deletion. #34416 (Kseniia Sumarokova).
  • Fix possible error 'file_size: Operation not supported'. #34479 (Kruglov Pavel).
  • Fix compression support in URL engine. #34524 (Frank Chen).
  • Fix comparison between integers and floats in index analysis. Previously it could lead to skipping some granules for reading by mistake. Fixes #34493. #34528 (Anton Popov).
  • Fix exception Chunk should have AggregatedChunkInfo in MergingAggregatedTransform (in case of optimize_aggregation_in_order=1 and distributed_aggregation_memory_efficient=0). Fixes #34526. #34532 (Anton Popov).
  • In case of cancelation S3 and HDFS canceled only current reader, but continued to execute the initial query. Fixes #34301 Relates to #34397. #34539 (Dmitry Novik).
  • Fix bug of round/roundBankers, close #33267. #34562 (李扬).
  • Fixed the assertion in case of using allow_experimental_parallel_reading_from_replicas with max_parallel_replicas equals to 1. This fixes #34525. #34613 (Nikita Mikhaylov).
    • Add Debug workflow to get variables for all actions on demand - Fix lack of pr_info.number for some edge case. #34644 (Mikhail f. Shiryaev).

NO CL CATEGORY

NO CL ENTRY