ClickHouse/docs/changelogs/v22.8.1.2097-lts.md
Alexey Milovidov e0252db8d4 No prestable
2023-03-27 12:19:32 +02:00

72 KiB

sidebar_position sidebar_label
1 2022

2022 Changelog

ClickHouse release v22.8.1.2097-lts (09a2ff8843) FIXME as compared to v22.7.1.2484-stable (f4f05ec786)

Backward Incompatible Change

  • Make cache composable, allow not to evict certain files (regarding idx, mrk, ..), delete old cache version. Now it is possible to configure cache over Azure blob storage disk, over Local disk, over StaticWeb disk, etc. This PR is marked backward incompatible because cache configuration changes and in order for cache to work need to update the config file. Old cache will still be used with new configuration. The server will startup fine with the old cache configuration. Closes #36140. Closes #37889. #36171 (Kseniia Sumarokova).
  • Now, all relevant dictionary sources respect remote_url_allow_hosts setting. It was already done for HTTP, Cassandra, Redis. Added ClickHouse, MongoDB, MySQL, PostgreSQL. Host is checked only for dictionaries created from DDL. #39184 (Nikolai Kochetov).
  • Extended range of Date32 and DateTime64 to support dates from the year 1900 to 2299. In previous versions, the supported interval was only from the year 1925 to 2283. The implementation is using the proleptic Gregorian calendar (which is conformant with ISO 8601:2004 (clause 3.2.1 The Gregorian calendar)) instead of accounting for historical transitions from the Julian to the Gregorian calendar. This change affects implementation-specific behavior for out-of-range arguments. E.g. if in previous versions the value of 1899-01-01 was clamped to 1925-01-01, in the new version it will be clamped to 1900-01-01. It changes the behavior of rounding with toStartOfInterval if you pass INTERVAL 3 QUARTER up to one quarter because the intervals are counted from an implementation-specific point of time. Closes #28216, improves #38393. #39425 (Roman Vasin).

New Feature

  • Added a setting exact_rows_before_limit (0/1). When enabled, ClickHouse will provide exact value for rows_before_limit_at_least statistic, but with the cost that the data before limit will have to be read completely. This closes #6613. #25333 (kevin wan).
  • Add SLRU cache policy for uncompressed cache and marks cache. #34651 (alexX512).
  • Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the Intel® Query Processing Library (QPL) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. #36654 (jasperzhu).
  • Add concurrent_threads_soft_limit parameter to increase performance in case of high RPS by means of limiting total number of threads for all queries. #37285 (Roman Vasin).
  • Added concurrency control logic to limit total number of concurrent threads created by queries. #37558 (Sergei Trifonov).
  • Added support for parallel distributed insert select into tables with Distributed and Replicated engine #34670. #39107 (Nikita Mikhaylov).
  • Add new settings to control schema inference from text formats: - input_format_try_infer_dates - try infer dates from strings. - input_format_try_infer_datetimes - try infer datetimes from strings. - input_format_try_infer_integers - try infer Int64 instead of Float64. - input_format_json_try_infer_numbers_from_strings - try infer numbers from json strings in JSON formats. #39186 (Kruglov Pavel).
  • This feature will provide JSON formatted log output in console. The purpose is to allow easier ingestion and query in log analysis tools. #39277 (Mallik Hassan).
  • Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator available in the upcoming generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). Its goal is to speed up common operations in analytics like data (de)compression and filtering. ClickHouse gained the new "DeflateQpl" compression codec which utilizes the Intel® IAA offloading technology to provide a high-performance DEFLATE implementation. The codec uses the Intel® Query Processing Library (QPL) which abstracts access to the hardware accelerator, respectively to a software fallback in case the hardware accelerator is not available. DEFLATE provides in general higher compression rates than ClickHouse's LZ4 default codec, and as a result, offers less disk I/O and lower main memory consumption. #39494 (Robert Schulze).
  • Add function nowInBlock which allows getting the current time during long-running and continuous queries. Closes #39522. Notes: there are no functions now64InBlock neither todayInBlock. #39533 (Alexey Milovidov).
    • Add result_rows and result_bytes to progress reports (X-ClickHouse-Summary). #39567 (Raúl Marín).
  • adds ability to specify settings for an executable() table function. #39681 (Constantine Peresypkin).
  • Implemented automatic conversion of database engine from Ordinary to Atomic. Create empty convert_ordinary_to_atomic file in flags directory and all Ordinary databases will be converted automatically on next server start. Resolves #39546. #39933 (Alexander Tokmakov).
  • Add new setting schema_inference_hints that allows to specify structure hints in schema inference for specific columns. Closes #39569. #40068 (Kruglov Pavel).

Performance Improvement

  • Deduce way to sort based on input stream sort description. Skip sorting if input stream is already sorted. #38719 (Igor Nikonov).
  • DISTINCT in order with ORDER BY improves memory usage (significantly) and query execution time if DISTINCT columns match (or form a prefix of) ORDER BY columns. #39432 (Igor Nikonov).
  • Use local node as first priority to get structure of remote table when executing cluster and similar table functions. #39440 (Mingliang Pan).
  • Use DistinctSortedTransform only when sort description is applicable to DISTINCT columns, otherwise fall back to ordinary DISTINCT implementation. It allows making less checks during DistinctSortedTransform execution. #39528 (Igor Nikonov).
  • DistinctSortedTransform didn't take advantage of sorting, i.e. it worked like ordinary DISTINCT implementation. The fix reduces memory usage significantly. #39538 (Igor Nikonov).
  • ColumnVector: optimize filter with AVX512VBMI2 compress store. #39633 (Guo Wangyang).
  • KeyCondition: optimize applyFunction in multi-thread scenario. #39812 (Guo Wangyang).
  • For systems with AVX512 VBMI2, this PR improves performance by ca. 6% for SSB benchmark queries queries 3.1, 3.2 and 3.3 (SF=100). Tested on Intel Icelake Xeon 8380 * 2 socket. #40033 (Robert Schulze).

Improvement

  • Change the way how PK is analyzed for MergeTree. #25563 (Nikolai Kochetov).
    • Improved structure of DDL query result table for Replicated database (separate columns with shard and replica name, more clear status) - CREATE TABLE ... ON CLUSTER queries can be normalized on initiator first if distributed_ddl_entry_format_version is set to 3 (default value). It means that ON CLUSTER queries may not work if initiator does not belong to the cluster that specified in query. Fixes #37318, #39500 - Ignore ON CLUSTER clause if database is Replicated and cluster name equals to database name. Related to #35570 - Miscellaneous minor fixes for Replicated database engine - Check metadata consistency when starting up Replicated database, start replica recovery in case of mismatch of local metadata and metadata in Keeper. Resolves #24880. #37198 (Alexander Tokmakov).
  • Support SQL standard DELETE FROM syntax on merge tree tables and lightweight delete implementation for merge tree families. #37893 (Jianmei Zhang).
  • timeSlots now works with DateTime64; subsecond duration and slot size available when working with DateTime64. #37951 (Andrey Zvonov).
  • Add cache for schema inference for file/s3/hdfs/url table functions. Now, schema inference will be performed only on the first query to the file, all subsequent queries to the same file will use the schema from cache if data wasn't changed. Add system table system.schema_inference_cache with all current schemas in cache and system queries SYSTEM DROP SCHEMA CACHE [FOR FILE/S3/HDFS/URL] to drop schemas from cache. #38286 (Kruglov Pavel).
    • Simplified function registration macro interface (FUNCTION_REGISTER*) to eliminate the step to add and call an extern function in the registerFunctions.cpp, it also makes incremental builds of a new function faster. #38615 (Li Yin).
    • Added support of LEFT SEMI and LEFT ANTI direct join with rocksdb. #38956 (Vladimir C).
  • resolves #37490. #39054 (SmitaRKulkarni).
  • Store Keeper API version inside a predefined path. #39096 (Antonio Andelic).
  • Now entrypoint.sh in docker image creates and executes chown for all folders it found in config for multidisk setup #17717. #39121 (Nikita Mikhaylov).
  • Add profile events for fsync. #39179 (Azat Khuzhin).
  • Add the second argument to the ordinary function file(path[, default]), which function returns in the case when a file does not exists. #39218 (Nikolay Degterinsky).
  • Some small fixes for reading via http, allow to retry partial content in case if got 200OK. #39244 (Kseniia Sumarokova).
  • Improved Base58 encoding/decoding. #39292 (Andrey Zvonov).
  • Normalize AggregateFunction types and state representations because optimizations like https://github.com/ClickHouse/ClickHouse/pull/35788 will treat count(not null columns) as count(), which might confuses distributed interpreters with the following error : Conversion from AggregateFunction(count) to AggregateFunction(count, Int64) is not supported. #39420 (Amos Bird).
  • Improved memory usage during memory efficient merging of aggregation results. #39429 (Nikita Taranov).
  • Support queries CREATE TEMPORARY TABLE ... (<list of columns>) AS .... #39462 (Kruglov Pavel).
  • Add support of !/* (exclamation/asterisk) in custom TLDs (cutToFirstSignificantSubdomainCustom()/cutToFirstSignificantSubdomainCustomWithWWW()/firstSignificantSubdomainCustom()). #39496 (Azat Khuzhin).
  • Rework and simplify the system.backups table, remove the internal column, allow user to set ID of operation, add columns num_files, uncompressed_size, compressed_size, start_time, end_time. #39503 (Vitaly Baranov).
  • Refactored a little code, removed duplicate code. #39509 (Simon Liu).
  • Add support for TLS connections to NATS. Implements #39525. #39527 (Constantine Peresypkin).
  • clickhouse-obfuscator (a tool for database obfuscation for testing and load generation) now has the new --save and --load parameters to work with pre-trained models. This closes #39534. #39541 (Alexey Milovidov).
  • Fix incorrect behavior of log rotation during restart. #39558 (Nikolay Degterinsky).
  • Improve bytes to bits mask transform for SSE/AVX/AVX512. #39586 (Guo Wangyang).
  • Add formats PrettyMonoBlock, PrettyNoEscapesMonoBlock, PrettyCompactNoEscapes, PrettyCompactNoEscapesMonoBlock, PrettySpaceNoEscapes, PrettySpaceMonoBlock, PrettySpaceNoEscapesMonoBlock. #39646 (Kruglov Pavel).
  • Fix building aggregate projections when external aggregation is on. Mark as improvement because the case is rare and there exists easy workaround to fix it via changing settings. This fixes #39667 . #39671 (Amos Bird).
  • Allow to execute hash functions with arguments of type Map. #39685 (Anton Popov).
  • Add a configuration parameter to hide addresses in stack traces. It may improve security a little but generally, it is harmful and should not be used. #39690 (Alexey Milovidov).
  • change the prefix size of AggregateFunctionDistinct to make sure nested function data memory aligned. #39696 (Pxl).
  • Properly escape credentials passed to the clickhouse-diagnostic tool. #39707 (Dale McDiarmid).
  • keeper-improvement: create a snapshot on exit. It can be controlled with the config keeper_server.create_snapshot_on_exit, true by default. #39755 (Antonio Andelic).
  • Support primary key analysis for row_policy_filter and additional_filter. It also helps fix issues like #37454 . #39826 (Amos Bird).
  • Parameters are now transferred in Query packets right after the query text in the same serialisation format as the settings. #39906 (Nikita Taranov).
  • Fix two usability issues in Play UI: - it was non-pixel-perfect on iPad due to parasitic border radius and margins; - the progress indication did not display after the first query. This closes #39957. This closes #39960. #39961 (Alexey Milovidov).
  • Play UI: add row numbers; add cell selection on click; add hysteresis for table cells. #39962 (Alexey Milovidov).
  • The client will show server-side elapsed time. This is important for the performance comparison of ClickHouse services in remote datacenters. This closes #38070. See also this for motivation. #39968 (Alexey Milovidov).
  • Adds parseDateTime64BestEffortUS, parseDateTime64BestEffortUSOrNull, parseDateTime64BestEffortUSOrZero functions, closing #37492. #40015 (Tanya Bragin).
    • Add observer mode to (zoo)keeper cluster discovery feature. In this mode node itself doesn't belong to cluster. #40035 (Vladimir C).
  • Play UI: recognize tab key in textarea, but at the same time don't mess up with tab navigation. #40053 (Alexey Milovidov).
  • Extend processors_profile_log with more information such as input rows. #40121 (Amos Bird).
  • Update tzdata to 2022b to support the new timezone changes. See https://github.com/google/cctz/pull/226. Chile's 2022 DST start is delayed from September 4 to September 11. Iran plans to stop observing DST permanently, after it falls back on 2022-09-21. There are corrections of the historical time zone of Asia/Tehran in the year 1977: Iran adopted standard time in 1935, not 1946. In 1977 it observed DST from 03-21 23:00 to 10-20 24:00; its 1978 transitions were on 03-24 and 08-05, not 03-20 and 10-20; and its spring 1979 transition was on 05-27, not 03-21 (https://data.iana.org/time-zones/tzdb/NEWS). #40184 (Alexey Milovidov).
  • Display server-side time in clickhouse-benchmark by default if it is available (since ClickHouse version 22.8). This is needed to correctly compare the performance of clouds. This behavior can be changed with the new --client-side-time command line option. Change the --randomize command line option from --randomize 1 to the form without argument. #40193 (Alexey Milovidov).
  • Add counters (ProfileEvents) for cases when query complexity limitation has been set and has reached (a separate counter for overflow_mode = break and throw). For example, if you have set up max_rows_to_read with read_overflow_mode = 'break', looking at the value of OverflowBreak counter will allow distinguishing incomplete results. #40205 (Alexey Milovidov).
  • Fix memory accounting in case of MEMORY_LIMIT_EXCEEDED errors (previously [peak] memory usage was takes failed allocations into account). #40249 (Azat Khuzhin).
  • Add current metrics for fs cache: FilesystemCacheSize and FilesystemCacheElements. #40260 (Kseniia Sumarokova).
  • Add support for LARGE_BINARY/LARGE_STRING with Arrow (Closes #32401). #40293 (Josh Taylor).

Bug Fix

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in official stable release)

  • Fixed query hanging for SELECT with ORDER BY WITH FILL with different date/time types. #37849 (Yakov Olkhovskiy).
  • Fix ORDER BY that matches projections ORDER BY (before it simply returns unsorted result). #38725 (Azat Khuzhin).
  • Do not optimise functions in GROUP BY statements if they shadow one of the table columns or expressions. Fixes #37032. #39103 (Anton Kozlov).
  • Fix wrong table name in logs after RENAME TABLE. This fixes #38018. #39227 (Amos Bird).
  • Fix positional arguments in case of columns pruning when optimising the query. Closes #38433. #39293 (Kseniia Sumarokova).
  • Fix bug in schema inference in case of empty messages in Protobuf/CapnProto formats that allowed to create column with empty Tuple type. Closes #39051 Add 2 new settings input_format_{protobuf/capnproto}_skip_fields_with_unsupported_types_in_schema_inference that allow to skip fields with unsupported types while schema inference for Protobuf and CapnProto formats. #39357 (Kruglov Pavel).
  • Fix segmentation fault on CREATE WINDOW VIEW .. ON CLUSTER ... INNER. Closes #39363. #39384 (Kseniia Sumarokova).
  • Fix WriteBuffer finalize when cancel insert into function. Proper version of https://github.com/ClickHouse/ClickHouse/pull/39396 that was reverted. #39458 (Kruglov Pavel).
  • Fix storing of columns of type Object in sparse serialization. #39464 (Anton Popov).
  • Fix possible "Not found column in block" exception when using projections. This closes #39469. #39470 (小路).
  • Fix LOGICAL_ERROR on race between DROP and INSERT with materialized views. #39477 (Azat Khuzhin).
  • Fix data race and possible heap-buffer-overflow in Avro format. Closes #39094 Closes #33652. #39498 (Kruglov Pavel).
  • Fix rare bug in asynchronous reading (with setting local_filesystem_read_method='pread_threadpool') with enabled O_DIRECT (enabled by setting min_bytes_to_use_direct_io). #39506 (Anton Popov).
  • Fixes "Code: 49. DB::Exception: FunctionFactory: the function name '' is not unique. (LOGICAL_ERROR)" observed on FreeBSD when starting clickhouse. #39551 (Alexander Gololobov).
  • Fix bug with maxsplit argument for splitByChar, which was not working correctly. #39552 (filimonov).
  • Fixed CREATE/DROP INDEX query with ON CLUSTER or Replicated database and ReplicatedMergeTree. It used to be executed on all replicas (causing error or DDL queue stuck). Fixes #39511. #39565 (Alexander Tokmakov).
  • Fix "column not found" error for push down with join, close #39505. #39575 (Vladimir C).
  • Fix the wrong REGEXP_REPLACE alias. This fixes https://github.com/ClickHouse/ClickBench/issues/9. #39592 (Alexey Milovidov).
  • Fixed point of origin for exponential decay window functions to the last value in window. Previously, decay was calculated by formula exp((t - curr_row_t) / decay_length), which is incorrect when right boundary of window is not CURRENT ROW. It was changed to: exp((t - last_row_t) / decay_length). There is no change in results for windows with ROWS BETWEEN (smth) AND CURRENT ROW. #39593 (Vladimir Chebotaryov).
  • Fix Decimal division overflow, which can be detected based on operands scale. #39600 (Andrey Zvonov).
  • Fix settings output_format_arrow_string_as_string and output_format_arrow_low_cardinality_as_dictionary work in combination. Closes #39624. #39647 (Kruglov Pavel).
  • Fixed a bug in default database resolution in distributed table reads. #39674 (Anton Kozlov).
  • Select might read data of dropped table if cache for mmap IO is used and database engine is Ordinary and new tables was created with the same name as dropped one had. It's fixed. #39708 (Alexander Tokmakov).
  • Fix possible error Invalid column type for ColumnUnique::insertRangeFrom. Expected String, got ColumnLowCardinality Fixes #38460. #39716 (Arthur Passos).
  • Field names in the meta section of JSON format were erroneously double escaped. This closes #39693. #39747 (Alexey Milovidov).
  • Fix wrong index analysis with tuples and operator IN, which could lead to wrong query result. #39752 (Anton Popov).
  • Fix EmbeddedRocksDB filtering by key using params. #39757 (Antonio Andelic).
  • Fix error Invalid number of columns in chunk pushed to OutputPort which was cause by ARRAY JOIN optimization. Fixes #39164. #39799 (Nikolai Kochetov).
  • Fix CANNOT_READ_ALL_DATA exception with local_filesystem_read_method=pread_threadpool. This bug affected only Linux kernel version 5.9 and 5.10 according to man. #39800 (Anton Popov).
  • Fix quota_key application on connect. #39874 (Yakov Olkhovskiy).
  • we meeted query exceptions: DB::Exception: Cannot open file /media/ssd1/fordata/clickhouse/data/data/perf/perf_log_local_v3_1/20220618_17233_17238_1/namespace.dict.bin, errno: 24, strerror: Too many open files. #39886 (Fangyuan Deng).
  • fix broken NFS mkdir for root-squashed volumes. #39898 (Constantine Peresypkin).
  • Remove dictionaries from prometheus metrics on DETACH/DROP. #39926 (Azat Khuzhin).
  • Fix read of StorageFile with virtual columns. Closes #39907. #39943 (flynn).
  • Fix big memory usage during fetches. Fixes #39915. #39990 (Nikolai Kochetov).
  • fix HashMethodOneNumber get wrong key value when column is const. #40020 (Duc Canh Le).
  • Fixed "Part directory doesn't exist" and "tmp_<part_name> ... No such file or directory" errors during too slow INSERT or too long merge/mutation. Also fixed issue that may cause some replication queue entries to stuck without any errors or warnings in logs if previous attempt to fetch part failed, but tmp-fetch_<part_name> directory was not cleaned up. #40031 (Alexander Tokmakov).
  • Fix rare cases of parsing of arrays of tuples in format Values. #40034 (Anton Popov).
  • Fixes ArrowColumn format Dictionary(X) & Dictionary(Nullable(X)) conversion to ClickHouse LowCardinality(X) & LowCardinality(Nullable(X)) respectively. #40037 (Arthur Passos).
  • Fix potential deadlock in WriteBufferFromS3 during task scheduling failure. #40070 (Maksim Kita).
  • Fix bug in collectFilesToSkip() by adding correct file extension(.idx or idx2) for indexes to be recalculated, avoid wrong hard links. Fixed #39896. #40095 (Jianmei Zhang).
  • A segmentation fault that has CaresPTRResolver::resolve in the stack trace has been reported:. #40134 (Arthur Passos).
  • Fix a very rare case of incorrect behavior of array subscript operator. This closes #28720. #40185 (Alexey Milovidov).
  • Fix insufficient argument check for encryption functions (found by query fuzzer). This closes #39987. #40194 (Alexey Milovidov).
  • Fix the case when the order of columns can be incorrect if the IN operator is used with a table with ENGINE = Set containing multiple columns. This fixes #13014. #40225 (Alexey Milovidov).
  • Fix possible segfault in CapnProto input format. This bug was found and send through ClickHouse bug-bounty program by kiojj. #40241 (Kruglov Pavel).
    • Avoid continuously growing memory consumption of pattern cache when using functions multi(Fuzzy)Match(Any|AllIndices|AnyIndex)(). #40264 (Robert Schulze).

Build

  • Fix build error: [ 69%] Building CXX object src/CMakeFiles/clickhouse_common_io.dir/Common/waitForPid.cpp.o /CLionProjects/clickhouse-yandex/src/Common/waitForPid.cpp:112:5: error: identifier '__kevp__' is reserved because it starts with '__' [-Werror,-Wreserved-identifier] EV_SET(&change, pid, EVFILT_PROC, EV_ADD, NOTE_EXIT, 0, NULL); ^ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/sys/event.h:108:17: note: expanded from macro 'EV_SET' struct kevent *__kevp__ = (kevp); \ ^. #39493 (小路).

Build Improvement

  • Fixed Endian issue in BitHelpers for s390x. #39656 (Harry Lee).
  • Implement a piece of code related to SipHash for s390x architecture (which is not supported by ClickHouse). #39732 (Harry Lee).
  • Fixed an Endian issue in Coordination snapshot code for s390x architecture (which is not supported by ClickHouse). #39931 (Harry Lee).
  • Fixed Endian issues in Codec code for s390x architecture (which is not supported by ClickHouse). #40008 (Harry Lee).
  • Fixed Endian issues in reading/writing BigEndian binary data in ReadHelpers and WriteHelpers code for s390x architecture (which is not supported by ClickHouse). #40179 (Harry Lee).

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT

Support cte statement for antlr4 syntax file