ClickHouse/docs/changelogs/v23.3.1.2823-lts.md
2024-05-23 13:54:45 +02:00

94 KiB

sidebar_position sidebar_label
1 2023

2023 Changelog

ClickHouse release v23.3.1.2823-lts (46e85357ce) FIXME as compared to v23.2.1.2537-stable (52bf836e03)

Backward Incompatible Change

  • Relax symbols that are allowed in URL authority in domainRFC()/netloc(). #46841 (Azat Khuzhin).
  • Prohibit create tables based on KafkaEngine with DEFAULT/EPHEMERAL/ALIAS/MATERIALIZED statements for columns. #47138 (Aleksandr Musorin).
  • An "asynchronous connection drain" feature is removed. Related settings and metrics are removed as well. It was an internal feature, so the removal should not affect users who had never heard about that feature. #47486 (Alexander Tokmakov).
  • Support 256-bit Decimal data type (more than 38 digits) in arraySum/Min/Max/Avg/Product, arrayCumSum/CumSumNonNegative, arrayDifference, array construction, IN operator, query parameters, groupArrayMovingSum, statistical functions, min/max/any/argMin/argMax, PostgreSQL wire protocol, MySQL table engine and function, sumMap, mapAdd, mapSubtract, arrayIntersect. Add support for big integers in arrayIntersect. Statistical aggregate functions involving moments (such as corr or various TTests) will use Float64 as their internal representation (they were using Decimal128 before this change, but it was pointless), and these functions can return nan instead of inf in case of infinite variance. Some functions were allowed on Decimal256 data types but returned Decimal128 in previous versions - now it is fixed. This closes #47569. This closes #44864. This closes #28335. #47594 (Alexey Milovidov).
  • Make backup_threads/restore_threads server settings. #47881 (Azat Khuzhin).
  • Fix the isIPv6String function which could have outputted a false positive result in the case of an incorrect IPv6 address. For example 1234::1234: was considered a valid IPv6 address. #47895 (Nikolay Degterinsky).

New Feature

  • Add new mode for splitting the work on replicas using settings parallel_replicas_custom_key and parallel_replicas_custom_key_filter_type. If the cluster consists of a single shard with multiple replicas, up to max_parallel_replicas will be randomly picked and turned into shards. For each shard, a corresponding filter is added to the query on the initiator before being sent to the shard. If the cluster consists of multiple shards, it will behave the same as sample_key but with the possibility to define an arbitrary key. #45108 (Antonio Andelic).
  • Added query setting partial_result_on_first_cancel allowing the canceled query (e.g. due to Ctrl-C) to return a partial result. #45689 (Alexey Perevyshin).
  • Added support of arbitrary tables engines for temporary tables except for Replicated and KeeperMap engines. Partially close #31497. #46071 (Roman Vasin).
  • Add replication of user-defined SQL functions using ZooKeeper. #46085 (Aleksei Filatov).
  • Implement system.server_settings (similar to system.settings), which will contain server configurations. #46550 (pufit).
  • Intruduce a function WIDTH_BUCKET. #42974. #46790 (avoiderboi).
  • Add new function parseDateTime/parseDateTimeInJodaSyntax according to specified format string. parseDateTime parses string to datetime in MySQL syntax, parseDateTimeInJodaSyntax parses in Joda syntax. #46815 (李扬).
  • Use dummy UInt8 for default structure of table function null. Closes #46930. #47006 (flynn).
  • Dec 15, 2021 support for parseDateTimeBestEffort function. closes #46816. #47071 (chen).
  • Add function ULIDStringToDateTime(). Closes #46945. #47087 (Nikolay Degterinsky).
  • Add settings http_wait_end_of_query and http_response_buffer_size that corresponds to URL params wait_end_of_query and buffer_size for HTTP interface. #47108 (Vladimir C).
  • Support for UNDROP TABLE query. Closes #46811. #47241 (chen).
  • Add system.marked_dropped_tables table that shows tables that were dropped from Atomic databases but were not completely removed yet. #47364 (chen).
  • Add INSTR as alias of positionCaseInsensitive for MySQL compatibility. Closes #47529. #47535 (flynn).
  • Added toDecimalString function allowing to convert numbers to string with fixed precision. #47838 (Andrey Zvonov).
  • Added operator "REGEXP" (similar to operators "LIKE", "IN", "MOD" etc.) for better compatibility with MySQL. #47869 (Robert Schulze).
  • Allow executing reading pipeline for DIRECT dictionary with CLICKHOUSE source in multiple threads. To enable set dictionary_use_async_executor=1 in SETTINGS section for source in CREATE DICTIONARY statement. #47986 (Vladimir C).
  • Add merge tree setting max_number_of_mutatuins_for_replica. It limit the number of part mutations per replica to the specified amount. Zero means no limit on the number of mutations per replica (the execution can still be constrained by other settings). #48047 (Vladimir C).

Performance Improvement

  • Optimize one nullable key aggregate performance. #45772 (LiuNeng).
  • Implemented lowercase tokenbf_v1 index utilization for hasTokenOrNull, hasTokenCaseInsensitive and hasTokenCaseInsensitiveOrNull. #46252 (ltrk2).
  • Optimize the generic SIMD StringSearcher by searching first two chars. #46289 (Jiebin Sun).
  • System.detached_parts could be significant large. - added several sources with respects block size limitation - in each block iothread pool is used to calculate part size, ie to make syscalls in parallel. #46624 (Sema Checherinda).
  • Increase the default value of max_replicated_merges_in_queue for ReplicatedMergeTree tables from 16 to 1000. It allows faster background merge operation on clusters with a very large number of replicas, such as clusters with shared storage in ClickHouse Cloud. #47050 (Alexey Milovidov).
  • Backups for large numbers of files were unbelievably slow in previous versions. #47251 (Alexey Milovidov).
  • Support filter push down to left table for JOIN with StorageJoin, StorageDictionary, StorageEmbeddedRocksDB. #47280 (Maksim Kita).
  • Marks in memory are now compressed, using 3-6x less memory. #47290 (Michael Kolupaev).
  • Updated copier to use group by instead of distinct to get list of partitions. For large tables this reduced the select time from over 500s to under 1s. #47386 (Clayton McClure).
  • Address https://github.com/clickhouse/clickhouse/issues/46453. bisect marked https://github.com/clickhouse/clickhouse/pull/35525 as the bad changed. this pr looks to reverse the changes in that pr. #47544 (Ongkong).
  • Fixed excessive reading in queries with FINAL. #47801 (Nikita Taranov).
  • Setting max_final_threads would be set to number of cores at server startup (by the same algorithm as we use for max_threads). This improves concurrency of final execution on servers with high number of CPUs. #47915 (Nikita Taranov).
  • Avoid breaking batches on read requests to improve performance. #47978 (Antonio Andelic).

Improvement

  • Add map related functions: mapFromArrays, which allows us to create map from a pair of arrays. #31125 (李扬).
  • Rewrite distributed sends to avoid using filesystem as a queue, use in-memory queue instead. #45491 (Azat Khuzhin).
  • Allow separate grants for named collections (e.g. to be able to give SHOW/CREATE/ALTER/DROP named collection access only to certain collections, instead of all at once). Closes #40894. Add new access type NAMED_COLLECTION_CONTROL which is not given to default user unless explicitly added to user config (is required to be able to do GRANT ALL), also show_named_collections is no longer obligatory to be manually specified for default user to be able to have full access rights as was in 23.2. #46241 (Kseniia Sumarokova).
  • Now X-ClickHouse-Query-Id and X-ClickHouse-Timezone headers are added to response in all queries via http protocol. Previously it was done only for SELECT queries. #46364 (Anton Popov).
  • Support for connection to a replica set via a URI with a host:port enum and support for the readPreference option in MongoDB dictionaries. Example URI: mongodb://db0.example.com:27017,db1.example.com:27017,db2.example.com:27017/?replicaSet=myRepl&readPreference=primary. #46524 (artem-yadr).
  • Re-implement projection analysis on top of query plan. Added setting query_plan_optimize_projection=1 to switch between old and new version. Fixes #44963. #46537 (Nikolai Kochetov).
  • Use parquet format v2 instead of v1 in output format by default. Add setting output_format_parquet_version to control parquet version, possible values v1_0, v2_4, v2_6, v2_latest (default). #46617 (Kruglov Pavel).
  • Not for changelog - part of #42648. #46632 (Yakov Olkhovskiy).
  • Allow to ignore errors while pushing to MATERILIZED VIEW (add new setting materialized_views_ignore_errors, by default to false, but it is set to true for flushing logs to system.*_log tables unconditionally). #46658 (Azat Khuzhin).
  • Enable input_format_json_ignore_unknown_keys_in_named_tuple by default. #46742 (Kruglov Pavel).
  • It is now possible using new configuration syntax to configure Kafka topics with periods in their name. #46752 (Robert Schulze).
  • Fix heuristics that check hyperscan patterns for problematic repeats. #46819 (Robert Schulze).
  • Don't report ZK node exists to system.errors when a block was created concurrently by a different replica. #46820 (Raúl Marín).
  • Allow PREWHERE for Merge with different DEFAULT expression for column. #46831 (Azat Khuzhin).
  • Increase the limit for opened files in clickhouse-local. It will be able to read from web tables on servers with a huge number of CPU cores. Do not back off reading from the URL table engine in case of too many opened files. This closes #46852. #46853 (Alexey Milovidov).
  • Exceptions thrown when numbers cannot be parsed now have an easier-to-read exception message. #46917 (Robert Schulze).
  • Added update system.backups after every processed task. #46989 (Aleksandr Musorin).
  • Allow types conversion in Native input format. Add settings input_format_native_allow_types_conversion that controls it (enabled by default). #46990 (Kruglov Pavel).
  • Allow IPv4 in the range function to generate IP ranges. #46995 (Yakov Olkhovskiy).
  • Role change was not promoted sometimes before https://github.com/ClickHouse/ClickHouse/pull/46772 This PR just adds tests. #47002 (Ilya Golshtein).
  • Improve exception message when it's impossible to make part move from one volume/disk to another. #47032 (alesapin).
  • Support Bool type in JSONType function. Previously Null type was mistakenly returned for bool values. #47046 (Anton Popov).
  • Use _request_body parameter to configure predefined http queries. #47086 (Constantine Peresypkin).
  • Removing logging of custom disk structure. #47103 (Kseniia Sumarokova).
  • Allow nested custom disks. Previously custom disks supported only flat disk structure. #47106 (Kseniia Sumarokova).
  • Automatic indentation in the built-in UI SQL editor when Enter is pressed. #47113 (Alexey Korepanov).
  • Allow control compression in Parquet/ORC/Arrow output formats, support more compression for input formats. This closes #13541. #47114 (Kruglov Pavel).
  • Self-extraction with 'sudo' will attempt to set uid and gid of extracted files to running user. #47116 (Yakov Olkhovskiy).
  • Currently the funtion repeat's second argument must be unsigned integer type, which can not accept a integer value like -1. And this is different from the spark function, so I fix this here to make it same as spark. And it tested as below. #47134 (KevinyhZou).
  • Remove ::__1 part from stacktraces. Display std::basic_string<char, ... as String in stacktraces. #47171 (Mike Kot).
  • Introduced a separate thread pool for backup IO operations. This will allow to scale it independently from other pool and increase performance. #47174 (Nikita Mikhaylov).
  • Reimplement interserver mode to avoid replay attacks (note, that change is backward compatible with older servers). #47213 (Azat Khuzhin).
  • Make function optimizeregularexpression recognize re groups and refine regexp tree dictionary. #47218 (Han Fei).
  • Use MultiRead request and retries for collecting metadata at final stage of backup processing. #47243 (Nikita Mikhaylov).
  • Keeper improvement: Add new 4LW clrs to clean resources used by Keeper (e.g. release unused memory). #47256 (Antonio Andelic).
  • Add optional arguments to codecs DoubleDelta(bytes_size), Gorilla(bytes_size), FPC(level, float_size), it will allow using this codecs without column type in clickhouse-compressor. Fix possible abrots and arithmetic errors in clickhouse-compressor with these codecs. Fixes: https://github.com/ClickHouse/ClickHouse/discussions/47262. #47271 (Kruglov Pavel).
  • Add support for big int types to runningDifference() function. Closes #47194. #47322 (Nikolay Degterinsky).
  • PostgreSQL replication has been adjusted to use "FROM ONLY" clause while performing initial synchronization. This prevents double-fetching the same data in case the target PostgreSQL database uses table inheritance. #47387 (Maksym Sobolyev).
  • Add an expiration window for S3 credentials that have an expiration time to avoid ExpiredToken errors in some edge cases. It can be controlled with expiration_window_seconds config, the default is 120 seconds. #47423 (Antonio Andelic).
  • Support Decimals and Date32 in Avro format. #47434 (Kruglov Pavel).
  • Do not start the server if an interrupted conversion from Ordinary to Atomic was detected, print a better error message with troubleshooting instructions. #47487 (Alexander Tokmakov).
  • Add a new column kind to system.opentelemetry_span_log. This column holds the value of SpanKind defined in OpenTelemtry. #47499 (Frank Chen).
  • If a backup and restoring data are both in S3 then server-side copy should be used from now on. #47546 (Vitaly Baranov).
  • Add SSL User Certificate authentication to the native protocol. Closes #47077. #47596 (Nikolay Degterinsky).
  • Allow reading/writing nested arrays in Protobuf with only root field name as column name. Previously column name should've contain all nested field names (like a.b.c Array(Array(Array(UInt32))), now you can use just a Array(Array(Array(UInt32))). #47650 (Kruglov Pavel).
  • Added an optional STRICT modifier for SYSTEM SYNC REPLICA which makes the query wait for replication queue to become empty (just like it worked before https://github.com/ClickHouse/ClickHouse/pull/45648). #47659 (Alexander Tokmakov).
  • Improvement name of some span logs. #47667 (Frank Chen).
  • Now ReplicatedMergeTree with zero copy replication has less load to ZooKeeper. #47676 (alesapin).
  • Prevent using too long chains of aggregate function combinators (they can lead to slow queries in the analysis stage). This closes #47715. #47716 (Alexey Milovidov).
  • Support for subquery in parameterized views resolves #46741 Implementation: * Updated to pass the parameter is_create_parameterized_view to subquery processing. Testing: * Added test case with subquery for parameterized view. #47725 (SmitaRKulkarni).
  • Fix memory leak in MySQL integration (reproduces with connection_auto_close=1). #47732 (Kseniia Sumarokova).
  • AST Fuzzer support fuzz EXPLAIN query. #47803 (flynn).
  • Fixed error print message while Decimal parameters is incorrect. #47812 (Yu Feng).
  • Add X-ClickHouse-Query-Id to HTTP response when queries fails to execute. #47813 (Frank Chen).
  • AST fuzzer support fuzzing SELECT query to EXPLAIN query randomly. #47852 (flynn).
  • Improved the overall performance by better utilizing local replica. And forbid reading with parallel replicas from non-replicated MergeTree by default. #47858 (Nikita Mikhaylov).
  • More accurate CPU usage indication for client: account for usage in some long-living server threads (Segmentator) and do regular CPU accounting for every thread. #47870 (Sergei Trifonov).
  • The parameter exact_rows_before_limit is used to make rows_before_limit_at_least is designed to accurately reflect the number of rows returned before the limit is reached. This pull request addresses issues encountered when the query involves distributed processing across multiple shards or sorting operations. Prior to this update, these scenarios were not functioning as intended. #47874 (Amos Bird).
  • ThreadPool metrics introspection. #47880 (Azat Khuzhin).
  • Add WriteBufferFromS3Microseconds and WriteBufferFromS3RequestsErrors profile events. #47885 (Antonio Andelic).
  • Add --link and --noninteractive (-y) options to clickhouse install. Closes #47750. #47887 (Nikolay Degterinsky).
  • Fix decimal-256 text output issue on s390x. #47932 (MeenaRenganathan22).
  • Fixed UNKNOWN_TABLE exception when attaching to a materialized view that has dependent tables that are not available. This might be useful when trying to restore state from a backup. #47975 (MikhailBurdukov).
  • Fix case when (optional) path is not added to encrypted disk configuration. #47981 (Kseniia Sumarokova).
  • Add *OrNull() and *OrZero() variants for parseDateTime(), add alias "str_to_date" for MySQL parity. #48000 (Robert Schulze).
  • Improve the code around background_..._pool_size settings reading. It should be configured via the main server configuration file. #48055 (filimonov).
  • Support for cte in parameterized views Implementation: * Updated to allow query parameters while evaluating scalar subqueries. Testing: * Added test case with cte for parameterized view. #48065 (SmitaRKulkarni).
  • Add NOSIGN keyword for S3 table function and storage engine to avoid signing requests with provided credentials. Add no_sign_request config for all functionalities using S3. #48092 (Antonio Andelic).
  • Support bin integers (U)Int128/(U)Int256, Map with any key type and DateTime64 with any precision (not only 3 and 6). #48119 (Kruglov Pavel).
  • Support more ClickHouse types in MsgPack format: (U)Int128/(U)Int256, Enum8(16), Date32, Decimal(32|64|128|256), Tuples. #48124 (Kruglov Pavel).
  • The output of some SHOW ... statements is now sorted. #48127 (Robert Schulze).
  • Allow skipping errors related to unknown enum values in row input formats. #48133 (Alexey Milovidov).
  • Add allow_distributed_ddl_queries option to disallow distributed DDL queries for the cluster in the config. #48171 (Aleksei Filatov).
  • Determine the hosts' order in SHOW CLUSTER query, a followup for #48127 and #46240. #48235 (Mikhail f. Shiryaev).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in an official stable release)

Build Improvement

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT

Testing Improvement

  • Fixed functional test 02534_keyed_siphash and 02552_siphash128_reference for s390x. #47615 (Harry Lee).