ClickHouse/docs/changelogs/v23.11.1.2711-stable.md
2024-05-23 13:54:45 +02:00

95 KiB

sidebar_position sidebar_label
1 2023

2023 Changelog

ClickHouse release v23.11.1.2711-stable (05bc8ef1e0) FIXME as compared to v23.10.1.1976-stable (13adae0e42)

Backward Incompatible Change

  • Formatters %l/%k/%c in function parseDateTime() are now able to parse hours/months without leading zeros, e.g. select parseDateTime('2023-11-26 8:14', '%F %k:%i') now works. Set parsedatetime_parse_without_leading_zeros = 0 to restore the previous behavior which required two digits. Function formatDateTime is now also able to print hours/months without leading zeros. This is controlled by setting formatdatetime_format_without_leading_zeros but off by default to not break existing use cases. #55872 (Azat Khuzhin).
  • You can no longer use the aggregate function avgWeighted with arguments of type Decimal. Workaround: convert arguments to Float64. This closes #43928. This closes #31768. This closes #56435. If you have used this function inside materialized views or projections with Decimal arguments, contact support@clickhouse.com. Fixed error in aggregate function sumMap and made it slower around 1.5..2 times. It does not matter because the function is garbage anyway. This closes #54955. This closes #53134. This closes #55148. Fix a bug in function groupArraySample - it used the same random seed in case more than one aggregate state is generated in a query. #56350 (Alexey Milovidov).
  • The default ClickHouse server configuration file has enabled access_management (user manipulation by SQL queries) and named_collection_control (manipulation of named collection by SQL queries) for the default user by default. This closes #56482. #56619 (Alexey Milovidov).
  • Multiple improvements for RESPECT/IGNORE NULLS. #57189 (Raúl Marín).
  • Remove optimization optimize_move_functions_out_of_any. #57190 (Raúl Marín).

New Feature

  • Added server setting async_load_databases for asynchronous loading of databases and tables. Speeds up the server start time. Applies to databases with Ordinary, Atomic and Replicated engines. Their tables load metadata asynchronously. Query to a table increases the priority of the load job and waits for it to be done. Added table system.async_loader. #49351 (Sergei Trifonov).
    1. Add function extractPlainRanges to KeyCondition. 2. Add some useful functions to Range 3. Add PlainRanges who represent a serious of ranges that ordered and no overlapping. 4. Add NumbersRangedSource who can accurately return user selected numbers. #50909 (JackyWoo).
  • Add system table blob_storage_log. #52918 (vdimir).
  • Use statistic to order prewhere conditions better. #53240 (Han Fei).
  • Added a new aggregation function groupArraySorted(n)(value) which returns an array with the n first values from a field value sorted by itself. #53562 (Yarik Briukhovetskyi).
  • Added support for compression in keeper protocol. Can be enabled on clickhouse by using this flag use_compression inside zookeeper. resolves #49507. #54957 (SmitaRKulkarni).
  • Add ClickHouse setting to disable tunneling for HTTPS requests over HTTP proxy. #55033 (Arthur Passos).
  • Introduce the feature storage_metadata_write_full_object_key. If it is set as true then metadata files are written with new format VERSION_FULL_OBJECT_KEY. With that format CH stores full remote object key in the metadata file. #55566 (Sema Checherinda).
  • Add new settings and syntax to protect named collections' fields from being overridden. This is meant to prevent a malicious user from obtaining unauthorized access to secrets. #55782 (Salvatore Mesoraca).
  • Add hostname column to all system log tables;. #55894 (Bharat Nallan).
  • Add CHECK ALL TABLES query. #56022 (vdimir).
  • Added function fromDaysSinceYearZero() which is similar to MySQL's FROM_DAYS. E.g. SELECT fromDaysSinceYearZero(739136) returns 2023-09-08. #56088 (Joanna Hulboj).
  • Implemented series period detect method using FFT in pocketFFT lib. #56171 (Bhavna Jindal).
  • Add an external Python tool to view backups and to extract information from them without using ClickHouse. #56268 (Vitaly Baranov).
  • ... #56275 (Alexey Milovidov).
  • This pull request implements new setting called preferred_projection_name. If it is set to a non-empty string, the specified projection would be used if possible. #56309 (Yarik Briukhovetskyi).
  • S3 adaptive timeout means that first attempt made with low send and receive timeouts. #56314 (Sema Checherinda).
  • Add 4-letter command for yielding/resigning leadership (https://github.com/ClickHouse/ClickHouse/issues/56352). #56354 (Pradeep Chhetri).
  • Added a new SQL function, "arrayRandomSample(arr, k)" which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g. "SELECT arrayReduce('groupArraySample(3)', range(10))". #56416 (Robert Schulze).
  • Added support for float16 type data to use in .npy files. Closes #56344. #56424 (Yarik Briukhovetskyi).
  • Added system view information_schema.statistics for better compatibility with Tableau Online. #56425 (Serge Klochkov).
  • Add function getClientHTTPHeader for fetching values header values set in the HTTP request. #56488 (凌涛).
  • Add a new table function named fuzzJSON with rows containing perturbed versions of the source JSON string with random variations. #56490 (Julia Kartseva).
  • Add system.symbols table useful for introspection of the binary. #56548 (Alexey Milovidov).
  • Add 4-letter command for yielding/resigning leadership. #56620 (Pradeep Chhetri).
  • Configurable dashboards. Queries for charts are now loaded using a query, which by default uses a new system.dashboards table. #56771 (Sergei Trifonov).
  • Introduce fileCluster table function. #56868 (Andrey Zvonov).
  • Add _size virtual column with file size in bytes to s3/file/hdfs/url/azureBlobStorage engines. #57126 (Kruglov Pavel).
  • Expose the number of errors occurred on a server since last restart from the Prometheus endpoint. #57209 (Nikita Mikhaylov).
  • Added a new SQL function sqid to generate Sqids (https://sqids.org/), example: SELECT sqid(125, 126). #57442 (awakeljw).

Performance Improvement

  • Support window functions parallel evaluation. Fixes #34688. #39631 (Dmitry Novik).
  • Increase the default value of max_concurrent_queries from 100 to 1000. This makes sense when there is a large number of connecting clients, which are slowly sending or receiving data, so the server is not limited by CPU, or when the number of CPU cores is larger than 100. Also, enable the concurrency control by default, and set the desired number of query processing threads in total as twice the number of CPU cores. It improves performance in scenarios with a very large number of concurrent queries. #46927 (Alexey Milovidov).
  • Fixed filtering by IN(...) condition for Merge table engine. #54905 (Nikita Taranov).
  • An improvement which takes place when cache is full and there are big reads. #55158 (Kseniia Sumarokova).
  • Add ability to disable checksums for S3 to avoid excessive input file read (this new behavior could be enabled with s3_disable_checksum=true). #55559 (Azat Khuzhin).
  • Now we read synchronously from remote tables when data is in page cache (like we do for local tables). It is faster, doesn't require synchronisation inside thread pool, doesn't hesitate to do seek-s on local fs and reduces cpu wait. #55841 (Nikita Taranov).
  • ... This PR follows #55929, it will bring about 30% speedup. - reduce the reserved memory - reduce the resize call. #55957 (lgbo).
  • The performance experiments of OnTime on the ICX device (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads) show that this change could bring the improvements of 7.4%, 5.9%, 4.7%, 3.0%, and 4.6% to the QPS of the query Q2, Q3, Q4, Q5 and Q6 respectively while having no impact on others. #56079 (Zhiguo Zhou).
  • Limit the number of threads busy inside the query profiler. If there are more - they will skip profiling. #56105 (Alexey Milovidov).
  • WindowTransform decrease amount of virtual function calls. #56120 (Maksim Kita).
  • Allow recursive tuple field pruning in ORC to speed up scaning. #56122 (李扬).
  • This pull request provides countRows support for Npy data format. Now with setting optimize_count_from_files=1 queries like select count() from file(data.npy) will work much more fast because of caching the results. #56304 (Yarik Briukhovetskyi).
  • Queries with aggregation and a large number of streams will use less amount of memory during the plan's construction. #57074 (Alexey Milovidov).
  • Improve performance of executing queries for use cases with many users. #57106 (Andrej Hoos).
  • Trivial improvement on array join, reuse some intermediate results. #57183 (李扬).
  • There are cases when stack unwinding was slow. #57221 (Alexey Milovidov).
  • Now we use default read pool for reading from external storage when max_streams = 1. It is beneficial when read prefetches are enabled. #57334 (Nikita Taranov).

Improvement

  • Engine Merge filters the records according to the row policies of the underlying tables. #50209 (Ilya Golshtein).
  • Add a setting max_execution_time_leaf to limit the execution time on shard for distributed query, and timeout_overflow_mode_leaf to control the behaviour if timeout happens. #51823 (Duc Canh Le).
  • Fix possible postgresql logical replication conversion_error when using MaterializedPostgreSQL. #53721 (takakawa).
  • Set background_fetches_pool_size to 16, background_schedule_pool_size to 512 that is better for production usage with frequent small insertions. #54327 (Denny Crane).
  • While read data from a csv format file, and at end of line is'\r' , which not followed by '\n', then we will enconter the exception as below Cannot parse CSV format: found \r (CR) not followed by \n (LF). Line must end by \n (LF) or \r\n (CR LF) or \n\r.: In clickhouse, the csv end of line must be \n or \r\n or \n\r, so the \r must be followed by \n , but in some suitation, the csv input data is abnormal, like above, \r is at end of line. #54340 (KevinyhZou).
  • Update arrow library to release-13.0.0 that supports new encodings. Closes #44505. #54800 (Kruglov Pavel).
  • Improve performance of ON CLUSTER queries by removing heavy system calls to get all network interfaces when looking for local ip address in the DDL entry hosts list. #54909 (Duc Canh Le).
  • Keeper improvement: improve memory-usage during startup by delaying log preprocessing. #55660 (Antonio Andelic).
  • Fixed accounting of memory allocated before attaching thread to a query or a user. #56089 (Nikita Taranov).
  • ClickHouse keeper reports its running availability zone at /keeper/availability-zone path, when running on AWS environment. #56104 (Jianfei Hu).
  • Add support for LARGE_LIST with Arrow. #56118 (edef).
  • Improved performance of glob matching for file and hdfs storages. #56141 (Andrey Zvonov).
  • Allow manual compaction of EmbeddedRocksDB via OPTIMIZE query. #56225 (Azat Khuzhin).
  • Posting lists in inverted indexes are now compressed which reduces their size by 10-30%. #56226 (Harry Lee).
  • Add ability to specify BlockBasedTableOptions for EmbeddedRocksDB. #56264 (Azat Khuzhin).
  • SHOW COLUMNS now displays MySQL's equivalent data type name when the connection was made through the MySQL protocol. Previously, this was the case when setting use_mysql_types_in_show_columns = 1. The setting is retained but made obsolete. #56277 (Robert Schulze).
  • Fixed possible The local set of parts of table doesn't look like the set of parts in ZooKeeper error if server was restarted just after TRUNCATE or DROP PARTITION. #56282 (Alexander Tokmakov).
  • Parallelise BackupEntriesCollector. #56312 (Kseniia Sumarokova).
  • Fixed handling of non-const query strings in functions formatQuery()/ formatQuerySingleLine(). Also added OrNull variants of both functions that return a NULL when a query cannot be parsed instead of throwing an exception. #56327 (Robert Schulze).
  • Support create and materialized index in the same alter query, also support modity TTL and materialize TTL in the same query. Closes #55651. #56331 (flynn).
  • Enable adding new disk to storage configuration without restart. #56367 (Duc Canh Le).
  • Allow backup of materialized view with dropped inner table instead of failing the backup. #56387 (Kseniia Sumarokova).
  • Queries to system.replicas initiate requests to ZooKeeper when certain columns are queried. When there are thousands of tables these requests might produce a considerable load on ZooKeeper. If there are multiple simultaneous queries to system.replicas they do same requests multiple times. The change is to "deduplicate" requests from concurrent queries. #56420 (Alexander Gololobov).
  • Add transition from reading key to reading quoted key when double quotes are found. #56423 (Arthur Passos).
  • Fix transfer query to MySQL compatible query. #56456 (flynn).
  • Add support for backing up and restoring tables using KeeperMap engine. #56460 (Antonio Andelic).
  • 404 response for CompleteMultipartUpload has to be rechecked. Operation could be done on server even if client got timeout or other network errors. The next retry of CompleteMultipartUpload receives 404 response. If the object key exists that operation is considered as successful. #56475 (Sema Checherinda).
  • Enable the HTTP OPTIONS method by default - it simplifies requesting ClickHouse from a web browser. #56483 (Alexey Milovidov).
  • The value for dns_max_consecutive_failures was changed by mistake in #46550 - this is reverted and adjusted to a better value. Also, increased the HTTP keep-alive timeout to a reasonable value from production. #56485 (Alexey Milovidov).
  • Load base backups lazily (a base backup won't be loaded until it's needed). Also add some log message and profile events for backups. #56516 (Vitaly Baranov).
  • Setting query_cache_store_results_of_queries_with_nondeterministic_functions (with values false or true) was marked obsolete. It was replaced by setting query_cache_nondeterministic_function_handling, a three-valued enum that controls how the query cache handles queries with non-deterministic functions: a) throw an exception (default behavior), b) save the non-deterministic query result regardless, or c) ignore, i.e. don't throw an exception and don't cache the result. #56519 (Robert Schulze).
  • Rewrite equality with is null check in JOIN ON section. Analyzer only. #56538 (vdimir).
  • Functionconcat now supports arbitrary argument types (instead of only String and FixedString arguments). This makes it behave more similar to MySQL concat implementation. For example, SELECT concat('ab', 42) now returns ab42. #56540 (Serge Klochkov).
  • Allow getting cache configuration from 'named_collection' section in config or from sql created named collection. #56541 (Kseniia Sumarokova).
  • Update query_masking_rules when reloading the config (#56449). #56573 (Mikhail Koviazin).
  • Make removeoutdatedtables() less aggressive with unsuccessful postgres connection. #56609 (jsc0218).
  • Currenting setting takes too much time to connnect to PG when URL is not right, so the relevant query stucks there and get cancelled. #56648 (jsc0218).
  • ClickHouse keeper reports its running availability zone at /keeper/availability-zone path. This can be configured via <availability_zone><value>us-west-1a</value></availability_zone>. #56715 (Jianfei Hu).
  • Do not allow tables on different replicas have different aggregate functions in SimpleAggregateFunction columns. #56724 (Duc Canh Le).
  • Add support for the well-known Protobuf types in the Protobuf format. #56741 (János Benjamin Antal).
  • Keeper improvement: disable compressed logs by default in Keeper. #56763 (Antonio Andelic).
  • Add config setting wait_dictionaries_load_at_startup:. #56782 (Vitaly Baranov).
  • There was a potential vulnerability in previous ClickHouse versions: if a user has connected and unsuccessfully tried to authenticate with the "interserver secret" method, the server didn't terminate the connection immediately but continued to receive and ignore the leftover packets from the client. While these packets are ignored, they are still parsed, and if they use a compression method with another known vulnerability, it will lead to exploitation of it without authentication. This issue was found with ClickHouse Bug Bounty Program by https://twitter.com/malacupa. #56794 (Alexey Milovidov).
  • Fetching a part waits when that part is fully committed on remote replica. It is better not send part in PreActive state. In case of zero copy this is mandatory restriction. #56808 (Sema Checherinda).
  • Implement user-level setting alter_move_to_space_execute_async which allow to execute queries ALTER TABLE ... MOVE PARTITION|PART TO DISK|VOLUME asynchronously. The size of pool for background executions is controlled by background_move_pool_size. Default behavior is synchronous execution. Fixes #47643. #56809 (alesapin).
  • Able to filter by engine when scanning system.tables, avoid unnecessary (potentially time-consuming) connection. #56813 (jsc0218).
  • Show total_bytes and total_rows in system tables for RocksDB storage. #56816 (Aleksandr Musorin).
  • Allow basic commands in ALTER for TEMPORARY tables. #56892 (Sergey).
  • Lz4 compression. Buffer compressed block in a rare case when out buffer capacity is not enough for writing compressed block directly to out's buffer. #56938 (Sema Checherinda).
  • Add metrics for the number of queued jobs, which is useful for the IO thread pool. #56958 (Alexey Milovidov).
  • Add a setting for PostgreSQL table engine setting in the config file Added a check for the setting Added documentation around the additional setting. #56959 (Peignon Melvyn).
  • Run interpreter with only_analyze flag in getsampleblock method. #56972 (Mikhail Artemenko).
  • Add a new MergeTree setting add_implicit_sign_column_constraint_for_collapsing_engine (disabled by default). When enabled, it adds an implicit CHECK constraint for CollapsingMergeTree tables that restricts the value of the Sign column to be only -1 or 1. #56701. #56986 (Kevin Mingtarja).
  • Function concat() can now be called with a single argument, e.g., SELECT concat('abc'). This makes its behavior more consistent with MySQL's concat implementation. #57000 (Serge Klochkov).
  • Signs all x-amz-* headers as required by AWS S3 docs. #57001 (Arthur Passos).
  • Function fromDaysSinceYearZero (alias: FROM_DAYS) can now be used with unsigned and signed integer types (previously, it had to be an unsigned integer). This improve compatibility with 3rd party tools such as Tableau Online. #57002 (Serge Klochkov).
  • Add system.s3queue_log to default config. #57036 (Kseniia Sumarokova).
  • Change the default for wait_dictionaries_load_at_startup to true, and use this setting only if dictionaries_lazy_load is false. #57133 (Vitaly Baranov).
  • Check dictionary source type on creation even if dictionaries_lazy_load is enabled. #57134 (Vitaly Baranov).
  • Plan-level optimizations can now be enabled/disabled individually. Previously, it was only possible to disable them all. The setting which previously did that (query_plan_enable_optimizations) is retained and can still be used to disable all optimizations. #57152 (Robert Schulze).
  • The server's exit code will correspond to the exception code. For example, if the server cannot start due to memory limit, it will exit with the code 241 = MEMORY_LIMIT_EXCEEDED. In previous versions, the exit code for exceptions was always 70 = Poco::Util::ExitCode::EXIT_SOFTWARE. #57153 (Alexey Milovidov).
  • Do not demangle and symbolize stack frames from __functional c++ header. #57201 (Mike Kot).
  • It is now possible to refer to ALIAS column in index (non-primary-key) definitions (issue #55650). Example: CREATE TABLE tab(col UInt32, col_alias ALIAS col + 1, INDEX idx (col_alias) TYPE minmax) ENGINE = MergeTree ORDER BY col;. #57220 (flynn).
  • HTTP server page /dashboard now supports charts with multiple lines. #57236 (Sergei Trifonov).
  • This pr gives possibility to use suffixes (K, M, G, T, E) along with the amount of memory to be used. Closes #56879. #57273 (Yarik Briukhovetskyi).
  • Bumped Intel QPL (used by codec DEFLATE_QPL) from v1.2.0 to v1.3.1 . Also fixed a bug in case of BOF (Block On Fault) = 0, changed to handle page faults by falling back to SW path. #57291 (jasperzhu).
  • Make alter materialized view non experimental and deprecate allow_experimental_alter_materialized_view_structure setting. Fixes #15206. #57311 (alesapin).
  • Increase default replicated_deduplication_window of MergeTree settings from 100 to 1k. #57335 (sichenzhao).
  • Stop using INCONSISTENT_METADATA_FOR_BACKUP that much. If possible prefer to continue scanning instead of stopping and starting the scanning for backup from the beginning. #57385 (Vitaly Baranov).
  • Introduce the limit for the maximum number of table projections (default 25). #57491 (Julia Kartseva).
  • Enable async_block_ids_cache by default for async_inserts deduplication. #57513 (alesapin).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in an official stable release)

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT