ClickHouse/docs/changelogs/v22.6.1.1985-stable.md
2024-05-23 13:54:45 +02:00

67 KiB

sidebar_position sidebar_label
1 2022

2022 Changelog

ClickHouse release v22.6.1.1985-stable (7000c4e003) FIXME as compared to v22.5.1.2079-stable (df0cb06209)

Backward Incompatible Change

  • Changes how settings using seconds as type are parsed to support floating point values (for example: max_execution_time=0.5). Infinity or NaN values will throw an exception. #37187 (Raúl Marín).
  • Changed format of binary serialization of columns of experimental type Object. New format is more convenient to implement by third-party clients. #37482 (Anton Popov).
  • Turn on setting output_format_json_named_tuples_as_objects by default. It allows to serialize named tuples as JSON objects in JSON formats. #37756 (Anton Popov).

New Feature

  • Added SYSTEM UNFREEZE query that deletes the whole backup regardless if the corresponding table is deleted or not. #36424 (Vadim Volodin).
  • Adds H3 unidirectional edge functions. #36843 (Bharat Nallan).
  • Add merge_reason column to system.part_log table. #36912 (Sema Checherinda).
  • This PR enables POPULATE for WindowView. #36945 (vxider).
  • Add new columnar JSON formats: JSONColumns, JSONCompactColumns, JSONColumnsWithMetadata. Closes #36338 Closes #34509. #36975 (Kruglov Pavel).
  • Add support for calculating hashids from unsigned integers. #37013 (Michael Nutt).
  • Add GROUPING function. Closes #19426. #37163 (Dmitry Novik).
  • ALTER TABLE ... MODIFY QUERY support for WindowView. #37188 (vxider).
  • This PR changes the behavior of the ENGINE syntax in WindowView, to make it like in MaterializedView. #37214 (vxider).
  • SALT is allowed for CREATE USER IDENTIFIED WITH sha256_hash. #37377 (Yakov Olkhovskiy).
  • Implemented changing comment to a ReplicatedMergeTree table. #37416 (Vasily Nemkov).
  • Add support for Maps and Records in Avro format. Add new setting input_format_avro_null_as_default that allow to insert null as default in Avro format. Closes #18925 Closes #37378 Closes #32899. #37525 (Kruglov Pavel).
  • Add two new settings input_format_csv_skip_first_lines/input_format_tsv_skip_first_lines to allow skipping specified number of lines in the beginning of the file in CSV/TSV formats. #37537 (Kruglov Pavel).
  • showCertificate() function shows current server's SSL certificate. #37540 (Yakov Olkhovskiy).
  • Implementation of FPC algorithm for floating point data compression. #37553 (Mikhail Guzov).
  • HTTP source for Data Dictionaries in Named Collections is supported. #37581 (Yakov Olkhovskiy).
  • This PR aims to resolve #22130 which allows inserting into system.zookeeper. To simplify, this PR is designed as:. #37596 (Han Fei).
  • Added a new window function nonNegativeDerivative(metric_column, timestamp_column[, INTERVAL x SECOND]). #37628 (Andrey Zvonov).
  • Executable user defined functions support parameters. Example: SELECT test_function(parameters)(arguments). Closes #37578. #37720 (Maksim Kita).
  • Added open telemetry traces visualizing tool based on d3js. #37810 (Sergei Trifonov).

Performance Improvement

  • Improve performance of insert into MergeTree if there are multiple columns in ORDER BY. #35762 (Maksim Kita).
  • Apply read method 'threadpool' for StorageHive. #36328 (李扬).
  • Now we split data parts into layers and distribute them among threads instead of whole parts to make the execution of queries with FINAL more data-parallel. #36396 (Nikita Taranov).
  • Load marks for only necessary columns when reading wide parts. #36879 (Anton Kozlov).
  • When all the columns to read are partition keys, construct columns by the file's row number without real reading the hive file. #37103 (lgbo).
  • Fix performance of dictGetDescendants, dictGetChildren functions, create temporary parent to children hierarchical index per query, not per function call during query. Allow to specify BIDIRECTIONAL for HIERARHICAL attributes, dictionary will maintain parent to children index in memory, that way functions dictGetDescendants, dictGetChildren will not create temporary index per query. Closes #32481. #37148 (Maksim Kita).
  • Improve performance and memory usage for select of subset of columns for formats Native, Protobuf, CapnProto, JSONEachRow, TSKV, all formats with suffixes WithNames/WithNamesAndTypes. Previously while selecting only subset of columns from files in these formats all columns were read and stored in memory. Now only required columns are read. This PR enables setting input_format_skip_unknown_fields by default, because otherwise in case of select of subset of columns exception will be thrown. #37192 (Kruglov Pavel).
  • Improve sort performance by single column. #37195 (Maksim Kita).
  • Support multi disks for caching hive files. #37279 (lgbo).
  • Improved performance on array norm and distance functions 2x-4x times. #37394 (Alexander Gololobov).
  • Improve performance of number comparison functions using dynamic dispatch. #37399 (Maksim Kita).
  • Improve performance of ORDER BY with LIMIT. #37481 (Maksim Kita).
  • Improve performance of hasAll function using dynamic dispatch infrastructure. #37484 (Maksim Kita).
  • Improve performance of greatCircleAngle, greatCircleDistance, geoDistance functions. #37524 (Maksim Kita).
  • Optimized the internal caching of re2 patterns which occur e.g. in LIKE and MATCH functions. #37544 (Robert Schulze).
  • Improve filter bitmask generator function all in one with avx512 instructions. #37588 (yaqi-zhao).
  • Improved performance of aggregation in case, when sparse columns (can be enabled by experimental setting ratio_of_defaults_for_sparse_serialization in MergeTree tables) are used as arguments in aggregate functions. #37617 (Anton Popov).
  • Optimize function COALESCE with two arguments. #37666 (Anton Popov).
  • Replace multiIf to if in case when multiIf has only one condition, because function if is more performant. #37695 (Anton Popov).
  • Aggregates state destruction now may be posted on a thread pool. For queries with LIMIT and big state it provides significant speedup, e.g. select uniq(number) from numbers_mt(1e7) group by number limit 100 became around 2.5x faster. #37855 (Nikita Taranov).
  • Improve performance of single column sorting using sorting queue specializations. #37990 (Maksim Kita).
  • Fix excessive CPU usage in background when there are a lot of tables. #38028 (Maksim Kita).
  • Improve performance of not function using dynamic dispatch. #38058 (Maksim Kita).
  • Added numerous NEON accelerated paths for main libraries. #38093 (Daniel Kutenin).

Improvement

  • Add separate CLUSTER grant (and access_control_improvements.on_cluster_queries_require_cluster_grant configuration directive, for backward compatibility, default to false). #35767 (Azat Khuzhin).
  • Add self extracting executable #34755. #35775 (Filatenkov Artur).
  • Added support for schema inference for hdfsCluster. #35812 (Nikita Mikhaylov).
  • Add feature disks( ls - list files on disk, C - set config file, list-disks - list disks names, disk - set disk for work by name, help - produce help message copy - copy data on disk containing at from_path to to_path link - Create hardlink on disk from from_path to to_path list - List files on disk move - Move file or directory on disk from from_path to to_path read - read File on disk from_path to to_path or to stdout remove - Remove file or directory on disk with all children. write - Write File on diskfrom_path or stdin to to_path). #36060 (Artyom Yurkov).
  • Implement least_used load balancing algorithm for disks inside volume (multi disk configuration). #36686 (Azat Khuzhin).
    • Modify the HTTP Endpoint to return the full stats under the X-ClickHouse-Summary header when send_progress_in_http_headers=0 (before it would return all zeros). - Modify the HTTP Endpoint to return X-ClickHouse-Exception-Code header when progress has been sent before (send_progress_in_http_headers=1) - Modify the HTTP Endpoint to return HTTP_REQUEST_TIMEOUT (408) instead of HTTP_INTERNAL_SERVER_ERROR (500) on TIMEOUT_EXCEEDED errors. #36884 (Raúl Marín).
  • Allow a user to inspect grants from granted roles. #36941 (nvartolomei).
  • Do not calculate an integral numerically but use CDF functions instead. This will speed up execution and will increase the precision. This fixes #36714. #36953 (Nikita Mikhaylov).
  • Add default implementation for Nothing in functions. Now most of the functions will return column with type Nothing in case one of it's arguments is Nothing. It also solves problem with functions like arrayMap/arrayFilter and similar when they have empty array as an argument. Previously queries like select arrayMap(x -> 2 * x, []); failed because function inside lambda cannot work with type Nothing, now such queries return empty array with type Array(Nothing). Also add support for arrays of nullable types in functions like arrayFilter/arrayFill. Previously, queries like select arrayFilter(x -> x % 2, [1, NULL]) failed, now they work (if the result of lambda is NULL, then this value won't be included in the result). Closes #37000. #37048 (Kruglov Pavel).
  • Now if a shard has local replica we create a local plan and a plan to read from all remote replicas. They have shared initiator which coordinates reading. #37204 (Nikita Mikhaylov).
  • In function: CompressedWriteBuffer::nextImpl(), there is an unnecessary write-copy step that would happen frequently during inserting data. Below shows the differentiation with this patch: - Before: 1. Compress "working_buffer" into "compressed_buffer" 2. write-copy into "out" - After: Directly Compress "working_buffer" into "out". #37242 (jasperzhu).
  • Support non-constant SQL functions (NOT) (I)LIKE and MATCH. #37251 (Robert Schulze).
  • Client will try every IP address returned by DNS resolution until successful connection. #37273 (Yakov Olkhovskiy).
    • Do no longer abort server startup if configuration option "mark_cache_size" is not explicitly set. #37326 (Robert Schulze).
  • Allow to use String type instead of Binary in Arrow/Parquet/ORC formats. This PR introduces 3 new settings for it: output_format_arrow_string_as_string, output_format_parquet_string_as_string, output_format_orc_string_as_string. Default value for all settings is false. #37327 (Kruglov Pavel).
  • Apply setting input_format_max_rows_to_read_for_schema_inference for all read rows in total from all files in globs. Previously setting input_format_max_rows_to_read_for_schema_inference was applied for each file in glob separately and in case of huge number of nulls we could read first input_format_max_rows_to_read_for_schema_inference rows from each file and get nothing. Also increase default value for this setting to 25000. #37332 (Kruglov Pavel).
  • allows providing NULL/NOT NULL right after type in column declaration. #37337 (Igor Nikonov).
  • optimize file segment PARTIALLY_DOWNLOADED get read buffer. #37338 (chen).
  • Allow to prune the list of files via virtual columns such as _file and _path when reading from S3. This is for #37174 , #23494. #37356 (Amos Bird).
  • Try to improve short circuit functions processing to fix problems with stress tests. #37384 (Kruglov Pavel).
  • Closes #37395. #37415 (Memo).
  • Fix extremely rare deadlock during part fetch in zero-copy replication. Fixes #37423. #37424 (metahys).
  • Don't allow to create storage with unknown data format. #37450 (Kruglov Pavel).
  • Set global_memory_usage_overcommit_max_wait_microseconds default value to 5 seconds. Add info about OvercommitTracker to OOM exception message. Add MemoryOvercommitWaitTimeMicroseconds profile event. #37460 (Dmitry Novik).
  • Play UI: Keep controls in place when the page is scrolled horizontally. This makes edits comfortable even if the table is wide and it was scrolled far to the right. The feature proposed by Maksym Tereshchenko from CaspianDB. #37470 (Alexey Milovidov).
  • Now more filters can be pushed down for join. #37472 (Amos Bird).
  • Modify query div in play.html to be extendable beyond 20% height. In case of very long queries it is helpful to extend the textarea element, only today, since the div is fixed height, the extended textarea hides the data div underneath. With this fix, extending the textarea element will push the data div down/up such the extended textarea won't hide it. Also, keeps query box width 100% even when the user adjusting the size of the query textarea. #37488 (guyco87).
  • Modify query div in play.html to be extendable beyond 20% height. In case of very long queries it is helpful to extend the textarea element, only today, since the div is fixed height, the extended textarea hides the data div underneath. With this fix, extending the textarea element will push the data div down/up such the extended textarea won't hide it. Also, keeps query box width 100% even when the user adjusting the size of the query textarea. #37504 (guyco87).
  • Currently clickhouse directly downloads all remote files to the local cache (even if they are only read once), which will frequently cause IO of the local hard disk. In some scenarios, these IOs may not be necessary and may easily cause negative optimization. As shown in the figure below, when we run SSB Q1-Q4, the performance of the cache has caused negative optimization. #37516 (Han Shukai).
  • Added ProfileEvents for introspection of type of written (inserted or merged) parts (Inserted{Wide/Compact/InMemory}Parts, MergedInto{Wide/Compact/InMemory}Parts. Added column part_type to system.part_log. Resolves #37495. #37536 (Anton Popov).
  • clickhouse-keeper improvement: move broken logs to a timestamped folder. #37565 (Antonio Andelic).
  • Do not write expired columns by TTL after subsequent merges (before only first merge/optimize of the part will not write expired by TTL columns, all other will do). #37570 (Azat Khuzhin).
  • More precise result of the dumpColumnStructure miscellaneous function in presence of LowCardinality or Sparse columns. In previous versions, these functions were converting the argument to a full column before returning the result. This is needed to provide an answer in #6935. #37633 (Alexey Milovidov).
  • keeper: store only unique session IDs for watches. #37641 (Azat Khuzhin).
  • Fix possible "Cannot write to finalized buffer". #37645 (Azat Khuzhin).
  • Add setting support_batch_delete for DiskS3 to disable multiobject delete calls, which Google Cloud Storage doesn't support. #37659 (Fred Wulff).
  • Support types with non-standard defaults in ROLLUP, CUBE, GROUPING SETS. Closes #37360. #37667 (Dmitry Novik).
  • Add an option to disable connection pooling in ODBC bridge. #37705 (Anton Kozlov).
  • ... LIKE patterns with trailing escape symbol ('\') are now disallowed (as mandated by the SQL standard). #37764 (Robert Schulze).
  • Fix stacktraces collection on ARM. Closes #37044. Closes #15638. #37797 (Maksim Kita).
  • Functions dictGetHierarchy, dictIsIn, dictGetChildren, dictGetDescendants added support nullable HIERARCHICAL attribute in dictionaries. Closes #35521. #37805 (Maksim Kita).
  • Expose BoringSSL version related info in the system.build_options table. #37850 (Bharat Nallan).
  • Description Limiting the maximum cache usage per query can effectively prevent cache pool contamination. Related Issues. #37859 (Han Shukai).
  • Now clickhouse-server removes delete_tmp directories on server start. Fixes #26503. #37906 (alesapin).
  • Clean up broken detached parts after timeout. Closes #25195. #37975 (Kseniia Sumarokova).
  • Now in MergeTree table engines family failed-to-move parts will be removed instantly. #37994 (alesapin).
  • Now if setting always_fetch_merged_part is enabled for ReplicatedMergeTree merges will try to find parts on other replicas rarely with smaller load for [Zoo]Keeper. #37995 (alesapin).
  • Add implicit grants with grant option too. For example GRANT CREATE TABLE ON test.* TO A WITH GRANT OPTION now allows A to execute GRANT CREATE VIEW ON test.* TO B. #38017 (Vitaly Baranov).
  • Do not display -0.0 CPU time in clickhouse-client. It can appear due to rounding errors. This closes #38003. This closes #38038. #38064 (Alexey Milovidov).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in official stable release)

  • Fix GROUP BY AggregateFunction (i.e. you GROUP BY by the column that has AggregateFunction type). #37093 (Azat Khuzhin).
  • Fix possible heap-use-after-free error when reading system.projection_parts and system.projection_parts_columns . This fixes #37184. #37185 (Amos Bird).
  • Fix addDependency in WindowView. This bug can be reproduced like #37237. #37224 (vxider).
  • This PR moving addDependency from constructor to startup() to avoid adding dependency to a dropped table, fix #37237. #37243 (vxider).
  • Fix inserting defaults for missing values in columnar formats. Previously missing columns were filled with defaults for types, not for columns. #37253 (Kruglov Pavel).
  • Fix some cases of insertion nested arrays to columns of type Object. #37305 (Anton Popov).
  • Fix unexpected errors with a clash of constant strings in aggregate function, prewhere and join. Close #36891. #37336 (Vladimir C).
  • Fix projections with GROUP/ORDER BY in query and optimize_aggregation_in_order (before the result was incorrect since only finish sorting was performed). #37342 (Azat Khuzhin).
  • Fixed error with symbols in key name in S3. Fixes #33009. #37344 (Vladimir Chebotarev).
  • Throw an exception when GROUPING SETS used with ROLLUP or CUBE. #37367 (Dmitry Novik).
  • Fix LOGICAL_ERROR in getMaxSourcePartsSizeForMerge during merges (in case of non standard, greater, values of background_pool_size/background_merges_mutations_concurrency_ratio has been specified in config.xml (new way) not in users.xml (deprecated way)). #37413 (Azat Khuzhin).
  • Stop removing UTF-8 BOM in RowBinary format. [#37428](https://github.com/ClickHouse/ClickHouse/pull/37428) ([Paul Loyd](https://github.com/loyd)).. #37428 (Paul Loyd).
  • clickhouse-keeper bugfix: fix force recovery for single node cluster. #37440 (Antonio Andelic).
  • Fix logical error in normalizeUTF8 functions. Closes #37298. #37443 (Maksim Kita).
  • Fix named tuples output in ORC/Arrow/Parquet formats. #37458 (Kruglov Pavel).
  • Fix optimization of monotonous functions in ORDER BY clause in presence of GROUPING SETS. Fixes #37401. #37493 (Dmitry Novik).
  • Fix error on joining with dictionary on some conditions. Close #37386. #37530 (Vladimir C).
  • Prohibit optimize_aggregation_in_order with GROUPING SETS (fixes LOGICAL_ERROR). #37542 (Azat Khuzhin).
  • Fix wrong dump information of ActionsDAG. #37587 (zhanglistar).
  • Fix converting types for UNION queries (may produce LOGICAL_ERROR). #37593 (Azat Khuzhin).
  • Fix WITH FILL modifier with negative intervals in STEP clause. Fixes #37514. #37600 (Anton Popov).
  • Fix illegal joinGet array usage when join_use_nulls = 1. This fixes #37562 . #37650 (Amos Bird).
  • Fix columns number mismatch in cross join, close #37561. #37653 (Vladimir C).
  • Fix segmentation fault in show create table from mysql database when it is configured with named collections. Closes #37683. #37690 (Kseniia Sumarokova).
  • Fix RabbitMQ Storage not being able to startup on server restart if storage was create without SETTINGS clause. Closes #37463. #37691 (Kseniia Sumarokova).
  • Fixed DateTime64 fractional seconds behavior prior to Unix epoch. #37697 (Andrey Zvonov).
  • SQL user defined functions disable CREATE/DROP in readonly mode. Closes #37280. #37699 (Maksim Kita).
  • Fix formatting of Nullable arguments for executable user defined functions. Closes #35897. #37711 (Maksim Kita).
  • Fix optimization enabled by setting optimize_monotonous_functions_in_order_by in distributed queries. Fixes #36037. #37724 (Anton Popov).
  • Fix SELECT ... INTERSECT and EXCEPT SELECT statements with constant string types. #37738 (Antonio Andelic).
  • Fix crash of FunctionHashID, closes #37735. #37742 (flynn).
  • Fix possible logical error: Invalid Field get from type UInt64 to type Float64 in values table function. Closes #37602. #37754 (Kruglov Pavel).
  • Fix possible segfault in schema inference in case of exception in SchemaReader constructor. Closes #37680. #37760 (Kruglov Pavel).
  • Fix setting cast_ipv4_ipv6_default_on_conversion_error for internal cast function. Closes #35156. #37761 (Maksim Kita).
  • Octal literals are not supported. #37765 (Yakov Olkhovskiy).
  • fix toString error on DatatypeDate32. #37775 (LiuNeng).
  • The clickhouse-keeper setting dead_session_check_period_ms was transformed into microseconds (multiplied by 1000), which lead to dead sessions only being cleaned up after several minutes (instead of 500ms). #37824 (Michael Lex).
  • Fix possible "No more packets are available" for distributed queries (in case of async_socket_for_remote/use_hedged_requests is disabled). #37826 (Azat Khuzhin).
  • Do not drop the inner target table when executing ALTER TABLE ... MODIFY QUERY in WindowView. #37879 (vxider).
  • Fix directory ownership of coordination dir in clickhouse-keeper Docker image. Fixes #37914. #37915 (James Maidment).
  • Dictionaries fix custom query with update field and {condition}. Closes #33746. #37947 (Maksim Kita).
  • Fix possible incorrect result of SELECT ... WITH FILL in the case when ORDER BY should be applied after WITH FILL result (e.g. for outer query). Incorrect result was caused by optimization for ORDER BY expressions (#35623). Closes #37904. #37959 (Yakov Olkhovskiy).
  • Add missing default columns when pushing to the target table in WindowView, fix #37815. #37965 (vxider).
  • Fixed a stack overflow issue that would cause compilation to fail. #37996 (Han Shukai).
  • when open enable_filesystem_query_cache_limit, throw Reserved cache size exceeds the remaining cache size. #38004 (chen).
  • Query, containing ORDER BY ... WITH FILL, can generate extra rows when multiple WITH FILL columns are present. #38074 (Yakov Olkhovskiy).

Bug Fix (user-visible misbehaviour in official stable release)

  • Fix converting types for UNION queries (may produce LOGICAL_ERROR). #34775 (Azat Khuzhin).
  • TTL merge may not be scheduled again if BackgroundExecutor is busy. --merges_with_ttl_counter is increased in selectPartsToMerge() --merge task will be ignored if BackgroundExecutor is busy --merges_with_ttl_counter will not be decrease. #36387 (lthaooo).
  • Fix overrided settings value of normalize_function_names. #36937 (李扬).
  • Fix for exponential time decaying window functions. Now respecting boundaries of the window. #36944 (Vladimir Chebotarev).
  • Fix bug datetime64 parse from string '1969-12-31 23:59:59.123'. Close #36994. #37039 (李扬).

NO CL ENTRY

  • NO CL ENTRY: 'Revert "Fix mutations in tables with columns of type Object"'. #37355 (Alexander Tokmakov).
  • NO CL ENTRY: 'Revert "Remove height restrictions from the query div in play web tool, and m..."'. #37501 (Alexey Milovidov).
  • NO CL ENTRY: 'Revert "Add support for preprocessing ZooKeeper operations in clickhouse-keeper"'. #37534 (Antonio Andelic).
  • NO CL ENTRY: 'Revert "(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part"'. #37545 (Alexander Tokmakov).
  • NO CL ENTRY: 'Revert "RFC: Fix converting types for UNION queries (may produce LOGICAL_ERROR)"'. #37582 (Dmitry Novik).
  • NO CL ENTRY: 'Revert "Revert "(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part""'. #37598 (alesapin).
  • NO CL ENTRY: 'Revert "Implemented changing comment to a ReplicatedMergeTree table"'. #37627 (Alexander Tokmakov).
  • NO CL ENTRY: 'Revert "Remove resursive submodules"'. #37774 (Alexey Milovidov).
  • NO CL ENTRY: 'Revert "Fix possible segfault in schema inference"'. #37785 (Alexander Tokmakov).
  • NO CL ENTRY: 'Revert "Revert "Fix possible segfault in schema inference""'. #37787 (Kruglov Pavel).
  • NO CL ENTRY: 'Add more Rust client libraries to documentation'. #37880 (Paul Loyd).
  • NO CL ENTRY: 'Revert "Fix errors of CheckTriviallyCopyableMove type"'. #37902 (Anton Popov).
  • NO CL ENTRY: 'Revert "Don't try to kill empty list of containers in integration/runner"'. #38001 (Alexander Tokmakov).
  • NO CL ENTRY: 'Revert "add d3js based trace visualizer as gantt chart"'. #38043 (Alexander Tokmakov).
  • NO CL ENTRY: 'Revert "Add backoff to merges in replicated queue if always_fetch_merged_part is enabled"'. #38082 (Alexander Tokmakov).
  • NO CL ENTRY: 'Revert "More parallel execution for queries with FINAL"'. #38094 (Alexander Tokmakov).
  • NO CL ENTRY: 'Revert "Revert "add d3js based trace visualizer as gantt chart""'. #38129 (Alexey Milovidov).

NOT FOR CHANGELOG / INSIGNIFICANT