ClickHouse/docs/changelogs/v21.12.1.9017-prestable.md

75 KiB

sidebar_position sidebar_label
1 2022

2022 Changelog

ClickHouse release v21.12.1.9017-prestable FIXME as compared to v21.11.1.8636-prestable

Backward Incompatible Change

  • Add custom null representation support for TSV/CSV input formats. Fix deserialing Nullable(String) in TSV/CSV/JSONCompactStringsEachRow/JSONStringsEachRow input formats. Rename output_format_csv_null_representation and output_format_tsv_null_representation to format_csv_null_representation and format_tsv_null_representation accordingly. #30497 (Kruglov Pavel).
  • Return unquoted string in JSON_VALUE. Closes #27965. #31008 (Kseniia Sumarokova).
  • Do not allow direct select for Kafka/RabbitMQ/FileLog. Can be enables by setting stream_like_engine_allow_direct_select. Direct select will be not allowed even if enabled by setting in case there is attached materialized view. For Kafka and RabbitMQ direct select if allowed, will not commit massages by default. To enable commits with direct select user must use storage level setting kafka{rabbitmq}_commit_on_select=1 (default 0). cc @filimonov. #31053 (Kseniia Sumarokova).
  • A "leader election" mechanism is removed from ReplicatedMergeTree, because multiple leaders are supported since 20.6. If you are upgrading from older version and some replica with old version is a leader, then server will fail to start after upgrade. Stop replicas with old version to make new version start. After that it will not be possible to downgrade to version older than 20.6. #32140 (Alexander Tokmakov).

New Feature

Performance Improvement

Improvement

  • Enable clang -fstrict-vtable-pointers, -fwhole-program-vtables compile options. #20151 (Maksim Kita).
  • Skipping mutations of different partitions in StorageMergeTree. #21326 (Vladimir Chebotarev).
  • Closes #12552. Allow versioning of aggregate function states. #24820 (Kseniia Sumarokova).
  • Add optimizations for constant conditions in JOIN ON, ref #26928. #27021 (Vladimir C).
  • Add support for Identifier table and database query parameters. Closes #27226. #28668 (Nikolay Degterinsky).
  • Allow to specify one or any number of PostgreSQL schemas for one MaterializedPostgreSQL database. Closes #28901. Closes #29324. #28933 (Kseniia Sumarokova).
  • Make reading from HTTP retriable. Closes #29696. #29894 (Kseniia Sumarokova).
  • Add support for parallel reading from multiple files and support globs in FROM INFILE clause. #30135 (Filatenkov Artur).
    • Refactor formats TSV, TSVRaw, CSV and JSONCompactEachRow, JSONCompactStringsEachRow, remove code duplication, add base interface for formats with -WithNames and -WithNamesAndTypes suffixes. - Add formats CSVWithNamesAndTypes, TSVRawWithNames, TSVRawWithNamesAndTypes, JSONCompactEachRowWIthNames, JSONCompactStringsEachRowWIthNames, RowBinaryWithNames - Support parallel parsing for formats TSVWithNamesAndTypes, TSVRaw(WithNames/WIthNamesAndTypes), CSVWithNamesAndTypes, JSONCompactEachRow(WithNames/WIthNamesAndTypes), JSONCompactStringsEachRow(WithNames/WIthNamesAndTypes). - Support columns mapping and types checking for RowBinaryWithNamesAndTypes format. - Add setting input_format_with_types_use_header which specify if we should check that types written in <format_name>WIthNamesAndTypes format matches with table structure. - Add setting input_format_csv_empty_as_default and use it in CSV format instead of input_format_defaults_for_omitted_fields (because this setting should't control csv_empty_as_default). - Fix usage of setting input_format_defaults_for_omitted_fields (it was used only as csv_empty_as_default, but it should control calculation of default expressions for omitted fields) - Fix Nullable input/output in TSVRaw format, make this format fully compatible with inserting into TSV. - Fix inserting NULLs in LowCardinality(Nullable) when input_format_null_as_default is enabled (previously default values was inserted instead of actual NULLs). - Fix strings deserialization in JSONStringsEachRow/JSONCompactStringsEachRow formats (strings were parsed just until first '\n' or '\t') - Add ability to use Raw escaping rule in Template input format. - Add diagnostic info for JSONCompactEachRow(WithNames/WIthNamesAndTypes) input format. - Fix bug with parallel parsing of -WithNames formats in case when setting min_chunk_bytes_for_parallel_parsing is less than bytes in a single row. #30178 (Kruglov Pavel).
  • Avro format works against Kafka. Setting output_format_avro_rows_in_file added. #30351 (Ilya Golshtein).
  • Implement the commands BACKUP and RESTORE for the Log family. #30688 (Vitaly Baranov).
  • Fix possible "The local set of parts of X doesn't look like the set of parts in ZooKeeper" error (if DROP fails during removing znodes from zookeeper). #30826 (Azat Khuzhin).
  • For clickhouse-local or clickhouse-client if there is --interactive option with --query or --queries-file, then first execute them like in non-interactive and then start interactive mode. #30851 (Kseniia Sumarokova).
  • added \l, \d, \c aliases like in MySQL. #30876 (Pavel Medvedev).
  • Fix --verbose option in clickhouse-local interactive mode and allow logging into file. #30881 (Kseniia Sumarokova).
  • Support INTERVAL type in STEP clause for WITH FILL modifier. #30927 (Anton Popov).
  • Reduce memory usage when reading with s3 / url / hdfs formats Parquet, ORC, Arrow (controlled by setting input_format_allow_seeks, enabled by default). Also add setting remote_read_min_bytes_for_seek to control seeks. Closes #10461. Closes #16857. #30936 (Kseniia Sumarokova).
  • Add settings merge_tree_min_rows_for_concurrent_read_for_remote_filesystem and merge_tree_min_bytes_for_concurrent_read_for_remote_filesystem. #30970 (Kseniia Sumarokova).
  • Do not allow to drop a table or dictionary if some tables or dictionaries depend on it. #30977 (Alexander Tokmakov).
  • Only grab AlterLock when we do alter command. Let's see if the assumption is correct. #31010 (Amos Bird).
  • The local session inside a Clickhouse dictionary source won't send its events to the session log anymore. This fixes a possible deadlock (tsan alert) on shutdown. Also this PR fixes flaky test_dictionaries_dependency_xml/. #31013 (Vitaly Baranov).
  • Cancel vertical merges when partition is dropped. This is a follow-up of https://github.com/ClickHouse/ClickHouse/pull/25684 and https://github.com/ClickHouse/ClickHouse/pull/30996. #31057 (Amos Bird).
  • Support IF EXISTS modifier for RENAME DATABASE/TABLE/DICTIONARY query, If this directive is used, one will not get an error if the DATABASE/TABLE/DICTIONARY to be renamed doesn't exist. #31081 (victorgao).
  • Function name normalization for ALTER queries. This helps avoid metadata mismatch between creating table with indices/projections and adding indices/projections via alter commands. This is a follow-up PR of https://github.com/ClickHouse/ClickHouse/pull/20174. Mark as improvements as there are no bug reports and the senario is somehow rare. #31095 (Amos Bird).
  • Enable multiline editing in clickhouse-client by default. This addresses #31121 . #31123 (Amos Bird).
  • Use DiskPtr instead of OS's file system API in class IDiskRemote in order to get more extendiability. Closes #31117. #31136 (Yangkuan Liu).
  • Now every replica will send to client only incremental information about profile events counters. #31155 (Dmitry Novik).
    • Syntax changed so now backup engine should be set explicitly: BACKUP ... TO Disk('backups', 'path\') - Changed the format of backup's metadata, now it's in XML - Backup of a whole database now works. #31178 (Vitaly Baranov).
  • Improved backoff for background cleanup tasks in MergeTree. Settings merge_tree_clear_old_temporary_directories_interval_seconds and merge_tree_clear_old_parts_interval_seconds moved form users settings to merge tree settings. #31180 (Alexander Tokmakov).
  • Optimize function mapContains to reading of subcolumn key with enabled settings optimize_functions_to_subcolumns. #31218 (Anton Popov).
  • If some obsolete setting is changed show warning in system.warnings. #31252 (Alexander Tokmakov).
  • Optimize function tupleElement to reading of subcolumn with enabled setting optimize_functions_to_subcolumns. #31261 (Anton Popov).
  • Initial user's roles are used now to find row policies, see #31080. #31262 (Vitaly Baranov).
  • Previously progress was shown only for numbers table function, not for numbers_mt. Now for numbers_mt it is also shown. #31318 (Kseniia Sumarokova).
  • return fake create query when executing show create table on system's tables. #31391 (SuperDJY).
  • MaterializedMySQL now handles CREATE TABLE ... LIKE ... DDL queries. #31410 (Stig Bakken).
  • Default value of http_send_timeout and http_receive_timeout settings changed from 1800 (30 minutes) to 180 (3 minutes). #31450 (Alexander Tokmakov).
  • Throw an exception if there is some garbage after field in JSONCompactStrings(EachRow) format. #31455 (Kruglov Pavel).
  • Fix waiting of the editor during interactive query edition (waitpid() returns -1 on SIGWINCH and EDITOR and clickhouse-local/clickhouse-client works concurrently). #31456 (Azat Khuzhin).
  • Add --pager support for clickhouse-local. #31457 (Azat Khuzhin).
  • Better analysis for min/max/count projection. Now, with enabled allow_experimental_projection_optimization, virtual min/max/count projection can be used together with columns from partition key. #31474 (Amos Bird).
  • Use shard and replica name from Replicated database arguments when expanding macros in ReplicatedMergeTree arguments if these macros are not defined in config. Closes #31471. #31488 (Alexander Tokmakov).
  • Better exception message when users.xml cannot be loaded due to bad password hash. This closes #24126. #31557 (Vitaly Baranov).
  • Improve the max_execution_time checks. Fixed some cases when timeout checks do not happen and query could run too long. #31636 (Raúl Marín).
  • Add bindings for navigating through history (instead of lines/history). #31641 (Azat Khuzhin).
  • Always re-render prompt while navigating history in clickhouse-client. This will improve usability of manipulating very long queries that don't fit on screen. #31675 (Alexey Milovidov).
  • Allow to use named collections configuration for kafka and rabbitmq engines (the same way as for other intgration table engines). #31691 (Kseniia Sumarokova).
  • ClickHouse dictionary source support named connections. Closes #31705. #31749 (Kseniia Sumarokova).
  • MaterializedMySQL: Fix issue with table named 'table'. #31781 (Håvard Kvålen).
  • Recreate system.*_log tables in case of different engine/partition_by. #31824 (Azat Khuzhin).
  • Fix the issue that LowCardinality of Int256 cannot be created. #31832 (Alexey Milovidov).
  • Support PostgreSQL style ALTER MODIFY COLUMN. #32003 (SuperDJY).
  • Remove excessive DESC TABLE requests for remote() (in case of remote('127.1', system.one) (i.e. identifier as the db.table instead of string) there was excessive DESC TABLE request). #32019 (Azat Khuzhin).
    • Fix a bug that opentelemetry span log duration is zero at the query level if there's query exception. #32038 (Frank Chen).
  • Added ClickHouse exception and exception_code fields to opentelemetry span log. #32040 (Frank Chen).
  • Allow a user configured hdfs_replication parameter for DiskHdfs and StorageHdfs. Closes #32039. #32049 (leosunli).
  • Allow to write + before Float32/Float64 values. #32079 (Kruglov Pavel).
    • returns Content-Type as 'application/json' for JSONEachRow format if output_format_json_array_of_rows is enabled. #32112 (Frank Chen).
  • Now clickhouse-keeper refuse to start or apply configuration changes when they contain duplicated IDs or endpoints. Fixes #31339. #32121 (alesapin).
  • Added update_field support for RangeHashedDictionary, ComplexKeyRangeHashedDictionary. #32185 (Maksim Kita).
  • Improve skiping unknown fields with Quoted escaping rule in Template/CustomSeparated formats. Previously we could skip only quoted strings, now we can skip values with any type. #32204 (Kruglov Pavel).
  • Use Content-Type: application/x-ndjson (http://ndjson.org/) for output format JSONEachRow. #32223 (Dmitriy Dorofeev).
  • Support default expression for storage hdfs and optimize fetching when source is column oriented. #32256 (李扬).

Bug Fix

  • Memory amount was incorrectly estimated when ClickHouse is run in containers with cgroup limits. #31157 (Pavel Medvedev).
  • Fix SHOW GRANTS when partial revokes are used. This PR fixes #31138. #31249 (Vitaly Baranov).
  • Quota limit was not reached, but the limit was exceeded. This PR fixes #31174. #31337 (sunny).
  • Fix skipping columns while writing protobuf. This PR fixes #31160, see the comment #31160#issuecomment-980595318. #31988 (Vitaly Baranov).
  • Fix bug when remove unneeded columns in subquery. If there is an aggregation function in query without group by, do not remove if it is unneeded. #32289 (dongyifeng).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehaviour in official stable or prestable release)

  • Fix some corner cases with intersect/except. Closes #30803. #30965 (Kseniia Sumarokova).
  • Skip max_partition_size_to_drop check in case of ATTACH PARTITION ... FROM and MOVE PARTITION ... #30995 (Amr Alaa).
  • Fix bug which broke select queries if they happened after dropping materialized view. Found in #30691. #30997 (Kseniia Sumarokova).
  • Using formatRow function with not row formats led to segfault. Don't allow to use this function with such formats (because it doesn't make sense). #31001 (Kruglov Pavel).
  • Fix JSONValue/Query with quoted identifiers. This allows to have spaces in json path. Closes #30971. #31003 (Kseniia Sumarokova).
  • Fix possible assert in hdfs table function/engine, add test. #31036 (Kruglov Pavel).
  • Fix abort in debug server and DB::Exception: std::out_of_range: basic_string error in release server in case of bad hdfs url by adding additional check of hdfs url structure. #31042 (Kruglov Pavel).
  • Fix StorageMerge with aliases and where (it did not work before at all). Closes #28802. #31044 (Kseniia Sumarokova).
  • Rewrite right distributed table in local join. solves #25809. #31105 (abel-cheng).
  • Fix bug in Keeper which can lead to inability to start when some coordination logs was lost and we have more fresh snapshot than our latest log. #31150 (alesapin).
  • Remove not like function into RPNElement. #31169 (sundyli).
  • Resolve nullptr in STS credentials provider for S3. #31409 (Vladimir Chebotarev).
  • Fix bug with group by and positional arguments. Closes #31280#issuecomment-968696186. #31420 (Kseniia Sumarokova).
  • Fix progress for short INSERT SELECT queries. #31510 (Azat Khuzhin).
    • Disable partial_merge_join_left_table_buffer_bytes before bug in this optimization is fixed. See #31009). * Remove redundant option partial_merge_join_optimizations. #31528 (Vladimir C).
  • Fix invalid generated JSON when only column names contain invalid UTF-8 sequences. #31534 (Kevin Michel).
  • All non-x86 builds were broken, because we don't have tests for them. This closes #31417. This closes #31524. #31574 (Alexey Milovidov).
  • Fix sparkbars are not aligned, see: #26175#issuecomment-960353867, comment. #31624 (小路).
  • RENAME TABLE query worked incorrectly on attempt to rename an DDL dictionary in Ordinary database, it's fixed. #31638 (Alexander Tokmakov).
  • Fixed null pointer exception in MATERIALIZE COLUMN. #31679 (Vladimir Chebotarev).
  • Settings input_format_allow_errors_num and input_format_allow_errors_ratio did not work for parsing of domain types, such as IPv4, it's fixed. Fixes #31686. #31697 (Alexander Tokmakov).
    • Fixed function ngrams when string contains utf8 characters. #31706 (yandd).
  • Fix exception on some of the applications of decrypt function on Nullable columns. This closes #31662. This closes #31426. #31707 (Alexey Milovidov).
  • Fixed there are no such cluster here error on execution of ON CLUSTER query if specified cluster name is name of Replicated database. #31723 (Alexander Tokmakov).
  • Fix race in JSONEachRowWithProgress output format when data and lines with progress are mixed in output. #31736 (Kruglov Pavel).
  • Fixed rare segfault on concurrent ATTACH PARTITION queries. #31738 (Alexander Tokmakov).
  • Fix disabling query profiler (In case of query_profiler_real_time_period_ns>0/query_profiler_cpu_time_period_ns>0 query profiler can stayed enabled even after query finished). #31740 (Azat Khuzhin).
  • Fix group by / order by / limit by aliases with positional arguments enabled. Closes #31173. #31741 (Kseniia Sumarokova).
  • Fix usage of Buffer table engine with type Map. Fixes #30546. #31742 (Anton Popov).
  • Fix crash with empty result on odbc query. Closes #31465. #31766 (Kseniia Sumarokova).
  • Fix crash when function dictGet with type is used for dictionary attribute when type is Nullable. Fixes #30980. #31800 (Maksim Kita).
  • Fix possible assertion ../src/IO/ReadBuffer.h:58: bool DB::ReadBuffer::next(): Assertion '!hasPendingData()' failed. in TSKV format. #31804 (Kruglov Pavel).
  • Fix recursive user defined functions crash. Closes #30856. #31820 (Maksim Kita).
  • Fix invalid cast of nullable type when nullable primary key is used. This fixes #31075. #31823 (Amos Bird).
  • Fix reading from MergeTree tables with enabled use_uncompressed_cache. #31826 (Anton Popov).
  • Fix a bug about function transform with decimal args. #31839 (Shuai li).
    • Change configuration path from keeper_server.session_timeout_ms to keeper_server.coordination_settings.session_timeout_ms when constructing a KeeperTCPHandler - Same with operation_timeout. #31859 (JackyWoo).
  • Fix functions empty and notEmpty with arguments of UUID type. Fixes #31819. #31883 (Anton Popov).
  • Some GET_PART entry might hang in replication queue if part is lost on all replicas and there are no other parts in the same partition. It's fixed in cases when partition key contains only columns of integer types or Date[Time]. Fixes #31485. #31887 (Alexander Tokmakov).
  • Fix FileLog engine unnesessary create meta data directory when create table failed. Fix #31962. #31967 (flynn).
  • MaterializedMySQL: Fix rare corruption of DECIMAL data. #31990 (Håvard Kvålen).
  • Fixed Directory ... already exists and is not empty error when detaching part. #32063 (Alexander Tokmakov).
  • Fix CREATE TABLE of Join Storage with multiply settings contains persistency. Close #31680. #32066 (SuperDJY).
  • Fix CAST from Nullable with cast_keep_nullable (PARAMETER_OUT_OF_BOUND error before for i.e. toUInt32OrDefault(toNullable(toUInt32(1)))). #32080 (Azat Khuzhin).
  • Dictionaries fix cases when {condition} does not work for custom database queries. #32117 (Maksim Kita).
  • Number of active replicas might be determined incorrectly when inserting with quorum if setting replicated_can_become_leader is disabled on some replicas. It's fixed. #32157 (Alexander Tokmakov).
  • XML dictionaries identifiers, used in table create query, can be qualified to default_database during upgrade to newer version. Closes #31963. #32187 (Maksim Kita).
  • Fix parsing error while NaN deserializing for Nullable(Float) for Quoted escaping rule. #32190 (Kruglov Pavel).
  • Fix window view parser. #32232 (vxider).
  • Server might fail to start with Cannot attach 1 tables due to cyclic dependencies error if Dictionary table looks at XML-dictionary with the same name, it's fixed. Fixes #31315. #32288 (Alexander Tokmakov).
  • Fixed crash with SIGFPE in aggregate function avgWeighted with Decimal argument. Fixes #32053. #32303 (Alexander Tokmakov).
  • Fix ALTER ... MATERIALIZE COLUMN ... queries in case when data type of default expression is not equal to the data type of column. #32348 (Anton Popov).
  • Fixed the behavior when mutations that have nothing to do are stuck (with enabled setting empty_result_for_aggregation_by_empty_set). #32358 (Nikita Mikhaylov).

Build

  • support compile in arm machine with parameter "-DENABLE_TESTS=OFF". #31007 (zhanghuajie).

Improvement (changelog entry is not required)

Imrovement (changelog entry is not required)

  • Rename setting value read_threadpool to threadpool for setting remote_filesystem_read_method. #31224 (Kseniia Sumarokova).

NO CL ENTRY

NOT FOR CHANGELOG / INSIGNIFICANT