ClickHouse/docs/changelogs/v20.10.1.4881-prestable.md

42 KiB

ClickHouse release v20.10.1.4881-prestable FIXME as compared to v20.9.1.4585-prestable

Backward Incompatible Change

  • Add support for nested multiline comments /* comment /* comment */ */ in SQL. This conforms to the SQL standard. #14655 (Alexey Milovidov).
  • Change default value of format_regexp_escaping_rule setting (it's related to Regexp format) to Raw (it means - read whole subpattern as a value) to make the behaviour more like to what users expect. #15426 (Alexey Milovidov).
  • Make multiple_joins_rewriter_version obsolete. Remove first version of joins rewriter. #15472 (Artem Zuikov).

New Feature

  • Supporting MySQL types: decimal (as ClickHouse Decimal) and datetime with sub-second precision (as DateTime64). ... #11512 (Vasily Nemkov).
  • Allow to turn on fsync on inserts, merges and fetches. #11948 (Anton Popov).
  • Secure inter-cluster query execution (with initial_user as current query user). #13156 (Azat Khuzhin).
  • Allow user to specify settings for ReplicatedMergeTree* storage in <replicated_merge_tree> section of config file. It works similarly to <merge_tree> section. For ReplicatedMergeTree* storages settings from <merge_tree> and <replicated_merge_tree> are applied together, but settings from <replicated_merge_tree> has higher priority. Added system.replicated_merge_tree_settings table. #13573 (Amos Bird).
  • Add new feature: format LineAsString that accepts a sequence of line separated by newlines, spaces and/or commas. #13846 (hexiaoting).
  • New query complexity limit settings max_rows_to_read_leaf, max_bytes_to_read_leaf for distributed queries to limit max rows/bytes read on the leaf nodes. Limit is applied for local reads only, excluding the final merge stage on the root node. #14221 (Roman Khavronenko).
  • Add JSONStrings formats which output data in arrays of strings. #14333 (hcz).
  • Now insert statements can have asterisk (or variants) with column transformers in the column list. #14453 (Amos Bird).
  • Added a script to import git repository to ClickHouse. #14471 (Alexey Milovidov).
  • Add the ability to specify TTL ... RECOMPRESS codec_name for MergeTree table engines family. #14494 (alesapin).
  • Add event_time_microseconds to system.asynchronous_metric_log & system.metric_log tables. #14514 (Bharat Nallan).
  • Add new feature: SHOW DATABASES LIKE 'xxx'. #14521 (hexiaoting).
  • Support decimal data type for MaterializedMySQL. #14535 (Winter Zhang).
  • Allow configurable NULL representation for TSV output format. It is controlled by the setting output_format_tsv_null_representation which is \N by default. This closes #9375. Note that the setting only controls output format and \N is the only supported NULL representation for TSV input format. #14586 (Kruglov Pavel).
  • Add new feature: format LineAsString that accepts a sequence of line separated by newlines, spaces and/or commas. #14703 (Nikita Mikhaylov).
  • Added formatReadableQuantity function. It is useful for reading big numbers by human. #14725 (Artem Hnilov).
  • Add the ability to remove column properties and table TTLs. Introduced queries ALTER TABLE MODIFY COLUMN col_name REMOVE what_to_remove and ALTER TABLE REMOVE TTL. Both operations are lightweight and executed at the metadata level. #14742 (alesapin).
  • Introduce event_time_microseconds field to system.text_log, system.trace_log, system.query_log and system.query_thread_log tables. #14760 (Bharat Nallan).
  • Now we support WITH <identifier> AS (subquery) ... to introduce named subqueries in the query context. This closes #2416. This closes #4967. #14771 (Amos Bird).
  • Allow to omit arguments for Replicated table engine if defaults are specified in config. #14791 (vxider).
  • Add table function null('structure'). #14797 (vxider).
  • Added query obfuscation tool. It allows to share more queries for better testing. This closes #15268. #15321 (Alexey Milovidov).
  • Added format RawBLOB. It is intended for input or output a single value without any escaping and delimiters. This closes #15349. #15364 (Alexey Milovidov).
  • fix 15350. #15443 (flynn).
  • Introduce enable_global_with_statement setting which propagates the first select's WITH statements to other select queries at the same level, and makes aliases in WITH statements visible to subqueries. #15451 (Amos Bird).
  • Add the reinterpretAsUUID function that allows to convert a big-endian byte string to UUID. #15480 (Alexander Kuzmenkov).
  • Add parallel quorum inserts. This closes #15601. #15601 (Latysheva Alexandra).

Performance Improvement

  • Enable compact parts by default for small parts. This will allow to process frequent inserts slightly more efficiently (4..100 times). #11913 (Alexey Milovidov).
  • Improve performance of 256-bit bytes using (u)int64_t as base type for wide integers. Original wide integers use 8-bit types as base. #14859 (Artem Zuikov).
  • Only mlock code segment when starting clickhouse-server. In previous versions, all mapped regions were locked in memory, including debug info. Debug info is usually splitted to a separate file but if it isn't, it led to +2..3 GiB memory usage. #14929 (Alexey Milovidov).
  • We used to choose fixed key method to group by one fixed string. It's unnecessary since we have StringHashTable which do the similar packedFix optimization for FixedString columns. And we should use low_cardinality_key_fixed_string if possible. #15034 (Amos Bird).
  • Fix DateTime <op> DateTime mistakenly choosing the slow generic implementation. This fixes #15153 . #15178 (Amos Bird).
  • Use one S3 DeleteObjects request instead of multiple DeleteObject in cycle. No any functionality changes, so covered by existing tests like integration/test_log_family_s3. #15238 (ianton-ru).
  • Faster 256-bit multiplication. #15418 (Artem Zuikov).
  • Improve quantileTDigest performance. This fixes #2668. #15542 (Kruglov Pavel).
  • Explicitly use a temporary disk to store vertical merge temporary data. #15639 (Pervakov Grigorii).

Improvement

  • When duplicate block is written to replica where it does not exist locally (has not been fetched from replicas), don't ignore it and write locally to achieve the same effect as if it was successfully replicated. #11684 (Alexey Milovidov).
  • Support custom codecs in compact parts. #12183 (Anton Popov).
  • Now joinGet supports multi-key lookup. Continuation of #12418. #13015 (Amos Bird).
  • For INSERTs with inline data in VALUES format, support semicolon as the data terminator, in addition to the new line. Closes #12288. #13192 (Alexander Kuzmenkov).
  • SYSTEM RELOAD CONFIG now throws an exception if failed to reload and continues using the previous users.xml. The background periodic reloading also continues using the previous users.xml if failed to reload. #14492 (Vitaly Baranov).
  • Add an option to skip access checks for DiskS3. #14497 (Pavel Kovalenko).
  • ClickHouse treats partition expr and key expr differently. Partition expr is used to construct an minmax index containing related columns, while primary key expr is stored as an expr. Sometimes user might partition a table at coarser levels, such as partition by i / 1000. However, binary operators are not monotonic and this PR tries to fix that. It might also benifit other use cases. #14513 (Amos Bird).
  • Fix some trailing whitespaces in query format. #14595 (Azat Khuzhin).
  • Add QueryMemoryLimitExceeded event. This closes #14589. #14647 (fastio).
  • Fixed the backward-incompatible change by providing the options to build without debug info for functions. #14657 (Mike Kot).
  • dynamic reload zookeeper config. #14678 (sundyli).
  • Allow parallel execution of distributed DDL. #14684 (Azat Khuzhin).
  • Fix potential memory leak caused by zookeeper exists watch. #14693 (hustnn).
  • Fixed "Packet payload is not fully read" error in MaterializeMySQL database engine. #14696 (BohuTANG).
  • Fix crash in bitShiftLeft() when called with negative big integer. #14697 (Artem Zuikov).
  • Add merge_algorithm to system.merges table to improve merging inspections. #14705 (Amos Bird).
  • Less unneded code generated by DecimalBinaryOperation template in FunctionBinaryArithmetic. #14743 (Artem Zuikov).
  • Now columns can be used to wrap over a list of columns and apply column transformers afterwards. #14775 (Amos Bird).
  • Support for disabling persistency for StorageJoin and StorageSet, this feature is controlled by setting disable_set_and_join_persistency. And this PR solved issue #6318. #14776 (vxider).
  • Construct query_start_time and query_start_time_microseconds from the same timespec. #14831 (Bharat Nallan).
  • Allow using multi-volume storage configuration in storage Distributed. #14839 (Pavel Kovalenko).
  • Show subqueries for SET and JOIN in EXPLAIN result. #14856 (Nikolai Kochetov).
  • Provide a load_balancing_first_offset query setting to explicitly state what the first replica is. It's used together with FIRST_OR_RANDOM load balancing strategy, which allows to control replicas workload. #14867 (Amos Bird).
  • Fixed excessive settings constraint violation when running SELECT with SETTINGS from a distributed table. #14876 (Amos Bird).
  • Allow to drop Replicated table if previous drop attempt was failed due to ZooKeeper session expiration. This fixes #11891. #14926 (Alexey Milovidov).
  • Avoid deadlock when executing INSERT SELECT into itself from a table with TinyLog or Log table engines. This closes #6802. #14962 (Alexey Milovidov).
  • Ignore key constraints when doing mutations. Without this pr, it's not possible to do mutations when force_index_by_date = 1 or force_primary_key = 1. #14973 (Amos Bird).
  • Add option to disable TTL move on data part insert. #15000 (Pavel Kovalenko).
  • Enable Atomic database engine by default. #15003 (Alexander Tokmakov).
  • Proper exception message for wrong number of arguments of CAST. This closes #13992. #15029 (Alexey Milovidov).
  • Add the ability to specify specialized codecs like Delta, T64, etc. for columns with subtypes. Implements #12551, fixes #11397, fixes #4609. #15089 (alesapin).
  • Added optimize setting to EXPLAIN PLAN query. If enabled, query plan level optimisations are applied. Enabled by default. #15201 (Nikolai Kochetov).
  • Do not allow connections to ClickHouse server until all scripts in /docker-entrypoint-initdb.d/ are executed. #15244 (Aleksei Kozharin).
  • fix 15264. #15285 (flynn).
  • Unfold {database}, {table} and {uuid} macros in zookeeper_path on replicated table creation. Do not allow RENAME TABLE if it may break zookeeper_path after server restart. Fixes #6917. #15348 (Alexander Tokmakov).
  • Add support for "Raw" column format for Regexp format. It allows to simply extract subpatterns as a whole without any escaping rules. #15363 (Alexey Milovidov).
  • Now it's possible to change the type of version column for VersionedCollapsingMergeTree with ALTER query. #15442 (alesapin).
  • Wait for DROP/DETACH TABLE to actually finish if NO DELAY or SYNC is specified for Atomic database. #15448 (Alexander Tokmakov).
  • Pass through *_for_user settings via Distributed with cluster-secure. #15551 (Azat Khuzhin).
  • Use experimental pass manager by default. #15608 (Daniel Kutenin).
  • Implement force_data_skipping_indices setting. #15642 (Azat Khuzhin).

Bug Fix

  • Fix currentDatabase() function cannot be used in ON CLUSTER ddl query. #14211 (Winter Zhang).
  • Fixed the incorrect sorting order of Nullable column. This fixes #14344. #14495 (Nikita Mikhaylov).
  • Fix executable dictionary source hang. In previous versions, when using some formats (e.g. JSONEachRow) data was not feed to a child process before it outputs at least something. This closes #1697. This closes #2455. #14525 (Alexey Milovidov).
  • Fix a bug when converting Nullable String to Enum. Introduced by https://github.com/ClickHouse/ClickHouse/pull/12745 . This fixes #14435 . #14530 (Amos Bird).
  • Fix rare segfaults in functions with combinator -Resample, which could appear in result of overflow with very large parameters. #14562 (Anton Popov).
  • Cleanup data directory after Zookeeper exceptions during CreateQuery for StorageReplicatedMergeTree Engine. #14563 (Bharat Nallan).
  • Added the checker as neither calling lc->isNullable() nor calling ls->getDictionaryPtr()->isNullable() would return the correct result. #14591 (Mike Kot).
  • Fix wrong Decimal multiplication result caused wrong decimal scale of result column. #14603 (Artem Zuikov).
  • Stuff the query into ASTFunction's argument list so that we don't break the presumptions of some AST visitors. This fixes #14608. #14611 (Amos Bird).
  • Fix bug when ALTER UPDATE mutation with Nullable column in assignment expression and constant value (like UPDATE x = 42) leads to incorrect value in column or segfault. Fixes #13634, #14045. #14646 (alesapin).
  • Fixed missed default database name in metadata of materialized view when executing ALTER ... MODIFY QUERY. #14664 (Alexander Tokmakov).
  • Replace column transformer should replace identifiers with cloned ASTs. This fixes #14695 . #14734 (Amos Bird).
  • Fix wrong monotonicity detection for shrunk Int -> Int cast of signed types. It might lead to incorrect query result. This bug is unveiled in #14513. #14783 (Amos Bird).
  • Fix unreleased bug for LineAsString Format. #14842 (hexiaoting).
  • Fix a problem where the server may get stuck on startup while talking to ZooKeeper, if the configuration files have to be fetched from ZK (using the from_zk include option). This fixes #14814. #14843 (Alexander Kuzmenkov).
  • Fix rare error in SELECT queries when the queried column has DEFAULT expression which depends on the other column which also has DEFAULT and not present in select query and not exists on disk. Partially fixes #14531. #14845 (alesapin).
  • Fixed bug in parsing MySQL binlog events, which causes Attempt to read after eof and Packet payload is not fully read in MaterializeMySQL database engine. #14852 (Winter Zhang).
  • Fixed segfault in CacheDictionary #14837. #14879 (Nikita Mikhaylov).
  • Fix SIGSEGV for an attempt to INSERT into StorageFile(fd). #14887 (Azat Khuzhin).
  • Fix the issue when some invocations of extractAllGroups function may trigger "Memory limit exceeded" error. This fixes #13383. #14889 (Alexey Milovidov).
  • Fixed .metadata.tmp File exists error when using MaterializeMySQL database engine. #14898 (Winter Zhang).
  • Publish CPU frequencies per logical core in system.asynchronous_metrics. This fixes #14923. #14924 (Alexander Kuzmenkov).
  • Fix to make predicate push down work when subquery contains finalizeAggregation function. Fixes #14847. #14937 (filimonov).
  • Update jemalloc to fix possible issues with percpu arena. #14957 (Azat Khuzhin).
  • Now settings number_of_free_entries_in_pool_to_execute_mutation and number_of_free_entries_in_pool_to_lower_max_size_of_merge can be equal to background_pool_size. #14975 (alesapin).
  • Fix crash in RIGHT or FULL JOIN with join_algorith='auto' when memory limit exceeded and we should change HashJoin with MergeJoin. #15002 (Artem Zuikov).
  • Fixed Cannot rename ... errno: 22, strerror: Invalid argument error on DDL query execution in Atomic database when running clickhouse-server in docker on Mac OS. #15024 (Alexander Tokmakov).
  • If function bar was called with specifically crafted arguments, buffer overflow was possible. This closes #13926. #15028 (Alexey Milovidov).
  • We already use padded comparison between String and FixedString (https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/FunctionsComparison.h#L333). This PR applies the same logic to field comparison which corrects the usage of FixedString as primary keys. This fixes #14908. #15033 (Amos Bird).
  • Fixes Data compressed with different methods in join_algorithm='auto'. Keep LowCardinality as type for left table join key in join_algorithm='partial_merge'. #15088 (Artem Zuikov).
  • Adjust decimals field size in mysql column definition packet. #15152 (maqroll).
  • Fix bug in table engine Buffer which doesn't allow to insert data of new structure into Buffer after ALTER query. Fixes #15117. #15192 (alesapin).
  • Fix instance crash when using joinGet with LowCardinality types. This fixes #15214. #15220 (Amos Bird).
  • Fix 'Unknown identifier' in GROUP BY when query has JOIN over Merge table. #15242 (Artem Zuikov).
  • Fix MSan report in QueryLog. Uninitialized memory can be used for the field memory_usage. #15258 (Alexey Milovidov).
  • Fix hang of queries with a lot of subqueries to same table of MySQL engine. Previously, if there were more than 16 subqueries to same MySQL table in query, it hang forever. #15299 (Anton Popov).
  • Fix rare race condition on server startup when system.logs are enabled. #15300 (alesapin).
  • Fix race condition during MergeTree table rename and background cleanup. #15304 (alesapin).
  • Fix bug where queries like SELECT toStartOfDay(today()) fail complaining about empty time_zone argument. #15319 (Bharat Nallan).
  • Fixed compression in S3 storage. #15376 (Vladimir Chebotarev).
  • Fix multiple occurrences of column transformers in a select query. #15378 (Amos Bird).
  • fixes #15365 fix attach mysql database engine throw exception(no query context). #15384 (Winter Zhang).
  • Report proper error when the second argument of boundingRatio aggregate function has a wrong type. #15407 (detailyang).
  • Fix bug with event subscription in DDLWorker which rarely may lead to query hangs in ON CLUSTER. Introduced in #13450. #15477 (alesapin).
  • Throw an error when a single parameter is passed to ReplicatedMergeTree instead of ignoring it. #15516 (nvartolomei).
  • Fix Missing columns errors when selecting columns which absent in data, but depend on other columns which also absent in data. Fixes #15530. #15532 (alesapin).
  • Fix bug when ILIKE operator stops being case insensitive if LIKE with the same pattern was executed. #15536 (alesapin).
  • Mutation might hang waiting for some non-existent part after MOVE or REPLACE PARTITION or, in rare cases, after DETACH or DROP PARTITION. It's fixed. #15537 (Alexander Tokmakov).
  • Fix 'Database doesn't exist.' in queries with IN and Distributed table when there's no database on initiator. #15538 (Artem Zuikov).
  • Significantly reduce memory usage in AggregatingInOrderTransform/optimize_aggregation_in_order. #15543 (Azat Khuzhin).
  • Prevent the possibility of error message Could not calculate available disk space (statvfs), errno: 4, strerror: Interrupted system call. This fixes #15541. #15557 (Alexey Milovidov).
  • Query is finished faster in case of exception. Cancel execution on remote replicas if exception happens. #15578 (Azat Khuzhin).
  • Fixed Element ... is not a constant expression error when using JSON* function result in VALUES, LIMIT or right side of IN operator. #15589 (Alexander Tokmakov).
  • Fix the order of destruction for resources in ReadFromStorage step of query plan. It might cause crashes in rare cases. Possibly connected with #15610. #15645 (Nikolai Kochetov).
  • Proper error handling during insert into MergeTree with S3. #15657 (Pavel Kovalenko).
  • Fix race condition in AMQP-CPP. #15667 (alesapin).
  • Fix rare race condition in dictionaries and tables from MySQL. #15686 (alesapin).
  • Fixed too low default value of max_replicated_logs_to_keep setting, which might cause replicas to become lost too often. Improve lost replica recovery process by choosing the most up-to-date replica to clone. Also do not remove old parts from lost replica, detach them instead. #15701 (Alexander Tokmakov).
  • Fix error Cannot find column which may happen at insertion into MATERIALIZED VIEW in case if query for MV containes ARRAY JOIN. #15717 (Nikolai Kochetov).
  • Fix some cases of queries, in which only virtual columns are selected. Previously Not found column _nothing in block exception may be thrown. Fixes #12298. #15756 (Anton Popov).

Build/Testing/Packaging Improvement

  • Control CI builds configuration from the ClickHouse repository. #14547 (alesapin).
  • Now ClickHouse uses gcc-10 for the release build. Fixes #11138. #14609 (alesapin).
  • Attempt to make performance test more reliable. It is done by remapping the executable memory of the process on the fly with madvise to use transparent huge pages - it can lower the number of iTLB misses which is the main source of instabilities in performance tests. #14685 (Alexey Milovidov).
    1. In CMake files: - Moved some options' descriptions' parts to comments above. - Replace 0 -> OFF, 1 -> ON in options default values. - Added some descriptions and links to docs to the options. - Replaced FUZZER option (there is another option ENABLE_FUZZING which also enables same functionality). - Removed ENABLE_GTEST_LIBRARY option as there is ENABLE_TESTS. #14711 (Mike Kot).
  • Speed up build a little by removing unused headers. #14714 (Alexey Milovidov).
  • Fix build failure in OSX. #14761 (Winter Zhang).
  • Attempt to speed up build a little. #14808 (Alexey Milovidov).
  • Now we use clang-11 to build ClickHouse in CI. #14846 (alesapin).
  • #14809 fix MaterializeMySQL empty transaction unstable test case found in CI. #14854 (Winter Zhang).
  • Reformat and cleanup code in all integration test *.py files. #14864 (Bharat Nallan).
  • Fixing tests/integration/test_distributed_over_live_view/test.py. #14892 (vzakaznikov).
  • Switch from clang-tidy-10 to clang-tidy-11. #14922 (Alexey Milovidov).
  • Convert to python3. This closes #14886. #15007 (Azat Khuzhin).
  • Make performance test more stable and representative by splitting test runs and profile runs. #15027 (Alexey Milovidov).
  • Maybe fix MSan report in base64 (on servers with AVX-512). This fixes #14006. #15030 (Alexey Milovidov).
  • Don't allow any C++ translation unit to build more than 10 minutes or to use more than 10 GB or memory. This fixes #14925. #15060 (Alexey Milovidov).
  • Now all test images use llvm-symbolizer-11. #15069 (alesapin).
  • Splitted huge test test_dictionaries_all_layouts_and_sources into smaller ones. #15110 (Nikita Mikhaylov).
  • Added a script to perform hardware benchmark in a single command. #15115 (Alexey Milovidov).
  • Fix CMake options forwarding in fast test script. Fixes error in #14711. #15155 (alesapin).
  • Improvements in CI docker images: get rid of ZooKeeper and single script for test configs installation. #15215 (alesapin).
  • Now we use clang-11 for production ClickHouse build. #15239 (alesapin).
  • Allow to run AArch64 version of clickhouse-server without configs. This facilitates #15174. #15266 (Alexey Milovidov).
  • Fail early in functional tests if server failed to respond. This closes #15262. #15267 (Alexey Milovidov).
  • fix bug for build error: #15272. #15297 (hexiaoting).
  • fix bug for building query_db_generator.cpp. #15353 (hexiaoting).
  • Allow to build with llvm-11. #15366 (Alexey Milovidov).
  • Switch binary builds(Linux, Darwin, AArch64, FreeDSD) to clang-11. #15622 (Ilya Yatsishin).
  • Fix build of one miscellaneous example tool on Mac OS. Note that we don't build examples on Mac OS in our CI (we build only ClickHouse binary), so there is zero chance it will not break again. This fixes #15804. #15808 (Alexey Milovidov).

Other

NO CL ENTRY