ClickHouse/docs/changelogs/v22.3.1.1262-prestable.md
2022-05-10 11:25:38 +02:00

32 KiB

ClickHouse release v22.3.1.1262-prestable FIXME as compared to v22.2.1.2139-prestable

Backward Incompatible Change

  • Improvement the toDatetime function overflows. When the date string is very large, it will be converted to 1970. #32898 (HaiBo Li).
  • Make arrayCompact behave as other higher-order functions: perform compaction not of lambda function results but on original array. If you using nontrivial lambda functions in arrayCompact you may restore old behaviour by wrapping arrayCompact arguments into arrayMap. Closes #34010 #18535 #14778. #34795 (Alexandre Snarskii).

New Feature

  • New data type Object(<schema_format>), which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.g data.key1.key2 or with cast operator data.key1.key2::Int64. #23932 (Anton Popov).
  • Support authentication of users connected via SSL by their X.509 certificate. #31484 (eungenue).
  • related to issue: #30715. Add three functions for map data type: 1. mapReplace(map1, map2) - replaces values for keys in map1 with the values of the corresponding keys in map2; adds keys from map2 that don't exist in map1. 2. mapFilter 3. mapMap mapFilter and mapMap are higher order functions , accept two arguments, first argument is a lambda function with k, v pair , the second argument is a map type column. #33698 (hexiaoting).
  • Add local cache for disk s3. Closes #28961. #33717 (Kseniia Sumarokova).
  • Implement DateTime64 transform from and to arrow column, which closes #8280 and closes #28574. #34561 (李扬).
  • Add cpu/mem metric for clickhouse-local. Close #34545. #34605 (李扬).
  • Support schema inference for inserting into table functions file/hdfs/s3/url. #34732 (Kruglov Pavel).
  • A new settings called <allow_plaintext_password><allow_no_password> is added in server configuration which on/off insecure AUTH_TYPE plaintext-password and no_password. By default the property is set to true which means authType Plaintext_password & NO_password is allowed. #34738 (Heena Bansal).
  • Add new table function hive, usage as follow hive('<hive metastore url>', '<hive database>', '<hive table name>', '<columns definition>', '<partition columns>') for example SELECT * FROM hive('thrift://hivetest:9083', 'test', 'demo', '`id` Nullable(String), `score` Nullable(Int32), `day` Nullable(String)', 'day'). #34946 (lgbo).
    • When use clickhouse-client logining, If user and password is not specified in command line or configuration file, get them from CLICKHOUSE_USER and CLICKHOUSE_PASSWORD environment variables. Close #34538. #34947 (DR).
  • Added date_time_input_format = 'best_effort_us'. Closes #34799. #34982 (WenYao).
  • Changed the Play UI to select a theme by the following priority: * 'theme' GET parameter * 'theme' in localStorage * According to OS preference (didn't work before). #35068 (peledni).
  • Add database_replicated_allow_only_replicated_engine setting. When enabled, it only allowed to create Replicated tables in Replicated database. #35214 (Nikolai Kochetov).

Performance Improvement

  • Calling std::distance on large list will decrease performance, use a version in Node to substitute with more memory 80MB/1000w Nodes. The logic is: Every node is list has a version with type size_t, setting in insert or insertOrReplace method using the class member variable current_version. The version is only increase at enableSnapshot method. When doing a snapshot, call snapshotSizeWithVersion to get snapshot size and version(snapshot_up_to_version). When traversing the list, if node version is less then or equal to snapshot_up_to_version, then protects it from deleting if node version is bigger than snapshot_up_to_version, we can do anything to it. #34486 (zhanglistar).
  • Compaction of log store in Nuraft need acquire an inner lock which also used in normal commit process, so we delete useless logs in compact method of Changelog class in a background thread. See details on: 1707a7572a/src/handle_commit.cxx (L560). #34534 (zhanglistar).
  • Don't hold the latest snapshot in memory, instead, reading the snapshot if needed, sequence reading is fast to 200+MBps even on HDD using mmap system call. Writing snapshot data directly to disk using compression method without holding original data and compressed data in memory. #34584 (zhanglistar).
  • MergeTree improve insert performance replacing std::stable_sort with pdqsort. #34750 (Maksim Kita).
  • Improve the performance of the ANY aggregation function by acting over batches. #34760 (Raúl Marín).
  • Improve performance of detectCharset , detectLanguageUnknown functions. Improve performance of DirectDictionary if dictionary source is ClickHouse. Improve performance of processing queries with large IN section. #34888 (Maksim Kita).
  • Less lock on connection using atomic stat. Notice that it is an approximate stat. #35010 (zhanglistar).

Improvement

  • Make the znode ctime and mtime consistent between servers. #33441 (小路).
  • Hold time lock while assigning tasks to clear old temporary directories in StorageMergeTree. #34025 (Amos Bird).
  • When large files were written with s3 table function or table engine, the content type on the files was mistakenly set to application/xml due to a bug in the AWS SDK. This closes #33964. #34433 (Alexey Milovidov).
  • Improve schema inference with globs in FIle/S3/HDFS/URL engines. Try to use the next path for schema inference in case of error. #34465 (Kruglov Pavel).
    • Improve the opentelemetry span logs for INSERT operation on distributed table. #34480 (Frank Chen).
  • MaterializedMySQL support materialized_mysql_tables_list(a comma-separated list of mysql database tables, which will be replicated by MaterializedMySQL database engine. Default value: empty list — means whole tables will be replicated) settings, mentioned at #32977. #34487 (zzsmdfj).
  • This PR changes restrictive row policies a bit to make them an easier alternative to permissive policies in easy cases. If for a particular table only restrictive policies exist (without permissive policies) users will be able to see some rows. Also SHOW CREATE ROW POLICY will always show AS permissive or AS restrictive in row policy's definition. #34596 (Vitaly Baranov).
  • Add encodeURLComponent, 'encodeURLFormComponent' function. Closes #31092. #34607 (zzsmdfj).
  • Now you can read system.zookeeper table without restrictions on path or using like expression. This reads can generate quite heavy load for zookeeper so to enable this ability you have to enable setting allow_unrestricted_reads_from_keeper. #34609 (Sergei Trifonov).
  • Some refactoring and improvement over async and remote buffer related stuff. Separated in each commit. #34629 (Amos Bird).
  • ExecutableUserDefinedFunctions allow to specify argument names. This is necessary for formats where argument name is part of serialization, like Native, JSONEachRow. Closes #34604. #34653 (Maksim Kita).
  • Extract schema only once on table creation and prevent reading from local files/external sources to extract schema on each server startup. #34684 (Kruglov Pavel).
  • Do not reset logging that configured via --log-file/--errorlog-file in case of empty logger.log/logger.errorlog. #34718 (Amos Bird).
  • Support remote()/cluster() for parallel_distributed_insert_select=2. #34728 (Azat Khuzhin).
  • Add name hints for data skipping indices. Closes #29698. #34764 (flynn).
  • Now ALTER TABLE DROP COLUMN columnX queries for MergeTree table engines will work instantly when columnX is ALIAS column. Fixes #34660. #34786 (alesapin).
  • In previous versions the progress bar in clickhouse-client can jump forward near 50% for no reason. This closes #34324. #34801 (Alexey Milovidov).
  • Fix reading only columns which user asked for. Closes #34163. #34849 (Kseniia Sumarokova).
  • Implement MemoryStatisticsOS for FreeBSD. #34902 (Alexandre Snarskii).
  • Allow to open empty sqlite db file if it does not exist. Closes #33367. #34907 (Kseniia Sumarokova).
  • Allow LowCardinality strings for ngrambf_v1/tokenbf_v1 indexes. Closes #21865. #34911 (Lars Hiller Eidnes).
  • Ignore per-column TTL in CREATE TABLE AS if new table engine does not support it (i.e. if the engine is not of MergeTree family). #34938 (Azat Khuzhin).
  • Use connection pool for hive metastore client. #34940 (lgbo).
  • Currently, if the user changes the settings of the system tables there will be tons of logs and ClickHouse will rename the tables every minute. This fixes #34929. #34949 (Nikita Mikhaylov).
  • remove unnecessary columns for reading parquet/orc files. #34954 (lgbo).
  • For random access readbuffer in hive, the first time to read the readbuffer would use the original readbuffer instead of local file. When we read a parquet/orc format file, the readbuffer seeks to the end of the file, which will be blocked until the local file finishes download, and make the whold process slow. #34957 (lgbo).
  • Add more sanity checks for keeper configuration: now mixing of localhost and non-local servers is not allowed, also add checks for same value of internal raft port and keeper client port. #35004 (alesapin).
  • Functions dictGetHierarchy, dictIsIn, dictGetChildren, dictGetDescendants support implicit key cast and constant arguments. Closes #34970. #35027 (Maksim Kita).
  • Avoid division by zero in Query Profiler if Linux kernel has a bug. Closes #34787. #35032 (Alexey Milovidov).
  • Avoid possible MEMORY_LIMIT_EXCEEDED during INSERT into Buffer with AggregateFunction. #35072 (Azat Khuzhin).
  • Support view() for parallel_distributed_insert_select. #35132 (Azat Khuzhin).
  • Add setting to lower column case when reading parquet/ORC file. #35145 (shuchaome).
  • Do not retry non-rertiable errors. Closes #35161. #35172 (Kseniia Sumarokova).
  • Added disk_name to system.part_log. #35178 (Artyom Yurkov).
  • Currently,Clickhouse validates hosts defined under <remote_url_allow_hosts> for URL and Remote Table functions. This PR extends the RemoteHostFilter to Mysql and PostgreSQL table functions. #35191 (Heena Bansal).
  • Sometimes it is not enough for us to distinguish queries hierachy only by is_initial_query in system.query_log and system.processes. So distributed_depth is needed. #35207 (李扬).
  • Support test mode for clickhouse-local. #35264 (Kseniia Sumarokova).
  • Return const for function getMacro if not in distributed query. Close #34727. #35289 (李扬).
  • Reload remote_url_allow_hosts after config update. #35294 (Nikolai Kochetov).

Bug Fix

  • Ignore obsolete grants in ATTACH GRANT statements. This PR fixes #34815. #34855 (Vitaly Baranov).
  • When the inner readbuffer's buffer size is too small, NEED_MORE_INPUT in HadoopSnappyDecoder will run multi times (>=3)for one compressed block. This makes the input data be copied into the wrong place in HadoopSnappyDecoder::buffer. #35116 (lgbo).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehaviour in official stable or prestable release)

  • Fix distributed subquery max_query_size limitation inconsistency. #34078 (Chao Ma).
  • Fix incorrect trivial count result when part movement feature is used #34089. #34385 (nvartolomei).
  • Stop to select part for mutate when the other replica has already updated the /log for ReplatedMergeTree engine. #34633 (Jianmei Zhang).
  • Fix allow_experimental_projection_optimization with enable_global_with_statement (before it may lead to Stack size too large error in case of multiple expressions in WITH clause, and also it executes scalar subqueries again and again, so not it will be more optimal). #34650 (Azat Khuzhin).
  • Fix serialization/printing for system queries RELOAD MODEL, RELOAD FUNCTION, RESTART DISK when used ON CLUSTER. Closes #34514. #34696 (Maksim Kita).
  • Fix ENOENT with fsync_part_directory and Vertical merge. #34739 (Azat Khuzhin).
  • Fix bug for h3 funcs containing const columns which cause queries to fail. #34743 (Bharat Nallan).
  • Fix possible failures in S2 functions when queries contain const columns. #34745 (Bharat Nallan).
  • Fix bugs for multiple columns group by in WindowView. #34859 (vxider).
  • Support DDLs like CREATE USER to be executed on cross replicated cluster. #34860 (Jianmei Zhang).
  • Fix asynchronous inserts to table functions. Fixes #34864. #34866 (Anton Popov).
  • Fix possible "Part directory doesn't exist" during INSERT. #34876 (Azat Khuzhin).
  • Fix postgres datetime64 conversion. Closes #33364. #34910 (Kseniia Sumarokova).
  • Avoid busy polling in keeper while searching for changelog files to delete. #34931 (Azat Khuzhin).
  • Unexpected result when use in in where in hive query. #34945 (lgbo).
  • Fix wrong schema inference for unquoted dates in CSV. Closes #34768. #34961 (Kruglov Pavel).
  • Fix possible rare error Cannot push block to port which already has data. Avoid pushing to port with data inside DelayedSource. #34993 (Nikolai Kochetov).
  • Fix possible segfault in filelog. Closes #30749. #34996 (Kseniia Sumarokova).
  • Fix unexpected result when use -state type aggregate function in window frame. #34999 (metahys).
  • Fix possible exception Reading for MergeTree family tables must be done with last position boundary. Closes #34979. #35001 (Kseniia Sumarokova).
  • Fix reading from system.asynchronous_inserts table if there exists asynchronous insert into table function. #35050 (Anton Popov).
  • Fix missing alias after function is optimized to subcolumn when setting optimize_functions_to_subcolumns is enabled. Closes #33798. #35079 (qieqieplus).
  • Avoid possible deadlock on server shutdown. #35081 (Azat Khuzhin).
  • Fixed the "update_lag" external dictionary configuration option being unusable with the error message Unexpected key `update_lag` in dictionary source configuration. #35089 (Jason Chu).
  • fix issue: #31469. #35118 (zzsmdfj).
  • Fix optimize_skip_unused_shards_rewrite_in for signed columns and negative values. #35134 (Azat Khuzhin).
  • Fixed the incorrect translation YAML config to XML. #35135 (Miel Donkers).
  • Fix partition pruning error when non-monotonic function is used with IN operator. This fixes #35136. #35146 (Amos Bird).
  • Fix materialised postrgesql adding new table to replication (ATTACH TABLE) after manually removing (DETACH TABLE). Closes #33800. Closes #34922. Closes #34315. #35158 (Kseniia Sumarokova).
  • Fix materialised postgres table overrides for partition by, etc. Closes #35048. #35162 (Kseniia Sumarokova).
  • Schema inference didn't work properly on case of INSERT INTO FUNCTION s3(...) FROM ..., it tried to read schema from s3 file instead of from select query. #35176 (Kruglov Pavel).
  • Fix error in query with WITH TOTALS in case if HAVING returned empty result. This fixes #33711. #35186 (Amos Bird).
  • Make function cast(value, 'IPv4'), cast(value, 'IPv6') behave same as toIPv4, toIPv6 functions. Changed behavior of incorrect IP address passed into functions toIPv4, toIPv6, now if invalid IP address passes into this functions exception will be raised, before this function return default value. Added functions IPv4StringToNumOrDefault, IPv4StringToNumOrNull, IPv6StringToNumOrDefault, IPv6StringOrNull toIPv4OrDefault, toIPv4OrNull, toIPv6OrDefault, toIPv6OrNull. Functions IPv4StringToNumOrDefault , toIPv4OrDefault , toIPv6OrDefault should be used if previous logic relied on IPv4StringToNum, toIPv4, toIPv6 returning default value for invalid address. Added setting cast_ipv4_ipv6_default_on_conversion_error, if this setting enabled, then IP address conversion functions will behave as before. Closes #22825. Closes #5799. Closes #35156. #35240 (Maksim Kita).
  • Wait for IDiskRemote thread pool properly. #35257 (Azat Khuzhin).
  • Fix CHECK TABLE query in case when sparse columns are enabled in table. #35274 (Anton Popov).
  • Fix possible Abort while using Brotli compression with a small max_read_buffer_size setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35281 (Kruglov Pavel).
  • Fix possible segfault in JSONEachRow schema inference. #35291 (Kruglov Pavel).
  • Fix possible Assertion 'position() != working_buffer.end()' failed while using lzma compression with small max_read_buffer_size setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35295 (Kruglov Pavel).
  • Fix possible segfault while using lz4 compression with a small max_read_buffer_size setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35296 (Kruglov Pavel).
  • Fix possible Assertion 'position() != working_buffer.end()' failed while using bzip2 compression with small max_read_buffer_size setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35300 (Kruglov Pavel).
  • Fix segfault in Postgres database when getting create table query if database was created using named collections. Closes #35312. #35313 (Kseniia Sumarokova).
  • Fix bug in S3 zero-copy replication which can lead to errors like Found parts with the same min block and with the same max block as the missing part after concurrent fetch/drop table. #35348 (alesapin).

NO CL ENTRY