ClickHouse/docs/changelogs/v23.6.1.1524-stable.md

52 KiB

sidebar_position sidebar_label
1 2023

2023 Changelog

ClickHouse release v23.6.1.1524-stable (d1c7e13d08) FIXME as compared to v23.5.1.3174-stable (2fec796e73)

Backward Incompatible Change

New Feature

  • Add setting session_timezone, it is used as default timezone for session when not explicitly specified. #44149 (Andrey Zvonov).
  • Added overlay database engine and representation of a directory as a database This commit adds 4 databases: 1. DatabaseOverlay: Implements the IDatabase interface. Allow to combine multiple databases, such as FileSystem and Memory. Internally, it stores a vector with other database pointers and proxies requests to them in turn until it is executed successfully. 2. DatabaseFilesystem: allows to read-only interact with files stored on the file system. Internally, it uses TableFunctionFile to implicitly load file when a user requests the table. Result of TableFunctionFile call cached inside to provide quick access. 3. DatabaseS3: allows to read-only interact with s3 storage. It uses TableFunctionS3 to implicitly load table from s3 4. DatabaseHDFS: allows to interact with hdfs storage. It uses TableFunctionHDFS to implicitly load table from hdfs. #48821 (alekseygolub).
  • Add a new setting named use_mysql_types_in_show_columns to alter the SHOW COLUMNS SQL statement to display MySQL equivalent types when a client is connected via the MySQL compatibility port. #49577 (Thomas Panetti).
  • Added option --rename_files_after_processing <pattern>. This closes #34207. #49626 (alekseygolub).
    1. Add TableFunctionRedis 3. Add table engine Redis 4. Add RedisCommon which contains Redis related tools and types 5. Support equals and in filter push down into Redis. #50150 (JackyWoo).
  • Allow to skip empty files in file/s3/url/hdfs table functions using settings s3_skip_empty_files, hdfs_skip_empty_files, engine_file_skip_empty_files, engine_url_skip_empty_files. #50364 (Kruglov Pavel).
  • Clickhouse-client can now be called with a connection instead of "--host", "--port", "--user" etc. #50689 (Alexey Gerasimchuck).
  • Codec DEFLATE_QPL is now controlled via server setting "enable_deflate_qpl_codec" (default: false) instead of setting "allow_experimental_codecs". This marks QPL_DEFLATE non-experimental. #50775 (Robert Schulze).

Performance Improvement

  • Improve performance with enabled QueryProfiler using thread-local timer_id instead of global object. #48778 (Jiebin Sun).
  • Rewrite CapnProto input/output format to improve its performance. Map column names and CapnProto fields case insensitive, fix reading/writing of nested structure fields. #49752 (Kruglov Pavel).
  • Optimize parquet write performance for parallel threads. #50102 (Hongbin Ma).
  • Documentation entry for user-facing changes Disable parallelize_output_from_storages for processing MATERIALIZED VIEWs and storages with one block only. #50214 (Azat Khuzhin).

  • Merge PR https://github.com/ClickHouse/ClickHouse/pull/46558 (Avoid processing already sorted data). Avoid block permutation during sort if the block is already sorted. #50697 (Maksim Kita).
  • In the earlier PRs (#50062, #50307), we used to propose an optimization pattern which transforms the predicates with toYear/toYYYYMM into its equivalent but converter-free form. This transformation could bring significant performance impact to some workloads, such as SSB. However, as issue #50628 indicated, these two PRs would introduce some issues which may results in incomplete query results, and as a result, they were reverted by #50629. #50951 (Zhiguo Zhou).
  • Make multiple list requests to ZooKeeper in parallel to speed up reading from system.zookeeper table. #51042 (Alexander Gololobov).
  • Speedup initialization of DateTime lookup tables for time zones. This should reduce startup/connect time of clickhouse client especially in debug build as it is rather heavy. #51347 (Alexander Gololobov).

Improvement

  • Allow to cast IPv6 to IPv4 address for CIDR ::ffff:0:0/96 (IPv4-mapped addresses). #49759 (Yakov Olkhovskiy).
  • Update MongoDB protocol to support MongoDB 5.1 version and newer. Support for the versions with the old protocol (<3.6) is preserved. Closes #45621, #49879. #50061 (Nikolay Degterinsky).
  • Improved scheduling of merge selecting and cleanup tasks in ReplicatedMergeTree. The tasks will not be executed too frequently when there's nothing to merge or cleanup. Added settings max_merge_selecting_sleep_ms, merge_selecting_sleep_slowdown_factor, max_cleanup_delay_period and cleanup_thread_preferred_points_per_iteration. It should close #31919. #50107 (Alexander Tokmakov).
  • Support parallel replicas with the analyzer. #50441 (Raúl Marín).
  • Add setting input_format_max_bytes_to_read_for_schema_inference to limit the number of bytes to read in schema inference. Closes #50577. #50592 (Kruglov Pavel).
  • Respect setting input_format_as_default in schema inference. #50602 (Kruglov Pavel).
  • Make filter push down through cross join. #50605 (Han Fei).
  • Actual lz4 version is used now. #50621 (Nikita Taranov).
  • Allow to skip trailing empty lines in CSV/TSV/CustomSeparated formats via settings input_format_csv_skip_trailing_empty_lines, input_format_tsv_skip_trailing_empty_lines and input_format_custom_skip_trailing_empty_lines (disabled by default). Closes #49315. #50635 (Kruglov Pavel).
  • Functions "toDateOrDefault|OrNull()" and "accuateCastOrDefault|OrNull" now correctly parse numeric arguments. #50709 (Dmitry Kardymon).
  • Currently, the csv input format can not parse the csv file with whitespace or \t field delimiter, and these delimiters is supported in spark. #50712 (KevinyhZou).
  • Settings number_of_mutations_to_delay and number_of_mutations_to_throw are enabled by default now with values 500 and 1000 respectively. #50726 (Anton Popov).
  • Keeper improvement: add feature flags for Keeper API. Each feature flag can be disabled or enabled by defining it under keeper_server.feature_flags config. E.g. to enable CheckNotExists request, keeper_server.feature_flags.check_not_exists should be set to 1 on Keeper. #50796 (Antonio Andelic).
  • The dashboard correctly shows missing values. This closes #50831. #50832 (Alexey Milovidov).
  • CGroups metrics related to CPU are replaced with one metric, CGroupMaxCPU for better usability. The Normalized CPU usage metrics will be normalized to CGroups limits instead of the total number of CPUs when they are set. This closes #50836. #50835 (Alexey Milovidov).
  • Relax the thresholds for "too many parts" to be more modern. Return the backpressure during long-running insert queries. #50856 (Alexey Milovidov).
  • Added the possibility to use date and time arguments in syslog timestamp format in functions parseDateTimeBestEffort*() and parseDateTime64BestEffort*(). #50925 (Victor Krasnov).
  • Suggest using APPEND or TRUNCATE for INTO OUTFILE when file exists. #50950 (alekar).
  • Add embedded keeper-client to standalone keeper binary. #50964 (pufit).
  • Command line parameter "--password" in clickhouse-client can now be specified only once. #50966 (Alexey Gerasimchuck).
  • Fix data lakes slowness because of synchronous head requests. (Related to Iceberg/Deltalake/Hudi being slow with a lot of files). #50976 (Kseniia Sumarokova).
  • Use hash_of_all_files from system.parts to check identity of parts during on-cluster backups. #50997 (Vitaly Baranov).
  • The system table zookeeper_connection connected_time identifies the time when the connection is established (standard format), and session_uptime_elapsed_seconds is added, which labels the duration of the established connection session (in seconds). #51026 (郭小龙).
  • Show halves of checksums in system.parts, system.projection_parts and in error messages in the correct order. #51040 (Vitaly Baranov).
  • Do not replicate ALTER PARTITION queries and mutations through Replicated database if it has only one shard and the underlying table is ReplicatedMergeTree. #51049 (Alexander Tokmakov).
  • Improve the progress bar for file/s3/hdfs/url table functions by using chunk size from source data and using incremental total size counting in each thread. Fix the progress bar for *Cluster functions. This closes #47250. #51088 (Kruglov Pavel).
  • Add total_bytes_to_read to Progress packet in TCP protocol for better Progress bar. #51158 (Kruglov Pavel).
  • Better checking of data parts on disks with filesystem cache. #51164 (Anton Popov).
  • Disable cache setting do_not_evict_index_and_mark_files (Was enabled in 23.5). #51222 (Kseniia Sumarokova).
  • Fix sometimes not correct current_elements_num in fs cache. #51242 (Kseniia Sumarokova).
  • Add random sleep before merges/mutations execution to split load more evenly between replicas in case of zero-copy replication. #51282 (alesapin).
  • The function transform as well as CASE with value matching started to support all data types. This closes #29730. This closes #32387. This closes #50827. This closes #31336. This closes #40493. #51351 (Alexey Milovidov).
  • We have found a bug in LLVM that makes the usage of compile_expressions setting unsafe. It is disabled by default. #51368 (Alexey Milovidov).
  • Issue #50220 reports a core in grace_hash join. We finally reproduce the exception on local, and found that the issue is related to the failure of creating temporary file. Somehow this is triggered in https://github.com/ClickHouse/ClickHouse/pull/49816 https://github.com/ClickHouse/ClickHouse/pull/49483. #51382 (lgbo).

Build/Testing/Packaging Improvement

Bug Fix (user-visible misbehavior in an official stable release)

NOT FOR CHANGELOG / INSIGNIFICANT