mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-29 19:12:03 +00:00
32 KiB
32 KiB
ClickHouse release v22.3.1.1262-prestable FIXME as compared to v22.2.1.2139-prestable
Backward Incompatible Change
- Improvement the toDatetime function overflows. When the date string is very large, it will be converted to 1970. #32898 (HaiBo Li).
- Make arrayCompact behave as other higher-order functions: perform compaction not of lambda function results but on original array. If you using nontrivial lambda functions in arrayCompact you may restore old behaviour by wrapping arrayCompact arguments into arrayMap. Closes #34010 #18535 #14778. #34795 (Alexandre Snarskii).
New Feature
- New data type
Object(<schema_format>)
, which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.gdata.key1.key2
or with cast operatordata.key1.key2::Int64
. #23932 (Anton Popov). - Support authentication of users connected via SSL by their X.509 certificate. #31484 (eungenue).
- related to issue: #30715. Add three functions for map data type: 1. mapReplace(map1, map2) - replaces values for keys in map1 with the values of the corresponding keys in map2; adds keys from map2 that don't exist in map1. 2. mapFilter 3. mapMap mapFilter and mapMap are higher order functions , accept two arguments, first argument is a lambda function with k, v pair , the second argument is a map type column. #33698 (hexiaoting).
- Add local cache for disk s3. Closes #28961. #33717 (Kseniia Sumarokova).
- Implement DateTime64 transform from and to arrow column, which closes #8280 and closes #28574. #34561 (李扬).
- Add cpu/mem metric for clickhouse-local. Close #34545. #34605 (李扬).
- Support schema inference for inserting into table functions file/hdfs/s3/url. #34732 (Kruglov Pavel).
- A new settings called <allow_plaintext_password><allow_no_password> is added in server configuration which on/off insecure AUTH_TYPE plaintext-password and no_password. By default the property is set to true which means authType Plaintext_password & NO_password is allowed. #34738 (Heena Bansal).
- Add new table function
hive
, usage as followhive('<hive metastore url>', '<hive database>', '<hive table name>', '<columns definition>', '<partition columns>')
for exampleSELECT * FROM hive('thrift://hivetest:9083', 'test', 'demo', '`id` Nullable(String), `score` Nullable(Int32), `day` Nullable(String)', 'day')
. #34946 (lgbo). - Added date_time_input_format = 'best_effort_us'. Closes #34799. #34982 (WenYao).
- Changed the Play UI to select a theme by the following priority: * 'theme' GET parameter * 'theme' in localStorage * According to OS preference (didn't work before). #35068 (peledni).
-
- Add
database_replicated_allow_only_replicated_engine
setting. When enabled, it only allowed to createReplicated
tables inReplicated
database. #35214 (Nikolai Kochetov).
Performance Improvement
- Calling std::distance on large list will decrease performance, use a version in Node to substitute with more memory 80MB/1000w Nodes. The logic is: Every node is list has a version with type size_t, setting in insert or insertOrReplace method using the class member variable current_version. The version is only increase at enableSnapshot method. When doing a snapshot, call snapshotSizeWithVersion to get snapshot size and version(snapshot_up_to_version). When traversing the list, if node version is less then or equal to snapshot_up_to_version, then protects it from deleting if node version is bigger than snapshot_up_to_version, we can do anything to it. #34486 (zhanglistar).
- Compaction of log store in Nuraft need acquire an inner lock which also used in normal commit process, so we delete useless logs in
compact
method of Changelog class in a background thread. See details on:1707a7572a/src/handle_commit.cxx (L560)
. #34534 (zhanglistar). - Don't hold the latest snapshot in memory, instead, reading the snapshot if needed, sequence reading is fast to 200+MBps even on HDD using mmap system call. Writing snapshot data directly to disk using compression method without holding original data and compressed data in memory. #34584 (zhanglistar).
- MergeTree improve insert performance replacing std::stable_sort with pdqsort. #34750 (Maksim Kita).
- Improve the performance of the
ANY
aggregation function by acting over batches. #34760 (Raúl Marín). - Improve performance of
detectCharset
,detectLanguageUnknown
functions. Improve performance ofDirectDictionary
if dictionary source isClickHouse
. Improve performance of processing queries with largeIN
section. #34888 (Maksim Kita). - Less lock on connection using atomic stat. Notice that it is an approximate stat. #35010 (zhanglistar).
Improvement
- Make the znode ctime and mtime consistent between servers. #33441 (小路).
- Hold time lock while assigning tasks to clear old temporary directories in StorageMergeTree. #34025 (Amos Bird).
- When large files were written with
s3
table function or table engine, the content type on the files was mistakenly set toapplication/xml
due to a bug in the AWS SDK. This closes #33964. #34433 (Alexey Milovidov). - Improve schema inference with globs in FIle/S3/HDFS/URL engines. Try to use the next path for schema inference in case of error. #34465 (Kruglov Pavel).
-
- Improve the opentelemetry span logs for INSERT operation on distributed table. #34480 (Frank Chen).
- MaterializedMySQL support materialized_mysql_tables_list(a comma-separated list of mysql database tables, which will be replicated by MaterializedMySQL database engine. Default value: empty list — means whole tables will be replicated) settings, mentioned at #32977. #34487 (zzsmdfj).
- This PR changes restrictive row policies a bit to make them an easier alternative to permissive policies in easy cases. If for a particular table only restrictive policies exist (without permissive policies) users will be able to see some rows. Also
SHOW CREATE ROW POLICY
will always showAS permissive
orAS restrictive
in row policy's definition. #34596 (Vitaly Baranov). - Add
encodeURLComponent
, 'encodeURLFormComponent' function. Closes #31092. #34607 (zzsmdfj). - Now you can read
system.zookeeper
table without restrictions on path or usinglike
expression. This reads can generate quite heavy load for zookeeper so to enable this ability you have to enable settingallow_unrestricted_reads_from_keeper
. #34609 (Sergei Trifonov). - Some refactoring and improvement over async and remote buffer related stuff. Separated in each commit. #34629 (Amos Bird).
- ExecutableUserDefinedFunctions allow to specify argument names. This is necessary for formats where argument name is part of serialization, like
Native
,JSONEachRow
. Closes #34604. #34653 (Maksim Kita). - Extract schema only once on table creation and prevent reading from local files/external sources to extract schema on each server startup. #34684 (Kruglov Pavel).
- Do not reset logging that configured via --log-file/--errorlog-file in case of empty logger.log/logger.errorlog. #34718 (Amos Bird).
- Support
remote()
/cluster()
forparallel_distributed_insert_select=2
. #34728 (Azat Khuzhin). - Add name hints for data skipping indices. Closes #29698. #34764 (flynn).
- Now
ALTER TABLE DROP COLUMN columnX
queries forMergeTree
table engines will work instantly whencolumnX
isALIAS
column. Fixes #34660. #34786 (alesapin). - In previous versions the progress bar in clickhouse-client can jump forward near 50% for no reason. This closes #34324. #34801 (Alexey Milovidov).
- Fix reading only columns which user asked for. Closes #34163. #34849 (Kseniia Sumarokova).
- Implement MemoryStatisticsOS for FreeBSD. #34902 (Alexandre Snarskii).
- Allow to open empty sqlite db file if it does not exist. Closes #33367. #34907 (Kseniia Sumarokova).
- Allow LowCardinality strings for ngrambf_v1/tokenbf_v1 indexes. Closes #21865. #34911 (Lars Hiller Eidnes).
- Ignore per-column
TTL
inCREATE TABLE AS
if new table engine does not support it (i.e. if the engine is not ofMergeTree
family). #34938 (Azat Khuzhin). - Use connection pool for hive metastore client. #34940 (lgbo).
- Currently, if the user changes the settings of the system tables there will be tons of logs and ClickHouse will rename the tables every minute. This fixes #34929. #34949 (Nikita Mikhaylov).
- remove unnecessary columns for reading parquet/orc files. #34954 (lgbo).
- For random access readbuffer in hive, the first time to read the readbuffer would use the original readbuffer instead of local file. When we read a parquet/orc format file, the readbuffer seeks to the end of the file, which will be blocked until the local file finishes download, and make the whold process slow. #34957 (lgbo).
- Add more sanity checks for keeper configuration: now mixing of localhost and non-local servers is not allowed, also add checks for same value of internal raft port and keeper client port. #35004 (alesapin).
- Functions
dictGetHierarchy
,dictIsIn
,dictGetChildren
,dictGetDescendants
support implicit key cast and constant arguments. Closes #34970. #35027 (Maksim Kita). - Avoid division by zero in Query Profiler if Linux kernel has a bug. Closes #34787. #35032 (Alexey Milovidov).
- Avoid possible
MEMORY_LIMIT_EXCEEDED
duringINSERT
intoBuffer
withAggregateFunction
. #35072 (Azat Khuzhin). - Support
view()
forparallel_distributed_insert_select
. #35132 (Azat Khuzhin). - Add setting to lower column case when reading parquet/ORC file. #35145 (shuchaome).
- Do not retry non-rertiable errors. Closes #35161. #35172 (Kseniia Sumarokova).
- Added disk_name to system.part_log. #35178 (Artyom Yurkov).
- Currently,Clickhouse validates hosts defined under <remote_url_allow_hosts> for URL and Remote Table functions. This PR extends the RemoteHostFilter to Mysql and PostgreSQL table functions. #35191 (Heena Bansal).
- Sometimes it is not enough for us to distinguish queries hierachy only by is_initial_query in system.query_log and system.processes. So distributed_depth is needed. #35207 (李扬).
- Support test mode for clickhouse-local. #35264 (Kseniia Sumarokova).
- Return const for function getMacro if not in distributed query. Close #34727. #35289 (李扬).
- Reload
remote_url_allow_hosts
after config update. #35294 (Nikolai Kochetov).
Bug Fix
- Ignore obsolete grants in ATTACH GRANT statements. This PR fixes #34815. #34855 (Vitaly Baranov).
- When the inner readbuffer's buffer size is too small, NEED_MORE_INPUT in
HadoopSnappyDecoder
will run multi times (>=3)for one compressed block. This makes the input data be copied into the wrong place inHadoopSnappyDecoder::buffer
. #35116 (lgbo).
Build/Testing/Packaging Improvement
- Randomize some settings in functional tests. This closes #32268. #34092 (Kruglov Pavel).
- NA. #34513 (vzakaznikov).
- Debian package clickhouse-test.deb removed completely. CI use tests from repository and standalone testing via deb package is no longer supported. #34606 (Ilya Yatsishin).
- Set timeout 40 minutes for fast tests. #34624 (Mikhail f. Shiryaev).
- Drop PVS test from CI. #34680 (Mikhail f. Shiryaev).
- Limit DWARF version for debug info by 4 max, because our internal stack symbolizer cannot parse DWARF version 5. This makes sense if you compile ClickHouse with clang-15. #34777 (Alexey Milovidov).
- Improve CI scripts arguments. #34792 (Mikhail f. Shiryaev).
- Use @robot-clickhouse as an author and committer for PRs like https://github.com/ClickHouse/ClickHouse/pull/34685. #34793 (Mikhail f. Shiryaev).
- Separate smaller clickhouse-keeper build. #35031 (alesapin).
- Clion has the following problems "The breakpoint will not currently be hit. No executable code is associated with this line". #35179 (小路).
- Add an ability to build stripped binaries with cmake. #35196 (alesapin).
Bug Fix (user-visible misbehaviour in official stable or prestable release)
- Fix distributed subquery max_query_size limitation inconsistency. #34078 (Chao Ma).
- Fix incorrect trivial count result when part movement feature is used #34089. #34385 (nvartolomei).
- Stop to select part for mutate when the other replica has already updated the /log for ReplatedMergeTree engine. #34633 (Jianmei Zhang).
- Fix
allow_experimental_projection_optimization
withenable_global_with_statement
(before it may lead toStack size too large
error in case of multiple expressions inWITH
clause, and also it executes scalar subqueries again and again, so not it will be more optimal). #34650 (Azat Khuzhin). - Fix serialization/printing for system queries
RELOAD MODEL
,RELOAD FUNCTION
,RESTART DISK
when usedON CLUSTER
. Closes #34514. #34696 (Maksim Kita). - Fix ENOENT with fsync_part_directory and Vertical merge. #34739 (Azat Khuzhin).
- Fix bug for h3 funcs containing const columns which cause queries to fail. #34743 (Bharat Nallan).
- Fix possible failures in S2 functions when queries contain const columns. #34745 (Bharat Nallan).
- Fix bugs for multiple columns group by in WindowView. #34859 (vxider).
- Support DDLs like CREATE USER to be executed on cross replicated cluster. #34860 (Jianmei Zhang).
- Fix asynchronous inserts to table functions. Fixes #34864. #34866 (Anton Popov).
- Fix possible "Part directory doesn't exist" during
INSERT
. #34876 (Azat Khuzhin). - Fix postgres datetime64 conversion. Closes #33364. #34910 (Kseniia Sumarokova).
- Avoid busy polling in keeper while searching for changelog files to delete. #34931 (Azat Khuzhin).
- Unexpected result when use
in
inwhere
in hive query. #34945 (lgbo). - Fix wrong schema inference for unquoted dates in CSV. Closes #34768. #34961 (Kruglov Pavel).
- Fix possible rare error
Cannot push block to port which already has data
. Avoid pushing to port with data insideDelayedSource
. #34993 (Nikolai Kochetov). - Fix possible segfault in filelog. Closes #30749. #34996 (Kseniia Sumarokova).
- Fix unexpected result when use -state type aggregate function in window frame. #34999 (metahys).
- Fix possible exception
Reading for MergeTree family tables must be done with last position boundary
. Closes #34979. #35001 (Kseniia Sumarokova). - Fix reading from
system.asynchronous_inserts
table if there exists asynchronous insert into table function. #35050 (Anton Popov). - Fix missing alias after function is optimized to subcolumn when setting
optimize_functions_to_subcolumns
is enabled. Closes #33798. #35079 (qieqieplus). - Avoid possible deadlock on server shutdown. #35081 (Azat Khuzhin).
- Fixed the "update_lag" external dictionary configuration option being unusable with the error message
Unexpected key `update_lag` in dictionary source configuration
. #35089 (Jason Chu). - fix issue: #31469. #35118 (zzsmdfj).
- Fix
optimize_skip_unused_shards_rewrite_in
for signed columns and negative values. #35134 (Azat Khuzhin). - Fixed the incorrect translation YAML config to XML. #35135 (Miel Donkers).
- Fix partition pruning error when non-monotonic function is used with IN operator. This fixes #35136. #35146 (Amos Bird).
- Fix materialised postrgesql adding new table to replication (ATTACH TABLE) after manually removing (DETACH TABLE). Closes #33800. Closes #34922. Closes #34315. #35158 (Kseniia Sumarokova).
- Fix materialised postgres
table overrides
for partition by, etc. Closes #35048. #35162 (Kseniia Sumarokova). - Schema inference didn't work properly on case of
INSERT INTO FUNCTION s3(...) FROM ...
, it tried to read schema from s3 file instead of from select query. #35176 (Kruglov Pavel). -
- Fix
replaceRegexpAll
, close #35117. #35182 (Vladimir C).
- Fix
- Fix error in query with
WITH TOTALS
in case ifHAVING
returned empty result. This fixes #33711. #35186 (Amos Bird). -
- Fix reading port from config, close #34776. #35193 (Vladimir C).
- Make function
cast(value, 'IPv4')
,cast(value, 'IPv6')
behave same astoIPv4
,toIPv6
functions. Changed behavior of incorrect IP address passed into functionstoIPv4
,toIPv6
, now if invalid IP address passes into this functions exception will be raised, before this function return default value. Added functionsIPv4StringToNumOrDefault
,IPv4StringToNumOrNull
,IPv6StringToNumOrDefault
,IPv6StringOrNull
toIPv4OrDefault
,toIPv4OrNull
,toIPv6OrDefault
,toIPv6OrNull
. FunctionsIPv4StringToNumOrDefault
,toIPv4OrDefault
,toIPv6OrDefault
should be used if previous logic relied onIPv4StringToNum
,toIPv4
,toIPv6
returning default value for invalid address. Added settingcast_ipv4_ipv6_default_on_conversion_error
, if this setting enabled, then IP address conversion functions will behave as before. Closes #22825. Closes #5799. Closes #35156. #35240 (Maksim Kita). - Wait for IDiskRemote thread pool properly. #35257 (Azat Khuzhin).
- Fix
CHECK TABLE
query in case when sparse columns are enabled in table. #35274 (Anton Popov). - Fix possible Abort while using Brotli compression with a small
max_read_buffer_size
setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35281 (Kruglov Pavel). - Fix possible segfault in JSONEachRow schema inference. #35291 (Kruglov Pavel).
- Fix possible
Assertion 'position() != working_buffer.end()' failed
while using lzma compression with smallmax_read_buffer_size
setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35295 (Kruglov Pavel). - Fix possible segfault while using lz4 compression with a small max_read_buffer_size setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35296 (Kruglov Pavel).
- Fix possible
Assertion 'position() != working_buffer.end()' failed
while using bzip2 compression with smallmax_read_buffer_size
setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35300 (Kruglov Pavel). -
- Fix partial merge join duplicate rows bug, close #31009. #35311 (Vladimir C).
- Fix segfault in Postgres database when getting create table query if database was created using named collections. Closes #35312. #35313 (Kseniia Sumarokova).
- Fix bug in S3 zero-copy replication which can lead to errors like
Found parts with the same min block and with the same max block as the missing part
after concurrent fetch/drop table. #35348 (alesapin).
NO CL ENTRY
- NO CL ENTRY: '[ImgBot] Optimize images'. #34590 (imgbot[bot]).
- NO CL ENTRY: 'Revert "Allow restrictive row policies without permissive"'. #34782 (Vitaly Baranov).
- NO CL ENTRY: 'Revert "Remove "bugs" that do not exist anymore"'. #35241 (Alexey Milovidov).
- NO CL ENTRY: 'Revert "Change timezone in Docker"'. #35243 (Alexey Milovidov).
- NO CL ENTRY: 'Revert "Fix 00900_long_parquet_load"'. #35301 (Vladimir C).