Merge branch 'master' into keeper-client-fix-parser

This commit is contained in:
pufit 2023-06-29 10:16:13 -04:00 committed by GitHub
commit 61b62e3330
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
63 changed files with 1024 additions and 144 deletions

View File

@ -1,4 +1,5 @@
### Table of Contents ### Table of Contents
**[ClickHouse release v23.6, 2023-06-30](#236)**<br/>
**[ClickHouse release v23.5, 2023-06-08](#235)**<br/> **[ClickHouse release v23.5, 2023-06-08](#235)**<br/>
**[ClickHouse release v23.4, 2023-04-26](#234)**<br/> **[ClickHouse release v23.4, 2023-04-26](#234)**<br/>
**[ClickHouse release v23.3 LTS, 2023-03-30](#233)**<br/> **[ClickHouse release v23.3 LTS, 2023-03-30](#233)**<br/>
@ -8,6 +9,107 @@
# 2023 Changelog # 2023 Changelog
### <a id="236"></a> ClickHouse release 23.6, 2023-06-29
#### Backward Incompatible Change
* Delete feature `do_not_evict_index_and_mark_files` in the fs cache. This feature was only making things worse. [#51253](https://github.com/ClickHouse/ClickHouse/pull/51253) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Remove ALTER support for experimental LIVE VIEW. [#51287](https://github.com/ClickHouse/ClickHouse/pull/51287) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Decrease the default values for `http_max_field_value_size` and `http_max_field_name_size` to 128 KiB. [#51163](https://github.com/ClickHouse/ClickHouse/pull/51163) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* CGroups metrics related to CPU are replaced with one metric, `CGroupMaxCPU` for better usability. The `Normalized` CPU usage metrics will be normalized to CGroups limits instead of the total number of CPUs when they are set. This closes [#50836](https://github.com/ClickHouse/ClickHouse/issues/50836). [#50835](https://github.com/ClickHouse/ClickHouse/pull/50835) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Added `Overlay` database engine to combine multiple databases into one. Added `Filesystem` database engine to represent a directory in the filesystem as a set of implicitly available tables with auto-detected formats and structures. A new `S3` database engine allows to read-only interact with s3 storage by representing a prefix as a set of tables. A new `HDFS` database engine allows to interact with HDFS storage in the same way. [#48821](https://github.com/ClickHouse/ClickHouse/pull/48821) ([alekseygolub](https://github.com/alekseygolub)).
* The function `transform` as well as `CASE` with value matching started to support all data types. This closes [#29730](https://github.com/ClickHouse/ClickHouse/issues/29730). This closes [#32387](https://github.com/ClickHouse/ClickHouse/issues/32387). This closes [#50827](https://github.com/ClickHouse/ClickHouse/issues/50827). This closes [#31336](https://github.com/ClickHouse/ClickHouse/issues/31336). This closes [#40493](https://github.com/ClickHouse/ClickHouse/issues/40493). [#51351](https://github.com/ClickHouse/ClickHouse/pull/51351) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added option `--rename_files_after_processing <pattern>`. This closes [#34207](https://github.com/ClickHouse/ClickHouse/issues/34207). [#49626](https://github.com/ClickHouse/ClickHouse/pull/49626) ([alekseygolub](https://github.com/alekseygolub)).
* Add support for `APPEND` modifier in `INTO OUTFILE` clause. Suggest using `APPEND` or `TRUNCATE` for `INTO OUTFILE` when file exists. [#50950](https://github.com/ClickHouse/ClickHouse/pull/50950) ([alekar](https://github.com/alekar)).
* Add table engine `Redis` and table function `redis`. It allows querying external Redis servers. [#50150](https://github.com/ClickHouse/ClickHouse/pull/50150) ([JackyWoo](https://github.com/JackyWoo)).
* Allow to skip empty files in file/s3/url/hdfs table functions using settings `s3_skip_empty_files`, `hdfs_skip_empty_files`, `engine_file_skip_empty_files`, `engine_url_skip_empty_files`. [#50364](https://github.com/ClickHouse/ClickHouse/pull/50364) ([Kruglov Pavel](https://github.com/Avogar)).
* Add a new setting named `use_mysql_types_in_show_columns` to alter the `SHOW COLUMNS` SQL statement to display MySQL equivalent types when a client is connected via the MySQL compatibility port. [#49577](https://github.com/ClickHouse/ClickHouse/pull/49577) ([Thomas Panetti](https://github.com/tpanetti)).
* Clickhouse-client can now be called with a connection string instead of "--host", "--port", "--user" etc. [#50689](https://github.com/ClickHouse/ClickHouse/pull/50689) ([Alexey Gerasimchuck](https://github.com/Demilivor)).
* Add setting `session_timezone`, it is used as default timezone for session when not explicitly specified. [#44149](https://github.com/ClickHouse/ClickHouse/pull/44149) ([Andrey Zvonov](https://github.com/zvonand)).
* Codec DEFLATE_QPL is now controlled via server setting "enable_deflate_qpl_codec" (default: false) instead of setting "allow_experimental_codecs". This marks DEFLATE_QPL non-experimental. [#50775](https://github.com/ClickHouse/ClickHouse/pull/50775) ([Robert Schulze](https://github.com/rschu1ze)).
#### Performance Improvement
* Improved scheduling of merge selecting and cleanup tasks in `ReplicatedMergeTree`. The tasks will not be executed too frequently when there's nothing to merge or cleanup. Added settings `max_merge_selecting_sleep_ms`, `merge_selecting_sleep_slowdown_factor`, `max_cleanup_delay_period` and `cleanup_thread_preferred_points_per_iteration`. It should close [#31919](https://github.com/ClickHouse/ClickHouse/issues/31919). [#50107](https://github.com/ClickHouse/ClickHouse/pull/50107) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Make filter push down through cross join. [#50605](https://github.com/ClickHouse/ClickHouse/pull/50605) ([Han Fei](https://github.com/hanfei1991)).
* Improve performance with enabled QueryProfiler using thread-local timer_id instead of global object. [#48778](https://github.com/ClickHouse/ClickHouse/pull/48778) ([Jiebin Sun](https://github.com/jiebinn)).
* Rewrite CapnProto input/output format to improve its performance. Map column names and CapnProto fields case insensitive, fix reading/writing of nested structure fields. [#49752](https://github.com/ClickHouse/ClickHouse/pull/49752) ([Kruglov Pavel](https://github.com/Avogar)).
* Optimize parquet write performance for parallel threads. [#50102](https://github.com/ClickHouse/ClickHouse/pull/50102) ([Hongbin Ma](https://github.com/binmahone)).
* Disable `parallelize_output_from_storages` for processing MATERIALIZED VIEWs and storages with one block only. [#50214](https://github.com/ClickHouse/ClickHouse/pull/50214) ([Azat Khuzhin](https://github.com/azat)).
* Merge PR [#46558](https://github.com/ClickHouse/ClickHouse/pull/46558). Avoid block permutation during sort if the block is already sorted. [#50697](https://github.com/ClickHouse/ClickHouse/pull/50697) ([Alexey Milovidov](https://github.com/alexey-milovidov), [Maksim Kita](https://github.com/kitaisreal)).
* Make multiple list requests to ZooKeeper in parallel to speed up reading from system.zookeeper table. [#51042](https://github.com/ClickHouse/ClickHouse/pull/51042) ([Alexander Gololobov](https://github.com/davenger)).
* Speedup initialization of DateTime lookup tables for time zones. This should reduce startup/connect time of clickhouse-client especially in debug build as it is rather heavy. [#51347](https://github.com/ClickHouse/ClickHouse/pull/51347) ([Alexander Gololobov](https://github.com/davenger)).
* Fix data lakes slowness because of synchronous head requests. (Related to Iceberg/Deltalake/Hudi being slow with a lot of files). [#50976](https://github.com/ClickHouse/ClickHouse/pull/50976) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Do not replicate `ALTER PARTITION` queries and mutations through `Replicated` database if it has only one shard and the underlying table is `ReplicatedMergeTree`. [#51049](https://github.com/ClickHouse/ClickHouse/pull/51049) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Do not read all the columns from right GLOBAL JOIN table. [#50721](https://github.com/ClickHouse/ClickHouse/pull/50721) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
#### Experimental Feature
* Support parallel replicas with the analyzer. [#50441](https://github.com/ClickHouse/ClickHouse/pull/50441) ([Raúl Marín](https://github.com/Algunenano)).
* Add random sleep before large merges/mutations execution to split load more evenly between replicas in case of zero-copy replication. [#51282](https://github.com/ClickHouse/ClickHouse/pull/51282) ([alesapin](https://github.com/alesapin)).
#### Improvement
* Relax the thresholds for "too many parts" to be more modern. Return the backpressure during long-running insert queries. [#50856](https://github.com/ClickHouse/ClickHouse/pull/50856) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allow to cast IPv6 to IPv4 address for CIDR ::ffff:0:0/96 (IPv4-mapped addresses). [#49759](https://github.com/ClickHouse/ClickHouse/pull/49759) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Update MongoDB protocol to support MongoDB 5.1 version and newer. Support for the versions with the old protocol (<3.6) is preserved. Closes [#45621](https://github.com/ClickHouse/ClickHouse/issues/45621), [#49879](https://github.com/ClickHouse/ClickHouse/issues/49879). [#50061](https://github.com/ClickHouse/ClickHouse/pull/50061) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add setting `input_format_max_bytes_to_read_for_schema_inference` to limit the number of bytes to read in schema inference. Closes [#50577](https://github.com/ClickHouse/ClickHouse/issues/50577). [#50592](https://github.com/ClickHouse/ClickHouse/pull/50592) ([Kruglov Pavel](https://github.com/Avogar)).
* Respect setting `input_format_null_as_default` in schema inference. [#50602](https://github.com/ClickHouse/ClickHouse/pull/50602) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow to skip trailing empty lines in CSV/TSV/CustomSeparated formats via settings `input_format_csv_skip_trailing_empty_lines`, `input_format_tsv_skip_trailing_empty_lines` and `input_format_custom_skip_trailing_empty_lines` (disabled by default). Closes [#49315](https://github.com/ClickHouse/ClickHouse/issues/49315). [#50635](https://github.com/ClickHouse/ClickHouse/pull/50635) ([Kruglov Pavel](https://github.com/Avogar)).
* Functions "toDateOrDefault|OrNull" and "accuateCast[OrDefault|OrNull]" now correctly parse numeric arguments. [#50709](https://github.com/ClickHouse/ClickHouse/pull/50709) ([Dmitry Kardymon](https://github.com/kardymonds)).
* Support CSV with whitespace or `\t` field delimiters, and these delimiters are supported in Spark. [#50712](https://github.com/ClickHouse/ClickHouse/pull/50712) ([KevinyhZou](https://github.com/KevinyhZou)).
* Settings `number_of_mutations_to_delay` and `number_of_mutations_to_throw` are enabled by default now with values 500 and 1000 respectively. [#50726](https://github.com/ClickHouse/ClickHouse/pull/50726) ([Anton Popov](https://github.com/CurtizJ)).
* The dashboard correctly shows missing values. This closes [#50831](https://github.com/ClickHouse/ClickHouse/issues/50831). [#50832](https://github.com/ClickHouse/ClickHouse/pull/50832) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added the possibility to use date and time arguments in the syslog timestamp format in functions `parseDateTimeBestEffort*` and `parseDateTime64BestEffort*`. [#50925](https://github.com/ClickHouse/ClickHouse/pull/50925) ([Victor Krasnov](https://github.com/sirvickr)).
* Command line parameter "--password" in clickhouse-client can now be specified only once. [#50966](https://github.com/ClickHouse/ClickHouse/pull/50966) ([Alexey Gerasimchuck](https://github.com/Demilivor)).
* Use `hash_of_all_files` from `system.parts` to check identity of parts during on-cluster backups. [#50997](https://github.com/ClickHouse/ClickHouse/pull/50997) ([Vitaly Baranov](https://github.com/vitlibar)).
* The system table zookeeper_connection connected_time identifies the time when the connection is established (standard format), and session_uptime_elapsed_seconds is added, which labels the duration of the established connection session (in seconds). [#51026](https://github.com/ClickHouse/ClickHouse/pull/51026) ([郭小龙](https://github.com/guoxiaolongzte)).
* Improve the progress bar for file/s3/hdfs/url table functions by using chunk size from source data and using incremental total size counting in each thread. Fix the progress bar for *Cluster functions. This closes [#47250](https://github.com/ClickHouse/ClickHouse/issues/47250). [#51088](https://github.com/ClickHouse/ClickHouse/pull/51088) ([Kruglov Pavel](https://github.com/Avogar)).
* Add total_bytes_to_read to the Progress packet in TCP protocol for better Progress bar. [#51158](https://github.com/ClickHouse/ClickHouse/pull/51158) ([Kruglov Pavel](https://github.com/Avogar)).
* Better checking of data parts on disks with filesystem cache. [#51164](https://github.com/ClickHouse/ClickHouse/pull/51164) ([Anton Popov](https://github.com/CurtizJ)).
* Fix sometimes not correct current_elements_num in fs cache. [#51242](https://github.com/ClickHouse/ClickHouse/pull/51242) ([Kseniia Sumarokova](https://github.com/kssenii)).
#### Build/Testing/Packaging Improvement
* Add embedded keeper-client to standalone keeper binary. [#50964](https://github.com/ClickHouse/ClickHouse/pull/50964) ([pufit](https://github.com/pufit)).
* Actual LZ4 version is used now. [#50621](https://github.com/ClickHouse/ClickHouse/pull/50621) ([Nikita Taranov](https://github.com/nickitat)).
* ClickHouse server will print the list of changed settings on fatal errors. This closes [#51137](https://github.com/ClickHouse/ClickHouse/issues/51137). [#51138](https://github.com/ClickHouse/ClickHouse/pull/51138) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allow building ClickHouse with clang-17. [#51300](https://github.com/ClickHouse/ClickHouse/pull/51300) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* [SQLancer](https://github.com/sqlancer/sqlancer) check is considered stable as bugs that were triggered by it are fixed. Now failures of SQLancer check will be reported as failed check status. [#51340](https://github.com/ClickHouse/ClickHouse/pull/51340) ([Ilya Yatsishin](https://github.com/qoega)).
* Split huge `RUN` in Dockerfile into smaller conditional. Install the necessary tools on demand in the same `RUN` layer, and remove them after that. Upgrade the OS only once at the beginning. Use a modern way to check the signed repository. Downgrade the base repo to ubuntu:20.04 to address the issues on older docker versions. Upgrade golang version to address golang vulnerabilities. [#51504](https://github.com/ClickHouse/ClickHouse/pull/51504) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Report loading status for executable dictionaries correctly [#48775](https://github.com/ClickHouse/ClickHouse/pull/48775) ([Anton Kozlov](https://github.com/tonickkozlov)).
* Proper mutation of skip indices and projections [#50104](https://github.com/ClickHouse/ClickHouse/pull/50104) ([Amos Bird](https://github.com/amosbird)).
* Cleanup moving parts [#50489](https://github.com/ClickHouse/ClickHouse/pull/50489) ([vdimir](https://github.com/vdimir)).
* Fix backward compatibility for IP types hashing in aggregate functions [#50551](https://github.com/ClickHouse/ClickHouse/pull/50551) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix Log family table return wrong rows count after truncate [#50585](https://github.com/ClickHouse/ClickHouse/pull/50585) ([flynn](https://github.com/ucasfl)).
* Fix bug in `uniqExact` parallel merging [#50590](https://github.com/ClickHouse/ClickHouse/pull/50590) ([Nikita Taranov](https://github.com/nickitat)).
* Revert recent grace hash join changes [#50699](https://github.com/ClickHouse/ClickHouse/pull/50699) ([vdimir](https://github.com/vdimir)).
* Query Cache: Try to fix bad cast from `ColumnConst` to `ColumnVector<char8_t>` [#50704](https://github.com/ClickHouse/ClickHouse/pull/50704) ([Robert Schulze](https://github.com/rschu1ze)).
* Avoid storing logs in Keeper containing unknown operation [#50751](https://github.com/ClickHouse/ClickHouse/pull/50751) ([Antonio Andelic](https://github.com/antonio2368)).
* SummingMergeTree support for DateTime64 [#50797](https://github.com/ClickHouse/ClickHouse/pull/50797) ([Jordi Villar](https://github.com/jrdi)).
* Add compatibility setting for non-const timezones [#50834](https://github.com/ClickHouse/ClickHouse/pull/50834) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix hashing of LDAP params in the cache entries [#50865](https://github.com/ClickHouse/ClickHouse/pull/50865) ([Julian Maicher](https://github.com/jmaicher)).
* Fallback to parsing big integer from String instead of exception in Parquet format [#50873](https://github.com/ClickHouse/ClickHouse/pull/50873) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix checking the lock file too often while writing a backup [#50889](https://github.com/ClickHouse/ClickHouse/pull/50889) ([Vitaly Baranov](https://github.com/vitlibar)).
* Do not apply projection if read-in-order was enabled. [#50923](https://github.com/ClickHouse/ClickHouse/pull/50923) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix race in the Azure blob storage iterator [#50936](https://github.com/ClickHouse/ClickHouse/pull/50936) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix erroneous `sort_description` propagation in `CreatingSets` [#50955](https://github.com/ClickHouse/ClickHouse/pull/50955) ([Nikita Taranov](https://github.com/nickitat)).
* Fix Iceberg v2 optional metadata parsing [#50974](https://github.com/ClickHouse/ClickHouse/pull/50974) ([Kseniia Sumarokova](https://github.com/kssenii)).
* MaterializedMySQL: Keep parentheses for empty table overrides [#50977](https://github.com/ClickHouse/ClickHouse/pull/50977) ([Val Doroshchuk](https://github.com/valbok)).
* Fix crash in BackupCoordinationStageSync::setError() [#51012](https://github.com/ClickHouse/ClickHouse/pull/51012) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix subtly broken copy-on-write of ColumnLowCardinality dictionary [#51064](https://github.com/ClickHouse/ClickHouse/pull/51064) ([Michael Kolupaev](https://github.com/al13n321)).
* Generate safe IVs [#51086](https://github.com/ClickHouse/ClickHouse/pull/51086) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fix ineffective query cache for SELECTs with subqueries [#51132](https://github.com/ClickHouse/ClickHouse/pull/51132) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix Set index with constant nullable comparison. [#51205](https://github.com/ClickHouse/ClickHouse/pull/51205) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix a crash in s3 and s3Cluster functions [#51209](https://github.com/ClickHouse/ClickHouse/pull/51209) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix a crash with compiled expressions [#51231](https://github.com/ClickHouse/ClickHouse/pull/51231) ([LiuNeng](https://github.com/liuneng1994)).
* Fix use-after-free in StorageURL when switching URLs [#51260](https://github.com/ClickHouse/ClickHouse/pull/51260) ([Michael Kolupaev](https://github.com/al13n321)).
* Updated check for parameterized view [#51272](https://github.com/ClickHouse/ClickHouse/pull/51272) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix multiple writing of same file to backup [#51299](https://github.com/ClickHouse/ClickHouse/pull/51299) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix fuzzer failure in ActionsDAG [#51301](https://github.com/ClickHouse/ClickHouse/pull/51301) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove garbage from function `transform` [#51350](https://github.com/ClickHouse/ClickHouse/pull/51350) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
### <a id="235"></a> ClickHouse release 23.5, 2023-06-08 ### <a id="235"></a> ClickHouse release 23.5, 2023-06-08
#### Upgrade Notes #### Upgrade Notes

View File

@ -2,21 +2,23 @@
#include <base/strong_typedef.h> #include <base/strong_typedef.h>
#include <base/extended_types.h> #include <base/extended_types.h>
#include <Common/formatIPv6.h>
#include <Common/memcmpSmall.h> #include <Common/memcmpSmall.h>
namespace DB namespace DB
{ {
using IPv4 = StrongTypedef<UInt32, struct IPv4Tag>; struct IPv4 : StrongTypedef<UInt32, struct IPv4Tag>
{
using StrongTypedef::StrongTypedef;
using StrongTypedef::operator=;
constexpr explicit IPv4(UInt64 value): StrongTypedef(static_cast<UnderlyingType>(value)) {}
};
struct IPv6 : StrongTypedef<UInt128, struct IPv6Tag> struct IPv6 : StrongTypedef<UInt128, struct IPv6Tag>
{ {
constexpr IPv6() = default; using StrongTypedef::StrongTypedef;
constexpr explicit IPv6(const UInt128 & x) : StrongTypedef(x) {} using StrongTypedef::operator=;
constexpr explicit IPv6(UInt128 && x) : StrongTypedef(std::move(x)) {}
IPv6 & operator=(const UInt128 & rhs) { StrongTypedef::operator=(rhs); return *this; }
IPv6 & operator=(UInt128 && rhs) { StrongTypedef::operator=(std::move(rhs)); return *this; }
bool operator<(const IPv6 & rhs) const bool operator<(const IPv6 & rhs) const
{ {
@ -54,12 +56,22 @@ namespace DB
namespace std namespace std
{ {
/// For historical reasons we hash IPv6 as a FixedString(16)
template <> template <>
struct hash<DB::IPv6> struct hash<DB::IPv6>
{ {
size_t operator()(const DB::IPv6 & x) const size_t operator()(const DB::IPv6 & x) const
{ {
return std::hash<DB::IPv6::UnderlyingType>()(x.toUnderType()); return std::hash<std::string_view>{}(std::string_view(reinterpret_cast<const char*>(&x.toUnderType()), IPV6_BINARY_LENGTH));
}
};
template <>
struct hash<DB::IPv4>
{
size_t operator()(const DB::IPv4 & x) const
{
return std::hash<DB::IPv4::UnderlyingType>()(x.toUnderType());
} }
}; };
} }

View File

@ -27,6 +27,8 @@ using FromDoubleIntermediateType = long double;
using FromDoubleIntermediateType = boost::multiprecision::cpp_bin_float_double_extended; using FromDoubleIntermediateType = boost::multiprecision::cpp_bin_float_double_extended;
#endif #endif
namespace CityHash_v1_0_2 { struct uint128; }
namespace wide namespace wide
{ {
@ -281,6 +283,17 @@ struct integer<Bits, Signed>::_impl
} }
} }
template <typename CityHashUInt128 = CityHash_v1_0_2::uint128>
constexpr static void wide_integer_from_cityhash_uint128(integer<Bits, Signed> & self, const CityHashUInt128 & value) noexcept
{
static_assert(sizeof(item_count) >= 2);
if constexpr (std::endian::native == std::endian::little)
wide_integer_from_tuple_like(self, std::make_pair(value.low64, value.high64));
else
wide_integer_from_tuple_like(self, std::make_pair(value.high64, value.low64));
}
/** /**
* N.B. t is constructed from double, so max(t) = max(double) ~ 2^310 * N.B. t is constructed from double, so max(t) = max(double) ~ 2^310
* the recursive call happens when t / 2^64 > 2^64, so there won't be more than 5 of them. * the recursive call happens when t / 2^64 > 2^64, so there won't be more than 5 of them.
@ -1036,6 +1049,8 @@ constexpr integer<Bits, Signed>::integer(T rhs) noexcept
_impl::wide_integer_from_wide_integer(*this, rhs); _impl::wide_integer_from_wide_integer(*this, rhs);
else if constexpr (IsTupleLike<T>::value) else if constexpr (IsTupleLike<T>::value)
_impl::wide_integer_from_tuple_like(*this, rhs); _impl::wide_integer_from_tuple_like(*this, rhs);
else if constexpr (std::is_same_v<std::remove_cvref_t<T>, CityHash_v1_0_2::uint128>)
_impl::wide_integer_from_cityhash_uint128(*this, rhs);
else else
_impl::wide_integer_from_builtin(*this, rhs); _impl::wide_integer_from_builtin(*this, rhs);
} }
@ -1051,6 +1066,8 @@ constexpr integer<Bits, Signed>::integer(std::initializer_list<T> il) noexcept
_impl::wide_integer_from_wide_integer(*this, *il.begin()); _impl::wide_integer_from_wide_integer(*this, *il.begin());
else if constexpr (IsTupleLike<T>::value) else if constexpr (IsTupleLike<T>::value)
_impl::wide_integer_from_tuple_like(*this, *il.begin()); _impl::wide_integer_from_tuple_like(*this, *il.begin());
else if constexpr (std::is_same_v<std::remove_cvref_t<T>, CityHash_v1_0_2::uint128>)
_impl::wide_integer_from_cityhash_uint128(*this, *il.begin());
else else
_impl::wide_integer_from_builtin(*this, *il.begin()); _impl::wide_integer_from_builtin(*this, *il.begin());
} }
@ -1088,6 +1105,8 @@ constexpr integer<Bits, Signed> & integer<Bits, Signed>::operator=(T rhs) noexce
{ {
if constexpr (IsTupleLike<T>::value) if constexpr (IsTupleLike<T>::value)
_impl::wide_integer_from_tuple_like(*this, rhs); _impl::wide_integer_from_tuple_like(*this, rhs);
else if constexpr (std::is_same_v<std::remove_cvref_t<T>, CityHash_v1_0_2::uint128>)
_impl::wide_integer_from_cityhash_uint128(*this, rhs);
else else
_impl::wide_integer_from_builtin(*this, rhs); _impl::wide_integer_from_builtin(*this, rhs);
return *this; return *this;

View File

@ -61,11 +61,24 @@ namespace CityHash_v1_0_2
typedef uint8_t uint8; typedef uint8_t uint8;
typedef uint32_t uint32; typedef uint32_t uint32;
typedef uint64_t uint64; typedef uint64_t uint64;
typedef std::pair<uint64, uint64> uint128;
/// Represent an unsigned integer of 128 bits as it's used in CityHash.
/// Originally CityHash used `std::pair<uint64, uint64>` instead of this struct,
/// however the members `first` and `second` could be easily confused so they were renamed to `low64` and `high64`:
/// `first` -> `low64`, `second` -> `high64`.
struct uint128
{
uint64 low64 = 0;
uint64 high64 = 0;
inline uint64 Uint128Low64(const uint128& x) { return x.first; } uint128() = default;
inline uint64 Uint128High64(const uint128& x) { return x.second; } uint128(uint64 low64_, uint64 high64_) : low64(low64_), high64(high64_) {}
friend bool operator ==(const uint128 & x, const uint128 & y) { return (x.low64 == y.low64) && (x.high64 == y.high64); }
friend bool operator !=(const uint128 & x, const uint128 & y) { return !(x == y); }
};
inline uint64 Uint128Low64(const uint128 & x) { return x.low64; }
inline uint64 Uint128High64(const uint128 & x) { return x.high64; }
// Hash function for a byte array. // Hash function for a byte array.
uint64 CityHash64(const char *buf, size_t len); uint64 CityHash64(const char *buf, size_t len);

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
esac esac
ARG REPOSITORY="https://s3.amazonaws.com/clickhouse-builds/22.4/31c367d3cd3aefd316778601ff6565119fe36682/package_release" ARG REPOSITORY="https://s3.amazonaws.com/clickhouse-builds/22.4/31c367d3cd3aefd316778601ff6565119fe36682/package_release"
ARG VERSION="23.5.3.24" ARG VERSION="23.5.4.25"
ARG PACKAGES="clickhouse-keeper" ARG PACKAGES="clickhouse-keeper"
# user/group precreated explicitly with fixed uid/gid on purpose. # user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -89,7 +89,7 @@ RUN arch=${TARGETARCH:-amd64} \
&& dpkg -i /tmp/nfpm.deb \ && dpkg -i /tmp/nfpm.deb \
&& rm /tmp/nfpm.deb && rm /tmp/nfpm.deb
ARG GO_VERSION=1.19.5 ARG GO_VERSION=1.19.10
# We need go for clickhouse-diagnostics # We need go for clickhouse-diagnostics
RUN arch=${TARGETARCH:-amd64} \ RUN arch=${TARGETARCH:-amd64} \
&& curl -Lo /tmp/go.tgz "https://go.dev/dl/go${GO_VERSION}.linux-${arch}.tar.gz" \ && curl -Lo /tmp/go.tgz "https://go.dev/dl/go${GO_VERSION}.linux-${arch}.tar.gz" \

View File

@ -33,7 +33,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc # lts / testing / prestable / etc
ARG REPO_CHANNEL="stable" ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}" ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.5.3.24" ARG VERSION="23.5.4.25"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static" ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# user/group precreated explicitly with fixed uid/gid on purpose. # user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -1,4 +1,4 @@
FROM ubuntu:22.04 FROM ubuntu:20.04
# see https://github.com/moby/moby/issues/4032#issuecomment-192327844 # see https://github.com/moby/moby/issues/4032#issuecomment-192327844
ARG DEBIAN_FRONTEND=noninteractive ARG DEBIAN_FRONTEND=noninteractive
@ -11,18 +11,19 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
&& apt-get update \ && apt-get update \
&& apt-get upgrade -yq \ && apt-get upgrade -yq \
&& apt-get install --yes --no-install-recommends \ && apt-get install --yes --no-install-recommends \
apt-transport-https \
ca-certificates \ ca-certificates \
dirmngr \
gnupg2 \
wget \
locales \ locales \
tzdata \ tzdata \
&& apt-get clean wget \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/var/cache/debconf \
/tmp/*
ARG REPO_CHANNEL="stable" ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb https://packages.clickhouse.com/deb ${REPO_CHANNEL} main" ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="23.5.3.24" ARG VERSION="23.5.4.25"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static" ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# set non-empty deb_location_url url to create a docker image # set non-empty deb_location_url url to create a docker image
@ -43,49 +44,68 @@ ARG single_binary_location_url=""
ARG TARGETARCH ARG TARGETARCH
RUN arch=${TARGETARCH:-amd64} \ # install from a web location with deb packages
RUN arch="${TARGETARCH:-amd64}" \
&& if [ -n "${deb_location_url}" ]; then \ && if [ -n "${deb_location_url}" ]; then \
echo "installing from custom url with deb packages: ${deb_location_url}" \ echo "installing from custom url with deb packages: ${deb_location_url}" \
rm -rf /tmp/clickhouse_debs \ && rm -rf /tmp/clickhouse_debs \
&& mkdir -p /tmp/clickhouse_debs \ && mkdir -p /tmp/clickhouse_debs \
&& for package in ${PACKAGES}; do \ && for package in ${PACKAGES}; do \
{ wget --progress=bar:force:noscroll "${deb_location_url}/${package}_${VERSION}_${arch}.deb" -P /tmp/clickhouse_debs || \ { wget --progress=bar:force:noscroll "${deb_location_url}/${package}_${VERSION}_${arch}.deb" -P /tmp/clickhouse_debs || \
wget --progress=bar:force:noscroll "${deb_location_url}/${package}_${VERSION}_all.deb" -P /tmp/clickhouse_debs ; } \ wget --progress=bar:force:noscroll "${deb_location_url}/${package}_${VERSION}_all.deb" -P /tmp/clickhouse_debs ; } \
|| exit 1 \ || exit 1 \
; done \ ; done \
&& dpkg -i /tmp/clickhouse_debs/*.deb ; \ && dpkg -i /tmp/clickhouse_debs/*.deb \
elif [ -n "${single_binary_location_url}" ]; then \ && rm -rf /tmp/* ; \
fi
# install from a single binary
RUN if [ -n "${single_binary_location_url}" ]; then \
echo "installing from single binary url: ${single_binary_location_url}" \ echo "installing from single binary url: ${single_binary_location_url}" \
&& rm -rf /tmp/clickhouse_binary \ && rm -rf /tmp/clickhouse_binary \
&& mkdir -p /tmp/clickhouse_binary \ && mkdir -p /tmp/clickhouse_binary \
&& wget --progress=bar:force:noscroll "${single_binary_location_url}" -O /tmp/clickhouse_binary/clickhouse \ && wget --progress=bar:force:noscroll "${single_binary_location_url}" -O /tmp/clickhouse_binary/clickhouse \
&& chmod +x /tmp/clickhouse_binary/clickhouse \ && chmod +x /tmp/clickhouse_binary/clickhouse \
&& /tmp/clickhouse_binary/clickhouse install --user "clickhouse" --group "clickhouse" ; \ && /tmp/clickhouse_binary/clickhouse install --user "clickhouse" --group "clickhouse" \
else \ && rm -rf /tmp/* ; \
mkdir -p /etc/apt/sources.list.d \ fi
&& apt-key adv --keyserver keyserver.ubuntu.com --recv 8919F6BD2B48D754 \
&& echo ${REPOSITORY} > /etc/apt/sources.list.d/clickhouse.list \ # A fallback to installation from ClickHouse repository
RUN if ! clickhouse local -q "SELECT ''" > /dev/null 2>&1; then \
apt-get update \
&& apt-get install --yes --no-install-recommends \
apt-transport-https \
ca-certificates \
dirmngr \
gnupg2 \
&& mkdir -p /etc/apt/sources.list.d \
&& GNUPGHOME=$(mktemp -d) \
&& GNUPGHOME="$GNUPGHOME" gpg --no-default-keyring \
--keyring /usr/share/keyrings/clickhouse-keyring.gpg \
--keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 8919F6BD2B48D754 \
&& rm -r "$GNUPGHOME" \
&& chmod +r /usr/share/keyrings/clickhouse-keyring.gpg \
&& echo "${REPOSITORY}" > /etc/apt/sources.list.d/clickhouse.list \
&& echo "installing from repository: ${REPOSITORY}" \ && echo "installing from repository: ${REPOSITORY}" \
&& apt-get update \ && apt-get update \
&& apt-get --yes -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" upgrade \
&& for package in ${PACKAGES}; do \ && for package in ${PACKAGES}; do \
packages="${packages} ${package}=${VERSION}" \ packages="${packages} ${package}=${VERSION}" \
; done \ ; done \
&& apt-get install --allow-unauthenticated --yes --no-install-recommends ${packages} || exit 1 \ && apt-get install --allow-unauthenticated --yes --no-install-recommends ${packages} || exit 1 \
; fi \
&& clickhouse-local -q 'SELECT * FROM system.build_options' \
&& rm -rf \ && rm -rf \
/var/lib/apt/lists/* \ /var/lib/apt/lists/* \
/var/cache/debconf \ /var/cache/debconf \
/tmp/* \ /tmp/* \
&& mkdir -p /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client \ && apt-get autoremove --purge -yq libksba8 \
&& chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client && apt-get autoremove -yq \
; fi
RUN apt-get autoremove --purge -yq libksba8 && \
apt-get autoremove -yq
# post install
# we need to allow "others" access to clickhouse folder, because docker container # we need to allow "others" access to clickhouse folder, because docker container
# can be started with arbitrary uid (openshift usecase) # can be started with arbitrary uid (openshift usecase)
RUN clickhouse-local -q 'SELECT * FROM system.build_options' \
&& mkdir -p /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client \
&& chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client
RUN locale-gen en_US.UTF-8 RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8 ENV LANG en_US.UTF-8

View File

@ -20,7 +20,6 @@ For more information and documentation see https://clickhouse.com/.
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3. - The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A). Most ARM CPUs after 2017 support ARMv8.2-A. A notable exception is Raspberry Pi 4 from 2019 whose CPU only supports ARMv8.0-A. - The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A). Most ARM CPUs after 2017 support ARMv8.2-A. A notable exception is Raspberry Pi 4 from 2019 whose CPU only supports ARMv8.0-A.
- Since the Clickhouse 23.3 Ubuntu image started using `ubuntu:22.04` as its base image, it requires docker version >= `20.10.10`, or use `docker run -- privileged` instead. Alternatively, try the Clickhouse Alpine image.
## How to use this image ## How to use this image

View File

@ -52,6 +52,8 @@ export CLICKHOUSE_TESTS_BASE_CONFIG_DIR=/clickhouse-config
export CLICKHOUSE_ODBC_BRIDGE_BINARY_PATH=/clickhouse-odbc-bridge export CLICKHOUSE_ODBC_BRIDGE_BINARY_PATH=/clickhouse-odbc-bridge
export CLICKHOUSE_LIBRARY_BRIDGE_BINARY_PATH=/clickhouse-library-bridge export CLICKHOUSE_LIBRARY_BRIDGE_BINARY_PATH=/clickhouse-library-bridge
export DOCKER_BASE_TAG=${DOCKER_BASE_TAG:=latest}
export DOCKER_HELPER_TAG=${DOCKER_HELPER_TAG:=latest}
export DOCKER_MYSQL_GOLANG_CLIENT_TAG=${DOCKER_MYSQL_GOLANG_CLIENT_TAG:=latest} export DOCKER_MYSQL_GOLANG_CLIENT_TAG=${DOCKER_MYSQL_GOLANG_CLIENT_TAG:=latest}
export DOCKER_DOTNET_CLIENT_TAG=${DOCKER_DOTNET_CLIENT_TAG:=latest} export DOCKER_DOTNET_CLIENT_TAG=${DOCKER_DOTNET_CLIENT_TAG:=latest}
export DOCKER_MYSQL_JAVA_CLIENT_TAG=${DOCKER_MYSQL_JAVA_CLIENT_TAG:=latest} export DOCKER_MYSQL_JAVA_CLIENT_TAG=${DOCKER_MYSQL_JAVA_CLIENT_TAG:=latest}

View File

@ -0,0 +1,19 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.3.6.7-lts (7e3f0a271b7) FIXME as compared to v23.3.5.9-lts (f5fbc2fd2b3)
#### Improvement
* Backported in [#51240](https://github.com/ClickHouse/ClickHouse/issues/51240): Improve the progress bar for file/s3/hdfs/url table functions by using chunk size from source data and using incremental total size counting in each thread. Fix the progress bar for *Cluster functions. This closes [#47250](https://github.com/ClickHouse/ClickHouse/issues/47250). [#51088](https://github.com/ClickHouse/ClickHouse/pull/51088) ([Kruglov Pavel](https://github.com/Avogar)).
#### Build/Testing/Packaging Improvement
* Backported in [#51529](https://github.com/ClickHouse/ClickHouse/issues/51529): Split huge `RUN` in Dockerfile into smaller conditional. Install the necessary tools on demand in the same `RUN` layer, and remove them after that. Upgrade the OS only once at the beginning. Use a modern way to check the signed repository. Downgrade the base repo to ubuntu:20.04 to address the issues on older docker versions. Upgrade golang version to address golang vulnerabilities. [#51504](https://github.com/ClickHouse/ClickHouse/pull/51504) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix type of LDAP server params hash in cache entry [#50865](https://github.com/ClickHouse/ClickHouse/pull/50865) ([Julian Maicher](https://github.com/jmaicher)).

View File

@ -0,0 +1,31 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.5.4.25-stable (190f962abcf) FIXME as compared to v23.5.3.24-stable (76f54616d3b)
#### Improvement
* Backported in [#51235](https://github.com/ClickHouse/ClickHouse/issues/51235): Improve the progress bar for file/s3/hdfs/url table functions by using chunk size from source data and using incremental total size counting in each thread. Fix the progress bar for *Cluster functions. This closes [#47250](https://github.com/ClickHouse/ClickHouse/issues/47250). [#51088](https://github.com/ClickHouse/ClickHouse/pull/51088) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#51255](https://github.com/ClickHouse/ClickHouse/issues/51255): Disable cache setting `do_not_evict_index_and_mark_files` (Was enabled in `23.5`). [#51222](https://github.com/ClickHouse/ClickHouse/pull/51222) ([Kseniia Sumarokova](https://github.com/kssenii)).
#### Build/Testing/Packaging Improvement
* Backported in [#51531](https://github.com/ClickHouse/ClickHouse/issues/51531): Split huge `RUN` in Dockerfile into smaller conditional. Install the necessary tools on demand in the same `RUN` layer, and remove them after that. Upgrade the OS only once at the beginning. Use a modern way to check the signed repository. Downgrade the base repo to ubuntu:20.04 to address the issues on older docker versions. Upgrade golang version to address golang vulnerabilities. [#51504](https://github.com/ClickHouse/ClickHouse/pull/51504) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#51572](https://github.com/ClickHouse/ClickHouse/issues/51572): This a follow-up for [#51504](https://github.com/ClickHouse/ClickHouse/issues/51504), the cleanup was lost during refactoring. [#51564](https://github.com/ClickHouse/ClickHouse/pull/51564) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Query Cache: Try to fix bad cast from ColumnConst to ColumnVector<char8_t> [#50704](https://github.com/ClickHouse/ClickHouse/pull/50704) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix type of LDAP server params hash in cache entry [#50865](https://github.com/ClickHouse/ClickHouse/pull/50865) ([Julian Maicher](https://github.com/jmaicher)).
* Fallback to parsing big integer from String instead of exception in Parquet format [#50873](https://github.com/ClickHouse/ClickHouse/pull/50873) ([Kruglov Pavel](https://github.com/Avogar)).
* Do not apply projection if read-in-order was enabled. [#50923](https://github.com/ClickHouse/ClickHouse/pull/50923) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix race azure blob storage iterator [#50936](https://github.com/ClickHouse/ClickHouse/pull/50936) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix ineffective query cache for SELECTs with subqueries [#51132](https://github.com/ClickHouse/ClickHouse/pull/51132) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix fuzzer failure in ActionsDAG [#51301](https://github.com/ClickHouse/ClickHouse/pull/51301) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Fix ParallelReadBuffer seek [#50820](https://github.com/ClickHouse/ClickHouse/pull/50820) ([Michael Kolupaev](https://github.com/al13n321)).

View File

@ -44,11 +44,12 @@ Create a table in ClickHouse which allows to read data from Redis:
``` sql ``` sql
CREATE TABLE redis_table CREATE TABLE redis_table
( (
`k` String, `key` String,
`m` String, `v1` UInt32,
`n` UInt32 `v2` String,
`v3` Float32
) )
ENGINE = Redis('redis1:6379') PRIMARY KEY(k); ENGINE = Redis('redis1:6379') PRIMARY KEY(key);
``` ```
Insert: Insert:
@ -111,9 +112,16 @@ Flush Redis db asynchronously. Also `Truncate` support SYNC mode.
TRUNCATE TABLE redis_table SYNC; TRUNCATE TABLE redis_table SYNC;
``` ```
Join:
Join with other tables.
```
SELECT * FROM redis_table JOIN merge_tree_table ON merge_tree_table.key=redis_table.key;
```
## Limitations {#limitations} ## Limitations {#limitations}
Redis engine also supports scanning queries, such as `where k > xx`, but it has some limitations: Redis engine also supports scanning queries, such as `where k > xx`, but it has some limitations:
1. Scanning query may produce some duplicated keys in a very rare case when it is rehashing. See details in [Redis Scan](https://github.com/redis/redis/blob/e4d183afd33e0b2e6e8d1c79a832f678a04a7886/src/dict.c#L1186-L1269) 1. Scanning query may produce some duplicated keys in a very rare case when it is rehashing. See details in [Redis Scan](https://github.com/redis/redis/blob/e4d183afd33e0b2e6e8d1c79a832f678a04a7886/src/dict.c#L1186-L1269).
2. During the scanning, keys could be created and deleted, so the resulting dataset can not represent a valid point in time. 2. During the scanning, keys could be created and deleted, so the resulting dataset can not represent a valid point in time.

View File

@ -2454,18 +2454,22 @@ In this format, all input data is read to a single value. It is possible to pars
The result is output in binary format without delimiters and escaping. If more than one value is output, the format is ambiguous, and it will be impossible to read the data back. The result is output in binary format without delimiters and escaping. If more than one value is output, the format is ambiguous, and it will be impossible to read the data back.
Below is a comparison of the formats `RawBLOB` and [TabSeparatedRaw](#tabseparatedraw). Below is a comparison of the formats `RawBLOB` and [TabSeparatedRaw](#tabseparatedraw).
`RawBLOB`: `RawBLOB`:
- data is output in binary format, no escaping; - data is output in binary format, no escaping;
- there are no delimiters between values; - there are no delimiters between values;
- no newline at the end of each value. - no newline at the end of each value.
[TabSeparatedRaw] (#tabseparatedraw):
`TabSeparatedRaw`:
- data is output without escaping; - data is output without escaping;
- the rows contain values separated by tabs; - the rows contain values separated by tabs;
- there is a line feed after the last value in every row. - there is a line feed after the last value in every row.
The following is a comparison of the `RawBLOB` and [RowBinary](#rowbinary) formats. The following is a comparison of the `RawBLOB` and [RowBinary](#rowbinary) formats.
`RawBLOB`: `RawBLOB`:
- String fields are output without being prefixed by length. - String fields are output without being prefixed by length.
`RowBinary`: `RowBinary`:
- String fields are represented as length in varint format (unsigned [LEB128] (https://en.wikipedia.org/wiki/LEB128)), followed by the bytes of the string. - String fields are represented as length in varint format (unsigned [LEB128] (https://en.wikipedia.org/wiki/LEB128)), followed by the bytes of the string.

View File

@ -97,6 +97,10 @@ Result:
If you apply this combinator, the aggregate function does not return the resulting value (such as the number of unique values for the [uniq](../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) function), but an intermediate state of the aggregation (for `uniq`, this is the hash table for calculating the number of unique values). This is an `AggregateFunction(...)` that can be used for further processing or stored in a table to finish aggregating later. If you apply this combinator, the aggregate function does not return the resulting value (such as the number of unique values for the [uniq](../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) function), but an intermediate state of the aggregation (for `uniq`, this is the hash table for calculating the number of unique values). This is an `AggregateFunction(...)` that can be used for further processing or stored in a table to finish aggregating later.
:::note
Please notice, that -MapState is not an invariant for the same data due to the fact that order of data in intermediate state can change, though it doesn't impact ingestion of this data.
:::
To work with these states, use: To work with these states, use:
- [AggregatingMergeTree](../../engines/table-engines/mergetree-family/aggregatingmergetree.md) table engine. - [AggregatingMergeTree](../../engines/table-engines/mergetree-family/aggregatingmergetree.md) table engine.

View File

@ -8,7 +8,7 @@ sidebar_label: Nullable
## isNull ## isNull
Returns whether the argument is [NULL](../../sql-reference/syntax.md#null-literal). Returns whether the argument is [NULL](../../sql-reference/syntax.md#null).
``` sql ``` sql
isNull(x) isNull(x)

View File

@ -21,6 +21,9 @@ Expressions from `ON` clause and columns from `USING` clause are called “join
## Related Content ## Related Content
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Part 1](https://clickhouse.com/blog/clickhouse-fully-supports-joins) - Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Part 1](https://clickhouse.com/blog/clickhouse-fully-supports-joins)
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 2](https://clickhouse.com/blog/clickhouse-fully-supports-joins-hash-joins-part2)
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 3](https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3)
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 4](https://clickhouse.com/blog/clickhouse-fully-supports-joins-direct-join-part4)
## Supported Types of JOIN ## Supported Types of JOIN

View File

@ -66,6 +66,10 @@ WITH anySimpleState(number) AS c SELECT toTypeName(c), c FROM numbers(1);
В случае применения этого комбинатора, агрегатная функция возвращает не готовое значение (например, в случае функции [uniq](reference/uniq.md#agg_function-uniq) — количество уникальных значений), а промежуточное состояние агрегации (например, в случае функции `uniq` — хэш-таблицу для расчёта количества уникальных значений), которое имеет тип `AggregateFunction(...)` и может использоваться для дальнейшей обработки или может быть сохранено в таблицу для последующей доагрегации. В случае применения этого комбинатора, агрегатная функция возвращает не готовое значение (например, в случае функции [uniq](reference/uniq.md#agg_function-uniq) — количество уникальных значений), а промежуточное состояние агрегации (например, в случае функции `uniq` — хэш-таблицу для расчёта количества уникальных значений), которое имеет тип `AggregateFunction(...)` и может использоваться для дальнейшей обработки или может быть сохранено в таблицу для последующей доагрегации.
:::note
Промежуточное состояние для -MapState не является инвариантом для одних и тех же исходных данные т.к. порядок данных может меняться. Это не влияет, тем не менее, на загрузку таких данных.
:::
Для работы с промежуточными состояниями предназначены: Для работы с промежуточными состояниями предназначены:
- Движок таблиц [AggregatingMergeTree](../../engines/table-engines/mergetree-family/aggregatingmergetree.md). - Движок таблиц [AggregatingMergeTree](../../engines/table-engines/mergetree-family/aggregatingmergetree.md).

View File

@ -25,6 +25,7 @@ IAggregateFunction * createWithNumericOrTimeType(const IDataType & argument_type
WhichDataType which(argument_type); WhichDataType which(argument_type);
if (which.idx == TypeIndex::Date) return new AggregateFunctionTemplate<UInt16, Data>(std::forward<TArgs>(args)...); if (which.idx == TypeIndex::Date) return new AggregateFunctionTemplate<UInt16, Data>(std::forward<TArgs>(args)...);
if (which.idx == TypeIndex::DateTime) return new AggregateFunctionTemplate<UInt32, Data>(std::forward<TArgs>(args)...); if (which.idx == TypeIndex::DateTime) return new AggregateFunctionTemplate<UInt32, Data>(std::forward<TArgs>(args)...);
if (which.idx == TypeIndex::IPv4) return new AggregateFunctionTemplate<IPv4, Data>(std::forward<TArgs>(args)...);
return createWithNumericType<AggregateFunctionTemplate, Data, TArgs...>(argument_type, std::forward<TArgs>(args)...); return createWithNumericType<AggregateFunctionTemplate, Data, TArgs...>(argument_type, std::forward<TArgs>(args)...);
} }

View File

@ -4,6 +4,7 @@
#include <AggregateFunctions/FactoryHelpers.h> #include <AggregateFunctions/FactoryHelpers.h>
#include <DataTypes/DataTypeDate.h> #include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDateTime.h> #include <DataTypes/DataTypeDateTime.h>
#include <DataTypes/DataTypeIPv4andIPv6.h>
namespace DB namespace DB
@ -39,12 +40,22 @@ public:
static DataTypePtr createResultType() { return std::make_shared<DataTypeArray>(std::make_shared<DataTypeDateTime>()); } static DataTypePtr createResultType() { return std::make_shared<DataTypeArray>(std::make_shared<DataTypeDateTime>()); }
}; };
template <typename HasLimit>
class AggregateFunctionGroupUniqArrayIPv4 : public AggregateFunctionGroupUniqArray<DataTypeIPv4::FieldType, HasLimit>
{
public:
explicit AggregateFunctionGroupUniqArrayIPv4(const DataTypePtr & argument_type, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits<UInt64>::max())
: AggregateFunctionGroupUniqArray<DataTypeIPv4::FieldType, HasLimit>(argument_type, parameters_, createResultType(), max_elems_) {}
static DataTypePtr createResultType() { return std::make_shared<DataTypeArray>(std::make_shared<DataTypeIPv4>()); }
};
template <typename HasLimit, typename ... TArgs> template <typename HasLimit, typename ... TArgs>
IAggregateFunction * createWithExtraTypes(const DataTypePtr & argument_type, TArgs && ... args) IAggregateFunction * createWithExtraTypes(const DataTypePtr & argument_type, TArgs && ... args)
{ {
WhichDataType which(argument_type); WhichDataType which(argument_type);
if (which.idx == TypeIndex::Date) return new AggregateFunctionGroupUniqArrayDate<HasLimit>(argument_type, std::forward<TArgs>(args)...); if (which.idx == TypeIndex::Date) return new AggregateFunctionGroupUniqArrayDate<HasLimit>(argument_type, std::forward<TArgs>(args)...);
else if (which.idx == TypeIndex::DateTime) return new AggregateFunctionGroupUniqArrayDateTime<HasLimit>(argument_type, std::forward<TArgs>(args)...); else if (which.idx == TypeIndex::DateTime) return new AggregateFunctionGroupUniqArrayDateTime<HasLimit>(argument_type, std::forward<TArgs>(args)...);
else if (which.idx == TypeIndex::IPv4) return new AggregateFunctionGroupUniqArrayIPv4<HasLimit>(argument_type, std::forward<TArgs>(args)...);
else else
{ {
/// Check that we can use plain version of AggregateFunctionGroupUniqArrayGeneric /// Check that we can use plain version of AggregateFunctionGroupUniqArrayGeneric

View File

@ -100,6 +100,10 @@ public:
return std::make_shared<AggregateFunctionMap<UInt256>>(nested_function, arguments); return std::make_shared<AggregateFunctionMap<UInt256>>(nested_function, arguments);
case TypeIndex::UUID: case TypeIndex::UUID:
return std::make_shared<AggregateFunctionMap<UUID>>(nested_function, arguments); return std::make_shared<AggregateFunctionMap<UUID>>(nested_function, arguments);
case TypeIndex::IPv4:
return std::make_shared<AggregateFunctionMap<IPv4>>(nested_function, arguments);
case TypeIndex::IPv6:
return std::make_shared<AggregateFunctionMap<IPv6>>(nested_function, arguments);
case TypeIndex::FixedString: case TypeIndex::FixedString:
case TypeIndex::String: case TypeIndex::String:
return std::make_shared<AggregateFunctionMap<String>>(nested_function, arguments); return std::make_shared<AggregateFunctionMap<String>>(nested_function, arguments);

View File

@ -19,7 +19,9 @@
#include <IO/ReadHelpers.h> #include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h> #include <IO/WriteHelpers.h>
#include "DataTypes/Serializations/ISerialization.h" #include "DataTypes/Serializations/ISerialization.h"
#include <base/IPv4andIPv6.h>
#include "base/types.h" #include "base/types.h"
#include <Common/formatIPv6.h>
#include <Common/Arena.h> #include <Common/Arena.h>
#include "AggregateFunctions/AggregateFunctionFactory.h" #include "AggregateFunctions/AggregateFunctionFactory.h"
@ -69,6 +71,31 @@ struct AggregateFunctionMapCombinatorData<String>
} }
}; };
/// Specialization for IPv6 - for historical reasons it should be stored as FixedString(16)
template <>
struct AggregateFunctionMapCombinatorData<IPv6>
{
struct IPv6Hash
{
using hash_type = std::hash<IPv6>;
using is_transparent = void;
size_t operator()(const IPv6 & ip) const { return hash_type{}(ip); }
};
using SearchType = IPv6;
std::unordered_map<IPv6, AggregateDataPtr, IPv6Hash, std::equal_to<>> merged_maps;
static void writeKey(const IPv6 & key, WriteBuffer & buf)
{
writeIPv6Binary(key, buf);
}
static void readKey(IPv6 & key, ReadBuffer & buf)
{
readIPv6Binary(key, buf);
}
};
template <typename KeyType> template <typename KeyType>
class AggregateFunctionMap final class AggregateFunctionMap final
: public IAggregateFunctionDataHelper<AggregateFunctionMapCombinatorData<KeyType>, AggregateFunctionMap<KeyType>> : public IAggregateFunctionDataHelper<AggregateFunctionMapCombinatorData<KeyType>, AggregateFunctionMap<KeyType>>
@ -147,6 +174,8 @@ public:
StringRef key_ref; StringRef key_ref;
if (key_type->getTypeId() == TypeIndex::FixedString) if (key_type->getTypeId() == TypeIndex::FixedString)
key_ref = assert_cast<const ColumnFixedString &>(key_column).getDataAt(offset + i); key_ref = assert_cast<const ColumnFixedString &>(key_column).getDataAt(offset + i);
else if (key_type->getTypeId() == TypeIndex::IPv6)
key_ref = assert_cast<const ColumnIPv6 &>(key_column).getDataAt(offset + i);
else else
key_ref = assert_cast<const ColumnString &>(key_column).getDataAt(offset + i); key_ref = assert_cast<const ColumnString &>(key_column).getDataAt(offset + i);

View File

@ -5,6 +5,7 @@
#include <Common/FieldVisitorConvertToNumber.h> #include <Common/FieldVisitorConvertToNumber.h>
#include <DataTypes/DataTypeDate.h> #include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDateTime.h> #include <DataTypes/DataTypeDateTime.h>
#include <DataTypes/DataTypeIPv4andIPv6.h>
static inline constexpr UInt64 TOP_K_MAX_SIZE = 0xFFFFFF; static inline constexpr UInt64 TOP_K_MAX_SIZE = 0xFFFFFF;
@ -60,6 +61,22 @@ public:
{} {}
}; };
template <bool is_weighted>
class AggregateFunctionTopKIPv4 : public AggregateFunctionTopK<DataTypeIPv4::FieldType, is_weighted>
{
public:
using AggregateFunctionTopK<DataTypeIPv4::FieldType, is_weighted>::AggregateFunctionTopK;
AggregateFunctionTopKIPv4(UInt64 threshold_, UInt64 load_factor, const DataTypes & argument_types_, const Array & params)
: AggregateFunctionTopK<DataTypeIPv4::FieldType, is_weighted>(
threshold_,
load_factor,
argument_types_,
params,
std::make_shared<DataTypeArray>(std::make_shared<DataTypeIPv4>()))
{}
};
template <bool is_weighted> template <bool is_weighted>
IAggregateFunction * createWithExtraTypes(const DataTypes & argument_types, UInt64 threshold, UInt64 load_factor, const Array & params) IAggregateFunction * createWithExtraTypes(const DataTypes & argument_types, UInt64 threshold, UInt64 load_factor, const Array & params)
@ -72,6 +89,8 @@ IAggregateFunction * createWithExtraTypes(const DataTypes & argument_types, UInt
return new AggregateFunctionTopKDate<is_weighted>(threshold, load_factor, argument_types, params); return new AggregateFunctionTopKDate<is_weighted>(threshold, load_factor, argument_types, params);
if (which.idx == TypeIndex::DateTime) if (which.idx == TypeIndex::DateTime)
return new AggregateFunctionTopKDateTime<is_weighted>(threshold, load_factor, argument_types, params); return new AggregateFunctionTopKDateTime<is_weighted>(threshold, load_factor, argument_types, params);
if (which.idx == TypeIndex::IPv4)
return new AggregateFunctionTopKIPv4<is_weighted>(threshold, load_factor, argument_types, params);
/// Check that we can use plain version of AggregateFunctionTopKGeneric /// Check that we can use plain version of AggregateFunctionTopKGeneric
if (argument_types[0]->isValueUnambiguouslyRepresentedInContiguousMemoryRegion()) if (argument_types[0]->isValueUnambiguouslyRepresentedInContiguousMemoryRegion())

View File

@ -8,6 +8,7 @@
#include <DataTypes/DataTypeDateTime.h> #include <DataTypes/DataTypeDateTime.h>
#include <DataTypes/DataTypeTuple.h> #include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeUUID.h> #include <DataTypes/DataTypeUUID.h>
#include <DataTypes/DataTypeIPv4andIPv6.h>
#include <Core/Settings.h> #include <Core/Settings.h>
@ -60,6 +61,10 @@ createAggregateFunctionUniq(const std::string & name, const DataTypes & argument
return std::make_shared<AggregateFunctionUniq<String, Data>>(argument_types); return std::make_shared<AggregateFunctionUniq<String, Data>>(argument_types);
else if (which.isUUID()) else if (which.isUUID())
return std::make_shared<AggregateFunctionUniq<DataTypeUUID::FieldType, Data>>(argument_types); return std::make_shared<AggregateFunctionUniq<DataTypeUUID::FieldType, Data>>(argument_types);
else if (which.isIPv4())
return std::make_shared<AggregateFunctionUniq<DataTypeIPv4::FieldType, Data>>(argument_types);
else if (which.isIPv6())
return std::make_shared<AggregateFunctionUniq<DataTypeIPv6::FieldType, Data>>(argument_types);
else if (which.isTuple()) else if (which.isTuple())
{ {
if (use_exact_hash_function) if (use_exact_hash_function)
@ -109,6 +114,10 @@ createAggregateFunctionUniq(const std::string & name, const DataTypes & argument
return std::make_shared<AggregateFunctionUniq<String, Data<String, is_able_to_parallelize_merge>>>(argument_types); return std::make_shared<AggregateFunctionUniq<String, Data<String, is_able_to_parallelize_merge>>>(argument_types);
else if (which.isUUID()) else if (which.isUUID())
return std::make_shared<AggregateFunctionUniq<DataTypeUUID::FieldType, Data<DataTypeUUID::FieldType, is_able_to_parallelize_merge>>>(argument_types); return std::make_shared<AggregateFunctionUniq<DataTypeUUID::FieldType, Data<DataTypeUUID::FieldType, is_able_to_parallelize_merge>>>(argument_types);
else if (which.isIPv4())
return std::make_shared<AggregateFunctionUniq<DataTypeIPv4::FieldType, Data<DataTypeIPv4::FieldType, is_able_to_parallelize_merge>>>(argument_types);
else if (which.isIPv6())
return std::make_shared<AggregateFunctionUniq<DataTypeIPv6::FieldType, Data<DataTypeIPv6::FieldType, is_able_to_parallelize_merge>>>(argument_types);
else if (which.isTuple()) else if (which.isTuple())
{ {
if (use_exact_hash_function) if (use_exact_hash_function)

View File

@ -101,6 +101,18 @@ struct AggregateFunctionUniqHLL12Data<UUID, false>
static String getName() { return "uniqHLL12"; } static String getName() { return "uniqHLL12"; }
}; };
template <>
struct AggregateFunctionUniqHLL12Data<IPv6, false>
{
using Set = HyperLogLogWithSmallSetOptimization<UInt64, 16, 12>;
Set set;
constexpr static bool is_able_to_parallelize_merge = false;
constexpr static bool is_variadic = false;
static String getName() { return "uniqHLL12"; }
};
template <bool is_exact_, bool argument_is_tuple_, bool is_able_to_parallelize_merge_> template <bool is_exact_, bool argument_is_tuple_, bool is_able_to_parallelize_merge_>
struct AggregateFunctionUniqHLL12DataForVariadic struct AggregateFunctionUniqHLL12DataForVariadic
{ {
@ -155,6 +167,25 @@ struct AggregateFunctionUniqExactData<String, is_able_to_parallelize_merge_>
static String getName() { return "uniqExact"; } static String getName() { return "uniqExact"; }
}; };
/// For historical reasons IPv6 is treated as FixedString(16)
template <bool is_able_to_parallelize_merge_>
struct AggregateFunctionUniqExactData<IPv6, is_able_to_parallelize_merge_>
{
using Key = UInt128;
/// When creating, the hash table must be small.
using SingleLevelSet = HashSet<Key, UInt128TrivialHash, HashTableGrower<3>, HashTableAllocatorWithStackMemory<sizeof(Key) * (1 << 3)>>;
using TwoLevelSet = TwoLevelHashSet<Key, UInt128TrivialHash>;
using Set = UniqExactSet<SingleLevelSet, TwoLevelSet>;
Set set;
constexpr static bool is_able_to_parallelize_merge = is_able_to_parallelize_merge_;
constexpr static bool is_variadic = false;
static String getName() { return "uniqExact"; }
};
template <bool is_exact_, bool argument_is_tuple_, bool is_able_to_parallelize_merge_> template <bool is_exact_, bool argument_is_tuple_, bool is_able_to_parallelize_merge_>
struct AggregateFunctionUniqExactDataForVariadic : AggregateFunctionUniqExactData<String, is_able_to_parallelize_merge_> struct AggregateFunctionUniqExactDataForVariadic : AggregateFunctionUniqExactData<String, is_able_to_parallelize_merge_>
{ {
@ -248,27 +279,22 @@ struct Adder
AggregateFunctionUniqUniquesHashSetData> || std::is_same_v<Data, AggregateFunctionUniqHLL12Data<T, Data::is_able_to_parallelize_merge>>) AggregateFunctionUniqUniquesHashSetData> || std::is_same_v<Data, AggregateFunctionUniqHLL12Data<T, Data::is_able_to_parallelize_merge>>)
{ {
const auto & column = *columns[0]; const auto & column = *columns[0];
if constexpr (!std::is_same_v<T, String>) if constexpr (std::is_same_v<T, String> || std::is_same_v<T, IPv6>)
{
StringRef value = column.getDataAt(row_num);
data.set.insert(CityHash_v1_0_2::CityHash64(value.data, value.size));
}
else
{ {
using ValueType = typename decltype(data.set)::value_type; using ValueType = typename decltype(data.set)::value_type;
const auto & value = assert_cast<const ColumnVector<T> &>(column).getElement(row_num); const auto & value = assert_cast<const ColumnVector<T> &>(column).getElement(row_num);
data.set.insert(static_cast<ValueType>(AggregateFunctionUniqTraits<T>::hash(value))); data.set.insert(static_cast<ValueType>(AggregateFunctionUniqTraits<T>::hash(value)));
} }
else
{
StringRef value = column.getDataAt(row_num);
data.set.insert(CityHash_v1_0_2::CityHash64(value.data, value.size));
}
} }
else if constexpr (std::is_same_v<Data, AggregateFunctionUniqExactData<T, Data::is_able_to_parallelize_merge>>) else if constexpr (std::is_same_v<Data, AggregateFunctionUniqExactData<T, Data::is_able_to_parallelize_merge>>)
{ {
const auto & column = *columns[0]; const auto & column = *columns[0];
if constexpr (!std::is_same_v<T, String>) if constexpr (std::is_same_v<T, String> || std::is_same_v<T, IPv6>)
{
data.set.template insert<const T &, use_single_level_hash_table>(
assert_cast<const ColumnVector<T> &>(column).getData()[row_num]);
}
else
{ {
StringRef value = column.getDataAt(row_num); StringRef value = column.getDataAt(row_num);
@ -279,6 +305,11 @@ struct Adder
data.set.template insert<const UInt128 &, use_single_level_hash_table>(key); data.set.template insert<const UInt128 &, use_single_level_hash_table>(key);
} }
else
{
data.set.template insert<const T &, use_single_level_hash_table>(
assert_cast<const ColumnVector<T> &>(column).getData()[row_num]);
}
} }
#if USE_DATASKETCHES #if USE_DATASKETCHES
else if constexpr (std::is_same_v<Data, AggregateFunctionUniqThetaData>) else if constexpr (std::is_same_v<Data, AggregateFunctionUniqThetaData>)

View File

@ -8,6 +8,7 @@
#include <DataTypes/DataTypeDate.h> #include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDate32.h> #include <DataTypes/DataTypeDate32.h>
#include <DataTypes/DataTypeDateTime.h> #include <DataTypes/DataTypeDateTime.h>
#include <DataTypes/DataTypeIPv4andIPv6.h>
#include <functional> #include <functional>
@ -60,6 +61,10 @@ namespace
return std::make_shared<typename WithK<K, HashValueType>::template AggregateFunction<String>>(argument_types, params); return std::make_shared<typename WithK<K, HashValueType>::template AggregateFunction<String>>(argument_types, params);
else if (which.isUUID()) else if (which.isUUID())
return std::make_shared<typename WithK<K, HashValueType>::template AggregateFunction<DataTypeUUID::FieldType>>(argument_types, params); return std::make_shared<typename WithK<K, HashValueType>::template AggregateFunction<DataTypeUUID::FieldType>>(argument_types, params);
else if (which.isIPv4())
return std::make_shared<typename WithK<K, HashValueType>::template AggregateFunction<DataTypeIPv4::FieldType>>(argument_types, params);
else if (which.isIPv6())
return std::make_shared<typename WithK<K, HashValueType>::template AggregateFunction<DataTypeIPv6::FieldType>>(argument_types, params);
else if (which.isTuple()) else if (which.isTuple())
{ {
if (use_exact_hash_function) if (use_exact_hash_function)

View File

@ -119,6 +119,10 @@ struct AggregateFunctionUniqCombinedData<String, K, HashValueType> : public Aggr
{ {
}; };
template <UInt8 K, typename HashValueType>
struct AggregateFunctionUniqCombinedData<IPv6, K, HashValueType> : public AggregateFunctionUniqCombinedDataWithKey<UInt64 /*always*/, K>
{
};
template <typename T, UInt8 K, typename HashValueType> template <typename T, UInt8 K, typename HashValueType>
class AggregateFunctionUniqCombined final class AggregateFunctionUniqCombined final
@ -141,16 +145,16 @@ public:
void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override
{ {
if constexpr (!std::is_same_v<T, String>) if constexpr (std::is_same_v<T, String> || std::is_same_v<T, IPv6>)
{
const auto & value = assert_cast<const ColumnVector<T> &>(*columns[0]).getElement(row_num);
this->data(place).set.insert(detail::AggregateFunctionUniqCombinedTraits<T, HashValueType>::hash(value));
}
else
{ {
StringRef value = columns[0]->getDataAt(row_num); StringRef value = columns[0]->getDataAt(row_num);
this->data(place).set.insert(CityHash_v1_0_2::CityHash64(value.data, value.size)); this->data(place).set.insert(CityHash_v1_0_2::CityHash64(value.data, value.size));
} }
else
{
const auto & value = assert_cast<const ColumnVector<T> &>(*columns[0]).getElement(row_num);
this->data(place).set.insert(detail::AggregateFunctionUniqCombinedTraits<T, HashValueType>::hash(value));
}
} }
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, Arena *) const override void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, Arena *) const override

View File

@ -1175,16 +1175,12 @@ ProfileInfo Connection::receiveProfileInfo() const
ParallelReadRequest Connection::receiveParallelReadRequest() const ParallelReadRequest Connection::receiveParallelReadRequest() const
{ {
ParallelReadRequest request; return ParallelReadRequest::deserialize(*in);
request.deserialize(*in);
return request;
} }
InitialAllRangesAnnouncement Connection::receiveInitialParallelReadAnnounecement() const InitialAllRangesAnnouncement Connection::receiveInitialParallelReadAnnounecement() const
{ {
InitialAllRangesAnnouncement announcement; return InitialAllRangesAnnouncement::deserialize(*in);
announcement.deserialize(*in);
return announcement;
} }

View File

@ -16,6 +16,10 @@
#include <boost/noncopyable.hpp> #include <boost/noncopyable.hpp>
#include <optional>
#include <vector>
#include <memory>
#include <string>
namespace DB namespace DB
{ {
@ -34,9 +38,9 @@ struct Packet
ProfileInfo profile_info; ProfileInfo profile_info;
std::vector<UUID> part_uuids; std::vector<UUID> part_uuids;
InitialAllRangesAnnouncement announcement; /// The part of parallel replicas protocol
ParallelReadRequest request; std::optional<InitialAllRangesAnnouncement> announcement;
ParallelReadResponse response; std::optional<ParallelReadRequest> request;
std::string server_timezone; std::string server_timezone;

View File

@ -49,8 +49,8 @@ static void validateChecksum(char * data, size_t size, const Checksum expected_c
/// TODO mess up of endianness in error message. /// TODO mess up of endianness in error message.
message << "Checksum doesn't match: corrupted data." message << "Checksum doesn't match: corrupted data."
" Reference: " + getHexUIntLowercase(expected_checksum.first) + getHexUIntLowercase(expected_checksum.second) " Reference: " + getHexUIntLowercase(expected_checksum.high64) + getHexUIntLowercase(expected_checksum.low64)
+ ". Actual: " + getHexUIntLowercase(calculated_checksum.first) + getHexUIntLowercase(calculated_checksum.second) + ". Actual: " + getHexUIntLowercase(calculated_checksum.high64) + getHexUIntLowercase(calculated_checksum.low64)
+ ". Size of compressed block: " + toString(size); + ". Size of compressed block: " + toString(size);
const char * message_hardware_failure = "This is most likely due to hardware failure. " const char * message_hardware_failure = "This is most likely due to hardware failure. "
@ -95,8 +95,8 @@ static void validateChecksum(char * data, size_t size, const Checksum expected_c
} }
/// Check if the difference caused by single bit flip in stored checksum. /// Check if the difference caused by single bit flip in stored checksum.
size_t difference = std::popcount(expected_checksum.first ^ calculated_checksum.first) size_t difference = std::popcount(expected_checksum.low64 ^ calculated_checksum.low64)
+ std::popcount(expected_checksum.second ^ calculated_checksum.second); + std::popcount(expected_checksum.high64 ^ calculated_checksum.high64);
if (difference == 1) if (difference == 1)
{ {
@ -194,8 +194,8 @@ size_t CompressedReadBufferBase::readCompressedData(size_t & size_decompressed,
{ {
Checksum checksum; Checksum checksum;
ReadBufferFromMemory checksum_in(own_compressed_buffer.data(), sizeof(checksum)); ReadBufferFromMemory checksum_in(own_compressed_buffer.data(), sizeof(checksum));
readBinaryLittleEndian(checksum.first, checksum_in); readBinaryLittleEndian(checksum.low64, checksum_in);
readBinaryLittleEndian(checksum.second, checksum_in); readBinaryLittleEndian(checksum.high64, checksum_in);
validateChecksum(compressed_buffer, size_compressed_without_checksum, checksum); validateChecksum(compressed_buffer, size_compressed_without_checksum, checksum);
} }
@ -238,8 +238,8 @@ size_t CompressedReadBufferBase::readCompressedDataBlockForAsynchronous(size_t &
{ {
Checksum checksum; Checksum checksum;
ReadBufferFromMemory checksum_in(own_compressed_buffer.data(), sizeof(checksum)); ReadBufferFromMemory checksum_in(own_compressed_buffer.data(), sizeof(checksum));
readBinaryLittleEndian(checksum.first, checksum_in); readBinaryLittleEndian(checksum.low64, checksum_in);
readBinaryLittleEndian(checksum.second, checksum_in); readBinaryLittleEndian(checksum.high64, checksum_in);
validateChecksum(compressed_buffer, size_compressed_without_checksum, checksum); validateChecksum(compressed_buffer, size_compressed_without_checksum, checksum);
} }

View File

@ -38,8 +38,8 @@ void CompressedWriteBuffer::nextImpl()
CityHash_v1_0_2::uint128 checksum = CityHash_v1_0_2::CityHash128(out_compressed_ptr, compressed_size); CityHash_v1_0_2::uint128 checksum = CityHash_v1_0_2::CityHash128(out_compressed_ptr, compressed_size);
writeBinaryLittleEndian(checksum.first, out); writeBinaryLittleEndian(checksum.low64, out);
writeBinaryLittleEndian(checksum.second, out); writeBinaryLittleEndian(checksum.high64, out);
out.position() += compressed_size; out.position() += compressed_size;
} }
@ -50,8 +50,8 @@ void CompressedWriteBuffer::nextImpl()
CityHash_v1_0_2::uint128 checksum = CityHash_v1_0_2::CityHash128(compressed_buffer.data(), compressed_size); CityHash_v1_0_2::uint128 checksum = CityHash_v1_0_2::CityHash128(compressed_buffer.data(), compressed_size);
writeBinaryLittleEndian(checksum.first, out); writeBinaryLittleEndian(checksum.low64, out);
writeBinaryLittleEndian(checksum.second, out); writeBinaryLittleEndian(checksum.high64, out);
out.write(compressed_buffer.data(), compressed_size); out.write(compressed_buffer.data(), compressed_size);
} }

View File

@ -9,6 +9,7 @@
#include <Common/logger_useful.h> #include <Common/logger_useful.h>
#include "libaccel_config.h" #include "libaccel_config.h"
#include <Common/MemorySanitizer.h> #include <Common/MemorySanitizer.h>
#include <base/scope_guard.h>
namespace DB namespace DB
{ {
@ -34,6 +35,7 @@ DeflateQplJobHWPool::DeflateQplJobHWPool()
// loop all configured workqueue size to get maximum job number. // loop all configured workqueue size to get maximum job number.
accfg_ctx * ctx_ptr = nullptr; accfg_ctx * ctx_ptr = nullptr;
auto ctx_status = accfg_new(&ctx_ptr); auto ctx_status = accfg_new(&ctx_ptr);
SCOPE_EXIT({ accfg_unref(ctx_ptr); });
if (ctx_status == 0) if (ctx_status == 0)
{ {
auto * dev_ptr = accfg_device_get_first(ctx_ptr); auto * dev_ptr = accfg_device_get_first(ctx_ptr);

View File

@ -27,7 +27,7 @@ namespace DB
using UUID = StrongTypedef<UInt128, struct UUIDTag>; using UUID = StrongTypedef<UInt128, struct UUIDTag>;
using IPv4 = StrongTypedef<UInt32, struct IPv4Tag>; struct IPv4;
struct IPv6; struct IPv6;

View File

@ -69,7 +69,7 @@ void DataTypeMap::assertKeyType() const
if (!checkKeyType(key_type)) if (!checkKeyType(key_type))
throw Exception(ErrorCodes::BAD_ARGUMENTS, throw Exception(ErrorCodes::BAD_ARGUMENTS,
"Type of Map key must be a type, that can be represented by integer " "Type of Map key must be a type, that can be represented by integer "
"or String or FixedString (possibly LowCardinality) or UUID," "or String or FixedString (possibly LowCardinality) or UUID or IPv6,"
" but {} given", key_type->getName()); " but {} given", key_type->getName());
} }
@ -120,6 +120,7 @@ bool DataTypeMap::checkKeyType(DataTypePtr key_type)
else if (!key_type->isValueRepresentedByInteger() else if (!key_type->isValueRepresentedByInteger()
&& !isStringOrFixedString(*key_type) && !isStringOrFixedString(*key_type)
&& !WhichDataType(key_type).isNothing() && !WhichDataType(key_type).isNothing()
&& !WhichDataType(key_type).isIPv6()
&& !WhichDataType(key_type).isUUID()) && !WhichDataType(key_type).isUUID())
{ {
return false; return false;

View File

@ -63,6 +63,7 @@ namespace ErrorCodes
extern const int INCORRECT_DATA; extern const int INCORRECT_DATA;
extern const int TOO_LARGE_STRING_SIZE; extern const int TOO_LARGE_STRING_SIZE;
extern const int TOO_LARGE_ARRAY_SIZE; extern const int TOO_LARGE_ARRAY_SIZE;
extern const int SIZE_OF_FIXED_STRING_DOESNT_MATCH;
} }
/// Helper functions for formatted input. /// Helper functions for formatted input.
@ -138,6 +139,19 @@ inline void readStringBinary(std::string & s, ReadBuffer & buf, size_t max_strin
buf.readStrict(s.data(), size); buf.readStrict(s.data(), size);
} }
/// For historical reasons we store IPv6 as a String
inline void readIPv6Binary(IPv6 & ip, ReadBuffer & buf)
{
size_t size = 0;
readVarUInt(size, buf);
if (size != IPV6_BINARY_LENGTH)
throw Exception(ErrorCodes::SIZE_OF_FIXED_STRING_DOESNT_MATCH,
"Size of the string {} doesn't match size of binary IPv6 {}", size, IPV6_BINARY_LENGTH);
buf.readStrict(reinterpret_cast<char*>(&ip.toUnderType()), size);
}
template <typename T> template <typename T>
void readVectorBinary(std::vector<T> & v, ReadBuffer & buf) void readVectorBinary(std::vector<T> & v, ReadBuffer & buf)
{ {

View File

@ -10,6 +10,7 @@
#include <pcg-random/pcg_random.hpp> #include <pcg-random/pcg_random.hpp>
#include "Common/formatIPv6.h"
#include <Common/DateLUT.h> #include <Common/DateLUT.h>
#include <Common/LocalDate.h> #include <Common/LocalDate.h>
#include <Common/LocalDateTime.h> #include <Common/LocalDateTime.h>
@ -105,6 +106,13 @@ inline void writeStringBinary(const std::string & s, WriteBuffer & buf)
buf.write(s.data(), s.size()); buf.write(s.data(), s.size());
} }
/// For historical reasons we store IPv6 as a String
inline void writeIPv6Binary(const IPv6 & ip, WriteBuffer & buf)
{
writeVarUInt(IPV6_BINARY_LENGTH, buf);
buf.write(reinterpret_cast<const char *>(&ip.toUnderType()), IPV6_BINARY_LENGTH);
}
inline void writeStringBinary(StringRef s, WriteBuffer & buf) inline void writeStringBinary(StringRef s, WriteBuffer & buf)
{ {
writeVarUInt(s.size, buf); writeVarUInt(s.size, buf);

View File

@ -1435,6 +1435,9 @@ FutureSetPtr ActionsMatcher::makeSet(const ASTFunction & node, Data & data, bool
if (table) if (table)
{ {
if (auto set = data.prepared_sets->findStorage(set_key))
return set;
if (StorageSet * storage_set = dynamic_cast<StorageSet *>(table.get())) if (StorageSet * storage_set = dynamic_cast<StorageSet *>(table.get()))
return data.prepared_sets->addFromStorage(set_key, storage_set->getSet()); return data.prepared_sets->addFromStorage(set_key, storage_set->getSet());
} }

View File

@ -216,8 +216,24 @@ void DatabaseCatalog::shutdownImpl()
/// We still hold "databases" (instead of std::move) for Buffer tables to flush data correctly. /// We still hold "databases" (instead of std::move) for Buffer tables to flush data correctly.
/// Delay shutdown of temporary and system databases. They will be shutdown last.
/// Because some databases might use them until their shutdown is called, but calling shutdown
/// on temporary database means clearing its set of tables, which will lead to unnecessary errors like "table not found".
std::vector<DatabasePtr> databases_with_delayed_shutdown;
for (auto & database : current_databases) for (auto & database : current_databases)
{
if (database.first == TEMPORARY_DATABASE || database.first == SYSTEM_DATABASE)
{
databases_with_delayed_shutdown.push_back(database.second);
continue;
}
database.second->shutdown(); database.second->shutdown();
}
for (auto & database : databases_with_delayed_shutdown)
{
database->shutdown();
}
{ {
std::lock_guard lock(tables_marked_dropped_mutex); std::lock_guard lock(tables_marked_dropped_mutex);

View File

@ -434,11 +434,13 @@ RemoteQueryExecutor::ReadResult RemoteQueryExecutor::processPacket(Packet packet
switch (packet.type) switch (packet.type)
{ {
case Protocol::Server::MergeTreeReadTaskRequest: case Protocol::Server::MergeTreeReadTaskRequest:
processMergeTreeReadTaskRequest(packet.request); chassert(packet.request.has_value());
processMergeTreeReadTaskRequest(packet.request.value());
return ReadResult(ReadResult::Type::ParallelReplicasToken); return ReadResult(ReadResult::Type::ParallelReplicasToken);
case Protocol::Server::MergeTreeAllRangesAnnounecement: case Protocol::Server::MergeTreeAllRangesAnnounecement:
processMergeTreeInitialReadAnnounecement(packet.announcement); chassert(packet.announcement.has_value());
processMergeTreeInitialReadAnnounecement(packet.announcement.value());
return ReadResult(ReadResult::Type::ParallelReplicasToken); return ReadResult(ReadResult::Type::ParallelReplicasToken);
case Protocol::Server::ReadTaskRequest: case Protocol::Server::ReadTaskRequest:

View File

@ -40,8 +40,8 @@ DistributedAsyncInsertHeader DistributedAsyncInsertHeader::read(ReadBufferFromFi
{ {
throw Exception(ErrorCodes::CHECKSUM_DOESNT_MATCH, throw Exception(ErrorCodes::CHECKSUM_DOESNT_MATCH,
"Checksum of extra info doesn't match: corrupted data. Reference: {}{}. Actual: {}{}.", "Checksum of extra info doesn't match: corrupted data. Reference: {}{}. Actual: {}{}.",
getHexUIntLowercase(expected_checksum.first), getHexUIntLowercase(expected_checksum.second), getHexUIntLowercase(expected_checksum.high64), getHexUIntLowercase(expected_checksum.low64),
getHexUIntLowercase(calculated_checksum.first), getHexUIntLowercase(calculated_checksum.second)); getHexUIntLowercase(calculated_checksum.high64), getHexUIntLowercase(calculated_checksum.low64));
} }
/// Read the parts of the header. /// Read the parts of the header.

View File

@ -400,7 +400,7 @@ void DataPartStorageOnDiskBase::backup(
if (it != checksums.files.end()) if (it != checksums.files.end())
{ {
file_size = it->second.file_size; file_size = it->second.file_size;
file_hash = {it->second.file_hash.first, it->second.file_hash.second}; file_hash = it->second.file_hash;
} }
BackupEntryPtr backup_entry = std::make_unique<BackupEntryFromImmutableFile>(disk, filepath_on_disk, copy_encrypted, file_size, file_hash); BackupEntryPtr backup_entry = std::make_unique<BackupEntryFromImmutableFile>(disk, filepath_on_disk, copy_encrypted, file_size, file_hash);

View File

@ -154,9 +154,9 @@ bool MergeTreeDataPartChecksums::readV2(ReadBuffer & in)
assertString("\n\tsize: ", in); assertString("\n\tsize: ", in);
readText(sum.file_size, in); readText(sum.file_size, in);
assertString("\n\thash: ", in); assertString("\n\thash: ", in);
readText(sum.file_hash.first, in); readText(sum.file_hash.low64, in);
assertString(" ", in); assertString(" ", in);
readText(sum.file_hash.second, in); readText(sum.file_hash.high64, in);
assertString("\n\tcompressed: ", in); assertString("\n\tcompressed: ", in);
readText(sum.is_compressed, in); readText(sum.is_compressed, in);
if (sum.is_compressed) if (sum.is_compressed)
@ -164,9 +164,9 @@ bool MergeTreeDataPartChecksums::readV2(ReadBuffer & in)
assertString("\n\tuncompressed size: ", in); assertString("\n\tuncompressed size: ", in);
readText(sum.uncompressed_size, in); readText(sum.uncompressed_size, in);
assertString("\n\tuncompressed hash: ", in); assertString("\n\tuncompressed hash: ", in);
readText(sum.uncompressed_hash.first, in); readText(sum.uncompressed_hash.low64, in);
assertString(" ", in); assertString(" ", in);
readText(sum.uncompressed_hash.second, in); readText(sum.uncompressed_hash.high64, in);
} }
assertChar('\n', in); assertChar('\n', in);

View File

@ -6,6 +6,7 @@
#include <Common/Stopwatch.h> #include <Common/Stopwatch.h>
#include <Common/formatReadable.h> #include <Common/formatReadable.h>
#include <Common/logger_useful.h> #include <Common/logger_useful.h>
#include <Storages/MergeTree/RequestResponse.h>
namespace ProfileEvents namespace ProfileEvents
@ -433,8 +434,12 @@ MergeTreeReadTaskPtr MergeTreeReadPoolParallelReplicas::getTask(size_t thread)
if (buffered_ranges.empty()) if (buffered_ranges.empty())
{ {
auto result = extension.callback(ParallelReadRequest{ auto result = extension.callback(ParallelReadRequest(
.replica_num = extension.number_of_current_replica, .min_number_of_marks = min_marks_for_concurrent_read * threads}); CoordinationMode::Default,
extension.number_of_current_replica,
min_marks_for_concurrent_read * threads,
/// For Default coordination mode we don't need to pass part names.
RangesInDataPartsDescription{}));
if (!result || result->finish) if (!result || result->finish)
{ {
@ -529,12 +534,12 @@ MarkRanges MergeTreeInOrderReadPoolParallelReplicas::getNewTask(RangesInDataPart
if (no_more_tasks) if (no_more_tasks)
return {}; return {};
auto response = extension.callback(ParallelReadRequest{ auto response = extension.callback(ParallelReadRequest(
.mode = mode, mode,
.replica_num = extension.number_of_current_replica, extension.number_of_current_replica,
.min_number_of_marks = min_marks_for_concurrent_read * request.size(), min_marks_for_concurrent_read * request.size(),
.description = request, request
}); ));
if (!response || response->description.empty() || response->finish) if (!response || response->description.empty() || response->finish)
{ {

View File

@ -193,10 +193,11 @@ public:
predict_block_size_bytes, column_names, virtual_column_names, prewhere_info, predict_block_size_bytes, column_names, virtual_column_names, prewhere_info,
actions_settings, reader_settings, per_part_params); actions_settings, reader_settings, per_part_params);
extension.all_callback({ extension.all_callback(InitialAllRangesAnnouncement(
.description = parts_ranges.getDescriptions(), CoordinationMode::Default,
.replica_num = extension.number_of_current_replica parts_ranges.getDescriptions(),
}); extension.number_of_current_replica
));
} }
~MergeTreeReadPoolParallelReplicas() override; ~MergeTreeReadPoolParallelReplicas() override;
@ -253,10 +254,11 @@ public:
for (const auto & part : parts_ranges) for (const auto & part : parts_ranges)
buffered_tasks.push_back({part.data_part->info, MarkRanges{}}); buffered_tasks.push_back({part.data_part->info, MarkRanges{}});
extension.all_callback({ extension.all_callback(InitialAllRangesAnnouncement(
.description = parts_ranges.getDescriptions(), mode,
.replica_num = extension.number_of_current_replica parts_ranges.getDescriptions(),
}); extension.number_of_current_replica
));
} }
MarkRanges getNewTask(RangesInDataPartDescription description); MarkRanges getNewTask(RangesInDataPartDescription description);

View File

@ -102,7 +102,6 @@ public:
explicit DefaultCoordinator(size_t replicas_count_) explicit DefaultCoordinator(size_t replicas_count_)
: ParallelReplicasReadingCoordinator::ImplInterface(replicas_count_) : ParallelReplicasReadingCoordinator::ImplInterface(replicas_count_)
, announcements(replicas_count_)
, reading_state(replicas_count_) , reading_state(replicas_count_)
{ {
} }
@ -119,7 +118,6 @@ public:
PartitionToBlockRanges partitions; PartitionToBlockRanges partitions;
size_t sent_initial_requests{0}; size_t sent_initial_requests{0};
std::vector<InitialAllRangesAnnouncement> announcements;
Parts all_parts_to_read; Parts all_parts_to_read;
/// Contains only parts which we haven't started to read from /// Contains only parts which we haven't started to read from

View File

@ -250,8 +250,8 @@ std::unordered_map<String, IPartMetadataManager::uint128> PartMetadataManagerWit
ErrorCodes::CORRUPTED_DATA, ErrorCodes::CORRUPTED_DATA,
"Checksums doesn't match in part {} for {}. Expected: {}. Found {}.", "Checksums doesn't match in part {} for {}. Expected: {}. Found {}.",
part->name, file_path, part->name, file_path,
getHexUIntUppercase(disk_checksum.first) + getHexUIntUppercase(disk_checksum.second), getHexUIntUppercase(disk_checksum.high64) + getHexUIntUppercase(disk_checksum.low64),
getHexUIntUppercase(cache_checksums[i].first) + getHexUIntUppercase(cache_checksums[i].second)); getHexUIntUppercase(cache_checksums[i].high64) + getHexUIntUppercase(cache_checksums[i].low64));
disk_checksums.push_back(disk_checksum); disk_checksums.push_back(disk_checksum);
continue; continue;
@ -287,8 +287,8 @@ std::unordered_map<String, IPartMetadataManager::uint128> PartMetadataManagerWit
ErrorCodes::CORRUPTED_DATA, ErrorCodes::CORRUPTED_DATA,
"Checksums doesn't match in projection part {} {}. Expected: {}. Found {}.", "Checksums doesn't match in projection part {} {}. Expected: {}. Found {}.",
part->name, proj_name, part->name, proj_name,
getHexUIntUppercase(disk_checksum.first) + getHexUIntUppercase(disk_checksum.second), getHexUIntUppercase(disk_checksum.high64) + getHexUIntUppercase(disk_checksum.low64),
getHexUIntUppercase(cache_checksums[i].first) + getHexUIntUppercase(cache_checksums[i].second)); getHexUIntUppercase(cache_checksums[i].high64) + getHexUIntUppercase(cache_checksums[i].low64));
disk_checksums.push_back(disk_checksum); disk_checksums.push_back(disk_checksum);
} }
return results; return results;

View File

@ -51,7 +51,7 @@ String ParallelReadRequest::describe() const
return result; return result;
} }
void ParallelReadRequest::deserialize(ReadBuffer & in) ParallelReadRequest ParallelReadRequest::deserialize(ReadBuffer & in)
{ {
UInt64 version; UInt64 version;
readIntBinary(version, in); readIntBinary(version, in);
@ -60,12 +60,24 @@ void ParallelReadRequest::deserialize(ReadBuffer & in)
"from replicas differ. Got: {}, supported version: {}", "from replicas differ. Got: {}, supported version: {}",
version, DBMS_PARALLEL_REPLICAS_PROTOCOL_VERSION); version, DBMS_PARALLEL_REPLICAS_PROTOCOL_VERSION);
CoordinationMode mode;
size_t replica_num;
size_t min_number_of_marks;
RangesInDataPartsDescription description;
uint8_t mode_candidate; uint8_t mode_candidate;
readIntBinary(mode_candidate, in); readIntBinary(mode_candidate, in);
mode = validateAndGet(mode_candidate); mode = validateAndGet(mode_candidate);
readIntBinary(replica_num, in); readIntBinary(replica_num, in);
readIntBinary(min_number_of_marks, in); readIntBinary(min_number_of_marks, in);
description.deserialize(in); description.deserialize(in);
return ParallelReadRequest(
mode,
replica_num,
min_number_of_marks,
std::move(description)
);
} }
void ParallelReadRequest::merge(ParallelReadRequest & other) void ParallelReadRequest::merge(ParallelReadRequest & other)
@ -125,7 +137,7 @@ String InitialAllRangesAnnouncement::describe()
return result; return result;
} }
void InitialAllRangesAnnouncement::deserialize(ReadBuffer & in) InitialAllRangesAnnouncement InitialAllRangesAnnouncement::deserialize(ReadBuffer & in)
{ {
UInt64 version; UInt64 version;
readIntBinary(version, in); readIntBinary(version, in);
@ -134,11 +146,21 @@ void InitialAllRangesAnnouncement::deserialize(ReadBuffer & in)
"from replicas differ. Got: {}, supported version: {}", "from replicas differ. Got: {}, supported version: {}",
version, DBMS_PARALLEL_REPLICAS_PROTOCOL_VERSION); version, DBMS_PARALLEL_REPLICAS_PROTOCOL_VERSION);
CoordinationMode mode;
RangesInDataPartsDescription description;
size_t replica_num;
uint8_t mode_candidate; uint8_t mode_candidate;
readIntBinary(mode_candidate, in); readIntBinary(mode_candidate, in);
mode = validateAndGet(mode_candidate); mode = validateAndGet(mode_candidate);
description.deserialize(in); description.deserialize(in);
readIntBinary(replica_num, in); readIntBinary(replica_num, in);
return InitialAllRangesAnnouncement {
mode,
description,
replica_num
};
} }
} }

View File

@ -40,21 +40,40 @@ struct PartBlockRange
} }
}; };
/// ParallelReadRequest is used by remote replicas during parallel read
/// to signal an initiator that they need more marks to read.
struct ParallelReadRequest struct ParallelReadRequest
{ {
/// No default constructor, you must initialize all fields at once.
ParallelReadRequest(
CoordinationMode mode_,
size_t replica_num_,
size_t min_number_of_marks_,
RangesInDataPartsDescription description_)
: mode(mode_)
, replica_num(replica_num_)
, min_number_of_marks(min_number_of_marks_)
, description(std::move(description_))
{}
CoordinationMode mode; CoordinationMode mode;
size_t replica_num; size_t replica_num;
size_t min_number_of_marks; size_t min_number_of_marks;
/// Extension for Ordered (InOrder or ReverseOrder) mode
/// Extension for ordered mode /// Contains only data part names without mark ranges.
RangesInDataPartsDescription description; RangesInDataPartsDescription description;
void serialize(WriteBuffer & out) const; void serialize(WriteBuffer & out) const;
String describe() const; String describe() const;
void deserialize(ReadBuffer & in); static ParallelReadRequest deserialize(ReadBuffer & in);
void merge(ParallelReadRequest & other); void merge(ParallelReadRequest & other);
}; };
/// ParallelReadResponse is used by an initiator to tell
/// remote replicas about what to read during parallel reading.
/// Additionally contains information whether there are more available
/// marks to read (whether it is the last packet or not).
struct ParallelReadResponse struct ParallelReadResponse
{ {
bool finish{false}; bool finish{false};
@ -66,15 +85,30 @@ struct ParallelReadResponse
}; };
/// The set of parts (their names) along with ranges to read which is sent back
/// to the initiator by remote replicas during parallel reading.
/// Additionally contains an identifier (replica_num) plus
/// the reading algorithm chosen (Default, InOrder or ReverseOrder).
struct InitialAllRangesAnnouncement struct InitialAllRangesAnnouncement
{ {
/// No default constructor, you must initialize all fields at once.
InitialAllRangesAnnouncement(
CoordinationMode mode_,
RangesInDataPartsDescription description_,
size_t replica_num_)
: mode(mode_)
, description(description_)
, replica_num(replica_num_)
{}
CoordinationMode mode; CoordinationMode mode;
RangesInDataPartsDescription description; RangesInDataPartsDescription description;
size_t replica_num; size_t replica_num;
void serialize(WriteBuffer & out) const; void serialize(WriteBuffer & out) const;
String describe(); String describe();
void deserialize(ReadBuffer & in); static InitialAllRangesAnnouncement deserialize(ReadBuffer & in);
}; };

View File

@ -252,17 +252,17 @@ void StorageSystemParts::processNextStorage(
if (columns_mask[src_index++]) if (columns_mask[src_index++])
{ {
auto checksum = helper.hash_of_all_files; auto checksum = helper.hash_of_all_files;
columns[res_index++]->insert(getHexUIntLowercase(checksum.first) + getHexUIntLowercase(checksum.second)); columns[res_index++]->insert(getHexUIntLowercase(checksum.high64) + getHexUIntLowercase(checksum.low64));
} }
if (columns_mask[src_index++]) if (columns_mask[src_index++])
{ {
auto checksum = helper.hash_of_uncompressed_files; auto checksum = helper.hash_of_uncompressed_files;
columns[res_index++]->insert(getHexUIntLowercase(checksum.first) + getHexUIntLowercase(checksum.second)); columns[res_index++]->insert(getHexUIntLowercase(checksum.high64) + getHexUIntLowercase(checksum.low64));
} }
if (columns_mask[src_index++]) if (columns_mask[src_index++])
{ {
auto checksum = helper.uncompressed_hash_of_compressed_files; auto checksum = helper.uncompressed_hash_of_compressed_files;
columns[res_index++]->insert(getHexUIntLowercase(checksum.first) + getHexUIntLowercase(checksum.second)); columns[res_index++]->insert(getHexUIntLowercase(checksum.high64) + getHexUIntLowercase(checksum.low64));
} }
} }

View File

@ -221,17 +221,17 @@ void StorageSystemProjectionParts::processNextStorage(
if (columns_mask[src_index++]) if (columns_mask[src_index++])
{ {
auto checksum = helper.hash_of_all_files; auto checksum = helper.hash_of_all_files;
columns[res_index++]->insert(getHexUIntLowercase(checksum.first) + getHexUIntLowercase(checksum.second)); columns[res_index++]->insert(getHexUIntLowercase(checksum.high64) + getHexUIntLowercase(checksum.low64));
} }
if (columns_mask[src_index++]) if (columns_mask[src_index++])
{ {
auto checksum = helper.hash_of_uncompressed_files; auto checksum = helper.hash_of_uncompressed_files;
columns[res_index++]->insert(getHexUIntLowercase(checksum.first) + getHexUIntLowercase(checksum.second)); columns[res_index++]->insert(getHexUIntLowercase(checksum.high64) + getHexUIntLowercase(checksum.low64));
} }
if (columns_mask[src_index++]) if (columns_mask[src_index++])
{ {
auto checksum = helper.uncompressed_hash_of_compressed_files; auto checksum = helper.uncompressed_hash_of_compressed_files;
columns[res_index++]->insert(getHexUIntLowercase(checksum.first) + getHexUIntLowercase(checksum.second)); columns[res_index++]->insert(getHexUIntLowercase(checksum.high64) + getHexUIntLowercase(checksum.low64));
} }
} }

View File

@ -1,5 +1,5 @@
from pathlib import Path from pathlib import Path
from typing import Dict, List from typing import Dict, List, Optional
import os import os
import logging import logging
@ -58,14 +58,19 @@ def upload_results(
test_results: TestResults, test_results: TestResults,
additional_files: List[str], additional_files: List[str],
check_name: str, check_name: str,
additional_urls: Optional[List[str]] = None,
) -> str: ) -> str:
normalized_check_name = check_name.lower() normalized_check_name = check_name.lower()
for r in ((" ", "_"), ("(", "_"), (")", "_"), (",", "_"), ("/", "_")): for r in ((" ", "_"), ("(", "_"), (")", "_"), (",", "_"), ("/", "_")):
normalized_check_name = normalized_check_name.replace(*r) normalized_check_name = normalized_check_name.replace(*r)
# Preserve additional_urls to not modify the original one
original_additional_urls = additional_urls or []
s3_path_prefix = f"{pr_number}/{commit_sha}/{normalized_check_name}" s3_path_prefix = f"{pr_number}/{commit_sha}/{normalized_check_name}"
additional_urls = process_logs( additional_urls = process_logs(
s3_client, additional_files, s3_path_prefix, test_results s3_client, additional_files, s3_path_prefix, test_results
) )
additional_urls.extend(original_additional_urls)
branch_url = f"{GITHUB_SERVER_URL}/{GITHUB_REPOSITORY}/commits/master" branch_url = f"{GITHUB_SERVER_URL}/{GITHUB_REPOSITORY}/commits/master"
branch_name = "master" branch_name = "master"

View File

@ -40,7 +40,7 @@
<operation_timeout_ms>10000</operation_timeout_ms> <operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms> <session_timeout_ms>30000</session_timeout_ms>
<heart_beat_interval_ms>1000</heart_beat_interval_ms> <heart_beat_interval_ms>1000</heart_beat_interval_ms>
<election_timeout_lower_bound_ms>4000</election_timeout_lower_bound_ms> <election_timeout_lower_bound_ms>2000</election_timeout_lower_bound_ms>
<election_timeout_upper_bound_ms>5000</election_timeout_upper_bound_ms> <election_timeout_upper_bound_ms>5000</election_timeout_upper_bound_ms>
<raft_logs_level>information</raft_logs_level> <raft_logs_level>information</raft_logs_level>
<force_sync>false</force_sync> <force_sync>false</force_sync>

View File

@ -231,6 +231,9 @@ class _NetworkManager:
def _ensure_container(self): def _ensure_container(self):
if self._container is None or self._container_expire_time <= time.time(): if self._container is None or self._container_expire_time <= time.time():
image_name = "clickhouse/integration-helper:" + os.getenv(
"DOCKER_HELPER_TAG", "latest"
)
for i in range(5): for i in range(5):
if self._container is not None: if self._container is not None:
try: try:
@ -247,7 +250,7 @@ class _NetworkManager:
time.sleep(i) time.sleep(i)
image = subprocess.check_output( image = subprocess.check_output(
"docker images -q clickhouse/integration-helper 2>/dev/null", shell=True f"docker images -q {image_name} 2>/dev/null", shell=True
) )
if not image.strip(): if not image.strip():
print("No network image helper, will try download") print("No network image helper, will try download")
@ -256,16 +259,16 @@ class _NetworkManager:
for i in range(5): for i in range(5):
try: try:
subprocess.check_call( # STYLE_CHECK_ALLOW_SUBPROCESS_CHECK_CALL subprocess.check_call( # STYLE_CHECK_ALLOW_SUBPROCESS_CHECK_CALL
"docker pull clickhouse/integration-helper", shell=True f"docker pull {image_name}", shell=True
) )
break break
except: except:
time.sleep(i) time.sleep(i)
else: else:
raise Exception("Cannot pull clickhouse/integration-helper image") raise Exception(f"Cannot pull {image_name} image")
self._container = self._docker_client.containers.run( self._container = self._docker_client.containers.run(
"clickhouse/integration-helper", image_name,
auto_remove=True, auto_remove=True,
command=("sleep %s" % self.container_exit_timeout), command=("sleep %s" % self.container_exit_timeout),
# /run/xtables.lock passed inside for correct iptables --wait # /run/xtables.lock passed inside for correct iptables --wait

View File

@ -336,6 +336,8 @@ if __name__ == "__main__":
env_tags += "-e {}={} ".format("DOCKER_MYSQL_PHP_CLIENT_TAG", tag) env_tags += "-e {}={} ".format("DOCKER_MYSQL_PHP_CLIENT_TAG", tag)
elif image == "clickhouse/postgresql-java-client": elif image == "clickhouse/postgresql-java-client":
env_tags += "-e {}={} ".format("DOCKER_POSTGRESQL_JAVA_CLIENT_TAG", tag) env_tags += "-e {}={} ".format("DOCKER_POSTGRESQL_JAVA_CLIENT_TAG", tag)
elif image == "clickhouse/integration-helper":
env_tags += "-e {}={} ".format("DOCKER_HELPER_TAG", tag)
elif image == "clickhouse/integration-test": elif image == "clickhouse/integration-test":
env_tags += "-e {}={} ".format("DOCKER_BASE_TAG", tag) env_tags += "-e {}={} ".format("DOCKER_BASE_TAG", tag)
elif image == "clickhouse/kerberized-hadoop": elif image == "clickhouse/kerberized-hadoop":

View File

@ -846,7 +846,7 @@ def test_start_stop_moves(start_cluster, name, engine):
node1.query("SYSTEM START MOVES {}".format(name)) node1.query("SYSTEM START MOVES {}".format(name))
# wait sometime until background backoff finishes # wait sometime until background backoff finishes
retry = 30 retry = 60
i = 0 i = 0
while not sum(1 for x in used_disks if x == "jbod1") <= 2 and i < retry: while not sum(1 for x in used_disks if x == "jbod1") <= 2 and i < retry:
time.sleep(1) time.sleep(1)

View File

@ -1 +1 @@
20000101_1_1_0 test_00961 b5fce9c4ef1ca42ce4ed027389c208d2 fc3b062b646cd23d4c23d7f5920f89ae da96ff1e527a8a1f908ddf2b1d0af239 20000101_1_1_0 test_00961 e4ed027389c208d2b5fce9c4ef1ca42c 4c23d7f5920f89aefc3b062b646cd23d 908ddf2b1d0af239da96ff1e527a8a1f

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,170 @@
-- Tags: no-parallel, no-fasttest
{# this test checks backward compatibility of aggregate functions States against IPv4, IPv6 types #}
{% set ip4_generator = "select num::UInt32::IPv4 ip from (select arrayJoin(range(999999999, number)) as num from numbers(999999999,50)) order by ip" %}
{% set ip6_generator = "SELECT toIPv6(IPv6NumToString(toFixedString(reinterpretAsFixedString(num)||reinterpretAsFixedString(num), 16))) AS ip FROM (select arrayJoin(range(1010011101, number)) as num from numbers(1010011101,50)) order by ip" %}
{% set ip_generators = {'ip4': ip4_generator, 'ip6': ip6_generator} %}
{% set agg_func_list = [ "min", "max", "first_value", "last_value", "topK", "groupArray", "groupUniqArray", "uniq", "uniqExact", "uniqCombined", "uniqCombined64", "uniqHLL12", "uniqTheta" ] %}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- hash / State / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
cityHash64(hex( {{ func }}State(ip) )) AS {{ func }}State{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- finalizeAggregation / State / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
finalizeAggregation( {{ func }}State(ip) ) AS {{ func }}{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- hash / IfState / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
cityHash64(hex( {{ func }}IfState(ip, 1) )) AS {{ func }}IfState{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- finalizeAggregation / IfState / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
finalizeAggregation( {{ func }}IfState(ip, 1) ) AS {{ func }}{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{% set agg_func_list = [ "argMin", "argMax" ] %}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- Arg / hash / State / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
cityHash64(hex( {{ func }}State(ip, ip) )) AS {{ func }}State{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- Arg / finalizeAggregation / State / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
finalizeAggregation( {{ func }}State(ip, ip) ) AS {{ func }}State{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{# let's test functions with not deterministic result against 1 row, to make it deterministic #}
{% set ip4_generator = "select number::UInt32::IPv4 ip from numbers(999999999,1) order by ip" %}
{% set ip6_generator = "SELECT toIPv6(IPv6NumToString(toFixedString(reinterpretAsFixedString(number)||reinterpretAsFixedString(number), 16))) AS ip FROM numbers(1010011101, 1) order by ip" %}
{% set ip_generators = {'ip4': ip4_generator, 'ip6': ip6_generator} %}
{% set agg_func_list = [ "any", "anyHeavy", "anyLast" ] %}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- hash / State / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
cityHash64(hex( {{ func }}State(ip) )) AS {{ func }}State{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- finalizeAggregation / State / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
finalizeAggregation( {{ func }}State(ip) ) AS {{ func }}{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{% set agg_func_list = [ "sumMap", "minMap", "maxMap" ] %}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- Map/Map hash / State / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
cityHash64(hex( {{ func }}State(map(ip, 1::Int64)) )) AS {{ func }}State{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- Map/Map finalizeAggregation / State / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
finalizeAggregation( {{ func }}State(map(ip, 1::Int64)) ) AS {{ func }}{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- Map/Array hash / State / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
cityHash64(hex( {{ func }}State([ip], [1::Int64]) )) AS {{ func }}State{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}
{% for generator_name, ip_generator in ip_generators.items() %}
select '----- Map/Array finalizeAggregation / State / {{ generator_name }} -----';
select
{% for func in agg_func_list -%}
finalizeAggregation( {{ func }}State([ip], [1::Int64]) ) AS {{ func }}{{ "," if not loop.last }}
{% endfor -%}
from ( {{ ip_generator }} ) format Vertical;
{% endfor -%}

View File

@ -0,0 +1,17 @@
DROP TABLE IF EXISTS test_set;
DROP TABLE IF EXISTS null_in__fuzz_6;
set allow_suspicious_low_cardinality_types = 1;
CREATE TABLE null_in__fuzz_6 (`dt` LowCardinality(UInt16), `idx` Int32, `i` Nullable(Int256), `s` Int32) ENGINE = MergeTree PARTITION BY dt ORDER BY idx;
insert into null_in__fuzz_6 select * from generateRandom() limit 1;
SET transform_null_in = 0;
CREATE TABLE test_set (i Nullable(int)) ENGINE = Set();
INSERT INTO test_set VALUES (1), (NULL);
SELECT count() = 1 FROM null_in__fuzz_6 PREWHERE 71 WHERE i IN (test_set); -- { serverError CANNOT_CONVERT_TYPE }
DROP TABLE test_set;
DROP TABLE null_in__fuzz_6;

View File

@ -469,6 +469,7 @@ MSan
MVCC MVCC
MacBook MacBook
MacOS MacOS
MapState
MarkCacheBytes MarkCacheBytes
MarkCacheFiles MarkCacheFiles
MarksLoaderThreads MarksLoaderThreads

View File

@ -45,7 +45,7 @@ int main(int, char **)
{ {
auto flipped = flipBit(str, pos); auto flipped = flipBit(str, pos);
auto checksum = CityHash_v1_0_2::CityHash128(flipped.data(), flipped.size()); auto checksum = CityHash_v1_0_2::CityHash128(flipped.data(), flipped.size());
std::cout << getHexUIntLowercase(checksum.first) << getHexUIntLowercase(checksum.second) << "\t" << pos / 8 << ", " << pos % 8 << "\n"; std::cout << getHexUIntLowercase(checksum.high64) << getHexUIntLowercase(checksum.low64) << "\t" << pos / 8 << ", " << pos % 8 << "\n";
} }
return 0; return 0;

View File

@ -1,3 +1,4 @@
v23.5.4.25-stable 2023-06-29
v23.5.3.24-stable 2023-06-17 v23.5.3.24-stable 2023-06-17
v23.5.2.7-stable 2023-06-10 v23.5.2.7-stable 2023-06-10
v23.5.1.3174-stable 2023-06-09 v23.5.1.3174-stable 2023-06-09
@ -5,6 +6,7 @@ v23.4.4.16-stable 2023-06-17
v23.4.3.48-stable 2023-06-12 v23.4.3.48-stable 2023-06-12
v23.4.2.11-stable 2023-05-02 v23.4.2.11-stable 2023-05-02
v23.4.1.1943-stable 2023-04-27 v23.4.1.1943-stable 2023-04-27
v23.3.6.7-lts 2023-06-28
v23.3.5.9-lts 2023-06-22 v23.3.5.9-lts 2023-06-22
v23.3.4.17-lts 2023-06-17 v23.3.4.17-lts 2023-06-17
v23.3.3.52-lts 2023-06-12 v23.3.3.52-lts 2023-06-12

1 v23.5.3.24-stable v23.5.4.25-stable 2023-06-17 2023-06-29
1 v23.5.4.25-stable 2023-06-29
2 v23.5.3.24-stable v23.5.3.24-stable 2023-06-17 2023-06-17
3 v23.5.2.7-stable v23.5.2.7-stable 2023-06-10 2023-06-10
4 v23.5.1.3174-stable v23.5.1.3174-stable 2023-06-09 2023-06-09
6 v23.4.3.48-stable v23.4.3.48-stable 2023-06-12 2023-06-12
7 v23.4.2.11-stable v23.4.2.11-stable 2023-05-02 2023-05-02
8 v23.4.1.1943-stable v23.4.1.1943-stable 2023-04-27 2023-04-27
9 v23.3.6.7-lts 2023-06-28
10 v23.3.5.9-lts v23.3.5.9-lts 2023-06-22 2023-06-22
11 v23.3.4.17-lts v23.3.4.17-lts 2023-06-17 2023-06-17
12 v23.3.3.52-lts v23.3.3.52-lts 2023-06-12 2023-06-12