Merge remote-tracking branch 'origin/master' into musysroot

This commit is contained in:
Michael Kolupaev 2024-08-31 19:48:19 +00:00
commit e955b2fc71
391 changed files with 6757 additions and 1915 deletions

View File

@ -130,6 +130,7 @@ jobs:
with:
build_name: package_debug
data: ${{ needs.RunConfig.outputs.data }}
force: true
BuilderBinDarwin:
needs: [RunConfig, BuildDockers]
if: ${{ !failure() && !cancelled() }}

View File

@ -30,7 +30,6 @@
* Support more variants of JOIN strictness (`LEFT/RIGHT SEMI/ANTI/ANY JOIN`) with inequality conditions which involve columns from both left and right table. e.g. `t1.y < t2.y` (see the setting `allow_experimental_join_condition`). [#64281](https://github.com/ClickHouse/ClickHouse/pull/64281) ([lgbo](https://github.com/lgbo-ustc)).
* Intrpret Hive-style partitioning for different engines (`File`, `URL`, `S3`, `AzureBlobStorage`, `HDFS`). Hive-style partitioning organizes data into partitioned sub-directories, making it efficient to query and manage large datasets. Currently, it only creates virtual columns with the appropriate name and data. The follow-up PR will introduce the appropriate data filtering (performance speedup). [#65997](https://github.com/ClickHouse/ClickHouse/pull/65997) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Add function `printf` for Spark compatiability (but you can use the existing `format` function). [#66257](https://github.com/ClickHouse/ClickHouse/pull/66257) ([李扬](https://github.com/taiyang-li)).
* Added a new server setting, `disable_insertion_and_mutation`. If it is enabled, the server will deny all insertions and mutations. This includes asynchronous INSERTs. This setting can be used to create read-only replicas. [#66519](https://github.com/ClickHouse/ClickHouse/pull/66519) ([Xu Jia](https://github.com/XuJia0210)).
* Add options `restore_replace_external_engines_to_null` and `restore_replace_external_table_functions_to_null` to replace external engines and table_engines to `Null` engine that can be useful for testing. It should work for RESTORE and explicit table creation. [#66536](https://github.com/ClickHouse/ClickHouse/pull/66536) ([Ilya Yatsishin](https://github.com/qoega)).
* Added support for reading `MULTILINESTRING` geometry in `WKT` format using function `readWKTLineString`. [#67647](https://github.com/ClickHouse/ClickHouse/pull/67647) ([Jacob Reckhard](https://github.com/jacobrec)).
* Add a new table function `fuzzQuery`. This function allows the modification of a given query string with random variations. Example: `SELECT query FROM fuzzQuery('SELECT 1') LIMIT 5;`. [#67655](https://github.com/ClickHouse/ClickHouse/pull/67655) ([pufit](https://github.com/pufit)).

View File

@ -34,17 +34,41 @@ curl https://clickhouse.com/ | sh
Every month we get together with the community (users, contributors, customers, those interested in learning more about ClickHouse) to discuss what is coming in the latest release. If you are interested in sharing what you've built on ClickHouse, let us know.
* [v24.8 Community Call](https://clickhouse.com/company/events/v24-8-community-release-call) - August 20
* [v24.9 Community Call](https://clickhouse.com/company/events/v24-9-community-release-call) - September 26
## Upcoming Events
Keep an eye out for upcoming meetups and events around the world. Somewhere else you want us to be? Please feel free to reach out to tyler `<at>` clickhouse `<dot>` com. You can also peruse [ClickHouse Events](https://clickhouse.com/company/news-events) for a list of all upcoming trainings, meetups, speaking engagements, etc.
The following upcoming meetups are featuring creator of ClickHouse & CTO, Alexey Milovidov:
* [ClickHouse Guangzhou User Group Meetup](https://mp.weixin.qq.com/s/GSvo-7xUoVzCsuUvlLTpCw) - August 25
* [San Francisco Meetup (Cloudflare)](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/302540575) - September 5
* [Raleigh Meetup (Deutsche Bank)](https://www.meetup.com/triangletechtalks/events/302723486/) - September 9
* [New York Meetup (Rokt)](https://www.meetup.com/clickhouse-new-york-user-group/events/302575342) - September 10
* [Chicago Meetup (Jump Capital)](https://lu.ma/43tvmrfw) - September 12
Other upcoming meetups
* [Seattle Meetup (Statsig)](https://www.meetup.com/clickhouse-seattle-user-group/events/302518075/) - August 27
* [Melbourne Meetup](https://www.meetup.com/clickhouse-australia-user-group/events/302732666/) - August 27
* [Sydney Meetup](https://www.meetup.com/clickhouse-australia-user-group/events/302862966/) - September 5
* [Zurich Meetup](https://www.meetup.com/clickhouse-switzerland-meetup-group/events/302267429/) - September 5
* [Toronto Meetup (Shopify)](https://www.meetup.com/clickhouse-toronto-user-group/events/301490855/) - September 10
* [Austin Meetup](https://www.meetup.com/clickhouse-austin-user-group/events/302558689/) - September 17
* [London Meetup](https://www.meetup.com/clickhouse-london-user-group/events/302977267) - September 17
* [Tel Aviv Meetup](https://www.meetup.com/clickhouse-meetup-israel/events/303095121) - September 22
* [Madrid Meetup](https://www.meetup.com/clickhouse-spain-user-group/events/303096564/) - October 22
* [Barcelona Meetup](https://www.meetup.com/clickhouse-spain-user-group/events/303096876/) - October 29
* [Oslo Meetup](https://www.meetup.com/open-source-real-time-data-warehouse-real-time-analytics/events/302938622) - October 31
* [Ghent Meetup](https://www.meetup.com/clickhouse-belgium-user-group/events/303049405/) - November 19
* [Dubai Meetup](https://www.meetup.com/clickhouse-dubai-meetup-group/events/303096989/) - November 21
* [Paris Meetup](https://www.meetup.com/clickhouse-france-user-group/events/303096434) - November 26
## Recent Recordings
* **Recent Meetup Videos**: [Meetup Playlist](https://www.youtube.com/playlist?list=PL0Z2YDlm0b3iNDUzpY1S3L_iV4nARda_U) Whenever possible recordings of the ClickHouse Community Meetups are edited and presented as individual talks. Current featuring "Modern SQL in 2023", "Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse", and "Full-Text Indices: Design and Experiments"
* **Recording available**: [**v24.4 Release Call**](https://www.youtube.com/watch?v=dtUqgcfOGmE) All the features of 24.4, one convenient video! Watch it now!
* **Recording available**: [**v24.8 LTS Release Call**](https://www.youtube.com/watch?v=AeLmp2jc51k) All the features of 24.8 LTS, one convenient video! Watch it now!
## Interested in joining ClickHouse and making it your full-time job?

View File

@ -18,7 +18,9 @@
#define Net_HTTPResponse_INCLUDED
#include <map>
#include <vector>
#include "Poco/Net/HTTPCookie.h"
#include "Poco/Net/HTTPMessage.h"
#include "Poco/Net/Net.h"
@ -180,6 +182,8 @@ namespace Net
/// May throw an exception in case of a malformed
/// Set-Cookie header.
void getHeaders(std::map<std::string, std::string> & headers) const;
void write(std::ostream & ostr) const;
/// Writes the HTTP response to the given
/// output stream.

View File

@ -209,6 +209,15 @@ void HTTPResponse::getCookies(std::vector<HTTPCookie>& cookies) const
}
}
void HTTPResponse::getHeaders(std::map<std::string, std::string> & headers) const
{
headers.clear();
for (const auto & it : *this)
{
headers.emplace(it.first, it.second);
}
}
void HTTPResponse::write(std::ostream& ostr) const
{

View File

@ -311,6 +311,14 @@ int SecureSocketImpl::sendBytes(const void* buffer, int length, int flags)
while (mustRetry(rc, remaining_time));
if (rc <= 0)
{
// At this stage we still can have last not yet received SSL message containing SSL error
// so make a read to force SSL to process possible SSL error
if (SSL_get_error(_pSSL, rc) == SSL_ERROR_SYSCALL && SocketImpl::lastError() == POCO_ECONNRESET)
{
char c = 0;
SSL_read(_pSSL, &c, 1);
}
rc = handleError(rc);
if (rc == 0) throw SSLConnectionUnexpectedlyClosedException();
}

View File

@ -8,4 +8,7 @@ set (CMAKE_CXX_COMPILER_TARGET "x86_64-pc-freebsd11")
set (CMAKE_ASM_COMPILER_TARGET "x86_64-pc-freebsd11")
set (CMAKE_SYSROOT "${CMAKE_CURRENT_LIST_DIR}/../../contrib/sysroot/freebsd-x86_64")
# dprintf is used in a patched version of replxx
add_compile_definitions(_WITH_DPRINTF)
set (CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY) # disable linkage check - it doesn't work in CMake

2
contrib/libfiu vendored

@ -1 +1 @@
Subproject commit b85edbde4cf974b1b40d27828a56f0505f4e2ee5
Subproject commit a1290d8cd3d7b4541d6c976e0a54f572ac03f2a3

2
contrib/replxx vendored

@ -1 +1 @@
Subproject commit 5d04501f93a4fb7f0bb8b73b8f614bc986f9e25b
Subproject commit 711c18e7f4d951255aa8b0851e5a55d5a5fb0ddb

2
contrib/usearch vendored

@ -1 +1 @@
Subproject commit e21a5778a0d4469ddaf38c94b7be0196bb701ee4
Subproject commit 7a8967cb442b08ca20c3dd781414378e65957d37

View File

@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.8.1.2684"
ARG VERSION="24.8.2.3"
ARG PACKAGES="clickhouse-keeper"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.8.1.2684"
ARG VERSION="24.8.2.3"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -28,7 +28,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="24.8.1.2684"
ARG VERSION="24.8.2.3"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
#docker-official-library:off

View File

@ -112,3 +112,5 @@ wadllib==1.3.6
websocket-client==0.59.0
wheel==0.37.1
zipp==1.0.0
deltalake==0.16.0

View File

@ -13,7 +13,8 @@ entry="/usr/share/clickhouse-test/performance/scripts/entrypoint.sh"
# https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt
# Double-escaped backslashes are a tribute to the engineering wonder of docker --
# it gives '/bin/sh: 1: [bash,: not found' otherwise.
numactl --hardware
echo > compare.log
numactl --hardware | tee -a compare.log
node=$(( RANDOM % $(numactl --hardware | sed -n 's/^.*available:\(.*\)nodes.*$/\1/p') ));
echo Will bind to NUMA node $node;
echo Will bind to NUMA node $node | tee -a compare.log
numactl --cpunodebind=$node --membind=$node $entry

View File

@ -40,6 +40,3 @@ RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
ARG sqllogic_test_repo="https://github.com/gregrahn/sqllogictest.git"
RUN git clone --recursive ${sqllogic_test_repo}
COPY run.sh /
CMD ["/bin/bash", "/run.sh"]

View File

@ -0,0 +1,71 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.5.5.41-stable (441d4a6ebe3) FIXME as compared to v24.5.4.49-stable (63b760955a0)
#### Improvement
* Backported in [#66768](https://github.com/ClickHouse/ClickHouse/issues/66768): Make allow_experimental_analyzer be controlled by the initiator for distributed queries. This ensures compatibility and correctness during operations in mixed version clusters. [#65777](https://github.com/ClickHouse/ClickHouse/pull/65777) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#65350](https://github.com/ClickHouse/ClickHouse/issues/65350): Fix possible abort on uncaught exception in ~WriteBufferFromFileDescriptor in StatusFile. [#64206](https://github.com/ClickHouse/ClickHouse/pull/64206) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#65621](https://github.com/ClickHouse/ClickHouse/issues/65621): Fix `Cannot find column` in distributed query with `ARRAY JOIN` by `Nested` column. Fixes [#64755](https://github.com/ClickHouse/ClickHouse/issues/64755). [#64801](https://github.com/ClickHouse/ClickHouse/pull/64801) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#67902](https://github.com/ClickHouse/ClickHouse/issues/67902): Fixing the `Not-ready Set` error after the `PREWHERE` optimization for StorageMerge. [#65057](https://github.com/ClickHouse/ClickHouse/pull/65057) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66884](https://github.com/ClickHouse/ClickHouse/issues/66884): Fix unexpeced size of low cardinality column in function calls. [#65298](https://github.com/ClickHouse/ClickHouse/pull/65298) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#65933](https://github.com/ClickHouse/ClickHouse/issues/65933): For queries that read from `PostgreSQL`, cancel the internal `PostgreSQL` query if the ClickHouse query is finished. Otherwise, `ClickHouse` query cannot be canceled until the internal `PostgreSQL` query is finished. [#65771](https://github.com/ClickHouse/ClickHouse/pull/65771) ([Maksim Kita](https://github.com/kitaisreal)).
* Backported in [#66301](https://github.com/ClickHouse/ClickHouse/issues/66301): Better handling of join conditions involving `IS NULL` checks (for example `ON (a = b AND (a IS NOT NULL) AND (b IS NOT NULL) ) OR ( (a IS NULL) AND (b IS NULL) )` is rewritten to `ON a <=> b`), fix incorrect optimization when condition other then `IS NULL` are present. [#65835](https://github.com/ClickHouse/ClickHouse/pull/65835) ([vdimir](https://github.com/vdimir)).
* Backported in [#66328](https://github.com/ClickHouse/ClickHouse/issues/66328): Add missing settings `input_format_csv_skip_first_lines/input_format_tsv_skip_first_lines/input_format_csv_try_infer_numbers_from_strings/input_format_csv_try_infer_strings_from_quoted_tuples` in schema inference cache because they can change the resulting schema. It prevents from incorrect result of schema inference with these settings changed. [#65980](https://github.com/ClickHouse/ClickHouse/pull/65980) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#68252](https://github.com/ClickHouse/ClickHouse/issues/68252): Fixed `Not-ready Set` in some system tables when filtering using subqueries. [#66018](https://github.com/ClickHouse/ClickHouse/pull/66018) ([Michael Kolupaev](https://github.com/al13n321)).
* Backported in [#66155](https://github.com/ClickHouse/ClickHouse/issues/66155): Fixed buffer overflow bug in `unbin`/`unhex` implementation. [#66106](https://github.com/ClickHouse/ClickHouse/pull/66106) ([Nikita Taranov](https://github.com/nickitat)).
* Backported in [#66454](https://github.com/ClickHouse/ClickHouse/issues/66454): Fixed a bug in ZooKeeper client: a session could get stuck in unusable state after receiving a hardware error from ZooKeeper. For example, this might happen due to "soft memory limit" in ClickHouse Keeper. [#66140](https://github.com/ClickHouse/ClickHouse/pull/66140) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Backported in [#66226](https://github.com/ClickHouse/ClickHouse/issues/66226): Fix issue in SumIfToCountIfVisitor and signed integers. [#66146](https://github.com/ClickHouse/ClickHouse/pull/66146) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#66680](https://github.com/ClickHouse/ClickHouse/issues/66680): Fix handling limit for `system.numbers_mt` when no index can be used. [#66231](https://github.com/ClickHouse/ClickHouse/pull/66231) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#66604](https://github.com/ClickHouse/ClickHouse/issues/66604): Fixed how the ClickHouse server detects the maximum number of usable CPU cores as specified by cgroups v2 if the server runs in a container such as Docker. In more detail, containers often run their process in the root cgroup which has an empty name. In that case, ClickHouse ignored the CPU limits set by cgroups v2. [#66237](https://github.com/ClickHouse/ClickHouse/pull/66237) ([filimonov](https://github.com/filimonov)).
* Backported in [#66360](https://github.com/ClickHouse/ClickHouse/issues/66360): Fix the `Not-ready set` error when a subquery with `IN` is used in the constraint. [#66261](https://github.com/ClickHouse/ClickHouse/pull/66261) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#68064](https://github.com/ClickHouse/ClickHouse/issues/68064): Fix boolean literals in query sent to external database (for engines like `PostgreSQL`). [#66282](https://github.com/ClickHouse/ClickHouse/pull/66282) ([vdimir](https://github.com/vdimir)).
* Backported in [#68158](https://github.com/ClickHouse/ClickHouse/issues/68158): Fix cluster() for inter-server secret (preserve initial user as before). [#66364](https://github.com/ClickHouse/ClickHouse/pull/66364) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#66972](https://github.com/ClickHouse/ClickHouse/issues/66972): Fix `Column identifier is already registered` error with `group_by_use_nulls=true` and new analyzer. [#66400](https://github.com/ClickHouse/ClickHouse/pull/66400) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66691](https://github.com/ClickHouse/ClickHouse/issues/66691): Fix the VALID UNTIL clause in the user definition resetting after a restart. Closes [#66405](https://github.com/ClickHouse/ClickHouse/issues/66405). [#66409](https://github.com/ClickHouse/ClickHouse/pull/66409) ([Nikolay Degterinsky](https://github.com/evillique)).
* Backported in [#66969](https://github.com/ClickHouse/ClickHouse/issues/66969): Fix `Cannot find column` error for queries with constant expression in `GROUP BY` key and new analyzer enabled. [#66433](https://github.com/ClickHouse/ClickHouse/pull/66433) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66720](https://github.com/ClickHouse/ClickHouse/issues/66720): Correctly track memory for `Allocator::realloc`. [#66548](https://github.com/ClickHouse/ClickHouse/pull/66548) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#66951](https://github.com/ClickHouse/ClickHouse/issues/66951): Fix an invalid result for queries with `WINDOW`. This could happen when `PARTITION` columns have sparse serialization and window functions are executed in parallel. [#66579](https://github.com/ClickHouse/ClickHouse/pull/66579) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66757](https://github.com/ClickHouse/ClickHouse/issues/66757): Fix `Unknown identifier` and `Column is not under aggregate function` errors for queries with the expression `(column IS NULL).` The bug was triggered by [#65088](https://github.com/ClickHouse/ClickHouse/issues/65088), with the disabled analyzer only. [#66654](https://github.com/ClickHouse/ClickHouse/pull/66654) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66948](https://github.com/ClickHouse/ClickHouse/issues/66948): Fix `Method getResultType is not supported for QUERY query node` error when scalar subquery was used as the first argument of IN (with new analyzer). [#66655](https://github.com/ClickHouse/ClickHouse/pull/66655) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#68115](https://github.com/ClickHouse/ClickHouse/issues/68115): Fix possible PARAMETER_OUT_OF_BOUND error during reading variant subcolumn. [#66659](https://github.com/ClickHouse/ClickHouse/pull/66659) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#67633](https://github.com/ClickHouse/ClickHouse/issues/67633): Fix for occasional deadlock in Context::getDDLWorker. [#66843](https://github.com/ClickHouse/ClickHouse/pull/66843) ([Alexander Gololobov](https://github.com/davenger)).
* Backported in [#67481](https://github.com/ClickHouse/ClickHouse/issues/67481): In rare cases ClickHouse could consider parts as broken because of some unexpected projections on disk. Now it's fixed. [#66898](https://github.com/ClickHouse/ClickHouse/pull/66898) ([alesapin](https://github.com/alesapin)).
* Backported in [#67814](https://github.com/ClickHouse/ClickHouse/issues/67814): Only relevant to the experimental Variant data type. Fix crash with Variant + AggregateFunction type. [#67122](https://github.com/ClickHouse/ClickHouse/pull/67122) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#67197](https://github.com/ClickHouse/ClickHouse/issues/67197): TRUNCATE DATABASE used to stop replication as if it was a DROP DATABASE query, it's fixed. [#67129](https://github.com/ClickHouse/ClickHouse/pull/67129) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Backported in [#67379](https://github.com/ClickHouse/ClickHouse/issues/67379): Fix error `Cannot convert column because it is non constant in source stream but must be constant in result.` for a query that reads from the `Merge` table over the `Distriburted` table with one shard. [#67146](https://github.com/ClickHouse/ClickHouse/pull/67146) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#67501](https://github.com/ClickHouse/ClickHouse/issues/67501): Fix crash in DistributedAsyncInsert when connection is empty. [#67219](https://github.com/ClickHouse/ClickHouse/pull/67219) ([Pablo Marcos](https://github.com/pamarcos)).
* Backported in [#67886](https://github.com/ClickHouse/ClickHouse/issues/67886): Correctly parse file name/URI containing `::` if it's not an archive. [#67433](https://github.com/ClickHouse/ClickHouse/pull/67433) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#67576](https://github.com/ClickHouse/ClickHouse/issues/67576): Fix execution of nested short-circuit functions. [#67520](https://github.com/ClickHouse/ClickHouse/pull/67520) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#67850](https://github.com/ClickHouse/ClickHouse/issues/67850): Fixes [#66026](https://github.com/ClickHouse/ClickHouse/issues/66026). Avoid unresolved table function arguments traversal in `ReplaceTableNodeToDummyVisitor`. [#67522](https://github.com/ClickHouse/ClickHouse/pull/67522) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#68272](https://github.com/ClickHouse/ClickHouse/issues/68272): Fix inserting into stream like engines (Kafka, RabbitMQ, NATS) through HTTP interface. [#67554](https://github.com/ClickHouse/ClickHouse/pull/67554) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#67807](https://github.com/ClickHouse/ClickHouse/issues/67807): Fix reloading SQL UDFs with UNION. Previously, restarting the server could make UDF invalid. [#67665](https://github.com/ClickHouse/ClickHouse/pull/67665) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#67836](https://github.com/ClickHouse/ClickHouse/issues/67836): Fix potential stack overflow in `JSONMergePatch` function. Renamed this function from `jsonMergePatch` to `JSONMergePatch` because the previous name was wrong. The previous name is still kept for compatibility. Improved diagnostic of errors in the function. This closes [#67304](https://github.com/ClickHouse/ClickHouse/issues/67304). [#67756](https://github.com/ClickHouse/ClickHouse/pull/67756) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#67991](https://github.com/ClickHouse/ClickHouse/issues/67991): Validate experimental/suspicious data types in ALTER ADD/MODIFY COLUMN. [#67911](https://github.com/ClickHouse/ClickHouse/pull/67911) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#68207](https://github.com/ClickHouse/ClickHouse/issues/68207): Fix wrong `count()` result when there is non-deterministic function in predicate. [#67922](https://github.com/ClickHouse/ClickHouse/pull/67922) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#68091](https://github.com/ClickHouse/ClickHouse/issues/68091): Fixed the calculation of the maximum thread soft limit in containerized environments where the usable CPU count is limited. [#67963](https://github.com/ClickHouse/ClickHouse/pull/67963) ([Robert Schulze](https://github.com/rschu1ze)).
* Backported in [#68122](https://github.com/ClickHouse/ClickHouse/issues/68122): Fixed skipping of untouched parts in mutations with new analyzer. Previously with enabled analyzer data in part could be rewritten by mutation even if mutation doesn't affect this part according to predicate. [#68052](https://github.com/ClickHouse/ClickHouse/pull/68052) ([Anton Popov](https://github.com/CurtizJ)).
* Backported in [#68171](https://github.com/ClickHouse/ClickHouse/issues/68171): Removes an incorrect optimization to remove sorting in subqueries that use `OFFSET`. Fixes [#67906](https://github.com/ClickHouse/ClickHouse/issues/67906). [#68099](https://github.com/ClickHouse/ClickHouse/pull/68099) ([Graham Campbell](https://github.com/GrahamCampbell)).
* Backported in [#68337](https://github.com/ClickHouse/ClickHouse/issues/68337): Try fix postgres crash when query is cancelled. [#68288](https://github.com/ClickHouse/ClickHouse/pull/68288) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Backported in [#68667](https://github.com/ClickHouse/ClickHouse/issues/68667): Fix `LOGICAL_ERROR`s when functions `sipHash64Keyed`, `sipHash128Keyed`, or `sipHash128ReferenceKeyed` are applied to empty arrays or tuples. [#68630](https://github.com/ClickHouse/ClickHouse/pull/68630) ([Robert Schulze](https://github.com/rschu1ze)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Backported in [#66387](https://github.com/ClickHouse/ClickHouse/issues/66387): Disable broken cases from 02911_join_on_nullsafe_optimization. [#66310](https://github.com/ClickHouse/ClickHouse/pull/66310) ([vdimir](https://github.com/vdimir)).
* Backported in [#66426](https://github.com/ClickHouse/ClickHouse/issues/66426): Ignore subquery for IN in DDLLoadingDependencyVisitor. [#66395](https://github.com/ClickHouse/ClickHouse/pull/66395) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66544](https://github.com/ClickHouse/ClickHouse/issues/66544): Add additional log masking in CI. [#66523](https://github.com/ClickHouse/ClickHouse/pull/66523) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#66859](https://github.com/ClickHouse/ClickHouse/issues/66859): Fix data race in S3::ClientCache. [#66644](https://github.com/ClickHouse/ClickHouse/pull/66644) ([Konstantin Morozov](https://github.com/k-morozov)).
* Backported in [#66875](https://github.com/ClickHouse/ClickHouse/issues/66875): Support one more case in JOIN ON ... IS NULL. [#66725](https://github.com/ClickHouse/ClickHouse/pull/66725) ([vdimir](https://github.com/vdimir)).
* Backported in [#67059](https://github.com/ClickHouse/ClickHouse/issues/67059): Increase asio pool size in case the server is tiny. [#66761](https://github.com/ClickHouse/ClickHouse/pull/66761) ([alesapin](https://github.com/alesapin)).
* Backported in [#66945](https://github.com/ClickHouse/ClickHouse/issues/66945): Small fix in realloc memory tracking. [#66820](https://github.com/ClickHouse/ClickHouse/pull/66820) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#67252](https://github.com/ClickHouse/ClickHouse/issues/67252): Followup [#66725](https://github.com/ClickHouse/ClickHouse/issues/66725). [#66869](https://github.com/ClickHouse/ClickHouse/pull/66869) ([vdimir](https://github.com/vdimir)).
* Backported in [#67412](https://github.com/ClickHouse/ClickHouse/issues/67412): CI: Fix build results for release branches. [#67402](https://github.com/ClickHouse/ClickHouse/pull/67402) ([Max K.](https://github.com/maxknv)).
* Update version after release. [#67862](https://github.com/ClickHouse/ClickHouse/pull/67862) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Backported in [#68077](https://github.com/ClickHouse/ClickHouse/issues/68077): Add an explicit error for `ALTER MODIFY SQL SECURITY` on non-view tables. [#67953](https://github.com/ClickHouse/ClickHouse/pull/67953) ([pufit](https://github.com/pufit)).

View File

@ -0,0 +1,33 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.5.6.45-stable (bdca8604c29) FIXME as compared to v24.5.5.78-stable (0138248cb62)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#67902](https://github.com/ClickHouse/ClickHouse/issues/67902): Fixing the `Not-ready Set` error after the `PREWHERE` optimization for StorageMerge. [#65057](https://github.com/ClickHouse/ClickHouse/pull/65057) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#68252](https://github.com/ClickHouse/ClickHouse/issues/68252): Fixed `Not-ready Set` in some system tables when filtering using subqueries. [#66018](https://github.com/ClickHouse/ClickHouse/pull/66018) ([Michael Kolupaev](https://github.com/al13n321)).
* Backported in [#68064](https://github.com/ClickHouse/ClickHouse/issues/68064): Fix boolean literals in query sent to external database (for engines like `PostgreSQL`). [#66282](https://github.com/ClickHouse/ClickHouse/pull/66282) ([vdimir](https://github.com/vdimir)).
* Backported in [#68158](https://github.com/ClickHouse/ClickHouse/issues/68158): Fix cluster() for inter-server secret (preserve initial user as before). [#66364](https://github.com/ClickHouse/ClickHouse/pull/66364) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#68115](https://github.com/ClickHouse/ClickHouse/issues/68115): Fix possible PARAMETER_OUT_OF_BOUND error during reading variant subcolumn. [#66659](https://github.com/ClickHouse/ClickHouse/pull/66659) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#67886](https://github.com/ClickHouse/ClickHouse/issues/67886): Correctly parse file name/URI containing `::` if it's not an archive. [#67433](https://github.com/ClickHouse/ClickHouse/pull/67433) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#68272](https://github.com/ClickHouse/ClickHouse/issues/68272): Fix inserting into stream like engines (Kafka, RabbitMQ, NATS) through HTTP interface. [#67554](https://github.com/ClickHouse/ClickHouse/pull/67554) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#67807](https://github.com/ClickHouse/ClickHouse/issues/67807): Fix reloading SQL UDFs with UNION. Previously, restarting the server could make UDF invalid. [#67665](https://github.com/ClickHouse/ClickHouse/pull/67665) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#67836](https://github.com/ClickHouse/ClickHouse/issues/67836): Fix potential stack overflow in `JSONMergePatch` function. Renamed this function from `jsonMergePatch` to `JSONMergePatch` because the previous name was wrong. The previous name is still kept for compatibility. Improved diagnostic of errors in the function. This closes [#67304](https://github.com/ClickHouse/ClickHouse/issues/67304). [#67756](https://github.com/ClickHouse/ClickHouse/pull/67756) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#67991](https://github.com/ClickHouse/ClickHouse/issues/67991): Validate experimental/suspicious data types in ALTER ADD/MODIFY COLUMN. [#67911](https://github.com/ClickHouse/ClickHouse/pull/67911) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#68207](https://github.com/ClickHouse/ClickHouse/issues/68207): Fix wrong `count()` result when there is non-deterministic function in predicate. [#67922](https://github.com/ClickHouse/ClickHouse/pull/67922) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#68091](https://github.com/ClickHouse/ClickHouse/issues/68091): Fixed the calculation of the maximum thread soft limit in containerized environments where the usable CPU count is limited. [#67963](https://github.com/ClickHouse/ClickHouse/pull/67963) ([Robert Schulze](https://github.com/rschu1ze)).
* Backported in [#68122](https://github.com/ClickHouse/ClickHouse/issues/68122): Fixed skipping of untouched parts in mutations with new analyzer. Previously with enabled analyzer data in part could be rewritten by mutation even if mutation doesn't affect this part according to predicate. [#68052](https://github.com/ClickHouse/ClickHouse/pull/68052) ([Anton Popov](https://github.com/CurtizJ)).
* Backported in [#68171](https://github.com/ClickHouse/ClickHouse/issues/68171): Removes an incorrect optimization to remove sorting in subqueries that use `OFFSET`. Fixes [#67906](https://github.com/ClickHouse/ClickHouse/issues/67906). [#68099](https://github.com/ClickHouse/ClickHouse/pull/68099) ([Graham Campbell](https://github.com/GrahamCampbell)).
* Backported in [#68337](https://github.com/ClickHouse/ClickHouse/issues/68337): Try fix postgres crash when query is cancelled. [#68288](https://github.com/ClickHouse/ClickHouse/pull/68288) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Backported in [#68667](https://github.com/ClickHouse/ClickHouse/issues/68667): Fix `LOGICAL_ERROR`s when functions `sipHash64Keyed`, `sipHash128Keyed`, or `sipHash128ReferenceKeyed` are applied to empty arrays or tuples. [#68630](https://github.com/ClickHouse/ClickHouse/pull/68630) ([Robert Schulze](https://github.com/rschu1ze)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Update version after release. [#67862](https://github.com/ClickHouse/ClickHouse/pull/67862) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Backported in [#68077](https://github.com/ClickHouse/ClickHouse/issues/68077): Add an explicit error for `ALTER MODIFY SQL SECURITY` on non-view tables. [#67953](https://github.com/ClickHouse/ClickHouse/pull/67953) ([pufit](https://github.com/pufit)).
* Backported in [#68756](https://github.com/ClickHouse/ClickHouse/issues/68756): To make patch release possible from every commit on release branch, package_debug build is required and must not be skipped. [#68750](https://github.com/ClickHouse/ClickHouse/pull/68750) ([Max K.](https://github.com/maxknv)).

View File

@ -0,0 +1,83 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.6.3.38-stable (4e33c831589) FIXME as compared to v24.6.2.17-stable (5710a8b5c0c)
#### Improvement
* Backported in [#66770](https://github.com/ClickHouse/ClickHouse/issues/66770): Make allow_experimental_analyzer be controlled by the initiator for distributed queries. This ensures compatibility and correctness during operations in mixed version clusters. [#65777](https://github.com/ClickHouse/ClickHouse/pull/65777) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#66885](https://github.com/ClickHouse/ClickHouse/issues/66885): Fix unexpeced size of low cardinality column in function calls. [#65298](https://github.com/ClickHouse/ClickHouse/pull/65298) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#66303](https://github.com/ClickHouse/ClickHouse/issues/66303): Better handling of join conditions involving `IS NULL` checks (for example `ON (a = b AND (a IS NOT NULL) AND (b IS NOT NULL) ) OR ( (a IS NULL) AND (b IS NULL) )` is rewritten to `ON a <=> b`), fix incorrect optimization when condition other then `IS NULL` are present. [#65835](https://github.com/ClickHouse/ClickHouse/pull/65835) ([vdimir](https://github.com/vdimir)).
* Backported in [#66330](https://github.com/ClickHouse/ClickHouse/issues/66330): Add missing settings `input_format_csv_skip_first_lines/input_format_tsv_skip_first_lines/input_format_csv_try_infer_numbers_from_strings/input_format_csv_try_infer_strings_from_quoted_tuples` in schema inference cache because they can change the resulting schema. It prevents from incorrect result of schema inference with these settings changed. [#65980](https://github.com/ClickHouse/ClickHouse/pull/65980) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#66157](https://github.com/ClickHouse/ClickHouse/issues/66157): Fixed buffer overflow bug in `unbin`/`unhex` implementation. [#66106](https://github.com/ClickHouse/ClickHouse/pull/66106) ([Nikita Taranov](https://github.com/nickitat)).
* Backported in [#66210](https://github.com/ClickHouse/ClickHouse/issues/66210): Disable the `merge-filters` optimization introduced in [#64760](https://github.com/ClickHouse/ClickHouse/issues/64760). It may cause an exception if optimization merges two filter expressions and does not apply a short-circuit evaluation. [#66126](https://github.com/ClickHouse/ClickHouse/pull/66126) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66456](https://github.com/ClickHouse/ClickHouse/issues/66456): Fixed a bug in ZooKeeper client: a session could get stuck in unusable state after receiving a hardware error from ZooKeeper. For example, this might happen due to "soft memory limit" in ClickHouse Keeper. [#66140](https://github.com/ClickHouse/ClickHouse/pull/66140) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Backported in [#66228](https://github.com/ClickHouse/ClickHouse/issues/66228): Fix issue in SumIfToCountIfVisitor and signed integers. [#66146](https://github.com/ClickHouse/ClickHouse/pull/66146) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#66183](https://github.com/ClickHouse/ClickHouse/issues/66183): Fix rare case with missing data in the result of distributed query, close [#61432](https://github.com/ClickHouse/ClickHouse/issues/61432). [#66174](https://github.com/ClickHouse/ClickHouse/pull/66174) ([vdimir](https://github.com/vdimir)).
* Backported in [#66271](https://github.com/ClickHouse/ClickHouse/issues/66271): Don't throw `TIMEOUT_EXCEEDED` for `none_only_active` mode of `distributed_ddl_output_mode`. [#66218](https://github.com/ClickHouse/ClickHouse/pull/66218) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Backported in [#66682](https://github.com/ClickHouse/ClickHouse/issues/66682): Fix handling limit for `system.numbers_mt` when no index can be used. [#66231](https://github.com/ClickHouse/ClickHouse/pull/66231) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#66587](https://github.com/ClickHouse/ClickHouse/issues/66587): Fixed how the ClickHouse server detects the maximum number of usable CPU cores as specified by cgroups v2 if the server runs in a container such as Docker. In more detail, containers often run their process in the root cgroup which has an empty name. In that case, ClickHouse ignored the CPU limits set by cgroups v2. [#66237](https://github.com/ClickHouse/ClickHouse/pull/66237) ([filimonov](https://github.com/filimonov)).
* Backported in [#66362](https://github.com/ClickHouse/ClickHouse/issues/66362): Fix the `Not-ready set` error when a subquery with `IN` is used in the constraint. [#66261](https://github.com/ClickHouse/ClickHouse/pull/66261) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#68066](https://github.com/ClickHouse/ClickHouse/issues/68066): Fix boolean literals in query sent to external database (for engines like `PostgreSQL`). [#66282](https://github.com/ClickHouse/ClickHouse/pull/66282) ([vdimir](https://github.com/vdimir)).
* Backported in [#68566](https://github.com/ClickHouse/ClickHouse/issues/68566): Fix indexHint function case found by fuzzer. [#66286](https://github.com/ClickHouse/ClickHouse/pull/66286) ([Anton Popov](https://github.com/CurtizJ)).
* Backported in [#68159](https://github.com/ClickHouse/ClickHouse/issues/68159): Fix cluster() for inter-server secret (preserve initial user as before). [#66364](https://github.com/ClickHouse/ClickHouse/pull/66364) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#66613](https://github.com/ClickHouse/ClickHouse/issues/66613): Fix `Column identifier is already registered` error with `group_by_use_nulls=true` and new analyzer. [#66400](https://github.com/ClickHouse/ClickHouse/pull/66400) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66693](https://github.com/ClickHouse/ClickHouse/issues/66693): Fix the VALID UNTIL clause in the user definition resetting after a restart. Closes [#66405](https://github.com/ClickHouse/ClickHouse/issues/66405). [#66409](https://github.com/ClickHouse/ClickHouse/pull/66409) ([Nikolay Degterinsky](https://github.com/evillique)).
* Backported in [#66577](https://github.com/ClickHouse/ClickHouse/issues/66577): Fix `Cannot find column` error for queries with constant expression in `GROUP BY` key and new analyzer enabled. [#66433](https://github.com/ClickHouse/ClickHouse/pull/66433) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66721](https://github.com/ClickHouse/ClickHouse/issues/66721): Correctly track memory for `Allocator::realloc`. [#66548](https://github.com/ClickHouse/ClickHouse/pull/66548) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#66670](https://github.com/ClickHouse/ClickHouse/issues/66670): Fix reading of uninitialized memory when hashing empty tuples. This closes [#66559](https://github.com/ClickHouse/ClickHouse/issues/66559). [#66562](https://github.com/ClickHouse/ClickHouse/pull/66562) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#66952](https://github.com/ClickHouse/ClickHouse/issues/66952): Fix an invalid result for queries with `WINDOW`. This could happen when `PARTITION` columns have sparse serialization and window functions are executed in parallel. [#66579](https://github.com/ClickHouse/ClickHouse/pull/66579) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66956](https://github.com/ClickHouse/ClickHouse/issues/66956): Fix removing named collections in local storage. [#66599](https://github.com/ClickHouse/ClickHouse/pull/66599) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#66716](https://github.com/ClickHouse/ClickHouse/issues/66716): Fix removing named collections in local storage. [#66599](https://github.com/ClickHouse/ClickHouse/pull/66599) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#66759](https://github.com/ClickHouse/ClickHouse/issues/66759): Fix `Unknown identifier` and `Column is not under aggregate function` errors for queries with the expression `(column IS NULL).` The bug was triggered by [#65088](https://github.com/ClickHouse/ClickHouse/issues/65088), with the disabled analyzer only. [#66654](https://github.com/ClickHouse/ClickHouse/pull/66654) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66751](https://github.com/ClickHouse/ClickHouse/issues/66751): Fix `Method getResultType is not supported for QUERY query node` error when scalar subquery was used as the first argument of IN (with new analyzer). [#66655](https://github.com/ClickHouse/ClickHouse/pull/66655) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#68116](https://github.com/ClickHouse/ClickHouse/issues/68116): Fix possible PARAMETER_OUT_OF_BOUND error during reading variant subcolumn. [#66659](https://github.com/ClickHouse/ClickHouse/pull/66659) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#67635](https://github.com/ClickHouse/ClickHouse/issues/67635): Fix for occasional deadlock in Context::getDDLWorker. [#66843](https://github.com/ClickHouse/ClickHouse/pull/66843) ([Alexander Gololobov](https://github.com/davenger)).
* Backported in [#67482](https://github.com/ClickHouse/ClickHouse/issues/67482): In rare cases ClickHouse could consider parts as broken because of some unexpected projections on disk. Now it's fixed. [#66898](https://github.com/ClickHouse/ClickHouse/pull/66898) ([alesapin](https://github.com/alesapin)).
* Backported in [#67816](https://github.com/ClickHouse/ClickHouse/issues/67816): Only relevant to the experimental Variant data type. Fix crash with Variant + AggregateFunction type. [#67122](https://github.com/ClickHouse/ClickHouse/pull/67122) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#67199](https://github.com/ClickHouse/ClickHouse/issues/67199): TRUNCATE DATABASE used to stop replication as if it was a DROP DATABASE query, it's fixed. [#67129](https://github.com/ClickHouse/ClickHouse/pull/67129) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Backported in [#67381](https://github.com/ClickHouse/ClickHouse/issues/67381): Fix error `Cannot convert column because it is non constant in source stream but must be constant in result.` for a query that reads from the `Merge` table over the `Distriburted` table with one shard. [#67146](https://github.com/ClickHouse/ClickHouse/pull/67146) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#67244](https://github.com/ClickHouse/ClickHouse/issues/67244): This closes [#67156](https://github.com/ClickHouse/ClickHouse/issues/67156). This closes [#66447](https://github.com/ClickHouse/ClickHouse/issues/66447). The bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/62907. [#67178](https://github.com/ClickHouse/ClickHouse/pull/67178) ([Maksim Kita](https://github.com/kitaisreal)).
* Backported in [#67503](https://github.com/ClickHouse/ClickHouse/issues/67503): Fix crash in DistributedAsyncInsert when connection is empty. [#67219](https://github.com/ClickHouse/ClickHouse/pull/67219) ([Pablo Marcos](https://github.com/pamarcos)).
* Backported in [#67887](https://github.com/ClickHouse/ClickHouse/issues/67887): Correctly parse file name/URI containing `::` if it's not an archive. [#67433](https://github.com/ClickHouse/ClickHouse/pull/67433) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#67578](https://github.com/ClickHouse/ClickHouse/issues/67578): Fix execution of nested short-circuit functions. [#67520](https://github.com/ClickHouse/ClickHouse/pull/67520) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#68611](https://github.com/ClickHouse/ClickHouse/issues/68611): Fixes [#66026](https://github.com/ClickHouse/ClickHouse/issues/66026). Avoid unresolved table function arguments traversal in `ReplaceTableNodeToDummyVisitor`. [#67522](https://github.com/ClickHouse/ClickHouse/pull/67522) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#67852](https://github.com/ClickHouse/ClickHouse/issues/67852): Fixes [#66026](https://github.com/ClickHouse/ClickHouse/issues/66026). Avoid unresolved table function arguments traversal in `ReplaceTableNodeToDummyVisitor`. [#67522](https://github.com/ClickHouse/ClickHouse/pull/67522) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#68275](https://github.com/ClickHouse/ClickHouse/issues/68275): Fix inserting into stream like engines (Kafka, RabbitMQ, NATS) through HTTP interface. [#67554](https://github.com/ClickHouse/ClickHouse/pull/67554) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#67808](https://github.com/ClickHouse/ClickHouse/issues/67808): Fix reloading SQL UDFs with UNION. Previously, restarting the server could make UDF invalid. [#67665](https://github.com/ClickHouse/ClickHouse/pull/67665) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#67838](https://github.com/ClickHouse/ClickHouse/issues/67838): Fix potential stack overflow in `JSONMergePatch` function. Renamed this function from `jsonMergePatch` to `JSONMergePatch` because the previous name was wrong. The previous name is still kept for compatibility. Improved diagnostic of errors in the function. This closes [#67304](https://github.com/ClickHouse/ClickHouse/issues/67304). [#67756](https://github.com/ClickHouse/ClickHouse/pull/67756) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#67993](https://github.com/ClickHouse/ClickHouse/issues/67993): Validate experimental/suspicious data types in ALTER ADD/MODIFY COLUMN. [#67911](https://github.com/ClickHouse/ClickHouse/pull/67911) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#68208](https://github.com/ClickHouse/ClickHouse/issues/68208): Fix wrong `count()` result when there is non-deterministic function in predicate. [#67922](https://github.com/ClickHouse/ClickHouse/pull/67922) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#68093](https://github.com/ClickHouse/ClickHouse/issues/68093): Fixed the calculation of the maximum thread soft limit in containerized environments where the usable CPU count is limited. [#67963](https://github.com/ClickHouse/ClickHouse/pull/67963) ([Robert Schulze](https://github.com/rschu1ze)).
* Backported in [#68124](https://github.com/ClickHouse/ClickHouse/issues/68124): Fixed skipping of untouched parts in mutations with new analyzer. Previously with enabled analyzer data in part could be rewritten by mutation even if mutation doesn't affect this part according to predicate. [#68052](https://github.com/ClickHouse/ClickHouse/pull/68052) ([Anton Popov](https://github.com/CurtizJ)).
* Backported in [#68221](https://github.com/ClickHouse/ClickHouse/issues/68221): Fixed a NULL pointer dereference, triggered by a specially crafted query, that crashed the server via hopEnd, hopStart, tumbleEnd, and tumbleStart. [#68098](https://github.com/ClickHouse/ClickHouse/pull/68098) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Backported in [#68173](https://github.com/ClickHouse/ClickHouse/issues/68173): Removes an incorrect optimization to remove sorting in subqueries that use `OFFSET`. Fixes [#67906](https://github.com/ClickHouse/ClickHouse/issues/67906). [#68099](https://github.com/ClickHouse/ClickHouse/pull/68099) ([Graham Campbell](https://github.com/GrahamCampbell)).
* Backported in [#68339](https://github.com/ClickHouse/ClickHouse/issues/68339): Try fix postgres crash when query is cancelled. [#68288](https://github.com/ClickHouse/ClickHouse/pull/68288) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Backported in [#68396](https://github.com/ClickHouse/ClickHouse/issues/68396): Fix missing sync replica mode in query `SYSTEM SYNC REPLICA`. [#68326](https://github.com/ClickHouse/ClickHouse/pull/68326) ([Duc Canh Le](https://github.com/canhld94)).
* Backported in [#68668](https://github.com/ClickHouse/ClickHouse/issues/68668): Fix `LOGICAL_ERROR`s when functions `sipHash64Keyed`, `sipHash128Keyed`, or `sipHash128ReferenceKeyed` are applied to empty arrays or tuples. [#68630](https://github.com/ClickHouse/ClickHouse/pull/68630) ([Robert Schulze](https://github.com/rschu1ze)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Backport [#66599](https://github.com/ClickHouse/ClickHouse/issues/66599) to 24.6: Fix dropping named collection in local storage"'. [#66922](https://github.com/ClickHouse/ClickHouse/pull/66922) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Backported in [#66332](https://github.com/ClickHouse/ClickHouse/issues/66332): Do not raise a NOT_IMPLEMENTED error when getting s3 metrics with a multiple disk configuration. [#65403](https://github.com/ClickHouse/ClickHouse/pull/65403) ([Elena Torró](https://github.com/elenatorro)).
* Backported in [#66142](https://github.com/ClickHouse/ClickHouse/issues/66142): Fix flaky test_storage_s3_queue tests. [#66009](https://github.com/ClickHouse/ClickHouse/pull/66009) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Backported in [#66389](https://github.com/ClickHouse/ClickHouse/issues/66389): Disable broken cases from 02911_join_on_nullsafe_optimization. [#66310](https://github.com/ClickHouse/ClickHouse/pull/66310) ([vdimir](https://github.com/vdimir)).
* Backported in [#66428](https://github.com/ClickHouse/ClickHouse/issues/66428): Ignore subquery for IN in DDLLoadingDependencyVisitor. [#66395](https://github.com/ClickHouse/ClickHouse/pull/66395) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Backported in [#66546](https://github.com/ClickHouse/ClickHouse/issues/66546): Add additional log masking in CI. [#66523](https://github.com/ClickHouse/ClickHouse/pull/66523) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#66861](https://github.com/ClickHouse/ClickHouse/issues/66861): Fix data race in S3::ClientCache. [#66644](https://github.com/ClickHouse/ClickHouse/pull/66644) ([Konstantin Morozov](https://github.com/k-morozov)).
* Backported in [#66877](https://github.com/ClickHouse/ClickHouse/issues/66877): Support one more case in JOIN ON ... IS NULL. [#66725](https://github.com/ClickHouse/ClickHouse/pull/66725) ([vdimir](https://github.com/vdimir)).
* Backported in [#67061](https://github.com/ClickHouse/ClickHouse/issues/67061): Increase asio pool size in case the server is tiny. [#66761](https://github.com/ClickHouse/ClickHouse/pull/66761) ([alesapin](https://github.com/alesapin)).
* Backported in [#66940](https://github.com/ClickHouse/ClickHouse/issues/66940): Small fix in realloc memory tracking. [#66820](https://github.com/ClickHouse/ClickHouse/pull/66820) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#67254](https://github.com/ClickHouse/ClickHouse/issues/67254): Followup [#66725](https://github.com/ClickHouse/ClickHouse/issues/66725). [#66869](https://github.com/ClickHouse/ClickHouse/pull/66869) ([vdimir](https://github.com/vdimir)).
* Backported in [#67414](https://github.com/ClickHouse/ClickHouse/issues/67414): CI: Fix build results for release branches. [#67402](https://github.com/ClickHouse/ClickHouse/pull/67402) ([Max K.](https://github.com/maxknv)).
* Update version after release. [#67909](https://github.com/ClickHouse/ClickHouse/pull/67909) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Backported in [#68079](https://github.com/ClickHouse/ClickHouse/issues/68079): Add an explicit error for `ALTER MODIFY SQL SECURITY` on non-view tables. [#67953](https://github.com/ClickHouse/ClickHouse/pull/67953) ([pufit](https://github.com/pufit)).

View File

@ -0,0 +1,33 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.6.4.42-stable (c534bb4b4dd) FIXME as compared to v24.6.3.95-stable (8325c920d11)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#68066](https://github.com/ClickHouse/ClickHouse/issues/68066): Fix boolean literals in query sent to external database (for engines like `PostgreSQL`). [#66282](https://github.com/ClickHouse/ClickHouse/pull/66282) ([vdimir](https://github.com/vdimir)).
* Backported in [#68566](https://github.com/ClickHouse/ClickHouse/issues/68566): Fix indexHint function case found by fuzzer. [#66286](https://github.com/ClickHouse/ClickHouse/pull/66286) ([Anton Popov](https://github.com/CurtizJ)).
* Backported in [#68159](https://github.com/ClickHouse/ClickHouse/issues/68159): Fix cluster() for inter-server secret (preserve initial user as before). [#66364](https://github.com/ClickHouse/ClickHouse/pull/66364) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#68116](https://github.com/ClickHouse/ClickHouse/issues/68116): Fix possible PARAMETER_OUT_OF_BOUND error during reading variant subcolumn. [#66659](https://github.com/ClickHouse/ClickHouse/pull/66659) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#67887](https://github.com/ClickHouse/ClickHouse/issues/67887): Correctly parse file name/URI containing `::` if it's not an archive. [#67433](https://github.com/ClickHouse/ClickHouse/pull/67433) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#68611](https://github.com/ClickHouse/ClickHouse/issues/68611): Fixes [#66026](https://github.com/ClickHouse/ClickHouse/issues/66026). Avoid unresolved table function arguments traversal in `ReplaceTableNodeToDummyVisitor`. [#67522](https://github.com/ClickHouse/ClickHouse/pull/67522) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#68275](https://github.com/ClickHouse/ClickHouse/issues/68275): Fix inserting into stream like engines (Kafka, RabbitMQ, NATS) through HTTP interface. [#67554](https://github.com/ClickHouse/ClickHouse/pull/67554) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#67993](https://github.com/ClickHouse/ClickHouse/issues/67993): Validate experimental/suspicious data types in ALTER ADD/MODIFY COLUMN. [#67911](https://github.com/ClickHouse/ClickHouse/pull/67911) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#68208](https://github.com/ClickHouse/ClickHouse/issues/68208): Fix wrong `count()` result when there is non-deterministic function in predicate. [#67922](https://github.com/ClickHouse/ClickHouse/pull/67922) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#68093](https://github.com/ClickHouse/ClickHouse/issues/68093): Fixed the calculation of the maximum thread soft limit in containerized environments where the usable CPU count is limited. [#67963](https://github.com/ClickHouse/ClickHouse/pull/67963) ([Robert Schulze](https://github.com/rschu1ze)).
* Backported in [#68124](https://github.com/ClickHouse/ClickHouse/issues/68124): Fixed skipping of untouched parts in mutations with new analyzer. Previously with enabled analyzer data in part could be rewritten by mutation even if mutation doesn't affect this part according to predicate. [#68052](https://github.com/ClickHouse/ClickHouse/pull/68052) ([Anton Popov](https://github.com/CurtizJ)).
* Backported in [#68221](https://github.com/ClickHouse/ClickHouse/issues/68221): Fixed a NULL pointer dereference, triggered by a specially crafted query, that crashed the server via hopEnd, hopStart, tumbleEnd, and tumbleStart. [#68098](https://github.com/ClickHouse/ClickHouse/pull/68098) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Backported in [#68173](https://github.com/ClickHouse/ClickHouse/issues/68173): Removes an incorrect optimization to remove sorting in subqueries that use `OFFSET`. Fixes [#67906](https://github.com/ClickHouse/ClickHouse/issues/67906). [#68099](https://github.com/ClickHouse/ClickHouse/pull/68099) ([Graham Campbell](https://github.com/GrahamCampbell)).
* Backported in [#68339](https://github.com/ClickHouse/ClickHouse/issues/68339): Try fix postgres crash when query is cancelled. [#68288](https://github.com/ClickHouse/ClickHouse/pull/68288) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Backported in [#68396](https://github.com/ClickHouse/ClickHouse/issues/68396): Fix missing sync replica mode in query `SYSTEM SYNC REPLICA`. [#68326](https://github.com/ClickHouse/ClickHouse/pull/68326) ([Duc Canh Le](https://github.com/canhld94)).
* Backported in [#68668](https://github.com/ClickHouse/ClickHouse/issues/68668): Fix `LOGICAL_ERROR`s when functions `sipHash64Keyed`, `sipHash128Keyed`, or `sipHash128ReferenceKeyed` are applied to empty arrays or tuples. [#68630](https://github.com/ClickHouse/ClickHouse/pull/68630) ([Robert Schulze](https://github.com/rschu1ze)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Update version after release. [#67909](https://github.com/ClickHouse/ClickHouse/pull/67909) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Backported in [#68079](https://github.com/ClickHouse/ClickHouse/issues/68079): Add an explicit error for `ALTER MODIFY SQL SECURITY` on non-view tables. [#67953](https://github.com/ClickHouse/ClickHouse/pull/67953) ([pufit](https://github.com/pufit)).
* Backported in [#68758](https://github.com/ClickHouse/ClickHouse/issues/68758): To make patch release possible from every commit on release branch, package_debug build is required and must not be skipped. [#68750](https://github.com/ClickHouse/ClickHouse/pull/68750) ([Max K.](https://github.com/maxknv)).

View File

@ -0,0 +1,55 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.7.3.47-stable (2e50fe27a14) FIXME as compared to v24.7.2.13-stable (6e41f601b2f)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#68232](https://github.com/ClickHouse/ClickHouse/issues/68232): Fixed `Not-ready Set` in some system tables when filtering using subqueries. [#66018](https://github.com/ClickHouse/ClickHouse/pull/66018) ([Michael Kolupaev](https://github.com/al13n321)).
* Backported in [#67969](https://github.com/ClickHouse/ClickHouse/issues/67969): Fixed reading of subcolumns after `ALTER ADD COLUMN` query. [#66243](https://github.com/ClickHouse/ClickHouse/pull/66243) ([Anton Popov](https://github.com/CurtizJ)).
* Backported in [#68068](https://github.com/ClickHouse/ClickHouse/issues/68068): Fix boolean literals in query sent to external database (for engines like `PostgreSQL`). [#66282](https://github.com/ClickHouse/ClickHouse/pull/66282) ([vdimir](https://github.com/vdimir)).
* Backported in [#67637](https://github.com/ClickHouse/ClickHouse/issues/67637): Fix for occasional deadlock in Context::getDDLWorker. [#66843](https://github.com/ClickHouse/ClickHouse/pull/66843) ([Alexander Gololobov](https://github.com/davenger)).
* Backported in [#67820](https://github.com/ClickHouse/ClickHouse/issues/67820): Fix possible deadlock on query cancel with parallel replicas. [#66905](https://github.com/ClickHouse/ClickHouse/pull/66905) ([Nikita Taranov](https://github.com/nickitat)).
* Backported in [#67818](https://github.com/ClickHouse/ClickHouse/issues/67818): Only relevant to the experimental Variant data type. Fix crash with Variant + AggregateFunction type. [#67122](https://github.com/ClickHouse/ClickHouse/pull/67122) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#67766](https://github.com/ClickHouse/ClickHouse/issues/67766): Fix crash of `uniq` and `uniqTheta ` with `tuple()` argument. Closes [#67303](https://github.com/ClickHouse/ClickHouse/issues/67303). [#67306](https://github.com/ClickHouse/ClickHouse/pull/67306) ([flynn](https://github.com/ucasfl)).
* Backported in [#67881](https://github.com/ClickHouse/ClickHouse/issues/67881): Correctly parse file name/URI containing `::` if it's not an archive. [#67433](https://github.com/ClickHouse/ClickHouse/pull/67433) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#68613](https://github.com/ClickHouse/ClickHouse/issues/68613): Fixes [#66026](https://github.com/ClickHouse/ClickHouse/issues/66026). Avoid unresolved table function arguments traversal in `ReplaceTableNodeToDummyVisitor`. [#67522](https://github.com/ClickHouse/ClickHouse/pull/67522) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#67854](https://github.com/ClickHouse/ClickHouse/issues/67854): Fixes [#66026](https://github.com/ClickHouse/ClickHouse/issues/66026). Avoid unresolved table function arguments traversal in `ReplaceTableNodeToDummyVisitor`. [#67522](https://github.com/ClickHouse/ClickHouse/pull/67522) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#68278](https://github.com/ClickHouse/ClickHouse/issues/68278): Fix inserting into stream like engines (Kafka, RabbitMQ, NATS) through HTTP interface. [#67554](https://github.com/ClickHouse/ClickHouse/pull/67554) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#68040](https://github.com/ClickHouse/ClickHouse/issues/68040): Fix creation of view with recursive CTE. [#67587](https://github.com/ClickHouse/ClickHouse/pull/67587) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Backported in [#68038](https://github.com/ClickHouse/ClickHouse/issues/68038): Fix crash on `percent_rank`. `percent_rank`'s default frame type is changed to `range unbounded preceding and unbounded following`. `IWindowFunction`'s default window frame is considered and now window functions without window frame definition in sql can be put into different `WindowTransfomer`s properly. [#67661](https://github.com/ClickHouse/ClickHouse/pull/67661) ([lgbo](https://github.com/lgbo-ustc)).
* Backported in [#67713](https://github.com/ClickHouse/ClickHouse/issues/67713): Fix reloading SQL UDFs with UNION. Previously, restarting the server could make UDF invalid. [#67665](https://github.com/ClickHouse/ClickHouse/pull/67665) ([Antonio Andelic](https://github.com/antonio2368)).
* Backported in [#67840](https://github.com/ClickHouse/ClickHouse/issues/67840): Fix potential stack overflow in `JSONMergePatch` function. Renamed this function from `jsonMergePatch` to `JSONMergePatch` because the previous name was wrong. The previous name is still kept for compatibility. Improved diagnostic of errors in the function. This closes [#67304](https://github.com/ClickHouse/ClickHouse/issues/67304). [#67756](https://github.com/ClickHouse/ClickHouse/pull/67756) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#67995](https://github.com/ClickHouse/ClickHouse/issues/67995): Validate experimental/suspicious data types in ALTER ADD/MODIFY COLUMN. [#67911](https://github.com/ClickHouse/ClickHouse/pull/67911) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#68224](https://github.com/ClickHouse/ClickHouse/issues/68224): Fix wrong `count()` result when there is non-deterministic function in predicate. [#67922](https://github.com/ClickHouse/ClickHouse/pull/67922) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#68095](https://github.com/ClickHouse/ClickHouse/issues/68095): Fixed the calculation of the maximum thread soft limit in containerized environments where the usable CPU count is limited. [#67963](https://github.com/ClickHouse/ClickHouse/pull/67963) ([Robert Schulze](https://github.com/rschu1ze)).
* Backported in [#68126](https://github.com/ClickHouse/ClickHouse/issues/68126): Fixed skipping of untouched parts in mutations with new analyzer. Previously with enabled analyzer data in part could be rewritten by mutation even if mutation doesn't affect this part according to predicate. [#68052](https://github.com/ClickHouse/ClickHouse/pull/68052) ([Anton Popov](https://github.com/CurtizJ)).
* Backported in [#68223](https://github.com/ClickHouse/ClickHouse/issues/68223): Fixed a NULL pointer dereference, triggered by a specially crafted query, that crashed the server via hopEnd, hopStart, tumbleEnd, and tumbleStart. [#68098](https://github.com/ClickHouse/ClickHouse/pull/68098) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Backported in [#68175](https://github.com/ClickHouse/ClickHouse/issues/68175): Removes an incorrect optimization to remove sorting in subqueries that use `OFFSET`. Fixes [#67906](https://github.com/ClickHouse/ClickHouse/issues/67906). [#68099](https://github.com/ClickHouse/ClickHouse/pull/68099) ([Graham Campbell](https://github.com/GrahamCampbell)).
* Backported in [#68341](https://github.com/ClickHouse/ClickHouse/issues/68341): Try fix postgres crash when query is cancelled. [#68288](https://github.com/ClickHouse/ClickHouse/pull/68288) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Backported in [#68398](https://github.com/ClickHouse/ClickHouse/issues/68398): Fix missing sync replica mode in query `SYSTEM SYNC REPLICA`. [#68326](https://github.com/ClickHouse/ClickHouse/pull/68326) ([Duc Canh Le](https://github.com/canhld94)).
* Backported in [#68669](https://github.com/ClickHouse/ClickHouse/issues/68669): Fix `LOGICAL_ERROR`s when functions `sipHash64Keyed`, `sipHash128Keyed`, or `sipHash128ReferenceKeyed` are applied to empty arrays or tuples. [#68630](https://github.com/ClickHouse/ClickHouse/pull/68630) ([Robert Schulze](https://github.com/rschu1ze)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Backported in [#67518](https://github.com/ClickHouse/ClickHouse/issues/67518): Split slow test 03036_dynamic_read_subcolumns. [#66954](https://github.com/ClickHouse/ClickHouse/pull/66954) ([Nikita Taranov](https://github.com/nickitat)).
* Backported in [#67516](https://github.com/ClickHouse/ClickHouse/issues/67516): Split 01508_partition_pruning_long. [#66983](https://github.com/ClickHouse/ClickHouse/pull/66983) ([Nikita Taranov](https://github.com/nickitat)).
* Backported in [#67529](https://github.com/ClickHouse/ClickHouse/issues/67529): Reduce max time of 00763_long_lock_buffer_alter_destination_table. [#67185](https://github.com/ClickHouse/ClickHouse/pull/67185) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#67803](https://github.com/ClickHouse/ClickHouse/issues/67803): Disable some Dynamic tests under sanitizers, rewrite 03202_dynamic_null_map_subcolumn to sql. [#67359](https://github.com/ClickHouse/ClickHouse/pull/67359) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#67643](https://github.com/ClickHouse/ClickHouse/issues/67643): [Green CI] Fix potentially flaky test_mask_sensitive_info integration test. [#67506](https://github.com/ClickHouse/ClickHouse/pull/67506) ([Alexey Katsman](https://github.com/alexkats)).
* Backported in [#67609](https://github.com/ClickHouse/ClickHouse/issues/67609): Fix test_zookeeper_config_load_balancing after adding the xdist worker name to the instance. [#67590](https://github.com/ClickHouse/ClickHouse/pull/67590) ([Pablo Marcos](https://github.com/pamarcos)).
* Backported in [#67871](https://github.com/ClickHouse/ClickHouse/issues/67871): Fix 02434_cancel_insert_when_client_dies. [#67600](https://github.com/ClickHouse/ClickHouse/pull/67600) ([vdimir](https://github.com/vdimir)).
* Backported in [#67704](https://github.com/ClickHouse/ClickHouse/issues/67704): Fix 02910_bad_logs_level_in_local in fast tests. [#67603](https://github.com/ClickHouse/ClickHouse/pull/67603) ([Raúl Marín](https://github.com/Algunenano)).
* Backported in [#67689](https://github.com/ClickHouse/ClickHouse/issues/67689): Fix 01605_adaptive_granularity_block_borders. [#67605](https://github.com/ClickHouse/ClickHouse/pull/67605) ([Nikita Taranov](https://github.com/nickitat)).
* Backported in [#67827](https://github.com/ClickHouse/ClickHouse/issues/67827): Try fix 03143_asof_join_ddb_long. [#67620](https://github.com/ClickHouse/ClickHouse/pull/67620) ([Nikita Taranov](https://github.com/nickitat)).
* Backported in [#67892](https://github.com/ClickHouse/ClickHouse/issues/67892): Revert "Merge pull request [#66510](https://github.com/ClickHouse/ClickHouse/issues/66510) from canhld94/fix_trivial_count_non_deterministic_func". [#67800](https://github.com/ClickHouse/ClickHouse/pull/67800) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#68081](https://github.com/ClickHouse/ClickHouse/issues/68081): Add an explicit error for `ALTER MODIFY SQL SECURITY` on non-view tables. [#67953](https://github.com/ClickHouse/ClickHouse/pull/67953) ([pufit](https://github.com/pufit)).
* Update version after release. [#68044](https://github.com/ClickHouse/ClickHouse/pull/68044) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Backported in [#68269](https://github.com/ClickHouse/ClickHouse/issues/68269): [Green CI] Fix test 01903_correct_block_size_prediction_with_default. [#68203](https://github.com/ClickHouse/ClickHouse/pull/68203) ([Pablo Marcos](https://github.com/pamarcos)).
* Backported in [#68432](https://github.com/ClickHouse/ClickHouse/issues/68432): tests: make 01600_parts_states_metrics_long better. [#68265](https://github.com/ClickHouse/ClickHouse/pull/68265) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#68538](https://github.com/ClickHouse/ClickHouse/issues/68538): CI: Native build for package_aarch64. [#68457](https://github.com/ClickHouse/ClickHouse/pull/68457) ([Max K.](https://github.com/maxknv)).
* Backported in [#68555](https://github.com/ClickHouse/ClickHouse/issues/68555): CI: Minor release workflow fix. [#68536](https://github.com/ClickHouse/ClickHouse/pull/68536) ([Max K.](https://github.com/maxknv)).

View File

@ -0,0 +1,36 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.7.4.51-stable (70fe2f6fa52) FIXME as compared to v24.7.3.42-stable (63730bc4293)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#68232](https://github.com/ClickHouse/ClickHouse/issues/68232): Fixed `Not-ready Set` in some system tables when filtering using subqueries. [#66018](https://github.com/ClickHouse/ClickHouse/pull/66018) ([Michael Kolupaev](https://github.com/al13n321)).
* Backported in [#68068](https://github.com/ClickHouse/ClickHouse/issues/68068): Fix boolean literals in query sent to external database (for engines like `PostgreSQL`). [#66282](https://github.com/ClickHouse/ClickHouse/pull/66282) ([vdimir](https://github.com/vdimir)).
* Backported in [#68613](https://github.com/ClickHouse/ClickHouse/issues/68613): Fixes [#66026](https://github.com/ClickHouse/ClickHouse/issues/66026). Avoid unresolved table function arguments traversal in `ReplaceTableNodeToDummyVisitor`. [#67522](https://github.com/ClickHouse/ClickHouse/pull/67522) ([Dmitry Novik](https://github.com/novikd)).
* Backported in [#68278](https://github.com/ClickHouse/ClickHouse/issues/68278): Fix inserting into stream like engines (Kafka, RabbitMQ, NATS) through HTTP interface. [#67554](https://github.com/ClickHouse/ClickHouse/pull/67554) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#68040](https://github.com/ClickHouse/ClickHouse/issues/68040): Fix creation of view with recursive CTE. [#67587](https://github.com/ClickHouse/ClickHouse/pull/67587) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Backported in [#68038](https://github.com/ClickHouse/ClickHouse/issues/68038): Fix crash on `percent_rank`. `percent_rank`'s default frame type is changed to `range unbounded preceding and unbounded following`. `IWindowFunction`'s default window frame is considered and now window functions without window frame definition in sql can be put into different `WindowTransfomer`s properly. [#67661](https://github.com/ClickHouse/ClickHouse/pull/67661) ([lgbo](https://github.com/lgbo-ustc)).
* Backported in [#68224](https://github.com/ClickHouse/ClickHouse/issues/68224): Fix wrong `count()` result when there is non-deterministic function in predicate. [#67922](https://github.com/ClickHouse/ClickHouse/pull/67922) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#68095](https://github.com/ClickHouse/ClickHouse/issues/68095): Fixed the calculation of the maximum thread soft limit in containerized environments where the usable CPU count is limited. [#67963](https://github.com/ClickHouse/ClickHouse/pull/67963) ([Robert Schulze](https://github.com/rschu1ze)).
* Backported in [#68126](https://github.com/ClickHouse/ClickHouse/issues/68126): Fixed skipping of untouched parts in mutations with new analyzer. Previously with enabled analyzer data in part could be rewritten by mutation even if mutation doesn't affect this part according to predicate. [#68052](https://github.com/ClickHouse/ClickHouse/pull/68052) ([Anton Popov](https://github.com/CurtizJ)).
* Backported in [#68223](https://github.com/ClickHouse/ClickHouse/issues/68223): Fixed a NULL pointer dereference, triggered by a specially crafted query, that crashed the server via hopEnd, hopStart, tumbleEnd, and tumbleStart. [#68098](https://github.com/ClickHouse/ClickHouse/pull/68098) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Backported in [#68175](https://github.com/ClickHouse/ClickHouse/issues/68175): Removes an incorrect optimization to remove sorting in subqueries that use `OFFSET`. Fixes [#67906](https://github.com/ClickHouse/ClickHouse/issues/67906). [#68099](https://github.com/ClickHouse/ClickHouse/pull/68099) ([Graham Campbell](https://github.com/GrahamCampbell)).
* Backported in [#68341](https://github.com/ClickHouse/ClickHouse/issues/68341): Try fix postgres crash when query is cancelled. [#68288](https://github.com/ClickHouse/ClickHouse/pull/68288) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Backported in [#68398](https://github.com/ClickHouse/ClickHouse/issues/68398): Fix missing sync replica mode in query `SYSTEM SYNC REPLICA`. [#68326](https://github.com/ClickHouse/ClickHouse/pull/68326) ([Duc Canh Le](https://github.com/canhld94)).
* Backported in [#68669](https://github.com/ClickHouse/ClickHouse/issues/68669): Fix `LOGICAL_ERROR`s when functions `sipHash64Keyed`, `sipHash128Keyed`, or `sipHash128ReferenceKeyed` are applied to empty arrays or tuples. [#68630](https://github.com/ClickHouse/ClickHouse/pull/68630) ([Robert Schulze](https://github.com/rschu1ze)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Backported in [#67803](https://github.com/ClickHouse/ClickHouse/issues/67803): Disable some Dynamic tests under sanitizers, rewrite 03202_dynamic_null_map_subcolumn to sql. [#67359](https://github.com/ClickHouse/ClickHouse/pull/67359) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#68081](https://github.com/ClickHouse/ClickHouse/issues/68081): Add an explicit error for `ALTER MODIFY SQL SECURITY` on non-view tables. [#67953](https://github.com/ClickHouse/ClickHouse/pull/67953) ([pufit](https://github.com/pufit)).
* Update version after release. [#68044](https://github.com/ClickHouse/ClickHouse/pull/68044) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Backported in [#68269](https://github.com/ClickHouse/ClickHouse/issues/68269): [Green CI] Fix test 01903_correct_block_size_prediction_with_default. [#68203](https://github.com/ClickHouse/ClickHouse/pull/68203) ([Pablo Marcos](https://github.com/pamarcos)).
* Backported in [#68432](https://github.com/ClickHouse/ClickHouse/issues/68432): tests: make 01600_parts_states_metrics_long better. [#68265](https://github.com/ClickHouse/ClickHouse/pull/68265) ([Azat Khuzhin](https://github.com/azat)).
* Backported in [#68538](https://github.com/ClickHouse/ClickHouse/issues/68538): CI: Native build for package_aarch64. [#68457](https://github.com/ClickHouse/ClickHouse/pull/68457) ([Max K.](https://github.com/maxknv)).
* Backported in [#68555](https://github.com/ClickHouse/ClickHouse/issues/68555): CI: Minor release workflow fix. [#68536](https://github.com/ClickHouse/ClickHouse/pull/68536) ([Max K.](https://github.com/maxknv)).
* Backported in [#68760](https://github.com/ClickHouse/ClickHouse/issues/68760): To make patch release possible from every commit on release branch, package_debug build is required and must not be skipped. [#68750](https://github.com/ClickHouse/ClickHouse/pull/68750) ([Max K.](https://github.com/maxknv)).

View File

@ -0,0 +1,12 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.8.2.3-lts (b54f79ed323) FIXME as compared to v24.8.1.2684-lts (161c62fd295)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Backported in [#68670](https://github.com/ClickHouse/ClickHouse/issues/68670): Fix `LOGICAL_ERROR`s when functions `sipHash64Keyed`, `sipHash128Keyed`, or `sipHash128ReferenceKeyed` are applied to empty arrays or tuples. [#68630](https://github.com/ClickHouse/ClickHouse/pull/68630) ([Robert Schulze](https://github.com/rschu1ze)).

View File

@ -6,28 +6,34 @@ sidebar_label: Iceberg
# Iceberg Table Engine
This engine provides a read-only integration with existing Apache [Iceberg](https://iceberg.apache.org/) tables in Amazon S3.
This engine provides a read-only integration with existing Apache [Iceberg](https://iceberg.apache.org/) tables in Amazon S3, Azure and locally stored tables.
## Create Table
Note that the Iceberg table must already exist in S3, this command does not take DDL parameters to create a new table.
Note that the Iceberg table must already exist in the storage, this command does not take DDL parameters to create a new table.
``` sql
CREATE TABLE iceberg_table
ENGINE = Iceberg(url, [aws_access_key_id, aws_secret_access_key,])
CREATE TABLE iceberg_table_s3
ENGINE = IcebergS3(url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression])
CREATE TABLE iceberg_table_azure
ENGINE = IcebergAzure(connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])
CREATE TABLE iceberg_table_local
ENGINE = IcebergLocal(path_to_table, [,format] [,compression_method])
```
**Engine parameters**
**Engine arguments**
- `url` — url with the path to an existing Iceberg table.
- `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the [AWS](https://aws.amazon.com/) account user. You can use these to authenticate your requests. Parameter is optional. If credentials are not specified, they are used from the configuration file.
Description of the arguments coincides with description of arguments in engines `S3`, `AzureBlobStorage` and `File` correspondingly.
`format` stands for the format of data files in the Iceberg table.
Engine parameters can be specified using [Named Collections](../../../operations/named-collections.md)
**Example**
```sql
CREATE TABLE iceberg_table ENGINE=Iceberg('http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')
CREATE TABLE iceberg_table ENGINE=IcebergS3('http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')
```
Using named collections:
@ -45,9 +51,15 @@ Using named collections:
```
```sql
CREATE TABLE iceberg_table ENGINE=Iceberg(iceberg_conf, filename = 'test_table')
CREATE TABLE iceberg_table ENGINE=IcebergS3(iceberg_conf, filename = 'test_table')
```
**Aliases**
Table engine `Iceberg` is an alias to `IcebergS3` now.
## See also
- [iceberg table function](/docs/en/sql-reference/table-functions/iceberg.md)

View File

@ -54,7 +54,7 @@ Parameters:
- `distance_function`: either `L2Distance` (the [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) - the length of a
line between two points in Euclidean space), or `cosineDistance` (the [cosine
distance](https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_distance)- the angle between two non-zero vectors).
- `quantization`: either `f32`, `f16`, or `i8` for storing the vector with reduced precision (optional, default: `f32`)
- `quantization`: either `f64`, `f32`, `f16`, `bf16`, or `i8` for storing the vector with reduced precision (optional, default: `bf16`)
- `m`: the number of neighbors per graph node (optional, default: 16)
- `ef_construction`: (optional, default: 128)
- `ef_search`: (optional, default: 64)

View File

@ -80,7 +80,7 @@ For partitioning by month, use the `toYYYYMM(date_column)` expression, where `da
`PRIMARY KEY` — The primary key if it [differs from the sorting key](#choosing-a-primary-key-that-differs-from-the-sorting-key). Optional.
Specifying a sorting key (using `ORDER BY` clause) implicitly specifies a primary key.
It is usually not necessary to specify the primary key in addition to the primary key.
It is usually not necessary to specify the primary key in addition to the sorting key.
#### SAMPLE BY

View File

@ -109,6 +109,7 @@ For partitioning by month, use the `toYYYYMM(date_column)` expression, where `da
- `_file` — Resource name of the `URL`. Type: `LowCardinalty(String)`.
- `_size` — Size of the resource in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
- `_headers` - HTTP response headers. Type: `Map(LowCardinality(String), LowCardinality(String))`.
## Storage Settings {#storage-settings}

View File

@ -1389,7 +1389,7 @@ DESC format(JSONEachRow, '{"id" : 1, "age" : 25, "name" : "Josh", "status" : nul
#### schema_inference_make_columns_nullable
Controls making inferred types `Nullable` in schema inference for formats without information about nullability.
If the setting is enabled, all inferred type will be `Nullable`, if disabled, the inferred type will be `Nullable` only if `input_format_null_as_default` is disabled and the column contains `NULL` in a sample that is parsed during schema inference.
If the setting is enabled, all inferred type will be `Nullable`, if disabled, the inferred type will never be `Nullable`, if set to `auto`, the inferred type will be `Nullable` only if the column contains `NULL` in a sample that is parsed during schema inference or file metadata contains information about column nullability.
Enabled by default.
@ -1412,15 +1412,13 @@ DESC format(JSONEachRow, $$
└─────────┴─────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
```sql
SET schema_inference_make_columns_nullable = 0;
SET input_format_null_as_default = 0;
SET schema_inference_make_columns_nullable = 'auto';
DESC format(JSONEachRow, $$
{"id" : 1, "age" : 25, "name" : "Josh", "status" : null, "hobbies" : ["football", "cooking"]}
{"id" : 2, "age" : 19, "name" : "Alan", "status" : "married", "hobbies" : ["tennis", "art"]}
$$)
```
```response
┌─name────┬─type─────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ id │ Int64 │ │ │ │ │ │
│ age │ Int64 │ │ │ │ │ │
@ -1432,7 +1430,6 @@ DESC format(JSONEachRow, $$
```sql
SET schema_inference_make_columns_nullable = 0;
SET input_format_null_as_default = 1;
DESC format(JSONEachRow, $$
{"id" : 1, "age" : 25, "name" : "Josh", "status" : null, "hobbies" : ["football", "cooking"]}
{"id" : 2, "age" : 19, "name" : "Alan", "status" : "married", "hobbies" : ["tennis", "art"]}

View File

@ -155,6 +155,8 @@ SELECT 1 SETTINGS use_query_cache = true, query_cache_tag = 'tag 1';
SELECT 1 SETTINGS use_query_cache = true, query_cache_tag = 'tag 2';
```
To remove only entries with tag `tag` from the query cache, you can use statement `SYSTEM DROP QUERY CACHE TAG 'tag'`.
ClickHouse reads table data in blocks of [max_block_size](settings/settings.md#setting-max_block_size) rows. Due to filtering, aggregation,
etc., result blocks are typically much smaller than 'max_block_size' but there are also cases where they are much bigger. Setting
[query_cache_squash_partial_results](settings/settings.md#query-cache-squash-partial-results) (enabled by default) controls if result blocks

View File

@ -171,8 +171,8 @@ If the `schema_inference_hints` is not formated properly, or if there is a typo
## schema_inference_make_columns_nullable {#schema_inference_make_columns_nullable}
Controls making inferred types `Nullable` in schema inference for formats without information about nullability.
If the setting is enabled, the inferred type will be `Nullable` only if column contains `NULL` in a sample that is parsed during schema inference.
Controls making inferred types `Nullable` in schema inference.
If the setting is enabled, all inferred type will be `Nullable`, if disabled, the inferred type will never be `Nullable`, if set to `auto`, the inferred type will be `Nullable` only if the column contains `NULL` in a sample that is parsed during schema inference or file metadata contains information about column nullability.
Default value: `true`.

View File

@ -2855,7 +2855,7 @@ The minimum chunk size in bytes, which each thread will parse in parallel.
## merge_selecting_sleep_ms {#merge_selecting_sleep_ms}
Sleep time for merge selecting when no part is selected. A lower setting triggers selecting tasks in `background_schedule_pool` frequently, which results in a large number of requests to ClickHouse Keeper in large-scale clusters.
Minimum time to wait before trying to select parts to merge again after no parts were selected. A lower setting triggers selecting tasks in `background_schedule_pool` frequently, which results in a large number of requests to ClickHouse Keeper in large-scale clusters.
Possible values:
@ -2863,6 +2863,16 @@ Possible values:
Default value: `5000`.
## max_merge_selecting_sleep_ms
Maximum time to wait before trying to select parts to merge again after no parts were selected. A lower setting triggers selecting tasks in `background_schedule_pool` frequently, which results in a large number of requests to ClickHouse Keeper in large-scale clusters.
Possible values:
- Any positive integer.
Default value: `60000`.
## parallel_distributed_insert_select {#parallel_distributed_insert_select}
Enables parallel distributed `INSERT ... SELECT` query.
@ -5623,7 +5633,6 @@ Default value: `1GiB`.
## use_json_alias_for_old_object_type
When enabled, `JSON` data type alias will be used to create an old [Object('json')](../../sql-reference/data-types/json.md) type instead of the new [JSON](../../sql-reference/data-types/newjson.md) type.
This setting requires server restart to take effect when changed.
Default value: `false`.

View File

@ -0,0 +1,41 @@
---
slug: /en/operations/system-tables/projections
---
# projections
Contains information about existing projections in all the tables.
Columns:
- `database` ([String](../../sql-reference/data-types/string.md)) — Database name.
- `table` ([String](../../sql-reference/data-types/string.md)) — Table name.
- `name` ([String](../../sql-reference/data-types/string.md)) — Projection name.
- `type` ([Enum](../../sql-reference/data-types/enum.md)) — Projection type ('Normal' = 0, 'Aggregate' = 1).
- `sorting_key` ([Array(String)](../../sql-reference/data-types/array.md)) — Projection sorting key.
- `query` ([String](../../sql-reference/data-types/string.md)) — Projection query.
**Example**
```sql
SELECT * FROM system.projections LIMIT 2 FORMAT Vertical;
```
```text
Row 1:
──────
database: default
table: landing
name: improved_sorting_key
type: Normal
sorting_key: ['user_id','date']
query: SELECT * ORDER BY user_id, date
Row 2:
──────
database: default
table: landing
name: agg_no_key
type: Aggregate
sorting_key: []
query: SELECT count()
```

View File

@ -70,7 +70,7 @@ SELECT '{"a" : {"b" : 42},"c" : [1, 2, 3], "d" : "Hello, World!"}'::JSON as json
└────────────────────────────────────────────────┘
```
CAST from named `Tuple`, `Map` and `Object('json')` to `JSON` type will be supported later.
CAST from `JSON`, named `Tuple`, `Map` and `Object('json')` to `JSON` type will be supported later.
## Reading JSON paths as subcolumns

View File

@ -53,29 +53,28 @@ SELECT now() as current_date_time, current_date_time + INTERVAL 4 DAY
└─────────────────────┴───────────────────────────────┘
```
Intervals with different types cant be combined. You cant use intervals like `4 DAY 1 HOUR`. Specify intervals in units that are smaller or equal to the smallest unit of the interval, for example, the interval `1 day and an hour` interval can be expressed as `25 HOUR` or `90000 SECOND`.
You cant perform arithmetical operations with `Interval`-type values, but you can add intervals of different types consequently to values in `Date` or `DateTime` data types. For example:
Also it is possible to use multiple intervals simultaneously:
``` sql
SELECT now() AS current_date_time, current_date_time + INTERVAL 4 DAY + INTERVAL 3 HOUR
SELECT now() AS current_date_time, current_date_time + (INTERVAL 4 DAY + INTERVAL 3 HOUR)
```
``` text
┌───current_date_time─┬─plus(plus(now(), toIntervalDay(4)), toIntervalHour(3))─┐
│ 2019-10-23 11:16:28 │ 2019-10-27 14:16:28
└─────────────────────┴────────────────────────────────────────────────────────┘
┌───current_date_time─┬─plus(current_date_time, plus(toIntervalDay(4), toIntervalHour(3)))─┐
│ 2024-08-08 18:31:39 │ 2024-08-12 21:31:39
└─────────────────────┴────────────────────────────────────────────────────────────────────
```
The following query causes an exception:
And to compare values with different intervals:
``` sql
select now() AS current_date_time, current_date_time + (INTERVAL 4 DAY + INTERVAL 3 HOUR)
SELECT toIntervalMicrosecond(3600000000) = toIntervalHour(1);
```
``` text
Received exception from server (version 19.14.1):
Code: 43. DB::Exception: Received from localhost:9000. DB::Exception: Wrong argument types for function plus: if one argument is Interval, then another must be Date or DateTime..
┌─less(toIntervalMicrosecond(179999999), toIntervalMinute(3))─┐
│ 1 │
└─────────────────────────────────────────────────────────────┘
```
## See Also

View File

@ -49,7 +49,7 @@ Result:
## multiIf
Allows to write the [CASE](../../sql-reference/operators/index.md#operator_case) operator more compactly in the query.
Allows to write the [CASE](../../sql-reference/operators/index.md#conditional-expression) operator more compactly in the query.
**Syntax**

View File

@ -4287,7 +4287,7 @@ Result:
## fromModifiedJulianDay
Converts a [Modified Julian Day](https://en.wikipedia.org/wiki/Julian_day#Variants) number to a [Proleptic Gregorian calendar](https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar) date in text form `YYYY-MM-DD`. This function supports day number from `-678941` to `2973119` (which represent 0000-01-01 and 9999-12-31 respectively). It raises an exception if the day number is outside of the supported range.
Converts a [Modified Julian Day](https://en.wikipedia.org/wiki/Julian_day#Variants) number to a [Proleptic Gregorian calendar](https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar) date in text form `YYYY-MM-DD`. This function supports day number from `-678941` to `2973483` (which represent 0000-01-01 and 9999-12-31 respectively). It raises an exception if the day number is outside of the supported range.
**Syntax**

View File

@ -6,7 +6,7 @@ title: "Functions for Working with Geohash"
## Geohash
[Geohash](https://en.wikipedia.org/wiki/Geohash) is the geocode system, which subdivides Earths surface into buckets of grid shape and encodes each cell into a short string of letters and digits. It is a hierarchical data structure, so the longer is the geohash string, the more precise is the geographic location.
[Geohash](https://en.wikipedia.org/wiki/Geohash) is the geocode system, which subdivides Earths surface into buckets of grid shape and encodes each cell into a short string of letters and digits. It is a hierarchical data structure, so the longer the geohash string is, the more precise the geographic location will be.
If you need to manually convert geographic coordinates to geohash strings, you can use [geohash.org](http://geohash.org/).
@ -14,26 +14,37 @@ If you need to manually convert geographic coordinates to geohash strings, you c
Encodes latitude and longitude as a [geohash](#geohash)-string.
**Syntax**
``` sql
geohashEncode(longitude, latitude, [precision])
```
**Input values**
- longitude - longitude part of the coordinate you want to encode. Floating in range`[-180°, 180°]`
- latitude - latitude part of the coordinate you want to encode. Floating in range `[-90°, 90°]`
- precision - Optional, length of the resulting encoded string, defaults to `12`. Integer in range `[1, 12]`. Any value less than `1` or greater than `12` is silently converted to `12`.
- `longitude` — Longitude part of the coordinate you want to encode. Floating in range`[-180°, 180°]`. [Float](../../data-types/float.md).
- `latitude` — Latitude part of the coordinate you want to encode. Floating in range `[-90°, 90°]`. [Float](../../data-types/float.md).
- `precision` (optional) — Length of the resulting encoded string. Defaults to `12`. Integer in the range `[1, 12]`. [Int8](../../data-types/int-uint.md).
:::note
- All coordinate parameters must be of the same type: either `Float32` or `Float64`.
- For the `precision` parameter, any value less than `1` or greater than `12` is silently converted to `12`.
:::
**Returned values**
- alphanumeric `String` of encoded coordinate (modified version of the base32-encoding alphabet is used).
- Alphanumeric string of the encoded coordinate (modified version of the base32-encoding alphabet is used). [String](../../data-types/string.md).
**Example**
Query:
``` sql
SELECT geohashEncode(-5.60302734375, 42.593994140625, 0) AS res;
```
Result:
``` text
┌─res──────────┐
│ ezs42d000000 │
@ -44,13 +55,19 @@ SELECT geohashEncode(-5.60302734375, 42.593994140625, 0) AS res;
Decodes any [geohash](#geohash)-encoded string into longitude and latitude.
**Syntax**
```sql
geohashDecode(hash_str)
```
**Input values**
- encoded string - geohash-encoded string.
- `hash_str` — Geohash-encoded string.
**Returned values**
- (longitude, latitude) - 2-tuple of `Float64` values of longitude and latitude.
- Tuple `(longitude, latitude)` of `Float64` values of longitude and latitude. [Tuple](../../data-types/tuple.md)([Float64](../../data-types/float.md))
**Example**

View File

@ -688,6 +688,40 @@ SELECT kostikConsistentHash(16045690984833335023, 2);
└───────────────────────────────────────────────┘
```
## ripeMD160
Produces [RIPEMD-160](https://en.wikipedia.org/wiki/RIPEMD) hash value.
**Syntax**
```sql
ripeMD160(input)
```
**Parameters**
- `input`: Input string. [String](../data-types/string.md)
**Returned value**
- A [UInt256](../data-types/int-uint.md) hash value where the 160-bit RIPEMD-160 hash is stored in the first 20 bytes. The remaining 12 bytes are zero-padded.
**Example**
Use the [hex](../functions/encoding-functions.md/#hex) function to represent the result as a hex-encoded string.
Query:
```sql
SELECT hex(ripeMD160('The quick brown fox jumps over the lazy dog'));
```
```response
┌─hex(ripeMD160('The quick brown fox jumps over the lazy dog'))─┐
│ 37F332F68DB77BD9D7EDD4969571AD671CF9DD3B │
└───────────────────────────────────────────────────────────────┘
```
## murmurHash2_32, murmurHash2_64
Produces a [MurmurHash2](https://github.com/aappleby/smhasher) hash value.

View File

@ -8,6 +8,78 @@ sidebar_label: Replacing in Strings
[General strings functions](string-functions.md) and [functions for searching in strings](string-search-functions.md) are described separately.
## overlay
Replace part of the string `input` with another string `replace`, starting at the 1-based index `offset`.
**Syntax**
```sql
overlay(s, replace, offset[, length])
```
**Parameters**
- `input`: A string type [String](../data-types/string.md).
- `replace`: A string type [String](../data-types/string.md).
- `offset`: An integer type [Int](../data-types/int-uint.md). If `offset` is negative, it is counted from the end of the `input` string.
- `length`: Optional. An integer type [Int](../data-types/int-uint.md). `length` specifies the length of the snippet within input to be replaced. If `length` is not specified, the number of bytes removed from `input` equals the length of `replace`; otherwise `length` bytes are removed.
**Returned value**
- A [String](../data-types/string.md) data type value.
**Example**
```sql
SELECT overlay('ClickHouse SQL', 'CORE', 12) AS res;
```
Result:
```text
┌─res─────────────┐
│ ClickHouse CORE │
└─────────────────┘
```
## overlayUTF8
Replace part of the string `input` with another string `replace`, starting at the 1-based index `offset`.
Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
**Syntax**
```sql
overlayUTF8(s, replace, offset[, length])
```
**Parameters**
- `s`: A string type [String](../data-types/string.md).
- `replace`: A string type [String](../data-types/string.md).
- `offset`: An integer type [Int](../data-types/int-uint.md). If `offset` is negative, it is counted from the end of the `input` string.
- `length`: Optional. An integer type [Int](../data-types/int-uint.md). `length` specifies the length of the snippet within input to be replaced. If `length` is not specified, the number of characters removed from `input` equals the length of `replace`; otherwise `length` characters are removed.
**Returned value**
- A [String](../data-types/string.md) data type value.
**Example**
```sql
SELECT overlayUTF8('ClickHouse是一款OLAP数据库', '开源', 12, 2) AS res;
```
Result:
```text
┌─res────────────────────────┐
│ ClickHouse是开源OLAP数据库 │
└────────────────────────────┘
```
## replaceOne
Replaces the first occurrence of the substring `pattern` in `haystack` by the `replacement` string.

View File

@ -49,6 +49,55 @@ SETTINGS cast_keep_nullable = 1
└──────────────────┴─────────────────────┴──────────────────┘
```
## toBool
Converts an input value to a value of type [`Bool`](../data-types/boolean.md). Throws an exception in case of an error.
**Syntax**
```sql
toBool(expr)
```
**Arguments**
- `expr` — Expression returning a number or a string. [Expression](../syntax.md/#syntax-expressions).
Supported arguments:
- Values of type (U)Int8/16/32/64/128/256.
- Values of type Float32/64.
- Strings `true` or `false` (case-insensitive).
**Returned value**
- Returns `true` or `false` based on evaluation of the argument. [Bool](../data-types/boolean.md).
**Example**
Query:
```sql
SELECT
toBool(toUInt8(1)),
toBool(toInt8(-1)),
toBool(toFloat32(1.01)),
toBool('true'),
toBool('false'),
toBool('FALSE')
FORMAT Vertical
```
Result:
```response
toBool(toUInt8(1)): true
toBool(toInt8(-1)): true
toBool(toFloat32(1.01)): true
toBool('true'): true
toBool('false'): false
toBool('FALSE'): false
```
## toInt8
Converts an input value to a value of type [`Int8`](../data-types/int-uint.md). Throws an exception in case of an error.

View File

@ -8,7 +8,7 @@ title: "CREATE ROW POLICY"
Creates a [row policy](../../../guides/sre/user-management/index.md#row-policy-management), i.e. a filter used to determine which rows a user can read from a table.
:::tip
Row policies makes sense only for users with readonly access. If user can modify table or copy partitions between tables, it defeats the restrictions of row policies.
Row policies make sense only for users with readonly access. If a user can modify a table or copy partitions between tables, it defeats the restrictions of row policies.
:::
Syntax:
@ -24,40 +24,40 @@ CREATE [ROW] POLICY [IF NOT EXISTS | OR REPLACE] policy_name1 [ON CLUSTER cluste
## USING Clause
Allows to specify a condition to filter rows. An user will see a row if the condition is calculated to non-zero for the row.
Allows specifying a condition to filter rows. A user will see a row if the condition is calculated to non-zero for the row.
## TO Clause
In the section `TO` you can provide a list of users and roles this policy should work for. For example, `CREATE ROW POLICY ... TO accountant, john@localhost`.
In the `TO` section you can provide a list of users and roles this policy should work for. For example, `CREATE ROW POLICY ... TO accountant, john@localhost`.
Keyword `ALL` means all the ClickHouse users including current user. Keyword `ALL EXCEPT` allow to exclude some users from the all users list, for example, `CREATE ROW POLICY ... TO ALL EXCEPT accountant, john@localhost`
Keyword `ALL` means all the ClickHouse users, including current user. Keyword `ALL EXCEPT` allows excluding some users from the all users list, for example, `CREATE ROW POLICY ... TO ALL EXCEPT accountant, john@localhost`
:::note
If there are no row policies defined for a table then any user can `SELECT` all the row from the table. Defining one or more row policies for the table makes the access to the table depending on the row policies no matter if those row policies are defined for the current user or not. For example, the following policy
If there are no row policies defined for a table, then any user can `SELECT` all the rows from the table. Defining one or more row policies for the table makes access to the table dependent on the row policies, no matter if those row policies are defined for the current user or not. For example, the following policy:
`CREATE ROW POLICY pol1 ON mydb.table1 USING b=1 TO mira, peter`
forbids the users `mira` and `peter` to see the rows with `b != 1`, and any non-mentioned user (e.g., the user `paul`) will see no rows from `mydb.table1` at all.
forbids the users `mira` and `peter` from seeing the rows with `b != 1`, and any non-mentioned user (e.g., the user `paul`) will see no rows from `mydb.table1` at all.
If that's not desirable it can't be fixed by adding one more row policy, like the following:
If that's not desirable, it can be fixed by adding one more row policy, like the following:
`CREATE ROW POLICY pol2 ON mydb.table1 USING 1 TO ALL EXCEPT mira, peter`
:::
## AS Clause
It's allowed to have more than one policy enabled on the same table for the same user at the one time. So we need a way to combine the conditions from multiple policies.
It's allowed to have more than one policy enabled on the same table for the same user at one time. So we need a way to combine the conditions from multiple policies.
By default policies are combined using the boolean `OR` operator. For example, the following policies
By default, policies are combined using the boolean `OR` operator. For example, the following policies:
``` sql
CREATE ROW POLICY pol1 ON mydb.table1 USING b=1 TO mira, peter
CREATE ROW POLICY pol2 ON mydb.table1 USING c=2 TO peter, antonio
```
enables the user `peter` to see rows with either `b=1` or `c=2`.
enable the user `peter` to see rows with either `b=1` or `c=2`.
The `AS` clause specifies how policies should be combined with other policies. Policies can be either permissive or restrictive. By default policies are permissive, which means they are combined using the boolean `OR` operator.
The `AS` clause specifies how policies should be combined with other policies. Policies can be either permissive or restrictive. By default, policies are permissive, which means they are combined using the boolean `OR` operator.
A policy can be defined as restrictive as an alternative. Restrictive policies are combined using the boolean `AND` operator.
@ -68,25 +68,25 @@ row_is_visible = (one or more of the permissive policies' conditions are non-zer
(all of the restrictive policies's conditions are non-zero)
```
For example, the following policies
For example, the following policies:
``` sql
CREATE ROW POLICY pol1 ON mydb.table1 USING b=1 TO mira, peter
CREATE ROW POLICY pol2 ON mydb.table1 USING c=2 AS RESTRICTIVE TO peter, antonio
```
enables the user `peter` to see rows only if both `b=1` AND `c=2`.
enable the user `peter` to see rows only if both `b=1` AND `c=2`.
Database policies are combined with table policies.
For example, the following policies
For example, the following policies:
``` sql
CREATE ROW POLICY pol1 ON mydb.* USING b=1 TO mira, peter
CREATE ROW POLICY pol2 ON mydb.table1 USING c=2 AS RESTRICTIVE TO peter, antonio
```
enables the user `peter` to see table1 rows only if both `b=1` AND `c=2`, although
enable the user `peter` to see table1 rows only if both `b=1` AND `c=2`, although
any other table in mydb would have only `b=1` policy applied for the user.

View File

@ -10,7 +10,7 @@ title: The Lightweight DELETE Statement
The lightweight `DELETE` statement removes rows from the table `[db.]table` that match the expression `expr`. It is only available for the *MergeTree table engine family.
``` sql
DELETE FROM [db.]table [ON CLUSTER cluster] WHERE expr;
DELETE FROM [db.]table [ON CLUSTER cluster] [IN PARTITION partition_expr] WHERE expr;
```
It is called "lightweight `DELETE`" to contrast it to the [ALTER table DELETE](/en/sql-reference/statements/alter/delete) command, which is a heavyweight process.

View File

@ -136,7 +136,13 @@ The compiled expression cache is enabled/disabled with the query/user/profile-le
## DROP QUERY CACHE
```sql
SYSTEM DROP QUERY CACHE;
SYSTEM DROP QUERY CACHE TAG '<tag>'
````
Clears the [query cache](../../operations/query-cache.md).
If a tag is specified, only query cache entries with the specified tag are deleted.
## DROP FORMAT SCHEMA CACHE {#system-drop-schema-format}

View File

@ -6,35 +6,37 @@ sidebar_label: iceberg
# iceberg Table Function
Provides a read-only table-like interface to Apache [Iceberg](https://iceberg.apache.org/) tables in Amazon S3.
Provides a read-only table-like interface to Apache [Iceberg](https://iceberg.apache.org/) tables in Amazon S3, Azure or locally stored.
## Syntax
``` sql
iceberg(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure])
icebergS3(url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method])
icebergS3(named_collection[, option=value [,..]])
icebergAzure(connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])
icebergAzure(named_collection[, option=value [,..]])
icebergLocal(path_to_table, [,format] [,compression_method])
icebergLocal(named_collection[, option=value [,..]])
```
## Arguments
- `url` — Bucket url with the path to an existing Iceberg table in S3.
- `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the [AWS](https://aws.amazon.com/) account user. You can use these to authenticate your requests. These parameters are optional. If credentials are not specified, they are used from the ClickHouse configuration. For more information see [Using S3 for Data Storage](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#table_engine-mergetree-s3).
- `format` — The [format](/docs/en/interfaces/formats.md/#formats) of the file. By default `Parquet` is used.
- `structure` — Structure of the table. Format `'column1_name column1_type, column2_name column2_type, ...'`.
Engine parameters can be specified using [Named Collections](/docs/en/operations/named-collections.md).
Description of the arguments coincides with description of arguments in table functions `s3`, `azureBlobStorage` and `file` correspondingly.
`format` stands for the format of data files in the Iceberg table.
**Returned value**
A table with the specified structure for reading data in the specified Iceberg table in S3.
A table with the specified structure for reading data in the specified Iceberg table.
**Example**
```sql
SELECT * FROM iceberg('http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')
SELECT * FROM icebergS3('http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')
```
:::important
ClickHouse currently supports reading v1 (v2 support is coming soon!) of the Iceberg format via the `iceberg` table function and `Iceberg` table engine.
ClickHouse currently supports reading v1 and v2 of the Iceberg format via the `icebergS3`, `icebergAzure` and `icebergLocal` table functions and `IcebergS3`, `icebergAzure` ans `icebergLocal` table engines.
:::
## Defining a named collection
@ -56,10 +58,14 @@ Here is an example of configuring a named collection for storing the URL and cre
```
```sql
SELECT * FROM iceberg(iceberg_conf, filename = 'test_table')
DESCRIBE iceberg(iceberg_conf, filename = 'test_table')
SELECT * FROM icebergS3(iceberg_conf, filename = 'test_table')
DESCRIBE icebergS3(iceberg_conf, filename = 'test_table')
```
**Aliases**
Table function `iceberg` is an alias to `icebergS3` now.
**See Also**
- [Iceberg engine](/docs/en/engines/table-engines/integrations/iceberg.md)

View File

@ -54,6 +54,7 @@ Character `|` inside patterns is used to specify failover addresses. They are it
- `_file` — Resource name of the `URL`. Type: `LowCardinalty(String)`.
- `_size` — Size of the resource in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
- `_headers` - HTTP response headers. Type: `Map(LowCardinality(String), LowCardinality(String))`.
## Hive-style partitioning {#hive-style-partitioning}

View File

@ -22,18 +22,26 @@ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not su
### Из deb-пакетов {#install-from-deb-packages}
Яндекс рекомендует использовать официальные скомпилированные `deb`-пакеты для Debian или Ubuntu. Для установки пакетов выполните:
Рекомендуется использовать официальные скомпилированные `deb`-пакеты для Debian или Ubuntu. Для установки пакетов выполните:
``` bash
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | sudo tee \
/etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
```
#### Установка ClickHouse server и client
```bash
sudo apt-get install -y clickhouse-server clickhouse-client
```
#### Запуск ClickHouse server
```bash
sudo service clickhouse-server start
clickhouse-client # or "clickhouse-client --password" if you've set up a password.
```
@ -55,7 +63,7 @@ clickhouse-client # or "clickhouse-client --password" if you've set up a passwor
:::
### Из rpm-пакетов {#from-rpm-packages}
Команда ClickHouse в Яндексе рекомендует использовать официальные предкомпилированные `rpm`-пакеты для CentOS, RedHat и всех остальных дистрибутивов Linux, основанных на rpm.
Команда ClickHouse рекомендует использовать официальные предкомпилированные `rpm`-пакеты для CentOS, RedHat и всех остальных дистрибутивов Linux, основанных на rpm.
#### Установка официального репозитория
@ -102,7 +110,7 @@ sudo yum install clickhouse-server clickhouse-client
### Из tgz-архивов {#from-tgz-archives}
Команда ClickHouse в Яндексе рекомендует использовать предкомпилированные бинарники из `tgz`-архивов для всех дистрибутивов, где невозможна установка `deb`- и `rpm`- пакетов.
Команда ClickHouse рекомендует использовать предкомпилированные бинарники из `tgz`-архивов для всех дистрибутивов, где невозможна установка `deb`- и `rpm`- пакетов.
Интересующую версию архивов можно скачать вручную с помощью `curl` или `wget` из репозитория https://packages.clickhouse.com/tgz/.
После этого архивы нужно распаковать и воспользоваться скриптами установки. Пример установки самой свежей версии:

View File

@ -54,29 +54,28 @@ SELECT now() as current_date_time, current_date_time + INTERVAL 4 DAY
└─────────────────────┴───────────────────────────────┘
```
Нельзя объединять интервалы различных типов. Нельзя использовать интервалы вида `4 DAY 1 HOUR`. Вместо этого выражайте интервал в единицах меньших или равных минимальной единице интервала, например, интервал «1 день и 1 час» можно выразить как `25 HOUR` или `90000 SECOND`.
Арифметические операции со значениями типов `Interval` не доступны, однако можно последовательно добавлять различные интервалы к значениям типов `Date` и `DateTime`. Например:
Также можно использовать различные типы интервалов одновременно:
``` sql
SELECT now() AS current_date_time, current_date_time + INTERVAL 4 DAY + INTERVAL 3 HOUR
SELECT now() AS current_date_time, current_date_time + (INTERVAL 4 DAY + INTERVAL 3 HOUR)
```
``` text
┌───current_date_time─┬─plus(plus(now(), toIntervalDay(4)), toIntervalHour(3))─┐
│ 2019-10-23 11:16:28 │ 2019-10-27 14:16:28
└─────────────────────┴────────────────────────────────────────────────────────┘
┌───current_date_time─┬─plus(current_date_time, plus(toIntervalDay(4), toIntervalHour(3)))─┐
│ 2024-08-08 18:31:39 │ 2024-08-12 21:31:39
└─────────────────────┴────────────────────────────────────────────────────────────────────
```
Следующий запрос приведёт к генерированию исключения:
И сравнивать значения из разными интервалами:
``` sql
select now() AS current_date_time, current_date_time + (INTERVAL 4 DAY + INTERVAL 3 HOUR)
SELECT toIntervalMicrosecond(3600000000) = toIntervalHour(1);
```
``` text
Received exception from server (version 19.14.1):
Code: 43. DB::Exception: Received from localhost:9000. DB::Exception: Wrong argument types for function plus: if one argument is Interval, then another must be Date or DateTime..
┌─less(toIntervalMicrosecond(179999999), toIntervalMinute(3))─┐
│ 1 │
└─────────────────────────────────────────────────────────────┘
```
## Смотрите также {#smotrite-takzhe}

View File

@ -124,6 +124,40 @@ SELECT hex(sipHash128('foo', '\x01', 3));
└──────────────────────────────────┘
```
## ripeMD160
Генерирует [RIPEMD-160](https://en.wikipedia.org/wiki/RIPEMD) хеш строки.
**Синтаксис**
```sql
ripeMD160(input)
```
**Аргументы**
- `input`: Строка [String](../data-types/string.md)
**Возвращаемое значение**
- [UInt256](../data-types/int-uint.md), где 160-битный хеш RIPEMD-160 хранится в первых 20 байтах. Оставшиеся 12 байт заполняются нулями.
**Пример**
Используйте функцию [hex](../functions/encoding-functions.md#hex) для представления результата в виде строки с шестнадцатеричной кодировкой
Запрос:
```sql
SELECT hex(ripeMD160('The quick brown fox jumps over the lazy dog'));
```
Результат:
```response
┌─hex(ripeMD160('The quick brown fox jumps over the lazy dog'))─┐
│ 37F332F68DB77BD9D7EDD4969571AD671CF9DD3B │
└───────────────────────────────────────────────────────────────┘
```
## cityHash64 {#cityhash64}
Генерирует 64-х битное значение [CityHash](https://github.com/google/cityhash).

View File

@ -280,7 +280,7 @@ SYSTEM START REPLICATION QUEUES [ON CLUSTER cluster_name] [[db.]replicated_merge
Ждет когда таблица семейства `ReplicatedMergeTree` будет синхронизирована с другими репликами в кластере, но не более `receive_timeout` секунд:
``` sql
SYSTEM SYNC REPLICA [db.]replicated_merge_tree_family_table_name [STRICT | LIGHTWEIGHT [FROM 'srcReplica1'[, 'srcReplica2'[, ...]]] | PULL]
SYSTEM SYNC REPLICA [ON CLUSTER cluster_name] [db.]replicated_merge_tree_family_table_name [STRICT | LIGHTWEIGHT [FROM 'srcReplica1'[, 'srcReplica2'[, ...]]] | PULL]
```
После выполнения этого запроса таблица `[db.]replicated_merge_tree_family_table_name` загружает команды из общего реплицированного лога в свою собственную очередь репликации. Затем запрос ждет, пока реплика не обработает все загруженные команды. Поддерживаются следующие модификаторы:

View File

@ -1157,7 +1157,7 @@ SELECT toModifiedJulianDayOrNull('2020-01-01');
## fromModifiedJulianDay {#frommodifiedjulianday}
将 [Modified Julian Day](https://en.wikipedia.org/wiki/Julian_day#Variants) 数字转换为 `YYYY-MM-DD` 文本格式的 [Proleptic Gregorian calendar](https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar) 日期。该函数支持从 `-678941``2973119` 的天数(分别代表 0000-01-01 和 9999-12-31。如果天数超出支持范围则会引发异常。
将 [Modified Julian Day](https://en.wikipedia.org/wiki/Julian_day#Variants) 数字转换为 `YYYY-MM-DD` 文本格式的 [Proleptic Gregorian calendar](https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar) 日期。该函数支持从 `-678941``2973483` 的天数(分别代表 0000-01-01 和 9999-12-31。如果天数超出支持范围则会引发异常。
**语法**

View File

@ -978,6 +978,7 @@ try
/** Explicitly destroy Context. It is more convenient than in destructor of Server, because logger is still available.
* At this moment, no one could own shared part of Context.
*/
global_context->resetSharedContext();
global_context.reset();
shared_context.reset();
LOG_DEBUG(log, "Destroyed global context.");

View File

@ -120,7 +120,7 @@ void RoleCache::collectEnabledRoles(EnabledRoles & enabled_roles, SubscriptionsO
SubscriptionsOnRoles new_subscriptions_on_roles;
new_subscriptions_on_roles.reserve(subscriptions_on_roles.size());
auto get_role_function = [this, &subscriptions_on_roles](const UUID & id) TSA_NO_THREAD_SAFETY_ANALYSIS { return getRole(id, subscriptions_on_roles); };
auto get_role_function = [this, &new_subscriptions_on_roles](const UUID & id) TSA_NO_THREAD_SAFETY_ANALYSIS { return getRole(id, new_subscriptions_on_roles); };
for (const auto & current_role : enabled_roles.params.current_roles)
collectRoles(*new_info, skip_ids, get_role_function, current_role, true, false);

View File

@ -209,7 +209,7 @@ std::map<std::pair<TypeIndex, String>, NodeToSubcolumnTransformer> node_transfor
},
};
std::tuple<FunctionNode *, ColumnNode *, TableNode *> getTypedNodesForOptimization(const QueryTreeNodePtr & node)
std::tuple<FunctionNode *, ColumnNode *, TableNode *> getTypedNodesForOptimization(const QueryTreeNodePtr & node, const ContextPtr & context)
{
auto * function_node = node->as<FunctionNode>();
if (!function_node)
@ -232,6 +232,12 @@ std::tuple<FunctionNode *, ColumnNode *, TableNode *> getTypedNodesForOptimizati
const auto & storage_snapshot = table_node->getStorageSnapshot();
auto column = first_argument_column_node->getColumn();
/// If view source is set we cannot optimize because it doesn't support moving functions to subcolumns.
/// The storage is replaced to the view source but it happens only after building a query tree and applying passes.
auto view_source = context->getViewSource();
if (view_source && view_source->getStorageID().getFullNameNotQuoted() == storage->getStorageID().getFullNameNotQuoted())
return {};
if (!storage->supportsOptimizationToSubcolumns() || storage->isVirtualColumn(column.name, storage_snapshot->metadata))
return {};
@ -266,7 +272,7 @@ public:
return;
}
auto [function_node, first_argument_node, table_node] = getTypedNodesForOptimization(node);
auto [function_node, first_argument_node, table_node] = getTypedNodesForOptimization(node, getContext());
if (function_node && first_argument_node && table_node)
{
enterImpl(*function_node, *first_argument_node, *table_node);
@ -416,7 +422,7 @@ public:
if (!getSettings().optimize_functions_to_subcolumns)
return;
auto [function_node, first_argument_column_node, table_node] = getTypedNodesForOptimization(node);
auto [function_node, first_argument_column_node, table_node] = getTypedNodesForOptimization(node, getContext());
if (!function_node || !first_argument_column_node || !table_node)
return;

View File

@ -692,7 +692,7 @@ QueryTreeNodePtr IdentifierResolver::tryResolveIdentifierFromStorage(
result_column_node = it->second;
}
/// Check if it's a dynamic subcolumn
else
else if (table_expression_data.supports_subcolumns)
{
auto [column_name, dynamic_subcolumn_name] = Nested::splitName(identifier_full_name);
auto jt = table_expression_data.column_name_to_column_node.find(column_name);

View File

@ -4379,7 +4379,10 @@ void QueryAnalyzer::initializeTableExpressionData(const QueryTreeNodePtr & table
auto get_column_options = GetColumnsOptions(GetColumnsOptions::All).withExtendedObjects().withVirtuals();
if (storage_snapshot->storage.supportsSubcolumns())
{
get_column_options.withSubcolumns();
table_expression_data.supports_subcolumns = true;
}
auto column_names_and_types = storage_snapshot->getColumns(get_column_options);
table_expression_data.column_names_and_types = NamesAndTypes(column_names_and_types.begin(), column_names_and_types.end());

View File

@ -36,6 +36,7 @@ struct AnalysisTableExpressionData
std::string database_name;
std::string table_name;
bool should_qualify_columns = true;
bool supports_subcolumns = false;
NamesAndTypes column_names_and_types;
ColumnNameToColumnNodeMap column_name_to_column_node;
std::unordered_set<std::string> subcolumn_names; /// Subset columns that are subcolumns of other columns

View File

@ -100,6 +100,7 @@ protected:
auto buf = BuilderRWBufferFromHTTP(getPingURI())
.withConnectionGroup(HTTPConnectionGroupType::STORAGE)
.withTimeouts(getHTTPTimeouts())
.withSettings(getContext()->getReadSettings())
.create(credentials);
return checkString(PING_OK_ANSWER, *buf);
@ -206,6 +207,7 @@ protected:
.withConnectionGroup(HTTPConnectionGroupType::STORAGE)
.withMethod(Poco::Net::HTTPRequest::HTTP_POST)
.withTimeouts(getHTTPTimeouts())
.withSettings(getContext()->getReadSettings())
.create(credentials);
bool res = false;
@ -232,6 +234,7 @@ protected:
.withConnectionGroup(HTTPConnectionGroupType::STORAGE)
.withMethod(Poco::Net::HTTPRequest::HTTP_POST)
.withTimeouts(getHTTPTimeouts())
.withSettings(getContext()->getReadSettings())
.create(credentials);
std::string character;

View File

@ -111,6 +111,7 @@ add_headers_and_sources(dbms Storages/ObjectStorage)
add_headers_and_sources(dbms Storages/ObjectStorage/Azure)
add_headers_and_sources(dbms Storages/ObjectStorage/S3)
add_headers_and_sources(dbms Storages/ObjectStorage/HDFS)
add_headers_and_sources(dbms Storages/ObjectStorage/Local)
add_headers_and_sources(dbms Storages/ObjectStorage/DataLakes)
add_headers_and_sources(dbms Common/NamedCollections)

View File

@ -145,6 +145,9 @@ void Connection::connect(const ConnectionTimeouts & timeouts)
/// work we need to pass host name separately. It will be send into TLS Hello packet to let
/// the server know which host we want to talk with (single IP can process requests for multiple hosts using SNI).
static_cast<Poco::Net::SecureStreamSocket*>(socket.get())->setPeerHostName(host);
/// we want to postpone SSL handshake until first read or write operation
/// so any errors during negotiation would be properly processed
static_cast<Poco::Net::SecureStreamSocket*>(socket.get())->setLazyHandshake(true);
#else
throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "tcp_secure protocol is disabled because poco library was built without NetSSL support.");
#endif

View File

@ -299,13 +299,14 @@ ReplxxLineReader::ReplxxLineReader(
Patterns delimiters_,
const char word_break_characters_[],
replxx::Replxx::highlighter_callback_t highlighter_,
[[ maybe_unused ]] std::istream & input_stream_,
[[ maybe_unused ]] std::ostream & output_stream_,
[[ maybe_unused ]] int in_fd_,
[[ maybe_unused ]] int out_fd_,
[[ maybe_unused ]] int err_fd_
std::istream & input_stream_,
std::ostream & output_stream_,
int in_fd_,
int out_fd_,
int err_fd_
)
: LineReader(history_file_path_, multiline_, std::move(extenders_), std::move(delimiters_), input_stream_, output_stream_, in_fd_)
, rx(input_stream_, output_stream_, in_fd_, out_fd_, err_fd_)
, highlighter(std::move(highlighter_))
, word_break_characters(word_break_characters_)
, editor(getEditor())
@ -516,7 +517,7 @@ void ReplxxLineReader::addToHistory(const String & line)
rx.history_add(line);
// flush changes to the disk
if (!rx.history_save(history_file_path))
if (history_file_fd >= 0 && !rx.history_save(history_file_path))
rx.print("Saving history failed: %s\n", errnoToString().c_str());
if (history_file_fd >= 0 && locked && 0 != flock(history_file_fd, LOCK_UN))

View File

@ -300,7 +300,7 @@ void ColumnDynamic::get(size_t n, Field & res) const
auto value_data = shared_variant.getDataAt(variant_col.offsetAt(n));
ReadBufferFromMemory buf(value_data.data, value_data.size);
auto type = decodeDataType(buf);
getVariantSerialization(type)->deserializeBinary(res, buf, getFormatSettings());
type->getDefaultSerialization()->deserializeBinary(res, buf, getFormatSettings());
}
@ -736,8 +736,7 @@ StringRef ColumnDynamic::serializeValueIntoArena(size_t n, Arena & arena, const
{
const auto & variant_type = assert_cast<const DataTypeVariant &>(*variant_info.variant_type).getVariant(discr);
encodeDataType(variant_type, buf);
getVariantSerialization(variant_type, variant_info.variant_names[discr])
->serializeBinary(variant_col.getVariantByGlobalDiscriminator(discr), variant_col.offsetAt(n), buf, getFormatSettings());
variant_type->getDefaultSerialization()->serializeBinary(variant_col.getVariantByGlobalDiscriminator(discr), variant_col.offsetAt(n), buf, getFormatSettings());
type_and_value = buf.str();
}
@ -870,7 +869,7 @@ int ColumnDynamic::doCompareAt(size_t n, size_t m, const IColumn & rhs, int nan_
/// We have both values serialized in binary format, so we need to
/// create temporary column, insert both values into it and compare.
auto tmp_column = left_data_type->createColumn();
const auto & serialization = getVariantSerialization(left_data_type, left_data_type_name);
const auto & serialization = left_data_type->getDefaultSerialization();
serialization->deserializeBinary(*tmp_column, buf_left, getFormatSettings());
serialization->deserializeBinary(*tmp_column, buf_right, getFormatSettings());
return tmp_column->compareAt(0, 1, *tmp_column, nan_direction_hint);
@ -892,7 +891,7 @@ int ColumnDynamic::doCompareAt(size_t n, size_t m, const IColumn & rhs, int nan_
/// We have left value serialized in binary format, we need to
/// create temporary column, insert the value into it and compare.
auto tmp_column = left_data_type->createColumn();
getVariantSerialization(left_data_type, left_data_type_name)->deserializeBinary(*tmp_column, buf_left, getFormatSettings());
left_data_type->getDefaultSerialization()->deserializeBinary(*tmp_column, buf_left, getFormatSettings());
return tmp_column->compareAt(0, right_variant.offsetAt(m), right_variant.getVariantByGlobalDiscriminator(right_discr), nan_direction_hint);
}
/// Check if only right value is in shared data.
@ -912,7 +911,7 @@ int ColumnDynamic::doCompareAt(size_t n, size_t m, const IColumn & rhs, int nan_
/// We have right value serialized in binary format, we need to
/// create temporary column, insert the value into it and compare.
auto tmp_column = right_data_type->createColumn();
getVariantSerialization(right_data_type, right_data_type_name)->deserializeBinary(*tmp_column, buf_right, getFormatSettings());
right_data_type->getDefaultSerialization()->deserializeBinary(*tmp_column, buf_right, getFormatSettings());
return left_variant.getVariantByGlobalDiscriminator(left_discr).compareAt(left_variant.offsetAt(n), 0, *tmp_column, nan_direction_hint);
}
/// Otherwise both values are regular variants.
@ -1181,13 +1180,14 @@ void ColumnDynamic::takeDynamicStructureFromSourceColumns(const Columns & source
/// Check if the number of all dynamic types exceeds the limit.
if (!canAddNewVariants(0, all_variants.size()))
{
/// Create list of variants with their sizes and sort it.
std::vector<std::pair<size_t, DataTypePtr>> variants_with_sizes;
/// Create a list of variants with their sizes and names and then sort it.
std::vector<std::tuple<size_t, String, DataTypePtr>> variants_with_sizes;
variants_with_sizes.reserve(all_variants.size());
for (const auto & variant : all_variants)
{
if (variant->getName() != getSharedVariantTypeName())
variants_with_sizes.emplace_back(total_sizes[variant->getName()], variant);
auto variant_name = variant->getName();
if (variant_name != getSharedVariantTypeName())
variants_with_sizes.emplace_back(total_sizes[variant_name], variant_name, variant);
}
std::sort(variants_with_sizes.begin(), variants_with_sizes.end(), std::greater());
@ -1196,14 +1196,14 @@ void ColumnDynamic::takeDynamicStructureFromSourceColumns(const Columns & source
result_variants.reserve(max_dynamic_types + 1); /// +1 for shared variant.
/// Add shared variant.
result_variants.push_back(getSharedVariantDataType());
for (const auto & [size, variant] : variants_with_sizes)
for (const auto & [size, variant_name, variant_type] : variants_with_sizes)
{
/// Add variant to the resulting variants list until we reach max_dynamic_types.
if (canAddNewVariant(result_variants.size()))
result_variants.push_back(variant);
result_variants.push_back(variant_type);
/// Add all remaining variants into shared_variants_statistics until we reach its max size.
else if (new_statistics.shared_variants_statistics.size() < Statistics::MAX_SHARED_VARIANT_STATISTICS_SIZE)
new_statistics.shared_variants_statistics[variant->getName()] = size;
new_statistics.shared_variants_statistics[variant_name] = size;
else
break;
}

View File

@ -414,7 +414,7 @@ public:
/// Insert value into shared variant. Also updates Variant discriminators and offsets.
void insertValueIntoSharedVariant(const IColumn & src, const DataTypePtr & type, const String & type_name, size_t n);
const SerializationPtr & getVariantSerialization(const DataTypePtr & variant_type, const String & variant_name) const
const SerializationPtr & getVariantSerialization(const DataTypePtr & variant_type, const String & variant_name)
{
/// Get serialization for provided data type.
/// To avoid calling type->getDefaultSerialization() every time we use simple cache with max size.
@ -428,7 +428,7 @@ public:
return serialization_cache.emplace(variant_name, variant_type->getDefaultSerialization()).first->second;
}
const SerializationPtr & getVariantSerialization(const DataTypePtr & variant_type) const { return getVariantSerialization(variant_type, variant_type->getName()); }
const SerializationPtr & getVariantSerialization(const DataTypePtr & variant_type) { return getVariantSerialization(variant_type, variant_type->getName()); }
private:
void createVariantInfo(const DataTypePtr & variant_type);
@ -473,7 +473,7 @@ private:
/// We can use serializations of different data types to serialize values into shared variant.
/// To avoid creating the same serialization multiple times, use simple cache.
static const size_t SERIALIZATION_CACHE_MAX_SIZE = 256;
mutable std::unordered_map<String, SerializationPtr> serialization_cache;
std::unordered_map<String, SerializationPtr> serialization_cache;
};
void extendVariantColumn(

View File

@ -127,7 +127,7 @@ std::string ColumnObject::getName() const
{
WriteBufferFromOwnString ss;
ss << "Object(";
ss << "max_dynamic_paths=" << max_dynamic_paths;
ss << "max_dynamic_paths=" << global_max_dynamic_paths;
ss << ", max_dynamic_types=" << max_dynamic_types;
std::vector<String> sorted_typed_paths;
sorted_typed_paths.reserve(typed_paths.size());
@ -1045,9 +1045,9 @@ void ColumnObject::forEachSubcolumnRecursively(DB::IColumn::RecursiveMutableColu
bool ColumnObject::structureEquals(const IColumn & rhs) const
{
/// 2 Object columns have equal structure if they have the same typed paths and max_dynamic_paths/max_dynamic_types.
/// 2 Object columns have equal structure if they have the same typed paths and global_max_dynamic_paths/max_dynamic_types.
const auto * rhs_object = typeid_cast<const ColumnObject *>(&rhs);
if (!rhs_object || typed_paths.size() != rhs_object->typed_paths.size() || max_dynamic_paths != rhs_object->max_dynamic_paths || max_dynamic_types != rhs_object->max_dynamic_types)
if (!rhs_object || typed_paths.size() != rhs_object->typed_paths.size() || global_max_dynamic_paths != rhs_object->global_max_dynamic_paths || max_dynamic_types != rhs_object->max_dynamic_types)
return false;
for (const auto & [path, column] : typed_paths)

View File

@ -953,7 +953,7 @@ ColumnPtr ColumnVariant::index(const IColumn & indexes, size_t limit) const
{
/// If we have only NULLs, index will take no effect, just return resized column.
if (hasOnlyNulls())
return cloneResized(limit);
return cloneResized(limit == 0 ? indexes.size(): limit);
/// Optimization when we have only one non empty variant and no NULLs.
/// In this case local_discriminators column is filled with identical values and offsets column
@ -1009,8 +1009,16 @@ ColumnPtr ColumnVariant::indexImpl(const PaddedPODArray<Type> & indexes, size_t
new_variants.reserve(num_variants);
for (size_t i = 0; i != num_variants; ++i)
{
size_t nested_limit = nested_perms[i].size() == variants[i]->size() ? 0 : nested_perms[i].size();
new_variants.emplace_back(variants[i]->permute(nested_perms[i], nested_limit));
/// Check if no values from this variant were selected.
if (nested_perms[i].empty())
{
new_variants.emplace_back(variants[i]->cloneEmpty());
}
else
{
size_t nested_limit = nested_perms[i].size() == variants[i]->size() ? 0 : nested_perms[i].size();
new_variants.emplace_back(variants[i]->permute(nested_perms[i], nested_limit));
}
}
/// We cannot use new_offsets column as an offset column, because it became invalid after variants permutation.

View File

@ -197,6 +197,12 @@ public:
cache_policy->remove(key);
}
void remove(std::function<bool(const Key&, const MappedPtr &)> predicate)
{
std::lock_guard lock(mutex);
cache_policy->remove(predicate);
}
size_t sizeInBytes() const
{
std::lock_guard lock(mutex);

View File

@ -55,6 +55,7 @@ public:
virtual void set(const Key & key, const MappedPtr & mapped) = 0;
virtual void remove(const Key & key) = 0;
virtual void remove(std::function<bool(const Key & key, const MappedPtr & mapped)> predicate) = 0;
virtual void clear() = 0;
virtual std::vector<KeyMapped> dump() const = 0;

View File

@ -79,6 +79,22 @@ public:
cells.erase(it);
}
void remove(std::function<bool(const Key &, const MappedPtr &)> predicate) override
{
for (auto it = cells.begin(); it != cells.end();)
{
if (predicate(it->first, it->second.value))
{
Cell & cell = it->second;
current_size_in_bytes -= cell.size;
queue.erase(cell.queue_iterator);
it = cells.erase(it);
}
else
++it;
}
}
MappedPtr get(const Key & key) override
{
auto it = cells.find(key);

View File

@ -95,6 +95,27 @@ public:
cells.erase(it);
}
void remove(std::function<bool(const Key &, const MappedPtr &)> predicate) override
{
for (auto it = cells.begin(); it != cells.end();)
{
if (predicate(it->first, it->second.value))
{
auto & cell = it->second;
current_size_in_bytes -= cell.size;
if (cell.is_protected)
current_protected_size -= cell.size;
auto & queue = cell.is_protected ? protected_queue : probationary_queue;
queue.erase(cell.queue_iterator);
it = cells.erase(it);
}
else
++it;
}
}
MappedPtr get(const Key & key) override
{
auto it = cells.find(key);

View File

@ -13,6 +13,7 @@
#include <IO/ReadHelpers.h>
#include <Interpreters/Context.h>
#include <Core/Settings.h>
#include <Poco/Environment.h>
#pragma clang diagnostic ignored "-Wreserved-identifier"
@ -371,8 +372,8 @@ try
/// in case of double fault.
LOG_FATAL(log, "########## Short fault info ############");
LOG_FATAL(log, "(version {}{}, build id: {}, git hash: {}) (from thread {}) Received signal {}",
VERSION_STRING, VERSION_OFFICIAL, daemon ? daemon->build_id : "", GIT_HASH,
LOG_FATAL(log, "(version {}{}, build id: {}, git hash: {}, architecture: {}) (from thread {}) Received signal {}",
VERSION_STRING, VERSION_OFFICIAL, daemon ? daemon->build_id : "", GIT_HASH, Poco::Environment::osArchitecture(),
thread_num, sig);
std::string signal_description = "Unknown signal";

View File

@ -273,6 +273,25 @@ void SystemLogBase<LogElement>::startup()
saving_thread = std::make_unique<ThreadFromGlobalPool>([this] { savingThreadFunction(); });
}
template <typename LogElement>
void SystemLogBase<LogElement>::stopFlushThread()
{
{
std::lock_guard lock(thread_mutex);
if (!saving_thread || !saving_thread->joinable())
return;
if (is_shutdown)
return;
is_shutdown = true;
queue->shutdown();
}
saving_thread->join();
}
template <typename LogElement>
void SystemLogBase<LogElement>::add(LogElement element)
{

View File

@ -216,6 +216,8 @@ public:
static consteval bool shouldTurnOffLogger() { return false; }
protected:
void stopFlushThread() final;
std::shared_ptr<SystemLogQueue<LogElement>> queue;
};
}

View File

@ -145,6 +145,23 @@ public:
size_in_bytes -= sz;
}
void remove(std::function<bool(const Key &, const MappedPtr &)> predicate) override
{
for (auto it = cache.begin(); it != cache.end();)
{
if (predicate(it->first, it->second))
{
size_t sz = weight_function(*it->second);
if (it->first.user_id.has_value())
Base::user_quotas->decreaseActual(*it->first.user_id, sz);
it = cache.erase(it);
size_in_bytes -= sz;
}
else
++it;
}
}
MappedPtr get(const Key & key) override
{
auto it = cache.find(key);

View File

@ -1120,7 +1120,7 @@ class IColumn;
M(String, column_names_for_schema_inference, "", "The list of column names to use in schema inference for formats without column names. The format: 'column1,column2,column3,...'", 0) \
M(String, schema_inference_hints, "", "The list of column names and types to use in schema inference for formats without column names. The format: 'column_name1 column_type1, column_name2 column_type2, ...'", 0) \
M(SchemaInferenceMode, schema_inference_mode, "default", "Mode of schema inference. 'default' - assume that all files have the same schema and schema can be inferred from any file, 'union' - files can have different schemas and the resulting schema should be the a union of schemas of all files", 0) \
M(Bool, schema_inference_make_columns_nullable, true, "If set to true, all inferred types will be Nullable in schema inference for formats without information about nullability.", 0) \
M(UInt64Auto, schema_inference_make_columns_nullable, 1, "If set to true, all inferred types will be Nullable in schema inference. When set to false, no columns will be converted to Nullable. When set to 'auto', ClickHouse will use information about nullability from the data.", 0) \
M(Bool, input_format_json_read_bools_as_numbers, true, "Allow to parse bools as numbers in JSON input formats", 0) \
M(Bool, input_format_json_read_bools_as_strings, true, "Allow to parse bools as strings in JSON input formats", 0) \
M(Bool, input_format_json_try_infer_numbers_from_strings, false, "Try to infer numbers from string fields while schema inference", 0) \

View File

@ -72,11 +72,13 @@ static std::initializer_list<std::pair<ClickHouseVersion, SettingsChangesHistory
{"24.9",
{
{"input_format_try_infer_variants", false, false, "Try to infer Variant type in text formats when there is more than one possible type for column/array elements"},
{"join_output_by_rowlist_perkey_rows_threshold", 0, 5, "The lower limit of per-key average rows in the right table to determine whether to output by row list in hash join."},
{"create_if_not_exists", false, false, "New setting."},
{"allow_materialized_view_with_bad_select", true, true, "Support (but not enable yet) stricter validation in CREATE MATERIALIZED VIEW"},
}
},
{"24.8",
{
{"create_if_not_exists", false, false, "New setting."},
{"rows_before_aggregation", true, true, "Provide exact value for rows_before_aggregation statistic, represents the number of rows read before aggregation"},
{"restore_replace_external_table_functions_to_null", false, false, "New setting."},
{"restore_replace_external_engines_to_null", false, false, "New setting."},
@ -85,7 +87,6 @@ static std::initializer_list<std::pair<ClickHouseVersion, SettingsChangesHistory
{"use_hive_partitioning", false, false, "Allows to use hive partitioning for File, URL, S3, AzureBlobStorage and HDFS engines."},
{"allow_experimental_kafka_offsets_storage_in_keeper", false, false, "Allow the usage of experimental Kafka storage engine that stores the committed offsets in ClickHouse Keeper"},
{"allow_archive_path_syntax", true, true, "Added new setting to allow disabling archive path syntax."},
{"allow_materialized_view_with_bad_select", true, true, "Support (but not enable yet) stricter validation in CREATE MATERIALIZED VIEW"},
{"query_cache_tag", "", "", "New setting for labeling query cache settings."},
{"allow_experimental_time_series_table", false, false, "Added new setting to allow the TimeSeries table engine"},
{"enable_analyzer", 1, 1, "Added an alias to a setting `allow_experimental_analyzer`."},
@ -93,7 +94,6 @@ static std::initializer_list<std::pair<ClickHouseVersion, SettingsChangesHistory
{"allow_experimental_json_type", false, false, "Add new experimental JSON type"},
{"use_json_alias_for_old_object_type", true, false, "Use JSON type alias to create new JSON type"},
{"type_json_skip_duplicated_paths", false, false, "Allow to skip duplicated paths during JSON parsing"},
{"join_output_by_rowlist_perkey_rows_threshold", 0, 5, "The lower limit of per-key average rows in the right table to determine whether to output by row list in hash join."},
{"allow_experimental_vector_similarity_index", false, false, "Added new setting to allow experimental vector similarity indexes"},
{"input_format_try_infer_datetimes_only_datetime64", true, false, "Allow to infer DateTime instead of DateTime64 in data formats"}
}

View File

@ -3,6 +3,7 @@
#include <utility>
#include <Core/Types.h>
#include <DataTypes/DataTypeInterval.h>
namespace DB
@ -212,6 +213,8 @@ static bool callOnIndexAndDataType(TypeIndex number, F && f, ExtraArgs && ... ar
case TypeIndex::IPv4: return f(TypePair<DataTypeIPv4, T>(), std::forward<ExtraArgs>(args)...);
case TypeIndex::IPv6: return f(TypePair<DataTypeIPv6, T>(), std::forward<ExtraArgs>(args)...);
case TypeIndex::Interval: return f(TypePair<DataTypeInterval, T>(), std::forward<ExtraArgs>(args)...);
default:
break;
}

View File

@ -22,7 +22,6 @@
#include <cstring>
#include <unistd.h>
#include <algorithm>
#include <typeinfo>
#include <iostream>
#include <memory>

View File

@ -185,7 +185,7 @@ std::unique_ptr<IDataType::SubstreamData> DataTypeDynamic::getDynamicSubcolumnDa
auto type = decodeDataType(buf);
if (type->getName() == subcolumn_type_name)
{
dynamic_column.getVariantSerialization(subcolumn_type, subcolumn_type_name)->deserializeBinary(*subcolumn, buf, format_settings);
subcolumn_type->getDefaultSerialization()->deserializeBinary(*subcolumn, buf, format_settings);
null_map.push_back(0);
}
else

View File

@ -1,10 +1,12 @@
#include <DataTypes/DataTypeFactory.h>
#include <DataTypes/DataTypeObject.h>
#include <DataTypes/DataTypeObjectDeprecated.h>
#include <DataTypes/Serializations/SerializationJSON.h>
#include <DataTypes/Serializations/SerializationObjectTypedPath.h>
#include <DataTypes/Serializations/SerializationObjectDynamicPath.h>
#include <DataTypes/Serializations/SerializationSubObject.h>
#include <Columns/ColumnObject.h>
#include <Common/CurrentThread.h>
#include <Parsers/IAST.h>
#include <Parsers/ASTLiteral.h>
@ -513,13 +515,24 @@ static DataTypePtr createObject(const ASTPtr & arguments, const DataTypeObject::
static DataTypePtr createJSON(const ASTPtr & arguments)
{
auto context = CurrentThread::getQueryContext();
if (!context)
context = Context::getGlobalContextInstance();
if (context->getSettingsRef().use_json_alias_for_old_object_type)
{
if (arguments && !arguments->children.empty())
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Experimental Object type doesn't support any arguments. If you want to use new JSON type, set setting allow_experimental_json_type = 1");
return std::make_shared<DataTypeObjectDeprecated>("JSON", false);
}
return createObject(arguments, DataTypeObject::SchemaFormat::JSON);
}
void registerDataTypeJSON(DataTypeFactory & factory)
{
if (!Context::getGlobalContextInstance()->getSettingsRef().use_json_alias_for_old_object_type)
factory.registerDataType("JSON", createJSON, DataTypeFactory::Case::Insensitive);
factory.registerDataType("JSON", createJSON, DataTypeFactory::Case::Insensitive);
}
}

View File

@ -78,10 +78,6 @@ static DataTypePtr create(const ASTPtr & arguments)
void registerDataTypeObjectDeprecated(DataTypeFactory & factory)
{
factory.registerDataType("Object", create);
if (Context::getGlobalContextInstance()->getSettingsRef().use_json_alias_for_old_object_type)
factory.registerSimpleDataType("JSON",
[] { return std::make_shared<DataTypeObjectDeprecated>("JSON", false); },
DataTypeFactory::Case::Insensitive);
}
}

View File

@ -149,6 +149,8 @@ std::unique_ptr<IDataType::SubstreamData> IDataType::getSubcolumnData(
ISerialization::EnumerateStreamsSettings settings;
settings.position_independent_encoding = false;
/// Don't enumerate dynamic subcolumns, they are handled separately.
settings.enumerate_dynamic_streams = false;
data.serialization->enumerateStreams(settings, callback_with_data, data);
if (!res && data.type->hasDynamicSubcolumnsData())

View File

@ -241,6 +241,10 @@ public:
{
SubstreamPath path;
bool position_independent_encoding = true;
/// If set to false, don't enumerate dynamic subcolumns
/// (such as dynamic types in Dynamic column or dynamic paths in JSON column).
/// It may be needed when dynamic subcolumns are processed separately.
bool enumerate_dynamic_streams = true;
};
virtual void enumerateStreams(

View File

@ -64,7 +64,7 @@ void SerializationDynamic::enumerateStreams(
const auto * deserialize_state = data.deserialize_state ? checkAndGetState<DeserializeBinaryBulkStateDynamic>(data.deserialize_state) : nullptr;
/// If column is nullptr and we don't have deserialize state yet, nothing to enumerate as we don't have any variants.
if (!column_dynamic && !deserialize_state)
if (!settings.enumerate_dynamic_streams || (!column_dynamic && !deserialize_state))
return;
const auto & variant_type = column_dynamic ? column_dynamic->getVariantInfo().variant_type : checkAndGetState<DeserializeBinaryBulkStateDynamicStructure>(deserialize_state->structure_state)->variant_type;
@ -489,9 +489,8 @@ void SerializationDynamic::serializeBinary(const IColumn & column, size_t row_nu
}
const auto & variant_type = assert_cast<const DataTypeVariant &>(*variant_info.variant_type).getVariant(global_discr);
const auto & variant_type_name = variant_info.variant_names[global_discr];
encodeDataType(variant_type, ostr);
dynamic_column.getVariantSerialization(variant_type, variant_type_name)->serializeBinary(variant_column.getVariantByGlobalDiscriminator(global_discr), variant_column.offsetAt(row_num), ostr, settings);
variant_type->getDefaultSerialization()->serializeBinary(variant_column.getVariantByGlobalDiscriminator(global_discr), variant_column.offsetAt(row_num), ostr, settings);
}
template <typename ReturnType = void, typename DeserializeFunc>
@ -629,7 +628,7 @@ static void serializeTextImpl(
ReadBufferFromMemory buf(value.data, value.size);
auto variant_type = decodeDataType(buf);
auto tmp_variant_column = variant_type->createColumn();
auto variant_serialization = dynamic_column.getVariantSerialization(variant_type);
auto variant_serialization = variant_type->getDefaultSerialization();
variant_serialization->deserializeBinary(*tmp_variant_column, buf, settings);
nested_serialize(*variant_serialization, *tmp_variant_column, 0, ostr);
}

View File

@ -130,7 +130,7 @@ void SerializationObject::enumerateStreams(EnumerateStreamsSettings & settings,
}
/// If column or deserialization state was provided, iterate over dynamic paths,
if (column_object || structure_state)
if (settings.enumerate_dynamic_streams && (column_object || structure_state))
{
/// Enumerate dynamic paths in sorted order for consistency.
const auto * dynamic_paths = column_object ? &column_object->getDynamicPaths() : nullptr;

View File

@ -228,6 +228,39 @@ void convertUInt64toInt64IfPossible(const DataTypes & types, TypeIndexSet & type
}
}
DataTypePtr findSmallestIntervalSuperType(const DataTypes &types, TypeIndexSet &types_set)
{
auto min_interval = IntervalKind::Kind::Year;
DataTypePtr smallest_type;
bool is_higher_interval = false; // For Years, Quarters and Months
for (const auto &type : types)
{
if (const auto * interval_type = typeid_cast<const DataTypeInterval *>(type.get()))
{
auto current_interval = interval_type->getKind().kind;
if (current_interval > IntervalKind::Kind::Week)
is_higher_interval = true;
if (current_interval < min_interval)
{
min_interval = current_interval;
smallest_type = type;
}
}
}
if (is_higher_interval && min_interval <= IntervalKind::Kind::Week)
throw Exception(ErrorCodes::NO_COMMON_TYPE, "Cannot compare intervals {} and {} because the number of days in a month is not fixed", types[0]->getName(), types[1]->getName());
if (smallest_type)
{
types_set.clear();
types_set.insert(smallest_type->getTypeId());
}
return smallest_type;
}
}
template <LeastSupertypeOnError on_error>
@ -652,6 +685,13 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
return numeric_type;
}
/// For interval data types.
{
auto res = findSmallestIntervalSuperType(types, type_ids);
if (res)
return res;
}
/// All other data types (UUID, AggregateFunction, Enum...) are compatible only if they are the same (checked in trivial cases).
return throwOrReturn<on_error>(types, "", ErrorCodes::NO_COMMON_TYPE);
}

View File

@ -1,5 +1,7 @@
#pragma once
#include <DataTypes/IDataType.h>
#include <DataTypes/DataTypeInterval.h>
#include <Common/IntervalKind.h>
namespace DB
{
@ -48,4 +50,7 @@ DataTypePtr getLeastSupertypeOrString(const TypeIndexSet & types);
DataTypePtr tryGetLeastSupertype(const TypeIndexSet & types);
/// A vector that shows the conversion rates to the next Interval type starting from NanoSecond
static std::vector<int> interval_conversions = {1, 1000, 1000, 1000, 60, 60, 24, 7, 4, 3, 4};
}

View File

@ -35,9 +35,10 @@ class RegionsNames
M(et, ru, 11) \
M(pt, en, 12) \
M(he, en, 13) \
M(vi, en, 14)
M(vi, en, 14) \
M(es, en, 15)
static constexpr size_t total_languages = 15;
static constexpr size_t total_languages = 16;
public:
enum class Language : size_t

View File

@ -43,39 +43,21 @@ bool LocalObjectStorage::exists(const StoredObject & object) const
std::unique_ptr<ReadBufferFromFileBase> LocalObjectStorage::readObjects( /// NOLINT
const StoredObjects & objects,
const ReadSettings & read_settings,
std::optional<size_t> read_hint,
std::optional<size_t> file_size) const
std::optional<size_t>,
std::optional<size_t>) const
{
auto modified_settings = patchSettings(read_settings);
auto global_context = Context::getGlobalContextInstance();
auto read_buffer_creator =
[=] (bool /* restricted_seek */, const StoredObject & object)
-> std::unique_ptr<ReadBufferFromFileBase>
{
return createReadBufferFromFileBase(object.remote_path, modified_settings, read_hint, file_size);
};
auto read_buffer_creator = [=](bool /* restricted_seek */, const StoredObject & object) -> std::unique_ptr<ReadBufferFromFileBase>
{ return std::make_unique<ReadBufferFromFile>(object.remote_path); };
switch (read_settings.remote_fs_method)
{
case RemoteFSReadMethod::read:
{
return std::make_unique<ReadBufferFromRemoteFSGather>(
std::move(read_buffer_creator), objects, "file:", modified_settings,
global_context->getFilesystemCacheLog(), /* use_external_buffer */false);
}
case RemoteFSReadMethod::threadpool:
{
auto impl = std::make_unique<ReadBufferFromRemoteFSGather>(
std::move(read_buffer_creator), objects, "file:", modified_settings,
global_context->getFilesystemCacheLog(), /* use_external_buffer */true);
auto & reader = global_context->getThreadPoolReader(FilesystemReaderType::ASYNCHRONOUS_REMOTE_FS_READER);
return std::make_unique<AsynchronousBoundedReadBuffer>(
std::move(impl), reader, read_settings,
global_context->getAsyncReadCounters(),
global_context->getFilesystemReadPrefetchesLog());
}
}
return std::make_unique<ReadBufferFromRemoteFSGather>(
std::move(read_buffer_creator),
objects,
"file:",
modified_settings,
global_context->getFilesystemCacheLog(),
/* use_external_buffer */ false);
}
ReadSettings LocalObjectStorage::patchSettings(const ReadSettings & read_settings) const

View File

@ -257,7 +257,7 @@ FormatSettings getFormatSettings(const ContextPtr & context, const Settings & se
format_settings.max_bytes_to_read_for_schema_inference = settings.input_format_max_bytes_to_read_for_schema_inference;
format_settings.column_names_for_schema_inference = settings.column_names_for_schema_inference;
format_settings.schema_inference_hints = settings.schema_inference_hints;
format_settings.schema_inference_make_columns_nullable = settings.schema_inference_make_columns_nullable;
format_settings.schema_inference_make_columns_nullable = settings.schema_inference_make_columns_nullable.valueOr(2);
format_settings.mysql_dump.table_name = settings.input_format_mysql_dump_table_name;
format_settings.mysql_dump.map_column_names = settings.input_format_mysql_dump_map_column_names;
format_settings.sql_insert.max_batch_size = settings.output_format_sql_insert_max_batch_size;

View File

@ -77,7 +77,7 @@ struct FormatSettings
Raw
};
bool schema_inference_make_columns_nullable = true;
UInt64 schema_inference_make_columns_nullable = 1;
DateTimeOutputFormat date_time_output_format = DateTimeOutputFormat::Simple;

View File

@ -1179,6 +1179,12 @@ public:
const FormatSettings & format_settings,
String & error) const override
{
if (element.isNull() && format_settings.null_as_default)
{
column.insertDefault();
return true;
}
auto & tuple = assert_cast<ColumnTuple &>(column);
size_t old_size = column.size();
bool were_valid_elements = false;
@ -1298,6 +1304,12 @@ public:
const FormatSettings & format_settings,
String & error) const override
{
if (element.isNull() && format_settings.null_as_default)
{
column.insertDefault();
return true;
}
if (!element.isObject())
{
error = fmt::format("cannot read Map value from JSON element: {}", jsonElementToString<JSONParser>(element, format_settings));
@ -1362,6 +1374,14 @@ public:
String & error) const override
{
auto & column_variant = assert_cast<ColumnVariant &>(column);
/// Check if element is NULL.
if (element.isNull())
{
column_variant.insertDefault();
return true;
}
for (size_t i : order)
{
auto & variant = column_variant.getVariantByGlobalDiscriminator(i);

View File

@ -1344,7 +1344,11 @@ namespace
if (checkCharCaseInsensitive('n', buf))
{
if (checkStringCaseInsensitive("ull", buf))
return std::make_shared<DataTypeNullable>(std::make_shared<DataTypeNothing>());
{
if (settings.schema_inference_make_columns_nullable == 0)
return std::make_shared<DataTypeNothing>();
return makeNullable(std::make_shared<DataTypeNothing>());
}
else if (checkStringCaseInsensitive("an", buf))
return std::make_shared<DataTypeFloat64>();
}

View File

@ -123,7 +123,7 @@ public:
class Executor
{
public:
static ColumnPtr run(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count, uint32_t parse_depth, uint32_t parse_backtracks, const ContextPtr & context)
static ColumnPtr run(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count, uint32_t parse_depth, uint32_t parse_backtracks, bool function_json_value_return_type_allow_complex)
{
MutableColumnPtr to{result_type->createColumn()};
to->reserve(input_rows_count);
@ -191,7 +191,7 @@ public:
{
/// Instead of creating a new generator for each row, we can reuse the same one.
generator_json_path.reinitialize();
added_to_column = impl.insertResultToColumn(*to, document, generator_json_path, context);
added_to_column = impl.insertResultToColumn(*to, document, generator_json_path, function_json_value_return_type_allow_complex);
}
if (!added_to_column)
{
@ -204,11 +204,18 @@ public:
};
template <typename Name, template <typename, typename> typename Impl>
class FunctionSQLJSON : public IFunction, WithConstContext
class FunctionSQLJSON : public IFunction
{
public:
static FunctionPtr create(ContextPtr context_) { return std::make_shared<FunctionSQLJSON>(context_); }
explicit FunctionSQLJSON(ContextPtr context_) : WithConstContext(context_) { }
explicit FunctionSQLJSON(ContextPtr context_)
: max_parser_depth(context_->getSettingsRef().max_parser_depth),
max_parser_backtracks(context_->getSettingsRef().max_parser_backtracks),
allow_simdjson(context_->getSettingsRef().allow_simdjson),
function_json_value_return_type_allow_complex(context_->getSettingsRef().function_json_value_return_type_allow_complex),
function_json_value_return_type_allow_nullable(context_->getSettingsRef().function_json_value_return_type_allow_nullable)
{
}
static constexpr auto name = Name::name;
String getName() const override { return Name::name; }
@ -221,7 +228,7 @@ public:
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
return Impl<DummyJSONParser, DefaultJSONStringSerializer<DummyJSONParser::Element>>::getReturnType(
Name::name, arguments, getContext());
Name::name, arguments, function_json_value_return_type_allow_nullable);
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
@ -231,19 +238,25 @@ public:
/// 2. Create ASTPtr
/// 3. Parser(Tokens, ASTPtr) -> complete AST
/// 4. Execute functions: call getNextItem on generator and handle each item
unsigned parse_depth = static_cast<unsigned>(getContext()->getSettingsRef().max_parser_depth);
unsigned parse_backtracks = static_cast<unsigned>(getContext()->getSettingsRef().max_parser_backtracks);
unsigned parse_depth = static_cast<unsigned>(max_parser_depth);
unsigned parse_backtracks = static_cast<unsigned>(max_parser_backtracks);
#if USE_SIMDJSON
if (getContext()->getSettingsRef().allow_simdjson)
if (allow_simdjson)
return FunctionSQLJSONHelpers::Executor<
Name,
Impl<SimdJSONParser, JSONStringSerializer<SimdJSONParser::Element, SimdJSONElementFormatter>>,
SimdJSONParser>::run(arguments, result_type, input_rows_count, parse_depth, parse_backtracks, getContext());
SimdJSONParser>::run(arguments, result_type, input_rows_count, parse_depth, parse_backtracks, function_json_value_return_type_allow_complex);
#endif
return FunctionSQLJSONHelpers::
Executor<Name, Impl<DummyJSONParser, DefaultJSONStringSerializer<DummyJSONParser::Element>>, DummyJSONParser>::run(
arguments, result_type, input_rows_count, parse_depth, parse_backtracks, getContext());
arguments, result_type, input_rows_count, parse_depth, parse_backtracks, function_json_value_return_type_allow_complex);
}
private:
const size_t max_parser_depth;
const size_t max_parser_backtracks;
const bool allow_simdjson;
const bool function_json_value_return_type_allow_complex;
const bool function_json_value_return_type_allow_nullable;
};
struct NameJSONExists
@ -267,11 +280,11 @@ class JSONExistsImpl
public:
using Element = typename JSONParser::Element;
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &, const ContextPtr &) { return std::make_shared<DataTypeUInt8>(); }
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &, bool) { return std::make_shared<DataTypeUInt8>(); }
static size_t getNumberOfIndexArguments(const ColumnsWithTypeAndName & arguments) { return arguments.size() - 1; }
static bool insertResultToColumn(IColumn & dest, const Element & root, GeneratorJSONPath<JSONParser> & generator_json_path, const ContextPtr &)
static bool insertResultToColumn(IColumn & dest, const Element & root, GeneratorJSONPath<JSONParser> & generator_json_path, bool)
{
Element current_element = root;
VisitorStatus status;
@ -305,9 +318,9 @@ class JSONValueImpl
public:
using Element = typename JSONParser::Element;
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &, const ContextPtr & context)
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &, bool function_json_value_return_type_allow_nullable)
{
if (context->getSettingsRef().function_json_value_return_type_allow_nullable)
if (function_json_value_return_type_allow_nullable)
{
DataTypePtr string_type = std::make_shared<DataTypeString>();
return std::make_shared<DataTypeNullable>(string_type);
@ -320,7 +333,7 @@ public:
static size_t getNumberOfIndexArguments(const ColumnsWithTypeAndName & arguments) { return arguments.size() - 1; }
static bool insertResultToColumn(IColumn & dest, const Element & root, GeneratorJSONPath<JSONParser> & generator_json_path, const ContextPtr & context)
static bool insertResultToColumn(IColumn & dest, const Element & root, GeneratorJSONPath<JSONParser> & generator_json_path, bool function_json_value_return_type_allow_complex)
{
Element current_element = root;
VisitorStatus status;
@ -329,7 +342,7 @@ public:
{
if (status == VisitorStatus::Ok)
{
if (context->getSettingsRef().function_json_value_return_type_allow_complex)
if (function_json_value_return_type_allow_complex)
{
break;
}
@ -383,11 +396,11 @@ class JSONQueryImpl
public:
using Element = typename JSONParser::Element;
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &, const ContextPtr &) { return std::make_shared<DataTypeString>(); }
static DataTypePtr getReturnType(const char *, const ColumnsWithTypeAndName &, bool) { return std::make_shared<DataTypeString>(); }
static size_t getNumberOfIndexArguments(const ColumnsWithTypeAndName & arguments) { return arguments.size() - 1; }
static bool insertResultToColumn(IColumn & dest, const Element & root, GeneratorJSONPath<JSONParser> & generator_json_path, const ContextPtr &)
static bool insertResultToColumn(IColumn & dest, const Element & root, GeneratorJSONPath<JSONParser> & generator_json_path, bool)
{
ColumnString & col_str = assert_cast<ColumnString &>(dest);

View File

@ -48,6 +48,7 @@
#include <DataTypes/DataTypesBinaryEncoding.h>
#include <DataTypes/ObjectUtils.h>
#include <DataTypes/Serializations/SerializationDecimal.h>
#include <DataTypes/getLeastSupertype.h>
#include <Formats/FormatSettings.h>
#include <Formats/FormatFactory.h>
#include <Functions/CastOverloadResolver.h>
@ -1576,6 +1577,35 @@ struct ConvertImpl
arguments, result_type, input_rows_count, additions);
}
}
else if constexpr (std::is_same_v<FromDataType, DataTypeInterval> && std::is_same_v<ToDataType, DataTypeInterval>)
{
IntervalKind to = typeid_cast<const DataTypeInterval *>(result_type.get())->getKind();
IntervalKind from = typeid_cast<const DataTypeInterval *>(arguments[0].type.get())->getKind();
if (from == to || arguments[0].column->empty())
return arguments[0].column;
Int64 conversion_factor = 1;
Int64 result_value;
int from_position = static_cast<int>(from.kind);
int to_position = static_cast<int>(to.kind); /// Positions of each interval according to granularity map
if (from_position < to_position)
{
for (int i = from_position; i < to_position; ++i)
conversion_factor *= interval_conversions[i];
result_value = arguments[0].column->getInt(0) / conversion_factor;
}
else
{
for (int i = from_position; i > to_position; --i)
conversion_factor *= interval_conversions[i];
result_value = arguments[0].column->getInt(0) * conversion_factor;
}
return ColumnConst::create(ColumnInt64::create(1, result_value), input_rows_count);
}
else
{
using FromFieldType = typename FromDataType::FieldType;
@ -2184,7 +2214,7 @@ private:
const DataTypePtr from_type = removeNullable(arguments[0].type);
ColumnPtr result_column;
[[maybe_unused]] FormatSettings::DateTimeOverflowBehavior date_time_overflow_behavior = default_date_time_overflow_behavior;
FormatSettings::DateTimeOverflowBehavior date_time_overflow_behavior = default_date_time_overflow_behavior;
if (context)
date_time_overflow_behavior = context->getSettingsRef().date_time_overflow_behavior.value;
@ -2280,7 +2310,7 @@ private:
}
}
else
result_column = ConvertImpl<LeftDataType, RightDataType, Name>::execute(arguments, result_type, input_rows_count, from_string_tag);
result_column = ConvertImpl<LeftDataType, RightDataType, Name>::execute(arguments, result_type, input_rows_count, from_string_tag);
return true;
};
@ -2337,6 +2367,10 @@ private:
else
done = callOnIndexAndDataType<ToDataType>(from_type->getTypeId(), call, BehaviourOnErrorFromString::ConvertDefaultBehaviorTag);
}
if constexpr (std::is_same_v<ToDataType, DataTypeInterval>)
if (WhichDataType(from_type).isInterval())
done = callOnIndexAndDataType<ToDataType>(from_type->getTypeId(), call, BehaviourOnErrorFromString::ConvertDefaultBehaviorTag);
}
if (!done)

View File

@ -19,7 +19,9 @@
#include <Common/HashTable/Hash.h>
#if USE_SSL
# include <openssl/evp.h>
# include <openssl/md5.h>
# include <openssl/ripemd.h>
#endif
#include <bit>
@ -93,9 +95,9 @@ namespace impl
if (is_const)
i = 0;
assert(key0->size() == key1->size());
if (offsets != nullptr)
if (offsets != nullptr && i > 0)
{
const auto * const begin = offsets->begin();
const auto * const begin = std::upper_bound(offsets->begin(), offsets->end(), i - 1);
const auto * upper = std::upper_bound(begin, offsets->end(), i);
if (upper != offsets->end())
i = upper - begin;
@ -196,6 +198,34 @@ T combineHashesFunc(T t1, T t2)
return HashFunction::apply(reinterpret_cast<const char *>(hashes), sizeof(hashes));
}
#if USE_SSL
struct RipeMD160Impl
{
static constexpr auto name = "ripeMD160";
using ReturnType = UInt256;
static UInt256 apply(const char * begin, size_t size)
{
UInt8 digest[RIPEMD160_DIGEST_LENGTH];
RIPEMD160(reinterpret_cast<const unsigned char *>(begin), size, reinterpret_cast<unsigned char *>(digest));
std::reverse(digest, digest + RIPEMD160_DIGEST_LENGTH);
UInt256 res = 0;
std::memcpy(&res, digest, RIPEMD160_DIGEST_LENGTH);
return res;
}
static UInt256 combineHashes(UInt256 h1, UInt256 h2)
{
return combineHashesFunc<UInt256, RipeMD160Impl>(h1, h2);
}
static constexpr bool use_int_hash_for_pods = false;
};
#endif
struct SipHash64Impl
{
@ -1624,6 +1654,7 @@ using FunctionIntHash32 = FunctionIntHash<IntHash32Impl, NameIntHash32>;
using FunctionIntHash64 = FunctionIntHash<IntHash64Impl, NameIntHash64>;
#if USE_SSL
using FunctionHalfMD5 = FunctionAnyHash<HalfMD5Impl>;
using FunctionRipeMD160Hash = FunctionAnyHash<RipeMD160Impl>;
#endif
using FunctionSipHash128 = FunctionAnyHash<SipHash128Impl>;
using FunctionSipHash128Keyed = FunctionAnyHash<SipHash128KeyedImpl, true, SipHash128KeyedImpl::Key, SipHash128KeyedImpl::KeyColumns>;
@ -1652,6 +1683,7 @@ using FunctionXxHash64 = FunctionAnyHash<ImplXxHash64>;
using FunctionXXH3 = FunctionAnyHash<ImplXXH3>;
using FunctionWyHash64 = FunctionAnyHash<ImplWyHash64>;
}
#pragma clang diagnostic pop

View File

@ -0,0 +1,23 @@
#include "FunctionsHashing.h"
#include <Functions/FunctionFactory.h>
/// FunctionsHashing instantiations are separated into files FunctionsHashing*.cpp
/// to better parallelize the build procedure and avoid MSan build failure
/// due to excessive resource consumption.
namespace DB
{
#if USE_SSL
REGISTER_FUNCTION(HashingRipe)
{
factory.registerFunction<FunctionRipeMD160Hash>(FunctionDocumentation{
.description = "RIPEMD-160 hash function, primarily used in Bitcoin address generation.",
.examples{{"", "SELECT hex(ripeMD160('The quick brown fox jumps over the lazy dog'));", R"(
hex(ripeMD160('The quick brown fox jumps over the lazy dog'))
37F332F68DB77BD9D7EDD4969571AD671CF9DD3B
)"}},
.categories{"Hash"}});
}
#endif
}

View File

@ -284,12 +284,12 @@ void OrdinalDate::init(int64_t modified_julian_day)
bool OrdinalDate::tryInit(int64_t modified_julian_day)
{
/// This function supports day number from -678941 to 2973119 (which represent 0000-01-01 and 9999-12-31 respectively).
/// This function supports day number from -678941 to 2973483 (which represent 0000-01-01 and 9999-12-31 respectively).
if (modified_julian_day < -678941)
return false;
if (modified_julian_day > 2973119)
if (modified_julian_day > 2973483)
return false;
const auto a = modified_julian_day + 678575;

View File

@ -4,18 +4,21 @@
#if USE_ICU
#include <Columns/ColumnString.h>
#include <Functions/LowerUpperImpl.h>
#include <base/find_symbols.h>
#include <unicode/unistr.h>
#include <Common/StringUtils.h>
# include <Columns/ColumnString.h>
# include <Functions/LowerUpperImpl.h>
# include <unicode/ucasemap.h>
# include <unicode/unistr.h>
# include <unicode/urename.h>
# include <unicode/utypes.h>
# include <Common/StringUtils.h>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
extern const int BAD_ARGUMENTS;
extern const int LOGICAL_ERROR;
}
template <char not_case_lower_bound, char not_case_upper_bound, bool upper>
@ -28,7 +31,7 @@ struct LowerUpperUTF8Impl
ColumnString::Offsets & res_offsets,
size_t input_rows_count)
{
if (data.empty())
if (input_rows_count == 0)
return;
bool all_ascii = isAllASCII(data.data(), data.size());
@ -39,37 +42,56 @@ struct LowerUpperUTF8Impl
}
res_data.resize(data.size());
res_offsets.resize_exact(offsets.size());
res_offsets.resize_exact(input_rows_count);
UErrorCode error_code = U_ZERO_ERROR;
UCaseMap * case_map = ucasemap_open("", U_FOLD_CASE_DEFAULT, &error_code);
if (U_FAILURE(error_code))
throw DB::Exception(ErrorCodes::LOGICAL_ERROR, "Error calling ucasemap_open: {}", u_errorName(error_code));
String output;
size_t curr_offset = 0;
for (size_t i = 0; i < offsets.size(); ++i)
for (size_t row_i = 0; row_i < input_rows_count; ++row_i)
{
const auto * data_start = reinterpret_cast<const char *>(&data[offsets[i - 1]]);
size_t size = offsets[i] - offsets[i - 1];
const auto * src = reinterpret_cast<const char *>(&data[offsets[row_i - 1]]);
size_t src_size = offsets[row_i] - offsets[row_i - 1] - 1;
icu::UnicodeString input(data_start, static_cast<int32_t>(size), "UTF-8");
int32_t dst_size;
if constexpr (upper)
input.toUpper();
dst_size = ucasemap_utf8ToUpper(
case_map, reinterpret_cast<char *>(&res_data[curr_offset]), res_data.size() - curr_offset, src, src_size, &error_code);
else
input.toLower();
dst_size = ucasemap_utf8ToLower(
case_map, reinterpret_cast<char *>(&res_data[curr_offset]), res_data.size() - curr_offset, src, src_size, &error_code);
output.clear();
input.toUTF8String(output);
if (error_code == U_BUFFER_OVERFLOW_ERROR || error_code == U_STRING_NOT_TERMINATED_WARNING)
{
size_t new_size = curr_offset + dst_size + 1;
res_data.resize(new_size);
/// For valid UTF-8 input strings, ICU sometimes produces output with extra '\0's at the end. Only the data before the first
/// '\0' is valid. It the input is not valid UTF-8, then the behavior of lower/upperUTF8 is undefined by definition. In this
/// case, the behavior is also reasonable.
const char * res_end = find_last_not_symbols_or_null<'\0'>(output.data(), output.data() + output.size());
size_t valid_size = res_end ? res_end - output.data() + 1 : 0;
error_code = U_ZERO_ERROR;
if constexpr (upper)
dst_size = ucasemap_utf8ToUpper(
case_map, reinterpret_cast<char *>(&res_data[curr_offset]), res_data.size() - curr_offset, src, src_size, &error_code);
else
dst_size = ucasemap_utf8ToLower(
case_map, reinterpret_cast<char *>(&res_data[curr_offset]), res_data.size() - curr_offset, src, src_size, &error_code);
}
res_data.resize(curr_offset + valid_size + 1);
memcpy(&res_data[curr_offset], output.data(), valid_size);
res_data[curr_offset + valid_size] = 0;
if (error_code != U_ZERO_ERROR)
throw DB::Exception(
ErrorCodes::LOGICAL_ERROR,
"Error calling {}: {} input: {} input_size: {}",
upper ? "ucasemap_utf8ToUpper" : "ucasemap_utf8ToLower",
u_errorName(error_code),
std::string_view(src, src_size),
src_size);
curr_offset += valid_size + 1;
res_offsets[i] = curr_offset;
res_data[curr_offset + dst_size] = 0;
curr_offset += dst_size + 1;
res_offsets[row_i] = curr_offset;
}
res_data.resize(curr_offset);
}
static void vectorFixed(const ColumnString::Chars &, size_t, ColumnString::Chars &, size_t)

View File

@ -1598,6 +1598,9 @@ ColumnPtr FunctionArrayElement::executeTuple(const ColumnsWithTypeAndName & argu
const auto & tuple_columns = col_nested->getColumns();
size_t tuple_size = tuple_columns.size();
if (tuple_size == 0)
return ColumnTuple::create(input_rows_count);
const DataTypes & tuple_types = typeid_cast<const DataTypeTuple &>(
*typeid_cast<const DataTypeArray &>(*arguments[0].type).getNestedType()).getElements();

718
src/Functions/overlay.cpp Normal file
View File

@ -0,0 +1,718 @@
#include <Columns/ColumnConst.h>
#include <Columns/ColumnString.h>
#include <DataTypes/DataTypeString.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/GatherUtils/Sources.h>
#include <Functions/IFunction.h>
#include <Common/StringUtils.h>
#include <Common/UTF8Helpers.h>
namespace DB
{
namespace
{
/// If 'is_utf8' - measure offset and length in code points instead of bytes.
/// Syntax:
/// - overlay(input, replace, offset[, length])
/// - overlayUTF8(input, replace, offset[, length]) - measure offset and length in code points instead of bytes
template <bool is_utf8>
class FunctionOverlay : public IFunction
{
public:
static constexpr auto name = is_utf8 ? "overlayUTF8" : "overlay";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionOverlay>(); }
String getName() const override { return name; }
bool isVariadic() const override { return true; }
size_t getNumberOfArguments() const override { return 0; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return true; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
FunctionArgumentDescriptors mandatory_args{
{"input", static_cast<FunctionArgumentDescriptor::TypeValidator>(&isString), nullptr, "String"},
{"replace", static_cast<FunctionArgumentDescriptor::TypeValidator>(&isString), nullptr, "String"},
{"offset", static_cast<FunctionArgumentDescriptor::TypeValidator>(&isNativeInteger), nullptr, "(U)Int8/16/32/64"},
};
FunctionArgumentDescriptors optional_args{
{"length", static_cast<FunctionArgumentDescriptor::TypeValidator>(&isNativeInteger), nullptr, "(U)Int8/16/32/64"},
};
validateFunctionArguments(*this, arguments, mandatory_args, optional_args);
return std::make_shared<DataTypeString>();
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
if (input_rows_count == 0)
return ColumnString::create();
bool has_four_args = (arguments.size() == 4);
ColumnPtr col_input = arguments[0].column;
const auto * col_input_const = checkAndGetColumn<ColumnConst>(col_input.get());
const auto * col_input_string = checkAndGetColumn<ColumnString>(col_input.get());
bool input_is_const = (col_input_const != nullptr);
ColumnPtr col_replace = arguments[1].column;
const auto * col_replace_const = checkAndGetColumn<ColumnConst>(col_replace.get());
const auto * col_replace_string = checkAndGetColumn<ColumnString>(col_replace.get());
bool replace_is_const = (col_replace_const != nullptr);
ColumnPtr col_offset = arguments[2].column;
const ColumnConst * col_offset_const = checkAndGetColumn<ColumnConst>(col_offset.get());
bool offset_is_const = false;
Int64 offset = -1;
if (col_offset_const)
{
offset = col_offset_const->getInt(0);
offset_is_const = true;
}
ColumnPtr col_length = has_four_args ? arguments[3].column : nullptr;
const ColumnConst * col_length_const = has_four_args ? checkAndGetColumn<ColumnConst>(col_length.get()) : nullptr;
bool length_is_const = false;
Int64 length = -1;
if (col_length_const)
{
length = col_length_const->getInt(0);
length_is_const = true;
}
auto res_col = ColumnString::create();
auto & res_data = res_col->getChars();
auto & res_offsets = res_col->getOffsets();
res_offsets.resize_exact(input_rows_count);
if (col_input_const)
{
StringRef input = col_input_const->getDataAt(0);
res_data.reserve((input.size + 1) * input_rows_count);
}
else
{
res_data.reserve(col_input_string->getChars().size());
}
#define OVERLAY_EXECUTE_CASE(HAS_FOUR_ARGS, OFFSET_IS_CONST, LENGTH_IS_CONST) \
if (input_is_const && replace_is_const) \
constantConstant<HAS_FOUR_ARGS, OFFSET_IS_CONST, LENGTH_IS_CONST>( \
input_rows_count, \
col_input_const->getDataAt(0), \
col_replace_const->getDataAt(0), \
col_offset, \
col_length, \
offset, \
length, \
res_data, \
res_offsets); \
else if (input_is_const && !replace_is_const) \
constantVector<HAS_FOUR_ARGS, OFFSET_IS_CONST, LENGTH_IS_CONST>( \
input_rows_count, \
col_input_const->getDataAt(0), \
col_replace_string->getChars(), \
col_replace_string->getOffsets(), \
col_offset, \
col_length, \
offset, \
length, \
res_data, \
res_offsets); \
else if (!input_is_const && replace_is_const) \
vectorConstant<HAS_FOUR_ARGS, OFFSET_IS_CONST, LENGTH_IS_CONST>( \
input_rows_count, \
col_input_string->getChars(), \
col_input_string->getOffsets(), \
col_replace_const->getDataAt(0), \
col_offset, \
col_length, \
offset, \
length, \
res_data, \
res_offsets); \
else \
vectorVector<HAS_FOUR_ARGS, OFFSET_IS_CONST, LENGTH_IS_CONST>( \
input_rows_count, \
col_input_string->getChars(), \
col_input_string->getOffsets(), \
col_replace_string->getChars(), \
col_replace_string->getOffsets(), \
col_offset, \
col_length, \
offset, \
length, \
res_data, \
res_offsets);
if (!has_four_args)
{
if (offset_is_const)
{
OVERLAY_EXECUTE_CASE(false, true, false)
}
else
{
OVERLAY_EXECUTE_CASE(false, false, false)
}
}
else
{
if (offset_is_const && length_is_const)
{
OVERLAY_EXECUTE_CASE(true, true, true)
}
else if (offset_is_const && !length_is_const)
{
OVERLAY_EXECUTE_CASE(true, true, false)
}
else if (!offset_is_const && length_is_const)
{
OVERLAY_EXECUTE_CASE(true, false, true)
}
else
{
OVERLAY_EXECUTE_CASE(true, false, false)
}
}
#undef OVERLAY_EXECUTE_CASE
return res_col;
}
private:
/// input offset is 1-based, maybe negative
/// output result is 0-based valid offset, within [0, input_size]
static size_t getValidOffset(Int64 offset, size_t input_size)
{
if (offset > 0)
{
if (static_cast<size_t>(offset) > input_size + 1)
return input_size;
else
return offset - 1;
}
else
{
if (input_size < -static_cast<size_t>(offset))
return 0;
else
return input_size + offset;
}
}
/// get character count of a slice [data, data+bytes)
static size_t getSliceSize(const UInt8 * data, size_t bytes)
{
if constexpr (is_utf8)
return UTF8::countCodePoints(data, bytes);
else
return bytes;
}
template <bool has_four_args, bool offset_is_const, bool length_is_const>
void constantConstant(
size_t rows,
const StringRef & input,
const StringRef & replace,
const ColumnPtr & column_offset,
const ColumnPtr & column_length,
Int64 const_offset,
Int64 const_length,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets) const
{
if (has_four_args && length_is_const && const_length < 0)
{
constantConstant<true, offset_is_const, false>(
rows, input, replace, column_offset, column_length, const_offset, -1, res_data, res_offsets);
return;
}
size_t input_size = getSliceSize(reinterpret_cast<const UInt8 *>(input.data), input.size);
size_t valid_offset = 0; // start from 0, not negative
if constexpr (offset_is_const)
valid_offset = getValidOffset(const_offset, input_size);
size_t replace_size = getSliceSize(reinterpret_cast<const UInt8 *>(replace.data), replace.size);
size_t valid_length = 0; // not negative
if constexpr (has_four_args && length_is_const)
{
assert(const_length >= 0);
valid_length = const_length;
}
else if constexpr (!has_four_args)
{
valid_length = replace_size;
}
Int64 offset = 0; // start from 1, maybe negative
Int64 length = 0; // maybe negative
const UInt8 * input_begin = reinterpret_cast<const UInt8 *>(input.data);
const UInt8 * input_end = reinterpret_cast<const UInt8 *>(input.data + input.size);
size_t res_offset = 0;
for (size_t i = 0; i < rows; ++i)
{
if constexpr (!offset_is_const)
{
offset = column_offset->getInt(i);
valid_offset = getValidOffset(offset, input_size);
}
if constexpr (has_four_args && !length_is_const)
{
length = column_length->getInt(i);
valid_length = length >= 0 ? length : replace_size;
}
size_t prefix_size = valid_offset;
size_t suffix_size = (prefix_size + valid_length > input_size) ? 0 : (input_size - prefix_size - valid_length);
if constexpr (!is_utf8)
{
size_t new_res_size = res_data.size() + prefix_size + replace_size + suffix_size + 1; /// +1 for zero terminator
res_data.resize(new_res_size);
/// copy prefix before replaced region
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], input.data, prefix_size);
res_offset += prefix_size;
/// copy replace
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], replace.data, replace_size);
res_offset += replace_size;
/// copy suffix after replaced region. It is not necessary to copy if suffix_size is zero.
if (suffix_size)
{
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], input.data + prefix_size + valid_length, suffix_size);
res_offset += suffix_size;
}
}
else
{
const auto * prefix_end = GatherUtils::UTF8StringSource::skipCodePointsForward(input_begin, prefix_size, input_end);
size_t prefix_bytes = prefix_end > input_end ? input.size : prefix_end - input_begin;
const auto * suffix_begin = GatherUtils::UTF8StringSource::skipCodePointsBackward(input_end, suffix_size, input_begin);
size_t suffix_bytes = input_end - suffix_begin;
size_t new_res_size = res_data.size() + prefix_bytes + replace.size + suffix_bytes + 1; /// +1 for zero terminator
res_data.resize(new_res_size);
/// copy prefix before replaced region
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], input_begin, prefix_bytes);
res_offset += prefix_bytes;
/// copy replace
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], replace.data, replace.size);
res_offset += replace.size;
/// copy suffix after replaced region. It is not necessary to copy if suffix_bytes is zero.
if (suffix_bytes)
{
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], suffix_begin, suffix_bytes);
res_offset += suffix_bytes;
}
}
/// add zero terminator
res_data[res_offset] = 0;
++res_offset;
res_offsets[i] = res_offset;
}
}
template <bool has_four_args, bool offset_is_const, bool length_is_const>
void vectorConstant(
size_t rows,
const ColumnString::Chars & input_data,
const ColumnString::Offsets & input_offsets,
const StringRef & replace,
const ColumnPtr & column_offset,
const ColumnPtr & column_length,
Int64 const_offset,
Int64 const_length,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets) const
{
if (has_four_args && length_is_const && const_length < 0)
{
vectorConstant<true, offset_is_const, false>(
rows, input_data, input_offsets, replace, column_offset, column_length, const_offset, -1, res_data, res_offsets);
return;
}
size_t replace_size = getSliceSize(reinterpret_cast<const UInt8 *>(replace.data), replace.size);
Int64 length = 0; // maybe negative
size_t valid_length = 0; // not negative
if constexpr (has_four_args && length_is_const)
{
assert(const_length >= 0);
valid_length = const_length;
}
else if constexpr (!has_four_args)
{
valid_length = replace_size;
}
Int64 offset = 0; // start from 1, maybe negative
size_t valid_offset = 0; // start from 0, not negative
size_t res_offset = 0;
for (size_t i = 0; i < rows; ++i)
{
size_t input_offset = input_offsets[i - 1];
size_t input_bytes = input_offsets[i] - input_offsets[i - 1] - 1;
size_t input_size = getSliceSize(&input_data[input_offset], input_bytes);
if constexpr (offset_is_const)
{
valid_offset = getValidOffset(const_offset, input_size);
}
else
{
offset = column_offset->getInt(i);
valid_offset = getValidOffset(offset, input_size);
}
if constexpr (has_four_args && !length_is_const)
{
length = column_length->getInt(i);
valid_length = length >= 0 ? length : replace_size;
}
size_t prefix_size = valid_offset;
size_t suffix_size = (prefix_size + valid_length > input_size) ? 0 : (input_size - prefix_size - valid_length);
if constexpr (!is_utf8)
{
size_t new_res_size = res_data.size() + prefix_size + replace_size + suffix_size + 1; /// +1 for zero terminator
res_data.resize(new_res_size);
/// copy prefix before replaced region
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], &input_data[input_offset], prefix_size);
res_offset += prefix_size;
/// copy replace
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], replace.data, replace_size);
res_offset += replace_size;
/// copy suffix after replaced region. It is not necessary to copy if suffix_size is zero.
if (suffix_size)
{
memcpySmallAllowReadWriteOverflow15(
&res_data[res_offset], &input_data[input_offset + prefix_size + valid_length], suffix_size);
res_offset += suffix_size;
}
}
else
{
const auto * input_begin = &input_data[input_offset];
const auto * input_end = &input_data[input_offset + input_bytes];
const auto * prefix_end = GatherUtils::UTF8StringSource::skipCodePointsForward(input_begin, prefix_size, input_end);
size_t prefix_bytes = prefix_end > input_end ? input_bytes : prefix_end - input_begin;
const auto * suffix_begin = GatherUtils::UTF8StringSource::skipCodePointsBackward(input_end, suffix_size, input_begin);
size_t suffix_bytes = input_end - suffix_begin;
size_t new_res_size = res_data.size() + prefix_bytes + replace.size + suffix_bytes + 1; /// +1 for zero terminator
res_data.resize(new_res_size);
/// copy prefix before replaced region
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], &input_data[input_offset], prefix_bytes);
res_offset += prefix_bytes;
/// copy replace
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], replace.data, replace.size);
res_offset += replace.size;
/// copy suffix after replaced region. It is not necessary to copy if suffix_bytes is zero.
if (suffix_bytes)
{
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], suffix_begin, suffix_bytes);
res_offset += suffix_bytes;
}
}
/// add zero terminator
res_data[res_offset] = 0;
++res_offset;
res_offsets[i] = res_offset;
}
}
template <bool has_four_args, bool offset_is_const, bool length_is_const>
void constantVector(
size_t rows,
const StringRef & input,
const ColumnString::Chars & replace_data,
const ColumnString::Offsets & replace_offsets,
const ColumnPtr & column_offset,
const ColumnPtr & column_length,
Int64 const_offset,
Int64 const_length,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets) const
{
if (has_four_args && length_is_const && const_length < 0)
{
constantVector<true, offset_is_const, false>(
rows, input, replace_data, replace_offsets, column_offset, column_length, const_offset, -1, res_data, res_offsets);
return;
}
size_t input_size = getSliceSize(reinterpret_cast<const UInt8 *>(input.data), input.size);
size_t valid_offset = 0; // start from 0, not negative
if constexpr (offset_is_const)
valid_offset = getValidOffset(const_offset, input_size);
Int64 length = 0; // maybe negative
size_t valid_length = 0; // not negative
if constexpr (has_four_args && length_is_const)
{
assert(const_length >= 0);
valid_length = const_length;
}
const auto * input_begin = reinterpret_cast<const UInt8 *>(input.data);
const auto * input_end = reinterpret_cast<const UInt8 *>(input.data + input.size);
Int64 offset = 0; // start from 1, maybe negative
size_t res_offset = 0;
for (size_t i = 0; i < rows; ++i)
{
size_t replace_offset = replace_offsets[i - 1];
size_t replace_bytes = replace_offsets[i] - replace_offsets[i - 1] - 1;
size_t replace_size = getSliceSize(&replace_data[replace_offset], replace_bytes);
if constexpr (!offset_is_const)
{
offset = column_offset->getInt(i);
valid_offset = getValidOffset(offset, input_size);
}
if constexpr (!has_four_args)
{
valid_length = replace_size;
}
else if constexpr (!length_is_const)
{
length = column_length->getInt(i);
valid_length = length >= 0 ? length : replace_size;
}
size_t prefix_size = valid_offset;
size_t suffix_size = (prefix_size + valid_length > input_size) ? 0 : (input_size - prefix_size - valid_length);
if constexpr (!is_utf8)
{
size_t new_res_size = res_data.size() + prefix_size + replace_size + suffix_size + 1; /// +1 for zero terminator
res_data.resize(new_res_size);
/// copy prefix before replaced region
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], input.data, prefix_size);
res_offset += prefix_size;
/// copy replace
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], &replace_data[replace_offset], replace_size);
res_offset += replace_size;
/// copy suffix after replaced region. It is not necessary to copy if suffix_size is zero.
if (suffix_size)
{
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], input.data + prefix_size + valid_length, suffix_size);
res_offset += suffix_size;
}
}
else
{
const auto * prefix_end = GatherUtils::UTF8StringSource::skipCodePointsForward(input_begin, prefix_size, input_end);
size_t prefix_bytes = prefix_end > input_end ? input.size : prefix_end - input_begin;
const auto * suffix_begin = GatherUtils::UTF8StringSource::skipCodePointsBackward(input_end, suffix_size, input_begin);
size_t suffix_bytes = input_end - suffix_begin;
size_t new_res_size = res_data.size() + prefix_bytes + replace_bytes + suffix_bytes + 1; /// +1 for zero terminator
res_data.resize(new_res_size);
/// copy prefix before replaced region
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], input_begin, prefix_bytes);
res_offset += prefix_bytes;
/// copy replace
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], &replace_data[replace_offset], replace_bytes);
res_offset += replace_bytes;
/// copy suffix after replaced region. It is not necessary to copy if suffix_bytes is zero
if (suffix_bytes)
{
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], suffix_begin, suffix_bytes);
res_offset += suffix_bytes;
}
}
/// add zero terminator
res_data[res_offset] = 0;
++res_offset;
res_offsets[i] = res_offset;
}
}
template <bool has_four_args, bool offset_is_const, bool length_is_const>
void vectorVector(
size_t rows,
const ColumnString::Chars & input_data,
const ColumnString::Offsets & input_offsets,
const ColumnString::Chars & replace_data,
const ColumnString::Offsets & replace_offsets,
const ColumnPtr & column_offset,
const ColumnPtr & column_length,
Int64 const_offset,
Int64 const_length,
ColumnString::Chars & res_data,
ColumnString::Offsets & res_offsets) const
{
if (has_four_args && length_is_const && const_length < 0)
{
vectorVector<true, offset_is_const, false>(
rows,
input_data,
input_offsets,
replace_data,
replace_offsets,
column_offset,
column_length,
const_offset,
-1,
res_data,
res_offsets);
return;
}
Int64 length = 0; // maybe negative
size_t valid_length = 0; // not negative
if constexpr (has_four_args && length_is_const)
{
assert(const_length >= 0);
valid_length = const_length;
}
Int64 offset = 0; // start from 1, maybe negative
size_t valid_offset = 0; // start from 0, not negative
size_t res_offset = 0;
for (size_t i = 0; i < rows; ++i)
{
size_t input_offset = input_offsets[i - 1];
size_t input_bytes = input_offsets[i] - input_offsets[i - 1] - 1;
size_t input_size = getSliceSize(&input_data[input_offset], input_bytes);
size_t replace_offset = replace_offsets[i - 1];
size_t replace_bytes = replace_offsets[i] - replace_offsets[i - 1] - 1;
size_t replace_size = getSliceSize(&replace_data[replace_offset], replace_bytes);
if constexpr (offset_is_const)
{
valid_offset = getValidOffset(const_offset, input_size);
}
else
{
offset = column_offset->getInt(i);
valid_offset = getValidOffset(offset, input_size);
}
if constexpr (!has_four_args)
{
valid_length = replace_size;
}
else if constexpr (!length_is_const)
{
length = column_length->getInt(i);
valid_length = length >= 0 ? length : replace_size;
}
size_t prefix_size = valid_offset;
size_t suffix_size = (prefix_size + valid_length > input_size) ? 0 : (input_size - prefix_size - valid_length);
if constexpr (!is_utf8)
{
size_t new_res_size = res_data.size() + prefix_size + replace_size + suffix_size + 1; /// +1 for zero terminator
res_data.resize(new_res_size);
/// copy prefix before replaced region
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], &input_data[input_offset], prefix_size);
res_offset += prefix_size;
/// copy replace
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], &replace_data[replace_offset], replace_size);
res_offset += replace_size;
/// copy suffix after replaced region. It is not necessary to copy if suffix_size is zero.
if (suffix_size)
{
memcpySmallAllowReadWriteOverflow15(
&res_data[res_offset], &input_data[input_offset + prefix_size + valid_length], suffix_size);
res_offset += suffix_size;
}
}
else
{
const auto * input_begin = &input_data[input_offset];
const auto * input_end = &input_data[input_offset + input_bytes];
const auto * prefix_end = GatherUtils::UTF8StringSource::skipCodePointsForward(input_begin, prefix_size, input_end);
size_t prefix_bytes = prefix_end > input_end ? input_bytes : prefix_end - input_begin;
const auto * suffix_begin = GatherUtils::UTF8StringSource::skipCodePointsBackward(input_end, suffix_size, input_begin);
size_t suffix_bytes = input_end - suffix_begin;
size_t new_res_size = res_data.size() + prefix_bytes + replace_bytes + suffix_bytes + 1; /// +1 for zero terminator
res_data.resize(new_res_size);
/// copy prefix before replaced region
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], input_begin, prefix_bytes);
res_offset += prefix_bytes;
/// copy replace
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], &replace_data[replace_offset], replace_bytes);
res_offset += replace_bytes;
/// copy suffix after replaced region. It is not necessary to copy if suffix_bytes is zero.
if (suffix_bytes)
{
memcpySmallAllowReadWriteOverflow15(&res_data[res_offset], suffix_begin, suffix_bytes);
res_offset += suffix_bytes;
}
}
/// add zero terminator
res_data[res_offset] = 0;
++res_offset;
res_offsets[i] = res_offset;
}
}
};
}
REGISTER_FUNCTION(Overlay)
{
factory.registerFunction<FunctionOverlay<false>>(
{.description = R"(
Replace a part of a string `input` with another string `replace`, starting at 1-based index `offset`. By default, the number of bytes removed from `input` equals the length of `replace`. If `length` (the optional fourth argument) is specified, a different number of bytes is removed.
)",
.categories{"String"}},
FunctionFactory::Case::Insensitive);
factory.registerFunction<FunctionOverlay<true>>(
{.description = R"(
Replace a part of a string `input` with another string `replace`, starting at 1-based index `offset`. By default, the number of characters removed from `input` equals the length of `replace`. If `length` (the optional fourth argument) is specified, a different number of characters is removed.
Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
)",
.categories{"String"}},
FunctionFactory::Case::Sensitive);
}
}

View File

@ -443,6 +443,7 @@ std::unique_ptr<ReadBuffer> ReadWriteBufferFromHTTP::initialize()
}
response.getCookies(cookies);
response.getHeaders(response_headers);
content_encoding = response.get("Content-Encoding", "");
// Remember file size. It'll be used to report eof in next nextImpl() call.
@ -680,6 +681,19 @@ std::string ReadWriteBufferFromHTTP::getResponseCookie(const std::string & name,
return def;
}
Map ReadWriteBufferFromHTTP::getResponseHeaders() const
{
Map map;
for (const auto & header : response_headers)
{
Tuple elem;
elem.emplace_back(header.first);
elem.emplace_back(header.second);
map.emplace_back(elem);
}
return map;
}
void ReadWriteBufferFromHTTP::setNextCallback(NextCallback next_callback_)
{
next_callback = next_callback_;

Some files were not shown because too many files have changed in this diff Show More