In case of Buffer table has columns of AggregateFunction type,
aggregate states for such columns will be allocated from the query
context but those states can be destroyed from the server context (in
case of background flush), and thus memory will be leaked from the query
since aggregate states can be shared, and eventually this will lead to
MEMORY_LIMIT_EXCEEDED error.
To avoid this, prohibit sharing the aggregate states.
But note, that this problem only about memory accounting, not memory
usage itself.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
For async s3 writes final part flushing was defered until all the INSERT
block was processed, however in case of too many partitions/columns you
may exceed max_memory_usage limit (since each stream has overhead).
Introduce max_insert_delayed_streams_for_parallel_writes (with default
to 1000 for S3, 0 otherwise), to avoid this.
This should "Memory limit exceeded" errors in performance tests.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Add a warning if parallel_distributed_insert_select was ignored
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Respect max_distributed_depth for parallel_distributed_insert_select
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Print warning for non applied parallel_distributed_insert_select only for initial query
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Remove Cluster::getHashOfAddresses()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Forbid parallel_distributed_insert_select for remote()/cluster() with different addresses
Before it uses empty cluster name (getClusterName()) which is not
correct, compare all addresses instead.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Fix max_distributed_depth check
max_distributed_depth=1 must mean not more then one distributed query,
not two, since max_distributed_depth=0 means no limit, and
distribute_depth is 0 for the first query.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Fix INSERT INTO remote()/cluster() with parallel_distributed_insert_select
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Add a test for parallel_distributed_insert_select with cluster()/remote()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Return <remote> instead of empty cluster name in Distributed engine
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Make user with sharding_key and w/o in remote()/cluster() identical
Before with sharding_key the user was "default", while w/o it it was
empty.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
In #33291 final part commit had been defered, and now it can take
significantly more time, that may lead to "Part directory doesn't exist"
error during INSERT:
2022.02.21 18:18:06.979881 [ 11329 ] {insert} <Debug> executeQuery: (from 127.1:24572, user: default) INSERT INTO db.table (...) VALUES
2022.02.21 20:58:03.933593 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18044_18044_0 to 20220214_270654_270654_0.
2022.02.21 21:16:50.961917 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18197_18197_0 to 20220214_270689_270689_0.
...
2022.02.22 21:16:57.632221 [ 64878 ] {} <Warning> db.table: Removing temporary directory /clickhouse/data/db/table/tmp_insert_20220214_18232_18232_0/
...
2022.02.23 12:23:56.277480 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18232_18232_0 to 20220214_273459_273459_0.
2022.02.23 12:23:56.299218 [ 11329 ] {insert} <Error> executeQuery: Code: 107. DB::Exception: Part directory /clickhouse/data/db/table/tmp_insert_20220214_18232_18232_0/ doesn't exist. Most likely it is a logical error. (FILE_DOESNT_EXIST) (version 22.2.1.1) (from 127.1:24572) (in query: INSERT INTO db.table (...) VALUES), Stack trace (when copying this message, always include the lines below):
Follow-up for: #28760
Refs: #33291
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
In external dictionary providers, the allowed keys for configuration seemed to have a typo
of "update_lag" as "update_tag", preventing the use of "update_lag". This change fixes that.
system.mutations includes only the message, but not stacktrace, and it
is not always obvious to understand the culprit w/o stacktrace.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
It was set only bu utils/release/release_lib.sh, and seems that this
script is not used anymore, at least that part of it.
Also note, that GIT_DATE is the same, and it is date time, not only
date.
Plus VERSION_DATE is not installed for releases anyway.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
CMAKE_LIBRARY_ARCHITECTURE and it is useless, since it is reported only
if the compiler reports subdir arch triplet [1]
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1531678
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>