Commit Graph

3103 Commits

Author SHA1 Message Date
alesapin
1bd5736e34 Fix build one more time 2022-03-19 20:00:08 +01:00
alesapin
9e24677a30 Fix build 2022-03-19 18:09:01 +01:00
alesapin
97e84e6dc2 fix build 2022-03-19 17:47:35 +01:00
alesapin
f2c5e2d3a0 Don't spam logs in zero copy replication 2022-03-19 17:31:33 +01:00
Nikolai Kochetov
ee9c2ec735
Merge pull request #34780 from azat/mt-delayed-part-flush
Do not delay final part writing by default (fixes possible Memory limit exceeded during INSERT)
2022-03-17 12:30:51 +01:00
alesapin
bb251938dc
Merge pull request #35344 from ClickHouse/changelog-22.3
Changelog 22.3
2022-03-17 11:25:36 +01:00
Alexey Milovidov
68ef49ea51 Fix something stupid 2022-03-17 05:57:13 +01:00
Anton Popov
de2cc23e15 fix race 2022-03-16 20:16:59 +00:00
Anton Popov
0ba78c3c3a Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-16 15:28:09 +00:00
alesapin
fbb1ebd9b8
Merge pull request #35274 from CurtizJ/fix-check-table-sparse-columns
Fix check table in case when there exist sparse columns
2022-03-14 21:56:04 +01:00
Maksim Kita
2fdcf53a76 Fix clang-tidy warnings in Server, Storages folders 2022-03-14 18:17:35 +00:00
Anton Popov
063917786e minor fixes 2022-03-14 17:29:18 +00:00
Anton Popov
36ec379aeb Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-14 16:28:35 +00:00
Anton Popov
428bbd6377 fix check table in case when there exist sparse columns 2022-03-14 15:22:23 +00:00
Kseniia Sumarokova
818459b9f0
Merge pull request #33717 from kssenii/local-cache-for-remote-fs
Local cache for remote filesystem
2022-03-11 07:23:10 +01:00
alesapin
c0d8ccc91b
Merge pull request #35178 from Varinara/master
Added disk_name to system.part_log
2022-03-10 22:22:37 +01:00
Varinara
f5523f7ff0 added disk_name to system.part_log 2022-03-10 18:44:19 +03:00
kssenii
787a0805a5 Merge master 2022-03-10 11:42:19 +01:00
zhangyifan27
e6fa9f699a fix typo 2022-03-10 18:29:42 +08:00
Vladimir C
ce266b5a3e
Merge pull request #35146 from amosbird/fixpartitionprunerin 2022-03-09 13:23:45 +01:00
Amos Bird
a19224bc9b
Fix partition pruner: non-monotonic function IN 2022-03-09 15:48:42 +08:00
Azat Khuzhin
3a5a39a9df Do not delay final part writing by default
For async s3 writes final part flushing was defered until all the INSERT
block was processed, however in case of too many partitions/columns you
may exceed max_memory_usage limit (since each stream has overhead).

Introduce max_insert_delayed_streams_for_parallel_writes (with default
to 1000 for S3, 0 otherwise), to avoid this.

This should "Memory limit exceeded" errors in performance tests.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-08 22:17:36 +03:00
kssenii
5260822964 Merge master 2022-03-08 18:21:28 +01:00
kssenii
e231c3a3e0 Fix split build 2022-03-08 18:05:55 +01:00
Azat Khuzhin
caffc144b5 Fix possible "Part directory doesn't exist" during INSERT
In #33291 final part commit had been defered, and now it can take
significantly more time, that may lead to "Part directory doesn't exist"
error during INSERT:

    2022.02.21 18:18:06.979881 [ 11329 ] {insert} <Debug> executeQuery: (from 127.1:24572, user: default) INSERT INTO db.table (...) VALUES
    2022.02.21 20:58:03.933593 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18044_18044_0 to 20220214_270654_270654_0.
    2022.02.21 21:16:50.961917 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18197_18197_0 to 20220214_270689_270689_0.
    ...
    2022.02.22 21:16:57.632221 [ 64878 ] {} <Warning> db.table: Removing temporary directory /clickhouse/data/db/table/tmp_insert_20220214_18232_18232_0/
    ...
    2022.02.23 12:23:56.277480 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18232_18232_0 to 20220214_273459_273459_0.
    2022.02.23 12:23:56.299218 [ 11329 ] {insert} <Error> executeQuery: Code: 107. DB::Exception: Part directory /clickhouse/data/db/table/tmp_insert_20220214_18232_18232_0/ doesn't exist. Most likely it is a logical error. (FILE_DOESNT_EXIST) (version 22.2.1.1) (from 127.1:24572) (in query: INSERT INTO db.table (...) VALUES), Stack trace (when copying this message, always include the lines below):

Follow-up for: #28760
Refs: #33291

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-08 07:44:11 +03:00
Anton Popov
0bc57da238 Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-07 14:46:08 +00:00
Azat Khuzhin
bc224dee36 Do not hide exceptions during mutations
system.mutations includes only the message, but not stacktrace, and it
is not always obvious to understand the culprit w/o stacktrace.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-06 13:39:49 +03:00
Maksim Kita
7ae1f0fa3b
Merge pull request #34911 from larspars/master
Allow LowCardinality strings for ngrambf_v1/tokenbf_v1 indexes. Fixes #21865
2022-03-04 19:17:48 +01:00
Anton Popov
df3b07fe7c Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-03 22:25:28 +00:00
Maksim Kita
b1a956c5f1 clang-tidy check performance-move-const-arg fix 2022-03-02 18:15:27 +00:00
mreddy017
f893002b69 Fix vulnerable code related to std::move and noexcept
This commit fixes the vulnerable code related to std::move and noexcept identified by clangtidy tool.
2022-03-02 18:15:27 +00:00
Maksim Kita
53116faeeb
Update MergeTreeIndexFullText.cpp 2022-03-02 11:08:35 +01:00
Filatenkov Artur
f48f35cad0
Merge pull request #34975 from Vector-Similarity-Search-for-ClickHouse/fix-typo
Fix typo
2022-03-02 09:59:06 +03:00
Anton Popov
d7cd9aa69b fix reading of missed subcolumns 2022-03-02 03:31:40 +03:00
NikitaEvs
06f47673f4 Fix typo 2022-03-01 21:42:27 +00:00
Anton Popov
04a3a10148 minor fixes 2022-03-01 20:20:53 +03:00
Anton Popov
2758db5341 add more comments 2022-03-01 19:32:55 +03:00
Lars Eidnes
2629614dfe Allow LowCardinality strings for ngrambf_v1/tokenbf_v1 indexes. Fixes #21865 2022-02-25 15:36:36 +01:00
Anton Popov
fcdebea925 Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-25 13:41:30 +03:00
Dmitry Novik
2fd4baaa64
Merge pull request #34387 from nvartolomei/nv/move-part-settings-cleanup
Remove useless setting experimental_query_deduplication_send_all_part_uuids
2022-02-22 06:11:00 -08:00
Kseniia Sumarokova
eeea322556
Merge pull request #34629 from amosbird/remotefsimprove
Some refactoring and improvement over async and remote buffer related stuff
2022-02-22 11:36:40 +01:00
mergify[bot]
314ab73b11
Merge branch 'master' into nv/move-part-settings-cleanup 2022-02-21 10:18:44 +00:00
Dmitry Novik
4428e7aa1b
Merge branch 'master' into nv/move-part-count 2022-02-21 02:14:23 -08:00
Azat Khuzhin
fef5f146e7 Fix ENOENT with fsync_part_directory and Vertical merge
fsync of the temporary part directory is superfluous anyway, and besides
that directory is not exists at that time, that will lead to ENOENT
error:

    2022.02.18 17:02:51.634565 [ 35639 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 107. DB::ErrnoException: Cannot open file /var/lib/clickhouse/data/system/text_log/tmp_merge_202202_1864_3192_14/, errno: 2, strerror: No such file or directory. (FILE_DOESNT_EXIST), Stack trace (when copying this message, always include the lines below):

    0. DB::Exception::Exception() @ 0xb26ecfa in /usr/lib/debug/.build-id/01/8c328bd4858d67.debug
    1. DB::throwFromErrnoWithPath() @ 0xb2700ea in /usr/lib/debug/.build-id/01/8c328bd4858d67.debug
    2. DB::LocalDirectorySyncGuard::LocalDirectorySyncGuard() @ 0x14905531 in /usr/lib/debug/.build-id/01/8c328bd4858d67.debug
    3. DB::DiskLocal::getDirectorySyncGuard() const @ 0x148af3e3 in /usr/lib/debug/.build-id/01/8c328bd4858d67.debug
    4. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::prepare() @ 0x157bef13 in /usr/lib/debug/.build-id/01/8c328bd4858d67.debug

Note, that IMergeTreeDataPart::renameTo() anyway will have fsync for the
directory.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-02-19 07:50:59 +03:00
Azat Khuzhin
65e9b4879d Fix possible memory_tracker use-after-free for merges/mutations
There are two possible cases for execution merges/mutations:
1) from background thread
2) from OPTIMIZE TABLE query

1) is pretty simple, it's memory tracking structure is as follow:

    current_thread::memory_tracker = level=Thread / description="(for thread)" ==
      background_thread_memory_tracker = level=Thread / description="(for thread)"
    current_thread::memory_tracker.parent = level=Global / description="(total)"

  So as you can see it is pretty simple and MemoryTrackerThreadSwitcher
  does not do anything icky for this case.

2) is complex, it's memory tracking structure is as follow:

    current_thread::memory_tracker = level=Thread / description="(for thread)"
    current_thread::memory_tracker.parent = level=Process / description="(for query)" ==
      background_thread_memory_tracker = level=Process / description="(for query)"

  Before this patch to track memory (and related things, like sampling,
  profiling and so on) for OPTIMIZE TABLE query dirty hacks was done to
  do this, since current_thread memory_tracker was of Thread scope, that
  does not have any limits.

  And so if will change parent for it to Merge/Mutate memory tracker
  (which also does not have some of settings) it will not be correctly
  tracked.

  To address this Merge/Mutate was set as parent not to the
  current_thread memory_tracker but to it's parent, since it's scope is
  Process with all settings.

  But that parent's memory_tracker is the memory_tracker of the
  thread_group, and so if you will have nested ThreadPool inside
  merge/mutate (this is the case for s3 async writes, which has been
  added in #33291) you may get use-after-free of memory_tracker.

  Consider the following example:

    MemoryTrackerThreadSwitcher()
      thread_group.memory_tracker.parent = merge_list_entry->memory_tracker
      (see also background_thread_memory_tracker above)

    CurrentThread::attachTo()
      current_thread.memory_tracker.parent = thread_group.memory_tracker

    CurrentThread::detachQuery()
      current_thread.memory_tracker.parent = thread_group.memory_tracker.parent
      # and this is equal to merge_list_entry->memory_tracker

    ~MemoryTrackerThreadSwitcher()
      thread_group.memory_tracker = thread_group.memory_tracker.parent

  So after the following we will get incorrect memory_tracker (from the
  mege_list_entry) when the next job in that ThreadPool will not have
  thread_group, since in this case it will not try to update the
  current_thread.memory_tracker.parent and use-after-free will happens.

So to address the (2) issue, settings from the parent memory_tracker
should be copied to the merge_list_entry->memory_tracker, to avoid
playing with parent memory tracker.

Note, that settings from the query (OPTIMIZE TABLE) is not available at
that time, so it cannot be used (instead of parent's memory tracker
settings).

v2: remove memory_tracker.setOrRaiseHardLimit() from settings

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-02-18 16:23:54 +03:00
Amos Bird
f459e8fc95
Less getMark calls 2022-02-18 19:55:19 +08:00
Amos Bird
d3bd8b5f93
Cosmetic fix 2022-02-17 14:31:22 +08:00
Amos Bird
ba19c7cf44
Slightly better interface of compressed buffer 2022-02-17 14:31:22 +08:00
Azat Khuzhin
774744a86d Fix allow_experimental_projection_optimization with enable_global_with_statement
allow_experimental_projection_optimization requires one more
InterpreterSelectQuery, which with enable_global_with_statement will
apply ApplyWithAliasVisitor if the query is not subquery.

But this should not be done for queries from
MergeTreeData::getQueryProcessingStage()/getQueryProcessingStageWithAggregateProjections()
since this will duplicate WITH statements over and over.

This will also fix scalar.xml perf tests, that leads to the following
error now:

    scalar.query0.prewarm0: DB::Exception: Stack size too large.

And since it has very long query in the log, this leads to the following
perf test error:

    _csv.Error: field larger than field limit (131072)

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-02-16 19:14:47 +03:00
Anton Popov
a661eaf39f better performance of getting storage snapshot 2022-02-16 02:17:22 +03:00