Commit Graph

96 Commits

Author SHA1 Message Date
Nikolai Kochetov
5bc9b32025 Merge branch 'master' into refactor-something-in-part-volumes 2022-06-08 11:10:06 +00:00
Nikolai Kochetov
678d978acf Merge branch 'master' into refactor-something-in-part-volumes 2022-06-07 15:23:00 +00:00
Anton Popov
df6882d2b9
Revert "Fix errors of CheckTriviallyCopyableMove type" 2022-06-07 13:53:10 +02:00
Anton Popov
ef6f5a6500
Merge pull request #37570 from azat/column-ttl-expired-fix
Do not write expired columns by TTL after subsequent merges
2022-06-07 13:05:03 +02:00
Anton Popov
d40b23272e
Merge pull request #37755 from CurtizJ/fix-mutations-again
Return back #37266
2022-06-06 20:22:03 +02:00
Robert Schulze
2d87af2a15
Merge pull request #37647 from DevTeamBK/Fix-all-CheckTriviallyCopyableMove-Errors
Fix errors of CheckTriviallyCopyableMove type
2022-06-05 19:58:47 +02:00
Anton Popov
5cf2c67410
Merge branch 'master' into fix-mutations-again 2022-06-03 14:17:05 +02:00
Nikolai Kochetov
89c5855d20 Merge branch 'master' into refactor-something-in-part-volumes 2022-06-02 12:19:07 +02:00
Anton Popov
cae2478b3f Revert "Merge pull request #37355 from ClickHouse/revert-37266-fix-mutations-with-object"
This reverts commit a53cfa9fca, reversing
changes made to 9acb42fcdb.
2022-06-01 10:57:20 +00:00
HeenaBansal2009
b7eb6bbd38 Fixed clang-tidy-CheckTriviallyCopyableMove-errors 2022-05-30 11:09:03 -07:00
Azat Khuzhin
73a99d4eee Improve error message for skipped/expired columns
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
f0cac5417a More optional skipping of fully expired columns (by TTL)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
46a94f395d Move the check into IMergedBlockOutputStream::removeEmptyColumnsFromPart()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
4288d09a85 Do not write expired columns by TTL after merge w/o TTL
Usually second merge do not perform TTL, since everything is up to date,
however in this case TTLTransform is not used, and hence expired_columns
will not be filled for new part, and so those columns will be written
with default values.

Avoid this, by manually filling expired_columns.

Here is a simpler reproducer:

Simple reproducer:

    ```sql
    create table ttl_02262 (date Date, key Int, value String TTL date + interval 1 month) engine=MergeTree order by key settings min_bytes_for_wide_part=0, min_rows_for_wide_part=0;
    insert into ttl_02262 values ('2010-01-01', 2010, 'foo');
    ```

    ```sh
    # ls -l .server/data/default/ttl_02262/all_*
    .server/data/default/ttl_02262/all_1_1_0:
    total 48
    -rw-r----- 1 root root 335 May 26 14:19 checksums.txt
    -rw-r----- 1 root root  76 May 26 14:19 columns.txt
    -rw-r----- 1 root root   1 May 26 14:19 count.txt
    -rw-r----- 1 root root  28 May 26 14:19 date.bin
    -rw-r----- 1 root root  48 May 26 14:19 date.mrk2
    -rw-r----- 1 root root  10 May 26 14:19 default_compression_codec.txt
    -rw-r----- 1 root root  30 May 26 14:19 key.bin
    -rw-r----- 1 root root  48 May 26 14:19 key.mrk2
    -rw-r----- 1 root root   8 May 26 14:19 primary.idx
    -rw-r----- 1 root root  99 May 26 14:19 ttl.txt
    -rw-r----- 1 root root  30 May 26 14:19 value.bin
    -rw-r----- 1 root root  48 May 26 14:19 value.mrk2
    ```

    ```sql
    optimize table ttl_02262 final;
    ```

    ```sh
    .server/data/default/ttl_02262/all_1_1_1:
    total 40
    -rw-r----- 1 root root 279 May 26 14:19 checksums.txt
    -rw-r----- 1 root root  61 May 26 14:19 columns.txt
    -rw-r----- 1 root root   1 May 26 14:19 count.txt
    -rw-r----- 1 root root  28 May 26 14:19 date.bin
    -rw-r----- 1 root root  48 May 26 14:19 date.mrk2
    -rw-r----- 1 root root  10 May 26 14:19 default_compression_codec.txt
    -rw-r----- 1 root root  30 May 26 14:19 key.bin
    -rw-r----- 1 root root  48 May 26 14:19 key.mrk2
    -rw-r----- 1 root root   8 May 26 14:19 primary.idx
    -rw-r----- 1 root root  81 May 26 14:19 ttl.txt
    ```

    ```sql
    optimize table ttl_02262 final;
    ```

    ```sh
    .server/data/default/ttl_02262/all_1_1_2:
    total 48
    -rw-r----- 1 root root 349 May 26 14:20 checksums.txt
    -rw-r----- 1 root root  76 May 26 14:20 columns.txt
    -rw-r----- 1 root root   1 May 26 14:20 count.txt
    -rw-r----- 1 root root  28 May 26 14:20 date.bin
    -rw-r----- 1 root root  48 May 26 14:20 date.mrk2
    -rw-r----- 1 root root  10 May 26 14:20 default_compression_codec.txt
    -rw-r----- 1 root root  30 May 26 14:20 key.bin
    -rw-r----- 1 root root  48 May 26 14:20 key.mrk2
    -rw-r----- 1 root root   8 May 26 14:20 primary.idx
    -rw-r----- 1 root root  81 May 26 14:20 ttl.txt
    -rw-r----- 1 root root  27 May 26 14:20 value.bin
    -rw-r----- 1 root root  48 May 26 14:20 value.mrk2
    ```

And now we have `value.*` for all_1_1_2, this should not happen.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Nikolai Kochetov
6d4a26afac Update ReadProgressCallback. 2022-05-25 19:45:48 +00:00
Nikolai Kochetov
1b85f2c1d6 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-25 16:27:40 +02:00
Nikolai Kochetov
56feef01e7 Move some resources 2022-05-20 19:49:31 +00:00
Alexander Tokmakov
f787dc7097
Revert "Fix mutations in tables with columns of type Object" 2022-05-19 13:24:48 +03:00
Anton Popov
b6c5ab4fcf fix mutations in tables with columns of type Object 2022-05-16 18:26:53 +00:00
Maksim Kita
437d70d4da Fixed tests 2022-05-11 21:59:51 +02:00
Maksim Kita
4e7d10297b Fixed style 2022-05-11 21:59:51 +02:00
Maksim Kita
ce92f6aab1 Fixed tests 2022-05-11 21:59:51 +02:00
Maksim Kita
8ceb63ee6c Added JIT compilation of SortDescription 2022-05-11 21:59:51 +02:00
Nikolai Kochetov
35095191eb Merge branch 'master' into refactor-something-in-part-volumes 2022-05-03 17:51:47 +02:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
Nikolai Kochetov
e44af67fee Merge branch 'master' into refactor-something-in-part-volumes 2022-04-26 21:08:00 +02:00
Nikolai Kochetov
9133e398b8 Part 7 2022-04-21 19:19:13 +00:00
Robert Schulze
118e94523c
Activate clang-tidy warning "readability-container-contains"
This check suggests replacing <Container>.count() by
<Container>.contains() which is more speaking and in case of
multimaps/multisets also faster.
2022-04-18 23:53:11 +02:00
Alexander Tokmakov
49c35f3261 Merge branch 'master' into mvcc_prototype 2022-04-08 13:34:40 +02:00
alesapin
8ec802bc62
Merge pull request #35475 from kssenii/remote-fs-cache-improvements
Allow to write remote fs cache on all write operations. Add `system.remote_filesystem_cache` table. Add `drop remote filesystem cache (<path>)` query. Add `system.remote_data_paths` table.
2022-04-08 12:06:26 +02:00
Alexander Tokmakov
da00beaf7f Merge branch 'master' into mvcc_prototype 2022-04-05 11:14:42 +02:00
Nikita Taranov
bd89fcafdb
Make SortDescription::column_name always non-empty (#35805) 2022-04-04 14:17:15 +02:00
kssenii
d4161b5925 Add optin read_from_cache_if_exists_otherwise_bypass_cache (for merges) 2022-03-23 20:24:00 +01:00
kssenii
d2a3cfe5dc Cache on all write operations 2022-03-23 19:14:33 +01:00
Alexander Tokmakov
d04dc03fa4 Merge branch 'master' into mvcc_prototype 2022-03-17 15:24:32 +01:00
Alexander Tokmakov
c2ac8d4a5c review fixes 2022-03-16 21:05:34 +01:00
Anton Popov
36ec379aeb Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-14 16:28:35 +00:00
Alexander Tokmakov
061fa6a6f2 Merge branch 'master' into mvcc_prototype 2022-03-10 13:13:04 +01:00
Azat Khuzhin
caffc144b5 Fix possible "Part directory doesn't exist" during INSERT
In #33291 final part commit had been defered, and now it can take
significantly more time, that may lead to "Part directory doesn't exist"
error during INSERT:

    2022.02.21 18:18:06.979881 [ 11329 ] {insert} <Debug> executeQuery: (from 127.1:24572, user: default) INSERT INTO db.table (...) VALUES
    2022.02.21 20:58:03.933593 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18044_18044_0 to 20220214_270654_270654_0.
    2022.02.21 21:16:50.961917 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18197_18197_0 to 20220214_270689_270689_0.
    ...
    2022.02.22 21:16:57.632221 [ 64878 ] {} <Warning> db.table: Removing temporary directory /clickhouse/data/db/table/tmp_insert_20220214_18232_18232_0/
    ...
    2022.02.23 12:23:56.277480 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18232_18232_0 to 20220214_273459_273459_0.
    2022.02.23 12:23:56.299218 [ 11329 ] {insert} <Error> executeQuery: Code: 107. DB::Exception: Part directory /clickhouse/data/db/table/tmp_insert_20220214_18232_18232_0/ doesn't exist. Most likely it is a logical error. (FILE_DOESNT_EXIST) (version 22.2.1.1) (from 127.1:24572) (in query: INSERT INTO db.table (...) VALUES), Stack trace (when copying this message, always include the lines below):

Follow-up for: #28760
Refs: #33291

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-08 07:44:11 +03:00
Anton Popov
fcdebea925 Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-25 13:41:30 +03:00
Alexander Tokmakov
aa6b9a2abc Merge branch 'master' into mvcc_prototype 2022-02-23 23:22:03 +03:00
Azat Khuzhin
fef5f146e7 Fix ENOENT with fsync_part_directory and Vertical merge
fsync of the temporary part directory is superfluous anyway, and besides
that directory is not exists at that time, that will lead to ENOENT
error:

    2022.02.18 17:02:51.634565 [ 35639 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 107. DB::ErrnoException: Cannot open file /var/lib/clickhouse/data/system/text_log/tmp_merge_202202_1864_3192_14/, errno: 2, strerror: No such file or directory. (FILE_DOESNT_EXIST), Stack trace (when copying this message, always include the lines below):

    0. DB::Exception::Exception() @ 0xb26ecfa in /usr/lib/debug/.build-id/01/8c328bd4858d67.debug
    1. DB::throwFromErrnoWithPath() @ 0xb2700ea in /usr/lib/debug/.build-id/01/8c328bd4858d67.debug
    2. DB::LocalDirectorySyncGuard::LocalDirectorySyncGuard() @ 0x14905531 in /usr/lib/debug/.build-id/01/8c328bd4858d67.debug
    3. DB::DiskLocal::getDirectorySyncGuard() const @ 0x148af3e3 in /usr/lib/debug/.build-id/01/8c328bd4858d67.debug
    4. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::prepare() @ 0x157bef13 in /usr/lib/debug/.build-id/01/8c328bd4858d67.debug

Note, that IMergeTreeDataPart::renameTo() anyway will have fsync for the
directory.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-02-19 07:50:59 +03:00
Azat Khuzhin
65e9b4879d Fix possible memory_tracker use-after-free for merges/mutations
There are two possible cases for execution merges/mutations:
1) from background thread
2) from OPTIMIZE TABLE query

1) is pretty simple, it's memory tracking structure is as follow:

    current_thread::memory_tracker = level=Thread / description="(for thread)" ==
      background_thread_memory_tracker = level=Thread / description="(for thread)"
    current_thread::memory_tracker.parent = level=Global / description="(total)"

  So as you can see it is pretty simple and MemoryTrackerThreadSwitcher
  does not do anything icky for this case.

2) is complex, it's memory tracking structure is as follow:

    current_thread::memory_tracker = level=Thread / description="(for thread)"
    current_thread::memory_tracker.parent = level=Process / description="(for query)" ==
      background_thread_memory_tracker = level=Process / description="(for query)"

  Before this patch to track memory (and related things, like sampling,
  profiling and so on) for OPTIMIZE TABLE query dirty hacks was done to
  do this, since current_thread memory_tracker was of Thread scope, that
  does not have any limits.

  And so if will change parent for it to Merge/Mutate memory tracker
  (which also does not have some of settings) it will not be correctly
  tracked.

  To address this Merge/Mutate was set as parent not to the
  current_thread memory_tracker but to it's parent, since it's scope is
  Process with all settings.

  But that parent's memory_tracker is the memory_tracker of the
  thread_group, and so if you will have nested ThreadPool inside
  merge/mutate (this is the case for s3 async writes, which has been
  added in #33291) you may get use-after-free of memory_tracker.

  Consider the following example:

    MemoryTrackerThreadSwitcher()
      thread_group.memory_tracker.parent = merge_list_entry->memory_tracker
      (see also background_thread_memory_tracker above)

    CurrentThread::attachTo()
      current_thread.memory_tracker.parent = thread_group.memory_tracker

    CurrentThread::detachQuery()
      current_thread.memory_tracker.parent = thread_group.memory_tracker.parent
      # and this is equal to merge_list_entry->memory_tracker

    ~MemoryTrackerThreadSwitcher()
      thread_group.memory_tracker = thread_group.memory_tracker.parent

  So after the following we will get incorrect memory_tracker (from the
  mege_list_entry) when the next job in that ThreadPool will not have
  thread_group, since in this case it will not try to update the
  current_thread.memory_tracker.parent and use-after-free will happens.

So to address the (2) issue, settings from the parent memory_tracker
should be copied to the merge_list_entry->memory_tracker, to avoid
playing with parent memory tracker.

Note, that settings from the query (OPTIMIZE TABLE) is not available at
that time, so it cannot be used (instead of parent's memory tracker
settings).

v2: remove memory_tracker.setOrRaiseHardLimit() from settings

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-02-18 16:23:54 +03:00
Alexander Tokmakov
ae5aa8c12d write part version before other files 2022-02-15 02:24:51 +03:00
Anton Popov
dcd7312d75 cache common type on objects in MergeTree 2022-02-09 23:47:53 +03:00
Anton Popov
18940b8637 Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-09 23:38:38 +03:00
Nikolai Kochetov
2a6eb593be
Revert "Revert "Add pool to WriteBufferFromS3"" 2022-02-01 13:36:51 +03:00
alexey-milovidov
095d9bfa43
Revert "Add pool to WriteBufferFromS3" 2022-02-01 05:49:40 +03:00
Anton Popov
78b9f15abb Merge remote-tracking branch 'upstream/master' into HEAD 2022-01-30 03:24:37 +03:00
Nikolai Kochetov
9b2998c639 Review fixes. 2022-01-26 18:08:01 +00:00