Previously all WAL files had been stored, though with the time of use
this can take too much space on disk, and also the startup time will be
increased.
But it is pretty easy to prune old WAL files (the one parts from which
had been completely written to disk already).
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Since this commit can definitelly throw (i.e. due to ENOSPC).
Note, that it should be safe, since rollback() will call dropPart() for
those parts.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
- TSA is a static analyzer build by Google which finds race conditions
and deadlocks at compile time.
- It works by associating a shared member variable with a
synchronization primitive that protects it. The compiler can then
check at each access if proper locking happened before. A good
introduction are [0] and [1].
- TSA requires some help by the programmer via annotations. Luckily,
LLVM's libcxx already has annotations for std::mutex, std::lock_guard,
std::shared_mutex and std::scoped_lock. This commit enables them
(--> contrib/libcxx-cmake/CMakeLists.txt).
- Further, this commit adds convenience macros for the low-level
annotations for use in ClickHouse (--> base/defines.h). For
demonstration, they are leveraged in a few places.
- As we compile with "-Wall -Wextra -Weverything", the required compiler
flag "-Wthread-safety-analysis" was already enabled. Negative checks
are an experimental feature of TSA and disabled
(--> cmake/warnings.cmake). Compile times did not increase noticeably.
- TSA is used in a few places with simple locking. I tried TSA also
where locking is more complex. The problem was usually that it is
unclear which data is protected by which lock :-(. But there was
definitely some weird code where locking looked broken. So there is
some potential to find bugs.
*** Limitations of TSA besides the ones listed in [1]:
- The programmer needs to know which lock protects which piece of shared
data. This is not always easy for large classes.
- Two synchronization primitives used in ClickHouse are not annotated in
libcxx:
(1) std::unique_lock: A releaseable lock handle often together with
std::condition_variable, e.g. in solve producer-consumer problems.
(2) std::recursive_mutex: A re-entrant mutex variant. Its usage can be
considered a design flaw + typically it is slower than a standard
mutex. In this commit, one std::recursive_mutex was converted to
std::mutex and annotated with TSA.
- For free-standing functions (e.g. helper functions) which are passed
shared data members, it can be tricky to specify the associated lock.
This is because the annotations use the normal C++ rules for symbol
resolution.
[0] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
[1] https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42958.pdf
For async s3 writes final part flushing was defered until all the INSERT
block was processed, however in case of too many partitions/columns you
may exceed max_memory_usage limit (since each stream has overhead).
Introduce max_insert_delayed_streams_for_parallel_writes (with default
to 1000 for S3, 0 otherwise), to avoid this.
This should "Memory limit exceeded" errors in performance tests.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
In #33291 final part commit had been defered, and now it can take
significantly more time, that may lead to "Part directory doesn't exist"
error during INSERT:
2022.02.21 18:18:06.979881 [ 11329 ] {insert} <Debug> executeQuery: (from 127.1:24572, user: default) INSERT INTO db.table (...) VALUES
2022.02.21 20:58:03.933593 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18044_18044_0 to 20220214_270654_270654_0.
2022.02.21 21:16:50.961917 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18197_18197_0 to 20220214_270689_270689_0.
...
2022.02.22 21:16:57.632221 [ 64878 ] {} <Warning> db.table: Removing temporary directory /clickhouse/data/db/table/tmp_insert_20220214_18232_18232_0/
...
2022.02.23 12:23:56.277480 [ 11329 ] {insert} <Trace> db.table: Renaming temporary part tmp_insert_20220214_18232_18232_0 to 20220214_273459_273459_0.
2022.02.23 12:23:56.299218 [ 11329 ] {insert} <Error> executeQuery: Code: 107. DB::Exception: Part directory /clickhouse/data/db/table/tmp_insert_20220214_18232_18232_0/ doesn't exist. Most likely it is a logical error. (FILE_DOESNT_EXIST) (version 22.2.1.1) (from 127.1:24572) (in query: INSERT INTO db.table (...) VALUES), Stack trace (when copying this message, always include the lines below):
Follow-up for: #28760
Refs: #33291
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Before it was possible to return stale counter from
StorageMergeTree::totalRows() (that is used for
optimize_trivial_count_query) since the accounting is done in two steps:
- subtract old number of rows
<-- here the number can be zero, even though there are rows -->
- add new number of rows
This was found by CI [1] in 01615_random_one_shard_insertion test:
Here you can see that INSERT went to both tables:
<details>
2022.01.16 09:07:34.288252 [ 154369 ] {a1905be0-93da-460c-8c6f-9b5adace72a0} <Debug> DistributedBlockOutputStream: It took 0.035197041 sec. to insert 100
blocks, 2841.1479249065287 rows per second. Insertion status:
Wrote 54 blocks and 54 rows on shard 0 replica 0, localhost:9000 (average 0 ms per block, the slowest block 1 ms)
Wrote 46 blocks and 46 rows on shard 1 replica 0, localhost:9000 (average 0 ms per block, the slowest block 1 ms)
</details>
But the test fails, since select from shard1.tbl returns 0, and the
problem was concurrent merge:
<details>
2022.01.16 09:07:34.289470 [ 146495 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Debug> executeQuery: (from [::1]:42082) (comment: 01615_random_one_shard_insertion.sql) select count() != 0 from shard_0.tbl;
2022.01.16 09:07:34.289564 [ 375 ] {c7a885fa-4ef4-4dcf-a4de-1650d44fa0ab::all_1_54_9} <Debug> MergeTask::MergeProjectionsStage: Merge sorted 54 rows, containing 1 columns (1 merged, 0 gathered) in 0.00171193 sec., 31543.345814373253 rows/sec., 246.43 KiB>
2022.01.16 09:07:34.289810 [ 375 ] {c7a885fa-4ef4-4dcf-a4de-1650d44fa0ab::all_1_54_9} <Trace> shard_0.tbl (c7a885fa-4ef4-4dcf-a4de-1650d44fa0ab): Renaming temporary part tmp_merge_all_1_54_9 to all_1_54_9.
2022.01.16 09:07:34.289858 [ 146495 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> ContextAccess (default): Access granted: SELECT(number) ON shard_0.tbl
2022.01.16 09:07:34.289897 [ 375 ] {c7a885fa-4ef4-4dcf-a4de-1650d44fa0ab::all_1_54_9} <Trace> shard_0.tbl (c7a885fa-4ef4-4dcf-a4de-1650d44fa0ab) (MergerMutator): Merged 6 parts: from all_1_49_8 to all_54_54_0
2022.01.16 09:07:34.289920 [ 146495 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> InterpreterSelectQuery: WithMergeableState -> Complete
2022.01.16 09:07:34.289987 [ 375 ] {} <Debug> MemoryTracker: Peak memory usage Mutate/Merge: 3.12 MiB.
2022.01.16 09:07:34.290305 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> MergingAggregatedTransform: Reading blocks of partially aggregated data.
2022.01.16 09:07:34.290332 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Debug> MergingAggregatedTransform: Read 1 blocks of partially aggregated data, total 1 rows.
2022.01.16 09:07:34.290343 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> Aggregator: Merging partially aggregated single-level data.
2022.01.16 09:07:34.290358 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> Aggregator: Merged partially aggregated single-level data.
2022.01.16 09:07:34.290366 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Trace> Aggregator: Converting aggregated data to blocks
2022.01.16 09:07:34.290391 [ 154344 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Debug> Aggregator: Converted aggregated data to blocks. 1 rows, 8.00 B in 1.0939e-05 sec. (91416.034 rows/sec., 714.19 KiB/sec.)
2022.01.16 09:07:34.290709 [ 146495 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Information> executeQuery: Read 1 rows, 4.01 KiB in 0.001187722 sec., 841 rows/sec., 3.30 MiB/sec.
2022.01.16 09:07:34.290774 [ 146495 ] {cd9d4cf2-7131-4179-b0b2-3aeec4045755} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
</details>
[1]: https://s3.amazonaws.com/clickhouse-test-reports/33675/7848ea7d609e4c720e8e4494eb6207c0751f5aea/stateless_tests__ubsan__actions_.html
This also fixes a race between DROP TABLE check and INSERT/merges.
v0: use Active parts instead.
v2: fix total counters accounting instead.
v2: rebase against MergeTask
v3: rebase due to conflicts in src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
v4:
- rebase due to conflicts in src/Storages/MergeTree/MergeTask.cpp
- drop common/scope_guard_safe.h (not used)