Lambda erase_from_active() captures the item (TaskRuntimeDataPtr), most
of the code path is OK, since it explicitly reset the item->task.
However one is not, when it moves the item to pending list, which will
be cleaned up when the table will be DROP/DETACH'ed, from
MergeTreeBackgroundExecutor::removeTasksCorrespondingToStorage(),
and in this case if IStorage will be already removed, then it will lead
to use-after-free on destroying the lambda, since it captures the item
by value.
And I belive that CI founds this issue here [1]:
<details>
<summary>stack trace</summary>
4 0x268d1354 in DB::ReplicatedMergeTreeQueue::CurrentlyExecuting::~CurrentlyExecuting() build_docker/../src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp:1510:33
5 0x268ead1a in std::__1::default_delete<DB::ReplicatedMergeTreeQueue::CurrentlyExecuting>::operator()(DB::ReplicatedMergeTreeQueue::CurrentlyExecuting*) const build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:54:5
6 0x268ead1a in std::__1::unique_ptr<DB::ReplicatedMergeTreeQueue::CurrentlyExecuting, std::__1::default_delete<DB::ReplicatedMergeTreeQueue::CurrentlyExecuting> >::reset(DB::ReplicatedMergeTreeQueue::CurrentlyExecuting*) build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:315:7
7 0x268ead1a in std::__1::unique_ptr<DB::ReplicatedMergeTreeQueue::CurrentlyExecuting, std::__1::default_delete<DB::ReplicatedMergeTreeQueue::CurrentlyExecuting> >::~unique_ptr() build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:269:19
8 0x268ead1a in DB::ReplicatedMergeTreeQueue::SelectedEntry::~SelectedEntry() build_docker/../src/Storages/MergeTree/ReplicatedMergeTreeQueue.h:351:12
9 0x268ead1a in void std::__1::__destroy_at<DB::ReplicatedMergeTreeQueue::SelectedEntry, 0>(DB::ReplicatedMergeTreeQueue::SelectedEntry*) build_docker/../contrib/libcxx/include/__memory/construct_at.h:56:13
...
16 0x265e9abb in DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(std::__1::shared_ptr<DB::TaskRuntimeData>)::'lambda'()::~() build_docker/../src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp:127:30
17 0x265e9abb in DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(std::__1::shared_ptr<DB::TaskRuntimeData>) build_docker/../src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp:235:1
18 0x265ea730 in DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::threadFunction() build_docker/../src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp:261:13
</details>
[1]: https://s3.amazonaws.com/clickhouse-test-reports/0/533c7888038453c047df816f3f65292cca05a54f/stateless_tests__ubsan__actions_.html
And I also I think that the following issue will also be fixed [2]:
<details>
<summary>stack trace</summary>
[ 680 ] {} <Fatal> : Logical error: 'Memory tracker: allocations not allowed.'.
...
[ 23430 ] {} <Fatal> BaseDaemon: 23. /build/build_docker/../src/Common/formatReadable.cpp:46: formatReadableSizeWithBinarySuffix(double, int) @ 0x1713bb98 in /usr/bin/clickhouse
...
[ 23430 ] {} <Fatal> BaseDaemon: 31. ../src/Common/MemoryTracker.cpp:112: MemoryTracker::logPeakMemoryUsage() @ 0x170e0ab9 in /usr/bin/clickhouse
[ 23430 ] {} <Fatal> BaseDaemon: 32. /build/build_docker/../src/Common/MemoryTracker.cpp:98: MemoryTracker::~MemoryTracker() @ 0x170e063a in /usr/bin/clickhouse
[ 23430 ] {} <Fatal> BaseDaemon: 33. /build/build_docker/../src/Storages/MergeTree/MergeList.cpp:144: DB::MergeListElement::~MergeListElement() @ 0x279fb290 in /usr/bin/clickhouse
...
[ 23430 ] {} <Fatal> BaseDaemon: 38. /build/build_docker/../src/Storages/MergeTree/BackgroundProcessList.h:41: DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo>::~BackgroundProcessListEntry() @ 0x276ce6c7 in /usr/bin/clickhouse
[ 23430 ] {} <Fatal> BaseDaemon: 39. /build/build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:54: std::__1::default_delete<DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo> >::operator()(DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo>*) const @ 0x276ce60b in /usr/bin/clickhouse
[ 23430 ] {} <Fatal> BaseDaemon: 40. /build/build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:316: std::__1::unique_ptr<DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo>, std::__1::default_delete<DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo> > >::reset(DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo>*) @ 0x276ce57c in /usr/bin/clickhouse
[ 23430 ] {} <Fatal> BaseDaemon: 41. /build/build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:269: std::__1::unique_ptr<DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo>, std::__1::default_delete<DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo> > >::~unique_ptr() @ 0x276ce399 in /usr/bin/clickhouse
[ 23430 ] {} <Fatal> BaseDaemon: 42. /build/build_docker/../src/Storages/MergeTree/MutatePlainMergeTreeTask.h:22: DB::MutatePlainMergeTreeTask::~MutatePlainMergeTreeTask() @ 0x27defceb in /usr/bin/clickhouse
[ 23430 ] {} <Fatal> BaseDaemon: 43. /build/build_docker/../contrib/libcxx/include/__memory/construct_at.h:57: void std::__1::__destroy_at<DB::MutatePlainMergeTreeTask, 0>(DB::MutatePlainMergeTreeTask*) @ 0x27dd69c1 in /usr/bin/clickhouse
[ 23430 ] {} <Fatal> BaseDaemon: 44. /build/build_docker/../contrib/libcxx/include/__memory/construct_at.h:82: void std::__1::destroy_at<DB::MutatePlainMergeTreeTask, 0>(DB::MutatePlainMergeTreeTask*) @ 0x27dd6955 in /usr/bin/clickhouse
[ 23430 ] {} <Fatal> BaseDaemon: Integrity check of the executable skipped because the reference checksum could not be read. (calculated checksum: 91F5937571C11255DFE73230B52CE9C0)
[ 602 ] {} <Fatal> Application: Child process was terminated by signal 6.
</details>
[2]: https://s3.amazonaws.com/clickhouse-test-reports/39222/a068c397dfd7943359a8b554566c3c70b78baf8d/stateless_tests__debug__actions__%5B1/3%5D.html
Refs: https://github.com/ClickHouse/ClickHouse/pull/29614#discussion_r720455032 (cc @nikitamikhaylov)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
getauxval() from glibc-compatibility did not work always correctly:
- It does not work after setenv(), and this breaks vsyscalls,
like sched_getcpu() [1] (and BaseDaemon.cpp always set TZ if timezone
is defined, which is true for CI [2]).
Also note, that fixing setenv() will not fix LSan,
since the culprit is getauxval()
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1163404
[2]: ClickHouse#32928 (comment)
- Another think that is definitely broken is LSan (Leak Sanitizer), it
relies on worked getauxval() but it does not work if __environ is not
initialized yet (there is even a commit about this).
And because of, at least, one leak had been introduced [3]:
[3]: ClickHouse#33840
Fix this by using /proc/self/auxv with fallback to environ solution to
make it compatible with environment that does not allow reading from
auxv (or no procfs).
v2: add fallback to environ solution
v3: fix return value for __auxv_init_procfs()
Refs: #33957
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>