The original motivation for this commit was that shared_ptr_helper used
std::shared_ptr<>() which does two heap allocations instead of
make_shared<>() which does a single allocation. Turned out that
1. the affected code (--> Storages/) is not on a hot path (rendering the
performance argument moot ...)
2. yet copying Storage objects is potentially dangerous and was
previously allowed.
Hence, this change
- removes shared_ptr_helper and as a result all inherited create() methods,
- instead, Storage objects are now created using make_shared<>() by the
caller (for that to work, many constructors had to be made public), and
- all Storage classes were marked as noncopyable using boost::noncopyable.
In sum, we are (likely) not making things faster but the code becomes
cleaner and harder to misuse.
There are two possible cases for execution merges/mutations:
1) from background thread
2) from OPTIMIZE TABLE query
1) is pretty simple, it's memory tracking structure is as follow:
current_thread::memory_tracker = level=Thread / description="(for thread)" ==
background_thread_memory_tracker = level=Thread / description="(for thread)"
current_thread::memory_tracker.parent = level=Global / description="(total)"
So as you can see it is pretty simple and MemoryTrackerThreadSwitcher
does not do anything icky for this case.
2) is complex, it's memory tracking structure is as follow:
current_thread::memory_tracker = level=Thread / description="(for thread)"
current_thread::memory_tracker.parent = level=Process / description="(for query)" ==
background_thread_memory_tracker = level=Process / description="(for query)"
Before this patch to track memory (and related things, like sampling,
profiling and so on) for OPTIMIZE TABLE query dirty hacks was done to
do this, since current_thread memory_tracker was of Thread scope, that
does not have any limits.
And so if will change parent for it to Merge/Mutate memory tracker
(which also does not have some of settings) it will not be correctly
tracked.
To address this Merge/Mutate was set as parent not to the
current_thread memory_tracker but to it's parent, since it's scope is
Process with all settings.
But that parent's memory_tracker is the memory_tracker of the
thread_group, and so if you will have nested ThreadPool inside
merge/mutate (this is the case for s3 async writes, which has been
added in #33291) you may get use-after-free of memory_tracker.
Consider the following example:
MemoryTrackerThreadSwitcher()
thread_group.memory_tracker.parent = merge_list_entry->memory_tracker
(see also background_thread_memory_tracker above)
CurrentThread::attachTo()
current_thread.memory_tracker.parent = thread_group.memory_tracker
CurrentThread::detachQuery()
current_thread.memory_tracker.parent = thread_group.memory_tracker.parent
# and this is equal to merge_list_entry->memory_tracker
~MemoryTrackerThreadSwitcher()
thread_group.memory_tracker = thread_group.memory_tracker.parent
So after the following we will get incorrect memory_tracker (from the
mege_list_entry) when the next job in that ThreadPool will not have
thread_group, since in this case it will not try to update the
current_thread.memory_tracker.parent and use-after-free will happens.
So to address the (2) issue, settings from the parent memory_tracker
should be copied to the merge_list_entry->memory_tracker, to avoid
playing with parent memory tracker.
Note, that settings from the query (OPTIMIZE TABLE) is not available at
that time, so it cannot be used (instead of parent's memory tracker
settings).
v2: remove memory_tracker.setOrRaiseHardLimit() from settings
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>