1. Moved Volume to separate file
2. Created IVolume interface and implemented current behaviour in implementation of new interface — VolumeJBOD
3. Replaced all old volume usages with new VolumeJBOD. Where it is unnecessary to have JBOD — left just IVolume.
4. Removed old Volume completely
5. Moved StoragePolicy to separated files
6. Moved DiskSelector to separated files
7. Removed DiskSpaceMonitor file
Before this patch it printed 3 times:
- from StorageDistributed::getProcessingStageImpl()
- from StorageDistributed::read()
- from StorageDistributed::getProcessingStageImpl() (from StorageDistributed::read() -> getSampleBlock())
(But this should be optimized)
I know at least one way to fool that optimization, by using as sharding
key something like `if(col1>0, col1, col2)` (although this is not common
sharding key I would say, but can be useful if this will work
correctly), so let's disable it by default.
StorageDistributed::shutdown() does not acquire the lock, that controls
access to the cluster_nodes_data, thus it does not synced with the
requireDirectoryMonitor(), hence some monitors can be untracked that
will trigger UAF (use-after-free) after DROP TABLE dist:
This is for the SIGSEGV from the DirectoryMonitor (with already destroyed storage):
0 0x0000000008e9f760 in std::__1::__cxx_atomic_load<int> (__order=std::__1::memory_order::seq_cst, __a=0x0)
1 std::__1::__atomic_base<int, false>::load (__m=std::__1::memory_order::seq_cst, this=0x0) <-- this is nullptr
2 std::__1::__atomic_base<int, false>::operator int (this=0x0)
3 DB::ActionBlocker::isCancelled (this=0x7f85e31c9bb8) at ../src/Common/ActionBlocker.h:18
4 DB::StorageDistributedDirectoryMonitor::run (this=0x7f85f93b2a00) at ../src/Storages/Distributed/DirectoryMonitor.cpp:140
After #8756 the problem with 1 thread for each (distributed table, disk)
for distributed sends became even worse (since there can be multiple
disks), so use predefined thread pool for this tasks, that can be
controlled with background_distributed_schedule_pool_size knob.
* "lock_acquire_timeout" controls for how long a query will continue to
acquire each lock on its argument tables
* "lock_acquire_timeout_for_background_operations" is a per-table
setting for storages of *MergeTree family