This also fixes hardlink code (when one file should be sent to multiple
servers, i.e. internal_replication == false) of writeToShard() with
distributed_storage_policy (i.e. when StorageDistributed::getPath() will
path to different filesystems).
Plus also cleanup DistributedBlockOutputStream::writeToShard() a little.
After #8756 the problem with 1 thread for each (distributed table, disk)
for distributed sends became even worse (since there can be multiple
disks), so use predefined thread pool for this tasks, that can be
controlled with background_distributed_schedule_pool_size knob.