After #8756 the problem with 1 thread for each (distributed table, disk)
for distributed sends became even worse (since there can be multiple
disks), so use predefined thread pool for this tasks, that can be
controlled with background_distributed_schedule_pool_size knob.
Add missing conversion (via ConvertingBlockInputStream) for INSERT into
remote nodes (for sync insert, async insert and async batch insert),
like for local nodes (in DistributedBlockOutputStream::writeBlockConverted).
This is required when the structure of the Distributed table differs
from the structure of the local table.
And also add a warning message, to highlight this in logs (since this
works slower).
Fixes: #19888