INSERT into Distributed with insert_distributed_sync=1 stores the
distributed batches on the disk for sending in background.
But types may be a little bit different for the Distributed and it's
underlying table, so the initiator need to know whether conversion is
required or not.
Before this patch those on disk distributed batches contains header,
which includes dumpStructure() for the block in that batch, however it
checks not only names and types and plus dumpStructure() is a debug
method.
So instead of storing string representation for the block header we
should store empty block in the file header (note, that we cannot store
the empty block not in header, since this will require reading all
blocks from file, due to some trickery of the readers interface).
Note, that this patch also contains tiny refactoring:
- s/header/distributed_header/
v1: dumpNamesAndTypes()
v2: dump empty block into the batch itself
v3: move empty block into the header
Found with fuzzer [1]:
<details>
2021.04.04 10:25:46.762596 [ 135 ] {4b6b5de3-d242-4267-910a-76b235467356} <Fatal> : Logical error: 'Logical error: no information about file column%2Ename.size0 in StorageLog'.
...
2021.04.04 10:25:46.763563 [ 44 ] {} <Trace> BaseDaemon: Received signal 6
2021.04.04 10:25:46.765884 [ 165 ] {} <Fatal> BaseDaemon: 5. abort @ 0x25859 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
2021.04.04 10:25:46.901540 [ 165 ] {} <Fatal> BaseDaemon: 6. ./obj-x86_64-linux-gnu/../src/Common/Exception.cpp:51: DB::handle_error_code() @ 0x11cc4f16 in /workspace/clickhouse
2021.04.04 10:25:47.030755 [ 165 ] {} <Fatal> BaseDaemon: 7. ./obj-x86_64-linux-gnu/../src/Common/Exception.cpp:58: DB::Exception::Exception() @ 0x11cc5025 in /workspace/clickhouse
2021.04.04 10:25:47.617201 [ 165 ] {} <Fatal> BaseDaemon: 8. ./obj-x86_64-linux-gnu/../src/Storages/StorageLog.cpp:183: DB::LogSource::readData()
</details>
[1]: https://clickhouse-test-reports.s3.yandex.net/22583/69296876005c0fa171c755f8b224e4d58192c402/fuzzer_debug/report.html#fail1
The problem here is that there is no column%2Ename.size0 there is
column.size0 instead, and it seems that reading sizeX file is not enough
in other places, so just filter them out in
ExpressionActions::getSmallestColumn() (fallback method for storages w/o
getColumnSizes() implemented)