stracktrace:
...
2 DB::WriteBuffer::nextIfAtEnd (this=0x0) at ../dbms/src/IO/WriteBuffer.h:66
3 DB::writeVarUInt (ostr=..., x=3) at ../dbms/src/IO/VarInt.h:191
4 DB::Connection::sendCancel (this=0x7f0b7f5e5610) at ../dbms/src/Client/Connection.cpp:444
5 0x0000561738d0b565 in DB::MultiplexedConnections::sendCancel (this=0x7f0c9b0173e0) at ../dbms/src/Client/MultiplexedConnections.cpp:174
6 0x00005617387d22df in DB::RemoteBlockInputStream::tryCancel (this=0x7f0cc45f9810, reason=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:352
7 0x00005617387d2a37 in DB::RemoteBlockInputStream::cancel (this=<optimized out>, kill=false) at ../dbms/src/DataStreams/RemoteBlockInputStream.cpp:121
8 0x000056173891f10c in DB::ParallelInputsProcessor<DB::UnionBlockInputStream::Handler>::cancel (this=0x7f0c370ec168, kill=kill@entry=false)
9 0x000056173892cfce in DB::UnionBlockInputStream::cancel (kill=false, this=<optimized out>) at ../dbms/src/DataStreams/UnionBlockInputStream.h:99
10 DB::UnionBlockInputStream::Handler::onException (this=0x7f0c370ec160, exception=...) at ../dbms/src/DataStreams/UnionBlockInputStream.h:240
11 DB::ParallelInputsProcessor<DB::UnionBlockInputStream::Handler>::thread (this=0x7f0c370ec168, thread_group=..., thread_num=<optimized out>)
at ../dbms/src/DataStreams/ParallelInputsProcessor.h:217
And in onException frame there is:
(gdb) p ((DB::Exception *)exception._M_exception_object)._msg._M_dataplus._M_p
$6 = (...) 0x7f0c9b1e7a40 "Unknown packet 9 from server HOSTNAME:PORT"
And on "Unknown packet" there is disconnect() before, see default in
switch statement at Connection::receivePacket().
* Added a limit on how many errors can replica accumulate
* Decreased default error halving time to 60 seconds
* Made both configurable via settings
* Showing errors count and estimated recovery time for each replica in system.clusters
For cross-replication topology setups load_balancing=in_order works best
as nodes handle equal amount of load and usually they hit only 1/n of
data (n = number of replicas), which makes page cache usage more
efficient.
The problem is when one node of the shard goes down. If one replica goes
down, the next one in config will handle twice the usual load while
remaining ones will handle usual traffic.
Closes#4820.