It is possible for network PartitionManager to work incorrectly, because
of how docker setting up forward to DOCKER-USER chain, it first removes
forward and then adds it back (see [1] and [2]), however this introduce
race for a short period of time, and this is enough for TCP to
retransmit packets, and breaks network PartitionManager.
[1]: b1e30e8328/libnetwork/iptables/iptables.go (L638)
[2]: b1e30e8328/libnetwork/firewall_linux.go (L42)
Here are some details from logs for [3]:
2022-04-27 03:01:00 [ 621 ] DEBUG : Executing query SELECT node FROM distributed_table ORDER BY node on node2 (cluster.py:2879, query_and_get_error)
[3]: https://s3.amazonaws.com/clickhouse-test-reports/36295/314d553ab14d30df7508814513506ec09c7c7061/integration_tests__asan__actions__[2/3]/integration_run_parallel1_0.log
This query fails, from the server logs:
2022.04.27 03:01:00.213101 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> executeQuery: (from 172.16.5.1:59008) SELECT node FROM distributed_table ORDER BY node
...
2022.04.27 03:01:03.578439 [ 223 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> Connection (node1:9000): Sent data for 2 scalars, total 2 rows in 0.000284672 sec., 6993 rows/sec., 68.00 B (232.15 KiB/sec.), compressed 0.4594594594594595 times to 148.00 B (505.16 KiB/sec.)
2022.04.27 03:01:03.590637 [ 223 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> MergingSortedTransform: Merge sorted 3 blocks, 2 rows in 3.371592744 sec., 0.5931914533744174 rows/sec., 94.61 B/sec
2022.04.27 03:01:03.601256 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Information> executeQuery: Read 2 rows, 28.00 B in 3.387950542 sec., 0 rows/sec., 8.26 B/sec.
2022.04.27 03:01:03.601894 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> MemoryTracker: Peak memory usage (for query): 334.38 KiB.
And from docker daemon log:
time="2022-04-27T03:00:59.916693113Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-I\",\"DOCKER-USER\",\"1\",\"-p\",\"tcp\",\"-s\",\"172.16.5.2\",\"-d\",\"172.16.5.3\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
time="2022-04-27T03:01:00.030654116Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-I\",\"DOCKER-USER\",\"1\",\"-p\",\"tcp\",\"-s\",\"172.16.5.3\",\"-d\",\"172.16.5.2\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
...
time="2022-04-27T03:01:03.515813984Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -n -L DOCKER-USER]"
time="2022-04-27T03:01:03.531106486Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C DOCKER-USER -j RETURN]"
time="2022-04-27T03:01:03.535442346Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-USER]"
time="2022-04-27T03:01:03.555856911Z" level=debug msg="/usr/sbin/iptables, [--wait -D FORWARD -j DOCKER-USER]"
time="2022-04-27T03:01:03.564905764Z" level=debug msg="/usr/sbin/iptables, [--wait -I FORWARD -j DOCKER-USER]"
...
time="2022-04-27T03:01:03.706374466Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-D\",\"DOCKER-USER\",\"-p\",\"tcp\",\"-s\",\"172.16.5.3\",\"-d\",\"172.16.5.2\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
time="2022-04-27T03:01:03.968077970Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-D\",\"DOCKER-USER\",\"-p\",\"tcp\",\"-s\",\"172.16.5.2\",\"-d\",\"172.16.5.3\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
I've tried multiple ways of fixing this:
- Creating separate chain for rules from PartitionManager (DOCKER-USER-CLICKHOUSE)
But it is created only once, and docker places new rules on top of the
FORWARD chain, so it will not work, since it will not receive any
packets
- Use DOCKER-USER, but replace iptables with a wrapper ([script]), that
will ignore recreating of a rule for forward to DOCKER-USER, but this
will not work too, since new docker rules will be created on top of
FORWARD chain, and so DOCKER-USER will packets.
[script]:
if [[ "$*" =~ "-D FORWARD -j DOCKER-USER" ]]; then
exit 0
fi
if [[ "$*" =~ "-I FORWARD -j DOCKER-USER" ]]; then
if iptables.real iptables -C FORWARD -j DOCKER-USER; then
exit 0
fi
fi
- And the only way to avoid flakiness for this case, is to forbid
parallel execution for tests with PartitionManager.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>