ClickHouse/tests/integration/parallel_skip.json
Azat Khuzhin f886c4237e tests/integration: fix possible race for iptables user rules inside containers
It is possible for network PartitionManager to work incorrectly, because
of how docker setting up forward to DOCKER-USER chain, it first removes
forward and then adds it back (see [1] and [2]), however this introduce
race for a short period of time, and this is enough for TCP to
retransmit packets, and breaks network PartitionManager.

  [1]: b1e30e8328/libnetwork/iptables/iptables.go (L638)
  [2]: b1e30e8328/libnetwork/firewall_linux.go (L42)

Here are some details from logs for [3]:

    2022-04-27 03:01:00 [ 621 ] DEBUG : Executing query SELECT node FROM distributed_table ORDER BY node on node2 (cluster.py:2879, query_and_get_error)

  [3]: https://s3.amazonaws.com/clickhouse-test-reports/36295/314d553ab14d30df7508814513506ec09c7c7061/integration_tests__asan__actions__[2/3]/integration_run_parallel1_0.log

This query fails, from the server logs:

    2022.04.27 03:01:00.213101 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> executeQuery: (from 172.16.5.1:59008) SELECT node FROM distributed_table ORDER BY node
    ...
    2022.04.27 03:01:03.578439 [ 223 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> Connection (node1:9000): Sent data for 2 scalars, total 2 rows in 0.000284672 sec., 6993 rows/sec., 68.00 B (232.15 KiB/sec.), compressed 0.4594594594594595 times to 148.00 B (505.16 KiB/sec.)
    2022.04.27 03:01:03.590637 [ 223 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> MergingSortedTransform: Merge sorted 3 blocks, 2 rows in 3.371592744 sec., 0.5931914533744174 rows/sec., 94.61 B/sec
    2022.04.27 03:01:03.601256 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Information> executeQuery: Read 2 rows, 28.00 B in 3.387950542 sec., 0 rows/sec., 8.26 B/sec.
    2022.04.27 03:01:03.601894 [ 10 ] {19b1719f-8c39-4e3e-b782-aa4c933650f2} <Debug> MemoryTracker: Peak memory usage (for query): 334.38 KiB.

And from docker daemon log:

    time="2022-04-27T03:00:59.916693113Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-I\",\"DOCKER-USER\",\"1\",\"-p\",\"tcp\",\"-s\",\"172.16.5.2\",\"-d\",\"172.16.5.3\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
    time="2022-04-27T03:01:00.030654116Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-I\",\"DOCKER-USER\",\"1\",\"-p\",\"tcp\",\"-s\",\"172.16.5.3\",\"-d\",\"172.16.5.2\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
    ...
    time="2022-04-27T03:01:03.515813984Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -n -L DOCKER-USER]"
    time="2022-04-27T03:01:03.531106486Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C DOCKER-USER -j RETURN]"
    time="2022-04-27T03:01:03.535442346Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-USER]"
    time="2022-04-27T03:01:03.555856911Z" level=debug msg="/usr/sbin/iptables, [--wait -D FORWARD -j DOCKER-USER]"
    time="2022-04-27T03:01:03.564905764Z" level=debug msg="/usr/sbin/iptables, [--wait -I FORWARD -j DOCKER-USER]"
    ...
    time="2022-04-27T03:01:03.706374466Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-D\",\"DOCKER-USER\",\"-p\",\"tcp\",\"-s\",\"172.16.5.3\",\"-d\",\"172.16.5.2\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"
    time="2022-04-27T03:01:03.968077970Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"iptables\",\"--wait\",\"-D\",\"DOCKER-USER\",\"-p\",\"tcp\",\"-s\",\"172.16.5.2\",\"-d\",\"172.16.5.3\",\"-j\",\"DROP\"],\"Container\":\"b75f3b68cda51386bfbb9cceb67e92c4d217a5a1660bde2470b583cb1f4c7fc4\",\"Privileged\":true,\"Tty\":false,\"User\":\"\"}"

I've tried multiple ways of fixing this:

- Creating separate chain for rules from PartitionManager (DOCKER-USER-CLICKHOUSE)
  But it is created only once, and docker places new rules on top of the
  FORWARD chain, so it will not work, since it will not receive any
  packets

- Use DOCKER-USER, but replace iptables with a wrapper ([script]), that
  will ignore recreating of a rule for forward to DOCKER-USER, but this
  will not work too, since new docker rules will be created on top of
  FORWARD chain, and so DOCKER-USER will packets.

  [script]:

    if [[ "$*" =~ "-D FORWARD -j DOCKER-USER" ]]; then
        exit 0
    fi
    if [[ "$*" =~ "-I FORWARD -j DOCKER-USER" ]]; then
        if iptables.real iptables -C FORWARD -j DOCKER-USER; then
            exit 0
        fi
    fi

- And the only way to avoid flakiness for this case, is to forbid
  parallel execution for tests with PartitionManager.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-13 10:57:24 +03:00

51 lines
3.2 KiB
JSON

[
"test_dns_cache/test.py::test_dns_cache_update",
"test_dns_cache/test.py::test_ip_change_drop_dns_cache",
"test_dns_cache/test.py::test_ip_change_update_dns_cache",
"test_dns_cache/test.py::test_user_access_ip_change[node0]",
"test_dns_cache/test.py::test_user_access_ip_change[node1]",
"test_dns_cache/test.py::test_host_is_drop_from_cache_after_consecutive_failures",
"test_atomic_drop_table/test.py::test_atomic_delete_with_stopped_zookeeper",
"test_attach_without_fetching/test.py::test_attach_without_fetching",
"test_cleanup_dir_after_bad_zk_conn/test.py::test_cleanup_dir_after_bad_zk_conn",
"test_cleanup_dir_after_bad_zk_conn/test.py::test_attach_without_zk",
"test_consistent_parts_after_clone_replica/test.py::test_inconsistent_parts_if_drop_while_replica_not_active",
"test_cross_replication/test.py::test",
"test_ddl_worker_non_leader/test.py::test_non_leader_replica",
"test_delayed_replica_failover/test.py::test",
"test_dictionary_allow_read_expired_keys/test_default_reading.py::test_default_reading",
"test_dictionary_allow_read_expired_keys/test_dict_get.py::test_simple_dict_get",
"test_dictionary_allow_read_expired_keys/test_dict_get_or_default.py::test_simple_dict_get_or_default",
"test_disabled_mysql_server/test.py::test_disabled_mysql_server",
"test_distributed_respect_user_timeouts/test.py::test_reconnect",
"test_https_replication/test.py::test_replication_after_partition",
"test_insert_into_distributed/test.py::test_reconnect",
"test_insert_into_distributed/test.py::test_inserts_batching",
"test_insert_into_distributed_through_materialized_view/test.py::test_reconnect",
"test_insert_into_distributed_through_materialized_view/test.py::test_inserts_batching",
"test_keeper_multinode_blocade_leader/test.py::test_blocade_leader",
"test_keeper_multinode_blocade_leader/test.py::test_blocade_leader_twice",
"test_keeper_multinode_simple/test.py::test_session_expiration",
"test_keeper_two_nodes_cluster/test.py::test_read_write_two_nodes_with_blocade",
"test_limited_replicated_fetches/test.py::test_limited_fetches",
"test_materialized_mysql_database/test.py::test_network_partition_5_7",
"test_materialized_mysql_database/test.py::test_network_partition_8_0",
"test_mysql_database_engine/test.py::test_restart_server",
"test_parts_delete_zookeeper/test.py::test_merge_doesnt_work_without_zookeeper",
"test_quorum_inserts_parallel/test.py::test_parallel_quorum_actually_quorum",
"test_random_inserts/test.py::test_random_inserts",
"test_redirect_url_storage/test.py::test_url_reconnect",
"test_replace_partition/test.py::test_drop_failover",
"test_replace_partition/test.py::test_replace_after_replace_failover",
"test_replicated_database/test.py::test_recover_staled_replica",
"test_replicated_database/test.py::test_startup_without_zk",
"test_replicated_database/test.py::test_sync_replica",
"test_replicated_fetches_timeouts/test.py::test_no_stall",
"test_storage_kafka/test.py::test_kafka_no_holes_when_write_suffix_failed",
"test_storage_s3/test.py::test_url_reconnect_in_the_middle",
"test_system_metrics/test.py::test_readonly_metrics",
"test_system_replicated_fetches/test.py::test_system_replicated_fetches",
"test_zookeeper_config_load_balancing/test.py::test_round_robin"
]