clickhouse-test: fix shared list object (by fixing manager lifetime)

Right now it is possible to get the following error:

    Having 20 errors! 0 tests passed. 0 tests skipped. 57.37 s elapsed (MainProcess).
    Won't run stateful tests because test data wasn't loaded.
    Traceback (most recent call last):
      File "/usr/lib/python3.9/multiprocessing/managers.py", line 802, in _callmethod
        conn = self._tls.connection
    AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/src/ch/clickhouse/.cmake/../tests/clickhouse-test", line 1462, in <module>
        main(args)
      File "/src/ch/clickhouse/.cmake/../tests/clickhouse-test", line 1261, in main
        if len(restarted_tests) > 0:
      File "<string>", line 2, in __len__
      File "/usr/lib/python3.9/multiprocessing/managers.py", line 806, in _callmethod
        self._connect()
      File "/usr/lib/python3.9/multiprocessing/managers.py", line 793, in _connect
        conn = self._Client(self._token.address, authkey=self._authkey)
      File "/usr/lib/python3.9/multiprocessing/connection.py", line 507, in Client
        c = SocketClient(address)
      File "/usr/lib/python3.9/multiprocessing/connection.py", line 635, in SocketClient
        s.connect(address)
    ConnectionRefusedError: [Errno 111] Connection refused

The reason behind this is that manager's thread got terminated:

    ipdb> p restarted_tests._manager._process
    <ForkProcess name='SyncManager-1' pid=25125 parent=24939 stopped exitcode=-SIGTERM>

Refs: #29259 (cc: @vdimir)
Follow-up for: #29197 (cc: @tavplubix)
This commit is contained in:
Azat Khuzhin 2021-09-27 21:10:59 +03:00
parent e9749b0027
commit 985e8ee061

View File

@ -65,11 +65,20 @@ def signal_handler(sig, frame):
def stop_tests():
global stop_tests_triggered_lock
global stop_tests_triggered
global restarted_tests
with stop_tests_triggered_lock:
if not stop_tests_triggered.is_set():
stop_tests_triggered.set()
# materialize multiprocessing.Manager().list() object before
# sending SIGTERM since this object is a proxy, that requires
# communicating with manager thread, but after SIGTERM will be
# send, this thread will die, and you will get
# ConnectionRefusedError error for any access to "restarted_tests"
# variable.
restarted_tests = [*restarted_tests]
# send signal to all processes in group to avoid hung check triggering
# (to avoid terminating clickhouse-test itself, the signal should be ignored)
signal.signal(signal.SIGTERM, signal.SIG_IGN)