Should fix the following [1]:
2024-10-07 21:13:38 02784_connection_string: [ OK ] 9.69 sec.
2024-10-07 21:13:38 Process Process-5:
2024-10-07 21:13:38 Traceback (most recent call last):
2024-10-07 21:13:38 File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
2024-10-07 21:13:38 self.run()
2024-10-07 21:13:38 File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
2024-10-07 21:13:38 self._target(*self._args, **self._kwargs)
2024-10-07 21:13:38 File "/usr/bin/clickhouse-test", line 2609, in run_tests_process
2024-10-07 21:13:38 return run_tests_array(*args, **kwargs)
2024-10-07 21:13:38 File "/usr/bin/clickhouse-test", line 2327, in run_tests_array
2024-10-07 21:13:38 stop_tests()
2024-10-07 21:13:38 File "/usr/bin/clickhouse-test", line 445, in stop_tests
2024-10-07 21:13:38 cleanup_child_processes(os.getpid())
2024-10-07 21:13:38 File "/usr/bin/clickhouse-test", line 433, in cleanup_child_processes
2024-10-07 21:13:38 child_pgid = os.getpgid(child)
2024-10-07 21:13:38 ProcessLookupError: [Errno 3] No such process
[1]: https://s3.amazonaws.com/clickhouse-test-reports/70448/cd826389e90065466ddfef140fc344b30e8c6de0/stateless_tests__aarch64_.html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Though now there are oddities with multiprocessing_manager.list():
Having 1 errors! 0 tests passed. 0 tests skipped. 2.20 s elapsed (MainProcess).
Won't run stateful tests because test data wasn't loaded.
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/managers.py", line 813, in _callmethod
conn = self._tls.connection
^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/ch/clickhouse/.cmake/../tests/clickhouse-test", line 3707, in <module>
main(args)
File "/src/ch/clickhouse/.cmake/../tests/clickhouse-test", line 3126, in main
if len(restarted_tests) > 0:
^^^^^^^^^^^^^^^^^^^^
File "<string>", line 2, in __len__
File "/usr/lib/python3.12/multiprocessing/managers.py", line 817, in _callmethod
self._connect()
File "/usr/lib/python3.12/multiprocessing/managers.py", line 804, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/multiprocessing/connection.py", line 519, in Client
c = SocketClient(address)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/multiprocessing/connection.py", line 647, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Here [1] the hung query failed:
2024.10.07 21:13:29.044675 [ 16750 ] {484a1200-d576-4c03-a82b-2d389b8e773f} <Debug> executeQuery: (from [::1]:43374) SELECT 1 /*hung check*/
(stage: Complete)
2024.10.07 21:13:29.047252 [ 16750 ] {484a1200-d576-4c03-a82b-2d389b8e773f} <Error> executeQuery: Code: 210. DB::Exception: I/O error: Broken pipe, while writing to socket ([::1]:8123 -> [::1]:43374): While executing TabSeparatedRowOutputFormat. (NETWORK_ERROR) (version 24.10.1.1368) (from [::1]:43374) (in query: SELECT 1 /*hung check*/
[1]: https://s3.amazonaws.com/clickhouse-test-reports/70448/cd826389e90065466ddfef140fc344b30e8c6de0/stateless_tests__aarch64_.html
But I don't see any possible reasons for this, only if the client closes
the connection, but I bet that the query had been sent long time ago,
but due to VM stall (#70473) it was not accepted.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
After https://github.com/ClickHouse/ClickHouse/pull/67737 the output
will be broken, since in case of timeout it will print to stdout.
Let's just capture it and add it to stderr.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Previously it was possible to have original pgid from the spawned
threads, that could lead to killing the caller script and in case of CI
it could be init process [1].
[1]: https://s3.amazonaws.com/clickhouse-test-reports/67737/e68c9c8d16f37f6c25739076c9b071ed97952269/stress_test__asan_/stress_test_run_21.txt
Repro:
$ echo "SELECT '1" > tests/queries/0_stateless/00001_select_1.sql # break the test
$ cat /tmp/test.sh
./tests/clickhouse-test 0001_select --test-runs 3 --max-failures-chain 1 --no-random-settings --no-random-merge-tree-settings
Before this change:
$ /tmp/test.sh
Using queries from '/src/ch/worktrees/clickhouse-upstream/tests/queries' directory
Connecting to ClickHouse server... OK
Connected to server 24.8.1.1 @ bef896ce143ea4e0464c9829de6277ba06cc1a53 mt/rename-without-lock-v2
Running 3 stateless tests (MainProcess).
00001_select_1: [ FAIL ]
Reason: return code: 62
Code: 62. DB::Exception: Syntax error: failed at position 8 (''1;
'): '1;
. Single quoted string is not closed: ''1;
'. (SYNTAX_ERROR)
, result:
stdout:
Database: test_hz2zwymr
Child processes of 13041:
13042 python3 ./tests/clickhouse-test 0001_select --test-runs 3 --max-failures-chain 1 --no-random-settings --no-random-merge-tree-settings
Killing process group 13040
Processes in process group 13040:
13040 -bash
13042 python3 ./tests/clickhouse-test 0001_select --test-runs 3 --max-failures-chain 1 --no-random-settings --no-random-merge-tree-settings
[2]+ Stopped /tmp/test.sh
[1]$ Process group 13040 should be killed
Max failures chain
[2]+ Killed /tmp/test.sh
After:
$ /tmp/test.sh
Using queries from '/src/ch/worktrees/clickhouse-upstream/tests/queries' directory
Connecting to ClickHouse server... OK
Connected to server 24.8.1.1 @ bef896ce143ea4e0464c9829de6277ba06cc1a53 mt/rename-without-lock-v2
Running 3 stateless tests (MainProcess).
00001_select_1: [ FAIL ]
Reason: return code: 62
Code: 62. DB::Exception: Syntax error: failed at position 8 (''1;
'): '1;
. Single quoted string is not closed: ''1;
'. (SYNTAX_ERROR)
, result:
stdout:
Database: test_urz6rk5z
Child processes of 9782:
9785 python3 ./tests/clickhouse-test 0001_select --test-runs 3 --max-failures-chain 1 --no-random-settings --no-random-merge-tree-settings
Max failures chain
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
It was simply wrong before, but now, with capturing stacktrace that can
take sometime it is a must.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>