Previously it was possible to have original pgid from the spawned
threads, that could lead to killing the caller script and in case of CI
it could be init process [1].
[1]: https://s3.amazonaws.com/clickhouse-test-reports/67737/e68c9c8d16f37f6c25739076c9b071ed97952269/stress_test__asan_/stress_test_run_21.txt
Repro:
$ echo "SELECT '1" > tests/queries/0_stateless/00001_select_1.sql # break the test
$ cat /tmp/test.sh
./tests/clickhouse-test 0001_select --test-runs 3 --max-failures-chain 1 --no-random-settings --no-random-merge-tree-settings
Before this change:
$ /tmp/test.sh
Using queries from '/src/ch/worktrees/clickhouse-upstream/tests/queries' directory
Connecting to ClickHouse server... OK
Connected to server 24.8.1.1 @ bef896ce143ea4e0464c9829de6277ba06cc1a53 mt/rename-without-lock-v2
Running 3 stateless tests (MainProcess).
00001_select_1: [ FAIL ]
Reason: return code: 62
Code: 62. DB::Exception: Syntax error: failed at position 8 (''1;
'): '1;
. Single quoted string is not closed: ''1;
'. (SYNTAX_ERROR)
, result:
stdout:
Database: test_hz2zwymr
Child processes of 13041:
13042 python3 ./tests/clickhouse-test 0001_select --test-runs 3 --max-failures-chain 1 --no-random-settings --no-random-merge-tree-settings
Killing process group 13040
Processes in process group 13040:
13040 -bash
13042 python3 ./tests/clickhouse-test 0001_select --test-runs 3 --max-failures-chain 1 --no-random-settings --no-random-merge-tree-settings
[2]+ Stopped /tmp/test.sh
[1]$ Process group 13040 should be killed
Max failures chain
[2]+ Killed /tmp/test.sh
After:
$ /tmp/test.sh
Using queries from '/src/ch/worktrees/clickhouse-upstream/tests/queries' directory
Connecting to ClickHouse server... OK
Connected to server 24.8.1.1 @ bef896ce143ea4e0464c9829de6277ba06cc1a53 mt/rename-without-lock-v2
Running 3 stateless tests (MainProcess).
00001_select_1: [ FAIL ]
Reason: return code: 62
Code: 62. DB::Exception: Syntax error: failed at position 8 (''1;
'): '1;
. Single quoted string is not closed: ''1;
'. (SYNTAX_ERROR)
, result:
stdout:
Database: test_urz6rk5z
Child processes of 9782:
9785 python3 ./tests/clickhouse-test 0001_select --test-runs 3 --max-failures-chain 1 --no-random-settings --no-random-merge-tree-settings
Max failures chain
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
It was simply wrong before, but now, with capturing stacktrace that can
take sometime it is a must.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Previously processes cleanup on i.e. SIGINT simply did not work, because
the launcher kills only processes in process group, while tests are
launched with start_new_session=True for Popen(), which creates own
process group.
This is needed for killing process group in case of test timeout.
So instead, look at the parent pid, and kill the child process groups.
Also add some logging to make it more explicit which processes will be
killed.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Right now there are couple of gdb bugs that makes CI unstable:
- https://sourceware.org/bugzilla/show_bug.cgi?id=29185
- https://bugzilla.redhat.com/show_bug.cgi?id=1492496
But ubuntu 22.04 does not have 14+ anywhere, the
~ubuntu-toolchain-r/test contains only gdb 13, so there is no other
options except for compiling it from sources.
But there also other reasons to update it - optimizations, looks like
older gdb versions does not use index fully - 5.6sec vs 56sec:
# 15.1
$ time command gdb -batch -ex 'disas main' clickhouse
...
real 0m5.692s
user 0m29.948s
sys 0m1.190s
# 12.1 (from ubuntu 22.04)
real 0m56.709s
user 0m59.307s
sys 0m0.585s
Also note, that we cannot compile gdb in the fasttest (that contains
compiler) since some images does not includes full toolchain, for
instance gdb is added in the following images:
- test-util -> test-base -> lots of other images (no toolchain)
- performance-comparison (no toolchain)
- integration-test (no toolchain)
- integration-tests-runner (no toolchain)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>