From 630eddbbbcf75aedaf89d3acbeb67ff8fc99d1e8 Mon Sep 17 00:00:00 2001 From: Azat Khuzhin Date: Tue, 19 Sep 2023 19:43:10 +0200 Subject: [PATCH] Disable forwarding signals by watchdog in systemd service With default KillMode=control-group, systemd will send signals to all processes in cgroup and this will lead to server will be terminated forcefully due to double signal. 2023.09.19 12:47:06.369090 [ 763 ] {} Application: Received termination signal (Terminated) 2023.09.19 12:47:06.369141 [ 762 ] {} Application: Received termination signal. 2023.09.19 12:47:06.369215 [ 763 ] {} Application: Received termination signal (Terminated) 2023.09.19 12:47:06.369225 [ 763 ] {} Application: This is the second termination signal. Immediately terminate. 2023.09.19 12:47:06.400959 [ 761 ] {} Application: Child process exited normally with code 143. Someone may naively think that, hey, I can change KillMode to process/mixed, but this will not work either, because in this case systemd cannot wait for the $MainPID (and main_pid_alien=true in systemd's sources), because it is not a child of systemd, and this will lead to double signal again: 2023.09.19 16:24:19.694473 [ 3118 ] {} Application: Received termination signal (Terminated) 2023.09.19 16:24:19.694894 [ 3118 ] {} Application: Received termination signal (Terminated) 2023.09.19 16:24:19.695060 [ 3118 ] {} Application: This is the second termination signal. Immediately terminate. And this is because it sends signal firstly on a normal termnation and then when it cleans up left over processes: clickhouse-server.service: Found left-over process 3117 (clickhouse-serv) in control group while starting unit. Ignoring. And yes, even though it prints "Ignoring" here (I guess it is related to the fact that it can be ignored if the signal will not be handled) Here is a proof of double signal by systemd: # pgrep clickhouse-serv | xargs strace -e /kill -fp strace: Process 3117 attached with 469 threads [pid 3582] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} --- [pid 3580] --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=1, si_uid=0} --- [pid 3582] --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=1, si_uid=0} --- [pid 3580] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} --- ^^^ [pid 3118] tgkill(3117, 3118, SIGTERM) = 0 # and this is a force termination So yes, there is no other way except for disabling signal forwarding. *Well, there is another way, but I guess it is will be unwelcome (even though systemd can be configured in multiple ways right now, and there is even systemd-oomd instead of clickhouse'es watchdog) - disable watchdog completelly.* Signed-off-by: Azat Khuzhin --- packages/clickhouse-server.service | 3 +++ 1 file changed, 3 insertions(+) diff --git a/packages/clickhouse-server.service b/packages/clickhouse-server.service index c2ef7c2746d..9a7d07e5cee 100644 --- a/packages/clickhouse-server.service +++ b/packages/clickhouse-server.service @@ -21,6 +21,9 @@ RestartSec=30 # - shutdown_wait_unfinished_queries # - shutdown_wait_unfinished TimeoutStopSec=infinity +# Disable forwarding signals by watchdog, since with default systemd's +# kill-mode control-group, systemd will send signal to all process in cgroup. +Environment=CLICKHOUSE_WATCHDOG_NO_FORWARD=1 # Since ClickHouse is systemd aware default 1m30sec may not be enough TimeoutStartSec=0 # %p is resolved to the systemd unit name