Use subshell to:
- avoid change/restore of TIMEFORMAT
- grepping out trace output, by using set +x in a subshell
- use array for options to avoid extra backslashes
Do not stop processing pathologically slow queries,
since this may hide errors in other queries,
i.e. when the test failed on one of servers (upstream or from PR)
and someone interpret this is as OK with the following comment:
"the failure was on the upstream server it is OK, PR should fix it"
Anyway there is almost zero such queries right now, and before merging
something this should be reviewed.
killall requires strict match, i.e. "clickhouse-server" not
"clickhouse":
2021-12-03 05:24:56 + env kill -- -21700
2021-12-03 05:24:56 kill: (-21700): No such process
2021-12-03 05:24:56 + killall clickhouse
2021-12-03 05:24:56 clickhouse: no process found
2021-12-03 05:24:56 + echo Servers stopped.
2021-12-03 05:24:56 Servers stopped.
2021-12-03 05:24:56 + analyze_queries
$ tail -n1 *-server-log.log
==> left-server-log.log <==
2021.12.03 05:26:59.530647 [ 450 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushed system log up to offset 1668052
==> right-server-log.log <==
2021.12.03 05:27:20.873136 [ 466 ] {} <Trace> SystemLog (system.metric_log): Flushed system log up to offset 9605
==> setup-server-log.log <==
2021.12.03 02:47:14.844395 [ 96 ] {} <Information> Application: Child process exited normally with code 0.
As you can see killall instantly fails with no such process, while this
cannot be true since it was there, and also according to logs there were
messages after running analyze_queries() from compare.sh
This should fix problems like in [1].
[1]: https://clickhouse-test-reports.s3.yandex.net/32080/344298f4037f88b114b8e798bb30036b24be8f16/performance_comparison/report.html#fail1