jemalloc can show the following warning:
Number of CPUs detected is not deterministic. Per-CPU arena disabled
It will be shown if one of the following returns different number of
CPUs:
- _SC_NPROCESSORS_ONLN
- _SC_NPROCESSORS_CONF
- sched_getaffinity()
And actually for my CPU linux returns different numbers, because there
are more possible CPUs then online, from dmesg:
smpboot: Allowing 128 CPUs, 64 hotplug CPUs
And from sysfs:
# grep . /sys/devices/system/cpu/{possible,online,offline}
/sys/devices/system/cpu/possible:0-127
/sys/devices/system/cpu/online:0-63
/sys/devices/system/cpu/offline:64-127
From ACPI:
# acpidump -o acpi
# acpixtract -a acpi
# iasl -d *.dat
# grep -e 'Processor Enabled' apic.dsl | sort | uniq -c
64 Processor Enabled : 0
64 Processor Enabled : 1
So I guess this is the same as what happened in this perf run [1].
[1]: https://s3.amazonaws.com/clickhouse-test-reports/51360/5d43a64112711b339b82b1c0e8df7882546a1a3c/performance_comparison_[4_4]/report.html
P.S. personally I, just use cmdline=possible_cpus=64 to fix this for my
setup.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Avoid OOM in perf tests
At some point perf tests started to fail for one setup on CI [1]:
/home/ubuntu/actions-runner/_work/_temp/f8fce7b1-8bc4-49c8-a203-c96867f4420a.sh: line 5: 1882659 Killed python3 performance_comparison_check.py "$CHECK_NAME"
Error: Process completed with exit code 137.
[1]: https://github.com/ClickHouse/ClickHouse/actions/runs/4230936986/jobs/7349818625
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Switch perf tests to ubuntu 22.04 for parallel with --memsuspend
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
---------
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
We run left server two times. If after the first run server will not be properly stopped, we will get `Address already in use: [::]:9001` exception on the second run.
Use subshell to:
- avoid change/restore of TIMEFORMAT
- grepping out trace output, by using set +x in a subshell
- use array for options to avoid extra backslashes
killall requires strict match, i.e. "clickhouse-server" not
"clickhouse":
2021-12-03 05:24:56 + env kill -- -21700
2021-12-03 05:24:56 kill: (-21700): No such process
2021-12-03 05:24:56 + killall clickhouse
2021-12-03 05:24:56 clickhouse: no process found
2021-12-03 05:24:56 + echo Servers stopped.
2021-12-03 05:24:56 Servers stopped.
2021-12-03 05:24:56 + analyze_queries
$ tail -n1 *-server-log.log
==> left-server-log.log <==
2021.12.03 05:26:59.530647 [ 450 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushed system log up to offset 1668052
==> right-server-log.log <==
2021.12.03 05:27:20.873136 [ 466 ] {} <Trace> SystemLog (system.metric_log): Flushed system log up to offset 9605
==> setup-server-log.log <==
2021.12.03 02:47:14.844395 [ 96 ] {} <Information> Application: Child process exited normally with code 0.
As you can see killall instantly fails with no such process, while this
cannot be true since it was there, and also according to logs there were
messages after running analyze_queries() from compare.sh
This should fix problems like in [1].
[1]: https://clickhouse-test-reports.s3.yandex.net/32080/344298f4037f88b114b8e798bb30036b24be8f16/performance_comparison/report.html#fail1
Before this patch:
- upstream/master and PR's *with* perf tests or pef scripts changes:
--runs=13 --max-queries=0
- PRs *without* perf changes:
--runs=7 --max-queries=20
- PRs w/ only perf tests changes:
--runs-13 --max-queries=0 <list of perf tests>
After:
- upstream/master and PR's *with* perf tests changes:
--runs=13 --max-queries=0
- PRs *without* perf changes:
--runs=7 --max-queries=10
- PRs w/ only perf tests changes:
--runs-13 --max-queries=0 <list of perf tests>
So to underline, now we will not look at perf scripts changes anymore,
and we will also decrease number of random queries to run to 10.