stress running previous version of the server w/o correct debug symbols
right now, since nobody restore clickhouse.debug file, and this can lead
to the following issues, like in [1]:
- incorrect stack traces
- gdb crashes
- clickhouse crashes, due to non-robust internal DWARF parser (probably)
[1]: https://s3.amazonaws.com/clickhouse-test-reports/41730/8cc53a48ae99a765085f44a75fa49314d1f1cc7d/stress_test__ubsan_.html
Right now I decided not to rework the script to make it less error
prone, but simply fix the problem.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
gcore is a gdb command, that internally uses gdb to dump the core.
However with proper configuration of limits (core_dump.size_limit) it
should not be required, althought some issues is possible:
- non standard kernel.core_pattern
- sanitizers
So yes, gcore is more "universal" (you don't need to configure any
`kernel_pattern`), but it is ad-hoc, and it has drawbacks -
**it does not work when gdb fails**. For example gdb may fail with
`Dwarf Error: DW_FORM_strx1 found in non-DWO CU` in case of DWARF-5 [1].
[1]: https://github.com/ClickHouse/ClickHouse/pull/40772#issuecomment-1236331323.
Let's try to switch to more native way.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Right now if you will look at the OOM errors:
- OOM killer (or signal 9) in clickhouse-server.log
- Backward compatibility check: OOM messages in clickhouse-server.log
Most of them are not real, but just clickhouse server got KILLed by
clickhouse stop, #40678 may imporove the situation, but to definitely
sure that there was OOM let's look at dmesg.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
max_server_memory_usage already set to 75%, so OOM should not happens,
the reason is that because RSS does not match with memory tracker
statistics:
2022.08.05 12:36:57.869896 [ 82524 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 64.69 GiB, peak 65.26 GiB, will set to 62.80 GiB (RSS), difference: -1.89 GiB
...
2022.08.05 12:37:00.213440 [ 82334 ] {} <Error> void DB::MergeTreeBackgroundExecutor<DB::MergeMutateRuntimeQueue>::routine(DB::TaskRuntimeDataPtr) [Queue = DB::MergeMutateRuntimeQueue]: Code: 241. DB::Exception: Memory limit (total) exceeded: would use 64.68 GiB (attempt to allocate chunk of 1298794 bytes), maximum: 51.44 GiB. OvercommitTracker decision: Memory overcommit isn't used. Waiting time or orvercommit denominator are set to zero.. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>