ClickHouse uses JEMALLOC_PROF_LIBGCC - _Unwind_Backtrace() for jemalloc,
which calls _Unwind_Backtrace() during bootstrap of jemalloc, and if,
dlsym() has allocations it will lead to deadlock:
0 __lll_lock_wait (futex=futex@entry=0x16516410 <init_lock+64>, private=0) at lowlevellock.c:52
1 0x00007ffff7f9d0a3 in __GI___pthread_mutex_lock (mutex=0x16516410 <init_lock+64>) at ../nptl/pthread_mutex_lock.c:80
2 0x000000000ba6ec23 in pthread_mutex_lock (arg=arg@entry=0x16516410 <init_lock+64>) at ./build/RelWithDebInfo/./src/Common/ThreadFuzzer.cpp:445
3 0x0000000014215fd4 in malloc_mutex_lock_final (mutex=0x165163d0 <init_lock>) at ./contrib/jemalloc/include/jemalloc/internal/mutex.h:151
4 malloc_mutex_lock_slow (mutex=0x165163d0 <init_lock>) at ./build/RelWithDebInfo/./contrib/jemalloc/src/mutex.c:90
5 0x00000000141defc6 in malloc_mutex_lock (tsdn=0x0, mutex=<optimized out>) at ./contrib/jemalloc/include/jemalloc/internal/mutex.h:217
6 malloc_init_hard () at ./build/RelWithDebInfo/./contrib/jemalloc/src/jemalloc.c:2118
7 0x00000000141d05db in malloc_init () at ./build/RelWithDebInfo/./contrib/jemalloc/src/jemalloc.c:298
8 imalloc_init_check (sopts=<optimized out>, dopts=<optimized out>) at ./build/RelWithDebInfo/./contrib/jemalloc/src/jemalloc.c:2658
9 imalloc (sopts=<optimized out>, dopts=<optimized out>) at ./build/RelWithDebInfo/./contrib/jemalloc/src/jemalloc.c:2689
10 calloc (num=num@entry=1, size=size@entry=32) at ./build/RelWithDebInfo/./contrib/jemalloc/src/jemalloc.c:2852
11 0x00007ffff7c42c05 in _dlerror_run (operate=operate@entry=0x7ffff7c42490 <dlsym_doit>, args=args@entry=0x7fffffffda40) at dlerror.c:148
12 0x00007ffff7c42525 in __dlsym (handle=<optimized out>, name=0xe7f568 "dl_iterate_phdr") at dlsym.c:70
13 0x000000001408e1a0 in (anonymous namespace)::getOriginalDLIteratePHDR () at ./build/RelWithDebInfo/./base/base/phdr_cache.cpp:44
14 dl_iterate_phdr (callback=0x16287a60 <libunwind::findUnwindSectionsByPhdr(dl_phdr_info*, unsigned long, void*)>, data=0x7fffffffdb58) at ./build/RelWithDebInfo/./base/base/phdr_cache.cpp:64
15 0x0000000016288468 in libunwind::LocalAddressSpace::findUnwindSections (this=<optimized out>, targetAddr=371765377, info=...) at ./contrib/libunwind/src/AddressSpace.hpp:605
16 libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::setInfoBasedOnIPRegister (this=this@entry=0x7fffffffde70, isReturnAddress=false) at ./contrib/libunwind/src/UnwindCursor.hpp:2553
17 0x0000000016287617 in __unw_init_local (cursor=cursor@entry=0x7fffffffde70, context=context@entry=0x7fffffffddc8) at ./build/RelWithDebInfo/./contrib/libunwind/src/libunwind.cpp:91
18 0x000000001628b094 in _Unwind_Backtrace (callback=0x14226f40 <prof_unwind_init_callback>, ref=0x0) at ./build/RelWithDebInfo/./contrib/libunwind/src/UnwindLevel1-gcc-ext.c:106
19 0x000000001421a7e0 in prof_boot2 (tsd=tsd@entry=0x7ffff7c357c0, base=<optimized out>) at ./build/RelWithDebInfo/./contrib/jemalloc/src/prof.c:693
20 0x00000000141df3e7 in malloc_init_hard () at ./build/RelWithDebInfo/./contrib/jemalloc/src/jemalloc.c:2151
21 0x00000000141d96f1 in malloc_init () at ./build/RelWithDebInfo/./contrib/jemalloc/src/jemalloc.c:298
22 nallocx (size=32, flags=0) at ./build/RelWithDebInfo/./contrib/jemalloc/src/jemalloc.c:4011
23 0x000000000b9fdd9b in _ZN6Memory23getActualAllocationSizeITpTkNSt3__17same_asISt11align_val_tEEJEQsr2DBE16OptionalArgumentIDpT_EEEmmS5_ (size=32) at ./src/Common/memory.h:197
24 _ZN6Memory11trackMemoryITpTkNSt3__17same_asISt11align_val_tEEJEQsr2DBE16OptionalArgumentIDpT_EEEmmR15AllocationTraceS5_ (size=32, trace=...) at ./src/Common/memory.h:208
25 operator new (size=size@entry=32) at ./build/RelWithDebInfo/./src/Common/new_delete.cpp:53
26 0x000000001540e0a6 in google::protobuf::internal::ShutdownData::get () at ./build/RelWithDebInfo/./contrib/google-protobuf/src/google/protobuf/message_lite.cc:679
27 google::protobuf::internal::OnShutdownRun (f=0x15407420 <google::protobuf::internal::DestroyString(void const*)>, arg=0x167a6368 <google::protobuf::internal::fixed_address_empty_string>) at ./build/RelWithDebInfo/./contrib/google-protobuf/src/google/protobuf/message_lite.cc:697
28 0x0000000015407560 in global constructors keyed to 000101 () at ./contrib/google-protobuf/src/google/protobuf/generated_message_util.h:180
29 0x000000001629101d in __libc_csu_init ()
30 0x00007ffff7dba010 in __libc_start_main (main=0x66e3b00 <main(int, char**)>, argc=4, argv=0x7fffffffe258, init=0x16290fd0 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe248) at ../csu/libc-start.c:264
31 0x00000000066e302e in _start ()
And this is indeed what happens for glibc prio 2.34, since only in [2]
the allocation had been removed
[2]: https://sourceware.org/git/?p=glibc.git;a=commit;h=fada9018199c21c469ff0e731ef75c6020074ac9
But, jemalloc provides also JEMALLOC_PROF_LIBUNWIND, yes it is not llvm
libunwind, but gnu one, but ClickHouse version of llvm libunwind
provides unw_backtrace(), so we can use it, and it will not be called
during jemalloc bootstrap, so deadlock should not be possible.
Funny thing that for this deadlock the workaround is to enable
background_thread, because a thread created for it will initialize TLS for
__dlerror().
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
libunwind is reentrant and signal safe, and works faster then then
gcc_eh (plus it has some custom patches for problems that have been
found during it's usage in ClickHouse).
gcc_eh may be missing in the system (if gcc was not installed), and
even if it exists clickhouse uses -nodefaultlibs, so some care should be
made to make it work.
Also this library is tiny and there shouln't be any problem to require
it always (there is already tendency to require some contrib libraries,
i.e. poco).
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This patch also updates the jemalloc version.
Note, that I've enabled page_id for jemalloc
PR_SET_VMA/PR_SET_VMA_ANON_NAME, that requires linux 5.17+ (but ignores
EINVAL anyway).
v2: add -isystem to fix reserved name for JEMALLOC_OVERRIDE___LIBC_PVALLOC
Refs: https://github.com/jemalloc/jemalloc/pull/2304
Refs: https://github.com/ClickHouse/ClickHouse/issues/31531
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
jemalloc provides support for two different libunwind flavors: the original HP libunwind and the one coming with gcc / g++ / libstdc++.
The latter is identified by `JEMALLOC_PROF_LIBGCC` and provides `_Unwind_Backtrace` method instead of `unw_backtrace`.
At the time ClickHouse uses LLVM libunwind which follows libgcc's way of backtracing.
ClickHouse has to provide `unw_backtrace` method by the means of [commit 8e2b31e](8e2b31e766).
While this PR does not allow complete remove of the patch (as ClickHouse itself uses unw_backtrace directly), it definitely sorts the things out.
If you had jemalloc build in contrib/jemalloc, and not current version
but some outdated, then compiler will pick incorrect file and may fail.
Also SYSTEM should be removed since regular -I is in the different list,
and will be checked before -isystem.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Note, that there is no need to disable JEMALLOC_PURGE_MADVISE_FREE,
since jemalloc does check in runtime, and ClickHouse already
successfully works w/o this change.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
- remove GNU_SOURCE it is done in common CMakeLists.txt
- remove JEMALLOC_OVERRIDE___POSIX_MEMALIGN (there is no need to since
__posix_memalign() does not exists in linux)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>