Recently I noticed that clickhouse compiled with ASan does not work with
newer glibc 2.36+, before I though that this was only about compiling
with old but using new, however that was not correct, ASan simply does
not work with glibc 2.36+.
Here is a simple reproducer [1]:
$ cat > test-asan.cpp <<EOL
#include <pthread.h>
int main()
{
// something broken in ASan in interceptor for __pthread_mutex_lock
// and only since glibc 2.36, and for pthread_mutex_lock everything is OK
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
return __pthread_mutex_lock(&mutex);
}
EOL
$ clang -g3 -o test-asan test-asan.cpp -fsanitize=address
$ ./test-asan
AddressSanitizer:DEADLYSIGNAL
=================================================================
==15659==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0x7fffffffccb0 sp 0x7fffffffcb98 T0)
==15659==Hint: pc points to the zero page.
==15659==The signal is caused by a READ memory access.
==15659==Hint: address points to the zero page.
#0 0x0 (<unknown module>)
#1 0x7ffff7cda28f (/usr/lib/libc.so.6+0x2328f) (BuildId: 1e94beb079e278ac4f2c8bce1f53091548ea1584)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (<unknown module>)
==15659==ABORTING
[1]: https://gist.github.com/azat/af073e57a248e04488b21068643f079e
I've started observing glibc code, there was some changes in glibc, that
moves pthread functions out from libpthread.so.0 into libc.so.6
(somewhere between 2.31 and 2.35), but
the problem pops up only with 2.36, 2.35 works fine.
After this I've looked into changes between 2.35 and 2.36, and found
this patch [2] - "dlsym: Make RTLD_NEXT prefer default version
definition [BZ #14932]", that fixes this bug [3].
[2]: https://sourceware.org/git/?p=glibc.git;a=commit;h=efa7936e4c91b1c260d03614bb26858fbb8a0204
[3]: https://sourceware.org/bugzilla/show_bug.cgi?id=14932
The problem with using DL_LOOKUP_RETURN_NEWEST flag for RTLD_NEXT is
that it does not resolve hidden symbols (and __pthread_mutex_lock is
indeed hidden).
Here is a sample that will show the difference [4]:
$ cat > test-dlsym.c <<EOL
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
int main()
{
void *p = dlsym(RTLD_NEXT, "__pthread_mutex_lock");
printf("__pthread_mutex_lock: %p (via RTLD_NEXT)\n", p);
return 0;
}
EOL
# glibc 2.35: __pthread_mutex_lock: 0x7ffff7e27f70 (via RTLD_NEXT)
# glibc 2.36: __pthread_mutex_lock: (nil) (via RTLD_NEXT)
[4]: https://gist.github.com/azat/3b5f2ae6011bef2ae86392cea7789eb7
But ThreadFuzzer uses internal symbols to wrap
pthread_mutex_lock/pthread_mutex_unlock, which are intercepted by ASan
and this leads to NULL dereference.
The fix was obvious - just use dlsym(RTLD_NEXT), however on older
glibc's this leads to endless recursion (see commits in the code). But
only for jemalloc [5], and even though sanitizers does not uses jemalloc
the code of ThreadFuzzer is generic and I don't want to guard it with
more preprocessors macros.
[5]: https://gist.github.com/azat/588d9c72c1e70fc13ebe113197883aa2
So we have to use RTLD_NEXT only for ASan.
There is also one more interesting issue, if you will compile with clang
that itself had been compiled with newer libc (i.e. 2.36), you will get
the following error:
$ podman run --privileged -v $PWD/.cmake-asan/programs:/root/bin -e PATH=/bin:/root/bin -e --rm -it ubuntu-dev-v3 clickhouse
==1==ERROR: AddressSanitizer failed to allocate 0x0 (0) bytes of SetAlternateSignalStack (error code: 22)
...
==1==End of process memory map.
AddressSanitizer: CHECK failed: sanitizer_common.cpp:53 "((0 && "unable to mmap")) != (0)" (0x0, 0x0) (tid=1)
<empty stack>
The problem is that since GLIBC_2.31, `SIGSTKSZ` is a call to
`getconf(_SC_MINSIGSTKSZ)`, but older glibc does not have it, so `-1`
will be returned and used as `SIGSTKSZ` instead.
The workaround to disable alternative stack:
$ podman run --privileged -v $PWD/.cmake-asan/programs:/root/bin -e PATH=/bin:/root/bin -e ASAN_OPTIONS=use_sigaltstack=0 --rm -it ubuntu-dev-v3 clickhouse client --version
ClickHouse client version 22.13.1.1.
Fixes: #43426
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Official docs:
Some headers from C library were deprecated in C++ and are no longer
welcome in C++ codebases. Some have no effect in C++. For more details
refer to the C++ 14 Standard [depr.c.headers] section. This check
replaces C standard library headers with their C++ alternatives and
removes redundant ones.
Right now it fails with:
ld.lld: error: undefined symbol: __pthread_mutex_lock
>>> referenced by ThreadFuzzer.cpp:300 (./src/Common/ThreadFuzzer.cpp:300)
>>> src/CMakeFiles/clickhouse_common_io.dir/Common/ThreadFuzzer.cpp.o:(pthread_mutex_lock)
>>> did you mean: __pthread_mutex_lock@GLIBC_2.2.5
>>> defined in: /usr/lib/libc.so.6
Here is the list of matched symbols for 2.35:
$ nm -D /lib/libc.so.6 | fgrep pthread_mutex_lock
00000000000908a0 T __pthread_mutex_lock@GLIBC_2.2.5
00000000000908a0 T pthread_mutex_lock@@GLIBC_2.2.5
$ nm -D /lib/libpthread.so.0 | fgrep -c pthread_mutex_lock
0
And this is for 2.33:
$ nm -D /lib/x86_64-linux-gnu/libc.so.6 | fgrep pthread_mutex_lock
0000000000083eb0 T pthread_mutex_lock@@GLIBC_2.2.5
$ nm -D /lib/x86_64-linux-gnu/libpthread.so.0 | fgrep pthread_mutex_lock
000000000000af00 T __pthread_mutex_lock@@GLIBC_2.2.5
000000000000af00 W pthread_mutex_lock@@GLIBC_2.2.5
Because "likely" starting from 27a448223cb2d3bab191c61303db48cee66f871c
("nptl: Move core mutex functions into libc") [1], __pthread_mutex_lock
is not exported anymore.
[1]: https://sourceware.org/git/?p=glibc.git;a=commit;h=27a448223cb2d3bab191c61303db48cee66f871c
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>