getauxval() from glibc-compatibility did not work always correctly:
- It does not work after setenv(), and this breaks vsyscalls,
like sched_getcpu() [1] (and BaseDaemon.cpp always set TZ if timezone
is defined, which is true for CI [2]).
Also note, that fixing setenv() will not fix LSan,
since the culprit is getauxval()
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1163404
[2]: ClickHouse#32928 (comment)
- Another think that is definitely broken is LSan (Leak Sanitizer), it
relies on worked getauxval() but it does not work if __environ is not
initialized yet (there is even a commit about this).
And because of, at least, one leak had been introduced [3]:
[3]: ClickHouse#33840
Fix this by using /proc/self/auxv with fallback to environ solution to
make it compatible with environment that does not allow reading from
auxv (or no procfs).
v2: add fallback to environ solution
v3: fix return value for __auxv_init_procfs()
(cherry picked from commit f187c3499a)
v4: more verbose message on errors, CI founds [1]:
AUXV already has value (529267711)
[1]: https://s3.amazonaws.com/clickhouse-test-reports/39103/2325f7e8442d1672ce5fb43b11039b6a8937e298/stress_test__memory__actions_.html
v5: break at AT_NULL
v6: ignore AT_IGNORE
v7: suppress TSan and remove superior check to avoid abort() in case of race
v8: proper suppressions (not inner function but itself)
Refs: #33957
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
getauxval() from glibc-compatibility did not work always correctly:
- It does not work after setenv(), and this breaks vsyscalls,
like sched_getcpu() [1] (and BaseDaemon.cpp always set TZ if timezone
is defined, which is true for CI [2]).
Also note, that fixing setenv() will not fix LSan,
since the culprit is getauxval()
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1163404
[2]: ClickHouse#32928 (comment)
- Another think that is definitely broken is LSan (Leak Sanitizer), it
relies on worked getauxval() but it does not work if __environ is not
initialized yet (there is even a commit about this).
And because of, at least, one leak had been introduced [3]:
[3]: ClickHouse#33840
Fix this by using /proc/self/auxv with fallback to environ solution to
make it compatible with environment that does not allow reading from
auxv (or no procfs).
v2: add fallback to environ solution
v3: fix return value for __auxv_init_procfs()
Refs: #33957
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
getauxval() from glibc-compatibility did not work always correctly:
- it does not work after setenv(), and this breaks vsyscalls,
like sched_getcpu() [1] (and BaseDaemon.cpp always set TZ if timezone
is defined, which is true for CI [2]).
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1163404
[2]: https://github.com/ClickHouse/ClickHouse/pull/32928#issuecomment-1015762717
- another think that is definitely broken is LSan (Leak Sanitizer), it
relies on worked getauxval() but it does not work if __environ is not
initialized yet (there is even a commit about this).
And because of, at least, one leak had been introduced [3]:
[3]: https://github.com/ClickHouse/ClickHouse/pull/33840
Fix this by using /proc/self/auxv.
And let's see how many issues will LSan find...
I've verified this patch manually by printing AT_BASE and compared it
with output of LD_SHOW_AUXV.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
In glibc 2.32 new version of some symbols had been added [1]:
$ nm -D clickhouse | fgrep -e @GLIBC_2.32
U pthread_getattr_np@GLIBC_2.32
U pthread_sigmask@GLIBC_2.32
[1]: https://www.spinics.net/lists/fedora-devel/msg273044.html
Right now ubuntu 20.04 is used as official image for building
ClickHouse, however once it will be switched someone may not be happy
with that fact that he/she cannot use official binaries anymore because
they have glibc < 2.32.
To avoid this dependency, let's force previous version of those
symbols from glibc.
Note, that I've tested this by compiling with glibc 2.32 and verifying
that output ELF does not have @GLIBC_2.32 symbols and also running that
binary inside ubuntu:20.04 image (that has glibc 2.31).
v1: -Wl,--wrap
v2: -Wl,--defsym
v3: -include
v4: fix versioning for aarch64
This is the function that should take into account TLS block, and 1MB is
too high, since it will be used for sigaltstack() on SIGSEGV
v0: copy-paste glibc __pthread_get_minstack()
v2: return static 16K instead of 1MB