Note, that this is just a syntastic change, that should not makes any
difference (well the only difference is that now it supports gold and
other links, since the option is handled by the plugin itself instead of
the linker).
Refs: https://reviews.llvm.org/D133092
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
We currently only log a compiler-generated "build id" at startup which
is different for each build. That makes it useless to determine the
exact source code state in tests (e.g. BC test) and from user log files
(e.g. if someone compiled an intermediate version of ClickHouse).
Current log message:
Starting ClickHouse 22.10.1.1 with revision 54467, build id: 6F35820328F89C9F36E91C447FF9E61CAF0EF019, PID 42633
New log message:
Starting ClickHouse 22.10.1.1 (revision 54467, git hash: b6b1f7f763f94ffa12133679a6f80342dd1c3afe, build id: 47B12BE61151926FBBD230DE42F3B7A6652AC482), PID 981813
Although this increase debug symbol size from 510MB to 1.8GB, but it is
not a problem for packages, since they are compressed anyway.
Checked deb package, and size slightly increased though, 834M -> 962M.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
During first run of cmake the toolchain file will be loaded twice,
- /usr/share/cmake-3.23/Modules/CMakeDetermineSystem.cmake
- /bld/CMakeFiles/3.23.2/CMakeSystem.cmake
But once you already have non-empty cmake cache it will be loaded only
once:
- /bld/CMakeFiles/3.23.2/CMakeSystem.cmake
This has no harm except for double load of toolchain will add
--gcc-toolchain multiple times that will not allow ccache to reuse the
cache.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
- TSA is a static analyzer build by Google which finds race conditions
and deadlocks at compile time.
- It works by associating a shared member variable with a
synchronization primitive that protects it. The compiler can then
check at each access if proper locking happened before. A good
introduction are [0] and [1].
- TSA requires some help by the programmer via annotations. Luckily,
LLVM's libcxx already has annotations for std::mutex, std::lock_guard,
std::shared_mutex and std::scoped_lock. This commit enables them
(--> contrib/libcxx-cmake/CMakeLists.txt).
- Further, this commit adds convenience macros for the low-level
annotations for use in ClickHouse (--> base/defines.h). For
demonstration, they are leveraged in a few places.
- As we compile with "-Wall -Wextra -Weverything", the required compiler
flag "-Wthread-safety-analysis" was already enabled. Negative checks
are an experimental feature of TSA and disabled
(--> cmake/warnings.cmake). Compile times did not increase noticeably.
- TSA is used in a few places with simple locking. I tried TSA also
where locking is more complex. The problem was usually that it is
unclear which data is protected by which lock :-(. But there was
definitely some weird code where locking looked broken. So there is
some potential to find bugs.
*** Limitations of TSA besides the ones listed in [1]:
- The programmer needs to know which lock protects which piece of shared
data. This is not always easy for large classes.
- Two synchronization primitives used in ClickHouse are not annotated in
libcxx:
(1) std::unique_lock: A releaseable lock handle often together with
std::condition_variable, e.g. in solve producer-consumer problems.
(2) std::recursive_mutex: A re-entrant mutex variant. Its usage can be
considered a design flaw + typically it is slower than a standard
mutex. In this commit, one std::recursive_mutex was converted to
std::mutex and annotated with TSA.
- For free-standing functions (e.g. helper functions) which are passed
shared data members, it can be tricky to specify the associated lock.
This is because the annotations use the normal C++ rules for symbol
resolution.
[0] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
[1] https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42958.pdf
- the CI builds were recently upgraded to Clang 14
- this commit bumps versions of other LLVM tools needed for the build
- this is important for people who have multiple LLVM versions installed
via their package manager
First part, updated most UTF8, hashing, memory and codecs. Except
utf8lower and upper, maybe a little later.
That includes huge amount of research with movemask dealing. Exact
details and blog post TBD.
- It was noticed that in (*), the crashstack says "There is no
information about the reference checksum."
- The binaries are pulled via docker hub and upon inspection they indeed
lack the hash embedded as ELF section ".note.ClickHouse.hash" in the
clickhouse binary. This is weird because docker hub images are
"official" builds which should trigger the hash embedding.
- Turns out that the docker hub binaries are also stripped which was too
aggressive. We now no longer remove sections ".comment" and ".note"
which are anyways only 140 bytes in size, i.e. binary size still goes
down (on my stystem) from 2.1 GB to 0.47 + 0.40 GB binary + dbg info.
(*) https://playground.lodthe.me/ba75d494-95d1-4ff6-a0ad-60c138636c9b
- before: usr/lib/debug/usr/bin/clickhouse.debug/clickhouse.debug
- after : usr/lib/debug/usr/bin/clickhouse.debug
Note, clickhouse_make_empty_debug_info_for_nfpm() is fine.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Globbing generally misses to pick up files which were added/deleted
after CMake's configure. This is a nuissance but can be alleviated using
CONFIGURE_DEPENDS (available since CMake 3.12) which adds a check for
new/deleted files before each compile and - if necessary - restarts the
configuration. On my system, the check takes < 0.1 sec.
(Side note: CONFIGURE_DEPENDS is not guaranteed to work accross all
generators, but at least it works for Ninja which everyone @CH seems to
use.)
The check activated ccache unconditionally for all non-Clang compilers
(= GCC) while allowing ancient ccache versions for these. Perhaps there
was a reason for that in the past but it's simpler to only require a
minimum ccache version.
To simplify further, also require at least ccache 3.3 (released in 2016)
instead of 3.2.1 (released in 2014).
The compiler launcher (ccache, distcc) can be set externally via
-DCMAKE_CXX_COMPILER_LAUNCHER=<tool>. We previously silently ignored
this setting and continued without any launcher (e.g. ccache). Changed
this to now respect the externally specified launcher.
WEVERYTHING enables on Clang literally every warning. People on the
internet are divided if this is a good thing or not but ClickHouse
compiles with -Weverything + some exceptions for noisy warnings since at
least a year.
I tried to build with WEVERYTHING = OFF and the build was badly broken.
It seems nobody actually turns WEVERYTHING off. Actually, why would one
if the CI builds (configured with WEVERYTHING = ON) potentially generate
errors not generated in local development.
To simplify the build scripts and to remove the need to maintain two
sets of compiler warnings, I made WEVERYTHING the default and threw
WEVERYTHING = OFF out.
On Darwin, the build script tries to
1. use llvm-objcopy/llvm-strip from $PATH,
2. if not found by 1., use standard objcopy/strip from $PATH
The brew install instructions recommends to set $PATH to brew's binary
dir, so 2. will find something (assuming binutils is installed from
brew). If $PATH additionally points to brew's LLVM binary dir (which is
different from brew's binary dir), 1. will find the llvm versions of the
tools.
This commit removes additional logic which repeats above steps in a more
implicit way by calling brew internally and figuring out the paths once
more if 1. and 2. cannot find them in the $PATH. This removes
duplication and simplifies the script. Maybe it even helps with
reproducibility.
- previous XCode 10.2 / Clang 7.0 was horribly outdated
- XCode 12.0 corresponds to the minimally required vanilla Clang version 12.0
- remove passing of "-fchar8_t" flag (with Clang >= 9, it is part of -std=c++20)
- Properly handle the case that we are on an unsupported but unlisted
arch, e.g. mips. Before, we would simply continue
configuration/compilation with no architecture set.
- CMake variable "ARCH_ARM" could in theory be replaced by
"ARCH_AARCH64". This would need refactoring in dependent CMakeLists,
therefore not doing it now.
Really just for convenience so the developer can choose appropriate
values for CMake options PARALLEL_COMPILE_JOBS and PARALLEL_LINK_JOBS.
Might also help to find out why builds are slow.