Implementation:
- Added a new buffer ForkWriteBuffer takes a vector of WriteBuffer and writes data to all of them. It uses the buffer of the first element as its buffer and copies data from first buffer to all the other buffers
Testing:
- Updated tests/queries/0_stateless/02346_into_outfile_and_stdout.sh
Documentation:
- Updated the english documentation for SELECT.. INTO OUTFILE with AND STDOUT.
A simple HelloWorld program with zero includes except iostream triggers
a build of ca. 2000 source files. The reason is that ClickHouse's
top-level CMakeLists.txt overrides "add_executable()" to link all
binaries against "clickhouse_new_delete". This links against
"clickhouse_common_io", which in turn has lots of 3rd party library
dependencies ... Without linking "clickhouse_new_delete", the number of
compiled files for "HelloWorld" goes down to ca. 70.
As an example, the self-extracting-executable needs none of its current
dependencies but other programs may also benefit.
In order to restore access to the original "add_executable()", the
overriding version is now prefixed. There is precedence for a
"clickhouse_" prefix (as opposed to "ch_"), for example
"clickhouse_split_debug_symbols". In general prefixing makes sense also
because overriding CMake commands relies on undocumented behavior and is
considered not-so-great practice (*).
(*) https://crascit.com/2018/09/14/do-not-redefine-cmake-commands/
Implementation:
- Added a bool to ASTQueryWithOutput & patched the usage in ClientBase.
- Added a new buffer TeeWriteBuffer which extends from WriteBufferFromFile (used to write data to the file) and has WriteBufferFromFileDescriptor (used to write data to stdout). The WriteBufferFromFileDescriptor uses the same buffer as TeeWriteBuffer.
- Added a new bool select_into_outfile_and_stdout in ClientBase to enable/disable progress rendering.
Testing:
- Added a test tests/queries/0_stateless/02346_into_outfile_and_stdout.sh
Documentation:
- Updated the english documentation for the new option in SELECT.
- TSA is a static analyzer build by Google which finds race conditions
and deadlocks at compile time.
- It works by associating a shared member variable with a
synchronization primitive that protects it. The compiler can then
check at each access if proper locking happened before. A good
introduction are [0] and [1].
- TSA requires some help by the programmer via annotations. Luckily,
LLVM's libcxx already has annotations for std::mutex, std::lock_guard,
std::shared_mutex and std::scoped_lock. This commit enables them
(--> contrib/libcxx-cmake/CMakeLists.txt).
- Further, this commit adds convenience macros for the low-level
annotations for use in ClickHouse (--> base/defines.h). For
demonstration, they are leveraged in a few places.
- As we compile with "-Wall -Wextra -Weverything", the required compiler
flag "-Wthread-safety-analysis" was already enabled. Negative checks
are an experimental feature of TSA and disabled
(--> cmake/warnings.cmake). Compile times did not increase noticeably.
- TSA is used in a few places with simple locking. I tried TSA also
where locking is more complex. The problem was usually that it is
unclear which data is protected by which lock :-(. But there was
definitely some weird code where locking looked broken. So there is
some potential to find bugs.
*** Limitations of TSA besides the ones listed in [1]:
- The programmer needs to know which lock protects which piece of shared
data. This is not always easy for large classes.
- Two synchronization primitives used in ClickHouse are not annotated in
libcxx:
(1) std::unique_lock: A releaseable lock handle often together with
std::condition_variable, e.g. in solve producer-consumer problems.
(2) std::recursive_mutex: A re-entrant mutex variant. Its usage can be
considered a design flaw + typically it is slower than a standard
mutex. In this commit, one std::recursive_mutex was converted to
std::mutex and annotated with TSA.
- For free-standing functions (e.g. helper functions) which are passed
shared data members, it can be tricky to specify the associated lock.
This is because the annotations use the normal C++ rules for symbol
resolution.
[0] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
[1] https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42958.pdf