A simple HelloWorld program with zero includes except iostream triggers
a build of ca. 2000 source files. The reason is that ClickHouse's
top-level CMakeLists.txt overrides "add_executable()" to link all
binaries against "clickhouse_new_delete". This links against
"clickhouse_common_io", which in turn has lots of 3rd party library
dependencies ... Without linking "clickhouse_new_delete", the number of
compiled files for "HelloWorld" goes down to ca. 70.
As an example, the self-extracting-executable needs none of its current
dependencies but other programs may also benefit.
In order to restore access to the original "add_executable()", the
overriding version is now prefixed. There is precedence for a
"clickhouse_" prefix (as opposed to "ch_"), for example
"clickhouse_split_debug_symbols". In general prefixing makes sense also
because overriding CMake commands relies on undocumented behavior and is
considered not-so-great practice (*).
(*) https://crascit.com/2018/09/14/do-not-redefine-cmake-commands/
This commit migrates ClickHouse to Vectorscan. The first 10 min of
[0] explain the reasons for it.
(*) Addresses (but does not resolve) #38046
(*) Config parameter names (e.g. "max_hyperscan_regexp_length") are
preserved for compatibility. Likewise, error codes (e.g.
"ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g.
"HyperscanDeleter") are preserved as vectorscan aims to be a drop-in
replacement.
[0] https://www.youtube.com/watch?v=KlZWmmflW6M
- list-licenses.sh assumed GNU versions (e.g. the "-printf" flag of
find) but Mac ships with a non-GNU version
- this led to error
[8677/8942] Generating StorageSystemLicenses.generated.cpp
find: -printf: unknown primary or operator
find: -printf: unknown primary or operator
find: -printf: unknown primary or operator
find: -printf: unknown primary or operator
during build and as a result, an empty system.licenses table
- As a fix, force the GNU versions of find and grep on Mac
E.g.
utils/self-extracting-executable/compressor.cpp:257:31: format specifies type 'ptrdiff_t' (aka 'long') but the argument has type 'off_t' (aka 'long long') [-Werror,-Wformat]
printf("Size: %td\n", info_in.st_size);
~~~ ^~~~~~~~~~~~~~~
%lld
Not sure though if it's a hard requirement to use only C.
Avoided usage of fmt::format() to keep link dependencies to a minimum.
Also not using C++20 std::format() as it's only available in Clang >=14.