ClickHouse/docs/en/development/cmake-in-clickhouse.md
Robert Schulze bb358617e1
Better naming for stuff related to splitted debug symbols
The previous name was slightly misleading, e.g. it is not about
"intalling stripped binaries" but about splitting debug symbols from the
binary.
2022-06-30 23:41:27 +02:00

30 KiB

sidebar_position sidebar_label description
69 CMake in ClickHouse How to make ClickHouse compile and link faster

CMake in ClickHouse

How to make ClickHouse compile and link faster. Minimal ClickHouse build example:

cmake .. \
    -DCMAKE_C_COMPILER=$(which clang-13) \
    -DCMAKE_CXX_COMPILER=$(which clang++-13) \
    -DCMAKE_BUILD_TYPE=Debug \
    -DENABLE_UTILS=OFF \
    -DENABLE_TESTS=OFF

CMake files types

  1. ClickHouse source CMake files (located in the root directory and in /src).
  2. Arch-dependent CMake files (located in /cmake/os_name).
  3. Libraries finders (search for contrib libraries, located in /contrib/*/CMakeLists.txt).
  4. Contrib build CMake files (used instead of libraries' own CMake files, located in /cmake/modules)

List of CMake flags

  • The flag name is a link to its position in the code.
  • If an option's default value is itself an option, it's also a link to its position in this list.

ClickHouse modes

Name Default value Description Comment
ENABLE_CLICKHOUSE_ALL ON Enable all ClickHouse modes by default The clickhouse binary is a multi purpose tool that contains multiple execution modes (client, server, etc.), each of them may be built and linked as a separate library. If you do not know what modes you need, turn this option OFF and enable SERVER and CLIENT only.
ENABLE_CLICKHOUSE_BENCHMARK ENABLE_CLICKHOUSE_ALL Queries benchmarking mode https://clickhouse.com/docs/en/operations/utilities/clickhouse-benchmark/
ENABLE_CLICKHOUSE_CLIENT ENABLE_CLICKHOUSE_ALL Client mode (interactive tui/shell that connects to the server)
ENABLE_CLICKHOUSE_COMPRESSOR ENABLE_CLICKHOUSE_ALL Data compressor and decompressor https://clickhouse.com/docs/en/operations/utilities/clickhouse-compressor/
ENABLE_CLICKHOUSE_COPIER ENABLE_CLICKHOUSE_ALL Inter-cluster data copying mode https://clickhouse.com/docs/en/operations/utilities/clickhouse-copier/
ENABLE_CLICKHOUSE_EXTRACT_FROM_CONFIG ENABLE_CLICKHOUSE_ALL Configs processor (extract values etc.)
ENABLE_CLICKHOUSE_FORMAT ENABLE_CLICKHOUSE_ALL Queries pretty-printer and formatter with syntax highlighting
ENABLE_CLICKHOUSE_GIT_IMPORT ENABLE_CLICKHOUSE_ALL A tool to analyze Git repositories https://presentations.clickhouse.com/matemarketing_2020/
ENABLE_CLICKHOUSE_INSTALL OFF Install ClickHouse without .deb/.rpm/.tgz packages (having the binary only)
ENABLE_CLICKHOUSE_KEEPER ENABLE_CLICKHOUSE_ALL ClickHouse alternative to ZooKeeper
ENABLE_CLICKHOUSE_KEEPER_CONVERTER ENABLE_CLICKHOUSE_ALL Util allows to convert ZooKeeper logs and snapshots into clickhouse-keeper snapshot
ENABLE_CLICKHOUSE_LIBRARY_BRIDGE ENABLE_CLICKHOUSE_ALL HTTP-server working like a proxy to Library dictionary source
ENABLE_CLICKHOUSE_LOCAL ENABLE_CLICKHOUSE_ALL Local files fast processing mode https://clickhouse.com/docs/en/operations/utilities/clickhouse-local/
ENABLE_CLICKHOUSE_OBFUSCATOR ENABLE_CLICKHOUSE_ALL Table data obfuscator (convert real data to benchmark-ready one) https://clickhouse.com/docs/en/operations/utilities/clickhouse-obfuscator/
ENABLE_CLICKHOUSE_ODBC_BRIDGE ENABLE_CLICKHOUSE_ALL HTTP-server working like a proxy to ODBC driver
ENABLE_CLICKHOUSE_SERVER ENABLE_CLICKHOUSE_ALL Server mode (main mode)
ENABLE_CLICKHOUSE_STATIC_FILES_DISK_UPLOADER ENABLE_CLICKHOUSE_ALL A tool to export table data files to be later put to a static files web server

External libraries

Note that ClickHouse uses forks of these libraries, see https://github.com/ClickHouse-Extras.

Name Default value Description Comment
ENABLE_AVX 0 Use AVX instructions on x86_64
ENABLE_AVX2 0 Use AVX2 instructions on x86_64
ENABLE_AVX2_FOR_SPEC_OP 0 Use avx2 instructions for specific operations on x86_64
ENABLE_AVX512 0 Use AVX512 instructions on x86_64
ENABLE_AVX512_FOR_SPEC_OP 0 Use avx512 instructions for specific operations on x86_64
ENABLE_BMI 0 Use BMI instructions on x86_64
ENABLE_CCACHE ENABLE_CCACHE_BY_DEFAULT Speedup re-compilations using ccache (external tool) https://ccache.dev/
ENABLE_CLANG_TIDY OFF Use clang-tidy static analyzer https://clang.llvm.org/extra/clang-tidy/
ENABLE_PCLMULQDQ 1 Use pclmulqdq instructions on x86_64
ENABLE_POPCNT 1 Use popcnt instructions on x86_64
ENABLE_SSE41 1 Use SSE4.1 instructions on x86_64
ENABLE_SSE42 1 Use SSE4.2 instructions on x86_64
ENABLE_SSSE3 1 Use SSSE3 instructions on x86_64

Other flags

Name Default value Description Comment
ADD_GDB_INDEX_FOR_GOLD OFF Add .gdb-index to resulting binaries for gold linker. Ignored if lld is used
ARCH_NATIVE 0 Add -march=native compiler flag. This makes your binaries non-portable but more performant code may be generated. This option overrides ENABLE_* options for specific instruction set. Highly not recommended to use.
BUILD_STANDALONE_KEEPER OFF Build keeper as small standalone binary
CLICKHOUSE_SPLIT_BINARY OFF Make several binaries (clickhouse-server, clickhouse-client etc.) instead of one bundled
COMPILER_PIPE ON -pipe compiler option Less /tmp usage, more RAM usage.
ENABLE_BUILD_PATH_MAPPING ON Enable remap file source paths in debug info, predefined preprocessor macros and __builtin_FILE(). It's to generate reproducible builds. See https://reproducible-builds.org/docs/build-path Reproducible builds If turned ON, remap file source paths in debug info, predefined preprocessor macros and __builtin_FILE().
ENABLE_CHECK_HEAVY_BUILDS OFF Don't allow C++ translation units to compile too long or to take too much memory while compiling. Take care to add prlimit in command line before ccache, or else ccache thinks that prlimit is compiler, and clang++ is its input file, and refuses to work with multiple inputs, e.g in ccache log: [2021-03-31T18:06:32.655327 36900] Command line: /usr/bin/ccache prlimit --as=10000000000 --data=5000000000 --cpu=600 /usr/bin/clang++-11 - ...... std=gnu++2a -MD -MT src/CMakeFiles/dbms.dir/Storages/MergeTree/IMergeTreeDataPart.cpp.o -MF src/CMakeFiles/dbms.dir/Storages/MergeTree/IMergeTreeDataPart.cpp.o.d -o src/CMakeFiles/dbms.dir/Storages/MergeTree/IMergeTreeDataPart.cpp.o -c ../src/Storages/MergeTree/IMergeTreeDataPart.cpp [2021-03-31T18:06:32.656704 36900] Multiple input files: /usr/bin/clang++-11 and ../src/Storages/MergeTree/IMergeTreeDataPart.cpp Another way would be to use --ccache-skip option before clang++-11 to make ccache ignore it.
ENABLE_COLORED_BUILD ON Enable colored diagnostics in build log.
ENABLE_EXAMPLES OFF Build all example programs in 'examples' subdirectories
ENABLE_FUZZING OFF Fuzzy testing using libfuzzer
ENABLE_LIBRARIES ON Enable all external libraries by default Turns on all external libs like s3, kafka, ODBC, ...
ENABLE_MULTITARGET_CODE ON Enable platform-dependent code ClickHouse developers may use platform-dependent code under some macro (e.g. ifdef ENABLE_MULTITARGET). If turned ON, this option defines such macro. See src/Functions/TargetSpecific.h
ENABLE_TESTS ON Provide unit_test_dbms target with Google.Test unit tests If turned ON, assumes the user has either the system GTest library or the bundled one.
ENABLE_THINLTO ON Clang-specific link time optimization https://clang.llvm.org/docs/ThinLTO.html Applies to clang only. Disabled when building with tests or sanitizers.
FAIL_ON_UNSUPPORTED_OPTIONS_COMBINATION ON Stop/Fail CMake configuration if some ENABLE_XXX option is defined (either ON or OFF) but is not possible to satisfy If turned off: e.g. when ENABLE_FOO is ON, but FOO tool was not found, the CMake will continue.
GLIBC_COMPATIBILITY ON Enable compatibility with older glibc libraries. Only for Linux, x86_64 or aarch64.
SPLIT_DEBUG_SYMBOLS OFF Build stripped binaries with debug info in separate directory
LINKER_NAME OFF Linker name or full path Example values: lld-10, gold.
PARALLEL_COMPILE_JOBS "" Maximum number of concurrent compilation jobs 1 if not set
PARALLEL_LINK_JOBS "" Maximum number of concurrent link jobs 1 if not set
SANITIZE "" Enable one of the code sanitizers Possible values: - address (ASan) - memory (MSan) - thread (TSan) - undefined (UBSan) - "" (no sanitizing)
SPLIT_SHARED_LIBRARIES OFF Keep all internal libraries as separate .so files DEVELOPER ONLY. Faster linking if turned on.
STRIP_DEBUG_SYMBOLS_FUNCTIONS STRIP_DSF_DEFAULT Do not generate debugger info for ClickHouse functions Provides faster linking and lower binary size. Tradeoff is the inability to debug some source files with e.g. gdb (empty stack frames and no local variables)."
USE_DEBUG_HELPERS USE_DEBUG_HELPERS Enable debug helpers
USE_STATIC_LIBRARIES ON Disable to use shared libraries
USE_UNWIND ENABLE_LIBRARIES Enable libunwind (better stacktraces)
WERROR OFF Enable -Werror compiler option Using system libs can cause a lot of warnings in includes (on macro expansion).
WITH_COVERAGE OFF Profile the resulting binary/binaries Compiler-specific coverage flags e.g. -fcoverage-mapping for gcc

Developer's guide for adding new CMake options

Don't be obvious. Be informative.

Bad:

option (ENABLE_TESTS "Enables testing" OFF)

This description is quite useless as it neither gives the viewer any additional information nor explains the option purpose.

Better:

option(ENABLE_TESTS "Provide unit_test_dbms target with Google.test unit tests" OFF)

If the option's purpose can't be guessed by its name, or the purpose guess may be misleading, or option has some pre-conditions, leave a comment above the option() line and explain what it does. The best way would be linking the docs page (if it exists). The comment is parsed into a separate column (see below).

Even better:

# implies ${TESTS_ARE_ENABLED}
# see tests/CMakeLists.txt for implementation detail.
option(ENABLE_TESTS "Provide unit_test_dbms target with Google.test unit tests" OFF)

If the option's state could produce unwanted (or unusual) result, explicitly warn the user.

Suppose you have an option that may strip debug symbols from the ClickHouse part. This can speed up the linking process, but produces a binary that cannot be debugged. In that case, prefer explicitly raising a warning telling the developer that he may be doing something wrong. Also, such options should be disabled if applies.

Bad:

option(STRIP_DEBUG_SYMBOLS_FUNCTIONS
    "Do not generate debugger info for ClickHouse functions.
    ${STRIP_DSF_DEFAULT})

if (STRIP_DEBUG_SYMBOLS_FUNCTIONS)
    target_compile_options(clickhouse_functions PRIVATE "-g0")
endif()

Better:

# Provides faster linking and lower binary size.
# Tradeoff is the inability to debug some source files with e.g. gdb
# (empty stack frames and no local variables)."
option(STRIP_DEBUG_SYMBOLS_FUNCTIONS
    "Do not generate debugger info for ClickHouse functions."
    ${STRIP_DSF_DEFAULT})

if (STRIP_DEBUG_SYMBOLS_FUNCTIONS)
    message(WARNING "Not generating debugger info for ClickHouse functions")
    target_compile_options(clickhouse_functions PRIVATE "-g0")
endif()

In the option's description, explain WHAT the option does rather than WHY it does something.

The WHY explanation should be placed in the comment. You may find that the option's name is self-descriptive.

Bad:

option(ENABLE_THINLTO "Enable Thin LTO. Only applicable for clang. It's also suppressed when building with tests or sanitizers." ON)

Better:

# Only applicable for clang.
# Turned off when building with tests or sanitizers.
option(ENABLE_THINLTO "Clang-specific link time optimisation" ON).

Don't assume other developers know as much as you do.

In ClickHouse, there are many tools used that an ordinary developer may not know. If you are in doubt, give a link to the tool's docs. It won't take much of your time.

Bad:

option(ENABLE_THINLTO "Enable Thin LTO. Only applicable for clang. It's also suppressed when building with tests or sanitizers." ON)

Better (combined with the above hint):

# https://clang.llvm.org/docs/ThinLTO.html
# Only applicable for clang.
# Turned off when building with tests or sanitizers.
option(ENABLE_THINLTO "Clang-specific link time optimisation" ON).

Other example, bad:

option (USE_INCLUDE_WHAT_YOU_USE "Use 'include-what-you-use' tool" OFF)

Better:

# https://github.com/include-what-you-use/include-what-you-use
option (USE_INCLUDE_WHAT_YOU_USE "Reduce unneeded #include s (external tool)" OFF)

Prefer consistent default values.

CMake allows you to pass a plethora of values representing boolean true/false, e.g. 1, ON, YES, ....

Prefer the ON/OFF values, if possible.