Fix build

This commit is contained in:
Nikolai Kochetov 2020-09-25 16:03:12 +03:00
commit dea90009e3
607 changed files with 9467 additions and 4460 deletions

2
.gitmodules vendored
View File

@ -37,7 +37,7 @@
url = https://github.com/ClickHouse-Extras/mariadb-connector-c.git
[submodule "contrib/jemalloc"]
path = contrib/jemalloc
url = https://github.com/jemalloc/jemalloc.git
url = https://github.com/ClickHouse-Extras/jemalloc.git
[submodule "contrib/unixodbc"]
path = contrib/unixodbc
url = https://github.com/ClickHouse-Extras/UnixODBC.git

View File

@ -1,3 +1,77 @@
## ClickHouse release 20.9
### ClickHouse release v20.9.2.20-stable, 2020-09-22
#### New Feature
* Added column transformers `EXCEPT`, `REPLACE`, `APPLY`, which can be applied to the list of selected columns (after `*` or `COLUMNS(...)`). For example, you can write `SELECT * EXCEPT(URL) REPLACE(number + 1 AS number)`. Another example: `select * apply(length) apply(max) from wide_string_table` to find out the maxium length of all string columns. [#14233](https://github.com/ClickHouse/ClickHouse/pull/14233) ([Amos Bird](https://github.com/amosbird)).
* Added an aggregate function `rankCorr` which computes a rank correlation coefficient. [#11769](https://github.com/ClickHouse/ClickHouse/pull/11769) ([antikvist](https://github.com/antikvist)) [#14411](https://github.com/ClickHouse/ClickHouse/pull/14411) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Added table function `view` which turns a subquery into a table object. This helps passing queries around. For instance, it can be used in remote/cluster table functions. [#12567](https://github.com/ClickHouse/ClickHouse/pull/12567) ([Amos Bird](https://github.com/amosbird)).
#### Bug Fix
* Fix bug when `ALTER UPDATE` mutation with Nullable column in assignment expression and constant value (like `UPDATE x = 42`) leads to incorrect value in column or segfault. Fixes [#13634](https://github.com/ClickHouse/ClickHouse/issues/13634), [#14045](https://github.com/ClickHouse/ClickHouse/issues/14045). [#14646](https://github.com/ClickHouse/ClickHouse/pull/14646) ([alesapin](https://github.com/alesapin)).
* Fix wrong Decimal multiplication result caused wrong decimal scale of result column. [#14603](https://github.com/ClickHouse/ClickHouse/pull/14603) ([Artem Zuikov](https://github.com/4ertus2)).
* Fixed the incorrect sorting order of `Nullable` column. This fixes [#14344](https://github.com/ClickHouse/ClickHouse/issues/14344). [#14495](https://github.com/ClickHouse/ClickHouse/pull/14495) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fixed inconsistent comparison with primary key of type `FixedString` on index analysis if they're compered with a string of less size. This fixes https://github.com/ClickHouse/ClickHouse/issues/14908. [#15033](https://github.com/ClickHouse/ClickHouse/pull/15033) ([Amos Bird](https://github.com/amosbird)).
* Fix bug which leads to wrong merges assignment if table has partitions with a single part. [#14444](https://github.com/ClickHouse/ClickHouse/pull/14444) ([alesapin](https://github.com/alesapin)).
* If function `bar` was called with specifically crafted arguments, buffer overflow was possible. This closes [#13926](https://github.com/ClickHouse/ClickHouse/issues/13926). [#15028](https://github.com/ClickHouse/ClickHouse/pull/15028) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Publish CPU frequencies per logical core in `system.asynchronous_metrics`. This fixes https://github.com/ClickHouse/ClickHouse/issues/14923. [#14924](https://github.com/ClickHouse/ClickHouse/pull/14924) ([Alexander Kuzmenkov](https://github.com/akuzm)).
* Fixed `.metadata.tmp File exists` error when using `MaterializeMySQL` database engine. [#14898](https://github.com/ClickHouse/ClickHouse/pull/14898) ([Winter Zhang](https://github.com/zhang2014)).
* Fix the issue when some invocations of `extractAllGroups` function may trigger "Memory limit exceeded" error. This fixes [#13383](https://github.com/ClickHouse/ClickHouse/issues/13383). [#14889](https://github.com/ClickHouse/ClickHouse/pull/14889) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix SIGSEGV for an attempt to INSERT into StorageFile(fd). [#14887](https://github.com/ClickHouse/ClickHouse/pull/14887) ([Azat Khuzhin](https://github.com/azat)).
* Fix rare error in `SELECT` queries when the queried column has `DEFAULT` expression which depends on the other column which also has `DEFAULT` and not present in select query and not exists on disk. Partially fixes [#14531](https://github.com/ClickHouse/ClickHouse/issues/14531). [#14845](https://github.com/ClickHouse/ClickHouse/pull/14845) ([alesapin](https://github.com/alesapin)).
* Fix wrong monotonicity detection for shrunk `Int -> Int` cast of signed types. It might lead to incorrect query result. This bug is unveiled in [#14513](https://github.com/ClickHouse/ClickHouse/issues/14513). [#14783](https://github.com/ClickHouse/ClickHouse/pull/14783) ([Amos Bird](https://github.com/amosbird)).
* Fixed missed default database name in metadata of materialized view when executing `ALTER ... MODIFY QUERY`. [#14664](https://github.com/ClickHouse/ClickHouse/pull/14664) ([tavplubix](https://github.com/tavplubix)).
* Fix possibly incorrect result of function `has` when LowCardinality and Nullable types are involved. [#14591](https://github.com/ClickHouse/ClickHouse/pull/14591) ([Mike](https://github.com/myrrc)).
* Cleanup data directory after Zookeeper exceptions during CREATE query for tables with ReplicatedMergeTree Engine. [#14563](https://github.com/ClickHouse/ClickHouse/pull/14563) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix rare segfaults in functions with combinator `-Resample`, which could appear in result of overflow with very large parameters. [#14562](https://github.com/ClickHouse/ClickHouse/pull/14562) ([Anton Popov](https://github.com/CurtizJ)).
* Check for array size overflow in `topK` aggregate function. Without this check the user may send a query with carefully crafted parameters that will lead to server crash. This closes [#14452](https://github.com/ClickHouse/ClickHouse/issues/14452). [#14467](https://github.com/ClickHouse/ClickHouse/pull/14467) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Proxy restart/start/stop/reload of SysVinit to systemd (if it is used). [#14460](https://github.com/ClickHouse/ClickHouse/pull/14460) ([Azat Khuzhin](https://github.com/azat)).
* Stop query execution if exception happened in `PipelineExecutor` itself. This could prevent rare possible query hung. [#14334](https://github.com/ClickHouse/ClickHouse/pull/14334) [#14402](https://github.com/ClickHouse/ClickHouse/pull/14402) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix crash during `ALTER` query for table which was created `AS table_function`. Fixes [#14212](https://github.com/ClickHouse/ClickHouse/issues/14212). [#14326](https://github.com/ClickHouse/ClickHouse/pull/14326) ([alesapin](https://github.com/alesapin)).
* Fix exception during ALTER LIVE VIEW query with REFRESH command. LIVE VIEW is an experimental feature. [#14320](https://github.com/ClickHouse/ClickHouse/pull/14320) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix QueryPlan lifetime (for EXPLAIN PIPELINE graph=1) for queries with nested interpreter. [#14315](https://github.com/ClickHouse/ClickHouse/pull/14315) ([Azat Khuzhin](https://github.com/azat)).
* Better check for tuple size in SSD cache complex key external dictionaries. This fixes [#13981](https://github.com/ClickHouse/ClickHouse/issues/13981). [#14313](https://github.com/ClickHouse/ClickHouse/pull/14313) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Disallows `CODEC` on `ALIAS` column type. Fixes [#13911](https://github.com/ClickHouse/ClickHouse/issues/13911). [#14263](https://github.com/ClickHouse/ClickHouse/pull/14263) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix GRANT ALL statement when executed on a non-global level. [#13987](https://github.com/ClickHouse/ClickHouse/pull/13987) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix arrayJoin() capturing in lambda (exception with logical error message was thrown). [#13792](https://github.com/ClickHouse/ClickHouse/pull/13792) ([Azat Khuzhin](https://github.com/azat)).
#### Experimental Feature
* Added `db-generator` tool for random database generation by given SELECT queries. It may faciliate reproducing issues when there is only incomplete bug report from the user. [#14442](https://github.com/ClickHouse/ClickHouse/pull/14442) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)) [#10973](https://github.com/ClickHouse/ClickHouse/issues/10973) ([ZeDRoman](https://github.com/ZeDRoman)).
#### Improvement
* Allow using multi-volume storage configuration in storage Distributed. [#14839](https://github.com/ClickHouse/ClickHouse/pull/14839) ([Pavel Kovalenko](https://github.com/Jokser)).
* Disallow empty time_zone argument in `toStartOf*` type of functions. [#14509](https://github.com/ClickHouse/ClickHouse/pull/14509) ([Bharat Nallan](https://github.com/bharatnc)).
* MySQL handler returns `OK` for queries like `SET @@var = value`. Such statement is ignored. It is needed because some MySQL drivers send `SET @@` query for setup after handshake https://github.com/ClickHouse/ClickHouse/issues/9336#issuecomment-686222422 . [#14469](https://github.com/ClickHouse/ClickHouse/pull/14469) ([BohuTANG](https://github.com/BohuTANG)).
* Now TTLs will be applied during merge if they were not previously materialized. [#14438](https://github.com/ClickHouse/ClickHouse/pull/14438) ([alesapin](https://github.com/alesapin)).
* Now `clickhouse-obfuscator` supports UUID type as proposed in [#13163](https://github.com/ClickHouse/ClickHouse/issues/13163). [#14409](https://github.com/ClickHouse/ClickHouse/pull/14409) ([dimarub2000](https://github.com/dimarub2000)).
* Added new setting `system_events_show_zero_values` as proposed in [#11384](https://github.com/ClickHouse/ClickHouse/issues/11384). [#14404](https://github.com/ClickHouse/ClickHouse/pull/14404) ([dimarub2000](https://github.com/dimarub2000)).
* Implicitly convert primary key to not null in `MaterializeMySQL` (Same as `MySQL`). Fixes [#14114](https://github.com/ClickHouse/ClickHouse/issues/14114). [#14397](https://github.com/ClickHouse/ClickHouse/pull/14397) ([Winter Zhang](https://github.com/zhang2014)).
* Replace wide integers (256 bit) from boost multiprecision with implementation from https://github.com/cerevra/int. 256bit integers are experimental. [#14229](https://github.com/ClickHouse/ClickHouse/pull/14229) ([Artem Zuikov](https://github.com/4ertus2)).
* Add default compression codec for parts in `system.part_log` with the name `default_compression_codec`. [#14116](https://github.com/ClickHouse/ClickHouse/pull/14116) ([alesapin](https://github.com/alesapin)).
* Add precision argument for `DateTime` type. It allows to use `DateTime` name instead of `DateTime64`. [#13761](https://github.com/ClickHouse/ClickHouse/pull/13761) ([Winter Zhang](https://github.com/zhang2014)).
* Added requirepass authorization for `Redis` external dictionary. [#13688](https://github.com/ClickHouse/ClickHouse/pull/13688) ([Ivan Torgashov](https://github.com/it1804)).
* Improvements in `RabbitMQ` engine: added connection and channels failure handling, proper commits, insert failures handling, better exchanges, queue durability and queue resume opportunity, new queue settings. Fixed tests. [#12761](https://github.com/ClickHouse/ClickHouse/pull/12761) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support custom codecs in compact parts. [#12183](https://github.com/ClickHouse/ClickHouse/pull/12183) ([Anton Popov](https://github.com/CurtizJ)).
#### Performance Improvement
* Optimize queries with LIMIT/LIMIT BY/ORDER BY for distributed with GROUP BY sharding_key (under optimize_skip_unused_shards and optimize_distributed_group_by_sharding_key). [#10373](https://github.com/ClickHouse/ClickHouse/pull/10373) ([Azat Khuzhin](https://github.com/azat)).
* Creating sets for multiple `JOIN` and `IN` in parallel. It may slightly improve performance for queries with several different `IN subquery` expressions. [#14412](https://github.com/ClickHouse/ClickHouse/pull/14412) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Improve Kafka engine performance by providing independent thread for each consumer. Separate thread pool for streaming engines (like Kafka). [#13939](https://github.com/ClickHouse/ClickHouse/pull/13939) ([fastio](https://github.com/fastio)).
#### Build/Testing/Packaging Improvement
* Lower binary size in debug build by removing debug info from `Functions`. This is needed only for one internal project in Yandex who is using very old linker. [#14549](https://github.com/ClickHouse/ClickHouse/pull/14549) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Prepare for build with clang 11. [#14455](https://github.com/ClickHouse/ClickHouse/pull/14455) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix the logic in backport script. In previous versions it was triggered for any labels of 100% red color. It was strange. [#14433](https://github.com/ClickHouse/ClickHouse/pull/14433) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Integration tests use default base config. All config changes are explicit with main_configs, user_configs and dictionaries parameters for instance. [#13647](https://github.com/ClickHouse/ClickHouse/pull/13647) ([Ilya Yatsishin](https://github.com/qoega)).
## ClickHouse release 20.8
### ClickHouse release v20.8.2.3-stable, 2020-09-08

View File

@ -28,10 +28,11 @@ endforeach()
project(ClickHouse)
# If turned off: e.g. when ENABLE_FOO is ON, but FOO tool was not found, the CMake will continue.
option(FAIL_ON_UNSUPPORTED_OPTIONS_COMBINATION
"Stop/Fail CMake configuration if some ENABLE_XXX option is defined (either ON or OFF) but is not possible to satisfy"
ON
)
"Stop/Fail CMake configuration if some ENABLE_XXX option is defined (either ON or OFF)
but is not possible to satisfy" ON)
if(FAIL_ON_UNSUPPORTED_OPTIONS_COMBINATION)
set(RECONFIGURE_MESSAGE_LEVEL FATAL_ERROR)
else()
@ -58,7 +59,11 @@ set(CMAKE_DEBUG_POSTFIX "d" CACHE STRING "Generate debug library name with a pos
# For more info see https://cmake.org/cmake/help/latest/prop_gbl/USE_FOLDERS.html
set_property(GLOBAL PROPERTY USE_FOLDERS ON)
option(ENABLE_IPO "Enable full link time optimization (it's usually impractical; see also ENABLE_THINLTO)" OFF) # need cmake 3.9+
# cmake 3.9+ needed.
# Usually impractical.
# See also ${ENABLE_THINLTO}
option(ENABLE_IPO "Full link time optimization")
if(ENABLE_IPO)
cmake_policy(SET CMP0069 NEW)
include(CheckIPOSupported)
@ -80,6 +85,11 @@ endif ()
include (cmake/find/ccache.cmake)
option(ENABLE_CHECK_HEAVY_BUILDS "Don't allow C++ translation units to compile too long or to take too much memory while compiling" OFF)
if (ENABLE_CHECK_HEAVY_BUILDS)
set (CMAKE_CXX_COMPILER_LAUNCHER prlimit --rss=10000000 --cpu=600)
endif ()
if (NOT CMAKE_BUILD_TYPE OR CMAKE_BUILD_TYPE STREQUAL "None")
set (CMAKE_BUILD_TYPE "RelWithDebInfo")
message (STATUS "CMAKE_BUILD_TYPE is not set, set to default = ${CMAKE_BUILD_TYPE}")
@ -88,11 +98,16 @@ message (STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
string (TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE_UC)
option (USE_STATIC_LIBRARIES "Set to FALSE to use shared libraries" ON)
option (MAKE_STATIC_LIBRARIES "Set to FALSE to make shared libraries" ${USE_STATIC_LIBRARIES})
option(USE_STATIC_LIBRARIES "Disable to use shared libraries" ON)
option(MAKE_STATIC_LIBRARIES "Disable to make shared libraries" ${USE_STATIC_LIBRARIES})
if (NOT MAKE_STATIC_LIBRARIES)
option (SPLIT_SHARED_LIBRARIES "DEV ONLY. Keep all internal libs as separate .so for faster linking" OFF)
option (CLICKHOUSE_SPLIT_BINARY "Make several binaries instead one bundled (clickhouse-server, clickhouse-client, ... )" OFF)
# DEVELOPER ONLY.
# Faster linking if turned on.
option(SPLIT_SHARED_LIBRARIES "Keep all internal libraries as separate .so files")
option(CLICKHOUSE_SPLIT_BINARY
"Make several binaries (clickhouse-server, clickhouse-client etc.) instead of one bundled")
endif ()
if (MAKE_STATIC_LIBRARIES AND SPLIT_SHARED_LIBRARIES)
@ -107,7 +122,8 @@ if (USE_STATIC_LIBRARIES)
list(REVERSE CMAKE_FIND_LIBRARY_SUFFIXES)
endif ()
option (ENABLE_FUZZING "Enables fuzzing instrumentation" OFF)
# Implies ${WITH_COVERAGE}
option (ENABLE_FUZZING "Fuzzy testing using libfuzzer" OFF)
if (ENABLE_FUZZING)
message (STATUS "Fuzzing instrumentation enabled")
@ -139,10 +155,13 @@ if (COMPILER_CLANG)
endif ()
endif ()
option (ENABLE_TESTS "Enables tests" ON)
# If turned `ON`, assumes the user has either the system GTest library or the bundled one.
option(ENABLE_TESTS "Provide unit_test_dbms target with Google.Test unit tests" ON)
if (OS_LINUX AND NOT UNBUNDLED AND MAKE_STATIC_LIBRARIES AND NOT SPLIT_SHARED_LIBRARIES AND CMAKE_VERSION VERSION_GREATER "3.9.0")
option (GLIBC_COMPATIBILITY "Set to TRUE to enable compatibility with older glibc libraries. Only for x86_64, Linux. Implies ENABLE_FASTMEMCPY." ON)
# Only for Linux, x86_64.
# Implies ${ENABLE_FASTMEMCPY}
option(GLIBC_COMPATIBILITY "Enable compatibility with older glibc libraries." ON)
elseif(GLIBC_COMPATIBILITY)
message (${RECONFIGURE_MESSAGE_LEVEL} "Glibc compatibility cannot be enabled in current configuration")
endif ()
@ -175,7 +194,9 @@ else ()
set(NO_WHOLE_ARCHIVE --no-whole-archive)
endif ()
option (ADD_GDB_INDEX_FOR_GOLD "Set to add .gdb-index to resulting binaries for gold linker. NOOP if lld is used." 0)
# Ignored if `lld` is used
option(ADD_GDB_INDEX_FOR_GOLD "Add .gdb-index to resulting binaries for gold linker.")
if (NOT CMAKE_BUILD_TYPE_UC STREQUAL "RELEASE")
if (LINKER_NAME STREQUAL "lld")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gdb-index")
@ -196,9 +217,13 @@ if (NOT CMAKE_BUILD_TYPE_UC STREQUAL "RELEASE")
endif()
cmake_host_system_information(RESULT AVAILABLE_PHYSICAL_MEMORY QUERY AVAILABLE_PHYSICAL_MEMORY) # Not available under freebsd
if(NOT AVAILABLE_PHYSICAL_MEMORY OR AVAILABLE_PHYSICAL_MEMORY GREATER 8000)
option(COMPILER_PIPE "-pipe compiler option [less /tmp usage, more ram usage]" ON)
# Less `/tmp` usage, more RAM usage.
option(COMPILER_PIPE "-pipe compiler option" ON)
endif()
if(COMPILER_PIPE)
set(COMPILER_FLAGS "${COMPILER_FLAGS} -pipe")
else()
@ -209,7 +234,8 @@ if(NOT DISABLE_CPU_OPTIMIZE)
include(cmake/cpu_features.cmake)
endif()
option(ARCH_NATIVE "Enable -march=native compiler flag" 0)
option(ARCH_NATIVE "Add -march=native compiler flag")
if (ARCH_NATIVE)
set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=native")
endif ()
@ -220,6 +246,7 @@ if (UNBUNDLED AND (COMPILER_GCC OR COMPILER_CLANG))
else()
set (_CXX_STANDARD "-std=c++2a")
endif()
# cmake < 3.12 doesn't support 20. We'll set CMAKE_CXX_FLAGS for now
# set (CMAKE_CXX_STANDARD 20)
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${_CXX_STANDARD}")
@ -232,7 +259,8 @@ if (COMPILER_GCC OR COMPILER_CLANG)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsized-deallocation")
endif ()
option(WITH_COVERAGE "Build with coverage." 0)
# Compiler-specific coverage flags e.g. -fcoverage-mapping for gcc
option(WITH_COVERAGE "Profile the resulting binary/binaries" OFF)
if (WITH_COVERAGE AND COMPILER_CLANG)
set(COMPILER_FLAGS "${COMPILER_FLAGS} -fprofile-instr-generate -fcoverage-mapping")
@ -266,10 +294,13 @@ if (COMPILER_CLANG)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fdiagnostics-absolute-paths")
if (NOT ENABLE_TESTS AND NOT SANITIZE)
option(ENABLE_THINLTO "Enable Thin LTO. Only applicable for clang. It's also suppressed when building with tests or sanitizers." ON)
# https://clang.llvm.org/docs/ThinLTO.html
# Applies to clang only.
# Disabled when building with tests or sanitizers.
option(ENABLE_THINLTO "Clang-specific link time optimization" ON)
endif()
# We cannot afford to use LTO when compiling unitests, and it's not enough
# We cannot afford to use LTO when compiling unit tests, and it's not enough
# to only supply -fno-lto at the final linking stage. So we disable it
# completely.
if (ENABLE_THINLTO AND NOT ENABLE_TESTS AND NOT SANITIZE)
@ -282,8 +313,8 @@ if (COMPILER_CLANG)
endif ()
# Always prefer llvm tools when using clang. For instance, we cannot use GNU ar when llvm LTO is enabled
find_program (LLVM_AR_PATH NAMES "llvm-ar" "llvm-ar-10" "llvm-ar-9" "llvm-ar-8")
if (LLVM_AR_PATH)
message(STATUS "Using llvm-ar: ${LLVM_AR_PATH}.")
set (CMAKE_AR ${LLVM_AR_PATH})
@ -292,30 +323,38 @@ if (COMPILER_CLANG)
endif ()
find_program (LLVM_RANLIB_PATH NAMES "llvm-ranlib" "llvm-ranlib-10" "llvm-ranlib-9" "llvm-ranlib-8")
if (LLVM_RANLIB_PATH)
message(STATUS "Using llvm-ranlib: ${LLVM_RANLIB_PATH}.")
set (CMAKE_RANLIB ${LLVM_RANLIB_PATH})
else ()
message(WARNING "Cannot find llvm-ranlib. System ranlib will be used instead. It does not work with ThinLTO.")
endif ()
elseif (ENABLE_THINLTO)
message (${RECONFIGURE_MESSAGE_LEVEL} "ThinLTO is only available with CLang")
endif ()
option (ENABLE_LIBRARIES "Enable all libraries (Global default switch)" ON)
# Turns on all external libs like s3, kafka, ODBC, ...
option(ENABLE_LIBRARIES "Enable all external libraries by default" ON)
# We recommend avoiding this mode for production builds because we can't guarantee all needed libraries exist in your
# system.
# This mode exists for enthusiastic developers who are searching for trouble.
# Useful for maintainers of OS packages.
option (UNBUNDLED "Use system libraries instead of ones in contrib/" OFF)
option (UNBUNDLED "Try find all libraries in system. We recommend to avoid this mode for production builds, because we cannot guarantee exact versions and variants of libraries your system has installed. This mode exists for enthusiastic developers who search for trouble. Also it is useful for maintainers of OS packages." OFF)
if (UNBUNDLED)
set(NOT_UNBUNDLED 0)
set(NOT_UNBUNDLED OFF)
else ()
set(NOT_UNBUNDLED 1)
set(NOT_UNBUNDLED ON)
endif ()
if (UNBUNDLED OR NOT (OS_LINUX OR OS_DARWIN))
# Using system libs can cause a lot of warnings in includes (on macro expansion).
option (WERROR "Enable -Werror compiler option" OFF)
option(WERROR "Enable -Werror compiler option" OFF)
else ()
option (WERROR "Enable -Werror compiler option" ON)
option(WERROR "Enable -Werror compiler option" ON)
endif ()
if (WERROR)
@ -357,8 +396,9 @@ else ()
set (CMAKE_POSITION_INDEPENDENT_CODE ON)
endif ()
# Using "include-what-you-use" tool.
option (USE_INCLUDE_WHAT_YOU_USE "Use 'include-what-you-use' tool" OFF)
# https://github.com/include-what-you-use/include-what-you-use
option (USE_INCLUDE_WHAT_YOU_USE "Automatically reduce unneeded includes in source code (external tool)" OFF)
if (USE_INCLUDE_WHAT_YOU_USE)
find_program(IWYU_PATH NAMES include-what-you-use iwyu)
if (NOT IWYU_PATH)
@ -370,8 +410,11 @@ if (USE_INCLUDE_WHAT_YOU_USE)
endif ()
if (ENABLE_TESTS)
message (STATUS "Tests are enabled")
message (STATUS "Unit tests are enabled")
else()
message(STATUS "Unit tests are disabled")
endif ()
enable_testing() # Enable for tests without binary
# when installing to /usr - place configs to /etc but for /usr/local place to /usr/local/etc
@ -381,7 +424,13 @@ else ()
set (CLICKHOUSE_ETC_DIR "${CMAKE_INSTALL_PREFIX}/etc")
endif ()
message (STATUS "Building for: ${CMAKE_SYSTEM} ${CMAKE_SYSTEM_PROCESSOR} ${CMAKE_LIBRARY_ARCHITECTURE} ; USE_STATIC_LIBRARIES=${USE_STATIC_LIBRARIES} MAKE_STATIC_LIBRARIES=${MAKE_STATIC_LIBRARIES} SPLIT_SHARED=${SPLIT_SHARED_LIBRARIES} UNBUNDLED=${UNBUNDLED} CCACHE=${CCACHE_FOUND} ${CCACHE_VERSION}")
message (STATUS
"Building for: ${CMAKE_SYSTEM} ${CMAKE_SYSTEM_PROCESSOR} ${CMAKE_LIBRARY_ARCHITECTURE} ;
USE_STATIC_LIBRARIES=${USE_STATIC_LIBRARIES}
MAKE_STATIC_LIBRARIES=${MAKE_STATIC_LIBRARIES}
SPLIT_SHARED=${SPLIT_SHARED_LIBRARIES}
UNBUNDLED=${UNBUNDLED}
CCACHE=${CCACHE_FOUND} ${CCACHE_VERSION}")
include (GNUInstallDirs)
include (cmake/contrib_finder.cmake)
@ -404,7 +453,6 @@ include (cmake/find/amqpcpp.cmake)
include (cmake/find/capnp.cmake)
include (cmake/find/llvm.cmake)
include (cmake/find/termcap.cmake) # for external static llvm
include (cmake/find/opencl.cmake)
include (cmake/find/h3.cmake)
include (cmake/find/libxml2.cmake)
include (cmake/find/brotli.cmake)
@ -450,13 +498,6 @@ include (cmake/find/mysqlclient.cmake)
# When testing for memory leaks with Valgrind, don't link tcmalloc or jemalloc.
if (USE_OPENCL)
if (OS_DARWIN)
set(OPENCL_LINKER_FLAGS "-framework OpenCL")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OPENCL_LINKER_FLAGS}")
endif ()
endif ()
include (cmake/print_flags.cmake)
if (TARGET global-group)

View File

@ -17,5 +17,5 @@ ClickHouse is an open-source column-oriented database management system that all
## Upcoming Events
* [eBay migrating from Druid](https://us02web.zoom.us/webinar/register/tZMkfu6rpjItHtaQ1DXcgPWcSOnmM73HLGKL) on September 23, 2020.
* [ClickHouse for Edge Analytics](https://ones2020.sched.com/event/bWPs) on September 29, 2020.
* [ClickHouse online meetup (in Russian)](https://clck.ru/R2zB9) on October 1, 2020.

View File

@ -18,6 +18,7 @@ set (SRCS
terminalColors.cpp
errnoToString.cpp
getResource.cpp
StringRef.cpp
)
if (ENABLE_REPLXX)

13
base/common/StringRef.cpp Normal file
View File

@ -0,0 +1,13 @@
#include <ostream>
#include "StringRef.h"
std::ostream & operator<<(std::ostream & os, const StringRef & str)
{
if (str.data)
os.write(str.data, str.size);
return os;
}

View File

@ -4,7 +4,7 @@
#include <string>
#include <vector>
#include <functional>
#include <ostream>
#include <iosfwd>
#include <common/types.h>
#include <common/unaligned.h>
@ -322,10 +322,4 @@ inline bool operator==(StringRef lhs, const char * rhs)
return true;
}
inline std::ostream & operator<<(std::ostream & os, const StringRef & str)
{
if (str.data)
os.write(str.data, str.size);
return os;
}
std::ostream & operator<<(std::ostream & os, const StringRef & str);

View File

@ -31,8 +31,8 @@ namespace common
template <>
inline bool addOverflow(__int128 x, __int128 y, __int128 & res)
{
static constexpr __int128 min_int128 = __int128(0x8000000000000000ll) << 64;
static constexpr __int128 max_int128 = (__int128(0x7fffffffffffffffll) << 64) + 0xffffffffffffffffll;
static constexpr __int128 min_int128 = minInt128();
static constexpr __int128 max_int128 = maxInt128();
res = x + y;
return (y > 0 && x > max_int128 - y) || (y < 0 && x < min_int128 - y);
}
@ -79,8 +79,8 @@ namespace common
template <>
inline bool subOverflow(__int128 x, __int128 y, __int128 & res)
{
static constexpr __int128 min_int128 = __int128(0x8000000000000000ll) << 64;
static constexpr __int128 max_int128 = (__int128(0x7fffffffffffffffll) << 64) + 0xffffffffffffffffll;
static constexpr __int128 min_int128 = minInt128();
static constexpr __int128 max_int128 = maxInt128();
res = x - y;
return (y < 0 && x > max_int128 + y) || (y > 0 && x < min_int128 + y);
}

View File

@ -3,12 +3,11 @@
#if WITH_COVERAGE
# include <mutex>
# include <unistd.h>
# if defined(__clang__)
extern "C" void __llvm_profile_dump();
extern "C" void __llvm_profile_dump(); // NOLINT
# elif defined(__GNUC__) || defined(__GNUG__)
extern "C" void __gcov_exit();
# endif
@ -23,7 +22,7 @@ void dumpCoverageReportIfPossible()
std::lock_guard lock(mutex);
# if defined(__clang__)
__llvm_profile_dump();
__llvm_profile_dump(); // NOLINT
# elif defined(__GNUC__) || defined(__GNUG__)
__gcov_exit();
# endif

View File

@ -13,6 +13,9 @@ using wUInt256 = wide::integer<256, unsigned>;
static_assert(sizeof(wInt256) == 32);
static_assert(sizeof(wUInt256) == 32);
static constexpr __int128 minInt128() { return static_cast<unsigned __int128>(1) << 127; }
static constexpr __int128 maxInt128() { return (static_cast<unsigned __int128>(1) << 127) - 1; }
/// The standard library type traits, such as std::is_arithmetic, with one exception
/// (std::common_type), are "set in stone". Attempting to specialize them causes undefined behavior.
/// So instead of using the std type_traits, we use our own version which allows extension.

View File

@ -372,7 +372,7 @@ static inline char * writeLeadingMinus(char * pos)
static inline char * writeSIntText(int128_t x, char * pos)
{
static const int128_t min_int128 = int128_t(0x8000000000000000ll) << 64;
static constexpr int128_t min_int128 = uint128_t(1) << 127;
if (unlikely(x == min_int128))
{

View File

@ -14,7 +14,7 @@
# pragma clang diagnostic ignored "-Wunused-macros"
#endif
#define __msan_unpoison(X, Y)
#define __msan_unpoison(X, Y) // NOLINT
#if defined(__has_feature)
# if __has_feature(memory_sanitizer)
# undef __msan_unpoison
@ -84,7 +84,7 @@ extern "C"
#ifdef ADDRESS_SANITIZER
void __lsan_ignore_object(const void *);
#else
void __lsan_ignore_object(const void *) {}
void __lsan_ignore_object(const void *) {} // NOLINT
#endif
}

View File

@ -53,6 +53,7 @@ SRCS(
setTerminalEcho.cpp
shift10.cpp
sleep.cpp
StringRef.cpp
terminalColors.cpp
)

View File

@ -781,7 +781,7 @@ void BaseDaemon::initializeTerminationAndSignalProcessing()
void BaseDaemon::logRevision() const
{
Poco::Logger::root().information("Starting " + std::string{VERSION_FULL}
+ " with revision " + std::to_string(ClickHouseRevision::get())
+ " with revision " + std::to_string(ClickHouseRevision::getVersionRevision())
+ ", " + build_id_info
+ ", PID " + std::to_string(getpid()));
}

127
benchmark/hardware.sh Executable file
View File

@ -0,0 +1,127 @@
#!/bin/bash -e
if [[ -n $1 ]]; then
SCALE=$1
else
SCALE=100
fi
TABLE="hits_${SCALE}m_obfuscated"
DATASET="${TABLE}_v1.tar.xz"
QUERIES_FILE="queries.sql"
TRIES=3
AMD64_BIN_URL="https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_build_check/gcc-10_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse"
AARCH64_BIN_URL="https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_special_build_check/clang-10-aarch64_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse"
# Note: on older Ubuntu versions, 'axel' does not support IPv6. If you are using IPv6-only servers on very old Ubuntu, just don't install 'axel'.
FASTER_DOWNLOAD=wget
if command -v axel >/dev/null; then
FASTER_DOWNLOAD=axel
else
echo "It's recommended to install 'axel' for faster downloads."
fi
if command -v pixz >/dev/null; then
TAR_PARAMS='-Ipixz'
else
echo "It's recommended to install 'pixz' for faster decompression of the dataset."
fi
mkdir -p clickhouse-benchmark-$SCALE
pushd clickhouse-benchmark-$SCALE
if [[ ! -f clickhouse ]]; then
CPU=$(uname -m)
if [[ ($CPU == x86_64) || ($CPU == amd64) ]]; then
$FASTER_DOWNLOAD "$AMD64_BIN_URL"
elif [[ $CPU == aarch64 ]]; then
$FASTER_DOWNLOAD "$AARCH64_BIN_URL"
else
echo "Unsupported CPU type: $CPU"
exit 1
fi
fi
chmod a+x clickhouse
if [[ ! -f $QUERIES_FILE ]]; then
wget "https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/benchmark/clickhouse/$QUERIES_FILE"
fi
if [[ ! -d data ]]; then
if [[ ! -f $DATASET ]]; then
$FASTER_DOWNLOAD "https://clickhouse-datasets.s3.yandex.net/hits/partitions/$DATASET"
fi
tar $TAR_PARAMS --strip-components=1 --directory=. -x -v -f $DATASET
fi
uptime
echo "Starting clickhouse-server"
./clickhouse server > server.log 2>&1 &
PID=$!
function finish {
kill $PID
wait
}
trap finish EXIT
echo "Waiting for clickhouse-server to start"
for i in {1..30}; do
sleep 1
./clickhouse client --query "SELECT 'The dataset size is: ', count() FROM $TABLE" 2>/dev/null && break || echo '.'
if [[ $i == 30 ]]; then exit 1; fi
done
echo
echo "Will perform benchmark. Results:"
echo
cat "$QUERIES_FILE" | sed "s/{table}/${TABLE}/g" | while read query; do
sync
echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null
echo -n "["
for i in $(seq 1 $TRIES); do
RES=$(./clickhouse client --max_memory_usage 100000000000 --time --format=Null --query="$query" 2>&1 ||:)
[[ "$?" == "0" ]] && echo -n "${RES}" || echo -n "null"
[[ "$i" != $TRIES ]] && echo -n ", "
done
echo "],"
done
echo
echo "Benchmark complete. System info:"
echo
echo '----Version, build id-----------'
./clickhouse local --query "SELECT format('Version: {}, build id: {}', version(), buildId())"
./clickhouse local --query "SELECT format('The number of threads is: {}', value) FROM system.settings WHERE name = 'max_threads'" --output-format TSVRaw
./clickhouse local --query "SELECT format('Current time: {}', toString(now(), 'UTC'))"
echo '----CPU-------------------------'
cat /proc/cpuinfo | grep -i -F 'model name' | uniq
lscpu
echo '----Block Devices---------------'
lsblk
echo '----Disk Free and Total--------'
df -h .
echo '----Memory Free and Total-------'
free -h
echo '----Physical Memory Amount------'
cat /proc/meminfo | grep MemTotal
echo '----RAID Info-------------------'
cat /proc/mdstat
#echo '----PCI-------------------------'
#lspci
#echo '----All Hardware Info-----------'
#lshw
echo '--------------------------------'
echo

View File

@ -1,20 +1,28 @@
# This file configures static analysis tools that can be integrated to the build process
# https://clang.llvm.org/extra/clang-tidy/
option (ENABLE_CLANG_TIDY "Use clang-tidy static analyzer" OFF)
option (ENABLE_CLANG_TIDY "Use 'clang-tidy' static analyzer if present" OFF)
if (ENABLE_CLANG_TIDY)
if (${CMAKE_VERSION} VERSION_LESS "3.6.0")
message(FATAL_ERROR "clang-tidy requires CMake version at least 3.6.")
endif()
find_program (CLANG_TIDY_PATH NAMES "clang-tidy" "clang-tidy-10" "clang-tidy-9" "clang-tidy-8")
if (CLANG_TIDY_PATH)
message(STATUS "Using clang-tidy: ${CLANG_TIDY_PATH}. The checks will be run during build process. See the .clang-tidy file at the root directory to configure the checks.")
set (USE_CLANG_TIDY 1)
message(STATUS
"Using clang-tidy: ${CLANG_TIDY_PATH}.
The checks will be run during build process.
See the .clang-tidy file at the root directory to configure the checks.")
set (USE_CLANG_TIDY ON)
# The variable CMAKE_CXX_CLANG_TIDY will be set inside src and base directories with non third-party code.
# set (CMAKE_CXX_CLANG_TIDY "${CLANG_TIDY_PATH}")
elseif (FAIL_ON_UNSUPPORTED_OPTIONS_COMBINATION)
message(FATAL_ERROR "clang-tidy is not found")
else ()
message(STATUS "clang-tidy is not found. This is normal - the tool is only used for static code analysis and isn't essential for the build.")
message(STATUS
"clang-tidy is not found.
This is normal - the tool is only used for code static analysis and isn't essential for the build.")
endif ()
endif ()

View File

@ -18,7 +18,8 @@ if (NOT CCACHE_FOUND AND NOT DEFINED ENABLE_CCACHE AND NOT COMPILER_MATCHES_CCAC
"Setting it up will significantly reduce compilation time for 2nd and consequent builds")
endif()
option(ENABLE_CCACHE "Speedup re-compilations using ccache" ${ENABLE_CCACHE_BY_DEFAULT})
# https://ccache.dev/
option(ENABLE_CCACHE "Speedup re-compilations using ccache (external tool)" ${ENABLE_CCACHE_BY_DEFAULT})
if (NOT ENABLE_CCACHE)
return()

View File

@ -4,13 +4,16 @@ if (NOT USE_LIBCXX)
if (USE_INTERNAL_LIBCXX_LIBRARY)
message (${RECONFIGURE_MESSAGE_LEVEL} "Cannot use internal libcxx with USE_LIBCXX=OFF")
endif()
target_link_libraries(global-libs INTERFACE -l:libstdc++.a -l:libstdc++fs.a) # Always link these libraries as static
target_link_libraries(global-libs INTERFACE ${EXCEPTION_HANDLING_LIBRARY})
return()
endif()
set(USE_INTERNAL_LIBCXX_LIBRARY_DEFAULT ${NOT_UNBUNDLED})
option (USE_INTERNAL_LIBCXX_LIBRARY "Set to FALSE to use system libcxx and libcxxabi libraries instead of bundled" ${USE_INTERNAL_LIBCXX_LIBRARY_DEFAULT})
option (USE_INTERNAL_LIBCXX_LIBRARY "Disable to use system libcxx and libcxxabi libraries instead of bundled"
${USE_INTERNAL_LIBCXX_LIBRARY_DEFAULT})
if(NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libcxx/CMakeLists.txt")
if (USE_INTERNAL_LIBCXX_LIBRARY)

View File

@ -1,11 +1,4 @@
option (ENABLE_GTEST_LIBRARY "Enable gtest library" ${ENABLE_LIBRARIES})
if (NOT ENABLE_GTEST_LIBRARY)
if(USE_INTERNAL_GTEST_LIBRARY)
message (${RECONFIGURE_MESSAGE_LEVEL} "Cannot use internal Google Test when ENABLE_GTEST_LIBRARY=OFF")
endif()
return()
endif()
# included only if ENABLE_TESTS=1
option (USE_INTERNAL_GTEST_LIBRARY "Set to FALSE to use system Google Test instead of bundled" ${NOT_UNBUNDLED})
@ -15,6 +8,7 @@ if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest/CMakeList
message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find internal gtest")
set (USE_INTERNAL_GTEST_LIBRARY 0)
endif ()
set (MISSING_INTERNAL_GTEST_LIBRARY 1)
endif ()

View File

@ -1,25 +0,0 @@
# TODO: enable by default
if(0)
option(ENABLE_OPENCL "Enable OpenCL support" ${ENABLE_LIBRARIES})
endif()
if(NOT ENABLE_OPENCL)
return()
endif()
# Intel OpenCl driver: sudo apt install intel-opencl-icd
# @sa https://github.com/intel/compute-runtime/releases
# OpenCL applications should link with ICD loader
# sudo apt install opencl-headers ocl-icd-libopencl1
# sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so
# TODO: add https://github.com/OCL-dev/ocl-icd as submodule instead
find_package(OpenCL)
if(OpenCL_FOUND)
set(USE_OPENCL 1)
else()
message (${RECONFIGURE_MESSAGE_LEVEL} "Can't enable OpenCL support")
endif()
message(STATUS "Using opencl=${USE_OPENCL}: ${OpenCL_INCLUDE_DIRS} : ${OpenCL_LIBRARIES}")

View File

@ -1,4 +1,5 @@
set (SENTRY_LIBRARY "sentry")
set (SENTRY_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/sentry-native/include")
if (NOT EXISTS "${SENTRY_INCLUDE_DIR}/sentry.h")
message (WARNING "submodule contrib/sentry-native is missing. to fix try run: \n git submodule update --init --recursive")

View File

@ -1,4 +1,4 @@
option(USE_SNAPPY "Enable support of snappy library" ${ENABLE_LIBRARIES})
option(USE_SNAPPY "Enable snappy library" ${ENABLE_LIBRARIES})
if(NOT USE_SNAPPY)
if (USE_INTERNAL_SNAPPY_LIBRARY)

View File

@ -1,11 +1,12 @@
option (FUZZER "Enable fuzzer: libfuzzer")
# see ./CMakeLists.txt for variable declaration
if (FUZZER)
if (FUZZER STREQUAL "libfuzzer")
# NOTE: Eldar Zaitov decided to name it "libfuzzer" instead of "fuzzer" to keep in mind another possible fuzzer backends.
# NOTE: no-link means that all the targets are built with instrumentation for fuzzer, but only some of them (tests) have entry point for fuzzer and it's not checked.
# NOTE: no-link means that all the targets are built with instrumentation for fuzzer, but only some of them
# (tests) have entry point for fuzzer and it's not checked.
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} -fsanitize=fuzzer-no-link")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} -fsanitize=fuzzer-no-link")
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fsanitize=fuzzer-no-link")
endif()
@ -14,7 +15,6 @@ if (FUZZER)
if (NOT LIB_FUZZING_ENGINE)
set (LIB_FUZZING_ENGINE "-fsanitize=fuzzer")
endif ()
else ()
message (FATAL_ERROR "Unknown fuzzer type: ${FUZZER}")
endif ()

View File

@ -6,26 +6,35 @@
cmake_host_system_information(RESULT AVAILABLE_PHYSICAL_MEMORY QUERY AVAILABLE_PHYSICAL_MEMORY) # Not available under freebsd
cmake_host_system_information(RESULT NUMBER_OF_LOGICAL_CORES QUERY NUMBER_OF_LOGICAL_CORES)
option(PARALLEL_COMPILE_JOBS "Define the maximum number of concurrent compilation jobs" "")
# 1 if not set
option(PARALLEL_COMPILE_JOBS "Maximum number of concurrent compilation jobs" "")
# 1 if not set
option(PARALLEL_LINK_JOBS "Maximum number of concurrent link jobs" "")
if (NOT PARALLEL_COMPILE_JOBS AND AVAILABLE_PHYSICAL_MEMORY AND MAX_COMPILER_MEMORY)
math(EXPR PARALLEL_COMPILE_JOBS ${AVAILABLE_PHYSICAL_MEMORY}/${MAX_COMPILER_MEMORY})
if (NOT PARALLEL_COMPILE_JOBS)
set (PARALLEL_COMPILE_JOBS 1)
endif ()
endif ()
if (PARALLEL_COMPILE_JOBS AND (NOT NUMBER_OF_LOGICAL_CORES OR PARALLEL_COMPILE_JOBS LESS NUMBER_OF_LOGICAL_CORES))
set(CMAKE_JOB_POOL_COMPILE compile_job_pool${CMAKE_CURRENT_SOURCE_DIR})
string (REGEX REPLACE "[^a-zA-Z0-9]+" "_" CMAKE_JOB_POOL_COMPILE ${CMAKE_JOB_POOL_COMPILE})
set_property(GLOBAL APPEND PROPERTY JOB_POOLS ${CMAKE_JOB_POOL_COMPILE}=${PARALLEL_COMPILE_JOBS})
endif ()
option(PARALLEL_LINK_JOBS "Define the maximum number of concurrent link jobs" "")
if (NOT PARALLEL_LINK_JOBS AND AVAILABLE_PHYSICAL_MEMORY AND MAX_LINKER_MEMORY)
math(EXPR PARALLEL_LINK_JOBS ${AVAILABLE_PHYSICAL_MEMORY}/${MAX_LINKER_MEMORY})
if (NOT PARALLEL_LINK_JOBS)
set (PARALLEL_LINK_JOBS 1)
endif ()
endif ()
if (PARALLEL_LINK_JOBS AND (NOT NUMBER_OF_LOGICAL_CORES OR PARALLEL_COMPILE_JOBS LESS NUMBER_OF_LOGICAL_CORES))
set(CMAKE_JOB_POOL_LINK link_job_pool${CMAKE_CURRENT_SOURCE_DIR})
string (REGEX REPLACE "[^a-zA-Z0-9]+" "_" CMAKE_JOB_POOL_LINK ${CMAKE_JOB_POOL_LINK})
@ -33,5 +42,7 @@ if (PARALLEL_LINK_JOBS AND (NOT NUMBER_OF_LOGICAL_CORES OR PARALLEL_COMPILE_JOBS
endif ()
if (PARALLEL_COMPILE_JOBS OR PARALLEL_LINK_JOBS)
message(STATUS "${CMAKE_CURRENT_SOURCE_DIR}: Have ${AVAILABLE_PHYSICAL_MEMORY} megabytes of memory. Limiting concurrent linkers jobs to ${PARALLEL_LINK_JOBS} and compiler jobs to ${PARALLEL_COMPILE_JOBS}")
message(STATUS
"${CMAKE_CURRENT_SOURCE_DIR}: Have ${AVAILABLE_PHYSICAL_MEMORY} megabytes of memory.
Limiting concurrent linkers jobs to ${PARALLEL_LINK_JOBS} and compiler jobs to ${PARALLEL_COMPILE_JOBS}")
endif ()

View File

@ -1,4 +1,5 @@
option (SANITIZE "Enable sanitizer: address, memory, thread, undefined" "")
# Possible values: `address` (ASan), `memory` (MSan), `thread` (TSan), `undefined` (UBSan), and "" (no sanitizing)
option (SANITIZE "Enable one of the code sanitizers" "")
set (SAN_FLAGS "${SAN_FLAGS} -g -fno-omit-frame-pointer -DSANITIZER")

View File

@ -40,7 +40,9 @@ endif ()
STRING(REGEX MATCHALL "[0-9]+" COMPILER_VERSION_LIST ${CMAKE_CXX_COMPILER_VERSION})
LIST(GET COMPILER_VERSION_LIST 0 COMPILER_VERSION_MAJOR)
# Example values: `lld-10`, `gold`.
option (LINKER_NAME "Linker name or full path")
if (COMPILER_GCC AND NOT LINKER_NAME)
find_program (LLD_PATH NAMES "ld.lld")
find_program (GOLD_PATH NAMES "ld.gold")

View File

@ -17,8 +17,9 @@ if (USE_DEBUG_HELPERS)
endif ()
# Add some warnings that are not available even with -Wall -Wextra -Wpedantic.
option (WEVERYTHING "Enables -Weverything option with some exceptions. This is intended for exploration of new compiler warnings that may be found to be useful. Only makes sense for clang." ON)
# Intended for exploration of new compiler warnings that may be found useful.
# Applies to clang only
option (WEVERYTHING "Enable -Weverything option with some exceptions." ON)
# Control maximum size of stack frames. It can be important if the code is run in fibers with small stack size.
# Only in release build because debug has too large stack frames.

2
contrib/jemalloc vendored

@ -1 +1 @@
Subproject commit ea6b3e973b477b8061e0076bb257dbd7f3faa756
Subproject commit 026764f19995c53583ab25a3b9c06a2fd74e4689

2
contrib/protobuf vendored

@ -1 +1 @@
Subproject commit d6a10dd3db55d8f7f9e464db9151874cde1f79ec
Subproject commit 445d1ae73a450b1e94622e7040989aa2048402e3

View File

@ -11,3 +11,7 @@ else ()
endif ()
add_subdirectory("${protobuf_SOURCE_DIR}/cmake" "${protobuf_BINARY_DIR}")
# We don't want to stop compilation on warnings in protobuf's headers.
# The following line overrides the value assigned by the command target_include_directories() in libprotobuf.cmake
set_property(TARGET libprotobuf PROPERTY INTERFACE_SYSTEM_INCLUDE_DIRECTORIES ${protobuf_SOURCE_DIR}/src)

1
debian/control vendored
View File

@ -11,7 +11,6 @@ Build-Depends: debhelper (>= 9),
libicu-dev,
libreadline-dev,
gperf,
python,
tzdata
Standards-Version: 3.9.8

4
debian/rules vendored
View File

@ -36,8 +36,8 @@ endif
CMAKE_FLAGS += -DENABLE_UTILS=0
DEB_CC ?= $(shell which gcc-9 gcc-8 gcc | head -n1)
DEB_CXX ?= $(shell which g++-9 g++-8 g++ | head -n1)
DEB_CC ?= $(shell which gcc-10 gcc-9 gcc | head -n1)
DEB_CXX ?= $(shell which g++-10 g++-9 g++ | head -n1)
ifdef DEB_CXX
DEB_BUILD_GNU_TYPE := $(shell dpkg-architecture -qDEB_BUILD_GNU_TYPE)

View File

@ -17,10 +17,10 @@ ccache --show-stats ||:
ccache --zero-stats ||:
ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so ||:
rm -f CMakeCache.txt
cmake --debug-trycompile --verbose=1 -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DSANITIZE=$SANITIZER $CMAKE_FLAGS ..
cmake --debug-trycompile --verbose=1 -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DSANITIZE=$SANITIZER -DENABLE_CHECK_HEAVY_BUILDS=1 $CMAKE_FLAGS ..
ninja $NINJA_FLAGS clickhouse-bundle
mv ./programs/clickhouse* /output
mv ./src/unit_tests_dbms /output
mv ./src/unit_tests_dbms /output ||: # may not exist for some binary builds
find . -name '*.so' -print -exec mv '{}' /output \;
find . -name '*.so.*' -print -exec mv '{}' /output \;

View File

@ -105,6 +105,7 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, image_typ
# Create combined output archive for split build and for performance tests.
if package_type == "performance":
result.append("COMBINED_OUTPUT=performance")
cmake_flags.append("-DENABLE_TESTS=0")
elif split_binary:
result.append("COMBINED_OUTPUT=shared_build")

View File

@ -89,7 +89,8 @@ EOT
fi
if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
$gosu /usr/bin/clickhouse-server --config-file=$CLICKHOUSE_CONFIG &
# Listen only on localhost until the initialization is done
$gosu /usr/bin/clickhouse-server --config-file=$CLICKHOUSE_CONFIG -- --listen_host=127.0.0.1 &
pid="$!"
# check if clickhouse is ready to accept connections

View File

@ -1,7 +1,7 @@
# docker build -t yandex/clickhouse-test-base .
FROM ubuntu:19.10
ENV DEBIAN_FRONTEND=noninteractive LLVM_VERSION=10
ENV DEBIAN_FRONTEND=noninteractive LLVM_VERSION=11
RUN apt-get update \
&& apt-get install ca-certificates lsb-release wget gnupg apt-transport-https \
@ -43,7 +43,6 @@ RUN apt-get update \
llvm-${LLVM_VERSION} \
moreutils \
perl \
perl \
pigz \
pkg-config \
tzdata \

View File

@ -83,7 +83,7 @@ SUBMODULES_TO_UPDATE=(contrib/boost contrib/zlib-ng contrib/libxml2 contrib/poco
git submodule update --init --recursive "${SUBMODULES_TO_UPDATE[@]}" | ts '%Y-%m-%d %H:%M:%S' | tee /test_output/submodule_log.txt
export CMAKE_LIBS_CONFIG="-DENABLE_LIBRARIES=0 -DENABLE_TESTS=0 -DENABLE_UTILS=0 -DENABLE_EMBEDDED_COMPILER=0 -DENABLE_THINLTO=0 -DUSE_UNWIND=1"
CMAKE_LIBS_CONFIG=(-DENABLE_LIBRARIES=0 -DENABLE_TESTS=0 -DENABLE_UTILS=0 -DENABLE_EMBEDDED_COMPILER=0 -DENABLE_THINLTO=0 -DUSE_UNWIND=1)
export CCACHE_DIR=/ccache
export CCACHE_BASEDIR=/ClickHouse
@ -96,8 +96,8 @@ ccache --zero-stats ||:
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_CXX_COMPILER=clang++-10 -DCMAKE_C_COMPILER=clang-10 "$CMAKE_LIBS_CONFIG" "${FASTTEST_CMAKE_FLAGS[@]}" | ts '%Y-%m-%d %H:%M:%S' | tee /test_output/cmake_log.txt
ninja clickhouse-bundle | ts '%Y-%m-%d %H:%M:%S' | tee /test_output/build_log.txt
cmake .. -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_CXX_COMPILER=clang++-10 -DCMAKE_C_COMPILER=clang-10 "${CMAKE_LIBS_CONFIG[@]}" "${FASTTEST_CMAKE_FLAGS[@]}" | ts '%Y-%m-%d %H:%M:%S' | tee /test_output/cmake_log.txt
time ninja clickhouse-bundle | ts '%Y-%m-%d %H:%M:%S' | tee /test_output/build_log.txt
ninja install | ts '%Y-%m-%d %H:%M:%S' | tee /test_output/install_log.txt
@ -111,35 +111,10 @@ ln -s /test_output /var/log/clickhouse-server
cp "$CLICKHOUSE_DIR/programs/server/config.xml" /etc/clickhouse-server/
cp "$CLICKHOUSE_DIR/programs/server/users.xml" /etc/clickhouse-server/
mkdir -p /etc/clickhouse-server/dict_examples
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/listen.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/metric_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/custom_settings_prefixes.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/readonly.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/access_management.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/executable_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/disks.xml /etc/clickhouse-server/config.d/
#ln -s /usr/share/clickhouse-test/config/secure_ports.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/clusters.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/graphite.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/server.key /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/server.crt /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/dhparam.pem /etc/clickhouse-server/
ln -sf /usr/share/clickhouse-test/config/client_config.xml /etc/clickhouse-client/config.xml
# Keep original query_masking_rules.xml
ln -s --backup=simple --suffix=_original.xml /usr/share/clickhouse-test/config/query_masking_rules.xml /etc/clickhouse-server/config.d/
# install tests config
$CLICKHOUSE_DIR/tests/config/install.sh
# doesn't support SSL
rm -f /etc/clickhouse-server/config.d/secure_ports.xml
# Kill the server in case we are running locally and not in docker
kill_clickhouse
@ -216,7 +191,7 @@ TESTS_TO_SKIP=(
01460_DistributedFilesToInsert
)
clickhouse-test -j 4 --no-long --testname --shard --zookeeper --skip "${TESTS_TO_SKIP[@]}" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee /test_output/test_log.txt
time clickhouse-test -j 8 --no-long --testname --shard --zookeeper --skip "${TESTS_TO_SKIP[@]}" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee /test_output/test_log.txt
# substr is to remove semicolon after test name
@ -234,7 +209,7 @@ then
kill_clickhouse
# Clean the data so that there is no interference from the previous test run.
rm -rvf /var/lib/clickhouse ||:
rm -rf /var/lib/clickhouse ||:
mkdir /var/lib/clickhouse
clickhouse-server --config /etc/clickhouse-server/config.xml --daemon

View File

@ -48,7 +48,7 @@ function configure
cp -av "$repo_dir"/programs/server/config* db
cp -av "$repo_dir"/programs/server/user* db
# TODO figure out which ones are needed
cp -av "$repo_dir"/tests/config/listen.xml db/config.d
cp -av "$repo_dir"/tests/config/config.d/listen.xml db/config.d
cp -av "$script_dir"/query-fuzzer-tweaks-users.xml db/users.d
}

View File

@ -1,5 +1,5 @@
# docker build -t yandex/clickhouse-integration-test .
FROM ubuntu:19.10
FROM yandex/clickhouse-test-base
RUN apt-get update \
&& env DEBIAN_FRONTEND=noninteractive apt-get -y install \
@ -8,7 +8,6 @@ RUN apt-get update \
libreadline-dev \
libicu-dev \
bsdutils \
llvm-9 \
gdb \
unixodbc \
odbcinst \
@ -29,9 +28,3 @@ RUN curl 'https://cdn.mysql.com//Downloads/Connector-ODBC/8.0/mysql-connector-od
ENV TZ=Europe/Moscow
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
# Sanitizer options
RUN echo "TSAN_OPTIONS='verbosity=1000 halt_on_error=1 history_size=7'" >> /etc/environment; \
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment; \
echo "MSAN_OPTIONS='abort_on_error=1'" >> /etc/environment; \
ln -s /usr/lib/llvm-9/bin/llvm-symbolizer /usr/bin/llvm-symbolizer;

View File

@ -29,7 +29,7 @@ RUN apt-get update \
tzdata \
vim \
wget \
&& pip3 --no-cache-dir install clickhouse_driver \
&& pip3 --no-cache-dir install clickhouse_driver scipy \
&& apt-get purge --yes python3-dev g++ \
&& apt-get autoremove --yes \
&& apt-get clean \

View File

@ -16,7 +16,7 @@ We also consider the test to be unstable, if the observed difference is less tha
performance differences above 5% more often than in 5% runs, so the test is likely
to have false positives.
### How to read the report
### How to Read the Report
The check status summarizes the report in a short text message like `1 faster, 10 unstable`:
* `1 faster` -- how many queries became faster,
@ -27,28 +27,50 @@ The check status summarizes the report in a short text message like `1 faster, 1
The report page itself constists of a several tables. Some of them always signify errors, e.g. "Run errors" -- the very presence of this table indicates that there were errors during the test, that are not normal and must be fixed. Some tables are mostly informational, e.g. "Test times" -- they reflect normal test results. But if a cell in such table is marked in red, this also means an error, e.g., a test is taking too long to run.
#### Tested commits
#### Tested Commits
Informational, no action required. Log messages for the commits that are tested. Note that for the right commit, we show nominal tested commit `pull/*/head` and real tested commit `pull/*/merge`, which is generated by GitHub by merging latest master to the `pull/*/head` and which we actually build and test in CI.
#### Run errors
Action required for every item -- these are errors that must be fixed. The errors that ocurred when running some test queries. For more information about the error, download test output archive and see `test-name-err.log`. To reproduce, see 'How to run' below.
#### Error Summary
Action required for every item.
#### Slow on client
Action required for every item -- these are errors that must be fixed. This table shows queries that take significantly longer to process on the client than on the server. A possible reason might be sending too much data to the client, e.g., a forgotten `format Null`.
This table summarizes all errors that ocurred during the test. Click the links to go to the description of a particular error.
#### Short queries not marked as short
Action required for every item -- these are errors that must be fixed. This table shows queries that are "short" but not explicitly marked as such. "Short" queries are too fast to meaningfully compare performance, because the changes are drowned by the noise. We consider all queries that run faster than 0.02 s to be "short", and only check the performance if they became slower than this threshold. Probably this mode is not what you want, so you have to increase the query run time to be between 1 and 0.1 s, so that the performance can be compared. You do want this "short" mode for queries that complete "immediately", such as some varieties of `select count(*)`. You have to mark them as "short" explicitly by writing `<query short="1">...`. The value of "short" attribute is evaluated as a python expression, and substitutions are performed, so you can write something like `<query short="{column1} = {column2}">select count(*) from table where {column1} > {column2}</query>`, to mark only a particular combination of variables as short.
#### Run Errors
Action required for every item -- these are errors that must be fixed.
#### Partial queries
Action required for the cells marked in red. Shows the queries we are unable to run on an old server -- probably because they contain a new function. You should see this table when you add a new function and a performance test for it. Check that the run time and variance are acceptable (run time between 0.1 and 1 seconds, variance below 10%). If not, they will be highlighted in red.
The errors that ocurred when running some test queries. For more information about the error, download test output archive and see `test-name-err.log`. To reproduce, see 'How to run' below.
#### Changes in performance
Action required for the cells marked in red, and some cheering is appropriate for the cells marked in green. These are the queries for which we observe a statistically significant change in performance. Note that there will always be some false positives -- we try to filter by p < 0.001, and have 2000 queries, so two false positives per run are expected. In practice we have more -- e.g. code layout changed because of some unknowable jitter in compiler internals, so the change we observe is real, but it is a 'false positive' in the sense that it is not directly caused by your changes. If, based on your knowledge of ClickHouse internals, you can decide that the observed test changes are not relevant to the changes made in the tested PR, you can ignore them.
#### Slow on Client
Action required for every item -- these are errors that must be fixed.
This table shows queries that take significantly longer to process on the client than on the server. A possible reason might be sending too much data to the client, e.g., a forgotten `format Null`.
#### Unexpected Query Duration
Action required for every item -- these are errors that must be fixed.
Queries that have "short" duration (on the order of 0.1 s) can't be reliably tested in a normal way, where we perform a small (about ten) measurements for each server, because the signal-to-noise ratio is much smaller. There is a special mode for such queries that instead runs them for a fixed amount of time, normally with much higher number of measurements (up to thousands). This mode must be explicitly enabled by the test author to avoid accidental errors. It must be used only for queries that are meant to complete "immediately", such as `select count(*)`. If your query is not supposed to be "immediate", try to make it run longer, by e.g. processing more data.
This table shows queries for which the "short" marking is not consistent with the actual query run time -- i.e., a query runs for a long time but is marked as short, or it runs very fast but is not marked as short.
If your query is really supposed to complete "immediately" and can't be made to run longer, you have to mark it as "short". To do so, write `<query short="1">...` in the test file. The value of "short" attribute is evaluated as a python expression, and substitutions are performed, so you can write something like `<query short="{column1} = {column2}">select count(*) from table where {column1} > {column2}</query>`, to mark only a particular combination of variables as short.
#### Partial Queries
Action required for the cells marked in red.
Shows the queries we are unable to run on an old server -- probably because they contain a new function. You should see this table when you add a new function and a performance test for it. Check that the run time and variance are acceptable (run time between 0.1 and 1 seconds, variance below 10%). If not, they will be highlighted in red.
#### Changes in Performance
Action required for the cells marked in red, and some cheering is appropriate for the cells marked in green.
These are the queries for which we observe a statistically significant change in performance. Note that there will always be some false positives -- we try to filter by p < 0.001, and have 2000 queries, so two false positives per run are expected. In practice we have more -- e.g. code layout changed because of some unknowable jitter in compiler internals, so the change we observe is real, but it is a 'false positive' in the sense that it is not directly caused by your changes. If, based on your knowledge of ClickHouse internals, you can decide that the observed test changes are not relevant to the changes made in the tested PR, you can ignore them.
You can find flame graphs for queries with performance changes in the test output archive, in files named as 'my_test_0_Cpu_SELECT 1 FROM....FORMAT Null.left.svg'. First goes the test name, then the query number in the test, then the trace type (same as in `system.trace_log`), and then the server version (left is old and right is new).
#### Unstable queries
Action required for the cells marked in red. These are queries for which we did not observe a statistically significant change in performance, but for which the variance in query performance is very high. This means that we are likely to observe big changes in performance even in the absence of real changes, e.g. when comparing the server to itself. Such queries are going to have bad sensitivity as performance tests -- if a query has, say, 50% expected variability, this means we are going to see changes in performance up to 50%, even when there were no real changes in the code. And because of this, we won't be able to detect changes less than 50% with such a query, which is pretty bad. The reasons for the high variability must be investigated and fixed; ideally, the variability should be brought under 5-10%.
#### Unstable Queries
Action required for the cells marked in red.
These are the queries for which we did not observe a statistically significant change in performance, but for which the variance in query performance is very high. This means that we are likely to observe big changes in performance even in the absence of real changes, e.g. when comparing the server to itself. Such queries are going to have bad sensitivity as performance tests -- if a query has, say, 50% expected variability, this means we are going to see changes in performance up to 50%, even when there were no real changes in the code. And because of this, we won't be able to detect changes less than 50% with such a query, which is pretty bad. The reasons for the high variability must be investigated and fixed; ideally, the variability should be brought under 5-10%.
The most frequent reason for instability is that the query is just too short -- e.g. below 0.1 seconds. Bringing query time to 0.2 seconds or above usually helps.
Other reasons may include:
@ -57,24 +79,33 @@ Other reasons may include:
Investigating the instablility is the hardest problem in performance testing, and we still have not been able to understand the reasons behind the instability of some queries. There are some data that can help you in the performance test output archive. Look for files named 'my_unstable_test_0_SELECT 1...FORMAT Null.{left,right}.metrics.rep'. They contain metrics from `system.query_log.ProfileEvents` and functions from stack traces from `system.trace_log`, that vary significantly between query runs. The second column is array of \[min, med, max] values for the metric. Say, if you see `PerfCacheMisses` there, it may mean that the code being tested has not-so-cache-local memory access pattern that is sensitive to memory layout.
#### Skipped tests
Informational, no action required. Shows the tests that were skipped, and the reason for it. Normally it is because the data set required for the test was not loaded, or the test is marked as 'long' -- both cases mean that the test is too big to be ran per-commit.
#### Skipped Tests
Informational, no action required.
#### Test performance changes
Informational, no action required. This table summarizes the changes in performance of queries in each test -- how many queries have changed, how many are unstable, and what is the magnitude of the changes.
Shows the tests that were skipped, and the reason for it. Normally it is because the data set required for the test was not loaded, or the test is marked as 'long' -- both cases mean that the test is too big to be ran per-commit.
#### Test times
Action required for the cells marked in red. This table shows the run times for all the tests. You may have to fix two kinds of errors in this table:
#### Test Performance Changes
Informational, no action required.
This table summarizes the changes in performance of queries in each test -- how many queries have changed, how many are unstable, and what is the magnitude of the changes.
#### Test Times
Action required for the cells marked in red.
This table shows the run times for all the tests. You may have to fix two kinds of errors in this table:
1) Average query run time is too long -- probalby means that the preparatory steps such as creating the table and filling them with data are taking too long. Try to make them faster.
2) Longest query run time is too long -- some particular queries are taking too long, try to make them faster. The ideal query run time is between 0.1 and 1 s.
#### Concurrent benchmarks
No action required. This table shows the results of a concurrent behcmark where queries from `website` are ran in parallel using `clickhouse-benchmark`, and requests per second values are compared for old and new servers. It shows variability up to 20% for no apparent reason, so it's probably safe to disregard it. We have it for special cases like investigating concurrency effects in memory allocators, where it may be important.
#### Metric Changes
No action required.
#### Metric changes
No action required. These are changes in median values of metrics from `system.asynchronous_metrics_log`. Again, they are prone to unexplained variation and you can safely ignore this table unless it's interesting to you for some particular reason (e.g. you want to compare memory usage). There are also graphs of these metrics in the performance test output archive, in the `metrics` folder.
These are changes in median values of metrics from `system.asynchronous_metrics_log`. These metrics are prone to unexplained variation and you can safely ignore this table unless it's interesting to you for some particular reason (e.g. you want to compare memory usage). There are also graphs of these metrics in the performance test output archive, in the `metrics` folder.
### How to run
#### Errors while Building the Report
Ask a maintainer for help. These errors normally indicate a problem with testing infrastructure.
### How to Run
Run the entire docker container, specifying PR number (0 for master)
and SHA of the commit to test. The reference revision is determined as a nearest
ancestor testing release tag. It is possible to specify the reference revision and

View File

@ -114,8 +114,6 @@ function run_tests
# Just check that the script runs at all
"$script_dir/perf.py" --help > /dev/null
changed_test_files=""
# Find the directory with test files.
if [ -v CHPC_TEST_PATH ]
then
@ -130,14 +128,6 @@ function run_tests
else
# For PRs, use newer test files so we can test these changes.
test_prefix=right/performance
# If only the perf tests were changed in the PR, we will run only these
# tests. The list of changed tests in changed-test.txt is prepared in
# entrypoint.sh from git diffs, because it has the cloned repo. Used
# to use rsync for that but it was really ugly and not always correct
# (e.g. when the reference SHA is really old and has some other
# differences to the tested SHA, besides the one introduced by the PR).
changed_test_files=$(sed "s/tests\/performance/${test_prefix//\//\\/}/" changed-tests.txt)
fi
# Determine which tests to run.
@ -146,25 +136,32 @@ function run_tests
# Run only explicitly specified tests, if any.
# shellcheck disable=SC2010
test_files=$(ls "$test_prefix" | grep "$CHPC_TEST_GREP" | xargs -I{} -n1 readlink -f "$test_prefix/{}")
elif [ "$changed_test_files" != "" ]
elif [ "$PR_TO_TEST" -ne 0 ] \
&& [ "$(wc -l < changed-test-definitions.txt)" -gt 0 ] \
&& [ "$(wc -l < changed-test-scripts.txt)" -eq 0 ] \
&& [ "$(wc -l < other-changed-files.txt)" -eq 0 ]
then
# Use test files that changed in the PR.
test_files="$changed_test_files"
# If only the perf tests were changed in the PR, we will run only these
# tests. The lists of changed files are prepared in entrypoint.sh because
# it has the repository.
test_files=$(sed "s/tests\/performance/${test_prefix//\//\\/}/" changed-test-definitions.txt)
else
# The default -- run all tests found in the test dir.
test_files=$(ls "$test_prefix"/*.xml)
fi
# For PRs, test only a subset of queries, and run them less times.
# If the corresponding environment variables are already set, keep
# those values.
if [ "$PR_TO_TEST" == "0" ]
# For PRs w/o changes in test definitons and scripts, test only a subset of
# queries, and run them less times. If the corresponding environment variables
# are already set, keep those values.
if [ "$PR_TO_TEST" -ne 0 ] \
&& [ "$(wc -l < changed-test-definitions.txt)" -eq 0 ] \
&& [ "$(wc -l < changed-test-scripts.txt)" -eq 0 ]
then
CHPC_RUNS=${CHPC_RUNS:-13}
CHPC_MAX_QUERIES=${CHPC_MAX_QUERIES:-0}
else
CHPC_RUNS=${CHPC_RUNS:-7}
CHPC_MAX_QUERIES=${CHPC_MAX_QUERIES:-20}
else
CHPC_RUNS=${CHPC_RUNS:-13}
CHPC_MAX_QUERIES=${CHPC_MAX_QUERIES:-0}
fi
export CHPC_RUNS
export CHPC_MAX_QUERIES
@ -213,33 +210,9 @@ function run_tests
wait
}
# Run some queries concurrently and report the resulting TPS. This additional
# (relatively) short test helps detect concurrency-related effects, because the
# main performance comparison testing is done query-by-query.
function run_benchmark
{
rm -rf benchmark ||:
mkdir benchmark ||:
# The list is built by run_tests.
while IFS= read -r file
do
name=$(basename "$file" ".xml")
"$script_dir/perf.py" --print-queries "$file" > "benchmark/$name-queries.txt"
"$script_dir/perf.py" --print-settings "$file" > "benchmark/$name-settings.txt"
readarray -t settings < "benchmark/$name-settings.txt"
command=(clickhouse-benchmark --concurrency 6 --cumulative --iterations 1000 --randomize 1 --delay 0 --continue_on_errors "${settings[@]}")
"${command[@]}" --port 9001 --json "benchmark/$name-left.json" < "benchmark/$name-queries.txt"
"${command[@]}" --port 9002 --json "benchmark/$name-right.json" < "benchmark/$name-queries.txt"
done < benchmarks-to-run.txt
}
function get_profiles_watchdog
{
sleep 6000
sleep 600
echo "The trace collection did not finish in time." >> profile-errors.log
@ -506,8 +479,6 @@ build_log_column_definitions
cat analyze/errors.log >> report/errors.log ||:
cat profile-errors.log >> report/errors.log ||:
short_query_threshold="0.02"
clickhouse-local --query "
create view query_display_names as select * from
file('analyze/query-display-names.tsv', TSV,
@ -540,18 +511,11 @@ create view query_metric_stats as
-- Main statistics for queries -- query time as reported in query log.
create table queries engine File(TSVWithNamesAndTypes, 'report/queries.tsv')
as select
-- Comparison mode doesn't make sense for queries that complete
-- immediately (on the same order of time as noise). If query duration is
-- less that some threshold, we just skip it. If there is a significant
-- regression in such query, the time will exceed the threshold, and we
-- well process it normally and detect the regression.
right < $short_query_threshold as short,
not short and abs(diff) > report_threshold and abs(diff) > stat_threshold as changed_fail,
not short and abs(diff) > report_threshold - 0.05 and abs(diff) > stat_threshold as changed_show,
abs(diff) > report_threshold and abs(diff) > stat_threshold as changed_fail,
abs(diff) > report_threshold - 0.05 and abs(diff) > stat_threshold as changed_show,
not short and not changed_fail and stat_threshold > report_threshold + 0.10 as unstable_fail,
not short and not changed_show and stat_threshold > report_threshold - 0.05 as unstable_show,
not changed_fail and stat_threshold > report_threshold + 0.10 as unstable_fail,
not changed_show and stat_threshold > report_threshold - 0.05 as unstable_show,
left, right, diff, stat_threshold,
if(report_threshold > 0, report_threshold, 0.10) as report_threshold,
@ -656,24 +620,59 @@ create table wall_clock_time_per_test engine Memory as select *
create table test_time engine Memory as
select test, sum(client) total_client_time,
maxIf(client, not short) query_max,
minIf(client, not short) query_min,
count(*) queries, sum(short) short_queries
max(client) query_max,
min(client) query_min,
count(*) queries
from total_client_time_per_query full join queries using (test, query_index)
group by test;
create view query_runs as select * from file('analyze/query-runs.tsv', TSV,
'test text, query_index int, query_id text, version UInt8, time float');
--
-- Guess the number of query runs used for this test. The number is required to
-- calculate and check the average query run time in the report.
-- We have to be careful, because we will encounter:
-- 1) partial queries which run only on one server
-- 2) short queries which run for a much higher number of times
-- 3) some errors that make query run for a different number of times on a
-- particular server.
--
create view test_runs as
select test,
-- Default to 7 runs if there are only 'short' queries in the test, and
-- we can't determine the number of runs.
if((ceil(medianOrDefaultIf(t.runs, not short), 0) as r) != 0, r, 7) runs
from (
select
-- The query id is the same for both servers, so no need to divide here.
uniqExact(query_id) runs,
(test, query_index) in
(select * from file('analyze/marked-short-queries.tsv', TSV,
'test text, query_index int'))
as short,
test, query_index
from query_runs
group by test, query_index
) t
group by test
;
create table test_times_report engine File(TSV, 'report/test-times.tsv') as
select wall_clock_time_per_test.test, real,
toDecimal64(total_client_time, 3),
queries,
short_queries,
toDecimal64(query_max, 3),
toDecimal64(real / queries, 3) avg_real_per_query,
toDecimal64(query_min, 3)
toDecimal64(query_min, 3),
runs
from test_time
-- wall clock times are also measured for skipped tests, so don't
-- do full join
left join wall_clock_time_per_test using test
-- wall clock times are also measured for skipped tests, so don't
-- do full join
left join wall_clock_time_per_test
on wall_clock_time_per_test.test = test_time.test
full join test_runs
on test_runs.test = test_time.test
order by avg_real_per_query desc;
-- report for all queries page, only main metric
@ -701,32 +700,48 @@ create table queries_for_flamegraph engine File(TSVWithNamesAndTypes,
select test, query_index from queries where unstable_show or changed_show
;
-- List of queries that have 'short' duration, but are not marked as 'short' by
-- the test author (we report them).
create table unmarked_short_queries_report
engine File(TSV, 'report/unmarked-short-queries.tsv')
as select time, test, query_index, query_display_name
create view shortness
as select
(test, query_index) in
(select * from file('analyze/marked-short-queries.tsv', TSV,
'test text, query_index int'))
as marked_short,
time, test, query_index, query_display_name
from (
select right time, test, query_index from queries where short
select right time, test, query_index from queries
union all
select time_median, test, query_index from partial_query_times
where time_median < $short_query_threshold
) times
left join query_display_names
on times.test = query_display_names.test
and times.query_index = query_display_names.query_index
where (test, query_index) not in
(select * from file('analyze/marked-short-queries.tsv', TSV,
'test text, query_index int'))
order by test, query_index
;
-- Report of queries that have inconsistent 'short' markings:
-- 1) have short duration, but are not marked as 'short'
-- 2) the reverse -- marked 'short' but take too long.
-- The threshold for 2) is significantly larger than the threshold for 1), to
-- avoid jitter.
create table inconsistent_short_marking_report
engine File(TSV, 'report/unexpected-query-duration.tsv')
as select
multiIf(marked_short and time > 0.1, '\"short\" queries must run faster than 0.02 s',
not marked_short and time < 0.02, '\"normal\" queries must run longer than 0.1 s',
'') problem,
marked_short, time,
test, query_index, query_display_name
from shortness
where problem != ''
;
--------------------------------------------------------------------------------
-- various compatibility data formats follow, not related to the main report
-- keep the table in old format so that we can analyze new and old data together
create table queries_old_format engine File(TSVWithNamesAndTypes, 'queries.rep')
as select short, changed_fail, unstable_fail, left, right, diff,
as select 0 short, changed_fail, unstable_fail, left, right, diff,
stat_threshold, test, query_display_name query
from queries
;
@ -1024,9 +1039,6 @@ case "$stage" in
# Ignore the errors to collect the log and build at least some report, anyway
time run_tests ||:
;&
"run_benchmark")
time run_benchmark 2> >(tee -a run-errors.tsv 1>&2) ||:
;&
"get_profiles")
# Check for huge pages.
cat /sys/kernel/mm/transparent_hugepage/enabled > thp-enabled.txt ||:
@ -1053,7 +1065,7 @@ case "$stage" in
# to collect the logs. Prefer not to restart, because addresses might change
# and we won't be able to process trace_log data. Start in a subshell, so that
# it doesn't interfere with the watchdog through `wait`.
( get_profiles || restart && get_profiles ) ||:
( get_profiles || { restart && get_profiles ; } ) ||:
# Kill the whole process group, because somehow when the subshell is killed,
# the sleep inside remains alive and orphaned.

View File

@ -1,8 +1,6 @@
<yandex>
<profiles>
<default>
<query_profiler_real_time_period_ns>10000000</query_profiler_real_time_period_ns>
<query_profiler_cpu_time_period_ns>0</query_profiler_cpu_time_period_ns>
<allow_introspection_functions>1</allow_introspection_functions>
<log_queries>1</log_queries>
<metrics_perf_events_enabled>1</metrics_perf_events_enabled>

View File

@ -97,13 +97,10 @@ then
# tests for use by compare.sh. Compare to merge base, because master might be
# far in the future and have unrelated test changes.
base=$(git -C right/ch merge-base pr origin/master)
git -C right/ch diff --name-only "$base" pr | tee changed-tests.txt
if grep -vq '^tests/performance' changed-tests.txt
then
# Have some other changes besides the tests, so truncate the test list,
# meaning, run all tests.
: > changed-tests.txt
fi
git -C right/ch diff --name-only "$base" pr -- . | tee all-changed-files.txt
git -C right/ch diff --name-only "$base" pr -- tests/performance | tee changed-test-definitions.txt
git -C right/ch diff --name-only "$base" pr -- docker/test/performance-comparison | tee changed-test-scripts.txt
git -C right/ch diff --name-only "$base" pr -- :!tests/performance :!docker/test/performance-comparison | tee other-changed-files.txt
fi
# Set python output encoding so that we can print queries with Russian letters.

View File

@ -1,17 +1,22 @@
#!/usr/bin/python3
import os
import sys
import itertools
import clickhouse_driver
import xml.etree.ElementTree as et
import argparse
import clickhouse_driver
import itertools
import functools
import math
import os
import pprint
import random
import re
import statistics
import string
import sys
import time
import traceback
import xml.etree.ElementTree as et
from threading import Thread
from scipy import stats
def tsv_escape(s):
return s.replace('\\', '\\\\').replace('\t', '\\t').replace('\n', '\\n').replace('\r','')
@ -19,10 +24,11 @@ def tsv_escape(s):
parser = argparse.ArgumentParser(description='Run performance test.')
# Explicitly decode files as UTF-8 because sometimes we have Russian characters in queries, and LANG=C is set.
parser.add_argument('file', metavar='FILE', type=argparse.FileType('r', encoding='utf-8'), nargs=1, help='test description file')
parser.add_argument('--host', nargs='*', default=['localhost'], help="Server hostname(s). Corresponds to '--port' options.")
parser.add_argument('--port', nargs='*', default=[9000], help="Server port(s). Corresponds to '--host' options.")
parser.add_argument('--host', nargs='*', default=['localhost'], help="Space-separated list of server hostname(s). Corresponds to '--port' options.")
parser.add_argument('--port', nargs='*', default=[9000], help="Space-separated list of server port(s). Corresponds to '--host' options.")
parser.add_argument('--runs', type=int, default=1, help='Number of query runs per server.')
parser.add_argument('--max-queries', type=int, default=None, help='Test no more than this number of queries, chosen at random.')
parser.add_argument('--queries-to-run', nargs='*', type=int, default=None, help='Space-separated list of indexes of queries to test.')
parser.add_argument('--long', action='store_true', help='Do not skip the tests tagged as long.')
parser.add_argument('--print-queries', action='store_true', help='Print test queries and exit.')
parser.add_argument('--print-settings', action='store_true', help='Print test settings and exit.')
@ -64,18 +70,13 @@ def substitute_parameters(query_templates, other_templates = []):
# Build a list of test queries, substituting parameters to query templates,
# and reporting the queries marked as short.
test_queries = []
is_short = []
for e in root.findall('query'):
new_queries = []
if 'short' in e.attrib:
new_queries, [is_short] = substitute_parameters([e.text], [[e.attrib['short']]])
for i, s in enumerate(is_short):
# Don't print this if we only need to print the queries.
if eval(s) and not args.print_queries:
print(f'short\t{i + len(test_queries)}')
else:
new_queries = substitute_parameters([e.text])
new_queries, [new_is_short] = substitute_parameters([e.text], [[e.attrib.get('short', '0')]])
test_queries += new_queries
is_short += [eval(s) for s in new_is_short]
assert(len(test_queries) == len(is_short))
# If we're only asked to print the queries, do that and exit
@ -84,6 +85,11 @@ if args.print_queries:
print(q)
exit(0)
# Print short queries
for i, s in enumerate(is_short):
if s:
print(f'short\t{i}')
# If we're only asked to print the settings, do that and exit. These are settings
# for clickhouse-benchmark, so we print them as command line arguments, e.g.
# '--max_memory_usage=10000000'.
@ -100,25 +106,13 @@ if not args.long:
print('skipped\tTest is tagged as long.')
sys.exit(0)
# Check main metric to detect infinite tests. We shouldn't have such tests anymore,
# but we did in the past, and it is convenient to be able to process old tests.
main_metric_element = root.find('main_metric/*')
if main_metric_element is not None and main_metric_element.tag != 'min_time':
raise Exception('Only the min_time main metric is supported. This test uses \'{}\''.format(main_metric_element.tag))
# Another way to detect infinite tests. They should have an appropriate main_metric
# but sometimes they don't.
infinite_sign = root.find('.//average_speed_not_changing_for_ms')
if infinite_sign is not None:
raise Exception('Looks like the test is infinite (sign 1)')
# Print report threshold for the test if it is set.
if 'max_ignored_relative_change' in root.attrib:
print(f'report-threshold\t{root.attrib["max_ignored_relative_change"]}')
# Open connections
servers = [{'host': host, 'port': port} for (host, port) in zip(args.host, args.port)]
connections = [clickhouse_driver.Client(**server) for server in servers]
all_connections = [clickhouse_driver.Client(**server) for server in servers]
for s in servers:
print('server\t{}\t{}'.format(s['host'], s['port']))
@ -128,7 +122,7 @@ for s in servers:
# connection loses the changes in settings.
drop_query_templates = [q.text for q in root.findall('drop_query')]
drop_queries = substitute_parameters(drop_query_templates)
for conn_index, c in enumerate(connections):
for conn_index, c in enumerate(all_connections):
for q in drop_queries:
try:
c.execute(q)
@ -144,7 +138,7 @@ for conn_index, c in enumerate(connections):
# configurable). So the end result is uncertain, but hopefully we'll be able to
# run at least some queries.
settings = root.findall('settings/*')
for conn_index, c in enumerate(connections):
for conn_index, c in enumerate(all_connections):
for s in settings:
try:
q = f"set {s.tag} = '{s.text}'"
@ -156,7 +150,7 @@ for conn_index, c in enumerate(connections):
# Check tables that should exist. If they don't exist, just skip this test.
tables = [e.text for e in root.findall('preconditions/table_exists')]
for t in tables:
for c in connections:
for c in all_connections:
try:
res = c.execute("select 1 from {} limit 1".format(t))
except:
@ -165,8 +159,11 @@ for t in tables:
print(f'skipped\t{tsv_escape(skipped_message)}')
sys.exit(0)
# Run create queries
create_query_templates = [q.text for q in root.findall('create_query')]
# Run create and fill queries. We will run them simultaneously for both servers,
# to save time.
# The weird search is to keep the relative order of elements, which matters, and
# etree doesn't support the appropriate xpath query.
create_query_templates = [q.text for q in root.findall('./*') if q.tag in ('create_query', 'fill_query')]
create_queries = substitute_parameters(create_query_templates)
# Disallow temporary tables, because the clickhouse_driver reconnects on errors,
@ -178,23 +175,34 @@ for q in create_queries:
file = sys.stderr)
sys.exit(1)
for conn_index, c in enumerate(connections):
for q in create_queries:
c.execute(q)
print(f'create\t{conn_index}\t{c.last_query.elapsed}\t{tsv_escape(q)}')
def do_create(connection, index, queries):
for q in queries:
connection.execute(q)
print(f'create\t{index}\t{connection.last_query.elapsed}\t{tsv_escape(q)}')
# Run fill queries
fill_query_templates = [q.text for q in root.findall('fill_query')]
fill_queries = substitute_parameters(fill_query_templates)
for conn_index, c in enumerate(connections):
for q in fill_queries:
c.execute(q)
print(f'fill\t{conn_index}\t{c.last_query.elapsed}\t{tsv_escape(q)}')
threads = [Thread(target = do_create, args = (connection, index, create_queries))
for index, connection in enumerate(all_connections)]
# Run the queries in randomized order, but preserve their indexes as specified
# in the test XML. To avoid using too much time, limit the number of queries
# we run per test.
queries_to_run = random.sample(range(0, len(test_queries)), min(len(test_queries), args.max_queries or len(test_queries)))
for t in threads:
t.start()
for t in threads:
t.join()
queries_to_run = range(0, len(test_queries))
if args.max_queries:
# If specified, test a limited number of queries chosen at random.
queries_to_run = random.sample(range(0, len(test_queries)), min(len(test_queries), args.max_queries))
if args.queries_to_run:
# Run the specified queries, with some sanity check.
for i in args.queries_to_run:
if i < 0 or i >= len(test_queries):
print(f'There is no query no. "{i}" in this test, only [{0}-{len(test_queries) - 1}] are present')
exit(1)
queries_to_run = args.queries_to_run
# Run test queries.
for query_index in queries_to_run:
@ -216,11 +224,12 @@ for query_index in queries_to_run:
# new one. We want to run them on the new server only, so that the PR author
# can ensure that the test works properly. Remember the errors we had on
# each server.
query_error_on_connection = [None] * len(connections);
for conn_index, c in enumerate(connections):
query_error_on_connection = [None] * len(all_connections);
for conn_index, c in enumerate(all_connections):
try:
prewarm_id = f'{query_prefix}.prewarm0'
res = c.execute(q, query_id = prewarm_id)
# Will also detect too long queries during warmup stage
res = c.execute(q, query_id = prewarm_id, settings = {'max_execution_time': 10})
print(f'prewarm\t{query_index}\t{prewarm_id}\t{conn_index}\t{c.last_query.elapsed}')
except KeyboardInterrupt:
raise
@ -230,7 +239,6 @@ for query_index in queries_to_run:
query_error_on_connection[conn_index] = traceback.format_exc();
continue
# Report all errors that ocurred during prewarm and decide what to do next.
# If prewarm fails for the query on all servers -- skip the query and
# continue testing the next query.
@ -244,21 +252,29 @@ for query_index in queries_to_run:
if len(no_errors) == 0:
continue
elif len(no_errors) < len(connections):
elif len(no_errors) < len(all_connections):
print(f'partial\t{query_index}\t{no_errors}')
this_query_connections = [all_connections[index] for index in no_errors]
# Now, perform measured runs.
# Track the time spent by the client to process this query, so that we can
# notice the queries that take long to process on the client side, e.g. by
# sending excessive data.
start_seconds = time.perf_counter()
server_seconds = 0
for run in range(0, args.runs):
run_id = f'{query_prefix}.run{run}'
for conn_index, c in enumerate(connections):
if query_error_on_connection[conn_index]:
continue
profile_seconds = 0
run = 0
# Arrays of run times for each connection.
all_server_times = []
for conn_index, c in enumerate(this_query_connections):
all_server_times.append([])
while True:
run_id = f'{query_prefix}.run{run}'
for conn_index, c in enumerate(this_query_connections):
try:
res = c.execute(q, query_id = run_id)
except Exception as e:
@ -267,22 +283,79 @@ for query_index in queries_to_run:
e.message = run_id + ': ' + e.message
raise
print(f'query\t{query_index}\t{run_id}\t{conn_index}\t{c.last_query.elapsed}')
server_seconds += c.last_query.elapsed
elapsed = c.last_query.elapsed
all_server_times[conn_index].append(elapsed)
if c.last_query.elapsed > 10:
server_seconds += elapsed
print(f'query\t{query_index}\t{run_id}\t{conn_index}\t{elapsed}')
if elapsed > 10:
# Stop processing pathologically slow queries, to avoid timing out
# the entire test task. This shouldn't really happen, so we don't
# need much handling for this case and can just exit.
print(f'The query no. {query_index} is taking too long to run ({c.last_query.elapsed} s)', file=sys.stderr)
print(f'The query no. {query_index} is taking too long to run ({elapsed} s)', file=sys.stderr)
exit(2)
# Be careful with the counter, after this line it's the next iteration
# already.
run += 1
# Try to run any query for at least the specified number of times,
# before considering other stop conditions.
if run < args.runs:
continue
# For very short queries we have a special mode where we run them for at
# least some time. The recommended lower bound of run time for "normal"
# queries is about 0.1 s, and we run them about 10 times, giving the
# time per query per server of about one second. Use this value as a
# reference for "short" queries.
if is_short[query_index]:
if server_seconds >= 2 * len(this_query_connections):
break
# Also limit the number of runs, so that we don't go crazy processing
# the results -- 'eqmed.sql' is really suboptimal.
if run >= 500:
break
else:
if run >= args.runs:
break
client_seconds = time.perf_counter() - start_seconds
print(f'client-time\t{query_index}\t{client_seconds}\t{server_seconds}')
#print(all_server_times)
#print(stats.ttest_ind(all_server_times[0], all_server_times[1], equal_var = False).pvalue)
# Run additional profiling queries to collect profile data, but only if test times appeared to be different.
# We have to do it after normal runs because otherwise it will affect test statistics too much
if len(all_server_times) == 2 and stats.ttest_ind(all_server_times[0], all_server_times[1], equal_var = False).pvalue < 0.1:
run = 0
while True:
run_id = f'{query_prefix}.profile{run}'
for conn_index, c in enumerate(this_query_connections):
try:
res = c.execute(q, query_id = run_id, settings = {'query_profiler_real_time_period_ns': 10000000})
print(f'profile\t{query_index}\t{run_id}\t{conn_index}\t{c.last_query.elapsed}')
except Exception as e:
# Add query id to the exception to make debugging easier.
e.args = (run_id, *e.args)
e.message = run_id + ': ' + e.message
raise
elapsed = c.last_query.elapsed
profile_seconds += elapsed
run += 1
# Don't spend too much time for profile runs
if run > args.runs or profile_seconds > 10:
break
# And don't bother with short queries
# Run drop queries
drop_queries = substitute_parameters(drop_query_templates)
for conn_index, c in enumerate(connections):
for conn_index, c in enumerate(all_connections):
for q in drop_queries:
c.execute(q)
print(f'drop\t{conn_index}\t{c.last_query.elapsed}\t{tsv_escape(q)}')

View File

@ -98,6 +98,9 @@ th {{
tr:nth-child(odd) td {{filter: brightness(90%);}}
.unexpected-query-duration tr :nth-child(2),
.unexpected-query-duration tr :nth-child(3),
.unexpected-query-duration tr :nth-child(5),
.all-query-times tr :nth-child(1),
.all-query-times tr :nth-child(2),
.all-query-times tr :nth-child(3),
@ -126,7 +129,6 @@ tr:nth-child(odd) td {{filter: brightness(90%);}}
.test-times tr :nth-child(5),
.test-times tr :nth-child(6),
.test-times tr :nth-child(7),
.test-times tr :nth-child(8),
.concurrent-benchmarks tr :nth-child(2),
.concurrent-benchmarks tr :nth-child(3),
.concurrent-benchmarks tr :nth-child(4),
@ -185,8 +187,10 @@ def td(value, cell_attributes = ''):
cell_attributes = cell_attributes,
value = value)
def th(x):
return '<th>' + str(x) + '</th>'
def th(value, cell_attributes = ''):
return '<th {cell_attributes}>{value}</th>'.format(
cell_attributes = cell_attributes,
value = value)
def tableRow(cell_values, cell_attributes = [], anchor=None):
return tr(
@ -197,17 +201,24 @@ def tableRow(cell_values, cell_attributes = [], anchor=None):
if a is not None and v is not None]),
anchor)
def tableHeader(r):
return tr(''.join([th(f) for f in r]))
def tableHeader(cell_values, cell_attributes = []):
return tr(
''.join([th(v, a)
for v, a in itertools.zip_longest(
cell_values, cell_attributes,
fillvalue = '')
if a is not None and v is not None]))
def tableStart(title):
cls = '-'.join(title.lower().split(' ')[:3]);
global table_anchor
table_anchor = cls
anchor = currentTableAnchor()
help_anchor = '-'.join(title.lower().split(' '));
return f"""
<h2 id="{anchor}">
<a class="cancela" href="#{anchor}">{title}</a>
<a class="cancela" href="https://github.com/ClickHouse/ClickHouse/tree/master/docker/test/performance-comparison#{help_anchor}"><sup style="color: #888">?</sup></a>
</h2>
<table class="{cls}">
"""
@ -250,7 +261,7 @@ def addSimpleTable(caption, columns, rows, pos=None):
def add_tested_commits():
global report_errors
try:
addSimpleTable('Tested commits', ['Old', 'New'],
addSimpleTable('Tested Commits', ['Old', 'New'],
[['<pre>{}</pre>'.format(x) for x in
[open('left-commit.txt').read(),
open('right-commit.txt').read()]]])
@ -276,7 +287,7 @@ def add_report_errors():
if not report_errors:
return
text = tableStart('Errors while building the report')
text = tableStart('Errors while Building the Report')
text += tableHeader(['Error'])
for x in report_errors:
text += tableRow([x])
@ -290,7 +301,7 @@ def add_errors_explained():
return
text = '<a name="fail1"/>'
text += tableStart('Error summary')
text += tableStart('Error Summary')
text += tableHeader(['Description'])
for row in errors_explained:
text += tableRow(row)
@ -308,26 +319,26 @@ if args.report == 'main':
run_error_rows = tsvRows('run-errors.tsv')
error_tests += len(run_error_rows)
addSimpleTable('Run errors', ['Test', 'Error'], run_error_rows)
addSimpleTable('Run Errors', ['Test', 'Error'], run_error_rows)
if run_error_rows:
errors_explained.append([f'<a href="#{currentTableAnchor()}">There were some errors while running the tests</a>']);
slow_on_client_rows = tsvRows('report/slow-on-client.tsv')
error_tests += len(slow_on_client_rows)
addSimpleTable('Slow on client',
addSimpleTable('Slow on Client',
['Client time,&nbsp;s', 'Server time,&nbsp;s', 'Ratio', 'Test', 'Query'],
slow_on_client_rows)
if slow_on_client_rows:
errors_explained.append([f'<a href="#{currentTableAnchor()}">Some queries are taking noticeable time client-side (missing `FORMAT Null`?)</a>']);
unmarked_short_rows = tsvRows('report/unmarked-short-queries.tsv')
unmarked_short_rows = tsvRows('report/unexpected-query-duration.tsv')
error_tests += len(unmarked_short_rows)
addSimpleTable('Short queries not marked as short',
['New client time, s', 'Test', '#', 'Query'],
addSimpleTable('Unexpected Query Duration',
['Problem', 'Marked as "short"?', 'Run time, s', 'Test', '#', 'Query'],
unmarked_short_rows)
if unmarked_short_rows:
errors_explained.append([f'<a href="#{currentTableAnchor()}">Some queries have short duration but are not explicitly marked as "short"</a>']);
errors_explained.append([f'<a href="#{currentTableAnchor()}">Some queries have unexpected duration</a>']);
def add_partial():
rows = tsvRows('report/partial-queries-report.tsv')
@ -335,7 +346,7 @@ if args.report == 'main':
return
global unstable_partial_queries, slow_average_tests, tables
text = tableStart('Partial queries')
text = tableStart('Partial Queries')
columns = ['Median time, s', 'Relative time variance', 'Test', '#', 'Query']
text += tableHeader(columns)
attrs = ['' for c in columns]
@ -366,23 +377,23 @@ if args.report == 'main':
global faster_queries, slower_queries, tables
text = tableStart('Changes in performance')
text = tableStart('Changes in Performance')
columns = [
'Old,&nbsp;s', # 0
'New,&nbsp;s', # 1
'Ratio of speedup&nbsp;(-) or slowdown&nbsp;(+)', # 2
'Relative difference (new&nbsp;&minus;&nbsp;old) / old', # 3
'p&nbsp;<&nbsp;0.01 threshold', # 4
# Failed # 5
'', # Failed # 5
'Test', # 6
'#', # 7
'Query', # 8
]
text += tableHeader(columns)
attrs = ['' for c in columns]
attrs[5] = None
text += tableHeader(columns, attrs)
for row in rows:
anchor = f'{currentTableAnchor()}.{row[6]}.{row[7]}'
if int(row[5]):
@ -417,17 +428,17 @@ if args.report == 'main':
'New,&nbsp;s', #1
'Relative difference (new&nbsp;-&nbsp;old)/old', #2
'p&nbsp;&lt;&nbsp;0.01 threshold', #3
# Failed #4
'', # Failed #4
'Test', #5
'#', #6
'Query' #7
]
text = tableStart('Unstable queries')
text += tableHeader(columns)
attrs = ['' for c in columns]
attrs[4] = None
text = tableStart('Unstable Queries')
text += tableHeader(columns, attrs)
for r in unstable_rows:
anchor = f'{currentTableAnchor()}.{r[5]}.{r[6]}'
if int(r[4]):
@ -444,9 +455,9 @@ if args.report == 'main':
add_unstable_queries()
skipped_tests_rows = tsvRows('analyze/skipped-tests.tsv')
addSimpleTable('Skipped tests', ['Test', 'Reason'], skipped_tests_rows)
addSimpleTable('Skipped Tests', ['Test', 'Reason'], skipped_tests_rows)
addSimpleTable('Test performance changes',
addSimpleTable('Test Performance Changes',
['Test', 'Ratio of speedup&nbsp;(-) or slowdown&nbsp;(+)', 'Queries', 'Total not OK', 'Changed perf', 'Unstable'],
tsvRows('report/test-perf-changes.tsv'))
@ -457,39 +468,39 @@ if args.report == 'main':
return
columns = [
'Test', #0
'Test', #0
'Wall clock time,&nbsp;s', #1
'Total client time,&nbsp;s', #2
'Total queries', #3
'Ignored short queries', #4
'Longest query<br>(sum for all runs),&nbsp;s', #5
'Avg wall clock time<br>(sum for all runs),&nbsp;s', #6
'Shortest query<br>(sum for all runs),&nbsp;s', #7
'Total queries', #3
'Longest query<br>(sum for all runs),&nbsp;s', #4
'Avg wall clock time<br>(sum for all runs),&nbsp;s', #5
'Shortest query<br>(sum for all runs),&nbsp;s', #6
'', # Runs #7
]
text = tableStart('Test times')
text += tableHeader(columns)
nominal_runs = 7 # FIXME pass this as an argument
total_runs = (nominal_runs + 1) * 2 # one prewarm run, two servers
allowed_average_run_time = allowed_single_run_time + 60 / total_runs; # some allowance for fill/create queries
attrs = ['' for c in columns]
attrs[7] = None
text = tableStart('Test Times')
text += tableHeader(columns, attrs)
allowed_average_run_time = 1.6 # 30 seconds per test at 7 runs
for r in rows:
anchor = f'{currentTableAnchor()}.{r[0]}'
if float(r[6]) > allowed_average_run_time * total_runs:
total_runs = (int(r[7]) + 1) * 2 # one prewarm run, two servers
if float(r[5]) > allowed_average_run_time * total_runs:
# FIXME should be 15s max -- investigate parallel_insert
slow_average_tests += 1
attrs[6] = f'style="background: {color_bad}"'
attrs[5] = f'style="background: {color_bad}"'
errors_explained.append([f'<a href="#{anchor}">The test \'{r[0]}\' is too slow to run as a whole. Investigate whether the create and fill queries can be sped up'])
else:
attrs[6] = ''
attrs[5] = ''
if float(r[5]) > allowed_single_run_time * total_runs:
if float(r[4]) > allowed_single_run_time * total_runs:
slow_average_tests += 1
attrs[5] = f'style="background: {color_bad}"'
attrs[4] = f'style="background: {color_bad}"'
errors_explained.append([f'<a href="./all-queries.html#all-query-times.{r[0]}.0">Some query of the test \'{r[0]}\' is too slow to run. See the all queries report'])
else:
attrs[5] = ''
attrs[4] = ''
text += tableRow(r, attrs, anchor)
@ -498,74 +509,7 @@ if args.report == 'main':
add_test_times()
def add_benchmark_results():
if not os.path.isfile('benchmark/website-left.json'):
return
json_reports = [json.load(open(f'benchmark/website-{x}.json')) for x in ['left', 'right']]
stats = [next(iter(x.values()))["statistics"] for x in json_reports]
qps = [x["QPS"] for x in stats]
queries = [x["num_queries"] for x in stats]
errors = [x["num_errors"] for x in stats]
relative_diff = (qps[1] - qps[0]) / max(0.01, qps[0]);
times_diff = max(qps) / max(0.01, min(qps))
all_rows = []
header = ['Benchmark', 'Metric', 'Old', 'New', 'Relative difference', 'Times difference'];
attrs = ['' for x in header]
row = ['website', 'queries', f'{queries[0]:d}', f'{queries[1]:d}', '--', '--']
attrs[0] = 'rowspan=2'
all_rows.append([row, attrs])
attrs = ['' for x in header]
row = [None, 'queries/s', f'{qps[0]:.3f}', f'{qps[1]:.3f}', f'{relative_diff:.3f}', f'x{times_diff:.3f}']
if abs(relative_diff) > 0.1:
# More queries per second is better.
if relative_diff > 0.:
attrs[4] = f'style="background: {color_good}"'
else:
attrs[4] = f'style="background: {color_bad}"'
else:
attrs[4] = ''
all_rows.append([row, attrs]);
if max(errors):
all_rows[0][1][0] = "rowspan=3"
row = [''] * (len(header))
attrs = ['' for x in header]
attrs[0] = None
row[1] = 'errors'
row[2] = f'{errors[0]:d}'
row[3] = f'{errors[1]:d}'
row[4] = '--'
row[5] = '--'
if errors[0]:
attrs[2] += f' style="background: {color_bad}" '
if errors[1]:
attrs[3] += f' style="background: {color_bad}" '
all_rows.append([row, attrs])
text = tableStart('Concurrent benchmarks')
text += tableHeader(header)
for row, attrs in all_rows:
text += tableRow(row, attrs)
text += tableEnd()
global tables
tables.append(text)
try:
add_benchmark_results()
except:
report_errors.append(
traceback.format_exception_only(
*sys.exc_info()[:2])[-1])
pass
addSimpleTable('Metric changes',
addSimpleTable('Metric Changes',
['Metric', 'Old median value', 'New median value',
'Relative difference', 'Times difference'],
tsvRows('metrics/changes.tsv'))
@ -644,8 +588,8 @@ elif args.report == 'all-queries':
return
columns = [
# Changed #0
# Unstable #1
'', # Changed #0
'', # Unstable #1
'Old,&nbsp;s', #2
'New,&nbsp;s', #3
'Ratio of speedup&nbsp;(-) or slowdown&nbsp;(+)', #4
@ -655,13 +599,13 @@ elif args.report == 'all-queries':
'#', #8
'Query', #9
]
text = tableStart('All query times')
text += tableHeader(columns)
attrs = ['' for c in columns]
attrs[0] = None
attrs[1] = None
text = tableStart('All Query Times')
text += tableHeader(columns, attrs)
for r in rows:
anchor = f'{currentTableAnchor()}.{r[7]}.{r[8]}'
if int(r[1]):

View File

@ -8,26 +8,8 @@ dpkg -i package_folder/clickhouse-server_*.deb
dpkg -i package_folder/clickhouse-client_*.deb
dpkg -i package_folder/clickhouse-test_*.deb
mkdir -p /etc/clickhouse-server/dict_examples
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/listen.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/metric_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/readonly.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/
if [[ -n "$USE_DATABASE_ATOMIC" ]] && [[ "$USE_DATABASE_ATOMIC" -eq 1 ]]; then
ln -s /usr/share/clickhouse-test/config/database_atomic_configd.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/database_atomic_usersd.xml /etc/clickhouse-server/users.d/
fi
# install test configs
/usr/share/clickhouse-test/config/install.sh
function start()
{

View File

@ -48,28 +48,8 @@ mkdir -p /var/lib/clickhouse
mkdir -p /var/log/clickhouse-server
chmod 777 -R /var/log/clickhouse-server/
# Temorary way to keep CI green while moving dictionaries to separate directory
mkdir -p /etc/clickhouse-server/dict_examples
chmod 777 -R /etc/clickhouse-server/dict_examples
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/dict_examples/; \
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/dict_examples/; \
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/dict_examples/;
ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/listen.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/metric_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/readonly.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/
# Retain any pre-existing config and allow ClickHouse to load those if required
ln -s --backup=simple --suffix=_original.xml \
/usr/share/clickhouse-test/config/query_masking_rules.xml /etc/clickhouse-server/config.d/
# install test configs
/usr/share/clickhouse-test/config/install.sh
function start()
{

View File

@ -21,9 +21,7 @@ RUN apt-get update -y \
telnet \
tree \
unixodbc \
wget \
zookeeper \
zookeeperd
wget
RUN mkdir -p /tmp/clickhouse-odbc-tmp \
&& wget -nv -O - ${odbc_driver_url} | tar --strip-components=1 -xz -C /tmp/clickhouse-odbc-tmp \

View File

@ -8,55 +8,9 @@ dpkg -i package_folder/clickhouse-server_*.deb
dpkg -i package_folder/clickhouse-client_*.deb
dpkg -i package_folder/clickhouse-test_*.deb
mkdir -p /etc/clickhouse-server/dict_examples
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/listen.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/metric_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/custom_settings_prefixes.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/readonly.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/access_management.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/executable_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/disks.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/secure_ports.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/clusters.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/graphite.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/server.key /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/server.crt /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/dhparam.pem /etc/clickhouse-server/
# install test configs
/usr/share/clickhouse-test/config/install.sh
# Retain any pre-existing config and allow ClickHouse to load it if required
ln -s --backup=simple --suffix=_original.xml \
/usr/share/clickhouse-test/config/query_masking_rules.xml /etc/clickhouse-server/config.d/
if [[ -n "$USE_POLYMORPHIC_PARTS" ]] && [[ "$USE_POLYMORPHIC_PARTS" -eq 1 ]]; then
ln -s /usr/share/clickhouse-test/config/polymorphic_parts.xml /etc/clickhouse-server/config.d/
fi
if [[ -n "$USE_DATABASE_ATOMIC" ]] && [[ "$USE_DATABASE_ATOMIC" -eq 1 ]]; then
ln -s /usr/share/clickhouse-test/config/database_atomic_configd.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/database_atomic_usersd.xml /etc/clickhouse-server/users.d/
fi
ln -sf /usr/share/clickhouse-test/config/client_config.xml /etc/clickhouse-client/config.xml
echo "TSAN_OPTIONS='verbosity=1000 halt_on_error=1 history_size=7'" >> /etc/environment
echo "TSAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment
echo "ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "UBSAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "LLVM_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
service zookeeper start
sleep 5
service clickhouse-server start && sleep 5
if cat /usr/bin/clickhouse-test | grep -q -- "--use-skip-list"; then

View File

@ -66,9 +66,7 @@ RUN apt-get --allow-unauthenticated update -y \
unixodbc \
unixodbc-dev \
wget \
zlib1g-dev \
zookeeper \
zookeeperd
zlib1g-dev
RUN mkdir -p /tmp/clickhouse-odbc-tmp \
&& wget -nv -O - ${odbc_driver_url} | tar --strip-components=1 -xz -C /tmp/clickhouse-odbc-tmp \

View File

@ -8,55 +8,9 @@ dpkg -i package_folder/clickhouse-server_*.deb
dpkg -i package_folder/clickhouse-client_*.deb
dpkg -i package_folder/clickhouse-test_*.deb
mkdir -p /etc/clickhouse-server/dict_examples
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/listen.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/metric_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/custom_settings_prefixes.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/readonly.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/access_management.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/executable_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/disks.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/secure_ports.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/clusters.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/graphite.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/server.key /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/server.crt /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/dhparam.pem /etc/clickhouse-server/
# install test configs
/usr/share/clickhouse-test/config/install.sh
# Retain any pre-existing config and allow ClickHouse to load it if required
ln -s --backup=simple --suffix=_original.xml \
/usr/share/clickhouse-test/config/query_masking_rules.xml /etc/clickhouse-server/config.d/
if [[ -n "$USE_POLYMORPHIC_PARTS" ]] && [[ "$USE_POLYMORPHIC_PARTS" -eq 1 ]]; then
ln -s /usr/share/clickhouse-test/config/polymorphic_parts.xml /etc/clickhouse-server/config.d/
fi
if [[ -n "$USE_DATABASE_ATOMIC" ]] && [[ "$USE_DATABASE_ATOMIC" -eq 1 ]]; then
ln -s /usr/share/clickhouse-test/config/database_atomic_configd.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/database_atomic_usersd.xml /etc/clickhouse-server/users.d/
fi
ln -sf /usr/share/clickhouse-test/config/client_config.xml /etc/clickhouse-client/config.xml
echo "TSAN_OPTIONS='verbosity=1000 halt_on_error=1 history_size=7'" >> /etc/environment
echo "TSAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment
echo "ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "UBSAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "LLVM_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
service zookeeper start
sleep 5
service clickhouse-server start && sleep 5
if cat /usr/bin/clickhouse-test | grep -q -- "--use-skip-list"; then

View File

@ -11,8 +11,6 @@ RUN apt-get update -y \
tzdata \
fakeroot \
debhelper \
zookeeper \
zookeeperd \
expect \
python \
python-lxml \

View File

@ -39,41 +39,8 @@ mkdir -p /var/log/clickhouse-server
chmod 777 -R /var/lib/clickhouse
chmod 777 -R /var/log/clickhouse-server/
# Temorary way to keep CI green while moving dictionaries to separate directory
mkdir -p /etc/clickhouse-server/dict_examples
chmod 777 -R /etc/clickhouse-server/dict_examples
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/dict_examples/; \
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/dict_examples/; \
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/dict_examples/;
ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/listen.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/metric_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/readonly.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/access_management.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/executable_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/disks.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/secure_ports.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/clusters.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/graphite.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/server.key /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/server.crt /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/dhparam.pem /etc/clickhouse-server/
ln -sf /usr/share/clickhouse-test/config/client_config.xml /etc/clickhouse-client/config.xml
# Retain any pre-existing config and allow ClickHouse to load it if required
ln -s --backup=simple --suffix=_original.xml \
/usr/share/clickhouse-test/config/query_masking_rules.xml /etc/clickhouse-server/config.d/
service zookeeper start
sleep 5
# install test configs
/usr/share/clickhouse-test/config/install.sh
start_clickhouse

View File

@ -39,12 +39,9 @@ function start()
done
}
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/
# install test configs
/usr/share/clickhouse-test/config/install.sh
echo "TSAN_OPTIONS='halt_on_error=1 history_size=7 ignore_noninstrumented_modules=1 verbosity=1'" >> /etc/environment
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment
echo "ASAN_OPTIONS='malloc_context_size=10 verbosity=1 allocator_release_to_os_interval_ms=10000'" >> /etc/environment
start

View File

@ -29,7 +29,7 @@ def get_options(i):
if 0 < i:
options += " --order=random"
if i % 2 == 1:
options += " --atomic-db-engine"
options += " --db-engine=Ordinary"
return options

View File

@ -35,7 +35,7 @@ RUN apt-get update \
ENV TZ=Europe/Moscow
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN pip3 install urllib3 testflows==1.6.42 docker-compose docker dicttoxml kazoo tzlocal
RUN pip3 install urllib3 testflows==1.6.48 docker-compose docker dicttoxml kazoo tzlocal
ENV DOCKER_CHANNEL stable
ENV DOCKER_VERSION 17.09.1-ce

View File

@ -5,12 +5,5 @@ ENV TZ=Europe/Moscow
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get install gdb
CMD ln -s /usr/lib/llvm-8/bin/llvm-symbolizer /usr/bin/llvm-symbolizer; \
echo "TSAN_OPTIONS='halt_on_error=1 history_size=7'" >> /etc/environment; \
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment; \
echo "ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-6.0/bin/llvm-symbolizer" >> /etc/environment; \
echo "UBSAN_SYMBOLIZER_PATH=/usr/lib/llvm-6.0/bin/llvm-symbolizer" >> /etc/environment; \
echo "TSAN_SYMBOLIZER_PATH=/usr/lib/llvm-8/bin/llvm-symbolizer" >> /etc/environment; \
echo "LLVM_SYMBOLIZER_PATH=/usr/lib/llvm-6.0/bin/llvm-symbolizer" >> /etc/environment; \
service zookeeper start && sleep 7 && /usr/share/zookeeper/bin/zkCli.sh -server localhost:2181 -create create /clickhouse_test ''; \
CMD service zookeeper start && sleep 7 && /usr/share/zookeeper/bin/zkCli.sh -server localhost:2181 -create create /clickhouse_test ''; \
gdb -q -ex 'set print inferior-events off' -ex 'set confirm off' -ex 'set print thread-events off' -ex run -ex bt -ex quit --args ./unit_tests_dbms | tee test_output/test_result.txt

View File

@ -0,0 +1,121 @@
## Developer's guide for adding new CMake options
### Don't be obvious. Be informative.
Bad:
```cmake
option (ENABLE_TESTS "Enables testing" OFF)
```
This description is quite useless as is neither gives the viewer any additional information nor explains the option purpose.
Better:
```cmake
option(ENABLE_TESTS "Provide unit_test_dbms target with Google.test unit tests" OFF)
```
If the option's purpose can't be guessed by its name, or the purpose guess may be misleading, or option has some
pre-conditions, leave a comment above the `option()` line and explain what it does.
The best way would be linking the docs page (if it exists).
The comment is parsed into a separate column (see below).
Even better:
```cmake
# implies ${TESTS_ARE_ENABLED}
# see tests/CMakeLists.txt for implementation detail.
option(ENABLE_TESTS "Provide unit_test_dbms target with Google.test unit tests" OFF)
```
### If the option's state could produce unwanted (or unusual) result, explicitly warn the user.
Suppose you have an option that may strip debug symbols from the ClickHouse's part.
This can speed up the linking process, but produces a binary that cannot be debugged.
In that case, prefer explicitly raising a warning telling the developer that he may be doing something wrong.
Also, such options should be disabled if applies.
Bad:
```cmake
option(STRIP_DEBUG_SYMBOLS_FUNCTIONS
"Do not generate debugger info for ClickHouse functions.
${STRIP_DSF_DEFAULT})
if (STRIP_DEBUG_SYMBOLS_FUNCTIONS)
target_compile_options(clickhouse_functions PRIVATE "-g0")
endif()
```
Better:
```cmake
# Provides faster linking and lower binary size.
# Tradeoff is the inability to debug some source files with e.g. gdb
# (empty stack frames and no local variables)."
option(STRIP_DEBUG_SYMBOLS_FUNCTIONS
"Do not generate debugger info for ClickHouse functions."
${STRIP_DSF_DEFAULT})
if (STRIP_DEBUG_SYMBOLS_FUNCTIONS)
message(WARNING "Not generating debugger info for ClickHouse functions")
target_compile_options(clickhouse_functions PRIVATE "-g0")
endif()
```
### In the option's description, explain WHAT the option does rather than WHY it does something.
The WHY explanation should be placed in the comment.
You may find that the option's name is self-descriptive.
Bad:
```cmake
option(ENABLE_THINLTO "Enable Thin LTO. Only applicable for clang. It's also suppressed when building with tests or sanitizers." ON)
```
Better:
```cmake
# Only applicable for clang.
# Turned off when building with tests or sanitizers.
option(ENABLE_THINLTO "Clang-specific link time optimisation" ON).
```
### Don't assume other developers know as much as you do.
In ClickHouse, there are many tools used that an ordinary developer may not know. If you are in doubt, give a link to
the tool's docs. It won't take much of your time.
Bad:
```cmake
option(ENABLE_THINLTO "Enable Thin LTO. Only applicable for clang. It's also suppressed when building with tests or sanitizers." ON)
```
Better (combined with the above hint):
```cmake
# https://clang.llvm.org/docs/ThinLTO.html
# Only applicable for clang.
# Turned off when building with tests or sanitizers.
option(ENABLE_THINLTO "Clang-specific link time optimisation" ON).
```
Other example, bad:
```cmake
option (USE_INCLUDE_WHAT_YOU_USE "Use 'include-what-you-use' tool" OFF)
```
Better:
```cmake
# https://github.com/include-what-you-use/include-what-you-use
option (USE_INCLUDE_WHAT_YOU_USE "Reduce unneeded #include s (external tool)" OFF)
```
### Prefer consistent default values.
CMake allows you to pass a plethora of values representing boolean `true/false`, e.g. `1, ON, YES, ...`.
Prefer the `ON/OFF` values, if possible.

View File

@ -0,0 +1,34 @@
# CMake in ClickHouse
## TL; DR How to make ClickHouse compile and link faster?
Developer only! This command will likely fulfill most of your needs. Run before calling `ninja`.
```cmake
cmake .. \
-DCMAKE_C_COMPILER=/bin/clang-10 \
-DCMAKE_CXX_COMPILER=/bin/clang++-10 \
-DCMAKE_BUILD_TYPE=Debug \
-DENABLE_CLICKHOUSE_ALL=OFF \
-DENABLE_CLICKHOUSE_SERVER=ON \
-DENABLE_CLICKHOUSE_CLIENT=ON \
-DUSE_STATIC_LIBRARIES=OFF \
-DCLICKHOUSE_SPLIT_BINARY=ON \
-DSPLIT_SHARED_LIBRARIES=ON \
-DENABLE_LIBRARIES=OFF \
-DENABLE_UTILS=OFF \
-DENABLE_TESTS=OFF
```
## CMake files types
1. ClickHouse's source CMake files (located in the root directory and in `/src`).
2. Arch-dependent CMake files (located in `/cmake/*os_name*`).
3. Libraries finders (search for contrib libraries, located in `/cmake/find`).
3. Contrib build CMake files (used instead of libraries' own CMake files, located in `/cmake/modules`)
## List of CMake flags
* This list is auto-generated by [this Python script](https://github.com/clickhouse/clickhouse/blob/master/docs/tools/cmake_in_clickhouse_generator.py).
* The flag name is a link to its position in the code.
* If an option's default value is itself an option, it's also a link to its position in this list.

View File

@ -38,6 +38,7 @@ toc_title: Adopters
| <a href="https://db.com" class="favicon">Deutsche Bank</a> | Finance | BI Analytics | — | — | [Slides in English, October 2019](https://bigdatadays.ru/wp-content/uploads/2019/10/D2-H3-3_Yakunin-Goihburg.pdf) |
| <a href="https://www.diva-e.com" class="favicon">Diva-e</a> | Digital consulting | Main Product | — | — | [Slides in English, September 2019](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup29/ClickHouse-MeetUp-Unusual-Applications-sd-2019-09-17.pdf) |
| <a href="https://www.ecwid.com/" class="favicon">Ecwid</a> | E-commerce SaaS | Metrics, Logging | — | — | [Slides in Russian, April 2019](https://nastachku.ru/var/files/1/presentation/backend/2_Backend_6.pdf) |
| <a href="https://www.ebay.com/" class="favicon">eBay</a> | E-commerce | Logs, Metrics and Events | — | — | [Official website, Sep 2020](https://tech.ebayinc.com/engineering/ou-online-analytical-processing/) |
| <a href="https://www.exness.com" class="favicon">Exness</a> | Trading | Metrics, Logging | — | — | [Talk in Russian, May 2019](https://youtu.be/_rpU-TvSfZ8?t=3215) |
| <a href="https://fastnetmon.com/" class="favicon">FastNetMon</a> | DDoS Protection | Main Product | | — | [Official website](https://fastnetmon.com/docs-fnm-advanced/fastnetmon-advanced-traffic-persistency/) |
| <a href="https://www.flipkart.com/" class="favicon">Flipkart</a> | e-Commerce | — | — | — | [Talk in English, July 2020](https://youtu.be/GMiXCMFDMow?t=239) |

View File

@ -13,49 +13,41 @@ With this instruction you can run basic ClickHouse performance test on any serve
4. ssh to the server and download it with wget:
```bash
# For amd64:
wget https://clickhouse-builds.s3.yandex.net/0/00ba767f5d2a929394ea3be193b1f79074a1c4bc/1578163263_binary/clickhouse
wget https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_build_check/gcc-10_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse
# For aarch64:
wget https://clickhouse-builds.s3.yandex.net/0/00ba767f5d2a929394ea3be193b1f79074a1c4bc/1578161264_binary/clickhouse
wget https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_special_build_check/clang-10-aarch64_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse
# Then do:
chmod a+x clickhouse
```
5. Download configs:
```bash
wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/programs/server/config.xml
wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/programs/server/users.xml
mkdir config.d
wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/programs/server/config.d/path.xml -O config.d/path.xml
wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/programs/server/config.d/log_to_console.xml -O config.d/log_to_console.xml
```
6. Download benchmark files:
5. Download benchmark files:
```bash
wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/benchmark/clickhouse/benchmark-new.sh
chmod a+x benchmark-new.sh
wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/benchmark/clickhouse/queries.sql
```
7. Download test data according to the [Yandex.Metrica dataset](../getting-started/example-datasets/metrica.md) instruction (“hits” table containing 100 million rows).
6. Download test data according to the [Yandex.Metrica dataset](../getting-started/example-datasets/metrica.md) instruction (“hits” table containing 100 million rows).
```bash
wget https://clickhouse-datasets.s3.yandex.net/hits/partitions/hits_100m_obfuscated_v1.tar.xz
tar xvf hits_100m_obfuscated_v1.tar.xz -C .
mv hits_100m_obfuscated_v1/* .
```
8. Run the server:
7. Run the server:
```bash
./clickhouse server
```
9. Check the data: ssh to the server in another terminal
8. Check the data: ssh to the server in another terminal
```bash
./clickhouse client --query "SELECT count() FROM hits_100m_obfuscated"
100000000
```
10. Edit the benchmark-new.sh, change `clickhouse-client` to `./clickhouse client` and add `--max_memory_usage 100000000000` parameter.
9. Edit the benchmark-new.sh, change `clickhouse-client` to `./clickhouse client` and add `--max_memory_usage 100000000000` parameter.
```bash
mcedit benchmark-new.sh
```
11. Run the benchmark:
10. Run the benchmark:
```bash
./benchmark-new.sh hits_100m_obfuscated
```
12. Send the numbers and the info about your hardware configuration to clickhouse-feedback@yandex-team.com
11. Send the numbers and the info about your hardware configuration to clickhouse-feedback@yandex-team.com
All the results are published here: https://clickhouse.tech/benchmark/hardware/

View File

@ -521,6 +521,22 @@ For more information, see the MergeTreeSettings.h header file.
</merge_tree>
```
## replicated\_merge\_tree {#server_configuration_parameters-replicated_merge_tree}
Fine tuning for tables in the [ReplicatedMergeTree](../../engines/table-engines/mergetree-family/mergetree.md).
This setting has higher priority.
For more information, see the MergeTreeSettings.h header file.
**Example**
``` xml
<replicated_merge_tree>
<max_suspicious_broken_parts>5</max_suspicious_broken_parts>
</replicated_merge_tree>
```
## openSSL {#server_configuration_parameters-openssl}
SSL client/server configuration.

View File

@ -1817,7 +1817,7 @@ Default value: 8192.
Turns on or turns off using of single dictionary for the data part.
By default, ClickHouse server monitors the size of dictionaries and if a dictionary overflows then the server starts to write the next one. To prohibit creating several dictionaries set `low_cardinality_use_single_dictionary_for_part = 1`.
By default, the ClickHouse server monitors the size of dictionaries and if a dictionary overflows then the server starts to write the next one. To prohibit creating several dictionaries set `low_cardinality_use_single_dictionary_for_part = 1`.
Possible values:
@ -1976,4 +1976,54 @@ Possible values:
Default value: `120` seconds.
## output_format_pretty_max_value_width {#output_format_pretty_max_value_width}
Limits the width of value displayed in [Pretty](../../interfaces/formats.md#pretty) formats. If the value width exceeds the limit, the value is cut.
Possible values:
- Positive integer.
- 0 — The value is cut completely.
Default value: `10000` symbols.
**Examples**
Query:
```sql
SET output_format_pretty_max_value_width = 10;
SELECT range(number) FROM system.numbers LIMIT 10 FORMAT PrettyCompactNoEscapes;
```
Result:
```text
┌─range(number)─┐
│ [] │
│ [0] │
│ [0,1] │
│ [0,1,2] │
│ [0,1,2,3] │
│ [0,1,2,3,4⋯ │
│ [0,1,2,3,4⋯ │
│ [0,1,2,3,4⋯ │
│ [0,1,2,3,4⋯ │
│ [0,1,2,3,4⋯ │
└───────────────┘
```
Query with zero width:
```sql
SET output_format_pretty_max_value_width = 0;
SELECT range(number) FROM system.numbers LIMIT 5 FORMAT PrettyCompactNoEscapes;
```
Result:
```text
┌─range(number)─┐
│ ⋯ │
│ ⋯ │
│ ⋯ │
│ ⋯ │
│ ⋯ │
└───────────────┘
```
[Original article](https://clickhouse.tech/docs/en/operations/settings/settings/) <!-- hide -->

View File

@ -0,0 +1,42 @@
# ClickHouse obfuscator
Simple tool for table data obfuscation.
It reads input table and produces output table, that retain some properties of input, but contains different data.
It allows to publish almost real production data for usage in benchmarks.
It is designed to retain the following properties of data:
- cardinalities of values (number of distinct values) for every column and for every tuple of columns;
- conditional cardinalities: number of distinct values of one column under condition on value of another column;
- probability distributions of absolute value of integers; sign of signed integers; exponent and sign for floats;
- probability distributions of length of strings;
- probability of zero values of numbers; empty strings and arrays, NULLs;
- data compression ratio when compressed with LZ77 and entropy family of codecs;
- continuity (magnitude of difference) of time values across table; continuity of floating point values.
- date component of DateTime values;
- UTF-8 validity of string values;
- string values continue to look somewhat natural.
Most of the properties above are viable for performance testing:
reading data, filtering, aggregation and sorting will work at almost the same speed
as on original data due to saved cardinalities, magnitudes, compression ratios, etc.
It works in deterministic fashion: you define a seed value and transform is totally determined by input data and by seed.
Some transforms are one to one and could be reversed, so you need to have large enough seed and keep it in secret.
It use some cryptographic primitives to transform data, but from the cryptographic point of view,
It doesn't do anything properly and you should never consider the result as secure, unless you have other reasons for it.
It may retain some data you don't want to publish.
It always leave numbers 0, 1, -1 as is. Also it leaves dates, lengths of arrays and null flags exactly as in source data.
For example, you have a column IsMobile in your table with values 0 and 1. In transformed data, it will have the same value.
So, the user will be able to count exact ratio of mobile traffic.
Another example, suppose you have some private data in your table, like user email and you don't want to publish any single email address.
If your table is large enough and contain multiple different emails and there is no email that have very high frequency than all others,
It will perfectly anonymize all data. But if you have small amount of different values in a column, it can possibly reproduce some of them.
And you should take care and look at exact algorithm, how this tool works, and probably fine tune some of it command line parameters.
This tool works fine only with reasonable amount of data (at least 1000s of rows).

View File

@ -6,10 +6,13 @@ toc_priority: 143
Syntax: `maxMap(key, value)` or `maxMap(Tuple(key, value))`
Calculates the maximum from `value` array according to the keys specified in the key array.
Passing tuple of keys and values arrays is synonymical to passing two arrays of keys and values.
The number of elements in key and value must be the same for each row that is totaled.
Returns a tuple of two arrays: keys in sorted order, and values calculated for the corresponding keys.
Calculates the maximum from `value` array according to the keys specified in the `key` array.
Passing a tuple of keys and value arrays is identical to passing two arrays of keys and values.
The number of elements in `key` and `value` must be the same for each row that is totaled.
Returns a tuple of two arrays: keys and values calculated for the corresponding keys.
Example:

View File

@ -8,7 +8,7 @@ Syntax: `minMap(key, value)` or `minMap(Tuple(key, value))`
Calculates the minimum from `value` array according to the keys specified in the `key` array.
Passing tuple of keys and values arrays is a synonym to passing two arrays of keys and values.
Passing a tuple of keys and value arrays is identical to passing two arrays of keys and values.
The number of elements in `key` and `value` must be the same for each row that is totaled.

View File

@ -21,7 +21,7 @@ LowCardinality(data_type)
`LowCardinality` is a superstructure that changes a data storage method and rules of data processing. ClickHouse applies [dictionary coding](https://en.wikipedia.org/wiki/Dictionary_coder) to `LowCardinality`-columns. Operating with dictionary encoded data significantly increases performance of [SELECT](../../sql-reference/statements/select/index.md) queries for many applications.
The efficiency of using `LowCarditality` data type depends on data diversity. If a dictionary contains less than 10,000 distinct values, then ClickHouse mostly shows higher efficiency of data reading and storing. If a dictionary contains more than 100,000 distinct values, then ClickHouse can perform worse in comparison with using ordinary data types.
The efficiency of using `LowCardinality` data type depends on data diversity. If a dictionary contains less than 10,000 distinct values, then ClickHouse mostly shows higher efficiency of data reading and storing. If a dictionary contains more than 100,000 distinct values, then ClickHouse can perform worse in comparison with using ordinary data types.
Consider using `LowCardinality` instead of [Enum](../../sql-reference/data-types/enum.md) when working with strings. `LowCardinality` provides more flexibility in use and often reveals the same or higher efficiency.

View File

@ -9,8 +9,7 @@ toc_title: Working with maps
Collect all the keys and sum corresponding values.
Arguments are tuples of two arrays, where items in the first array represent keys, and the second array
contains values for the each key.
Arguments are tuples of two arrays, where items in the first array represent keys, and the second array contains values for the each key.
All key arrays should have same type, and all value arrays should contain items which are promotable to the one type (Int64, UInt64 or Float64).
The common promoted type is used as a type for the result array.
@ -30,8 +29,7 @@ SELECT mapAdd(([toUInt8(1), 2], [1, 1]), ([toUInt8(1), 2], [1, 1])) as res, toTy
Collect all the keys and subtract corresponding values.
Arguments are tuples of two arrays, where items in the first array represent keys, and the second array
contains values for the each key.
Arguments are tuples of two arrays, where items in the first array represent keys, and the second array contains values for the each key.
All key arrays should have same type, and all value arrays should contain items which are promotable to the one type (Int64, UInt64 or Float64).
The common promoted type is used as a type for the result array.
@ -45,25 +43,24 @@ SELECT mapSubtract(([toUInt8(1), 2], [toInt32(1), 1]), ([toUInt8(1), 2], [toInt3
┌─res────────────┬─type──────────────────────────────┐
│ ([1,2],[-1,0]) │ Tuple(Array(UInt8), Array(Int64)) │
└────────────────┴───────────────────────────────────┘
````
```
## mapPopulateSeries {#function-mappopulateseries}
Syntax: `mapPopulateSeries((keys : Array(<IntegerType>), values : Array(<IntegerType>)[, max : <IntegerType>])`
Generates a map, where keys are a series of numbers, from minimum to maximum keys (or `max` argument if it specified) taken from `keys` array with step size of one,
and corresponding values taken from `values` array. If the value is not specified for the key, then it uses default value in the resulting map.
Generates a map, where keys are a series of numbers, from minimum to maximum keys (or `max` argument if it specified) taken from `keys` array with step size of one, and corresponding values taken from `values` array. If the value is not specified for the key, then it uses default value in the resulting map.
For repeated keys only the first value (in order of appearing) gets associated with the key.
The number of elements in `keys` and `values` must be the same for each row.
Returns a tuple of two arrays: keys in sorted order, and values the corresponding keys.
``` sql
```sql
select mapPopulateSeries([1,2,4], [11,22,44], 5) as res, toTypeName(res) as type;
```
``` text
```text
┌─res──────────────────────────┬─type──────────────────────────────┐
│ ([1,2,3,4,5],[11,22,0,44,0]) │ Tuple(Array(UInt8), Array(UInt8)) │
└──────────────────────────────┴───────────────────────────────────┘

View File

@ -516,14 +516,14 @@ Result:
**See Also**
- \[ISO 8601 announcement by @xkcd\](https://xkcd.com/1179/)
- [ISO 8601 announcement by @xkcd](https://xkcd.com/1179/)
- [RFC 1123](https://tools.ietf.org/html/rfc1123)
- [toDate](#todate)
- [toDateTime](#todatetime)
## parseDateTimeBestEffortUS {#parsedatetimebesteffortUS}
This function is similar to [parseDateTimeBestEffort](#parsedatetimebesteffort), the only difference is that this function prefers US style (`MM/DD/YYYY` etc) in case of ambiguouty.
This function is similar to [parseDateTimeBestEffort](#parsedatetimebesteffort), the only difference is that this function prefers US date format (`MM/DD/YYYY` etc.) in case of ambiguity.
**Syntax**
@ -541,7 +541,7 @@ parseDateTimeBestEffortUS(time_string [, time_zone]);
- A string containing 9..10 digit [unix timestamp](https://en.wikipedia.org/wiki/Unix_time).
- A string with a date and a time component: `YYYYMMDDhhmmss`, `MM/DD/YYYY hh:mm:ss`, `MM-DD-YY hh:mm`, `YYYY-MM-DD hh:mm:ss`, etc.
- A string with a date, but no time component: `YYYY`, `YYYYMM`, `YYYY*MM`, `MM/DD/YYYY`, `MM-DD-YY` etc.
- A string with a day and time: `DD`, `DD hh`, `DD hh:mm`. In this case `YYYY-MM` are substituted as `2000-01`.
- A string with a day and time: `DD`, `DD hh`, `DD hh:mm`. In this case, `YYYY-MM` are substituted as `2000-01`.
- A string that includes the date and time along with time zone offset information: `YYYY-MM-DD hh:mm:ss ±h:mm`, etc. For example, `2020-12-12 17:36:00 -5:00`.
**Returned value**

View File

@ -6,4 +6,14 @@ toc_title: "\u041A\u043E\u043C\u043C\u0435\u0440\u0447\u0435\u0441\u043A\u0438\u
\ \u0443\u0441\u043B\u0443\u0433\u0438"
---
# Коммерческие услуги {#clickhouse-commercial-services}
Данный раздел содержит описание коммерческих услуг, предоставляемых для ClickHouse. Поставщики этих услуг — независимые компании, которые могут не быть аффилированы с Яндексом.
Категории услуг:
- Облачные услуги [Cloud](../commercial/cloud.md)
- Поддержка [Support](../commercial/support.md)
!!! note "Для поставщиков услуг"
Если вы — представитель компании-поставщика услуг, вы можете отправить запрос на добавление вашей компании и ваших услуг в соответствующий раздел данной документации (или на добавление нового раздела, если ваши услуги не соответствуют ни одной из существующих категорий). Чтобы отправить запрос (pull-request) на добавление описания в документацию, нажмите на значок "карандаша" в правом верхнем углу страницы. Если ваши услуги доступны в только отдельных регионах, не забудьте указать это на соответствующих локализованных страницах (и обязательно отметьте это при отправке заявки).

View File

@ -43,9 +43,6 @@ ORDER BY expr
Описание параметров смотрите в [описании запроса CREATE](../../../engines/table-engines/mergetree-family/mergetree.md).
!!! note "Примечание"
`INDEX` — экспериментальная возможность, смотрите [Индексы пропуска данных](#table_engine-mergetree-data_skipping-indexes).
### Секции запроса {#mergetree-query-clauses}
- `ENGINE` — имя и параметры движка. `ENGINE = MergeTree()`. `MergeTree` не имеет параметров.
@ -269,7 +266,7 @@ ClickHouse не может использовать индекс, если зн
ClickHouse использует эту логику не только для последовательностей дней месяца, но и для любого частично-монотонного первичного ключа.
### Индексы пропуска данных (экспериментальная функциональность) {#table_engine-mergetree-data_skipping-indexes}
### Индексы пропуска данных {#table_engine-mergetree-data_skipping-indexes}
Объявление индексов при определении столбцов в запросе `CREATE`.
@ -566,7 +563,7 @@ ALTER TABLE example_table
- `volume_name_N` — название тома. Названия томов должны быть уникальны.
- `disk` — диск, находящийся внутри тома.
- `max_data_part_size_bytes` — максимальный размер куска данных, который может находится на любом из дисков этого тома.
- `move_factor` — доля свободного места, при превышении которого данные начинают перемещаться на следующий том, если он есть (по умолчанию 0.1).
- `move_factor` — доля доступного свободного места на томе, если места становится меньше, то данные начнут перемещение на следующий том, если он есть (по умолчанию 0.1).
Примеры конфигураций:

View File

@ -1050,13 +1050,13 @@ $ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_
Для обмена данными с экосистемой Hadoop можно использовать движки таблиц [HDFS](../engines/table-engines/integrations/hdfs.md).
## Arrow {data-format-arrow}
## Arrow {#data-format-arrow}
[Apache Arrow](https://arrow.apache.org/) поставляется с двумя встроенными поколоночнами форматами хранения. ClickHouse поддерживает операции чтения и записи для этих форматов.
`Arrow` — это Apache Arrow's "file mode" формат. Он предназначен для произвольного доступа в памяти.
## ArrowStream {data-format-arrow-stream}
## ArrowStream {#data-format-arrow-stream}
`ArrowStream` — это Apache Arrow's "stream mode" формат. Он предназначен для обработки потоков в памяти.

View File

@ -484,7 +484,7 @@ INSERT INTO test VALUES (lower('Hello')), (lower('world')), (lower('INSERT')), (
См. также:
- [JOIN strictness](../../sql-reference/statements/select/join.md#select-join-strictness)
- [JOIN strictness](../../sql-reference/statements/select/join.md#join-settings)
## max\_block\_size {#setting-max_block_size}
@ -1616,6 +1616,63 @@ SELECT idx, i FROM null_in WHERE i IN (1, NULL) SETTINGS transform_null_in = 1;
- [Обработка значения NULL в операторе IN](../../sql-reference/operators/in.md#in-null-processing)
## low\_cardinality\_max\_dictionary\_size {#low_cardinality_max_dictionary_size}
Задает максимальный размер общего глобального словаря (в строках) для типа данных `LowCardinality`, который может быть записан в файловую систему хранилища. Настройка предотвращает проблемы с оперативной памятью в случае неограниченного увеличения словаря. Все данные, которые не могут быть закодированы из-за ограничения максимального размера словаря, ClickHouse записывает обычным способом.
Допустимые значения:
- Положительное целое число.
Значение по умолчанию: 8192.
## low\_cardinality\_use\_single\_dictionary\_for\_part {#low_cardinality_use_single_dictionary_for_part}
Включает или выключает использование единого словаря для куска (парта).
По умолчанию сервер ClickHouse следит за размером словарей, и если словарь переполняется, сервер создает следующий. Чтобы запретить создание нескольких словарей, задайте настройку `low_cardinality_use_single_dictionary_for_part = 1`.
Допустимые значения:
- 1 — Создание нескольких словарей для частей данных запрещено.
- 0 — Создание нескольких словарей для частей данных не запрещено.
Значение по умолчанию: 0.
## low\_cardinality\_allow\_in\_native\_format {#low_cardinality_allow_in_native_format}
Разрешает или запрещает использование типа данных `LowCardinality` с форматом данных [Native](../../interfaces/formats.md#native).
Если использование типа `LowCardinality` ограничено, сервер CLickHouse преобразует столбцы `LowCardinality` в обычные столбцы для запросов `SELECT`, а обычные столбцы - в столбцы `LowCardinality` для запросов `INSERT`.
В основном настройка используется для сторонних клиентов, не поддерживающих тип данных `LowCardinality`.
Допустимые значения:
- 1 — Использование `LowCardinality` не ограничено.
- 0 — Использование `LowCardinality` ограничено.
Значение по умолчанию: 1.
## allow\_suspicious\_low\_cardinality\_types {#allow_suspicious_low_cardinality_types}
Разрешает или запрещает использование типа данных `LowCardinality` с типами данных с фиксированным размером 8 байт или меньше: числовые типы данных и `FixedString (8_bytes_or_less)`.
Для небольших фиксированных значений использование `LowCardinality` обычно неэффективно, поскольку ClickHouse хранит числовой индекс для каждой строки. В результате:
- Используется больше дискового пространства.
- Потребление ОЗУ увеличивается, в зависимости от размера словаря.
- Некоторые функции работают медленнее из-за дополнительных операций кодирования.
Время слияния в таблицах на движке [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) также может увеличиться по описанным выше причинам.
Допустимые значения:
- 1 — Использование `LowCardinality` не ограничено.
- 0 — Использование `LowCardinality` ограничено.
Значение по умолчанию: 0.
## background_buffer_flush_schedule_pool_size {#background_buffer_flush_schedule_pool_size}
Задает количество потоков для выполнения фонового сброса данных в таблицах с движком [Buffer](../../engines/table-engines/special/buffer.md). Настройка применяется при запуске сервера ClickHouse и не может быть изменена в пользовательском сеансе.
@ -1756,6 +1813,60 @@ SELECT idx, i FROM null_in WHERE i IN (1, NULL) SETTINGS transform_null_in = 1;
- [Секции и настройки запроса CREATE TABLE](../../engines/table-engines/mergetree-family/mergetree.md#mergetree-query-clauses) (настройка `merge_with_ttl_timeout`)
- [Table TTL](../../engines/table-engines/mergetree-family/mergetree.md#mergetree-table-ttl)
## output_format_pretty_max_value_width {#output_format_pretty_max_value_width}
Ограничивает длину значения, выводимого в формате [Pretty](../../interfaces/formats.md#pretty). Если значение длиннее указанного количества символов, оно обрезается.
Возможные значения:
- Положительное целое число.
- 0 — значение обрезается полностью.
Значение по умолчанию: `10000` символов.
**Примеры**
Запрос:
```sql
SET output_format_pretty_max_value_width = 10;
SELECT range(number) FROM system.numbers LIMIT 10 FORMAT PrettyCompactNoEscapes;
```
Результат:
```text
┌─range(number)─┐
│ [] │
│ [0] │
│ [0,1] │
│ [0,1,2] │
│ [0,1,2,3] │
│ [0,1,2,3,4⋯ │
│ [0,1,2,3,4⋯ │
│ [0,1,2,3,4⋯ │
│ [0,1,2,3,4⋯ │
│ [0,1,2,3,4⋯ │
└───────────────┘
```
Запрос, где длина выводимого значения ограничена 0 символов:
```sql
SET output_format_pretty_max_value_width = 0;
SELECT range(number) FROM system.numbers LIMIT 5 FORMAT PrettyCompactNoEscapes;
```
Результат:
```text
┌─range(number)─┐
│ ⋯ │
│ ⋯ │
│ ⋯ │
│ ⋯ │
│ ⋯ │
└───────────────┘
```
## lock_acquire_timeout {#lock_acquire_timeout}
Устанавливает, сколько секунд сервер ожидает возможности выполнить блокировку таблицы.

View File

@ -9,7 +9,7 @@
- `volume_priority` ([UInt64](../../sql-reference/data-types/int-uint.md)) — порядковый номер тома согласно конфигурации.
- `disks` ([Array(String)](../../sql-reference/data-types/array.md)) — имена дисков, содержащихся в политике хранения.
- `max_data_part_size` ([UInt64](../../sql-reference/data-types/int-uint.md)) — максимальный размер куска данных, который может храниться на дисках тома (0 — без ограничений).
- `move_factor` ([Float64](../../sql-reference/data-types/float.md))\` — доля свободного места, при превышении которой данные начинают перемещаться на следующий том.
- `move_factor` — доля доступного свободного места на томе, если места становится меньше, то данные начнут перемещение на следующий том, если он есть (по умолчанию 0.1).
Если политика хранения содержит несколько томов, то каждому тому соответствует отдельная запись в таблице.

View File

@ -24,13 +24,16 @@
- [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes)
- [Distributed](../../engines/table-engines/special/distributed.md#distributed)
- `total_rows` (Nullable(UInt64)) - Общее количество строк, если есть возможность быстро определить точное количество строк в таблице, в противном случае `Null` (включая базовую таблицу `Buffer`).
- `total_rows` (Nullable(UInt64)) - общее количество строк, если есть возможность быстро определить точное количество строк в таблице, в противном случае `Null` (включая базовую таблицу `Buffer`).
- `total_bytes` (Nullable(UInt64)) - Общее количество байт, если можно быстро определить точное количество байт для таблицы на накопителе, в противном случае `Null` (**не включает** в себя никакого базового хранилища).
- `total_bytes` (Nullable(UInt64)) - общее количество байт, если можно быстро определить точное количество байт для таблицы на накопителе, в противном случае `Null` (**не включает** в себя никакого базового хранилища).
- Если таблица хранит данные на диске, возвращает используемое пространство на диске (т. е. сжатое).
- Если таблица хранит данные в памяти, возвращает приблизительное количество используемых байт в памяти.
- `lifetime_rows` (Nullable(UInt64)) - общее количество строк, добавленных оператором `INSERT` с момента запуска сервера (только для таблиц `Buffer`).
- `lifetime_bytes` (Nullable(UInt64)) - общее количество байт, добавленных оператором `INSERT` с момента запуска сервера (только для таблиц `Buffer`).
Таблица `system.tables` используется при выполнении запроса `SHOW TABLES`.

View File

@ -4,7 +4,7 @@ toc_priority: 128
# groupBitmap {#groupbitmap}
Bitmap или агрегатные вычисления для столбца с типом данных `UInt*`, возвращают кардинальность в виде значения типа UInt64, если добавить суффикс -State, то возвращают [объект bitmap](../../../sql-reference/functions/bitmap-functions.md).
Bitmap или агрегатные вычисления для столбца с типом данных `UInt*`, возвращают кардинальность в виде значения типа UInt64, если добавить суффикс `-State`, то возвращают [объект bitmap](../../../sql-reference/functions/bitmap-functions.md#bitmap-functions).
``` sql
groupBitmap(expr)

View File

@ -0,0 +1,28 @@
---
toc_priority: 143
---
# maxMap {#agg_functions-maxmap}
Синтаксис: `maxMap(key, value)` or `maxMap(Tuple(key, value))`
Вычисляет максимальные значения массива `value`, соответствующие ключам, указанным в массиве `key`.
Передача кортежа ключей и массивов значений идентична передаче двух массивов ключей и значений.
Количество элементов в параметрах `key` и `value` должно быть одинаковым для каждой суммируемой строки.
Возвращает кортеж из двух массивов: ключи и значения, рассчитанные для соответствующих ключей.
Пример:
``` sql
SELECT maxMap(a, b)
FROM values('a Array(Int32), b Array(Int64)', ([1, 2], [2, 2]), ([2, 3], [1, 1]))
```
``` text
┌─maxMap(a, b)──────┐
│ ([1,2,3],[2,2,1]) │
└───────────────────┘
```

View File

@ -0,0 +1,28 @@
---
toc_priority: 142
---
# minMap {#agg_functions-minmap}
Синтаксис: `minMap(key, value)` or `minMap(Tuple(key, value))`
Вычисляет минимальное значение массива `value` в соответствии с ключами, указанными в массиве `key`.
Передача кортежа ключей и массивов значений идентична передаче двух массивов ключей и значений.
Количество элементов в параметрах `key` и `value` должно быть одинаковым для каждой суммируемой строки.
Возвращает кортеж из двух массивов: ключи в отсортированном порядке и значения, рассчитанные для соответствующих ключей.
Пример:
``` sql
SELECT minMap(a, b)
FROM values('a Array(Int32), b Array(Int64)', ([1, 2], [2, 2]), ([2, 3], [1, 1]))
```
``` text
┌─minMap(a, b)──────┐
│ ([1,2,3],[2,1,1]) │
└───────────────────┘
```

View File

@ -1,3 +1,8 @@
---
toc_priority: 53
toc_title: AggregateFunction
---
# AggregateFunction {#data-type-aggregatefunction}
Агрегатные функции могут обладать определяемым реализацией промежуточным состоянием, которое может быть сериализовано в тип данных, соответствующий AggregateFunction(…), и быть записано в таблицу обычно посредством [материализованного представления] (../../sql-reference/statements/create.md#create-view). Чтобы получить промежуточное состояние, обычно используются агрегатные функции с суффиксом `-State`. Чтобы в дальнейшем получить агрегированные данные необходимо использовать те же агрегатные функции с суффиксом `-Merge`.

View File

@ -1,3 +1,8 @@
---
toc_priority: 52
toc_title: Array(T)
---
# Array(T) {#data-type-array}
Массив из элементов типа `T`.

View File

@ -0,0 +1,59 @@
---
toc_priority: 51
toc_title: LowCardinality
---
# LowCardinality {#lowcardinality-data-type}
Изменяет внутреннее представление других типов данных, превращая их в тип со словарным кодированием.
## Синтаксис {#lowcardinality-syntax}
```sql
LowCardinality(data_type)
```
**Параметры**
- `data_type` — [String](string.md), [FixedString](fixedstring.md), [Date](date.md), [DateTime](datetime.md) и числа за исключением типа [Decimal](decimal.md). `LowCardinality` неэффективен для некоторых типов данных, см. описание настройки [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types).
## Описание {#lowcardinality-dscr}
`LowCardinality` — это надстройка, изменяющая способ хранения и правила обработки данных. ClickHouse применяет [словарное кодирование](https://en.wikipedia.org/wiki/Dictionary_coder) в столбцы типа `LowCardinality`. Работа с данными, представленными в словарном виде, может значительно увеличивать производительность запросов [SELECT](../statements/select/index.md) для многих приложений.
Эффективность использования типа данных `LowCarditality` зависит от разнообразия данных. Если словарь содержит менее 10 000 различных значений, ClickHouse в основном показывает более высокую эффективность чтения и хранения данных. Если же словарь содержит более 100 000 различных значений, ClickHouse может работать хуже, чем при использовании обычных типов данных.
При работе со строками, использование `LowCardinality` вместо [Enum](enum.md). `LowCardinality` обеспечивает большую гибкость в использовании и часто показывает такую же или более высокую эффективность.
## Пример
Создать таблицу со столбцами типа `LowCardinality`:
```sql
CREATE TABLE lc_t
(
`id` UInt16,
`strings` LowCardinality(String)
)
ENGINE = MergeTree()
ORDER BY id
```
## Связанные настройки и функции
Настройки:
- [low_cardinality_max_dictionary_size](../../operations/settings/settings.md#low_cardinality_max_dictionary_size)
- [low_cardinality_use_single_dictionary_for_part](../../operations/settings/settings.md#low_cardinality_use_single_dictionary_for_part)
- [low_cardinality_allow_in_native_format](../../operations/settings/settings.md#low_cardinality_allow_in_native_format)
- [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types)
Функции:
- [toLowCardinality](../functions/type-conversion-functions.md#tolowcardinality)
## Смотрите также
- [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality).
- [Reducing Clickhouse Storage Cost with the Low Cardinality Type Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/).
- [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf).

View File

@ -1,3 +1,8 @@
---
toc_priority: 55
toc_title: Nullable
---
# Nullable(TypeName) {#data_type-nullable}
Позволяет работать как со значением типа `TypeName` так и с отсутствием этого значения ([NULL](../../sql-reference/data-types/nullable.md)) в одной и той же переменной, в том числе хранить `NULL` в таблицах вместе со значения типа `TypeName`. Например, в столбце типа `Nullable(Int8)` можно хранить значения типа `Int8`, а в тех строках, где значения нет, будет храниться `NULL`.

View File

@ -1,3 +1,8 @@
---
toc_priority: 54
toc_title: Tuple(T1, T2, ...)
---
# Tuple(T1, T2, …) {#tuplet1-t2}
Кортеж из элементов любого [типа](index.md#data_types). Элементы кортежа могут быть одного или разных типов.

View File

@ -1,4 +1,4 @@
# Функции для битмапов {#funktsii-dlia-bitmapov}
# Функции для битмапов {#bitmap-functions}
## bitmapBuild {#bitmap_functions-bitmapbuild}
@ -61,8 +61,8 @@ bitmapSubsetLimit(bitmap, range_start, cardinality_limit)
**Параметры**
- `bitmap` Битмап. [Bitmap object](#bitmap_functions-bitmapbuild).
- `range_start` Начальная точка подмножества. [UInt32](../../sql-reference/functions/bitmap-functions.md).
- `cardinality_limit` Верхний предел подмножества. [UInt32](../../sql-reference/functions/bitmap-functions.md).
- `range_start` Начальная точка подмножества. [UInt32](../../sql-reference/functions/bitmap-functions.md#bitmap-functions).
- `cardinality_limit` Верхний предел подмножества. [UInt32](../../sql-reference/functions/bitmap-functions.md#bitmap-functions).
**Возвращаемое значение**
@ -97,7 +97,7 @@ bitmapContains(haystack, needle)
**Параметры**
- `haystack` [объект Bitmap](#bitmap_functions-bitmapbuild), в котором функция ищет значение.
- `needle` значение, которое функция ищет. Тип — [UInt32](../../sql-reference/functions/bitmap-functions.md).
- `needle` значение, которое функция ищет. Тип — [UInt32](../../sql-reference/functions/bitmap-functions.md#bitmap-functions).
**Возвращаемые значения**

View File

@ -100,5 +100,6 @@ FROM numbers(3)
│ a*cjab+ │
│ aeca2A │
└───────────────────────────────────────┘
```
[Оригинальная статья](https://clickhouse.tech/docs/ru/query_language/functions/random_functions/) <!--hide-->

View File

@ -508,11 +508,85 @@ SELECT parseDateTimeBestEffort('10 20:19')
**См. также**
- \[Информация о формате ISO 8601 от @xkcd\](https://xkcd.com/1179/)
- [Информация о формате ISO 8601 от @xkcd](https://xkcd.com/1179/)
- [RFC 1123](https://tools.ietf.org/html/rfc1123)
- [toDate](#todate)
- [toDateTime](#todatetime)
## parseDateTimeBestEffortUS {#parsedatetimebesteffortUS}
Эта функция похожа на [parseDateTimeBestEffort](#parsedatetimebesteffort), но разница состоит в том, что в она предполагает американский формат даты (`MM/DD/YYYY` etc.) в случае неоднозначности.
**Синтаксис**
``` sql
parseDateTimeBestEffortUS(time_string [, time_zone]);
```
**Параметры**
- `time_string` — строка, содержащая дату и время для преобразования. [String](../../sql-reference/data-types/string.md).
- `time_zone` — часовой пояс. Функция анализирует `time_string` в соответствии с часовым поясом. [String](../../sql-reference/data-types/string.md).
**Поддерживаемые нестандартные форматы**
- Строка, содержащая 9-10 цифр [unix timestamp](https://en.wikipedia.org/wiki/Unix_time).
- Строка, содержащая дату и время: `YYYYMMDDhhmmss`, `MM/DD/YYYY hh:mm:ss`, `MM-DD-YY hh:mm`, `YYYY-MM-DD hh:mm:ss`, etc.
- Строка с датой, но без времени: `YYYY`, `YYYYMM`, `YYYY*MM`, `MM/DD/YYYY`, `MM-DD-YY` etc.
- Строка, содержащая день и время: `DD`, `DD hh`, `DD hh:mm`. В этом случае `YYYY-MM` заменяется на `2000-01`.
- Строка, содержащая дату и время, а также информацию о часовом поясе: `YYYY-MM-DD hh:mm:ss ±h:mm` и т.д. Например, `2020-12-12 17:36:00 -5:00`.
**Возвращаемое значение**
- `time_string` преобразован в тип данных `DateTime`.
**Примеры**
Запрос:
``` sql
SELECT parseDateTimeBestEffortUS('09/12/2020 12:12:57')
AS parseDateTimeBestEffortUS;
```
Ответ:
``` text
┌─parseDateTimeBestEffortUS─┐
│ 2020-09-12 12:12:57 │
└─────────────────────────——┘
```
Запрос:
``` sql
SELECT parseDateTimeBestEffortUS('09-12-2020 12:12:57')
AS parseDateTimeBestEffortUS;
```
Ответ:
``` text
┌─parseDateTimeBestEffortUS─┐
│ 2020-09-12 12:12:57 │
└─────────────────────────——┘
```
Запрос:
``` sql
SELECT parseDateTimeBestEffortUS('09.12.2020 12:12:57')
AS parseDateTimeBestEffortUS;
```
Ответ:
``` text
┌─parseDateTimeBestEffortUS─┐
│ 2020-09-12 12:12:57 │
└─────────────────────────——┘
```
## toUnixTimestamp64Milli
## toUnixTimestamp64Micro
## toUnixTimestamp64Nano
@ -604,4 +678,43 @@ SELECT fromUnixTimestamp64Milli(i64, 'UTC')
└──────────────────────────────────────┘
```
## toLowCardinality {#tolowcardinality}
Преобразует входные данные в версию [LowCardianlity](../data-types/lowcardinality.md) того же типа данных.
Чтобы преобразовать данные из типа `LowCardinality`, используйте функцию [CAST](#type_conversion_function-cast). Например, `CAST(x as String)`.
**Синтаксис**
```sql
toLowCardinality(expr)
```
**Параметры**
- `expr` — [Выражение](../syntax.md#syntax-expressions), которое в результате преобразуется в один из [поддерживаемых типов данных](../data-types/index.md#data_types).
**Возвращаемое значение**
- Результат преобразования `expr`.
Тип: `LowCardinality(expr_result_type)`
**Example**
Запрос:
```sql
SELECT toLowCardinality('1')
```
Результат:
```text
┌─toLowCardinality('1')─┐
│ 1 │
└───────────────────────┘
```
[Оригинальная статья](https://clickhouse.tech/docs/ru/query_language/functions/type_conversion_functions/) <!--hide-->

View File

@ -3,4 +3,28 @@ toc_folder_title: "\u0412\u044B\u0440\u0430\u0436\u0435\u043D\u0438\u044F"
toc_priority: 31
---
# SQL выражения в ClickHouse {#clickhouse-sql-statements}
Выражения описывают различные действия, которые можно выполнить с помощью SQL запросов. Каждый вид выражения имеет свой синтаксис и особенности использования, которые описаны в соответствующих разделах документации:
- [SELECT](../../sql-reference/statements/select/index.md)
- [INSERT INTO](../../sql-reference/statements/insert-into.md)
- [CREATE](../../sql-reference/statements/create/index.md)
- [ALTER](../../sql-reference/statements/alter/index.md)
- [SYSTEM](../../sql-reference/statements/system.md)
- [SHOW](../../sql-reference/statements/show.md)
- [GRANT](../../sql-reference/statements/grant.md)
- [REVOKE](../../sql-reference/statements/revoke.md)
- [ATTACH](../../sql-reference/statements/attach.md)
- [CHECK TABLE](../../sql-reference/statements/check-table.md)
- [DESCRIBE TABLE](../../sql-reference/statements/describe-table.md)
- [DETACH](../../sql-reference/statements/detach.md)
- [DROP](../../sql-reference/statements/drop.md)
- [EXISTS](../../sql-reference/statements/exists.md)
- [KILL](../../sql-reference/statements/kill.md)
- [OPTIMIZE](../../sql-reference/statements/optimize.md)
- [RENAME](../../sql-reference/statements/rename.md)
- [SET](../../sql-reference/statements/set.md)
- [SET ROLE](../../sql-reference/statements/set-role.md)
- [TRUNCATE](../../sql-reference/statements/truncate.md)
- [USE](../../sql-reference/statements/use.md)

View File

@ -97,7 +97,9 @@ Upd. Есть pull request. Upd. Сделано.
Частный случай такой задачи уже есть в https://clickhouse.tech/docs/ru/operations/table_engines/graphitemergetree/ Но это было сделано для конкретной задачи. А надо обобщить.
### 1.10. Пережатие старых данных в фоне {#perezhatie-starykh-dannykh-v-fone}
### 1.10. + Пережатие старых данных в фоне {#perezhatie-starykh-dannykh-v-fone}
В master, сделал Александр Сапин, https://github.com/ClickHouse/ClickHouse/pull/14494
Будет делать Кирилл Барухов, ВШЭ, экспериментальная реализация к весне 2020. Нужно для Яндекс.Метрики.
@ -138,27 +140,32 @@ Upd: PR [#10463](https://github.com/ClickHouse/ClickHouse/pull/10463)
### 1.14. Не писать столбцы, полностью состоящие из нулей {#ne-pisat-stolbtsy-polnostiu-sostoiashchie-iz-nulei}
Антон Попов. Q3.
Антон Попов. Q4.
В очереди. Простая задача, является небольшим пререквизитом для потенциальной поддержки полуструктурированных данных.
Upd. В очереди после чтения срезов столбцов.
### 1.15. Возможность иметь разный первичный ключ в разных кусках {#vozmozhnost-imet-raznyi-pervichnyi-kliuch-v-raznykh-kuskakh}
Сложная задача, только после 1.3.
Upd. В обсуждении.
Upd. Взял в работу Amos Bird. Описана концепция. Совпадает с 1.16.
### 1.16. Несколько физических представлений для одного куска данных {#neskolko-fizicheskikh-predstavlenii-dlia-odnogo-kuska-dannykh}
Сложная задача, только после 1.3 и 1.6. Позволяет компенсировать 21.20.
Upd. В обсуждении.
Upd. Взял в работу Amos Bird. Описана концепция, работа на начальной стадии.
### 1.17. Несколько сортировок для одной таблицы {#neskolko-sortirovok-dlia-odnoi-tablitsy}
Сложная задача, только после 1.3 и 1.6.
Upd. В обсуждении.
Upd. Взял в работу Amos Bird. Описана концепция. Совпадает с 1.16.
### 1.18. Отдельное хранение файлов кусков {#otdelnoe-khranenie-failov-kuskov}
### 1.18. - Отдельное хранение файлов кусков {#otdelnoe-khranenie-failov-kuskov}
Требует 1.3 и 1.6. Полная замена hard links на sym links, что будет лучше для 1.12.
Отменено.
## 2. Крупные рефакторинги {#krupnye-refaktoringi}
@ -194,13 +201,14 @@ Upd. Старый код по большей части удалён.
### 2.5. Версионирование состояний агрегатных функций {#versionirovanie-sostoianii-agregatnykh-funktsii}
В очереди.
В очереди. Описана схема реализации. Алексей Миловидов.
### 2.6. Правая часть IN как тип данных. Выполнение IN в виде скалярного подзапроса {#pravaia-chast-in-kak-tip-dannykh-vypolnenie-in-v-vide-skaliarnogo-podzaprosa}
Требует 2.1.
Отменено.
### 2.7. Нормализация Context {#normalizatsiia-context}
### 2.7. + Нормализация Context {#normalizatsiia-context}
В очереди. Нужно для YQL.
@ -209,12 +217,14 @@ Upd. Старый код по большей части удалён.
Upd. Каталог БД вынесен из Context.
Upd. SharedContext вынесен из Context.
Upd. Проблема нейтрализована и перестала быть актуальной.
Upd. Вообще всё стало Ок.
### 2.8. Декларативный парсер запросов {#deklarativnyi-parser-zaprosov}
Средний приоритет. Нужно для YQL.
Upd. В очереди. Иван Лежанкин.
Upd. Задача в финальной стадии. Пока рассматривается только как альтернативный парсер, описание которого подойдёт для сторонних приложений.
### 2.9. + Логгировние в format-стиле {#loggirovnie-v-format-stile}
@ -225,10 +235,12 @@ Upd. В очереди. Иван Лежанкин.
### 2.10. Запрашивать у таблиц не столбцы, а срезы {#zaprashivat-u-tablits-ne-stolbtsy-a-srezy}
В очереди.
В работе, Антон Попов, Q4.
### 2.11. Разбирательство и нормализация функциональности для bitmap {#razbiratelstvo-i-normalizatsiia-funktsionalnosti-dlia-bitmap}
### 2.11. - Разбирательство и нормализация функциональности для bitmap {#razbiratelstvo-i-normalizatsiia-funktsionalnosti-dlia-bitmap}
В очереди.
Не актуально.
### 2.12. Декларативные сигнатуры функций {#deklarativnye-signatury-funktsii}
@ -265,7 +277,7 @@ Upd. Поползновения наблюдаются.
Требует 3.1.
### + 3.3. Исправить катастрофически отвратительно неприемлемый поиск по документации {#ispravit-katastroficheski-otvratitelno-nepriemlemyi-poisk-po-dokumentatsii}
### 3.3. + Исправить катастрофически отвратительно неприемлемый поиск по документации {#ispravit-katastroficheski-otvratitelno-nepriemlemyi-poisk-po-dokumentatsii}
[Иван Блинков](https://github.com/blinkov/) - очень хороший человек. Сам сайт документации основан на технологиях, не удовлетворяющих требованиям задачи, и эти технологии трудно исправить. Задачу будет делать первый встретившийся нам frontend разработчик, которого мы сможем заставить это сделать.
@ -311,7 +323,6 @@ Upd. Сейчас обсуждается, как сделать другую з
### 4.8. Разделить background pool для fetch и merge {#razdelit-background-pool-dlia-fetch-i-merge}
В очереди. Исправить проблему, что восстанавливающаяся реплика перестаёт мержить. Частично компенсируется 4.3.
Александр Казаков.
## 5. Операции {#operatsii}
@ -450,6 +461,7 @@ UBSan включен в функциональных тестах, но не в
### 7.12. Показывать тестовое покрытие нового кода в PR {#pokazyvat-testovoe-pokrytie-novogo-koda-v-pr}
Пока есть просто показ тестового покрытия всего кода.
Отложено.
### 7.13. + Включение аналога -Weverything в gcc {#vkliuchenie-analoga-weverything-v-gcc}
@ -598,7 +610,7 @@ Upd. Сергей Штыков сделал функцию `randomPrintableASCII
Upd. Илья Яцишин сделал табличную функцию `generateRandom`.
Upd. Эльдар Заитов добавляет OSS Fuzz.
Upd. Сделаны randomString, randomFixedString.
Upd. Сделаны fuzzBits, fuzzBytes.
Upd. Сделаны fuzzBits.
### 7.24. Fuzzing лексера и парсера запросов; кодеков и форматов {#fuzzing-leksera-i-parsera-zaprosov-kodekov-i-formatov}
@ -649,10 +661,11 @@ Upd. В Аркадии частично работает небольшая ча
В очереди. Нужно для Яндекс.Метрики.
### 7.32. Обфускация продакшен запросов {#obfuskatsiia-prodakshen-zaprosov}
### 7.32. + Обфускация продакшен запросов {#obfuskatsiia-prodakshen-zaprosov}
Роман Ильговский. Нужно для Яндекс.Метрики.
Есть pull request, почти готово: https://github.com/ClickHouse/ClickHouse/pull/10973
Есть pull request: https://github.com/ClickHouse/ClickHouse/pull/10973
Готово.
Имея SQL запрос, требуется вывести структуру таблиц, на которых этот запрос будет выполнен, и заполнить эти таблицы случайными данными, такими, что результат этого запроса зависит от выбора подмножества данных.
@ -660,6 +673,8 @@ Upd. В Аркадии частично работает небольшая ча
Обфускация запросов: имея секретные запросы и структуру таблиц, заменить имена полей и константы, чтобы запросы можно было использовать в качестве публично доступных тестов.
Upd. Последняя часть пока не сделана и будет сделана отдельно.
### 7.33. Выкладывать патч релизы в репозиторий автоматически {#vykladyvat-patch-relizy-v-repozitorii-avtomaticheski}
В очереди. Иван Лежанкин.
@ -701,10 +716,11 @@ Upd. Частично решён вопрос с visibility - есть како
Altinity. Никто не делает эту задачу.
### 8.2. Поддержка Mongo Atlas URI {#podderzhka-mongo-atlas-uri}
### 8.2. - Поддержка Mongo Atlas URI {#podderzhka-mongo-atlas-uri}
[Александр Кузьменков](https://github.com/akuzm).
Upd. Задача взята в работу.
Все pull requests успешно закрыты.
### 8.3. + Доработки globs (правильная поддержка диапазонов, уменьшение числа одновременных stream-ов) {#dorabotki-globs-pravilnaia-podderzhka-diapazonov-umenshenie-chisla-odnovremennykh-stream-ov}
@ -721,6 +737,7 @@ Upd. Задача взята в работу.
### 8.6. Kerberos аутентификация для HDFS и Kafka {#kerberos-autentifikatsiia-dlia-hdfs-i-kafka}
Андрей Коняев, ArenaData. Он куда-то пропал.
Upd. В процессе работа для Kafka.
### 8.7. + Исправление мелочи HDFS на очень старых ядрах Linux {#ispravlenie-melochi-hdfs-na-ochen-starykh-iadrakh-linux}
@ -759,6 +776,8 @@ Upd. В стадии код-ревью.
### 8.15. Запись данных в CapNProto {#zapis-dannykh-v-capnproto}
Отложено.
### 8.16. + Поддержка формата Avro {#podderzhka-formata-avro}
Andrew Onyshchuk. Есть pull request. Q1. Сделано.
@ -814,12 +833,13 @@ Upd. Готово.
Низкий приоритет. Отменено.
### 8.21. Поддержка произвольного количества языков для имён регионов {#podderzhka-proizvolnogo-kolichestva-iazykov-dlia-imion-regionov}
### 8.21. - Поддержка произвольного количества языков для имён регионов {#podderzhka-proizvolnogo-kolichestva-iazykov-dlia-imion-regionov}
Нужно для БК. Декабрь 2019.
В декабре для БК сделан минимальный вариант этой задачи.
Максимальный вариант, вроде, никому не нужен.
Upd. Всё ещё кажется, что задача не нужна.
Отменено.
### 8.22. + Поддержка синтаксиса для переменных в стиле MySQL {#podderzhka-sintaksisa-dlia-peremennykh-v-stile-mysql}
@ -831,6 +851,7 @@ Upd. Сделано теми людьми, кому не запрещено ра
### 8.23. Подписка для импорта обновляемых и ротируемых логов в ФС {#podpiska-dlia-importa-obnovliaemykh-i-rotiruemykh-logov-v-fs}
Желательно 2.15.
Отложено.
## 9. Безопасность {#bezopasnost}
@ -870,9 +891,10 @@ Upd. Одну причину устранили, но ещё что-то неи
Upd. Нас заставляют переписать эту библиотеку с одного API на другое, так как старое внезапно устарело. Кажется, что переписывание случайно исправит все проблемы.
Upd. Ура, нашли причину и исправили.
### 10.3. Возможность чтения данных из статических таблиц в YT словарях {#vozmozhnost-chteniia-dannykh-iz-staticheskikh-tablits-v-yt-slovariakh}
### 10.3. - Возможность чтения данных из статических таблиц в YT словарях {#vozmozhnost-chteniia-dannykh-iz-staticheskikh-tablits-v-yt-slovariakh}
Нужно для БК и Метрики.
Отменено.
### 10.4. - Словарь из YDB (KikiMR) {#slovar-iz-ydb-kikimr}
@ -884,9 +906,11 @@ Upd. Ура, нашли причину и исправили.
Для MySQL сделал Clément Rodriguez.
### 10.6. Словари из Cassandra и Couchbase {#slovari-iz-cassandra-i-couchbase}
### 10.6. + Словари из Cassandra и Couchbase {#slovari-iz-cassandra-i-couchbase}
Готова Cassandra.
Couchbase отменён, так как не было спроса.
Aerospike под вопросом.
### 10.7. Поддержка Nullable в словарях {#podderzhka-nullable-v-slovariakh}
@ -929,10 +953,14 @@ Upd. Задача в финальной стадии готовности.
### 10.17. Локальный дамп состояния словаря для быстрого старта сервера {#lokalnyi-damp-sostoianiia-slovaria-dlia-bystrogo-starta-servera}
Отложено.
### 10.18. Таблица Join или словарь на удалённом сервере как key-value БД для cache словаря {#tablitsa-join-ili-slovar-na-udalionnom-servere-kak-key-value-bd-dlia-cache-slovaria}
### 10.19. Возможность зарегистрировать некоторые функции, использующие словари, под пользовательскими именами {#vozmozhnost-zaregistrirovat-nekotorye-funktsii-ispolzuiushchie-slovari-pod-polzovatelskimi-imenami}
Отложено.
## 11. Интерфейсы {#interfeisy}
@ -943,6 +971,7 @@ Upd. Задача в финальной стадии готовности.
Нужно разобраться, как упаковывать Java в статический бинарник, возможно AppImage. Или предоставить максимально простую инструкцию по установке jdbc-bridge. Может быть будет заинтересован Александр Крашенинников, Badoo, так как он разработал jdbc-bridge.
Upd. Александр Крашенинников перешёл в другую компанию и больше не занимается этим.
Upd. Задачу взял Zhichun Wu.
### 11.3. + Интеграционные тесты ODBC драйвера путём подключения ClickHouse к самому себе через ODBC {#integratsionnye-testy-odbc-draivera-putiom-podkliucheniia-clickhouse-k-samomu-sebe-cherez-odbc}
@ -960,6 +989,8 @@ Altinity целиком взяли на себя поддержку clickhouse-c
### 11.7. Интерактивный режим работы программы clickhouse-local {#interaktivnyi-rezhim-raboty-programmy-clickhouse-local}
Отложено.
### 11.8. + Поддержка протокола PostgreSQL {#podderzhka-protokola-postgresql}
Элбакян Мовсес Андраникович, ВШЭ.
@ -998,14 +1029,17 @@ Q1. Сделано управление правами полностью, но
Аутентификация через LDAP - Денис Глазачев.
[Виталий Баранов](https://github.com/vitlibar) и Денис Глазачев, Altinity. Требует 12.1.
Q3.
Upd. Pull request на финальной стадии.
### 12.4. Подключение IDM системы Яндекса как справочника пользователей и прав доступа {#podkliuchenie-idm-sistemy-iandeksa-kak-spravochnika-polzovatelei-i-prav-dostupa}
Пока низкий приоритет. Нужно для Метрики. Требует 12.3.
Отложено.
### 12.5. Pluggable аутентификация с помощью Kerberos (возможно, подключение GSASL) {#pluggable-autentifikatsiia-s-pomoshchiu-kerberos-vozmozhno-podkliuchenie-gsasl}
[Виталий Баранов](https://github.com/vitlibar) и Денис Глазачев, Altinity. Требует 12.1.
Upd. Есть pull request.
### 12.6. + Информация о пользователях и квотах в системной таблице {#informatsiia-o-polzovateliakh-i-kvotakh-v-sistemnoi-tablitse}
@ -1033,6 +1067,7 @@ Q3.
Upd. Не уследили, и задачу стали обсуждать менеджеры.
Upd. Задачу смотрит Александр Казаков.
Upd. Задача взята в работу.
Upd. Задача как будто взята в работу.
## 14. Диалект SQL {#dialekt-sql}
@ -1041,7 +1076,9 @@ Upd. Задача взята в работу.
Нужно для DataLens. А также для внедрения в BI инструмент Looker.
### 14.2. Поддержка WITH для подзапросов {#podderzhka-with-dlia-podzaprosov}
### 14.2. + Поддержка WITH для подзапросов {#podderzhka-with-dlia-podzaprosov}
Сделал Amos Bird.
### 14.3. Поддержка подстановок для множеств в правой части IN {#podderzhka-podstanovok-dlia-mnozhestv-v-pravoi-chasti-in}
@ -1057,11 +1094,13 @@ zhang2014
### 14.6. Глобальный scope для WITH {#globalnyi-scope-dlia-with}
В обсуждении. Amos Bird.
### 14.7. Nullable для WITH ROLLUP, WITH CUBE, WITH TOTALS {#nullable-dlia-with-rollup-with-cube-with-totals}
Простая задача.
### 14.8. Модификаторы DISTINCT, ORDER BY для агрегатных функций {#modifikatory-distinct-order-by-dlia-agregatnykh-funktsii}
### 14.8. + Модификаторы DISTINCT, ORDER BY для агрегатных функций {#modifikatory-distinct-order-by-dlia-agregatnykh-funktsii}
В ClickHouse поддерживается вычисление COUNT(DISTINCT x). Предлагается добавить возможность использования модификатора DISTINCT для всех агрегатных функций. Например, AVG(DISTINCT x) - вычислить среднее значение для всех различных значений x. Под вопросом вариант, в котором фильтрация уникальных значений выполняется по одному выражению, а агрегация по другому.
@ -1069,6 +1108,7 @@ zhang2014
Upd. Есть pull request-ы.
Upd. DISTINCT готов.
Upd. ORDER BY отменён и будет заново сделан уже с LIMIT.
### 14.9. + Поддержка запроса EXPLAIN {#podderzhka-zaprosa-explain}
@ -1079,8 +1119,12 @@ Upd. Есть pull request. Готово.
### 14.11. Функции для grouping sets {#funktsii-dlia-grouping-sets}
Отложено.
### 14.12. Функции обработки временных рядов {#funktsii-obrabotki-vremennykh-riadov}
Отложено.
Сложная задача, так как вводит новый класс функций и требует его обработку в оптимизаторе запросов.
В time-series СУБД нужны функции, которые зависят от последовательности значений. Или даже от последовательности значений и их меток времени. Примеры: moving average, exponential smoothing, derivative, Holt-Winters forecast. Вычисление таких функций поддерживается в ClickHouse лишь частично. Так, ClickHouse поддерживает тип данных «массив» и позволяет реализовать эти функции как функции, принимающие массивы. Но гораздо удобнее для пользователя было бы иметь возможность применить такие функции к таблице (промежуточному результату запроса после сортировки).
@ -1089,6 +1133,8 @@ Upd. Есть pull request. Готово.
### 14.13. Применимость функций высшего порядка для кортежей и Nested {#primenimost-funktsii-vysshego-poriadka-dlia-kortezhei-i-nested}
После задачи "чтение срезов столбцов".
### 14.14. Неявные преобразования типов констант {#neiavnye-preobrazovaniia-tipov-konstant}
Сделано для операторов сравнения с константами (подавляющее большинство use cases).
@ -1180,12 +1226,14 @@ Upd. Секретного изменения в работе не будет, з
### 16.5. Функции для XML и HTML escape {#funktsii-dlia-xml-i-html-escape}
### 16.6. Функции нормализации и хэширования SQL запросов {#funktsii-normalizatsii-i-kheshirovaniia-sql-zaprosov}
### 16.6. + Функции нормализации и хэширования SQL запросов {#funktsii-normalizatsii-i-kheshirovaniia-sql-zaprosov}
Алексей Миловидов. Сделано.
## 17. Работа с географическими данными {#rabota-s-geograficheskimi-dannymi}
### 17.1. Гео-словари для определения региона по координатам {#geo-slovari-dlia-opredeleniia-regiona-po-koordinatam}
### 17.1. + Гео-словари для определения региона по координатам {#geo-slovari-dlia-opredeleniia-regiona-po-koordinatam}
[Андрей Чулков](https://github.com/achulkov2), Антон Кваша, Артур Петуховский, ВШЭ.
Будет основано на коде от Арслана Урташева.
@ -1198,6 +1246,7 @@ Upd. Андрей сделал прототип интерфейса и реал
Upd. Андрей сделал прототип более оптимальной структуры данных.
Upd. Есть обнадёживающие результаты.
Upd. В ревью.
Upd. В релизе.
### 17.2. GIS типы данных и операции {#gis-tipy-dannykh-i-operatsii}
@ -1227,6 +1276,7 @@ Upd. Есть pull request.
Александр Кожихов, Максим Кузнецов. Обнаружена фундаментальная проблема в реализации, доделывает предположительно [Николай Кочетов](https://github.com/KochetovNicolai). Он может делегировать задачу кому угодно.
Исправление фундаментальной проблемы - есть PR.
Фундаментальная проблема решена.
### 18.2. Агрегатные функции для статистических тестов {#agregatnye-funktsii-dlia-statisticheskikh-testov}
@ -1235,16 +1285,20 @@ Upd. Есть pull request.
Предлагается реализовать в ClickHouse статистические тесты (Analysis of Variance, тесты нормальности распределения и т. п.) в виде агрегатных функций. Пример: `welchTTest(value, sample_idx)`.
Сделали прототип двух тестов, есть pull request. Также есть pull request для корелляции рангов.
Upd. Помержили корелляцию рангов, но ещё не помержили сравнение t-test, u-test.
### 18.3. Инфраструктура для тренировки моделей в ClickHouse {#infrastruktura-dlia-trenirovki-modelei-v-clickhouse}
В очереди.
Отложено.
## 19. Улучшение работы кластера {#uluchshenie-raboty-klastera}
### 19.1. Параллельные кворумные вставки без линеаризуемости {#parallelnye-kvorumnye-vstavki-bez-linearizuemosti}
Upd. В работе, ожидается в начале октября.
Репликация данных в ClickHouse по-умолчанию является асинхронной без выделенного мастера. Это значит, что клиент, осуществляющий вставку данных, получает успешный ответ после того, как данные попали на один сервер; репликация данных по остальным серверам осуществляется в другой момент времени. Это ненадёжно, потому что допускает потерю только что вставленных данных при потере лишь одного сервера.
Для решения этой проблемы, в ClickHouse есть возможность включить «кворумную» вставку. Это значит, что клиент, осуществляющий вставку данных, получает успешный ответ после того, как данные попали на несколько (кворум) серверов. Обеспечивается линеаризуемость: клиент, получает успешный ответ после того, как данные попали на несколько реплик, *которые содержат все предыдущие данные, вставленные с кворумом* (такие реплики можно называть «синхронными»), и при запросе SELECT можно выставить настройку, разрешающую только чтение с синхронных реплик.
@ -1265,6 +1319,7 @@ Upd. Есть pull request.
Upd. Алексей сделал какой-то вариант, но борется с тем, что ничего не работает.
Upd. Есть pull request на начальной стадии.
Upd. Взято в работу, но непонятна перспектива, так как не ясно, подлежат ли исправлению некоторые нюансы.
### 19.3. - Подключение YT Cypress или YDB как альтернативы ZooKeeper {#podkliuchenie-yt-cypress-ili-ydb-kak-alternativy-zookeeper}
@ -1349,9 +1404,9 @@ Upd. Для DISTINCT есть pull request.
[Vxider](https://github.com/Vxider), ICT
Есть pull request.
### 21.6. Уменьшение числа потоков для SELECT в случае тривиального INSERT SELECT {#umenshenie-chisla-potokov-dlia-select-v-sluchae-trivialnogo-insert-select}
### 21.6. + Уменьшение числа потоков для SELECT в случае тривиального INSERT SELECT {#umenshenie-chisla-potokov-dlia-select-v-sluchae-trivialnogo-insert-select}
ucasFL, в разработке.
ucasFL, в разработке. Готово.
### 21.7. Кэш результатов запросов {#kesh-rezultatov-zaprosov}
@ -1371,11 +1426,14 @@ Upd. В обсуждении.
Upd. Есть нерабочий прототип, скорее всего будет отложено.
Upd. Отложено до осени.
Upd. Отложено до.
### 21.8.1. Отдельный аллокатор для кэшей с ASLR {#otdelnyi-allokator-dlia-keshei-s-aslr}
В прошлом году задачу пытался сделать Данила Кутенин с помощью lfalloc из Аркадии и mimalloc из Microsoft, но оба решения не были квалифицированы для использования в продакшене. Успешная реализация задачи 21.8 отменит необходимость в этой задаче, поэтому холд.
Upd. Ещё попробовали новый tcmalloc, результаты неудовлетворительные. Пока отменено.
### 21.9. Исправить push-down выражений с помощью Processors {#ispravit-push-down-vyrazhenii-s-pomoshchiu-processors}
[Николай Кочетов](https://github.com/KochetovNicolai). Требует 2.1.
@ -1384,7 +1442,7 @@ Upd. Отложено до осени.
Amos Bird.
### 21.11. Peephole оптимизации запросов {#peephole-optimizatsii-zaprosov}
### 21.11. + Peephole оптимизации запросов {#peephole-optimizatsii-zaprosov}
Руслан Камалов, Михаил Малафеев, Виктор Гришанин, ВШЭ
@ -1399,8 +1457,9 @@ Amos Bird.
Сделано ещё несколько оптимизаций.
Upd. Все вышеперечисленные оптимизации доступны в pull requests.
Upd. Из них почти все помержены, осталась одна.
Upd. Помержили всё.
### 21.12. Алгебраические оптимизации запросов {#algebraicheskie-optimizatsii-zaprosov}
### 21.12. + Алгебраические оптимизации запросов {#algebraicheskie-optimizatsii-zaprosov}
Руслан Камалов, Михаил Малафеев, Виктор Гришанин, ВШЭ
@ -1415,6 +1474,7 @@ Upd. Из них почти все помержены, осталась одна
Несколько оптимизаций есть в PR.
Upd. Все оптимизации кроме "Обращение инъективных функций в сравнениях на равенство" есть в PR.
Upd. Из них больше половины помержены, осталось ещё две.
Upd. Помержили всё.
### 21.13. Fusion агрегатных функций {#fusion-agregatnykh-funktsii}
@ -1427,6 +1487,7 @@ Constraints позволяют задать выражение, истиннос
Если выражение содержит равенство, то встретив в запросе одну из частей равенства, её можно заменить на другую часть равенства, если это сделает проще чтение данных или вычисление выражения. Например, задан constraint: `URLDomain = domain(URL)`. Значит, выражение `domain(URL)` можно заменить на `URLDomain`.
Upd. Возможно будет отложено на следующий год.
Отложено на следующий год.
### 21.15. Многоступенчатое чтение данных вместо PREWHERE {#mnogostupenchatoe-chtenie-dannykh-vmesto-prewhere}
@ -1442,10 +1503,11 @@ Upd. Возможно будет отложено на следующий год
### 21.18. Внутренняя параллелизация мержа больших состояний агрегатных функций {#vnutrenniaia-parallelizatsiia-merzha-bolshikh-sostoianii-agregatnykh-funktsii}
### 21.19. Оптимизация сортировки {#optimizatsiia-sortirovki}
### 21.19. + Оптимизация сортировки {#optimizatsiia-sortirovki}
Василий Морозов, Арслан Гумеров, Альберт Кидрачев, ВШЭ.
В прошлом году задачу начинал делать другой человек, но не добился достаточного прогресса.
Upd. Сделаны самые существенные из предложенных вариантов.
\+ 1. Оптимизация top sort.
@ -1481,11 +1543,13 @@ Upd. Вместо этого будем делать задачу 1.16.
### 21.22. Userspace page cache {#userspace-page-cache}
Требует 21.8.
Отложено.
### 21.23. Ускорение работы с вторичными индексами {#uskorenie-raboty-s-vtorichnymi-indeksami}
### 21.23. + Ускорение работы с вторичными индексами {#uskorenie-raboty-s-vtorichnymi-indeksami}
zhang2014.
Есть pull request.
Готово.
## 22. Долги и недоделанные возможности {#dolgi-i-nedodelannye-vozmozhnosti}
@ -1679,15 +1743,18 @@ Q1. [Николай Кочетов](https://github.com/KochetovNicolai).
### 24.2. Экспериментальные алгоритмы сжатия {#eksperimentalnye-algoritmy-szhatiia}
Отложено.
ClickHouse поддерживает LZ4 и ZSTD для сжатия данных. Эти алгоритмы являются парето-оптимальными по соотношению скорости и коэффициентам сжатия среди достаточно известных. Тем не менее, существуют менее известные алгоритмы сжатия, которые могут превзойти их по какому-либо критерию. Из потенциально более быстрых по сравнимом коэффициенте сжатия: Lizard, LZSSE, density. Из более сильных: bsc и csc. Необходимо изучить эти алгоритмы, добавить их поддержку в ClickHouse и исследовать их работу на тестовых датасетах.
### 24.3. Экспериментальные кодеки {#eksperimentalnye-kodeki}
### 24.3. - Экспериментальные кодеки {#eksperimentalnye-kodeki}
Существуют специализированные алгоритмы кодирования числовых последовательностей: Group VarInt, MaskedVByte, PFOR. Необходимо изучить наиболее эффективные реализации этих алгоритмов. Примеры вы сможете найти на https://github.com/lemire и https://github.com/powturbo/ а также https://github.com/schizofreny/middle-out
Внедрить их в ClickHouse в виде кодеков и изучить их работу на тестовых датасетах.
Upd. Есть два pull requests в начальной стадии, отложено.
Upd. Отменено.
### 24.4. Шифрование в ClickHouse на уровне VFS {#shifrovanie-v-clickhouse-na-urovne-vfs}
@ -1697,6 +1764,7 @@ Upd. Есть два pull requests в начальной стадии, отло
Обсуждаются детали реализации. Q3/Q4.
Виталий Баранов.
Отложено, после бэкапов.
### 24.5. Поддержка функций шифрования для отдельных значений {#podderzhka-funktsii-shifrovaniia-dlia-otdelnykh-znachenii}
@ -1706,6 +1774,7 @@ Upd. Есть два pull requests в начальной стадии, отло
Для этого требуется реализовать функции шифрования и расшифрования, доступные из SQL. Для шифрования реализовать возможность добавления нужного количества случайных бит для исключения одинаковых зашифрованных значений на одинаковых данных. Это позволит реализовать возможность «забывания» данных без удаления строк таблицы: можно шифровать данные разных клиентов разными ключами, и для того, чтобы забыть данные одного клиента, потребуется всего лишь удалить ключ.
Делает Василий Немков, Altinity
Есть pull request в процессе ревью, исправляем проблемы производительности.
### 24.6. Userspace RAID {#userspace-raid}
@ -1722,6 +1791,7 @@ RAID позволяет одновременно увеличить надёжн
Для преодоления этих ограничений, предлагается реализовать в ClickHouse встроенный алгоритм расположения данных на дисках.
Есть pull request на начальной стадии.
Отложено.
### 24.7. Вероятностные структуры данных для фильтрации по подзапросам {#veroiatnostnye-struktury-dannykh-dlia-filtratsii-po-podzaprosam}
@ -1762,6 +1832,7 @@ Upd. Есть pull request. В стадии ревью. Готово.
Рустам Гусейн-заде, ВШЭ.
Есть pull request на промежуточной стадии.
Отложено.
### 24.11. User Defined Functions {#user-defined-functions}
@ -1785,7 +1856,7 @@ ClickHouse предоставляет достаточно богатый наб
Upd. В работе два варианта реализации UDF.
### 24.12. GPU offloading {#gpu-offloading}
### 24.12. - GPU offloading {#gpu-offloading}
Риск состоит в том, что даже известные GPU базы, такие как OmniSci, работают медленнее, чем ClickHouse.
Преимущество возможно только на полной сортировке и JOIN.
@ -1794,10 +1865,11 @@ Upd. В работе два варианта реализации UDF.
В компании nVidia сделали прототип offloading вычисления GROUP BY с некоторыми из агрегатных функций в ClickHouse и обещат предоставить исходники в публичный доступ для дальнейшего развития. Предлагается изучить этот прототип и расширить его применимость для более широкого сценария использования. В качестве альтернативы, предлагается изучить исходные коды системы `OmniSci` или `Alenka` или библиотеку `CUB` https://nvlabs.github.io/cub/ и применить некоторые из алгоритмов в ClickHouse.
Upd. В компании nVidia выложили прототип, теперь нужна интеграция в систему сборки.
Upd. Интеграция в систему сборки - Иван Лежанкин.
Upd. Интеграция в систему сборки - Иван Лежанкин (не сделано).
Upd. Есть прототип bitonic sort.
Upd. Прототип bitonic sort помержен, но целесообразность под вопросом (он работает медленнее).
Наверное надо будет подержать и удалить.
Удалили.
### 24.13. Stream запросы {#stream-zaprosy}
@ -1819,6 +1891,8 @@ Upd. Есть два прототипа от внешних контрибьют
В прошлом году исследование по этой задаче сделал Егор Соловьёв, ВШЭ и Яндекс.Такси. Его исследование показало, что алгоритм нельзя существенно улучшить путём изменения параметров. Но исследование лажовое, так как рассмотрен только уже использующийся алгоритм. То есть, задача остаётся открытой.
Отложено.
### 24.17. Экспериментальные способы ускорения параллельного GROUP BY {#eksperimentalnye-sposoby-uskoreniia-parallelnogo-group-by}
Максим Серебряков
@ -1831,9 +1905,12 @@ Upd. Есть pull request - в большинстве случаев однов
### 24.19. Промежуточное состояние GROUP BY как структура данных для key-value доступа {#promezhutochnoe-sostoianie-group-by-kak-struktura-dannykh-dlia-key-value-dostupa}
Отложено.
### 24.20. Short-circuit вычисления некоторых выражений {#short-circuit-vychisleniia-nekotorykh-vyrazhenii}
Два года назад задачу попробовала сделать Анастасия Царькова, ВШЭ и Яндекс, но реализация получилась слишком неудобной и её удалили.
В обсуждении.
### 24.21. Реализация в ClickHouse протокола распределённого консенсуса {#realizatsiia-v-clickhouse-protokola-raspredelionnogo-konsensusa}
@ -1851,9 +1928,10 @@ ClickHouse также может использоваться для быстр
Другая экспериментальная задача - реализация эвристик для обработки данных в неизвестном построчном текстовом формате. Детектирование CSV, TSV, JSON, детектирование разделителей и форматов значений.
### 24.23. Минимальная поддержка транзакций для множества вставок/чтений {#minimalnaia-podderzhka-tranzaktsii-dlia-mnozhestva-vstavokchtenii}
### 24.23. - Минимальная поддержка транзакций для множества вставок/чтений {#minimalnaia-podderzhka-tranzaktsii-dlia-mnozhestva-vstavokchtenii}
Максим Кузнецов, ВШЭ.
Отменено.
Таблицы типа MergeTree состоят из набора независимых неизменяемых «кусков» данных. При вставках данных (INSERT), формируются новые куски. При модификациях данных (слияние кусков), формируются новые куски, а старые - становятся неактивными и перестают использоваться следующими запросами. Чтение данных (SELECT) производится из снэпшота множества кусков на некоторый момент времени. Таким образом, чтения и вставки не блокируют друг друга.
@ -1863,11 +1941,12 @@ ClickHouse также может использоваться для быстр
Для решения этих проблем, предлагается ввести глобальные метки времени для кусков данных (сейчас уже есть инкрементальные номера кусков, но они выделяются в рамках одной таблицы). Первым шагом сделаем эти метки времени в рамках сервера. Вторым шагом сделаем метки времени в рамках всех серверов, но неточные на основе локальных часов. Третьим шагом сделаем метки времени, выдаваемые сервисом координации.
### 24.24. Реализация алгоритмов differential privacy {#realizatsiia-algoritmov-differential-privacy}
### 24.24. - Реализация алгоритмов differential privacy {#realizatsiia-algoritmov-differential-privacy}
[\#6874](https://github.com/ClickHouse/ClickHouse/issues/6874)
Артём Вишняков, ВШЭ. Есть pull request.
Отменено, так как решение имеет низкую практичность.
### 24.25. Интеграция в ClickHouse функциональности обработки HTTP User Agent {#integratsiia-v-clickhouse-funktsionalnosti-obrabotki-http-user-agent}
@ -1882,6 +1961,7 @@ Upd. Есть pull request. Нужно ещё чистить код библио
Александр Кожихов, ВШЭ и Яндекс.YT.
Upd. Есть pull request с прототипом.
Upd. Александ Кузьменков взял задачу в работу.
### 24.27. Реализация алгоритмов min-hash, sim-hash для нечёткого поиска полудубликатов {#realizatsiia-algoritmov-min-hash-sim-hash-dlia-nechiotkogo-poiska-poludublikatov}
@ -1892,10 +1972,12 @@ ucasFL, ICT.
Алгоритмы min-hash и sim-hash позволяют вычислить для текста несколько хэш-значений таких, что при небольшом изменении текста, по крайней мере один из хэшей не меняется. Вычисления можно реализовать на n-грамах и словарных шинглах. Предлагается добавить поддержку этих алгоритмов в виде функций в ClickHouse и изучить их применимость для задачи нечёткого поиска полудубликатов.
Есть pull request, есть что доделывать.
Upd. Николай Кочетов взял задачу в работу.
### 24.28. Другой sketch для квантилей {#drugoi-sketch-dlia-kvantilei}
Похоже на quantileTiming, но с логарифмическими корзинами. См. DDSketch.
Отложено.
### 24.29. Поддержка Arrow Flight {#podderzhka-arrow-flight}
@ -1911,6 +1993,7 @@ Amos Bird, но его решение слишком громоздкое и п
### 24.31. Кореллированные подзапросы {#korellirovannye-podzaprosy}
Перепиcывание в JOIN. Не раньше 21.11, 21.12, 21.9. Низкий приоритет.
Отложено.
### 24.32. Поддержка GRPC {#podderzhka-grpc}
@ -1925,6 +2008,7 @@ Amos Bird, но его решение слишком громоздкое и п
Рассматривается вариант - поддержка GRPC в ClickHouse. Здесь есть неочевидные моменты, такие как - эффективная передача массивов данных в column-oriented формате - насколько удобно будет обернуть это в GRPC.
Задача в работе, есть pull request. [#10136](https://github.com/ClickHouse/ClickHouse/pull/10136)
Upd. Задачу взял в работу Виталий Баранов.
## 25. DevRel {#devrel}
@ -1970,17 +2054,18 @@ Amos Bird, но его решение слишком громоздкое и п
Екатерина - организация. Upd. Проведено два онлайн митапа на русском и два на английском.
### 25.11. Митапы зарубежные: восток США (Нью Йорк, возможно Raleigh), возможно северо-запад (Сиэтл), Китай (Пекин снова, возможно митап для разработчиков или хакатон), Лондон {#mitapy-zarubezhnye-vostok-ssha-niu-iork-vozmozhno-raleigh-vozmozhno-severo-zapad-sietl-kitai-pekin-snova-vozmozhno-mitap-dlia-razrabotchikov-ili-khakaton-london}
### 25.11. + Митапы зарубежные: восток США (Нью Йорк, возможно Raleigh), возможно северо-запад (Сиэтл), Китай (Пекин снова, возможно митап для разработчиков или хакатон), Лондон {#mitapy-zarubezhnye-vostok-ssha-niu-iork-vozmozhno-raleigh-vozmozhno-severo-zapad-sietl-kitai-pekin-snova-vozmozhno-mitap-dlia-razrabotchikov-ili-khakaton-london}
[Иван Блинков](https://github.com/blinkov/) - организация. Две штуки в США запланированы. Upd. Два митапа в США и один в Европе проведены.
[Иван Блинков](https://github.com/blinkov/) - организация. Две штуки в США запланированы. Upd. Два митапа в США и один в Европе проведены. Upd. Все остальные перенесены в онлайн.
### 25.12. Статья «научная» - про устройство хранения данных и индексов или whitepaper по архитектуре. Есть вариант подать на VLDB {#statia-nauchnaia-pro-ustroistvo-khraneniia-dannykh-i-indeksov-ili-whitepaper-po-arkhitekture-est-variant-podat-na-vldb}
Низкий приоритет. Алексей Миловидов.
### 25.13. Участие во всех мероприятиях Яндекса, которые связаны с разработкой бэкенда, C++ разработкой или с базами данных, возможно участие в DevRel мероприятиях {#uchastie-vo-vsekh-meropriiatiiakh-iandeksa-kotorye-sviazany-s-razrabotkoi-bekenda-c-razrabotkoi-ili-s-bazami-dannykh-vozmozhno-uchastie-v-devrel-meropriiatiiakh}
### 25.13. + Участие во всех мероприятиях Яндекса, которые связаны с разработкой бэкенда, C++ разработкой или с базами данных, возможно участие в DevRel мероприятиях {#uchastie-vo-vsekh-meropriiatiiakh-iandeksa-kotorye-sviazany-s-razrabotkoi-bekenda-c-razrabotkoi-ili-s-bazami-dannykh-vozmozhno-uchastie-v-devrel-meropriiatiiakh}
Алексей Миловидов и все подготовленные докладчики
Алексей Миловидов и все подготовленные докладчики.
Upd. Участвуем.
### 25.14. Конференции в России: все HighLoad, возможно CodeFest, DUMP или UWDC, возможно C++ Russia {#konferentsii-v-rossii-vse-highload-vozmozhno-codefest-dump-ili-uwdc-vozmozhno-c-russia}
@ -1988,6 +2073,7 @@ Amos Bird, но его решение слишком громоздкое и п
Upd. Есть Saint HighLoad online.
Upd. Есть C++ Russia.
CodeFest, DUMP, UWDC отменились.
Upd. Добавились Highload Fwdays, Матемаркетинг.
### 25.15. Конференции зарубежные: Percona, DataOps, попытка попасть на более крупные {#konferentsii-zarubezhnye-percona-dataops-popytka-popast-na-bolee-krupnye}
@ -2009,16 +2095,18 @@ DataOps отменилась.
Требуется проработать вопрос безопасности и изоляции инстансов (поднятие в контейнерах с ограничениями по сети), подключение тестовых датасетов с помощью copy-on-write файловой системы; органичения ресурсов.
Есть минимальный прототип. Сделал Илья Яцишин. Этот прототип не позволяет делиться ссылками на результаты запросов.
Upd. На финальной стадии инструмент для экспериментирования с разными версиями ClickHouse.
### 25.17. Взаимодействие с ВУЗами: ВШЭ, УрФУ, ICT Beijing {#vzaimodeistvie-s-vuzami-vshe-urfu-ict-beijing}
Алексей Миловидов и вся группа разработки.
Благодаря Robert Hodges добавлен CMU.
Upd. Взаимодействие с ВШЭ 2019/2020 успешно выполнено.
Upd. Идёт подготовка к 2020/2021.
### 25.18. - Лекция в ШАД {#lektsiia-v-shad}
Алексей Миловидов
Алексей Миловидов.
### 25.19. - Участие в курсе разработки на C++ в ШАД {#uchastie-v-kurse-razrabotki-na-c-v-shad}
@ -2029,6 +2117,8 @@ Upd. Взаимодействие с ВШЭ 2019/2020 успешно выпол
Существуют мало известные специализированные СУБД, способные конкурировать с ClickHouse по скорости обработки некоторых классов запросов. Пример: `TDEngine` и `DolphinDB`, `VictoriaMetrics`, а также `Apache Doris` и `LocustDB`. Предлагается изучить и классифицировать архитектурные особенности этих систем - их особенности и преимущества. Установить эти системы, загрузить тестовые данные, изучить производительность. Проанализировать, за счёт чего достигаются преимущества.
Upd. Есть поползновения с TDEngine.
Upd. Добавили OmniSci, обновили MonetDB.
Также посмотрели QuestDB и VectorSQL (они не работают).
### 25.21. Повторное награждение контрибьюторов в Китае {#povtornoe-nagrazhdenie-kontribiutorov-v-kitae}
@ -2038,6 +2128,7 @@ Upd. Ждём снятия ограничений и восстановлени
[Иван Блинков](https://github.com/blinkov/) - организация. Провёл мероприятие для турецкой компании.
Upd. On-site заменяется на Online.
Upd. Проведены консультации для нескольких секретных компаний.
### 25.23. Новый мерч для ClickHouse {#novyi-merch-dlia-clickhouse}

View File

@ -28,6 +28,7 @@ import test
import util
import website
from cmake_in_clickhouse_generator import generate_cmake_flags_files
class ClickHouseMarkdown(markdown.extensions.Extension):
class ClickHousePreprocessor(markdown.util.Processor):
@ -184,6 +185,8 @@ def build(args):
test.test_templates(args.website_dir)
if not args.skip_docs:
generate_cmake_flags_files(os.path.join(os.path.dirname(__file__), '..', '..'))
build_docs(args)
from github import build_releases
build_releases(args, build_docs)
@ -200,6 +203,7 @@ def build(args):
if __name__ == '__main__':
os.chdir(os.path.join(os.path.dirname(__file__), '..'))
website_dir = os.path.join('..', 'website')
arg_parser = argparse.ArgumentParser()
arg_parser.add_argument('--lang', default='en,es,fr,ru,zh,ja,tr,fa')
arg_parser.add_argument('--blog-lang', default='en,ru')

View File

@ -0,0 +1,152 @@
import re
import os
from typing import TextIO, List, Tuple, Optional, Dict
# name, default value, description
Entity = Tuple[str, str, str]
# https://regex101.com/r/R6iogw/12
cmake_option_regex: str = r"^\s*option\s*\(([A-Z_0-9${}]+)\s*(?:\"((?:.|\n)*?)\")?\s*(.*)?\).*$"
ch_master_url: str = "https://github.com/clickhouse/clickhouse/blob/master/"
name_str: str = "<a name=\"{anchor}\"></a>[`{name}`](" + ch_master_url + "{path}#L{line})"
default_anchor_str: str = "[`{name}`](#{anchor})"
comment_var_regex: str = r"\${(.+)}"
comment_var_replace: str = "`\\1`"
table_header: str = """
| Name | Default value | Description | Comment |
|------|---------------|-------------|---------|
"""
# Needed to detect conditional variables (those which are defined twice)
# name -> (path, values)
entities: Dict[str, Tuple[str, str]] = {}
def make_anchor(t: str) -> str:
return "".join(["-" if i == "_" else i.lower() for i in t if i.isalpha() or i == "_"])
def process_comment(comment: str) -> str:
return re.sub(comment_var_regex, comment_var_replace, comment, flags=re.MULTILINE)
def build_entity(path: str, entity: Entity, line_comment: Tuple[int, str]) -> None:
(line, comment) = line_comment
(name, description, default) = entity
if name in entities:
return
# cannot escape the { in macro option description -> invalid AMP html
# Skipping "USE_INTERNAL_${LIB_NAME_UC}_LIBRARY"
if "LIB_NAME_UC" in name:
return
if len(default) == 0:
formatted_default: str = "`OFF`"
elif default[0] == "$":
formatted_default: str = "`{}`".format(default[2:-1])
else:
formatted_default: str = "`" + default + "`"
formatted_name: str = name_str.format(
anchor=make_anchor(name),
name=name,
path=path,
line=line if line > 0 else 1)
formatted_description: str = "".join(description.split("\n"))
formatted_comment: str = process_comment(comment)
formatted_entity: str = "| {} | {} | {} | {} |".format(
formatted_name, formatted_default, formatted_description, formatted_comment)
entities[name] = path, formatted_entity
def process_file(root_path: str, input_name: str) -> None:
with open(os.path.join(root_path, input_name), 'r') as cmake_file:
contents: str = cmake_file.read()
def get_line_and_comment(target: str) -> Tuple[int, str]:
contents_list: List[str] = contents.split("\n")
comment: str = ""
for n, line in enumerate(contents_list):
if line.find(target) == -1:
continue
for maybe_comment_line in contents_list[n - 1::-1]:
if not re.match("\s*#\s*", maybe_comment_line):
break
comment = re.sub("\s*#\s*", "", maybe_comment_line) + " " + comment
return n, comment
matches: Optional[List[Entity]] = re.findall(cmake_option_regex, contents, re.MULTILINE)
if matches:
for entity in matches:
build_entity(os.path.join(root_path[6:], input_name), entity, get_line_and_comment(entity[0]))
def process_folder(root_path:str, name: str) -> None:
for root, _, files in os.walk(os.path.join(root_path, name)):
for f in files:
if f == "CMakeLists.txt" or ".cmake" in f:
process_file(root, f)
def generate_cmake_flags_files(root_path: str) -> None:
output_file_name: str = os.path.join(root_path, "docs/en/development/cmake-in-clickhouse.md")
header_file_name: str = os.path.join(root_path, "docs/_includes/cmake_in_clickhouse_header.md")
footer_file_name: str = os.path.join(root_path, "docs/_includes/cmake_in_clickhouse_footer.md")
process_file(root_path, "CMakeLists.txt")
process_file(root_path, "programs/CMakeLists.txt")
process_folder(root_path, "base")
process_folder(root_path, "cmake")
process_folder(root_path, "src")
with open(output_file_name, "w") as f:
with open(header_file_name, "r") as header:
f.write(header.read())
sorted_keys: List[str] = sorted(entities.keys())
ignored_keys: List[str] = []
f.write("### ClickHouse modes\n" + table_header)
for k in sorted_keys:
if k.startswith("ENABLE_CLICKHOUSE_"):
f.write(entities[k][1] + "\n")
ignored_keys.append(k)
f.write("\n### External libraries\nNote that ClickHouse uses forks of these libraries, see https://github.com/ClickHouse-Extras.\n" +
table_header)
for k in sorted_keys:
if k.startswith("ENABLE_") and ".cmake" in entities[k][0]:
f.write(entities[k][1] + "\n")
ignored_keys.append(k)
f.write("\n### External libraries system/bundled mode\n" + table_header)
for k in sorted_keys:
if k.startswith("USE_INTERNAL_"):
f.write(entities[k][1] + "\n")
ignored_keys.append(k)
f.write("\n### Other flags\n" + table_header)
for k in sorted(set(sorted_keys).difference(set(ignored_keys))):
f.write(entities[k][1] + "\n")
with open(footer_file_name, "r") as footer:
f.write(footer.read())
if __name__ == '__main__':
generate_cmake_flags_files("../../")

View File

@ -18,11 +18,11 @@ Markdown==3.2.1
MarkupSafe==1.1.1
mkdocs==1.1.2
mkdocs-htmlproofer-plugin==0.0.3
mkdocs-macros-plugin==0.4.9
mkdocs-macros-plugin==0.4.13
nltk==3.5
nose==1.3.7
protobuf==3.13.0
numpy==1.19.1
numpy==1.19.2
Pygments==2.5.2
pymdown-extensions==8.0
python-slugify==4.0.1

View File

@ -92,7 +92,7 @@ def test_single_page(input_path, lang):
logging.warning('Found %d duplicate anchor points' % duplicate_anchor_points)
if links_to_nowhere:
if lang == 'en': # TODO: check all languages again
if lang == 'en' or lang == 'ru': # TODO: check all languages again
logging.error(f'Found {links_to_nowhere} links to nowhere in {lang}')
sys.exit(1)
else:

View File

@ -1,12 +1,15 @@
# AggregatingMergeTree {#aggregatingmergetree}
该引擎继承自 [MergeTree](mergetree.md),并改变了数据片段的合并逻辑。 ClickHouse 会将相同主键的所有行(在一个数据片段内)替换为单个存储一系列聚合函数状态的行
该引擎继承自 [MergeTree](mergetree.md),并改变了数据片段的合并逻辑。 ClickHouse 会将一个数据片段内所有具有相同主键(准确的说是 [排序键](../../../engines/table-engines/mergetree-family/mergetree.md))的行替换成一行,这一行会存储一系列聚合函数的状态
可以使用 `AggregatingMergeTree` 表来做增量数据统计聚合,包括物化视图的数据聚合。
可以使用 `AggregatingMergeTree` 表来做增量数据的聚合统计,包括物化视图的数据聚合。
引擎需使用 [AggregateFunction](../../../engines/table-engines/mergetree-family/aggregatingmergetree.md) 类型来处理所有列。
引擎使用以下类型来处理所有列:
如果要按一组规则来合并减少行数,则使用 `AggregatingMergeTree` 是合适的。
- [AggregateFunction](../../../sql-reference/data-types/aggregatefunction.md)
- [SimpleAggregateFunction](../../../sql-reference/data-types/simpleaggregatefunction.md)
`AggregatingMergeTree` 适用于能够按照一定的规则缩减行数的情况。
## 建表 {#jian-biao}
@ -20,10 +23,11 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]
```
语句参数的说明,请参阅 [语句描述](../../../engines/table-engines/mergetree-family/aggregatingmergetree.md)。
语句参数的说明,请参阅 [建表语句描述](../../../sql-reference/statements/create.md#create-table-query)。
**子句**
@ -33,7 +37,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
<summary>已弃用的建表方法</summary>
!!! 注意 "注意"
!!! attention "注意"
不要在新项目中使用该方法,可能的话,请将旧项目切换到上述方法。
``` sql
@ -45,15 +49,15 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
) ENGINE [=] AggregatingMergeTree(date-column [, sampling_expression], (primary, key), index_granularity)
```
上面的所有参数跟 `MergeTree` 中的一样。
上面的所有参数的含义`MergeTree` 中的一样。
</details>
## SELECT 和 INSERT {#select-he-insert}
插入数据,需使用带有聚合 -State- 函数的 [INSERT SELECT](../../../engines/table-engines/mergetree-family/aggregatingmergetree.md) 语句。
要插入数据,需使用带有 -State- 聚合函数的 [INSERT SELECT](../../../sql-reference/statements/insert-into.md) 语句。
`AggregatingMergeTree` 表中查询数据时,需使用 `GROUP BY` 子句并且要使用与插入时相同的聚合函数,但后缀要改为 `-Merge`
`SELECT` 查询的结果中,对于 ClickHouse 的所有输出格式 `AggregateFunction` 类型的值都实现了特定的二进制表示法。如果直接用 `SELECT` 导出这些数据,例如如用 `TabSeparated` 格式,那么这些导出数据也能直接用 `INSERT` 语句加载导入
对于 `SELECT` 查询的结果, `AggregateFunction` 类型的值对 ClickHouse 的所有输出格式都实现了特定的二进制表示法。在进行数据转储时,例如使用 `TabSeparated` 格式进行 `SELECT` 查询,那么这些转储数据也能直接用 `INSERT` 语句导回
## 聚合物化视图的示例 {#ju-he-wu-hua-shi-tu-de-shi-li}

View File

@ -2,9 +2,9 @@
[MergeTree](mergetree.md) 系列的表(包括 [可复制表](replication.md) )可以使用分区。基于 MergeTree 表的 [物化视图](../special/materializedview.md#materializedview) 也支持分区。
一个分区是指按指定规则逻辑组合一起的表的记录集。可以按任意标准进行分区如按月按日或按事件类型。为了减少需要操作的数据每个分区都是分开存储的。访问数据时ClickHouse 尽量使用这些分区的最小子集。
分区是在一个表中通过指定的规则划分而成的逻辑数据集。可以按任意标准进行分区如按月按日或按事件类型。为了减少需要操作的数据每个分区都是分开存储的。访问数据时ClickHouse 尽量使用这些分区的最小子集。
分区是在 [建表](mergetree.md#table_engine-mergetree-creating-a-table) `PARTITION BY expr` 子句中指定。分区键可以是关于列的任何表达式。例如,指定按月分区,表达式为 `toYYYYMM(date_column)`
分区是在 [建表](mergetree.md#table_engine-mergetree-creating-a-table) 时通过 `PARTITION BY expr` 子句指定的。分区键可以是表中列的任意表达式。例如,指定按月分区,表达式为 `toYYYYMM(date_column)`
``` sql
CREATE TABLE visits
@ -30,10 +30,10 @@ ORDER BY (CounterID, StartDate, intHash32(UserID));
新数据插入到表中时,这些数据会存储为按主键排序的新片段(块)。插入后 10-15 分钟,同一分区的各个片段会合并为一整个片段。
!!! attention "注意"
那些有相同分区表达式值的数据片段才会合并。这意味着 **你不应该用太精细的分区方案**(超过一千个分区)。否则,会因为文件系统中的文件数量和需要找开的文件描述符过多,导致 `SELECT` 查询效率不佳。
!!! info "注意"
那些有相同分区表达式值的数据片段才会合并。这意味着 **你不应该用太精细的分区方案**(超过一千个分区)。否则,会因为文件系统中的文件数量过多和需要打开的文件描述符过多,导致 `SELECT` 查询效率不佳。
可以通过 [系统。零件](../../../engines/table-engines/mergetree-family/custom-partitioning-key.md#system_tables-parts) 表查看表片段和分区信息。例如,假设我们有一个 `visits` 表,按月分区。对 `system.parts` 表执行 `SELECT`
可以通过 [system.parts](../../../engines/table-engines/mergetree-family/custom-partitioning-key.md#system_tables-parts) 表查看表片段和分区信息。例如,假设我们有一个 `visits` 表,按月分区。对 `system.parts` 表执行 `SELECT`
``` sql
SELECT
@ -44,55 +44,59 @@ FROM system.parts
WHERE table = 'visits'
```
┌─partition─┬─name───────────┬─active─┐
│ 201901 │ 201901_1_3_1 │ 0 │
│ 201901 │ 201901_1_9_2 │ 1 │
│ 201901 │ 201901_8_8_0 │ 0 │
│ 201901 │ 201901_9_9_0 │ 0 │
│ 201902 │ 201902_4_6_1 │ 1 │
│ 201902 │ 201902_10_10_0 │ 1 │
│ 201902 │ 201902_11_11_0 │ 1 │
└───────────┴────────────────┴────────┘
``` text
┌─partition─┬─name───────────┬─active─┐
│ 201901 │ 201901_1_3_1 │ 0 │
│ 201901 │ 201901_1_9_2 │ 1 │
│ 201901 │ 201901_8_8_0 │ 0 │
│ 201901 │ 201901_9_9_0 │ 0 │
│ 201902 │ 201902_4_6_1 │ 1 │
│ 201902 │ 201902_10_10_0 │ 1 │
│ 201902 │ 201902_11_11_0 │ 1 │
└───────────┴────────────────┴────────┘
```
`partition` 列存储分区的名称。此示例中有两个分区:`201901` 和 `201902`。在 [ALTER … PARTITION](#alter_manipulations-with-partitions) 语句中你可以使用该列值来指定分区名称。
`name` 列为分区中数据片段的名称。在 [ALTER ATTACH PART](#alter_attach-partition) 语句中你可以使用此列值中来指定片段名称。
这里我们拆解下第一部分的名称:`201901_1_3_1`
这里我们拆解下第一个数据片段的名称:`201901_1_3_1`
- `201901` 是分区名称。
- `1` 是数据块的最小编号。
- `3` 是数据块的最大编号。
- `1` 是块级别(即在由块组成的合并树中,该块在树中的深度)。
!!! attention "注意"
!!! info "注意"
旧类型表的片段名称为:`20190117_20190123_2_2_0`(最小日期 - 最大日期 - 最小块编号 - 最大块编号 - 块级别)。
`active` 列为片段状态。`1` 激活状态;`0` 非激活状态。非激活片段是那些在合并到较大片段之后剩余的源数据片段。损坏的数据片段也表示为非活动状态。
`active` 列为片段状态。`1` 代表激活状态;`0` 代表非激活状态。非激活片段是那些在合并到较大片段之后剩余的源数据片段。损坏的数据片段也表示为非活动状态。
正如在示例中所看到的,同一分区中有几个独立的片段(例如,`201901_1_3_1`和`201901_1_9_2`。这意味着这些片段尚未合并。ClickHouse 大约在插入后15分钟定期报告合并操作合并插入的数据片段。此外你也可以使用 [OPTIMIZE](../../../engines/table-engines/mergetree-family/custom-partitioning-key.md#misc_operations-optimize) 语句直接执行合并。例
正如在示例中所看到的,同一分区中有几个独立的片段(例如,`201901_1_3_1`和`201901_1_9_2`。这意味着这些片段尚未合并。ClickHouse 会定期的对插入的数据片段进行合并大约是在插入后15分钟左右。此外你也可以使用 [OPTIMIZE](../../../sql-reference/statements/misc.md#misc_operations-optimize) 语句发起一个计划外的合并。例如
``` sql
OPTIMIZE TABLE visits PARTITION 201902;
```
┌─partition─┬─name───────────┬─active─┐
│ 201901 │ 201901_1_3_1 │ 0 │
│ 201901 │ 201901_1_9_2 │ 1 │
│ 201901 │ 201901_8_8_0 │ 0 │
│ 201901 │ 201901_9_9_0 │ 0 │
│ 201902 │ 201902_4_6_1 │ 0 │
│ 201902 │ 201902_4_11_2 │ 1 │
│ 201902 │ 201902_10_10_0 │ 0 │
│ 201902 │ 201902_11_11_0 │ 0 │
└───────────┴────────────────┴────────┘
```
┌─partition─┬─name───────────┬─active─┐
│ 201901 │ 201901_1_3_1 │ 0 │
│ 201901 │ 201901_1_9_2 │ 1 │
│ 201901 │ 201901_8_8_0 │ 0 │
│ 201901 │ 201901_9_9_0 │ 0 │
│ 201902 │ 201902_4_6_1 │ 0 │
│ 201902 │ 201902_4_11_2 │ 1 │
│ 201902 │ 201902_10_10_0 │ 0 │
│ 201902 │ 201902_11_11_0 │ 0 │
└───────────┴────────────────┴────────┘
```
非激活片段会在合并后的10分钟左右删除。
非激活片段会在合并后的10分钟左右删除。
查看片段和分区信息的另一种方法是进入表的目录:`/var/lib/clickhouse/data/<database>/<table>/`。例如:
``` bash
dev:/var/lib/clickhouse/data/default/visits$ ls -l
/var/lib/clickhouse/data/default/visits$ ls -l
total 40
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 1 16:48 201901_1_3_1
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 5 16:17 201901_1_9_2
@ -105,12 +109,12 @@ drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 5 12:09 201902_4_6_1
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 1 16:48 detached
```
文件夹 201901\_1\_1\_0201901\_1\_7\_1 等是片段的目录。每个片段都与一个对应的分区相关,并且只包含这个月的数据(本例中的表按月分区)。
201901\_1\_1\_0201901\_1\_7\_1文件夹数据片段的目录。每个片段都与一个对应的分区相关,并且只包含这个月的数据(本例中的表按月分区)。
`detached` 目录存放着使用 [DETACH](../../../sql-reference/statements/alter.md#alter_detach-partition) 语句从表中分离的片段。损坏的片段也会移到该目录,而不是删除。服务器不使用`detached`目录中的片段。可以随时添加,删除或修改此目录中的数据 在运行 [ATTACH](../../../engines/table-engines/mergetree-family/custom-partitioning-key.md#alter_attach-partition) 语句前,服务器不会感知到。
`detached` 目录存放着使用 [DETACH](../../../sql-reference/statements/alter.md#alter_detach-partition) 语句从表中卸载的片段。损坏的片段不会被删除而是也会移到该目录下。服务器不会去使用`detached`目录中的数据片段。因此你可以随时添加,删除或修改此目录中的数据 在运行 [ATTACH](../../../sql-reference/statements/alter.md#alter_attach-partition) 语句前,服务器不会感知到。
注意,在操作服务器时,你不能手动更改文件系统上的片段集或其数据,因为服务器不会感知到这些修改。对于非复制表,可以在服务器停止时执行这些操作,但不建议这样做。对于复制表,在任何情况下都不要更改片段文件。
ClickHouse 支持对分区执行这些操作:删除分区,从一个表复制到另一个表,或创建备份。了解分区的所有操作,请参阅 [分区和片段的操作](../../../engines/table-engines/mergetree-family/custom-partitioning-key.md#alter_manipulations-with-partitions) 一节。
ClickHouse 支持对分区执行这些操作:删除分区,将分区从一个表复制到另一个表,或创建备份。了解分区的所有操作,请参阅 [分区和片段的操作](../../../sql-reference/statements/alter.md#alter_manipulations-with-partitions) 一节。
[来源文章](https://clickhouse.tech/docs/en/operations/table_engines/custom_partitioning_key/) <!--hide-->

View File

@ -1,8 +1,8 @@
# 替换合并树 {#replacingmergetree}
# ReplacingMergeTree {#replacingmergetree}
该引擎和[MergeTree](mergetree.md)的不同之处在于它会删除具有相同主键的重复项。
该引擎和 [MergeTree](mergetree.md) 的不同之处在于它会删除排序键值相同的重复项。
数据的去重只会在合并的过程中出现。合并会在未知的时间在后台进行,因此你无法预先作出计划。有一些数据可能仍未被处理。尽管你可以调用 `OPTIMIZE` 语句发起计划外的合并,但请不要指望使用它,因为 `OPTIMIZE` 语句会引发对大量数据的读写。
数据的去重只会在数据合并期间进行。合并会在后台一个不确定的时间进行,因此你无法预先作出计划。有一些数据可能仍未被处理。尽管你可以调用 `OPTIMIZE` 语句发起计划外的合并,但请不要依靠它,因为 `OPTIMIZE` 语句会引发对数据的大量读写。
因此,`ReplacingMergeTree` 适用于在后台清除重复的数据以节省空间,但是它不保证没有重复的数据出现。
@ -21,19 +21,20 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[SETTINGS name=value, ...]
```
请求参数的描述,参考[请求参数](../../../engines/table-engines/mergetree-family/replacingmergetree.md)。
有关建表参数的描述,可参考 [创建表](../../../sql-reference/statements/create.md#create-table-query)。
**参数**
**ReplacingMergeTree 的参数**
- `ver` — 版本列。类型为 `UInt*`, `Date``DateTime`。可选参数。
合并的时候,`ReplacingMergeTree` 从所有具有相同主键的行中选择一行留下:
- 如果 `ver` 列未指定,选择最后一条。
- 如果 `ver` 列已指定,选择 `ver` 值最大的版本。
在数据合并的时候,`ReplacingMergeTree` 从所有具有相同排序键的行中选择一行留下:
- 如果 `ver` 列未指定,保留最后一条。
- 如果 `ver` 列已指定,保留 `ver` 值最大的版本。
**子句**
创建 `ReplacingMergeTree` 表时,需要与创建 `MergeTree` 表时相同的[子句](mergetree.md)。
创建 `ReplacingMergeTree` 表时,需要使用与创建 `MergeTree` 表时相同的 [子句](mergetree.md)。
<details markdown="1">

View File

@ -13,7 +13,7 @@ Yandex.Metrica基于用户定义的字段对实时访问、连接会话
ClickHouse还被使用在
- 存储来自Yandex.Metrica话重放数据。
- 存储来自Yandex.Metrica的会话重放数据。
- 处理中间数据
- 与Analytics一起构建全球报表。
- 为调试Yandex.Metrica引擎运行查询

View File

@ -3,7 +3,7 @@ machine_translated: true
machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3
---
# 在运营商 {#select-in-operators}
# IN 操作符 {#select-in-operators}
`IN`, `NOT IN`, `GLOBAL IN`,和 `GLOBAL NOT IN` 运算符是单独复盖的,因为它们的功能相当丰富。

Some files were not shown because too many files have changed in this diff Show More