diff --git a/.gitmodules b/.gitmodules index 8342064a055..8c6d0f61b53 100644 --- a/.gitmodules +++ b/.gitmodules @@ -137,3 +137,6 @@ [submodule "contrib/replxx"] path = contrib/replxx url = https://github.com/AmokHuginnsson/replxx.git +[submodule "contrib/ryu"] + path = contrib/ryu + url = https://github.com/ClickHouse-Extras/ryu.git diff --git a/CHANGELOG.md b/CHANGELOG.md index 305021728a9..a6757c38898 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,38 @@ +## ClickHouse release v19.17.6.36, 2019-12-27 + +### Bug Fix +* Fixed potential buffer overflow in decompress. Malicious user can pass fabricated compressed data that could cause read after buffer. This issue was found by Eldar Zaitov from Yandex information security team. [#8404](https://github.com/ClickHouse/ClickHouse/pull/8404) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed possible server crash (`std::terminate`) when the server cannot send or write data in JSON or XML format with values of String data type (that require UTF-8 validation) or when compressing result data with Brotli algorithm or in some other rare cases. [#8384](https://github.com/ClickHouse/ClickHouse/pull/8384) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed dictionaries with source from a clickhouse `VIEW`, now reading such dictionaries doesn't cause the error `There is no query`. [#8351](https://github.com/ClickHouse/ClickHouse/pull/8351) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) +* Fixed checking if a client host is allowed by host_regexp specified in users.xml. [#8241](https://github.com/ClickHouse/ClickHouse/pull/8241), [#8342](https://github.com/ClickHouse/ClickHouse/pull/8342) ([Vitaly Baranov](https://github.com/vitlibar)) +* `RENAME TABLE` for a distributed table now renames the folder containing inserted data before sending to shards. This fixes an issue with successive renames `tableA->tableB`, `tableC->tableA`. [#8306](https://github.com/ClickHouse/ClickHouse/pull/8306) ([tavplubix](https://github.com/tavplubix)) +* `range_hashed` external dictionaries created by DDL queries now allow ranges of arbitrary numeric types. [#8275](https://github.com/ClickHouse/ClickHouse/pull/8275) ([alesapin](https://github.com/alesapin)) +* Fixed `INSERT INTO table SELECT ... FROM mysql(...)` table function. [#8234](https://github.com/ClickHouse/ClickHouse/pull/8234) ([tavplubix](https://github.com/tavplubix)) +* Fixed segfault in `INSERT INTO TABLE FUNCTION file()` while inserting into a file which doesn't exist. Now in this case file would be created and then insert would be processed. [#8177](https://github.com/ClickHouse/ClickHouse/pull/8177) ([Olga Khvostikova](https://github.com/stavrolia)) +* Fixed bitmapAnd error when intersecting an aggregated bitmap and a scalar bitmap. [#8082](https://github.com/ClickHouse/ClickHouse/pull/8082) ([Yue Huang](https://github.com/moon03432)) +* Fixed segfault when `EXISTS` query was used without `TABLE` or `DICTIONARY` qualifier, just like `EXISTS t`. [#8213](https://github.com/ClickHouse/ClickHouse/pull/8213) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed return type for functions `rand` and `randConstant` in case of nullable argument. Now functions always return `UInt32` and never `Nullable(UInt32)`. [#8204](https://github.com/ClickHouse/ClickHouse/pull/8204) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) +* Fixed `DROP DICTIONARY IF EXISTS db.dict`, now it doesn't throw exception if `db` doesn't exist. [#8185](https://github.com/ClickHouse/ClickHouse/pull/8185) ([Vitaly Baranov](https://github.com/vitlibar)) +* If a table wasn't completely dropped because of server crash, the server will try to restore and load it [#8176](https://github.com/ClickHouse/ClickHouse/pull/8176) ([tavplubix](https://github.com/tavplubix)) +* Fixed a trivial count query for a distributed table if there are more than two shard local table. [#8164](https://github.com/ClickHouse/ClickHouse/pull/8164) ([小路](https://github.com/nicelulu)) +* Fixed bug that lead to a data race in DB::BlockStreamProfileInfo::calculateRowsBeforeLimit() [#8143](https://github.com/ClickHouse/ClickHouse/pull/8143) ([Alexander Kazakov](https://github.com/Akazz)) +* Fixed `ALTER table MOVE part` executed immediately after merging the specified part, which could cause moving a part which the specified part merged into. Now it correctly moves the specified part. [#8104](https://github.com/ClickHouse/ClickHouse/pull/8104) ([Vladimir Chebotarev](https://github.com/excitoon)) +* Expressions for dictionaries can be specified as strings now. This is useful for calculation of attributes while extracting data from non-ClickHouse sources because it allows to use non-ClickHouse syntax for those expressions. [#8098](https://github.com/ClickHouse/ClickHouse/pull/8098) ([alesapin](https://github.com/alesapin)) +* Fixed a very rare race in `clickhouse-copier` because of an overflow in ZXid. [#8088](https://github.com/ClickHouse/ClickHouse/pull/8088) ([Ding Xiang Fei](https://github.com/dingxiangfei2009)) +* Fixed the bug when after the query failed (due to "Too many simultaneous queries" for example) it would not read external tables info, and the +next request would interpret this info as the beginning of the next query causing an error like `Unknown packet from client`. [#8084](https://github.com/ClickHouse/ClickHouse/pull/8084) ([Azat Khuzhin](https://github.com/azat)) +* Avoid null dereference after "Unknown packet X from server" [#8071](https://github.com/ClickHouse/ClickHouse/pull/8071) ([Azat Khuzhin](https://github.com/azat)) +* Restore support of all ICU locales, add the ability to apply collations for constant expressions and add language name to system.collations table. [#8051](https://github.com/ClickHouse/ClickHouse/pull/8051) ([alesapin](https://github.com/alesapin)) +* Number of streams for read from `StorageFile` and `StorageHDFS` is now limited, to avoid exceeding the memory limit. [#7981](https://github.com/ClickHouse/ClickHouse/pull/7981) ([alesapin](https://github.com/alesapin)) +* Fixed `CHECK TABLE` query for `*MergeTree` tables without key. [#7979](https://github.com/ClickHouse/ClickHouse/pull/7979) ([alesapin](https://github.com/alesapin)) +* Removed the mutation number from a part name in case there were no mutations. This removing improved the compatibility with older versions. [#8250](https://github.com/ClickHouse/ClickHouse/pull/8250) ([alesapin](https://github.com/alesapin)) +* Fixed the bug that mutations are skipped for some attached parts due to their data_version are larger than the table mutation version. [#7812](https://github.com/ClickHouse/ClickHouse/pull/7812) ([Zhichang Yu](https://github.com/yuzhichang)) +* Allow starting the server with redundant copies of parts after moving them to another device. [#7810](https://github.com/ClickHouse/ClickHouse/pull/7810) ([Vladimir Chebotarev](https://github.com/excitoon)) +* Fixed the error "Sizes of columns doesn't match" that might appear when using aggregate function columns. [#7790](https://github.com/ClickHouse/ClickHouse/pull/7790) ([Boris Granveaud](https://github.com/bgranvea)) +* Now an exception will be thrown in case of using WITH TIES alongside LIMIT BY. And now it's possible to use TOP with LIMIT BY. [#7637](https://github.com/ClickHouse/ClickHouse/pull/7637) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)) +* Fix dictionary reload if it has `invalidate_query`, which stopped updates and some exception on previous update tries. [#8029](https://github.com/ClickHouse/ClickHouse/pull/8029) ([alesapin](https://github.com/alesapin)) + + ## ClickHouse release v19.17.4.11, 2019-11-22 ### Backward Incompatible Change diff --git a/CMakeLists.txt b/CMakeLists.txt index fd83e6f39a1..623b6ac9966 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -176,7 +176,9 @@ if (ARCH_NATIVE) set (COMPILER_FLAGS "${COMPILER_FLAGS} -march=native") endif () -set (CMAKE_CXX_STANDARD 17) +# cmake < 3.12 doesn't supoprt 20. We'll set CMAKE_CXX_FLAGS for now +# set (CMAKE_CXX_STANDARD 20) +set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++2a") set (CMAKE_CXX_EXTENSIONS 0) # https://cmake.org/cmake/help/latest/prop_tgt/CXX_EXTENSIONS.html#prop_tgt:CXX_EXTENSIONS set (CMAKE_CXX_STANDARD_REQUIRED ON) @@ -208,7 +210,7 @@ set (CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -O0 -g3 -ggdb3 if (COMPILER_CLANG) # Exception unwinding doesn't work in clang release build without this option - # TODO investigate if contrib/libcxxabi is out of date + # TODO investigate that set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-omit-frame-pointer") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fno-omit-frame-pointer") endif () @@ -248,8 +250,16 @@ endif () string (TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE_UC) set (CMAKE_POSTFIX_VARIABLE "CMAKE_${CMAKE_BUILD_TYPE_UC}_POSTFIX") -if (NOT MAKE_STATIC_LIBRARIES) - set(CMAKE_POSITION_INDEPENDENT_CODE ON) +if (MAKE_STATIC_LIBRARIES) + set (CMAKE_POSITION_INDEPENDENT_CODE OFF) + if (OS_LINUX) + # Slightly more efficient code can be generated + set (CMAKE_CXX_FLAGS_RELWITHDEBINFO "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} -fno-pie") + set (CMAKE_C_FLAGS_RELWITHDEBINFO "${CMAKE_C_FLAGS_RELWITHDEBINFO} -fno-pie") + set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,-no-pie") + endif () +else () + set (CMAKE_POSITION_INDEPENDENT_CODE ON) endif () # Using "include-what-you-use" tool. diff --git a/cmake/darwin/default_libs.cmake b/cmake/darwin/default_libs.cmake index 679fef8808a..6010ea0f5de 100644 --- a/cmake/darwin/default_libs.cmake +++ b/cmake/darwin/default_libs.cmake @@ -15,6 +15,7 @@ set(CMAKE_C_STANDARD_LIBRARIES ${DEFAULT_LIBS}) set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mmacosx-version-min=10.14") set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mmacosx-version-min=10.14") +set (CMAKE_ASM_FLAGS "${CMAKE_ASM_FLAGS} -mmacosx-version-min=10.14") set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -mmacosx-version-min=10.14") set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -mmacosx-version-min=10.14") diff --git a/cmake/darwin/toolchain-x86_64.cmake b/cmake/darwin/toolchain-x86_64.cmake index 9128311e3bb..0be81dfa753 100644 --- a/cmake/darwin/toolchain-x86_64.cmake +++ b/cmake/darwin/toolchain-x86_64.cmake @@ -2,6 +2,7 @@ set (CMAKE_SYSTEM_NAME "Darwin") set (CMAKE_SYSTEM_PROCESSOR "x86_64") set (CMAKE_C_COMPILER_TARGET "x86_64-apple-darwin") set (CMAKE_CXX_COMPILER_TARGET "x86_64-apple-darwin") +set (CMAKE_ASM_COMPILER_TARGET "x86_64-apple-darwin") set (CMAKE_OSX_SYSROOT "${CMAKE_CURRENT_LIST_DIR}/../toolchain/darwin-x86_64") set (CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY) # disable linkage check - it doesn't work in CMake diff --git a/cmake/find/icu.cmake b/cmake/find/icu.cmake index 8ebe2f9befd..7beb25626b9 100644 --- a/cmake/find/icu.cmake +++ b/cmake/find/icu.cmake @@ -1,4 +1,8 @@ -option(ENABLE_ICU "Enable ICU" ${ENABLE_LIBRARIES}) +if (OS_LINUX) + option(ENABLE_ICU "Enable ICU" ${ENABLE_LIBRARIES}) +else () + option(ENABLE_ICU "Enable ICU" 0) +endif () if (ENABLE_ICU) diff --git a/cmake/target.cmake b/cmake/target.cmake index f1b18786d1d..1f40e28e76b 100644 --- a/cmake/target.cmake +++ b/cmake/target.cmake @@ -13,7 +13,6 @@ if (CMAKE_CROSSCOMPILING) if (OS_DARWIN) # FIXME: broken dependencies set (USE_SNAPPY OFF CACHE INTERNAL "") - set (ENABLE_SSL OFF CACHE INTERNAL "") set (ENABLE_PROTOBUF OFF CACHE INTERNAL "") set (ENABLE_PARQUET OFF CACHE INTERNAL "") set (ENABLE_ICU OFF CACHE INTERNAL "") diff --git a/contrib/CMakeLists.txt b/contrib/CMakeLists.txt index 7d25e0d60cb..fe3e4e83f03 100644 --- a/contrib/CMakeLists.txt +++ b/contrib/CMakeLists.txt @@ -2,10 +2,10 @@ if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -w") - set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -w -std=c++1z") + set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -w") elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -w") - set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -w -std=c++1z") + set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -w") endif () set_property(DIRECTORY PROPERTY EXCLUDE_FROM_ALL 1) @@ -32,6 +32,8 @@ if (USE_INTERNAL_DOUBLE_CONVERSION_LIBRARY) add_subdirectory (double-conversion-cmake) endif () +add_subdirectory (ryu-cmake) + if (USE_INTERNAL_CITYHASH_LIBRARY) add_subdirectory (cityhash102) endif () @@ -250,6 +252,7 @@ if (USE_EMBEDDED_COMPILER AND USE_INTERNAL_LLVM_LIBRARY) endif () set (LLVM_ENABLE_EH 1 CACHE INTERNAL "") set (LLVM_ENABLE_RTTI 1 CACHE INTERNAL "") + set (LLVM_ENABLE_PIC 0 CACHE INTERNAL "") set (LLVM_TARGETS_TO_BUILD "X86;AArch64" CACHE STRING "") add_subdirectory (llvm/llvm) endif () diff --git a/contrib/arrow-cmake/CMakeLists.txt b/contrib/arrow-cmake/CMakeLists.txt index 335106cc7ca..1f09bba8d31 100644 --- a/contrib/arrow-cmake/CMakeLists.txt +++ b/contrib/arrow-cmake/CMakeLists.txt @@ -1,5 +1,7 @@ include(ExternalProject) +set (CMAKE_CXX_STANDARD 17) + # === thrift set(LIBRARY_DIR ${ClickHouse_SOURCE_DIR}/contrib/thrift/lib/cpp) diff --git a/contrib/aws-s3-cmake/CMakeLists.txt b/contrib/aws-s3-cmake/CMakeLists.txt index 667ca43c501..e86ac0cb5a6 100644 --- a/contrib/aws-s3-cmake/CMakeLists.txt +++ b/contrib/aws-s3-cmake/CMakeLists.txt @@ -6,88 +6,87 @@ SET(AWS_EVENT_STREAM_LIBRARY_DIR ${ClickHouse_SOURCE_DIR}/contrib/aws-c-event-st OPTION(USE_AWS_MEMORY_MANAGEMENT "Aws memory management" OFF) configure_file("${AWS_CORE_LIBRARY_DIR}/include/aws/core/SDKConfig.h.in" - "${CMAKE_CURRENT_BINARY_DIR}/include/aws/core/SDKConfig.h" @ONLY) + "${CMAKE_CURRENT_BINARY_DIR}/include/aws/core/SDKConfig.h" @ONLY) configure_file("${AWS_COMMON_LIBRARY_DIR}/include/aws/common/config.h.in" - "${CMAKE_CURRENT_BINARY_DIR}/include/aws/common/config.h" @ONLY) + "${CMAKE_CURRENT_BINARY_DIR}/include/aws/common/config.h" @ONLY) file(GLOB AWS_CORE_SOURCES - "${AWS_CORE_LIBRARY_DIR}/source/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/auth/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/client/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/http/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/http/standard/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/http/curl/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/config/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/external/cjson/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/external/tinyxml2/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/internal/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/monitoring/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/net/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/linux-shared/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/platform/linux-shared/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/base64/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/event/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/crypto/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/crypto/openssl/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/crypto/factory/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/json/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/logging/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/memory/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/memory/stl/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/stream/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/threading/*.cpp" - "${AWS_CORE_LIBRARY_DIR}/source/utils/xml/*.cpp" - ) + "${AWS_CORE_LIBRARY_DIR}/source/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/auth/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/client/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/http/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/http/standard/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/http/curl/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/config/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/external/cjson/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/external/tinyxml2/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/internal/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/monitoring/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/net/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/linux-shared/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/platform/linux-shared/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/base64/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/event/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/crypto/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/crypto/openssl/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/crypto/factory/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/json/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/logging/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/memory/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/memory/stl/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/stream/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/threading/*.cpp" + "${AWS_CORE_LIBRARY_DIR}/source/utils/xml/*.cpp" +) file(GLOB AWS_S3_SOURCES - "${AWS_S3_LIBRARY_DIR}/source/*.cpp" - ) + "${AWS_S3_LIBRARY_DIR}/source/*.cpp" +) file(GLOB AWS_S3_MODEL_SOURCES - "${AWS_S3_LIBRARY_DIR}/source/model/*.cpp" - ) + "${AWS_S3_LIBRARY_DIR}/source/model/*.cpp" +) file(GLOB AWS_EVENT_STREAM_SOURCES - "${AWS_EVENT_STREAM_LIBRARY_DIR}/source/*.c" - ) + "${AWS_EVENT_STREAM_LIBRARY_DIR}/source/*.c" +) file(GLOB AWS_COMMON_SOURCES - "${AWS_COMMON_LIBRARY_DIR}/source/*.c" - "${AWS_COMMON_LIBRARY_DIR}/source/posix/*.c" - ) + "${AWS_COMMON_LIBRARY_DIR}/source/*.c" + "${AWS_COMMON_LIBRARY_DIR}/source/posix/*.c" +) file(GLOB AWS_CHECKSUMS_SOURCES - "${AWS_CHECKSUMS_LIBRARY_DIR}/source/*.c" - "${AWS_CHECKSUMS_LIBRARY_DIR}/source/intel/*.c" - "${AWS_CHECKSUMS_LIBRARY_DIR}/source/arm/*.c" - ) + "${AWS_CHECKSUMS_LIBRARY_DIR}/source/*.c" + "${AWS_CHECKSUMS_LIBRARY_DIR}/source/intel/*.c" + "${AWS_CHECKSUMS_LIBRARY_DIR}/source/arm/*.c" +) file(GLOB S3_UNIFIED_SRC - ${AWS_EVENT_STREAM_SOURCES} - ${AWS_COMMON_SOURCES} - ${AWS_S3_SOURCES} - ${AWS_S3_MODEL_SOURCES} - ${AWS_CORE_SOURCES} - ) + ${AWS_EVENT_STREAM_SOURCES} + ${AWS_COMMON_SOURCES} + ${AWS_S3_SOURCES} + ${AWS_S3_MODEL_SOURCES} + ${AWS_CORE_SOURCES} +) set(S3_INCLUDES - "${CMAKE_CURRENT_SOURCE_DIR}/include/" - "${AWS_COMMON_LIBRARY_DIR}/include/" - "${AWS_EVENT_STREAM_LIBRARY_DIR}/include/" - "${AWS_S3_LIBRARY_DIR}/include/" - "${AWS_CORE_LIBRARY_DIR}/include/" - "${CMAKE_CURRENT_BINARY_DIR}/include/" - ) + "${CMAKE_CURRENT_SOURCE_DIR}/include/" + "${AWS_COMMON_LIBRARY_DIR}/include/" + "${AWS_EVENT_STREAM_LIBRARY_DIR}/include/" + "${AWS_S3_LIBRARY_DIR}/include/" + "${AWS_CORE_LIBRARY_DIR}/include/" + "${CMAKE_CURRENT_BINARY_DIR}/include/" +) add_library(aws_s3_checksums ${AWS_CHECKSUMS_SOURCES}) target_include_directories(aws_s3_checksums PUBLIC "${AWS_CHECKSUMS_LIBRARY_DIR}/include/") if(CMAKE_BUILD_TYPE STREQUAL "" OR CMAKE_BUILD_TYPE STREQUAL "Debug") target_compile_definitions(aws_s3_checksums PRIVATE "-DDEBUG_BUILD") endif() -set_target_properties(aws_s3_checksums PROPERTIES COMPILE_OPTIONS -fPIC) set_target_properties(aws_s3_checksums PROPERTIES LINKER_LANGUAGE C) set_property(TARGET aws_s3_checksums PROPERTY C_STANDARD 99) diff --git a/contrib/capnproto-cmake/CMakeLists.txt b/contrib/capnproto-cmake/CMakeLists.txt index c54b4e8eae5..8bdac0beec0 100644 --- a/contrib/capnproto-cmake/CMakeLists.txt +++ b/contrib/capnproto-cmake/CMakeLists.txt @@ -1,5 +1,7 @@ set (CAPNPROTO_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/capnproto/c++/src) +set (CMAKE_CXX_STANDARD 17) + set (KJ_SRCS ${CAPNPROTO_SOURCE_DIR}/kj/array.c++ ${CAPNPROTO_SOURCE_DIR}/kj/common.c++ diff --git a/contrib/curl-cmake/CMakeLists.txt b/contrib/curl-cmake/CMakeLists.txt index 17aeef6e165..53a255c7d8b 100644 --- a/contrib/curl-cmake/CMakeLists.txt +++ b/contrib/curl-cmake/CMakeLists.txt @@ -65,11 +65,6 @@ if(CMAKE_COMPILER_IS_GNUCC OR CMAKE_COMPILER_IS_CLANG) endif() endif() -# For debug libs and exes, add "-d" postfix -if(NOT DEFINED CMAKE_DEBUG_POSTFIX) - set(CMAKE_DEBUG_POSTFIX "-d") -endif() - # initialize CURL_LIBS set(CURL_LIBS "") @@ -115,8 +110,6 @@ if(ENABLE_IPV6 AND NOT WIN32) endif() endif() -curl_nroff_check() - # We need ansi c-flags, especially on HP set(CMAKE_C_FLAGS "${CMAKE_ANSI_CFLAGS} ${CMAKE_C_FLAGS}") set(CMAKE_REQUIRED_FLAGS ${CMAKE_ANSI_CFLAGS}) @@ -132,21 +125,21 @@ include(CheckCSourceCompiles) if(ENABLE_THREADED_RESOLVER) find_package(Threads REQUIRED) - if(WIN32) - set(USE_THREADS_WIN32 ON) - else() - set(USE_THREADS_POSIX ${CMAKE_USE_PTHREADS_INIT}) - set(HAVE_PTHREAD_H ${CMAKE_USE_PTHREADS_INIT}) - endif() + set(USE_THREADS_POSIX ${CMAKE_USE_PTHREADS_INIT}) + set(HAVE_PTHREAD_H ${CMAKE_USE_PTHREADS_INIT}) set(CURL_LIBS ${CURL_LIBS} ${CMAKE_THREAD_LIBS_INIT}) endif() # Check for all needed libraries -check_library_exists_concat("${CMAKE_DL_LIBS}" dlopen HAVE_LIBDL) -check_library_exists_concat("socket" connect HAVE_LIBSOCKET) -check_library_exists("c" gethostbyname "" NOT_NEED_LIBNSL) -check_function_exists(gethostname HAVE_GETHOSTNAME) +# We don't want any plugin loading at runtime. It is harmful. +#check_library_exists_concat("${CMAKE_DL_LIBS}" dlopen HAVE_LIBDL) + +# This is unneeded. +#check_library_exists_concat("socket" connect HAVE_LIBSOCKET) + +set (NOT_NEED_LIBNSL 1) +set (gethostname HAVE_GETHOSTNAME 1) # From cmake/find/ssl.cmake if (OPENSSL_FOUND) @@ -167,10 +160,12 @@ if (OPENSSL_FOUND) endif() # Check for idn -check_library_exists_concat("idn2" idn2_lookup_ul HAVE_LIBIDN2) +# No, we don't need that. +# check_library_exists_concat("idn2" idn2_lookup_ul HAVE_LIBIDN2) # Check for symbol dlopen (same as HAVE_LIBDL) -check_library_exists("${CURL_LIBS}" dlopen "" HAVE_DLOPEN) +# We don't want any plugin loading at runtime. It is harmful. +# check_library_exists("${CURL_LIBS}" dlopen "" HAVE_DLOPEN) # From /cmake/find/zlib.cmake if (ZLIB_FOUND) @@ -181,7 +176,7 @@ if (ZLIB_FOUND) list(APPEND CURL_LIBS ${ZLIB_LIBRARIES}) endif() -option(ENABLE_UNIX_SOCKETS "Define if you want Unix domain sockets support" ON) +option(ENABLE_UNIX_SOCKETS "Define if you want Unix domain sockets support" OFF) if(ENABLE_UNIX_SOCKETS) include(CheckStructHasMember) check_struct_has_member("struct sockaddr_un" sun_path "sys/un.h" USE_UNIX_SOCKETS) @@ -217,14 +212,14 @@ check_include_file_concat("sys/utime.h" HAVE_SYS_UTIME_H) check_include_file_concat("sys/xattr.h" HAVE_SYS_XATTR_H) check_include_file_concat("alloca.h" HAVE_ALLOCA_H) check_include_file_concat("arpa/inet.h" HAVE_ARPA_INET_H) -check_include_file_concat("arpa/tftp.h" HAVE_ARPA_TFTP_H) +#check_include_file_concat("arpa/tftp.h" HAVE_ARPA_TFTP_H) check_include_file_concat("assert.h" HAVE_ASSERT_H) check_include_file_concat("crypto.h" HAVE_CRYPTO_H) check_include_file_concat("des.h" HAVE_DES_H) check_include_file_concat("err.h" HAVE_ERR_H) check_include_file_concat("errno.h" HAVE_ERRNO_H) check_include_file_concat("fcntl.h" HAVE_FCNTL_H) -check_include_file_concat("idn2.h" HAVE_IDN2_H) +#check_include_file_concat("idn2.h" HAVE_IDN2_H) check_include_file_concat("ifaddrs.h" HAVE_IFADDRS_H) check_include_file_concat("io.h" HAVE_IO_H) check_include_file_concat("krb.h" HAVE_KRB_H) @@ -259,7 +254,7 @@ check_include_file_concat("x509.h" HAVE_X509_H) check_include_file_concat("process.h" HAVE_PROCESS_H) check_include_file_concat("stddef.h" HAVE_STDDEF_H) -check_include_file_concat("dlfcn.h" HAVE_DLFCN_H) +#check_include_file_concat("dlfcn.h" HAVE_DLFCN_H) check_include_file_concat("malloc.h" HAVE_MALLOC_H) check_include_file_concat("memory.h" HAVE_MEMORY_H) check_include_file_concat("netinet/if_ether.h" HAVE_NETINET_IF_ETHER_H) @@ -276,30 +271,11 @@ check_type_size("int" SIZEOF_INT) check_type_size("__int64" SIZEOF___INT64) check_type_size("long double" SIZEOF_LONG_DOUBLE) check_type_size("time_t" SIZEOF_TIME_T) -if(NOT HAVE_SIZEOF_SSIZE_T) - if(SIZEOF_LONG EQUAL SIZEOF_SIZE_T) - set(ssize_t long) - endif() - if(NOT ssize_t AND SIZEOF___INT64 EQUAL SIZEOF_SIZE_T) - set(ssize_t __int64) - endif() -endif() -# off_t is sized later, after the HAVE_FILE_OFFSET_BITS test -if(HAVE_SIZEOF_LONG_LONG) - set(HAVE_LONGLONG 1) - set(HAVE_LL 1) -endif() +set(HAVE_LONGLONG 1) +set(HAVE_LL 1) -find_file(RANDOM_FILE urandom /dev) -mark_as_advanced(RANDOM_FILE) - -# Check for some functions that are used -if(HAVE_LIBWS2_32) - set(CMAKE_REQUIRED_LIBRARIES ws2_32) -elseif(HAVE_LIBSOCKET) - set(CMAKE_REQUIRED_LIBRARIES socket) -endif() +set(RANDOM_FILE /dev/urandom) check_symbol_exists(basename "${CURL_INCLUDES}" HAVE_BASENAME) check_symbol_exists(socket "${CURL_INCLUDES}" HAVE_SOCKET) @@ -311,18 +287,15 @@ check_symbol_exists(strtok_r "${CURL_INCLUDES}" HAVE_STRTOK_R) check_symbol_exists(strftime "${CURL_INCLUDES}" HAVE_STRFTIME) check_symbol_exists(uname "${CURL_INCLUDES}" HAVE_UNAME) check_symbol_exists(strcasecmp "${CURL_INCLUDES}" HAVE_STRCASECMP) -check_symbol_exists(stricmp "${CURL_INCLUDES}" HAVE_STRICMP) -check_symbol_exists(strcmpi "${CURL_INCLUDES}" HAVE_STRCMPI) -check_symbol_exists(strncmpi "${CURL_INCLUDES}" HAVE_STRNCMPI) +#check_symbol_exists(stricmp "${CURL_INCLUDES}" HAVE_STRICMP) +#check_symbol_exists(strcmpi "${CURL_INCLUDES}" HAVE_STRCMPI) +#check_symbol_exists(strncmpi "${CURL_INCLUDES}" HAVE_STRNCMPI) check_symbol_exists(alarm "${CURL_INCLUDES}" HAVE_ALARM) -if(NOT HAVE_STRNCMPI) - set(HAVE_STRCMPI) -endif() -check_symbol_exists(gethostbyaddr "${CURL_INCLUDES}" HAVE_GETHOSTBYADDR) +#check_symbol_exists(gethostbyaddr "${CURL_INCLUDES}" HAVE_GETHOSTBYADDR) check_symbol_exists(gethostbyaddr_r "${CURL_INCLUDES}" HAVE_GETHOSTBYADDR_R) check_symbol_exists(gettimeofday "${CURL_INCLUDES}" HAVE_GETTIMEOFDAY) check_symbol_exists(inet_addr "${CURL_INCLUDES}" HAVE_INET_ADDR) -check_symbol_exists(inet_ntoa "${CURL_INCLUDES}" HAVE_INET_NTOA) +#check_symbol_exists(inet_ntoa "${CURL_INCLUDES}" HAVE_INET_NTOA) check_symbol_exists(inet_ntoa_r "${CURL_INCLUDES}" HAVE_INET_NTOA_R) check_symbol_exists(tcsetattr "${CURL_INCLUDES}" HAVE_TCSETATTR) check_symbol_exists(tcgetattr "${CURL_INCLUDES}" HAVE_TCGETATTR) @@ -331,8 +304,8 @@ check_symbol_exists(closesocket "${CURL_INCLUDES}" HAVE_CLOSESOCKET) check_symbol_exists(setvbuf "${CURL_INCLUDES}" HAVE_SETVBUF) check_symbol_exists(sigsetjmp "${CURL_INCLUDES}" HAVE_SIGSETJMP) check_symbol_exists(getpass_r "${CURL_INCLUDES}" HAVE_GETPASS_R) -check_symbol_exists(strlcat "${CURL_INCLUDES}" HAVE_STRLCAT) -check_symbol_exists(getpwuid "${CURL_INCLUDES}" HAVE_GETPWUID) +#check_symbol_exists(strlcat "${CURL_INCLUDES}" HAVE_STRLCAT) +#check_symbol_exists(getpwuid "${CURL_INCLUDES}" HAVE_GETPWUID) check_symbol_exists(getpwuid_r "${CURL_INCLUDES}" HAVE_GETPWUID_R) check_symbol_exists(geteuid "${CURL_INCLUDES}" HAVE_GETEUID) check_symbol_exists(usleep "${CURL_INCLUDES}" HAVE_USLEEP) @@ -340,17 +313,15 @@ check_symbol_exists(utime "${CURL_INCLUDES}" HAVE_UTIME) check_symbol_exists(gmtime_r "${CURL_INCLUDES}" HAVE_GMTIME_R) check_symbol_exists(localtime_r "${CURL_INCLUDES}" HAVE_LOCALTIME_R) -check_symbol_exists(gethostbyname "${CURL_INCLUDES}" HAVE_GETHOSTBYNAME) +#check_symbol_exists(gethostbyname "${CURL_INCLUDES}" HAVE_GETHOSTBYNAME) check_symbol_exists(gethostbyname_r "${CURL_INCLUDES}" HAVE_GETHOSTBYNAME_R) check_symbol_exists(signal "${CURL_INCLUDES}" HAVE_SIGNAL_FUNC) check_symbol_exists(SIGALRM "${CURL_INCLUDES}" HAVE_SIGNAL_MACRO) -if(HAVE_SIGNAL_FUNC AND HAVE_SIGNAL_MACRO) - set(HAVE_SIGNAL 1) -endif() +set(HAVE_SIGNAL 1) check_symbol_exists(uname "${CURL_INCLUDES}" HAVE_UNAME) check_symbol_exists(strtoll "${CURL_INCLUDES}" HAVE_STRTOLL) -check_symbol_exists(_strtoi64 "${CURL_INCLUDES}" HAVE__STRTOI64) +#check_symbol_exists(_strtoi64 "${CURL_INCLUDES}" HAVE__STRTOI64) check_symbol_exists(strerror_r "${CURL_INCLUDES}" HAVE_STRERROR_R) check_symbol_exists(siginterrupt "${CURL_INCLUDES}" HAVE_SIGINTERRUPT) check_symbol_exists(perror "${CURL_INCLUDES}" HAVE_PERROR) diff --git a/contrib/icu-cmake/CMakeLists.txt b/contrib/icu-cmake/CMakeLists.txt index 64e82366076..b4903c141fb 100644 --- a/contrib/icu-cmake/CMakeLists.txt +++ b/contrib/icu-cmake/CMakeLists.txt @@ -1,6 +1,8 @@ set(ICU_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/icu/icu4c/source) set(ICUDATA_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/icudata/) +set (CMAKE_CXX_STANDARD 17) + # These lists of sources were generated from build log of the original ICU build system (configure + make). set(ICUUC_SOURCES diff --git a/contrib/libc-headers b/contrib/libc-headers index cd82fd9d8ee..9676d2645a7 160000 --- a/contrib/libc-headers +++ b/contrib/libc-headers @@ -1 +1 @@ -Subproject commit cd82fd9d8eefe50a47a0adf7c617c3ea7d558d11 +Subproject commit 9676d2645a713e679dc981ffd84dee99fcd68b8e diff --git a/contrib/libcxx b/contrib/libcxx index f7c63235238..a8c45330087 160000 --- a/contrib/libcxx +++ b/contrib/libcxx @@ -1 +1 @@ -Subproject commit f7c63235238a71b7e0563fab8c7c5ec1b54831f6 +Subproject commit a8c453300879d0bf255f9d5959d42e2c8aac1bfb diff --git a/contrib/libcxx-cmake/CMakeLists.txt b/contrib/libcxx-cmake/CMakeLists.txt index ee5fe625079..3d7447b7bf0 100644 --- a/contrib/libcxx-cmake/CMakeLists.txt +++ b/contrib/libcxx-cmake/CMakeLists.txt @@ -47,6 +47,11 @@ add_library(cxx ${SRCS}) target_include_directories(cxx SYSTEM BEFORE PUBLIC $) target_compile_definitions(cxx PRIVATE -D_LIBCPP_BUILDING_LIBRARY -DLIBCXX_BUILDING_LIBCXXABI) +# Enable capturing stack traces for all exceptions. +if (USE_UNWIND) + target_compile_definitions(cxx PUBLIC -DSTD_EXCEPTION_HAS_STACK_TRACE=1) +endif () + target_compile_options(cxx PUBLIC $<$:-nostdinc++>) check_cxx_compiler_flag(-Wreserved-id-macro HAVE_WARNING_RESERVED_ID_MACRO) diff --git a/contrib/libcxxabi-cmake/CMakeLists.txt b/contrib/libcxxabi-cmake/CMakeLists.txt index 68bb5606689..daeb209603b 100644 --- a/contrib/libcxxabi-cmake/CMakeLists.txt +++ b/contrib/libcxxabi-cmake/CMakeLists.txt @@ -32,6 +32,11 @@ target_compile_definitions(cxxabi PRIVATE -D_LIBCPP_BUILDING_LIBRARY) target_compile_options(cxxabi PRIVATE -nostdinc++ -fno-sanitize=undefined -Wno-macro-redefined) # If we don't disable UBSan, infinite recursion happens in dynamic_cast. target_link_libraries(cxxabi PUBLIC ${EXCEPTION_HANDLING_LIBRARY}) +# Enable capturing stack traces for all exceptions. +if (USE_UNWIND) + target_compile_definitions(cxxabi PUBLIC -DSTD_EXCEPTION_HAS_STACK_TRACE=1) +endif () + install( TARGETS cxxabi EXPORT global diff --git a/contrib/libhdfs3-cmake/CMake/Platform.cmake b/contrib/libhdfs3-cmake/CMake/Platform.cmake index d9bc760ee3f..fec1d974519 100644 --- a/contrib/libhdfs3-cmake/CMake/Platform.cmake +++ b/contrib/libhdfs3-cmake/CMake/Platform.cmake @@ -7,10 +7,14 @@ ELSE(CMAKE_SYSTEM_NAME STREQUAL "Linux") ENDIF(CMAKE_SYSTEM_NAME STREQUAL "Linux") IF(CMAKE_COMPILER_IS_GNUCXX) - EXECUTE_PROCESS(COMMAND ${CMAKE_CXX_COMPILER} -dumpversion OUTPUT_VARIABLE GCC_COMPILER_VERSION) + EXECUTE_PROCESS(COMMAND ${CMAKE_CXX_COMPILER} -dumpfullversion OUTPUT_VARIABLE GCC_COMPILER_VERSION) IF (NOT GCC_COMPILER_VERSION) - MESSAGE(FATAL_ERROR "Cannot get gcc version") + EXECUTE_PROCESS(COMMAND ${CMAKE_CXX_COMPILER} -dumpversion OUTPUT_VARIABLE GCC_COMPILER_VERSION) + + IF (NOT GCC_COMPILER_VERSION) + MESSAGE(FATAL_ERROR "Cannot get gcc version") + ENDIF (NOT GCC_COMPILER_VERSION) ENDIF (NOT GCC_COMPILER_VERSION) STRING(REGEX MATCHALL "[0-9]+" GCC_COMPILER_VERSION ${GCC_COMPILER_VERSION}) diff --git a/contrib/openssl-cmake/CMakeLists.txt b/contrib/openssl-cmake/CMakeLists.txt index c2e74dc0023..8dc0c6ae6c5 100644 --- a/contrib/openssl-cmake/CMakeLists.txt +++ b/contrib/openssl-cmake/CMakeLists.txt @@ -1,16 +1,6 @@ set(OPENSSL_SOURCE_DIR ${ClickHouse_SOURCE_DIR}/contrib/openssl) set(OPENSSL_BINARY_DIR ${ClickHouse_BINARY_DIR}/contrib/openssl) -#file(READ ${CMAKE_CURRENT_SOURCE_DIR}/${OPENSSL_SOURCE_DIR}/ssl/VERSION SSL_VERSION) -#string(STRIP ${SSL_VERSION} SSL_VERSION) -#string(REPLACE ":" "." SSL_VERSION ${SSL_VERSION}) -#string(REGEX REPLACE "\\..*" "" SSL_MAJOR_VERSION ${SSL_VERSION}) - -#file(READ ${CMAKE_CURRENT_SOURCE_DIR}/${OPENSSL_SOURCE_DIR}/crypto/VERSION CRYPTO_VERSION) -#string(STRIP ${CRYPTO_VERSION} CRYPTO_VERSION) -#string(REPLACE ":" "." CRYPTO_VERSION ${CRYPTO_VERSION}) -#string(REGEX REPLACE "\\..*" "" CRYPTO_MAJOR_VERSION ${CRYPTO_VERSION}) - set(OPENSSLDIR "/etc/ssl" CACHE PATH "Set the default openssl directory") set(OPENSSL_ENGINESDIR "/usr/lib/engines-3" CACHE PATH "Set the default openssl directory for engines") set(OPENSSL_MODULESDIR "/usr/local/lib/ossl-modules" CACHE PATH "Set the default openssl directory for modules") @@ -27,19 +17,27 @@ elseif(ARCH_AARCH64) endif() enable_language(ASM) + if (COMPILER_CLANG) add_definitions(-Wno-unused-command-line-argument) endif () if (ARCH_AMD64) + if (OS_DARWIN) + set (OPENSSL_SYSTEM "macosx") + endif () + macro(perl_generate_asm FILE_IN FILE_OUT) + get_filename_component(DIRNAME ${FILE_OUT} DIRECTORY) + file(MAKE_DIRECTORY ${DIRNAME}) add_custom_command(OUTPUT ${FILE_OUT} - COMMAND /usr/bin/env perl ${FILE_IN} ${FILE_OUT} + COMMAND /usr/bin/env perl ${FILE_IN} ${OPENSSL_SYSTEM} ${FILE_OUT} # ASM code has broken unwind tables (CFI), strip them. # Otherwise asynchronous unwind (that we use for query profiler) # will lead to segfault while trying to interpret wrong "CFA expression". COMMAND sed -i -e '/^\.cfi_/d' ${FILE_OUT}) endmacro() + perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/aes/asm/aes-x86_64.pl ${OPENSSL_BINARY_DIR}/crypto/aes/aes-x86_64.s) perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/aes/asm/aesni-mb-x86_64.pl ${OPENSSL_BINARY_DIR}/crypto/aes/aesni-mb-x86_64.s) perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/aes/asm/aesni-sha1-x86_64.pl ${OPENSSL_BINARY_DIR}/crypto/aes/aesni-sha1-x86_64.s) @@ -70,12 +68,17 @@ if (ARCH_AMD64) perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/sha/asm/sha512-x86_64.pl ${OPENSSL_BINARY_DIR}/crypto/sha/sha256-x86_64.s) # This is not a mistake perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/sha/asm/sha512-x86_64.pl ${OPENSSL_BINARY_DIR}/crypto/sha/sha512-x86_64.s) perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/whrlpool/asm/wp-x86_64.pl ${OPENSSL_BINARY_DIR}/crypto/whrlpool/wp-x86_64.s) + elseif (ARCH_AARCH64) + macro(perl_generate_asm FILE_IN FILE_OUT) + get_filename_component(DIRNAME ${FILE_OUT} DIRECTORY) + file(MAKE_DIRECTORY ${DIRNAME}) add_custom_command(OUTPUT ${FILE_OUT} COMMAND /usr/bin/env perl ${FILE_IN} "linux64" ${FILE_OUT}) # Hope that the ASM code for AArch64 doesn't have broken CFI. Otherwise, add the same sed as for x86_64. endmacro() + perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/aes/asm/aesv8-armx.pl ${OPENSSL_BINARY_DIR}/crypto/aes/aesv8-armx.S) perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/aes/asm/vpaes-armv8.pl ${OPENSSL_BINARY_DIR}/crypto/aes/vpaes-armv8.S) perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/bn/asm/armv8-mont.pl ${OPENSSL_BINARY_DIR}/crypto/bn/armv8-mont.S) @@ -88,6 +91,7 @@ elseif (ARCH_AARCH64) perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/sha/asm/sha1-armv8.pl ${OPENSSL_BINARY_DIR}/crypto/sha/sha1-armv8.S) perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/sha/asm/sha512-armv8.pl ${OPENSSL_BINARY_DIR}/crypto/sha/sha256-armv8.S) # This is not a mistake perl_generate_asm(${OPENSSL_SOURCE_DIR}/crypto/sha/asm/sha512-armv8.pl ${OPENSSL_BINARY_DIR}/crypto/sha/sha512-armv8.S) + endif () set(CRYPTO_SRCS diff --git a/contrib/ryu b/contrib/ryu new file mode 160000 index 00000000000..5b4a853534b --- /dev/null +++ b/contrib/ryu @@ -0,0 +1 @@ +Subproject commit 5b4a853534b47438b4d97935370f6b2397137c2b diff --git a/contrib/ryu-cmake/CMakeLists.txt b/contrib/ryu-cmake/CMakeLists.txt new file mode 100644 index 00000000000..bf46fdc61a7 --- /dev/null +++ b/contrib/ryu-cmake/CMakeLists.txt @@ -0,0 +1,10 @@ +SET(LIBRARY_DIR ${ClickHouse_SOURCE_DIR}/contrib/ryu) + +add_library(ryu +${LIBRARY_DIR}/ryu/d2fixed.c +${LIBRARY_DIR}/ryu/d2s.c +${LIBRARY_DIR}/ryu/f2s.c +${LIBRARY_DIR}/ryu/generic_128.c +) + +target_include_directories(ryu SYSTEM BEFORE PUBLIC "${LIBRARY_DIR}") diff --git a/contrib/zlib-ng b/contrib/zlib-ng index 5673222fbd3..bba56a73be2 160000 --- a/contrib/zlib-ng +++ b/contrib/zlib-ng @@ -1 +1 @@ -Subproject commit 5673222fbd37ea89afb2ea73096f9bf5ec68ea31 +Subproject commit bba56a73be249514acfbc7d49aa2a68994dad8ab diff --git a/dbms/CMakeLists.txt b/dbms/CMakeLists.txt index 466b3daf94f..e0c8b7da37a 100644 --- a/dbms/CMakeLists.txt +++ b/dbms/CMakeLists.txt @@ -330,6 +330,7 @@ target_link_libraries (clickhouse_common_io ${LINK_LIBRARIES_ONLY_ON_X86_64} PUBLIC ${DOUBLE_CONVERSION_LIBRARIES} + ryu PUBLIC ${Poco_Net_LIBRARY} ${Poco_Util_LIBRARY} diff --git a/dbms/cmake/version.cmake b/dbms/cmake/version.cmake index 220af3d87dc..6d6700bc649 100644 --- a/dbms/cmake/version.cmake +++ b/dbms/cmake/version.cmake @@ -1,11 +1,11 @@ # This strings autochanged from release_lib.sh: -set(VERSION_REVISION 54430) -set(VERSION_MAJOR 19) -set(VERSION_MINOR 19) +set(VERSION_REVISION 54431) +set(VERSION_MAJOR 20) +set(VERSION_MINOR 1) set(VERSION_PATCH 1) -set(VERSION_GITHASH 8bd9709d1dec3366e35d2efeab213435857f67a9) -set(VERSION_DESCRIBE v19.19.1.1-prestable) -set(VERSION_STRING 19.19.1.1) +set(VERSION_GITHASH 51d4c8a53be94504e3607b2232e12e5ef7a8ec28) +set(VERSION_DESCRIBE v20.1.1.1-prestable) +set(VERSION_STRING 20.1.1.1) # end of autochange set(VERSION_EXTRA "" CACHE STRING "") diff --git a/dbms/programs/client/Client.cpp b/dbms/programs/client/Client.cpp index 9e5ce4211ec..a38906c9620 100644 --- a/dbms/programs/client/Client.cpp +++ b/dbms/programs/client/Client.cpp @@ -2,7 +2,6 @@ #include "ConnectionParameters.h" #include "Suggest.h" -#include #include #include #include @@ -263,7 +262,7 @@ private: && std::string::npos == embedded_stack_trace_pos) { std::cerr << "Stack trace:" << std::endl - << e.getStackTrace().toString(); + << e.getStackTraceString(); } /// If exception code isn't zero, we should return non-zero return code anyway. @@ -290,6 +289,78 @@ private: || (now.month() == 1 && now.day() <= 5); } + bool isChineseNewYearMode(const String & local_tz) + { + /// Days of Dec. 20 in Chinese calendar starting from year 2019 to year 2105 + static constexpr UInt16 chineseNewYearIndicators[] + = {18275, 18659, 19014, 19368, 19752, 20107, 20491, 20845, 21199, 21583, 21937, 22292, 22676, 23030, 23414, 23768, 24122, 24506, + 24860, 25215, 25599, 25954, 26308, 26692, 27046, 27430, 27784, 28138, 28522, 28877, 29232, 29616, 29970, 30354, 30708, 31062, + 31446, 31800, 32155, 32539, 32894, 33248, 33632, 33986, 34369, 34724, 35078, 35462, 35817, 36171, 36555, 36909, 37293, 37647, + 38002, 38386, 38740, 39095, 39479, 39833, 40187, 40571, 40925, 41309, 41664, 42018, 42402, 42757, 43111, 43495, 43849, 44233, + 44587, 44942, 45326, 45680, 46035, 46418, 46772, 47126, 47510, 47865, 48249, 48604, 48958, 49342}; + static constexpr size_t N = sizeof(chineseNewYearIndicators) / sizeof(chineseNewYearIndicators[0]); + + /// All time zone names are acquired from https://www.iana.org/time-zones + static constexpr const char * chineseNewYearTimeZoneIndicators[] = { + /// Time zones celebrating Chinese new year. + "Asia/Shanghai", + "Asia/Chongqing", + "Asia/Harbin", + "Asia/Urumqi", + "Asia/Hong_Kong", + "Asia/Chungking", + "Asia/Macao", + "Asia/Macau", + "Asia/Taipei", + "Asia/Singapore", + + /// Time zones celebrating Chinese new year but with different festival names. Let's not print the message for now. + // "Asia/Brunei", + // "Asia/Ho_Chi_Minh", + // "Asia/Hovd", + // "Asia/Jakarta", + // "Asia/Jayapura", + // "Asia/Kashgar", + // "Asia/Kuala_Lumpur", + // "Asia/Kuching", + // "Asia/Makassar", + // "Asia/Pontianak", + // "Asia/Pyongyang", + // "Asia/Saigon", + // "Asia/Seoul", + // "Asia/Ujung_Pandang", + // "Asia/Ulaanbaatar", + // "Asia/Ulan_Bator", + }; + static constexpr size_t M = sizeof(chineseNewYearTimeZoneIndicators) / sizeof(chineseNewYearTimeZoneIndicators[0]); + + time_t current_time = time(nullptr); + + if (chineseNewYearTimeZoneIndicators + M + == std::find_if(chineseNewYearTimeZoneIndicators, chineseNewYearTimeZoneIndicators + M, [&local_tz](const char * tz) + { + return tz == local_tz; + })) + return false; + + /// It's bad to be intrusive. + if (current_time % 3 != 0) + return false; + + auto days = DateLUT::instance().toDayNum(current_time).toUnderType(); + for (auto i = 0ul; i < N; ++i) + { + auto d = chineseNewYearIndicators[i]; + + /// Let's celebrate until Lantern Festival + if (d <= days && d + 25u >= days) + return true; + else if (d > days) + return false; + } + return false; + } + int mainImpl() { UseSSL use_ssl; @@ -337,7 +408,7 @@ private: connect(); /// Initialize DateLUT here to avoid counting time spent here as query execution time. - DateLUT::instance(); + const auto local_tz = DateLUT::instance().getTimeZone(); if (!context.getSettingsRef().use_client_time_zone) { const auto & time_zone = connection->getServerTimezone(connection_parameters.timeouts); @@ -448,8 +519,7 @@ private: << "Code: " << e.code() << ". " << e.displayText() << std::endl; if (config().getBool("stacktrace", false)) - std::cerr << "Stack trace:" << std::endl - << e.getStackTrace().toString() << std::endl; + std::cerr << "Stack trace:" << std::endl << e.getStackTraceString() << std::endl; std::cerr << std::endl; @@ -463,7 +533,12 @@ private: } while (true); - std::cout << (isNewYearMode() ? "Happy new year." : "Bye.") << std::endl; + if (isNewYearMode()) + std::cout << "Happy new year." << std::endl; + else if (isChineseNewYearMode(local_tz)) + std::cout << "Happy Chinese new year. 春节快乐!" << std::endl; + else + std::cout << "Bye." << std::endl; return 0; } else @@ -553,27 +628,11 @@ private: } - /// Check if multi-line query is inserted from the paste buffer. - /// Allows delaying the start of query execution until the entirety of query is inserted. - static bool hasDataInSTDIN() - { - timeval timeout = { 0, 0 }; - fd_set fds; - FD_ZERO(&fds); - FD_SET(STDIN_FILENO, &fds); - return select(1, &fds, nullptr, nullptr, &timeout) == 1; - } - inline const String prompt() const { return boost::replace_all_copy(prompt_by_server_display_name, "{database}", config().getString("database", "default")); } - void loop() - { - - } - void nonInteractive() { diff --git a/dbms/programs/local/LocalServer.cpp b/dbms/programs/local/LocalServer.cpp index f84d9d4b6ac..ca45217cb97 100644 --- a/dbms/programs/local/LocalServer.cpp +++ b/dbms/programs/local/LocalServer.cpp @@ -76,7 +76,7 @@ void LocalServer::initialize(Poco::Util::Application & self) if (config().has("logger") || config().has("logger.level") || config().has("logger.log")) { // sensitive data rules are not used here - buildLoggers(config(), logger()); + buildLoggers(config(), logger(), self.commandName()); } else { diff --git a/dbms/programs/odbc-bridge/MainHandler.cpp b/dbms/programs/odbc-bridge/MainHandler.cpp index 73480bf884f..074aaedd7ce 100644 --- a/dbms/programs/odbc-bridge/MainHandler.cpp +++ b/dbms/programs/odbc-bridge/MainHandler.cpp @@ -115,7 +115,7 @@ void ODBCHandler::handleRequest(Poco::Net::HTTPServerRequest & request, Poco::Ne catch (const Exception & ex) { process_error("Invalid 'columns' parameter in request body '" + ex.message() + "'"); - LOG_WARNING(log, ex.getStackTrace().toString()); + LOG_WARNING(log, ex.getStackTraceString()); return; } diff --git a/dbms/programs/odbc-bridge/ODBCBridge.cpp b/dbms/programs/odbc-bridge/ODBCBridge.cpp index 453ee499784..a99e9fcf2c6 100644 --- a/dbms/programs/odbc-bridge/ODBCBridge.cpp +++ b/dbms/programs/odbc-bridge/ODBCBridge.cpp @@ -124,7 +124,7 @@ void ODBCBridge::initialize(Application & self) config().setString("logger", "ODBCBridge"); - buildLoggers(config(), logger()); + buildLoggers(config(), logger(), self.commandName()); log = &logger(); hostname = config().getString("listen-host", "localhost"); diff --git a/dbms/programs/performance-test/PerformanceTest.cpp b/dbms/programs/performance-test/PerformanceTest.cpp index 689f68f8d5e..e1550780b15 100644 --- a/dbms/programs/performance-test/PerformanceTest.cpp +++ b/dbms/programs/performance-test/PerformanceTest.cpp @@ -85,16 +85,6 @@ bool PerformanceTest::checkPreconditions() const for (const std::string & precondition : preconditions) { - if (precondition == "flush_disk_cache") - { - if (system( - "(>&2 echo 'Flushing disk cache...') && (sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches') && (>&2 echo 'Flushed.')")) - { - LOG_WARNING(log, "Failed to flush disk cache"); - return false; - } - } - if (precondition == "ram_size") { size_t ram_size_needed = config->getUInt64("preconditions.ram_size"); @@ -337,7 +327,7 @@ void PerformanceTest::runQueries( { statistics.exception = "Code: " + std::to_string(e.code()) + ", e.displayText() = " + e.displayText(); LOG_WARNING(log, "Code: " << e.code() << ", e.displayText() = " << e.displayText() - << ", Stack trace:\n\n" << e.getStackTrace().toString()); + << ", Stack trace:\n\n" << e.getStackTraceString()); } if (!statistics.got_SIGINT) diff --git a/dbms/programs/performance-test/PerformanceTestInfo.cpp b/dbms/programs/performance-test/PerformanceTestInfo.cpp index 8435b29a67a..b0f877abfc7 100644 --- a/dbms/programs/performance-test/PerformanceTestInfo.cpp +++ b/dbms/programs/performance-test/PerformanceTestInfo.cpp @@ -45,21 +45,11 @@ namespace fs = std::filesystem; PerformanceTestInfo::PerformanceTestInfo( XMLConfigurationPtr config, - const std::string & profiles_file_, const Settings & global_settings_) - : profiles_file(profiles_file_) - , settings(global_settings_) + : settings(global_settings_) { path = config->getString("path"); test_name = fs::path(path).stem().string(); - if (config->has("main_metric")) - { - Strings main_metrics; - config->keys("main_metric", main_metrics); - if (main_metrics.size()) - main_metric = main_metrics[0]; - } - applySettings(config); extractQueries(config); extractAuxiliaryQueries(config); @@ -75,38 +65,8 @@ void PerformanceTestInfo::applySettings(XMLConfigurationPtr config) SettingsChanges settings_to_apply; Strings config_settings; config->keys("settings", config_settings); - - auto settings_contain = [&config_settings] (const std::string & setting) - { - auto position = std::find(config_settings.begin(), config_settings.end(), setting); - return position != config_settings.end(); - - }; - /// Preprocess configuration file - if (settings_contain("profile")) - { - if (!profiles_file.empty()) - { - std::string profile_name = config->getString("settings.profile"); - XMLConfigurationPtr profiles_config(new XMLConfiguration(profiles_file)); - - Strings profile_settings; - profiles_config->keys("profiles." + profile_name, profile_settings); - - extractSettings(profiles_config, "profiles." + profile_name, profile_settings, settings_to_apply); - } - } - extractSettings(config, "settings", config_settings, settings_to_apply); settings.applyChanges(settings_to_apply); - - if (settings_contain("average_rows_speed_precision")) - TestStats::avg_rows_speed_precision = - config->getDouble("settings.average_rows_speed_precision"); - - if (settings_contain("average_bytes_speed_precision")) - TestStats::avg_bytes_speed_precision = - config->getDouble("settings.average_bytes_speed_precision"); } } diff --git a/dbms/programs/performance-test/PerformanceTestInfo.h b/dbms/programs/performance-test/PerformanceTestInfo.h index 4483e56bbfe..8e6b1c5f43a 100644 --- a/dbms/programs/performance-test/PerformanceTestInfo.h +++ b/dbms/programs/performance-test/PerformanceTestInfo.h @@ -26,15 +26,13 @@ using StringToVector = std::map; class PerformanceTestInfo { public: - PerformanceTestInfo(XMLConfigurationPtr config, const std::string & profiles_file_, const Settings & global_settings_); + PerformanceTestInfo(XMLConfigurationPtr config, const Settings & global_settings_); std::string test_name; std::string path; - std::string main_metric; Strings queries; - std::string profiles_file; Settings settings; ExecutionType exec_type; StringToVector substitutions; diff --git a/dbms/programs/performance-test/PerformanceTestSuite.cpp b/dbms/programs/performance-test/PerformanceTestSuite.cpp index 594f04a3906..66ef8eb51c0 100644 --- a/dbms/programs/performance-test/PerformanceTestSuite.cpp +++ b/dbms/programs/performance-test/PerformanceTestSuite.cpp @@ -64,7 +64,6 @@ public: const std::string & password_, const Settings & cmd_settings, const bool lite_output_, - const std::string & profiles_file_, Strings && input_files_, Strings && tests_tags_, Strings && skip_tags_, @@ -86,7 +85,6 @@ public: , skip_names_regexp(std::move(skip_names_regexp_)) , query_indexes(query_indexes_) , lite_output(lite_output_) - , profiles_file(profiles_file_) , input_files(input_files_) , log(&Poco::Logger::get("PerformanceTestSuite")) { @@ -139,7 +137,6 @@ private: using XMLConfigurationPtr = Poco::AutoPtr; bool lite_output; - std::string profiles_file; Strings input_files; std::vector tests_configurations; @@ -197,7 +194,7 @@ private: std::pair runTest(XMLConfigurationPtr & test_config) { - PerformanceTestInfo info(test_config, profiles_file, global_context.getSettingsRef()); + PerformanceTestInfo info(test_config, global_context.getSettingsRef()); LOG_INFO(log, "Config for test '" << info.test_name << "' parsed"); PerformanceTest current(test_config, connection, timeouts, interrupt_listener, info, global_context, query_indexes[info.path]); @@ -332,7 +329,6 @@ try desc.add_options() ("help", "produce help message") ("lite", "use lite version of output") - ("profiles-file", value()->default_value(""), "Specify a file with global profiles") ("host,h", value()->default_value("localhost"), "") ("port", value()->default_value(9000), "") ("secure,s", "Use TLS connection") @@ -401,7 +397,6 @@ try options["password"].as(), cmd_settings, options.count("lite") > 0, - options["profiles-file"].as(), std::move(input_files), std::move(tests_tags), std::move(skip_tags), diff --git a/dbms/programs/performance-test/ReportBuilder.cpp b/dbms/programs/performance-test/ReportBuilder.cpp index cfefc37c470..f95aa025095 100644 --- a/dbms/programs/performance-test/ReportBuilder.cpp +++ b/dbms/programs/performance-test/ReportBuilder.cpp @@ -17,23 +17,25 @@ namespace DB namespace { -const std::regex QUOTE_REGEX{"\""}; std::string getMainMetric(const PerformanceTestInfo & test_info) { - std::string main_metric; - if (test_info.main_metric.empty()) - if (test_info.exec_type == ExecutionType::Loop) - main_metric = "min_time"; - else - main_metric = "rows_per_second"; + if (test_info.exec_type == ExecutionType::Loop) + return "min_time"; else - main_metric = test_info.main_metric; - return main_metric; + return "rows_per_second"; } + bool isASCIIString(const std::string & str) { return std::all_of(str.begin(), str.end(), isASCII); } + +String jsonString(const String & str, FormatSettings & settings) +{ + WriteBufferFromOwnString buffer; + writeJSONString(str, buffer, settings); + return std::move(buffer.str()); +} } ReportBuilder::ReportBuilder(const std::string & server_version_) @@ -55,6 +57,8 @@ std::string ReportBuilder::buildFullReport( std::vector & stats, const std::vector & queries_to_run) const { + FormatSettings settings; + JSONString json_output; json_output.set("hostname", hostname); @@ -65,22 +69,18 @@ std::string ReportBuilder::buildFullReport( json_output.set("time", getCurrentTime()); json_output.set("test_name", test_info.test_name); json_output.set("path", test_info.path); - json_output.set("main_metric", getMainMetric(test_info)); - if (test_info.substitutions.size()) + if (!test_info.substitutions.empty()) { JSONString json_parameters(2); /// here, 2 is the size of \t padding - for (auto it = test_info.substitutions.begin(); it != test_info.substitutions.end(); ++it) + for (auto & [parameter, values] : test_info.substitutions) { - std::string parameter = it->first; - Strings values = it->second; - std::ostringstream array_string; array_string << "["; for (size_t i = 0; i != values.size(); ++i) { - array_string << '"' << std::regex_replace(values[i], QUOTE_REGEX, "\\\"") << '"'; + array_string << jsonString(values[i], settings); if (i != values.size() - 1) { array_string << ", "; @@ -110,13 +110,12 @@ std::string ReportBuilder::buildFullReport( JSONString runJSON; - auto query = std::regex_replace(test_info.queries[query_index], QUOTE_REGEX, "\\\""); - runJSON.set("query", query); + runJSON.set("query", jsonString(test_info.queries[query_index], settings), false); runJSON.set("query_index", query_index); if (!statistics.exception.empty()) { if (isASCIIString(statistics.exception)) - runJSON.set("exception", std::regex_replace(statistics.exception, QUOTE_REGEX, "\\\"")); + runJSON.set("exception", jsonString(statistics.exception, settings), false); else runJSON.set("exception", "Some exception occured with non ASCII message. This may produce invalid JSON. Try reproduce locally."); } @@ -183,7 +182,7 @@ std::string ReportBuilder::buildCompactReport( std::vector & stats, const std::vector & queries_to_run) const { - + FormatSettings settings; std::ostringstream output; for (size_t query_index = 0; query_index < test_info.queries.size(); ++query_index) @@ -194,7 +193,7 @@ std::string ReportBuilder::buildCompactReport( for (size_t number_of_launch = 0; number_of_launch < test_info.times_to_run; ++number_of_launch) { if (test_info.queries.size() > 1) - output << "query \"" << test_info.queries[query_index] << "\", "; + output << "query " << jsonString(test_info.queries[query_index], settings) << ", "; output << "run " << std::to_string(number_of_launch + 1) << ": "; diff --git a/dbms/programs/server/HTTPHandler.cpp b/dbms/programs/server/HTTPHandler.cpp index 29d186def2d..b2b3298693e 100644 --- a/dbms/programs/server/HTTPHandler.cpp +++ b/dbms/programs/server/HTTPHandler.cpp @@ -20,8 +20,6 @@ #include #include #include -#include -#include #include #include #include @@ -300,32 +298,24 @@ void HTTPHandler::processQuery( /// The client can pass a HTTP header indicating supported compression method (gzip or deflate). String http_response_compression_methods = request.get("Accept-Encoding", ""); - bool client_supports_http_compression = false; - CompressionMethod http_response_compression_method {}; + CompressionMethod http_response_compression_method = CompressionMethod::None; if (!http_response_compression_methods.empty()) { + /// If client supports brotli - it's preferred. /// Both gzip and deflate are supported. If the client supports both, gzip is preferred. /// NOTE parsing of the list of methods is slightly incorrect. - if (std::string::npos != http_response_compression_methods.find("gzip")) - { - client_supports_http_compression = true; - http_response_compression_method = CompressionMethod::Gzip; - } - else if (std::string::npos != http_response_compression_methods.find("deflate")) - { - client_supports_http_compression = true; - http_response_compression_method = CompressionMethod::Zlib; - } -#if USE_BROTLI - else if (http_response_compression_methods == "br") - { - client_supports_http_compression = true; + + if (std::string::npos != http_response_compression_methods.find("br")) http_response_compression_method = CompressionMethod::Brotli; - } -#endif + else if (std::string::npos != http_response_compression_methods.find("gzip")) + http_response_compression_method = CompressionMethod::Gzip; + else if (std::string::npos != http_response_compression_methods.find("deflate")) + http_response_compression_method = CompressionMethod::Zlib; } + bool client_supports_http_compression = http_response_compression_method != CompressionMethod::None; + /// Client can pass a 'compress' flag in the query string. In this case the query result is /// compressed using internal algorithm. This is not reflected in HTTP headers. bool internal_compression = params.getParsed("compress", false); @@ -344,8 +334,8 @@ void HTTPHandler::processQuery( unsigned keep_alive_timeout = config.getUInt("keep_alive_timeout", 10); used_output.out = std::make_shared( - request, response, keep_alive_timeout, - client_supports_http_compression, http_response_compression_method, buffer_size_http); + request, response, keep_alive_timeout, client_supports_http_compression, http_response_compression_method); + if (internal_compression) used_output.out_maybe_compressed = std::make_shared(*used_output.out); else @@ -400,32 +390,9 @@ void HTTPHandler::processQuery( std::unique_ptr in_post_raw = std::make_unique(istr); /// Request body can be compressed using algorithm specified in the Content-Encoding header. - std::unique_ptr in_post; String http_request_compression_method_str = request.get("Content-Encoding", ""); - if (!http_request_compression_method_str.empty()) - { - if (http_request_compression_method_str == "gzip") - { - in_post = std::make_unique(std::move(in_post_raw), CompressionMethod::Gzip); - } - else if (http_request_compression_method_str == "deflate") - { - in_post = std::make_unique(std::move(in_post_raw), CompressionMethod::Zlib); - } -#if USE_BROTLI - else if (http_request_compression_method_str == "br") - { - in_post = std::make_unique(std::move(in_post_raw)); - } -#endif - else - { - throw Exception("Unknown Content-Encoding of HTTP request: " + http_request_compression_method_str, - ErrorCodes::UNKNOWN_COMPRESSION_METHOD); - } - } - else - in_post = std::move(in_post_raw); + std::unique_ptr in_post = wrapReadBufferWithCompressionMethod( + std::make_unique(istr), chooseCompressionMethod({}, http_request_compression_method_str)); /// The data can also be compressed using incompatible internal algorithm. This is indicated by /// 'decompress' query parameter. diff --git a/dbms/programs/server/Server.cpp b/dbms/programs/server/Server.cpp index 972401c19d5..bb08abf2161 100644 --- a/dbms/programs/server/Server.cpp +++ b/dbms/programs/server/Server.cpp @@ -947,6 +947,7 @@ int Server::main(const std::vector & /*args*/) }); /// try to load dictionaries immediately, throw on error and die + ext::scope_guard dictionaries_xmls, models_xmls; try { if (!config().getBool("dictionaries_lazy_load", true)) @@ -954,12 +955,10 @@ int Server::main(const std::vector & /*args*/) global_context->tryCreateEmbeddedDictionaries(); global_context->getExternalDictionariesLoader().enableAlwaysLoadEverything(true); } - - auto dictionaries_repository = std::make_unique(config(), "dictionaries_config"); - global_context->getExternalDictionariesLoader().addConfigRepository("", std::move(dictionaries_repository)); - - auto models_repository = std::make_unique(config(), "models_config"); - global_context->getExternalModelsLoader().addConfigRepository("", std::move(models_repository)); + dictionaries_xmls = global_context->getExternalDictionariesLoader().addConfigRepository( + std::make_unique(config(), "dictionaries_config")); + models_xmls = global_context->getExternalModelsLoader().addConfigRepository( + std::make_unique(config(), "models_config")); } catch (...) { diff --git a/dbms/programs/server/TCPHandler.cpp b/dbms/programs/server/TCPHandler.cpp index cb215eb0af8..01c5ae59cea 100644 --- a/dbms/programs/server/TCPHandler.cpp +++ b/dbms/programs/server/TCPHandler.cpp @@ -112,7 +112,7 @@ void TCPHandler::runImpl() { Exception e("Database " + backQuote(default_database) + " doesn't exist", ErrorCodes::UNKNOWN_DATABASE); LOG_ERROR(log, "Code: " << e.code() << ", e.displayText() = " << e.displayText() - << ", Stack trace:\n\n" << e.getStackTrace().toString()); + << ", Stack trace:\n\n" << e.getStackTraceString()); sendException(e, connection_context.getSettingsRef().calculate_text_stack_trace); return; } @@ -158,7 +158,7 @@ void TCPHandler::runImpl() /** An exception during the execution of request (it must be sent over the network to the client). * The client will be able to accept it, if it did not happen while sending another packet and the client has not disconnected yet. */ - std::unique_ptr exception; + std::optional exception; bool network_error = false; bool send_exception_with_stack_trace = connection_context.getSettingsRef().calculate_text_stack_trace; @@ -280,7 +280,7 @@ void TCPHandler::runImpl() catch (const Exception & e) { state.io.onException(); - exception.reset(e.clone()); + exception.emplace(e); if (e.code() == ErrorCodes::UNKNOWN_PACKET_FROM_CLIENT) throw; @@ -298,22 +298,22 @@ void TCPHandler::runImpl() * We will try to send exception to the client in any case - see below. */ state.io.onException(); - exception = std::make_unique(e.displayText(), ErrorCodes::POCO_EXCEPTION); + exception.emplace(Exception::CreateFromPoco, e); } catch (const Poco::Exception & e) { state.io.onException(); - exception = std::make_unique(e.displayText(), ErrorCodes::POCO_EXCEPTION); + exception.emplace(Exception::CreateFromPoco, e); } catch (const std::exception & e) { state.io.onException(); - exception = std::make_unique(e.what(), ErrorCodes::STD_EXCEPTION); + exception.emplace(Exception::CreateFromSTD, e); } catch (...) { state.io.onException(); - exception = std::make_unique("Unknown exception", ErrorCodes::UNKNOWN_EXCEPTION); + exception.emplace("Unknown exception", ErrorCodes::UNKNOWN_EXCEPTION); } try @@ -546,7 +546,7 @@ void TCPHandler::processOrdinaryQueryWithProcessors(size_t num_threads) auto & pipeline = state.io.pipeline; if (pipeline.getMaxThreads()) - num_threads = pipeline.getMaxThreads(); + num_threads = std::min(num_threads, pipeline.getMaxThreads()); /// Send header-block, to allow client to prepare output format for data to send. { diff --git a/dbms/src/Access/SettingsConstraints.cpp b/dbms/src/Access/SettingsConstraints.cpp index a044b7a0dc1..64460aaa8f1 100644 --- a/dbms/src/Access/SettingsConstraints.cpp +++ b/dbms/src/Access/SettingsConstraints.cpp @@ -217,9 +217,18 @@ const SettingsConstraints::Constraint * SettingsConstraints::tryGetConstraint(si void SettingsConstraints::setProfile(const String & profile_name, const Poco::Util::AbstractConfiguration & config) { - String parent_profile = "profiles." + profile_name + ".profile"; - if (config.has(parent_profile)) - setProfile(parent_profile, config); // Inheritance of one profile from another. + String elem = "profiles." + profile_name; + + Poco::Util::AbstractConfiguration::Keys config_keys; + config.keys(elem, config_keys); + + for (const std::string & key : config_keys) + { + if (key == "profile" || 0 == key.compare(0, strlen("profile["), "profile[")) /// Inheritance of profiles from the current one. + setProfile(config.getString(elem + "." + key), config); + else + continue; + } String path_to_constraints = "profiles." + profile_name + ".constraints"; if (config.has(path_to_constraints)) diff --git a/dbms/src/AggregateFunctions/AggregateFunctionAggThrow.cpp b/dbms/src/AggregateFunctions/AggregateFunctionAggThrow.cpp new file mode 100644 index 00000000000..2bf00676d77 --- /dev/null +++ b/dbms/src/AggregateFunctions/AggregateFunctionAggThrow.cpp @@ -0,0 +1,119 @@ +#include +#include + +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int AGGREGATE_FUNCTION_THROW; + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; +} + +namespace +{ + +struct AggregateFunctionThrowData +{ + bool allocated; + + AggregateFunctionThrowData() : allocated(true) {} + ~AggregateFunctionThrowData() + { + volatile bool * allocated_ptr = &allocated; + + if (*allocated_ptr) + *allocated_ptr = false; + else + abort(); + } +}; + +/** Throw on creation with probability specified in parameter. + * It will check correct destruction of the state. + * This is intended to check for exception safety. + */ +class AggregateFunctionThrow final : public IAggregateFunctionDataHelper +{ +private: + Float64 throw_probability; + +public: + AggregateFunctionThrow(const DataTypes & argument_types_, const Array & parameters_, Float64 throw_probability_) + : IAggregateFunctionDataHelper(argument_types_, parameters_), throw_probability(throw_probability_) {} + + String getName() const override + { + return "aggThrow"; + } + + DataTypePtr getReturnType() const override + { + return std::make_shared(); + } + + void create(AggregateDataPtr place) const override + { + if (std::uniform_real_distribution<>(0.0, 1.0)(thread_local_rng) <= throw_probability) + throw Exception("Aggregate function " + getName() + " has thrown exception successfully", ErrorCodes::AGGREGATE_FUNCTION_THROW); + + new (place) Data; + } + + void destroy(AggregateDataPtr place) const noexcept override + { + data(place).~Data(); + } + + void add(AggregateDataPtr, const IColumn **, size_t, Arena *) const override + { + } + + void merge(AggregateDataPtr, ConstAggregateDataPtr, Arena *) const override + { + } + + void serialize(ConstAggregateDataPtr, WriteBuffer & buf) const override + { + char c = 0; + buf.write(c); + } + + void deserialize(AggregateDataPtr, ReadBuffer & buf, Arena *) const override + { + char c = 0; + buf.read(c); + } + + void insertResultInto(ConstAggregateDataPtr, IColumn & to) const override + { + to.insertDefault(); + } +}; + +} + +void registerAggregateFunctionAggThrow(AggregateFunctionFactory & factory) +{ + factory.registerFunction("aggThrow", [](const std::string & name, const DataTypes & argument_types, const Array & parameters) + { + Float64 throw_probability = 1.0; + if (parameters.size() == 1) + throw_probability = parameters[0].safeGet(); + else if (parameters.size() > 1) + throw Exception("Aggregate function " + name + " cannot have more than one parameter", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + + return std::make_shared(argument_types, parameters, throw_probability); + }); +} + +} + diff --git a/dbms/src/AggregateFunctions/AggregateFunctionBoundingRatio.h b/dbms/src/AggregateFunctions/AggregateFunctionBoundingRatio.h index ecb4d686e59..8451d6532f6 100644 --- a/dbms/src/AggregateFunctions/AggregateFunctionBoundingRatio.h +++ b/dbms/src/AggregateFunctions/AggregateFunctionBoundingRatio.h @@ -129,9 +129,9 @@ public: void add(AggregateDataPtr place, const IColumn ** columns, const size_t row_num, Arena *) const override { - /// TODO Inefficient. - const auto x = applyVisitor(FieldVisitorConvertToNumber(), (*columns[0])[row_num]); - const auto y = applyVisitor(FieldVisitorConvertToNumber(), (*columns[1])[row_num]); + /// NOTE Slightly inefficient. + const auto x = columns[0]->getFloat64(row_num); + const auto y = columns[1]->getFloat64(row_num); data(place).add(x, y); } diff --git a/dbms/src/AggregateFunctions/AggregateFunctionResample.h b/dbms/src/AggregateFunctions/AggregateFunctionResample.h index 33b03fcdee0..0f348899884 100644 --- a/dbms/src/AggregateFunctions/AggregateFunctionResample.h +++ b/dbms/src/AggregateFunctions/AggregateFunctionResample.h @@ -100,7 +100,18 @@ public: void create(AggregateDataPtr place) const override { for (size_t i = 0; i < total; ++i) - nested_function->create(place + i * size_of_data); + { + try + { + nested_function->create(place + i * size_of_data); + } + catch (...) + { + for (size_t j = 0; j < i; ++j) + nested_function->destroy(place + j * size_of_data); + throw; + } + } } void destroy(AggregateDataPtr place) const noexcept override diff --git a/dbms/src/AggregateFunctions/FactoryHelpers.h b/dbms/src/AggregateFunctions/FactoryHelpers.h index 183116df54e..aff7ff0ff36 100644 --- a/dbms/src/AggregateFunctions/FactoryHelpers.h +++ b/dbms/src/AggregateFunctions/FactoryHelpers.h @@ -23,13 +23,13 @@ inline void assertNoParameters(const std::string & name, const Array & parameter inline void assertUnary(const std::string & name, const DataTypes & argument_types) { if (argument_types.size() != 1) - throw Exception("Aggregate function " + name + " require single argument", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + throw Exception("Aggregate function " + name + " requires single argument", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); } inline void assertBinary(const std::string & name, const DataTypes & argument_types) { if (argument_types.size() != 2) - throw Exception("Aggregate function " + name + " require two arguments", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + throw Exception("Aggregate function " + name + " requires two arguments", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); } template diff --git a/dbms/src/AggregateFunctions/HelpersMinMaxAny.h b/dbms/src/AggregateFunctions/HelpersMinMaxAny.h index 6c027b5a6de..dc165f50d8e 100644 --- a/dbms/src/AggregateFunctions/HelpersMinMaxAny.h +++ b/dbms/src/AggregateFunctions/HelpersMinMaxAny.h @@ -55,7 +55,7 @@ static IAggregateFunction * createAggregateFunctionArgMinMaxSecond(const DataTyp #define DISPATCH(TYPE) \ if (which.idx == TypeIndex::TYPE) \ - return new AggregateFunctionArgMinMax>>>(res_type, val_type); \ + return new AggregateFunctionArgMinMax>>>(res_type, val_type); FOR_NUMERIC_TYPES(DISPATCH) #undef DISPATCH diff --git a/dbms/src/AggregateFunctions/IAggregateFunction.h b/dbms/src/AggregateFunctions/IAggregateFunction.h index 94dcf4cbcab..9ac8bf7f34d 100644 --- a/dbms/src/AggregateFunctions/IAggregateFunction.h +++ b/dbms/src/AggregateFunctions/IAggregateFunction.h @@ -131,9 +131,7 @@ public: /** Contains a loop with calls to "add" function. You can collect arguments into array "places" * and do a single call to "addBatch" for devirtualization and inlining. */ - virtual void - addBatch(size_t batch_size, AggregateDataPtr * places, size_t place_offset, const IColumn ** columns, Arena * arena) - const = 0; + virtual void addBatch(size_t batch_size, AggregateDataPtr * places, size_t place_offset, const IColumn ** columns, Arena * arena) const = 0; /** The same for single place. */ @@ -144,9 +142,8 @@ public: * -Array combinator. It might also be used generally to break data dependency when array * "places" contains a large number of same values consecutively. */ - virtual void - addBatchArray(size_t batch_size, AggregateDataPtr * places, size_t place_offset, const IColumn ** columns, const UInt64 * offsets, Arena * arena) - const = 0; + virtual void addBatchArray( + size_t batch_size, AggregateDataPtr * places, size_t place_offset, const IColumn ** columns, const UInt64 * offsets, Arena * arena) const = 0; const DataTypes & getArgumentTypes() const { return argument_types; } const Array & getParameters() const { return parameters; } @@ -213,7 +210,7 @@ protected: public: IAggregateFunctionDataHelper(const DataTypes & argument_types_, const Array & parameters_) - : IAggregateFunctionHelper(argument_types_, parameters_) {} + : IAggregateFunctionHelper(argument_types_, parameters_) {} void create(AggregateDataPtr place) const override { diff --git a/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp b/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp index d36603df081..a4fc41e9c06 100644 --- a/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp +++ b/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp @@ -42,6 +42,7 @@ void registerAggregateFunctions() registerAggregateFunctionSimpleLinearRegression(factory); registerAggregateFunctionMoving(factory); registerAggregateFunctionCategoricalIV(factory); + registerAggregateFunctionAggThrow(factory); } { diff --git a/dbms/src/AggregateFunctions/registerAggregateFunctions.h b/dbms/src/AggregateFunctions/registerAggregateFunctions.h index 897e5d52a61..88cdf4a504d 100644 --- a/dbms/src/AggregateFunctions/registerAggregateFunctions.h +++ b/dbms/src/AggregateFunctions/registerAggregateFunctions.h @@ -34,6 +34,7 @@ void registerAggregateFunctionEntropy(AggregateFunctionFactory &); void registerAggregateFunctionSimpleLinearRegression(AggregateFunctionFactory &); void registerAggregateFunctionMoving(AggregateFunctionFactory &); void registerAggregateFunctionCategoricalIV(AggregateFunctionFactory &); +void registerAggregateFunctionAggThrow(AggregateFunctionFactory &); class AggregateFunctionCombinatorFactory; void registerAggregateFunctionCombinatorIf(AggregateFunctionCombinatorFactory &); diff --git a/dbms/src/Columns/ColumnVector.h b/dbms/src/Columns/ColumnVector.h index a90f1bdb6e8..a157a184974 100644 --- a/dbms/src/Columns/ColumnVector.h +++ b/dbms/src/Columns/ColumnVector.h @@ -212,21 +212,23 @@ public: Float64 getFloat64(size_t n) const override; Float32 getFloat32(size_t n) const override; - UInt64 getUInt(size_t n) const override + /// Out of range conversion is permitted. + UInt64 NO_SANITIZE_UNDEFINED getUInt(size_t n) const override { return UInt64(data[n]); } + /// Out of range conversion is permitted. + Int64 NO_SANITIZE_UNDEFINED getInt(size_t n) const override + { + return Int64(data[n]); + } + bool getBool(size_t n) const override { return bool(data[n]); } - Int64 getInt(size_t n) const override - { - return Int64(data[n]); - } - void insert(const Field & x) override { data.push_back(DB::get>(x)); diff --git a/dbms/src/Common/ErrorCodes.cpp b/dbms/src/Common/ErrorCodes.cpp index 25d1b015a03..f20c7920a12 100644 --- a/dbms/src/Common/ErrorCodes.cpp +++ b/dbms/src/Common/ErrorCodes.cpp @@ -138,7 +138,6 @@ namespace ErrorCodes extern const int FUNCTION_IS_SPECIAL = 129; extern const int CANNOT_READ_ARRAY_FROM_TEXT = 130; extern const int TOO_LARGE_STRING_SIZE = 131; - extern const int CANNOT_CREATE_TABLE_FROM_METADATA = 132; extern const int AGGREGATE_FUNCTION_DOESNT_ALLOW_PARAMETERS = 133; extern const int PARAMETERS_TO_AGGREGATE_FUNCTIONS_MUST_BE_LITERALS = 134; extern const int ZERO_ARRAY_OR_TUPLE_INDEX = 135; @@ -474,9 +473,9 @@ namespace ErrorCodes extern const int NOT_ENOUGH_PRIVILEGES = 497; extern const int LIMIT_BY_WITH_TIES_IS_NOT_SUPPORTED = 498; extern const int S3_ERROR = 499; - extern const int CANNOT_CREATE_DICTIONARY_FROM_METADATA = 500; extern const int CANNOT_CREATE_DATABASE = 501; extern const int CANNOT_SIGQUEUE = 502; + extern const int AGGREGATE_FUNCTION_THROW = 503; extern const int KEEPER_EXCEPTION = 999; extern const int POCO_EXCEPTION = 1000; diff --git a/dbms/src/Common/Exception.cpp b/dbms/src/Common/Exception.cpp index 28a86c620d1..a7a3d58c04f 100644 --- a/dbms/src/Common/Exception.cpp +++ b/dbms/src/Common/Exception.cpp @@ -25,6 +25,55 @@ namespace ErrorCodes extern const int NOT_IMPLEMENTED; } + +Exception::Exception() +{ +} + +Exception::Exception(const std::string & msg, int code) + : Poco::Exception(msg, code) +{ +} + +Exception::Exception(CreateFromPocoTag, const Poco::Exception & exc) + : Poco::Exception(exc.displayText(), ErrorCodes::POCO_EXCEPTION) +{ +#ifdef STD_EXCEPTION_HAS_STACK_TRACE + set_stack_trace(exc.get_stack_trace_frames(), exc.get_stack_trace_size()); +#endif +} + +Exception::Exception(CreateFromSTDTag, const std::exception & exc) + : Poco::Exception(String(typeid(exc).name()) + ": " + String(exc.what()), ErrorCodes::STD_EXCEPTION) +{ +#ifdef STD_EXCEPTION_HAS_STACK_TRACE + set_stack_trace(exc.get_stack_trace_frames(), exc.get_stack_trace_size()); +#endif +} + + +std::string getExceptionStackTraceString(const std::exception & e) +{ +#ifdef STD_EXCEPTION_HAS_STACK_TRACE + return StackTrace::toString(e.get_stack_trace_frames(), 0, e.get_stack_trace_size()); +#else + if (const auto * db_exception = dynamic_cast(&e)) + return db_exception->getStackTraceString(); + return {}; +#endif +} + + +std::string Exception::getStackTraceString() const +{ +#ifdef STD_EXCEPTION_HAS_STACK_TRACE + return StackTrace::toString(get_stack_trace_frames(), 0, get_stack_trace_size()); +#else + return trace.toString(); +#endif +} + + std::string errnoToString(int code, int e) { const size_t buf_size = 128; @@ -141,6 +190,7 @@ std::string getCurrentExceptionMessage(bool with_stacktrace, bool check_embedded { stream << "Poco::Exception. Code: " << ErrorCodes::POCO_EXCEPTION << ", e.code() = " << e.code() << ", e.displayText() = " << e.displayText() + << (with_stacktrace ? getExceptionStackTraceString(e) : "") << (with_extra_info ? getExtraExceptionInfo(e) : "") << " (version " << VERSION_STRING << VERSION_OFFICIAL; } @@ -157,8 +207,9 @@ std::string getCurrentExceptionMessage(bool with_stacktrace, bool check_embedded name += " (demangling status: " + toString(status) + ")"; stream << "std::exception. Code: " << ErrorCodes::STD_EXCEPTION << ", type: " << name << ", e.what() = " << e.what() - << (with_extra_info ? getExtraExceptionInfo(e) : "") - << ", version = " << VERSION_STRING << VERSION_OFFICIAL; + << (with_stacktrace ? getExceptionStackTraceString(e) : "") + << (with_extra_info ? getExtraExceptionInfo(e) : "") + << ", version = " << VERSION_STRING << VERSION_OFFICIAL; } catch (...) {} } @@ -261,7 +312,7 @@ std::string getExceptionMessage(const Exception & e, bool with_stacktrace, bool stream << "Code: " << e.code() << ", e.displayText() = " << text; if (with_stacktrace && !has_embedded_stack_trace) - stream << ", Stack trace (when copying this message, always include the lines below):\n\n" << e.getStackTrace().toString(); + stream << ", Stack trace (when copying this message, always include the lines below):\n\n" << e.getStackTraceString(); } catch (...) {} diff --git a/dbms/src/Common/Exception.h b/dbms/src/Common/Exception.h index 5df2879a16d..d14b2eb77a1 100644 --- a/dbms/src/Common/Exception.h +++ b/dbms/src/Common/Exception.h @@ -22,13 +22,14 @@ namespace ErrorCodes class Exception : public Poco::Exception { public: - Exception() {} /// For deferred initialization. - Exception(const std::string & msg, int code) : Poco::Exception(msg, code) {} - Exception(const std::string & msg, const Exception & nested_exception, int code) - : Poco::Exception(msg, nested_exception, code), trace(nested_exception.trace) {} + Exception(); + Exception(const std::string & msg, int code); enum CreateFromPocoTag { CreateFromPoco }; - Exception(CreateFromPocoTag, const Poco::Exception & exc) : Poco::Exception(exc.displayText(), ErrorCodes::POCO_EXCEPTION) {} + enum CreateFromSTDTag { CreateFromSTD }; + + Exception(CreateFromPocoTag, const Poco::Exception & exc); + Exception(CreateFromSTDTag, const std::exception & exc); Exception * clone() const override { return new Exception(*this); } void rethrow() const override { throw *this; } @@ -38,15 +39,20 @@ public: /// Add something to the existing message. void addMessage(const std::string & arg) { extendedMessage(arg); } - const StackTrace & getStackTrace() const { return trace; } + std::string getStackTraceString() const; private: +#ifndef STD_EXCEPTION_HAS_STACK_TRACE StackTrace trace; +#endif const char * className() const throw() override { return "DB::Exception"; } }; +std::string getExceptionStackTraceString(const std::exception & e); + + /// Contains an additional member `saved_errno`. See the throwFromErrno function. class ErrnoException : public Exception { diff --git a/dbms/src/Common/LRUCache.h b/dbms/src/Common/LRUCache.h index 98e88d9c59f..5bcfc8fc2db 100644 --- a/dbms/src/Common/LRUCache.h +++ b/dbms/src/Common/LRUCache.h @@ -23,11 +23,10 @@ struct TrivialWeightFunction }; -/// Thread-safe cache that evicts entries which are not used for a long time or are expired. +/// Thread-safe cache that evicts entries which are not used for a long time. /// WeightFunction is a functor that takes Mapped as a parameter and returns "weight" (approximate size) /// of that value. -/// Cache starts to evict entries when their total weight exceeds max_size and when expiration time of these -/// entries is due. +/// Cache starts to evict entries when their total weight exceeds max_size. /// Value weight should not change after insertion. template , typename WeightFunction = TrivialWeightFunction> class LRUCache @@ -36,15 +35,13 @@ public: using Key = TKey; using Mapped = TMapped; using MappedPtr = std::shared_ptr; - using Delay = std::chrono::seconds; private: using Clock = std::chrono::steady_clock; - using Timestamp = Clock::time_point; public: - LRUCache(size_t max_size_, const Delay & expiration_delay_ = Delay::zero()) - : max_size(std::max(static_cast(1), max_size_)), expiration_delay(expiration_delay_) {} + LRUCache(size_t max_size_) + : max_size(std::max(static_cast(1), max_size_)) {} MappedPtr get(const Key & key) { @@ -167,16 +164,9 @@ protected: struct Cell { - bool expired(const Timestamp & last_timestamp, const Delay & delay) const - { - return (delay == Delay::zero()) || - ((last_timestamp > timestamp) && ((last_timestamp - timestamp) > delay)); - } - MappedPtr value; size_t size; LRUQueueIterator queue_iterator; - Timestamp timestamp; }; using Cells = std::unordered_map; @@ -257,7 +247,6 @@ private: /// Total weight of values. size_t current_size = 0; const size_t max_size; - const Delay expiration_delay; std::atomic hits {0}; std::atomic misses {0}; @@ -273,7 +262,6 @@ private: } Cell & cell = it->second; - updateCellTimestamp(cell); /// Move the key to the end of the queue. The iterator remains valid. queue.splice(queue.end(), queue, cell.queue_iterator); @@ -303,18 +291,11 @@ private: cell.value = mapped; cell.size = cell.value ? weight_function(*cell.value) : 0; current_size += cell.size; - updateCellTimestamp(cell); - removeOverflow(cell.timestamp); + removeOverflow(); } - void updateCellTimestamp(Cell & cell) - { - if (expiration_delay != Delay::zero()) - cell.timestamp = Clock::now(); - } - - void removeOverflow(const Timestamp & last_timestamp) + void removeOverflow() { size_t current_weight_lost = 0; size_t queue_size = cells.size(); @@ -330,8 +311,6 @@ private: } const auto & cell = it->second; - if (!cell.expired(last_timestamp, expiration_delay)) - break; current_size -= cell.size; current_weight_lost += cell.size; diff --git a/dbms/src/Common/ProfileEvents.cpp b/dbms/src/Common/ProfileEvents.cpp index 6cbbc07d8d8..99723cccfb4 100644 --- a/dbms/src/Common/ProfileEvents.cpp +++ b/dbms/src/Common/ProfileEvents.cpp @@ -37,6 +37,8 @@ M(CreatedReadBufferOrdinary, "") \ M(CreatedReadBufferAIO, "") \ M(CreatedReadBufferAIOFailed, "") \ + M(CreatedReadBufferMMap, "") \ + M(CreatedReadBufferMMapFailed, "") \ M(CreatedWriteBufferOrdinary, "") \ M(CreatedWriteBufferAIO, "") \ M(CreatedWriteBufferAIOFailed, "") \ diff --git a/dbms/src/Common/StackTrace.cpp b/dbms/src/Common/StackTrace.cpp index 597ed2028fa..e43bc4c287e 100644 --- a/dbms/src/Common/StackTrace.cpp +++ b/dbms/src/Common/StackTrace.cpp @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -226,6 +227,7 @@ void StackTrace::tryCapture() size = 0; #if USE_UNWIND size = unw_backtrace(frames.data(), capacity); + __msan_unpoison(frames.data(), size * sizeof(frames[0])); #endif } @@ -328,3 +330,15 @@ std::string StackTrace::toString() const static SimpleCache func_cached; return func_cached(frames, offset, size); } + +std::string StackTrace::toString(void ** frames_, size_t offset, size_t size) +{ + __msan_unpoison(frames_, size * sizeof(*frames_)); + + StackTrace::Frames frames_copy{}; + for (size_t i = 0; i < size; ++i) + frames_copy[i] = frames_[i]; + + static SimpleCache func_cached; + return func_cached(frames_copy, offset, size); +} diff --git a/dbms/src/Common/StackTrace.h b/dbms/src/Common/StackTrace.h index 13147587a19..401c8344f2d 100644 --- a/dbms/src/Common/StackTrace.h +++ b/dbms/src/Common/StackTrace.h @@ -41,6 +41,8 @@ public: const Frames & getFrames() const; std::string toString() const; + static std::string toString(void ** frames, size_t offset, size_t size); + void toStringEveryLine(std::function callback) const; protected: diff --git a/dbms/src/Common/tests/CMakeLists.txt b/dbms/src/Common/tests/CMakeLists.txt index 36206cf55d6..bce78dbe4a6 100644 --- a/dbms/src/Common/tests/CMakeLists.txt +++ b/dbms/src/Common/tests/CMakeLists.txt @@ -13,9 +13,6 @@ target_link_libraries (sip_hash_perf PRIVATE clickhouse_common_io) add_executable (auto_array auto_array.cpp) target_link_libraries (auto_array PRIVATE clickhouse_common_io) -add_executable (lru_cache lru_cache.cpp) -target_link_libraries (lru_cache PRIVATE clickhouse_common_io) - add_executable (hash_table hash_table.cpp) target_link_libraries (hash_table PRIVATE clickhouse_common_io) diff --git a/dbms/src/Common/tests/lru_cache.cpp b/dbms/src/Common/tests/lru_cache.cpp deleted file mode 100644 index e50d6ad9786..00000000000 --- a/dbms/src/Common/tests/lru_cache.cpp +++ /dev/null @@ -1,317 +0,0 @@ -#include -#include - -#include -#include -#include -#include -#include - - -namespace -{ - -void run(); -void runTest(unsigned int num, const std::function & func); -bool test1(); -bool test2(); -bool test_concurrent(); - -#define ASSERT_CHECK(cond, res) \ -do \ -{ \ - if (!(cond)) \ - { \ - std::cout << __FILE__ << ":" << __LINE__ << ":" \ - << "Assertion " << #cond << " failed.\n"; \ - if ((res)) { (res) = false; } \ - } \ -} \ -while (0) - -void run() -{ - const std::vector> tests = - { - test1, - test2, - test_concurrent - }; - - unsigned int num = 0; - for (const auto & test : tests) - { - ++num; - runTest(num, test); - } -} - -void runTest(unsigned int num, const std::function & func) -{ - bool ok; - - try - { - ok = func(); - } - catch (const DB::Exception & ex) - { - ok = false; - std::cout << "Caught exception " << ex.displayText() << "\n"; - } - catch (const std::exception & ex) - { - ok = false; - std::cout << "Caught exception " << ex.what() << "\n"; - } - catch (...) - { - ok = false; - std::cout << "Caught unhandled exception\n"; - } - - if (ok) - std::cout << "Test " << num << " passed\n"; - else - std::cout << "Test " << num << " failed\n"; -} - -struct Weight -{ - size_t operator()(const std::string & s) const - { - return s.size(); - } -}; - -bool test1() -{ - using Cache = DB::LRUCache, Weight>; - using MappedPtr = Cache::MappedPtr; - - auto ptr = [](const std::string & s) - { - return MappedPtr(new std::string(s)); - }; - - Cache cache(10); - - bool res = true; - - ASSERT_CHECK(!cache.get("asd"), res); - - cache.set("asd", ptr("qwe")); - - ASSERT_CHECK((*cache.get("asd") == "qwe"), res); - - cache.set("zxcv", ptr("12345")); - cache.set("01234567891234567", ptr("--")); - - ASSERT_CHECK((*cache.get("zxcv") == "12345"), res); - ASSERT_CHECK((*cache.get("asd") == "qwe"), res); - ASSERT_CHECK((*cache.get("01234567891234567") == "--"), res); - ASSERT_CHECK(!cache.get("123x"), res); - - cache.set("321x", ptr("+")); - - ASSERT_CHECK(!cache.get("zxcv"), res); - ASSERT_CHECK((*cache.get("asd") == "qwe"), res); - ASSERT_CHECK((*cache.get("01234567891234567") == "--"), res); - ASSERT_CHECK(!cache.get("123x"), res); - ASSERT_CHECK((*cache.get("321x") == "+"), res); - - ASSERT_CHECK((cache.weight() == 6), res); - ASSERT_CHECK((cache.count() == 3), res); - - return res; -} - -bool test2() -{ - using namespace std::literals; - using Cache = DB::LRUCache, Weight>; - using MappedPtr = Cache::MappedPtr; - - auto ptr = [](const std::string & s) - { - return MappedPtr(new std::string(s)); - }; - - Cache cache(10, 3s); - - bool res = true; - - ASSERT_CHECK(!cache.get("asd"), res); - - cache.set("asd", ptr("qwe")); - - ASSERT_CHECK((*cache.get("asd") == "qwe"), res); - - cache.set("zxcv", ptr("12345")); - cache.set("01234567891234567", ptr("--")); - - ASSERT_CHECK((*cache.get("zxcv") == "12345"), res); - ASSERT_CHECK((*cache.get("asd") == "qwe"), res); - ASSERT_CHECK((*cache.get("01234567891234567") == "--"), res); - ASSERT_CHECK(!cache.get("123x"), res); - - cache.set("321x", ptr("+")); - - ASSERT_CHECK((cache.get("zxcv")), res); - ASSERT_CHECK((*cache.get("asd") == "qwe"), res); - ASSERT_CHECK((*cache.get("01234567891234567") == "--"), res); - ASSERT_CHECK(!cache.get("123x"), res); - ASSERT_CHECK((*cache.get("321x") == "+"), res); - - ASSERT_CHECK((cache.weight() == 11), res); - ASSERT_CHECK((cache.count() == 4), res); - - std::this_thread::sleep_for(5s); - - cache.set("123x", ptr("2769")); - - ASSERT_CHECK(!cache.get("zxcv"), res); - ASSERT_CHECK((*cache.get("asd") == "qwe"), res); - ASSERT_CHECK((*cache.get("01234567891234567") == "--"), res); - ASSERT_CHECK((*cache.get("321x") == "+"), res); - - ASSERT_CHECK((cache.weight() == 10), res); - ASSERT_CHECK((cache.count() == 4), res); - - return res; -} - -bool test_concurrent() -{ - using namespace std::literals; - - using Cache = DB::LRUCache, Weight>; - Cache cache(2); - - bool res = true; - - auto load_func = [](const std::string & result, std::chrono::seconds sleep_for, bool throw_exc) - { - std::this_thread::sleep_for(sleep_for); - if (throw_exc) - throw std::runtime_error("Exception!"); - return std::make_shared(result); - }; - - /// Case 1: Both threads are able to load the value. - - std::pair result1; - std::thread thread1([&]() - { - result1 = cache.getOrSet("key", [&]() { return load_func("val1", 1s, false); }); - }); - - std::pair result2; - std::thread thread2([&]() - { - result2 = cache.getOrSet("key", [&]() { return load_func("val2", 1s, false); }); - }); - - thread1.join(); - thread2.join(); - - ASSERT_CHECK((result1.first == result2.first), res); - ASSERT_CHECK((result1.second != result2.second), res); - - /// Case 2: One thread throws an exception during loading. - - cache.reset(); - - bool thrown = false; - thread1 = std::thread([&]() - { - try - { - cache.getOrSet("key", [&]() { return load_func("val1", 2s, true); }); - } - catch (...) - { - thrown = true; - } - }); - - thread2 = std::thread([&]() - { - std::this_thread::sleep_for(1s); - result2 = cache.getOrSet("key", [&]() { return load_func("val2", 1s, false); }); - }); - - thread1.join(); - thread2.join(); - - ASSERT_CHECK((thrown == true), res); - ASSERT_CHECK((result2.second == true), res); - ASSERT_CHECK((result2.first.get() == cache.get("key").get()), res); - ASSERT_CHECK((*result2.first == "val2"), res); - - /// Case 3: All threads throw an exception. - - cache.reset(); - - bool thrown1 = false; - thread1 = std::thread([&]() - { - try - { - cache.getOrSet("key", [&]() { return load_func("val1", 1s, true); }); - } - catch (...) - { - thrown1 = true; - } - }); - - bool thrown2 = false; - thread2 = std::thread([&]() - { - try - { - cache.getOrSet("key", [&]() { return load_func("val1", 1s, true); }); - } - catch (...) - { - thrown2 = true; - } - }); - - thread1.join(); - thread2.join(); - - ASSERT_CHECK((thrown1 == true), res); - ASSERT_CHECK((thrown2 == true), res); - ASSERT_CHECK((cache.get("key") == nullptr), res); - - /// Case 4: Concurrent reset. - - cache.reset(); - - thread1 = std::thread([&]() - { - result1 = cache.getOrSet("key", [&]() { return load_func("val1", 2s, false); }); - }); - - std::this_thread::sleep_for(1s); - cache.reset(); - - thread1.join(); - - ASSERT_CHECK((result1.second == true), res); - ASSERT_CHECK((*result1.first == "val1"), res); - ASSERT_CHECK((cache.get("key") == nullptr), res); - - return res; -} - -} - -int main() -{ - run(); - return 0; -} - diff --git a/dbms/src/Compression/CachedCompressedReadBuffer.cpp b/dbms/src/Compression/CachedCompressedReadBuffer.cpp index b39d04cf03f..a1bcb8a7d66 100644 --- a/dbms/src/Compression/CachedCompressedReadBuffer.cpp +++ b/dbms/src/Compression/CachedCompressedReadBuffer.cpp @@ -19,7 +19,7 @@ void CachedCompressedReadBuffer::initInput() { if (!file_in) { - file_in = createReadBufferFromFileBase(path, estimated_size, aio_threshold, buf_size); + file_in = createReadBufferFromFileBase(path, estimated_size, aio_threshold, mmap_threshold, buf_size); compressed_in = file_in.get(); if (profile_callback) @@ -73,10 +73,11 @@ bool CachedCompressedReadBuffer::nextImpl() CachedCompressedReadBuffer::CachedCompressedReadBuffer( - const std::string & path_, UncompressedCache * cache_, size_t estimated_size_, size_t aio_threshold_, + const std::string & path_, UncompressedCache * cache_, + size_t estimated_size_, size_t aio_threshold_, size_t mmap_threshold_, size_t buf_size_) : ReadBuffer(nullptr, 0), path(path_), cache(cache_), buf_size(buf_size_), estimated_size(estimated_size_), - aio_threshold(aio_threshold_), file_pos(0) + aio_threshold(aio_threshold_), mmap_threshold(mmap_threshold_), file_pos(0) { } diff --git a/dbms/src/Compression/CachedCompressedReadBuffer.h b/dbms/src/Compression/CachedCompressedReadBuffer.h index 174ddb98587..52ef750ff19 100644 --- a/dbms/src/Compression/CachedCompressedReadBuffer.h +++ b/dbms/src/Compression/CachedCompressedReadBuffer.h @@ -26,6 +26,7 @@ private: size_t buf_size; size_t estimated_size; size_t aio_threshold; + size_t mmap_threshold; std::unique_ptr file_in; size_t file_pos; @@ -42,7 +43,8 @@ private: public: CachedCompressedReadBuffer( - const std::string & path_, UncompressedCache * cache_, size_t estimated_size_, size_t aio_threshold_, + const std::string & path_, UncompressedCache * cache_, + size_t estimated_size_, size_t aio_threshold_, size_t mmap_threshold_, size_t buf_size_ = DBMS_DEFAULT_BUFFER_SIZE); diff --git a/dbms/src/Compression/CompressedReadBufferFromFile.cpp b/dbms/src/Compression/CompressedReadBufferFromFile.cpp index e413c5e1086..22eaf9e15e8 100644 --- a/dbms/src/Compression/CompressedReadBufferFromFile.cpp +++ b/dbms/src/Compression/CompressedReadBufferFromFile.cpp @@ -33,9 +33,9 @@ bool CompressedReadBufferFromFile::nextImpl() CompressedReadBufferFromFile::CompressedReadBufferFromFile( - const std::string & path, size_t estimated_size, size_t aio_threshold, size_t buf_size) + const std::string & path, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold, size_t buf_size) : BufferWithOwnMemory(0), - p_file_in(createReadBufferFromFileBase(path, estimated_size, aio_threshold, buf_size)), + p_file_in(createReadBufferFromFileBase(path, estimated_size, aio_threshold, mmap_threshold, buf_size)), file_in(*p_file_in) { compressed_in = &file_in; diff --git a/dbms/src/Compression/CompressedReadBufferFromFile.h b/dbms/src/Compression/CompressedReadBufferFromFile.h index 288a66e321a..641e3d6ed1b 100644 --- a/dbms/src/Compression/CompressedReadBufferFromFile.h +++ b/dbms/src/Compression/CompressedReadBufferFromFile.h @@ -30,7 +30,7 @@ private: public: CompressedReadBufferFromFile( - const std::string & path, size_t estimated_size, size_t aio_threshold, size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE); + const std::string & path, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold, size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE); void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block); diff --git a/dbms/src/Compression/CompressionCodecDoubleDelta.cpp b/dbms/src/Compression/CompressionCodecDoubleDelta.cpp index 17eeba9a152..dc4a5084c83 100644 --- a/dbms/src/Compression/CompressionCodecDoubleDelta.cpp +++ b/dbms/src/Compression/CompressionCodecDoubleDelta.cpp @@ -26,7 +26,7 @@ extern const int CANNOT_DECOMPRESS; namespace { -Int64 getMaxValueForByteSize(UInt8 byte_size) +inline Int64 getMaxValueForByteSize(Int8 byte_size) { switch (byte_size) { @@ -51,11 +51,56 @@ struct WriteSpec const UInt8 data_bits; }; -const std::array DELTA_SIZES{7, 9, 12, 32, 64}; +// delta size prefix and data lengths based on few high bits peeked from binary stream +static const WriteSpec WRITE_SPEC_LUT[32] = { + // 0b0 - 1-bit prefix, no data to read + /* 00000 */ {1, 0b0, 0}, + /* 00001 */ {1, 0b0, 0}, + /* 00010 */ {1, 0b0, 0}, + /* 00011 */ {1, 0b0, 0}, + /* 00100 */ {1, 0b0, 0}, + /* 00101 */ {1, 0b0, 0}, + /* 00110 */ {1, 0b0, 0}, + /* 00111 */ {1, 0b0, 0}, + /* 01000 */ {1, 0b0, 0}, + /* 01001 */ {1, 0b0, 0}, + /* 01010 */ {1, 0b0, 0}, + /* 01011 */ {1, 0b0, 0}, + /* 01100 */ {1, 0b0, 0}, + /* 01101 */ {1, 0b0, 0}, + /* 01110 */ {1, 0b0, 0}, + /* 01111 */ {1, 0b0, 0}, + + // 0b10 - 2 bit prefix, 7 bits of data + /* 10000 */ {2, 0b10, 7}, + /* 10001 */ {2, 0b10, 7}, + /* 10010 */ {2, 0b10, 7}, + /* 10011 */ {2, 0b10, 7}, + /* 10100 */ {2, 0b10, 7}, + /* 10101 */ {2, 0b10, 7}, + /* 10110 */ {2, 0b10, 7}, + /* 10111 */ {2, 0b10, 7}, + + // 0b110 - 3 bit prefix, 9 bits of data + /* 11000 */ {3, 0b110, 9}, + /* 11001 */ {3, 0b110, 9}, + /* 11010 */ {3, 0b110, 9}, + /* 11011 */ {3, 0b110, 9}, + + // 0b1110 - 4 bit prefix, 12 bits of data + /* 11100 */ {4, 0b1110, 12}, + /* 11101 */ {4, 0b1110, 12}, + + // 5-bit prefixes + /* 11110 */ {5, 0b11110, 32}, + /* 11111 */ {5, 0b11111, 64}, +}; + template WriteSpec getDeltaWriteSpec(const T & value) { + // TODO: to speed up things a bit by counting number of leading zeroes instead of doing lots of comparisons if (value > -63 && value < 64) { return WriteSpec{2, 0b10, 7}; @@ -107,14 +152,15 @@ UInt32 getCompressedDataSize(UInt8 data_bytes_size, UInt32 uncompressed_size) template UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest) { - // Since only unsinged int has granted 2-compliment overflow handling, we are doing math here on unsigned types. - // To simplify and booletproof code, we operate enforce ValueType to be unsigned too. + // Since only unsinged int has granted 2-complement overflow handling, + // we are doing math here only on unsigned types. + // To simplify and booletproof code, we enforce ValueType to be unsigned too. static_assert(is_unsigned_v, "ValueType must be unsigned."); using UnsignedDeltaType = ValueType; // We use signed delta type to turn huge unsigned values into smaller signed: // ffffffff => -1 - using SignedDeltaType = typename std::make_signed::type; + using SignedDeltaType = typename std::make_signed_t; if (source_size % sizeof(ValueType) != 0) throw Exception("Cannot compress, data size " + toString(source_size) @@ -149,8 +195,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest) prev_value = curr_value; } - WriteBuffer buffer(dest, getCompressedDataSize(sizeof(ValueType), source_size - sizeof(ValueType)*2)); - BitWriter writer(buffer); + BitWriter writer(dest, getCompressedDataSize(sizeof(ValueType), source_size - sizeof(ValueType)*2)); int item = 2; for (; source < source_end; source += sizeof(ValueType), ++item) @@ -170,7 +215,8 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest) else { const SignedDeltaType signed_dd = static_cast(double_delta); - const auto sign = std::signbit(signed_dd); + const auto sign = signed_dd < 0; + // -1 shirnks dd down to fit into number of bits, and there can't be 0, so it is OK. const auto abs_value = static_cast(std::abs(signed_dd) - 1); const auto write_spec = getDeltaWriteSpec(signed_dd); @@ -183,7 +229,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest) writer.flush(); - return sizeof(items_count) + sizeof(prev_value) + sizeof(prev_delta) + buffer.count(); + return sizeof(items_count) + sizeof(prev_value) + sizeof(prev_delta) + writer.count() / 8; } template @@ -220,35 +266,28 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest) dest += sizeof(prev_value); } - ReadBufferFromMemory buffer(source, source_size - sizeof(prev_value) - sizeof(prev_delta) - sizeof(items_count)); - BitReader reader(buffer); + BitReader reader(source, source_size - sizeof(prev_value) - sizeof(prev_delta) - sizeof(items_count)); // since data is tightly packed, up to 1 bit per value, and last byte is padded with zeroes, // we have to keep track of items to avoid reading more that there is. for (UInt32 items_read = 2; items_read < items_count && !reader.eof(); ++items_read) { UnsignedDeltaType double_delta = 0; - if (reader.readBit() == 1) - { - UInt8 i = 0; - for (; i < sizeof(DELTA_SIZES) - 1; ++i) - { - const auto next_bit = reader.readBit(); - if (next_bit == 0) - { - break; - } - } + static_assert(sizeof(WRITE_SPEC_LUT)/sizeof(WRITE_SPEC_LUT[0]) == 32); // 5-bit prefix lookup table + const auto write_spec = WRITE_SPEC_LUT[reader.peekByte() >> (8 - 5)]; // only 5 high bits of peeked byte value + + reader.skipBufferedBits(write_spec.prefix_bits); // discard the prefix value, since we've already used it + if (write_spec.data_bits != 0) + { const UInt8 sign = reader.readBit(); - SignedDeltaType signed_dd = static_cast(reader.readBits(DELTA_SIZES[i] - 1) + 1); + SignedDeltaType signed_dd = static_cast(reader.readBits(write_spec.data_bits - 1) + 1); if (sign) { signed_dd *= -1; } double_delta = static_cast(signed_dd); } - // else if first bit is zero, no need to read more data. const UnsignedDeltaType delta = double_delta + prev_delta; const ValueType curr_value = prev_value + delta; diff --git a/dbms/src/Compression/CompressionCodecDoubleDelta.h b/dbms/src/Compression/CompressionCodecDoubleDelta.h index 3fdaf5f76d8..edbe2eea01d 100644 --- a/dbms/src/Compression/CompressionCodecDoubleDelta.h +++ b/dbms/src/Compression/CompressionCodecDoubleDelta.h @@ -5,6 +5,92 @@ namespace DB { +/** DoubleDelta column codec implementation. + * + * Based on Gorilla paper: http://www.vldb.org/pvldb/vol8/p1816-teller.pdf, which was extended + * to support 64bit types. The drawback is 1 extra bit for 32-byte wide deltas: 5-bit prefix + * instead of 4-bit prefix. + * + * This codec is best used against monotonic integer sequences with constant (or almost contant) + * stride, like event timestamp for some monitoring application. + * + * Given input sequence a: [a0, a1, ... an]: + * + * First, write number of items (sizeof(int32)*8 bits): n + * Then write first item as is (sizeof(a[0])*8 bits): a[0] + * Second item is written as delta (sizeof(a[0])*8 bits): a[1] - a[0] + * Loop over remaining items and calculate double delta: + * double_delta = a[i] - 2 * a[i - 1] + a[i - 2] + * Write it in compact binary form with `BitWriter` + * if double_delta == 0: + * write 1bit: 0 + * else if -63 < double_delta < 64: + * write 2 bit prefix: 10 + * write sign bit (1 if signed): x + * write 7-1 bits of abs(double_delta - 1): xxxxxx + * else if -255 < double_delta < 256: + * write 3 bit prefix: 110 + * write sign bit (1 if signed): x + * write 9-1 bits of abs(double_delta - 1): xxxxxxxx + * else if -2047 < double_delta < 2048: + * write 4 bit prefix: 1110 + * write sign bit (1 if signed): x + * write 12-1 bits of abs(double_delta - 1): xxxxxxxxxxx + * else if double_delta fits into 32-bit int: + * write 5 bit prefix: 11110 + * write sign bit (1 if signed): x + * write 32-1 bits of abs(double_delta - 1): xxxxxxxxxxx... + * else + * write 5 bit prefix: 11111 + * write sign bit (1 if signed): x + * write 64-1 bits of abs(double_delta - 1): xxxxxxxxxxx... + * + * @example sequence of UInt8 values [1, 2, 3, 4, 5, 6, 7, 8, 9 10] is encoded as (codec header is ommited): + * + * .- 4-byte little-endian sequence length (10 == 0xa) + * | .- 1 byte (sizeof(UInt8) a[0] : 0x01 + * | | .- 1 byte of delta: a[1] - a[0] = 2 - 1 = 1 : 0x01 + * | | | .- 8 zero bits since double delta for remaining 8 elements was 0 : 0x00 + * v_______________v___v___v___ + * \x0a\x00\x00\x00\x01\x01\x00 + * + * @example sequence of Int16 values [-10, 10, -20, 20, -40, 40] is encoded as: + * + * .- 4-byte little endian sequence length = 6 : 0x00000006 + * | .- 2 bytes (sizeof(Int16) a[0] as UInt16 = -10 : 0xfff6 + * | | .- 2 bytes of delta: a[1] - a[0] = 10 - (-10) = 20 : 0x0014 + * | | | .- 4 encoded double deltas (see below) + * v_______________ v______ v______ v______________________ + * \x06\x00\x00\x00\xf6\xff\x14\x00\xb8\xe2\x2e\xb1\xe4\x58 + * + * 4 binary encoded double deltas (\xb8\xe2\x2e\xb1\xe4\x58): + * double_delta (DD) = -20 - 2 * 10 + (-10) = -50 + * .- 2-bit prefix : 0b10 + * | .- sign-bit : 0b1 + * | |.- abs(DD - 1) = 49 : 0b110001 + * | || + * | || DD = 20 - 2 * (-20) + 10 = 70 + * | || .- 3-bit prefix : 0b110 + * | || | .- sign bit : 0b0 + * | || | |.- abs(DD - 1) = 69 : 0b1000101 + * | || | || + * | || | || DD = -40 - 2 * 20 + (-20) = -100 + * | || | || .- 3-bit prefix : 0b110 + * | || | || | .- sign-bit : 0b0 + * | || | || | |.- abs(DD - 1) = 99 : 0b1100011 + * | || | || | || + * | || | || | || DD = 40 - 2 * (-40) + 20 = 140 + * | || | || | || .- 3-bit prefix : 0b110 + * | || | || | || | .- sign bit : 0b0 + * | || | || | || | |.- abs(DD - 1) = 139 : 0b10001011 + * | || | || | || | || + * V_vv______V__vv________V____vv_______V__vv________,- padding bits + * 10111000 11100010 00101110 10110001 11100100 01011000 + * + * Please also see unit tests for: + * * Examples on what output `BitWriter` produces on predefined input. + * * Compatibility tests solidifying encoded binary output on set of predefined sequences. + */ class CompressionCodecDoubleDelta : public ICompressionCodec { public: diff --git a/dbms/src/Compression/CompressionCodecGorilla.cpp b/dbms/src/Compression/CompressionCodecGorilla.cpp index 574e40b06bf..62e7a81aae9 100644 --- a/dbms/src/Compression/CompressionCodecGorilla.cpp +++ b/dbms/src/Compression/CompressionCodecGorilla.cpp @@ -112,8 +112,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest, dest += sizeof(prev_value); } - WriteBuffer buffer(dest, dest_end - dest); - BitWriter writer(buffer); + BitWriter writer(dest, dest_end - dest); while (source < source_end) { @@ -148,7 +147,7 @@ UInt32 compressDataForType(const char * source, UInt32 source_size, char * dest, writer.flush(); - return sizeof(items_count) + sizeof(prev_value) + buffer.count(); + return sizeof(items_count) + sizeof(prev_value) + writer.count() / 8; } template @@ -174,8 +173,7 @@ void decompressDataForType(const char * source, UInt32 source_size, char * dest) dest += sizeof(prev_value); } - ReadBufferFromMemory buffer(source, source_size - sizeof(items_count) - sizeof(prev_value)); - BitReader reader(buffer); + BitReader reader(source, source_size - sizeof(items_count) - sizeof(prev_value)); binary_value_info prev_xored_info{0, 0, 0}; diff --git a/dbms/src/Compression/CompressionCodecGorilla.h b/dbms/src/Compression/CompressionCodecGorilla.h index 0bbd220cb59..94941b7aa06 100644 --- a/dbms/src/Compression/CompressionCodecGorilla.h +++ b/dbms/src/Compression/CompressionCodecGorilla.h @@ -5,6 +5,89 @@ namespace DB { +/** Gorilla column codec implementation. + * + * Based on Gorilla paper: http://www.vldb.org/pvldb/vol8/p1816-teller.pdf + * + * This codec is best used against monotonic floating sequences, like CPU usage percentage + * or any other gauge. + * + * Given input sequence a: [a0, a1, ... an] + * + * First, write number of items (sizeof(int32)*8 bits): n + * Then write first item as is (sizeof(a[0])*8 bits): a[0] + * Loop over remaining items and calculate xor_diff: + * xor_diff = a[i] ^ a[i - 1] (e.g. 00000011'10110100) + * Write it in compact binary form with `BitWriter` + * if xor_diff == 0: + * write 1 bit: 0 + * else: + * calculate leading zero bits (lzb) + * and trailing zero bits (tzb) of xor_diff, + * compare to lzb and tzb of previous xor_diff + * (X = sizeof(a[i]) * 8, e.g. X = 16, lzb = 6, tzb = 2) + * if lzb >= prev_lzb && tzb >= prev_tzb: + * (e.g. prev_lzb=4, prev_tzb=1) + * write 2 bit prefix: 0b10 + * write xor_diff >> prev_tzb (X - prev_lzb - prev_tzb bits):0b00111011010 + * (where X = sizeof(a[i]) * 8, e.g. 16) + * else: + * write 2 bit prefix: 0b11 + * write 5 bits of lzb: 0b00110 + * write 6 bits of (X - lzb - tzb)=(16-6-2)=8: 0b001000 + * write (X - lzb - tzb) non-zero bits of xor_diff: 0b11101101 + * prev_lzb = lzb + * prev_tzb = tzb + * + * @example sequence of Float32 values [0.1, 0.1, 0.11, 0.2, 0.1] is encoded as: + * + * .- 4-byte little endian sequence length: 5 : 0x00000005 + * | .- 4 byte (sizeof(Float32) a[0] as UInt32 : -10 : 0xcdcccc3d + * | | .- 4 encoded xor diffs (see below) + * v_______________ v______________ v__________________________________________________ + * \x05\x00\x00\x00\xcd\xcc\xcc\x3d\x6a\x5a\xd8\xb6\x3c\xcd\x75\xb1\x6c\x77\x00\x00\x00 + * + * 4 binary encoded xor diffs (\x6a\x5a\xd8\xb6\x3c\xcd\x75\xb1\x6c\x77\x00\x00\x00): + * + * ........................................... + * a[i-1] = 00111101110011001100110011001101 + * a[i] = 00111101110011001100110011001101 + * xor_diff = 00000000000000000000000000000000 + * .- 1-bit prefix : 0b0 + * | + * | ........................................... + * | a[i-1] = 00111101110011001100110011001101 + * ! a[i] = 00111101111000010100011110101110 + * | xor_diff = 00000000001011011000101101100011 + * | lzb = 10 + * | tzb = 0 + * |.- 2-bit prefix : 0b11 + * || .- lzb (10) : 0b1010 + * || | .- data length (32-10-0): 22 : 0b010110 + * || | | .- data : 0b1011011000101101100011 + * || | | | + * || | | | ........................................... + * || | | | a[i-1] = 00111101111000010100011110101110 + * || | | | a[i] = 00111110010011001100110011001101 + * || | | | xor_diff = 00000011101011011000101101100011 + * || | | | .- 2-bit prefix : 0b11 + * || | | | | .- lzb = 6 : 0b00110 + * || | | | | | .- data length = (32 - 6) = 26 : 0b011010 + * || | | | | | | .- data : 0b11101011011000101101100011 + * || | | | | | | | + * || | | | | | | | ........................................... + * || | | | | | | | a[i-1] = 00111110010011001100110011001101 + * || | | | | | | | a[i] = 00111101110011001100110011001101 + * || | | | | | | | xor_diff = 00000011100000000000000000000000 + * || | | | | | | | .- 2-bit prefix : 0b10 + * || | | | | | | | | .- data : 0b11100000000000000000000000 + * VV_v____ v_____v________________________V_v_____v______v____________________________V_v_____________________________ + * 01101010 01011010 11011000 10110110 00111100 11001101 01110101 10110001 01101100 01110111 00000000 00000000 00000000 + * + * Please also see unit tests for: + * * Examples on what output `BitWriter` produces on predefined input. + * * Compatibility tests solidifying encoded binary output on set of predefined sequences. + */ class CompressionCodecGorilla : public ICompressionCodec { public: diff --git a/dbms/src/Compression/tests/cached_compressed_read_buffer.cpp b/dbms/src/Compression/tests/cached_compressed_read_buffer.cpp index fb30d691745..01dcd5a9fcd 100644 --- a/dbms/src/Compression/tests/cached_compressed_read_buffer.cpp +++ b/dbms/src/Compression/tests/cached_compressed_read_buffer.cpp @@ -32,7 +32,7 @@ int main(int argc, char ** argv) { Stopwatch watch; - CachedCompressedReadBuffer in(path, &cache, 0, 0); + CachedCompressedReadBuffer in(path, &cache, 0, 0, 0); WriteBufferFromFile out("/dev/null"); copyData(in, out); @@ -44,7 +44,7 @@ int main(int argc, char ** argv) { Stopwatch watch; - CachedCompressedReadBuffer in(path, &cache, 0, 0); + CachedCompressedReadBuffer in(path, &cache, 0, 0, 0); WriteBufferFromFile out("/dev/null"); copyData(in, out); diff --git a/dbms/src/Compression/tests/gtest_compressionCodec.cpp b/dbms/src/Compression/tests/gtest_compressionCodec.cpp index 32fff70d564..95bef3b691e 100644 --- a/dbms/src/Compression/tests/gtest_compressionCodec.cpp +++ b/dbms/src/Compression/tests/gtest_compressionCodec.cpp @@ -1,6 +1,7 @@ #include #include +#include #include #include #include @@ -62,6 +63,32 @@ std::vector operator+(std::vector && left, std::vector && right) namespace { +template +struct AsHexStringHelper +{ + const T & container; +}; + +template +std::ostream & operator << (std::ostream & ostr, const AsHexStringHelper & helper) +{ + ostr << std::hex; + for (const auto & e : helper.container) + { + ostr << "\\x" << std::setw(2) << std::setfill('0') << (static_cast(e) & 0xFF); + } + + return ostr; +} + +template +AsHexStringHelper AsHexString(const T & container) +{ + static_assert (sizeof(container[0]) == 1 && std::is_pod>::value, "Only works on containers of byte-size PODs."); + + return AsHexStringHelper{container}; +} + template std::string bin(const T & value, size_t bits = sizeof(T)*8) { @@ -113,10 +140,71 @@ DataTypePtr makeDataType() #undef MAKE_DATA_TYPE - assert(false && "unsupported size"); + assert(false && "unknown datatype"); return nullptr; } +template +class BinaryDataAsSequenceOfValuesIterator +{ + const Container & container; + const void * data; + const void * data_end; + + T current_value; + +public: + using Self = BinaryDataAsSequenceOfValuesIterator; + + explicit BinaryDataAsSequenceOfValuesIterator(const Container & container_) + : container(container_), + data(&container[0]), + data_end(reinterpret_cast(data) + container.size()), + current_value(T{}) + { + static_assert(sizeof(container[0]) == 1 && std::is_pod>::value, "Only works on containers of byte-size PODs."); + read(); + } + + const T & operator*() const + { + return current_value; + } + + size_t ItemsLeft() const + { + return reinterpret_cast(data_end) - reinterpret_cast(data); + } + + Self & operator++() + { + read(); + return *this; + } + + operator bool() const + { + return ItemsLeft() > 0; + } + +private: + void read() + { + if (!*this) + { + throw std::runtime_error("No more data to read"); + } + + current_value = unalignedLoad(data); + data = reinterpret_cast(data) + sizeof(T); + } +}; + +template +BinaryDataAsSequenceOfValuesIterator AsSequenceOf(const Container & container) +{ + return BinaryDataAsSequenceOfValuesIterator(container); +} template ::testing::AssertionResult EqualByteContainersAs(const ContainerLeft & left, const ContainerRight & right) @@ -126,9 +214,6 @@ template ::testing::AssertionResult result = ::testing::AssertionSuccess(); - ReadBufferFromMemory left_read_buffer(left.data(), left.size()); - ReadBufferFromMemory right_read_buffer(right.data(), right.size()); - const auto l_size = left.size() / sizeof(T); const auto r_size = right.size() / sizeof(T); const auto size = std::min(l_size, r_size); @@ -137,16 +222,25 @@ template { result = ::testing::AssertionFailure() << "size mismatch" << " expected: " << l_size << " got:" << r_size; } + if (l_size == 0 || r_size == 0) + { + return result; + } + + auto l = AsSequenceOf(left); + auto r = AsSequenceOf(right); const auto MAX_MISMATCHING_ITEMS = 5; int mismatching_items = 0; - for (int i = 0; i < size; ++i) - { - T left_value{}; - left_read_buffer.readStrict(reinterpret_cast(&left_value), sizeof(left_value)); + size_t i = 0; - T right_value{}; - right_read_buffer.readStrict(reinterpret_cast(&right_value), sizeof(right_value)); + while (l && r) + { + const auto left_value = *l; + const auto right_value = *r; + ++l; + ++r; + ++i; if (left_value != right_value) { @@ -157,25 +251,47 @@ template if (++mismatching_items <= MAX_MISMATCHING_ITEMS) { - result << "mismatching " << sizeof(T) << "-byte item #" << i + result << "\nmismatching " << sizeof(T) << "-byte item #" << i << "\nexpected: " << bin(left_value) << " (0x" << std::hex << left_value << ")" - << "\ngot : " << bin(right_value) << " (0x" << std::hex << right_value << ")" - << std::endl; + << "\ngot : " << bin(right_value) << " (0x" << std::hex << right_value << ")"; if (mismatching_items == MAX_MISMATCHING_ITEMS) { - result << "..." << std::endl; + result << "\n..." << std::endl; } } } } if (mismatching_items > 0) { - result << "\ntotal mismatching items:" << mismatching_items << " of " << size; + result << "total mismatching items:" << mismatching_items << " of " << size; } return result; } +template +::testing::AssertionResult EqualByteContainers(UInt8 element_size, const ContainerLeft & left, const ContainerRight & right) +{ + switch (element_size) + { + case 1: + return EqualByteContainersAs(left, right); + break; + case 2: + return EqualByteContainersAs(left, right); + break; + case 4: + return EqualByteContainersAs(left, right); + break; + case 8: + return EqualByteContainersAs(left, right); + break; + default: + assert(false && "Invalid element_size"); + return ::testing::AssertionFailure() << "Invalid element_size: " << element_size; + } +} + struct Codec { std::string codec_statement; @@ -214,20 +330,23 @@ struct CodecTestSequence CodecTestSequence & operator=(const CodecTestSequence &) = default; CodecTestSequence(CodecTestSequence &&) = default; CodecTestSequence & operator=(CodecTestSequence &&) = default; + + CodecTestSequence & append(const CodecTestSequence & other) + { + assert(data_type->equals(*other.data_type)); + + serialized_data.insert(serialized_data.end(), other.serialized_data.begin(), other.serialized_data.end()); + if (!name.empty()) + name += " + "; + name += other.name; + + return *this; + } }; -CodecTestSequence operator+(CodecTestSequence && left, CodecTestSequence && right) +CodecTestSequence operator+(CodecTestSequence && left, const CodecTestSequence & right) { - assert(left.data_type->equals(*right.data_type)); - - std::vector data(std::move(left.serialized_data)); - data.insert(data.end(), right.serialized_data.begin(), right.serialized_data.end()); - - return CodecTestSequence{ - left.name + " + " + right.name, - std::move(data), - std::move(left.data_type) - }; + return left.append(right); } template @@ -288,17 +407,22 @@ CodecTestSequence makeSeq(Args && ... args) }; } -template -CodecTestSequence generateSeq(Generator gen, const char* gen_name, size_t Begin = 0, size_t End = 10000) +template +CodecTestSequence generateSeq(Generator gen, const char* gen_name, B Begin = 0, E End = 10000) { - assert (End >= Begin); - + const auto direction = std::signbit(End - Begin) ? -1 : 1; std::vector data(sizeof(T) * (End - Begin)); char * write_pos = data.data(); - for (size_t i = Begin; i < End; ++i) + for (auto i = Begin; i < End; i += direction) { const T v = gen(static_cast(i)); + +// if constexpr (debug_log_items) +// { +// std::cerr << "#" << i << " " << type_name() << "(" << sizeof(T) << " bytes) : " << v << std::endl; +// } + unalignedStore(write_pos, v); write_pos += sizeof(v); } @@ -310,6 +434,96 @@ CodecTestSequence generateSeq(Generator gen, const char* gen_name, size_t Begin }; } +struct NoOpTimer +{ + void start() {} + void report(const char*) {} +}; + +struct StopwatchTimer +{ + explicit StopwatchTimer(clockid_t clock_type, size_t estimated_marks = 32) + : stopwatch(clock_type) + { + results.reserve(estimated_marks); + } + + void start() + { + stopwatch.restart(); + } + + void report(const char * mark) + { + results.emplace_back(mark, stopwatch.elapsed()); + } + + void stop() + { + stopwatch.stop(); + } + + const std::vector> & getResults() const + { + return results; + } + +private: + Stopwatch stopwatch; + std::vector> results; +}; + +CompressionCodecPtr makeCodec(const std::string & codec_string, const DataTypePtr data_type) +{ + const std::string codec_statement = "(" + codec_string + ")"; + Tokens tokens(codec_statement.begin().base(), codec_statement.end().base()); + IParser::Pos token_iterator(tokens); + + Expected expected; + ASTPtr codec_ast; + ParserCodec parser; + + parser.parse(token_iterator, codec_ast, expected); + + return CompressionCodecFactory::instance().get(codec_ast, data_type); +} + +template +void testTranscoding(Timer & timer, ICompressionCodec & codec, const CodecTestSequence & test_sequence, std::optional expected_compression_ratio = std::optional{}) +{ + const auto & source_data = test_sequence.serialized_data; + + const UInt32 encoded_max_size = codec.getCompressedReserveSize(source_data.size()); + PODArray encoded(encoded_max_size); + + timer.start(); + + const UInt32 encoded_size = codec.compress(source_data.data(), source_data.size(), encoded.data()); + timer.report("encoding"); + + encoded.resize(encoded_size); + + PODArray decoded(source_data.size()); + + timer.start(); + const UInt32 decoded_size = codec.decompress(encoded.data(), encoded.size(), decoded.data()); + timer.report("decoding"); + + decoded.resize(decoded_size); + + ASSERT_TRUE(EqualByteContainers(test_sequence.data_type->getSizeOfValueInMemory(), source_data, decoded)); + + const auto header_size = codec.getHeaderSize(); + const auto compression_ratio = (encoded_size - header_size) / (source_data.size() * 1.0); + + if (expected_compression_ratio) + { + ASSERT_LE(compression_ratio, *expected_compression_ratio) + << "\n\tdecoded size: " << source_data.size() + << "\n\tencoded size: " << encoded_size + << "(no header: " << encoded_size - header_size << ")"; + } +} class CodecTest : public ::testing::TestWithParam> { @@ -320,67 +534,18 @@ public: CODEC_WITHOUT_DATA_TYPE, }; - CompressionCodecPtr makeCodec(MakeCodecParam with_data_type) const + CompressionCodecPtr makeCodec(MakeCodecParam with_data_type) { const auto & codec_string = std::get<0>(GetParam()).codec_statement; const auto & data_type = with_data_type == CODEC_WITH_DATA_TYPE ? std::get<1>(GetParam()).data_type : nullptr; - const std::string codec_statement = "(" + codec_string + ")"; - Tokens tokens(codec_statement.begin().base(), codec_statement.end().base()); - IParser::Pos token_iterator(tokens); - - Expected expected; - ASTPtr codec_ast; - ParserCodec parser; - - parser.parse(token_iterator, codec_ast, expected); - - return CompressionCodecFactory::instance().get(codec_ast, data_type); + return ::makeCodec(codec_string, data_type); } void testTranscoding(ICompressionCodec & codec) { - const auto & test_sequence = std::get<1>(GetParam()); - const auto & source_data = test_sequence.serialized_data; - - const UInt32 encoded_max_size = codec.getCompressedReserveSize(source_data.size()); - PODArray encoded(encoded_max_size); - - const UInt32 encoded_size = codec.compress(source_data.data(), source_data.size(), encoded.data()); - encoded.resize(encoded_size); - - PODArray decoded(source_data.size()); - const UInt32 decoded_size = codec.decompress(encoded.data(), encoded.size(), decoded.data()); - decoded.resize(decoded_size); - - switch (test_sequence.data_type->getSizeOfValueInMemory()) - { - case 1: - ASSERT_TRUE(EqualByteContainersAs(source_data, decoded)); - break; - case 2: - ASSERT_TRUE(EqualByteContainersAs(source_data, decoded)); - break; - case 4: - ASSERT_TRUE(EqualByteContainersAs(source_data, decoded)); - break; - case 8: - ASSERT_TRUE(EqualByteContainersAs(source_data, decoded)); - break; - default: - FAIL() << "Invalid test sequence data type: " << test_sequence.data_type->getName(); - } - const auto header_size = codec.getHeaderSize(); - const auto compression_ratio = (encoded_size - header_size) / (source_data.size() * 1.0); - - const auto & codec_spec = std::get<0>(GetParam()); - if (codec_spec.expected_compression_ratio) - { - ASSERT_LE(compression_ratio, *codec_spec.expected_compression_ratio) - << "\n\tdecoded size: " << source_data.size() - << "\n\tencoded size: " << encoded_size - << "(no header: " << encoded_size - header_size << ")"; - } + NoOpTimer timer; + ::testTranscoding(timer, codec, std::get<1>(GetParam()), std::get<0>(GetParam()).expected_compression_ratio); } }; @@ -396,10 +561,121 @@ TEST_P(CodecTest, TranscodingWithoutDataType) testTranscoding(*codec); } +// Param is tuple-of-tuple to simplify instantiating with values, since typically group of cases test only one codec. +class CodecTest_Compatibility : public ::testing::TestWithParam>> +{}; + +// Check that iput sequence when encoded matches the encoded string binary. +TEST_P(CodecTest_Compatibility, Encoding) +{ + const auto & codec_spec = std::get<0>(GetParam()); + const auto & [data_sequence, expected] = std::get<1>(GetParam()); + const auto codec = makeCodec(codec_spec.codec_statement, data_sequence.data_type); + + const auto & source_data = data_sequence.serialized_data; + + // Just encode the data with codec + const UInt32 encoded_max_size = codec->getCompressedReserveSize(source_data.size()); + PODArray encoded(encoded_max_size); + + const UInt32 encoded_size = codec->compress(source_data.data(), source_data.size(), encoded.data()); + encoded.resize(encoded_size); + SCOPED_TRACE(::testing::Message("encoded: ") << AsHexString(encoded)); + + ASSERT_TRUE(EqualByteContainersAs(expected, encoded)); +} + +// Check that binary string is exactly decoded into input sequence. +TEST_P(CodecTest_Compatibility, Decoding) +{ + const auto & codec_spec = std::get<0>(GetParam()); + const auto & [expected, encoded_data] = std::get<1>(GetParam()); + const auto codec = makeCodec(codec_spec.codec_statement, expected.data_type); + + PODArray decoded(expected.serialized_data.size()); + const UInt32 decoded_size = codec->decompress(encoded_data.c_str(), encoded_data.size(), decoded.data()); + decoded.resize(decoded_size); + + ASSERT_TRUE(EqualByteContainers(expected.data_type->getSizeOfValueInMemory(), expected.serialized_data, decoded)); +} + +class CodecTest_Performance : public ::testing::TestWithParam> +{}; + +TEST_P(CodecTest_Performance, TranscodingWithDataType) +{ + const auto & [codec_spec, test_seq] = GetParam(); + const auto codec = ::makeCodec(codec_spec.codec_statement, test_seq.data_type); + + const auto runs = 10; + std::map> results; + + for (size_t i = 0; i < runs; ++i) + { + StopwatchTimer timer{CLOCK_THREAD_CPUTIME_ID}; + ::testTranscoding(timer, *codec, test_seq); + timer.stop(); + + for (const auto & [label, value] : timer.getResults()) + { + results[label].push_back(value); + } + } + + auto computeMeanAndStdDev = [](const auto & values) + { + double mean{}; + + if (values.size() < 2) + return std::make_tuple(mean, double{}); + + using ValueType = typename std::decay_t::value_type; + std::vector tmp_v(std::begin(values), std::end(values)); + std::sort(tmp_v.begin(), tmp_v.end()); + + // remove min and max + tmp_v.erase(tmp_v.begin()); + tmp_v.erase(tmp_v.end() - 1); + + for (const auto & v : tmp_v) + { + mean += v; + } + + mean = mean / tmp_v.size(); + double std_dev = 0.0; + for (const auto & v : tmp_v) + { + const auto d = (v - mean); + std_dev += (d * d); + } + std_dev = std::sqrt(std_dev / tmp_v.size()); + + return std::make_tuple(mean, std_dev); + }; + + std::cerr << codec_spec.codec_statement + << " " << test_seq.data_type->getName() + << " (" << test_seq.serialized_data.size() << " bytes, " + << std::hex << CityHash_v1_0_2::CityHash64(test_seq.serialized_data.data(), test_seq.serialized_data.size()) << std::dec + << ", average of " << runs << " runs, μs)"; + + for (const auto & k : {"encoding", "decoding"}) + { + const auto & values = results[k]; + const auto & [mean, std_dev] = computeMeanAndStdDev(values); + // Ensure that Coefficient of variation is reasonably low, otherwise these numbers are meaningless + EXPECT_GT(0.05, std_dev / mean); + std::cerr << "\t" << std::fixed << std::setprecision(1) << mean / 1000.0; + } + + std::cerr << std::endl; +} + /////////////////////////////////////////////////////////////////////////////////////////////////// // Here we use generators to produce test payload for codecs. // Generator is a callable that can produce infinite number of values, -// output value MUST be of the same type input value. +// output value MUST be of the same type as input value. /////////////////////////////////////////////////////////////////////////////////////////////////// auto SameValueGenerator = [](auto value) @@ -543,6 +819,23 @@ std::vector generatePyramidOfSequences(const size_t sequences return sequences; }; +// Just as if all sequences from generatePyramidOfSequences were appended to one-by-one to the first one. +template +CodecTestSequence generatePyramidSequence(const size_t sequences_count, Generator && generator, const char* generator_name) +{ + CodecTestSequence sequence; + sequence.data_type = makeDataType(); + sequence.serialized_data.reserve(sequences_count * sequences_count * sizeof(T)); + + for (size_t i = 1; i < sequences_count; ++i) + { + std::string name = generator_name + std::string(" from 0 to ") + std::to_string(i); + sequence.append(generateSeq(std::forward(generator), name.c_str(), 0, i)); + } + + return sequence; +}; + // helper macro to produce human-friendly sequence name from generator #define G(generator) generator, #generator @@ -575,7 +868,7 @@ INSTANTIATE_TEST_CASE_P(SmallSequences, ::testing::Combine( DefaultCodecsToTest, ::testing::ValuesIn( - generatePyramidOfSequences(42, G(SequentialGenerator(1))) + generatePyramidOfSequences(42, G(SequentialGenerator(1))) + generatePyramidOfSequences(42, G(SequentialGenerator(1))) + generatePyramidOfSequences(42, G(SequentialGenerator(1))) + generatePyramidOfSequences(42, G(SequentialGenerator(1))) @@ -609,7 +902,7 @@ INSTANTIATE_TEST_CASE_P(SameValueInt, ::testing::Combine( DefaultCodecsToTest, ::testing::Values( - generateSeq(G(SameValueGenerator(1000))), + generateSeq(G(SameValueGenerator(1000))), generateSeq(G(SameValueGenerator(1000))), generateSeq(G(SameValueGenerator(1000))), generateSeq(G(SameValueGenerator(1000))), @@ -626,7 +919,7 @@ INSTANTIATE_TEST_CASE_P(SameNegativeValueInt, ::testing::Combine( DefaultCodecsToTest, ::testing::Values( - generateSeq(G(SameValueGenerator(-1000))), + generateSeq(G(SameValueGenerator(-1000))), generateSeq(G(SameValueGenerator(-1000))), generateSeq(G(SameValueGenerator(-1000))), generateSeq(G(SameValueGenerator(-1000))), @@ -671,7 +964,7 @@ INSTANTIATE_TEST_CASE_P(SequentialInt, ::testing::Combine( DefaultCodecsToTest, ::testing::Values( - generateSeq(G(SequentialGenerator(1))), + generateSeq(G(SequentialGenerator(1))), generateSeq(G(SequentialGenerator(1))), generateSeq(G(SequentialGenerator(1))), generateSeq(G(SequentialGenerator(1))), @@ -690,7 +983,7 @@ INSTANTIATE_TEST_CASE_P(SequentialReverseInt, ::testing::Combine( DefaultCodecsToTest, ::testing::Values( - generateSeq(G(SequentialGenerator(-1))), + generateSeq(G(SequentialGenerator(-1))), generateSeq(G(SequentialGenerator(-1))), generateSeq(G(SequentialGenerator(-1))), generateSeq(G(SequentialGenerator(-1))), @@ -735,10 +1028,10 @@ INSTANTIATE_TEST_CASE_P(MonotonicInt, ::testing::Combine( DefaultCodecsToTest, ::testing::Values( - generateSeq(G(MonotonicGenerator(1, 5))), - generateSeq(G(MonotonicGenerator(1, 5))), - generateSeq(G(MonotonicGenerator(1, 5))), - generateSeq(G(MonotonicGenerator(1, 5))), + generateSeq(G(MonotonicGenerator(1, 5))), + generateSeq(G(MonotonicGenerator(1, 5))), + generateSeq(G(MonotonicGenerator(1, 5))), + generateSeq(G(MonotonicGenerator(1, 5))), generateSeq(G(MonotonicGenerator(1, 5))), generateSeq(G(MonotonicGenerator(1, 5))), generateSeq(G(MonotonicGenerator(1, 5))), @@ -752,11 +1045,11 @@ INSTANTIATE_TEST_CASE_P(MonotonicReverseInt, ::testing::Combine( DefaultCodecsToTest, ::testing::Values( - generateSeq(G(MonotonicGenerator(-1, 5))), - generateSeq(G(MonotonicGenerator(-1, 5))), - generateSeq(G(MonotonicGenerator(-1, 5))), - generateSeq(G(MonotonicGenerator(-1, 5))), - generateSeq(G(MonotonicGenerator(-1, 5))), + generateSeq(G(MonotonicGenerator(-1, 5))), + generateSeq(G(MonotonicGenerator(-1, 5))), + generateSeq(G(MonotonicGenerator(-1, 5))), + generateSeq(G(MonotonicGenerator(-1, 5))), + generateSeq(G(MonotonicGenerator(-1, 5))), generateSeq(G(MonotonicGenerator(-1, 5))), generateSeq(G(MonotonicGenerator(-1, 5))), generateSeq(G(MonotonicGenerator(-1, 5))) @@ -862,4 +1155,191 @@ INSTANTIATE_TEST_CASE_P(OverflowFloat, ), ); +template +auto DDCompatibilityTestSequence() +{ + // Generates sequences with double delta in given range. + auto ddGenerator = [prev_delta = static_cast(0), prev = static_cast(0)](auto dd) mutable + { + const auto curr = dd + prev + prev_delta; + prev = curr; + prev_delta = dd + prev_delta; + return curr; + }; + + auto ret = generateSeq(G(SameValueGenerator(42)), 0, 3); + + // These values are from DoubleDelta paper (and implementation) and represent points at which DD encoded length is changed. + // DD value less that this point is encoded in shorter binary form (bigger - longer binary). + const Int64 dd_corner_points[] = {-63, 64, -255, 256, -2047, 2048, std::numeric_limits::min(), std::numeric_limits::max()}; + for (const auto & p : dd_corner_points) + { + if (std::abs(p) > std::numeric_limits::max()) + { + break; + } + + // - 4 is to allow DD value to settle before transitioning through important point, + // since DD depends on 2 previous values of data, + 2 is arbitrary. + ret.append(generateSeq(G(ddGenerator), p - 4, p + 2)); + } + + return ret; +} + +#define BIN_STR(x) std::string{x, sizeof(x) - 1} + +INSTANTIATE_TEST_CASE_P(DoubleDelta, + CodecTest_Compatibility, + ::testing::Combine( + ::testing::Values(Codec("DoubleDelta")), + ::testing::ValuesIn(std::initializer_list>{ + { + DDCompatibilityTestSequence(), + BIN_STR("\x94\x21\x00\x00\x00\x0f\x00\x00\x00\x01\x00\x0f\x00\x00\x00\x2a\x00\x6b\x65\x5f\x50\x34\xff\x4f\xaf\xb1\xaa\xf4\xf6\x7d\x87\xf8\x80") + }, + { + DDCompatibilityTestSequence(), + BIN_STR("\x94\x27\x00\x00\x00\x15\x00\x00\x00\x01\x00\x15\x00\x00\x00\x2a\x00\x6b\x65\x5f\x50\x34\xff\x4f\xaf\xb1\xaa\xf4\xf6\x7d\x87\xf8\x81\x8e\xd0\xca\x02\x01\x01") + }, + { + DDCompatibilityTestSequence(), + BIN_STR("\x94\x70\x00\x00\x00\x4e\x00\x00\x00\x02\x00\x27\x00\x00\x00\x2a\x00\x00\x00\x6b\x65\x5f\x50\x34\xff\x4f\xaf\xbc\xe3\x5d\xa3\xd3\xd9\xf6\x1f\xe2\x07\x7c\x47\x20\x67\x48\x07\x47\xff\x47\xf6\xfe\xf8\x00\x00\x70\x6b\xd0\x00\x02\x83\xd9\xfb\x9f\xdc\x1f\xfc\x20\x1e\x80\x00\x22\xc8\xf0\x00\x00\x66\x67\xa0\x00\x02\x00\x3d\x00\x00\x0f\xff\xe8\x00\x00\x7f\xee\xff\xdf\x40\x00\x0f\xf2\x78\x00\x01\x7f\x83\x9f\xf7\x9f\xfb\xc0\x00\x00\xff\xfe\x00\x00\x08\x00") + }, + { + DDCompatibilityTestSequence(), + BIN_STR("\x94\x70\x00\x00\x00\x4e\x00\x00\x00\x02\x00\x27\x00\x00\x00\x2a\x00\x00\x00\x6b\x65\x5f\x50\x34\xff\x4f\xaf\xbc\xe3\x5d\xa3\xd3\xd9\xf6\x1f\xe2\x07\x7c\x47\x20\x67\x48\x07\x47\xff\x47\xf6\xfe\xf8\x00\x00\x70\x6b\xd0\x00\x02\x83\xd9\xfb\x9f\xdc\x1f\xfc\x20\x1e\x80\x00\x22\xc8\xf0\x00\x00\x66\x67\xa0\x00\x02\x00\x3d\x00\x00\x0f\xff\xe8\x00\x00\x7f\xee\xff\xdf\x40\x00\x0f\xf2\x78\x00\x01\x7f\x83\x9f\xf7\x9f\xfb\xc0\x00\x00\xff\xfe\x00\x00\x08\x00") + }, + { + DDCompatibilityTestSequence(), + BIN_STR("\x94\x74\x00\x00\x00\x9c\x00\x00\x00\x04\x00\x27\x00\x00\x00\x2a\x00\x00\x00\x00\x00\x00\x00\x6b\x65\x5f\x50\x34\xff\x4f\xaf\xbc\xe3\x5d\xa3\xd3\xd9\xf6\x1f\xe2\x07\x7c\x47\x20\x67\x48\x07\x47\xff\x47\xf6\xfe\xf8\x00\x00\x70\x6b\xd0\x00\x02\x83\xd9\xfb\x9f\xdc\x1f\xfc\x20\x1e\x80\x00\x22\xc8\xf0\x00\x00\x66\x67\xa0\x00\x02\x00\x3d\x00\x00\x0f\xff\xe8\x00\x00\x7f\xee\xff\xdf\x00\x00\x70\x0d\x7a\x00\x02\x80\x7b\x9f\xf7\x9f\xfb\xc0\x00\x00\xff\xfe\x00\x00\x08\x00") + }, + { + DDCompatibilityTestSequence(), + BIN_STR("\x94\xb5\x00\x00\x00\xcc\x00\x00\x00\x04\x00\x33\x00\x00\x00\x2a\x00\x00\x00\x00\x00\x00\x00\x6b\x65\x5f\x50\x34\xff\x4f\xaf\xbc\xe3\x5d\xa3\xd3\xd9\xf6\x1f\xe2\x07\x7c\x47\x20\x67\x48\x07\x47\xff\x47\xf6\xfe\xf8\x00\x00\x70\x6b\xd0\x00\x02\x83\xd9\xfb\x9f\xdc\x1f\xfc\x20\x1e\x80\x00\x22\xc8\xf0\x00\x00\x66\x67\xa0\x00\x02\x00\x3d\x00\x00\x0f\xff\xe8\x00\x00\x7f\xee\xff\xdf\x00\x00\x70\x0d\x7a\x00\x02\x80\x7b\x9f\xf7\x9f\xfb\xc0\x00\x00\xff\xfe\x00\x00\x08\x00\xf3\xff\xf9\x41\xaf\xbf\xff\xd6\x0c\xfc\xff\xff\xff\xfb\xf0\x00\x00\x00\x07\xff\xff\xff\xef\xc0\x00\x00\x00\x3f\xff\xff\xff\xfb\xff\xff\xff\xfa\x69\x74\xf3\xff\xff\xff\xe7\x9f\xff\xff\xff\x7e\x00\x00\x00\x00\xff\xff\xff\xfd\xf8\x00\x00\x00\x07\xff\xff\xff\xf0") + }, + { + DDCompatibilityTestSequence(), + BIN_STR("\x94\xd4\x00\x00\x00\x98\x01\x00\x00\x08\x00\x33\x00\x00\x00\x2a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x6b\x65\x5f\x50\x34\xff\x4f\xaf\xbc\xe3\x5d\xa3\xd3\xd9\xf6\x1f\xe2\x07\x7c\x47\x20\x67\x48\x07\x47\xff\x47\xf6\xfe\xf8\x00\x00\x70\x6b\xd0\x00\x02\x83\xd9\xfb\x9f\xdc\x1f\xfc\x20\x1e\x80\x00\x22\xc8\xf0\x00\x00\x66\x67\xa0\x00\x02\x00\x3d\x00\x00\x0f\xff\xe8\x00\x00\x7f\xee\xff\xdf\x00\x00\x70\x0d\x7a\x00\x02\x80\x7b\x9f\xf7\x9f\xfb\xc0\x00\x00\xff\xfe\x00\x00\x08\x00\xfc\x00\x00\x00\x04\x00\x06\xbe\x4f\xbf\xff\xd6\x0c\xff\x00\x00\x00\x01\x00\x00\x00\x03\xf8\x00\x00\x00\x08\x00\x00\x00\x0f\xc0\x00\x00\x00\x3f\xff\xff\xff\xfb\xff\xff\xff\xfb\xe0\x00\x00\x01\xc0\x00\x00\x06\x9f\x80\x00\x00\x0a\x00\x00\x00\x34\xf3\xff\xff\xff\xe7\x9f\xff\xff\xff\x7e\x00\x00\x00\x00\xff\xff\xff\xfd\xf0\x00\x00\x00\x07\xff\xff\xff\xf0") + }, + { + DDCompatibilityTestSequence(), + BIN_STR("\x94\xd4\x00\x00\x00\x98\x01\x00\x00\x08\x00\x33\x00\x00\x00\x2a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x6b\x65\x5f\x50\x34\xff\x4f\xaf\xbc\xe3\x5d\xa3\xd3\xd9\xf6\x1f\xe2\x07\x7c\x47\x20\x67\x48\x07\x47\xff\x47\xf6\xfe\xf8\x00\x00\x70\x6b\xd0\x00\x02\x83\xd9\xfb\x9f\xdc\x1f\xfc\x20\x1e\x80\x00\x22\xc8\xf0\x00\x00\x66\x67\xa0\x00\x02\x00\x3d\x00\x00\x0f\xff\xe8\x00\x00\x7f\xee\xff\xdf\x00\x00\x70\x0d\x7a\x00\x02\x80\x7b\x9f\xf7\x9f\xfb\xc0\x00\x00\xff\xfe\x00\x00\x08\x00\xfc\x00\x00\x00\x04\x00\x06\xbe\x4f\xbf\xff\xd6\x0c\xff\x00\x00\x00\x01\x00\x00\x00\x03\xf8\x00\x00\x00\x08\x00\x00\x00\x0f\xc0\x00\x00\x00\x3f\xff\xff\xff\xfb\xff\xff\xff\xfb\xe0\x00\x00\x01\xc0\x00\x00\x06\x9f\x80\x00\x00\x0a\x00\x00\x00\x34\xf3\xff\xff\xff\xe7\x9f\xff\xff\xff\x7e\x00\x00\x00\x00\xff\xff\xff\xfd\xf0\x00\x00\x00\x07\xff\xff\xff\xf0") + }, + }) + ), +); + +template +auto DDperformanceTestSequence() +{ + const auto times = 100'000; + return DDCompatibilityTestSequence() * times // average case + + generateSeq(G(MinMaxGenerator()), 0, times) // worst + + generateSeq(G(SameValueGenerator(42)), 0, times); // best +} + +// prime numbers in ascending order with some random repitions hit all the cases of Gorilla. +auto PrimesWithMultiplierGenerator = [](int multiplier = 1) +{ + return [multiplier](auto i) + { + static const int vals[] = { + 2, 3, 5, 7, 11, 11, 13, 17, 19, 23, 29, 29, 31, 37, 41, 43, + 47, 47, 53, 59, 61, 61, 67, 71, 73, 79, 83, 89, 89, 97, 101, 103, + 107, 107, 109, 113, 113, 127, 127, 127 + }; + static const size_t count = sizeof(vals)/sizeof(vals[0]); + + using T = decltype(i); + return static_cast(vals[i % count] * static_cast(multiplier)); + }; +}; + +template +auto GCompatibilityTestSequence() +{ + // Also multiply result by some factor to test large values on types that can hold those. + return generateSeq(G(PrimesWithMultiplierGenerator(intExp10(sizeof(ValueType)))), 0, 42); +} + +INSTANTIATE_TEST_CASE_P(Gorilla, + CodecTest_Compatibility, + ::testing::Combine( + ::testing::Values(Codec("Gorilla")), + ::testing::ValuesIn(std::initializer_list>{ + { + GCompatibilityTestSequence(), + BIN_STR("\x95\x35\x00\x00\x00\x2a\x00\x00\x00\x01\x00\x2a\x00\x00\x00\x14\xe1\xdd\x25\xe5\x7b\x29\x86\xee\x2a\x16\x5a\xc5\x0b\x23\x75\x1b\x3c\xb1\x97\x8b\x5f\xcb\x43\xd9\xc5\x48\xab\x23\xaf\x62\x93\x71\x4a\x73\x0f\xc6\x0a") + }, + { + GCompatibilityTestSequence(), + BIN_STR("\x95\x35\x00\x00\x00\x2a\x00\x00\x00\x01\x00\x2a\x00\x00\x00\x14\xe1\xdd\x25\xe5\x7b\x29\x86\xee\x2a\x16\x5a\xc5\x0b\x23\x75\x1b\x3c\xb1\x97\x8b\x5f\xcb\x43\xd9\xc5\x48\xab\x23\xaf\x62\x93\x71\x4a\x73\x0f\xc6\x0a") + }, + { + GCompatibilityTestSequence(), + BIN_STR("\x95\x52\x00\x00\x00\x54\x00\x00\x00\x02\x00\x2a\x00\x00\x00\xc8\x00\xdc\xfe\x66\xdb\x1f\x4e\xa7\xde\xdc\xd5\xec\x6e\xf7\x37\x3a\x23\xe7\x63\xf5\x6a\x8e\x99\x37\x34\xf9\xf8\x2e\x76\x35\x2d\x51\xbb\x3b\xc3\x6d\x13\xbf\x86\x53\x9e\x25\xe4\xaf\xaf\x63\xd5\x6a\x6e\x76\x35\x3a\x27\xd3\x0f\x91\xae\x6b\x33\x57\x6e\x64\xcc\x55\x81\xe4") + }, + { + GCompatibilityTestSequence(), + BIN_STR("\x95\x52\x00\x00\x00\x54\x00\x00\x00\x02\x00\x2a\x00\x00\x00\xc8\x00\xdc\xfe\x66\xdb\x1f\x4e\xa7\xde\xdc\xd5\xec\x6e\xf7\x37\x3a\x23\xe7\x63\xf5\x6a\x8e\x99\x37\x34\xf9\xf8\x2e\x76\x35\x2d\x51\xbb\x3b\xc3\x6d\x13\xbf\x86\x53\x9e\x25\xe4\xaf\xaf\x63\xd5\x6a\x6e\x76\x35\x3a\x27\xd3\x0f\x91\xae\x6b\x33\x57\x6e\x64\xcc\x55\x81\xe4") + }, + { + GCompatibilityTestSequence(), + BIN_STR("\x95\x65\x00\x00\x00\xa8\x00\x00\x00\x04\x00\x2a\x00\x00\x00\x20\x4e\x00\x00\xe4\x57\x63\xc0\xbb\x67\xbc\xce\x91\x97\x99\x15\x9e\xe3\x36\x3f\x89\x5f\x8e\xf2\xec\x8e\xd3\xbf\x75\x43\x58\xc4\x7e\xcf\x93\x43\x38\xc6\x91\x36\x1f\xe7\xb6\x11\x6f\x02\x73\x46\xef\xe0\xec\x50\xfb\x79\xcb\x9c\x14\xfa\x13\xea\x8d\x66\x43\x48\xa0\xde\x3a\xcf\xff\x26\xe0\x5f\x93\xde\x5e\x7f\x6e\x36\x5e\xe6\xb4\x66\x5d\xb0\x0e\xc4") + }, + { + GCompatibilityTestSequence(), + BIN_STR("\x95\x65\x00\x00\x00\xa8\x00\x00\x00\x04\x00\x2a\x00\x00\x00\x20\x4e\x00\x00\xe4\x57\x63\xc0\xbb\x67\xbc\xce\x91\x97\x99\x15\x9e\xe3\x36\x3f\x89\x5f\x8e\xf2\xec\x8e\xd3\xbf\x75\x43\x58\xc4\x7e\xcf\x93\x43\x38\xc6\x91\x36\x1f\xe7\xb6\x11\x6f\x02\x73\x46\xef\xe0\xec\x50\xfb\x79\xcb\x9c\x14\xfa\x13\xea\x8d\x66\x43\x48\xa0\xde\x3a\xcf\xff\x26\xe0\x5f\x93\xde\x5e\x7f\x6e\x36\x5e\xe6\xb4\x66\x5d\xb0\x0e\xc4") + }, + { + GCompatibilityTestSequence(), + BIN_STR("\x95\x91\x00\x00\x00\x50\x01\x00\x00\x08\x00\x2a\x00\x00\x00\x00\xc2\xeb\x0b\x00\x00\x00\x00\xe3\x2b\xa0\xa6\x19\x85\x98\xdc\x45\x74\x74\x43\xc2\x57\x41\x4c\x6e\x42\x79\xd9\x8f\x88\xa5\x05\xf3\xf1\x94\xa3\x62\x1e\x02\xdf\x05\x10\xf1\x15\x97\x35\x2a\x50\x71\x0f\x09\x6c\x89\xf7\x65\x1d\x11\xb7\xcc\x7d\x0b\x70\xc1\x86\x88\x48\x47\x87\xb6\x32\x26\xa7\x86\x87\x88\xd3\x93\x3d\xfc\x28\x68\x85\x05\x0b\x13\xc6\x5f\xd4\x70\xe1\x5e\x76\xf1\x9f\xf3\x33\x2a\x14\x14\x5e\x40\xc1\x5c\x28\x3f\xec\x43\x03\x05\x11\x91\xe8\xeb\x8e\x0a\x0e\x27\x21\x55\xcb\x39\xbc\x6a\xff\x11\x5d\x81\xa0\xa6\x10") + }, + { + GCompatibilityTestSequence(), + BIN_STR("\x95\x91\x00\x00\x00\x50\x01\x00\x00\x08\x00\x2a\x00\x00\x00\x00\xc2\xeb\x0b\x00\x00\x00\x00\xe3\x2b\xa0\xa6\x19\x85\x98\xdc\x45\x74\x74\x43\xc2\x57\x41\x4c\x6e\x42\x79\xd9\x8f\x88\xa5\x05\xf3\xf1\x94\xa3\x62\x1e\x02\xdf\x05\x10\xf1\x15\x97\x35\x2a\x50\x71\x0f\x09\x6c\x89\xf7\x65\x1d\x11\xb7\xcc\x7d\x0b\x70\xc1\x86\x88\x48\x47\x87\xb6\x32\x26\xa7\x86\x87\x88\xd3\x93\x3d\xfc\x28\x68\x85\x05\x0b\x13\xc6\x5f\xd4\x70\xe1\x5e\x76\xf1\x9f\xf3\x33\x2a\x14\x14\x5e\x40\xc1\x5c\x28\x3f\xec\x43\x03\x05\x11\x91\xe8\xeb\x8e\x0a\x0e\x27\x21\x55\xcb\x39\xbc\x6a\xff\x11\x5d\x81\xa0\xa6\x10") + }, + }) + ), +); + +// These 'tests' try to measure performance of encoding and decoding and hence only make sence to be run locally, +// also they require pretty big data to run agains and generating this data slows down startup of unit test process. +// So un-comment only at your discretion. + +//INSTANTIATE_TEST_CASE_P(DoubleDelta, +// CodecTest_Performance, +// ::testing::Combine( +// ::testing::Values(Codec("DoubleDelta")), +// ::testing::Values( +// DDperformanceTestSequence(), +// DDperformanceTestSequence(), +// DDperformanceTestSequence(), +// DDperformanceTestSequence(), +// DDperformanceTestSequence(), +// DDperformanceTestSequence(), +// DDperformanceTestSequence(), +// DDperformanceTestSequence() +// ) +// ), +//); + +//INSTANTIATE_TEST_CASE_P(Gorilla, +// CodecTest_Performance, +// ::testing::Combine( +// ::testing::Values(Codec("Gorilla")), +// ::testing::Values( +// generatePyramidSequence(42, G(PrimesWithMultiplierGenerator())) * 6'000, +// generatePyramidSequence(42, G(PrimesWithMultiplierGenerator())) * 6'000, +// generatePyramidSequence(42, G(PrimesWithMultiplierGenerator())) * 6'000, +// generatePyramidSequence(42, G(PrimesWithMultiplierGenerator())) * 6'000, +// generatePyramidSequence(42, G(PrimesWithMultiplierGenerator())) * 6'000, +// generatePyramidSequence(42, G(PrimesWithMultiplierGenerator())) * 6'000, +// generatePyramidSequence(42, G(PrimesWithMultiplierGenerator())) * 6'000, +// generatePyramidSequence(42, G(PrimesWithMultiplierGenerator())) * 6'000 +// ) +// ), +//); + } diff --git a/dbms/src/Core/Settings.cpp b/dbms/src/Core/Settings.cpp index 717261e298d..fe6e0c18b85 100644 --- a/dbms/src/Core/Settings.cpp +++ b/dbms/src/Core/Settings.cpp @@ -37,7 +37,7 @@ void Settings::setProfile(const String & profile_name, const Poco::Util::Abstrac { if (key == "constraints") continue; - if (key == "profile") /// Inheritance of one profile from another. + if (key == "profile" || 0 == key.compare(0, strlen("profile["), "profile[")) /// Inheritance of profiles from the current one. setProfile(config.getString(elem + "." + key), config); else set(key, config.getString(elem + "." + key)); diff --git a/dbms/src/Core/Settings.h b/dbms/src/Core/Settings.h index db1b3e6da59..724b31ca642 100644 --- a/dbms/src/Core/Settings.h +++ b/dbms/src/Core/Settings.h @@ -127,12 +127,11 @@ struct Settings : public SettingsCollection M(SettingUInt64, optimize_min_equality_disjunction_chain_length, 3, "The minimum length of the expression `expr = x1 OR ... expr = xN` for optimization ", 0) \ \ M(SettingUInt64, min_bytes_to_use_direct_io, 0, "The minimum number of bytes for reading the data with O_DIRECT option during SELECT queries execution. 0 - disabled.", 0) \ + M(SettingUInt64, min_bytes_to_use_mmap_io, 0, "The minimum number of bytes for reading the data with mmap option during SELECT queries execution. 0 - disabled.", 0) \ \ M(SettingBool, force_index_by_date, 0, "Throw an exception if there is a partition key in a table, and it is not used.", 0) \ M(SettingBool, force_primary_key, 0, "Throw an exception if there is primary key in a table, and it is not used.", 0) \ \ - M(SettingUInt64, mark_cache_min_lifetime, 10000, "If the maximum size of mark_cache is exceeded, delete only records older than mark_cache_min_lifetime seconds.", 0) \ - \ M(SettingFloat, max_streams_to_max_threads_ratio, 1, "Allows you to use more sources than the number of threads - to more evenly distribute work across threads. It is assumed that this is a temporary solution, since it will be possible in the future to make the number of sources equal to the number of threads, but for each source to dynamically select available work for itself.", 0) \ M(SettingFloat, max_streams_multiplier_for_merge_tables, 5, "Ask more streams when reading from Merge table. Streams will be spread across tables that Merge table will use. This allows more even distribution of work across threads and especially helpful when merged tables differ in size.", 0) \ \ @@ -358,11 +357,8 @@ struct Settings : public SettingsCollection M(SettingBool, enable_unaligned_array_join, false, "Allow ARRAY JOIN with multiple arrays that have different sizes. When this settings is enabled, arrays will be resized to the longest one.", 0) \ M(SettingBool, optimize_read_in_order, true, "Enable ORDER BY optimization for reading data in corresponding order in MergeTree tables.", 0) \ M(SettingBool, low_cardinality_allow_in_native_format, true, "Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.", 0) \ - M(SettingBool, allow_experimental_multiple_joins_emulation, true, "Emulate multiple joins using subselects", 0) \ - M(SettingBool, allow_experimental_cross_to_join_conversion, true, "Convert CROSS JOIN to INNER JOIN if possible", 0) \ M(SettingBool, cancel_http_readonly_queries_on_client_close, false, "Cancel HTTP readonly queries when a client closes the connection without waiting for response.", 0) \ M(SettingBool, external_table_functions_use_nulls, true, "If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only by 'mysql' and 'odbc' table functions.", 0) \ - M(SettingBool, allow_experimental_data_skipping_indices, false, "If it is set to true, data skipping indices can be used in CREATE TABLE/ALTER TABLE queries.", 0) \ \ M(SettingBool, experimental_use_processors, false, "Use processors pipeline.", 0) \ \ @@ -386,14 +382,18 @@ struct Settings : public SettingsCollection M(SettingBool, enable_scalar_subquery_optimization, true, "If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once.", 0) \ M(SettingBool, optimize_trivial_count_query, true, "Process trivial 'SELECT count() FROM table' query from metadata.", 0) \ M(SettingUInt64, mutations_sync, 0, "Wait for synchronous execution of ALTER TABLE UPDATE/DELETE queries (mutations). 0 - execute asynchronously. 1 - wait current server. 2 - wait all replicas if they exist.", 0) \ + M(SettingBool, optimize_if_chain_to_miltiif, false, "Replace if(cond1, then1, if(cond2, ...)) chains to multiIf. Currently it's not beneficial for numeric types.", 0) \ \ /** Obsolete settings that do nothing but left for compatibility reasons. Remove each one after half a year of obsolescence. */ \ \ M(SettingBool, allow_experimental_low_cardinality_type, true, "Obsolete setting, does nothing. Will be removed after 2019-08-13", 0) \ - M(SettingBool, compile, false, "Obsolete setting, does nothing. Will be removed after 2020-03-13", 0) \ + M(SettingBool, compile, false, "Whether query compilation is enabled. Will be removed after 2020-03-13", 0) \ M(SettingUInt64, min_count_to_compile, 0, "Obsolete setting, does nothing. Will be removed after 2020-03-13", 0) \ + M(SettingBool, allow_experimental_multiple_joins_emulation, true, "Obsolete setting, does nothing. Will be removed after 2020-05-31", 0) \ + M(SettingBool, allow_experimental_cross_to_join_conversion, true, "Obsolete setting, does nothing. Will be removed after 2020-05-31", 0) \ + M(SettingBool, allow_experimental_data_skipping_indices, true, "Obsolete setting, does nothing. Will be removed after 2020-05-31", 0) \ M(SettingBool, merge_tree_uniform_read_distribution, true, "Obsolete setting, does nothing. Will be removed after 2020-05-20", 0) \ - + M(SettingUInt64, mark_cache_min_lifetime, 0, "Obsolete setting, does nothing. Will be removed after 2020-05-31", 0) \ DECLARE_SETTINGS_COLLECTION(LIST_OF_SETTINGS) diff --git a/dbms/src/Core/SortCursor.h b/dbms/src/Core/SortCursor.h index 5b4db43024f..84b788c8845 100644 --- a/dbms/src/Core/SortCursor.h +++ b/dbms/src/Core/SortCursor.h @@ -22,8 +22,8 @@ namespace DB */ struct SortCursorImpl { - ColumnRawPtrs all_columns; ColumnRawPtrs sort_columns; + ColumnRawPtrs all_columns; SortDescription desc; size_t sort_columns_size = 0; size_t pos = 0; @@ -110,21 +110,52 @@ using SortCursorImpls = std::vector; /// For easy copying. -struct SortCursor +template +struct SortCursorHelper { SortCursorImpl * impl; - SortCursor(SortCursorImpl * impl_) : impl(impl_) {} + const Derived & derived() const { return static_cast(*this); } + + SortCursorHelper(SortCursorImpl * impl_) : impl(impl_) {} SortCursorImpl * operator-> () { return impl; } const SortCursorImpl * operator-> () const { return impl; } + bool ALWAYS_INLINE greater(const SortCursorHelper & rhs) const + { + return derived().greaterAt(rhs.derived(), impl->pos, rhs.impl->pos); + } + + /// Inverted so that the priority queue elements are removed in ascending order. + bool ALWAYS_INLINE operator< (const SortCursorHelper & rhs) const + { + return derived().greater(rhs.derived()); + } + + /// Checks that all rows in the current block of this cursor are less than or equal to all the rows of the current block of another cursor. + bool ALWAYS_INLINE totallyLessOrEquals(const SortCursorHelper & rhs) const + { + if (impl->rows == 0 || rhs.impl->rows == 0) + return false; + + /// The last row of this cursor is no larger than the first row of the another cursor. + return !derived().greaterAt(rhs.derived(), impl->rows - 1, 0); + } +}; + + +struct SortCursor : SortCursorHelper +{ + using SortCursorHelper::SortCursorHelper; + /// The specified row of this cursor is greater than the specified row of another cursor. - bool greaterAt(const SortCursor & rhs, size_t lhs_pos, size_t rhs_pos) const + bool ALWAYS_INLINE greaterAt(const SortCursor & rhs, size_t lhs_pos, size_t rhs_pos) const { for (size_t i = 0; i < impl->sort_columns_size; ++i) { - int direction = impl->desc[i].direction; - int nulls_direction = impl->desc[i].nulls_direction; + const auto & desc = impl->desc[i]; + int direction = desc.direction; + int nulls_direction = desc.nulls_direction; int res = direction * impl->sort_columns[i]->compareAt(lhs_pos, rhs_pos, *(rhs.impl->sort_columns[i]), nulls_direction); if (res > 0) return true; @@ -133,45 +164,37 @@ struct SortCursor } return impl->order > rhs.impl->order; } +}; - /// Checks that all rows in the current block of this cursor are less than or equal to all the rows of the current block of another cursor. - bool totallyLessOrEquals(const SortCursor & rhs) const + +/// For the case with a single column and when there is no order between different cursors. +struct SimpleSortCursor : SortCursorHelper +{ + using SortCursorHelper::SortCursorHelper; + + bool ALWAYS_INLINE greaterAt(const SimpleSortCursor & rhs, size_t lhs_pos, size_t rhs_pos) const { - if (impl->rows == 0 || rhs.impl->rows == 0) - return false; - - /// The last row of this cursor is no larger than the first row of the another cursor. - return !greaterAt(rhs, impl->rows - 1, 0); - } - - bool greater(const SortCursor & rhs) const - { - return greaterAt(rhs, impl->pos, rhs.impl->pos); - } - - /// Inverted so that the priority queue elements are removed in ascending order. - bool operator< (const SortCursor & rhs) const - { - return greater(rhs); + const auto & desc = impl->desc[0]; + int direction = desc.direction; + int nulls_direction = desc.nulls_direction; + int res = impl->sort_columns[0]->compareAt(lhs_pos, rhs_pos, *(rhs.impl->sort_columns[0]), nulls_direction); + return res != 0 && ((res > 0) == (direction > 0)); } }; /// Separate comparator for locale-sensitive string comparisons -struct SortCursorWithCollation +struct SortCursorWithCollation : SortCursorHelper { - SortCursorImpl * impl; + using SortCursorHelper::SortCursorHelper; - SortCursorWithCollation(SortCursorImpl * impl_) : impl(impl_) {} - SortCursorImpl * operator-> () { return impl; } - const SortCursorImpl * operator-> () const { return impl; } - - bool greaterAt(const SortCursorWithCollation & rhs, size_t lhs_pos, size_t rhs_pos) const + bool ALWAYS_INLINE greaterAt(const SortCursorWithCollation & rhs, size_t lhs_pos, size_t rhs_pos) const { for (size_t i = 0; i < impl->sort_columns_size; ++i) { - int direction = impl->desc[i].direction; - int nulls_direction = impl->desc[i].nulls_direction; + const auto & desc = impl->desc[i]; + int direction = desc.direction; + int nulls_direction = desc.nulls_direction; int res; if (impl->need_collation[i]) { @@ -189,29 +212,11 @@ struct SortCursorWithCollation } return impl->order > rhs.impl->order; } - - bool totallyLessOrEquals(const SortCursorWithCollation & rhs) const - { - if (impl->rows == 0 || rhs.impl->rows == 0) - return false; - - /// The last row of this cursor is no larger than the first row of the another cursor. - return !greaterAt(rhs, impl->rows - 1, 0); - } - - bool greater(const SortCursorWithCollation & rhs) const - { - return greaterAt(rhs, impl->pos, rhs.impl->pos); - } - - bool operator< (const SortCursorWithCollation & rhs) const - { - return greater(rhs); - } }; /** Allows to fetch data from multiple sort cursors in sorted order (merging sorted data streams). + * TODO: Replace with "Loser Tree", see https://en.wikipedia.org/wiki/K-way_merge_algorithm */ template class SortingHeap @@ -225,7 +230,8 @@ public: size_t size = cursors.size(); queue.reserve(size); for (size_t i = 0; i < size; ++i) - queue.emplace_back(&cursors[i]); + if (!cursors[i].empty()) + queue.emplace_back(&cursors[i]); std::make_heap(queue.begin(), queue.end()); } @@ -233,7 +239,11 @@ public: Cursor & current() { return queue.front(); } - void next() + size_t size() { return queue.size(); } + + Cursor & nextChild() { return queue[nextChildIndex()]; } + + void ALWAYS_INLINE next() { assert(isValid()); @@ -246,34 +256,67 @@ public: removeTop(); } + void replaceTop(Cursor new_top) + { + current() = new_top; + updateTop(); + } + + void removeTop() + { + std::pop_heap(queue.begin(), queue.end()); + queue.pop_back(); + next_idx = 0; + } + + void push(SortCursorImpl & cursor) + { + queue.emplace_back(&cursor); + std::push_heap(queue.begin(), queue.end()); + next_idx = 0; + } + private: using Container = std::vector; Container queue; + /// Cache comparison between first and second child if the order in queue has not been changed. + size_t next_idx = 0; + + size_t ALWAYS_INLINE nextChildIndex() + { + if (next_idx == 0) + { + next_idx = 1; + + if (queue.size() > 2 && queue[1] < queue[2]) + ++next_idx; + } + + return next_idx; + } + /// This is adapted version of the function __sift_down from libc++. /// Why cannot simply use std::priority_queue? /// - because it doesn't support updating the top element and requires pop and push instead. - void updateTop() + /// Also look at "Boost.Heap" library. + void ALWAYS_INLINE updateTop() { size_t size = queue.size(); if (size < 2) return; - size_t child_idx = 1; auto begin = queue.begin(); - auto child_it = begin + 1; - /// Right child exists and is greater than left child. - if (size > 2 && *child_it < *(child_it + 1)) - { - ++child_it; - ++child_idx; - } + size_t child_idx = nextChildIndex(); + auto child_it = begin + child_idx; /// Check if we are in order. if (*child_it < *begin) return; + next_idx = 0; + auto curr_it = begin; auto top(std::move(*begin)); do @@ -282,11 +325,12 @@ private: *curr_it = std::move(*child_it); curr_it = child_it; - if ((size - 2) / 2 < child_idx) - break; - // recompute the child based off of the updated parent child_idx = 2 * child_idx + 1; + + if (child_idx >= size) + break; + child_it = begin + child_idx; if ((child_idx + 1) < size && *child_it < *(child_it + 1)) @@ -300,12 +344,6 @@ private: } while (!(*child_it < top)); *curr_it = std::move(top); } - - void removeTop() - { - std::pop_heap(queue.begin(), queue.end()); - queue.pop_back(); - } }; } diff --git a/dbms/src/DataStreams/AggregatingSortedBlockInputStream.cpp b/dbms/src/DataStreams/AggregatingSortedBlockInputStream.cpp index 3607d1f917f..d23d93e7e5c 100644 --- a/dbms/src/DataStreams/AggregatingSortedBlockInputStream.cpp +++ b/dbms/src/DataStreams/AggregatingSortedBlockInputStream.cpp @@ -138,14 +138,14 @@ Block AggregatingSortedBlockInputStream::readImpl() } -void AggregatingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::priority_queue & queue) +void AggregatingSortedBlockInputStream::merge(MutableColumns & merged_columns, SortingHeap & queue) { size_t merged_rows = 0; /// We take the rows in the correct order and put them in `merged_block`, while the rows are no more than `max_block_size` - while (!queue.empty()) + while (queue.isValid()) { - SortCursor current = queue.top(); + SortCursor current = queue.current(); setPrimaryKeyRef(next_key, current); @@ -167,8 +167,6 @@ void AggregatingSortedBlockInputStream::merge(MutableColumns & merged_columns, s return; } - queue.pop(); - if (key_differs) { current_key.swap(next_key); @@ -202,8 +200,7 @@ void AggregatingSortedBlockInputStream::merge(MutableColumns & merged_columns, s if (!current->isLast()) { - current->next(); - queue.push(current); + queue.next(); } else { diff --git a/dbms/src/DataStreams/AggregatingSortedBlockInputStream.h b/dbms/src/DataStreams/AggregatingSortedBlockInputStream.h index 0cf4bd64d87..6ef1259e458 100644 --- a/dbms/src/DataStreams/AggregatingSortedBlockInputStream.h +++ b/dbms/src/DataStreams/AggregatingSortedBlockInputStream.h @@ -55,7 +55,7 @@ private: /** We support two different cursors - with Collation and without. * Templates are used instead of polymorphic SortCursor and calls to virtual functions. */ - void merge(MutableColumns & merged_columns, std::priority_queue & queue); + void merge(MutableColumns & merged_columns, SortingHeap & queue); /** Extract all states of aggregate functions and merge them with the current group. */ diff --git a/dbms/src/DataStreams/CollapsingSortedBlockInputStream.cpp b/dbms/src/DataStreams/CollapsingSortedBlockInputStream.cpp index 7e4ad04b806..ef82a6d8c5e 100644 --- a/dbms/src/DataStreams/CollapsingSortedBlockInputStream.cpp +++ b/dbms/src/DataStreams/CollapsingSortedBlockInputStream.cpp @@ -105,15 +105,15 @@ Block CollapsingSortedBlockInputStream::readImpl() } -void CollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::priority_queue & queue) +void CollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, SortingHeap & queue) { MergeStopCondition stop_condition(average_block_sizes, max_block_size); size_t current_block_granularity; /// Take rows in correct order and put them into `merged_columns` until the rows no more than `max_block_size` - for (; !queue.empty(); ++current_pos) + for (; queue.isValid(); ++current_pos) { - SortCursor current = queue.top(); + SortCursor current = queue.current(); current_block_granularity = current->rows; if (current_key.empty()) @@ -131,8 +131,6 @@ void CollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, st return; } - queue.pop(); - if (key_differs) { /// We write data for the previous primary key. @@ -185,8 +183,7 @@ void CollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, st if (!current->isLast()) { - current->next(); - queue.push(current); + queue.next(); } else { diff --git a/dbms/src/DataStreams/CollapsingSortedBlockInputStream.h b/dbms/src/DataStreams/CollapsingSortedBlockInputStream.h index 7e114e614f6..2b528d27339 100644 --- a/dbms/src/DataStreams/CollapsingSortedBlockInputStream.h +++ b/dbms/src/DataStreams/CollapsingSortedBlockInputStream.h @@ -73,7 +73,7 @@ private: /** We support two different cursors - with Collation and without. * Templates are used instead of polymorphic SortCursors and calls to virtual functions. */ - void merge(MutableColumns & merged_columns, std::priority_queue & queue); + void merge(MutableColumns & merged_columns, SortingHeap & queue); /// Output to result rows for the current primary key. void insertRows(MutableColumns & merged_columns, size_t block_size, MergeStopCondition & condition); diff --git a/dbms/src/DataStreams/GraphiteRollupSortedBlockInputStream.cpp b/dbms/src/DataStreams/GraphiteRollupSortedBlockInputStream.cpp index 340e10df12f..64a0d52c1aa 100644 --- a/dbms/src/DataStreams/GraphiteRollupSortedBlockInputStream.cpp +++ b/dbms/src/DataStreams/GraphiteRollupSortedBlockInputStream.cpp @@ -161,7 +161,7 @@ Block GraphiteRollupSortedBlockInputStream::readImpl() } -void GraphiteRollupSortedBlockInputStream::merge(MutableColumns & merged_columns, std::priority_queue & queue) +void GraphiteRollupSortedBlockInputStream::merge(MutableColumns & merged_columns, SortingHeap & queue) { const DateLUTImpl & date_lut = DateLUT::instance(); @@ -173,9 +173,9 @@ void GraphiteRollupSortedBlockInputStream::merge(MutableColumns & merged_columns /// contribute towards current output row. /// Variables starting with next_* refer to the row at the top of the queue. - while (!queue.empty()) + while (queue.isValid()) { - SortCursor next_cursor = queue.top(); + SortCursor next_cursor = queue.current(); StringRef next_path = next_cursor->all_columns[path_column_num]->getDataAt(next_cursor->pos); bool new_path = is_first || next_path != current_group_path; @@ -253,12 +253,9 @@ void GraphiteRollupSortedBlockInputStream::merge(MutableColumns & merged_columns current_group_path = next_path; } - queue.pop(); - if (!next_cursor->isLast()) { - next_cursor->next(); - queue.push(next_cursor); + queue.next(); } else { diff --git a/dbms/src/DataStreams/GraphiteRollupSortedBlockInputStream.h b/dbms/src/DataStreams/GraphiteRollupSortedBlockInputStream.h index 533c267ff02..0dfdf7c300c 100644 --- a/dbms/src/DataStreams/GraphiteRollupSortedBlockInputStream.h +++ b/dbms/src/DataStreams/GraphiteRollupSortedBlockInputStream.h @@ -225,7 +225,7 @@ private: UInt32 selectPrecision(const Graphite::Retentions & retentions, time_t time) const; - void merge(MutableColumns & merged_columns, std::priority_queue & queue); + void merge(MutableColumns & merged_columns, SortingHeap & queue); /// Insert the values into the resulting columns, which will not be changed in the future. template diff --git a/dbms/src/DataStreams/MergeSortingBlockInputStream.cpp b/dbms/src/DataStreams/MergeSortingBlockInputStream.cpp index 1c50316fc3f..52f85f1349c 100644 --- a/dbms/src/DataStreams/MergeSortingBlockInputStream.cpp +++ b/dbms/src/DataStreams/MergeSortingBlockInputStream.cpp @@ -150,10 +150,12 @@ MergeSortingBlocksBlockInputStream::MergeSortingBlocksBlockInputStream( blocks.swap(nonempty_blocks); - if (!has_collation) + if (has_collation) + queue_with_collation = SortingHeap(cursors); + else if (description.size() > 1) queue_without_collation = SortingHeap(cursors); else - queue_with_collation = SortingHeap(cursors); + queue_simple = SortingHeap(cursors); } @@ -169,9 +171,12 @@ Block MergeSortingBlocksBlockInputStream::readImpl() return res; } - return !has_collation - ? mergeImpl(queue_without_collation) - : mergeImpl(queue_with_collation); + if (has_collation) + return mergeImpl(queue_with_collation); + else if (description.size() > 1) + return mergeImpl(queue_without_collation); + else + return mergeImpl(queue_simple); } @@ -179,9 +184,18 @@ template Block MergeSortingBlocksBlockInputStream::mergeImpl(TSortingHeap & queue) { size_t num_columns = header.columns(); - MutableColumns merged_columns = header.cloneEmptyColumns(); - /// TODO: reserve (in each column) + + /// Reserve + if (queue.isValid() && !blocks.empty()) + { + /// The expected size of output block is the same as input block + size_t size_to_reserve = blocks[0].rows(); + for (auto & column : merged_columns) + column->reserve(size_to_reserve); + } + + /// TODO: Optimization when a single block left. /// Take rows from queue in right order and push to 'merged'. size_t merged_rows = 0; @@ -210,6 +224,9 @@ Block MergeSortingBlocksBlockInputStream::mergeImpl(TSortingHeap & queue) break; } + if (!queue.isValid()) + blocks.clear(); + if (merged_rows == 0) return {}; diff --git a/dbms/src/DataStreams/MergeSortingBlockInputStream.h b/dbms/src/DataStreams/MergeSortingBlockInputStream.h index 9492bdb074b..ce82f6bb120 100644 --- a/dbms/src/DataStreams/MergeSortingBlockInputStream.h +++ b/dbms/src/DataStreams/MergeSortingBlockInputStream.h @@ -59,6 +59,7 @@ private: bool has_collation = false; SortingHeap queue_without_collation; + SortingHeap queue_simple; SortingHeap queue_with_collation; /** Two different cursors are supported - with and without Collation. diff --git a/dbms/src/DataStreams/MergingSortedBlockInputStream.cpp b/dbms/src/DataStreams/MergingSortedBlockInputStream.cpp index 8c0707e09b0..3614d9c1d66 100644 --- a/dbms/src/DataStreams/MergingSortedBlockInputStream.cpp +++ b/dbms/src/DataStreams/MergingSortedBlockInputStream.cpp @@ -59,9 +59,9 @@ void MergingSortedBlockInputStream::init(MutableColumns & merged_columns) } if (has_collation) - initQueue(queue_with_collation); + queue_with_collation = SortingHeap(cursors); else - initQueue(queue_without_collation); + queue_without_collation = SortingHeap(cursors); } /// Let's check that all source blocks have the same structure. @@ -82,15 +82,6 @@ void MergingSortedBlockInputStream::init(MutableColumns & merged_columns) } -template -void MergingSortedBlockInputStream::initQueue(std::priority_queue & queue) -{ - for (size_t i = 0; i < cursors.size(); ++i) - if (!cursors[i].empty()) - queue.push(TSortCursor(&cursors[i])); -} - - Block MergingSortedBlockInputStream::readImpl() { if (finished) @@ -115,7 +106,7 @@ Block MergingSortedBlockInputStream::readImpl() template -void MergingSortedBlockInputStream::fetchNextBlock(const TSortCursor & current, std::priority_queue & queue) +void MergingSortedBlockInputStream::fetchNextBlock(const TSortCursor & current, SortingHeap & queue) { size_t order = current->order; size_t size = cursors.size(); @@ -125,15 +116,19 @@ void MergingSortedBlockInputStream::fetchNextBlock(const TSortCursor & current, while (true) { - source_blocks[order] = new detail::SharedBlock(children[order]->read()); + source_blocks[order] = new detail::SharedBlock(children[order]->read()); /// intrusive ptr if (!*source_blocks[order]) + { + queue.removeTop(); break; + } if (source_blocks[order]->rows()) { cursors[order].reset(*source_blocks[order]); - queue.push(TSortCursor(&cursors[order])); + queue.replaceTop(&cursors[order]); + source_blocks[order]->all_columns = cursors[order].all_columns; source_blocks[order]->sort_columns = cursors[order].sort_columns; break; @@ -154,19 +149,14 @@ bool MergingSortedBlockInputStream::MergeStopCondition::checkStop() const return sum_rows_count >= average; } -template -void MergingSortedBlockInputStream::fetchNextBlock(const SortCursor & current, std::priority_queue & queue); -template -void MergingSortedBlockInputStream::fetchNextBlock(const SortCursorWithCollation & current, std::priority_queue & queue); - - -template -void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::priority_queue & queue) +template +void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, TSortingHeap & queue) { size_t merged_rows = 0; MergeStopCondition stop_condition(average_block_sizes, max_block_size); + /** Increase row counters. * Return true if it's time to finish generating the current data block. */ @@ -186,123 +176,100 @@ void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, std:: return stop_condition.checkStop(); }; - /// Take rows in required order and put them into `merged_columns`, while the rows are no more than `max_block_size` - while (!queue.empty()) + /// Take rows in required order and put them into `merged_columns`, while the number of rows are no more than `max_block_size` + while (queue.isValid()) { - TSortCursor current = queue.top(); + auto current = queue.current(); size_t current_block_granularity = current->rows; - queue.pop(); - while (true) + /** And what if the block is totally less or equal than the rest for the current cursor? + * Or is there only one data source left in the queue? Then you can take the entire block on current cursor. + */ + if (current->isFirst() + && (queue.size() == 1 + || (queue.size() >= 2 && current.totallyLessOrEquals(queue.nextChild())))) { - /** And what if the block is totally less or equal than the rest for the current cursor? - * Or is there only one data source left in the queue? Then you can take the entire block on current cursor. - */ - if (current->isFirst() && (queue.empty() || current.totallyLessOrEquals(queue.top()))) +// std::cerr << "current block is totally less or equals\n"; + + /// If there are already data in the current block, we first return it. We'll get here again the next time we call the merge function. + if (merged_rows != 0) { - // std::cerr << "current block is totally less or equals\n"; - - /// If there are already data in the current block, we first return it. We'll get here again the next time we call the merge function. - if (merged_rows != 0) - { - //std::cerr << "merged rows is non-zero\n"; - queue.push(current); - return; - } - - /// Actually, current->order stores source number (i.e. cursors[current->order] == current) - size_t source_num = current->order; - - if (source_num >= cursors.size()) - throw Exception("Logical error in MergingSortedBlockInputStream", ErrorCodes::LOGICAL_ERROR); - - for (size_t i = 0; i < num_columns; ++i) - merged_columns[i] = (*std::move(source_blocks[source_num]->getByPosition(i).column)).mutate(); - - // std::cerr << "copied columns\n"; - - merged_rows = merged_columns.at(0)->size(); - - /// Limit output - if (limit && total_merged_rows + merged_rows > limit) - { - merged_rows = limit - total_merged_rows; - for (size_t i = 0; i < num_columns; ++i) - { - auto & column = merged_columns[i]; - column = (*column->cut(0, merged_rows)).mutate(); - } - - cancel(false); - finished = true; - } - - /// Write order of rows for other columns - /// this data will be used in grather stream - if (out_row_sources_buf) - { - RowSourcePart row_source(source_num); - for (size_t i = 0; i < merged_rows; ++i) - out_row_sources_buf->write(row_source.data); - } - - //std::cerr << "fetching next block\n"; - - total_merged_rows += merged_rows; - fetchNextBlock(current, queue); + //std::cerr << "merged rows is non-zero\n"; return; } - // std::cerr << "total_merged_rows: " << total_merged_rows << ", merged_rows: " << merged_rows << "\n"; - // std::cerr << "Inserting row\n"; - for (size_t i = 0; i < num_columns; ++i) - merged_columns[i]->insertFrom(*current->all_columns[i], current->pos); + /// Actually, current->order stores source number (i.e. cursors[current->order] == current) + size_t source_num = current->order; + if (source_num >= cursors.size()) + throw Exception("Logical error in MergingSortedBlockInputStream", ErrorCodes::LOGICAL_ERROR); + + for (size_t i = 0; i < num_columns; ++i) + merged_columns[i] = (*std::move(source_blocks[source_num]->getByPosition(i).column)).mutate(); + +// std::cerr << "copied columns\n"; + + merged_rows = merged_columns.at(0)->size(); + + /// Limit output + if (limit && total_merged_rows + merged_rows > limit) + { + merged_rows = limit - total_merged_rows; + for (size_t i = 0; i < num_columns; ++i) + { + auto & column = merged_columns[i]; + column = (*column->cut(0, merged_rows)).mutate(); + } + + cancel(false); + finished = true; + } + + /// Write order of rows for other columns + /// this data will be used in grather stream if (out_row_sources_buf) { - /// Actually, current.impl->order stores source number (i.e. cursors[current.impl->order] == current.impl) - RowSourcePart row_source(current->order); - out_row_sources_buf->write(row_source.data); + RowSourcePart row_source(source_num); + for (size_t i = 0; i < merged_rows; ++i) + out_row_sources_buf->write(row_source.data); } - if (!current->isLast()) - { - // std::cerr << "moving to next row\n"; - current->next(); + //std::cerr << "fetching next block\n"; - if (queue.empty() || !(current.greater(queue.top()))) - { - if (count_row_and_check_limit(current_block_granularity)) - { - // std::cerr << "pushing back to queue\n"; - queue.push(current); - return; - } + total_merged_rows += merged_rows; + fetchNextBlock(current, queue); + return; + } - /// Do not put the cursor back in the queue, but continue to work with the current cursor. - // std::cerr << "current is still on top, using current row\n"; - continue; - } - else - { - // std::cerr << "next row is not least, pushing back to queue\n"; - queue.push(current); - } - } - else - { - /// We get the next block from the corresponding source, if there is one. - // std::cerr << "It was last row, fetching next block\n"; - fetchNextBlock(current, queue); - } +// std::cerr << "total_merged_rows: " << total_merged_rows << ", merged_rows: " << merged_rows << "\n"; +// std::cerr << "Inserting row\n"; + for (size_t i = 0; i < num_columns; ++i) + merged_columns[i]->insertFrom(*current->all_columns[i], current->pos); - break; + if (out_row_sources_buf) + { + /// Actually, current.impl->order stores source number (i.e. cursors[current.impl->order] == current.impl) + RowSourcePart row_source(current->order); + out_row_sources_buf->write(row_source.data); + } + + if (!current->isLast()) + { +// std::cerr << "moving to next row\n"; + queue.next(); + } + else + { + /// We get the next block from the corresponding source, if there is one. +// std::cerr << "It was last row, fetching next block\n"; + fetchNextBlock(current, queue); } if (count_row_and_check_limit(current_block_granularity)) return; } + /// We have read all data. Ask childs to cancel providing more data. cancel(false); finished = true; } diff --git a/dbms/src/DataStreams/MergingSortedBlockInputStream.h b/dbms/src/DataStreams/MergingSortedBlockInputStream.h index beb3c7afc52..e6c2b257013 100644 --- a/dbms/src/DataStreams/MergingSortedBlockInputStream.h +++ b/dbms/src/DataStreams/MergingSortedBlockInputStream.h @@ -1,7 +1,5 @@ #pragma once -#include - #include #include @@ -87,7 +85,7 @@ protected: /// Gets the next block from the source corresponding to the `current`. template - void fetchNextBlock(const TSortCursor & current, std::priority_queue & queue); + void fetchNextBlock(const TSortCursor & current, SortingHeap & queue); Block header; @@ -109,14 +107,10 @@ protected: size_t num_columns = 0; std::vector source_blocks; - using CursorImpls = std::vector; - CursorImpls cursors; + SortCursorImpls cursors; - using Queue = std::priority_queue; - Queue queue_without_collation; - - using QueueWithCollation = std::priority_queue; - QueueWithCollation queue_with_collation; + SortingHeap queue_without_collation; + SortingHeap queue_with_collation; /// Used in Vertical merge algorithm to gather non-PK/non-index columns (on next step) /// If it is not nullptr then it should be populated during execution @@ -177,13 +171,10 @@ protected: private: /** We support two different cursors - with Collation and without. - * Templates are used instead of polymorphic SortCursor and calls to virtual functions. - */ - template - void initQueue(std::priority_queue & queue); - - template - void merge(MutableColumns & merged_columns, std::priority_queue & queue); + * Templates are used instead of polymorphic SortCursor and calls to virtual functions. + */ + template + void merge(MutableColumns & merged_columns, TSortingHeap & queue); Logger * log = &Logger::get("MergingSortedBlockInputStream"); diff --git a/dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp b/dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp index d7fb7bad343..3d5fb426218 100644 --- a/dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp +++ b/dbms/src/DataStreams/PushingToViewsBlockOutputStream.cpp @@ -129,7 +129,7 @@ void PushingToViewsBlockOutputStream::write(const Block & block) for (size_t view_num = 0; view_num < views.size(); ++view_num) { auto thread_group = CurrentThread::getGroup(); - pool.scheduleOrThrowOnError([=] + pool.scheduleOrThrowOnError([=, this] { setThreadName("PushingToViews"); if (thread_group) diff --git a/dbms/src/DataStreams/ReplacingSortedBlockInputStream.cpp b/dbms/src/DataStreams/ReplacingSortedBlockInputStream.cpp index e2e99815b93..967b4ebb046 100644 --- a/dbms/src/DataStreams/ReplacingSortedBlockInputStream.cpp +++ b/dbms/src/DataStreams/ReplacingSortedBlockInputStream.cpp @@ -48,13 +48,14 @@ Block ReplacingSortedBlockInputStream::readImpl() } -void ReplacingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::priority_queue & queue) +void ReplacingSortedBlockInputStream::merge(MutableColumns & merged_columns, SortingHeap & queue) { MergeStopCondition stop_condition(average_block_sizes, max_block_size); + /// Take the rows in needed order and put them into `merged_columns` until rows no more than `max_block_size` - while (!queue.empty()) + while (queue.isValid()) { - SortCursor current = queue.top(); + SortCursor current = queue.current(); size_t current_block_granularity = current->rows; if (current_key.empty()) @@ -68,8 +69,6 @@ void ReplacingSortedBlockInputStream::merge(MutableColumns & merged_columns, std if (key_differs && stop_condition.checkStop()) return; - queue.pop(); - if (key_differs) { /// Write the data for the previous primary key. @@ -98,8 +97,7 @@ void ReplacingSortedBlockInputStream::merge(MutableColumns & merged_columns, std if (!current->isLast()) { - current->next(); - queue.push(current); + queue.next(); } else { diff --git a/dbms/src/DataStreams/ReplacingSortedBlockInputStream.h b/dbms/src/DataStreams/ReplacingSortedBlockInputStream.h index 7d85542520d..22920c2eb20 100644 --- a/dbms/src/DataStreams/ReplacingSortedBlockInputStream.h +++ b/dbms/src/DataStreams/ReplacingSortedBlockInputStream.h @@ -52,7 +52,7 @@ private: /// Sources of rows with the current primary key. PODArray current_row_sources; - void merge(MutableColumns & merged_columns, std::priority_queue & queue); + void merge(MutableColumns & merged_columns, SortingHeap & queue); /// Output into result the rows for current primary key. void insertRow(MutableColumns & merged_columns); diff --git a/dbms/src/DataStreams/SummingSortedBlockInputStream.cpp b/dbms/src/DataStreams/SummingSortedBlockInputStream.cpp index 9ac7d6a3397..fe29dc55916 100644 --- a/dbms/src/DataStreams/SummingSortedBlockInputStream.cpp +++ b/dbms/src/DataStreams/SummingSortedBlockInputStream.cpp @@ -314,14 +314,14 @@ Block SummingSortedBlockInputStream::readImpl() } -void SummingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::priority_queue & queue) +void SummingSortedBlockInputStream::merge(MutableColumns & merged_columns, SortingHeap & queue) { merged_rows = 0; /// Take the rows in needed order and put them in `merged_columns` until rows no more than `max_block_size` - while (!queue.empty()) + while (queue.isValid()) { - SortCursor current = queue.top(); + SortCursor current = queue.current(); setPrimaryKeyRef(next_key, current); @@ -383,12 +383,9 @@ void SummingSortedBlockInputStream::merge(MutableColumns & merged_columns, std:: current_row_is_zero = false; } - queue.pop(); - if (!current->isLast()) { - current->next(); - queue.push(current); + queue.next(); } else { diff --git a/dbms/src/DataStreams/SummingSortedBlockInputStream.h b/dbms/src/DataStreams/SummingSortedBlockInputStream.h index 4412e5529f8..fc02d36d3fd 100644 --- a/dbms/src/DataStreams/SummingSortedBlockInputStream.h +++ b/dbms/src/DataStreams/SummingSortedBlockInputStream.h @@ -1,5 +1,7 @@ #pragma once +#include + #include #include #include @@ -140,7 +142,7 @@ private: /** We support two different cursors - with Collation and without. * Templates are used instead of polymorphic SortCursor and calls to virtual functions. */ - void merge(MutableColumns & merged_columns, std::priority_queue & queue); + void merge(MutableColumns & merged_columns, SortingHeap & queue); /// Insert the summed row for the current group into the result and updates some of per-block flags if the row is not "zero". void insertCurrentRowIfNeeded(MutableColumns & merged_columns); diff --git a/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.cpp b/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.cpp index 4dda97597bd..de6f7027243 100644 --- a/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.cpp +++ b/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.cpp @@ -82,21 +82,18 @@ Block VersionedCollapsingSortedBlockInputStream::readImpl() } -void VersionedCollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::priority_queue & queue) +void VersionedCollapsingSortedBlockInputStream::merge(MutableColumns & merged_columns, SortingHeap & queue) { MergeStopCondition stop_condition(average_block_sizes, max_block_size); auto update_queue = [this, & queue](SortCursor & cursor) { - queue.pop(); - if (out_row_sources_buf) current_row_sources.emplace(cursor->order, true); if (!cursor->isLast()) { - cursor->next(); - queue.push(cursor); + queue.next(); } else { @@ -106,9 +103,9 @@ void VersionedCollapsingSortedBlockInputStream::merge(MutableColumns & merged_co }; /// Take rows in correct order and put them into `merged_columns` until the rows no more than `max_block_size` - while (!queue.empty()) + while (queue.isValid()) { - SortCursor current = queue.top(); + SortCursor current = queue.current(); size_t current_block_granularity = current->rows; SharedBlockRowRef next_key; diff --git a/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.h b/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.h index f79b564063d..c64972d9266 100644 --- a/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.h +++ b/dbms/src/DataStreams/VersionedCollapsingSortedBlockInputStream.h @@ -5,7 +5,7 @@ #include #include -#include +#include namespace DB @@ -204,7 +204,7 @@ private: /// Sources of rows for VERTICAL merge algorithm. Size equals to (size + number of gaps) in current_keys. std::queue current_row_sources; - void merge(MutableColumns & merged_columns, std::priority_queue & queue); + void merge(MutableColumns & merged_columns, SortingHeap & queue); /// Output to result row for the current primary key. void insertRow(size_t skip_rows, const SharedBlockRowRef & row, MutableColumns & merged_columns); diff --git a/dbms/src/DataStreams/tests/union_stream2.cpp b/dbms/src/DataStreams/tests/union_stream2.cpp index 3eb1927f80a..ab0b583b8e5 100644 --- a/dbms/src/DataStreams/tests/union_stream2.cpp +++ b/dbms/src/DataStreams/tests/union_stream2.cpp @@ -57,6 +57,6 @@ catch (const Exception & e) std::cerr << e.what() << ", " << e.displayText() << std::endl << std::endl << "Stack trace:" << std::endl - << e.getStackTrace().toString(); + << e.getStackTraceString(); return 1; } diff --git a/dbms/src/DataTypes/DataTypeCustomSimpleAggregateFunction.cpp b/dbms/src/DataTypes/DataTypeCustomSimpleAggregateFunction.cpp index 21e5200c1dd..fecf09f08f7 100644 --- a/dbms/src/DataTypes/DataTypeCustomSimpleAggregateFunction.cpp +++ b/dbms/src/DataTypes/DataTypeCustomSimpleAggregateFunction.cpp @@ -30,7 +30,7 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; } -static const std::vector supported_functions{"any", "anyLast", "min", "max", "sum"}; +static const std::vector supported_functions{"any", "anyLast", "min", "max", "sum", "groupBitAnd", "groupBitOr", "groupBitXor"}; String DataTypeCustomSimpleAggregateFunction::getName() const diff --git a/dbms/src/Databases/DatabaseDictionary.cpp b/dbms/src/Databases/DatabaseDictionary.cpp index 9299a75ad37..9409fdc584a 100644 --- a/dbms/src/Databases/DatabaseDictionary.cpp +++ b/dbms/src/Databases/DatabaseDictionary.cpp @@ -23,12 +23,8 @@ namespace ErrorCodes } DatabaseDictionary::DatabaseDictionary(const String & name_) - : name(name_), - log(&Logger::get("DatabaseDictionary(" + name + ")")) -{ -} - -void DatabaseDictionary::loadStoredObjects(Context &, bool) + : IDatabase(name_), + log(&Logger::get("DatabaseDictionary(" + database_name + ")")) { } @@ -69,65 +65,6 @@ bool DatabaseDictionary::isTableExist( return context.getExternalDictionariesLoader().getCurrentStatus(table_name) != ExternalLoader::Status::NOT_EXIST; } - -bool DatabaseDictionary::isDictionaryExist( - const Context & /*context*/, - const String & /*table_name*/) const -{ - return false; -} - - -DatabaseDictionariesIteratorPtr DatabaseDictionary::getDictionariesIterator( - const Context & /*context*/, - const FilterByNameFunction & /*filter_by_dictionary_name*/) -{ - return std::make_unique(); -} - - -void DatabaseDictionary::createDictionary( - const Context & /*context*/, - const String & /*dictionary_name*/, - const ASTPtr & /*query*/) -{ - throw Exception("Dictionary engine doesn't support dictionaries.", ErrorCodes::UNSUPPORTED_METHOD); -} - -void DatabaseDictionary::removeDictionary( - const Context & /*context*/, - const String & /*table_name*/) -{ - throw Exception("Dictionary engine doesn't support dictionaries.", ErrorCodes::UNSUPPORTED_METHOD); -} - -void DatabaseDictionary::attachDictionary( - const String & /*dictionary_name*/, const Context & /*context*/) -{ - throw Exception("Dictionary engine doesn't support dictionaries.", ErrorCodes::UNSUPPORTED_METHOD); -} - -void DatabaseDictionary::detachDictionary(const String & /*dictionary_name*/, const Context & /*context*/) -{ - throw Exception("Dictionary engine doesn't support dictionaries.", ErrorCodes::UNSUPPORTED_METHOD); -} - - -ASTPtr DatabaseDictionary::tryGetCreateDictionaryQuery( - const Context & /*context*/, - const String & /*table_name*/) const -{ - return nullptr; -} - - -ASTPtr DatabaseDictionary::getCreateDictionaryQuery( - const Context & /*context*/, - const String & /*table_name*/) const -{ - throw Exception("Dictionary engine doesn't support dictionaries.", ErrorCodes::UNSUPPORTED_METHOD); -} - StoragePtr DatabaseDictionary::tryGetTable( const Context & context, const String & table_name) const @@ -153,39 +90,6 @@ bool DatabaseDictionary::empty(const Context & context) const return !context.getExternalDictionariesLoader().hasCurrentlyLoadedObjects(); } -StoragePtr DatabaseDictionary::detachTable(const String & /*table_name*/) -{ - throw Exception("DatabaseDictionary: detachTable() is not supported", ErrorCodes::NOT_IMPLEMENTED); -} - -void DatabaseDictionary::attachTable(const String & /*table_name*/, const StoragePtr & /*table*/) -{ - throw Exception("DatabaseDictionary: attachTable() is not supported", ErrorCodes::NOT_IMPLEMENTED); -} - -void DatabaseDictionary::createTable( - const Context &, - const String &, - const StoragePtr &, - const ASTPtr &) -{ - throw Exception("DatabaseDictionary: createTable() is not supported", ErrorCodes::NOT_IMPLEMENTED); -} - -void DatabaseDictionary::removeTable( - const Context &, - const String &) -{ - throw Exception("DatabaseDictionary: removeTable() is not supported", ErrorCodes::NOT_IMPLEMENTED); -} - -time_t DatabaseDictionary::getObjectMetadataModificationTime( - const Context &, - const String &) -{ - return static_cast(0); -} - ASTPtr DatabaseDictionary::getCreateTableQueryImpl(const Context & context, const String & table_name, bool throw_on_error) const { @@ -196,9 +100,11 @@ ASTPtr DatabaseDictionary::getCreateTableQueryImpl(const Context & context, const auto & dictionaries = context.getExternalDictionariesLoader(); auto dictionary = throw_on_error ? dictionaries.getDictionary(table_name) : dictionaries.tryGetDictionary(table_name); + if (!dictionary) + return {}; auto names_and_types = StorageDictionary::getNamesAndTypes(dictionary->getStructure()); - buffer << "CREATE TABLE " << backQuoteIfNeed(name) << '.' << backQuoteIfNeed(table_name) << " ("; + buffer << "CREATE TABLE " << backQuoteIfNeed(database_name) << '.' << backQuoteIfNeed(table_name) << " ("; buffer << StorageDictionary::generateNamesAndTypesDescription(names_and_types.begin(), names_and_types.end()); buffer << ") Engine = Dictionary(" << backQuoteIfNeed(table_name) << ")"; } @@ -215,22 +121,12 @@ ASTPtr DatabaseDictionary::getCreateTableQueryImpl(const Context & context, return ast; } -ASTPtr DatabaseDictionary::getCreateTableQuery(const Context & context, const String & table_name) const -{ - return getCreateTableQueryImpl(context, table_name, true); -} - -ASTPtr DatabaseDictionary::tryGetCreateTableQuery(const Context & context, const String & table_name) const -{ - return getCreateTableQueryImpl(context, table_name, false); -} - -ASTPtr DatabaseDictionary::getCreateDatabaseQuery(const Context & /*context*/) const +ASTPtr DatabaseDictionary::getCreateDatabaseQuery() const { String query; { WriteBufferFromString buffer(query); - buffer << "CREATE DATABASE " << backQuoteIfNeed(name) << " ENGINE = Dictionary"; + buffer << "CREATE DATABASE " << backQuoteIfNeed(database_name) << " ENGINE = Dictionary"; } ParserCreateQuery parser; return parseQuery(parser, query.data(), query.data() + query.size(), "", 0); @@ -240,9 +136,4 @@ void DatabaseDictionary::shutdown() { } -String DatabaseDictionary::getDatabaseName() const -{ - return name; -} - } diff --git a/dbms/src/Databases/DatabaseDictionary.h b/dbms/src/Databases/DatabaseDictionary.h index 64acdad8645..3155e12b862 100644 --- a/dbms/src/Databases/DatabaseDictionary.h +++ b/dbms/src/Databases/DatabaseDictionary.h @@ -24,85 +24,36 @@ class DatabaseDictionary : public IDatabase public: DatabaseDictionary(const String & name_); - String getDatabaseName() const override; - String getEngineName() const override { return "Dictionary"; } - void loadStoredObjects( - Context & context, - bool has_force_restore_data_flag) override; - bool isTableExist( const Context & context, const String & table_name) const override; - bool isDictionaryExist(const Context & context, const String & table_name) const override; - StoragePtr tryGetTable( const Context & context, const String & table_name) const override; DatabaseTablesIteratorPtr getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name = {}) override; - DatabaseDictionariesIteratorPtr getDictionariesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name = {}) override; - bool empty(const Context & context) const override; - void createTable( - const Context & context, - const String & table_name, - const StoragePtr & table, - const ASTPtr & query) override; - - void createDictionary( - const Context & context, const String & dictionary_name, const ASTPtr & query) override; - - void removeTable( - const Context & context, - const String & table_name) override; - - void removeDictionary(const Context & context, const String & table_name) override; - - void attachTable(const String & table_name, const StoragePtr & table) override; - - StoragePtr detachTable(const String & table_name) override; - - time_t getObjectMetadataModificationTime( - const Context & context, - const String & table_name) override; - - ASTPtr getCreateTableQuery( - const Context & context, - const String & table_name) const override; - - ASTPtr tryGetCreateTableQuery( - const Context & context, - const String & table_name) const override; - - ASTPtr getCreateDatabaseQuery(const Context & context) const override; - - ASTPtr getCreateDictionaryQuery(const Context & context, const String & table_name) const override; - - ASTPtr tryGetCreateDictionaryQuery(const Context & context, const String & table_name) const override; - - - void attachDictionary(const String & dictionary_name, const Context & context) override; - - void detachDictionary(const String & dictionary_name, const Context & context) override; + ASTPtr getCreateDatabaseQuery() const override; void shutdown() override; +protected: + ASTPtr getCreateTableQueryImpl(const Context & context, const String & table_name, bool throw_on_error) const override; + private: - const String name; mutable std::mutex mutex; Poco::Logger * log; Tables listTables(const Context & context, const FilterByNameFunction & filter_by_name); - ASTPtr getCreateTableQueryImpl(const Context & context, const String & table_name, bool throw_on_error) const; }; } diff --git a/dbms/src/Databases/DatabaseLazy.cpp b/dbms/src/Databases/DatabaseLazy.cpp index 5e1f2577367..fc71b3a63a7 100644 --- a/dbms/src/Databases/DatabaseLazy.cpp +++ b/dbms/src/Databases/DatabaseLazy.cpp @@ -3,7 +3,6 @@ #include #include #include -#include #include #include #include @@ -24,18 +23,14 @@ namespace ErrorCodes extern const int TABLE_ALREADY_EXISTS; extern const int UNKNOWN_TABLE; extern const int UNSUPPORTED_METHOD; - extern const int CANNOT_CREATE_TABLE_FROM_METADATA; extern const int LOGICAL_ERROR; } DatabaseLazy::DatabaseLazy(const String & name_, const String & metadata_path_, time_t expiration_time_, const Context & context_) - : name(name_) - , metadata_path(metadata_path_) - , data_path("data/" + escapeForFileName(name) + "/") + : DatabaseOnDisk(name_, metadata_path_, "DatabaseLazy (" + name_ + ")") , expiration_time(expiration_time_) - , log(&Logger::get("DatabaseLazy (" + name + ")")) { Poco::File(context_.getPath() + getDataPath()).createDirectories(); } @@ -45,7 +40,7 @@ void DatabaseLazy::loadStoredObjects( Context & context, bool /* has_force_restore_data_flag */) { - DatabaseOnDisk::iterateMetadataFiles(*this, log, context, [this](const String & file_name) + iterateMetadataFiles(context, [this](const String & file_name) { const std::string table_name = file_name.substr(0, file_name.size() - 4); attachTable(table_name, nullptr); @@ -62,75 +57,21 @@ void DatabaseLazy::createTable( SCOPE_EXIT({ clearExpiredTables(); }); if (!endsWith(table->getName(), "Log")) throw Exception("Lazy engine can be used only with *Log tables.", ErrorCodes::UNSUPPORTED_METHOD); - DatabaseOnDisk::createTable(*this, context, table_name, table, query); + DatabaseOnDisk::createTable(context, table_name, table, query); /// DatabaseOnDisk::createTable renames file, so we need to get new metadata_modification_time. - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); auto it = tables_cache.find(table_name); if (it != tables_cache.end()) - it->second.metadata_modification_time = DatabaseOnDisk::getObjectMetadataModificationTime(*this, table_name); + it->second.metadata_modification_time = DatabaseOnDisk::getObjectMetadataModificationTime(table_name); } - -void DatabaseLazy::createDictionary( - const Context & /*context*/, - const String & /*dictionary_name*/, - const ASTPtr & /*query*/) -{ - throw Exception("Lazy engine can be used only with *Log tables.", ErrorCodes::UNSUPPORTED_METHOD); -} - - void DatabaseLazy::removeTable( const Context & context, const String & table_name) { SCOPE_EXIT({ clearExpiredTables(); }); - DatabaseOnDisk::removeTable(*this, context, table_name, log); -} - -void DatabaseLazy::removeDictionary( - const Context & /*context*/, - const String & /*table_name*/) -{ - throw Exception("Lazy engine can be used only with *Log tables.", ErrorCodes::UNSUPPORTED_METHOD); -} - -ASTPtr DatabaseLazy::getCreateDictionaryQuery( - const Context & /*context*/, - const String & /*table_name*/) const -{ - throw Exception("Lazy engine can be used only with *Log tables.", ErrorCodes::UNSUPPORTED_METHOD); -} - -ASTPtr DatabaseLazy::tryGetCreateDictionaryQuery(const Context & /*context*/, const String & /*table_name*/) const -{ - return nullptr; -} - -bool DatabaseLazy::isDictionaryExist(const Context & /*context*/, const String & /*table_name*/) const -{ - return false; -} - - -DatabaseDictionariesIteratorPtr DatabaseLazy::getDictionariesIterator( - const Context & /*context*/, - const FilterByNameFunction & /*filter_by_dictionary_name*/) -{ - return std::make_unique(); -} - -void DatabaseLazy::attachDictionary( - const String & /*dictionary_name*/, - const Context & /*context*/) -{ - throw Exception("Lazy engine can be used only with *Log tables.", ErrorCodes::UNSUPPORTED_METHOD); -} - -void DatabaseLazy::detachDictionary(const String & /*dictionary_name*/, const Context & /*context*/) -{ - throw Exception("Lazy engine can be used only with *Log tables.", ErrorCodes::UNSUPPORTED_METHOD); + DatabaseOnDisk::removeTable(context, table_name); } void DatabaseLazy::renameTable( @@ -141,61 +82,34 @@ void DatabaseLazy::renameTable( TableStructureWriteLockHolder & lock) { SCOPE_EXIT({ clearExpiredTables(); }); - DatabaseOnDisk::renameTable(*this, context, table_name, to_database, to_table_name, lock); + DatabaseOnDisk::renameTable(context, table_name, to_database, to_table_name, lock); } -time_t DatabaseLazy::getObjectMetadataModificationTime( - const Context & /* context */, - const String & table_name) +time_t DatabaseLazy::getObjectMetadataModificationTime(const String & table_name) const { - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); auto it = tables_cache.find(table_name); if (it != tables_cache.end()) return it->second.metadata_modification_time; - else - throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); -} - -ASTPtr DatabaseLazy::getCreateTableQuery(const Context & context, const String & table_name) const -{ - return DatabaseOnDisk::getCreateTableQuery(*this, context, table_name); -} - -ASTPtr DatabaseLazy::tryGetCreateTableQuery(const Context & context, const String & table_name) const -{ - return DatabaseOnDisk::tryGetCreateTableQuery(*this, context, table_name); -} - -ASTPtr DatabaseLazy::getCreateDatabaseQuery(const Context & context) const -{ - return DatabaseOnDisk::getCreateDatabaseQuery(*this, context); + throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); } void DatabaseLazy::alterTable( const Context & /* context */, const String & /* table_name */, - const ColumnsDescription & /* columns */, - const IndicesDescription & /* indices */, - const ConstraintsDescription & /* constraints */, - const ASTModifier & /* storage_modifier */) + const StorageInMemoryMetadata & /* metadata */) { - SCOPE_EXIT({ clearExpiredTables(); }); + clearExpiredTables(); throw Exception("ALTER query is not supported for Lazy database.", ErrorCodes::UNSUPPORTED_METHOD); } - -void DatabaseLazy::drop(const Context & context) -{ - DatabaseOnDisk::drop(*this, context); -} - bool DatabaseLazy::isTableExist( const Context & /* context */, const String & table_name) const { SCOPE_EXIT({ clearExpiredTables(); }); - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); return tables_cache.find(table_name) != tables_cache.end(); } @@ -205,7 +119,7 @@ StoragePtr DatabaseLazy::tryGetTable( { SCOPE_EXIT({ clearExpiredTables(); }); { - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); auto it = tables_cache.find(table_name); if (it == tables_cache.end()) throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); @@ -225,7 +139,7 @@ StoragePtr DatabaseLazy::tryGetTable( DatabaseTablesIteratorPtr DatabaseLazy::getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name) { - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); Strings filtered_tables; for (const auto & [table_name, cached_table] : tables_cache) { @@ -244,12 +158,12 @@ bool DatabaseLazy::empty(const Context & /* context */) const void DatabaseLazy::attachTable(const String & table_name, const StoragePtr & table) { LOG_DEBUG(log, "Attach table " << backQuote(table_name) << "."); - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); time_t current_time = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()); auto [it, inserted] = tables_cache.emplace(std::piecewise_construct, std::forward_as_tuple(table_name), - std::forward_as_tuple(table, current_time, DatabaseOnDisk::getObjectMetadataModificationTime(*this, table_name))); + std::forward_as_tuple(table, current_time, DatabaseOnDisk::getObjectMetadataModificationTime(table_name))); if (!inserted) throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " already exists.", ErrorCodes::TABLE_ALREADY_EXISTS); @@ -261,7 +175,7 @@ StoragePtr DatabaseLazy::detachTable(const String & table_name) StoragePtr res; { LOG_DEBUG(log, "Detach table " << backQuote(table_name) << "."); - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); auto it = tables_cache.find(table_name); if (it == tables_cache.end()) throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); @@ -277,7 +191,7 @@ void DatabaseLazy::shutdown() { TablesCache tables_snapshot; { - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); tables_snapshot = tables_cache; } @@ -287,7 +201,7 @@ void DatabaseLazy::shutdown() kv.second.table->shutdown(); } - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); tables_cache.clear(); } @@ -303,26 +217,6 @@ DatabaseLazy::~DatabaseLazy() } } -String DatabaseLazy::getDataPath() const -{ - return data_path; -} - -String DatabaseLazy::getMetadataPath() const -{ - return metadata_path; -} - -String DatabaseLazy::getDatabaseName() const -{ - return name; -} - -String DatabaseLazy::getObjectMetadataPath(const String & table_name) const -{ - return DatabaseOnDisk::getObjectMetadataPath(*this, table_name); -} - StoragePtr DatabaseLazy::loadTable(const Context & context, const String & table_name) const { SCOPE_EXIT({ clearExpiredTables(); }); @@ -333,19 +227,21 @@ StoragePtr DatabaseLazy::loadTable(const Context & context, const String & table try { - String table_name_; StoragePtr table; Context context_copy(context); /// some tables can change context, but not LogTables - auto ast = parseCreateQueryFromMetadataFile(table_metadata_path, log); + auto ast = parseQueryFromMetadata(table_metadata_path, /*throw_on_error*/ true, /*remove_empty*/false); if (ast) - std::tie(table_name_, table) = createTableFromAST( - ast->as(), name, getDataPath(), context_copy, false); + { + auto & ast_create = ast->as(); + String table_data_path_relative = getTableDataPath(ast_create); + table = createTableFromAST(ast_create, database_name, table_data_path_relative, context_copy, false).second; + } if (!ast || !endsWith(table->getName(), "Log")) throw Exception("Only *Log tables can be used with Lazy database engine.", ErrorCodes::LOGICAL_ERROR); { - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); auto it = tables_cache.find(table_name); if (it == tables_cache.end()) throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); @@ -358,16 +254,16 @@ StoragePtr DatabaseLazy::loadTable(const Context & context, const String & table return it->second.table = table; } } - catch (const Exception & e) + catch (Exception & e) { - throw Exception("Cannot create table from metadata file " + table_metadata_path + ". Error: " + DB::getCurrentExceptionMessage(true), - e, DB::ErrorCodes::CANNOT_CREATE_TABLE_FROM_METADATA); + e.addMessage("Cannot create table from metadata file " + table_metadata_path); + throw; } } void DatabaseLazy::clearExpiredTables() const { - std::lock_guard lock(tables_mutex); + std::lock_guard lock(mutex); auto time_now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()); CacheExpirationQueue expired_tables; diff --git a/dbms/src/Databases/DatabaseLazy.h b/dbms/src/Databases/DatabaseLazy.h index 95605984a1c..8d1f20c068d 100644 --- a/dbms/src/Databases/DatabaseLazy.h +++ b/dbms/src/Databases/DatabaseLazy.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include @@ -15,7 +15,7 @@ class DatabaseLazyIterator; * Works like DatabaseOrdinary, but stores in memory only cache. * Can be used only with *Log engines. */ -class DatabaseLazy : public IDatabase +class DatabaseLazy : public DatabaseOnDisk { public: DatabaseLazy(const String & name_, const String & metadata_path_, time_t expiration_time_, const Context & context_); @@ -32,19 +32,10 @@ public: const StoragePtr & table, const ASTPtr & query) override; - void createDictionary( - const Context & context, - const String & dictionary_name, - const ASTPtr & query) override; - void removeTable( const Context & context, const String & table_name) override; - void removeDictionary( - const Context & context, - const String & table_name) override; - void renameTable( const Context & context, const String & table_name, @@ -55,48 +46,14 @@ public: void alterTable( const Context & context, const String & name, - const ColumnsDescription & columns, - const IndicesDescription & indices, - const ConstraintsDescription & constraints, - const ASTModifier & engine_modifier) override; + const StorageInMemoryMetadata & metadata) override; - time_t getObjectMetadataModificationTime( - const Context & context, - const String & table_name) override; - - ASTPtr getCreateTableQuery( - const Context & context, - const String & table_name) const override; - - ASTPtr tryGetCreateTableQuery( - const Context & context, - const String & table_name) const override; - - ASTPtr getCreateDictionaryQuery( - const Context & context, - const String & dictionary_name) const override; - - ASTPtr tryGetCreateDictionaryQuery( - const Context & context, - const String & dictionary_name) const override; - - ASTPtr getCreateDatabaseQuery(const Context & context) const override; - - String getDataPath() const override; - String getDatabaseName() const override; - String getMetadataPath() const override; - String getObjectMetadataPath(const String & table_name) const override; - - void drop(const Context & context) override; + time_t getObjectMetadataModificationTime(const String & table_name) const override; bool isTableExist( const Context & context, const String & table_name) const override; - bool isDictionaryExist( - const Context & context, - const String & table_name) const override; - StoragePtr tryGetTable( const Context & context, const String & table_name) const override; @@ -105,16 +62,10 @@ public: DatabaseTablesIteratorPtr getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name = {}) override; - DatabaseDictionariesIteratorPtr getDictionariesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name = {}) override; - void attachTable(const String & table_name, const StoragePtr & table) override; StoragePtr detachTable(const String & table_name) override; - void attachDictionary(const String & dictionary_name, const Context & context) override; - - void detachDictionary(const String & dictionary_name, const Context & context) override; - void shutdown() override; ~DatabaseLazy() override; @@ -146,19 +97,12 @@ private: using TablesCache = std::unordered_map; - - String name; - const String metadata_path; - const String data_path; - const time_t expiration_time; - mutable std::mutex tables_mutex; + /// TODO use DatabaseWithOwnTablesBase::tables mutable TablesCache tables_cache; mutable CacheExpirationQueue cache_expiration_queue; - Poco::Logger * log; - StoragePtr loadTable(const Context & context, const String & table_name) const; void clearExpiredTables() const; diff --git a/dbms/src/Databases/DatabaseMemory.cpp b/dbms/src/Databases/DatabaseMemory.cpp index 7d7f101a88c..996d5ca7c84 100644 --- a/dbms/src/Databases/DatabaseMemory.cpp +++ b/dbms/src/Databases/DatabaseMemory.cpp @@ -1,30 +1,16 @@ #include #include #include +#include namespace DB { -namespace ErrorCodes -{ - extern const int CANNOT_GET_CREATE_TABLE_QUERY; - extern const int CANNOT_GET_CREATE_DICTIONARY_QUERY; - extern const int UNSUPPORTED_METHOD; -} - -DatabaseMemory::DatabaseMemory(String name_) - : DatabaseWithOwnTablesBase(std::move(name_)) - , log(&Logger::get("DatabaseMemory(" + name + ")")) +DatabaseMemory::DatabaseMemory(const String & name_) + : DatabaseWithOwnTablesBase(name_, "DatabaseMemory(" + name_ + ")") {} -void DatabaseMemory::loadStoredObjects( - Context & /*context*/, - bool /*has_force_restore_data_flag*/) -{ - /// Nothing to load. -} - void DatabaseMemory::createTable( const Context & /*context*/, const String & table_name, @@ -34,21 +20,6 @@ void DatabaseMemory::createTable( attachTable(table_name, table); } - -void DatabaseMemory::attachDictionary(const String & /*name*/, const Context & /*context*/) -{ - throw Exception("There is no ATTACH DICTIONARY query for DatabaseMemory", ErrorCodes::UNSUPPORTED_METHOD); -} - -void DatabaseMemory::createDictionary( - const Context & /*context*/, - const String & /*dictionary_name*/, - const ASTPtr & /*query*/) -{ - throw Exception("There is no CREATE DICTIONARY query for DatabaseMemory", ErrorCodes::UNSUPPORTED_METHOD); -} - - void DatabaseMemory::removeTable( const Context & /*context*/, const String & table_name) @@ -56,52 +27,13 @@ void DatabaseMemory::removeTable( detachTable(table_name); } - -void DatabaseMemory::detachDictionary(const String & /*name*/, const Context & /*context*/) +ASTPtr DatabaseMemory::getCreateDatabaseQuery() const { - throw Exception("There is no DETACH DICTIONARY query for DatabaseMemory", ErrorCodes::UNSUPPORTED_METHOD); -} - - -void DatabaseMemory::removeDictionary( - const Context & /*context*/, - const String & /*dictionary_name*/) -{ - throw Exception("There is no DROP DICTIONARY query for DatabaseMemory", ErrorCodes::UNSUPPORTED_METHOD); -} - - -time_t DatabaseMemory::getObjectMetadataModificationTime( - const Context &, const String &) -{ - return static_cast(0); -} - -ASTPtr DatabaseMemory::getCreateTableQuery( - const Context &, - const String &) const -{ - throw Exception("There is no CREATE TABLE query for DatabaseMemory tables", ErrorCodes::CANNOT_GET_CREATE_TABLE_QUERY); -} - - -ASTPtr DatabaseMemory::getCreateDictionaryQuery( - const Context &, - const String &) const -{ - throw Exception("There is no CREATE DICTIONARY query for DatabaseMemory dictionaries", ErrorCodes::CANNOT_GET_CREATE_DICTIONARY_QUERY); -} - - -ASTPtr DatabaseMemory::getCreateDatabaseQuery( - const Context &) const -{ - throw Exception("There is no CREATE DATABASE query for DatabaseMemory", ErrorCodes::CANNOT_GET_CREATE_TABLE_QUERY); -} - -String DatabaseMemory::getDatabaseName() const -{ - return name; + auto create_query = std::make_shared(); + create_query->database = database_name; + create_query->set(create_query->storage, std::make_shared()); + create_query->storage->set(create_query->storage->engine, makeASTFunction(getEngineName())); + return create_query; } } diff --git a/dbms/src/Databases/DatabaseMemory.h b/dbms/src/Databases/DatabaseMemory.h index 40f54c793e6..5609e6053ce 100644 --- a/dbms/src/Databases/DatabaseMemory.h +++ b/dbms/src/Databases/DatabaseMemory.h @@ -17,54 +17,21 @@ namespace DB class DatabaseMemory : public DatabaseWithOwnTablesBase { public: - DatabaseMemory(String name_); - - String getDatabaseName() const override; + DatabaseMemory(const String & name_); String getEngineName() const override { return "Memory"; } - void loadStoredObjects( - Context & context, - bool has_force_restore_data_flag) override; - void createTable( const Context & context, const String & table_name, const StoragePtr & table, const ASTPtr & query) override; - void createDictionary( - const Context & context, - const String & dictionary_name, - const ASTPtr & query) override; - - void attachDictionary( - const String & name, - const Context & context) override; - void removeTable( const Context & context, const String & table_name) override; - void removeDictionary( - const Context & context, - const String & dictionary_name) override; - - void detachDictionary( - const String & name, - const Context & context) override; - - time_t getObjectMetadataModificationTime(const Context & context, const String & table_name) override; - - ASTPtr getCreateTableQuery(const Context & context, const String & table_name) const override; - ASTPtr getCreateDictionaryQuery(const Context & context, const String & table_name) const override; - ASTPtr tryGetCreateTableQuery(const Context &, const String &) const override { return nullptr; } - ASTPtr tryGetCreateDictionaryQuery(const Context &, const String &) const override { return nullptr; } - - ASTPtr getCreateDatabaseQuery(const Context & context) const override; - -private: - Poco::Logger * log; + ASTPtr getCreateDatabaseQuery() const override; }; } diff --git a/dbms/src/Databases/DatabaseMySQL.cpp b/dbms/src/Databases/DatabaseMySQL.cpp index fa8b7c7159c..9ae42f08d8a 100644 --- a/dbms/src/Databases/DatabaseMySQL.cpp +++ b/dbms/src/Databases/DatabaseMySQL.cpp @@ -9,10 +9,8 @@ #include #include #include -#include #include #include -#include #include #include #include @@ -63,8 +61,12 @@ static String toQueryStringWithQuote(const std::vector & quote_list) DatabaseMySQL::DatabaseMySQL( const Context & global_context_, const String & database_name_, const String & metadata_path_, const ASTStorage * database_engine_define_, const String & database_name_in_mysql_, mysqlxx::Pool && pool) - : global_context(global_context_), database_name(database_name_), metadata_path(metadata_path_), - database_engine_define(database_engine_define_->clone()), database_name_in_mysql(database_name_in_mysql_), mysql_pool(std::move(pool)) + : IDatabase(database_name_) + , global_context(global_context_) + , metadata_path(metadata_path_) + , database_engine_define(database_engine_define_->clone()) + , database_name_in_mysql(database_name_in_mysql_) + , mysql_pool(std::move(pool)) { } @@ -150,19 +152,24 @@ static ASTPtr getCreateQueryFromStorage(const StoragePtr & storage, const ASTPtr return create_table_query; } -ASTPtr DatabaseMySQL::tryGetCreateTableQuery(const Context &, const String & table_name) const +ASTPtr DatabaseMySQL::getCreateTableQueryImpl(const Context &, const String & table_name, bool throw_on_error) const { std::lock_guard lock(mutex); fetchTablesIntoLocalCache(); if (local_tables_cache.find(table_name) == local_tables_cache.end()) - throw Exception("MySQL table " + database_name_in_mysql + "." + table_name + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); + { + if (throw_on_error) + throw Exception("MySQL table " + database_name_in_mysql + "." + table_name + " doesn't exist..", + ErrorCodes::UNKNOWN_TABLE); + return nullptr; + } return getCreateQueryFromStorage(local_tables_cache[table_name].second, database_engine_define); } -time_t DatabaseMySQL::getObjectMetadataModificationTime(const Context &, const String & table_name) +time_t DatabaseMySQL::getObjectMetadataModificationTime(const String & table_name) const { std::lock_guard lock(mutex); @@ -174,7 +181,7 @@ time_t DatabaseMySQL::getObjectMetadataModificationTime(const Context &, const S return time_t(local_tables_cache[table_name].first); } -ASTPtr DatabaseMySQL::getCreateDatabaseQuery(const Context &) const +ASTPtr DatabaseMySQL::getCreateDatabaseQuery() const { const auto & create_query = std::make_shared(); create_query->database = database_name; diff --git a/dbms/src/Databases/DatabaseMySQL.h b/dbms/src/Databases/DatabaseMySQL.h index 82adba8f44c..a327cf143eb 100644 --- a/dbms/src/Databases/DatabaseMySQL.h +++ b/dbms/src/Databases/DatabaseMySQL.h @@ -28,35 +28,17 @@ public: String getEngineName() const override { return "MySQL"; } - String getDatabaseName() const override { return database_name; } - bool empty(const Context & context) const override; DatabaseTablesIteratorPtr getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name = {}) override; - DatabaseDictionariesIteratorPtr getDictionariesIterator(const Context &, const FilterByNameFunction & = {}) override - { - return std::make_unique(); - } - - ASTPtr getCreateDatabaseQuery(const Context & context) const override; + ASTPtr getCreateDatabaseQuery() const override; bool isTableExist(const Context & context, const String & name) const override; - bool isDictionaryExist(const Context &, const String &) const override { return false; } - StoragePtr tryGetTable(const Context & context, const String & name) const override; - ASTPtr tryGetCreateTableQuery(const Context & context, const String & name) const override; - - ASTPtr getCreateDictionaryQuery(const Context &, const String &) const override - { - throw Exception("MySQL database engine does not support dictionaries.", ErrorCodes::NOT_IMPLEMENTED); - } - - ASTPtr tryGetCreateDictionaryQuery(const Context &, const String &) const override { return nullptr; } - - time_t getObjectMetadataModificationTime(const Context & context, const String & name) override; + time_t getObjectMetadataModificationTime(const String & name) const override; void shutdown() override; @@ -74,29 +56,12 @@ public: void attachTable(const String & table_name, const StoragePtr & storage) override; - void detachDictionary(const String &, const Context &) override - { - throw Exception("MySQL database engine does not support detach dictionary.", ErrorCodes::NOT_IMPLEMENTED); - } - void removeDictionary(const Context &, const String &) override - { - throw Exception("MySQL database engine does not support remove dictionary.", ErrorCodes::NOT_IMPLEMENTED); - } - - void attachDictionary(const String &, const Context &) override - { - throw Exception("MySQL database engine does not support attach dictionary.", ErrorCodes::NOT_IMPLEMENTED); - } - - void createDictionary(const Context &, const String &, const ASTPtr &) override - { - throw Exception("MySQL database engine does not support create dictionary.", ErrorCodes::NOT_IMPLEMENTED); - } +protected: + ASTPtr getCreateTableQueryImpl(const Context & context, const String & name, bool throw_on_error) const override; private: Context global_context; - String database_name; String metadata_path; ASTPtr database_engine_define; String database_name_in_mysql; diff --git a/dbms/src/Databases/DatabaseOnDisk.cpp b/dbms/src/Databases/DatabaseOnDisk.cpp index b0d5a7a3f30..dfeb8746a2f 100644 --- a/dbms/src/Databases/DatabaseOnDisk.cpp +++ b/dbms/src/Databases/DatabaseOnDisk.cpp @@ -6,9 +6,6 @@ #include #include #include -#include -#include -#include #include #include #include @@ -19,7 +16,6 @@ #include #include -#include #include @@ -31,8 +27,6 @@ static constexpr size_t METADATA_FILE_BUFFER_SIZE = 32768; namespace ErrorCodes { - extern const int CANNOT_GET_CREATE_TABLE_QUERY; - extern const int CANNOT_GET_CREATE_DICTIONARY_QUERY; extern const int FILE_DOESNT_EXIST; extern const int INCORRECT_FILE_NAME; extern const int SYNTAX_ERROR; @@ -43,93 +37,10 @@ namespace ErrorCodes } -namespace detail -{ - String getObjectMetadataPath(const String & base_path, const String & table_name) - { - return base_path + (endsWith(base_path, "/") ? "" : "/") + escapeForFileName(table_name) + ".sql"; - } - - String getDatabaseMetadataPath(const String & base_path) - { - return (endsWith(base_path, "/") ? base_path.substr(0, base_path.size() - 1) : base_path) + ".sql"; - } - - ASTPtr getQueryFromMetadata(const String & metadata_path, bool throw_on_error) - { - String query; - - try - { - ReadBufferFromFile in(metadata_path, 4096); - readStringUntilEOF(query, in); - } - catch (const Exception & e) - { - if (!throw_on_error && e.code() == ErrorCodes::FILE_DOESNT_EXIST) - return nullptr; - else - throw; - } - - ParserCreateQuery parser; - const char * pos = query.data(); - std::string error_message; - auto ast = tryParseQuery(parser, pos, pos + query.size(), error_message, /* hilite = */ false, - "in file " + metadata_path, /* allow_multi_statements = */ false, 0); - - if (!ast && throw_on_error) - throw Exception(error_message, ErrorCodes::SYNTAX_ERROR); - - return ast; - } - - ASTPtr getCreateQueryFromMetadata(const String & metadata_path, const String & database, bool throw_on_error) - { - ASTPtr ast = getQueryFromMetadata(metadata_path, throw_on_error); - - if (ast) - { - auto & ast_create_query = ast->as(); - ast_create_query.attach = false; - ast_create_query.database = database; - } - - return ast; - } -} - - -ASTPtr parseCreateQueryFromMetadataFile(const String & filepath, Poco::Logger * log) -{ - String definition; - { - char in_buf[METADATA_FILE_BUFFER_SIZE]; - ReadBufferFromFile in(filepath, METADATA_FILE_BUFFER_SIZE, -1, in_buf); - readStringUntilEOF(definition, in); - } - - /** Empty files with metadata are generated after a rough restart of the server. - * Remove these files to slightly reduce the work of the admins on startup. - */ - if (definition.empty()) - { - LOG_ERROR(log, "File " << filepath << " is empty. Removing."); - Poco::File(filepath).remove(); - return nullptr; - } - - ParserCreateQuery parser_create; - ASTPtr result = parseQuery(parser_create, definition, "in file " + filepath, 0); - return result; -} - - - std::pair createTableFromAST( ASTCreateQuery ast_create_query, const String & database_name, - const String & database_data_path_relative, + const String & table_data_path_relative, Context & context, bool has_force_restore_data_flag) { @@ -144,7 +55,7 @@ std::pair createTableFromAST( return {ast_create_query.table, storage}; } /// We do not directly use `InterpreterCreateQuery::execute`, because - /// - the database has not been created yet; + /// - the database has not been loaded yet; /// - the code is simpler, since the query is already brought to a suitable form. if (!ast_create_query.columns_list || !ast_create_query.columns_list->columns) throw Exception("Missing definition of columns.", ErrorCodes::EMPTY_LIST_OF_COLUMNS_PASSED); @@ -152,7 +63,6 @@ std::pair createTableFromAST( ColumnsDescription columns = InterpreterCreateQuery::getColumnsDescription(*ast_create_query.columns_list->columns, context); ConstraintsDescription constraints = InterpreterCreateQuery::getConstraintsDescription(ast_create_query.columns_list->constraints); - String table_data_path_relative = database_data_path_relative + escapeForFileName(ast_create_query.table) + '/'; return { ast_create_query.table, @@ -202,7 +112,6 @@ String getObjectDefinitionFromCreateQuery(const ASTPtr & query) } void DatabaseOnDisk::createTable( - IDatabase & database, const Context & context, const String & table_name, const StoragePtr & table, @@ -222,14 +131,14 @@ void DatabaseOnDisk::createTable( /// A race condition would be possible if a table with the same name is simultaneously created using CREATE and using ATTACH. /// But there is protection from it - see using DDLGuard in InterpreterCreateQuery. - if (database.isDictionaryExist(context, table_name)) - throw Exception("Dictionary " + backQuote(database.getDatabaseName()) + "." + backQuote(table_name) + " already exists.", + if (isDictionaryExist(context, table_name)) + throw Exception("Dictionary " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " already exists.", ErrorCodes::DICTIONARY_ALREADY_EXISTS); - if (database.isTableExist(context, table_name)) - throw Exception("Table " + backQuote(database.getDatabaseName()) + "." + backQuote(table_name) + " already exists.", ErrorCodes::TABLE_ALREADY_EXISTS); + if (isTableExist(context, table_name)) + throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " already exists.", ErrorCodes::TABLE_ALREADY_EXISTS); - String table_metadata_path = database.getObjectMetadataPath(table_name); + String table_metadata_path = getObjectMetadataPath(table_name); String table_metadata_tmp_path = table_metadata_path + ".tmp"; String statement; @@ -248,7 +157,7 @@ void DatabaseOnDisk::createTable( try { /// Add a table to the map of known tables. - database.attachTable(table_name, table); + attachTable(table_name, table); /// If it was ATTACH query and file with table metadata already exist /// (so, ATTACH is done after DETACH), then rename atomically replaces old file with new one. @@ -261,107 +170,11 @@ void DatabaseOnDisk::createTable( } } - -void DatabaseOnDisk::createDictionary( - IDatabase & database, - const Context & context, - const String & dictionary_name, - const ASTPtr & query) +void DatabaseOnDisk::removeTable(const Context & /* context */, const String & table_name) { - const auto & settings = context.getSettingsRef(); + StoragePtr res = detachTable(table_name); - /** The code is based on the assumption that all threads share the same order of operations: - * - create the .sql.tmp file; - * - add the dictionary to ExternalDictionariesLoader; - * - load the dictionary in case dictionaries_lazy_load == false; - * - attach the dictionary; - * - rename .sql.tmp to .sql. - */ - - /// A race condition would be possible if a dictionary with the same name is simultaneously created using CREATE and using ATTACH. - /// But there is protection from it - see using DDLGuard in InterpreterCreateQuery. - if (database.isDictionaryExist(context, dictionary_name)) - throw Exception("Dictionary " + backQuote(database.getDatabaseName()) + "." + backQuote(dictionary_name) + " already exists.", ErrorCodes::DICTIONARY_ALREADY_EXISTS); - - /// A dictionary with the same full name could be defined in *.xml config files. - String full_name = database.getDatabaseName() + "." + dictionary_name; - auto & external_loader = const_cast(context.getExternalDictionariesLoader()); - if (external_loader.getCurrentStatus(full_name) != ExternalLoader::Status::NOT_EXIST) - throw Exception( - "Dictionary " + backQuote(database.getDatabaseName()) + "." + backQuote(dictionary_name) + " already exists.", - ErrorCodes::DICTIONARY_ALREADY_EXISTS); - - if (database.isTableExist(context, dictionary_name)) - throw Exception("Table " + backQuote(database.getDatabaseName()) + "." + backQuote(dictionary_name) + " already exists.", ErrorCodes::TABLE_ALREADY_EXISTS); - - String dictionary_metadata_path = database.getObjectMetadataPath(dictionary_name); - String dictionary_metadata_tmp_path = dictionary_metadata_path + ".tmp"; - String statement = getObjectDefinitionFromCreateQuery(query); - - { - /// Exclusive flags guarantees, that table is not created right now in another thread. Otherwise, exception will be thrown. - WriteBufferFromFile out(dictionary_metadata_tmp_path, statement.size(), O_WRONLY | O_CREAT | O_EXCL); - writeString(statement, out); - out.next(); - if (settings.fsync_metadata) - out.sync(); - out.close(); - } - - bool succeeded = false; - SCOPE_EXIT({ - if (!succeeded) - Poco::File(dictionary_metadata_tmp_path).remove(); - }); - - /// Add a temporary repository containing the dictionary. - /// We need this temp repository to try loading the dictionary before actually attaching it to the database. - static std::atomic counter = 0; - String temp_repository_name = String(IExternalLoaderConfigRepository::INTERNAL_REPOSITORY_NAME_PREFIX) + " creating " + full_name + " " - + std::to_string(++counter); - external_loader.addConfigRepository( - temp_repository_name, - std::make_unique( - std::vector{std::pair{dictionary_metadata_tmp_path, - getDictionaryConfigurationFromAST(query->as(), database.getDatabaseName())}})); - SCOPE_EXIT({ external_loader.removeConfigRepository(temp_repository_name); }); - - bool lazy_load = context.getConfigRef().getBool("dictionaries_lazy_load", true); - if (!lazy_load) - { - /// load() is called here to force loading the dictionary, wait until the loading is finished, - /// and throw an exception if the loading is failed. - external_loader.load(full_name); - } - - database.attachDictionary(dictionary_name, context); - SCOPE_EXIT({ - if (!succeeded) - database.detachDictionary(dictionary_name, context); - }); - - /// If it was ATTACH query and file with dictionary metadata already exist - /// (so, ATTACH is done after DETACH), then rename atomically replaces old file with new one. - Poco::File(dictionary_metadata_tmp_path).renameTo(dictionary_metadata_path); - - /// ExternalDictionariesLoader doesn't know we renamed the metadata path. - /// So we have to manually call reloadConfig() here. - external_loader.reloadConfig(database.getDatabaseName(), full_name); - - /// Everything's ok. - succeeded = true; -} - - -void DatabaseOnDisk::removeTable( - IDatabase & database, - const Context & /* context */, - const String & table_name, - Poco::Logger * log) -{ - StoragePtr res = database.detachTable(table_name); - - String table_metadata_path = database.getObjectMetadataPath(table_name); + String table_metadata_path = getObjectMetadataPath(table_name); try { @@ -378,51 +191,64 @@ void DatabaseOnDisk::removeTable( { LOG_WARNING(log, getCurrentExceptionMessage(__PRETTY_FUNCTION__)); } - database.attachTable(table_name, res); + attachTable(table_name, res); throw; } } - -void DatabaseOnDisk::removeDictionary( - IDatabase & database, - const Context & context, - const String & dictionary_name, - Poco::Logger * /*log*/) +void DatabaseOnDisk::renameTable( + const Context & context, + const String & table_name, + IDatabase & to_database, + const String & to_table_name, + TableStructureWriteLockHolder & lock) { - database.detachDictionary(dictionary_name, context); + if (typeid(*this) != typeid(to_database)) + throw Exception("Moving tables between databases of different engines is not supported", ErrorCodes::NOT_IMPLEMENTED); - String dictionary_metadata_path = database.getObjectMetadataPath(dictionary_name); - if (Poco::File(dictionary_metadata_path).exists()) + StoragePtr table = tryGetTable(context, table_name); + + if (!table) + throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); + + ASTPtr ast = parseQueryFromMetadata(getObjectMetadataPath(table_name)); + if (!ast) + throw Exception("There is no metadata file for table " + backQuote(table_name) + ".", ErrorCodes::FILE_DOESNT_EXIST); + auto & create = ast->as(); + create.table = to_table_name; + + /// Notify the table that it is renamed. If the table does not support renaming, exception is thrown. + try { - try - { - Poco::File(dictionary_metadata_path).remove(); - } - catch (...) - { - /// If remove was not possible for some reason - database.attachDictionary(dictionary_name, context); - throw; - } + table->rename(to_database.getTableDataPath(create), + to_database.getDatabaseName(), + to_table_name, lock); } + catch (const Exception &) + { + throw; + } + catch (const Poco::Exception & e) + { + /// Better diagnostics. + throw Exception{Exception::CreateFromPoco, e}; + } + + /// NOTE Non-atomic. + to_database.createTable(context, to_table_name, table, ast); + removeTable(context, table_name); } - -ASTPtr DatabaseOnDisk::getCreateTableQueryImpl( - const IDatabase & database, - const Context & context, - const String & table_name, - bool throw_on_error) +ASTPtr DatabaseOnDisk::getCreateTableQueryImpl(const Context & context, const String & table_name, bool throw_on_error) const { ASTPtr ast; - auto table_metadata_path = detail::getObjectMetadataPath(database.getMetadataPath(), table_name); - ast = detail::getCreateQueryFromMetadata(table_metadata_path, database.getDatabaseName(), throw_on_error); + auto table_metadata_path = getObjectMetadataPath(table_name); + ast = getCreateQueryFromMetadata(table_metadata_path, throw_on_error); if (!ast && throw_on_error) { /// Handle system.* tables for which there are no table.sql files. - bool has_table = database.tryGetTable(context, table_name) != nullptr; + bool has_table = tryGetTable(context, table_name) != nullptr; auto msg = has_table ? "There is no CREATE TABLE query for table " @@ -434,61 +260,18 @@ ASTPtr DatabaseOnDisk::getCreateTableQueryImpl( return ast; } - -ASTPtr DatabaseOnDisk::getCreateDictionaryQueryImpl( - const IDatabase & database, - const Context & context, - const String & dictionary_name, - bool throw_on_error) +ASTPtr DatabaseOnDisk::getCreateDatabaseQuery() const { ASTPtr ast; - auto dictionary_metadata_path = detail::getObjectMetadataPath(database.getMetadataPath(), dictionary_name); - ast = detail::getCreateQueryFromMetadata(dictionary_metadata_path, database.getDatabaseName(), throw_on_error); - if (!ast && throw_on_error) - { - /// Handle system.* tables for which there are no table.sql files. - bool has_dictionary = database.isDictionaryExist(context, dictionary_name); - - auto msg = has_dictionary ? "There is no CREATE DICTIONARY query for table " : "There is no metadata file for dictionary "; - - throw Exception(msg + backQuote(dictionary_name), ErrorCodes::CANNOT_GET_CREATE_DICTIONARY_QUERY); - } - - return ast; -} - -ASTPtr DatabaseOnDisk::getCreateTableQuery(const IDatabase & database, const Context & context, const String & table_name) -{ - return getCreateTableQueryImpl(database, context, table_name, true); -} - -ASTPtr DatabaseOnDisk::tryGetCreateTableQuery(const IDatabase & database, const Context & context, const String & table_name) -{ - return getCreateTableQueryImpl(database, context, table_name, false); -} - - -ASTPtr DatabaseOnDisk::getCreateDictionaryQuery(const IDatabase & database, const Context & context, const String & dictionary_name) -{ - return getCreateDictionaryQueryImpl(database, context, dictionary_name, true); -} - -ASTPtr DatabaseOnDisk::tryGetCreateDictionaryQuery(const IDatabase & database, const Context & context, const String & dictionary_name) -{ - return getCreateDictionaryQueryImpl(database, context, dictionary_name, false); -} - -ASTPtr DatabaseOnDisk::getCreateDatabaseQuery(const IDatabase & database, const Context & /*context*/) -{ - ASTPtr ast; - - auto database_metadata_path = detail::getDatabaseMetadataPath(database.getMetadataPath()); - ast = detail::getCreateQueryFromMetadata(database_metadata_path, database.getDatabaseName(), true); + auto metadata_dir_path = getMetadataPath(); + auto database_metadata_path = metadata_dir_path.substr(0, metadata_dir_path.size() - 1) + ".sql"; + ast = getCreateQueryFromMetadata(database_metadata_path, true); if (!ast) { /// Handle databases (such as default) for which there are no database.sql files. - String query = "CREATE DATABASE " + backQuoteIfNeed(database.getDatabaseName()) + " ENGINE = Lazy"; + /// If database.sql doesn't exist, then engine is Ordinary + String query = "CREATE DATABASE " + backQuoteIfNeed(getDatabaseName()) + " ENGINE = Ordinary"; ParserCreateQuery parser; ast = parseQuery(parser, query.data(), query.data() + query.size(), "", 0); } @@ -496,22 +279,20 @@ ASTPtr DatabaseOnDisk::getCreateDatabaseQuery(const IDatabase & database, const return ast; } -void DatabaseOnDisk::drop(const IDatabase & database, const Context & context) +void DatabaseOnDisk::drop(const Context & context) { - Poco::File(context.getPath() + database.getDataPath()).remove(false); - Poco::File(database.getMetadataPath()).remove(false); + Poco::File(context.getPath() + getDataPath()).remove(false); + Poco::File(getMetadataPath()).remove(false); } -String DatabaseOnDisk::getObjectMetadataPath(const IDatabase & database, const String & table_name) +String DatabaseOnDisk::getObjectMetadataPath(const String & table_name) const { - return detail::getObjectMetadataPath(database.getMetadataPath(), table_name); + return getMetadataPath() + escapeForFileName(table_name) + ".sql"; } -time_t DatabaseOnDisk::getObjectMetadataModificationTime( - const IDatabase & database, - const String & table_name) +time_t DatabaseOnDisk::getObjectMetadataModificationTime(const String & table_name) const { - String table_metadata_path = getObjectMetadataPath(database, table_name); + String table_metadata_path = getObjectMetadataPath(table_name); Poco::File meta_file(table_metadata_path); if (meta_file.exists()) @@ -520,10 +301,10 @@ time_t DatabaseOnDisk::getObjectMetadataModificationTime( return static_cast(0); } -void DatabaseOnDisk::iterateMetadataFiles(const IDatabase & database, Poco::Logger * log, const Context & context, const IteratingFunction & iterating_function) +void DatabaseOnDisk::iterateMetadataFiles(const Context & context, const IteratingFunction & iterating_function) const { Poco::DirectoryIterator dir_end; - for (Poco::DirectoryIterator dir_it(database.getMetadataPath()); dir_it != dir_end; ++dir_it) + for (Poco::DirectoryIterator dir_it(getMetadataPath()); dir_it != dir_end; ++dir_it) { /// For '.svn', '.gitignore' directory and similar. if (dir_it.name().at(0) == '.') @@ -538,10 +319,10 @@ void DatabaseOnDisk::iterateMetadataFiles(const IDatabase & database, Poco::Logg if (endsWith(dir_it.name(), tmp_drop_ext)) { const std::string object_name = dir_it.name().substr(0, dir_it.name().size() - strlen(tmp_drop_ext)); - if (Poco::File(context.getPath() + database.getDataPath() + '/' + object_name).exists()) + if (Poco::File(context.getPath() + getDataPath() + '/' + object_name).exists()) { /// TODO maybe complete table drop and remove all table data (including data on other volumes and metadata in ZK) - Poco::File(dir_it->path()).renameTo(database.getMetadataPath() + object_name + ".sql"); + Poco::File(dir_it->path()).renameTo(getMetadataPath() + object_name + ".sql"); LOG_WARNING(log, "Object " << backQuote(object_name) << " was not dropped previously and will be restored"); iterating_function(object_name + ".sql"); } @@ -567,9 +348,64 @@ void DatabaseOnDisk::iterateMetadataFiles(const IDatabase & database, Poco::Logg iterating_function(dir_it.name()); } else - throw Exception("Incorrect file extension: " + dir_it.name() + " in metadata directory " + database.getMetadataPath(), + throw Exception("Incorrect file extension: " + dir_it.name() + " in metadata directory " + getMetadataPath(), ErrorCodes::INCORRECT_FILE_NAME); } } +ASTPtr DatabaseOnDisk::parseQueryFromMetadata(const String & metadata_file_path, bool throw_on_error /*= true*/, bool remove_empty /*= false*/) const +{ + String query; + + try + { + ReadBufferFromFile in(metadata_file_path, METADATA_FILE_BUFFER_SIZE); + readStringUntilEOF(query, in); + } + catch (const Exception & e) + { + if (!throw_on_error && e.code() == ErrorCodes::FILE_DOESNT_EXIST) + return nullptr; + else + throw; + } + + /** Empty files with metadata are generated after a rough restart of the server. + * Remove these files to slightly reduce the work of the admins on startup. + */ + if (remove_empty && query.empty()) + { + LOG_ERROR(log, "File " << metadata_file_path << " is empty. Removing."); + Poco::File(metadata_file_path).remove(); + return nullptr; + } + + ParserCreateQuery parser; + const char * pos = query.data(); + std::string error_message; + auto ast = tryParseQuery(parser, pos, pos + query.size(), error_message, /* hilite = */ false, + "in file " + getMetadataPath(), /* allow_multi_statements = */ false, 0); + + if (!ast && throw_on_error) + throw Exception(error_message, ErrorCodes::SYNTAX_ERROR); + else if (!ast) + return nullptr; + + return ast; +} + +ASTPtr DatabaseOnDisk::getCreateQueryFromMetadata(const String & database_metadata_path, bool throw_on_error) const +{ + ASTPtr ast = parseQueryFromMetadata(database_metadata_path, throw_on_error); + + if (ast) + { + auto & ast_create_query = ast->as(); + ast_create_query.attach = false; + ast_create_query.database = database_name; + } + + return ast; +} + } diff --git a/dbms/src/Databases/DatabaseOnDisk.h b/dbms/src/Databases/DatabaseOnDisk.h index c138d43971b..fa77e80a3a4 100644 --- a/dbms/src/Databases/DatabaseOnDisk.h +++ b/dbms/src/Databases/DatabaseOnDisk.h @@ -11,24 +11,14 @@ namespace DB { -namespace detail -{ - String getObjectMetadataPath(const String & base_path, const String & dictionary_name); - String getDatabaseMetadataPath(const String & base_path); - ASTPtr getQueryFromMetadata(const String & metadata_path, bool throw_on_error = true); - ASTPtr getCreateQueryFromMetadata(const String & metadata_path, const String & database, bool throw_on_error); -} - -ASTPtr parseCreateQueryFromMetadataFile(const String & filepath, Poco::Logger * log); - std::pair createTableFromAST( ASTCreateQuery ast_create_query, const String & database_name, - const String & database_data_path_relative, + const String & table_data_path_relative, Context & context, bool has_force_restore_data_flag); -/** Get the row with the table definition based on the CREATE query. +/** Get the string with the table definition based on the CREATE query. * It is an ATTACH query that you can execute to create a table from the correspondent database. * See the implementation. */ @@ -37,147 +27,59 @@ String getObjectDefinitionFromCreateQuery(const ASTPtr & query); /* Class to provide basic operations with tables when metadata is stored on disk in .sql files. */ -class DatabaseOnDisk +class DatabaseOnDisk : public DatabaseWithOwnTablesBase { public: - static void createTable( - IDatabase & database, + DatabaseOnDisk(const String & name, const String & metadata_path_, const String & logger) + : DatabaseWithOwnTablesBase(name, logger) + , metadata_path(metadata_path_) + , data_path("data/" + escapeForFileName(database_name) + "/") {} + + void createTable( const Context & context, const String & table_name, const StoragePtr & table, - const ASTPtr & query); + const ASTPtr & query) override; - static void createDictionary( - IDatabase & database, + void removeTable( const Context & context, - const String & dictionary_name, - const ASTPtr & query); + const String & table_name) override; - static void removeTable( - IDatabase & database, - const Context & context, - const String & table_name, - Poco::Logger * log); - - static void removeDictionary( - IDatabase & database, - const Context & context, - const String & dictionary_name, - Poco::Logger * log); - - template - static void renameTable( - IDatabase & database, + void renameTable( const Context & context, const String & table_name, IDatabase & to_database, const String & to_table_name, - TableStructureWriteLockHolder & lock); + TableStructureWriteLockHolder & lock) override; - static ASTPtr getCreateTableQuery( - const IDatabase & database, - const Context & context, - const String & table_name); + ASTPtr getCreateDatabaseQuery() const override; - static ASTPtr tryGetCreateTableQuery( - const IDatabase & database, - const Context & context, - const String & table_name); + void drop(const Context & context) override; - static ASTPtr getCreateDictionaryQuery( - const IDatabase & database, - const Context & context, - const String & dictionary_name); + String getObjectMetadataPath(const String & object_name) const override; - static ASTPtr tryGetCreateDictionaryQuery( - const IDatabase & database, - const Context & context, - const String & dictionary_name); - - static ASTPtr getCreateDatabaseQuery( - const IDatabase & database, - const Context & context); - - static void drop(const IDatabase & database, const Context & context); - - static String getObjectMetadataPath( - const IDatabase & database, - const String & object_name); - - static time_t getObjectMetadataModificationTime( - const IDatabase & database, - const String & object_name); + time_t getObjectMetadataModificationTime(const String & object_name) const override; + String getDataPath() const override { return data_path; } + String getTableDataPath(const String & table_name) const override { return data_path + escapeForFileName(table_name) + "/"; } + String getTableDataPath(const ASTCreateQuery & query) const override { return getTableDataPath(query.table); } + String getMetadataPath() const override { return metadata_path; } +protected: using IteratingFunction = std::function; - static void iterateMetadataFiles(const IDatabase & database, Poco::Logger * log, const Context & context, const IteratingFunction & iterating_function); + void iterateMetadataFiles(const Context & context, const IteratingFunction & iterating_function) const; -private: - static ASTPtr getCreateTableQueryImpl( - const IDatabase & database, + ASTPtr getCreateTableQueryImpl( const Context & context, const String & table_name, - bool throw_on_error); + bool throw_on_error) const override; - static ASTPtr getCreateDictionaryQueryImpl( - const IDatabase & database, - const Context & context, - const String & dictionary_name, - bool throw_on_error); + ASTPtr parseQueryFromMetadata(const String & metadata_file_path, bool throw_on_error = true, bool remove_empty = false) const; + ASTPtr getCreateQueryFromMetadata(const String & metadata_path, bool throw_on_error) const; + + + const String metadata_path; + const String data_path; }; - -namespace ErrorCodes -{ - extern const int NOT_IMPLEMENTED; - extern const int UNKNOWN_TABLE; - extern const int FILE_DOESNT_EXIST; -} - -template -void DatabaseOnDisk::renameTable( - IDatabase & database, - const Context & context, - const String & table_name, - IDatabase & to_database, - const String & to_table_name, - TableStructureWriteLockHolder & lock) -{ - Database * to_database_concrete = typeid_cast(&to_database); - - if (!to_database_concrete) - throw Exception("Moving tables between databases of different engines is not supported", ErrorCodes::NOT_IMPLEMENTED); - - StoragePtr table = database.tryGetTable(context, table_name); - - if (!table) - throw Exception("Table " + backQuote(database.getDatabaseName()) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); - - /// Notify the table that it is renamed. If the table does not support renaming, exception is thrown. - try - { - table->rename("/data/" + escapeForFileName(to_database_concrete->getDatabaseName()) + "/" + escapeForFileName(to_table_name) + '/', - to_database_concrete->getDatabaseName(), - to_table_name, lock); - } - catch (const Exception &) - { - throw; - } - catch (const Poco::Exception & e) - { - /// Better diagnostics. - throw Exception{Exception::CreateFromPoco, e}; - } - - ASTPtr ast = detail::getQueryFromMetadata(detail::getObjectMetadataPath(database.getMetadataPath(), table_name)); - if (!ast) - throw Exception("There is no metadata file for table " + backQuote(table_name) + ".", ErrorCodes::FILE_DOESNT_EXIST); - ast->as().table = to_table_name; - - /// NOTE Non-atomic. - to_database_concrete->createTable(context, to_table_name, table, ast); - database.removeTable(context, table_name); -} - } diff --git a/dbms/src/Databases/DatabaseOrdinary.cpp b/dbms/src/Databases/DatabaseOrdinary.cpp index 387c8a6335f..f9f6983604c 100644 --- a/dbms/src/Databases/DatabaseOrdinary.cpp +++ b/dbms/src/Databases/DatabaseOrdinary.cpp @@ -1,7 +1,6 @@ #include #include -#include #include #include #include @@ -11,22 +10,19 @@ #include #include #include -#include -#include #include #include -#include #include -#include #include #include -#include +#include #include +#include + #include #include #include -#include #include #include #include @@ -40,11 +36,9 @@ namespace DB namespace ErrorCodes { - extern const int CANNOT_CREATE_TABLE_FROM_METADATA; extern const int CANNOT_CREATE_DICTIONARY_FROM_METADATA; extern const int EMPTY_LIST_OF_COLUMNS_PASSED; extern const int CANNOT_PARSE_TEXT; - extern const int EMPTY_LIST_OF_ATTRIBUTES_PASSED; } @@ -68,16 +62,13 @@ namespace String table_name; StoragePtr table; std::tie(table_name, table) - = createTableFromAST(query, database_name, database.getDataPath(), context, has_force_restore_data_flag); + = createTableFromAST(query, database_name, database.getTableDataPath(query), context, has_force_restore_data_flag); database.attachTable(table_name, table); } - catch (const Exception & e) + catch (Exception & e) { - throw Exception( - "Cannot attach table '" + query.table + "' from query " + serializeAST(query) - + ". Error: " + DB::getCurrentExceptionMessage(true), - e, - DB::ErrorCodes::CANNOT_CREATE_TABLE_FROM_METADATA); + e.addMessage("Cannot attach table '" + backQuote(query.table) + "' from query " + serializeAST(query)); + throw; } } @@ -92,13 +83,10 @@ namespace { database.attachDictionary(query.table, context); } - catch (const Exception & e) + catch (Exception & e) { - throw Exception( - "Cannot create dictionary '" + query.table + "' from query " + serializeAST(query) - + ". Error: " + DB::getCurrentExceptionMessage(true), - e, - DB::ErrorCodes::CANNOT_CREATE_DICTIONARY_FROM_METADATA); + e.addMessage("Cannot attach table '" + backQuote(query.table) + "' from query " + serializeAST(query)); + throw; } } @@ -114,11 +102,8 @@ namespace } -DatabaseOrdinary::DatabaseOrdinary(String name_, const String & metadata_path_, const Context & context_) - : DatabaseWithOwnTablesBase(std::move(name_)) - , metadata_path(metadata_path_) - , data_path("data/" + escapeForFileName(name) + "/") - , log(&Logger::get("DatabaseOrdinary (" + name + ")")) +DatabaseOrdinary::DatabaseOrdinary(const String & name_, const String & metadata_path_, const Context & context_) + : DatabaseWithDictionaries(name_, metadata_path_, "DatabaseOrdinary (" + name_ + ")") { Poco::File(context_.getPath() + getDataPath()).createDirectories(); } @@ -137,12 +122,12 @@ void DatabaseOrdinary::loadStoredObjects( FileNames file_names; size_t total_dictionaries = 0; - DatabaseOnDisk::iterateMetadataFiles(*this, log, context, [&file_names, &total_dictionaries, this](const String & file_name) + iterateMetadataFiles(context, [&file_names, &total_dictionaries, this](const String & file_name) { - String full_path = metadata_path + "/" + file_name; + String full_path = getMetadataPath() + file_name; try { - auto ast = parseCreateQueryFromMetadataFile(full_path, log); + auto ast = parseQueryFromMetadata(full_path, /*throw_on_error*/ true, /*remove_empty*/false); if (ast) { auto * create_query = ast->as(); @@ -150,10 +135,10 @@ void DatabaseOrdinary::loadStoredObjects( total_dictionaries += create_query->is_dictionary; } } - catch (const Exception & e) + catch (Exception & e) { - throw Exception( - "Cannot parse definition from metadata file " + full_path + ". Error: " + DB::getCurrentExceptionMessage(true), e, ErrorCodes::CANNOT_PARSE_TEXT); + e.addMessage("Cannot parse definition from metadata file " + full_path); + throw; } }); @@ -187,12 +172,8 @@ void DatabaseOrdinary::loadStoredObjects( /// After all tables was basically initialized, startup them. startupTables(pool); - /// Add database as repository - auto dictionaries_repository = std::make_unique(shared_from_this(), context); - auto & external_loader = context.getExternalDictionariesLoader(); - external_loader.addConfigRepository(getDatabaseName(), std::move(dictionaries_repository)); - /// Attach dictionaries. + attachToExternalDictionariesLoader(context); for (const auto & name_with_query : file_names) { auto create_query = name_with_query.second->as(); @@ -237,94 +218,14 @@ void DatabaseOrdinary::startupTables(ThreadPool & thread_pool) thread_pool.wait(); } -void DatabaseOrdinary::createTable( - const Context & context, - const String & table_name, - const StoragePtr & table, - const ASTPtr & query) -{ - DatabaseOnDisk::createTable(*this, context, table_name, table, query); -} - -void DatabaseOrdinary::createDictionary( - const Context & context, - const String & dictionary_name, - const ASTPtr & query) -{ - DatabaseOnDisk::createDictionary(*this, context, dictionary_name, query); -} - -void DatabaseOrdinary::removeTable( - const Context & context, - const String & table_name) -{ - DatabaseOnDisk::removeTable(*this, context, table_name, log); -} - -void DatabaseOrdinary::removeDictionary( - const Context & context, - const String & table_name) -{ - DatabaseOnDisk::removeDictionary(*this, context, table_name, log); -} - -void DatabaseOrdinary::renameTable( - const Context & context, - const String & table_name, - IDatabase & to_database, - const String & to_table_name, - TableStructureWriteLockHolder & lock) -{ - DatabaseOnDisk::renameTable(*this, context, table_name, to_database, to_table_name, lock); -} - - -time_t DatabaseOrdinary::getObjectMetadataModificationTime( - const Context & /* context */, - const String & table_name) -{ - return DatabaseOnDisk::getObjectMetadataModificationTime(*this, table_name); -} - -ASTPtr DatabaseOrdinary::getCreateTableQuery(const Context & context, const String & table_name) const -{ - return DatabaseOnDisk::getCreateTableQuery(*this, context, table_name); -} - -ASTPtr DatabaseOrdinary::tryGetCreateTableQuery(const Context & context, const String & table_name) const -{ - return DatabaseOnDisk::tryGetCreateTableQuery(*this, context, table_name); -} - - -ASTPtr DatabaseOrdinary::getCreateDictionaryQuery(const Context & context, const String & dictionary_name) const -{ - return DatabaseOnDisk::getCreateDictionaryQuery(*this, context, dictionary_name); -} - -ASTPtr DatabaseOrdinary::tryGetCreateDictionaryQuery(const Context & context, const String & dictionary_name) const -{ - return DatabaseOnDisk::tryGetCreateTableQuery(*this, context, dictionary_name); -} - -ASTPtr DatabaseOrdinary::getCreateDatabaseQuery(const Context & context) const -{ - return DatabaseOnDisk::getCreateDatabaseQuery(*this, context); -} - void DatabaseOrdinary::alterTable( const Context & context, const String & table_name, - const ColumnsDescription & columns, - const IndicesDescription & indices, - const ConstraintsDescription & constraints, - const ASTModifier & storage_modifier) + const StorageInMemoryMetadata & metadata) { /// Read the definition of the table and replace the necessary parts with new ones. - - String table_name_escaped = escapeForFileName(table_name); - String table_metadata_tmp_path = getMetadataPath() + "/" + table_name_escaped + ".sql.tmp"; - String table_metadata_path = getMetadataPath() + "/" + table_name_escaped + ".sql"; + String table_metadata_path = getObjectMetadataPath(table_name); + String table_metadata_tmp_path = table_metadata_path + ".tmp"; String statement; { @@ -338,19 +239,30 @@ void DatabaseOrdinary::alterTable( const auto & ast_create_query = ast->as(); - ASTPtr new_columns = InterpreterCreateQuery::formatColumns(columns); - ASTPtr new_indices = InterpreterCreateQuery::formatIndices(indices); - ASTPtr new_constraints = InterpreterCreateQuery::formatConstraints(constraints); + ASTPtr new_columns = InterpreterCreateQuery::formatColumns(metadata.columns); + ASTPtr new_indices = InterpreterCreateQuery::formatIndices(metadata.indices); + ASTPtr new_constraints = InterpreterCreateQuery::formatConstraints(metadata.constraints); ast_create_query.columns_list->replace(ast_create_query.columns_list->columns, new_columns); ast_create_query.columns_list->setOrReplace(ast_create_query.columns_list->indices, new_indices); ast_create_query.columns_list->setOrReplace(ast_create_query.columns_list->constraints, new_constraints); - if (storage_modifier) - storage_modifier(*ast_create_query.storage); + ASTStorage & storage_ast = *ast_create_query.storage; + /// ORDER BY may change, but cannot appear, it's required construction + if (metadata.order_by_ast && storage_ast.order_by) + storage_ast.set(storage_ast.order_by, metadata.order_by_ast); + + if (metadata.primary_key_ast) + storage_ast.set(storage_ast.primary_key, metadata.primary_key_ast); + + if (metadata.ttl_for_table_ast) + storage_ast.set(storage_ast.ttl_table, metadata.ttl_for_table_ast); + + if (metadata.settings_ast) + storage_ast.set(storage_ast.settings, metadata.settings_ast); + statement = getObjectDefinitionFromCreateQuery(ast); - { WriteBufferFromFile out(table_metadata_tmp_path, statement.size(), O_WRONLY | O_CREAT | O_EXCL); writeString(statement, out); @@ -372,31 +284,4 @@ void DatabaseOrdinary::alterTable( } } - -void DatabaseOrdinary::drop(const Context & context) -{ - DatabaseOnDisk::drop(*this, context); -} - - -String DatabaseOrdinary::getDataPath() const -{ - return data_path; -} - -String DatabaseOrdinary::getMetadataPath() const -{ - return metadata_path; -} - -String DatabaseOrdinary::getDatabaseName() const -{ - return name; -} - -String DatabaseOrdinary::getObjectMetadataPath(const String & table_name) const -{ - return DatabaseOnDisk::getObjectMetadataPath(*this, table_name); -} - } diff --git a/dbms/src/Databases/DatabaseOrdinary.h b/dbms/src/Databases/DatabaseOrdinary.h index 7809d63caba..41c03b5103e 100644 --- a/dbms/src/Databases/DatabaseOrdinary.h +++ b/dbms/src/Databases/DatabaseOrdinary.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include @@ -11,10 +11,10 @@ namespace DB * It stores tables list in filesystem using list of .sql files, * that contain declaration of table represented by SQL ATTACH TABLE query. */ -class DatabaseOrdinary : public DatabaseWithOwnTablesBase +class DatabaseOrdinary : public DatabaseWithDictionaries //DatabaseWithOwnTablesBase { public: - DatabaseOrdinary(String name_, const String & metadata_path_, const Context & context); + DatabaseOrdinary(const String & name_, const String & metadata_path_, const Context & context); String getEngineName() const override { return "Ordinary"; } @@ -22,73 +22,12 @@ public: Context & context, bool has_force_restore_data_flag) override; - void createTable( - const Context & context, - const String & table_name, - const StoragePtr & table, - const ASTPtr & query) override; - - void createDictionary( - const Context & context, - const String & dictionary_name, - const ASTPtr & query) override; - - void removeTable( - const Context & context, - const String & table_name) override; - - void removeDictionary( - const Context & context, - const String & table_name) override; - - void renameTable( - const Context & context, - const String & table_name, - IDatabase & to_database, - const String & to_table_name, - TableStructureWriteLockHolder &) override; - void alterTable( const Context & context, const String & name, - const ColumnsDescription & columns, - const IndicesDescription & indices, - const ConstraintsDescription & constraints, - const ASTModifier & engine_modifier) override; - - time_t getObjectMetadataModificationTime( - const Context & context, - const String & table_name) override; - - ASTPtr getCreateTableQuery( - const Context & context, - const String & table_name) const override; - - ASTPtr tryGetCreateTableQuery( - const Context & context, - const String & table_name) const override; - - ASTPtr tryGetCreateDictionaryQuery( - const Context & context, - const String & name) const override; - - ASTPtr getCreateDictionaryQuery( - const Context & context, - const String & name) const override; - - ASTPtr getCreateDatabaseQuery(const Context & context) const override; - - String getDataPath() const override; - String getDatabaseName() const override; - String getMetadataPath() const override; - String getObjectMetadataPath(const String & table_name) const override; - - void drop(const Context & context) override; + const StorageInMemoryMetadata & metadata) override; private: - const String metadata_path; - const String data_path; - Poco::Logger * log; void startupTables(ThreadPool & thread_pool); }; diff --git a/dbms/src/Databases/DatabaseWithDictionaries.cpp b/dbms/src/Databases/DatabaseWithDictionaries.cpp new file mode 100644 index 00000000000..716ed32b676 --- /dev/null +++ b/dbms/src/Databases/DatabaseWithDictionaries.cpp @@ -0,0 +1,271 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int EMPTY_LIST_OF_COLUMNS_PASSED; + extern const int TABLE_ALREADY_EXISTS; + extern const int UNKNOWN_TABLE; + extern const int LOGICAL_ERROR; + extern const int DICTIONARY_ALREADY_EXISTS; +} + + +void DatabaseWithDictionaries::attachDictionary(const String & dictionary_name, const Context & context) +{ + String full_name = getDatabaseName() + "." + dictionary_name; + { + std::lock_guard lock(mutex); + if (!dictionaries.emplace(dictionary_name).second) + throw Exception("Dictionary " + full_name + " already exists.", ErrorCodes::DICTIONARY_ALREADY_EXISTS); + } + + /// ExternalLoader::reloadConfig() will find out that the dictionary's config has been added + /// and in case `dictionaries_lazy_load == false` it will load the dictionary. + const auto & external_loader = context.getExternalDictionariesLoader(); + external_loader.reloadConfig(getDatabaseName(), full_name); +} + +void DatabaseWithDictionaries::detachDictionary(const String & dictionary_name, const Context & context) +{ + String full_name = getDatabaseName() + "." + dictionary_name; + { + std::lock_guard lock(mutex); + auto it = dictionaries.find(dictionary_name); + if (it == dictionaries.end()) + throw Exception("Dictionary " + full_name + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); + dictionaries.erase(it); + } + + /// ExternalLoader::reloadConfig() will find out that the dictionary's config has been removed + /// and therefore it will unload the dictionary. + const auto & external_loader = context.getExternalDictionariesLoader(); + external_loader.reloadConfig(getDatabaseName(), full_name); + +} + +void DatabaseWithDictionaries::createDictionary(const Context & context, const String & dictionary_name, const ASTPtr & query) +{ + const auto & settings = context.getSettingsRef(); + + /** The code is based on the assumption that all threads share the same order of operations: + * - create the .sql.tmp file; + * - add the dictionary to ExternalDictionariesLoader; + * - load the dictionary in case dictionaries_lazy_load == false; + * - attach the dictionary; + * - rename .sql.tmp to .sql. + */ + + /// A race condition would be possible if a dictionary with the same name is simultaneously created using CREATE and using ATTACH. + /// But there is protection from it - see using DDLGuard in InterpreterCreateQuery. + if (isDictionaryExist(context, dictionary_name)) + throw Exception("Dictionary " + backQuote(getDatabaseName()) + "." + backQuote(dictionary_name) + " already exists.", ErrorCodes::DICTIONARY_ALREADY_EXISTS); + + /// A dictionary with the same full name could be defined in *.xml config files. + String full_name = getDatabaseName() + "." + dictionary_name; + const auto & external_loader = context.getExternalDictionariesLoader(); + if (external_loader.getCurrentStatus(full_name) != ExternalLoader::Status::NOT_EXIST) + throw Exception( + "Dictionary " + backQuote(getDatabaseName()) + "." + backQuote(dictionary_name) + " already exists.", + ErrorCodes::DICTIONARY_ALREADY_EXISTS); + + if (isTableExist(context, dictionary_name)) + throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(dictionary_name) + " already exists.", ErrorCodes::TABLE_ALREADY_EXISTS); + + + String dictionary_metadata_path = getObjectMetadataPath(dictionary_name); + String dictionary_metadata_tmp_path = dictionary_metadata_path + ".tmp"; + String statement = getObjectDefinitionFromCreateQuery(query); + + { + /// Exclusive flags guarantees, that table is not created right now in another thread. Otherwise, exception will be thrown. + WriteBufferFromFile out(dictionary_metadata_tmp_path, statement.size(), O_WRONLY | O_CREAT | O_EXCL); + writeString(statement, out); + out.next(); + if (settings.fsync_metadata) + out.sync(); + out.close(); + } + + bool succeeded = false; + SCOPE_EXIT({ + if (!succeeded) + Poco::File(dictionary_metadata_tmp_path).remove(); + }); + + /// Add a temporary repository containing the dictionary. + /// We need this temp repository to try loading the dictionary before actually attaching it to the database. + auto temp_repository + = const_cast(external_loader) /// the change of ExternalDictionariesLoader is temporary + .addConfigRepository(std::make_unique( + getDatabaseName(), dictionary_metadata_tmp_path, getDictionaryConfigurationFromAST(query->as()))); + + bool lazy_load = context.getConfigRef().getBool("dictionaries_lazy_load", true); + if (!lazy_load) + { + /// load() is called here to force loading the dictionary, wait until the loading is finished, + /// and throw an exception if the loading is failed. + external_loader.load(full_name); + } + + attachDictionary(dictionary_name, context); + SCOPE_EXIT({ + if (!succeeded) + detachDictionary(dictionary_name, context); + }); + + /// If it was ATTACH query and file with dictionary metadata already exist + /// (so, ATTACH is done after DETACH), then rename atomically replaces old file with new one. + Poco::File(dictionary_metadata_tmp_path).renameTo(dictionary_metadata_path); + + /// ExternalDictionariesLoader doesn't know we renamed the metadata path. + /// So we have to manually call reloadConfig() here. + external_loader.reloadConfig(getDatabaseName(), full_name); + + /// Everything's ok. + succeeded = true; +} + +void DatabaseWithDictionaries::removeDictionary(const Context & context, const String & dictionary_name) +{ + detachDictionary(dictionary_name, context); + + String dictionary_metadata_path = getObjectMetadataPath(dictionary_name); + + try + { + Poco::File(dictionary_metadata_path).remove(); + } + catch (...) + { + /// If remove was not possible for some reason + attachDictionary(dictionary_name, context); + throw; + } +} + +StoragePtr DatabaseWithDictionaries::tryGetTable(const Context & context, const String & table_name) const +{ + if (auto table_ptr = DatabaseWithOwnTablesBase::tryGetTable(context, table_name)) + return table_ptr; + + if (isDictionaryExist(context, table_name)) + /// We don't need lock database here, because database doesn't store dictionary itself + /// just metadata + return getDictionaryStorage(context, table_name); + + return {}; +} + +DatabaseTablesIteratorPtr DatabaseWithDictionaries::getTablesWithDictionaryTablesIterator(const Context & context, const FilterByNameFunction & filter_by_name) +{ + /// NOTE: it's not atomic + auto tables_it = getTablesIterator(context, filter_by_name); + auto dictionaries_it = getDictionariesIterator(context, filter_by_name); + + Tables result; + while (tables_it && tables_it->isValid()) + { + result.emplace(tables_it->name(), tables_it->table()); + tables_it->next(); + } + + while (dictionaries_it && dictionaries_it->isValid()) + { + auto table_name = dictionaries_it->name(); + auto table_ptr = getDictionaryStorage(context, table_name); + if (table_ptr) + result.emplace(table_name, table_ptr); + dictionaries_it->next(); + } + + return std::make_unique(result); +} + +DatabaseDictionariesIteratorPtr DatabaseWithDictionaries::getDictionariesIterator(const Context & /*context*/, const FilterByNameFunction & filter_by_dictionary_name) +{ + std::lock_guard lock(mutex); + if (!filter_by_dictionary_name) + return std::make_unique(dictionaries); + + Dictionaries filtered_dictionaries; + for (const auto & dictionary_name : dictionaries) + if (filter_by_dictionary_name(dictionary_name)) + filtered_dictionaries.emplace(dictionary_name); + return std::make_unique(std::move(filtered_dictionaries)); +} + +bool DatabaseWithDictionaries::isDictionaryExist(const Context & /*context*/, const String & dictionary_name) const +{ + std::lock_guard lock(mutex); + return dictionaries.find(dictionary_name) != dictionaries.end(); +} + +StoragePtr DatabaseWithDictionaries::getDictionaryStorage(const Context & context, const String & table_name) const +{ + auto dict_name = database_name + "." + table_name; + const auto & external_loader = context.getExternalDictionariesLoader(); + auto dict_ptr = external_loader.tryGetDictionary(dict_name); + if (dict_ptr) + { + const DictionaryStructure & dictionary_structure = dict_ptr->getStructure(); + auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure); + return StorageDictionary::create(database_name, table_name, ColumnsDescription{columns}, context, true, dict_name); + } + return nullptr; +} + +ASTPtr DatabaseWithDictionaries::getCreateDictionaryQueryImpl( + const Context & context, + const String & dictionary_name, + bool throw_on_error) const +{ + ASTPtr ast; + + auto dictionary_metadata_path = getObjectMetadataPath(dictionary_name); + ast = getCreateQueryFromMetadata(dictionary_metadata_path, throw_on_error); + if (!ast && throw_on_error) + { + /// Handle system.* tables for which there are no table.sql files. + bool has_dictionary = isDictionaryExist(context, dictionary_name); + + auto msg = has_dictionary ? "There is no CREATE DICTIONARY query for table " : "There is no metadata file for dictionary "; + + throw Exception(msg + backQuote(dictionary_name), ErrorCodes::CANNOT_GET_CREATE_DICTIONARY_QUERY); + } + + return ast; +} + +void DatabaseWithDictionaries::shutdown() +{ + detachFromExternalDictionariesLoader(); + DatabaseOnDisk::shutdown(); +} + +DatabaseWithDictionaries::~DatabaseWithDictionaries() = default; + +void DatabaseWithDictionaries::attachToExternalDictionariesLoader(Context & context) +{ + database_as_config_repo_for_external_loader = context.getExternalDictionariesLoader().addConfigRepository( + std::make_unique(*this, context)); +} + +void DatabaseWithDictionaries::detachFromExternalDictionariesLoader() +{ + database_as_config_repo_for_external_loader = {}; +} + +} diff --git a/dbms/src/Databases/DatabaseWithDictionaries.h b/dbms/src/Databases/DatabaseWithDictionaries.h new file mode 100644 index 00000000000..c16e11f24c5 --- /dev/null +++ b/dbms/src/Databases/DatabaseWithDictionaries.h @@ -0,0 +1,50 @@ +#include +#include + +namespace DB +{ + + +class DatabaseWithDictionaries : public DatabaseOnDisk +{ +public: + void attachDictionary(const String & name, const Context & context) override; + + void detachDictionary(const String & name, const Context & context) override; + + void createDictionary(const Context & context, + const String & dictionary_name, + const ASTPtr & query) override; + + void removeDictionary(const Context & context, const String & dictionary_name) override; + + StoragePtr tryGetTable(const Context & context, const String & table_name) const override; + + DatabaseTablesIteratorPtr getTablesWithDictionaryTablesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name = {}) override; + + DatabaseDictionariesIteratorPtr getDictionariesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name = {}) override; + + bool isDictionaryExist(const Context & context, const String & dictionary_name) const override; + + void shutdown() override; + + ~DatabaseWithDictionaries() override; + +protected: + DatabaseWithDictionaries(const String & name, const String & metadata_path_, const String & logger) + : DatabaseOnDisk(name, metadata_path_, logger) {} + + void attachToExternalDictionariesLoader(Context & context); + void detachFromExternalDictionariesLoader(); + + StoragePtr getDictionaryStorage(const Context & context, const String & table_name) const; + + ASTPtr getCreateDictionaryQueryImpl(const Context & context, + const String & dictionary_name, + bool throw_on_error) const override; + +private: + ext::scope_guard database_as_config_repo_for_external_loader; +}; + +} diff --git a/dbms/src/Databases/DatabasesCommon.cpp b/dbms/src/Databases/DatabasesCommon.cpp index 5942009dd31..becdad672a1 100644 --- a/dbms/src/Databases/DatabasesCommon.cpp +++ b/dbms/src/Databases/DatabasesCommon.cpp @@ -1,6 +1,4 @@ #include -#include -#include #include #include #include @@ -22,23 +20,9 @@ namespace ErrorCodes extern const int DICTIONARY_ALREADY_EXISTS; } -namespace +DatabaseWithOwnTablesBase::DatabaseWithOwnTablesBase(const String & name_, const String & logger) + : IDatabase(name_), log(&Logger::get(logger)) { - -StoragePtr getDictionaryStorage(const Context & context, const String & table_name, const String & db_name) -{ - auto dict_name = db_name + "." + table_name; - const auto & external_loader = context.getExternalDictionariesLoader(); - auto dict_ptr = external_loader.tryGetDictionary(dict_name); - if (dict_ptr) - { - const DictionaryStructure & dictionary_structure = dict_ptr->getStructure(); - auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure); - return StorageDictionary::create(db_name, table_name, ColumnsDescription{columns}, context, true, dict_name); - } - return nullptr; -} - } bool DatabaseWithOwnTablesBase::isTableExist( @@ -49,57 +33,17 @@ bool DatabaseWithOwnTablesBase::isTableExist( return tables.find(table_name) != tables.end() || dictionaries.find(table_name) != dictionaries.end(); } -bool DatabaseWithOwnTablesBase::isDictionaryExist( - const Context & /*context*/, - const String & dictionary_name) const -{ - std::lock_guard lock(mutex); - return dictionaries.find(dictionary_name) != dictionaries.end(); -} - StoragePtr DatabaseWithOwnTablesBase::tryGetTable( - const Context & context, + const Context & /*context*/, const String & table_name) const { - { - std::lock_guard lock(mutex); - auto it = tables.find(table_name); - if (it != tables.end()) - return it->second; - } - - if (isDictionaryExist(context, table_name)) - /// We don't need lock database here, because database doesn't store dictionary itself - /// just metadata - return getDictionaryStorage(context, table_name, getDatabaseName()); - + std::lock_guard lock(mutex); + auto it = tables.find(table_name); + if (it != tables.end()) + return it->second; return {}; } -DatabaseTablesIteratorPtr DatabaseWithOwnTablesBase::getTablesWithDictionaryTablesIterator(const Context & context, const FilterByNameFunction & filter_by_name) -{ - auto tables_it = getTablesIterator(context, filter_by_name); - auto dictionaries_it = getDictionariesIterator(context, filter_by_name); - - Tables result; - while (tables_it && tables_it->isValid()) - { - result.emplace(tables_it->name(), tables_it->table()); - tables_it->next(); - } - - while (dictionaries_it && dictionaries_it->isValid()) - { - auto table_name = dictionaries_it->name(); - auto table_ptr = getDictionaryStorage(context, table_name, getDatabaseName()); - if (table_ptr) - result.emplace(table_name, table_ptr); - dictionaries_it->next(); - } - - return std::make_unique(result); -} - DatabaseTablesIteratorPtr DatabaseWithOwnTablesBase::getTablesIterator(const Context & /*context*/, const FilterByNameFunction & filter_by_table_name) { std::lock_guard lock(mutex); @@ -114,20 +58,6 @@ DatabaseTablesIteratorPtr DatabaseWithOwnTablesBase::getTablesIterator(const Con return std::make_unique(std::move(filtered_tables)); } - -DatabaseDictionariesIteratorPtr DatabaseWithOwnTablesBase::getDictionariesIterator(const Context & /*context*/, const FilterByNameFunction & filter_by_dictionary_name) -{ - std::lock_guard lock(mutex); - if (!filter_by_dictionary_name) - return std::make_unique(dictionaries); - - Dictionaries filtered_dictionaries; - for (const auto & dictionary_name : dictionaries) - if (filter_by_dictionary_name(dictionary_name)) - filtered_dictionaries.emplace(dictionary_name); - return std::make_unique(std::move(filtered_dictionaries)); -} - bool DatabaseWithOwnTablesBase::empty(const Context & /*context*/) const { std::lock_guard lock(mutex); @@ -140,11 +70,11 @@ StoragePtr DatabaseWithOwnTablesBase::detachTable(const String & table_name) { std::lock_guard lock(mutex); if (dictionaries.count(table_name)) - throw Exception("Cannot detach dictionary " + name + "." + table_name + " as table, use DETACH DICTIONARY query.", ErrorCodes::UNKNOWN_TABLE); + throw Exception("Cannot detach dictionary " + database_name + "." + table_name + " as table, use DETACH DICTIONARY query.", ErrorCodes::UNKNOWN_TABLE); auto it = tables.find(table_name); if (it == tables.end()) - throw Exception("Table " + backQuote(name) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); + throw Exception("Table " + backQuote(database_name) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); res = it->second; tables.erase(it); } @@ -152,44 +82,11 @@ StoragePtr DatabaseWithOwnTablesBase::detachTable(const String & table_name) return res; } -void DatabaseWithOwnTablesBase::detachDictionary(const String & dictionary_name, const Context & context) -{ - String full_name = getDatabaseName() + "." + dictionary_name; - { - std::lock_guard lock(mutex); - auto it = dictionaries.find(dictionary_name); - if (it == dictionaries.end()) - throw Exception("Dictionary " + full_name + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); - dictionaries.erase(it); - } - - /// ExternalLoader::reloadConfig() will find out that the dictionary's config has been removed - /// and therefore it will unload the dictionary. - const auto & external_loader = context.getExternalDictionariesLoader(); - external_loader.reloadConfig(getDatabaseName(), full_name); -} - void DatabaseWithOwnTablesBase::attachTable(const String & table_name, const StoragePtr & table) { std::lock_guard lock(mutex); if (!tables.emplace(table_name, table).second) - throw Exception("Table " + name + "." + table_name + " already exists.", ErrorCodes::TABLE_ALREADY_EXISTS); -} - - -void DatabaseWithOwnTablesBase::attachDictionary(const String & dictionary_name, const Context & context) -{ - String full_name = getDatabaseName() + "." + dictionary_name; - { - std::lock_guard lock(mutex); - if (!dictionaries.emplace(dictionary_name).second) - throw Exception("Dictionary " + full_name + " already exists.", ErrorCodes::DICTIONARY_ALREADY_EXISTS); - } - - /// ExternalLoader::reloadConfig() will find out that the dictionary's config has been added - /// and in case `dictionaries_lazy_load == false` it will load the dictionary. - const auto & external_loader = context.getExternalDictionariesLoader(); - external_loader.reloadConfig(getDatabaseName(), full_name); + throw Exception("Table " + database_name + "." + table_name + " already exists.", ErrorCodes::TABLE_ALREADY_EXISTS); } void DatabaseWithOwnTablesBase::shutdown() @@ -217,7 +114,7 @@ DatabaseWithOwnTablesBase::~DatabaseWithOwnTablesBase() { try { - shutdown(); + DatabaseWithOwnTablesBase::shutdown(); } catch (...) { diff --git a/dbms/src/Databases/DatabasesCommon.h b/dbms/src/Databases/DatabasesCommon.h index b277e8cd3d1..291c0e56e8f 100644 --- a/dbms/src/Databases/DatabasesCommon.h +++ b/dbms/src/Databases/DatabasesCommon.h @@ -23,8 +23,6 @@ public: const Context & context, const String & table_name) const override; - bool isDictionaryExist(const Context & context, const String & dictionary_name) const override; - StoragePtr tryGetTable( const Context & context, const String & table_name) const override; @@ -33,30 +31,21 @@ public: void attachTable(const String & table_name, const StoragePtr & table) override; - void attachDictionary(const String & name, const Context & context) override; - StoragePtr detachTable(const String & table_name) override; - void detachDictionary(const String & name, const Context & context) override; - DatabaseTablesIteratorPtr getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name = {}) override; - DatabaseDictionariesIteratorPtr getDictionariesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name = {}) override; - - DatabaseTablesIteratorPtr getTablesWithDictionaryTablesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name = {}) override; - void shutdown() override; virtual ~DatabaseWithOwnTablesBase() override; protected: - String name; - mutable std::mutex mutex; Tables tables; Dictionaries dictionaries; + Poco::Logger * log; - DatabaseWithOwnTablesBase(String name_) : name(std::move(name_)) { } + DatabaseWithOwnTablesBase(const String & name_, const String & logger); }; } diff --git a/dbms/src/Databases/IDatabase.h b/dbms/src/Databases/IDatabase.h index 3b4f774afd3..e9211560c51 100644 --- a/dbms/src/Databases/IDatabase.h +++ b/dbms/src/Databases/IDatabase.h @@ -3,6 +3,7 @@ #include #include #include +#include #include #include @@ -20,11 +21,15 @@ struct ConstraintsDescription; class ColumnsDescription; struct IndicesDescription; struct TableStructureWriteLockHolder; +class ASTCreateQuery; using Dictionaries = std::set; namespace ErrorCodes { extern const int NOT_IMPLEMENTED; + extern const int CANNOT_GET_CREATE_TABLE_QUERY; + extern const int CANNOT_GET_CREATE_TABLE_QUERY; + extern const int CANNOT_GET_CREATE_DICTIONARY_QUERY; } class IDatabaseTablesIterator @@ -96,14 +101,15 @@ using DatabaseDictionariesIteratorPtr = std::unique_ptr { public: + IDatabase() = delete; + IDatabase(String database_name_) : database_name(std::move(database_name_)) {} + /// Get name of database engine. virtual String getEngineName() const = 0; /// Load a set of existing tables. /// You can call only once, right after the object is created. - virtual void loadStoredObjects( - Context & context, - bool has_force_restore_data_flag) = 0; + virtual void loadStoredObjects(Context & /*context*/, bool /*has_force_restore_data_flag*/) {} /// Check the existence of the table. virtual bool isTableExist( @@ -112,8 +118,11 @@ public: /// Check the existence of the dictionary virtual bool isDictionaryExist( - const Context & context, - const String & name) const = 0; + const Context & /*context*/, + const String & /*name*/) const + { + return false; + } /// Get the table for work. Return nullptr if there is no table. virtual StoragePtr tryGetTable( @@ -127,7 +136,10 @@ public: virtual DatabaseTablesIteratorPtr getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name = {}) = 0; /// Get an iterator to pass through all the dictionaries. - virtual DatabaseDictionariesIteratorPtr getDictionariesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name = {}) = 0; + virtual DatabaseDictionariesIteratorPtr getDictionariesIterator(const Context & /*context*/, [[maybe_unused]] const FilterByNameFunction & filter_by_dictionary_name = {}) + { + return std::make_unique(); + } /// Get an iterator to pass through all the tables and dictionary tables. virtual DatabaseTablesIteratorPtr getTablesWithDictionaryTablesIterator(const Context & context, const FilterByNameFunction & filter_by_name = {}) @@ -140,39 +152,63 @@ public: /// Add the table to the database. Record its presence in the metadata. virtual void createTable( - const Context & context, - const String & name, - const StoragePtr & table, - const ASTPtr & query) = 0; + const Context & /*context*/, + const String & /*name*/, + const StoragePtr & /*table*/, + const ASTPtr & /*query*/) + { + throw Exception("There is no CREATE TABLE query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED); + } /// Add the dictionary to the database. Record its presence in the metadata. virtual void createDictionary( - const Context & context, - const String & dictionary_name, - const ASTPtr & query) = 0; + const Context & /*context*/, + const String & /*dictionary_name*/, + const ASTPtr & /*query*/) + { + throw Exception("There is no CREATE DICTIONARY query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED); + } /// Delete the table from the database. Delete the metadata. virtual void removeTable( - const Context & context, - const String & name) = 0; + const Context & /*context*/, + const String & /*name*/) + { + throw Exception("There is no DROP TABLE query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED); + } /// Delete the dictionary from the database. Delete the metadata. virtual void removeDictionary( - const Context & context, - const String & dictionary_name) = 0; + const Context & /*context*/, + const String & /*dictionary_name*/) + { + throw Exception("There is no DROP DICTIONARY query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED); + } /// Add a table to the database, but do not add it to the metadata. The database may not support this method. - virtual void attachTable(const String & name, const StoragePtr & table) = 0; + virtual void attachTable(const String & /*name*/, const StoragePtr & /*table*/) + { + throw Exception("There is no ATTACH TABLE query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED); + } /// Add dictionary to the database, but do not add it to the metadata. The database may not support this method. /// If dictionaries_lazy_load is false it also starts loading the dictionary asynchronously. - virtual void attachDictionary(const String & name, const Context & context) = 0; + virtual void attachDictionary(const String & /*name*/, const Context & /*context*/) + { + throw Exception("There is no ATTACH DICTIONARY query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED); + } /// Forget about the table without deleting it, and return it. The database may not support this method. - virtual StoragePtr detachTable(const String & name) = 0; + virtual StoragePtr detachTable(const String & /*name*/) + { + throw Exception("There is no DETACH TABLE query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED); + } /// Forget about the dictionary without deleting it. The database may not support this method. - virtual void detachDictionary(const String & name, const Context & context) = 0; + virtual void detachDictionary(const String & /*name*/, const Context & /*context*/) + { + throw Exception("There is no DETACH DICTIONARY query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED); + } /// Rename the table and possibly move the table to another database. virtual void renameTable( @@ -192,42 +228,50 @@ public: virtual void alterTable( const Context & /*context*/, const String & /*name*/, - const ColumnsDescription & /*columns*/, - const IndicesDescription & /*indices*/, - const ConstraintsDescription & /*constraints*/, - const ASTModifier & /*engine_modifier*/) + const StorageInMemoryMetadata & /*metadata*/) { - throw Exception(getEngineName() + ": renameTable() is not supported", ErrorCodes::NOT_IMPLEMENTED); + throw Exception(getEngineName() + ": alterTable() is not supported", ErrorCodes::NOT_IMPLEMENTED); } /// Returns time of table's metadata change, 0 if there is no corresponding metadata file. - virtual time_t getObjectMetadataModificationTime( - const Context & context, - const String & name) = 0; + virtual time_t getObjectMetadataModificationTime(const String & /*name*/) const + { + return static_cast(0); + } /// Get the CREATE TABLE query for the table. It can also provide information for detached tables for which there is metadata. - virtual ASTPtr tryGetCreateTableQuery(const Context & context, const String & name) const = 0; - - virtual ASTPtr getCreateTableQuery(const Context & context, const String & name) const + ASTPtr tryGetCreateTableQuery(const Context & context, const String & name) const noexcept { - return tryGetCreateTableQuery(context, name); + return getCreateTableQueryImpl(context, name, false); + } + + ASTPtr getCreateTableQuery(const Context & context, const String & name) const + { + return getCreateTableQueryImpl(context, name, true); } /// Get the CREATE DICTIONARY query for the dictionary. Returns nullptr if dictionary doesn't exists. - virtual ASTPtr tryGetCreateDictionaryQuery(const Context & context, const String & name) const = 0; - - virtual ASTPtr getCreateDictionaryQuery(const Context & context, const String & name) const + ASTPtr tryGetCreateDictionaryQuery(const Context & context, const String & name) const noexcept { - return tryGetCreateDictionaryQuery(context, name); + return getCreateDictionaryQueryImpl(context, name, false); + } + + ASTPtr getCreateDictionaryQuery(const Context & context, const String & name) const + { + return getCreateDictionaryQueryImpl(context, name, true); } /// Get the CREATE DATABASE query for current database. - virtual ASTPtr getCreateDatabaseQuery(const Context & context) const = 0; + virtual ASTPtr getCreateDatabaseQuery() const = 0; /// Get name of database. - virtual String getDatabaseName() const = 0; + String getDatabaseName() const { return database_name; } /// Returns path for persistent data storage if the database supports it, empty string otherwise virtual String getDataPath() const { return {}; } + /// Returns path for persistent data storage for table if the database supports it, empty string otherwise. Table must exist + virtual String getTableDataPath(const String & /*table_name*/) const { return {}; } + /// Returns path for persistent data storage for CREATE/ATTACH query if the database supports it, empty string otherwise + virtual String getTableDataPath(const ASTCreateQuery & /*query*/) const { return {}; } /// Returns metadata path if the database supports it, empty string otherwise virtual String getMetadataPath() const { return {}; } /// Returns metadata path of a concrete table if the database supports it, empty string otherwise @@ -240,6 +284,23 @@ public: virtual void drop(const Context & /*context*/) {} virtual ~IDatabase() {} + +protected: + virtual ASTPtr getCreateTableQueryImpl(const Context & /*context*/, const String & /*name*/, bool throw_on_error) const + { + if (throw_on_error) + throw Exception("There is no SHOW CREATE TABLE query for Database" + getEngineName(), ErrorCodes::CANNOT_GET_CREATE_TABLE_QUERY); + return nullptr; + } + + virtual ASTPtr getCreateDictionaryQueryImpl(const Context & /*context*/, const String & /*name*/, bool throw_on_error) const + { + if (throw_on_error) + throw Exception("There is no SHOW CREATE DICTIONARY query for Database" + getEngineName(), ErrorCodes::CANNOT_GET_CREATE_DICTIONARY_QUERY); + return nullptr; + } + + String database_name; }; using DatabasePtr = std::shared_ptr; diff --git a/dbms/src/Dictionaries/CacheDictionary.cpp b/dbms/src/Dictionaries/CacheDictionary.cpp index 4dcb87c7b8a..78ab9964e5b 100644 --- a/dbms/src/Dictionaries/CacheDictionary.cpp +++ b/dbms/src/Dictionaries/CacheDictionary.cpp @@ -57,12 +57,15 @@ inline size_t CacheDictionary::getCellIdx(const Key id) const CacheDictionary::CacheDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, const DictionaryLifetime dict_lifetime_, const size_t size_) - : name{name_} + : database(database_) + , name(name_) + , full_name{database_.empty() ? name_ : (database_ + "." + name_)} , dict_struct(dict_struct_) , source_ptr{std::move(source_ptr_)} , dict_lifetime(dict_lifetime_) @@ -73,7 +76,7 @@ CacheDictionary::CacheDictionary( , rnd_engine(randomSeed()) { if (!this->source_ptr->supportsSelectiveLoad()) - throw Exception{name + ": source cannot be used with CacheDictionary", ErrorCodes::UNSUPPORTED_METHOD}; + throw Exception{full_name + ": source cannot be used with CacheDictionary", ErrorCodes::UNSUPPORTED_METHOD}; createAttributes(); } @@ -204,7 +207,7 @@ void CacheDictionary::isInConstantVector(const Key child_id, const PaddedPODArra void CacheDictionary::getString(const std::string & attribute_name, const PaddedPODArray & ids, ColumnString * out) const { auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); const auto null_value = StringRef{std::get(attribute.null_values)}; @@ -215,7 +218,7 @@ void CacheDictionary::getString( const std::string & attribute_name, const PaddedPODArray & ids, const ColumnString * const def, ColumnString * const out) const { auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsString(attribute, ids, out, [&](const size_t row) { return def->getDataAt(row); }); } @@ -224,7 +227,7 @@ void CacheDictionary::getString( const std::string & attribute_name, const PaddedPODArray & ids, const String & def, ColumnString * const out) const { auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsString(attribute, ids, out, [&](const size_t) { return StringRef{def}; }); } @@ -352,7 +355,7 @@ void CacheDictionary::createAttributes() hierarchical_attribute = &attributes.back(); if (hierarchical_attribute->type != AttributeUnderlyingType::utUInt64) - throw Exception{name + ": hierarchical attribute must be UInt64.", ErrorCodes::TYPE_MISMATCH}; + throw Exception{full_name + ": hierarchical attribute must be UInt64.", ErrorCodes::TYPE_MISMATCH}; } } } @@ -539,7 +542,7 @@ CacheDictionary::Attribute & CacheDictionary::getAttribute(const std::string & a { const auto it = attribute_index_by_name.find(attribute_name); if (it == std::end(attribute_index_by_name)) - throw Exception{name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; + throw Exception{full_name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; return attributes[it->second]; } @@ -580,7 +583,7 @@ std::exception_ptr CacheDictionary::getLastException() const void registerDictionaryCache(DictionaryFactory & factory) { - auto create_layout = [=](const std::string & name, + auto create_layout = [=](const std::string & full_name, const DictionaryStructure & dict_struct, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, @@ -590,22 +593,24 @@ void registerDictionaryCache(DictionaryFactory & factory) throw Exception{"'key' is not supported for dictionary of layout 'cache'", ErrorCodes::UNSUPPORTED_METHOD}; if (dict_struct.range_min || dict_struct.range_max) - throw Exception{name + throw Exception{full_name + ": elements .structure.range_min and .structure.range_max should be defined only " "for a dictionary of layout 'range_hashed'", ErrorCodes::BAD_ARGUMENTS}; const auto & layout_prefix = config_prefix + ".layout"; const auto size = config.getInt(layout_prefix + ".cache.size_in_cells"); if (size == 0) - throw Exception{name + ": dictionary of layout 'cache' cannot have 0 cells", ErrorCodes::TOO_SMALL_BUFFER_SIZE}; + throw Exception{full_name + ": dictionary of layout 'cache' cannot have 0 cells", ErrorCodes::TOO_SMALL_BUFFER_SIZE}; const bool require_nonempty = config.getBool(config_prefix + ".require_nonempty", false); if (require_nonempty) - throw Exception{name + ": dictionary of layout 'cache' cannot have 'require_nonempty' attribute set", + throw Exception{full_name + ": dictionary of layout 'cache' cannot have 'require_nonempty' attribute set", ErrorCodes::BAD_ARGUMENTS}; + const String database = config.getString(config_prefix + ".database", ""); + const String name = config.getString(config_prefix + ".name"); const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"}; - return std::make_unique(name, dict_struct, std::move(source_ptr), dict_lifetime, size); + return std::make_unique(database, name, dict_struct, std::move(source_ptr), dict_lifetime, size); }; factory.registerLayout("cache", create_layout, false); } diff --git a/dbms/src/Dictionaries/CacheDictionary.h b/dbms/src/Dictionaries/CacheDictionary.h index b5065a63922..d780b557c03 100644 --- a/dbms/src/Dictionaries/CacheDictionary.h +++ b/dbms/src/Dictionaries/CacheDictionary.h @@ -25,13 +25,16 @@ class CacheDictionary final : public IDictionary { public: CacheDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, const DictionaryLifetime dict_lifetime_, const size_t size_); - std::string getName() const override { return name; } + const std::string & getDatabase() const override { return database; } + const std::string & getName() const override { return name; } + const std::string & getFullName() const override { return full_name; } std::string getTypeName() const override { return "Cache"; } @@ -52,7 +55,7 @@ public: std::shared_ptr clone() const override { - return std::make_shared(name, dict_struct, source_ptr->clone(), dict_lifetime, size); + return std::make_shared(database, name, dict_struct, source_ptr->clone(), dict_lifetime, size); } const IDictionarySource * getSource() const override { return source_ptr.get(); } @@ -254,7 +257,9 @@ private: template void isInImpl(const PaddedPODArray & child_ids, const AncestorType & ancestor_ids, PaddedPODArray & out) const; + const std::string database; const std::string name; + const std::string full_name; const DictionaryStructure dict_struct; mutable DictionarySourcePtr source_ptr; const DictionaryLifetime dict_lifetime; diff --git a/dbms/src/Dictionaries/CacheDictionary.inc.h b/dbms/src/Dictionaries/CacheDictionary.inc.h index 87005ac821f..a3a6937e9c5 100644 --- a/dbms/src/Dictionaries/CacheDictionary.inc.h +++ b/dbms/src/Dictionaries/CacheDictionary.inc.h @@ -333,7 +333,7 @@ void CacheDictionary::update( last_exception = std::current_exception(); backoff_end_time = now + std::chrono::seconds(calculateDurationWithBackoff(rnd_engine, error_count)); - tryLogException(last_exception, log, "Could not update cache dictionary '" + getName() + + tryLogException(last_exception, log, "Could not update cache dictionary '" + getFullName() + "', next update is scheduled at " + ext::to_string(backoff_end_time)); } } diff --git a/dbms/src/Dictionaries/ComplexKeyCacheDictionary.cpp b/dbms/src/Dictionaries/ComplexKeyCacheDictionary.cpp index 8ed917e8f89..e16f389e1ce 100644 --- a/dbms/src/Dictionaries/ComplexKeyCacheDictionary.cpp +++ b/dbms/src/Dictionaries/ComplexKeyCacheDictionary.cpp @@ -51,12 +51,15 @@ inline UInt64 ComplexKeyCacheDictionary::getCellIdx(const StringRef key) const ComplexKeyCacheDictionary::ComplexKeyCacheDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, const DictionaryLifetime dict_lifetime_, const size_t size_) - : name{name_} + : database(database_) + , name(name_) + , full_name{database_.empty() ? name_ : (database_ + "." + name_)} , dict_struct(dict_struct_) , source_ptr{std::move(source_ptr_)} , dict_lifetime(dict_lifetime_) @@ -65,7 +68,7 @@ ComplexKeyCacheDictionary::ComplexKeyCacheDictionary( , rnd_engine(randomSeed()) { if (!this->source_ptr->supportsSelectiveLoad()) - throw Exception{name + ": source cannot be used with ComplexKeyCacheDictionary", ErrorCodes::UNSUPPORTED_METHOD}; + throw Exception{full_name + ": source cannot be used with ComplexKeyCacheDictionary", ErrorCodes::UNSUPPORTED_METHOD}; createAttributes(); } @@ -77,7 +80,7 @@ void ComplexKeyCacheDictionary::getString( dict_struct.validateKeyTypes(key_types); auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); const auto null_value = StringRef{std::get(attribute.null_values)}; @@ -94,7 +97,7 @@ void ComplexKeyCacheDictionary::getString( dict_struct.validateKeyTypes(key_types); auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsString(attribute, key_columns, out, [&](const size_t row) { return def->getDataAt(row); }); } @@ -109,7 +112,7 @@ void ComplexKeyCacheDictionary::getString( dict_struct.validateKeyTypes(key_types); auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsString(attribute, key_columns, out, [&](const size_t) { return StringRef{def}; }); } @@ -249,7 +252,7 @@ void ComplexKeyCacheDictionary::createAttributes() attributes.push_back(createAttributeWithType(attribute.underlying_type, attribute.null_value)); if (attribute.hierarchical) - throw Exception{name + ": hierarchical attributes not supported for dictionary of type " + getTypeName(), + throw Exception{full_name + ": hierarchical attributes not supported for dictionary of type " + getTypeName(), ErrorCodes::TYPE_MISMATCH}; } } @@ -258,7 +261,7 @@ ComplexKeyCacheDictionary::Attribute & ComplexKeyCacheDictionary::getAttribute(c { const auto it = attribute_index_by_name.find(attribute_name); if (it == std::end(attribute_index_by_name)) - throw Exception{name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; + throw Exception{full_name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; return attributes[it->second]; } @@ -394,7 +397,7 @@ BlockInputStreamPtr ComplexKeyCacheDictionary::getBlockInputStream(const Names & void registerDictionaryComplexKeyCache(DictionaryFactory & factory) { - auto create_layout = [=](const std::string & name, + auto create_layout = [=](const std::string & full_name, const DictionaryStructure & dict_struct, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, @@ -405,15 +408,17 @@ void registerDictionaryComplexKeyCache(DictionaryFactory & factory) const auto & layout_prefix = config_prefix + ".layout"; const auto size = config.getInt(layout_prefix + ".complex_key_cache.size_in_cells"); if (size == 0) - throw Exception{name + ": dictionary of layout 'cache' cannot have 0 cells", ErrorCodes::TOO_SMALL_BUFFER_SIZE}; + throw Exception{full_name + ": dictionary of layout 'cache' cannot have 0 cells", ErrorCodes::TOO_SMALL_BUFFER_SIZE}; const bool require_nonempty = config.getBool(config_prefix + ".require_nonempty", false); if (require_nonempty) - throw Exception{name + ": dictionary of layout 'cache' cannot have 'require_nonempty' attribute set", + throw Exception{full_name + ": dictionary of layout 'cache' cannot have 'require_nonempty' attribute set", ErrorCodes::BAD_ARGUMENTS}; + const String database = config.getString(config_prefix + ".database", ""); + const String name = config.getString(config_prefix + ".name"); const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"}; - return std::make_unique(name, dict_struct, std::move(source_ptr), dict_lifetime, size); + return std::make_unique(database, name, dict_struct, std::move(source_ptr), dict_lifetime, size); }; factory.registerLayout("complex_key_cache", create_layout, true); } diff --git a/dbms/src/Dictionaries/ComplexKeyCacheDictionary.h b/dbms/src/Dictionaries/ComplexKeyCacheDictionary.h index e9269cb165a..4547a305f1d 100644 --- a/dbms/src/Dictionaries/ComplexKeyCacheDictionary.h +++ b/dbms/src/Dictionaries/ComplexKeyCacheDictionary.h @@ -42,6 +42,7 @@ class ComplexKeyCacheDictionary final : public IDictionaryBase { public: ComplexKeyCacheDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, @@ -50,7 +51,9 @@ public: std::string getKeyDescription() const { return key_description; } - std::string getName() const override { return name; } + const std::string & getDatabase() const override { return database; } + const std::string & getName() const override { return name; } + const std::string & getFullName() const override { return full_name; } std::string getTypeName() const override { return "ComplexKeyCache"; } @@ -75,7 +78,7 @@ public: std::shared_ptr clone() const override { - return std::make_shared(name, dict_struct, source_ptr->clone(), dict_lifetime, size); + return std::make_shared(database, name, dict_struct, source_ptr->clone(), dict_lifetime, size); } const IDictionarySource * getSource() const override { return source_ptr.get(); } @@ -668,7 +671,9 @@ private: bool isEmptyCell(const UInt64 idx) const; + const std::string database; const std::string name; + const std::string full_name; const DictionaryStructure dict_struct; const DictionarySourcePtr source_ptr; const DictionaryLifetime dict_lifetime; diff --git a/dbms/src/Dictionaries/ComplexKeyHashedDictionary.cpp b/dbms/src/Dictionaries/ComplexKeyHashedDictionary.cpp index 1dafde39a24..7d8d481e2fa 100644 --- a/dbms/src/Dictionaries/ComplexKeyHashedDictionary.cpp +++ b/dbms/src/Dictionaries/ComplexKeyHashedDictionary.cpp @@ -15,13 +15,16 @@ namespace ErrorCodes } ComplexKeyHashedDictionary::ComplexKeyHashedDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, const DictionaryLifetime dict_lifetime_, bool require_nonempty_, BlockPtr saved_block_) - : name{name_} + : database(database_) + , name(name_) + , full_name{database_.empty() ? name_ : (database_ + "." + name_)} , dict_struct(dict_struct_) , source_ptr{std::move(source_ptr_)} , dict_lifetime(dict_lifetime_) @@ -40,7 +43,7 @@ ComplexKeyHashedDictionary::ComplexKeyHashedDictionary( dict_struct.validateKeyTypes(key_types); \ \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ const auto null_value = std::get(attribute.null_values); \ \ @@ -72,7 +75,7 @@ void ComplexKeyHashedDictionary::getString( dict_struct.validateKeyTypes(key_types); const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); const auto & null_value = StringRef{std::get(attribute.null_values)}; @@ -94,7 +97,7 @@ void ComplexKeyHashedDictionary::getString( dict_struct.validateKeyTypes(key_types); \ \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ getItemsImpl( \ attribute, \ @@ -128,7 +131,7 @@ void ComplexKeyHashedDictionary::getString( dict_struct.validateKeyTypes(key_types); const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsImpl( attribute, @@ -148,7 +151,7 @@ void ComplexKeyHashedDictionary::getString( dict_struct.validateKeyTypes(key_types); \ \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ getItemsImpl( \ attribute, key_columns, [&](const size_t row, const auto value) { out[row] = value; }, [&](const size_t) { return def; }); \ @@ -179,7 +182,7 @@ void ComplexKeyHashedDictionary::getString( dict_struct.validateKeyTypes(key_types); const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsImpl( attribute, @@ -256,7 +259,7 @@ void ComplexKeyHashedDictionary::createAttributes() attributes.push_back(createAttributeWithType(attribute.underlying_type, attribute.null_value)); if (attribute.hierarchical) - throw Exception{name + ": hierarchical attributes not supported for dictionary of type " + getTypeName(), + throw Exception{full_name + ": hierarchical attributes not supported for dictionary of type " + getTypeName(), ErrorCodes::TYPE_MISMATCH}; } } @@ -397,7 +400,7 @@ void ComplexKeyHashedDictionary::loadData() updateData(); if (require_nonempty && 0 == element_count) - throw Exception{name + ": dictionary source is empty and 'require_nonempty' property is set.", ErrorCodes::DICTIONARY_IS_EMPTY}; + throw Exception{full_name + ": dictionary source is empty and 'require_nonempty' property is set.", ErrorCodes::DICTIONARY_IS_EMPTY}; } template @@ -630,7 +633,7 @@ const ComplexKeyHashedDictionary::Attribute & ComplexKeyHashedDictionary::getAtt { const auto it = attribute_index_by_name.find(attribute_name); if (it == std::end(attribute_index_by_name)) - throw Exception{name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; + throw Exception{full_name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; return attributes[it->second]; } @@ -742,7 +745,7 @@ BlockInputStreamPtr ComplexKeyHashedDictionary::getBlockInputStream(const Names void registerDictionaryComplexKeyHashed(DictionaryFactory & factory) { - auto create_layout = [=](const std::string & name, + auto create_layout = [=](const std::string &, const DictionaryStructure & dict_struct, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, @@ -751,12 +754,13 @@ void registerDictionaryComplexKeyHashed(DictionaryFactory & factory) if (!dict_struct.key) throw Exception{"'key' is required for dictionary of layout 'complex_key_hashed'", ErrorCodes::BAD_ARGUMENTS}; + const String database = config.getString(config_prefix + ".database", ""); + const String name = config.getString(config_prefix + ".name"); const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"}; const bool require_nonempty = config.getBool(config_prefix + ".require_nonempty", false); - return std::make_unique(name, dict_struct, std::move(source_ptr), dict_lifetime, require_nonempty); + return std::make_unique(database, name, dict_struct, std::move(source_ptr), dict_lifetime, require_nonempty); }; factory.registerLayout("complex_key_hashed", create_layout, true); } - } diff --git a/dbms/src/Dictionaries/ComplexKeyHashedDictionary.h b/dbms/src/Dictionaries/ComplexKeyHashedDictionary.h index 77941d6c5df..82b2a93b010 100644 --- a/dbms/src/Dictionaries/ComplexKeyHashedDictionary.h +++ b/dbms/src/Dictionaries/ComplexKeyHashedDictionary.h @@ -23,6 +23,7 @@ class ComplexKeyHashedDictionary final : public IDictionaryBase { public: ComplexKeyHashedDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, @@ -32,7 +33,9 @@ public: std::string getKeyDescription() const { return key_description; } - std::string getName() const override { return name; } + const std::string & getDatabase() const override { return database; } + const std::string & getName() const override { return name; } + const std::string & getFullName() const override { return full_name; } std::string getTypeName() const override { return "ComplexKeyHashed"; } @@ -48,7 +51,7 @@ public: std::shared_ptr clone() const override { - return std::make_shared(name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty, saved_block); + return std::make_shared(database, name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty, saved_block); } const IDictionarySource * getSource() const override { return source_ptr.get(); } @@ -233,7 +236,9 @@ private: template std::vector getKeys(const Attribute & attribute) const; + const std::string database; const std::string name; + const std::string full_name; const DictionaryStructure dict_struct; const DictionarySourcePtr source_ptr; const DictionaryLifetime dict_lifetime; diff --git a/dbms/src/Dictionaries/FlatDictionary.cpp b/dbms/src/Dictionaries/FlatDictionary.cpp index 68afdd355b8..a26d566e10c 100644 --- a/dbms/src/Dictionaries/FlatDictionary.cpp +++ b/dbms/src/Dictionaries/FlatDictionary.cpp @@ -21,13 +21,16 @@ static const auto max_array_size = 500000; FlatDictionary::FlatDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, const DictionaryLifetime dict_lifetime_, bool require_nonempty_, BlockPtr saved_block_) - : name{name_} + : database(database_) + , name(name_) + , full_name{database_.empty() ? name_ : (database_ + "." + name_)} , dict_struct(dict_struct_) , source_ptr{std::move(source_ptr_)} , dict_lifetime(dict_lifetime_) @@ -107,7 +110,7 @@ void FlatDictionary::isInConstantVector(const Key child_id, const PaddedPODArray void FlatDictionary::get##TYPE(const std::string & attribute_name, const PaddedPODArray & ids, ResultArrayType & out) const \ { \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ const auto null_value = std::get(attribute.null_values); \ \ @@ -133,7 +136,7 @@ DECLARE(Decimal128) void FlatDictionary::getString(const std::string & attribute_name, const PaddedPODArray & ids, ColumnString * out) const { const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); const auto & null_value = std::get(attribute.null_values); @@ -152,7 +155,7 @@ void FlatDictionary::getString(const std::string & attribute_name, const PaddedP ResultArrayType & out) const \ { \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ getItemsImpl( \ attribute, ids, [&](const size_t row, const auto value) { out[row] = value; }, [&](const size_t row) { return def[row]; }); \ @@ -177,7 +180,7 @@ void FlatDictionary::getString( const std::string & attribute_name, const PaddedPODArray & ids, const ColumnString * const def, ColumnString * const out) const { const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsImpl( attribute, @@ -191,7 +194,7 @@ void FlatDictionary::getString( const std::string & attribute_name, const PaddedPODArray & ids, const TYPE def, ResultArrayType & out) const \ { \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ getItemsImpl( \ attribute, ids, [&](const size_t row, const auto value) { out[row] = value; }, [&](const size_t) { return def; }); \ @@ -216,7 +219,7 @@ void FlatDictionary::getString( const std::string & attribute_name, const PaddedPODArray & ids, const String & def, ColumnString * const out) const { const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); FlatDictionary::getItemsImpl( attribute, @@ -297,7 +300,7 @@ void FlatDictionary::createAttributes() hierarchical_attribute = &attributes.back(); if (hierarchical_attribute->type != AttributeUnderlyingType::utUInt64) - throw Exception{name + ": hierarchical attribute must be UInt64.", ErrorCodes::TYPE_MISMATCH}; + throw Exception{full_name + ": hierarchical attribute must be UInt64.", ErrorCodes::TYPE_MISMATCH}; } } } @@ -404,7 +407,7 @@ void FlatDictionary::loadData() updateData(); if (require_nonempty && 0 == element_count) - throw Exception{name + ": dictionary source is empty and 'require_nonempty' property is set.", ErrorCodes::DICTIONARY_IS_EMPTY}; + throw Exception{full_name + ": dictionary source is empty and 'require_nonempty' property is set.", ErrorCodes::DICTIONARY_IS_EMPTY}; } @@ -578,7 +581,7 @@ template void FlatDictionary::resize(Attribute & attribute, const Key id) { if (id >= max_array_size) - throw Exception{name + ": identifier should be less than " + toString(max_array_size), ErrorCodes::ARGUMENT_OUT_OF_BOUND}; + throw Exception{full_name + ": identifier should be less than " + toString(max_array_size), ErrorCodes::ARGUMENT_OUT_OF_BOUND}; auto & array = std::get>(attribute.arrays); if (id >= array.size()) @@ -666,7 +669,7 @@ const FlatDictionary::Attribute & FlatDictionary::getAttribute(const std::string { const auto it = attribute_index_by_name.find(attribute_name); if (it == std::end(attribute_index_by_name)) - throw Exception{name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; + throw Exception{full_name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; return attributes[it->second]; } @@ -706,7 +709,7 @@ BlockInputStreamPtr FlatDictionary::getBlockInputStream(const Names & column_nam void registerDictionaryFlat(DictionaryFactory & factory) { - auto create_layout = [=](const std::string & name, + auto create_layout = [=](const std::string & full_name, const DictionaryStructure & dict_struct, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, @@ -716,13 +719,16 @@ void registerDictionaryFlat(DictionaryFactory & factory) throw Exception{"'key' is not supported for dictionary of layout 'flat'", ErrorCodes::UNSUPPORTED_METHOD}; if (dict_struct.range_min || dict_struct.range_max) - throw Exception{name + throw Exception{full_name + ": elements .structure.range_min and .structure.range_max should be defined only " "for a dictionary of layout 'range_hashed'", ErrorCodes::BAD_ARGUMENTS}; + + const String database = config.getString(config_prefix + ".database", ""); + const String name = config.getString(config_prefix + ".name"); const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"}; const bool require_nonempty = config.getBool(config_prefix + ".require_nonempty", false); - return std::make_unique(name, dict_struct, std::move(source_ptr), dict_lifetime, require_nonempty); + return std::make_unique(database, name, dict_struct, std::move(source_ptr), dict_lifetime, require_nonempty); }; factory.registerLayout("flat", create_layout, false); } diff --git a/dbms/src/Dictionaries/FlatDictionary.h b/dbms/src/Dictionaries/FlatDictionary.h index 1bb06348aab..636c7b9d092 100644 --- a/dbms/src/Dictionaries/FlatDictionary.h +++ b/dbms/src/Dictionaries/FlatDictionary.h @@ -22,6 +22,7 @@ class FlatDictionary final : public IDictionary { public: FlatDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, @@ -29,7 +30,9 @@ public: bool require_nonempty_, BlockPtr saved_block_ = nullptr); - std::string getName() const override { return name; } + const std::string & getDatabase() const override { return database; } + const std::string & getName() const override { return name; } + const std::string & getFullName() const override { return full_name; } std::string getTypeName() const override { return "Flat"; } @@ -45,7 +48,7 @@ public: std::shared_ptr clone() const override { - return std::make_shared(name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty, saved_block); + return std::make_shared(database, name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty, saved_block); } const IDictionarySource * getSource() const override { return source_ptr.get(); } @@ -222,7 +225,9 @@ private: PaddedPODArray getIds() const; + const std::string database; const std::string name; + const std::string full_name; const DictionaryStructure dict_struct; const DictionarySourcePtr source_ptr; const DictionaryLifetime dict_lifetime; diff --git a/dbms/src/Dictionaries/HashedDictionary.cpp b/dbms/src/Dictionaries/HashedDictionary.cpp index 78c871bebc4..025e2e040b9 100644 --- a/dbms/src/Dictionaries/HashedDictionary.cpp +++ b/dbms/src/Dictionaries/HashedDictionary.cpp @@ -31,6 +31,7 @@ namespace ErrorCodes HashedDictionary::HashedDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, @@ -38,7 +39,9 @@ HashedDictionary::HashedDictionary( bool require_nonempty_, bool sparse_, BlockPtr saved_block_) - : name{name_} + : database(database_) + , name(name_) + , full_name{database_.empty() ? name_ : (database_ + "." + name_)} , dict_struct(dict_struct_) , source_ptr{std::move(source_ptr_)} , dict_lifetime(dict_lifetime_) @@ -129,7 +132,7 @@ void HashedDictionary::isInConstantVector(const Key child_id, const PaddedPODArr const \ { \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ const auto null_value = std::get(attribute.null_values); \ \ @@ -155,7 +158,7 @@ DECLARE(Decimal128) void HashedDictionary::getString(const std::string & attribute_name, const PaddedPODArray & ids, ColumnString * out) const { const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); const auto & null_value = StringRef{std::get(attribute.null_values)}; @@ -174,7 +177,7 @@ void HashedDictionary::getString(const std::string & attribute_name, const Padde ResultArrayType & out) const \ { \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ getItemsImpl( \ attribute, ids, [&](const size_t row, const auto value) { out[row] = value; }, [&](const size_t row) { return def[row]; }); \ @@ -199,7 +202,7 @@ void HashedDictionary::getString( const std::string & attribute_name, const PaddedPODArray & ids, const ColumnString * const def, ColumnString * const out) const { const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsImpl( attribute, @@ -213,7 +216,7 @@ void HashedDictionary::getString( const std::string & attribute_name, const PaddedPODArray & ids, const TYPE & def, ResultArrayType & out) const \ { \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ getItemsImpl( \ attribute, ids, [&](const size_t row, const auto value) { out[row] = value; }, [&](const size_t) { return def; }); \ @@ -238,7 +241,7 @@ void HashedDictionary::getString( const std::string & attribute_name, const PaddedPODArray & ids, const String & def, ColumnString * const out) const { const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsImpl( attribute, @@ -317,7 +320,7 @@ void HashedDictionary::createAttributes() hierarchical_attribute = &attributes.back(); if (hierarchical_attribute->type != AttributeUnderlyingType::utUInt64) - throw Exception{name + ": hierarchical attribute must be UInt64.", ErrorCodes::TYPE_MISMATCH}; + throw Exception{full_name + ": hierarchical attribute must be UInt64.", ErrorCodes::TYPE_MISMATCH}; } } } @@ -424,7 +427,7 @@ void HashedDictionary::loadData() updateData(); if (require_nonempty && 0 == element_count) - throw Exception{name + ": dictionary source is empty and 'require_nonempty' property is set.", ErrorCodes::DICTIONARY_IS_EMPTY}; + throw Exception{full_name + ": dictionary source is empty and 'require_nonempty' property is set.", ErrorCodes::DICTIONARY_IS_EMPTY}; } template @@ -684,7 +687,7 @@ const HashedDictionary::Attribute & HashedDictionary::getAttribute(const std::st { const auto it = attribute_index_by_name.find(attribute_name); if (it == std::end(attribute_index_by_name)) - throw Exception{name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; + throw Exception{full_name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; return attributes[it->second]; } @@ -768,27 +771,31 @@ BlockInputStreamPtr HashedDictionary::getBlockInputStream(const Names & column_n void registerDictionaryHashed(DictionaryFactory & factory) { - auto create_layout = [=](const std::string & name, + auto create_layout = [=](const std::string & full_name, const DictionaryStructure & dict_struct, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, - DictionarySourcePtr source_ptr) -> DictionaryPtr + DictionarySourcePtr source_ptr, + bool sparse) -> DictionaryPtr { if (dict_struct.key) throw Exception{"'key' is not supported for dictionary of layout 'hashed'", ErrorCodes::UNSUPPORTED_METHOD}; if (dict_struct.range_min || dict_struct.range_max) - throw Exception{name + throw Exception{full_name + ": elements .structure.range_min and .structure.range_max should be defined only " "for a dictionary of layout 'range_hashed'", ErrorCodes::BAD_ARGUMENTS}; + + const String database = config.getString(config_prefix + ".database", ""); + const String name = config.getString(config_prefix + ".name"); const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"}; const bool require_nonempty = config.getBool(config_prefix + ".require_nonempty", false); - const bool sparse = name == "sparse_hashed"; - return std::make_unique(name, dict_struct, std::move(source_ptr), dict_lifetime, require_nonempty, sparse); + return std::make_unique(database, name, dict_struct, std::move(source_ptr), dict_lifetime, require_nonempty, sparse); }; - factory.registerLayout("hashed", create_layout, false); - factory.registerLayout("sparse_hashed", create_layout, false); + using namespace std::placeholders; + factory.registerLayout("hashed", std::bind(create_layout, _1, _2, _3, _4, _5, /* sparse = */ false), false); + factory.registerLayout("sparse_hashed", std::bind(create_layout, _1, _2, _3, _4, _5, /* sparse = */ true), false); } } diff --git a/dbms/src/Dictionaries/HashedDictionary.h b/dbms/src/Dictionaries/HashedDictionary.h index d4f55dc8e39..3f8eec979bb 100644 --- a/dbms/src/Dictionaries/HashedDictionary.h +++ b/dbms/src/Dictionaries/HashedDictionary.h @@ -26,6 +26,7 @@ class HashedDictionary final : public IDictionary { public: HashedDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, @@ -34,7 +35,9 @@ public: bool sparse_, BlockPtr saved_block_ = nullptr); - std::string getName() const override { return name; } + const std::string & getDatabase() const override { return database; } + const std::string & getName() const override { return name; } + const std::string & getFullName() const override { return full_name; } std::string getTypeName() const override { return sparse ? "SparseHashed" : "Hashed"; } @@ -50,7 +53,7 @@ public: std::shared_ptr clone() const override { - return std::make_shared(name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty, sparse, saved_block); + return std::make_shared(database, name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty, sparse, saved_block); } const IDictionarySource * getSource() const override { return source_ptr.get(); } @@ -262,7 +265,9 @@ private: template void isInImpl(const ChildType & child_ids, const AncestorType & ancestor_ids, PaddedPODArray & out) const; + const std::string database; const std::string name; + const std::string full_name; const DictionaryStructure dict_struct; const DictionarySourcePtr source_ptr; const DictionaryLifetime dict_lifetime; diff --git a/dbms/src/Dictionaries/IDictionary.h b/dbms/src/Dictionaries/IDictionary.h index 9c74c98e88a..e2b6f078a8e 100644 --- a/dbms/src/Dictionaries/IDictionary.h +++ b/dbms/src/Dictionaries/IDictionary.h @@ -25,6 +25,11 @@ struct IDictionaryBase : public IExternalLoadable { using Key = UInt64; + virtual const std::string & getDatabase() const = 0; + virtual const std::string & getName() const = 0; + virtual const std::string & getFullName() const = 0; + const std::string & getLoadableName() const override { return getFullName(); } + virtual std::string getTypeName() const = 0; virtual size_t getBytesAllocated() const = 0; diff --git a/dbms/src/Dictionaries/MongoDBDictionarySource.cpp b/dbms/src/Dictionaries/MongoDBDictionarySource.cpp index 75e5987b80c..97391a1ee59 100644 --- a/dbms/src/Dictionaries/MongoDBDictionarySource.cpp +++ b/dbms/src/Dictionaries/MongoDBDictionarySource.cpp @@ -316,7 +316,7 @@ BlockInputStreamPtr MongoDBDictionarySource::loadKeys(const Columns & key_column case AttributeUnderlyingType::utFloat32: case AttributeUnderlyingType::utFloat64: - key.add(attr.second.name, applyVisitor(FieldVisitorConvertToNumber(), (*key_columns[attr.first])[row_idx])); + key.add(attr.second.name, key_columns[attr.first]->getFloat64(row_idx)); break; case AttributeUnderlyingType::utString: diff --git a/dbms/src/Dictionaries/RangeHashedDictionary.cpp b/dbms/src/Dictionaries/RangeHashedDictionary.cpp index 1d80ea8c497..8fc16da2e32 100644 --- a/dbms/src/Dictionaries/RangeHashedDictionary.cpp +++ b/dbms/src/Dictionaries/RangeHashedDictionary.cpp @@ -68,12 +68,15 @@ static bool operator<(const RangeHashedDictionary::Range & left, const RangeHash RangeHashedDictionary::RangeHashedDictionary( - const std::string & dictionary_name_, + const std::string & database_, + const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, const DictionaryLifetime dict_lifetime_, bool require_nonempty_) - : dictionary_name{dictionary_name_} + : database(database_) + , name(name_) + , full_name{database_.empty() ? name_ : (database_ + "." + name_)} , dict_struct(dict_struct_) , source_ptr{std::move(source_ptr_)} , dict_lifetime(dict_lifetime_) @@ -156,7 +159,7 @@ void RangeHashedDictionary::createAttributes() attributes.push_back(createAttributeWithType(attribute.underlying_type, attribute.null_value)); if (attribute.hierarchical) - throw Exception{dictionary_name + ": hierarchical attributes not supported by " + getName() + " dictionary.", + throw Exception{full_name + ": hierarchical attributes not supported by " + getName() + " dictionary.", ErrorCodes::BAD_ARGUMENTS}; } } @@ -207,7 +210,7 @@ void RangeHashedDictionary::loadData() stream->readSuffix(); if (require_nonempty && 0 == element_count) - throw Exception{dictionary_name + ": dictionary source is empty and 'require_nonempty' property is set.", + throw Exception{full_name + ": dictionary source is empty and 'require_nonempty' property is set.", ErrorCodes::DICTIONARY_IS_EMPTY}; } @@ -520,7 +523,7 @@ const RangeHashedDictionary::Attribute & RangeHashedDictionary::getAttribute(con { const auto it = attribute_index_by_name.find(attribute_name); if (it == std::end(attribute_index_by_name)) - throw Exception{dictionary_name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; + throw Exception{full_name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; return attributes[it->second]; } @@ -674,7 +677,7 @@ BlockInputStreamPtr RangeHashedDictionary::getBlockInputStream(const Names & col void registerDictionaryRangeHashed(DictionaryFactory & factory) { - auto create_layout = [=](const std::string & name, + auto create_layout = [=](const std::string & full_name, const DictionaryStructure & dict_struct, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, @@ -684,12 +687,14 @@ void registerDictionaryRangeHashed(DictionaryFactory & factory) throw Exception{"'key' is not supported for dictionary of layout 'range_hashed'", ErrorCodes::UNSUPPORTED_METHOD}; if (!dict_struct.range_min || !dict_struct.range_max) - throw Exception{name + ": dictionary of layout 'range_hashed' requires .structure.range_min and .structure.range_max", + throw Exception{full_name + ": dictionary of layout 'range_hashed' requires .structure.range_min and .structure.range_max", ErrorCodes::BAD_ARGUMENTS}; + const String database = config.getString(config_prefix + ".database", ""); + const String name = config.getString(config_prefix + ".name"); const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"}; const bool require_nonempty = config.getBool(config_prefix + ".require_nonempty", false); - return std::make_unique(name, dict_struct, std::move(source_ptr), dict_lifetime, require_nonempty); + return std::make_unique(database, name, dict_struct, std::move(source_ptr), dict_lifetime, require_nonempty); }; factory.registerLayout("range_hashed", create_layout, false); } diff --git a/dbms/src/Dictionaries/RangeHashedDictionary.h b/dbms/src/Dictionaries/RangeHashedDictionary.h index 829553c68b3..eba10bbbdbb 100644 --- a/dbms/src/Dictionaries/RangeHashedDictionary.h +++ b/dbms/src/Dictionaries/RangeHashedDictionary.h @@ -18,13 +18,16 @@ class RangeHashedDictionary final : public IDictionaryBase { public: RangeHashedDictionary( - const std::string & dictionary_name_, + const std::string & database_, + const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, const DictionaryLifetime dict_lifetime_, bool require_nonempty_); - std::string getName() const override { return dictionary_name; } + const std::string & getDatabase() const override { return database; } + const std::string & getName() const override { return name; } + const std::string & getFullName() const override { return full_name; } std::string getTypeName() const override { return "RangeHashed"; } @@ -40,7 +43,7 @@ public: std::shared_ptr clone() const override { - return std::make_shared(dictionary_name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty); + return std::make_shared(database, name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty); } const IDictionarySource * getSource() const override { return source_ptr.get(); } @@ -208,7 +211,9 @@ private: friend struct RangeHashedDIctionaryCallGetBlockInputStreamImpl; - const std::string dictionary_name; + const std::string database; + const std::string name; + const std::string full_name; const DictionaryStructure dict_struct; const DictionarySourcePtr source_ptr; const DictionaryLifetime dict_lifetime; diff --git a/dbms/src/Dictionaries/TrieDictionary.cpp b/dbms/src/Dictionaries/TrieDictionary.cpp index 4432a1ec548..a16d30e32e9 100644 --- a/dbms/src/Dictionaries/TrieDictionary.cpp +++ b/dbms/src/Dictionaries/TrieDictionary.cpp @@ -35,12 +35,15 @@ namespace ErrorCodes } TrieDictionary::TrieDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, const DictionaryLifetime dict_lifetime_, bool require_nonempty_) - : name{name_} + : database(database_) + , name(name_) + , full_name{database_.empty() ? name_ : (database_ + "." + name_)} , dict_struct(dict_struct_) , source_ptr{std::move(source_ptr_)} , dict_lifetime(dict_lifetime_) @@ -75,7 +78,7 @@ TrieDictionary::~TrieDictionary() validateKeyTypes(key_types); \ \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ const auto null_value = std::get(attribute.null_values); \ \ @@ -107,7 +110,7 @@ void TrieDictionary::getString( validateKeyTypes(key_types); const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); const auto & null_value = StringRef{std::get(attribute.null_values)}; @@ -129,7 +132,7 @@ void TrieDictionary::getString( validateKeyTypes(key_types); \ \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ getItemsImpl( \ attribute, \ @@ -163,7 +166,7 @@ void TrieDictionary::getString( validateKeyTypes(key_types); const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsImpl( attribute, @@ -183,7 +186,7 @@ void TrieDictionary::getString( validateKeyTypes(key_types); \ \ const auto & attribute = getAttribute(attribute_name); \ - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::ut##TYPE); \ \ getItemsImpl( \ attribute, key_columns, [&](const size_t row, const auto value) { out[row] = value; }, [&](const size_t) { return def; }); \ @@ -214,7 +217,7 @@ void TrieDictionary::getString( validateKeyTypes(key_types); const auto & attribute = getAttribute(attribute_name); - checkAttributeType(name, attribute_name, attribute.type, AttributeUnderlyingType::utString); + checkAttributeType(full_name, attribute_name, attribute.type, AttributeUnderlyingType::utString); getItemsImpl( attribute, @@ -291,7 +294,7 @@ void TrieDictionary::createAttributes() attributes.push_back(createAttributeWithType(attribute.underlying_type, attribute.null_value)); if (attribute.hierarchical) - throw Exception{name + ": hierarchical attributes not supported for dictionary of type " + getTypeName(), + throw Exception{full_name + ": hierarchical attributes not supported for dictionary of type " + getTypeName(), ErrorCodes::TYPE_MISMATCH}; } } @@ -337,7 +340,7 @@ void TrieDictionary::loadData() stream->readSuffix(); if (require_nonempty && 0 == element_count) - throw Exception{name + ": dictionary source is empty and 'require_nonempty' property is set.", ErrorCodes::DICTIONARY_IS_EMPTY}; + throw Exception{full_name + ": dictionary source is empty and 'require_nonempty' property is set.", ErrorCodes::DICTIONARY_IS_EMPTY}; } template @@ -627,7 +630,7 @@ const TrieDictionary::Attribute & TrieDictionary::getAttribute(const std::string { const auto it = attribute_index_by_name.find(attribute_name); if (it == std::end(attribute_index_by_name)) - throw Exception{name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; + throw Exception{full_name + ": no such attribute '" + attribute_name + "'", ErrorCodes::BAD_ARGUMENTS}; return attributes[it->second]; } @@ -767,7 +770,7 @@ BlockInputStreamPtr TrieDictionary::getBlockInputStream(const Names & column_nam void registerDictionaryTrie(DictionaryFactory & factory) { - auto create_layout = [=](const std::string & name, + auto create_layout = [=](const std::string &, const DictionaryStructure & dict_struct, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, @@ -776,10 +779,12 @@ void registerDictionaryTrie(DictionaryFactory & factory) if (!dict_struct.key) throw Exception{"'key' is required for dictionary of layout 'ip_trie'", ErrorCodes::BAD_ARGUMENTS}; + const String database = config.getString(config_prefix + ".database", ""); + const String name = config.getString(config_prefix + ".name"); const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"}; const bool require_nonempty = config.getBool(config_prefix + ".require_nonempty", false); // This is specialised trie for storing IPv4 and IPv6 prefixes. - return std::make_unique(name, dict_struct, std::move(source_ptr), dict_lifetime, require_nonempty); + return std::make_unique(database, name, dict_struct, std::move(source_ptr), dict_lifetime, require_nonempty); }; factory.registerLayout("ip_trie", create_layout, true); } diff --git a/dbms/src/Dictionaries/TrieDictionary.h b/dbms/src/Dictionaries/TrieDictionary.h index 7e41942b873..5168eec5a74 100644 --- a/dbms/src/Dictionaries/TrieDictionary.h +++ b/dbms/src/Dictionaries/TrieDictionary.h @@ -23,6 +23,7 @@ class TrieDictionary final : public IDictionaryBase { public: TrieDictionary( + const std::string & database_, const std::string & name_, const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, @@ -33,7 +34,9 @@ public: std::string getKeyDescription() const { return key_description; } - std::string getName() const override { return name; } + const std::string & getDatabase() const override { return database; } + const std::string & getName() const override { return name; } + const std::string & getFullName() const override { return full_name; } std::string getTypeName() const override { return "Trie"; } @@ -49,7 +52,7 @@ public: std::shared_ptr clone() const override { - return std::make_shared(name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty); + return std::make_shared(database, name, dict_struct, source_ptr->clone(), dict_lifetime, require_nonempty); } const IDictionarySource * getSource() const override { return source_ptr.get(); } @@ -232,7 +235,9 @@ private: Columns getKeyColumns() const; + const std::string database; const std::string name; + const std::string full_name; const DictionaryStructure dict_struct; const DictionarySourcePtr source_ptr; const DictionaryLifetime dict_lifetime; diff --git a/dbms/src/Dictionaries/getDictionaryConfigurationFromAST.cpp b/dbms/src/Dictionaries/getDictionaryConfigurationFromAST.cpp index f0e49bcc4ac..dbbcc0e41a8 100644 --- a/dbms/src/Dictionaries/getDictionaryConfigurationFromAST.cpp +++ b/dbms/src/Dictionaries/getDictionaryConfigurationFromAST.cpp @@ -421,7 +421,7 @@ void checkPrimaryKey(const NamesToTypeNames & all_attrs, const Names & key_attrs } -DictionaryConfigurationPtr getDictionaryConfigurationFromAST(const ASTCreateQuery & query, const String & database_name) +DictionaryConfigurationPtr getDictionaryConfigurationFromAST(const ASTCreateQuery & query) { checkAST(query); @@ -434,10 +434,14 @@ DictionaryConfigurationPtr getDictionaryConfigurationFromAST(const ASTCreateQuer AutoPtr name_element(xml_document->createElement("name")); current_dictionary->appendChild(name_element); - String full_name = (!database_name.empty() ? database_name : query.database) + "." + query.table; - AutoPtr name(xml_document->createTextNode(full_name)); + AutoPtr name(xml_document->createTextNode(query.table)); name_element->appendChild(name); + AutoPtr database_element(xml_document->createElement("database")); + current_dictionary->appendChild(database_element); + AutoPtr database(xml_document->createTextNode(query.database)); + database_element->appendChild(database); + AutoPtr structure_element(xml_document->createElement("structure")); current_dictionary->appendChild(structure_element); Names pk_attrs = getPrimaryKeyColumns(query.dictionary->primary_key); diff --git a/dbms/src/Dictionaries/getDictionaryConfigurationFromAST.h b/dbms/src/Dictionaries/getDictionaryConfigurationFromAST.h index adfcd7f2768..bb48765c492 100644 --- a/dbms/src/Dictionaries/getDictionaryConfigurationFromAST.h +++ b/dbms/src/Dictionaries/getDictionaryConfigurationFromAST.h @@ -10,6 +10,6 @@ using DictionaryConfigurationPtr = Poco::AutoPtrgetString("dictionary.name"), "test.dict1"); + EXPECT_EQ(config->getString("dictionary.database"), "test"); + EXPECT_EQ(config->getString("dictionary.name"), "dict1"); /// lifetime EXPECT_EQ(config->getInt("dictionary.lifetime.min"), 1); diff --git a/dbms/src/Disks/DiskSpaceMonitor.cpp b/dbms/src/Disks/DiskSpaceMonitor.cpp index 2feb289cfc2..59b8c21119a 100644 --- a/dbms/src/Disks/DiskSpaceMonitor.cpp +++ b/dbms/src/Disks/DiskSpaceMonitor.cpp @@ -208,6 +208,30 @@ StoragePolicy::StoragePolicy(String name_, Volumes volumes_, double move_factor_ } +bool StoragePolicy::isDefaultPolicy() const +{ + /// Guessing if this policy is default, not 100% correct though. + + if (getName() != "default") + return false; + + if (volumes.size() != 1) + return false; + + if (volumes[0]->getName() != "default") + return false; + + const auto & disks = volumes[0]->disks; + if (disks.size() != 1) + return false; + + if (disks[0]->getName() != "default") + return false; + + return true; +} + + Disks StoragePolicy::getDisks() const { Disks res; diff --git a/dbms/src/Disks/DiskSpaceMonitor.h b/dbms/src/Disks/DiskSpaceMonitor.h index 252fb72f3f4..3d2216b545b 100644 --- a/dbms/src/Disks/DiskSpaceMonitor.h +++ b/dbms/src/Disks/DiskSpaceMonitor.h @@ -103,6 +103,8 @@ public: StoragePolicy(String name_, Volumes volumes_, double move_factor_); + bool isDefaultPolicy() const; + /// Returns disks ordered by volumes priority Disks getDisks() const; diff --git a/dbms/src/Functions/CRC.cpp b/dbms/src/Functions/CRC.cpp index 0af35387639..a16888ffe46 100644 --- a/dbms/src/Functions/CRC.cpp +++ b/dbms/src/Functions/CRC.cpp @@ -2,7 +2,7 @@ #include #include #include -#include "registerFunctions.h" + namespace { diff --git a/dbms/src/Functions/FunctionFQDN.cpp b/dbms/src/Functions/FunctionFQDN.cpp index 90aa7d35383..ed49b43632e 100644 --- a/dbms/src/Functions/FunctionFQDN.cpp +++ b/dbms/src/Functions/FunctionFQDN.cpp @@ -3,7 +3,7 @@ #include #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionFactory.h b/dbms/src/Functions/FunctionFactory.h index 75930f92c46..401e774939d 100644 --- a/dbms/src/Functions/FunctionFactory.h +++ b/dbms/src/Functions/FunctionFactory.h @@ -2,8 +2,6 @@ #include #include -#include "URL/registerFunctionsURL.h" -#include "registerFunctions.h" #include #include diff --git a/dbms/src/Functions/FunctionJoinGet.cpp b/dbms/src/Functions/FunctionJoinGet.cpp index 3bcbf69e21e..83d0cca1694 100644 --- a/dbms/src/Functions/FunctionJoinGet.cpp +++ b/dbms/src/Functions/FunctionJoinGet.cpp @@ -5,7 +5,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/FunctionsBitmap.cpp b/dbms/src/Functions/FunctionsBitmap.cpp index 240299e5ced..c94566b04b0 100644 --- a/dbms/src/Functions/FunctionsBitmap.cpp +++ b/dbms/src/Functions/FunctionsBitmap.cpp @@ -1,5 +1,4 @@ #include -#include "registerFunctions.h" // TODO include this last because of a broken roaring header. See the comment // inside. diff --git a/dbms/src/Functions/FunctionsCoding.cpp b/dbms/src/Functions/FunctionsCoding.cpp index 997c42e55ca..97add9bf32a 100644 --- a/dbms/src/Functions/FunctionsCoding.cpp +++ b/dbms/src/Functions/FunctionsCoding.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionsConversion.cpp b/dbms/src/Functions/FunctionsConversion.cpp index 1d6a24d99b4..0bd7d1a27e8 100644 --- a/dbms/src/Functions/FunctionsConversion.cpp +++ b/dbms/src/Functions/FunctionsConversion.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionsEmbeddedDictionaries.cpp b/dbms/src/Functions/FunctionsEmbeddedDictionaries.cpp index 683de258ef7..eeaea9a32a5 100644 --- a/dbms/src/Functions/FunctionsEmbeddedDictionaries.cpp +++ b/dbms/src/Functions/FunctionsEmbeddedDictionaries.cpp @@ -1,6 +1,6 @@ #include "FunctionFactory.h" #include "FunctionsEmbeddedDictionaries.h" -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionsExternalDictionaries.cpp b/dbms/src/Functions/FunctionsExternalDictionaries.cpp index 65909564702..3d536630d7a 100644 --- a/dbms/src/Functions/FunctionsExternalDictionaries.cpp +++ b/dbms/src/Functions/FunctionsExternalDictionaries.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionsExternalDictionaries.h b/dbms/src/Functions/FunctionsExternalDictionaries.h index 33cb05e2e7b..94bc861de13 100644 --- a/dbms/src/Functions/FunctionsExternalDictionaries.h +++ b/dbms/src/Functions/FunctionsExternalDictionaries.h @@ -127,10 +127,10 @@ private: auto dict = dictionaries_loader.getDictionary(dict_name_col->getValue()); const auto dict_ptr = dict.get(); - if (!context.hasDictionaryAccessRights(dict_ptr->getName())) + if (!context.hasDictionaryAccessRights(dict_ptr->getFullName())) { throw Exception{"For function " + getName() + ", cannot access dictionary " - + dict->getName() + " on database " + context.getCurrentDatabase(), ErrorCodes::DICTIONARY_ACCESS_DENIED}; + + dict->getFullName() + " on database " + context.getCurrentDatabase(), ErrorCodes::DICTIONARY_ACCESS_DENIED}; } if (!executeDispatchSimple(block, arguments, result, dict_ptr) && @@ -302,10 +302,10 @@ private: auto dict = dictionaries_loader.getDictionary(dict_name_col->getValue()); const auto dict_ptr = dict.get(); - if (!context.hasDictionaryAccessRights(dict_ptr->getName())) + if (!context.hasDictionaryAccessRights(dict_ptr->getFullName())) { throw Exception{"For function " + getName() + ", cannot access dictionary " - + dict->getName() + " on database " + context.getCurrentDatabase(), ErrorCodes::DICTIONARY_ACCESS_DENIED}; + + dict->getFullName() + " on database " + context.getCurrentDatabase(), ErrorCodes::DICTIONARY_ACCESS_DENIED}; } if (!executeDispatch(block, arguments, result, dict_ptr) && @@ -488,10 +488,10 @@ private: auto dict = dictionaries_loader.getDictionary(dict_name_col->getValue()); const auto dict_ptr = dict.get(); - if (!context.hasDictionaryAccessRights(dict_ptr->getName())) + if (!context.hasDictionaryAccessRights(dict_ptr->getFullName())) { throw Exception{"For function " + getName() + ", cannot access dictionary " - + dict->getName() + " on database " + context.getCurrentDatabase(), ErrorCodes::DICTIONARY_ACCESS_DENIED}; + + dict->getFullName() + " on database " + context.getCurrentDatabase(), ErrorCodes::DICTIONARY_ACCESS_DENIED}; } if (!executeDispatch(block, arguments, result, dict_ptr) && diff --git a/dbms/src/Functions/FunctionsExternalModels.cpp b/dbms/src/Functions/FunctionsExternalModels.cpp index a9d3ee45c91..df9c438d4ca 100644 --- a/dbms/src/Functions/FunctionsExternalModels.cpp +++ b/dbms/src/Functions/FunctionsExternalModels.cpp @@ -15,7 +15,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/FunctionsFindCluster.cpp b/dbms/src/Functions/FunctionsFindCluster.cpp index c05740e5f97..4f7caf8d536 100644 --- a/dbms/src/Functions/FunctionsFindCluster.cpp +++ b/dbms/src/Functions/FunctionsFindCluster.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionsFormatting.cpp b/dbms/src/Functions/FunctionsFormatting.cpp index aca4df091db..7582e234622 100644 --- a/dbms/src/Functions/FunctionsFormatting.cpp +++ b/dbms/src/Functions/FunctionsFormatting.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionsHashing.cpp b/dbms/src/Functions/FunctionsHashing.cpp index aab0f6e1e16..8705e6bfaa3 100644 --- a/dbms/src/Functions/FunctionsHashing.cpp +++ b/dbms/src/Functions/FunctionsHashing.cpp @@ -1,7 +1,7 @@ #include "FunctionsHashing.h" #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionsJSON.cpp b/dbms/src/Functions/FunctionsJSON.cpp index 4e62a06c0a9..79dea768f61 100644 --- a/dbms/src/Functions/FunctionsJSON.cpp +++ b/dbms/src/Functions/FunctionsJSON.cpp @@ -1,6 +1,5 @@ #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/FunctionsRandom.cpp b/dbms/src/Functions/FunctionsRandom.cpp index a6865b96e0b..19b2f08cdba 100644 --- a/dbms/src/Functions/FunctionsRandom.cpp +++ b/dbms/src/Functions/FunctionsRandom.cpp @@ -3,7 +3,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/FunctionsReinterpret.cpp b/dbms/src/Functions/FunctionsReinterpret.cpp index 61a1e56ceac..31acae0c6ea 100644 --- a/dbms/src/Functions/FunctionsReinterpret.cpp +++ b/dbms/src/Functions/FunctionsReinterpret.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionsRound.cpp b/dbms/src/Functions/FunctionsRound.cpp index 3c48ff26c1a..b1349bd2164 100644 --- a/dbms/src/Functions/FunctionsRound.cpp +++ b/dbms/src/Functions/FunctionsRound.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionsRound.h b/dbms/src/Functions/FunctionsRound.h index 4ba3a9d24db..1a85db5ecb2 100644 --- a/dbms/src/Functions/FunctionsRound.h +++ b/dbms/src/Functions/FunctionsRound.h @@ -12,6 +12,7 @@ #include "IFunctionImpl.h" #include #include +#include #include #include #include @@ -702,7 +703,7 @@ private: } template - void executeImplNumToNum(const Container & src, Container & dst, const Array & boundaries) + void NO_INLINE executeImplNumToNum(const Container & src, Container & dst, const Array & boundaries) { using ValueType = typename Container::value_type; std::vector boundary_values(boundaries.size()); @@ -714,20 +715,53 @@ private: size_t size = src.size(); dst.resize(size); - for (size_t i = 0; i < size; ++i) + + if (boundary_values.size() < 32) /// Just a guess { - auto it = std::upper_bound(boundary_values.begin(), boundary_values.end(), src[i]); - if (it == boundary_values.end()) + /// Linear search with value on previous iteration as a hint. + /// Not optimal if the size of list is large and distribution of values is uniform random. + + auto begin = boundary_values.begin(); + auto end = boundary_values.end(); + auto it = begin + (end - begin) / 2; + + for (size_t i = 0; i < size; ++i) { - dst[i] = boundary_values.back(); + auto value = src[i]; + + if (*it < value) + { + while (it != end && *it <= value) + ++it; + if (it != begin) + --it; + } + else + { + while (*it > value && it != begin) + --it; + } + + dst[i] = *it; } - else if (it == boundary_values.begin()) + } + else + { + for (size_t i = 0; i < size; ++i) { - dst[i] = boundary_values.front(); - } - else - { - dst[i] = *(it - 1); + auto it = std::upper_bound(boundary_values.begin(), boundary_values.end(), src[i]); + if (it == boundary_values.end()) + { + dst[i] = boundary_values.back(); + } + else if (it == boundary_values.begin()) + { + dst[i] = boundary_values.front(); + } + else + { + dst[i] = *(it - 1); + } } } } diff --git a/dbms/src/Functions/FunctionsStringArray.cpp b/dbms/src/Functions/FunctionsStringArray.cpp index 6f50369d52f..247d9ffb704 100644 --- a/dbms/src/Functions/FunctionsStringArray.cpp +++ b/dbms/src/Functions/FunctionsStringArray.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/FunctionsStringRegex.cpp b/dbms/src/Functions/FunctionsStringRegex.cpp index 6f0c52347fe..dc8bbc9a937 100644 --- a/dbms/src/Functions/FunctionsStringRegex.cpp +++ b/dbms/src/Functions/FunctionsStringRegex.cpp @@ -1,6 +1,4 @@ #include "FunctionsStringRegex.h" -#include "registerFunctions.h" - #include "FunctionsStringSearch.h" #include #include diff --git a/dbms/src/Functions/FunctionsStringSearch.cpp b/dbms/src/Functions/FunctionsStringSearch.cpp index 25ef3c7c800..c39d536927c 100644 --- a/dbms/src/Functions/FunctionsStringSearch.cpp +++ b/dbms/src/Functions/FunctionsStringSearch.cpp @@ -1,5 +1,4 @@ #include "FunctionsStringSearch.h" -#include "registerFunctions.h" #include #include diff --git a/dbms/src/Functions/FunctionsStringSimilarity.cpp b/dbms/src/Functions/FunctionsStringSimilarity.cpp index 464458f6288..17f7b004fef 100644 --- a/dbms/src/Functions/FunctionsStringSimilarity.cpp +++ b/dbms/src/Functions/FunctionsStringSimilarity.cpp @@ -1,6 +1,4 @@ #include -#include "registerFunctions.h" - #include #include #include diff --git a/dbms/src/Functions/GeoUtils.h b/dbms/src/Functions/GeoUtils.h index b13faa0f014..3a5e0202469 100644 --- a/dbms/src/Functions/GeoUtils.h +++ b/dbms/src/Functions/GeoUtils.h @@ -261,6 +261,10 @@ void PointInPolygonWithGrid::buildGrid() for (size_t row = 0; row < grid_size; ++row) { +#pragma GCC diagnostic push +#if !__clang__ +#pragma GCC diagnostic ignored "-Wmaybe-uninitialized" +#endif CoordinateType y_min = min_corner.y() + row * cell_height; CoordinateType y_max = min_corner.y() + (row + 1) * cell_height; @@ -268,6 +272,7 @@ void PointInPolygonWithGrid::buildGrid() { CoordinateType x_min = min_corner.x() + col * cell_width; CoordinateType x_max = min_corner.x() + (col + 1) * cell_width; +#pragma GCC diagnostic pop Box cell_box(Point(x_min, y_min), Point(x_max, y_max)); Polygon cell_bound; diff --git a/dbms/src/Functions/URL/basename.cpp b/dbms/src/Functions/URL/basename.cpp index a180b2899a8..1d7a3a3bc61 100644 --- a/dbms/src/Functions/URL/basename.cpp +++ b/dbms/src/Functions/URL/basename.cpp @@ -1,6 +1,5 @@ #include #include -#include #include #include "FunctionsURL.h" diff --git a/dbms/src/Functions/URL/registerFunctionsURL.cpp b/dbms/src/Functions/URL/registerFunctionsURL.cpp index 8ca5131abbf..66a847185f3 100644 --- a/dbms/src/Functions/URL/registerFunctionsURL.cpp +++ b/dbms/src/Functions/URL/registerFunctionsURL.cpp @@ -1,8 +1,31 @@ -#include "registerFunctionsURL.h" - namespace DB { +class FunctionFactory; + +void registerFunctionProtocol(FunctionFactory & factory); +void registerFunctionDomain(FunctionFactory & factory); +void registerFunctionDomainWithoutWWW(FunctionFactory & factory); +void registerFunctionFirstSignificantSubdomain(FunctionFactory & factory); +void registerFunctionTopLevelDomain(FunctionFactory & factory); +void registerFunctionPath(FunctionFactory & factory); +void registerFunctionPathFull(FunctionFactory & factory); +void registerFunctionQueryString(FunctionFactory & factory); +void registerFunctionFragment(FunctionFactory & factory); +void registerFunctionQueryStringAndFragment(FunctionFactory & factory); +void registerFunctionExtractURLParameter(FunctionFactory & factory); +void registerFunctionExtractURLParameters(FunctionFactory & factory); +void registerFunctionExtractURLParameterNames(FunctionFactory & factory); +void registerFunctionURLHierarchy(FunctionFactory & factory); +void registerFunctionURLPathHierarchy(FunctionFactory & factory); +void registerFunctionCutToFirstSignificantSubdomain(FunctionFactory & factory); +void registerFunctionCutWWW(FunctionFactory & factory); +void registerFunctionCutQueryString(FunctionFactory & factory); +void registerFunctionCutFragment(FunctionFactory & factory); +void registerFunctionCutQueryStringAndFragment(FunctionFactory & factory); +void registerFunctionCutURLParameter(FunctionFactory & factory); +void registerFunctionDecodeURLComponent(FunctionFactory & factory); + void registerFunctionsURL(FunctionFactory & factory) { registerFunctionProtocol(factory); diff --git a/dbms/src/Functions/URL/registerFunctionsURL.h b/dbms/src/Functions/URL/registerFunctionsURL.h deleted file mode 100644 index 94ba5a037a4..00000000000 --- a/dbms/src/Functions/URL/registerFunctionsURL.h +++ /dev/null @@ -1,32 +0,0 @@ -#pragma once - -namespace DB -{ -class FunctionFactory; - -void registerFunctionProtocol(FunctionFactory &); -void registerFunctionDomain(FunctionFactory &); -void registerFunctionDomainWithoutWWW(FunctionFactory &); -void registerFunctionFirstSignificantSubdomain(FunctionFactory &); -void registerFunctionTopLevelDomain(FunctionFactory &); -void registerFunctionPath(FunctionFactory &); -void registerFunctionPathFull(FunctionFactory &); -void registerFunctionQueryString(FunctionFactory &); -void registerFunctionFragment(FunctionFactory &); -void registerFunctionQueryStringAndFragment(FunctionFactory &); -void registerFunctionExtractURLParameter(FunctionFactory &); -void registerFunctionExtractURLParameters(FunctionFactory &); -void registerFunctionExtractURLParameterNames(FunctionFactory &); -void registerFunctionURLHierarchy(FunctionFactory &); -void registerFunctionURLPathHierarchy(FunctionFactory &); -void registerFunctionCutToFirstSignificantSubdomain(FunctionFactory &); -void registerFunctionCutWWW(FunctionFactory &); -void registerFunctionCutQueryString(FunctionFactory &); -void registerFunctionCutFragment(FunctionFactory &); -void registerFunctionCutQueryStringAndFragment(FunctionFactory &); -void registerFunctionCutURLParameter(FunctionFactory &); -void registerFunctionDecodeURLComponent(FunctionFactory &); - -void registerFunctionsURL(FunctionFactory &); - -} diff --git a/dbms/src/Functions/acos.cpp b/dbms/src/Functions/acos.cpp index e4fc8146eda..61e213acabf 100644 --- a/dbms/src/Functions/acos.cpp +++ b/dbms/src/Functions/acos.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/addDays.cpp b/dbms/src/Functions/addDays.cpp index f8d384a9c08..da85377323f 100644 --- a/dbms/src/Functions/addDays.cpp +++ b/dbms/src/Functions/addDays.cpp @@ -1,7 +1,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/addHours.cpp b/dbms/src/Functions/addHours.cpp index f0a70b91328..3052f7d0acd 100644 --- a/dbms/src/Functions/addHours.cpp +++ b/dbms/src/Functions/addHours.cpp @@ -1,7 +1,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/addMinutes.cpp b/dbms/src/Functions/addMinutes.cpp index fc1c4a50d45..5c22059f792 100644 --- a/dbms/src/Functions/addMinutes.cpp +++ b/dbms/src/Functions/addMinutes.cpp @@ -1,7 +1,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/addMonths.cpp b/dbms/src/Functions/addMonths.cpp index 661a4daf75e..d2f44d4efbd 100644 --- a/dbms/src/Functions/addMonths.cpp +++ b/dbms/src/Functions/addMonths.cpp @@ -1,7 +1,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/addQuarters.cpp b/dbms/src/Functions/addQuarters.cpp index eaf64f0b85d..dd158186383 100644 --- a/dbms/src/Functions/addQuarters.cpp +++ b/dbms/src/Functions/addQuarters.cpp @@ -1,7 +1,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/addSeconds.cpp b/dbms/src/Functions/addSeconds.cpp index 1fedcad7f28..efc7129a62e 100644 --- a/dbms/src/Functions/addSeconds.cpp +++ b/dbms/src/Functions/addSeconds.cpp @@ -1,7 +1,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/addWeeks.cpp b/dbms/src/Functions/addWeeks.cpp index 9751913bfc9..050091c0b74 100644 --- a/dbms/src/Functions/addWeeks.cpp +++ b/dbms/src/Functions/addWeeks.cpp @@ -1,7 +1,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/addYears.cpp b/dbms/src/Functions/addYears.cpp index fd338483b77..f47e13a144b 100644 --- a/dbms/src/Functions/addYears.cpp +++ b/dbms/src/Functions/addYears.cpp @@ -1,7 +1,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/addressToLine.cpp b/dbms/src/Functions/addressToLine.cpp index d1e2ecb7658..b87a3816fc5 100644 --- a/dbms/src/Functions/addressToLine.cpp +++ b/dbms/src/Functions/addressToLine.cpp @@ -18,7 +18,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/addressToSymbol.cpp b/dbms/src/Functions/addressToSymbol.cpp index 91c95252909..e9e9f8e6ca6 100644 --- a/dbms/src/Functions/addressToSymbol.cpp +++ b/dbms/src/Functions/addressToSymbol.cpp @@ -9,7 +9,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/appendTrailingCharIfAbsent.cpp b/dbms/src/Functions/appendTrailingCharIfAbsent.cpp index b3829e87116..1c3267343ca 100644 --- a/dbms/src/Functions/appendTrailingCharIfAbsent.cpp +++ b/dbms/src/Functions/appendTrailingCharIfAbsent.cpp @@ -5,7 +5,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/array/array.cpp b/dbms/src/Functions/array/array.cpp index 0dc0196357a..d517ced8203 100644 --- a/dbms/src/Functions/array/array.cpp +++ b/dbms/src/Functions/array/array.cpp @@ -4,7 +4,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayAll.cpp b/dbms/src/Functions/array/arrayAll.cpp index 6f9771d4f0a..43d10f0eb4f 100644 --- a/dbms/src/Functions/array/arrayAll.cpp +++ b/dbms/src/Functions/array/arrayAll.cpp @@ -2,7 +2,6 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayCompact.cpp b/dbms/src/Functions/array/arrayCompact.cpp index d57b108a597..489d18440e0 100644 --- a/dbms/src/Functions/array/arrayCompact.cpp +++ b/dbms/src/Functions/array/arrayCompact.cpp @@ -4,7 +4,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayConcat.cpp b/dbms/src/Functions/array/arrayConcat.cpp index 06e5db6e93e..30da20c7766 100644 --- a/dbms/src/Functions/array/arrayConcat.cpp +++ b/dbms/src/Functions/array/arrayConcat.cpp @@ -8,7 +8,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayCount.cpp b/dbms/src/Functions/array/arrayCount.cpp index cbb27b8857b..49623cf0446 100644 --- a/dbms/src/Functions/array/arrayCount.cpp +++ b/dbms/src/Functions/array/arrayCount.cpp @@ -2,7 +2,6 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayCumSum.cpp b/dbms/src/Functions/array/arrayCumSum.cpp index d8be7aa3562..79f705e74fa 100644 --- a/dbms/src/Functions/array/arrayCumSum.cpp +++ b/dbms/src/Functions/array/arrayCumSum.cpp @@ -4,7 +4,6 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayCumSumNonNegative.cpp b/dbms/src/Functions/array/arrayCumSumNonNegative.cpp index b07fe7b6faf..88a2b258571 100644 --- a/dbms/src/Functions/array/arrayCumSumNonNegative.cpp +++ b/dbms/src/Functions/array/arrayCumSumNonNegative.cpp @@ -4,7 +4,6 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayDifference.cpp b/dbms/src/Functions/array/arrayDifference.cpp index fe01ab5a366..545749e5ec0 100644 --- a/dbms/src/Functions/array/arrayDifference.cpp +++ b/dbms/src/Functions/array/arrayDifference.cpp @@ -4,7 +4,6 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayDistinct.cpp b/dbms/src/Functions/array/arrayDistinct.cpp index e2bcb532b08..3246539d497 100644 --- a/dbms/src/Functions/array/arrayDistinct.cpp +++ b/dbms/src/Functions/array/arrayDistinct.cpp @@ -9,7 +9,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayElement.cpp b/dbms/src/Functions/array/arrayElement.cpp index 876f7a49755..2921a4bd02a 100644 --- a/dbms/src/Functions/array/arrayElement.cpp +++ b/dbms/src/Functions/array/arrayElement.cpp @@ -12,7 +12,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayEnumerate.cpp b/dbms/src/Functions/array/arrayEnumerate.cpp index a35a9e63b69..a228c310fbc 100644 --- a/dbms/src/Functions/array/arrayEnumerate.cpp +++ b/dbms/src/Functions/array/arrayEnumerate.cpp @@ -5,7 +5,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayEnumerateDense.cpp b/dbms/src/Functions/array/arrayEnumerateDense.cpp index be2cb2cb69e..4539aed18ab 100644 --- a/dbms/src/Functions/array/arrayEnumerateDense.cpp +++ b/dbms/src/Functions/array/arrayEnumerateDense.cpp @@ -1,6 +1,5 @@ #include "arrayEnumerateExtended.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayEnumerateDenseRanked.cpp b/dbms/src/Functions/array/arrayEnumerateDenseRanked.cpp index 0b6eba4639f..735211fb3df 100644 --- a/dbms/src/Functions/array/arrayEnumerateDenseRanked.cpp +++ b/dbms/src/Functions/array/arrayEnumerateDenseRanked.cpp @@ -1,6 +1,5 @@ #include #include "arrayEnumerateRanked.h" -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayEnumerateRanked.cpp b/dbms/src/Functions/array/arrayEnumerateRanked.cpp index 758c2db4414..7be0cbc44ce 100644 --- a/dbms/src/Functions/array/arrayEnumerateRanked.cpp +++ b/dbms/src/Functions/array/arrayEnumerateRanked.cpp @@ -2,7 +2,6 @@ #include #include #include "arrayEnumerateRanked.h" -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayEnumerateUniq.cpp b/dbms/src/Functions/array/arrayEnumerateUniq.cpp index cb4e72dd1cd..848b29064c4 100644 --- a/dbms/src/Functions/array/arrayEnumerateUniq.cpp +++ b/dbms/src/Functions/array/arrayEnumerateUniq.cpp @@ -1,6 +1,5 @@ #include "arrayEnumerateExtended.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayEnumerateUniqRanked.cpp b/dbms/src/Functions/array/arrayEnumerateUniqRanked.cpp index 75da44105a6..2cd1fe40c2e 100644 --- a/dbms/src/Functions/array/arrayEnumerateUniqRanked.cpp +++ b/dbms/src/Functions/array/arrayEnumerateUniqRanked.cpp @@ -1,6 +1,5 @@ #include "Functions/FunctionFactory.h" #include "arrayEnumerateRanked.h" -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayExists.cpp b/dbms/src/Functions/array/arrayExists.cpp index b8ec0c6317b..770e19ceec2 100644 --- a/dbms/src/Functions/array/arrayExists.cpp +++ b/dbms/src/Functions/array/arrayExists.cpp @@ -2,7 +2,7 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" + namespace DB { diff --git a/dbms/src/Functions/array/arrayFill.cpp b/dbms/src/Functions/array/arrayFill.cpp index f2ea9d9ad70..544cd0a8849 100644 --- a/dbms/src/Functions/array/arrayFill.cpp +++ b/dbms/src/Functions/array/arrayFill.cpp @@ -2,7 +2,7 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" + namespace DB { diff --git a/dbms/src/Functions/array/arrayFilter.cpp b/dbms/src/Functions/array/arrayFilter.cpp index 6140fc7e053..e5db2b34e23 100644 --- a/dbms/src/Functions/array/arrayFilter.cpp +++ b/dbms/src/Functions/array/arrayFilter.cpp @@ -2,7 +2,6 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayFirst.cpp b/dbms/src/Functions/array/arrayFirst.cpp index b7ec1f09254..98de4f8f1e5 100644 --- a/dbms/src/Functions/array/arrayFirst.cpp +++ b/dbms/src/Functions/array/arrayFirst.cpp @@ -2,7 +2,6 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayFirstIndex.cpp b/dbms/src/Functions/array/arrayFirstIndex.cpp index 1f61dd0b8aa..fccbf05c66c 100644 --- a/dbms/src/Functions/array/arrayFirstIndex.cpp +++ b/dbms/src/Functions/array/arrayFirstIndex.cpp @@ -2,7 +2,6 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayFlatten.cpp b/dbms/src/Functions/array/arrayFlatten.cpp index 939142aeff7..230dc551007 100644 --- a/dbms/src/Functions/array/arrayFlatten.cpp +++ b/dbms/src/Functions/array/arrayFlatten.cpp @@ -3,7 +3,7 @@ #include #include #include -#include "registerFunctionsArray.h" + namespace DB { diff --git a/dbms/src/Functions/array/arrayIntersect.cpp b/dbms/src/Functions/array/arrayIntersect.cpp index 933567cebf2..4673f4a7a05 100644 --- a/dbms/src/Functions/array/arrayIntersect.cpp +++ b/dbms/src/Functions/array/arrayIntersect.cpp @@ -22,7 +22,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayJoin.cpp b/dbms/src/Functions/array/arrayJoin.cpp index 62b2848cde3..6302f01b762 100644 --- a/dbms/src/Functions/array/arrayJoin.cpp +++ b/dbms/src/Functions/array/arrayJoin.cpp @@ -2,7 +2,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayMap.cpp b/dbms/src/Functions/array/arrayMap.cpp index ea456c29fd5..e3afaf7fb66 100644 --- a/dbms/src/Functions/array/arrayMap.cpp +++ b/dbms/src/Functions/array/arrayMap.cpp @@ -1,6 +1,5 @@ #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayPopBack.cpp b/dbms/src/Functions/array/arrayPopBack.cpp index a2421cc86cc..d69e59e7128 100644 --- a/dbms/src/Functions/array/arrayPopBack.cpp +++ b/dbms/src/Functions/array/arrayPopBack.cpp @@ -1,6 +1,5 @@ #include "arrayPop.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayPopFront.cpp b/dbms/src/Functions/array/arrayPopFront.cpp index 61c250403ec..ca9ce923aaa 100644 --- a/dbms/src/Functions/array/arrayPopFront.cpp +++ b/dbms/src/Functions/array/arrayPopFront.cpp @@ -1,6 +1,5 @@ #include "arrayPop.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayPushBack.cpp b/dbms/src/Functions/array/arrayPushBack.cpp index ad91cfdfd26..a9c4ed88a7a 100644 --- a/dbms/src/Functions/array/arrayPushBack.cpp +++ b/dbms/src/Functions/array/arrayPushBack.cpp @@ -1,6 +1,5 @@ #include "arrayPush.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayPushFront.cpp b/dbms/src/Functions/array/arrayPushFront.cpp index d79990fb7e8..e0cc56c8ae2 100644 --- a/dbms/src/Functions/array/arrayPushFront.cpp +++ b/dbms/src/Functions/array/arrayPushFront.cpp @@ -1,6 +1,5 @@ #include "arrayPush.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayReduce.cpp b/dbms/src/Functions/array/arrayReduce.cpp index 103d0fe5fa8..e97607af135 100644 --- a/dbms/src/Functions/array/arrayReduce.cpp +++ b/dbms/src/Functions/array/arrayReduce.cpp @@ -12,7 +12,6 @@ #include #include #include -#include "registerFunctionsArray.h" #include @@ -168,7 +167,8 @@ void FunctionArrayReduce::executeImpl(Block & block, const ColumnNumbers & argum } catch (...) { - agg_func.destroy(places[i]); + for (size_t j = 0; j < i; ++j) + agg_func.destroy(places[j]); throw; } } diff --git a/dbms/src/Functions/array/arrayResize.cpp b/dbms/src/Functions/array/arrayResize.cpp index e7dde32bd68..e7cda17cd27 100644 --- a/dbms/src/Functions/array/arrayResize.cpp +++ b/dbms/src/Functions/array/arrayResize.cpp @@ -9,7 +9,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayReverse.cpp b/dbms/src/Functions/array/arrayReverse.cpp index 931785e1198..a4f2f1ab90a 100644 --- a/dbms/src/Functions/array/arrayReverse.cpp +++ b/dbms/src/Functions/array/arrayReverse.cpp @@ -8,7 +8,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arraySlice.cpp b/dbms/src/Functions/array/arraySlice.cpp index 8233a5cdf2d..a952aa72d2e 100644 --- a/dbms/src/Functions/array/arraySlice.cpp +++ b/dbms/src/Functions/array/arraySlice.cpp @@ -7,7 +7,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arraySort.cpp b/dbms/src/Functions/array/arraySort.cpp index 0b0a76c941e..17a711e8902 100644 --- a/dbms/src/Functions/array/arraySort.cpp +++ b/dbms/src/Functions/array/arraySort.cpp @@ -1,6 +1,5 @@ #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arraySplit.cpp b/dbms/src/Functions/array/arraySplit.cpp index 7a29136e513..c23f3b0af21 100644 --- a/dbms/src/Functions/array/arraySplit.cpp +++ b/dbms/src/Functions/array/arraySplit.cpp @@ -2,7 +2,6 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arraySum.cpp b/dbms/src/Functions/array/arraySum.cpp index fcb5796c592..ea4101fa556 100644 --- a/dbms/src/Functions/array/arraySum.cpp +++ b/dbms/src/Functions/array/arraySum.cpp @@ -4,7 +4,6 @@ #include #include "FunctionArrayMapped.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayUniq.cpp b/dbms/src/Functions/array/arrayUniq.cpp index ad2b0044aee..1b66a3da318 100644 --- a/dbms/src/Functions/array/arrayUniq.cpp +++ b/dbms/src/Functions/array/arrayUniq.cpp @@ -11,7 +11,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayWithConstant.cpp b/dbms/src/Functions/array/arrayWithConstant.cpp index 6d816939a6d..0396f007aae 100644 --- a/dbms/src/Functions/array/arrayWithConstant.cpp +++ b/dbms/src/Functions/array/arrayWithConstant.cpp @@ -4,7 +4,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/arrayZip.cpp b/dbms/src/Functions/array/arrayZip.cpp index 93b3ceeee1c..20fca29bae8 100644 --- a/dbms/src/Functions/array/arrayZip.cpp +++ b/dbms/src/Functions/array/arrayZip.cpp @@ -5,7 +5,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/countEqual.cpp b/dbms/src/Functions/array/countEqual.cpp index dfb0f902714..fd4914e90f4 100644 --- a/dbms/src/Functions/array/countEqual.cpp +++ b/dbms/src/Functions/array/countEqual.cpp @@ -1,6 +1,5 @@ #include "arrayIndex.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/emptyArray.cpp b/dbms/src/Functions/array/emptyArray.cpp index c981ff339ae..0a5b6473112 100644 --- a/dbms/src/Functions/array/emptyArray.cpp +++ b/dbms/src/Functions/array/emptyArray.cpp @@ -9,7 +9,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/emptyArrayToSingle.cpp b/dbms/src/Functions/array/emptyArrayToSingle.cpp index 404aded8fa2..27f4e01c547 100644 --- a/dbms/src/Functions/array/emptyArrayToSingle.cpp +++ b/dbms/src/Functions/array/emptyArrayToSingle.cpp @@ -8,7 +8,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/has.cpp b/dbms/src/Functions/array/has.cpp index 112ef2b85c7..772facea52d 100644 --- a/dbms/src/Functions/array/has.cpp +++ b/dbms/src/Functions/array/has.cpp @@ -1,6 +1,5 @@ #include "arrayIndex.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/hasAll.cpp b/dbms/src/Functions/array/hasAll.cpp index bb67f21a0dd..6ae1640e382 100644 --- a/dbms/src/Functions/array/hasAll.cpp +++ b/dbms/src/Functions/array/hasAll.cpp @@ -1,6 +1,5 @@ #include "hasAllAny.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/hasAny.cpp b/dbms/src/Functions/array/hasAny.cpp index b71542d4eca..756e5311b50 100644 --- a/dbms/src/Functions/array/hasAny.cpp +++ b/dbms/src/Functions/array/hasAny.cpp @@ -1,6 +1,5 @@ #include "hasAllAny.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/indexOf.cpp b/dbms/src/Functions/array/indexOf.cpp index cc47d885762..d180a9f65d4 100644 --- a/dbms/src/Functions/array/indexOf.cpp +++ b/dbms/src/Functions/array/indexOf.cpp @@ -1,6 +1,5 @@ #include "arrayIndex.h" #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/length.cpp b/dbms/src/Functions/array/length.cpp index 3243e78dfb9..67267434794 100644 --- a/dbms/src/Functions/array/length.cpp +++ b/dbms/src/Functions/array/length.cpp @@ -1,7 +1,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/range.cpp b/dbms/src/Functions/array/range.cpp index 9ed508ce18d..b04dcce7519 100644 --- a/dbms/src/Functions/array/range.cpp +++ b/dbms/src/Functions/array/range.cpp @@ -8,7 +8,6 @@ #include #include #include -#include "registerFunctionsArray.h" namespace DB diff --git a/dbms/src/Functions/array/registerFunctionsArray.cpp b/dbms/src/Functions/array/registerFunctionsArray.cpp index 06a36d47f5c..ababc7603e3 100644 --- a/dbms/src/Functions/array/registerFunctionsArray.cpp +++ b/dbms/src/Functions/array/registerFunctionsArray.cpp @@ -1,8 +1,40 @@ -#include "registerFunctionsArray.h" - namespace DB { +class FunctionFactory; + +void registerFunctionArray(FunctionFactory & factory); +void registerFunctionArrayElement(FunctionFactory & factory); +void registerFunctionArrayResize(FunctionFactory & factory); +void registerFunctionHas(FunctionFactory & factory); +void registerFunctionHasAll(FunctionFactory & factory); +void registerFunctionHasAny(FunctionFactory & factory); +void registerFunctionIndexOf(FunctionFactory & factory); +void registerFunctionCountEqual(FunctionFactory & factory); +void registerFunctionArrayIntersect(FunctionFactory & factory); +void registerFunctionArrayPushFront(FunctionFactory & factory); +void registerFunctionArrayPushBack(FunctionFactory & factory); +void registerFunctionArrayPopFront(FunctionFactory & factory); +void registerFunctionArrayPopBack(FunctionFactory & factory); +void registerFunctionArrayConcat(FunctionFactory & factory); +void registerFunctionArraySlice(FunctionFactory & factory); +void registerFunctionArrayReverse(FunctionFactory & factory); +void registerFunctionArrayReduce(FunctionFactory & factory); +void registerFunctionRange(FunctionFactory & factory); +void registerFunctionsEmptyArray(FunctionFactory & factory); +void registerFunctionEmptyArrayToSingle(FunctionFactory & factory); +void registerFunctionArrayEnumerate(FunctionFactory & factory); +void registerFunctionArrayEnumerateUniq(FunctionFactory & factory); +void registerFunctionArrayEnumerateDense(FunctionFactory & factory); +void registerFunctionArrayEnumerateUniqRanked(FunctionFactory & factory); +void registerFunctionArrayEnumerateDenseRanked(FunctionFactory & factory); +void registerFunctionArrayUniq(FunctionFactory & factory); +void registerFunctionArrayDistinct(FunctionFactory & factory); +void registerFunctionArrayFlatten(FunctionFactory & factory); +void registerFunctionArrayWithConstant(FunctionFactory & factory); +void registerFunctionArrayZip(FunctionFactory & factory); + + void registerFunctionsArray(FunctionFactory & factory) { registerFunctionArray(factory); diff --git a/dbms/src/Functions/array/registerFunctionsArray.h b/dbms/src/Functions/array/registerFunctionsArray.h deleted file mode 100644 index ab8fa210106..00000000000 --- a/dbms/src/Functions/array/registerFunctionsArray.h +++ /dev/null @@ -1,58 +0,0 @@ -#pragma once - -namespace DB -{ -class FunctionFactory; - -void registerFunctionArray(FunctionFactory &); -void registerFunctionArrayElement(FunctionFactory &); -void registerFunctionArrayResize(FunctionFactory &); -void registerFunctionHas(FunctionFactory &); -void registerFunctionHasAll(FunctionFactory &); -void registerFunctionHasAny(FunctionFactory &); -void registerFunctionIndexOf(FunctionFactory &); -void registerFunctionCountEqual(FunctionFactory &); -void registerFunctionArrayIntersect(FunctionFactory &); -void registerFunctionArrayPushFront(FunctionFactory &); -void registerFunctionArrayPushBack(FunctionFactory &); -void registerFunctionArrayPopFront(FunctionFactory &); -void registerFunctionArrayPopBack(FunctionFactory &); -void registerFunctionArrayConcat(FunctionFactory &); -void registerFunctionArraySlice(FunctionFactory &); -void registerFunctionArrayReverse(FunctionFactory &); -void registerFunctionArrayReduce(FunctionFactory &); -void registerFunctionRange(FunctionFactory &); -void registerFunctionsEmptyArray(FunctionFactory &); -void registerFunctionEmptyArrayToSingle(FunctionFactory &); -void registerFunctionArrayEnumerate(FunctionFactory &); -void registerFunctionArrayEnumerateUniq(FunctionFactory &); -void registerFunctionArrayEnumerateDense(FunctionFactory &); -void registerFunctionArrayEnumerateUniqRanked(FunctionFactory &); -void registerFunctionArrayEnumerateDenseRanked(FunctionFactory &); -void registerFunctionArrayUniq(FunctionFactory &); -void registerFunctionArrayDistinct(FunctionFactory &); -void registerFunctionArrayFlatten(FunctionFactory &); -void registerFunctionArrayWithConstant(FunctionFactory &); -void registerFunctionArrayZip(FunctionFactory &); - -void registerFunctionArrayMap(FunctionFactory &); -void registerFunctionArrayFilter(FunctionFactory &); -void registerFunctionArrayCount(FunctionFactory &); -void registerFunctionArrayExists(FunctionFactory &); -void registerFunctionArrayAll(FunctionFactory &); -void registerFunctionArrayCompact(FunctionFactory &); -void registerFunctionArraySum(FunctionFactory &); -void registerFunctionArrayFirst(FunctionFactory &); -void registerFunctionArrayFirstIndex(FunctionFactory &); -void registerFunctionsArrayFill(FunctionFactory &); -void registerFunctionsArraySplit(FunctionFactory &); -void registerFunctionsArraySort(FunctionFactory &); -void registerFunctionArrayCumSum(FunctionFactory &); -void registerFunctionArrayCumSumNonNegative(FunctionFactory &); -void registerFunctionArrayDifference(FunctionFactory &); -void registerFunctionArrayJoin(FunctionFactory &); -void registerFunctionLength(FunctionFactory &); - -void registerFunctionsArray(FunctionFactory & factory); - -} diff --git a/dbms/src/Functions/asin.cpp b/dbms/src/Functions/asin.cpp index 577256e3fbd..cccd3fc05d4 100644 --- a/dbms/src/Functions/asin.cpp +++ b/dbms/src/Functions/asin.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/assumeNotNull.cpp b/dbms/src/Functions/assumeNotNull.cpp index fb277e71a94..4fc98e43b12 100644 --- a/dbms/src/Functions/assumeNotNull.cpp +++ b/dbms/src/Functions/assumeNotNull.cpp @@ -4,7 +4,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/atan.cpp b/dbms/src/Functions/atan.cpp index a29a5ca7ffd..00e871b9a84 100644 --- a/dbms/src/Functions/atan.cpp +++ b/dbms/src/Functions/atan.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/bar.cpp b/dbms/src/Functions/bar.cpp index a95ad079a6f..7c0f962cd80 100644 --- a/dbms/src/Functions/bar.cpp +++ b/dbms/src/Functions/bar.cpp @@ -7,7 +7,6 @@ #include #include #include -#include "registerFunctions.h" namespace DB diff --git a/dbms/src/Functions/base64Decode.cpp b/dbms/src/Functions/base64Decode.cpp index a81eeb8e96f..cedbd1e82f3 100644 --- a/dbms/src/Functions/base64Decode.cpp +++ b/dbms/src/Functions/base64Decode.cpp @@ -2,7 +2,7 @@ #if USE_BASE64 #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/bitAnd.cpp b/dbms/src/Functions/bitAnd.cpp index 180c9456c70..c779d9a90ed 100644 --- a/dbms/src/Functions/bitAnd.cpp +++ b/dbms/src/Functions/bitAnd.cpp @@ -1,6 +1,6 @@ #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/bitBoolMaskAnd.cpp b/dbms/src/Functions/bitBoolMaskAnd.cpp index 3d7a3354fae..7ca183386cf 100644 --- a/dbms/src/Functions/bitBoolMaskAnd.cpp +++ b/dbms/src/Functions/bitBoolMaskAnd.cpp @@ -1,7 +1,7 @@ #include #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/bitBoolMaskOr.cpp b/dbms/src/Functions/bitBoolMaskOr.cpp index 4e5b0d9c0c4..ab1e9544d01 100644 --- a/dbms/src/Functions/bitBoolMaskOr.cpp +++ b/dbms/src/Functions/bitBoolMaskOr.cpp @@ -1,7 +1,7 @@ #include #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/bitNot.cpp b/dbms/src/Functions/bitNot.cpp index d86daeb9187..eebc88a6329 100644 --- a/dbms/src/Functions/bitNot.cpp +++ b/dbms/src/Functions/bitNot.cpp @@ -1,7 +1,7 @@ #include #include #include -#include "registerFunctions.h" + namespace DB { diff --git a/dbms/src/Functions/formatString.h b/dbms/src/Functions/formatString.h index 1df8d090d22..c1f9b6d3783 100644 --- a/dbms/src/Functions/formatString.h +++ b/dbms/src/Functions/formatString.h @@ -12,6 +12,7 @@ #include #include + namespace DB { namespace ErrorCodes @@ -60,7 +61,7 @@ struct FormatImpl UInt64 * index_positions_ptr, std::vector & substrings) { - /// Is current position is after open curly brace. + /// Is current position after open curly brace. bool is_open_curly = false; /// The position of last open token. size_t last_open = -1; diff --git a/dbms/src/Functions/randomPrintableASCII.cpp b/dbms/src/Functions/randomPrintableASCII.cpp new file mode 100644 index 00000000000..a6f0fd65e42 --- /dev/null +++ b/dbms/src/Functions/randomPrintableASCII.cpp @@ -0,0 +1,113 @@ +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int TOO_LARGE_STRING_SIZE; +} + + +/** Generate random string of specified length with printable ASCII characters, almost uniformly distributed. + * First argument is length, other optional arguments are ignored and used to prevent common subexpression elimination to get different values. + */ +class FunctionRandomPrintableASCII : public IFunction +{ +public: + static constexpr auto name = "randomPrintableASCII"; + static FunctionPtr create(const Context &) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + bool isVariadic() const override { return true; } + size_t getNumberOfArguments() const override { return 0; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (arguments.size() < 1) + throw Exception("Function " + getName() + " requires at least one argument: the size of resulting string", + ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + + if (arguments.size() > 2) + throw Exception("Function " + getName() + " requires at most two arguments: the size of resulting string and optional disambiguation tag", + ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + + const IDataType & length_type = *arguments[0]; + if (!isNumber(length_type)) + throw Exception("First argument of function " + getName() + " must have numeric type", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + bool isDeterministic() const override { return false; } + bool isDeterministicInScopeOfQuery() const override { return false; } + + void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override + { + auto col_to = ColumnString::create(); + ColumnString::Chars & data_to = col_to->getChars(); + ColumnString::Offsets & offsets_to = col_to->getOffsets(); + offsets_to.resize(input_rows_count); + + const IColumn & length_column = *block.getByPosition(arguments[0]).column; + + IColumn::Offset offset = 0; + for (size_t row_num = 0; row_num < input_rows_count; ++row_num) + { + size_t length = length_column.getUInt(row_num); + if (length > (1 << 30)) + throw Exception("Too large string size in function " + getName(), ErrorCodes::TOO_LARGE_STRING_SIZE); + + IColumn::Offset next_offset = offset + length + 1; + data_to.resize(next_offset); + offsets_to[row_num] = next_offset; + + auto * data_to_ptr = data_to.data(); /// avoid assert on array indexing after end + for (size_t pos = offset, end = offset + length; pos < end; pos += 4) /// We have padding in column buffers that we can overwrite. + { + UInt64 rand = thread_local_rng(); + + UInt16 rand1 = rand; + UInt16 rand2 = rand >> 16; + UInt16 rand3 = rand >> 32; + UInt16 rand4 = rand >> 48; + + /// Printable characters are from range [32; 126]. + /// https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ + + data_to_ptr[pos + 0] = 32 + ((rand1 * 95) >> 16); + data_to_ptr[pos + 1] = 32 + ((rand2 * 95) >> 16); + data_to_ptr[pos + 2] = 32 + ((rand3 * 95) >> 16); + data_to_ptr[pos + 3] = 32 + ((rand4 * 95) >> 16); + + /// NOTE gcc failed to vectorize this code (aliasing of char?) + /// TODO Implement SIMD optimizations from Danila Kutenin. + } + + data_to[offset + length] = 0; + + offset = next_offset; + } + + block.getByPosition(result).column = std::move(col_to); + } +}; + +void registerFunctionRandomPrintableASCII(FunctionFactory & factory) +{ + factory.registerFunction(); +} + +} diff --git a/dbms/src/Functions/registerFunctions.cpp b/dbms/src/Functions/registerFunctions.cpp index 178c92236af..652e9a8b8af 100644 --- a/dbms/src/Functions/registerFunctions.cpp +++ b/dbms/src/Functions/registerFunctions.cpp @@ -1,10 +1,45 @@ #include -#include -#include + namespace DB { +void registerFunctionsArithmetic(FunctionFactory &); +void registerFunctionsArray(FunctionFactory &); +void registerFunctionsTuple(FunctionFactory &); +void registerFunctionsBitmap(FunctionFactory &); +void registerFunctionsCoding(FunctionFactory &); +void registerFunctionsComparison(FunctionFactory &); +void registerFunctionsConditional(FunctionFactory &); +void registerFunctionsConversion(FunctionFactory &); +void registerFunctionsDateTime(FunctionFactory &); +void registerFunctionsEmbeddedDictionaries(FunctionFactory &); +void registerFunctionsExternalDictionaries(FunctionFactory &); +void registerFunctionsExternalModels(FunctionFactory &); +void registerFunctionsFormatting(FunctionFactory &); +void registerFunctionsHashing(FunctionFactory &); +void registerFunctionsHigherOrder(FunctionFactory &); +void registerFunctionsLogical(FunctionFactory &); +void registerFunctionsMiscellaneous(FunctionFactory &); +void registerFunctionsRandom(FunctionFactory &); +void registerFunctionsReinterpret(FunctionFactory &); +void registerFunctionsRound(FunctionFactory &); +void registerFunctionsString(FunctionFactory &); +void registerFunctionsStringArray(FunctionFactory &); +void registerFunctionsStringSearch(FunctionFactory &); +void registerFunctionsStringRegex(FunctionFactory &); +void registerFunctionsStringSimilarity(FunctionFactory &); +void registerFunctionsURL(FunctionFactory &); +void registerFunctionsVisitParam(FunctionFactory &); +void registerFunctionsMath(FunctionFactory &); +void registerFunctionsGeo(FunctionFactory &); +void registerFunctionsIntrospection(FunctionFactory &); +void registerFunctionsNull(FunctionFactory &); +void registerFunctionsFindCluster(FunctionFactory &); +void registerFunctionsJSON(FunctionFactory &); +void registerFunctionsConsistentHashing(FunctionFactory & factory); + + void registerFunctions() { auto & factory = FunctionFactory::instance(); diff --git a/dbms/src/Functions/registerFunctions.h b/dbms/src/Functions/registerFunctions.h index 5827ae5894c..d426fb9ebef 100644 --- a/dbms/src/Functions/registerFunctions.h +++ b/dbms/src/Functions/registerFunctions.h @@ -1,308 +1,7 @@ #pragma once -#include "config_core.h" -#include "config_functions.h" namespace DB { -class FunctionFactory; - -void registerFunctionCurrentDatabase(FunctionFactory &); -void registerFunctionCurrentUser(FunctionFactory &); -void registerFunctionCurrentQuota(FunctionFactory &); -void registerFunctionCurrentRowPolicies(FunctionFactory &); -void registerFunctionHostName(FunctionFactory &); -void registerFunctionFQDN(FunctionFactory &); -void registerFunctionVisibleWidth(FunctionFactory &); -void registerFunctionToTypeName(FunctionFactory &); -void registerFunctionGetSizeOfEnumType(FunctionFactory &); -void registerFunctionToColumnTypeName(FunctionFactory &); -void registerFunctionDumpColumnStructure(FunctionFactory &); -void registerFunctionDefaultValueOfArgumentType(FunctionFactory &); -void registerFunctionBlockSize(FunctionFactory &); -void registerFunctionBlockNumber(FunctionFactory &); -void registerFunctionRowNumberInBlock(FunctionFactory &); -void registerFunctionRowNumberInAllBlocks(FunctionFactory &); -void registerFunctionNeighbor(FunctionFactory &); -void registerFunctionSleep(FunctionFactory &); -void registerFunctionSleepEachRow(FunctionFactory &); -void registerFunctionMaterialize(FunctionFactory &); -void registerFunctionIgnore(FunctionFactory &); -void registerFunctionIgnoreExceptNull(FunctionFactory &); -void registerFunctionIndexHint(FunctionFactory &); -void registerFunctionIdentity(FunctionFactory &); -void registerFunctionReplicate(FunctionFactory &); -void registerFunctionBar(FunctionFactory &); -void registerFunctionHasColumnInTable(FunctionFactory &); -void registerFunctionIsFinite(FunctionFactory &); -void registerFunctionIsInfinite(FunctionFactory &); -void registerFunctionIsNaN(FunctionFactory &); -void registerFunctionThrowIf(FunctionFactory &); -void registerFunctionVersion(FunctionFactory &); -void registerFunctionUptime(FunctionFactory &); -void registerFunctionTimeZone(FunctionFactory &); -void registerFunctionRunningAccumulate(FunctionFactory &); -void registerFunctionRunningDifference(FunctionFactory &); -void registerFunctionRunningDifferenceStartingWithFirstValue(FunctionFactory &); -void registerFunctionFinalizeAggregation(FunctionFactory &); -void registerFunctionToLowCardinality(FunctionFactory &); -void registerFunctionLowCardinalityIndices(FunctionFactory &); -void registerFunctionLowCardinalityKeys(FunctionFactory &); -void registerFunctionsIn(FunctionFactory &); -void registerFunctionJoinGet(FunctionFactory &); -void registerFunctionFilesystem(FunctionFactory &); -void registerFunctionEvalMLMethod(FunctionFactory &); -void registerFunctionBasename(FunctionFactory &); -void registerFunctionTransform(FunctionFactory &); -void registerFunctionGetMacro(FunctionFactory &); -void registerFunctionGetScalar(FunctionFactory &); - -#if USE_ICU -void registerFunctionConvertCharset(FunctionFactory &); -#endif - -void registerFunctionsArithmetic(FunctionFactory &); -void registerFunctionsTuple(FunctionFactory &); -void registerFunctionsBitmap(FunctionFactory &); -void registerFunctionsCoding(FunctionFactory &); -void registerFunctionsComparison(FunctionFactory &); -void registerFunctionsConditional(FunctionFactory &); -void registerFunctionsConversion(FunctionFactory &); -void registerFunctionsDateTime(FunctionFactory &); -void registerFunctionsEmbeddedDictionaries(FunctionFactory &); -void registerFunctionsExternalDictionaries(FunctionFactory &); -void registerFunctionsExternalModels(FunctionFactory &); -void registerFunctionsFormatting(FunctionFactory &); -void registerFunctionsHashing(FunctionFactory &); -void registerFunctionsHigherOrder(FunctionFactory &); -void registerFunctionsLogical(FunctionFactory &); -void registerFunctionsMiscellaneous(FunctionFactory &); -void registerFunctionsRandom(FunctionFactory &); -void registerFunctionsReinterpret(FunctionFactory &); -void registerFunctionsRound(FunctionFactory &); -void registerFunctionsString(FunctionFactory &); -void registerFunctionsStringArray(FunctionFactory &); -void registerFunctionsStringSearch(FunctionFactory &); -void registerFunctionsStringRegex(FunctionFactory &); -void registerFunctionsStringSimilarity(FunctionFactory &); -void registerFunctionsVisitParam(FunctionFactory &); -void registerFunctionsMath(FunctionFactory &); -void registerFunctionsGeo(FunctionFactory &); -void registerFunctionsIntrospection(FunctionFactory &); -void registerFunctionsNull(FunctionFactory &); -void registerFunctionsFindCluster(FunctionFactory &); -void registerFunctionsJSON(FunctionFactory &); -void registerFunctionsConsistentHashing(FunctionFactory & factory); - -void registerFunctionPlus(FunctionFactory & factory); -void registerFunctionMinus(FunctionFactory & factory); -void registerFunctionMultiply(FunctionFactory & factory); -void registerFunctionDivide(FunctionFactory & factory); -void registerFunctionIntDiv(FunctionFactory & factory); -void registerFunctionIntDivOrZero(FunctionFactory & factory); -void registerFunctionModulo(FunctionFactory & factory); -void registerFunctionNegate(FunctionFactory & factory); -void registerFunctionAbs(FunctionFactory & factory); -void registerFunctionBitAnd(FunctionFactory & factory); -void registerFunctionBitOr(FunctionFactory & factory); -void registerFunctionBitXor(FunctionFactory & factory); -void registerFunctionBitNot(FunctionFactory & factory); -void registerFunctionBitShiftLeft(FunctionFactory & factory); -void registerFunctionBitShiftRight(FunctionFactory & factory); -void registerFunctionBitRotateLeft(FunctionFactory & factory); -void registerFunctionBitRotateRight(FunctionFactory & factory); -void registerFunctionLeast(FunctionFactory & factory); -void registerFunctionGreatest(FunctionFactory & factory); -void registerFunctionBitTest(FunctionFactory & factory); -void registerFunctionBitTestAny(FunctionFactory & factory); -void registerFunctionBitTestAll(FunctionFactory & factory); -void registerFunctionGCD(FunctionFactory & factory); -void registerFunctionLCM(FunctionFactory & factory); -void registerFunctionIntExp2(FunctionFactory & factory); -void registerFunctionIntExp10(FunctionFactory & factory); -void registerFunctionRoundToExp2(FunctionFactory & factory); -void registerFunctionRoundDuration(FunctionFactory & factory); -void registerFunctionRoundAge(FunctionFactory & factory); - -void registerFunctionBitBoolMaskOr(FunctionFactory & factory); -void registerFunctionBitBoolMaskAnd(FunctionFactory & factory); -void registerFunctionBitWrapperFunc(FunctionFactory & factory); -void registerFunctionBitSwapLastTwo(FunctionFactory & factory); - -void registerFunctionEquals(FunctionFactory & factory); -void registerFunctionNotEquals(FunctionFactory & factory); -void registerFunctionLess(FunctionFactory & factory); -void registerFunctionGreater(FunctionFactory & factory); -void registerFunctionLessOrEquals(FunctionFactory & factory); -void registerFunctionGreaterOrEquals(FunctionFactory & factory); - -void registerFunctionIf(FunctionFactory & factory); -void registerFunctionMultiIf(FunctionFactory & factory); -void registerFunctionCaseWithExpression(FunctionFactory & factory); - -void registerFunctionYandexConsistentHash(FunctionFactory & factory); -void registerFunctionJumpConsistentHash(FunctionFactory & factory); -void registerFunctionSumburConsistentHash(FunctionFactory & factory); - -void registerFunctionToYear(FunctionFactory &); -void registerFunctionToQuarter(FunctionFactory &); -void registerFunctionToMonth(FunctionFactory &); -void registerFunctionToDayOfMonth(FunctionFactory &); -void registerFunctionToDayOfWeek(FunctionFactory &); -void registerFunctionToDayOfYear(FunctionFactory &); -void registerFunctionToHour(FunctionFactory &); -void registerFunctionToMinute(FunctionFactory &); -void registerFunctionToSecond(FunctionFactory &); -void registerFunctionToStartOfDay(FunctionFactory &); -void registerFunctionToMonday(FunctionFactory &); -void registerFunctionToISOWeek(FunctionFactory &); -void registerFunctionToISOYear(FunctionFactory &); -void registerFunctionToCustomWeek(FunctionFactory &); -void registerFunctionToStartOfMonth(FunctionFactory &); -void registerFunctionToStartOfQuarter(FunctionFactory &); -void registerFunctionToStartOfYear(FunctionFactory &); -void registerFunctionToStartOfMinute(FunctionFactory &); -void registerFunctionToStartOfFiveMinute(FunctionFactory &); -void registerFunctionToStartOfTenMinutes(FunctionFactory &); -void registerFunctionToStartOfFifteenMinutes(FunctionFactory &); -void registerFunctionToStartOfHour(FunctionFactory &); -void registerFunctionToStartOfInterval(FunctionFactory &); -void registerFunctionToStartOfISOYear(FunctionFactory &); -void registerFunctionToRelativeYearNum(FunctionFactory &); -void registerFunctionToRelativeQuarterNum(FunctionFactory &); -void registerFunctionToRelativeMonthNum(FunctionFactory &); -void registerFunctionToRelativeWeekNum(FunctionFactory &); -void registerFunctionToRelativeDayNum(FunctionFactory &); -void registerFunctionToRelativeHourNum(FunctionFactory &); -void registerFunctionToRelativeMinuteNum(FunctionFactory &); -void registerFunctionToRelativeSecondNum(FunctionFactory &); -void registerFunctionToTime(FunctionFactory &); -void registerFunctionNow(FunctionFactory &); -void registerFunctionNow64(FunctionFactory &); -void registerFunctionToday(FunctionFactory &); -void registerFunctionYesterday(FunctionFactory &); -void registerFunctionTimeSlot(FunctionFactory &); -void registerFunctionTimeSlots(FunctionFactory &); -void registerFunctionToYYYYMM(FunctionFactory &); -void registerFunctionToYYYYMMDD(FunctionFactory &); -void registerFunctionToYYYYMMDDhhmmss(FunctionFactory &); -void registerFunctionAddSeconds(FunctionFactory &); -void registerFunctionAddMinutes(FunctionFactory &); -void registerFunctionAddHours(FunctionFactory &); -void registerFunctionAddDays(FunctionFactory &); -void registerFunctionAddWeeks(FunctionFactory &); -void registerFunctionAddMonths(FunctionFactory &); -void registerFunctionAddQuarters(FunctionFactory &); -void registerFunctionAddYears(FunctionFactory &); -void registerFunctionSubtractSeconds(FunctionFactory &); -void registerFunctionSubtractMinutes(FunctionFactory &); -void registerFunctionSubtractHours(FunctionFactory &); -void registerFunctionSubtractDays(FunctionFactory &); -void registerFunctionSubtractWeeks(FunctionFactory &); -void registerFunctionSubtractMonths(FunctionFactory &); -void registerFunctionSubtractQuarters(FunctionFactory &); -void registerFunctionSubtractYears(FunctionFactory &); -void registerFunctionDateDiff(FunctionFactory &); -void registerFunctionToTimeZone(FunctionFactory &); -void registerFunctionFormatDateTime(FunctionFactory &); - -void registerFunctionGeoDistance(FunctionFactory & factory); -void registerFunctionPointInEllipses(FunctionFactory & factory); -void registerFunctionPointInPolygon(FunctionFactory & factory); -void registerFunctionGeohashEncode(FunctionFactory & factory); -void registerFunctionGeohashDecode(FunctionFactory & factory); -void registerFunctionGeohashesInBox(FunctionFactory & factory); - -#if USE_H3 -void registerFunctionGeoToH3(FunctionFactory &); -void registerFunctionH3EdgeAngle(FunctionFactory &); -void registerFunctionH3EdgeLengthM(FunctionFactory &); -void registerFunctionH3GetResolution(FunctionFactory &); -void registerFunctionH3IsValid(FunctionFactory &); -void registerFunctionH3KRing(FunctionFactory &); -#endif - -#if defined(OS_LINUX) -void registerFunctionAddressToSymbol(FunctionFactory & factory); -void registerFunctionAddressToLine(FunctionFactory & factory); -#endif -void registerFunctionDemangle(FunctionFactory & factory); -void registerFunctionTrap(FunctionFactory & factory); - -void registerFunctionE(FunctionFactory & factory); -void registerFunctionPi(FunctionFactory & factory); -void registerFunctionExp(FunctionFactory & factory); -void registerFunctionLog(FunctionFactory & factory); -void registerFunctionExp2(FunctionFactory & factory); -void registerFunctionLog2(FunctionFactory & factory); -void registerFunctionExp10(FunctionFactory & factory); -void registerFunctionLog10(FunctionFactory & factory); -void registerFunctionSqrt(FunctionFactory & factory); -void registerFunctionCbrt(FunctionFactory & factory); -void registerFunctionErf(FunctionFactory & factory); -void registerFunctionErfc(FunctionFactory & factory); -void registerFunctionLGamma(FunctionFactory & factory); -void registerFunctionTGamma(FunctionFactory & factory); -void registerFunctionSin(FunctionFactory & factory); -void registerFunctionCos(FunctionFactory & factory); -void registerFunctionTan(FunctionFactory & factory); -void registerFunctionAsin(FunctionFactory & factory); -void registerFunctionAcos(FunctionFactory & factory); -void registerFunctionAtan(FunctionFactory & factory); -void registerFunctionSigmoid(FunctionFactory & factory); -void registerFunctionTanh(FunctionFactory & factory); -void registerFunctionPow(FunctionFactory & factory); - -void registerFunctionIsNull(FunctionFactory & factory); -void registerFunctionIsNotNull(FunctionFactory & factory); -void registerFunctionCoalesce(FunctionFactory & factory); -void registerFunctionIfNull(FunctionFactory & factory); -void registerFunctionNullIf(FunctionFactory & factory); -void registerFunctionAssumeNotNull(FunctionFactory & factory); -void registerFunctionToNullable(FunctionFactory & factory); - -void registerFunctionRand(FunctionFactory & factory); -void registerFunctionRand64(FunctionFactory & factory); -void registerFunctionRandConstant(FunctionFactory & factory); -void registerFunctionGenerateUUIDv4(FunctionFactory & factory); - -void registerFunctionRepeat(FunctionFactory &); -void registerFunctionEmpty(FunctionFactory &); -void registerFunctionNotEmpty(FunctionFactory &); -void registerFunctionLengthUTF8(FunctionFactory &); -void registerFunctionIsValidUTF8(FunctionFactory &); -void registerFunctionToValidUTF8(FunctionFactory &); -void registerFunctionLower(FunctionFactory &); -void registerFunctionUpper(FunctionFactory &); -void registerFunctionLowerUTF8(FunctionFactory &); -void registerFunctionUpperUTF8(FunctionFactory &); -void registerFunctionReverse(FunctionFactory &); -void registerFunctionReverseUTF8(FunctionFactory &); -void registerFunctionsConcat(FunctionFactory &); -void registerFunctionFormat(FunctionFactory &); -void registerFunctionSubstring(FunctionFactory &); -void registerFunctionCRC(FunctionFactory &); -void registerFunctionAppendTrailingCharIfAbsent(FunctionFactory &); -void registerFunctionStartsWith(FunctionFactory &); -void registerFunctionEndsWith(FunctionFactory &); -void registerFunctionTrim(FunctionFactory &); -void registerFunctionRegexpQuoteMeta(FunctionFactory &); - -#if USE_BASE64 -void registerFunctionBase64Encode(FunctionFactory &); -void registerFunctionBase64Decode(FunctionFactory &); -void registerFunctionTryBase64Decode(FunctionFactory &); -#endif - -void registerFunctionTuple(FunctionFactory &); -void registerFunctionTupleElement(FunctionFactory &); - -void registerFunctionVisitParamHas(FunctionFactory & factory); -void registerFunctionVisitParamExtractUInt(FunctionFactory & factory); -void registerFunctionVisitParamExtractInt(FunctionFactory & factory); -void registerFunctionVisitParamExtractFloat(FunctionFactory & factory); -void registerFunctionVisitParamExtractBool(FunctionFactory & factory); -void registerFunctionVisitParamExtractRaw(FunctionFactory & factory); -void registerFunctionVisitParamExtractString(FunctionFactory & factory); void registerFunctions(); diff --git a/dbms/src/Functions/registerFunctionsArithmetic.cpp b/dbms/src/Functions/registerFunctionsArithmetic.cpp index 4d05fe0d885..eb68fc32fa1 100644 --- a/dbms/src/Functions/registerFunctionsArithmetic.cpp +++ b/dbms/src/Functions/registerFunctionsArithmetic.cpp @@ -1,6 +1,44 @@ -#include "registerFunctions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionPlus(FunctionFactory & factory); +void registerFunctionMinus(FunctionFactory & factory); +void registerFunctionMultiply(FunctionFactory & factory); +void registerFunctionDivide(FunctionFactory & factory); +void registerFunctionIntDiv(FunctionFactory & factory); +void registerFunctionIntDivOrZero(FunctionFactory & factory); +void registerFunctionModulo(FunctionFactory & factory); +void registerFunctionNegate(FunctionFactory & factory); +void registerFunctionAbs(FunctionFactory & factory); +void registerFunctionBitAnd(FunctionFactory & factory); +void registerFunctionBitOr(FunctionFactory & factory); +void registerFunctionBitXor(FunctionFactory & factory); +void registerFunctionBitNot(FunctionFactory & factory); +void registerFunctionBitShiftLeft(FunctionFactory & factory); +void registerFunctionBitShiftRight(FunctionFactory & factory); +void registerFunctionBitRotateLeft(FunctionFactory & factory); +void registerFunctionBitRotateRight(FunctionFactory & factory); +void registerFunctionLeast(FunctionFactory & factory); +void registerFunctionGreatest(FunctionFactory & factory); +void registerFunctionBitTest(FunctionFactory & factory); +void registerFunctionBitTestAny(FunctionFactory & factory); +void registerFunctionBitTestAll(FunctionFactory & factory); +void registerFunctionGCD(FunctionFactory & factory); +void registerFunctionLCM(FunctionFactory & factory); +void registerFunctionIntExp2(FunctionFactory & factory); +void registerFunctionIntExp10(FunctionFactory & factory); +void registerFunctionRoundToExp2(FunctionFactory & factory); +void registerFunctionRoundDuration(FunctionFactory & factory); +void registerFunctionRoundAge(FunctionFactory & factory); + +void registerFunctionBitBoolMaskOr(FunctionFactory & factory); +void registerFunctionBitBoolMaskAnd(FunctionFactory & factory); +void registerFunctionBitWrapperFunc(FunctionFactory & factory); +void registerFunctionBitSwapLastTwo(FunctionFactory & factory); + + void registerFunctionsArithmetic(FunctionFactory & factory) { registerFunctionPlus(factory); diff --git a/dbms/src/Functions/registerFunctionsComparison.cpp b/dbms/src/Functions/registerFunctionsComparison.cpp index 5ea2f2034a0..af5cbed6191 100644 --- a/dbms/src/Functions/registerFunctionsComparison.cpp +++ b/dbms/src/Functions/registerFunctionsComparison.cpp @@ -1,6 +1,16 @@ -#include "registerFunctions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionEquals(FunctionFactory & factory); +void registerFunctionNotEquals(FunctionFactory & factory); +void registerFunctionLess(FunctionFactory & factory); +void registerFunctionGreater(FunctionFactory & factory); +void registerFunctionLessOrEquals(FunctionFactory & factory); +void registerFunctionGreaterOrEquals(FunctionFactory & factory); + + void registerFunctionsComparison(FunctionFactory & factory) { registerFunctionEquals(factory); diff --git a/dbms/src/Functions/registerFunctionsConditional.cpp b/dbms/src/Functions/registerFunctionsConditional.cpp index 23704ae5d65..d58d2508dee 100644 --- a/dbms/src/Functions/registerFunctionsConditional.cpp +++ b/dbms/src/Functions/registerFunctionsConditional.cpp @@ -1,6 +1,13 @@ -#include "registerFunctions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionIf(FunctionFactory & factory); +void registerFunctionMultiIf(FunctionFactory & factory); +void registerFunctionCaseWithExpression(FunctionFactory & factory); + + void registerFunctionsConditional(FunctionFactory & factory) { registerFunctionIf(factory); diff --git a/dbms/src/Functions/registerFunctionsConsistentHashing.cpp b/dbms/src/Functions/registerFunctionsConsistentHashing.cpp index b5eb232558c..95a856b6d3c 100644 --- a/dbms/src/Functions/registerFunctionsConsistentHashing.cpp +++ b/dbms/src/Functions/registerFunctionsConsistentHashing.cpp @@ -1,6 +1,13 @@ -#include "registerFunctions.h" namespace DB + { +class FunctionFactory; + +void registerFunctionYandexConsistentHash(FunctionFactory & factory); +void registerFunctionJumpConsistentHash(FunctionFactory & factory); +void registerFunctionSumburConsistentHash(FunctionFactory & factory); + + void registerFunctionsConsistentHashing(FunctionFactory & factory) { registerFunctionYandexConsistentHash(factory); diff --git a/dbms/src/Functions/registerFunctionsDateTime.cpp b/dbms/src/Functions/registerFunctionsDateTime.cpp index 3714f62a410..1dd1b991337 100644 --- a/dbms/src/Functions/registerFunctionsDateTime.cpp +++ b/dbms/src/Functions/registerFunctionsDateTime.cpp @@ -1,6 +1,70 @@ -#include "registerFunctions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionToYear(FunctionFactory &); +void registerFunctionToQuarter(FunctionFactory &); +void registerFunctionToMonth(FunctionFactory &); +void registerFunctionToDayOfMonth(FunctionFactory &); +void registerFunctionToDayOfWeek(FunctionFactory &); +void registerFunctionToDayOfYear(FunctionFactory &); +void registerFunctionToHour(FunctionFactory &); +void registerFunctionToMinute(FunctionFactory &); +void registerFunctionToSecond(FunctionFactory &); +void registerFunctionToStartOfDay(FunctionFactory &); +void registerFunctionToMonday(FunctionFactory &); +void registerFunctionToISOWeek(FunctionFactory &); +void registerFunctionToISOYear(FunctionFactory &); +void registerFunctionToCustomWeek(FunctionFactory &); +void registerFunctionToStartOfMonth(FunctionFactory &); +void registerFunctionToStartOfQuarter(FunctionFactory &); +void registerFunctionToStartOfYear(FunctionFactory &); +void registerFunctionToStartOfMinute(FunctionFactory &); +void registerFunctionToStartOfFiveMinute(FunctionFactory &); +void registerFunctionToStartOfTenMinutes(FunctionFactory &); +void registerFunctionToStartOfFifteenMinutes(FunctionFactory &); +void registerFunctionToStartOfHour(FunctionFactory &); +void registerFunctionToStartOfInterval(FunctionFactory &); +void registerFunctionToStartOfISOYear(FunctionFactory &); +void registerFunctionToRelativeYearNum(FunctionFactory &); +void registerFunctionToRelativeQuarterNum(FunctionFactory &); +void registerFunctionToRelativeMonthNum(FunctionFactory &); +void registerFunctionToRelativeWeekNum(FunctionFactory &); +void registerFunctionToRelativeDayNum(FunctionFactory &); +void registerFunctionToRelativeHourNum(FunctionFactory &); +void registerFunctionToRelativeMinuteNum(FunctionFactory &); +void registerFunctionToRelativeSecondNum(FunctionFactory &); +void registerFunctionToTime(FunctionFactory &); +void registerFunctionNow(FunctionFactory &); +void registerFunctionNow64(FunctionFactory &); +void registerFunctionToday(FunctionFactory &); +void registerFunctionYesterday(FunctionFactory &); +void registerFunctionTimeSlot(FunctionFactory &); +void registerFunctionTimeSlots(FunctionFactory &); +void registerFunctionToYYYYMM(FunctionFactory &); +void registerFunctionToYYYYMMDD(FunctionFactory &); +void registerFunctionToYYYYMMDDhhmmss(FunctionFactory &); +void registerFunctionAddSeconds(FunctionFactory &); +void registerFunctionAddMinutes(FunctionFactory &); +void registerFunctionAddHours(FunctionFactory &); +void registerFunctionAddDays(FunctionFactory &); +void registerFunctionAddWeeks(FunctionFactory &); +void registerFunctionAddMonths(FunctionFactory &); +void registerFunctionAddQuarters(FunctionFactory &); +void registerFunctionAddYears(FunctionFactory &); +void registerFunctionSubtractSeconds(FunctionFactory &); +void registerFunctionSubtractMinutes(FunctionFactory &); +void registerFunctionSubtractHours(FunctionFactory &); +void registerFunctionSubtractDays(FunctionFactory &); +void registerFunctionSubtractWeeks(FunctionFactory &); +void registerFunctionSubtractMonths(FunctionFactory &); +void registerFunctionSubtractQuarters(FunctionFactory &); +void registerFunctionSubtractYears(FunctionFactory &); +void registerFunctionDateDiff(FunctionFactory &); +void registerFunctionToTimeZone(FunctionFactory &); +void registerFunctionFormatDateTime(FunctionFactory &); + void registerFunctionsDateTime(FunctionFactory & factory) { registerFunctionToYear(factory); diff --git a/dbms/src/Functions/registerFunctionsGeo.cpp b/dbms/src/Functions/registerFunctionsGeo.cpp index 7dc7ab471a8..d7c54986f8d 100644 --- a/dbms/src/Functions/registerFunctionsGeo.cpp +++ b/dbms/src/Functions/registerFunctionsGeo.cpp @@ -1,7 +1,27 @@ -#include "registerFunctions.h" +#include "config_functions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionGeoDistance(FunctionFactory & factory); +void registerFunctionPointInEllipses(FunctionFactory & factory); +void registerFunctionPointInPolygon(FunctionFactory & factory); +void registerFunctionGeohashEncode(FunctionFactory & factory); +void registerFunctionGeohashDecode(FunctionFactory & factory); +void registerFunctionGeohashesInBox(FunctionFactory & factory); + +#if USE_H3 +void registerFunctionGeoToH3(FunctionFactory &); +void registerFunctionH3EdgeAngle(FunctionFactory &); +void registerFunctionH3EdgeLengthM(FunctionFactory &); +void registerFunctionH3GetResolution(FunctionFactory &); +void registerFunctionH3IsValid(FunctionFactory &); +void registerFunctionH3KRing(FunctionFactory &); +#endif + + void registerFunctionsGeo(FunctionFactory & factory) { registerFunctionGeoDistance(factory); diff --git a/dbms/src/Functions/registerFunctionsHigherOrder.cpp b/dbms/src/Functions/registerFunctionsHigherOrder.cpp index 42c639f6926..08938ef6534 100644 --- a/dbms/src/Functions/registerFunctionsHigherOrder.cpp +++ b/dbms/src/Functions/registerFunctionsHigherOrder.cpp @@ -1,8 +1,24 @@ -#include "registerFunctions.h" -#include "array/registerFunctionsArray.h" - namespace DB { + +class FunctionFactory; + +void registerFunctionArrayMap(FunctionFactory & factory); +void registerFunctionArrayFilter(FunctionFactory & factory); +void registerFunctionArrayCount(FunctionFactory & factory); +void registerFunctionArrayExists(FunctionFactory & factory); +void registerFunctionArrayAll(FunctionFactory & factory); +void registerFunctionArrayCompact(FunctionFactory & factory); +void registerFunctionArraySum(FunctionFactory & factory); +void registerFunctionArrayFirst(FunctionFactory & factory); +void registerFunctionArrayFirstIndex(FunctionFactory & factory); +void registerFunctionsArrayFill(FunctionFactory & factory); +void registerFunctionsArraySplit(FunctionFactory & factory); +void registerFunctionsArraySort(FunctionFactory & factory); +void registerFunctionArrayCumSum(FunctionFactory & factory); +void registerFunctionArrayCumSumNonNegative(FunctionFactory & factory); +void registerFunctionArrayDifference(FunctionFactory & factory); + void registerFunctionsHigherOrder(FunctionFactory & factory) { registerFunctionArrayMap(factory); diff --git a/dbms/src/Functions/registerFunctionsIntrospection.cpp b/dbms/src/Functions/registerFunctionsIntrospection.cpp index 9eb92ef57c5..fe76c96d62d 100644 --- a/dbms/src/Functions/registerFunctionsIntrospection.cpp +++ b/dbms/src/Functions/registerFunctionsIntrospection.cpp @@ -1,7 +1,17 @@ -#include "registerFunctions.h" - namespace DB { + +class FunctionFactory; + +#if defined(OS_LINUX) +void registerFunctionAddressToSymbol(FunctionFactory & factory); +void registerFunctionAddressToLine(FunctionFactory & factory); +#endif + +void registerFunctionDemangle(FunctionFactory & factory); +void registerFunctionTrap(FunctionFactory & factory); + + void registerFunctionsIntrospection(FunctionFactory & factory) { #if defined(OS_LINUX) diff --git a/dbms/src/Functions/registerFunctionsMath.cpp b/dbms/src/Functions/registerFunctionsMath.cpp index a34bf9fb987..e102c725050 100644 --- a/dbms/src/Functions/registerFunctionsMath.cpp +++ b/dbms/src/Functions/registerFunctionsMath.cpp @@ -1,6 +1,33 @@ -#include "registerFunctions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionE(FunctionFactory & factory); +void registerFunctionPi(FunctionFactory & factory); +void registerFunctionExp(FunctionFactory & factory); +void registerFunctionLog(FunctionFactory & factory); +void registerFunctionExp2(FunctionFactory & factory); +void registerFunctionLog2(FunctionFactory & factory); +void registerFunctionExp10(FunctionFactory & factory); +void registerFunctionLog10(FunctionFactory & factory); +void registerFunctionSqrt(FunctionFactory & factory); +void registerFunctionCbrt(FunctionFactory & factory); +void registerFunctionErf(FunctionFactory & factory); +void registerFunctionErfc(FunctionFactory & factory); +void registerFunctionLGamma(FunctionFactory & factory); +void registerFunctionTGamma(FunctionFactory & factory); +void registerFunctionSin(FunctionFactory & factory); +void registerFunctionCos(FunctionFactory & factory); +void registerFunctionTan(FunctionFactory & factory); +void registerFunctionAsin(FunctionFactory & factory); +void registerFunctionAcos(FunctionFactory & factory); +void registerFunctionAtan(FunctionFactory & factory); +void registerFunctionSigmoid(FunctionFactory & factory); +void registerFunctionTanh(FunctionFactory & factory); +void registerFunctionPow(FunctionFactory & factory); + + void registerFunctionsMath(FunctionFactory & factory) { registerFunctionE(factory); diff --git a/dbms/src/Functions/registerFunctionsMiscellaneous.cpp b/dbms/src/Functions/registerFunctionsMiscellaneous.cpp index 98c749189d4..ef93d4554d5 100644 --- a/dbms/src/Functions/registerFunctionsMiscellaneous.cpp +++ b/dbms/src/Functions/registerFunctionsMiscellaneous.cpp @@ -1,8 +1,65 @@ -#include -#include "registerFunctions.h" +#include namespace DB { + +class FunctionFactory; + +void registerFunctionCurrentDatabase(FunctionFactory &); +void registerFunctionCurrentUser(FunctionFactory &); +void registerFunctionCurrentQuota(FunctionFactory &); +void registerFunctionCurrentRowPolicies(FunctionFactory &); +void registerFunctionHostName(FunctionFactory &); +void registerFunctionFQDN(FunctionFactory &); +void registerFunctionVisibleWidth(FunctionFactory &); +void registerFunctionToTypeName(FunctionFactory &); +void registerFunctionGetSizeOfEnumType(FunctionFactory &); +void registerFunctionToColumnTypeName(FunctionFactory &); +void registerFunctionDumpColumnStructure(FunctionFactory &); +void registerFunctionDefaultValueOfArgumentType(FunctionFactory &); +void registerFunctionBlockSize(FunctionFactory &); +void registerFunctionBlockNumber(FunctionFactory &); +void registerFunctionRowNumberInBlock(FunctionFactory &); +void registerFunctionRowNumberInAllBlocks(FunctionFactory &); +void registerFunctionNeighbor(FunctionFactory &); +void registerFunctionSleep(FunctionFactory &); +void registerFunctionSleepEachRow(FunctionFactory &); +void registerFunctionMaterialize(FunctionFactory &); +void registerFunctionIgnore(FunctionFactory &); +void registerFunctionIgnoreExceptNull(FunctionFactory &); +void registerFunctionIndexHint(FunctionFactory &); +void registerFunctionIdentity(FunctionFactory &); +void registerFunctionArrayJoin(FunctionFactory &); +void registerFunctionReplicate(FunctionFactory &); +void registerFunctionBar(FunctionFactory &); +void registerFunctionHasColumnInTable(FunctionFactory &); +void registerFunctionIsFinite(FunctionFactory &); +void registerFunctionIsInfinite(FunctionFactory &); +void registerFunctionIsNaN(FunctionFactory &); +void registerFunctionThrowIf(FunctionFactory &); +void registerFunctionVersion(FunctionFactory &); +void registerFunctionUptime(FunctionFactory &); +void registerFunctionTimeZone(FunctionFactory &); +void registerFunctionRunningAccumulate(FunctionFactory &); +void registerFunctionRunningDifference(FunctionFactory &); +void registerFunctionRunningDifferenceStartingWithFirstValue(FunctionFactory &); +void registerFunctionFinalizeAggregation(FunctionFactory &); +void registerFunctionToLowCardinality(FunctionFactory &); +void registerFunctionLowCardinalityIndices(FunctionFactory &); +void registerFunctionLowCardinalityKeys(FunctionFactory &); +void registerFunctionsIn(FunctionFactory &); +void registerFunctionJoinGet(FunctionFactory &); +void registerFunctionFilesystem(FunctionFactory &); +void registerFunctionEvalMLMethod(FunctionFactory &); +void registerFunctionBasename(FunctionFactory &); +void registerFunctionTransform(FunctionFactory &); +void registerFunctionGetMacro(FunctionFactory &); +void registerFunctionGetScalar(FunctionFactory &); + +#if USE_ICU +void registerFunctionConvertCharset(FunctionFactory &); +#endif + void registerFunctionsMiscellaneous(FunctionFactory & factory) { registerFunctionCurrentDatabase(factory); diff --git a/dbms/src/Functions/registerFunctionsNull.cpp b/dbms/src/Functions/registerFunctionsNull.cpp index dd73ea2c16e..e8894e19907 100644 --- a/dbms/src/Functions/registerFunctionsNull.cpp +++ b/dbms/src/Functions/registerFunctionsNull.cpp @@ -1,6 +1,17 @@ -#include "registerFunctions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionIsNull(FunctionFactory & factory); +void registerFunctionIsNotNull(FunctionFactory & factory); +void registerFunctionCoalesce(FunctionFactory & factory); +void registerFunctionIfNull(FunctionFactory & factory); +void registerFunctionNullIf(FunctionFactory & factory); +void registerFunctionAssumeNotNull(FunctionFactory & factory); +void registerFunctionToNullable(FunctionFactory & factory); + + void registerFunctionsNull(FunctionFactory & factory) { registerFunctionIsNull(factory); diff --git a/dbms/src/Functions/registerFunctionsRandom.cpp b/dbms/src/Functions/registerFunctionsRandom.cpp index 6bf41df6b22..7b72c1cf305 100644 --- a/dbms/src/Functions/registerFunctionsRandom.cpp +++ b/dbms/src/Functions/registerFunctionsRandom.cpp @@ -1,12 +1,21 @@ -#include "registerFunctions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionRand(FunctionFactory & factory); +void registerFunctionRand64(FunctionFactory & factory); +void registerFunctionRandConstant(FunctionFactory & factory); +void registerFunctionGenerateUUIDv4(FunctionFactory & factory); +void registerFunctionRandomPrintableASCII(FunctionFactory & factory); + void registerFunctionsRandom(FunctionFactory & factory) { registerFunctionRand(factory); registerFunctionRand64(factory); registerFunctionRandConstant(factory); registerFunctionGenerateUUIDv4(factory); + registerFunctionRandomPrintableASCII(factory); } } diff --git a/dbms/src/Functions/registerFunctionsString.cpp b/dbms/src/Functions/registerFunctionsString.cpp index 3c8020432a9..cc94e877bbf 100644 --- a/dbms/src/Functions/registerFunctionsString.cpp +++ b/dbms/src/Functions/registerFunctionsString.cpp @@ -1,9 +1,39 @@ -#include #include "config_functions.h" -#include "registerFunctions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionRepeat(FunctionFactory &); +void registerFunctionEmpty(FunctionFactory &); +void registerFunctionNotEmpty(FunctionFactory &); +void registerFunctionLength(FunctionFactory &); +void registerFunctionLengthUTF8(FunctionFactory &); +void registerFunctionIsValidUTF8(FunctionFactory &); +void registerFunctionToValidUTF8(FunctionFactory &); +void registerFunctionLower(FunctionFactory &); +void registerFunctionUpper(FunctionFactory &); +void registerFunctionLowerUTF8(FunctionFactory &); +void registerFunctionUpperUTF8(FunctionFactory &); +void registerFunctionReverse(FunctionFactory &); +void registerFunctionReverseUTF8(FunctionFactory &); +void registerFunctionsConcat(FunctionFactory &); +void registerFunctionFormat(FunctionFactory &); +void registerFunctionSubstring(FunctionFactory &); +void registerFunctionCRC(FunctionFactory &); +void registerFunctionAppendTrailingCharIfAbsent(FunctionFactory &); +void registerFunctionStartsWith(FunctionFactory &); +void registerFunctionEndsWith(FunctionFactory &); +void registerFunctionTrim(FunctionFactory &); +void registerFunctionRegexpQuoteMeta(FunctionFactory &); + +#if USE_BASE64 +void registerFunctionBase64Encode(FunctionFactory &); +void registerFunctionBase64Decode(FunctionFactory &); +void registerFunctionTryBase64Decode(FunctionFactory &); +#endif + void registerFunctionsString(FunctionFactory & factory) { registerFunctionRepeat(factory); diff --git a/dbms/src/Functions/registerFunctionsTuple.cpp b/dbms/src/Functions/registerFunctionsTuple.cpp index d5a16734dd1..12092e1e7e0 100644 --- a/dbms/src/Functions/registerFunctionsTuple.cpp +++ b/dbms/src/Functions/registerFunctionsTuple.cpp @@ -1,6 +1,11 @@ -#include "registerFunctions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionTuple(FunctionFactory &); +void registerFunctionTupleElement(FunctionFactory &); + void registerFunctionsTuple(FunctionFactory & factory) { registerFunctionTuple(factory); diff --git a/dbms/src/Functions/registerFunctionsVisitParam.cpp b/dbms/src/Functions/registerFunctionsVisitParam.cpp index db3fffc9dcc..01084594f08 100644 --- a/dbms/src/Functions/registerFunctionsVisitParam.cpp +++ b/dbms/src/Functions/registerFunctionsVisitParam.cpp @@ -1,6 +1,16 @@ -#include "registerFunctions.h" namespace DB { + +class FunctionFactory; + +void registerFunctionVisitParamHas(FunctionFactory & factory); +void registerFunctionVisitParamExtractUInt(FunctionFactory & factory); +void registerFunctionVisitParamExtractInt(FunctionFactory & factory); +void registerFunctionVisitParamExtractFloat(FunctionFactory & factory); +void registerFunctionVisitParamExtractBool(FunctionFactory & factory); +void registerFunctionVisitParamExtractRaw(FunctionFactory & factory); +void registerFunctionVisitParamExtractString(FunctionFactory & factory); + void registerFunctionsVisitParam(FunctionFactory & factory) { registerFunctionVisitParamHas(factory); diff --git a/dbms/src/Functions/runningAccumulate.cpp b/dbms/src/Functions/runningAccumulate.cpp index a4ccc1e1553..53dc5e19777 100644 --- a/dbms/src/Functions/runningAccumulate.cpp +++ b/dbms/src/Functions/runningAccumulate.cpp @@ -15,6 +15,7 @@ namespace ErrorCodes { extern const int ILLEGAL_COLUMN; extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; } @@ -46,10 +47,9 @@ public: return true; } - size_t getNumberOfArguments() const override - { - return 1; - } + bool isVariadic() const override { return true; } + + size_t getNumberOfArguments() const override { return 0; } bool isDeterministic() const override { return false; } @@ -60,6 +60,10 @@ public: DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override { + if (arguments.size() < 1 || arguments.size() > 2) + throw Exception("Incorrect number of arguments of function " + getName() + ". Must be 1 or 2.", + ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + const DataTypeAggregateFunction * type = checkAndGetDataType(arguments[0].get()); if (!type) throw Exception("Argument for function " + getName() + " must have type AggregateFunction - state of aggregate function.", @@ -72,19 +76,24 @@ public: { const ColumnAggregateFunction * column_with_states = typeid_cast(&*block.getByPosition(arguments.at(0)).column); + if (!column_with_states) throw Exception("Illegal column " + block.getByPosition(arguments.at(0)).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); + ColumnPtr column_with_groups; + + if (arguments.size() == 2) + column_with_groups = block.getByPosition(arguments[1]).column; + AggregateFunctionPtr aggregate_function_ptr = column_with_states->getAggregateFunction(); const IAggregateFunction & agg_func = *aggregate_function_ptr; AlignedBuffer place(agg_func.sizeOfData(), agg_func.alignOfData()); - agg_func.create(place.data()); - SCOPE_EXIT(agg_func.destroy(place.data())); + /// Will pass empty arena if agg_func does not allocate memory in arena std::unique_ptr arena = agg_func.allocatesMemoryInArena() ? std::make_unique() : nullptr; auto result_column_ptr = agg_func.getReturnType()->createColumn(); @@ -92,11 +101,32 @@ public: result_column.reserve(column_with_states->size()); const auto & states = column_with_states->getData(); + + bool state_created = false; + SCOPE_EXIT({ + if (state_created) + agg_func.destroy(place.data()); + }); + + size_t row_number = 0; for (const auto & state_to_add : states) { - /// Will pass empty arena if agg_func does not allocate memory in arena + if (row_number == 0 || (column_with_groups && column_with_groups->compareAt(row_number, row_number - 1, *column_with_groups, 1) != 0)) + { + if (state_created) + { + agg_func.destroy(place.data()); + state_created = false; + } + + agg_func.create(place.data()); + state_created = true; + } + agg_func.merge(place.data(), state_to_add, arena.get()); agg_func.insertResultInto(place.data(), result_column); + + ++row_number; } block.getByPosition(result).column = std::move(result_column_ptr); diff --git a/dbms/src/Functions/trap.cpp b/dbms/src/Functions/trap.cpp index 9176a8656af..a7f8d81576d 100644 --- a/dbms/src/Functions/trap.cpp +++ b/dbms/src/Functions/trap.cpp @@ -1,4 +1,3 @@ -#include "registerFunctions.h" #if 0 #include @@ -125,6 +124,10 @@ public: t1.join(); t2.join(); } + else if (mode == "throw exception") + { + std::vector().at(0); + } else if (mode == "access context") { (void)context.getCurrentQueryId(); diff --git a/dbms/src/IO/BitHelpers.h b/dbms/src/IO/BitHelpers.h index 1947d9d99ba..321fb4d254e 100644 --- a/dbms/src/IO/BitHelpers.h +++ b/dbms/src/IO/BitHelpers.h @@ -1,9 +1,10 @@ #pragma once -#include -#include #include #include +#include + +#include #if defined(__OpenBSD__) || defined(__FreeBSD__) # include @@ -14,9 +15,16 @@ # define be64toh(x) OSSwapBigToHostInt64(x) #endif + namespace DB { +namespace ErrorCodes +{ +extern const int CANNOT_WRITE_AFTER_END_OF_BUFFER; +extern const int ATTEMPT_TO_READ_AFTER_EOF; +} + /** Reads data from underlying ReadBuffer bit by bit, max 64 bits at once. * * reads MSB bits first, imagine that you have a data: @@ -34,15 +42,20 @@ namespace DB class BitReader { - ReadBuffer & buf; + using BufferType = unsigned __int128; - UInt64 bits_buffer; + const char * source_begin; + const char * source_current; + const char * source_end; + + BufferType bits_buffer; UInt8 bits_count; - static constexpr UInt8 BIT_BUFFER_SIZE = sizeof(bits_buffer) * 8; public: - BitReader(ReadBuffer & buf_) - : buf(buf_), + BitReader(const char * begin, size_t size) + : source_begin(begin), + source_current(begin), + source_end(begin + size), bits_buffer(0), bits_count(0) {} @@ -50,44 +63,21 @@ public: ~BitReader() {} - inline UInt64 readBits(UInt8 bits) + // reads bits_to_read high-bits from bits_buffer + inline UInt64 readBits(UInt8 bits_to_read) { - UInt64 result = 0; - bits = std::min(static_cast(sizeof(result) * 8), bits); + if (bits_to_read > bits_count) + fillBitBuffer(); - while (bits != 0) - { - if (bits_count == 0) - { - fillBuffer(); - if (bits_count == 0) - { - // EOF. - break; - } - } - - const auto to_read = std::min(bits, bits_count); - - const UInt64 v = bits_buffer >> (bits_count - to_read); - const UInt64 mask = maskLowBits(to_read); - const UInt64 value = v & mask; - result |= value; - - // unset bits that were read - bits_buffer &= ~(mask << (bits_count - to_read)); - bits_count -= to_read; - bits -= to_read; - - result <<= std::min(bits, BIT_BUFFER_SIZE); - } - - return result; + return getBitsFromBitBuffer(bits_to_read); } - inline UInt64 peekBits(UInt8 /*bits*/) + inline UInt8 peekByte() { - return 0; + if (bits_count < 8) + fillBitBuffer(); + + return getBitsFromBitBuffer(8); } inline UInt8 readBit() @@ -95,34 +85,95 @@ public: return static_cast(readBits(1)); } + // skip bits from bits_buffer + inline void skipBufferedBits(UInt8 bits) + { + bits_buffer <<= bits; + bits_count -= bits; + } + + inline bool eof() const { - return bits_count == 0 && buf.eof(); + return bits_count == 0 && source_current >= source_end; + } + + // number of bits that was already read by clients with readBits() + inline UInt64 count() const + { + return (source_current - source_begin) * 8 - bits_count; + } + + inline UInt64 remaining() const + { + return (source_end - source_current) * 8 + bits_count; } private: - void fillBuffer() + enum GetBitsMode {CONSUME, PEEK}; + // read data from internal buffer, if it has not enough bits, result is undefined. + template + inline UInt64 getBitsFromBitBuffer(UInt8 bits_to_read) { - auto read = buf.read(reinterpret_cast(&bits_buffer), BIT_BUFFER_SIZE / 8); - bits_buffer = be64toh(bits_buffer); - bits_buffer >>= BIT_BUFFER_SIZE - read * 8; + // push down the high-bits + const UInt64 result = static_cast(bits_buffer >> (sizeof(bits_buffer) * 8 - bits_to_read)); - bits_count = static_cast(read) * 8; + if constexpr (mode == CONSUME) + { + // 'erase' high-bits that were have read + skipBufferedBits(bits_to_read); + } + + return result; + } + + + // Fills internal bits_buffer with data from source, reads at most 64 bits + size_t fillBitBuffer() + { + const size_t available = source_end - source_current; + const auto bytes_to_read = std::min(64 / 8, available); + if (available == 0) + { + if (bytes_to_read == 0) + return 0; + + throw Exception("Buffer is empty, but requested to read " + + std::to_string(bytes_to_read) + " more bytes.", + ErrorCodes::ATTEMPT_TO_READ_AFTER_EOF); + } + + UInt64 tmp_buffer = 0; + memcpy(&tmp_buffer, source_current, bytes_to_read); + source_current += bytes_to_read; + + tmp_buffer = be64toh(tmp_buffer); + + bits_buffer |= BufferType(tmp_buffer) << ((sizeof(BufferType) - sizeof(tmp_buffer)) * 8 - bits_count); + bits_count += static_cast(bytes_to_read) * 8; + + return bytes_to_read; } }; class BitWriter { - WriteBuffer & buf; + using BufferType = unsigned __int128; - UInt64 bits_buffer; + char * dest_begin; + char * dest_current; + char * dest_end; + + BufferType bits_buffer; UInt8 bits_count; static constexpr UInt8 BIT_BUFFER_SIZE = sizeof(bits_buffer) * 8; public: - BitWriter(WriteBuffer & buf_) - : buf(buf_), + BitWriter(char * begin, size_t size) + : dest_begin(begin), + dest_current(begin), + dest_end(begin + size), bits_buffer(0), bits_count(0) {} @@ -132,54 +183,59 @@ public: flush(); } - inline void writeBits(UInt8 bits, UInt64 value) + // write `bits_to_write` low-bits of `value` to the buffer + inline void writeBits(UInt8 bits_to_write, UInt64 value) { - bits = std::min(static_cast(sizeof(value) * 8), bits); - - while (bits > 0) + UInt32 capacity = BIT_BUFFER_SIZE - bits_count; + if (capacity < bits_to_write) { - auto v = value; - auto to_write = bits; - - const UInt8 capacity = BIT_BUFFER_SIZE - bits_count; - if (capacity < bits) - { - v >>= bits - capacity; - to_write = capacity; - } - - const UInt64 mask = maskLowBits(to_write); - v &= mask; - - bits_buffer <<= to_write; - bits_buffer |= v; - bits_count += to_write; - - if (bits_count < BIT_BUFFER_SIZE) - break; - doFlush(); - bits -= to_write; + capacity = BIT_BUFFER_SIZE - bits_count; } + +// write low bits of value as high bits of bits_buffer + const UInt64 mask = maskLowBits(bits_to_write); + BufferType v = value & mask; + v <<= capacity - bits_to_write; + + bits_buffer |= v; + bits_count += bits_to_write; } + // flush contents of bits_buffer to the dest_current, partial bytes are completed with zeroes. inline void flush() { - if (bits_count != 0) - { - bits_buffer <<= (BIT_BUFFER_SIZE - bits_count); + bits_count = (bits_count + 8 - 1) & ~(8 - 1); // align UP to 8-bytes, so doFlush will write ALL data from bits_buffer + while (bits_count != 0) doFlush(); - } + } + + inline UInt64 count() const + { + return (dest_current - dest_begin) * 8 + bits_count; } private: void doFlush() { - bits_buffer = htobe64(bits_buffer); - buf.write(reinterpret_cast(&bits_buffer), (bits_count + 7) / 8); + // write whole bytes to the dest_current, leaving partial bits in bits_buffer + const size_t available = dest_end - dest_current; + const size_t to_write = std::min(sizeof(UInt64), bits_count / 8); // align to 8-bit boundary - bits_count = 0; - bits_buffer = 0; + if (available < to_write) + { + throw Exception("Can not write past end of buffer. Space available " + + std::to_string(available) + " bytes, required to write: " + + std::to_string(to_write) + ".", + ErrorCodes::CANNOT_WRITE_AFTER_END_OF_BUFFER); + } + + const auto tmp_buffer = htobe64(static_cast(bits_buffer >> (sizeof(bits_buffer) - sizeof(UInt64)) * 8)); + memcpy(dest_current, &tmp_buffer, to_write); + dest_current += to_write; + + bits_buffer <<= to_write * 8; + bits_count -= to_write * 8; } }; diff --git a/dbms/src/IO/BrotliWriteBuffer.cpp b/dbms/src/IO/BrotliWriteBuffer.cpp index 0a0eeb52956..ac1e2b3c188 100644 --- a/dbms/src/IO/BrotliWriteBuffer.cpp +++ b/dbms/src/IO/BrotliWriteBuffer.cpp @@ -30,14 +30,14 @@ public: BrotliEncoderState * state; }; -BrotliWriteBuffer::BrotliWriteBuffer(WriteBuffer & out_, int compression_level, size_t buf_size, char * existing_memory, size_t alignment) - : BufferWithOwnMemory(buf_size, existing_memory, alignment) - , brotli(std::make_unique()) - , in_available(0) - , in_data(nullptr) - , out_capacity(0) - , out_data(nullptr) - , out(out_) +BrotliWriteBuffer::BrotliWriteBuffer(std::unique_ptr out_, int compression_level, size_t buf_size, char * existing_memory, size_t alignment) + : BufferWithOwnMemory(buf_size, existing_memory, alignment) + , brotli(std::make_unique()) + , in_available(0) + , in_data(nullptr) + , out_capacity(0) + , out_data(nullptr) + , out(std::move(out_)) { BrotliEncoderSetParameter(brotli->state, BROTLI_PARAM_QUALITY, static_cast(compression_level)); // Set LZ77 window size. According to brotli sources default value is 24 (c/tools/brotli.c:81) @@ -68,9 +68,9 @@ void BrotliWriteBuffer::nextImpl() do { - out.nextIfAtEnd(); - out_data = reinterpret_cast(out.position()); - out_capacity = out.buffer().end() - out.position(); + out->nextIfAtEnd(); + out_data = reinterpret_cast(out->position()); + out_capacity = out->buffer().end() - out->position(); int result = BrotliEncoderCompressStream( brotli->state, @@ -81,7 +81,7 @@ void BrotliWriteBuffer::nextImpl() &out_data, nullptr); - out.position() = out.buffer().end() - out_capacity; + out->position() = out->buffer().end() - out_capacity; if (result == 0) { @@ -100,9 +100,9 @@ void BrotliWriteBuffer::finish() while (true) { - out.nextIfAtEnd(); - out_data = reinterpret_cast(out.position()); - out_capacity = out.buffer().end() - out.position(); + out->nextIfAtEnd(); + out_data = reinterpret_cast(out->position()); + out_capacity = out->buffer().end() - out->position(); int result = BrotliEncoderCompressStream( brotli->state, @@ -113,7 +113,7 @@ void BrotliWriteBuffer::finish() &out_data, nullptr); - out.position() = out.buffer().end() - out_capacity; + out->position() = out->buffer().end() - out_capacity; if (BrotliEncoderIsFinished(brotli->state)) { diff --git a/dbms/src/IO/BrotliWriteBuffer.h b/dbms/src/IO/BrotliWriteBuffer.h index 6cc2a4ec4b7..5a294354f49 100644 --- a/dbms/src/IO/BrotliWriteBuffer.h +++ b/dbms/src/IO/BrotliWriteBuffer.h @@ -10,11 +10,11 @@ class BrotliWriteBuffer : public BufferWithOwnMemory { public: BrotliWriteBuffer( - WriteBuffer & out_, - int compression_level, - size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE, - char * existing_memory = nullptr, - size_t alignment = 0); + std::unique_ptr out_, + int compression_level, + size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE, + char * existing_memory = nullptr, + size_t alignment = 0); ~BrotliWriteBuffer() override; @@ -30,9 +30,9 @@ private: const uint8_t * in_data; size_t out_capacity; - uint8_t * out_data; + uint8_t * out_data; - WriteBuffer & out; + std::unique_ptr out; bool finished = false; }; diff --git a/dbms/src/IO/CompressionMethod.cpp b/dbms/src/IO/CompressionMethod.cpp new file mode 100644 index 00000000000..20f1ea44301 --- /dev/null +++ b/dbms/src/IO/CompressionMethod.cpp @@ -0,0 +1,104 @@ +#include + +#include +#include +#include +#include +#include +#include + +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int NOT_IMPLEMENTED; +} + + +std::string toContentEncodingName(CompressionMethod method) +{ + switch (method) + { + case CompressionMethod::Gzip: return "gzip"; + case CompressionMethod::Zlib: return "deflate"; + case CompressionMethod::Brotli: return "br"; + case CompressionMethod::None: return ""; + } + __builtin_unreachable(); +} + + +CompressionMethod chooseCompressionMethod(const std::string & path, const std::string & hint) +{ + std::string file_extension; + if (hint.empty() || hint == "auto") + { + auto pos = path.find_last_of('.'); + if (pos != std::string::npos) + file_extension = path.substr(pos + 1, std::string::npos); + } + + const std::string * method_str = file_extension.empty() ? &hint : &file_extension; + + if (*method_str == "gzip" || *method_str == "gz") + return CompressionMethod::Gzip; + if (*method_str == "deflate") + return CompressionMethod::Zlib; + if (*method_str == "brotli" || *method_str == "br") + return CompressionMethod::Brotli; + if (hint.empty() || hint == "auto" || hint == "none") + return CompressionMethod::None; + + throw Exception("Unknown compression method " + hint + ". Only 'auto', 'none', 'gzip', 'br' are supported as compression methods", + ErrorCodes::NOT_IMPLEMENTED); +} + + +std::unique_ptr wrapReadBufferWithCompressionMethod( + std::unique_ptr nested, + CompressionMethod method, + size_t buf_size, + char * existing_memory, + size_t alignment) +{ + if (method == CompressionMethod::Gzip || method == CompressionMethod::Zlib) + return std::make_unique(std::move(nested), method, buf_size, existing_memory, alignment); +#if USE_BROTLI + if (method == CompressionMethod::Brotli) + return std::make_unique(std::move(nested), buf_size, existing_memory, alignment); +#endif + + if (method == CompressionMethod::None) + return nested; + + throw Exception("Unsupported compression method", ErrorCodes::NOT_IMPLEMENTED); +} + + +std::unique_ptr wrapWriteBufferWithCompressionMethod( + std::unique_ptr nested, + CompressionMethod method, + int level, + size_t buf_size, + char * existing_memory, + size_t alignment) +{ + if (method == DB::CompressionMethod::Gzip || method == CompressionMethod::Zlib) + return std::make_unique(std::move(nested), method, level, buf_size, existing_memory, alignment); + +#if USE_BROTLI + if (method == DB::CompressionMethod::Brotli) + return std::make_unique(std::move(nested), level, buf_size, existing_memory, alignment); +#endif + + if (method == CompressionMethod::None) + return nested; + + throw Exception("Unsupported compression method", ErrorCodes::NOT_IMPLEMENTED); +} + +} diff --git a/dbms/src/IO/CompressionMethod.h b/dbms/src/IO/CompressionMethod.h index c54d2b581fd..64c2ba3341f 100644 --- a/dbms/src/IO/CompressionMethod.h +++ b/dbms/src/IO/CompressionMethod.h @@ -1,18 +1,57 @@ #pragma once +#include +#include + +#include + + namespace DB { +class ReadBuffer; +class WriteBuffer; + +/** These are "generally recognizable" compression methods for data import/export. + * Do not mess with more efficient compression methods used by ClickHouse internally + * (they use non-standard framing, indexes, checksums...) + */ + enum class CompressionMethod { + None, /// DEFLATE compression with gzip header and CRC32 checksum. /// This option corresponds to files produced by gzip(1) or HTTP Content-Encoding: gzip. Gzip, /// DEFLATE compression with zlib header and Adler32 checksum. /// This option corresponds to HTTP Content-Encoding: deflate. Zlib, - Brotli, - None + Brotli }; +/// How the compression method is named in HTTP. +std::string toContentEncodingName(CompressionMethod method); + +/** Choose compression method from path and hint. + * if hint is "auto" or empty string, then path is analyzed, + * otherwise path parameter is ignored and hint is used as compression method name. + * path is arbitrary string that will be analyzed for file extension (gz, br...) that determines compression. + */ +CompressionMethod chooseCompressionMethod(const std::string & path, const std::string & hint); + +std::unique_ptr wrapReadBufferWithCompressionMethod( + std::unique_ptr nested, + CompressionMethod method, + size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE, + char * existing_memory = nullptr, + size_t alignment = 0); + +std::unique_ptr wrapWriteBufferWithCompressionMethod( + std::unique_ptr nested, + CompressionMethod method, + int level, + size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE, + char * existing_memory = nullptr, + size_t alignment = 0); + } diff --git a/dbms/src/IO/MMapReadBufferFromFile.cpp b/dbms/src/IO/MMapReadBufferFromFile.cpp index 45558b540e5..32271103724 100644 --- a/dbms/src/IO/MMapReadBufferFromFile.cpp +++ b/dbms/src/IO/MMapReadBufferFromFile.cpp @@ -22,7 +22,7 @@ namespace ErrorCodes } -void MMapReadBufferFromFile::open(const std::string & file_name) +void MMapReadBufferFromFile::open() { ProfileEvents::increment(ProfileEvents::FileOpen); @@ -34,16 +34,24 @@ void MMapReadBufferFromFile::open(const std::string & file_name) } -MMapReadBufferFromFile::MMapReadBufferFromFile(const std::string & file_name, size_t offset, size_t length_) +std::string MMapReadBufferFromFile::getFileName() const { - open(file_name); + return file_name; +} + + +MMapReadBufferFromFile::MMapReadBufferFromFile(const std::string & file_name_, size_t offset, size_t length_) + : file_name(file_name_) +{ + open(); init(fd, offset, length_); } -MMapReadBufferFromFile::MMapReadBufferFromFile(const std::string & file_name, size_t offset) +MMapReadBufferFromFile::MMapReadBufferFromFile(const std::string & file_name_, size_t offset) + : file_name(file_name_) { - open(file_name); + open(); init(fd, offset); } diff --git a/dbms/src/IO/MMapReadBufferFromFile.h b/dbms/src/IO/MMapReadBufferFromFile.h index 6790f817b93..bc566a0489c 100644 --- a/dbms/src/IO/MMapReadBufferFromFile.h +++ b/dbms/src/IO/MMapReadBufferFromFile.h @@ -16,21 +16,24 @@ namespace DB class MMapReadBufferFromFile : public MMapReadBufferFromFileDescriptor { public: - MMapReadBufferFromFile(const std::string & file_name, size_t offset, size_t length_); + MMapReadBufferFromFile(const std::string & file_name_, size_t offset, size_t length_); /// Map till end of file. - MMapReadBufferFromFile(const std::string & file_name, size_t offset); + MMapReadBufferFromFile(const std::string & file_name_, size_t offset); ~MMapReadBufferFromFile() override; void close(); + std::string getFileName() const override; + private: int fd = -1; + std::string file_name; CurrentMetrics::Increment metric_increment{CurrentMetrics::OpenFileForRead}; - void open(const std::string & file_name); + void open(); }; } diff --git a/dbms/src/IO/MMapReadBufferFromFileDescriptor.cpp b/dbms/src/IO/MMapReadBufferFromFileDescriptor.cpp index 4852f9e57e9..034c8524f83 100644 --- a/dbms/src/IO/MMapReadBufferFromFileDescriptor.cpp +++ b/dbms/src/IO/MMapReadBufferFromFileDescriptor.cpp @@ -5,6 +5,8 @@ #include #include +#include +#include #include @@ -18,6 +20,8 @@ namespace ErrorCodes extern const int CANNOT_STAT; extern const int BAD_ARGUMENTS; extern const int LOGICAL_ERROR; + extern const int ARGUMENT_OUT_OF_BOUND; + extern const int CANNOT_SEEK_THROUGH_FILE; } @@ -34,6 +38,7 @@ void MMapReadBufferFromFileDescriptor::init(int fd_, size_t offset, size_t lengt ErrorCodes::CANNOT_ALLOCATE_MEMORY); BufferBase::set(static_cast(buf), length, 0); + ReadBuffer::padded = (length % 4096) > 0 && (length % 4096) <= (4096 - 15); /// TODO determine page size } } @@ -58,14 +63,12 @@ void MMapReadBufferFromFileDescriptor::init(int fd_, size_t offset) MMapReadBufferFromFileDescriptor::MMapReadBufferFromFileDescriptor(int fd_, size_t offset_, size_t length_) - : MMapReadBufferFromFileDescriptor() { init(fd_, offset_, length_); } MMapReadBufferFromFileDescriptor::MMapReadBufferFromFileDescriptor(int fd_, size_t offset_) - : MMapReadBufferFromFileDescriptor() { init(fd_, offset_); } @@ -87,4 +90,39 @@ void MMapReadBufferFromFileDescriptor::finish() length = 0; } +std::string MMapReadBufferFromFileDescriptor::getFileName() const +{ + return "(fd = " + toString(fd) + ")"; +} + +int MMapReadBufferFromFileDescriptor::getFD() const +{ + return fd; +} + +off_t MMapReadBufferFromFileDescriptor::getPositionInFile() +{ + return count(); +} + +off_t MMapReadBufferFromFileDescriptor::doSeek(off_t offset, int whence) +{ + off_t new_pos; + if (whence == SEEK_SET) + new_pos = offset; + else if (whence == SEEK_CUR) + new_pos = count() + offset; + else + throw Exception("MMapReadBufferFromFileDescriptor::seek expects SEEK_SET or SEEK_CUR as whence", ErrorCodes::ARGUMENT_OUT_OF_BOUND); + + working_buffer = internal_buffer; + if (new_pos < 0 || new_pos > off_t(working_buffer.size())) + throw Exception("Cannot seek through file " + getFileName() + + " because seek position (" + toString(new_pos) + ") is out of bounds [0, " + toString(working_buffer.size()) + "]", + ErrorCodes::CANNOT_SEEK_THROUGH_FILE); + + position() = working_buffer.begin() + new_pos; + return new_pos; +} + } diff --git a/dbms/src/IO/MMapReadBufferFromFileDescriptor.h b/dbms/src/IO/MMapReadBufferFromFileDescriptor.h index aaef8c3212a..3cf5e89de7a 100644 --- a/dbms/src/IO/MMapReadBufferFromFileDescriptor.h +++ b/dbms/src/IO/MMapReadBufferFromFileDescriptor.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace DB @@ -11,14 +11,16 @@ namespace DB * Also you cannot control whether and how long actual IO take place, * so this method is not manageable and not recommended for anything except benchmarks. */ -class MMapReadBufferFromFileDescriptor : public ReadBuffer +class MMapReadBufferFromFileDescriptor : public ReadBufferFromFileBase { protected: - MMapReadBufferFromFileDescriptor() : ReadBuffer(nullptr, 0) {} + MMapReadBufferFromFileDescriptor() {} void init(int fd_, size_t offset, size_t length_); void init(int fd_, size_t offset); + off_t doSeek(off_t off, int whence) override; + public: MMapReadBufferFromFileDescriptor(int fd_, size_t offset_, size_t length_); @@ -30,6 +32,10 @@ public: /// unmap memory before call to destructor void finish(); + off_t getPositionInFile() override; + std::string getFileName() const override; + int getFD() const override; + private: size_t length = 0; int fd = -1; diff --git a/dbms/src/IO/ReadBufferFromFileBase.cpp b/dbms/src/IO/ReadBufferFromFileBase.cpp index 320053bad12..e1f9694c85d 100644 --- a/dbms/src/IO/ReadBufferFromFileBase.cpp +++ b/dbms/src/IO/ReadBufferFromFileBase.cpp @@ -3,6 +3,11 @@ namespace DB { +ReadBufferFromFileBase::ReadBufferFromFileBase() + : BufferWithOwnMemory(0) +{ +} + ReadBufferFromFileBase::ReadBufferFromFileBase(size_t buf_size, char * existing_memory, size_t alignment) : BufferWithOwnMemory(buf_size, existing_memory, alignment) { diff --git a/dbms/src/IO/ReadBufferFromFileBase.h b/dbms/src/IO/ReadBufferFromFileBase.h index 755f0269664..cf1dff9a574 100644 --- a/dbms/src/IO/ReadBufferFromFileBase.h +++ b/dbms/src/IO/ReadBufferFromFileBase.h @@ -14,6 +14,7 @@ namespace DB class ReadBufferFromFileBase : public BufferWithOwnMemory { public: + ReadBufferFromFileBase(); ReadBufferFromFileBase(size_t buf_size, char * existing_memory, size_t alignment); ReadBufferFromFileBase(ReadBufferFromFileBase &&) = default; ~ReadBufferFromFileBase() override; diff --git a/dbms/src/IO/ReadBufferFromFileDescriptor.cpp b/dbms/src/IO/ReadBufferFromFileDescriptor.cpp index db79d078c65..776d5fc828d 100644 --- a/dbms/src/IO/ReadBufferFromFileDescriptor.cpp +++ b/dbms/src/IO/ReadBufferFromFileDescriptor.cpp @@ -101,10 +101,12 @@ bool ReadBufferFromFileDescriptor::nextImpl() /// If 'offset' is small enough to stay in buffer after seek, then true seek in file does not happen. off_t ReadBufferFromFileDescriptor::doSeek(off_t offset, int whence) { - off_t new_pos = offset; - if (whence == SEEK_CUR) + off_t new_pos; + if (whence == SEEK_SET) + new_pos = offset; + else if (whence == SEEK_CUR) new_pos = pos_in_file - (working_buffer.end() - pos) + offset; - else if (whence != SEEK_SET) + else throw Exception("ReadBufferFromFileDescriptor::seek expects SEEK_SET or SEEK_CUR as whence", ErrorCodes::ARGUMENT_OUT_OF_BOUND); /// Position is unchanged. diff --git a/dbms/src/IO/ReadHelpers.cpp b/dbms/src/IO/ReadHelpers.cpp index ea54d37b1b1..9ad6cf72171 100644 --- a/dbms/src/IO/ReadHelpers.cpp +++ b/dbms/src/IO/ReadHelpers.cpp @@ -965,7 +965,7 @@ void readException(Exception & e, ReadBuffer & buf, const String & additional_me String name; String message; String stack_trace; - bool has_nested = false; + bool has_nested = false; /// Obsolete readBinary(code, buf); readBinary(name, buf); @@ -986,14 +986,7 @@ void readException(Exception & e, ReadBuffer & buf, const String & additional_me if (!stack_trace.empty()) out << " Stack trace:\n\n" << stack_trace; - if (has_nested) - { - Exception nested; - readException(nested, buf); - e = Exception(out.str(), nested, code); - } - else - e = Exception(out.str(), code); + e = Exception(out.str(), code); } void readAndThrowException(ReadBuffer & buf, const String & additional_message) diff --git a/dbms/src/IO/ReadHelpers.h b/dbms/src/IO/ReadHelpers.h index 47206039435..7e5b5ce804f 100644 --- a/dbms/src/IO/ReadHelpers.h +++ b/dbms/src/IO/ReadHelpers.h @@ -29,22 +29,13 @@ #include #include #include +#include #include -#include #include -#ifdef __clang__ -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wdouble-promotion" -#endif - #include -#ifdef __clang__ -#pragma clang diagnostic pop -#endif - /// 1 GiB #define DEFAULT_MAX_STRING_SIZE (1ULL << 30) @@ -1024,21 +1015,11 @@ void skipToNextLineOrEOF(ReadBuffer & buf); /// Skip to next character after next unescaped \n. If no \n in stream, skip to end. Does not throw on invalid escape sequences. void skipToUnescapedNextLineOrEOF(ReadBuffer & buf); -template -std::unique_ptr getReadBuffer(const DB::CompressionMethod method, Types&&... args) -{ - if (method == DB::CompressionMethod::Gzip) - { - auto read_buf = std::make_unique(std::forward(args)...); - return std::make_unique(std::move(read_buf), method); - } - return std::make_unique(args...); -} /** This function just copies the data from buffer's internal position (in.position()) * to current position (from arguments) into memory. */ -void saveUpToPosition(ReadBuffer & in, DB::Memory<> & memory, char * current); +void saveUpToPosition(ReadBuffer & in, Memory<> & memory, char * current); /** This function is negative to eof(). * In fact it returns whether the data was loaded to internal ReadBuffers's buffer or not. @@ -1047,6 +1028,6 @@ void saveUpToPosition(ReadBuffer & in, DB::Memory<> & memory, char * current); * of our buffer and the current cursor in the end of the buffer. When we call eof() it calls next(). * And this function can fill the buffer with new data, so we will lose the data from previous buffer state. */ -bool loadAtPosition(ReadBuffer & in, DB::Memory<> & memory, char * & current); +bool loadAtPosition(ReadBuffer & in, Memory<> & memory, char * & current); } diff --git a/dbms/src/IO/WriteBufferFromHTTPServerResponse.cpp b/dbms/src/IO/WriteBufferFromHTTPServerResponse.cpp index f8bd166a4dd..7fbdeab7ab5 100644 --- a/dbms/src/IO/WriteBufferFromHTTPServerResponse.cpp +++ b/dbms/src/IO/WriteBufferFromHTTPServerResponse.cpp @@ -105,67 +105,41 @@ void WriteBufferFromHTTPServerResponse::nextImpl() { if (compress) { - if (compression_method == CompressionMethod::Gzip) - { -#if defined(POCO_CLICKHOUSE_PATCH) - *response_header_ostr << "Content-Encoding: gzip\r\n"; -#else - response.set("Content-Encoding", "gzip"); - response_body_ostr = &(response.send()); -#endif - out_raw = std::make_unique(*response_body_ostr); - deflating_buf.emplace(std::move(out_raw), compression_method, compression_level, working_buffer.size(), working_buffer.begin()); - out = &*deflating_buf; - } - else if (compression_method == CompressionMethod::Zlib) - { -#if defined(POCO_CLICKHOUSE_PATCH) - *response_header_ostr << "Content-Encoding: deflate\r\n"; -#else - response.set("Content-Encoding", "deflate"); - response_body_ostr = &(response.send()); -#endif - out_raw = std::make_unique(*response_body_ostr); - deflating_buf.emplace(std::move(out_raw), compression_method, compression_level, working_buffer.size(), working_buffer.begin()); - out = &*deflating_buf; - } -#if USE_BROTLI - else if (compression_method == CompressionMethod::Brotli) - { -#if defined(POCO_CLICKHOUSE_PATCH) - *response_header_ostr << "Content-Encoding: br\r\n"; -#else - response.set("Content-Encoding", "br"); - response_body_ostr = &(response.send()); -#endif - out_raw = std::make_unique(*response_body_ostr); - brotli_buf.emplace(*out_raw, compression_level, working_buffer.size(), working_buffer.begin()); - out = &*brotli_buf; - } -#endif + auto content_encoding_name = toContentEncodingName(compression_method); - else - throw Exception("Logical error: unknown compression method passed to WriteBufferFromHTTPServerResponse", - ErrorCodes::LOGICAL_ERROR); - /// Use memory allocated for the outer buffer in the buffer pointed to by out. This avoids extra allocation and copy. +#if defined(POCO_CLICKHOUSE_PATCH) + *response_header_ostr << "Content-Encoding: " << content_encoding_name << "\r\n"; +#else + response.set("Content-Encoding", content_encoding_name); +#endif } - else - { + #if !defined(POCO_CLICKHOUSE_PATCH) - response_body_ostr = &(response.send()); + response_body_ostr = &(response.send()); #endif - out_raw = std::make_unique(*response_body_ostr, working_buffer.size(), working_buffer.begin()); - out = &*out_raw; - } + /// We reuse our buffer in "out" to avoid extra allocations and copies. + + if (compress) + out = wrapWriteBufferWithCompressionMethod( + std::make_unique(*response_body_ostr), + compress ? compression_method : CompressionMethod::None, + compression_level, + working_buffer.size(), + working_buffer.begin()); + else + out = std::make_unique( + *response_body_ostr, + working_buffer.size(), + working_buffer.begin()); } finishSendHeaders(); - } if (out) { + out->buffer() = buffer(); out->position() = position(); out->next(); } @@ -177,9 +151,8 @@ WriteBufferFromHTTPServerResponse::WriteBufferFromHTTPServerResponse( Poco::Net::HTTPServerResponse & response_, unsigned keep_alive_timeout_, bool compress_, - CompressionMethod compression_method_, - size_t size) - : BufferWithOwnMemory(size) + CompressionMethod compression_method_) + : BufferWithOwnMemory(DBMS_DEFAULT_BUFFER_SIZE) , request(request_) , response(response_) , keep_alive_timeout(keep_alive_timeout_) @@ -215,6 +188,9 @@ void WriteBufferFromHTTPServerResponse::finalize() if (offset()) { next(); + + if (out) + out.reset(); } else { diff --git a/dbms/src/IO/WriteBufferFromHTTPServerResponse.h b/dbms/src/IO/WriteBufferFromHTTPServerResponse.h index 642e59e4921..f0b614c7406 100644 --- a/dbms/src/IO/WriteBufferFromHTTPServerResponse.h +++ b/dbms/src/IO/WriteBufferFromHTTPServerResponse.h @@ -8,8 +8,6 @@ #include #include #include -#include -#include #include #include #include @@ -52,7 +50,7 @@ private: unsigned keep_alive_timeout = 0; bool compress = false; CompressionMethod compression_method; - int compression_level = Z_DEFAULT_COMPRESSION; + int compression_level = 1; std::ostream * response_body_ostr = nullptr; @@ -60,13 +58,7 @@ private: std::ostream * response_header_ostr = nullptr; #endif - std::unique_ptr out_raw; - std::optional deflating_buf; -#if USE_BROTLI - std::optional brotli_buf; -#endif - - WriteBuffer * out = nullptr; /// Uncompressed HTTP body is written to this buffer. Points to out_raw or possibly to deflating_buf. + std::unique_ptr out; bool headers_started_sending = false; bool headers_finished_sending = false; /// If true, you could not add any headers. @@ -99,8 +91,7 @@ public: Poco::Net::HTTPServerResponse & response_, unsigned keep_alive_timeout_, bool compress_ = false, /// If true - set Content-Encoding header and compress the result. - CompressionMethod compression_method_ = CompressionMethod::Gzip, - size_t size = DBMS_DEFAULT_BUFFER_SIZE); + CompressionMethod compression_method_ = CompressionMethod::None); /// Writes progess in repeating HTTP headers. void onProgress(const Progress & progress); diff --git a/dbms/src/IO/WriteHelpers.cpp b/dbms/src/IO/WriteHelpers.cpp index fe64983c18a..d2605dce9fe 100644 --- a/dbms/src/IO/WriteHelpers.cpp +++ b/dbms/src/IO/WriteHelpers.cpp @@ -48,7 +48,6 @@ void formatUUID(std::reverse_iterator src16, UInt8 * dst36) } - void writeException(const Exception & e, WriteBuffer & buf, bool with_stack_trace) { writeBinary(e.code(), buf); @@ -56,14 +55,11 @@ void writeException(const Exception & e, WriteBuffer & buf, bool with_stack_trac writeBinary(e.displayText(), buf); if (with_stack_trace) - writeBinary(e.getStackTrace().toString(), buf); + writeBinary(e.getStackTraceString(), buf); else writeBinary(String(), buf); - bool has_nested = e.nested() != nullptr; + bool has_nested = false; writeBinary(has_nested, buf); - - if (has_nested) - writeException(Exception(Exception::CreateFromPoco, *e.nested()), buf, with_stack_trace); } } diff --git a/dbms/src/IO/WriteHelpers.h b/dbms/src/IO/WriteHelpers.h index 082bf63e6b7..3e6d579cb16 100644 --- a/dbms/src/IO/WriteHelpers.h +++ b/dbms/src/IO/WriteHelpers.h @@ -26,10 +26,12 @@ #include #include #include -#include + +#include #include + namespace DB { @@ -115,21 +117,108 @@ inline void writeBoolText(bool x, WriteBuffer & buf) writeChar(x ? '1' : '0', buf); } -template -inline size_t writeFloatTextFastPath(T x, char * buffer, int len) + +struct DecomposedFloat64 { - using Converter = DoubleConverter; - double_conversion::StringBuilder builder{buffer, len}; + DecomposedFloat64(double x) + { + memcpy(&x_uint, &x, sizeof(x)); + } + + uint64_t x_uint; + + bool sign() const + { + return x_uint >> 63; + } + + uint16_t exponent() const + { + return (x_uint >> 52) & 0x7FF; + } + + int16_t normalized_exponent() const + { + return int16_t(exponent()) - 1023; + } + + uint64_t mantissa() const + { + return x_uint & 0x5affffffffffffful; + } + + /// NOTE Probably floating point instructions can be better. + bool is_inside_int64() const + { + return x_uint == 0 + || (normalized_exponent() >= 0 && normalized_exponent() <= 52 + && ((mantissa() & ((1ULL << (52 - normalized_exponent())) - 1)) == 0)); + } +}; + +struct DecomposedFloat32 +{ + DecomposedFloat32(float x) + { + memcpy(&x_uint, &x, sizeof(x)); + } + + uint32_t x_uint; + + bool sign() const + { + return x_uint >> 31; + } + + uint16_t exponent() const + { + return (x_uint >> 23) & 0xFF; + } + + int16_t normalized_exponent() const + { + return int16_t(exponent()) - 127; + } + + uint32_t mantissa() const + { + return x_uint & 0x7fffff; + } + + bool is_inside_int32() const + { + return x_uint == 0 + || (normalized_exponent() >= 0 && normalized_exponent() <= 23 + && ((mantissa() & ((1ULL << (23 - normalized_exponent())) - 1)) == 0)); + } +}; + +template +inline size_t writeFloatTextFastPath(T x, char * buffer) +{ + int result = 0; - bool result = false; if constexpr (std::is_same_v) - result = Converter::instance().ToShortest(x, &builder); - else - result = Converter::instance().ToShortestSingle(x, &builder); + { + /// The library Ryu has low performance on integers. + /// This workaround improves performance 6..10 times. - if (!result) + if (DecomposedFloat64(x).is_inside_int64()) + result = itoa(Int64(x), buffer) - buffer; + else + result = d2s_buffered_n(x, buffer); + } + else + { + if (DecomposedFloat32(x).is_inside_int32()) + result = itoa(Int32(x), buffer) - buffer; + else + result = f2s_buffered_n(x, buffer); + } + + if (result <= 0) throw Exception("Cannot print floating point number", ErrorCodes::CANNOT_PRINT_FLOAT_OR_DOUBLE_NUMBER); - return builder.position(); + return result; } template @@ -140,23 +229,13 @@ inline void writeFloatText(T x, WriteBuffer & buf) using Converter = DoubleConverter; if (likely(buf.available() >= Converter::MAX_REPRESENTATION_LENGTH)) { - buf.position() += writeFloatTextFastPath(x, buf.position(), Converter::MAX_REPRESENTATION_LENGTH); + buf.position() += writeFloatTextFastPath(x, buf.position()); return; } Converter::BufferType buffer; - double_conversion::StringBuilder builder{buffer, sizeof(buffer)}; - - bool result = false; - if constexpr (std::is_same_v) - result = Converter::instance().ToShortest(x, &builder); - else - result = Converter::instance().ToShortestSingle(x, &builder); - - if (!result) - throw Exception("Cannot print floating point number", ErrorCodes::CANNOT_PRINT_FLOAT_OR_DOUBLE_NUMBER); - - buf.write(buffer, builder.position()); + size_t result = writeFloatTextFastPath(x, buffer); + buf.write(buffer, result); } @@ -955,15 +1034,4 @@ inline String toString(const T & x) return buf.str(); } -template -std::unique_ptr getWriteBuffer(const DB::CompressionMethod method, Types&&... args) -{ - if (method == DB::CompressionMethod::Gzip) - { - auto write_buf = std::make_unique(std::forward(args)...); - return std::make_unique(std::move(write_buf), method, 1 /* compression level */); - } - return std::make_unique(args...); -} - } diff --git a/dbms/src/IO/ZlibDeflatingWriteBuffer.cpp b/dbms/src/IO/ZlibDeflatingWriteBuffer.cpp index c4d7fac56a6..8efe96877e4 100644 --- a/dbms/src/IO/ZlibDeflatingWriteBuffer.cpp +++ b/dbms/src/IO/ZlibDeflatingWriteBuffer.cpp @@ -5,6 +5,12 @@ namespace DB { +namespace ErrorCodes +{ + extern const int ZLIB_DEFLATE_FAILED; +} + + ZlibDeflatingWriteBuffer::ZlibDeflatingWriteBuffer( std::unique_ptr out_, CompressionMethod compression_method, @@ -84,6 +90,21 @@ void ZlibDeflatingWriteBuffer::finish() next(); + /// https://github.com/zlib-ng/zlib-ng/issues/494 + do + { + out->nextIfAtEnd(); + zstr.next_out = reinterpret_cast(out->position()); + zstr.avail_out = out->buffer().end() - out->position(); + + int rc = deflate(&zstr, Z_FULL_FLUSH); + out->position() = out->buffer().end() - zstr.avail_out; + + if (rc != Z_OK) + throw Exception(std::string("deflate failed: ") + zError(rc), ErrorCodes::ZLIB_DEFLATE_FAILED); + } + while (zstr.avail_out == 0); + while (true) { out->nextIfAtEnd(); diff --git a/dbms/src/IO/ZlibDeflatingWriteBuffer.h b/dbms/src/IO/ZlibDeflatingWriteBuffer.h index 86eee1cffe5..f9df8f8157b 100644 --- a/dbms/src/IO/ZlibDeflatingWriteBuffer.h +++ b/dbms/src/IO/ZlibDeflatingWriteBuffer.h @@ -10,11 +10,6 @@ namespace DB { -namespace ErrorCodes -{ - extern const int ZLIB_DEFLATE_FAILED; -} - /// Performs compression using zlib library and writes compressed data to out_ WriteBuffer. class ZlibDeflatingWriteBuffer : public BufferWithOwnMemory { diff --git a/dbms/src/IO/createReadBufferFromFileBase.cpp b/dbms/src/IO/createReadBufferFromFileBase.cpp index 8fc5923e6ff..9fa560620dd 100644 --- a/dbms/src/IO/createReadBufferFromFileBase.cpp +++ b/dbms/src/IO/createReadBufferFromFileBase.cpp @@ -3,6 +3,7 @@ #if defined(__linux__) || defined(__FreeBSD__) #include #endif +#include #include @@ -11,13 +12,17 @@ namespace ProfileEvents extern const Event CreatedReadBufferOrdinary; extern const Event CreatedReadBufferAIO; extern const Event CreatedReadBufferAIOFailed; + extern const Event CreatedReadBufferMMap; + extern const Event CreatedReadBufferMMapFailed; } namespace DB { -std::unique_ptr createReadBufferFromFileBase(const std::string & filename_, size_t estimated_size, - size_t aio_threshold, size_t buffer_size_, int flags_, char * existing_memory_, size_t alignment) +std::unique_ptr createReadBufferFromFileBase( + const std::string & filename_, + size_t estimated_size, size_t aio_threshold, size_t mmap_threshold, + size_t buffer_size_, int flags_, char * existing_memory_, size_t alignment) { #if defined(__linux__) || defined(__FreeBSD__) if (aio_threshold && estimated_size >= aio_threshold) @@ -40,6 +45,21 @@ std::unique_ptr createReadBufferFromFileBase(const std:: (void)estimated_size; #endif + if (!existing_memory_ && mmap_threshold && estimated_size >= mmap_threshold) + { + try + { + auto res = std::make_unique(filename_, 0); + ProfileEvents::increment(ProfileEvents::CreatedReadBufferMMap); + return res; + } + catch (const ErrnoException &) + { + /// Fallback if mmap is not supported (example: pipe). + ProfileEvents::increment(ProfileEvents::CreatedReadBufferMMapFailed); + } + } + ProfileEvents::increment(ProfileEvents::CreatedReadBufferOrdinary); return std::make_unique(filename_, buffer_size_, flags_, existing_memory_, alignment); } diff --git a/dbms/src/IO/createReadBufferFromFileBase.h b/dbms/src/IO/createReadBufferFromFileBase.h index fa98e536a46..61dfde6229f 100644 --- a/dbms/src/IO/createReadBufferFromFileBase.h +++ b/dbms/src/IO/createReadBufferFromFileBase.h @@ -19,6 +19,7 @@ std::unique_ptr createReadBufferFromFileBase( const std::string & filename_, size_t estimated_size, size_t aio_threshold, + size_t mmap_threshold, size_t buffer_size_ = DBMS_DEFAULT_BUFFER_SIZE, int flags_ = -1, char * existing_memory_ = nullptr, diff --git a/dbms/src/IO/tests/CMakeLists.txt b/dbms/src/IO/tests/CMakeLists.txt index 38802718dd1..e168e704814 100644 --- a/dbms/src/IO/tests/CMakeLists.txt +++ b/dbms/src/IO/tests/CMakeLists.txt @@ -78,7 +78,7 @@ add_executable (parse_date_time_best_effort parse_date_time_best_effort.cpp) target_link_libraries (parse_date_time_best_effort PRIVATE clickhouse_common_io) add_executable (zlib_ng_bug zlib_ng_bug.cpp) -target_link_libraries (zlib_ng_bug PRIVATE ${Poco_Foundation_LIBRARY}) -if(NOT USE_INTERNAL_POCO_LIBRARY) - target_include_directories(zlib_ng_bug SYSTEM BEFORE PRIVATE ${Poco_INCLUDE_DIRS}) -endif() +target_link_libraries (zlib_ng_bug PRIVATE ${Poco_Foundation_LIBRARY} ${ZLIB_LIBRARY}) + +add_executable (ryu_test ryu_test.cpp) +target_link_libraries (ryu_test PRIVATE ryu) diff --git a/dbms/src/IO/tests/gtest_bit_io.cpp b/dbms/src/IO/tests/gtest_bit_io.cpp index 3def664231f..994e08214cc 100644 --- a/dbms/src/IO/tests/gtest_bit_io.cpp +++ b/dbms/src/IO/tests/gtest_bit_io.cpp @@ -36,11 +36,11 @@ std::string bin(const T & value, size_t bits = sizeof(T)*8) .to_string().substr(MAX_BITS - bits, bits); } +// gets N low bits of value template T getBits(UInt8 bits, const T & value) { - const T mask = ((static_cast(1) << static_cast(bits)) - 1); - return value & mask; + return value & maskLowBits(bits); } template @@ -83,12 +83,36 @@ std::string dumpContents(const T& container, return sstr.str(); } +template +::testing::AssertionResult BinaryEqual(const ValueLeft & left, const ValueRight & right) +{ +// ::testing::AssertionResult result = ::testing::AssertionSuccess(); + if (sizeof(left) != sizeof(right)) + return ::testing::AssertionFailure() + << "Sizes do not match, expected: " << sizeof(left) << " actual: " << sizeof(right); + + const auto size = std::min(sizeof(left), sizeof(right)); + if (memcmp(&left, &right, size) != 0) + { + const auto l_bits = left ? static_cast(std::log2(left)) : 0; + const auto r_bits = right ? static_cast(std::log2(right)) : 0; + const size_t bits = std::max(l_bits, r_bits) + 1; + + return ::testing::AssertionFailure() + << "Values are binary different,\n" + << "\texpected: 0b" << bin(left, bits) << " (" << std::hex << left << "),\n" + << "\tactual : 0b" << bin(right, bits) << " (" <> bits_and_vals; std::string expected_buffer_binary; - explicit TestCaseParameter(std::vector> vals, std::string binary = std::string{}) + TestCaseParameter(std::vector> vals, std::string binary = std::string{}) : bits_and_vals(std::move(vals)), expected_buffer_binary(binary) {} @@ -114,8 +138,7 @@ TEST_P(BitIO, WriteAndRead) PODArray data(max_buffer_size); { - WriteBuffer write_buffer(data.data(), data.size()); - BitWriter writer(write_buffer); + BitWriter writer(data.data(), data.size()); for (const auto & bv : bits_and_vals) { writer.writeBits(bv.first, bv.second); @@ -133,38 +156,73 @@ TEST_P(BitIO, WriteAndRead) ASSERT_EQ(expected_buffer_binary, actual_buffer_binary); } - BitReader reader(read_buffer); + BitReader reader(data.data(), data.size()); + int bitpos = 0; int item = 0; for (const auto & bv : bits_and_vals) { SCOPED_TRACE(::testing::Message() - << "item #" << item << ", width: " << static_cast(bv.first) - << ", value: " << bin(bv.second) - << ".\n\n\nBuffer memory:\n" << dumpContents(data)); + << "item #" << item << " of " << bits_and_vals.size() << ", width: " << static_cast(bv.first) + << ", value: " << bv.second << "(" << bin(bv.second) << ")" + << ", at bit position: " << std::dec << reader.count() + << ".\nBuffer memory:\n" << dumpContents(data)); - //EXPECT_EQ(getBits(bv.first, bv.second), reader.peekBits(bv.first)); - EXPECT_EQ(getBits(bv.first, bv.second), reader.readBits(bv.first)); +// const UInt8 next_byte = getBits(bv.first, bv.second) & + ASSERT_TRUE(BinaryEqual(getBits(bv.first, bv.second), reader.readBits(bv.first))); ++item; + bitpos += bv.first; } } } INSTANTIATE_TEST_CASE_P(Simple, - BitIO, - ::testing::Values( - TestCaseParameter( - {{9, 0xFFFFFFFF}, {9, 0x00}, {9, 0xFFFFFFFF}, {9, 0x00}, {9, 0xFFFFFFFF}}, - "11111111 10000000 00111111 11100000 00001111 11111000 "), - TestCaseParameter( - {{7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {3, 0xFFFF}}, - "01111110 11111101 11111011 11110111 11101111 11011111 10111111 01111111 11000000 "), - TestCaseParameter({{33, 0xFF110d0b07050300}, {33, 0xAAEE29251f1d1713}}), - TestCaseParameter({{33, BIT_PATTERN}, {33, BIT_PATTERN}}), - TestCaseParameter({{24, 0xFFFFFFFF}}, - "11111111 11111111 11111111 ") -),); + BitIO, + ::testing::ValuesIn(std::initializer_list{ + { + {{9, 0xFFFFFFFF}, {9, 0x00}, {9, 0xFFFFFFFF}, {9, 0x00}, {9, 0xFFFFFFFF}}, + "11111111 10000000 00111111 11100000 00001111 11111000 " + }, + { + {{7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {7, 0x3f}, {3, 0xFFFF}}, + "01111110 11111101 11111011 11110111 11101111 11011111 10111111 01111111 11000000 " + }, + { + {{33, 0xFF110d0b07050300}, {33, 0xAAEE29251f1d1713}} + }, + { + {{33, BIT_PATTERN}, {33, BIT_PATTERN}} + }, + { + {{24, 0xFFFFFFFF}}, + "11111111 11111111 11111111 " + }, + { + // Note that we take only N lower bits of the number: {3, 0b01011} => 011 + {{5, 0b01010}, {3, 0b111}, {7, 0b11001100}, {6, 0}, {5, 0b11111111}, {4, 0}, {3, 0b101}, {2, 0}, {1, 0b11111111}}, + "01010111 10011000 00000111 11000010 10010000 " + }, + { + {{64, BIT_PATTERN}, {56, BIT_PATTERN} , {4, 0b1111}, {4, 0}, // 128 + {8, 0b11111111}, {64, BIT_PATTERN}, {48, BIT_PATTERN}, {8, 0}}, // 256 + "11101011 11101111 10111010 11101111 10101111 10111010 11101011 10101001 " // 64 + "11101111 10111010 11101111 10101111 10111010 11101011 10101001 11110000 " // 128 + "11111111 11101011 11101111 10111010 11101111 10101111 10111010 11101011 " // 192 + "10101001 10111010 11101111 10101111 10111010 11101011 10101001 00000000 " // 256 + }, + { + {{64, BIT_PATTERN}, {56, BIT_PATTERN} , {5, 0b11111}, {3, 0}, // 128 + {8, 0b11111111}, {64, BIT_PATTERN}, {48, BIT_PATTERN}, {8, 0}, //256 + {32, BIT_PATTERN}, {12, 0xff}, {8, 0}, {12, 0xAEff}}, + "11101011 11101111 10111010 11101111 10101111 10111010 11101011 10101001 " // 64 + "11101111 10111010 11101111 10101111 10111010 11101011 10101001 11111000 " // 128 + "11111111 11101011 11101111 10111010 11101111 10101111 10111010 11101011 " // 192 + "10101001 10111010 11101111 10101111 10111010 11101011 10101001 00000000 " // 256 + "10101111 10111010 11101011 10101001 00001111 11110000 00001110 11111111 " // 320 + } + }), +); TestCaseParameter primes_case(UInt8 repeat_times, UInt64 pattern) { diff --git a/dbms/src/IO/tests/ryu_test.cpp b/dbms/src/IO/tests/ryu_test.cpp new file mode 100644 index 00000000000..0866f6afd3d --- /dev/null +++ b/dbms/src/IO/tests/ryu_test.cpp @@ -0,0 +1,92 @@ +#include +#include +#include + + +struct DecomposedFloat64 +{ + DecomposedFloat64(double x) + { + memcpy(&x_uint, &x, sizeof(x)); + } + + uint64_t x_uint; + + bool sign() const + { + return x_uint >> 63; + } + + uint16_t exponent() const + { + return (x_uint >> 52) & 0x7FF; + } + + int16_t normalized_exponent() const + { + return int16_t(exponent()) - 1023; + } + + uint64_t mantissa() const + { + return x_uint & 0x5affffffffffffful; + } + + bool is_inside_int64() const + { + return x_uint == 0 + || (normalized_exponent() >= 0 && normalized_exponent() <= 52 + && ((mantissa() & ((1ULL << (52 - normalized_exponent())) - 1)) == 0)); + } +}; + +struct DecomposedFloat32 +{ + DecomposedFloat32(float x) + { + memcpy(&x_uint, &x, sizeof(x)); + } + + uint32_t x_uint; + + bool sign() const + { + return x_uint >> 31; + } + + uint16_t exponent() const + { + return (x_uint >> 23) & 0xFF; + } + + int16_t normalized_exponent() const + { + return int16_t(exponent()) - 127; + } + + uint32_t mantissa() const + { + return x_uint & 0x7fffff; + } + + bool is_inside_int32() const + { + return x_uint == 0 + || (normalized_exponent() >= 0 && normalized_exponent() <= 23 + && ((mantissa() & ((1ULL << (23 - normalized_exponent())) - 1)) == 0)); + } +}; + + +int main(int argc, char ** argv) +{ + double x = argc > 1 ? std::stod(argv[1]) : 0; + char buf[32]; + + d2s_buffered(x, buf); + std::cout << buf << "\n"; + + std::cout << DecomposedFloat64(x).is_inside_int64() << "\n"; + + return 0; +} diff --git a/dbms/src/IO/tests/zlib_ng_bug.cpp b/dbms/src/IO/tests/zlib_ng_bug.cpp index 8b94b4e49d2..e9b3c448b88 100644 --- a/dbms/src/IO/tests/zlib_ng_bug.cpp +++ b/dbms/src/IO/tests/zlib_ng_bug.cpp @@ -1,32 +1,50 @@ -#include -#include -#include -#include +#include +#include +#include +#include -/** This script reproduces the bug in zlib-ng library. - * Put the following content to "data.bin" file: -abcdefghijklmn!@Aab#AAabcdefghijklmn$% -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - * There are two lines. First line make sense. Second line contains padding to make file size large enough. - * Compile with - * cmake -D SANITIZE=address - * and run: +#pragma GCC diagnostic ignored "-Wold-style-cast" -./zlib_ng_bug data2.bin -================================================================= -==204952==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6310000147ff at pc 0x000000596d7a bp 0x7ffd139edd50 sp 0x7ffd139edd48 -READ of size 1 at 0x6310000147ff thread T0 - */ -int main(int argc, char ** argv) +/// https://github.com/zlib-ng/zlib-ng/issues/494 +int main(int, char **) { - using namespace Poco; + std::vector in(1048576); + std::vector out(1048576); - std::string filename(argc >= 2 ? argv[1] : "data.bin"); - FileInputStream istr(filename); - NullOutputStream ostr; - DeflatingOutputStream deflater(ostr, DeflatingStreamBuf::STREAM_GZIP); - StreamCopier::copyStream(istr, deflater); + ssize_t in_size = read(STDIN_FILENO, in.data(), 1048576); + if (in_size < 0) + throw std::runtime_error("Cannot read"); + in.resize(in_size); + + z_stream zstr{}; + if (Z_OK != deflateInit2(&zstr, 1, Z_DEFLATED, 15 + 16, 8, Z_DEFAULT_STRATEGY)) + throw std::runtime_error("Cannot deflateInit2"); + + zstr.next_in = in.data(); + zstr.avail_in = in.size(); + zstr.next_out = out.data(); + zstr.avail_out = out.size(); + + while (zstr.avail_in > 0) + if (Z_OK != deflate(&zstr, Z_NO_FLUSH)) + throw std::runtime_error("Cannot deflate"); + + while (true) + { + int rc = deflate(&zstr, Z_FINISH); + + if (rc == Z_STREAM_END) + break; + + if (rc != Z_OK) + throw std::runtime_error("Cannot finish deflate"); + } + + deflateEnd(&zstr); + + if (ssize_t(zstr.total_out) != write(STDOUT_FILENO, out.data(), zstr.total_out)) + throw std::runtime_error("Cannot write"); return 0; } diff --git a/dbms/src/Interpreters/BloomFilterHash.h b/dbms/src/Interpreters/BloomFilterHash.h index bd1100c7c68..77bd5cc7ffd 100644 --- a/dbms/src/Interpreters/BloomFilterHash.h +++ b/dbms/src/Interpreters/BloomFilterHash.h @@ -85,8 +85,16 @@ struct BloomFilterHash throw Exception("Unexpected type " + data_type->getName() + " of bloom filter index.", ErrorCodes::LOGICAL_ERROR); const auto & offsets = array_col->getOffsets(); - size_t offset = (pos == 0) ? 0 : offsets[pos - 1]; - limit = std::max(array_col->getData().size() - offset, limit); + limit = offsets[pos + limit - 1] - offsets[pos - 1]; /// PaddedPODArray allows access on index -1. + pos = offsets[pos - 1]; + + if (limit == 0) + { + auto index_column = ColumnUInt64::create(1); + ColumnUInt64::Container & index_column_vec = index_column->getData(); + index_column_vec[0] = 0; + return index_column; + } } const ColumnPtr actual_col = BloomFilter::getPrimitiveColumn(column); diff --git a/dbms/src/Interpreters/CatBoostModel.h b/dbms/src/Interpreters/CatBoostModel.h index 541dd111c82..820c26a1fb4 100644 --- a/dbms/src/Interpreters/CatBoostModel.h +++ b/dbms/src/Interpreters/CatBoostModel.h @@ -60,7 +60,7 @@ public: const ExternalLoadableLifetime & getLifetime() const override; - std::string getName() const override { return name; } + const std::string & getLoadableName() const override { return name; } bool supportUpdates() const override { return true; } @@ -69,7 +69,7 @@ public: std::shared_ptr clone() const override; private: - std::string name; + const std::string name; std::string model_path; std::string lib_path; ExternalLoadableLifetime lifetime; diff --git a/dbms/src/Interpreters/Context.cpp b/dbms/src/Interpreters/Context.cpp index e5aab7075ca..69ceb24e570 100644 --- a/dbms/src/Interpreters/Context.cpp +++ b/dbms/src/Interpreters/Context.cpp @@ -33,8 +33,6 @@ #include #include #include -#include -#include #include #include #include @@ -1088,34 +1086,12 @@ DatabasePtr Context::detachDatabase(const String & database_name) { auto lock = getLock(); auto res = getDatabase(database_name); - getExternalDictionariesLoader().removeConfigRepository(database_name); shared->databases.erase(database_name); return res; } -ASTPtr Context::getCreateTableQuery(const String & database_name, const String & table_name) const -{ - auto lock = getLock(); - - String db = resolveDatabase(database_name, current_database); - assertDatabaseExists(db); - - return shared->databases[db]->getCreateTableQuery(*this, table_name); -} - - -ASTPtr Context::getCreateDictionaryQuery(const String & database_name, const String & dictionary_name) const -{ - auto lock = getLock(); - - String db = resolveDatabase(database_name, current_database); - assertDatabaseExists(db); - - return shared->databases[db]->getCreateDictionaryQuery(*this, dictionary_name); -} - ASTPtr Context::getCreateExternalTableQuery(const String & table_name) const { TableAndCreateASTs::const_iterator jt = external_tables.find(table_name); @@ -1125,16 +1101,6 @@ ASTPtr Context::getCreateExternalTableQuery(const String & table_name) const return jt->second.second; } -ASTPtr Context::getCreateDatabaseQuery(const String & database_name) const -{ - auto lock = getLock(); - - String db = resolveDatabase(database_name, current_database); - assertDatabaseExists(db); - - return shared->databases[db]->getCreateDatabaseQuery(*this); -} - Settings Context::getSettings() const { return settings; @@ -1467,7 +1433,7 @@ void Context::setMarkCache(size_t cache_size_in_bytes) if (shared->mark_cache) throw Exception("Mark cache has been already created.", ErrorCodes::LOGICAL_ERROR); - shared->mark_cache = std::make_shared(cache_size_in_bytes, std::chrono::seconds(settings.mark_cache_min_lifetime)); + shared->mark_cache = std::make_shared(cache_size_in_bytes); } diff --git a/dbms/src/Interpreters/Context.h b/dbms/src/Interpreters/Context.h index 79e2936d7d2..b0a4b5bb580 100644 --- a/dbms/src/Interpreters/Context.h +++ b/dbms/src/Interpreters/Context.h @@ -153,6 +153,7 @@ private: String default_format; /// Format, used when server formats data by itself and if query does not have FORMAT specification. /// Thus, used in HTTP interface. If not specified - then some globally default format is used. + // TODO maybe replace with DatabaseMemory? TableAndCreateASTs external_tables; /// Temporary tables. Scalars scalars; StoragePtr view_source; /// Temporary StorageValues used to generate alias columns for materialized views @@ -373,10 +374,7 @@ public: std::optional getTCPPortSecure() const; /// Get query for the CREATE table. - ASTPtr getCreateTableQuery(const String & database_name, const String & table_name) const; ASTPtr getCreateExternalTableQuery(const String & table_name) const; - ASTPtr getCreateDatabaseQuery(const String & database_name) const; - ASTPtr getCreateDictionaryQuery(const String & database_name, const String & dictionary_name) const; const DatabasePtr getDatabase(const String & database_name) const; DatabasePtr getDatabase(const String & database_name); diff --git a/dbms/src/Interpreters/DDLWorker.cpp b/dbms/src/Interpreters/DDLWorker.cpp index 861a6b5ff03..3077290e3fe 100644 --- a/dbms/src/Interpreters/DDLWorker.cpp +++ b/dbms/src/Interpreters/DDLWorker.cpp @@ -941,22 +941,19 @@ void DDLWorker::runMainThread() { try { - try - { - auto zookeeper = getAndSetZooKeeper(); - zookeeper->createAncestors(queue_dir + "/"); - initialized = true; - } - catch (const Coordination::Exception & e) - { - if (!Coordination::isHardwareError(e.code)) - throw; /// A logical error. + auto zookeeper = getAndSetZooKeeper(); + zookeeper->createAncestors(queue_dir + "/"); + initialized = true; + } + catch (const Coordination::Exception & e) + { + if (!Coordination::isHardwareError(e.code)) + throw; /// A logical error. - tryLogCurrentException(__PRETTY_FUNCTION__); + tryLogCurrentException(__PRETTY_FUNCTION__); - /// Avoid busy loop when ZooKeeper is not available. - sleepForSeconds(1); - } + /// Avoid busy loop when ZooKeeper is not available. + sleepForSeconds(1); } catch (...) { diff --git a/dbms/src/Interpreters/EmbeddedDictionaries.cpp b/dbms/src/Interpreters/EmbeddedDictionaries.cpp index c73850073cd..9ab3cf2dcbe 100644 --- a/dbms/src/Interpreters/EmbeddedDictionaries.cpp +++ b/dbms/src/Interpreters/EmbeddedDictionaries.cpp @@ -72,7 +72,7 @@ bool EmbeddedDictionaries::reloadImpl(const bool throw_on_error, const bool forc bool was_exception = false; - DictionaryReloader reload_regions_hierarchies = [=] (const Poco::Util::AbstractConfiguration & config) + DictionaryReloader reload_regions_hierarchies = [=, this] (const Poco::Util::AbstractConfiguration & config) { return geo_dictionaries_loader->reloadRegionsHierarchies(config); }; @@ -80,7 +80,7 @@ bool EmbeddedDictionaries::reloadImpl(const bool throw_on_error, const bool forc if (!reloadDictionary(regions_hierarchies, std::move(reload_regions_hierarchies), throw_on_error, force_reload)) was_exception = true; - DictionaryReloader reload_regions_names = [=] (const Poco::Util::AbstractConfiguration & config) + DictionaryReloader reload_regions_names = [=, this] (const Poco::Util::AbstractConfiguration & config) { return geo_dictionaries_loader->reloadRegionsNames(config); }; diff --git a/dbms/src/Interpreters/ExternalDictionariesLoader.cpp b/dbms/src/Interpreters/ExternalDictionariesLoader.cpp index 53dc70fe5d4..d5f995a8db3 100644 --- a/dbms/src/Interpreters/ExternalDictionariesLoader.cpp +++ b/dbms/src/Interpreters/ExternalDictionariesLoader.cpp @@ -9,6 +9,7 @@ ExternalDictionariesLoader::ExternalDictionariesLoader(Context & context_) : ExternalLoader("external dictionary", &Logger::get("ExternalDictionariesLoader")) , context(context_) { + setConfigSettings({"dictionary", "name", "database"}); enableAsyncLoading(true); enablePeriodicUpdates(true); } @@ -23,11 +24,4 @@ ExternalLoader::LoadablePtr ExternalDictionariesLoader::create( bool dictionary_from_database = !repository_name.empty(); return DictionaryFactory::instance().create(name, config, key_in_config, context, dictionary_from_database); } - -void ExternalDictionariesLoader::addConfigRepository( - const std::string & repository_name, std::unique_ptr config_repository) -{ - ExternalLoader::addConfigRepository(repository_name, std::move(config_repository), {"dictionary", "name"}); -} - } diff --git a/dbms/src/Interpreters/ExternalDictionariesLoader.h b/dbms/src/Interpreters/ExternalDictionariesLoader.h index 15293ac09c0..90c49876ca4 100644 --- a/dbms/src/Interpreters/ExternalDictionariesLoader.h +++ b/dbms/src/Interpreters/ExternalDictionariesLoader.h @@ -29,10 +29,6 @@ public: return std::static_pointer_cast(tryLoad(name)); } - void addConfigRepository( - const std::string & repository_name, - std::unique_ptr config_repository); - protected: LoadablePtr create(const std::string & name, const Poco::Util::AbstractConfiguration & config, const std::string & key_in_config, const std::string & repository_name) const override; diff --git a/dbms/src/Interpreters/ExternalLoader.cpp b/dbms/src/Interpreters/ExternalLoader.cpp index 215263d8b3c..4b907b521e9 100644 --- a/dbms/src/Interpreters/ExternalLoader.cpp +++ b/dbms/src/Interpreters/ExternalLoader.cpp @@ -92,6 +92,7 @@ struct ExternalLoader::ObjectConfig Poco::AutoPtr config; String key_in_config; String repository_name; + bool from_temp_repository = false; String path; }; @@ -107,26 +108,30 @@ public: } ~LoadablesConfigReader() = default; - using RepositoryPtr = std::unique_ptr; + using Repository = IExternalLoaderConfigRepository; - void addConfigRepository(const String & repository_name, RepositoryPtr repository, const ExternalLoaderConfigSettings & settings) + void addConfigRepository(std::unique_ptr repository) { std::lock_guard lock{mutex}; - RepositoryInfo repository_info{std::move(repository), settings, {}}; - repositories.emplace(repository_name, std::move(repository_info)); + auto * ptr = repository.get(); + repositories.emplace(ptr, RepositoryInfo{std::move(repository), {}}); need_collect_object_configs = true; } - RepositoryPtr removeConfigRepository(const String & repository_name) + void removeConfigRepository(Repository * repository) { std::lock_guard lock{mutex}; - auto it = repositories.find(repository_name); + auto it = repositories.find(repository); if (it == repositories.end()) - return nullptr; - auto repository = std::move(it->second.repository); + return; repositories.erase(it); need_collect_object_configs = true; - return repository; + } + + void setConfigSettings(const ExternalLoaderConfigSettings & settings_) + { + std::lock_guard lock{mutex}; + settings = settings_; } using ObjectConfigsPtr = std::shared_ptr>; @@ -170,8 +175,7 @@ private: struct RepositoryInfo { - RepositoryPtr repository; - ExternalLoaderConfigSettings settings; + std::unique_ptr repository; std::unordered_map files; }; @@ -179,18 +183,10 @@ private: /// Checks last modification times of files and read those files which are new or changed. void readRepositories(const std::optional & only_repository_name = {}, const std::optional & only_path = {}) { - Strings repository_names; - if (only_repository_name) + for (auto & [repository, repository_info] : repositories) { - if (repositories.count(*only_repository_name)) - repository_names.push_back(*only_repository_name); - } - else - boost::copy(repositories | boost::adaptors::map_keys, std::back_inserter(repository_names)); - - for (const auto & repository_name : repository_names) - { - auto & repository_info = repositories[repository_name]; + if (only_repository_name && (repository->getName() != *only_repository_name)) + continue; for (auto & file_info : repository_info.files | boost::adaptors::map_values) file_info.in_use = false; @@ -198,11 +194,11 @@ private: Strings existing_paths; if (only_path) { - if (repository_info.repository->exists(*only_path)) + if (repository->exists(*only_path)) existing_paths.push_back(*only_path); } else - boost::copy(repository_info.repository->getAllLoadablesDefinitionNames(), std::back_inserter(existing_paths)); + boost::copy(repository->getAllLoadablesDefinitionNames(), std::back_inserter(existing_paths)); for (const auto & path : existing_paths) { @@ -210,13 +206,13 @@ private: if (it != repository_info.files.end()) { FileInfo & file_info = it->second; - if (readFileInfo(file_info, *repository_info.repository, path, repository_info.settings)) + if (readFileInfo(file_info, *repository, path)) need_collect_object_configs = true; } else { FileInfo file_info; - if (readFileInfo(file_info, *repository_info.repository, path, repository_info.settings)) + if (readFileInfo(file_info, *repository, path)) { repository_info.files.emplace(path, std::move(file_info)); need_collect_object_configs = true; @@ -249,8 +245,7 @@ private: bool readFileInfo( FileInfo & file_info, IExternalLoaderConfigRepository & repository, - const String & path, - const ExternalLoaderConfigSettings & settings) const + const String & path) const { try { @@ -293,7 +288,13 @@ private: continue; } - object_configs_from_file.emplace_back(object_name, ObjectConfig{file_contents, key, {}, {}}); + String database; + if (!settings.external_database.empty()) + database = file_contents->getString(key + "." + settings.external_database, ""); + if (!database.empty()) + object_name = database + "." + object_name; + + object_configs_from_file.emplace_back(object_name, ObjectConfig{file_contents, key, {}, {}, {}}); } file_info.objects = std::move(object_configs_from_file); @@ -318,7 +319,7 @@ private: // Generate new result. auto new_configs = std::make_shared>(); - for (const auto & [repository_name, repository_info] : repositories) + for (const auto & [repository, repository_info] : repositories) { for (const auto & [path, file_info] : repository_info.files) { @@ -328,19 +329,19 @@ private: if (already_added_it == new_configs->end()) { auto & new_config = new_configs->emplace(object_name, object_config).first->second; - new_config.repository_name = repository_name; + new_config.from_temp_repository = repository->isTemporary(); + new_config.repository_name = repository->getName(); new_config.path = path; } else { const auto & already_added = already_added_it->second; - if (!startsWith(repository_name, IExternalLoaderConfigRepository::INTERNAL_REPOSITORY_NAME_PREFIX) && - !startsWith(already_added.repository_name, IExternalLoaderConfigRepository::INTERNAL_REPOSITORY_NAME_PREFIX)) + if (!already_added.from_temp_repository && !repository->isTemporary()) { LOG_WARNING( log, type_name << " '" << object_name << "' is found " - << (((path == already_added.path) && repository_name == already_added.repository_name) + << (((path == already_added.path) && (repository->getName() == already_added.repository_name)) ? ("twice in the same file '" + path + "'") : ("both in file '" + already_added.path + "' and '" + path + "'"))); } @@ -356,7 +357,8 @@ private: Logger * log; std::mutex mutex; - std::unordered_map repositories; + ExternalLoaderConfigSettings settings; + std::unordered_map repositories; ObjectConfigsPtr object_configs; bool need_collect_object_configs = false; }; @@ -613,7 +615,7 @@ public: } catch (...) { - tryLogCurrentException(log, "Could not check if " + type_name + " '" + object->getName() + "' was modified"); + tryLogCurrentException(log, "Could not check if " + type_name + " '" + object->getLoadableName() + "' was modified"); /// Cannot check isModified, so update should_update_flag = true; } @@ -1151,20 +1153,23 @@ ExternalLoader::ExternalLoader(const String & type_name_, Logger * log_) ExternalLoader::~ExternalLoader() = default; -void ExternalLoader::addConfigRepository( - const std::string & repository_name, - std::unique_ptr config_repository, - const ExternalLoaderConfigSettings & config_settings) +ext::scope_guard ExternalLoader::addConfigRepository(std::unique_ptr repository) { - config_files_reader->addConfigRepository(repository_name, std::move(config_repository), config_settings); - reloadConfig(repository_name); + auto * ptr = repository.get(); + String name = ptr->getName(); + config_files_reader->addConfigRepository(std::move(repository)); + reloadConfig(name); + + return [this, ptr, name]() + { + config_files_reader->removeConfigRepository(ptr); + reloadConfig(name); + }; } -std::unique_ptr ExternalLoader::removeConfigRepository(const std::string & repository_name) +void ExternalLoader::setConfigSettings(const ExternalLoaderConfigSettings & settings) { - auto repository = config_files_reader->removeConfigRepository(repository_name); - reloadConfig(repository_name); - return repository; + config_files_reader->setConfigSettings(settings); } void ExternalLoader::enableAlwaysLoadEverything(bool enable) diff --git a/dbms/src/Interpreters/ExternalLoader.h b/dbms/src/Interpreters/ExternalLoader.h index 9ccdc4bf39c..1f65cb80f94 100644 --- a/dbms/src/Interpreters/ExternalLoader.h +++ b/dbms/src/Interpreters/ExternalLoader.h @@ -7,6 +7,7 @@ #include #include #include +#include namespace DB @@ -24,9 +25,9 @@ struct ExternalLoaderConfigSettings { std::string external_config; std::string external_name; + std::string external_database; }; - /** Interface for manage user-defined objects. * Monitors configuration file and automatically reloads objects in separate threads. * The monitoring thread wakes up every 'check_period_sec' seconds and checks @@ -87,13 +88,9 @@ public: virtual ~ExternalLoader(); /// Adds a repository which will be used to read configurations from. - void addConfigRepository( - const std::string & repository_name, - std::unique_ptr config_repository, - const ExternalLoaderConfigSettings & config_settings); + ext::scope_guard addConfigRepository(std::unique_ptr config_repository); - /// Removes a repository which were used to read configurations. - std::unique_ptr removeConfigRepository(const std::string & repository_name); + void setConfigSettings(const ExternalLoaderConfigSettings & settings_); /// Sets whether all the objects from the configuration should be always loaded (even those which are never used). void enableAlwaysLoadEverything(bool enable); diff --git a/dbms/src/Interpreters/ExternalLoaderDatabaseConfigRepository.cpp b/dbms/src/Interpreters/ExternalLoaderDatabaseConfigRepository.cpp index bd89f27def1..33469d95e08 100644 --- a/dbms/src/Interpreters/ExternalLoaderDatabaseConfigRepository.cpp +++ b/dbms/src/Interpreters/ExternalLoaderDatabaseConfigRepository.cpp @@ -1,4 +1,5 @@ #include +#include #include namespace DB @@ -11,40 +12,46 @@ namespace ErrorCodes namespace { -String trimDatabaseName(const std::string & loadable_definition_name, const DatabasePtr database) +String trimDatabaseName(const std::string & loadable_definition_name, const IDatabase & database) { - const auto & dbname = database->getDatabaseName(); + const auto & dbname = database.getDatabaseName(); if (!startsWith(loadable_definition_name, dbname)) throw Exception( - "Loadable '" + loadable_definition_name + "' is not from database '" + database->getDatabaseName(), ErrorCodes::UNKNOWN_DICTIONARY); + "Loadable '" + loadable_definition_name + "' is not from database '" + database.getDatabaseName(), ErrorCodes::UNKNOWN_DICTIONARY); /// dbname.loadable_name ///--> remove <--- return loadable_definition_name.substr(dbname.length() + 1); } } -LoadablesConfigurationPtr ExternalLoaderDatabaseConfigRepository::load(const std::string & loadable_definition_name) const +ExternalLoaderDatabaseConfigRepository::ExternalLoaderDatabaseConfigRepository(IDatabase & database_, const Context & context_) + : name(database_.getDatabaseName()) + , database(database_) + , context(context_) { - String dictname = trimDatabaseName(loadable_definition_name, database); - return getDictionaryConfigurationFromAST(database->getCreateDictionaryQuery(context, dictname)->as()); } -bool ExternalLoaderDatabaseConfigRepository::exists(const std::string & loadable_definition_name) const +LoadablesConfigurationPtr ExternalLoaderDatabaseConfigRepository::load(const std::string & loadable_definition_name) { - return database->isDictionaryExist( - context, trimDatabaseName(loadable_definition_name, database)); + String dictname = trimDatabaseName(loadable_definition_name, database); + return getDictionaryConfigurationFromAST(database.getCreateDictionaryQuery(context, dictname)->as()); +} + +bool ExternalLoaderDatabaseConfigRepository::exists(const std::string & loadable_definition_name) +{ + return database.isDictionaryExist(context, trimDatabaseName(loadable_definition_name, database)); } Poco::Timestamp ExternalLoaderDatabaseConfigRepository::getUpdateTime(const std::string & loadable_definition_name) { - return database->getObjectMetadataModificationTime(context, trimDatabaseName(loadable_definition_name, database)); + return database.getObjectMetadataModificationTime(trimDatabaseName(loadable_definition_name, database)); } -std::set ExternalLoaderDatabaseConfigRepository::getAllLoadablesDefinitionNames() const +std::set ExternalLoaderDatabaseConfigRepository::getAllLoadablesDefinitionNames() { std::set result; - const auto & dbname = database->getDatabaseName(); - auto itr = database->getDictionariesIterator(context); + const auto & dbname = database.getDatabaseName(); + auto itr = database.getDictionariesIterator(context); while (itr && itr->isValid()) { result.insert(dbname + "." + itr->name()); diff --git a/dbms/src/Interpreters/ExternalLoaderDatabaseConfigRepository.h b/dbms/src/Interpreters/ExternalLoaderDatabaseConfigRepository.h index 343ed8cf038..2afff035d9d 100644 --- a/dbms/src/Interpreters/ExternalLoaderDatabaseConfigRepository.h +++ b/dbms/src/Interpreters/ExternalLoaderDatabaseConfigRepository.h @@ -12,22 +12,21 @@ namespace DB class ExternalLoaderDatabaseConfigRepository : public IExternalLoaderConfigRepository { public: - ExternalLoaderDatabaseConfigRepository(const DatabasePtr & database_, const Context & context_) - : database(database_) - , context(context_) - { - } + ExternalLoaderDatabaseConfigRepository(IDatabase & database_, const Context & context_); - std::set getAllLoadablesDefinitionNames() const override; + const std::string & getName() const override { return name; } - bool exists(const std::string & loadable_definition_name) const override; + std::set getAllLoadablesDefinitionNames() override; + + bool exists(const std::string & loadable_definition_name) override; Poco::Timestamp getUpdateTime(const std::string & loadable_definition_name) override; - LoadablesConfigurationPtr load(const std::string & loadable_definition_name) const override; + LoadablesConfigurationPtr load(const std::string & loadable_definition_name) override; private: - DatabasePtr database; + const String name; + IDatabase & database; Context context; }; diff --git a/dbms/src/Interpreters/ExternalLoaderPresetConfigRepository.cpp b/dbms/src/Interpreters/ExternalLoaderPresetConfigRepository.cpp deleted file mode 100644 index 16c8a3aa59c..00000000000 --- a/dbms/src/Interpreters/ExternalLoaderPresetConfigRepository.cpp +++ /dev/null @@ -1,49 +0,0 @@ -#include -#include -#include -#include - - -namespace DB -{ -namespace ErrorCodes -{ - extern const int BAD_ARGUMENTS; -} - -ExternalLoaderPresetConfigRepository::ExternalLoaderPresetConfigRepository(const std::vector> & preset_) -{ - boost::range::copy(preset_, std::inserter(preset, preset.end())); -} - -ExternalLoaderPresetConfigRepository::~ExternalLoaderPresetConfigRepository() = default; - -std::set ExternalLoaderPresetConfigRepository::getAllLoadablesDefinitionNames() const -{ - std::set paths; - boost::range::copy(preset | boost::adaptors::map_keys, std::inserter(paths, paths.end())); - return paths; -} - -bool ExternalLoaderPresetConfigRepository::exists(const String& path) const -{ - return preset.count(path); -} - -Poco::Timestamp ExternalLoaderPresetConfigRepository::getUpdateTime(const String & path) -{ - if (!exists(path)) - throw Exception("Loadable " + path + " not found", ErrorCodes::BAD_ARGUMENTS); - return creation_time; -} - -/// May contain definition about several entities (several dictionaries in one .xml file) -LoadablesConfigurationPtr ExternalLoaderPresetConfigRepository::load(const String & path) const -{ - auto it = preset.find(path); - if (it == preset.end()) - throw Exception("Loadable " + path + " not found", ErrorCodes::BAD_ARGUMENTS); - return it->second; -} - -} diff --git a/dbms/src/Interpreters/ExternalLoaderPresetConfigRepository.h b/dbms/src/Interpreters/ExternalLoaderPresetConfigRepository.h deleted file mode 100644 index b35209a7fb9..00000000000 --- a/dbms/src/Interpreters/ExternalLoaderPresetConfigRepository.h +++ /dev/null @@ -1,28 +0,0 @@ -#pragma once - -#include -#include -#include -#include - - -namespace DB -{ -/// A config repository filled with preset loadables used by ExternalLoader. -class ExternalLoaderPresetConfigRepository : public IExternalLoaderConfigRepository -{ -public: - ExternalLoaderPresetConfigRepository(const std::vector> & preset_); - ~ExternalLoaderPresetConfigRepository() override; - - std::set getAllLoadablesDefinitionNames() const override; - bool exists(const String & path) const override; - Poco::Timestamp getUpdateTime(const String & path) override; - LoadablesConfigurationPtr load(const String & path) const override; - -private: - std::unordered_map preset; - Poco::Timestamp creation_time; -}; - -} diff --git a/dbms/src/Interpreters/ExternalLoaderTempConfigRepository.cpp b/dbms/src/Interpreters/ExternalLoaderTempConfigRepository.cpp new file mode 100644 index 00000000000..c4210875867 --- /dev/null +++ b/dbms/src/Interpreters/ExternalLoaderTempConfigRepository.cpp @@ -0,0 +1,46 @@ +#include +#include + + +namespace DB +{ +namespace ErrorCodes +{ + extern const int BAD_ARGUMENTS; +} + + +ExternalLoaderTempConfigRepository::ExternalLoaderTempConfigRepository(const String & repository_name_, const String & path_, const LoadablesConfigurationPtr & config_) + : name(repository_name_), path(path_), config(config_) {} + + +std::set ExternalLoaderTempConfigRepository::getAllLoadablesDefinitionNames() +{ + std::set paths; + paths.insert(path); + return paths; +} + + +bool ExternalLoaderTempConfigRepository::exists(const String & path_) +{ + return path == path_; +} + + +Poco::Timestamp ExternalLoaderTempConfigRepository::getUpdateTime(const String & path_) +{ + if (!exists(path_)) + throw Exception("Loadable " + path_ + " not found", ErrorCodes::BAD_ARGUMENTS); + return creation_time; +} + + +LoadablesConfigurationPtr ExternalLoaderTempConfigRepository::load(const String & path_) +{ + if (!exists(path_)) + throw Exception("Loadable " + path_ + " not found", ErrorCodes::BAD_ARGUMENTS); + return config; +} + +} diff --git a/dbms/src/Interpreters/ExternalLoaderTempConfigRepository.h b/dbms/src/Interpreters/ExternalLoaderTempConfigRepository.h new file mode 100644 index 00000000000..6ee717631cc --- /dev/null +++ b/dbms/src/Interpreters/ExternalLoaderTempConfigRepository.h @@ -0,0 +1,31 @@ +#pragma once + +#include +#include +#include + + +namespace DB +{ +/// A config repository filled with preset loadables used by ExternalLoader. +class ExternalLoaderTempConfigRepository : public IExternalLoaderConfigRepository +{ +public: + ExternalLoaderTempConfigRepository(const String & repository_name_, const String & path_, const LoadablesConfigurationPtr & config_); + + const String & getName() const override { return name; } + bool isTemporary() const override { return true; } + + std::set getAllLoadablesDefinitionNames() override; + bool exists(const String & path) override; + Poco::Timestamp getUpdateTime(const String & path) override; + LoadablesConfigurationPtr load(const String & path) override; + +private: + String name; + String path; + LoadablesConfigurationPtr config; + Poco::Timestamp creation_time; +}; + +} diff --git a/dbms/src/Interpreters/ExternalLoaderXMLConfigRepository.cpp b/dbms/src/Interpreters/ExternalLoaderXMLConfigRepository.cpp index 9a5e32697df..63755ee1839 100644 --- a/dbms/src/Interpreters/ExternalLoaderXMLConfigRepository.cpp +++ b/dbms/src/Interpreters/ExternalLoaderXMLConfigRepository.cpp @@ -11,13 +11,18 @@ namespace DB { +ExternalLoaderXMLConfigRepository::ExternalLoaderXMLConfigRepository( + const Poco::Util::AbstractConfiguration & main_config_, const std::string & config_key_) + : main_config(main_config_), config_key(config_key_) +{ +} Poco::Timestamp ExternalLoaderXMLConfigRepository::getUpdateTime(const std::string & definition_entity_name) { return Poco::File(definition_entity_name).getLastModified(); } -std::set ExternalLoaderXMLConfigRepository::getAllLoadablesDefinitionNames() const +std::set ExternalLoaderXMLConfigRepository::getAllLoadablesDefinitionNames() { std::set files; @@ -52,13 +57,13 @@ std::set ExternalLoaderXMLConfigRepository::getAllLoadablesDefiniti return files; } -bool ExternalLoaderXMLConfigRepository::exists(const std::string & definition_entity_name) const +bool ExternalLoaderXMLConfigRepository::exists(const std::string & definition_entity_name) { return Poco::File(definition_entity_name).exists(); } Poco::AutoPtr ExternalLoaderXMLConfigRepository::load( - const std::string & config_file) const + const std::string & config_file) { ConfigProcessor config_processor{config_file}; ConfigProcessor::LoadedConfig preprocessed = config_processor.loadConfig(); diff --git a/dbms/src/Interpreters/ExternalLoaderXMLConfigRepository.h b/dbms/src/Interpreters/ExternalLoaderXMLConfigRepository.h index b8676209c14..808fa66fdbf 100644 --- a/dbms/src/Interpreters/ExternalLoaderXMLConfigRepository.h +++ b/dbms/src/Interpreters/ExternalLoaderXMLConfigRepository.h @@ -13,26 +13,25 @@ namespace DB class ExternalLoaderXMLConfigRepository : public IExternalLoaderConfigRepository { public: + ExternalLoaderXMLConfigRepository(const Poco::Util::AbstractConfiguration & main_config_, const std::string & config_key_); - ExternalLoaderXMLConfigRepository(const Poco::Util::AbstractConfiguration & main_config_, const std::string & config_key_) - : main_config(main_config_) - , config_key(config_key_) - { - } + const String & getName() const override { return name; } /// Return set of .xml files from path in main_config (config_key) - std::set getAllLoadablesDefinitionNames() const override; + std::set getAllLoadablesDefinitionNames() override; /// Checks that file with name exists on filesystem - bool exists(const std::string & definition_entity_name) const override; + bool exists(const std::string & definition_entity_name) override; /// Return xml-file modification time via stat call Poco::Timestamp getUpdateTime(const std::string & definition_entity_name) override; /// May contain definition about several entities (several dictionaries in one .xml file) - LoadablesConfigurationPtr load(const std::string & definition_entity_name) const override; + LoadablesConfigurationPtr load(const std::string & definition_entity_name) override; private: + const String name; + /// Main server config (config.xml). const Poco::Util::AbstractConfiguration & main_config; diff --git a/dbms/src/Interpreters/ExternalModelsLoader.cpp b/dbms/src/Interpreters/ExternalModelsLoader.cpp index 2a83b8324a4..7334cfbaa3a 100644 --- a/dbms/src/Interpreters/ExternalModelsLoader.cpp +++ b/dbms/src/Interpreters/ExternalModelsLoader.cpp @@ -14,6 +14,7 @@ ExternalModelsLoader::ExternalModelsLoader(Context & context_) : ExternalLoader("external model", &Logger::get("ExternalModelsLoader")) , context(context_) { + setConfigSettings({"models", "name", {}}); enablePeriodicUpdates(true); } @@ -38,9 +39,4 @@ std::shared_ptr ExternalModelsLoader::create( throw Exception("Unknown model type: " + type, ErrorCodes::INVALID_CONFIG_PARAMETER); } } - -void ExternalModelsLoader::addConfigRepository(const String & name, std::unique_ptr config_repository) -{ - ExternalLoader::addConfigRepository(name, std::move(config_repository), {"models", "name"}); -} } diff --git a/dbms/src/Interpreters/ExternalModelsLoader.h b/dbms/src/Interpreters/ExternalModelsLoader.h index 753bad20ca0..4d09cebc307 100644 --- a/dbms/src/Interpreters/ExternalModelsLoader.h +++ b/dbms/src/Interpreters/ExternalModelsLoader.h @@ -25,10 +25,6 @@ public: return std::static_pointer_cast(load(name)); } - void addConfigRepository(const String & name, - std::unique_ptr config_repository); - - protected: LoadablePtr create(const std::string & name, const Poco::Util::AbstractConfiguration & config, const std::string & key_in_config, const std::string & repository_name) const override; diff --git a/dbms/src/Interpreters/IExternalLoadable.h b/dbms/src/Interpreters/IExternalLoadable.h index d4b93c56d2a..f9f24a9bbac 100644 --- a/dbms/src/Interpreters/IExternalLoadable.h +++ b/dbms/src/Interpreters/IExternalLoadable.h @@ -36,7 +36,7 @@ public: virtual const ExternalLoadableLifetime & getLifetime() const = 0; - virtual std::string getName() const = 0; + virtual const std::string & getLoadableName() const = 0; /// True if object can be updated when lifetime exceeded. virtual bool supportUpdates() const = 0; /// If lifetime exceeded and isModified(), ExternalLoader replace current object with the result of clone(). diff --git a/dbms/src/Interpreters/IExternalLoaderConfigRepository.cpp b/dbms/src/Interpreters/IExternalLoaderConfigRepository.cpp deleted file mode 100644 index 968a9bca9de..00000000000 --- a/dbms/src/Interpreters/IExternalLoaderConfigRepository.cpp +++ /dev/null @@ -1,7 +0,0 @@ -#include - - -namespace DB -{ -const char * IExternalLoaderConfigRepository::INTERNAL_REPOSITORY_NAME_PREFIX = "\xFF internal repo "; -} diff --git a/dbms/src/Interpreters/IExternalLoaderConfigRepository.h b/dbms/src/Interpreters/IExternalLoaderConfigRepository.h index bcac36d9807..866aa0b877f 100644 --- a/dbms/src/Interpreters/IExternalLoaderConfigRepository.h +++ b/dbms/src/Interpreters/IExternalLoaderConfigRepository.h @@ -4,13 +4,13 @@ #include #include +#include #include #include #include namespace DB { - using LoadablesConfigurationPtr = Poco::AutoPtr; /// Base interface for configurations source for Loadble objects, which can be @@ -22,24 +22,27 @@ using LoadablesConfigurationPtr = Poco::AutoPtr getAllLoadablesDefinitionNames() const = 0; + virtual std::set getAllLoadablesDefinitionNames() = 0; /// Checks that source of loadables configuration exist. - virtual bool exists(const std::string & loadable_definition_name) const = 0; + virtual bool exists(const std::string & path) = 0; /// Returns entity last update time - virtual Poco::Timestamp getUpdateTime(const std::string & loadable_definition_name) = 0; + virtual Poco::Timestamp getUpdateTime(const std::string & path) = 0; /// Load configuration from some concrete source to AbstractConfiguration - virtual LoadablesConfigurationPtr load(const std::string & loadable_definition_name) const = 0; + virtual LoadablesConfigurationPtr load(const std::string & path) = 0; - virtual ~IExternalLoaderConfigRepository() = default; - - static const char * INTERNAL_REPOSITORY_NAME_PREFIX; + virtual ~IExternalLoaderConfigRepository() {} }; -using ExternalLoaderConfigRepositoryPtr = std::unique_ptr; - } diff --git a/dbms/src/Interpreters/InterpreterAlterQuery.cpp b/dbms/src/Interpreters/InterpreterAlterQuery.cpp index 94d27a7157b..e821b56de2e 100644 --- a/dbms/src/Interpreters/InterpreterAlterQuery.cpp +++ b/dbms/src/Interpreters/InterpreterAlterQuery.cpp @@ -103,7 +103,10 @@ BlockIO InterpreterAlterQuery::execute() if (!alter_commands.empty()) { auto table_lock_holder = table->lockAlterIntention(context.getCurrentQueryId()); - alter_commands.validate(*table, context); + StorageInMemoryMetadata metadata = table->getInMemoryMetadata(); + alter_commands.validate(metadata, context); + alter_commands.prepare(metadata, context); + table->checkAlterIsPossible(alter_commands, context.getSettingsRef()); table->alter(alter_commands, context, table_lock_holder); } diff --git a/dbms/src/Interpreters/InterpreterCreateQuery.cpp b/dbms/src/Interpreters/InterpreterCreateQuery.cpp index b43f3de35dc..db48c3bed92 100644 --- a/dbms/src/Interpreters/InterpreterCreateQuery.cpp +++ b/dbms/src/Interpreters/InterpreterCreateQuery.cpp @@ -511,17 +511,24 @@ void InterpreterCreateQuery::setEngine(ASTCreateQuery & create) const String as_database_name = create.as_database.empty() ? context.getCurrentDatabase() : create.as_database; String as_table_name = create.as_table; - ASTPtr as_create_ptr = context.getCreateTableQuery(as_database_name, as_table_name); + ASTPtr as_create_ptr = context.getDatabase(as_database_name)->getCreateTableQuery(context, as_table_name); const auto & as_create = as_create_ptr->as(); + const String qualified_name = backQuoteIfNeed(as_database_name) + "." + backQuoteIfNeed(as_table_name); + if (as_create.is_view) throw Exception( - "Cannot CREATE a table AS " + as_database_name + "." + as_table_name + ", it is a View", + "Cannot CREATE a table AS " + qualified_name + ", it is a View", ErrorCodes::INCORRECT_QUERY); if (as_create.is_live_view) throw Exception( - "Cannot CREATE a table AS " + as_database_name + "." + as_table_name + ", it is a Live View", + "Cannot CREATE a table AS " + qualified_name + ", it is a Live View", + ErrorCodes::INCORRECT_QUERY); + + if (as_create.is_dictionary) + throw Exception( + "Cannot CREATE a table AS " + qualified_name + ", it is a Dictionary", ErrorCodes::INCORRECT_QUERY); create.set(create.storage, as_create.storage->ptr()); @@ -549,7 +556,7 @@ BlockIO InterpreterCreateQuery::createTable(ASTCreateQuery & create) if (create.attach && !create.storage && !create.columns_list) { // Table SQL definition is available even if the table is detached - auto query = context.getCreateTableQuery(create.database, create.table); + auto query = context.getDatabase(create.database)->getCreateTableQuery(context, create.table); create = query->as(); // Copy the saved create query, but use ATTACH instead of CREATE create.attach = true; } @@ -583,7 +590,6 @@ bool InterpreterCreateQuery::doCreateTable(const ASTCreateQuery & create, { std::unique_ptr guard; - String data_path; DatabasePtr database; const String & table_name = create.table; @@ -591,7 +597,6 @@ bool InterpreterCreateQuery::doCreateTable(const ASTCreateQuery & create, if (need_add_to_database) { database = context.getDatabase(create.database); - data_path = database->getDataPath(); /** If the request specifies IF NOT EXISTS, we allow concurrent CREATE queries (which do nothing). * If table doesnt exist, one thread is creating table, while others wait in DDLGuard. @@ -632,7 +637,7 @@ bool InterpreterCreateQuery::doCreateTable(const ASTCreateQuery & create, else { res = StorageFactory::instance().get(create, - data_path + escapeForFileName(table_name) + "/", + database ? database->getTableDataPath(create) : "", table_name, create.database, context, @@ -693,7 +698,9 @@ BlockIO InterpreterCreateQuery::createDictionary(ASTCreateQuery & create) String dictionary_name = create.table; - String database_name = !create.database.empty() ? create.database : context.getCurrentDatabase(); + if (create.database.empty()) + create.database = context.getCurrentDatabase(); + const String & database_name = create.database; auto guard = context.getDDLGuard(database_name, dictionary_name); DatabasePtr database = context.getDatabase(database_name); @@ -710,7 +717,7 @@ BlockIO InterpreterCreateQuery::createDictionary(ASTCreateQuery & create) if (create.attach) { - auto query = context.getCreateDictionaryQuery(database_name, dictionary_name); + auto query = context.getDatabase(database_name)->getCreateDictionaryQuery(context, dictionary_name); create = query->as(); create.attach = true; } diff --git a/dbms/src/Interpreters/InterpreterDropQuery.cpp b/dbms/src/Interpreters/InterpreterDropQuery.cpp index 03171202fd1..6b1f3088636 100644 --- a/dbms/src/Interpreters/InterpreterDropQuery.cpp +++ b/dbms/src/Interpreters/InterpreterDropQuery.cpp @@ -128,17 +128,16 @@ BlockIO InterpreterDropQuery::executeToTable( throw; } + String table_data_path_relative = database_and_table.first->getTableDataPath(table_name); + /// Delete table metadata and table itself from memory database_and_table.first->removeTable(context, database_and_table.second->getTableName()); database_and_table.second->is_dropped = true; - String database_data_path = database_and_table.first->getDataPath(); - /// If it is not virtual database like Dictionary then drop remaining data dir - if (!database_data_path.empty()) + if (!table_data_path_relative.empty()) { - String table_data_path = context.getPath() + database_data_path + "/" + escapeForFileName(table_name); - + String table_data_path = context.getPath() + table_data_path_relative; if (Poco::File(table_data_path).exists()) Poco::File(table_data_path).remove(true); } diff --git a/dbms/src/Interpreters/InterpreterSelectQuery.cpp b/dbms/src/Interpreters/InterpreterSelectQuery.cpp index b8e4be1f1d9..f5971d7edbf 100644 --- a/dbms/src/Interpreters/InterpreterSelectQuery.cpp +++ b/dbms/src/Interpreters/InterpreterSelectQuery.cpp @@ -242,17 +242,11 @@ InterpreterSelectQuery::InterpreterSelectQuery( throw Exception("Too deep subqueries. Maximum: " + settings.max_subquery_depth.toString(), ErrorCodes::TOO_DEEP_SUBQUERIES); - if (settings.allow_experimental_cross_to_join_conversion) - { - CrossToInnerJoinVisitor::Data cross_to_inner; - CrossToInnerJoinVisitor(cross_to_inner).visit(query_ptr); - } + CrossToInnerJoinVisitor::Data cross_to_inner; + CrossToInnerJoinVisitor(cross_to_inner).visit(query_ptr); - if (settings.allow_experimental_multiple_joins_emulation) - { - JoinToSubqueryTransformVisitor::Data join_to_subs_data{*context}; - JoinToSubqueryTransformVisitor(join_to_subs_data).visit(query_ptr); - } + JoinToSubqueryTransformVisitor::Data join_to_subs_data{*context}; + JoinToSubqueryTransformVisitor(join_to_subs_data).visit(query_ptr); max_streams = settings.max_threads; auto & query = getSelectQuery(); @@ -411,9 +405,19 @@ InterpreterSelectQuery::InterpreterSelectQuery( query.setExpression(ASTSelectQuery::Expression::WHERE, std::make_shared(0u)); need_analyze_again = true; } + if (query.prewhere() && query.where()) + { + /// Filter block in WHERE instead to get better performance + query.setExpression(ASTSelectQuery::Expression::WHERE, makeASTFunction("and", query.prewhere()->clone(), query.where()->clone())); + need_analyze_again = true; + } if (need_analyze_again) analyze(); + /// If there is no WHERE, filter blocks as usual + if (query.prewhere() && !query.where()) + analysis_result.prewhere_info->need_filter = true; + /// Blocks used in expression analysis contains size 1 const columns for constant folding and /// null non-const columns to avoid useless memory allocations. However, a valid block sample /// requires all columns to be of size 0, thus we need to sanitize the block here. @@ -484,8 +488,8 @@ BlockInputStreams InterpreterSelectQuery::executeWithMultipleStreams(QueryPipeli QueryPipeline InterpreterSelectQuery::executeWithProcessors() { QueryPipeline query_pipeline; - query_pipeline.setMaxThreads(context->getSettingsRef().max_threads); executeImpl(query_pipeline, input, query_pipeline); + query_pipeline.setMaxThreads(max_streams); query_pipeline.addInterpreterContext(context); query_pipeline.addStorageHolder(storage); return query_pipeline; @@ -1789,6 +1793,9 @@ void InterpreterSelectQuery::executeFetchColumns( // pipes[i].pinSources(i); // } + for (auto & pipe : pipes) + pipe.enableQuota(); + pipeline.init(std::move(pipes)); } else @@ -1967,6 +1974,8 @@ void InterpreterSelectQuery::executeAggregation(QueryPipeline & pipeline, const return std::make_shared(header, transform_params); }); } + + pipeline.enableQuotaForCurrentStreams(); } @@ -2080,6 +2089,8 @@ void InterpreterSelectQuery::executeMergeAggregated(QueryPipeline & pipeline, bo pipeline.addPipe(std::move(pipe)); } + + pipeline.enableQuotaForCurrentStreams(); } @@ -2263,17 +2274,17 @@ void InterpreterSelectQuery::executeOrder(Pipeline & pipeline, InputSortingInfoP limits.size_limits = SizeLimits(settings.max_rows_to_sort, settings.max_bytes_to_sort, settings.sort_overflow_mode); sorting_stream->setLimits(limits); - stream = sorting_stream; + auto merging_stream = std::make_shared( + sorting_stream, output_order_descr, settings.max_block_size, limit, + settings.max_bytes_before_remerge_sort, + settings.max_bytes_before_external_sort / pipeline.streams.size(), + context->getTemporaryPath(), settings.min_free_disk_space_for_temporary_data); + + stream = merging_stream; }); /// If there are several streams, we merge them into one - executeUnion(pipeline, {}); - - /// Merge the sorted blocks. - pipeline.firstStream() = std::make_shared( - pipeline.firstStream(), output_order_descr, settings.max_block_size, limit, - settings.max_bytes_before_remerge_sort, - settings.max_bytes_before_external_sort, context->getTemporaryPath(), settings.min_free_disk_space_for_temporary_data); + executeMergeSorted(pipeline, output_order_descr, limit); } } @@ -2313,6 +2324,8 @@ void InterpreterSelectQuery::executeOrder(QueryPipeline & pipeline, InputSorting pipeline.addPipe({ std::move(transform) }); } + pipeline.enableQuotaForCurrentStreams(); + if (need_finish_sorting) { pipeline.addSimpleTransform([&](const Block & header, QueryPipeline::StreamType stream_type) @@ -2352,6 +2365,8 @@ void InterpreterSelectQuery::executeOrder(QueryPipeline & pipeline, InputSorting settings.max_bytes_before_remerge_sort, settings.max_bytes_before_external_sort, context->getTemporaryPath(), settings.min_free_disk_space_for_temporary_data); }); + + pipeline.enableQuotaForCurrentStreams(); } @@ -2413,6 +2428,8 @@ void InterpreterSelectQuery::executeMergeSorted(QueryPipeline & pipeline, const settings.max_block_size, limit); pipeline.addPipe({ std::move(transform) }); + + pipeline.enableQuotaForCurrentStreams(); } } diff --git a/dbms/src/Interpreters/InterpreterShowCreateQuery.cpp b/dbms/src/Interpreters/InterpreterShowCreateQuery.cpp index 95ebd8cc959..6f76016725f 100644 --- a/dbms/src/Interpreters/InterpreterShowCreateQuery.cpp +++ b/dbms/src/Interpreters/InterpreterShowCreateQuery.cpp @@ -49,19 +49,19 @@ BlockInputStreamPtr InterpreterShowCreateQuery::executeImpl() if (show_query->temporary) create_query = context.getCreateExternalTableQuery(show_query->table); else - create_query = context.getCreateTableQuery(show_query->database, show_query->table); + create_query = context.getDatabase(show_query->database)->getCreateTableQuery(context, show_query->table); } else if ((show_query = query_ptr->as())) { if (show_query->temporary) throw Exception("Temporary databases are not possible.", ErrorCodes::SYNTAX_ERROR); - create_query = context.getCreateDatabaseQuery(show_query->database); + create_query = context.getDatabase(show_query->database)->getCreateDatabaseQuery(); } else if ((show_query = query_ptr->as())) { if (show_query->temporary) throw Exception("Temporary dictionaries are not possible.", ErrorCodes::SYNTAX_ERROR); - create_query = context.getCreateDictionaryQuery(show_query->database, show_query->table); + create_query = context.getDatabase(show_query->database)->getCreateDictionaryQuery(context, show_query->table); } if (!create_query && show_query && show_query->temporary) diff --git a/dbms/src/Interpreters/InterpreterSystemQuery.cpp b/dbms/src/Interpreters/InterpreterSystemQuery.cpp index 05999117b7c..d5c04750302 100644 --- a/dbms/src/Interpreters/InterpreterSystemQuery.cpp +++ b/dbms/src/Interpreters/InterpreterSystemQuery.cpp @@ -283,7 +283,7 @@ StoragePtr InterpreterSystemQuery::tryRestartReplica(const String & database_nam /// If table was already dropped by anyone, an exception will be thrown auto table_lock = table->lockExclusively(context.getCurrentQueryId()); - create_ast = system_context.getCreateTableQuery(database_name, table_name); + create_ast = database->getCreateTableQuery(system_context, table_name); database->detachTable(table_name); } @@ -294,12 +294,11 @@ StoragePtr InterpreterSystemQuery::tryRestartReplica(const String & database_nam auto & create = create_ast->as(); create.attach = true; - std::string data_path = database->getDataPath(); auto columns = InterpreterCreateQuery::getColumnsDescription(*create.columns_list->columns, system_context); auto constraints = InterpreterCreateQuery::getConstraintsDescription(create.columns_list->constraints); StoragePtr table = StorageFactory::instance().get(create, - data_path + escapeForFileName(table_name) + "/", + database->getTableDataPath(create), table_name, database_name, system_context, diff --git a/dbms/src/Interpreters/OptimizeIfChains.cpp b/dbms/src/Interpreters/OptimizeIfChains.cpp new file mode 100644 index 00000000000..d440b204d54 --- /dev/null +++ b/dbms/src/Interpreters/OptimizeIfChains.cpp @@ -0,0 +1,92 @@ +#include +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; + extern const int UNEXPECTED_AST_STRUCTURE; +} + +void OptimizeIfChainsVisitor::visit(ASTPtr & current_ast) +{ + if (!current_ast) + return; + + for (ASTPtr & child : current_ast->children) + { + /// Fallthrough cases + + const auto * function_node = child->as(); + if (!function_node || function_node->name != "if" || !function_node->arguments) + { + visit(child); + continue; + } + + const auto * function_args = function_node->arguments->as(); + if (!function_args || function_args->children.size() != 3 || !function_args->children[2]) + { + visit(child); + continue; + } + + const auto * else_arg = function_args->children[2]->as(); + if (!else_arg || else_arg->name != "if") + { + visit(child); + continue; + } + + /// The case of: + /// if(cond, a, if(...)) + + auto chain = ifChain(child); + std::reverse(chain.begin(), chain.end()); + child->as()->name = "multiIf"; + child->as()->arguments->children = std::move(chain); + } +} + +ASTs OptimizeIfChainsVisitor::ifChain(const ASTPtr & child) +{ + const auto * function_node = child->as(); + if (!function_node || !function_node->arguments) + throw Exception("Unexpected AST for function 'if'", ErrorCodes::UNEXPECTED_AST_STRUCTURE); + + const auto * function_args = function_node->arguments->as(); + + if (!function_args || function_args->children.size() != 3) + throw Exception("Wrong number of arguments for function 'if' (" + toString(function_args->children.size()) + " instead of 3)", + ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + + const auto * else_arg = function_args->children[2]->as(); + + /// Recursively collect arguments from the innermost if ("head-resursion"). + /// Arguments will be returned in reverse order. + + if (else_arg && else_arg->name == "if") + { + auto cur = ifChain(function_node->arguments->children[2]); + cur.push_back(function_node->arguments->children[1]); + cur.push_back(function_node->arguments->children[0]); + return cur; + } + else + { + ASTs end; + end.reserve(3); + end.push_back(function_node->arguments->children[2]); + end.push_back(function_node->arguments->children[1]); + end.push_back(function_node->arguments->children[0]); + return end; + } +} + +} diff --git a/dbms/src/Interpreters/OptimizeIfChains.h b/dbms/src/Interpreters/OptimizeIfChains.h new file mode 100644 index 00000000000..5dbdb9bee50 --- /dev/null +++ b/dbms/src/Interpreters/OptimizeIfChains.h @@ -0,0 +1,19 @@ +#pragma once + +#include + +namespace DB +{ + +/// It converts if-chain to multiIf. +class OptimizeIfChainsVisitor +{ +public: + OptimizeIfChainsVisitor() = default; + void visit(ASTPtr & ast); + +private: + ASTs ifChain(const ASTPtr & child); +}; + +} diff --git a/dbms/src/Interpreters/ReplaceQueryParameterVisitor.cpp b/dbms/src/Interpreters/ReplaceQueryParameterVisitor.cpp index 1cbcb758bf3..5c29c722f88 100644 --- a/dbms/src/Interpreters/ReplaceQueryParameterVisitor.cpp +++ b/dbms/src/Interpreters/ReplaceQueryParameterVisitor.cpp @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -54,10 +55,12 @@ void ReplaceQueryParameterVisitor::visitQueryParameter(ASTPtr & ast) IColumn & temp_column = *temp_column_ptr; ReadBufferFromString read_buffer{value}; FormatSettings format_settings; - data_type->deserializeAsWholeText(temp_column, read_buffer, format_settings); + data_type->deserializeAsTextEscaped(temp_column, read_buffer, format_settings); if (!read_buffer.eof()) - throw Exception("Value " + value + " cannot be parsed as " + type_name + " for query parameter '" + ast_param.name + "'", ErrorCodes::BAD_QUERY_PARAMETER); + throw Exception("Value " + value + " cannot be parsed as " + type_name + " for query parameter '" + ast_param.name + "'" + " because it isn't parsed completely: only " + toString(read_buffer.count()) + " of " + toString(value.size()) + " bytes was parsed: " + + value.substr(0, read_buffer.count()), ErrorCodes::BAD_QUERY_PARAMETER); ast = addTypeConversionToAST(std::make_shared(temp_column[0]), type_name); } diff --git a/dbms/src/Interpreters/SyntaxAnalyzer.cpp b/dbms/src/Interpreters/SyntaxAnalyzer.cpp index a26d8b8253a..85135c71c6f 100644 --- a/dbms/src/Interpreters/SyntaxAnalyzer.cpp +++ b/dbms/src/Interpreters/SyntaxAnalyzer.cpp @@ -21,6 +21,7 @@ #include #include /// getSmallestColumn() #include +#include #include #include @@ -914,6 +915,9 @@ SyntaxAnalyzerResultPtr SyntaxAnalyzer::analyze( /// Optimize if with constant condition after constants was substituted instead of scalar subqueries. OptimizeIfWithConstantConditionVisitor(result.aliases).visit(query); + if (settings.optimize_if_chain_to_miltiif) + OptimizeIfChainsVisitor().visit(query); + if (select_query) { /// GROUP BY injective function elimination. diff --git a/dbms/src/Interpreters/executeQuery.cpp b/dbms/src/Interpreters/executeQuery.cpp index 2c6bf087f8d..77b0256d121 100644 --- a/dbms/src/Interpreters/executeQuery.cpp +++ b/dbms/src/Interpreters/executeQuery.cpp @@ -129,9 +129,9 @@ static void setExceptionStackTrace(QueryLogElement & elem) { throw; } - catch (const Exception & e) + catch (const std::exception & e) { - elem.stack_trace = e.getStackTrace().toString(); + elem.stack_trace = getExceptionStackTraceString(e); } catch (...) {} } diff --git a/dbms/src/Interpreters/tests/create_query.cpp b/dbms/src/Interpreters/tests/create_query.cpp index fc487f4b7bb..e159afced58 100644 --- a/dbms/src/Interpreters/tests/create_query.cpp +++ b/dbms/src/Interpreters/tests/create_query.cpp @@ -97,6 +97,6 @@ catch (const Exception & e) std::cerr << e.what() << ", " << e.displayText() << std::endl << std::endl << "Stack trace:" << std::endl - << e.getStackTrace().toString(); + << e.getStackTraceString(); return 1; } diff --git a/dbms/src/Interpreters/tests/select_query.cpp b/dbms/src/Interpreters/tests/select_query.cpp index 54613fffd8e..197d1c55faf 100644 --- a/dbms/src/Interpreters/tests/select_query.cpp +++ b/dbms/src/Interpreters/tests/select_query.cpp @@ -55,6 +55,6 @@ catch (const Exception & e) std::cerr << e.what() << ", " << e.displayText() << std::endl << std::endl << "Stack trace:" << std::endl - << e.getStackTrace().toString(); + << e.getStackTraceString(); return 1; } diff --git a/dbms/src/Parsers/ASTSelectQuery.cpp b/dbms/src/Parsers/ASTSelectQuery.cpp index 802046114a2..e9a4eae73c2 100644 --- a/dbms/src/Parsers/ASTSelectQuery.cpp +++ b/dbms/src/Parsers/ASTSelectQuery.cpp @@ -239,7 +239,7 @@ static const ASTTablesInSelectQueryElement * getFirstTableJoin(const ASTSelectQu if (!joined_table) joined_table = &tables_element; else - throw Exception("Multiple JOIN disabled or does not support the query.", ErrorCodes::NOT_IMPLEMENTED); + throw Exception("Multiple JOIN does not support the query.", ErrorCodes::NOT_IMPLEMENTED); } } diff --git a/dbms/src/Processors/Executors/PipelineExecutor.cpp b/dbms/src/Processors/Executors/PipelineExecutor.cpp index 6addec11975..bc0de1fb81d 100644 --- a/dbms/src/Processors/Executors/PipelineExecutor.cpp +++ b/dbms/src/Processors/Executors/PipelineExecutor.cpp @@ -177,10 +177,20 @@ void PipelineExecutor::addJob(ExecutionState * execution_state) execution_state->job = std::move(job); } -void PipelineExecutor::expandPipeline(Stack & stack, UInt64 pid) +bool PipelineExecutor::expandPipeline(Stack & stack, UInt64 pid) { auto & cur_node = graph[pid]; - auto new_processors = cur_node.processor->expandPipeline(); + Processors new_processors; + + try + { + new_processors = cur_node.processor->expandPipeline(); + } + catch (...) + { + cur_node.execution_state->exception = std::current_exception(); + return false; + } for (const auto & processor : new_processors) { @@ -220,20 +230,22 @@ void PipelineExecutor::expandPipeline(Stack & stack, UInt64 pid) } } } + + return true; } -bool PipelineExecutor::tryAddProcessorToStackIfUpdated(Edge & edge, Stack & stack) +bool PipelineExecutor::tryAddProcessorToStackIfUpdated(Edge & edge, Queue & queue, size_t thread_number) { /// In this method we have ownership on edge, but node can be concurrently accessed. auto & node = graph[edge.to]; - std::lock_guard guard(node.status_mutex); + std::unique_lock lock(node.status_mutex); ExecStatus status = node.status; if (status == ExecStatus::Finished) - return false; + return true; if (edge.backward) node.updated_output_ports.push_back(edge.output_port_number); @@ -243,14 +255,13 @@ bool PipelineExecutor::tryAddProcessorToStackIfUpdated(Edge & edge, Stack & stac if (status == ExecStatus::Idle) { node.status = ExecStatus::Preparing; - stack.push(edge.to); - return true; + return prepareProcessor(edge.to, thread_number, queue, std::move(lock)); } - return false; + return true; } -bool PipelineExecutor::prepareProcessor(UInt64 pid, Stack & children, Stack & parents, size_t thread_number, bool async) +bool PipelineExecutor::prepareProcessor(UInt64 pid, size_t thread_number, Queue & queue, std::unique_lock node_lock) { /// In this method we have ownership on node. auto & node = graph[pid]; @@ -264,14 +275,22 @@ bool PipelineExecutor::prepareProcessor(UInt64 pid, Stack & children, Stack & pa { /// Stopwatch watch; - std::lock_guard guard(node.status_mutex); + std::unique_lock lock(std::move(node_lock)); - auto status = node.processor->prepare(node.updated_input_ports, node.updated_output_ports); - node.updated_input_ports.clear(); - node.updated_output_ports.clear(); + try + { + node.last_processor_status = node.processor->prepare(node.updated_input_ports, node.updated_output_ports); + } + catch (...) + { + node.execution_state->exception = std::current_exception(); + return false; + } /// node.execution_state->preparation_time_ns += watch.elapsed(); - node.last_processor_status = status; + + node.updated_input_ports.clear(); + node.updated_output_ports.clear(); switch (node.last_processor_status) { @@ -291,7 +310,8 @@ bool PipelineExecutor::prepareProcessor(UInt64 pid, Stack & children, Stack & pa case IProcessor::Status::Ready: { node.status = ExecStatus::Executing; - return true; + queue.push(node.execution_state.get()); + break; } case IProcessor::Status::Async: { @@ -303,9 +323,7 @@ bool PipelineExecutor::prepareProcessor(UInt64 pid, Stack & children, Stack & pa } case IProcessor::Status::Wait: { - if (!async) - throw Exception("Processor returned status Wait before Async.", ErrorCodes::LOGICAL_ERROR); - break; + throw Exception("Wait is temporary not supported.", ErrorCodes::LOGICAL_ERROR); } case IProcessor::Status::ExpandPipeline: { @@ -337,18 +355,26 @@ bool PipelineExecutor::prepareProcessor(UInt64 pid, Stack & children, Stack & pa if (need_traverse) { - for (auto & edge : updated_back_edges) - tryAddProcessorToStackIfUpdated(*edge, parents); - for (auto & edge : updated_direct_edges) - tryAddProcessorToStackIfUpdated(*edge, children); + { + if (!tryAddProcessorToStackIfUpdated(*edge, queue, thread_number)) + return false; + } + + for (auto & edge : updated_back_edges) + { + if (!tryAddProcessorToStackIfUpdated(*edge, queue, thread_number)) + return false; + } } if (need_expand_pipeline) { + Stack stack; + executor_contexts[thread_number]->task_list.emplace_back( node.execution_state.get(), - &parents + &stack ); ExpandPipelineTask * desired = &executor_contexts[thread_number]->task_list.back(); @@ -356,20 +382,32 @@ bool PipelineExecutor::prepareProcessor(UInt64 pid, Stack & children, Stack & pa while (!expand_pipeline_task.compare_exchange_strong(expected, desired)) { - doExpandPipeline(expected, true); + if (!doExpandPipeline(expected, true)) + return false; + expected = nullptr; } - doExpandPipeline(desired, true); + if (!doExpandPipeline(desired, true)) + return false; /// Add itself back to be prepared again. - children.push(pid); + stack.push(pid); + + while (!stack.empty()) + { + auto item = stack.top(); + if (!prepareProcessor(item, thread_number, queue, std::unique_lock(graph[item].status_mutex))) + return false; + + stack.pop(); + } } - return false; + return true; } -void PipelineExecutor::doExpandPipeline(ExpandPipelineTask * task, bool processing) +bool PipelineExecutor::doExpandPipeline(ExpandPipelineTask * task, bool processing) { std::unique_lock lock(task->mutex); @@ -381,16 +419,20 @@ void PipelineExecutor::doExpandPipeline(ExpandPipelineTask * task, bool processi return task->num_waiting_processing_threads >= num_processing_executors || expand_pipeline_task != task; }); + bool result = true; + /// After condvar.wait() task may point to trash. Can change it only if it is still in expand_pipeline_task. if (expand_pipeline_task == task) { - expandPipeline(*task->stack, task->node_to_expand->processors_id); + result = expandPipeline(*task->stack, task->node_to_expand->processors_id); expand_pipeline_task = nullptr; lock.unlock(); task->condvar.notify_all(); } + + return result; } void PipelineExecutor::cancel() @@ -459,49 +501,31 @@ void PipelineExecutor::executeSingleThread(size_t thread_num, size_t num_threads #if !defined(__APPLE__) && !defined(__FreeBSD__) /// Specify CPU core for thread if can. /// It may reduce the number of context swithches. - cpu_set_t cpu_set; - CPU_ZERO(&cpu_set); - CPU_SET(thread_num, &cpu_set); - if (sched_setaffinity(0, sizeof(cpu_set_t), &cpu_set) == -1) - LOG_TRACE(log, "Cannot set affinity for thread " << num_threads); + /* + if (num_threads > 1) + { + cpu_set_t cpu_set; + CPU_ZERO(&cpu_set); + CPU_SET(thread_num, &cpu_set); + if (sched_setaffinity(0, sizeof(cpu_set_t), &cpu_set) == -1) + LOG_TRACE(log, "Cannot set affinity for thread " << num_threads); + } + */ #endif - UInt64 total_time_ns = 0; - UInt64 execution_time_ns = 0; - UInt64 processing_time_ns = 0; - UInt64 wait_time_ns = 0; +// UInt64 total_time_ns = 0; +// UInt64 execution_time_ns = 0; +// UInt64 processing_time_ns = 0; +// UInt64 wait_time_ns = 0; - Stopwatch total_time_watch; +// Stopwatch total_time_watch; ExecutionState * state = nullptr; - auto prepare_processor = [&](UInt64 pid, Stack & children, Stack & parents) + auto prepare_processor = [&](UInt64 pid, Queue & queue) { - try - { - return prepareProcessor(pid, children, parents, thread_num, false); - } - catch (...) - { - graph[pid].execution_state->exception = std::current_exception(); + if (!prepareProcessor(pid, thread_num, queue, std::unique_lock(graph[pid].status_mutex))) finish(); - } - - return false; - }; - - using Queue = std::queue; - - auto prepare_all_processors = [&](Queue & queue, Stack & stack, Stack & children, Stack & parents) - { - while (!stack.empty() && !finished) - { - auto current_processor = stack.top(); - stack.pop(); - - if (prepare_processor(current_processor, children, parents)) - queue.push(graph[current_processor].execution_state.get()); - } }; auto wake_up_executor = [&](size_t executor) @@ -511,63 +535,6 @@ void PipelineExecutor::executeSingleThread(size_t thread_num, size_t num_threads executor_contexts[executor]->condvar.notify_one(); }; - auto process_pinned_tasks = [&](Queue & queue) - { - Queue tmp_queue; - - struct PinnedTask - { - ExecutionState * task; - size_t thread_num; - }; - - std::stack pinned_tasks; - - while (!queue.empty()) - { - auto task = queue.front(); - queue.pop(); - - auto stream = task->processor->getStream(); - if (stream != IProcessor::NO_STREAM) - pinned_tasks.push({.task = task, .thread_num = stream % num_threads}); - else - tmp_queue.push(task); - } - - if (!pinned_tasks.empty()) - { - std::stack threads_to_wake; - - { - std::lock_guard lock(task_queue_mutex); - - while (!pinned_tasks.empty()) - { - auto & pinned_task = pinned_tasks.top(); - auto thread = pinned_task.thread_num; - - executor_contexts[thread]->pinned_tasks.push(pinned_task.task); - pinned_tasks.pop(); - - if (threads_queue.has(thread)) - { - threads_queue.pop(thread); - threads_to_wake.push(thread); - } - } - } - - while (!threads_to_wake.empty()) - { - wake_up_executor(threads_to_wake.top()); - threads_to_wake.pop(); - } - } - - queue.swap(tmp_queue); - }; - while (!finished) { /// First, find any processor to execute. @@ -577,20 +544,11 @@ void PipelineExecutor::executeSingleThread(size_t thread_num, size_t num_threads { std::unique_lock lock(task_queue_mutex); - if (!executor_contexts[thread_num]->pinned_tasks.empty()) - { - state = executor_contexts[thread_num]->pinned_tasks.front(); - executor_contexts[thread_num]->pinned_tasks.pop(); - - break; - } - if (!task_queue.empty()) { - state = task_queue.front(); - task_queue.pop(); + state = task_queue.pop(thread_num); - if (!task_queue.empty() && !threads_queue.empty()) + if (!task_queue.empty() && !threads_queue.empty() /*&& task_queue.quota() > threads_queue.size()*/) { auto thread_to_wake = threads_queue.pop_any(); lock.unlock(); @@ -648,8 +606,6 @@ void PipelineExecutor::executeSingleThread(size_t thread_num, size_t num_threads /// Try to execute neighbour processor. { - Stack children; - Stack parents; Queue queue; ++num_processing_executors; @@ -657,36 +613,16 @@ void PipelineExecutor::executeSingleThread(size_t thread_num, size_t num_threads doExpandPipeline(task, true); /// Execute again if can. - if (!prepare_processor(state->processors_id, children, parents)) - state = nullptr; - - /// Process all neighbours. Children will be on the top of stack, then parents. - prepare_all_processors(queue, children, children, parents); - process_pinned_tasks(queue); + prepare_processor(state->processors_id, queue); + state = nullptr; /// Take local task from queue if has one. - if (!state && !queue.empty()) + if (!queue.empty()) { state = queue.front(); queue.pop(); } - prepare_all_processors(queue, parents, parents, parents); - process_pinned_tasks(queue); - - /// Take pinned task if has one. - { - std::lock_guard guard(task_queue_mutex); - if (!executor_contexts[thread_num]->pinned_tasks.empty()) - { - if (state) - queue.push(state); - - state = executor_contexts[thread_num]->pinned_tasks.front(); - executor_contexts[thread_num]->pinned_tasks.pop(); - } - } - /// Push other tasks to global queue. if (!queue.empty()) { @@ -694,14 +630,15 @@ void PipelineExecutor::executeSingleThread(size_t thread_num, size_t num_threads while (!queue.empty() && !finished) { - task_queue.push(queue.front()); + task_queue.push(queue.front(), thread_num); queue.pop(); } - if (!threads_queue.empty()) + if (!threads_queue.empty() /* && task_queue.quota() > threads_queue.size()*/) { auto thread_to_wake = threads_queue.pop_any(); lock.unlock(); + wake_up_executor(thread_to_wake); } } @@ -715,14 +652,15 @@ void PipelineExecutor::executeSingleThread(size_t thread_num, size_t num_threads } } - total_time_ns = total_time_watch.elapsed(); - wait_time_ns = total_time_ns - execution_time_ns - processing_time_ns; - +// total_time_ns = total_time_watch.elapsed(); +// wait_time_ns = total_time_ns - execution_time_ns - processing_time_ns; +/* LOG_TRACE(log, "Thread finished." << " Total time: " << (total_time_ns / 1e9) << " sec." << " Execution time: " << (execution_time_ns / 1e9) << " sec." << " Processing time: " << (processing_time_ns / 1e9) << " sec." << " Wait time: " << (wait_time_ns / 1e9) << "sec."); +*/ } void PipelineExecutor::executeImpl(size_t num_threads) @@ -730,6 +668,7 @@ void PipelineExecutor::executeImpl(size_t num_threads) Stack stack; threads_queue.init(num_threads); + task_queue.init(num_threads); { std::lock_guard guard(executor_contexts_mutex); @@ -763,42 +702,57 @@ void PipelineExecutor::executeImpl(size_t num_threads) { std::lock_guard lock(task_queue_mutex); + Queue queue; + size_t next_thread = 0; + while (!stack.empty()) { UInt64 proc = stack.top(); stack.pop(); - if (prepareProcessor(proc, stack, stack, 0, false)) + prepareProcessor(proc, 0, queue, std::unique_lock(graph[proc].status_mutex)); + + while (!queue.empty()) { - auto cur_state = graph[proc].execution_state.get(); - task_queue.push(cur_state); + task_queue.push(queue.front(), next_thread); + queue.pop(); + + ++next_thread; + if (next_thread >= num_threads) + next_thread = 0; } } } - for (size_t i = 0; i < num_threads; ++i) + if (num_threads > 1) { - threads.emplace_back([this, thread_group, thread_num = i, num_threads] + + for (size_t i = 0; i < num_threads; ++i) { - /// ThreadStatus thread_status; + threads.emplace_back([this, thread_group, thread_num = i, num_threads] + { + /// ThreadStatus thread_status; - setThreadName("QueryPipelineEx"); + setThreadName("QueryPipelineEx"); - if (thread_group) - CurrentThread::attachTo(thread_group); + if (thread_group) + CurrentThread::attachTo(thread_group); - SCOPE_EXIT( - if (thread_group) - CurrentThread::detachQueryIfNotDetached(); - ); + SCOPE_EXIT( + if (thread_group) + CurrentThread::detachQueryIfNotDetached(); + ); - executeSingleThread(thread_num, num_threads); - }); + executeSingleThread(thread_num, num_threads); + }); + } + + for (auto & thread : threads) + if (thread.joinable()) + thread.join(); } - - for (auto & thread : threads) - if (thread.joinable()) - thread.join(); + else + executeSingleThread(0, num_threads); finished_flag = true; } diff --git a/dbms/src/Processors/Executors/PipelineExecutor.h b/dbms/src/Processors/Executors/PipelineExecutor.h index aded3de3008..2231c19284b 100644 --- a/dbms/src/Processors/Executors/PipelineExecutor.h +++ b/dbms/src/Processors/Executors/PipelineExecutor.h @@ -84,6 +84,7 @@ private: IProcessor * processor = nullptr; UInt64 processors_id = 0; + bool has_quota = false; /// Counters for profiling. size_t num_executed_jobs = 0; @@ -117,6 +118,7 @@ private: execution_state = std::make_unique(); execution_state->processor = processor; execution_state->processors_id = processor_id; + execution_state->has_quota = processor->hasQuota(); } Node(Node && other) noexcept @@ -132,7 +134,59 @@ private: using Stack = std::stack; - using TaskQueue = std::queue; + class TaskQueue + { + public: + void init(size_t num_threads) { queues.resize(num_threads); } + + void push(ExecutionState * state, size_t thread_num) + { + queues[thread_num].push(state); + + ++size_; + + if (state->has_quota) + ++quota_; + } + + ExecutionState * pop(size_t thread_num) + { + if (size_ == 0) + throw Exception("TaskQueue is not empty.", ErrorCodes::LOGICAL_ERROR); + + for (size_t i = 0; i < queues.size(); ++i) + { + if (!queues[thread_num].empty()) + { + ExecutionState * state = queues[thread_num].front(); + queues[thread_num].pop(); + + --size_; + + if (state->has_quota) + ++quota_; + + return state; + } + + ++thread_num; + if (thread_num >= queues.size()) + thread_num = 0; + } + + throw Exception("TaskQueue is not empty.", ErrorCodes::LOGICAL_ERROR); + } + + size_t size() const { return size_; } + bool empty() const { return size_ == 0; } + size_t quota() const { return quota_; } + + private: + using Queue = std::queue; + std::vector queues; + size_t size_ = 0; + size_t quota_ = 0; + }; /// Queue with pointers to tasks. Each thread will concurrently read from it until finished flag is set. /// Stores processors need to be prepared. Preparing status is already set for them. @@ -173,7 +227,7 @@ private: std::mutex mutex; bool wake_flag = false; - std::queue pinned_tasks; + /// std::queue pinned_tasks; }; std::vector> executor_contexts; @@ -186,19 +240,21 @@ private: /// Graph related methods. bool addEdges(UInt64 node); void buildGraph(); - void expandPipeline(Stack & stack, UInt64 pid); + bool expandPipeline(Stack & stack, UInt64 pid); + + using Queue = std::queue; /// Pipeline execution related methods. void addChildlessProcessorsToStack(Stack & stack); - bool tryAddProcessorToStackIfUpdated(Edge & edge, Stack & stack); + bool tryAddProcessorToStackIfUpdated(Edge & edge, Queue & queue, size_t thread_number); static void addJob(ExecutionState * execution_state); // TODO: void addAsyncJob(UInt64 pid); /// Prepare processor with pid number. /// Check parents and children of current processor and push them to stacks if they also need to be prepared. /// If processor wants to be expanded, ExpandPipelineTask from thread_number's execution context will be used. - bool prepareProcessor(UInt64 pid, Stack & children, Stack & parents, size_t thread_number, bool async); - void doExpandPipeline(ExpandPipelineTask * task, bool processing); + bool prepareProcessor(UInt64 pid, size_t thread_number, Queue & queue, std::unique_lock node_lock); + bool doExpandPipeline(ExpandPipelineTask * task, bool processing); void executeImpl(size_t num_threads); void executeSingleThread(size_t thread_num, size_t num_threads); diff --git a/dbms/src/Processors/IProcessor.h b/dbms/src/Processors/IProcessor.h index 852bde2d467..5296f36de87 100644 --- a/dbms/src/Processors/IProcessor.h +++ b/dbms/src/Processors/IProcessor.h @@ -272,12 +272,17 @@ public: size_t getStream() const { return stream_number; } constexpr static size_t NO_STREAM = std::numeric_limits::max(); + void enableQuota() { has_quota = true; } + bool hasQuota() const { return has_quota; } + private: std::atomic is_cancelled{false}; std::string processor_description; size_t stream_number = NO_STREAM; + + bool has_quota = false; }; diff --git a/dbms/src/Processors/Pipe.cpp b/dbms/src/Processors/Pipe.cpp index 17b44a48ea1..27daadac8a0 100644 --- a/dbms/src/Processors/Pipe.cpp +++ b/dbms/src/Processors/Pipe.cpp @@ -115,4 +115,13 @@ void Pipe::pinSources(size_t executor_number) } } +void Pipe::enableQuota() +{ + for (auto & processor : processors) + { + if (auto * source = dynamic_cast(processor.get())) + source->enableQuota(); + } +} + } diff --git a/dbms/src/Processors/Pipe.h b/dbms/src/Processors/Pipe.h index d734c89f485..3d121d3b2e3 100644 --- a/dbms/src/Processors/Pipe.h +++ b/dbms/src/Processors/Pipe.h @@ -42,6 +42,8 @@ public: /// Set information about preferred executor number for sources. void pinSources(size_t executor_number); + void enableQuota(); + void setTotalsPort(OutputPort * totals_) { totals = totals_; } OutputPort * getTotalsPort() const { return totals; } diff --git a/dbms/src/Processors/QueryPipeline.cpp b/dbms/src/Processors/QueryPipeline.cpp index fd75d7f57cf..13e91ac718d 100644 --- a/dbms/src/Processors/QueryPipeline.cpp +++ b/dbms/src/Processors/QueryPipeline.cpp @@ -278,6 +278,12 @@ void QueryPipeline::resize(size_t num_streams, bool force) processors.emplace_back(std::move(resize)); } +void QueryPipeline::enableQuotaForCurrentStreams() +{ + for (auto & stream : streams) + stream->getProcessor().enableQuota(); +} + void QueryPipeline::addTotalsHavingTransform(ProcessorPtr transform) { checkInitialized(); @@ -490,6 +496,8 @@ void QueryPipeline::unitePipelines( table_locks.insert(table_locks.end(), std::make_move_iterator(pipeline.table_locks.begin()), std::make_move_iterator(pipeline.table_locks.end())); interpreter_context.insert(interpreter_context.end(), pipeline.interpreter_context.begin(), pipeline.interpreter_context.end()); storage_holder.insert(storage_holder.end(), pipeline.storage_holder.begin(), pipeline.storage_holder.end()); + + max_threads = std::max(max_threads, pipeline.max_threads); } if (!extremes.empty()) diff --git a/dbms/src/Processors/QueryPipeline.h b/dbms/src/Processors/QueryPipeline.h index e32ed6a0abe..c27e570018f 100644 --- a/dbms/src/Processors/QueryPipeline.h +++ b/dbms/src/Processors/QueryPipeline.h @@ -63,6 +63,8 @@ public: void resize(size_t num_streams, bool force = false); + void enableQuotaForCurrentStreams(); + void unitePipelines(std::vector && pipelines, const Block & common_header, const Context & context); PipelineExecutorPtr execute(); diff --git a/dbms/src/Processors/Transforms/AggregatingTransform.cpp b/dbms/src/Processors/Transforms/AggregatingTransform.cpp index 72a5ff3bb7c..7a5bf77da68 100644 --- a/dbms/src/Processors/Transforms/AggregatingTransform.cpp +++ b/dbms/src/Processors/Transforms/AggregatingTransform.cpp @@ -72,15 +72,33 @@ namespace class ConvertingAggregatedToChunksSource : public ISource { public: + static constexpr UInt32 NUM_BUCKETS = 256; + + struct SharedData + { + std::atomic next_bucket_to_merge = 0; + std::array, NUM_BUCKETS> source_for_bucket; + + SharedData() + { + for (auto & source : source_for_bucket) + source = -1; + } + }; + + using SharedDataPtr = std::shared_ptr; + ConvertingAggregatedToChunksSource( AggregatingTransformParamsPtr params_, ManyAggregatedDataVariantsPtr data_, - Arena * arena_, - std::shared_ptr> next_bucket_to_merge_) + SharedDataPtr shared_data_, + Int32 source_number_, + Arena * arena_) : ISource(params_->getHeader()) , params(std::move(params_)) , data(std::move(data_)) - , next_bucket_to_merge(std::move(next_bucket_to_merge_)) + , shared_data(std::move(shared_data_)) + , source_number(source_number_) , arena(arena_) {} @@ -89,23 +107,25 @@ public: protected: Chunk generate() override { - UInt32 bucket_num = next_bucket_to_merge->fetch_add(1); + UInt32 bucket_num = shared_data->next_bucket_to_merge.fetch_add(1); if (bucket_num >= NUM_BUCKETS) return {}; Block block = params->aggregator.mergeAndConvertOneBucketToBlock(*data, arena, params->final, bucket_num); + Chunk chunk = convertToChunk(block); - return convertToChunk(block); + shared_data->source_for_bucket[bucket_num] = source_number; + + return chunk; } private: AggregatingTransformParamsPtr params; ManyAggregatedDataVariantsPtr data; - std::shared_ptr> next_bucket_to_merge; + SharedDataPtr shared_data; + Int32 source_number; Arena * arena; - - static constexpr UInt32 NUM_BUCKETS = 256; }; /// Generates chunks with aggregated data. @@ -159,6 +179,7 @@ public: auto & out = source->getOutputs().front(); inputs.emplace_back(out.getHeader(), this); connect(out, inputs.back()); + inputs.back().setNeeded(); } return std::move(processors); @@ -200,7 +221,7 @@ public: return Status::Ready; /// Two-level case. - return preparePullFromInputs(); + return prepareTwoLevel(); } private: @@ -220,38 +241,37 @@ private: } /// Read all sources and try to push current bucket. - IProcessor::Status preparePullFromInputs() + IProcessor::Status prepareTwoLevel() { - bool all_inputs_are_finished = true; + auto & output = outputs.front(); - for (auto & input : inputs) + Int32 next_input_num = shared_data->source_for_bucket[current_bucket_num]; + if (next_input_num < 0) + return Status::NeedData; + + auto next_input = std::next(inputs.begin(), next_input_num); + /// next_input can't be finished till data was not pulled. + if (!next_input->hasData()) + return Status::NeedData; + + output.push(next_input->pull()); + + ++current_bucket_num; + if (current_bucket_num == NUM_BUCKETS) { - if (input.isFinished()) - continue; - - all_inputs_are_finished = false; - - input.setNeeded(); - - if (input.hasData()) - ready_chunks.emplace_back(input.pull()); + output.finish(); + /// Do not close inputs, they must be finished. + return Status::Finished; } - moveReadyChunksToMap(); - - if (trySetCurrentChunkFromCurrentBucket()) - return preparePushToOutput(); - - if (all_inputs_are_finished) - throw Exception("All sources have finished before getting enough data in " - "ConvertingAggregatedToChunksTransform.", ErrorCodes::LOGICAL_ERROR); - - return Status::NeedData; + return Status::PortFull; } private: AggregatingTransformParamsPtr params; ManyAggregatedDataVariantsPtr data; + ConvertingAggregatedToChunksSource::SharedDataPtr shared_data; + size_t num_threads; bool is_initialized = false; @@ -259,49 +279,12 @@ private: bool finished = false; Chunk current_chunk; - Chunks ready_chunks; UInt32 current_bucket_num = 0; static constexpr Int32 NUM_BUCKETS = 256; - std::map bucket_to_chunk; Processors processors; - static Int32 getBucketFromChunk(const Chunk & chunk) - { - auto & info = chunk.getChunkInfo(); - if (!info) - throw Exception("Chunk info was not set for chunk in " - "ConvertingAggregatedToChunksTransform.", ErrorCodes::LOGICAL_ERROR); - - auto * agg_info = typeid_cast(info.get()); - if (!agg_info) - throw Exception("Chunk should have AggregatedChunkInfo in " - "ConvertingAggregatedToChunksTransform.", ErrorCodes::LOGICAL_ERROR); - - return agg_info->bucket_num; - } - - void moveReadyChunksToMap() - { - for (auto & chunk : ready_chunks) - { - auto bucket = getBucketFromChunk(chunk); - - if (bucket < 0 || bucket >= NUM_BUCKETS) - throw Exception("Invalid bucket number " + toString(bucket) + " in " - "ConvertingAggregatedToChunksTransform.", ErrorCodes::LOGICAL_ERROR); - - if (bucket_to_chunk.count(bucket)) - throw Exception("Found several chunks with the same bucket number in " - "ConvertingAggregatedToChunksTransform.", ErrorCodes::LOGICAL_ERROR); - - bucket_to_chunk[bucket] = std::move(chunk); - } - - ready_chunks.clear(); - } - void setCurrentChunk(Chunk chunk) { if (has_input) @@ -366,34 +349,17 @@ private: void createSources() { AggregatedDataVariantsPtr & first = data->at(0); - auto next_bucket_to_merge = std::make_shared>(0); + shared_data = std::make_shared(); for (size_t thread = 0; thread < num_threads; ++thread) { Arena * arena = first->aggregates_pools.at(thread).get(); auto source = std::make_shared( - params, data, arena, next_bucket_to_merge); + params, data, shared_data, thread, arena); processors.emplace_back(std::move(source)); } } - - bool trySetCurrentChunkFromCurrentBucket() - { - auto it = bucket_to_chunk.find(current_bucket_num); - if (it != bucket_to_chunk.end()) - { - setCurrentChunk(std::move(it->second)); - ++current_bucket_num; - - if (current_bucket_num == NUM_BUCKETS) - finished = true; - - return true; - } - - return false; - } }; AggregatingTransform::AggregatingTransform(Block header, AggregatingTransformParamsPtr params_) diff --git a/dbms/src/Processors/Transforms/MergeSortingTransform.cpp b/dbms/src/Processors/Transforms/MergeSortingTransform.cpp index 83d80d42e05..39da24ba149 100644 --- a/dbms/src/Processors/Transforms/MergeSortingTransform.cpp +++ b/dbms/src/Processors/Transforms/MergeSortingTransform.cpp @@ -1,11 +1,10 @@ -#include #include #include #include -#include #include -#include #include +#include +#include #include #include #include @@ -21,6 +20,13 @@ namespace ProfileEvents namespace DB { +namespace ErrorCodes +{ + extern const int NOT_ENOUGH_SPACE; +} +class MergeSorter; + + class BufferingToFileTransform : public IAccumulatingTransform { public: diff --git a/dbms/src/Processors/Transforms/MergeSortingTransform.h b/dbms/src/Processors/Transforms/MergeSortingTransform.h index ee51f29565a..ecfaeb4f272 100644 --- a/dbms/src/Processors/Transforms/MergeSortingTransform.h +++ b/dbms/src/Processors/Transforms/MergeSortingTransform.h @@ -1,25 +1,14 @@ #pragma once + #include #include #include -#include -#include -#include -#include - #include -#include namespace DB { -namespace ErrorCodes -{ - extern const int NOT_ENOUGH_SPACE; -} -class MergeSorter; - class MergeSortingTransform : public SortingTransform { public: diff --git a/dbms/src/Processors/Transforms/MergingAggregatedMemoryEfficientTransform.cpp b/dbms/src/Processors/Transforms/MergingAggregatedMemoryEfficientTransform.cpp index d9b5fbc330e..42a94ea0bf9 100644 --- a/dbms/src/Processors/Transforms/MergingAggregatedMemoryEfficientTransform.cpp +++ b/dbms/src/Processors/Transforms/MergingAggregatedMemoryEfficientTransform.cpp @@ -429,21 +429,30 @@ IProcessor::Status SortingAggregatedTransform::prepare() continue; } - all_finished = false; + //all_finished = false; in->setNeeded(); if (!in->hasData()) { need_data = true; + all_finished = false; continue; } auto chunk = in->pull(); - /// If chunk was pulled, then we need data from this port. - need_data = true; - addChunk(std::move(chunk), input_num); + + if (in->isFinished()) + { + is_input_finished[input_num] = true; + } + else + { + /// If chunk was pulled, then we need data from this port. + need_data = true; + all_finished = false; + } } if (pushed_to_output) diff --git a/dbms/src/Processors/Transforms/MergingSortedTransform.cpp b/dbms/src/Processors/Transforms/MergingSortedTransform.cpp index 705116ca081..ddbd91b38d1 100644 --- a/dbms/src/Processors/Transforms/MergingSortedTransform.cpp +++ b/dbms/src/Processors/Transforms/MergingSortedTransform.cpp @@ -148,9 +148,9 @@ IProcessor::Status MergingSortedTransform::prepare() return Status::NeedData; if (has_collation) - initQueue(queue_with_collation); + queue_with_collation = SortingHeap(cursors); else - initQueue(queue_without_collation); + queue_without_collation = SortingHeap(cursors); is_initialized = true; return Status::Ready; @@ -169,7 +169,6 @@ IProcessor::Status MergingSortedTransform::prepare() if (need_data) { - auto & input = *std::next(inputs.begin(), next_input_to_read); if (!input.isFinished()) { @@ -183,7 +182,11 @@ IProcessor::Status MergingSortedTransform::prepare() return Status::NeedData; updateCursor(std::move(chunk), next_input_to_read); - pushToQueue(next_input_to_read); + + if (has_collation) + queue_with_collation.push(cursors[next_input_to_read]); + else + queue_without_collation.push(cursors[next_input_to_read]); } need_data = false; @@ -201,8 +204,8 @@ void MergingSortedTransform::work() merge(queue_without_collation); } -template -void MergingSortedTransform::merge(std::priority_queue & queue) +template +void MergingSortedTransform::merge(TSortingHeap & queue) { /// Returns MergeStatus which we should return if we are going to finish now. auto can_read_another_row = [&, this]() @@ -224,77 +227,66 @@ void MergingSortedTransform::merge(std::priority_queue & queue) }; /// Take rows in required order and put them into `merged_data`, while the rows are no more than `max_block_size` - while (!queue.empty()) + while (queue.isValid()) { /// Shouldn't happen at first iteration, but check just in case. if (!can_read_another_row()) return; - TSortCursor current = queue.top(); - queue.pop(); - bool first_iteration = true; + auto current = queue.current(); - while (true) + /** And what if the block is totally less or equal than the rest for the current cursor? + * Or is there only one data source left in the queue? Then you can take the entire block on current cursor. + */ + if (current.impl->isFirst() + && (queue.size() == 1 + || (queue.size() >= 2 && current.totallyLessOrEquals(queue.nextChild())))) { - if (!first_iteration && !can_read_another_row()) + //std::cerr << "current block is totally less or equals\n"; + + /// If there are already data in the current block, we first return it. We'll get here again the next time we call the merge function. + if (merged_data.mergedRows() != 0) { - queue.push(current); - return; - } - first_iteration = false; - - /** And what if the block is totally less or equal than the rest for the current cursor? - * Or is there only one data source left in the queue? Then you can take the entire block on current cursor. - */ - if (current.impl->isFirst() && (queue.empty() || current.totallyLessOrEquals(queue.top()))) - { - //std::cerr << "current block is totally less or equals\n"; - - /// If there are already data in the current block, we first return it. We'll get here again the next time we call the merge function. - if (merged_data.mergedRows() != 0) - { - //std::cerr << "merged rows is non-zero\n"; - queue.push(current); - return; - } - - /// Actually, current.impl->order stores source number (i.e. cursors[current.impl->order] == current.impl) - size_t source_num = current.impl->order; - insertFromChunk(source_num); + //std::cerr << "merged rows is non-zero\n"; return; } - //std::cerr << "total_merged_rows: " << total_merged_rows << ", merged_rows: " << merged_rows << "\n"; - //std::cerr << "Inserting row\n"; - merged_data.insertRow(current->all_columns, current->pos); + /// Actually, current.impl->order stores source number (i.e. cursors[current.impl->order] == current.impl) + size_t source_num = current.impl->order; + insertFromChunk(source_num); + queue.removeTop(); + return; + } - if (out_row_sources_buf) - { - /// Actually, current.impl->order stores source number (i.e. cursors[current.impl->order] == current.impl) - RowSourcePart row_source(current.impl->order); - out_row_sources_buf->write(row_source.data); - } + //std::cerr << "total_merged_rows: " << total_merged_rows << ", merged_rows: " << merged_rows << "\n"; + //std::cerr << "Inserting row\n"; + merged_data.insertRow(current->all_columns, current->pos); - if (current->isLast()) - { - need_data = true; - next_input_to_read = current.impl->order; + if (out_row_sources_buf) + { + /// Actually, current.impl->order stores source number (i.e. cursors[current.impl->order] == current.impl) + RowSourcePart row_source(current.impl->order); + out_row_sources_buf->write(row_source.data); + } - if (limit && merged_data.totalMergedRows() >= limit) - is_finished = true; + if (!current->isLast()) + { +// std::cerr << "moving to next row\n"; + queue.next(); + } + else + { + /// We will get the next block from the corresponding source, if there is one. + queue.removeTop(); - return; - } +// std::cerr << "It was last row, fetching next block\n"; + need_data = true; + next_input_to_read = current.impl->order; - //std::cerr << "moving to next row\n"; - current->next(); + if (limit && merged_data.totalMergedRows() >= limit) + is_finished = true; - if (!queue.empty() && current.greater(queue.top())) - { - //std::cerr << "next row is not least, pushing back to queue\n"; - queue.push(current); - break; - } + return; } } is_finished = true; diff --git a/dbms/src/Processors/Transforms/MergingSortedTransform.h b/dbms/src/Processors/Transforms/MergingSortedTransform.h index b32dd076c5f..aa88fb09623 100644 --- a/dbms/src/Processors/Transforms/MergingSortedTransform.h +++ b/dbms/src/Processors/Transforms/MergingSortedTransform.h @@ -1,10 +1,10 @@ #pragma once + #include #include #include #include -#include namespace DB { @@ -111,14 +111,10 @@ protected: /// Chunks currently being merged. std::vector source_chunks; - using CursorImpls = std::vector; - CursorImpls cursors; + SortCursorImpls cursors; - using Queue = std::priority_queue; - Queue queue_without_collation; - - using QueueWithCollation = std::priority_queue; - QueueWithCollation queue_with_collation; + SortingHeap queue_without_collation; + SortingHeap queue_with_collation; private: @@ -128,8 +124,8 @@ private: bool need_data = false; size_t next_input_to_read = 0; - template - void merge(std::priority_queue & queue); + template + void merge(TSortingHeap & queue); void insertFromChunk(size_t source_num); @@ -159,22 +155,6 @@ private: shared_chunk_ptr->all_columns = cursors[source_num].all_columns; shared_chunk_ptr->sort_columns = cursors[source_num].sort_columns; } - - void pushToQueue(size_t source_num) - { - if (has_collation) - queue_with_collation.push(SortCursorWithCollation(&cursors[source_num])); - else - queue_without_collation.push(SortCursor(&cursors[source_num])); - } - - template - void initQueue(std::priority_queue & queue) - { - for (auto & cursor : cursors) - if (!cursor.empty()) - queue.push(TSortCursor(&cursor)); - } }; } diff --git a/dbms/src/Processors/Transforms/SortingTransform.cpp b/dbms/src/Processors/Transforms/SortingTransform.cpp index ab87591c0d6..30f53742ec0 100644 --- a/dbms/src/Processors/Transforms/SortingTransform.cpp +++ b/dbms/src/Processors/Transforms/SortingTransform.cpp @@ -40,16 +40,12 @@ MergeSorter::MergeSorter(Chunks chunks_, SortDescription & description_, size_t chunks.swap(nonempty_chunks); - if (!has_collation) - { - for (auto & cursor : cursors) - queue_without_collation.push(SortCursor(&cursor)); - } + if (has_collation) + queue_with_collation = SortingHeap(cursors); + else if (description.size() > 1) + queue_without_collation = SortingHeap(cursors); else - { - for (auto & cursor : cursors) - queue_with_collation.push(SortCursorWithCollation(&cursor)); - } + queue_simple = SortingHeap(cursors); } @@ -65,50 +61,61 @@ Chunk MergeSorter::read() return res; } - return !has_collation - ? mergeImpl(queue_without_collation) - : mergeImpl(queue_with_collation); + if (has_collation) + return mergeImpl(queue_with_collation); + else if (description.size() > 1) + return mergeImpl(queue_without_collation); + else + return mergeImpl(queue_simple); } -template -Chunk MergeSorter::mergeImpl(std::priority_queue & queue) +template +Chunk MergeSorter::mergeImpl(TSortingHeap & queue) { size_t num_columns = chunks[0].getNumColumns(); - MutableColumns merged_columns = chunks[0].cloneEmptyColumns(); - /// TODO: reserve (in each column) + + /// Reserve + if (queue.isValid()) + { + /// The expected size of output block is the same as input block + size_t size_to_reserve = chunks[0].getNumRows(); + for (auto & column : merged_columns) + column->reserve(size_to_reserve); + } + + /// TODO: Optimization when a single block left. /// Take rows from queue in right order and push to 'merged'. size_t merged_rows = 0; - while (!queue.empty()) + while (queue.isValid()) { - TSortCursor current = queue.top(); - queue.pop(); + auto current = queue.current(); + /// Append a row from queue. for (size_t i = 0; i < num_columns; ++i) merged_columns[i]->insertFrom(*current->all_columns[i], current->pos); ++total_merged_rows; ++merged_rows; - if (!current->isLast()) - { - current->next(); - queue.push(current); - } - + /// We don't need more rows because of limit has reached. if (limit && total_merged_rows == limit) { chunks.clear(); - return Chunk(std::move(merged_columns), merged_rows); + break; } + queue.next(); + + /// It's enough for current output block but we will continue. if (merged_rows == max_merged_block_size) - return Chunk(std::move(merged_columns), merged_rows); + break; } - chunks.clear(); + if (!queue.isValid()) + chunks.clear(); if (merged_rows == 0) return {}; diff --git a/dbms/src/Processors/Transforms/SortingTransform.h b/dbms/src/Processors/Transforms/SortingTransform.h index 2703501c81a..49bdf303c7f 100644 --- a/dbms/src/Processors/Transforms/SortingTransform.h +++ b/dbms/src/Processors/Transforms/SortingTransform.h @@ -1,10 +1,10 @@ #pragma once + #include #include #include #include #include -#include namespace DB @@ -27,19 +27,19 @@ private: UInt64 limit; size_t total_merged_rows = 0; - using CursorImpls = std::vector; - CursorImpls cursors; + SortCursorImpls cursors; bool has_collation = false; - std::priority_queue queue_without_collation; - std::priority_queue queue_with_collation; + SortingHeap queue_without_collation; + SortingHeap queue_simple; + SortingHeap queue_with_collation; /** Two different cursors are supported - with and without Collation. * Templates are used (instead of virtual functions in SortCursor) for zero-overhead. */ - template - Chunk mergeImpl(std::priority_queue & queue); + template + Chunk mergeImpl(TSortingHeap & queue); }; diff --git a/dbms/src/Storages/AlterCommands.cpp b/dbms/src/Storages/AlterCommands.cpp index 217f7787d75..c586ea54c98 100644 --- a/dbms/src/Storages/AlterCommands.cpp +++ b/dbms/src/Storages/AlterCommands.cpp @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -213,9 +214,7 @@ std::optional AlterCommand::parse(const ASTAlterCommand * command_ } -void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescription & indices_description, - ConstraintsDescription & constraints_description, ASTPtr & order_by_ast, ASTPtr & primary_key_ast, - ASTPtr & ttl_table_ast, SettingsChanges & changes) const +void AlterCommand::apply(StorageInMemoryMetadata & metadata) const { if (type == ADD_COLUMN) { @@ -231,18 +230,18 @@ void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescri column.codec = codec; column.ttl = ttl; - columns_description.add(column, after_column); + metadata.columns.add(column, after_column); /// Slow, because each time a list is copied - columns_description.flattenNested(); + metadata.columns.flattenNested(); } else if (type == DROP_COLUMN) { - columns_description.remove(column_name); + metadata.columns.remove(column_name); } else if (type == MODIFY_COLUMN) { - columns_description.modify(column_name, [&](ColumnDescription & column) + metadata.columns.modify(column_name, [&](ColumnDescription & column) { if (codec) { @@ -273,24 +272,24 @@ void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescri } else if (type == MODIFY_ORDER_BY) { - if (!primary_key_ast && order_by_ast) + if (!metadata.primary_key_ast && metadata.order_by_ast) { /// Primary and sorting key become independent after this ALTER so we have to /// save the old ORDER BY expression as the new primary key. - primary_key_ast = order_by_ast->clone(); + metadata.primary_key_ast = metadata.order_by_ast->clone(); } - order_by_ast = order_by; + metadata.order_by_ast = order_by; } else if (type == COMMENT_COLUMN) { - columns_description.modify(column_name, [&](ColumnDescription & column) { column.comment = *comment; }); + metadata.columns.modify(column_name, [&](ColumnDescription & column) { column.comment = *comment; }); } else if (type == ADD_INDEX) { if (std::any_of( - indices_description.indices.cbegin(), - indices_description.indices.cend(), + metadata.indices.indices.cbegin(), + metadata.indices.indices.cend(), [this](const ASTPtr & index_ast) { return index_ast->as().name == index_name; @@ -303,38 +302,38 @@ void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescri ErrorCodes::ILLEGAL_COLUMN}; } - auto insert_it = indices_description.indices.end(); + auto insert_it = metadata.indices.indices.end(); if (!after_index_name.empty()) { insert_it = std::find_if( - indices_description.indices.begin(), - indices_description.indices.end(), + metadata.indices.indices.begin(), + metadata.indices.indices.end(), [this](const ASTPtr & index_ast) { return index_ast->as().name == after_index_name; }); - if (insert_it == indices_description.indices.end()) + if (insert_it == metadata.indices.indices.end()) throw Exception("Wrong index name. Cannot find index " + backQuote(after_index_name) + " to insert after.", ErrorCodes::LOGICAL_ERROR); ++insert_it; } - indices_description.indices.emplace(insert_it, std::dynamic_pointer_cast(index_decl)); + metadata.indices.indices.emplace(insert_it, std::dynamic_pointer_cast(index_decl)); } else if (type == DROP_INDEX) { auto erase_it = std::find_if( - indices_description.indices.begin(), - indices_description.indices.end(), + metadata.indices.indices.begin(), + metadata.indices.indices.end(), [this](const ASTPtr & index_ast) { return index_ast->as().name == index_name; }); - if (erase_it == indices_description.indices.end()) + if (erase_it == metadata.indices.indices.end()) { if (if_exists) return; @@ -342,13 +341,13 @@ void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescri ErrorCodes::LOGICAL_ERROR); } - indices_description.indices.erase(erase_it); + metadata.indices.indices.erase(erase_it); } else if (type == ADD_CONSTRAINT) { if (std::any_of( - constraints_description.constraints.cbegin(), - constraints_description.constraints.cend(), + metadata.constraints.constraints.cbegin(), + metadata.constraints.constraints.cend(), [this](const ASTPtr & constraint_ast) { return constraint_ast->as().name == constraint_name; @@ -360,36 +359,46 @@ void AlterCommand::apply(ColumnsDescription & columns_description, IndicesDescri ErrorCodes::ILLEGAL_COLUMN); } - auto insert_it = constraints_description.constraints.end(); + auto insert_it = metadata.constraints.constraints.end(); - constraints_description.constraints.emplace(insert_it, std::dynamic_pointer_cast(constraint_decl)); + metadata.constraints.constraints.emplace(insert_it, std::dynamic_pointer_cast(constraint_decl)); } else if (type == DROP_CONSTRAINT) { auto erase_it = std::find_if( - constraints_description.constraints.begin(), - constraints_description.constraints.end(), + metadata.constraints.constraints.begin(), + metadata.constraints.constraints.end(), [this](const ASTPtr & constraint_ast) { return constraint_ast->as().name == constraint_name; }); - if (erase_it == constraints_description.constraints.end()) + if (erase_it == metadata.constraints.constraints.end()) { if (if_exists) return; throw Exception("Wrong constraint name. Cannot find constraint `" + constraint_name + "` to drop.", ErrorCodes::LOGICAL_ERROR); } - constraints_description.constraints.erase(erase_it); + metadata.constraints.constraints.erase(erase_it); } else if (type == MODIFY_TTL) { - ttl_table_ast = ttl; + metadata.ttl_for_table_ast = ttl; } else if (type == MODIFY_SETTING) { - changes.insert(changes.end(), settings_changes.begin(), settings_changes.end()); + auto & settings_from_storage = metadata.settings_ast->as().changes; + for (const auto & change : settings_changes) + { + auto finder = [&change](const SettingChange & c) { return c.name == change.name; }; + auto it = std::find_if(settings_from_storage.begin(), settings_from_storage.end(), finder); + + if (it != settings_from_storage.end()) + it->value = change.value; + else + settings_from_storage.push_back(change); + } } else throw Exception("Wrong parameter type in ALTER query", ErrorCodes::LOGICAL_ERROR); @@ -411,35 +420,72 @@ bool AlterCommand::isSettingsAlter() const return type == MODIFY_SETTING; } -void AlterCommands::apply(ColumnsDescription & columns_description, IndicesDescription & indices_description, - ConstraintsDescription & constraints_description, ASTPtr & order_by_ast, ASTPtr & primary_key_ast, - ASTPtr & ttl_table_ast, SettingsChanges & changes) const +bool AlterCommand::isCommentAlter() const { - auto new_columns_description = columns_description; - auto new_indices_description = indices_description; - auto new_constraints_description = constraints_description; - auto new_order_by_ast = order_by_ast; - auto new_primary_key_ast = primary_key_ast; - auto new_ttl_table_ast = ttl_table_ast; - auto new_changes = changes; - - for (const AlterCommand & command : *this) - if (!command.ignore) - command.apply(new_columns_description, new_indices_description, new_constraints_description, new_order_by_ast, new_primary_key_ast, new_ttl_table_ast, new_changes); - - columns_description = std::move(new_columns_description); - indices_description = std::move(new_indices_description); - constraints_description = std::move(new_constraints_description); - order_by_ast = std::move(new_order_by_ast); - primary_key_ast = std::move(new_primary_key_ast); - ttl_table_ast = std::move(new_ttl_table_ast); - changes = std::move(new_changes); + if (type == COMMENT_COLUMN) + { + return true; + } + else if (type == MODIFY_COLUMN) + { + return comment.has_value() + && codec == nullptr + && data_type == nullptr + && default_expression == nullptr + && ttl == nullptr; + } + return false; } -void AlterCommands::validate(const IStorage & table, const Context & context) + +String alterTypeToString(const AlterCommand::Type type) +{ + switch (type) + { + case AlterCommand::Type::ADD_COLUMN: + return "ADD COLUMN"; + case AlterCommand::Type::ADD_CONSTRAINT: + return "ADD CONSTRAINT"; + case AlterCommand::Type::ADD_INDEX: + return "ADD INDEX"; + case AlterCommand::Type::COMMENT_COLUMN: + return "COMMENT COLUMN"; + case AlterCommand::Type::DROP_COLUMN: + return "DROP COLUMN"; + case AlterCommand::Type::DROP_CONSTRAINT: + return "DROP CONSTRAINT"; + case AlterCommand::Type::DROP_INDEX: + return "DROP INDEX"; + case AlterCommand::Type::MODIFY_COLUMN: + return "MODIFY COLUMN"; + case AlterCommand::Type::MODIFY_ORDER_BY: + return "MODIFY ORDER BY"; + case AlterCommand::Type::MODIFY_TTL: + return "MODIFY TTL"; + case AlterCommand::Type::MODIFY_SETTING: + return "MODIFY SETTING"; + } + __builtin_unreachable(); +} + +void AlterCommands::apply(StorageInMemoryMetadata & metadata) const +{ + if (!prepared) + throw DB::Exception("Alter commands is not prepared. Cannot apply. It's a bug", ErrorCodes::LOGICAL_ERROR); + + auto metadata_copy = metadata; + for (const AlterCommand & command : *this) + if (!command.ignore) + command.apply(metadata_copy); + + metadata = std::move(metadata_copy); +} + + +void AlterCommands::prepare(const StorageInMemoryMetadata & metadata, const Context & context) { /// A temporary object that is used to keep track of the current state of columns after applying a subset of commands. - auto columns = table.getColumns(); + auto columns = metadata.columns; /// Default expressions will be added to this list for type deduction. auto default_expr_list = std::make_shared(); @@ -461,19 +507,13 @@ void AlterCommands::validate(const IStorage & table, const Context & context) { if (command.if_not_exists) command.ignore = true; - else - throw Exception{"Cannot add column " + column_name + ": column with this name already exists", ErrorCodes::ILLEGAL_COLUMN}; } } else if (command.type == AlterCommand::MODIFY_COLUMN) { if (!columns.has(column_name)) - { if (command.if_exists) command.ignore = true; - else - throw Exception{"Wrong column name. Cannot find column " + column_name + " to modify", ErrorCodes::ILLEGAL_COLUMN}; - } if (!command.ignore) columns.remove(column_name); @@ -513,45 +553,15 @@ void AlterCommands::validate(const IStorage & table, const Context & context) else if (command.type == AlterCommand::DROP_COLUMN) { if (columns.has(command.column_name) || columns.hasNested(command.column_name)) - { - for (const ColumnDescription & column : columns) - { - const auto & default_expression = column.default_desc.expression; - if (!default_expression) - continue; - - ASTPtr query = default_expression->clone(); - auto syntax_result = SyntaxAnalyzer(context).analyze(query, columns.getAll()); - const auto actions = ExpressionAnalyzer(query, syntax_result, context).getActions(true); - const auto required_columns = actions->getRequiredColumns(); - - if (required_columns.end() != std::find(required_columns.begin(), required_columns.end(), command.column_name)) - throw Exception( - "Cannot drop column " + command.column_name + ", because column " + column.name + - " depends on it", ErrorCodes::ILLEGAL_COLUMN); - } - columns.remove(command.column_name); - } else if (command.if_exists) command.ignore = true; - else - throw Exception("Wrong column name. Cannot find column " + command.column_name + " to drop", - ErrorCodes::ILLEGAL_COLUMN); } else if (command.type == AlterCommand::COMMENT_COLUMN) { - if (!columns.has(command.column_name)) - { - if (command.if_exists) - command.ignore = true; - else - throw Exception{"Wrong column name. Cannot find column " + command.column_name + " to comment", ErrorCodes::ILLEGAL_COLUMN}; - } + if (!columns.has(command.column_name) && command.if_exists) + command.ignore = true; } - else if (command.type == AlterCommand::MODIFY_SETTING) - for (const auto & change : command.settings_changes) - table.checkSettingCanBeChanged(change.name); } /** Existing defaulted columns may require default expression extensions with a type conversion, @@ -596,10 +606,25 @@ void AlterCommands::validate(const IStorage & table, const Context & context) { if (!command) { +#if !__clang__ +# pragma GCC diagnostic push +# pragma GCC diagnostic ignored "-Wmissing-field-initializers" +#endif + /// We completely sure, that we initialize all required fields + AlterCommand aux_command{ + .type = AlterCommand::MODIFY_COLUMN, + .column_name = column.name, + .data_type = explicit_type, + .default_kind = column.default_desc.kind, + .default_expression = column.default_desc.expression + }; +#if !__clang__ +# pragma GCC diagnostic pop +#endif + /// column has no associated alter command, let's create it /// add a new alter command to modify existing column - this->emplace_back(AlterCommand{AlterCommand::MODIFY_COLUMN, - column.name, explicit_type, column.default_desc.kind, column.default_desc.expression, {}, {}, {}, {}}); + this->emplace_back(aux_command); command = &back(); } @@ -615,62 +640,66 @@ void AlterCommands::validate(const IStorage & table, const Context & context) command->data_type = block.getByName(column.name).type; } } + prepared = true; } -void AlterCommands::applyForColumnsOnly(ColumnsDescription & columns_description) const +void AlterCommands::validate(const StorageInMemoryMetadata & metadata, const Context & context) const { - auto out_columns_description = columns_description; - IndicesDescription indices_description; - ConstraintsDescription constraints_description; - ASTPtr out_order_by; - ASTPtr out_primary_key; - ASTPtr out_ttl_table; - SettingsChanges out_changes; - apply(out_columns_description, indices_description, constraints_description, - out_order_by, out_primary_key, out_ttl_table, out_changes); + for (size_t i = 0; i < size(); ++i) + { + auto & command = (*this)[i]; + if (command.type == AlterCommand::ADD_COLUMN || command.type == AlterCommand::MODIFY_COLUMN) + { + const auto & column_name = command.column_name; - if (out_order_by) - throw Exception("Storage doesn't support modifying ORDER BY expression", ErrorCodes::NOT_IMPLEMENTED); - if (out_primary_key) - throw Exception("Storage doesn't support modifying PRIMARY KEY expression", ErrorCodes::NOT_IMPLEMENTED); - if (!indices_description.indices.empty()) - throw Exception("Storage doesn't support modifying indices", ErrorCodes::NOT_IMPLEMENTED); - if (!constraints_description.constraints.empty()) - throw Exception("Storage doesn't support modifying constraints", ErrorCodes::NOT_IMPLEMENTED); - if (out_ttl_table) - throw Exception("Storage doesn't support modifying TTL expression", ErrorCodes::NOT_IMPLEMENTED); - if (!out_changes.empty()) - throw Exception("Storage doesn't support modifying settings", ErrorCodes::NOT_IMPLEMENTED); + if (command.type == AlterCommand::ADD_COLUMN) + { + if (metadata.columns.has(column_name) || metadata.columns.hasNested(column_name)) + if (!command.if_not_exists) + throw Exception{"Cannot add column " + column_name + ": column with this name already exists", ErrorCodes::ILLEGAL_COLUMN}; + } + else if (command.type == AlterCommand::MODIFY_COLUMN) + { + if (!metadata.columns.has(column_name)) + if (!command.if_exists) + throw Exception{"Wrong column name. Cannot find column " + column_name + " to modify", ErrorCodes::ILLEGAL_COLUMN}; + } + } + else if (command.type == AlterCommand::DROP_COLUMN) + { + if (metadata.columns.has(command.column_name) || metadata.columns.hasNested(command.column_name)) + { + for (const ColumnDescription & column : metadata.columns) + { + const auto & default_expression = column.default_desc.expression; + if (!default_expression) + continue; - columns_description = std::move(out_columns_description); -} + ASTPtr query = default_expression->clone(); + auto syntax_result = SyntaxAnalyzer(context).analyze(query, metadata.columns.getAll()); + const auto actions = ExpressionAnalyzer(query, syntax_result, context).getActions(true); + const auto required_columns = actions->getRequiredColumns(); - -void AlterCommands::applyForSettingsOnly(SettingsChanges & changes) const -{ - ColumnsDescription out_columns_description; - IndicesDescription indices_description; - ConstraintsDescription constraints_description; - ASTPtr out_order_by; - ASTPtr out_primary_key; - ASTPtr out_ttl_table; - SettingsChanges out_changes; - apply(out_columns_description, indices_description, constraints_description, out_order_by, - out_primary_key, out_ttl_table, out_changes); - - if (out_columns_description.begin() != out_columns_description.end()) - throw Exception("Alter modifying columns, but only settings change applied.", ErrorCodes::LOGICAL_ERROR); - if (out_order_by) - throw Exception("Alter modifying ORDER BY expression, but only settings change applied.", ErrorCodes::LOGICAL_ERROR); - if (out_primary_key) - throw Exception("Alter modifying PRIMARY KEY expression, but only settings change applied.", ErrorCodes::LOGICAL_ERROR); - if (!indices_description.indices.empty()) - throw Exception("Alter modifying indices, but only settings change applied.", ErrorCodes::NOT_IMPLEMENTED); - if (out_ttl_table) - throw Exception("Alter modifying TTL, but only settings change applied.", ErrorCodes::NOT_IMPLEMENTED); - - changes = std::move(out_changes); + if (required_columns.end() != std::find(required_columns.begin(), required_columns.end(), command.column_name)) + throw Exception( + "Cannot drop column " + command.column_name + ", because column " + column.name + + " depends on it", ErrorCodes::ILLEGAL_COLUMN); + } + } + else if (!command.if_exists) + throw Exception("Wrong column name. Cannot find column " + command.column_name + " to drop", + ErrorCodes::ILLEGAL_COLUMN); + } + else if (command.type == AlterCommand::COMMENT_COLUMN) + { + if (!metadata.columns.has(command.column_name)) + { + if (!command.if_exists) + throw Exception{"Wrong column name. Cannot find column " + command.column_name + " to comment", ErrorCodes::ILLEGAL_COLUMN}; + } + } + } } bool AlterCommands::isModifyingData() const @@ -688,4 +717,9 @@ bool AlterCommands::isSettingsAlter() const { return std::all_of(begin(), end(), [](const AlterCommand & c) { return c.isSettingsAlter(); }); } + +bool AlterCommands::isCommentAlter() const +{ + return std::all_of(begin(), end(), [](const AlterCommand & c) { return c.isCommentAlter(); }); +} } diff --git a/dbms/src/Storages/AlterCommands.h b/dbms/src/Storages/AlterCommands.h index 1217d96dc29..e547752fa09 100644 --- a/dbms/src/Storages/AlterCommands.h +++ b/dbms/src/Storages/AlterCommands.h @@ -2,10 +2,9 @@ #include #include -#include #include -#include -#include +#include + #include @@ -31,11 +30,10 @@ struct AlterCommand ADD_CONSTRAINT, DROP_CONSTRAINT, MODIFY_TTL, - UKNOWN_TYPE, MODIFY_SETTING, }; - Type type = UKNOWN_TYPE; + Type type; String column_name; @@ -43,10 +41,12 @@ struct AlterCommand String partition_name; /// For ADD and MODIFY, a new column type. - DataTypePtr data_type; + DataTypePtr data_type = nullptr; ColumnDefaultKind default_kind{}; ASTPtr default_expression{}; + + /// For COMMENT column std::optional comment; /// For ADD - after which column to add a new one. If an empty string, add to the end. To add to the beginning now it is impossible. @@ -59,48 +59,36 @@ struct AlterCommand bool if_not_exists = false; /// For MODIFY_ORDER_BY - ASTPtr order_by; + ASTPtr order_by = nullptr; /// For ADD INDEX - ASTPtr index_decl; + ASTPtr index_decl = nullptr; String after_index_name; /// For ADD/DROP INDEX String index_name; // For ADD CONSTRAINT - ASTPtr constraint_decl; + ASTPtr constraint_decl = nullptr; // For ADD/DROP CONSTRAINT String constraint_name; /// For MODIFY TTL - ASTPtr ttl; + ASTPtr ttl = nullptr; /// indicates that this command should not be applied, for example in case of if_exists=true and column doesn't exist. bool ignore = false; /// For ADD and MODIFY - CompressionCodecPtr codec; + CompressionCodecPtr codec = nullptr; /// For MODIFY SETTING SettingsChanges settings_changes; - AlterCommand() = default; - AlterCommand(const Type type_, const String & column_name_, const DataTypePtr & data_type_, - const ColumnDefaultKind default_kind_, const ASTPtr & default_expression_, - const String & after_column_, const String & comment_, - const bool if_exists_, const bool if_not_exists_) - : type{type_}, column_name{column_name_}, data_type{data_type_}, default_kind{default_kind_}, - default_expression{default_expression_}, comment(comment_), after_column{after_column_}, - if_exists(if_exists_), if_not_exists(if_not_exists_) - {} - static std::optional parse(const ASTAlterCommand * command); - void apply(ColumnsDescription & columns_description, IndicesDescription & indices_description, - ConstraintsDescription & constraints_description, ASTPtr & order_by_ast, - ASTPtr & primary_key_ast, ASTPtr & ttl_table_ast, SettingsChanges & changes) const; + void apply(StorageInMemoryMetadata & metadata) const; /// Checks that alter query changes data. For MergeTree: /// * column files (data and marks) @@ -108,27 +96,46 @@ struct AlterCommand /// in each part on disk (it's not lightweight alter). bool isModifyingData() const; - /// checks that only settings changed by alter + /// Checks that only settings changed by alter bool isSettingsAlter() const; + + /// Checks that only comment changed by alter + bool isCommentAlter() const; }; +/// Return string representation of AlterCommand::Type +String alterTypeToString(const AlterCommand::Type type); + class Context; +/// Vector of AlterCommand with several additional functions class AlterCommands : public std::vector { +private: + bool prepared = false; public: - /// Used for primitive table engines, where only columns metadata can be changed - void applyForColumnsOnly(ColumnsDescription & columns_description) const; - void apply(ColumnsDescription & columns_description, IndicesDescription & indices_description, - ConstraintsDescription & constraints_description, ASTPtr & order_by_ast, ASTPtr & primary_key_ast, - ASTPtr & ttl_table_ast, SettingsChanges & changes) const; + /// Validate that commands can be applied to metadata. + /// Checks that all columns exist and dependecies between them. + /// This check is lightweight and base only on metadata. + /// More accurate check have to be performed with storage->checkAlterIsPossible. + void validate(const StorageInMemoryMetadata & metadata, const Context & context) const; - /// Apply alter commands only for settings. Exception will be thrown if any other part of table structure will be modified. - void applyForSettingsOnly(SettingsChanges & changes) const; + /// Prepare alter commands. Set ignore flag to some of them + /// and additional commands for dependent columns. + void prepare(const StorageInMemoryMetadata & metadata, const Context & context); - void validate(const IStorage & table, const Context & context); + /// Apply all alter command in sequential order to storage metadata. + /// Commands have to be prepared before apply. + void apply(StorageInMemoryMetadata & metadata) const; + + /// At least one command modify data on disk. bool isModifyingData() const; + + /// At least one command modify settings. bool isSettingsAlter() const; + + /// At least one command modify comments. + bool isCommentAlter() const; }; } diff --git a/dbms/src/Storages/IStorage.cpp b/dbms/src/Storages/IStorage.cpp index 169117f7b44..e48e9896597 100644 --- a/dbms/src/Storages/IStorage.cpp +++ b/dbms/src/Storages/IStorage.cpp @@ -27,6 +27,7 @@ namespace ErrorCodes extern const int SETTINGS_ARE_NOT_SUPPORTED; extern const int UNKNOWN_SETTING; extern const int TABLE_IS_DROPPED; + extern const int NOT_IMPLEMENTED; } IStorage::IStorage(ColumnsDescription virtuals_) : virtuals(std::move(virtuals_)) @@ -313,12 +314,6 @@ bool IStorage::isVirtualColumn(const String & column_name) const return getColumns().get(column_name).is_virtual; } -void IStorage::checkSettingCanBeChanged(const String & /* setting_name */) const -{ - if (!supportsSettings()) - throw Exception("Storage '" + getName() + "' doesn't support settings.", ErrorCodes::SETTINGS_ARE_NOT_SUPPORTED); -} - TableStructureReadLockHolder IStorage::lockStructureForShare(bool will_add_new_data, const String & query_id) { TableStructureReadLockHolder result; @@ -373,57 +368,41 @@ TableStructureWriteLockHolder IStorage::lockExclusively(const String & query_id) return result; } - -IDatabase::ASTModifier IStorage::getSettingsModifier(const SettingsChanges & new_changes) const +StorageInMemoryMetadata IStorage::getInMemoryMetadata() const { - return [&] (IAST & ast) + return { - if (!new_changes.empty()) - { - auto & storage_changes = ast.as().settings->changes; - /// Make storage settings unique - for (const auto & change : new_changes) - { - checkSettingCanBeChanged(change.name); - - auto finder = [&change] (const SettingChange & c) { return c.name == change.name; }; - if (auto it = std::find_if(storage_changes.begin(), storage_changes.end(), finder); it != storage_changes.end()) - it->value = change.value; - else - storage_changes.push_back(change); - } - } + .columns = getColumns(), + .indices = getIndices(), + .constraints = getConstraints(), }; } - void IStorage::alter( const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) { - if (params.isModifyingData()) - throw Exception("Method alter supports only change comment of column for storage " + getName(), ErrorCodes::NOT_IMPLEMENTED); - const String database_name = getDatabaseName(); const String table_name = getTableName(); - if (params.isSettingsAlter()) + lockStructureExclusively(table_lock_holder, context.getCurrentQueryId()); + + StorageInMemoryMetadata metadata = getInMemoryMetadata(); + params.apply(metadata); + context.getDatabase(database_name)->alterTable(context, table_name, metadata); + setColumns(std::move(metadata.columns)); +} + + +void IStorage::checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */) +{ + for (const auto & command : commands) { - SettingsChanges new_changes; - params.applyForSettingsOnly(new_changes); - IDatabase::ASTModifier settings_modifier = getSettingsModifier(new_changes); - context.getDatabase(database_name)->alterTable(context, table_name, getColumns(), getIndices(), getConstraints(), settings_modifier); - } - else - { - lockStructureExclusively(table_lock_holder, context.getCurrentQueryId()); - auto new_columns = getColumns(); - auto new_indices = getIndices(); - auto new_constraints = getConstraints(); - params.applyForColumnsOnly(new_columns); - context.getDatabase(database_name)->alterTable(context, table_name, new_columns, new_indices, new_constraints, {}); - setColumns(std::move(new_columns)); + if (!command.isCommentAlter()) + throw Exception( + "Alter of type '" + alterTypeToString(command.type) + "' is not supported by storage " + getName(), + ErrorCodes::NOT_IMPLEMENTED); } } @@ -446,21 +425,4 @@ BlockInputStreams IStorage::read( return res; } -DB::CompressionMethod IStorage::chooseCompressionMethod(const String & uri, const String & compression_method) -{ - if (compression_method == "auto" || compression_method == "") - { - if (endsWith(uri, ".gz")) - return DB::CompressionMethod::Gzip; - else - return DB::CompressionMethod::None; - } - else if (compression_method == "gzip") - return DB::CompressionMethod::Gzip; - else if (compression_method == "none") - return DB::CompressionMethod::None; - else - throw Exception("Only auto, none, gzip supported as compression method", ErrorCodes::NOT_IMPLEMENTED); -} - } diff --git a/dbms/src/Storages/IStorage.h b/dbms/src/Storages/IStorage.h index 255d53e5b0a..69bbca86879 100644 --- a/dbms/src/Storages/IStorage.h +++ b/dbms/src/Storages/IStorage.h @@ -5,7 +5,6 @@ #include #include #include -#include #include #include #include @@ -13,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -127,6 +127,10 @@ public: /// thread-unsafe part. lockStructure must be acquired const ConstraintsDescription & getConstraints() const; void setConstraints(ConstraintsDescription constraints_); + /// Returns storage metadata copy. Direct modification of + /// result structure doesn't affect storage. + virtual StorageInMemoryMetadata getInMemoryMetadata() const; + /// NOTE: these methods should include virtual columns, /// but should NOT include ALIAS columns (they are treated separately). virtual NameAndTypePair getColumn(const String & column_name) const; @@ -152,9 +156,6 @@ public: /// thread-unsafe part. lockStructure must be acquired /// If |need_all| is set, then checks that all the columns of the table are in the block. void check(const Block & block, bool need_all = false) const; - /// Check storage has setting and setting can be modified. - virtual void checkSettingCanBeChanged(const String & setting_name) const; - protected: /// still thread-unsafe part. void setIndices(IndicesDescription indices_); @@ -162,8 +163,6 @@ protected: /// still thread-unsafe part. /// Initially reserved virtual column name may be shadowed by real column. virtual bool isVirtualColumn(const String & column_name) const; - /// Returns modifier of settings in storage definition - IDatabase::ASTModifier getSettingsModifier(const SettingsChanges & new_changes) const; private: ColumnsDescription columns; /// combined real and virtual columns @@ -316,6 +315,11 @@ public: */ virtual void alter(const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder); + /** Checks that alter commands can be applied to storage. For example, columns can be modified, + * or primary key can be changes, etc. + */ + virtual void checkAlterIsPossible(const AlterCommands & commands, const Settings & settings); + /** ALTER tables with regard to its partitions. * Should handle locks for each command on its own. */ @@ -435,8 +439,6 @@ public: return {}; } - static DB::CompressionMethod chooseCompressionMethod(const String & uri, const String & compression_method); - private: /// You always need to take the next three locks in this order. diff --git a/dbms/src/Storages/Kafka/StorageKafka.cpp b/dbms/src/Storages/Kafka/StorageKafka.cpp index d732243c370..c1fc156ca37 100644 --- a/dbms/src/Storages/Kafka/StorageKafka.cpp +++ b/dbms/src/Storages/Kafka/StorageKafka.cpp @@ -418,15 +418,6 @@ bool StorageKafka::streamToViews() return limits_applied; } - -void StorageKafka::checkSettingCanBeChanged(const String & setting_name) const -{ - if (KafkaSettings::findIndex(setting_name) == KafkaSettings::npos) - throw Exception{"Storage '" + getName() + "' doesn't have setting '" + setting_name + "'", ErrorCodes::UNKNOWN_SETTING}; - - throw Exception{"Setting '" + setting_name + "' is readonly for storage '" + getName() + "'", ErrorCodes::READONLY_SETTING}; -} - void registerStorageKafka(StorageFactory & factory) { factory.registerStorage("Kafka", [](const StorageFactory::Arguments & args) diff --git a/dbms/src/Storages/Kafka/StorageKafka.h b/dbms/src/Storages/Kafka/StorageKafka.h index 224b5c0d709..7c1ba219245 100644 --- a/dbms/src/Storages/Kafka/StorageKafka.h +++ b/dbms/src/Storages/Kafka/StorageKafka.h @@ -64,8 +64,6 @@ public: const auto & getSchemaName() const { return schema_name; } const auto & skipBroken() const { return skip_broken; } - void checkSettingCanBeChanged(const String & setting_name) const override; - protected: StorageKafka( const std::string & table_name_, diff --git a/dbms/src/Storages/LiveView/LiveViewCommands.h b/dbms/src/Storages/LiveView/LiveViewCommands.h index 54048c28a5f..6757acdadab 100644 --- a/dbms/src/Storages/LiveView/LiveViewCommands.h +++ b/dbms/src/Storages/LiveView/LiveViewCommands.h @@ -9,13 +9,13 @@ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ - #pragma once #include #include #include + namespace DB { diff --git a/dbms/src/Storages/LiveView/LiveViewEventsBlockInputStream.h b/dbms/src/Storages/LiveView/LiveViewEventsBlockInputStream.h index 14cfade338d..bf971d75d01 100644 --- a/dbms/src/Storages/LiveView/LiveViewEventsBlockInputStream.h +++ b/dbms/src/Storages/LiveView/LiveViewEventsBlockInputStream.h @@ -9,7 +9,6 @@ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ - #pragma once #include diff --git a/dbms/src/Storages/LiveView/ProxyStorage.h b/dbms/src/Storages/LiveView/ProxyStorage.h deleted file mode 100644 index 60faa907209..00000000000 --- a/dbms/src/Storages/LiveView/ProxyStorage.h +++ /dev/null @@ -1,66 +0,0 @@ -#pragma once - -#include - -namespace DB -{ - -class ProxyStorage : public IStorage -{ -public: - ProxyStorage(StoragePtr storage_, BlockInputStreams streams_, QueryProcessingStage::Enum to_stage_) - : storage(std::move(storage_)), streams(std::move(streams_)), to_stage(to_stage_) {} - -public: - std::string getName() const override { return "ProxyStorage(" + storage->getName() + ")"; } - std::string getTableName() const override { return storage->getTableName(); } - - bool isRemote() const override { return storage->isRemote(); } - bool supportsSampling() const override { return storage->supportsSampling(); } - bool supportsFinal() const override { return storage->supportsFinal(); } - bool supportsPrewhere() const override { return storage->supportsPrewhere(); } - bool supportsReplication() const override { return storage->supportsReplication(); } - bool supportsDeduplication() const override { return storage->supportsDeduplication(); } - - QueryProcessingStage::Enum getQueryProcessingStage(const Context & /*context*/) const override { return to_stage; } - - BlockInputStreams read( - const Names & /*column_names*/, - const SelectQueryInfo & /*query_info*/, - const Context & /*context*/, - QueryProcessingStage::Enum /*processed_stage*/, - size_t /*max_block_size*/, - unsigned /*num_streams*/) override - { - return streams; - } - - bool supportsIndexForIn() const override { return storage->supportsIndexForIn(); } - bool mayBenefitFromIndexForIn(const ASTPtr & left_in_operand, const Context & query_context) const override { return storage->mayBenefitFromIndexForIn(left_in_operand, query_context); } - ASTPtr getPartitionKeyAST() const override { return storage->getPartitionKeyAST(); } - ASTPtr getSortingKeyAST() const override { return storage->getSortingKeyAST(); } - ASTPtr getPrimaryKeyAST() const override { return storage->getPrimaryKeyAST(); } - ASTPtr getSamplingKeyAST() const override { return storage->getSamplingKeyAST(); } - Names getColumnsRequiredForPartitionKey() const override { return storage->getColumnsRequiredForPartitionKey(); } - Names getColumnsRequiredForSortingKey() const override { return storage->getColumnsRequiredForSortingKey(); } - Names getColumnsRequiredForPrimaryKey() const override { return storage->getColumnsRequiredForPrimaryKey(); } - Names getColumnsRequiredForSampling() const override { return storage->getColumnsRequiredForSampling(); } - Names getColumnsRequiredForFinal() const override { return storage->getColumnsRequiredForFinal(); } - - const ColumnsDescription & getColumns() const override { return storage->getColumns(); } - void setColumns(ColumnsDescription columns_) override { return storage->setColumns(columns_); } - NameAndTypePair getColumn(const String & column_name) const override { return storage->getColumn(column_name); } - bool hasColumn(const String & column_name) const override { return storage->hasColumn(column_name); } - static StoragePtr createProxyStorage(StoragePtr storage, BlockInputStreams streams, QueryProcessingStage::Enum to_stage) - { - return std::make_shared(std::move(storage), std::move(streams), to_stage); - } -private: - StoragePtr storage; - BlockInputStreams streams; - QueryProcessingStage::Enum to_stage; -}; - - - -} diff --git a/dbms/src/Storages/LiveView/StorageBlocks.h b/dbms/src/Storages/LiveView/StorageBlocks.h new file mode 100644 index 00000000000..bd60e6f0b97 --- /dev/null +++ b/dbms/src/Storages/LiveView/StorageBlocks.h @@ -0,0 +1,51 @@ +#pragma once + +#include + + +namespace DB +{ + +class StorageBlocks : public IStorage +{ +/* Storage based on the prepared streams that already contain data blocks. + * Used by Live Views to complete stored query based on the mergeable blocks. + */ +public: + StorageBlocks(const std::string & database_name_, const std::string & table_name_, + const ColumnsDescription & columns_, BlockInputStreams streams_, + QueryProcessingStage::Enum to_stage_) + : database_name(database_name_), table_name(table_name_), streams(streams_), to_stage(to_stage_) + { + setColumns(columns_); + } + static StoragePtr createStorage(const std::string & database_name, const std::string & table_name, + const ColumnsDescription & columns, BlockInputStreams streams, QueryProcessingStage::Enum to_stage) + { + return std::make_shared(database_name, table_name, columns, streams, to_stage); + } + std::string getName() const override { return "Blocks"; } + std::string getTableName() const override { return table_name; } + std::string getDatabaseName() const override { return database_name; } + QueryProcessingStage::Enum getQueryProcessingStage(const Context & /*context*/) const override { return to_stage; } + + BlockInputStreams read( + const Names & /*column_names*/, + const SelectQueryInfo & /*query_info*/, + const Context & /*context*/, + QueryProcessingStage::Enum /*processed_stage*/, + size_t /*max_block_size*/, + unsigned /*num_streams*/) override + { + return streams; + } + +private: + std::string database_name; + std::string table_name; + Block res_block; + BlockInputStreams streams; + QueryProcessingStage::Enum to_stage; +}; + +} diff --git a/dbms/src/Storages/LiveView/StorageLiveView.cpp b/dbms/src/Storages/LiveView/StorageLiveView.cpp index 6118ef26bba..db410eeb5e4 100644 --- a/dbms/src/Storages/LiveView/StorageLiveView.cpp +++ b/dbms/src/Storages/LiveView/StorageLiveView.cpp @@ -1,4 +1,4 @@ -/* Copyright (c) 2018 BlackBerry Limited +/* iopyright (c) 2018 BlackBerry Limited Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. @@ -9,7 +9,6 @@ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ - #include #include #include @@ -32,14 +31,17 @@ limitations under the License. */ #include #include #include -#include +#include #include #include #include +#include +#include #include #include + namespace DB { @@ -52,13 +54,16 @@ namespace ErrorCodes extern const int SUPPORT_IS_DISABLED; } -static void extractDependentTable(ASTSelectQuery & query, String & select_database_name, String & select_table_name) +static void extractDependentTable(ASTPtr & query, String & select_database_name, String & select_table_name, const String & table_name, ASTPtr & inner_subquery) { - auto db_and_table = getDatabaseAndTable(query, 0); - ASTPtr subquery = extractTableExpression(query, 0); + ASTSelectQuery & select_query = typeid_cast(*query); + auto db_and_table = getDatabaseAndTable(select_query, 0); + ASTPtr subquery = extractTableExpression(select_query, 0); if (!db_and_table && !subquery) + { return; + } if (db_and_table) { @@ -68,19 +73,21 @@ static void extractDependentTable(ASTSelectQuery & query, String & select_databa { db_and_table->database = select_database_name; AddDefaultDatabaseVisitor visitor(select_database_name); - visitor.visit(query); + visitor.visit(select_query); } else select_database_name = db_and_table->database; + + select_query.replaceDatabaseAndTable("", table_name + "_blocks"); } else if (auto * ast_select = subquery->as()) { if (ast_select->list_of_selects->children.size() != 1) throw Exception("UNION is not supported for LIVE VIEW", ErrorCodes::QUERY_IS_NOT_SUPPORTED_IN_LIVE_VIEW); - auto & inner_query = ast_select->list_of_selects->children.at(0); + inner_subquery = ast_select->list_of_selects->children.at(0)->clone(); - extractDependentTable(inner_query->as(), select_database_name, select_table_name); + extractDependentTable(ast_select->list_of_selects->children.at(0), select_database_name, select_table_name, table_name, inner_subquery); } else throw Exception("Logical error while creating StorageLiveView." @@ -95,6 +102,8 @@ void StorageLiveView::writeIntoLiveView( const Context & context) { BlockOutputStreamPtr output = std::make_shared(live_view); + auto block_context = std::make_unique(context.getGlobalContext()); + block_context->makeQueryContext(); /// Check if live view has any readers if not /// just reset blocks to empty and do nothing else @@ -112,6 +121,10 @@ void StorageLiveView::writeIntoLiveView( BlockInputStreams from; BlocksPtrs mergeable_blocks; BlocksPtr new_mergeable_blocks = std::make_shared(); + ASTPtr mergeable_query = live_view.getInnerQuery(); + + if (live_view.getInnerSubQuery()) + mergeable_query = live_view.getInnerSubQuery(); { std::lock_guard lock(live_view.mutex); @@ -121,7 +134,7 @@ void StorageLiveView::writeIntoLiveView( { mergeable_blocks = std::make_shared>(); BlocksPtr base_mergeable_blocks = std::make_shared(); - InterpreterSelectQuery interpreter(live_view.getInnerQuery(), context, SelectQueryOptions(QueryProcessingStage::WithMergeableState), Names()); + InterpreterSelectQuery interpreter(mergeable_query, context, SelectQueryOptions(QueryProcessingStage::WithMergeableState), Names()); auto view_mergeable_stream = std::make_shared( interpreter.execute().in); while (Block this_block = view_mergeable_stream->read()) @@ -143,13 +156,14 @@ void StorageLiveView::writeIntoLiveView( } } + auto parent_storage = context.getTable(live_view.getSelectDatabaseName(), live_view.getSelectTableName()); + if (!is_block_processed) { - auto parent_storage = context.getTable(live_view.getSelectDatabaseName(), live_view.getSelectTableName()); BlockInputStreams streams = {std::make_shared(block)}; - auto proxy_storage = std::make_shared(parent_storage, std::move(streams), QueryProcessingStage::FetchColumns); - InterpreterSelectQuery select_block(live_view.getInnerQuery(), - context, proxy_storage, + auto blocks_storage = StorageBlocks::createStorage(live_view.database_name, live_view.table_name, + parent_storage->getColumns(), std::move(streams), QueryProcessingStage::FetchColumns); + InterpreterSelectQuery select_block(mergeable_query, context, blocks_storage, QueryProcessingStage::WithMergeableState); auto data_mergeable_stream = std::make_shared( select_block.execute().in); @@ -177,9 +191,10 @@ void StorageLiveView::writeIntoLiveView( } } - auto parent_storage = context.getTable(live_view.getSelectDatabaseName(), live_view.getSelectTableName()); - auto proxy_storage = std::make_shared(parent_storage, std::move(from), QueryProcessingStage::WithMergeableState); - InterpreterSelectQuery select(live_view.getInnerQuery(), context, proxy_storage, QueryProcessingStage::Complete); + auto blocks_storage = StorageBlocks::createStorage(live_view.database_name, live_view.table_name, parent_storage->getColumns(), std::move(from), QueryProcessingStage::WithMergeableState); + block_context->addExternalTable(live_view.table_name + "_blocks", blocks_storage); + + InterpreterSelectQuery select(live_view.getInnerBlocksQuery(), *block_context, StoragePtr(), SelectQueryOptions(QueryProcessingStage::Complete)); BlockInputStreamPtr data = std::make_shared(select.execute().in); /// Squashing is needed here because the view query can generate a lot of blocks @@ -201,6 +216,9 @@ StorageLiveView::StorageLiveView( : table_name(table_name_), database_name(database_name_), global_context(local_context.getGlobalContext()) { + live_view_context = std::make_unique(global_context); + live_view_context->makeQueryContext(); + setColumns(columns_); if (!query.select) @@ -212,9 +230,11 @@ StorageLiveView::StorageLiveView( throw Exception("UNION is not supported for LIVE VIEW", ErrorCodes::QUERY_IS_NOT_SUPPORTED_IN_LIVE_VIEW); inner_query = query.select->list_of_selects->children.at(0); + inner_blocks_query = inner_query->clone(); - ASTSelectQuery & select_query = typeid_cast(*inner_query); - extractDependentTable(select_query, select_database_name, select_table_name); + InterpreterSelectQuery(inner_blocks_query, *live_view_context, SelectQueryOptions().modify().analyze()); + + extractDependentTable(inner_blocks_query, select_database_name, select_table_name, table_name, inner_subquery); /// If the table is not specified - use the table `system.one` if (select_table_name.empty()) @@ -257,9 +277,7 @@ Block StorageLiveView::getHeader() const if (!sample_block) { - auto storage = global_context.getTable(select_database_name, select_table_name); - sample_block = InterpreterSelectQuery(inner_query, global_context, storage, - SelectQueryOptions(QueryProcessingStage::Complete)).getSampleBlock(); + sample_block = InterpreterSelectQuery(inner_query->clone(), *live_view_context, SelectQueryOptions(QueryProcessingStage::Complete)).getSampleBlock(); sample_block.insert({DataTypeUInt64().createColumnConst( sample_block.rows(), 0)->convertToFullColumnIfConst(), std::make_shared(), @@ -271,7 +289,6 @@ Block StorageLiveView::getHeader() const sample_block.safeGetByPosition(i).column = sample_block.safeGetByPosition(i).column->convertToFullColumnIfConst(); } } - return sample_block; } @@ -282,18 +299,28 @@ bool StorageLiveView::getNewBlocks() BlocksPtr new_blocks = std::make_shared(); BlocksMetadataPtr new_blocks_metadata = std::make_shared(); BlocksPtr new_mergeable_blocks = std::make_shared(); + ASTPtr mergeable_query = inner_query; - InterpreterSelectQuery interpreter(inner_query->clone(), global_context, SelectQueryOptions(QueryProcessingStage::WithMergeableState), Names()); + if (inner_subquery) + mergeable_query = inner_subquery; + + InterpreterSelectQuery interpreter(mergeable_query->clone(), *live_view_context, SelectQueryOptions(QueryProcessingStage::WithMergeableState), Names()); auto mergeable_stream = std::make_shared(interpreter.execute().in); while (Block block = mergeable_stream->read()) new_mergeable_blocks->push_back(block); + auto block_context = std::make_unique(global_context); + block_context->makeQueryContext(); + mergeable_blocks = std::make_shared>(); mergeable_blocks->push_back(new_mergeable_blocks); BlockInputStreamPtr from = std::make_shared(std::make_shared(new_mergeable_blocks), mergeable_stream->getHeader()); - auto proxy_storage = ProxyStorage::createProxyStorage(global_context.getTable(select_database_name, select_table_name), {from}, QueryProcessingStage::WithMergeableState); - InterpreterSelectQuery select(inner_query->clone(), global_context, proxy_storage, SelectQueryOptions(QueryProcessingStage::Complete)); + + auto blocks_storage = StorageBlocks::createStorage(database_name, table_name, global_context.getTable(select_database_name, select_table_name)->getColumns(), {from}, QueryProcessingStage::WithMergeableState); + block_context->addExternalTable(table_name + "_blocks", blocks_storage); + + InterpreterSelectQuery select(inner_blocks_query->clone(), *block_context, StoragePtr(), SelectQueryOptions(QueryProcessingStage::Complete)); BlockInputStreamPtr data = std::make_shared(select.execute().in); /// Squashing is needed here because the view query can generate a lot of blocks diff --git a/dbms/src/Storages/LiveView/StorageLiveView.h b/dbms/src/Storages/LiveView/StorageLiveView.h index 3f1dffb898c..a5b0f15e879 100644 --- a/dbms/src/Storages/LiveView/StorageLiveView.h +++ b/dbms/src/Storages/LiveView/StorageLiveView.h @@ -49,8 +49,19 @@ public: NameAndTypePair getColumn(const String & column_name) const override; bool hasColumn(const String & column_name) const override; - // const NamesAndTypesList & getColumnsListImpl() const override { return *columns; } ASTPtr getInnerQuery() const { return inner_query->clone(); } + ASTPtr getInnerSubQuery() const + { + if (inner_subquery) + return inner_subquery->clone(); + return nullptr; + } + ASTPtr getInnerBlocksQuery() const + { + if (inner_blocks_query) + return inner_blocks_query->clone(); + return nullptr; + } /// It is passed inside the query and solved at its level. bool supportsSampling() const override { return true; } @@ -146,8 +157,12 @@ private: String select_table_name; String table_name; String database_name; - ASTPtr inner_query; + ASTPtr inner_query; /// stored query : SELECT * FROM ( SELECT a FROM A) + ASTPtr inner_subquery; /// stored query's innermost subquery if any + ASTPtr inner_blocks_query; /// query over the mergeable blocks to produce final result Context & global_context; + std::unique_ptr live_view_context; + bool is_temporary = false; /// Mutex to protect access to sample block mutable std::mutex sample_block_lock; diff --git a/dbms/src/Storages/MarkCache.h b/dbms/src/Storages/MarkCache.h index 9ce04c01e43..6e36a941fff 100644 --- a/dbms/src/Storages/MarkCache.h +++ b/dbms/src/Storages/MarkCache.h @@ -38,8 +38,8 @@ private: using Base = LRUCache; public: - MarkCache(size_t max_size_in_bytes, const Delay & expiration_delay_) - : Base(max_size_in_bytes, expiration_delay_) {} + MarkCache(size_t max_size_in_bytes) + : Base(max_size_in_bytes) {} /// Calculate key from path to file and offset. static UInt128 hash(const String & path_to_file) diff --git a/dbms/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.cpp b/dbms/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.cpp index af43a0b8a6f..a519e2a4b71 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.cpp @@ -25,6 +25,7 @@ MergeTreeBaseSelectProcessor::MergeTreeBaseSelectProcessor( UInt64 preferred_block_size_bytes_, UInt64 preferred_max_column_in_block_size_bytes_, UInt64 min_bytes_to_use_direct_io_, + UInt64 min_bytes_to_use_mmap_io_, UInt64 max_read_buffer_size_, bool use_uncompressed_cache_, bool save_marks_in_cache_, @@ -37,6 +38,7 @@ MergeTreeBaseSelectProcessor::MergeTreeBaseSelectProcessor( preferred_block_size_bytes(preferred_block_size_bytes_), preferred_max_column_in_block_size_bytes(preferred_max_column_in_block_size_bytes_), min_bytes_to_use_direct_io(min_bytes_to_use_direct_io_), + min_bytes_to_use_mmap_io(min_bytes_to_use_mmap_io_), max_read_buffer_size(max_read_buffer_size_), use_uncompressed_cache(use_uncompressed_cache_), save_marks_in_cache(save_marks_in_cache_), @@ -76,35 +78,23 @@ void MergeTreeBaseSelectProcessor::initializeRangeReaders(MergeTreeReadTask & cu { if (reader->getColumns().empty()) { - current_task.range_reader = MergeTreeRangeReader( - pre_reader.get(), nullptr, - prewhere_info->alias_actions, prewhere_info->prewhere_actions, - &prewhere_info->prewhere_column_name, - current_task.remove_prewhere_column, true); + current_task.range_reader = MergeTreeRangeReader(pre_reader.get(), nullptr, prewhere_info, true); } else { MergeTreeRangeReader * pre_reader_ptr = nullptr; if (pre_reader != nullptr) { - current_task.pre_range_reader = MergeTreeRangeReader( - pre_reader.get(), nullptr, - prewhere_info->alias_actions, prewhere_info->prewhere_actions, - &prewhere_info->prewhere_column_name, - current_task.remove_prewhere_column, false); + current_task.pre_range_reader = MergeTreeRangeReader(pre_reader.get(), nullptr, prewhere_info, false); pre_reader_ptr = ¤t_task.pre_range_reader; } - current_task.range_reader = MergeTreeRangeReader( - reader.get(), pre_reader_ptr, nullptr, nullptr, - nullptr, false, true); + current_task.range_reader = MergeTreeRangeReader(reader.get(), pre_reader_ptr, nullptr, true); } } else { - current_task.range_reader = MergeTreeRangeReader( - reader.get(), nullptr, nullptr, nullptr, - nullptr, false, true); + current_task.range_reader = MergeTreeRangeReader(reader.get(), nullptr, nullptr, true); } } @@ -333,6 +323,12 @@ void MergeTreeBaseSelectProcessor::executePrewhereActions(Block & block, const P prewhere_info->prewhere_actions->execute(block); if (prewhere_info->remove_prewhere_column) block.erase(prewhere_info->prewhere_column_name); + else + { + auto & ctn = block.getByName(prewhere_info->prewhere_column_name); + ctn.type = std::make_shared(); + ctn.column = ctn.type->createColumnConst(block.rows(), 1u)->convertToFullColumnIfConst(); + } if (!block) block.insert({nullptr, std::make_shared(), "_nothing"}); diff --git a/dbms/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.h b/dbms/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.h index 7f3367b74c8..ba6404bfb4e 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.h +++ b/dbms/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.h @@ -27,6 +27,7 @@ public: UInt64 preferred_block_size_bytes_, UInt64 preferred_max_column_in_block_size_bytes_, UInt64 min_bytes_to_use_direct_io_, + UInt64 min_bytes_to_use_mmap_io_, UInt64 max_read_buffer_size_, bool use_uncompressed_cache_, bool save_marks_in_cache_ = true, @@ -64,6 +65,7 @@ protected: UInt64 preferred_max_column_in_block_size_bytes; UInt64 min_bytes_to_use_direct_io; + UInt64 min_bytes_to_use_mmap_io; UInt64 max_read_buffer_size; bool use_uncompressed_cache; diff --git a/dbms/src/Storages/MergeTree/MergeTreeData.cpp b/dbms/src/Storages/MergeTree/MergeTreeData.cpp index a8f6f0a8aad..d4451af3273 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeData.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeData.cpp @@ -114,16 +114,9 @@ MergeTreeData::MergeTreeData( const String & database_, const String & table_, const String & relative_data_path_, - const ColumnsDescription & columns_, - const IndicesDescription & indices_, - const ConstraintsDescription & constraints_, + const StorageInMemoryMetadata & metadata, Context & context_, const String & date_column_name, - const ASTPtr & partition_by_ast_, - const ASTPtr & order_by_ast_, - const ASTPtr & primary_key_ast_, - const ASTPtr & sample_by_ast_, - const ASTPtr & ttl_table_ast_, const MergingParams & merging_params_, std::unique_ptr storage_settings_, bool require_part_metadata_, @@ -131,8 +124,9 @@ MergeTreeData::MergeTreeData( BrokenPartCallback broken_part_callback_) : global_context(context_) , merging_params(merging_params_) - , partition_by_ast(partition_by_ast_) - , sample_by_ast(sample_by_ast_) + , partition_by_ast(metadata.partition_by_ast) + , sample_by_ast(metadata.sample_by_ast) + , settings_ast(metadata.settings_ast) , require_part_metadata(require_part_metadata_) , database_name(database_) , table_name(table_) @@ -147,7 +141,7 @@ MergeTreeData::MergeTreeData( , parts_mover(this) { const auto settings = getSettings(); - setProperties(order_by_ast_, primary_key_ast_, columns_, indices_, constraints_); + setProperties(metadata); /// NOTE: using the same columns list as is read when performing actual merges. merging_params.check(getColumns().getAllPhysical()); @@ -189,7 +183,7 @@ MergeTreeData::MergeTreeData( min_format_version = MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING; } - setTTLExpressions(columns_.getColumnTTLs(), ttl_table_ast_); + setTTLExpressions(metadata.columns.getColumnTTLs(), metadata.ttl_for_table_ast); // format_file always contained on any data path String version_file_path; @@ -244,6 +238,35 @@ MergeTreeData::MergeTreeData( } +StorageInMemoryMetadata MergeTreeData::getInMemoryMetadata() const +{ + StorageInMemoryMetadata metadata{ + .columns = getColumns(), + .indices = getIndices(), + .constraints = getConstraints(), + }; + + if (partition_by_ast) + metadata.partition_by_ast = partition_by_ast->clone(); + + if (order_by_ast) + metadata.order_by_ast = order_by_ast->clone(); + + if (primary_key_ast) + metadata.primary_key_ast = primary_key_ast->clone(); + + if (ttl_table_ast) + metadata.ttl_for_table_ast = ttl_table_ast->clone(); + + if (sample_by_ast) + metadata.sample_by_ast = sample_by_ast->clone(); + + if (settings_ast) + metadata.settings_ast = settings_ast->clone(); + + return metadata; +} + static void checkKeyExpression(const ExpressionActions & expr, const Block & sample_block, const String & key_name) { for (const ExpressionAction & action : expr.getActions()) @@ -272,18 +295,14 @@ static void checkKeyExpression(const ExpressionActions & expr, const Block & sam } } - -void MergeTreeData::setProperties( - const ASTPtr & new_order_by_ast, const ASTPtr & new_primary_key_ast, - const ColumnsDescription & new_columns, const IndicesDescription & indices_description, - const ConstraintsDescription & constraints_description, bool only_check) +void MergeTreeData::setProperties(const StorageInMemoryMetadata & metadata, bool only_check) { - if (!new_order_by_ast) + if (!metadata.order_by_ast) throw Exception("ORDER BY cannot be empty", ErrorCodes::BAD_ARGUMENTS); - ASTPtr new_sorting_key_expr_list = extractKeyExpressionList(new_order_by_ast); - ASTPtr new_primary_key_expr_list = new_primary_key_ast - ? extractKeyExpressionList(new_primary_key_ast) : new_sorting_key_expr_list->clone(); + ASTPtr new_sorting_key_expr_list = extractKeyExpressionList(metadata.order_by_ast); + ASTPtr new_primary_key_expr_list = metadata.primary_key_ast + ? extractKeyExpressionList(metadata.primary_key_ast) : new_sorting_key_expr_list->clone(); if (merging_params.mode == MergeTreeData::MergingParams::VersionedCollapsing) new_sorting_key_expr_list->children.push_back(std::make_shared(merging_params.version_column)); @@ -315,8 +334,9 @@ void MergeTreeData::setProperties( } } - auto all_columns = new_columns.getAllPhysical(); + auto all_columns = metadata.columns.getAllPhysical(); + /// Order by check AST if (order_by_ast && only_check) { /// This is ALTER, not CREATE/ATTACH TABLE. Let us check that all new columns used in the sorting key @@ -352,7 +372,7 @@ void MergeTreeData::setProperties( "added to the sorting key. You can add expressions that use only the newly added columns", ErrorCodes::BAD_ARGUMENTS); - if (new_columns.getDefaults().count(col)) + if (metadata.columns.getDefaults().count(col)) throw Exception("Newly added column " + col + " has a default expression, so adding " "expressions that use it to the sorting key is forbidden", ErrorCodes::BAD_ARGUMENTS); @@ -387,11 +407,11 @@ void MergeTreeData::setProperties( MergeTreeIndices new_indices; - if (!indices_description.indices.empty()) + if (!metadata.indices.indices.empty()) { std::set indices_names; - for (const auto & index_ast : indices_description.indices) + for (const auto & index_ast : metadata.indices.indices) { const auto & index_decl = std::dynamic_pointer_cast(index_ast); @@ -428,24 +448,24 @@ void MergeTreeData::setProperties( if (!only_check) { - setColumns(std::move(new_columns)); + setColumns(std::move(metadata.columns)); - order_by_ast = new_order_by_ast; + order_by_ast = metadata.order_by_ast; sorting_key_columns = std::move(new_sorting_key_columns); sorting_key_expr_ast = std::move(new_sorting_key_expr_list); sorting_key_expr = std::move(new_sorting_key_expr); - primary_key_ast = new_primary_key_ast; + primary_key_ast = metadata.primary_key_ast; primary_key_columns = std::move(new_primary_key_columns); primary_key_expr_ast = std::move(new_primary_key_expr_list); primary_key_expr = std::move(new_primary_key_expr); primary_key_sample = std::move(new_primary_key_sample); primary_key_data_types = std::move(new_primary_key_data_types); - setIndices(indices_description); + setIndices(metadata.indices); skip_indices = std::move(new_indices); - setConstraints(constraints_description); + setConstraints(metadata.constraints); primary_key_and_skip_indices_expr = new_indices_with_primary_key_expr; sorting_key_and_skip_indices_expr = new_indices_with_sorting_key_expr; @@ -803,7 +823,8 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks) auto disks = storage_policy->getDisks(); - if (getStoragePolicy()->getName() != "default") + /// Only check if user did touch storage configuration for this table. + if (!getStoragePolicy()->isDefaultPolicy() && !skip_sanity_checks) { /// Check extra parts at different disks, in order to not allow to miss data parts at undefined disks. std::unordered_set defined_disk_names; @@ -1357,19 +1378,13 @@ bool isMetadataOnlyConversion(const IDataType * from, const IDataType * to) } -void MergeTreeData::checkAlter(const AlterCommands & commands, const Context & context) +void MergeTreeData::checkAlterIsPossible(const AlterCommands & commands, const Settings & settings) { /// Check that needed transformations can be applied to the list of columns without considering type conversions. - auto new_columns = getColumns(); - auto new_indices = getIndices(); - auto new_constraints = getConstraints(); - ASTPtr new_order_by_ast = order_by_ast; - ASTPtr new_primary_key_ast = primary_key_ast; - ASTPtr new_ttl_table_ast = ttl_table_ast; - SettingsChanges new_changes; - commands.apply(new_columns, new_indices, new_constraints, new_order_by_ast, new_primary_key_ast, new_ttl_table_ast, new_changes); - if (getIndices().empty() && !new_indices.empty() && - !context.getSettingsRef().allow_experimental_data_skipping_indices) + StorageInMemoryMetadata metadata = getInMemoryMetadata(); + commands.apply(metadata); + if (getIndices().empty() && !metadata.indices.empty() && + !settings.allow_experimental_data_skipping_indices) throw Exception("You must set the setting `allow_experimental_data_skipping_indices` to 1 " \ "before using data skipping indices.", ErrorCodes::BAD_ARGUMENTS); @@ -1440,13 +1455,32 @@ void MergeTreeData::checkAlter(const AlterCommands & commands, const Context & c } } - setProperties(new_order_by_ast, new_primary_key_ast, - new_columns, new_indices, new_constraints, /* only_check = */ true); + setProperties(metadata, /* only_check = */ true); - setTTLExpressions(new_columns.getColumnTTLs(), new_ttl_table_ast, /* only_check = */ true); + setTTLExpressions(metadata.columns.getColumnTTLs(), metadata.ttl_for_table_ast, /* only_check = */ true); - for (const auto & setting : new_changes) - checkSettingCanBeChanged(setting.name); + if (settings_ast) + { + const auto & current_changes = settings_ast->as().changes; + for (const auto & changed_setting : metadata.settings_ast->as().changes) + { + if (MergeTreeSettings::findIndex(changed_setting.name) == MergeTreeSettings::npos) + throw Exception{"Storage '" + getName() + "' doesn't have setting '" + changed_setting.name + "'", + ErrorCodes::UNKNOWN_SETTING}; + + auto comparator = [&changed_setting](const auto & change) { return change.name == changed_setting.name; }; + + auto current_setting_it + = std::find_if(current_changes.begin(), current_changes.end(), comparator); + + if ((current_setting_it == current_changes.end() || *current_setting_it != changed_setting) + && MergeTreeSettings::isReadonlySetting(changed_setting.name)) + { + throw Exception{"Setting '" + changed_setting.name + "' is readonly for storage '" + getName() + "'", + ErrorCodes::READONLY_SETTING}; + } + } + } if (commands.isModifyingData()) { @@ -1454,8 +1488,8 @@ void MergeTreeData::checkAlter(const AlterCommands & commands, const Context & c ExpressionActionsPtr unused_expression; NameToNameMap unused_map; bool unused_bool; - createConvertExpression(nullptr, getColumns().getAllPhysical(), new_columns.getAllPhysical(), - getIndices().indices, new_indices.indices, unused_expression, unused_map, unused_bool); + createConvertExpression(nullptr, getColumns().getAllPhysical(), metadata.columns.getAllPhysical(), + getIndices().indices, metadata.indices.indices, unused_expression, unused_map, unused_bool); } } @@ -1771,26 +1805,19 @@ void MergeTreeData::alterDataPart( } void MergeTreeData::changeSettings( - const SettingsChanges & new_changes, + const ASTPtr & new_settings, TableStructureWriteLockHolder & /* table_lock_holder */) { - if (!new_changes.empty()) + if (new_settings) { + const auto & new_changes = new_settings->as().changes; MergeTreeSettings copy = *getSettings(); copy.applyChanges(new_changes); storage_settings.set(std::make_unique(copy)); + settings_ast = new_settings; } } -void MergeTreeData::checkSettingCanBeChanged(const String & setting_name) const -{ - if (MergeTreeSettings::findIndex(setting_name) == MergeTreeSettings::npos) - throw Exception{"Storage '" + getName() + "' doesn't have setting '" + setting_name + "'", ErrorCodes::UNKNOWN_SETTING}; - if (MergeTreeSettings::isReadonlySetting(setting_name)) - throw Exception{"Setting '" + setting_name + "' is readonly for storage '" + getName() + "'", ErrorCodes::READONLY_SETTING}; - -} - void MergeTreeData::removeEmptyColumnsFromPart(MergeTreeData::MutableDataPartPtr & data_part) { auto & empty_columns = data_part->empty_columns; diff --git a/dbms/src/Storages/MergeTree/MergeTreeData.h b/dbms/src/Storages/MergeTree/MergeTreeData.h index 3ff12d69391..4fb09277b1e 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeData.h +++ b/dbms/src/Storages/MergeTree/MergeTreeData.h @@ -332,22 +332,17 @@ public: /// attach - whether the existing table is attached or the new table is created. MergeTreeData(const String & database_, const String & table_, const String & relative_data_path_, - const ColumnsDescription & columns_, - const IndicesDescription & indices_, - const ConstraintsDescription & constraints_, + const StorageInMemoryMetadata & metadata, Context & context_, const String & date_column_name, - const ASTPtr & partition_by_ast_, - const ASTPtr & order_by_ast_, - const ASTPtr & primary_key_ast_, - const ASTPtr & sample_by_ast_, /// nullptr, if sampling is not supported. - const ASTPtr & ttl_table_ast_, const MergingParams & merging_params_, std::unique_ptr settings_, bool require_part_metadata_, bool attach, BrokenPartCallback broken_part_callback_ = [](const String &){}); + + StorageInMemoryMetadata getInMemoryMetadata() const override; ASTPtr getPartitionKeyAST() const override { return partition_by_ast; } ASTPtr getSortingKeyAST() const override { return sorting_key_expr_ast; } ASTPtr getPrimaryKeyAST() const override { return primary_key_expr_ast; } @@ -545,7 +540,7 @@ public: /// - all type conversions can be done. /// - columns corresponding to primary key, indices, sign, sampling expression and date are not affected. /// If something is wrong, throws an exception. - void checkAlter(const AlterCommands & commands, const Context & context); + void checkAlterIsPossible(const AlterCommands & commands, const Settings & settings) override; /// Performs ALTER of the data part, writes the result to temporary files. /// Returns an object allowing to rename temporary files to permanent files. @@ -559,12 +554,9 @@ public: /// Change MergeTreeSettings void changeSettings( - const SettingsChanges & new_changes, + const ASTPtr & new_changes, TableStructureWriteLockHolder & table_lock_holder); - /// All MergeTreeData children have settings. - void checkSettingCanBeChanged(const String & setting_name) const override; - /// Remove columns, that have been marked as empty after zeroing values with expired ttl void removeEmptyColumnsFromPart(MergeTreeData::MutableDataPartPtr & data_part); @@ -787,6 +779,7 @@ protected: ASTPtr primary_key_ast; ASTPtr sample_by_ast; ASTPtr ttl_table_ast; + ASTPtr settings_ast; bool require_part_metadata; @@ -899,10 +892,7 @@ protected: std::mutex clear_old_temporary_directories_mutex; /// Mutex for settings usage - void setProperties(const ASTPtr & new_order_by_ast, const ASTPtr & new_primary_key_ast, - const ColumnsDescription & new_columns, - const IndicesDescription & indices_description, - const ConstraintsDescription & constraints_description, bool only_check = false); + void setProperties(const StorageInMemoryMetadata & metadata, bool only_check = false); void initPartitionKey(); diff --git a/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp b/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp index 5919e5a2670..8975535f31b 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp @@ -816,7 +816,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor + ") differs from number of bytes written to rows_sources file (" + toString(rows_sources_count) + "). It is a bug.", ErrorCodes::LOGICAL_ERROR); - CompressedReadBufferFromFile rows_sources_read_buf(rows_sources_file_path, 0, 0); + CompressedReadBufferFromFile rows_sources_read_buf(rows_sources_file_path, 0, 0, 0); IMergedBlockOutputStream::WrittenOffsetColumns written_offset_columns; for (size_t column_num = 0, gathering_column_names_size = gathering_column_names.size(); diff --git a/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp b/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp index 36969ed18fe..09c4fe835d6 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp @@ -793,7 +793,8 @@ Pipes MergeTreeDataSelectExecutor::spreadMarkRangesAmongStreams( auto source = std::make_shared( data, part.data_part, max_block_size, settings.preferred_block_size_bytes, settings.preferred_max_column_in_block_size_bytes, column_names, part.ranges, use_uncompressed_cache, - query_info.prewhere_info, true, settings.min_bytes_to_use_direct_io, settings.max_read_buffer_size, true, + query_info.prewhere_info, true, settings.min_bytes_to_use_direct_io, settings.min_bytes_to_use_mmap_io, + settings.max_read_buffer_size, true, virt_columns, part.part_index_in_query); res.emplace_back(std::move(source)); @@ -973,7 +974,7 @@ Pipes MergeTreeDataSelectExecutor::spreadMarkRangesAmongStreamsWithOrder( pipes.emplace_back(std::make_shared( data, part.data_part, max_block_size, settings.preferred_block_size_bytes, settings.preferred_max_column_in_block_size_bytes, column_names, ranges_to_get_from_part, - use_uncompressed_cache, query_info.prewhere_info, true, settings.min_bytes_to_use_direct_io, + use_uncompressed_cache, query_info.prewhere_info, true, settings.min_bytes_to_use_direct_io, settings.min_bytes_to_use_mmap_io, settings.max_read_buffer_size, true, virt_columns, part.part_index_in_query)); } else @@ -981,7 +982,7 @@ Pipes MergeTreeDataSelectExecutor::spreadMarkRangesAmongStreamsWithOrder( pipes.emplace_back(std::make_shared( data, part.data_part, max_block_size, settings.preferred_block_size_bytes, settings.preferred_max_column_in_block_size_bytes, column_names, ranges_to_get_from_part, - use_uncompressed_cache, query_info.prewhere_info, true, settings.min_bytes_to_use_direct_io, + use_uncompressed_cache, query_info.prewhere_info, true, settings.min_bytes_to_use_direct_io, settings.min_bytes_to_use_mmap_io, settings.max_read_buffer_size, true, virt_columns, part.part_index_in_query)); pipes.back().addSimpleTransform(std::make_shared(pipes.back().getHeader())); @@ -1054,7 +1055,8 @@ Pipes MergeTreeDataSelectExecutor::spreadMarkRangesAmongStreamsFinal( auto source_processor = std::make_shared( data, part.data_part, max_block_size, settings.preferred_block_size_bytes, settings.preferred_max_column_in_block_size_bytes, column_names, part.ranges, use_uncompressed_cache, - query_info.prewhere_info, true, settings.min_bytes_to_use_direct_io, settings.max_read_buffer_size, true, + query_info.prewhere_info, true, settings.min_bytes_to_use_direct_io, settings.min_bytes_to_use_mmap_io, + settings.max_read_buffer_size, true, virt_columns, part.part_index_in_query); Pipe pipe(std::move(source_processor)); diff --git a/dbms/src/Storages/MergeTree/MergeTreeIndexReader.cpp b/dbms/src/Storages/MergeTree/MergeTreeIndexReader.cpp index 05f09041fed..8b567b39304 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeIndexReader.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeIndexReader.cpp @@ -9,7 +9,7 @@ MergeTreeIndexReader::MergeTreeIndexReader( : index(index_), stream( part_->getFullPath() + index->getFileName(), ".idx", marks_count_, all_mark_ranges_, nullptr, false, nullptr, - part_->getFileSizeOrZero(index->getFileName() + ".idx"), 0, DBMS_DEFAULT_BUFFER_SIZE, + part_->getFileSizeOrZero(index->getFileName() + ".idx"), 0, 0, DBMS_DEFAULT_BUFFER_SIZE, &part_->index_granularity_info, ReadBufferFromFileBase::ProfileCallback{}, CLOCK_MONOTONIC_COARSE) { diff --git a/dbms/src/Storages/MergeTree/MergeTreeRangeReader.cpp b/dbms/src/Storages/MergeTree/MergeTreeRangeReader.cpp index d03160d7ec2..a09bd548b64 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeRangeReader.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeRangeReader.cpp @@ -255,6 +255,36 @@ void MergeTreeRangeReader::ReadResult::clear() filter = nullptr; } +void MergeTreeRangeReader::ReadResult::shrink(Columns & old_columns) +{ + for (size_t i = 0; i < old_columns.size(); ++i) + { + if (!old_columns[i]) + continue; + auto new_column = old_columns[i]->cloneEmpty(); + new_column->reserve(total_rows_per_granule); + for (size_t j = 0, pos = 0; j < rows_per_granule_original.size(); pos += rows_per_granule_original[i], ++j) + { + if (rows_per_granule[j]) + new_column->insertRangeFrom(*old_columns[i], pos, rows_per_granule[j]); + } + old_columns[i] = std::move(new_column); + } +} + +void MergeTreeRangeReader::ReadResult::setFilterConstTrue() +{ + clearFilter(); + filter_holder = DataTypeUInt8().createColumnConst(num_rows, 1u); +} + +void MergeTreeRangeReader::ReadResult::setFilterConstFalse() +{ + clearFilter(); + columns.clear(); + num_rows = 0; +} + void MergeTreeRangeReader::ReadResult::optimize() { if (total_rows_per_granule == 0 || filter == nullptr) @@ -268,30 +298,47 @@ void MergeTreeRangeReader::ReadResult::optimize() clear(); return; } - else if (total_zero_rows_in_tails == 0 && countBytesInFilter(filter->getData()) == filter->size()) + else if (total_zero_rows_in_tails == 0 && countBytesInResultFilter(filter->getData()) == filter->size()) { - filter_holder = nullptr; - filter = nullptr; + setFilterConstTrue(); return; } - /// Just a guess. If only a few rows may be skipped, it's better not to skip at all. - if (2 * total_zero_rows_in_tails > filter->size()) + else if (2 * total_zero_rows_in_tails > filter->size()) { + for (auto i : ext::range(0, rows_per_granule.size())) + { + rows_per_granule_original.push_back(rows_per_granule[i]); + rows_per_granule[i] -= zero_tails[i]; + } + num_rows_to_skip_in_last_granule += rows_per_granule_original.back() - rows_per_granule.back(); - auto new_filter = ColumnUInt8::create(filter->size() - total_zero_rows_in_tails); - IColumn::Filter & new_data = new_filter->getData(); + /// Check if const 1 after shrink + if (countBytesInResultFilter(filter->getData()) + total_zero_rows_in_tails == total_rows_per_granule) + { + total_rows_per_granule = total_rows_per_granule - total_zero_rows_in_tails; + num_rows = total_rows_per_granule; + setFilterConstTrue(); + shrink(columns); /// shrink acts as filtering in such case + } + else + { + auto new_filter = ColumnUInt8::create(filter->size() - total_zero_rows_in_tails); + IColumn::Filter & new_data = new_filter->getData(); - size_t rows_in_last_granule = rows_per_granule.back(); - - collapseZeroTails(filter->getData(), new_data, zero_tails); - - total_rows_per_granule = new_filter->size(); - num_rows_to_skip_in_last_granule += rows_in_last_granule - rows_per_granule.back(); - - filter = new_filter.get(); - filter_holder = std::move(new_filter); + collapseZeroTails(filter->getData(), new_data); + total_rows_per_granule = new_filter->size(); + num_rows = total_rows_per_granule; + filter_original = filter; + filter_holder_original = std::move(filter_holder); + filter = new_filter.get(); + filter_holder = std::move(new_filter); + } + need_filter = true; } + /// Another guess, if it's worth filtering at PREWHERE + else if (countBytesInResultFilter(filter->getData()) < 0.6 * filter->size()) + need_filter = true; } size_t MergeTreeRangeReader::ReadResult::countZeroTails(const IColumn::Filter & filter_vec, NumRows & zero_tails) const @@ -314,24 +361,16 @@ size_t MergeTreeRangeReader::ReadResult::countZeroTails(const IColumn::Filter & return total_zero_rows_in_tails; } -void MergeTreeRangeReader::ReadResult::collapseZeroTails(const IColumn::Filter & filter_vec, IColumn::Filter & new_filter_vec, - const NumRows & zero_tails) +void MergeTreeRangeReader::ReadResult::collapseZeroTails(const IColumn::Filter & filter_vec, IColumn::Filter & new_filter_vec) { auto filter_data = filter_vec.data(); auto new_filter_data = new_filter_vec.data(); for (auto i : ext::range(0, rows_per_granule.size())) { - auto & rows_to_read = rows_per_granule[i]; - auto filtered_rows_num_at_granule_end = zero_tails[i]; - - rows_to_read -= filtered_rows_num_at_granule_end; - - memcpySmallAllowReadWriteOverflow15(new_filter_data, filter_data, rows_to_read); - filter_data += rows_to_read; - new_filter_data += rows_to_read; - - filter_data += filtered_rows_num_at_granule_end; + memcpySmallAllowReadWriteOverflow15(new_filter_data, filter_data, rows_per_granule[i]); + filter_data += rows_per_granule_original[i]; + new_filter_data += rows_per_granule[i]; } new_filter_vec.resize(new_filter_data - new_filter_vec.data()); @@ -405,15 +444,27 @@ void MergeTreeRangeReader::ReadResult::setFilter(const ColumnPtr & new_filter) } +size_t MergeTreeRangeReader::ReadResult::countBytesInResultFilter(const IColumn::Filter & filter_) +{ + auto it = filter_bytes_map.find(&filter_); + if (it == filter_bytes_map.end()) + { + auto bytes = countBytesInFilter(filter_); + filter_bytes_map[&filter_] = bytes; + return bytes; + } + else + return it->second; +} + MergeTreeRangeReader::MergeTreeRangeReader( - MergeTreeReader * merge_tree_reader_, MergeTreeRangeReader * prev_reader_, - ExpressionActionsPtr alias_actions_, ExpressionActionsPtr prewhere_actions_, - const String * prewhere_column_name_, bool remove_prewhere_column_, bool last_reader_in_chain_) - : merge_tree_reader(merge_tree_reader_), index_granularity(&(merge_tree_reader->data_part->index_granularity)) - , prev_reader(prev_reader_), prewhere_column_name(prewhere_column_name_) - , alias_actions(std::move(alias_actions_)), prewhere_actions(std::move(prewhere_actions_)) - , remove_prewhere_column(remove_prewhere_column_) - , last_reader_in_chain(last_reader_in_chain_), is_initialized(true) + MergeTreeReader * merge_tree_reader_, + MergeTreeRangeReader * prev_reader_, + const PrewhereInfoPtr & prewhere_, + bool last_reader_in_chain_) + : merge_tree_reader(merge_tree_reader_) + , index_granularity(&(merge_tree_reader->data_part->index_granularity)), prev_reader(prev_reader_) + , prewhere(prewhere_), last_reader_in_chain(last_reader_in_chain_), is_initialized(true) { if (prev_reader) sample_block = prev_reader->getSampleBlock(); @@ -421,14 +472,18 @@ MergeTreeRangeReader::MergeTreeRangeReader( for (auto & name_and_type : merge_tree_reader->getColumns()) sample_block.insert({name_and_type.type->createColumn(), name_and_type.type, name_and_type.name}); - if (alias_actions) - alias_actions->execute(sample_block, true); + if (prewhere) + { + if (prewhere->alias_actions) + prewhere->alias_actions->execute(sample_block, true); - if (prewhere_actions) - prewhere_actions->execute(sample_block, true); + sample_block_before_prewhere = sample_block; + if (prewhere->prewhere_actions) + prewhere->prewhere_actions->execute(sample_block, true); - if (remove_prewhere_column) - sample_block.erase(*prewhere_column_name); + if (prewhere->remove_prewhere_column) + sample_block.erase(prewhere->prewhere_column_name); + } } bool MergeTreeRangeReader::isReadingFinished() const @@ -488,12 +543,10 @@ MergeTreeRangeReader::ReadResult MergeTreeRangeReader::read(size_t max_rows, Mar throw Exception("Expected at least 1 row to read, got 0.", ErrorCodes::LOGICAL_ERROR); ReadResult read_result; - size_t prev_bytes = 0; if (prev_reader) { read_result = prev_reader->read(max_rows, ranges); - prev_bytes = read_result.numBytesRead(); size_t num_read_rows; Columns columns = continueReadingChain(read_result, num_read_rows); @@ -509,6 +562,15 @@ MergeTreeRangeReader::ReadResult MergeTreeRangeReader::read(size_t max_rows, Mar has_columns = true; } + size_t total_bytes = 0; + for (auto & column : columns) + { + if (column) + total_bytes += column->byteSize(); + } + + read_result.addNumBytesRead(total_bytes); + bool should_evaluate_missing_defaults = false; if (has_columns) @@ -533,8 +595,30 @@ MergeTreeRangeReader::ReadResult MergeTreeRangeReader::read(size_t max_rows, Mar } if (!columns.empty() && should_evaluate_missing_defaults) - merge_tree_reader->evaluateMissingDefaults( - prev_reader->getSampleBlock().cloneWithColumns(read_result.columns), columns); + { + auto block = prev_reader->sample_block.cloneWithColumns(read_result.columns); + auto block_before_prewhere = read_result.block_before_prewhere; + for (auto & ctn : block) + { + if (block_before_prewhere.has(ctn.name)) + block_before_prewhere.erase(ctn.name); + } + + if (block_before_prewhere) + { + if (read_result.need_filter) + { + auto old_columns = block_before_prewhere.getColumns(); + filterColumns(old_columns, read_result.getFilter()->getData()); + block_before_prewhere.setColumns(std::move(old_columns)); + } + + for (auto && ctn : block_before_prewhere) + block.insert(std::move(ctn)); + } + + merge_tree_reader->evaluateMissingDefaults(block, columns); + } read_result.columns.reserve(read_result.columns.size() + columns.size()); for (auto & column : columns) @@ -556,17 +640,17 @@ MergeTreeRangeReader::ReadResult MergeTreeRangeReader::read(size_t max_rows, Mar } else read_result.columns.clear(); + + size_t total_bytes = 0; + for (auto & column : read_result.columns) + total_bytes += column->byteSize(); + + read_result.addNumBytesRead(total_bytes); } if (read_result.num_rows == 0) return read_result; - size_t total_bytes = 0; - for (auto & column : read_result.columns) - total_bytes += column->byteSize(); - - read_result.addNumBytesRead(total_bytes - prev_bytes); - executePrewhereActionsAndFilterColumns(read_result); return read_result; @@ -674,7 +758,7 @@ Columns MergeTreeRangeReader::continueReadingChain(ReadResult & result, size_t & void MergeTreeRangeReader::executePrewhereActionsAndFilterColumns(ReadResult & result) { - if (!prewhere_actions) + if (!prewhere) return; auto & header = merge_tree_reader->getColumns(); @@ -705,12 +789,14 @@ void MergeTreeRangeReader::executePrewhereActionsAndFilterColumns(ReadResult & r for (auto name_and_type = header.begin(); pos < num_columns; ++pos, ++name_and_type) block.insert({result.columns[pos], name_and_type->type, name_and_type->name}); - if (alias_actions) - alias_actions->execute(block); + if (prewhere && prewhere->alias_actions) + prewhere->alias_actions->execute(block); - prewhere_actions->execute(block); + /// Columns might be projected out. We need to store them here so that default columns can be evaluated later. + result.block_before_prewhere = block; + prewhere->prewhere_actions->execute(block); - prewhere_column_pos = block.getPositionByName(*prewhere_column_name); + prewhere_column_pos = block.getPositionByName(prewhere->prewhere_column_name); result.columns.clear(); result.columns.reserve(block.columns()); @@ -729,51 +815,38 @@ void MergeTreeRangeReader::executePrewhereActionsAndFilterColumns(ReadResult & r } result.setFilter(filter); + + /// If there is a WHERE, we filter in there, and only optimize IO and shrink columns here if (!last_reader_in_chain) result.optimize(); - bool filter_always_true = !result.getFilter() && result.totalRowsPerGranule() == filter->size(); - + /// If we read nothing or filter gets optimized to nothing if (result.totalRowsPerGranule() == 0) + result.setFilterConstFalse(); + /// If we need to filter in PREWHERE + else if (prewhere->need_filter || result.need_filter) { - result.columns.clear(); - result.num_rows = 0; - } - else if (!filter_always_true) - { - FilterDescription filter_description(*filter); - - size_t num_bytes_in_filter = 0; - bool calculated_num_bytes_in_filter = false; - - auto getNumBytesInFilter = [&]() + /// If there is a filter and without optimized + if (result.getFilter() && last_reader_in_chain) { - if (!calculated_num_bytes_in_filter) - num_bytes_in_filter = countBytesInFilter(*filter_description.data); - - calculated_num_bytes_in_filter = true; - return num_bytes_in_filter; - }; - - if (last_reader_in_chain) - { - size_t bytes_in_filter = getNumBytesInFilter(); + auto result_filter = result.getFilter(); + /// optimize is not called, need to check const 1 and const 0 + size_t bytes_in_filter = result.countBytesInResultFilter(result_filter->getData()); if (bytes_in_filter == 0) - { - result.columns.clear(); - result.num_rows = 0; - } - else if (bytes_in_filter == filter->size()) - filter_always_true = true; + result.setFilterConstFalse(); + else if (bytes_in_filter == result.num_rows) + result.setFilterConstTrue(); } - if (!filter_always_true) + /// If there is still a filter, do the filtering now + if (result.getFilter()) { - filterColumns(result.columns, *filter_description.data); + /// filter might be shrinked while columns not + auto result_filter = result.getFilterOriginal() ? result.getFilterOriginal() : result.getFilter(); + filterColumns(result.columns, result_filter->getData()); + result.need_filter = true; - /// Get num rows after filtration. bool has_column = false; - for (auto & column : result.columns) { if (column) @@ -784,19 +857,26 @@ void MergeTreeRangeReader::executePrewhereActionsAndFilterColumns(ReadResult & r } } + /// There is only one filter column. Record the actual number if (!has_column) - result.num_rows = getNumBytesInFilter(); + result.num_rows = result.countBytesInResultFilter(result_filter->getData()); + } + + /// Check if the PREWHERE column is needed + if (result.columns.size()) + { + if (prewhere->remove_prewhere_column) + result.columns.erase(result.columns.begin() + prewhere_column_pos); + else + result.columns[prewhere_column_pos] = DataTypeUInt8().createColumnConst(result.num_rows, 1u)->convertToFullColumnIfConst(); } } - - if (result.num_rows == 0) - return; - - if (remove_prewhere_column) - result.columns.erase(result.columns.begin() + prewhere_column_pos); + /// Filter in WHERE instead else - result.columns[prewhere_column_pos] = - DataTypeUInt8().createColumnConst(result.num_rows, 1u)->convertToFullColumnIfConst(); + { + result.columns[prewhere_column_pos] = result.getFilterHolder()->convertToFullColumnIfConst(); + result.clearFilter(); // Acting as a flag to not filter in PREWHERE + } } } diff --git a/dbms/src/Storages/MergeTree/MergeTreeRangeReader.h b/dbms/src/Storages/MergeTree/MergeTreeRangeReader.h index 67d5cbc3908..345f537d2aa 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeRangeReader.h +++ b/dbms/src/Storages/MergeTree/MergeTreeRangeReader.h @@ -13,6 +13,8 @@ using ColumnUInt8 = ColumnVector; class MergeTreeReader; class MergeTreeIndexGranularity; +struct PrewhereInfo; +using PrewhereInfoPtr = std::shared_ptr; /// MergeTreeReader iterator which allows sequential reading for arbitrary number of rows between pairs of marks in the same part. /// Stores reading state, which can be inside granule. Can skip rows in current granule and start reading from next mark. @@ -20,9 +22,11 @@ class MergeTreeIndexGranularity; class MergeTreeRangeReader { public: - MergeTreeRangeReader(MergeTreeReader * merge_tree_reader_, MergeTreeRangeReader * prev_reader_, - ExpressionActionsPtr alias_actions_, ExpressionActionsPtr prewhere_actions_, - const String * prewhere_column_name_, bool remove_prewhere_column_, bool last_reader_in_chain_); + MergeTreeRangeReader( + MergeTreeReader * merge_tree_reader_, + MergeTreeRangeReader * prev_reader_, + const PrewhereInfoPtr & prewhere_, + bool last_reader_in_chain_); MergeTreeRangeReader() = default; @@ -140,7 +144,9 @@ public: /// The number of bytes read from disk. size_t numBytesRead() const { return num_bytes_read; } /// Filter you need to apply to newly-read columns in order to add them to block. + const ColumnUInt8 * getFilterOriginal() const { return filter_original; } const ColumnUInt8 * getFilter() const { return filter; } + ColumnPtr & getFilterHolder() { return filter_holder; } void addGranule(size_t num_rows_); void adjustLastGranule(); @@ -154,10 +160,21 @@ public: /// Remove all rows from granules. void clear(); + void clearFilter() { filter = nullptr; } + void setFilterConstTrue(); + void setFilterConstFalse(); + void addNumBytesRead(size_t count) { num_bytes_read += count; } + void shrink(Columns & old_columns); + + size_t countBytesInResultFilter(const IColumn::Filter & filter); + Columns columns; size_t num_rows = 0; + bool need_filter = false; + + Block block_before_prewhere; private: RangesInfo started_ranges; @@ -165,6 +182,7 @@ public: /// Granule here is not number of rows between two marks /// It's amount of rows per single reading act NumRows rows_per_granule; + NumRows rows_per_granule_original; /// Sum(rows_per_granule) size_t total_rows_per_granule = 0; /// The number of rows was read at first step. May be zero if no read columns present in part. @@ -175,11 +193,15 @@ public: size_t num_bytes_read = 0; /// nullptr if prev reader hasn't prewhere_actions. Otherwise filter.size() >= total_rows_per_granule. ColumnPtr filter_holder; + ColumnPtr filter_holder_original; const ColumnUInt8 * filter = nullptr; + const ColumnUInt8 * filter_original = nullptr; - void collapseZeroTails(const IColumn::Filter & filter, IColumn::Filter & new_filter, const NumRows & zero_tails); + void collapseZeroTails(const IColumn::Filter & filter, IColumn::Filter & new_filter); size_t countZeroTails(const IColumn::Filter & filter, NumRows & zero_tails) const; static size_t numZerosInTail(const UInt8 * begin, const UInt8 * end); + + std::map filter_bytes_map; }; ReadResult read(size_t max_rows, MarkRanges & ranges); @@ -196,16 +218,13 @@ private: MergeTreeReader * merge_tree_reader = nullptr; const MergeTreeIndexGranularity * index_granularity = nullptr; MergeTreeRangeReader * prev_reader = nullptr; /// If not nullptr, read from prev_reader firstly. - - const String * prewhere_column_name = nullptr; - ExpressionActionsPtr alias_actions = nullptr; /// If not nullptr, calculate aliases. - ExpressionActionsPtr prewhere_actions = nullptr; /// If not nullptr, calculate filter. + PrewhereInfoPtr prewhere; Stream stream; Block sample_block; + Block sample_block_before_prewhere; - bool remove_prewhere_column = false; bool last_reader_in_chain = false; bool is_initialized = false; }; diff --git a/dbms/src/Storages/MergeTree/MergeTreeReader.cpp b/dbms/src/Storages/MergeTree/MergeTreeReader.cpp index 29d1dac7587..72c31a9dcb1 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeReader.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeReader.cpp @@ -1,7 +1,6 @@ #include #include #include -#include #include #include #include @@ -40,6 +39,7 @@ MergeTreeReader::MergeTreeReader( const MergeTreeData & storage_, MarkRanges all_mark_ranges_, size_t aio_threshold_, + size_t mmap_threshold_, size_t max_read_buffer_size_, ValueSizeMap avg_value_size_hints_, const ReadBufferFromFileBase::ProfileCallback & profile_callback_, @@ -53,6 +53,7 @@ MergeTreeReader::MergeTreeReader( , storage(storage_) , all_mark_ranges(std::move(all_mark_ranges_)) , aio_threshold(aio_threshold_) + , mmap_threshold(mmap_threshold_) , max_read_buffer_size(max_read_buffer_size_) { try @@ -198,7 +199,7 @@ void MergeTreeReader::addStreams(const String & name, const IDataType & type, path + stream_name, DATA_FILE_EXTENSION, data_part->getMarksCount(), all_mark_ranges, mark_cache, save_marks_in_cache, uncompressed_cache, data_part->getFileSizeOrZero(stream_name + DATA_FILE_EXTENSION), - aio_threshold, max_read_buffer_size, + aio_threshold, mmap_threshold, max_read_buffer_size, &data_part->index_granularity_info, profile_callback, clock_type)); }; diff --git a/dbms/src/Storages/MergeTree/MergeTreeReader.h b/dbms/src/Storages/MergeTree/MergeTreeReader.h index 140fbcb51b0..b0642c06108 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeReader.h +++ b/dbms/src/Storages/MergeTree/MergeTreeReader.h @@ -28,6 +28,7 @@ public: const MergeTreeData & storage_, MarkRanges all_mark_ranges_, size_t aio_threshold_, + size_t mmap_threshold_, size_t max_read_buffer_size_, ValueSizeMap avg_value_size_hints_ = ValueSizeMap{}, const ReadBufferFromFileBase::ProfileCallback & profile_callback_ = ReadBufferFromFileBase::ProfileCallback{}, @@ -81,6 +82,7 @@ private: const MergeTreeData & storage; MarkRanges all_mark_ranges; size_t aio_threshold; + size_t mmap_threshold; size_t max_read_buffer_size; void addStreams(const String & name, const IDataType & type, diff --git a/dbms/src/Storages/MergeTree/MergeTreeReaderStream.cpp b/dbms/src/Storages/MergeTree/MergeTreeReaderStream.cpp index 3dbfc61a00b..1691f01d794 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeReaderStream.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeReaderStream.cpp @@ -1,4 +1,5 @@ #include +#include #include @@ -19,7 +20,7 @@ MergeTreeReaderStream::MergeTreeReaderStream( const MarkRanges & all_mark_ranges, MarkCache * mark_cache_, bool save_marks_in_cache_, UncompressedCache * uncompressed_cache, - size_t file_size, size_t aio_threshold, size_t max_read_buffer_size, + size_t file_size, size_t aio_threshold, size_t mmap_threshold, size_t max_read_buffer_size, const MergeTreeIndexGranularityInfo * index_granularity_info_, const ReadBufferFromFileBase::ProfileCallback & profile_callback, clockid_t clock_type) : path_prefix(path_prefix_), data_file_extension(data_file_extension_), marks_count(marks_count_) @@ -79,7 +80,7 @@ MergeTreeReaderStream::MergeTreeReaderStream( if (uncompressed_cache) { auto buffer = std::make_unique( - path_prefix + data_file_extension, uncompressed_cache, sum_mark_range_bytes, aio_threshold, buffer_size); + path_prefix + data_file_extension, uncompressed_cache, sum_mark_range_bytes, aio_threshold, mmap_threshold, buffer_size); if (profile_callback) buffer->setProfileCallback(profile_callback, clock_type); @@ -90,7 +91,7 @@ MergeTreeReaderStream::MergeTreeReaderStream( else { auto buffer = std::make_unique( - path_prefix + data_file_extension, sum_mark_range_bytes, aio_threshold, buffer_size); + path_prefix + data_file_extension, sum_mark_range_bytes, aio_threshold, mmap_threshold, buffer_size); if (profile_callback) buffer->setProfileCallback(profile_callback, clock_type); diff --git a/dbms/src/Storages/MergeTree/MergeTreeReaderStream.h b/dbms/src/Storages/MergeTree/MergeTreeReaderStream.h index d60689f1e91..1b0a973797b 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeReaderStream.h +++ b/dbms/src/Storages/MergeTree/MergeTreeReaderStream.h @@ -15,13 +15,13 @@ class MergeTreeReaderStream { public: MergeTreeReaderStream( - const String & path_prefix_, const String & data_file_extension_, size_t marks_count_, - const MarkRanges & all_mark_ranges, - MarkCache * mark_cache, bool save_marks_in_cache, - UncompressedCache * uncompressed_cache, - size_t file_size, size_t aio_threshold, size_t max_read_buffer_size, - const MergeTreeIndexGranularityInfo * index_granularity_info_, - const ReadBufferFromFileBase::ProfileCallback & profile_callback, clockid_t clock_type); + const String & path_prefix_, const String & data_file_extension_, size_t marks_count_, + const MarkRanges & all_mark_ranges, + MarkCache * mark_cache, bool save_marks_in_cache, + UncompressedCache * uncompressed_cache, + size_t file_size, size_t aio_threshold, size_t mmap_threshold, size_t max_read_buffer_size, + const MergeTreeIndexGranularityInfo * index_granularity_info_, + const ReadBufferFromFileBase::ProfileCallback & profile_callback, clockid_t clock_type); void seekToMark(size_t index); diff --git a/dbms/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.cpp b/dbms/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.cpp index af8c02318d7..25425b62aa9 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.cpp @@ -43,6 +43,7 @@ MergeTreeReverseSelectProcessor::MergeTreeReverseSelectProcessor( const PrewhereInfoPtr & prewhere_info_, bool check_columns, size_t min_bytes_to_use_direct_io_, + size_t min_bytes_to_use_mmap_io_, size_t max_read_buffer_size_, bool save_marks_in_cache_, const Names & virt_column_names_, @@ -52,7 +53,7 @@ MergeTreeReverseSelectProcessor::MergeTreeReverseSelectProcessor( MergeTreeBaseSelectProcessor{ replaceTypes(storage_.getSampleBlockForColumns(required_columns_), owned_data_part_), storage_, prewhere_info_, max_block_size_rows_, - preferred_block_size_bytes_, preferred_max_column_in_block_size_bytes_, min_bytes_to_use_direct_io_, + preferred_block_size_bytes_, preferred_max_column_in_block_size_bytes_, min_bytes_to_use_direct_io_, min_bytes_to_use_mmap_io_, max_read_buffer_size_, use_uncompressed_cache_, save_marks_in_cache_, virt_column_names_}, required_columns{std::move(required_columns_)}, data_part{owned_data_part_}, @@ -93,13 +94,13 @@ MergeTreeReverseSelectProcessor::MergeTreeReverseSelectProcessor( reader = std::make_unique( path, data_part, task_columns.columns, owned_uncompressed_cache.get(), owned_mark_cache.get(), save_marks_in_cache, storage, - all_mark_ranges, min_bytes_to_use_direct_io, max_read_buffer_size); + all_mark_ranges, min_bytes_to_use_direct_io, min_bytes_to_use_mmap_io, max_read_buffer_size); if (prewhere_info) pre_reader = std::make_unique( path, data_part, task_columns.pre_columns, owned_uncompressed_cache.get(), owned_mark_cache.get(), save_marks_in_cache, storage, - all_mark_ranges, min_bytes_to_use_direct_io, max_read_buffer_size); + all_mark_ranges, min_bytes_to_use_direct_io, min_bytes_to_use_mmap_io, max_read_buffer_size); } bool MergeTreeReverseSelectProcessor::getNewTask() @@ -111,7 +112,7 @@ try return false; } - /// We have some blocks to return in buffer. + /// We have some blocks to return in buffer. /// Return true to continue reading, but actually don't create a task. if (all_mark_ranges.empty()) return true; diff --git a/dbms/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.h b/dbms/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.h index 58202988e4c..9c37e60ab10 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.h +++ b/dbms/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.h @@ -28,6 +28,7 @@ public: const PrewhereInfoPtr & prewhere_info, bool check_columns, size_t min_bytes_to_use_direct_io, + size_t min_bytes_to_use_mmap_io, size_t max_read_buffer_size, bool save_marks_in_cache, const Names & virt_column_names = {}, diff --git a/dbms/src/Storages/MergeTree/MergeTreeSelectProcessor.cpp b/dbms/src/Storages/MergeTree/MergeTreeSelectProcessor.cpp index 51ed337367d..dac42859eef 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeSelectProcessor.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeSelectProcessor.cpp @@ -43,6 +43,7 @@ MergeTreeSelectProcessor::MergeTreeSelectProcessor( const PrewhereInfoPtr & prewhere_info_, bool check_columns_, size_t min_bytes_to_use_direct_io_, + size_t min_bytes_to_use_mmap_io_, size_t max_read_buffer_size_, bool save_marks_in_cache_, const Names & virt_column_names_, @@ -52,7 +53,7 @@ MergeTreeSelectProcessor::MergeTreeSelectProcessor( MergeTreeBaseSelectProcessor{ replaceTypes(storage_.getSampleBlockForColumns(required_columns_), owned_data_part_), storage_, prewhere_info_, max_block_size_rows_, - preferred_block_size_bytes_, preferred_max_column_in_block_size_bytes_, min_bytes_to_use_direct_io_, + preferred_block_size_bytes_, preferred_max_column_in_block_size_bytes_, min_bytes_to_use_direct_io_, min_bytes_to_use_mmap_io_, max_read_buffer_size_, use_uncompressed_cache_, save_marks_in_cache_, virt_column_names_}, required_columns{std::move(required_columns_)}, data_part{owned_data_part_}, @@ -122,13 +123,13 @@ try reader = std::make_unique( path, data_part, task_columns.columns, owned_uncompressed_cache.get(), owned_mark_cache.get(), save_marks_in_cache, storage, - all_mark_ranges, min_bytes_to_use_direct_io, max_read_buffer_size); + all_mark_ranges, min_bytes_to_use_direct_io, min_bytes_to_use_mmap_io, max_read_buffer_size); if (prewhere_info) pre_reader = std::make_unique( path, data_part, task_columns.pre_columns, owned_uncompressed_cache.get(), owned_mark_cache.get(), save_marks_in_cache, storage, - all_mark_ranges, min_bytes_to_use_direct_io, max_read_buffer_size); + all_mark_ranges, min_bytes_to_use_direct_io, min_bytes_to_use_mmap_io, max_read_buffer_size); } return true; diff --git a/dbms/src/Storages/MergeTree/MergeTreeSelectProcessor.h b/dbms/src/Storages/MergeTree/MergeTreeSelectProcessor.h index c0d93842a81..9b0aac9bab1 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeSelectProcessor.h +++ b/dbms/src/Storages/MergeTree/MergeTreeSelectProcessor.h @@ -28,6 +28,7 @@ public: const PrewhereInfoPtr & prewhere_info, bool check_columns, size_t min_bytes_to_use_direct_io, + size_t min_bytes_to_use_mmap_io, size_t max_read_buffer_size, bool save_marks_in_cache, const Names & virt_column_names = {}, diff --git a/dbms/src/Storages/MergeTree/MergeTreeSequentialBlockInputStream.cpp b/dbms/src/Storages/MergeTree/MergeTreeSequentialBlockInputStream.cpp index 081ad289d28..372c29a3ac3 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeSequentialBlockInputStream.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeSequentialBlockInputStream.cpp @@ -57,7 +57,7 @@ MergeTreeSequentialBlockInputStream::MergeTreeSequentialBlockInputStream( MarkRanges{MarkRange(0, data_part->getMarksCount())}, /* bytes to use AIO (this is hack) */ read_with_direct_io ? 1UL : std::numeric_limits::max(), - DBMS_DEFAULT_BUFFER_SIZE); + 0, DBMS_DEFAULT_BUFFER_SIZE); } diff --git a/dbms/src/Storages/MergeTree/MergeTreeSettings.h b/dbms/src/Storages/MergeTree/MergeTreeSettings.h index 67e58e6083f..10f221b8e2a 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeSettings.h +++ b/dbms/src/Storages/MergeTree/MergeTreeSettings.h @@ -78,7 +78,7 @@ struct MergeTreeSettings : public SettingsCollection /** Compatibility settings */ \ M(SettingBool, compatibility_allow_sampling_expression_not_in_primary_key, false, "Allow to create a table with sampling expression not in primary key. This is needed only to temporarily allow to run the server with wrong tables for backward compatibility.", 0) \ M(SettingBool, use_minimalistic_checksums_in_zookeeper, true, "Use small format (dozens bytes) for part checksums in ZooKeeper instead of ordinary ones (dozens KB). Before enabling check that all replicas support new format.", 0) \ - M(SettingBool, use_minimalistic_part_header_in_zookeeper, false, "Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes (/columns and /checksums). This can dramatically reduce snapshot size in ZooKeeper. Before enabling check that all replicas support new format.", 0) \ + M(SettingBool, use_minimalistic_part_header_in_zookeeper, true, "Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes (/columns and /checksums). This can dramatically reduce snapshot size in ZooKeeper. Before enabling check that all replicas support new format.", 0) \ M(SettingUInt64, finished_mutations_to_keep, 100, "How many records about mutations that are done to keep. If zero, then keep all of them.", 0) \ M(SettingUInt64, min_merge_bytes_to_use_direct_io, 10ULL * 1024 * 1024 * 1024, "Minimal amount of bytes to enable O_DIRECT in merge (0 - disabled).", 0) \ M(SettingUInt64, index_granularity_bytes, 10 * 1024 * 1024, "Approximate amount of bytes in single granule (0 - disabled).", 0) \ diff --git a/dbms/src/Storages/MergeTree/MergeTreeThreadSelectBlockInputProcessor.cpp b/dbms/src/Storages/MergeTree/MergeTreeThreadSelectBlockInputProcessor.cpp index cc090833f1e..ff2c058ede8 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeThreadSelectBlockInputProcessor.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeThreadSelectBlockInputProcessor.cpp @@ -20,10 +20,11 @@ MergeTreeThreadSelectBlockInputProcessor::MergeTreeThreadSelectBlockInputProcess const Settings & settings, const Names & virt_column_names_) : - MergeTreeBaseSelectProcessor{pool_->getHeader(), storage_, prewhere_info_, max_block_size_rows_, - preferred_block_size_bytes_, preferred_max_column_in_block_size_bytes_, - settings.min_bytes_to_use_direct_io, settings.max_read_buffer_size, - use_uncompressed_cache_, true, virt_column_names_}, + MergeTreeBaseSelectProcessor{ + pool_->getHeader(), storage_, prewhere_info_, max_block_size_rows_, + preferred_block_size_bytes_, preferred_max_column_in_block_size_bytes_, + settings.min_bytes_to_use_direct_io, settings.min_bytes_to_use_mmap_io, settings.max_read_buffer_size, + use_uncompressed_cache_, true, virt_column_names_}, thread{thread_}, pool{pool_} { @@ -73,12 +74,12 @@ bool MergeTreeThreadSelectBlockInputProcessor::getNewTask() reader = std::make_unique( path, task->data_part, task->columns, owned_uncompressed_cache.get(), owned_mark_cache.get(), save_marks_in_cache, - storage, rest_mark_ranges, min_bytes_to_use_direct_io, max_read_buffer_size, MergeTreeReader::ValueSizeMap{}, profile_callback); + storage, rest_mark_ranges, min_bytes_to_use_direct_io, min_bytes_to_use_mmap_io, max_read_buffer_size, MergeTreeReader::ValueSizeMap{}, profile_callback); if (prewhere_info) pre_reader = std::make_unique( path, task->data_part, task->pre_columns, owned_uncompressed_cache.get(), owned_mark_cache.get(), save_marks_in_cache, - storage, rest_mark_ranges, min_bytes_to_use_direct_io, + storage, rest_mark_ranges, min_bytes_to_use_direct_io, min_bytes_to_use_mmap_io, max_read_buffer_size, MergeTreeReader::ValueSizeMap{}, profile_callback); } else @@ -90,13 +91,13 @@ bool MergeTreeThreadSelectBlockInputProcessor::getNewTask() /// retain avg_value_size_hints reader = std::make_unique( path, task->data_part, task->columns, owned_uncompressed_cache.get(), owned_mark_cache.get(), save_marks_in_cache, - storage, rest_mark_ranges, min_bytes_to_use_direct_io, max_read_buffer_size, + storage, rest_mark_ranges, min_bytes_to_use_direct_io, min_bytes_to_use_mmap_io, max_read_buffer_size, reader->getAvgValueSizeHints(), profile_callback); if (prewhere_info) pre_reader = std::make_unique( path, task->data_part, task->pre_columns, owned_uncompressed_cache.get(), owned_mark_cache.get(), save_marks_in_cache, - storage, rest_mark_ranges, min_bytes_to_use_direct_io, + storage, rest_mark_ranges, min_bytes_to_use_direct_io, min_bytes_to_use_mmap_io, max_read_buffer_size, pre_reader->getAvgValueSizeHints(), profile_callback); } } diff --git a/dbms/src/Storages/MergeTree/ReplicatedMergeTreeTableMetadata.cpp b/dbms/src/Storages/MergeTree/ReplicatedMergeTreeTableMetadata.cpp index edc031bc53b..703659bb4ea 100644 --- a/dbms/src/Storages/MergeTree/ReplicatedMergeTreeTableMetadata.cpp +++ b/dbms/src/Storages/MergeTree/ReplicatedMergeTreeTableMetadata.cpp @@ -34,6 +34,12 @@ ReplicatedMergeTreeTableMetadata::ReplicatedMergeTreeTableMetadata(const MergeTr merging_params_mode = static_cast(data.merging_params.mode); sign_column = data.merging_params.sign_column; + /// This code may looks strange, but previously we had only one entity: PRIMARY KEY (or ORDER BY, it doesn't matter) + /// Now we have two different entities ORDER BY and it's optional prefix -- PRIMARY KEY. + /// In most cases user doesn't specify PRIMARY KEY and semantically it's equal to ORDER BY. + /// So rules in zookeeper metadata is following: + /// - When we have only ORDER BY, than store it in "primary key:" row of /metadata + /// - When we have both, than store PRIMARY KEY in "primary key:" row and ORDER BY in "sorting key:" row of /metadata if (!data.primary_key_ast) primary_key = formattedAST(MergeTreeData::extractKeyExpressionList(data.order_by_ast)); else diff --git a/dbms/src/Storages/MergeTree/registerStorageMergeTree.cpp b/dbms/src/Storages/MergeTree/registerStorageMergeTree.cpp index 633699d1529..858a49f015f 100644 --- a/dbms/src/Storages/MergeTree/registerStorageMergeTree.cpp +++ b/dbms/src/Storages/MergeTree/registerStorageMergeTree.cpp @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -573,6 +574,7 @@ static StoragePtr create(const StorageFactory::Arguments & args) ASTPtr primary_key_ast; ASTPtr sample_by_ast; ASTPtr ttl_table_ast; + ASTPtr settings_ast; IndicesDescription indices_description; ConstraintsDescription constraints_description; @@ -599,12 +601,16 @@ static StoragePtr create(const StorageFactory::Arguments & args) if (args.storage_def->ttl_table) ttl_table_ast = args.storage_def->ttl_table->ptr(); + if (args.query.columns_list && args.query.columns_list->indices) for (const auto & index : args.query.columns_list->indices->children) indices_description.indices.push_back( std::dynamic_pointer_cast(index->clone())); storage_settings->loadFromQuery(*args.storage_def); + + if (args.storage_def->settings) + settings_ast = args.storage_def->settings->ptr(); } else { @@ -637,18 +643,26 @@ static StoragePtr create(const StorageFactory::Arguments & args) throw Exception("You must set the setting `allow_experimental_data_skipping_indices` to 1 " \ "before using data skipping indices.", ErrorCodes::BAD_ARGUMENTS); + StorageInMemoryMetadata metadata{ + .columns = args.columns, + .indices = indices_description, + .constraints = args.constraints, + .partition_by_ast = partition_by_ast, + .order_by_ast = order_by_ast, + .primary_key_ast = primary_key_ast, + .ttl_for_table_ast = ttl_table_ast, + .sample_by_ast = sample_by_ast, + .settings_ast = settings_ast, + }; if (replicated) return StorageReplicatedMergeTree::create( zookeeper_path, replica_name, args.attach, args.database_name, args.table_name, args.relative_data_path, - args.columns, indices_description, args.constraints, - args.context, date_column_name, partition_by_ast, order_by_ast, primary_key_ast, - sample_by_ast, ttl_table_ast, merging_params, std::move(storage_settings), + metadata, args.context, date_column_name, merging_params, std::move(storage_settings), args.has_force_restore_data_flag); else return StorageMergeTree::create( - args.database_name, args.table_name, args.relative_data_path, args.columns, indices_description, - args.constraints, args.attach, args.context, date_column_name, partition_by_ast, order_by_ast, - primary_key_ast, sample_by_ast, ttl_table_ast, merging_params, std::move(storage_settings), + args.database_name, args.table_name, args.relative_data_path, metadata, args.attach, args.context, + date_column_name, merging_params, std::move(storage_settings), args.has_force_restore_data_flag); } diff --git a/dbms/src/Storages/SelectQueryInfo.h b/dbms/src/Storages/SelectQueryInfo.h index 11907151575..84cf3a32aa1 100644 --- a/dbms/src/Storages/SelectQueryInfo.h +++ b/dbms/src/Storages/SelectQueryInfo.h @@ -20,6 +20,7 @@ struct PrewhereInfo ExpressionActionsPtr remove_columns_actions; String prewhere_column_name; bool remove_prewhere_column = false; + bool need_filter = false; PrewhereInfo() = default; explicit PrewhereInfo(ExpressionActionsPtr prewhere_actions_, String prewhere_column_name_) diff --git a/dbms/src/Storages/StorageBuffer.cpp b/dbms/src/Storages/StorageBuffer.cpp index 811a4ff3e59..0433f8848b6 100644 --- a/dbms/src/Storages/StorageBuffer.cpp +++ b/dbms/src/Storages/StorageBuffer.cpp @@ -699,6 +699,18 @@ void StorageBuffer::flushThread() } while (!shutdown_event.tryWait(1000)); } +void StorageBuffer::checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */) +{ + for (const auto & command : commands) + { + if (command.type != AlterCommand::Type::ADD_COLUMN && command.type != AlterCommand::Type::MODIFY_COLUMN + && command.type != AlterCommand::Type::DROP_COLUMN && command.type != AlterCommand::Type::COMMENT_COLUMN) + throw Exception( + "Alter of type '" + alterTypeToString(command.type) + "' is not supported by storage " + getName(), + ErrorCodes::NOT_IMPLEMENTED); + } +} + void StorageBuffer::alter(const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) { @@ -710,12 +722,10 @@ void StorageBuffer::alter(const AlterCommands & params, const Context & context, /// So that no blocks of the old structure remain. optimize({} /*query*/, {} /*partition_id*/, false /*final*/, false /*deduplicate*/, context); - auto new_columns = getColumns(); - auto new_indices = getIndices(); - auto new_constraints = getConstraints(); - params.applyForColumnsOnly(new_columns); - context.getDatabase(database_name_)->alterTable(context, table_name_, new_columns, new_indices, new_constraints, {}); - setColumns(std::move(new_columns)); + StorageInMemoryMetadata metadata = getInMemoryMetadata(); + params.apply(metadata); + context.getDatabase(database_name_)->alterTable(context, table_name_, metadata); + setColumns(std::move(metadata.columns)); } diff --git a/dbms/src/Storages/StorageBuffer.h b/dbms/src/Storages/StorageBuffer.h index 1c565a7d8f0..e7bdbc947f5 100644 --- a/dbms/src/Storages/StorageBuffer.h +++ b/dbms/src/Storages/StorageBuffer.h @@ -94,9 +94,10 @@ public: bool mayBenefitFromIndexForIn(const ASTPtr & left_in_operand, const Context & query_context) const override; - /// The structure of the subordinate table is not checked and does not change. - void alter( - const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) override; + void checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */) override; + + /// The structure of the subordinate table is not checked and does not change. + void alter(const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) override; ~StorageBuffer() override; diff --git a/dbms/src/Storages/StorageDistributed.cpp b/dbms/src/Storages/StorageDistributed.cpp index a12da69c2e6..86ef945d49f 100644 --- a/dbms/src/Storages/StorageDistributed.cpp +++ b/dbms/src/Storages/StorageDistributed.cpp @@ -393,20 +393,31 @@ BlockOutputStreamPtr StorageDistributed::write(const ASTPtr &, const Context & c } -void StorageDistributed::alter( - const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) +void StorageDistributed::checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */) +{ + for (const auto & command : commands) + { + if (command.type != AlterCommand::Type::ADD_COLUMN + && command.type != AlterCommand::Type::MODIFY_COLUMN + && command.type != AlterCommand::Type::DROP_COLUMN + && command.type != AlterCommand::Type::COMMENT_COLUMN) + + throw Exception("Alter of type '" + alterTypeToString(command.type) + "' is not supported by storage " + getName(), + ErrorCodes::NOT_IMPLEMENTED); + } +} + +void StorageDistributed::alter(const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) { lockStructureExclusively(table_lock_holder, context.getCurrentQueryId()); const String current_database_name = getDatabaseName(); const String current_table_name = getTableName(); - auto new_columns = getColumns(); - auto new_indices = getIndices(); - auto new_constraints = getConstraints(); - params.applyForColumnsOnly(new_columns); - context.getDatabase(current_database_name)->alterTable(context, current_table_name, new_columns, new_indices, new_constraints, {}); - setColumns(std::move(new_columns)); + StorageInMemoryMetadata metadata = getInMemoryMetadata(); + params.apply(metadata); + context.getDatabase(current_database_name)->alterTable(context, current_table_name, metadata); + setColumns(std::move(metadata.columns)); } diff --git a/dbms/src/Storages/StorageDistributed.h b/dbms/src/Storages/StorageDistributed.h index 2a1a4fce9ce..e29587691d4 100644 --- a/dbms/src/Storages/StorageDistributed.h +++ b/dbms/src/Storages/StorageDistributed.h @@ -84,10 +84,12 @@ public: void rename(const String & new_path_to_table_data, const String & new_database_name, const String & new_table_name, TableStructureWriteLockHolder &) override; + + void checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */) override; + /// in the sub-tables, you need to manually add and delete columns /// the structure of the sub-table is not checked - void alter( - const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) override; + void alter(const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) override; void startup() override; void shutdown() override; diff --git a/dbms/src/Storages/StorageFile.cpp b/dbms/src/Storages/StorageFile.cpp index 5b7112c8651..a15d7ba414c 100644 --- a/dbms/src/Storages/StorageFile.cpp +++ b/dbms/src/Storages/StorageFile.cpp @@ -23,6 +23,8 @@ #include #include +#include +#include #include #include @@ -39,6 +41,7 @@ namespace ErrorCodes { extern const int CANNOT_WRITE_TO_FILE_DESCRIPTOR; extern const int CANNOT_SEEK_THROUGH_FILE; + extern const int CANNOT_TRUNCATE_FILE; extern const int DATABASE_ACCESS_DENIED; extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int UNKNOWN_IDENTIFIER; @@ -61,11 +64,12 @@ static std::vector listFilesWithRegexpMatching(const std::string & const std::string suffix_with_globs = for_match.substr(end_of_path_without_globs); /// begin with '/' const size_t next_slash = suffix_with_globs.find('/', 1); - re2::RE2 matcher(makeRegexpPatternFromGlobs(suffix_with_globs.substr(0, next_slash))); + auto regexp = makeRegexpPatternFromGlobs(suffix_with_globs.substr(0, next_slash)); + re2::RE2 matcher(regexp); std::vector result; const std::string prefix_without_globs = path_for_ls + for_match.substr(1, end_of_path_without_globs); - if (!fs::exists(fs::path(prefix_without_globs.data()))) + if (!fs::exists(fs::path(prefix_without_globs))) { return result; } @@ -108,12 +112,13 @@ static void checkCreationIsAllowed(const Context & context_global, const std::st if (context_global.getApplicationType() != Context::ApplicationType::SERVER) return; - if (!startsWith(table_path, db_dir_path)) - throw Exception("Part path " + table_path + " is not inside " + db_dir_path, ErrorCodes::DATABASE_ACCESS_DENIED); + /// "/dev/null" is allowed for perf testing + if (!startsWith(table_path, db_dir_path) && table_path != "/dev/null") + throw Exception("File is not inside " + db_dir_path, ErrorCodes::DATABASE_ACCESS_DENIED); Poco::File table_path_poco_file = Poco::File(table_path); if (table_path_poco_file.exists() && table_path_poco_file.isDirectory()) - throw Exception("File " + table_path + " must not be a directory", ErrorCodes::INCORRECT_FILE_NAME); + throw Exception("File must not be a directory", ErrorCodes::INCORRECT_FILE_NAME); } } @@ -144,11 +149,10 @@ StorageFile::StorageFile(const std::string & table_path_, const std::string & us const std::string path = poco_path.absolute().toString(); if (path.find_first_of("*?{") == std::string::npos) - { paths.push_back(path); - } else paths = listFilesWithRegexpMatching("/", path); + for (const auto & cur_path : paths) checkCreationIsAllowed(args.context, user_files_absolute_path, cur_path); } @@ -199,12 +203,12 @@ public: } storage->table_fd_was_used = true; - read_buf = getReadBuffer(compression_method, storage->table_fd); + read_buf = wrapReadBufferWithCompressionMethod(std::make_unique(storage->table_fd), compression_method); } else { shared_lock = std::shared_lock(storage->rwlock); - read_buf = getReadBuffer(compression_method, file_path); + read_buf = wrapReadBufferWithCompressionMethod(std::make_unique(file_path), compression_method); } reader = FormatFactory::instance().getInput(storage->format_name, *read_buf, storage->getSampleBlock(), context, max_block_size); @@ -265,7 +269,7 @@ BlockInputStreams StorageFile::read( for (const auto & file_path : paths) { BlockInputStreamPtr cur_block = std::make_shared( - std::static_pointer_cast(shared_from_this()), context, max_block_size, file_path, IStorage::chooseCompressionMethod(file_path, compression_method)); + std::static_pointer_cast(shared_from_this()), context, max_block_size, file_path, chooseCompressionMethod(file_path, compression_method)); blocks_input.push_back(column_defaults.empty() ? cur_block : std::make_shared(cur_block, column_defaults, context)); } return narrowBlockInputStreams(blocks_input, num_streams); @@ -287,13 +291,15 @@ public: * INSERT data; SELECT *; last SELECT returns only insert_data */ storage.table_fd_was_used = true; - write_buf = getWriteBuffer(compression_method, storage.table_fd); + write_buf = wrapWriteBufferWithCompressionMethod(std::make_unique(storage.table_fd), compression_method, 3); } else { if (storage.paths.size() != 1) throw Exception("Table '" + storage.table_name + "' is in readonly mode because of globs in filepath", ErrorCodes::DATABASE_ACCESS_DENIED); - write_buf = getWriteBuffer(compression_method, storage.paths[0], DBMS_DEFAULT_BUFFER_SIZE, O_WRONLY | O_APPEND | O_CREAT); + write_buf = wrapWriteBufferWithCompressionMethod( + std::make_unique(storage.paths[0], DBMS_DEFAULT_BUFFER_SIZE, O_WRONLY | O_APPEND | O_CREAT), + compression_method, 3); } writer = FormatFactory::instance().getOutput(storage.format_name, *write_buf, storage.getSampleBlock(), context); @@ -332,8 +338,7 @@ BlockOutputStreamPtr StorageFile::write( const ASTPtr & /*query*/, const Context & context) { - return std::make_shared(*this, - IStorage::chooseCompressionMethod(paths[0], compression_method), context); + return std::make_shared(*this, chooseCompressionMethod(paths[0], compression_method), context); } Strings StorageFile::getDataPaths() const @@ -362,6 +367,28 @@ void StorageFile::rename(const String & new_path_to_table_data, const String & n database_name = new_database_name; } +void StorageFile::truncate(const ASTPtr & /*query*/, const Context & /* context */, TableStructureWriteLockHolder &) +{ + if (paths.size() != 1) + throw Exception("Can't truncate table '" + table_name + "' in readonly mode", ErrorCodes::DATABASE_ACCESS_DENIED); + + std::unique_lock lock(rwlock); + + if (use_table_fd) + { + if (0 != ::ftruncate(table_fd, 0)) + throwFromErrno("Cannot truncate file at fd " + toString(table_fd), ErrorCodes::CANNOT_TRUNCATE_FILE); + } + else + { + if (!Poco::File(paths[0]).exists()) + return; + + if (0 != ::truncate(paths[0].c_str(), 0)) + throwFromErrnoWithPath("Cannot truncate file " + paths[0], paths[0], ErrorCodes::CANNOT_TRUNCATE_FILE); + } +} + void registerStorageFile(StorageFactory & factory) { diff --git a/dbms/src/Storages/StorageFile.h b/dbms/src/Storages/StorageFile.h index e3871166f03..23a6d6e7ff5 100644 --- a/dbms/src/Storages/StorageFile.h +++ b/dbms/src/Storages/StorageFile.h @@ -38,6 +38,8 @@ public: const ASTPtr & query, const Context & context) override; + void truncate(const ASTPtr & /*query*/, const Context & /* context */, TableStructureWriteLockHolder &) override; + void rename(const String & new_path_to_table_data, const String & new_database_name, const String & new_table_name, TableStructureWriteLockHolder &) override; Strings getDataPaths() const override; diff --git a/dbms/src/Storages/StorageHDFS.cpp b/dbms/src/Storages/StorageHDFS.cpp index 3f1386cca5e..8e5db910092 100644 --- a/dbms/src/Storages/StorageHDFS.cpp +++ b/dbms/src/Storages/StorageHDFS.cpp @@ -67,7 +67,7 @@ public: UInt64 max_block_size, const CompressionMethod compression_method) { - auto read_buf = getReadBuffer(compression_method, uri); + auto read_buf = wrapReadBufferWithCompressionMethod(std::make_unique(uri), compression_method); auto input_stream = FormatFactory::instance().getInput(format, *read_buf, sample_block, context, max_block_size); reader = std::make_shared>(input_stream, std::move(read_buf)); @@ -112,7 +112,7 @@ public: const CompressionMethod compression_method) : sample_block(sample_block_) { - write_buf = getWriteBuffer(compression_method, uri); + write_buf = wrapWriteBufferWithCompressionMethod(std::make_unique(uri), compression_method, 3); writer = FormatFactory::instance().getOutput(format, *write_buf, sample_block, context); } @@ -213,7 +213,7 @@ BlockInputStreams StorageHDFS::read( for (const auto & res_path : res_paths) { result.push_back(std::make_shared(uri_without_path + res_path, format_name, getSampleBlock(), context_, - max_block_size, IStorage::chooseCompressionMethod(res_path, compression_method))); + max_block_size, chooseCompressionMethod(res_path, compression_method))); } return narrowBlockInputStreams(result, num_streams); @@ -231,7 +231,7 @@ BlockOutputStreamPtr StorageHDFS::write(const ASTPtr & /*query*/, const Context format_name, getSampleBlock(), context, - IStorage::chooseCompressionMethod(uri, compression_method)); + chooseCompressionMethod(uri, compression_method)); } void registerStorageHDFS(StorageFactory & factory) diff --git a/dbms/src/Storages/StorageInMemoryMetadata.h b/dbms/src/Storages/StorageInMemoryMetadata.h new file mode 100644 index 00000000000..42fdfa45fec --- /dev/null +++ b/dbms/src/Storages/StorageInMemoryMetadata.h @@ -0,0 +1,38 @@ +#pragma once + +#include +#include +#include +#include + +namespace DB +{ + +/// Structure represent table metadata stored in memory. +/// Only one storage engine support all fields -- MergeTree. +/// Complete table AST can be recreated from this struct. +struct StorageInMemoryMetadata +{ + /// Columns of table with their names, types, + /// defaults, comments, etc. All table engines have columns. + ColumnsDescription columns; + /// Table indices. Currently supported for MergeTree only. + IndicesDescription indices; + /// Table constraints. Currently supported for MergeTree only. + ConstraintsDescription constraints; + /// PARTITION BY expression. Currently supported for MergeTree only. + ASTPtr partition_by_ast = nullptr; + /// ORDER BY expression. Required field for all MergeTree tables + /// even in old syntax MergeTree(partition_key, order_by, ...) + ASTPtr order_by_ast = nullptr; + /// PRIMARY KEY expression. If absent, than equal to order_by_ast. + ASTPtr primary_key_ast = nullptr; + /// TTL expression for whole table. Supported for MergeTree only. + ASTPtr ttl_for_table_ast = nullptr; + /// SAMPLE BY expression. Supported for MergeTree only. + ASTPtr sample_by_ast = nullptr; + /// SETTINGS expression. Supported for MergeTree, Buffer and Kafka. + ASTPtr settings_ast = nullptr; +}; + +} diff --git a/dbms/src/Storages/StorageJoin.cpp b/dbms/src/Storages/StorageJoin.cpp index 954d096cbd7..871bd595f38 100644 --- a/dbms/src/Storages/StorageJoin.cpp +++ b/dbms/src/Storages/StorageJoin.cpp @@ -10,6 +10,7 @@ #include #include #include +#include #include /// toLower #include @@ -71,7 +72,12 @@ void StorageJoin::truncate(const ASTPtr &, const Context &, TableStructureWriteL HashJoinPtr StorageJoin::getJoin(std::shared_ptr analyzed_join) const { if (kind != analyzed_join->kind() || strictness != analyzed_join->strictness()) - throw Exception("Table " + table_name + " has incompatible type of JOIN.", ErrorCodes::INCOMPATIBLE_TYPE_OF_JOIN); + throw Exception("Table " + backQuote(table_name) + " has incompatible type of JOIN.", ErrorCodes::INCOMPATIBLE_TYPE_OF_JOIN); + + if ((analyzed_join->forceNullableRight() && !use_nulls) || + (!analyzed_join->forceNullableRight() && isLeftOrFull(analyzed_join->kind()) && use_nulls)) + throw Exception("Table " + backQuote(table_name) + " needs the same join_use_nulls setting as present in LEFT or FULL JOIN.", + ErrorCodes::INCOMPATIBLE_TYPE_OF_JOIN); /// TODO: check key columns @@ -140,35 +146,36 @@ void registerStorageJoin(StorageFactory & factory) { const String strictness_str = Poco::toLower(*opt_strictness_id); - if (strictness_str == "any" || strictness_str == "\'any\'") + if (strictness_str == "any") { if (old_any_join) strictness = ASTTableJoin::Strictness::RightAny; else strictness = ASTTableJoin::Strictness::Any; } - else if (strictness_str == "all" || strictness_str == "\'all\'") + else if (strictness_str == "all") strictness = ASTTableJoin::Strictness::All; - else if (strictness_str == "semi" || strictness_str == "\'semi\'") + else if (strictness_str == "semi") strictness = ASTTableJoin::Strictness::Semi; - else if (strictness_str == "anti" || strictness_str == "\'anti\'") + else if (strictness_str == "anti") strictness = ASTTableJoin::Strictness::Anti; } if (strictness == ASTTableJoin::Strictness::Unspecified) - throw Exception("First parameter of storage Join must be ANY or ALL or SEMI or ANTI.", ErrorCodes::BAD_ARGUMENTS); + throw Exception("First parameter of storage Join must be ANY or ALL or SEMI or ANTI (without quotes).", + ErrorCodes::BAD_ARGUMENTS); if (auto opt_kind_id = tryGetIdentifierName(engine_args[1])) { const String kind_str = Poco::toLower(*opt_kind_id); - if (kind_str == "left" || kind_str == "\'left\'") + if (kind_str == "left") kind = ASTTableJoin::Kind::Left; - else if (kind_str == "inner" || kind_str == "\'inner\'") + else if (kind_str == "inner") kind = ASTTableJoin::Kind::Inner; - else if (kind_str == "right" || kind_str == "\'right\'") + else if (kind_str == "right") kind = ASTTableJoin::Kind::Right; - else if (kind_str == "full" || kind_str == "\'full\'") + else if (kind_str == "full") { if (strictness == ASTTableJoin::Strictness::Any) strictness = ASTTableJoin::Strictness::RightAny; @@ -177,7 +184,8 @@ void registerStorageJoin(StorageFactory & factory) } if (kind == ASTTableJoin::Kind::Comma) - throw Exception("Second parameter of storage Join must be LEFT or INNER or RIGHT or FULL.", ErrorCodes::BAD_ARGUMENTS); + throw Exception("Second parameter of storage Join must be LEFT or INNER or RIGHT or FULL (without quotes).", + ErrorCodes::BAD_ARGUMENTS); Names key_names; key_names.reserve(engine_args.size() - 2); diff --git a/dbms/src/Storages/StorageMerge.cpp b/dbms/src/Storages/StorageMerge.cpp index f2cfa62a375..5be6353514e 100644 --- a/dbms/src/Storages/StorageMerge.cpp +++ b/dbms/src/Storages/StorageMerge.cpp @@ -252,7 +252,7 @@ BlockInputStreams StorageMerge::read( else { source_streams.emplace_back(std::make_shared( - header, [=]() mutable -> BlockInputStreamPtr + header, [=, this]() mutable -> BlockInputStreamPtr { BlockInputStreams streams = createSourceStreams(query_info, processed_stage, max_block_size, header, storage, struct_lock, real_column_names, @@ -413,17 +413,27 @@ DatabaseTablesIteratorPtr StorageMerge::getDatabaseIterator(const Context & cont } +void StorageMerge::checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */) +{ + for (const auto & command : commands) + { + if (command.type != AlterCommand::Type::ADD_COLUMN && command.type != AlterCommand::Type::MODIFY_COLUMN + && command.type != AlterCommand::Type::DROP_COLUMN && command.type != AlterCommand::Type::COMMENT_COLUMN) + throw Exception( + "Alter of type '" + alterTypeToString(command.type) + "' is not supported by storage " + getName(), + ErrorCodes::NOT_IMPLEMENTED); + } +} + void StorageMerge::alter( const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) { lockStructureExclusively(table_lock_holder, context.getCurrentQueryId()); - auto new_columns = getColumns(); - auto new_indices = getIndices(); - auto new_constraints = getConstraints(); - params.applyForColumnsOnly(new_columns); - context.getDatabase(database_name)->alterTable(context, table_name, new_columns, new_indices, new_constraints, {}); - setColumns(new_columns); + StorageInMemoryMetadata storage_metadata = getInMemoryMetadata(); + params.apply(storage_metadata); + context.getDatabase(database_name)->alterTable(context, table_name, storage_metadata); + setColumns(storage_metadata.columns); } Block StorageMerge::getQueryHeader( diff --git a/dbms/src/Storages/StorageMerge.h b/dbms/src/Storages/StorageMerge.h index debcb4da58e..70bed6498f1 100644 --- a/dbms/src/Storages/StorageMerge.h +++ b/dbms/src/Storages/StorageMerge.h @@ -48,10 +48,12 @@ public: database_name = new_database_name; } + + void checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */) override; + /// you need to add and remove columns in the sub-tables manually /// the structure of sub-tables is not checked - void alter( - const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) override; + void alter(const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) override; bool mayBenefitFromIndexForIn(const ASTPtr & left_in_operand, const Context & query_context) const override; diff --git a/dbms/src/Storages/StorageMergeTree.cpp b/dbms/src/Storages/StorageMergeTree.cpp index 8680e076d13..9a0583a464a 100644 --- a/dbms/src/Storages/StorageMergeTree.cpp +++ b/dbms/src/Storages/StorageMergeTree.cpp @@ -56,27 +56,27 @@ StorageMergeTree::StorageMergeTree( const String & database_name_, const String & table_name_, const String & relative_data_path_, - const ColumnsDescription & columns_, - const IndicesDescription & indices_, - const ConstraintsDescription & constraints_, + const StorageInMemoryMetadata & metadata, bool attach, Context & context_, const String & date_column_name, - const ASTPtr & partition_by_ast_, - const ASTPtr & order_by_ast_, - const ASTPtr & primary_key_ast_, - const ASTPtr & sample_by_ast_, /// nullptr, if sampling is not supported. - const ASTPtr & ttl_table_ast_, const MergingParams & merging_params_, std::unique_ptr storage_settings_, bool has_force_restore_data_flag) - : MergeTreeData(database_name_, table_name_, relative_data_path_, - columns_, indices_, constraints_, - context_, date_column_name, partition_by_ast_, order_by_ast_, primary_key_ast_, - sample_by_ast_, ttl_table_ast_, merging_params_, - std::move(storage_settings_), false, attach), - reader(*this), writer(*this), - merger_mutator(*this, global_context.getBackgroundPool().getNumberOfThreads()) + : MergeTreeData( + database_name_, + table_name_, + relative_data_path_, + metadata, + context_, + date_column_name, + merging_params_, + std::move(storage_settings_), + false, + attach) + , reader(*this) + , writer(*this) + , merger_mutator(*this, global_context.getBackgroundPool().getNumberOfThreads()) { loadDataParts(has_force_restore_data_flag); @@ -252,47 +252,19 @@ void StorageMergeTree::alter( lockNewDataStructureExclusively(table_lock_holder, context.getCurrentQueryId()); - checkAlter(params, context); + StorageInMemoryMetadata metadata = getInMemoryMetadata(); - auto new_columns = getColumns(); - auto new_indices = getIndices(); - auto new_constraints = getConstraints(); - ASTPtr new_order_by_ast = order_by_ast; - ASTPtr new_primary_key_ast = primary_key_ast; - ASTPtr new_ttl_table_ast = ttl_table_ast; - SettingsChanges new_changes; - - params.apply(new_columns, new_indices, new_constraints, new_order_by_ast, new_primary_key_ast, new_ttl_table_ast, new_changes); - - /// Modifier for storage AST in /metadata/storage_db/storage.sql - IDatabase::ASTModifier storage_modifier = [&](IAST & ast) - { - auto & storage_ast = ast.as(); - - if (new_order_by_ast.get() != order_by_ast.get()) - storage_ast.set(storage_ast.order_by, new_order_by_ast); - - if (new_primary_key_ast.get() != primary_key_ast.get()) - storage_ast.set(storage_ast.primary_key, new_primary_key_ast); - - if (new_ttl_table_ast.get() != ttl_table_ast.get()) - storage_ast.set(storage_ast.ttl_table, new_ttl_table_ast); - - if (!new_changes.empty()) - { - auto settings_modifier = getSettingsModifier(new_changes); - settings_modifier(ast); - } - }; + params.apply(metadata); /// Update metdata in memory - auto update_metadata = [&]() + auto update_metadata = [&metadata, &table_lock_holder, this]() { - changeSettings(new_changes, table_lock_holder); - /// Reinitialize primary key because primary key column types might have changed. - setProperties(new_order_by_ast, new_primary_key_ast, new_columns, new_indices, new_constraints); - setTTLExpressions(new_columns.getColumnTTLs(), new_ttl_table_ast); + changeSettings(metadata.settings_ast, table_lock_holder); + /// Reinitialize primary key because primary key column types might have changed. + setProperties(metadata); + + setTTLExpressions(metadata.columns.getColumnTTLs(), metadata.ttl_for_table_ast); }; /// This alter can be performed at metadata level only @@ -300,7 +272,7 @@ void StorageMergeTree::alter( { lockStructureExclusively(table_lock_holder, context.getCurrentQueryId()); - context.getDatabase(current_database_name)->alterTable(context, current_table_name, new_columns, new_indices, new_constraints, storage_modifier); + context.getDatabase(current_database_name)->alterTable(context, current_table_name, metadata); update_metadata(); } @@ -308,16 +280,16 @@ void StorageMergeTree::alter( { /// NOTE: Here, as in ReplicatedMergeTree, you can do ALTER which does not block the writing of data for a long time. - /// Also block moves, because they can replace part with old state + /// Also block moves, because they can replace part with old state. auto merge_blocker = merger_mutator.merges_blocker.cancel(); auto moves_blocked = parts_mover.moves_blocker.cancel(); - auto transactions = prepareAlterTransactions(new_columns, new_indices, context); + auto transactions = prepareAlterTransactions(metadata.columns, metadata.indices, context); lockStructureExclusively(table_lock_holder, context.getCurrentQueryId()); - context.getDatabase(current_database_name)->alterTable(context, current_table_name, new_columns, new_indices, new_constraints, storage_modifier); + context.getDatabase(current_database_name)->alterTable(context, current_table_name, metadata); update_metadata(); @@ -930,25 +902,18 @@ void StorageMergeTree::clearColumnOrIndexInPartition(const ASTPtr & partition, c std::vector transactions; - auto new_columns = getColumns(); - auto new_indices = getIndices(); - auto new_constraints = getConstraints(); - ASTPtr ignored_order_by_ast; - ASTPtr ignored_primary_key_ast; - ASTPtr ignored_ttl_table_ast; - SettingsChanges ignored_settings_changes; - alter_command.apply(new_columns, new_indices, new_constraints, ignored_order_by_ast, - ignored_primary_key_ast, ignored_ttl_table_ast, ignored_settings_changes); + StorageInMemoryMetadata metadata = getInMemoryMetadata(); + alter_command.apply(metadata); - auto columns_for_parts = new_columns.getAllPhysical(); + auto columns_for_parts = metadata.columns.getAllPhysical(); for (const auto & part : parts) { if (part->info.partition_id != partition_id) throw Exception("Unexpected partition ID " + part->info.partition_id + ". This is a bug.", ErrorCodes::LOGICAL_ERROR); MergeTreeData::AlterDataPartTransactionPtr transaction(new MergeTreeData::AlterDataPartTransaction(part)); - alterDataPart(columns_for_parts, new_indices.indices, false, transaction); + alterDataPart(columns_for_parts, metadata.indices.indices, false, transaction); if (transaction->isValid()) transactions.push_back(std::move(transaction)); diff --git a/dbms/src/Storages/StorageMergeTree.h b/dbms/src/Storages/StorageMergeTree.h index 0c26d287c72..2a30c5ffe25 100644 --- a/dbms/src/Storages/StorageMergeTree.h +++ b/dbms/src/Storages/StorageMergeTree.h @@ -67,7 +67,7 @@ public: void drop(TableStructureWriteLockHolder &) override; void truncate(const ASTPtr &, const Context &, TableStructureWriteLockHolder &) override; - void alter(const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) override; + void alter(const AlterCommands & commands, const Context & context, TableStructureWriteLockHolder & table_lock_holder) override; void checkTableCanBeDropped() const override; @@ -161,17 +161,10 @@ protected: const String & database_name_, const String & table_name_, const String & relative_data_path_, - const ColumnsDescription & columns_, - const IndicesDescription & indices_, - const ConstraintsDescription & constraints_, + const StorageInMemoryMetadata & metadata, bool attach, Context & context_, const String & date_column_name, - const ASTPtr & partition_by_ast_, - const ASTPtr & order_by_ast_, - const ASTPtr & primary_key_ast_, - const ASTPtr & sample_by_ast_, /// nullptr, if sampling is not supported. - const ASTPtr & ttl_table_ast_, const MergingParams & merging_params_, std::unique_ptr settings_, bool has_force_restore_data_flag); diff --git a/dbms/src/Storages/StorageNull.cpp b/dbms/src/Storages/StorageNull.cpp index d3b97f9ad46..342f786bf4f 100644 --- a/dbms/src/Storages/StorageNull.cpp +++ b/dbms/src/Storages/StorageNull.cpp @@ -30,6 +30,19 @@ void registerStorageNull(StorageFactory & factory) }); } +void StorageNull::checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */) +{ + for (const auto & command : commands) + { + if (command.type != AlterCommand::Type::ADD_COLUMN && command.type != AlterCommand::Type::MODIFY_COLUMN + && command.type != AlterCommand::Type::DROP_COLUMN && command.type != AlterCommand::Type::COMMENT_COLUMN) + throw Exception( + "Alter of type '" + alterTypeToString(command.type) + "' is not supported by storage " + getName(), + ErrorCodes::NOT_IMPLEMENTED); + } +} + + void StorageNull::alter( const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) { @@ -38,12 +51,10 @@ void StorageNull::alter( const String current_database_name = getDatabaseName(); const String current_table_name = getTableName(); - ColumnsDescription new_columns = getColumns(); - IndicesDescription new_indices = getIndices(); - ConstraintsDescription new_constraints = getConstraints(); - params.applyForColumnsOnly(new_columns); - context.getDatabase(current_database_name)->alterTable(context, current_table_name, new_columns, new_indices, new_constraints, {}); - setColumns(std::move(new_columns)); + StorageInMemoryMetadata metadata = getInMemoryMetadata(); + params.apply(metadata); + context.getDatabase(current_database_name)->alterTable(context, current_table_name, metadata); + setColumns(std::move(metadata.columns)); } } diff --git a/dbms/src/Storages/StorageNull.h b/dbms/src/Storages/StorageNull.h index e1a80f3fbaf..0122632fb69 100644 --- a/dbms/src/Storages/StorageNull.h +++ b/dbms/src/Storages/StorageNull.h @@ -44,8 +44,9 @@ public: database_name = new_database_name; } - void alter( - const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) override; + void checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */) override; + + void alter(const AlterCommands & params, const Context & context, TableStructureWriteLockHolder & table_lock_holder) override; private: String table_name; diff --git a/dbms/src/Storages/StorageReplicatedMergeTree.cpp b/dbms/src/Storages/StorageReplicatedMergeTree.cpp index d7f7a22895b..45b6cdcebf8 100644 --- a/dbms/src/Storages/StorageReplicatedMergeTree.cpp +++ b/dbms/src/Storages/StorageReplicatedMergeTree.cpp @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -194,25 +195,16 @@ StorageReplicatedMergeTree::StorageReplicatedMergeTree( const String & database_name_, const String & table_name_, const String & relative_data_path_, - const ColumnsDescription & columns_, - const IndicesDescription & indices_, - const ConstraintsDescription & constraints_, + const StorageInMemoryMetadata & metadata, Context & context_, const String & date_column_name, - const ASTPtr & partition_by_ast_, - const ASTPtr & order_by_ast_, - const ASTPtr & primary_key_ast_, - const ASTPtr & sample_by_ast_, - const ASTPtr & ttl_table_ast_, const MergingParams & merging_params_, std::unique_ptr settings_, bool has_force_restore_data_flag) - : MergeTreeData(database_name_, table_name_, relative_data_path_, - columns_, indices_, constraints_, - context_, date_column_name, partition_by_ast_, order_by_ast_, primary_key_ast_, - sample_by_ast_, ttl_table_ast_, merging_params_, - std::move(settings_), true, attach, + : MergeTreeData(database_name_, table_name_, relative_data_path_, metadata, + context_, date_column_name, merging_params_, std::move(settings_), true, attach, [this] (const std::string & name) { enqueuePartForCheck(name); }), + zookeeper_path(global_context.getMacros()->expand(zookeeper_path_, database_name_, table_name_)), replica_name(global_context.getMacros()->expand(replica_name_, database_name_, table_name_)), reader(*this), writer(*this), merger_mutator(*this, global_context.getBackgroundPool().getNumberOfThreads()), @@ -496,12 +488,10 @@ void StorageReplicatedMergeTree::checkTableStructure(bool skip_sanity_checks, bo void StorageReplicatedMergeTree::setTableStructure(ColumnsDescription new_columns, const ReplicatedMergeTreeTableMetadata::Diff & metadata_diff) { - ASTPtr new_primary_key_ast = primary_key_ast; - ASTPtr new_order_by_ast = order_by_ast; - auto new_indices = getIndices(); - auto new_constraints = getConstraints(); - ASTPtr new_ttl_table_ast = ttl_table_ast; - IDatabase::ASTModifier storage_modifier; + StorageInMemoryMetadata metadata = getInMemoryMetadata(); + if (new_columns != metadata.columns) + metadata.columns = new_columns; + if (!metadata_diff.empty()) { if (metadata_diff.sorting_key_changed) @@ -510,59 +500,41 @@ void StorageReplicatedMergeTree::setTableStructure(ColumnsDescription new_column auto new_sorting_key_expr_list = parseQuery(parser, metadata_diff.new_sorting_key, 0); if (new_sorting_key_expr_list->children.size() == 1) - new_order_by_ast = new_sorting_key_expr_list->children[0]; + metadata.order_by_ast = new_sorting_key_expr_list->children[0]; else { auto tuple = makeASTFunction("tuple"); tuple->arguments->children = new_sorting_key_expr_list->children; - new_order_by_ast = tuple; + metadata.order_by_ast = tuple; } if (!primary_key_ast) { /// Primary and sorting key become independent after this ALTER so we have to /// save the old ORDER BY expression as the new primary key. - new_primary_key_ast = order_by_ast->clone(); + metadata.primary_key_ast = order_by_ast->clone(); } } if (metadata_diff.skip_indices_changed) - new_indices = IndicesDescription::parse(metadata_diff.new_skip_indices); + metadata.indices = IndicesDescription::parse(metadata_diff.new_skip_indices); if (metadata_diff.constraints_changed) - new_constraints = ConstraintsDescription::parse(metadata_diff.new_constraints); + metadata.constraints = ConstraintsDescription::parse(metadata_diff.new_constraints); if (metadata_diff.ttl_table_changed) { - ParserExpression parser; - new_ttl_table_ast = parseQuery(parser, metadata_diff.new_ttl_table, 0); + ParserTTLExpressionList parser; + metadata.ttl_for_table_ast = parseQuery(parser, metadata_diff.new_ttl_table, 0); } - - storage_modifier = [&](IAST & ast) - { - auto & storage_ast = ast.as(); - - if (!storage_ast.order_by) - throw Exception( - "ALTER MODIFY ORDER BY of default-partitioned tables is not supported", - ErrorCodes::LOGICAL_ERROR); - - if (new_primary_key_ast.get() != primary_key_ast.get()) - storage_ast.set(storage_ast.primary_key, new_primary_key_ast); - - if (new_ttl_table_ast.get() != ttl_table_ast.get()) - storage_ast.set(storage_ast.ttl_table, new_ttl_table_ast); - - storage_ast.set(storage_ast.order_by, new_order_by_ast); - }; } - global_context.getDatabase(database_name)->alterTable(global_context, table_name, new_columns, new_indices, new_constraints, storage_modifier); + global_context.getDatabase(database_name)->alterTable(global_context, table_name, metadata); /// Even if the primary/sorting keys didn't change we must reinitialize it /// because primary key column types might have changed. - setProperties(new_order_by_ast, new_primary_key_ast, new_columns, new_indices, new_constraints); - setTTLExpressions(new_columns.getColumnTTLs(), new_ttl_table_ast); + setProperties(metadata); + setTTLExpressions(new_columns.getColumnTTLs(), metadata.ttl_for_table_ast); } @@ -1550,18 +1522,12 @@ void StorageReplicatedMergeTree::executeClearColumnOrIndexInPartition(const LogE alter_command.index_name = entry.index_name; } - auto new_columns = getColumns(); - auto new_indices = getIndices(); - auto new_constraints = getConstraints(); - ASTPtr ignored_order_by_ast; - ASTPtr ignored_primary_key_ast; - ASTPtr ignored_ttl_table_ast; - SettingsChanges ignored_changes; - alter_command.apply(new_columns, new_indices, new_constraints, ignored_order_by_ast, ignored_primary_key_ast, ignored_ttl_table_ast, ignored_changes); + StorageInMemoryMetadata metadata = getInMemoryMetadata(); + alter_command.apply(metadata); size_t modified_parts = 0; auto parts = getDataParts(); - auto columns_for_parts = new_columns.getAllPhysical(); + auto columns_for_parts = metadata.columns.getAllPhysical(); /// Check there are no merges in range again /// TODO: Currently, there are no guarantees that a merge covering entry_part_info will happen during the execution. @@ -1581,7 +1547,7 @@ void StorageReplicatedMergeTree::executeClearColumnOrIndexInPartition(const LogE LOG_DEBUG(log, "Clearing index " << alter_command.index_name << " in part " << part->name); MergeTreeData::AlterDataPartTransactionPtr transaction(new MergeTreeData::AlterDataPartTransaction(part)); - alterDataPart(columns_for_parts, new_indices.indices, false, transaction); + alterDataPart(columns_for_parts, metadata.indices.indices, false, transaction); if (!transaction->isValid()) continue; @@ -3241,14 +3207,12 @@ void StorageReplicatedMergeTree::alter( /// We don't replicate storage_settings_ptr ALTER. It's local operation. /// Also we don't upgrade alter lock to table structure lock. LOG_DEBUG(log, "ALTER storage_settings_ptr only"); - SettingsChanges new_changes; - params.applyForSettingsOnly(new_changes); + StorageInMemoryMetadata metadata = getInMemoryMetadata(); + params.apply(metadata); - changeSettings(new_changes, table_lock_holder); + changeSettings(metadata.settings_ast, table_lock_holder); - IDatabase::ASTModifier settings_modifier = getSettingsModifier(new_changes); - global_context.getDatabase(current_database_name)->alterTable( - query_context, current_table_name, getColumns(), getIndices(), getConstraints(), settings_modifier); + global_context.getDatabase(current_database_name)->alterTable(query_context, current_table_name, metadata); return; } @@ -3278,6 +3242,13 @@ void StorageReplicatedMergeTree::alter( int32_t new_version = -1; /// Initialization is to suppress (useless) false positive warning found by cppcheck. }; + auto ast_to_str = [](ASTPtr query) -> String + { + if (!query) + return ""; + return queryToString(query); + }; + /// /columns and /metadata nodes std::vector changed_nodes; @@ -3288,33 +3259,25 @@ void StorageReplicatedMergeTree::alter( if (is_readonly) throw Exception("Can't ALTER readonly table", ErrorCodes::TABLE_IS_READ_ONLY); - checkAlter(params, query_context); + StorageInMemoryMetadata metadata = getInMemoryMetadata(); + params.apply(metadata); - ColumnsDescription new_columns = getColumns(); - IndicesDescription new_indices = getIndices(); - ConstraintsDescription new_constraints = getConstraints(); - ASTPtr new_order_by_ast = order_by_ast; - ASTPtr new_primary_key_ast = primary_key_ast; - ASTPtr new_ttl_table_ast = ttl_table_ast; - SettingsChanges new_changes; - params.apply(new_columns, new_indices, new_constraints, new_order_by_ast, new_primary_key_ast, new_ttl_table_ast, new_changes); - - String new_columns_str = new_columns.toString(); + String new_columns_str = metadata.columns.toString(); if (new_columns_str != getColumns().toString()) changed_nodes.emplace_back(zookeeper_path, "columns", new_columns_str); ReplicatedMergeTreeTableMetadata new_metadata(*this); - if (new_order_by_ast.get() != order_by_ast.get()) - new_metadata.sorting_key = serializeAST(*extractKeyExpressionList(new_order_by_ast)); + if (ast_to_str(metadata.order_by_ast) != ast_to_str(order_by_ast)) + new_metadata.sorting_key = serializeAST(*extractKeyExpressionList(metadata.order_by_ast)); - if (new_ttl_table_ast.get() != ttl_table_ast.get()) - new_metadata.ttl_table = serializeAST(*new_ttl_table_ast); + if (ast_to_str(metadata.ttl_for_table_ast) != ast_to_str(ttl_table_ast)) + new_metadata.ttl_table = serializeAST(*metadata.ttl_for_table_ast); - String new_indices_str = new_indices.toString(); + String new_indices_str = metadata.indices.toString(); if (new_indices_str != getIndices().toString()) new_metadata.skip_indices = new_indices_str; - String new_constraints_str = new_constraints.toString(); + String new_constraints_str = metadata.constraints.toString(); if (new_constraints_str != getConstraints().toString()) new_metadata.constraints = new_constraints_str; @@ -3323,16 +3286,11 @@ void StorageReplicatedMergeTree::alter( changed_nodes.emplace_back(zookeeper_path, "metadata", new_metadata_str); /// Perform settings update locally - if (!new_changes.empty()) - { - IDatabase::ASTModifier settings_modifier = getSettingsModifier(new_changes); - changeSettings(new_changes, table_lock_holder); - - global_context.getDatabase(current_database_name)->alterTable( - query_context, current_table_name, getColumns(), getIndices(), getConstraints(), settings_modifier); - - } + auto old_metadata = getInMemoryMetadata(); + old_metadata.settings_ast = metadata.settings_ast; + changeSettings(metadata.settings_ast, table_lock_holder); + global_context.getDatabase(current_database_name)->alterTable(query_context, current_table_name, old_metadata); /// Modify shared metadata nodes in ZooKeeper. Coordination::Requests ops; diff --git a/dbms/src/Storages/StorageReplicatedMergeTree.h b/dbms/src/Storages/StorageReplicatedMergeTree.h index 6c7f1276175..9c97abdff40 100644 --- a/dbms/src/Storages/StorageReplicatedMergeTree.h +++ b/dbms/src/Storages/StorageReplicatedMergeTree.h @@ -545,16 +545,9 @@ protected: bool attach, const String & database_name_, const String & name_, const String & relative_data_path_, - const ColumnsDescription & columns_, - const IndicesDescription & indices_, - const ConstraintsDescription & constraints_, + const StorageInMemoryMetadata & metadata, Context & context_, const String & date_column_name, - const ASTPtr & partition_by_ast_, - const ASTPtr & order_by_ast_, - const ASTPtr & primary_key_ast_, - const ASTPtr & sample_by_ast_, - const ASTPtr & table_ttl_ast_, const MergingParams & merging_params_, std::unique_ptr settings_, bool has_force_restore_data_flag); diff --git a/dbms/src/Storages/StorageS3.cpp b/dbms/src/Storages/StorageS3.cpp index cf0b3df44fd..14732a291b1 100644 --- a/dbms/src/Storages/StorageS3.cpp +++ b/dbms/src/Storages/StorageS3.cpp @@ -49,7 +49,7 @@ namespace const String & key) : name(name_) { - read_buf = getReadBuffer(compression_method, client, bucket, key); + read_buf = wrapReadBufferWithCompressionMethod(std::make_unique(client, bucket, key), compression_method); reader = FormatFactory::instance().getInput(format, *read_buf, sample_block, context, max_block_size); } @@ -98,7 +98,8 @@ namespace const String & key) : sample_block(sample_block_) { - write_buf = getWriteBuffer(compression_method, client, bucket, key, min_upload_part_size); + write_buf = wrapWriteBufferWithCompressionMethod( + std::make_unique(client, bucket, key, min_upload_part_size), compression_method, 3); writer = FormatFactory::instance().getOutput(format, *write_buf, sample_block, context); } @@ -173,7 +174,7 @@ BlockInputStreams StorageS3::read( getHeaderBlock(column_names), context, max_block_size, - IStorage::chooseCompressionMethod(uri.endpoint, compression_method), + chooseCompressionMethod(uri.endpoint, compression_method), client, uri.bucket, uri.key); @@ -194,7 +195,7 @@ BlockOutputStreamPtr StorageS3::write(const ASTPtr & /*query*/, const Context & { return std::make_shared( format_name, min_upload_part_size, getSampleBlock(), context_global, - IStorage::chooseCompressionMethod(uri.endpoint, compression_method), + chooseCompressionMethod(uri.endpoint, compression_method), client, uri.bucket, uri.key); } diff --git a/dbms/src/Storages/StorageStripeLog.cpp b/dbms/src/Storages/StorageStripeLog.cpp index 1be6adb8037..90dac4a53d7 100644 --- a/dbms/src/Storages/StorageStripeLog.cpp +++ b/dbms/src/Storages/StorageStripeLog.cpp @@ -250,7 +250,7 @@ BlockInputStreams StorageStripeLog::read( if (!Poco::File(fullPath() + "index.mrk").exists()) return { std::make_shared(getSampleBlockForColumns(column_names)) }; - CompressedReadBufferFromFile index_in(fullPath() + "index.mrk", 0, 0, INDEX_BUFFER_SIZE); + CompressedReadBufferFromFile index_in(fullPath() + "index.mrk", 0, 0, 0, INDEX_BUFFER_SIZE); std::shared_ptr index{std::make_shared(index_in, column_names_set)}; BlockInputStreams res; diff --git a/dbms/src/Storages/StorageURL.cpp b/dbms/src/Storages/StorageURL.cpp index 907e18b21cf..efe15dc1928 100644 --- a/dbms/src/Storages/StorageURL.cpp +++ b/dbms/src/Storages/StorageURL.cpp @@ -60,17 +60,18 @@ namespace const CompressionMethod compression_method) : name(name_) { - read_buf = getReadBuffer( - compression_method, - uri, - method, - callback, - timeouts, - context.getSettingsRef().max_http_get_redirects, - Poco::Net::HTTPBasicCredentials{}, - DBMS_DEFAULT_BUFFER_SIZE, - ReadWriteBufferFromHTTP::HTTPHeaderEntries{}, - context.getRemoteHostFilter()); + read_buf = wrapReadBufferWithCompressionMethod( + std::make_unique( + uri, + method, + callback, + timeouts, + context.getSettingsRef().max_http_get_redirects, + Poco::Net::HTTPBasicCredentials{}, + DBMS_DEFAULT_BUFFER_SIZE, + ReadWriteBufferFromHTTP::HTTPHeaderEntries{}, + context.getRemoteHostFilter()), + compression_method); reader = FormatFactory::instance().getInput(format, *read_buf, sample_block, context, max_block_size); } @@ -117,7 +118,9 @@ namespace const CompressionMethod compression_method) : sample_block(sample_block_) { - write_buf = getWriteBuffer(compression_method, uri, Poco::Net::HTTPRequest::HTTP_POST, timeouts); + write_buf = wrapWriteBufferWithCompressionMethod( + std::make_unique(uri, Poco::Net::HTTPRequest::HTTP_POST, timeouts), + compression_method, 3); writer = FormatFactory::instance().getOutput(format, *write_buf, sample_block, context); } @@ -196,7 +199,7 @@ BlockInputStreams IStorageURLBase::read(const Names & column_names, context, max_block_size, ConnectionTimeouts::getHTTPTimeouts(context), - IStorage::chooseCompressionMethod(request_uri.getPath(), compression_method)); + chooseCompressionMethod(request_uri.getPath(), compression_method)); auto column_defaults = getColumns().getDefaults(); if (column_defaults.empty()) @@ -215,7 +218,7 @@ BlockOutputStreamPtr IStorageURLBase::write(const ASTPtr & /*query*/, const Cont return std::make_shared( uri, format_name, getSampleBlock(), context_global, ConnectionTimeouts::getHTTPTimeouts(context_global), - IStorage::chooseCompressionMethod(uri.toString(), compression_method)); + chooseCompressionMethod(uri.toString(), compression_method)); } void registerStorageURL(StorageFactory & factory) diff --git a/dbms/src/Storages/StorageXDBC.cpp b/dbms/src/Storages/StorageXDBC.cpp index 222eebd6377..0dcbf372b28 100644 --- a/dbms/src/Storages/StorageXDBC.cpp +++ b/dbms/src/Storages/StorageXDBC.cpp @@ -7,7 +7,6 @@ #include #include #include -#include #include #include #include diff --git a/dbms/src/Storages/System/StorageSystemDictionaries.cpp b/dbms/src/Storages/System/StorageSystemDictionaries.cpp index c8e19fed086..60ae65427a1 100644 --- a/dbms/src/Storages/System/StorageSystemDictionaries.cpp +++ b/dbms/src/Storages/System/StorageSystemDictionaries.cpp @@ -50,19 +50,25 @@ void StorageSystemDictionaries::fillData(MutableColumns & res_columns, const Con const auto & external_dictionaries = context.getExternalDictionariesLoader(); for (const auto & load_result : external_dictionaries.getCurrentLoadResults()) { - if (startsWith(load_result.repository_name, IExternalLoaderConfigRepository::INTERNAL_REPOSITORY_NAME_PREFIX)) - continue; + const auto dict_ptr = std::dynamic_pointer_cast(load_result.object); - size_t i = 0; - String database; - String short_name = load_result.name; - - if (!load_result.repository_name.empty() && startsWith(load_result.name, load_result.repository_name + ".")) + String database, short_name; + if (dict_ptr) { - database = load_result.repository_name; - short_name = load_result.name.substr(load_result.repository_name.length() + 1); + database = dict_ptr->getDatabase(); + short_name = dict_ptr->getName(); + } + else + { + short_name = load_result.name; + if (!load_result.repository_name.empty() && startsWith(short_name, load_result.repository_name + ".")) + { + database = load_result.repository_name; + short_name = short_name.substr(database.length() + 1); + } } + size_t i = 0; res_columns[i++]->insert(database); res_columns[i++]->insert(short_name); res_columns[i++]->insert(static_cast(load_result.status)); @@ -70,7 +76,6 @@ void StorageSystemDictionaries::fillData(MutableColumns & res_columns, const Con std::exception_ptr last_exception = load_result.exception; - const auto dict_ptr = std::dynamic_pointer_cast(load_result.object); if (dict_ptr) { res_columns[i++]->insert(dict_ptr->getTypeName()); diff --git a/dbms/src/Storages/System/StorageSystemTables.cpp b/dbms/src/Storages/System/StorageSystemTables.cpp index 01f8704f681..cfa417f24a3 100644 --- a/dbms/src/Storages/System/StorageSystemTables.cpp +++ b/dbms/src/Storages/System/StorageSystemTables.cpp @@ -258,7 +258,7 @@ protected: res_columns[res_index++]->insert(database->getObjectMetadataPath(table_name)); if (columns_mask[src_index++]) - res_columns[res_index++]->insert(static_cast(database->getObjectMetadataModificationTime(context, table_name))); + res_columns[res_index++]->insert(static_cast(database->getObjectMetadataModificationTime(table_name))); { Array dependencies_table_name_array; diff --git a/dbms/src/Storages/tests/get_abandonable_lock_in_all_partitions.cpp b/dbms/src/Storages/tests/get_abandonable_lock_in_all_partitions.cpp index f0c5a3d158e..533db3236fd 100644 --- a/dbms/src/Storages/tests/get_abandonable_lock_in_all_partitions.cpp +++ b/dbms/src/Storages/tests/get_abandonable_lock_in_all_partitions.cpp @@ -54,7 +54,7 @@ try catch (const Exception & e) { std::cerr << e.what() << ", " << e.displayText() << ": " << std::endl - << e.getStackTrace().toString() << std::endl; + << e.getStackTraceString() << std::endl; throw; } catch (Poco::Exception & e) diff --git a/dbms/src/Storages/tests/get_current_inserts_in_replicated.cpp b/dbms/src/Storages/tests/get_current_inserts_in_replicated.cpp index 9012ccb6eb8..aba69684045 100644 --- a/dbms/src/Storages/tests/get_current_inserts_in_replicated.cpp +++ b/dbms/src/Storages/tests/get_current_inserts_in_replicated.cpp @@ -112,7 +112,7 @@ try catch (const Exception & e) { std::cerr << e.what() << ", " << e.displayText() << ": " << std::endl - << e.getStackTrace().toString() << std::endl; + << e.getStackTraceString() << std::endl; throw; } catch (Poco::Exception & e) diff --git a/dbms/src/TableFunctions/ITableFunctionFileLike.cpp b/dbms/src/TableFunctions/ITableFunctionFileLike.cpp index 3e0ddafaa90..7b1d342a64a 100644 --- a/dbms/src/TableFunctions/ITableFunctionFileLike.cpp +++ b/dbms/src/TableFunctions/ITableFunctionFileLike.cpp @@ -42,12 +42,10 @@ StoragePtr ITableFunctionFileLike::executeImpl(const ASTPtr & ast_function, cons std::string filename = args[0]->as().value.safeGet(); std::string format = args[1]->as().value.safeGet(); std::string structure = args[2]->as().value.safeGet(); - std::string compression_method; + std::string compression_method = "auto"; if (args.size() == 4) - { compression_method = args[3]->as().value.safeGet(); - } else compression_method = "auto"; ColumnsDescription columns = parseColumnsListFromString(structure, context); diff --git a/dbms/tests/integration/test_dictionary_ddl_on_cluster/test.py b/dbms/tests/integration/test_dictionary_ddl_on_cluster/test.py index 76cdfb458ee..909d2e06377 100644 --- a/dbms/tests/integration/test_dictionary_ddl_on_cluster/test.py +++ b/dbms/tests/integration/test_dictionary_ddl_on_cluster/test.py @@ -61,7 +61,6 @@ def test_dictionary_ddl_on_cluster(started_cluster): node.query("ALTER TABLE sometbl UPDATE value = 'new_key' WHERE 1") ch1.query("SYSTEM RELOAD DICTIONARY ON CLUSTER 'cluster' `default.somedict`") - time.sleep(2) # SYSTEM RELOAD DICTIONARY is an asynchronous query for num, node in enumerate([ch1, ch2, ch3, ch4]): assert node.query("SELECT dictGetString('default.somedict', 'value', toUInt64({}))".format(num)) == 'new_key' + '\n' diff --git a/dbms/tests/integration/test_inherit_multiple_profiles/__init__.py b/dbms/tests/integration/test_inherit_multiple_profiles/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/dbms/tests/integration/test_inherit_multiple_profiles/configs/combined_profile.xml b/dbms/tests/integration/test_inherit_multiple_profiles/configs/combined_profile.xml new file mode 100644 index 00000000000..4fbb7dcf3ff --- /dev/null +++ b/dbms/tests/integration/test_inherit_multiple_profiles/configs/combined_profile.xml @@ -0,0 +1,59 @@ + + + + 2 + + 200000000 + + + 100000000 + + + + + + + + 1234567890 + 300000000 + + + 200000000 + + + 1234567889 + 1234567891 + + + + + 654321 + 400000000 + + + 300000000 + + + 654320 + 654322 + + + + + profile_1 + profile_2 + profile_3 + 2 + + + + + + + ::/0 + + default + combined_profile + + + diff --git a/dbms/tests/integration/test_inherit_multiple_profiles/test.py b/dbms/tests/integration/test_inherit_multiple_profiles/test.py new file mode 100644 index 00000000000..1540196f9b6 --- /dev/null +++ b/dbms/tests/integration/test_inherit_multiple_profiles/test.py @@ -0,0 +1,74 @@ +import pytest + +from helpers.client import QueryRuntimeException +from helpers.cluster import ClickHouseCluster +from helpers.test_tools import TSV + + +cluster = ClickHouseCluster(__file__) +instance = cluster.add_instance('instance', + user_configs=['configs/combined_profile.xml']) +q = instance.query + + +@pytest.fixture(scope="module") +def started_cluster(): + try: + cluster.start() + + yield cluster + + finally: + cluster.shutdown() + + +def test_combined_profile(started_cluster): + settings = q(''' +SELECT name, value FROM system.settings + WHERE name IN + ('max_insert_block_size', 'max_network_bytes', 'max_query_size', + 'max_parallel_replicas', 'readonly') + AND changed +ORDER BY name +''', user='test_combined_profile') + + expected1 = '''\ +max_insert_block_size 654321 +max_network_bytes 1234567890 +max_parallel_replicas 2 +max_query_size 400000000 +readonly 2''' + + assert TSV(settings) == TSV(expected1) + + with pytest.raises(QueryRuntimeException) as exc: + q(''' + SET max_insert_block_size = 1000; + ''', user='test_combined_profile') + + assert ("max_insert_block_size shouldn't be less than 654320." in + str(exc.value)) + + with pytest.raises(QueryRuntimeException) as exc: + q(''' + SET max_network_bytes = 2000000000; + ''', user='test_combined_profile') + + assert ("max_network_bytes shouldn't be greater than 1234567891." in + str(exc.value)) + + with pytest.raises(QueryRuntimeException) as exc: + q(''' + SET max_parallel_replicas = 1000; + ''', user='test_combined_profile') + + assert ('max_parallel_replicas should not be changed.' in + str(exc.value)) + + with pytest.raises(QueryRuntimeException) as exc: + q(''' + SET max_memory_usage = 1000; + ''', user='test_combined_profile') + + assert ("max_memory_usage shouldn't be less than 300000000." in + str(exc.value)) diff --git a/dbms/tests/integration/test_ttl_move/test.py b/dbms/tests/integration/test_ttl_move/test.py index a03575cde98..071257d24ca 100644 --- a/dbms/tests/integration/test_ttl_move/test.py +++ b/dbms/tests/integration/test_ttl_move/test.py @@ -538,3 +538,85 @@ def test_ttls_do_not_work_after_alter(started_cluster, name, engine, positive): finally: node1.query("DROP TABLE IF EXISTS {}".format(name)) + + +@pytest.mark.parametrize("name,engine,positive", [ + ("mt_test_alter_multiple_ttls_positive", "MergeTree()", True), + ("mt_replicated_test_alter_multiple_ttls_positive", "ReplicatedMergeTree('/clickhouse/replicated_test_alter_multiple_ttls_positive', '1')", True), + ("mt_test_alter_multiple_ttls_negative", "MergeTree()", False), + ("mt_replicated_test_alter_multiple_ttls_negative", "ReplicatedMergeTree('/clickhouse/replicated_test_alter_multiple_ttls_negative', '1')", False), +]) +def test_alter_multiple_ttls(started_cluster, name, engine, positive): + """Copyright 2019, Altinity LTD + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License.""" + """Check that when multiple TTL expressions are set + and before any parts are inserted the TTL expressions + are changed with ALTER command then all old + TTL expressions are removed and the + the parts are moved to the specified disk or volume or + deleted if the new TTL expression is triggered + and are not moved or deleted when it is not. + """ + now = time.time() + try: + node1.query(""" + CREATE TABLE {name} ( + p1 Int64, + s1 String, + d1 DateTime + ) ENGINE = {engine} + ORDER BY tuple() + PARTITION BY p1 + TTL d1 + INTERVAL 30 SECOND TO DISK 'jbod2', + d1 + INTERVAL 60 SECOND TO VOLUME 'external' + SETTINGS storage_policy='jbods_with_external', merge_with_ttl_timeout=0 + """.format(name=name, engine=engine)) + + node1.query(""" + ALTER TABLE {name} MODIFY + TTL d1 + INTERVAL 0 SECOND TO DISK 'jbod2', + d1 + INTERVAL 5 SECOND TO VOLUME 'external', + d1 + INTERVAL 10 SECOND DELETE + """.format(name=name)) + + for p in range(3): + data = [] # 6MB in total + now = time.time() + for i in range(2): + p1 = p + s1 = get_random_string(1024 * 1024) # 1MB + d1 = now - 1 if i > 0 or positive else now + 300 + data.append("({}, '{}', toDateTime({}))".format(p1, s1, d1)) + node1.query("INSERT INTO {name} (p1, s1, d1) VALUES {values}".format(name=name, values=",".join(data))) + + used_disks = get_used_disks_for_table(node1, name) + assert set(used_disks) == {"jbod2"} if positive else {"jbod1", "jbod2"} + + assert node1.query("SELECT count() FROM {name}".format(name=name)).splitlines() == ["6"] + + time.sleep(5) + + used_disks = get_used_disks_for_table(node1, name) + assert set(used_disks) == {"external"} if positive else {"jbod1", "jbod2"} + + assert node1.query("SELECT count() FROM {name}".format(name=name)).splitlines() == ["6"] + + time.sleep(5) + + node1.query("OPTIMIZE TABLE {name} FINAL".format(name=name)) + + assert node1.query("SELECT count() FROM {name}".format(name=name)).splitlines() == ["0"] if positive else ["3"] + + finally: + node1.query("DROP TABLE IF EXISTS {name}".format(name=name)) diff --git a/dbms/tests/performance/README.md b/dbms/tests/performance/README.md index ecda08a80b1..d436eb7bce3 100644 --- a/dbms/tests/performance/README.md +++ b/dbms/tests/performance/README.md @@ -16,7 +16,7 @@ After you have choosen type, you have to specify `preconditions`. It contains ta The most important part of test is `stop_conditions`. For `loop` test you should always use `min_time_not_changing_for_ms` stop condition. For `once` test you can choose between `average_speed_not_changing_for_ms` and `max_speed_not_changing_for_ms`, but first is preferable. Also you should always specify `total_time_ms` metric. Endless tests will be ignored by CI. -`metrics` and `main_metric` settings are not important and can be ommited, because `loop` tests are always compared by `min_time` metric and `once` tests compared by `max_rows_per_second`. +`loop` tests are always compared by `min_time` metric and `once` tests compared by `max_rows_per_second`. You can use `substitions`, `create`, `fill` and `drop` queries to prepare test. You can find examples in this folder. diff --git a/dbms/tests/performance/agg_functions_min_max_any.xml b/dbms/tests/performance/agg_functions_min_max_any.xml index 85799a9a94a..8a132bb79a9 100644 --- a/dbms/tests/performance/agg_functions_min_max_any.xml +++ b/dbms/tests/performance/agg_functions_min_max_any.xml @@ -12,9 +12,6 @@ - - - default.hits_1000m_single diff --git a/dbms/tests/performance/and_function.xml b/dbms/tests/performance/and_function.xml index 08fd07ea7e5..fb1bcb9fcf8 100644 --- a/dbms/tests/performance/and_function.xml +++ b/dbms/tests/performance/and_function.xml @@ -12,9 +12,6 @@ - - - select count() from numbers(10000000) where number != 96594 AND number != 18511 AND number != 98085 AND number != 84177 AND number != 70314 AND number != 28083 AND number != 54202 AND number != 66522 AND number != 66939 AND number != 99469 AND number != 65776 AND number != 22876 AND number != 42151 AND number != 19924 AND number != 66681 AND number != 63022 AND number != 17487 AND number != 83914 AND number != 59754 AND number != 968 AND number != 73334 AND number != 68569 AND number != 49853 AND number != 33155 AND number != 31777 AND number != 99698 AND number != 26708 AND number != 76409 AND number != 42191 AND number != 55397 AND number != 25724 AND number != 39170 AND number != 22728 AND number != 98238 AND number != 86052 AND number != 12756 AND number != 13948 AND number != 57774 AND number != 82511 AND number != 11337 AND number != 23506 AND number != 11875 AND number != 58536 AND number != 56919 AND number != 25986 AND number != 80710 AND number != 61797 AND number != 99244 AND number != 11665 AND number != 15758 AND number != 82899 AND number != 63150 AND number != 7198 AND number != 40071 AND number != 46310 AND number != 78488 AND number != 9273 AND number != 91878 AND number != 57904 AND number != 53941 AND number != 75675 AND number != 12093 AND number != 50090 AND number != 59675 AND number != 41632 AND number != 81448 AND number != 46821 AND number != 51919 AND number != 49028 AND number != 71059 AND number != 15673 AND number != 6132 AND number != 15473 AND number != 32527 AND number != 63842 AND number != 33121 AND number != 53271 AND number != 86033 AND number != 96807 AND number != 4791 AND number != 80089 AND number != 51616 AND number != 46311 AND number != 82844 AND number != 59353 AND number != 63538 AND number != 64857 AND number != 58471 AND number != 29870 AND number != 80209 AND number != 61000 AND number != 75991 AND number != 44506 AND number != 11283 AND number != 6335 AND number != 73502 AND number != 22354 AND number != 72816 AND number != 66399 AND number != 61703 diff --git a/dbms/tests/performance/array_element.xml b/dbms/tests/performance/array_element.xml index 92dcf0bb5e1..672683fe146 100644 --- a/dbms/tests/performance/array_element.xml +++ b/dbms/tests/performance/array_element.xml @@ -8,9 +8,6 @@ - - - SELECT count() FROM system.numbers WHERE NOT ignore([[1], [2]][number % 2 + 2]) SELECT count() FROM system.numbers WHERE NOT ignore([[], [2]][number % 2 + 2]) diff --git a/dbms/tests/performance/array_fill.xml b/dbms/tests/performance/array_fill.xml index c4c0955dfc6..25ed745158b 100644 --- a/dbms/tests/performance/array_fill.xml +++ b/dbms/tests/performance/array_fill.xml @@ -7,9 +7,6 @@ - - - SELECT arraySlice(arrayFill(x -> ((x % 2) >= 0), range(100000000)), 1, 10) SELECT arraySlice(arrayFill(x -> (((x.1) % 2) >= 0), arrayMap(x -> (x, toString(x)), range(100000000))), 1, 10) diff --git a/dbms/tests/performance/array_join.xml b/dbms/tests/performance/array_join.xml index 7220f35d881..d2eb213ce03 100644 --- a/dbms/tests/performance/array_join.xml +++ b/dbms/tests/performance/array_join.xml @@ -8,9 +8,6 @@ - - - SELECT count() FROM (SELECT [number] a, [number * 2] b FROM system.numbers) AS t ARRAY JOIN a, b WHERE NOT ignore(a + b) SELECT count() FROM (SELECT [number] a, [number * 2] b FROM system.numbers) AS t LEFT ARRAY JOIN a, b WHERE NOT ignore(a + b) diff --git a/dbms/tests/performance/base64.xml b/dbms/tests/performance/base64.xml index a479b10d48a..651412c2752 100644 --- a/dbms/tests/performance/base64.xml +++ b/dbms/tests/performance/base64.xml @@ -11,9 +11,6 @@ - - - diff --git a/dbms/tests/performance/base64_hits.xml b/dbms/tests/performance/base64_hits.xml index be693a16bee..7b07f3badb7 100644 --- a/dbms/tests/performance/base64_hits.xml +++ b/dbms/tests/performance/base64_hits.xml @@ -15,9 +15,6 @@ - - - diff --git a/dbms/tests/performance/basename.xml b/dbms/tests/performance/basename.xml index e204f050bfe..6af67bc94c4 100644 --- a/dbms/tests/performance/basename.xml +++ b/dbms/tests/performance/basename.xml @@ -12,9 +12,6 @@ - - - test.hits diff --git a/dbms/tests/performance/bloom_filter.xml b/dbms/tests/performance/bloom_filter.xml new file mode 100644 index 00000000000..f08674cb268 --- /dev/null +++ b/dbms/tests/performance/bloom_filter.xml @@ -0,0 +1,16 @@ + + once + + + + 30000 + + + + DROP TABLE IF EXISTS test_bf + CREATE TABLE test_bf (`id` int, `ary` Array(String), INDEX idx_ary ary TYPE bloom_filter(0.01) GRANULARITY 8192) ENGINE = MergeTree() ORDER BY id + SYSTEM STOP MERGES + INSERT INTO test_bf SELECT number AS id, [CAST(id, 'String'), CAST(id + 1, 'String'), CAST(id + 2, 'String')] FROM system.numbers LIMIT 3000000 + SYSTEM START MERGES + DROP TABLE IF EXISTS test_bf + diff --git a/dbms/tests/performance/cidr.xml b/dbms/tests/performance/cidr.xml index 257fcf6fb0d..1ca7f691881 100644 --- a/dbms/tests/performance/cidr.xml +++ b/dbms/tests/performance/cidr.xml @@ -12,9 +12,6 @@ - - - test.hits diff --git a/dbms/tests/performance/collations.xml b/dbms/tests/performance/collations.xml index 9bc48d76bce..1bec38dd103 100644 --- a/dbms/tests/performance/collations.xml +++ b/dbms/tests/performance/collations.xml @@ -12,9 +12,6 @@ - - - test.hits diff --git a/dbms/tests/performance/column_column_comparison.xml b/dbms/tests/performance/column_column_comparison.xml index f80a673ce22..9d4446d7c2d 100644 --- a/dbms/tests/performance/column_column_comparison.xml +++ b/dbms/tests/performance/column_column_comparison.xml @@ -40,7 +40,4 @@ - - - diff --git a/dbms/tests/performance/columns_hashing.xml b/dbms/tests/performance/columns_hashing.xml index fe9777b8a38..138855dae89 100644 --- a/dbms/tests/performance/columns_hashing.xml +++ b/dbms/tests/performance/columns_hashing.xml @@ -40,7 +40,4 @@ - - - diff --git a/dbms/tests/performance/complex_array_creation.xml b/dbms/tests/performance/complex_array_creation.xml index b572041e8ec..a5ff824d6de 100644 --- a/dbms/tests/performance/complex_array_creation.xml +++ b/dbms/tests/performance/complex_array_creation.xml @@ -8,9 +8,6 @@ - - - SELECT count() FROM system.numbers WHERE NOT ignore([[number], [number]]) SELECT count() FROM system.numbers WHERE NOT ignore([[], [number]]) diff --git a/dbms/tests/performance/concat_hits.xml b/dbms/tests/performance/concat_hits.xml index 0c4c52afa02..e2c6fc23c08 100644 --- a/dbms/tests/performance/concat_hits.xml +++ b/dbms/tests/performance/concat_hits.xml @@ -12,9 +12,6 @@ - - - test.hits diff --git a/dbms/tests/performance/consistent_hashes.xml b/dbms/tests/performance/consistent_hashes.xml index 98a621e4248..7219aa00c1a 100644 --- a/dbms/tests/performance/consistent_hashes.xml +++ b/dbms/tests/performance/consistent_hashes.xml @@ -8,10 +8,6 @@ - - - - diff --git a/dbms/tests/performance/constant_column_comparison.xml b/dbms/tests/performance/constant_column_comparison.xml index 14fa7fa623e..f32ed444a0c 100644 --- a/dbms/tests/performance/constant_column_comparison.xml +++ b/dbms/tests/performance/constant_column_comparison.xml @@ -42,7 +42,4 @@ - - - diff --git a/dbms/tests/performance/constant_column_search.xml b/dbms/tests/performance/constant_column_search.xml index 76a4d2b7e74..9953c2797a2 100644 --- a/dbms/tests/performance/constant_column_search.xml +++ b/dbms/tests/performance/constant_column_search.xml @@ -61,7 +61,4 @@ - - - diff --git a/dbms/tests/performance/count.xml b/dbms/tests/performance/count.xml index da972c69059..0244adf4b38 100644 --- a/dbms/tests/performance/count.xml +++ b/dbms/tests/performance/count.xml @@ -12,9 +12,6 @@ - - - CREATE TABLE data(k UInt64, v UInt64) ENGINE = MergeTree ORDER BY k diff --git a/dbms/tests/performance/cpu_synthetic.xml b/dbms/tests/performance/cpu_synthetic.xml index cb6e41b34ac..16bef3fd42e 100644 --- a/dbms/tests/performance/cpu_synthetic.xml +++ b/dbms/tests/performance/cpu_synthetic.xml @@ -12,9 +12,6 @@ - - - test.hits diff --git a/dbms/tests/performance/cryptographic_hashes.xml b/dbms/tests/performance/cryptographic_hashes.xml index 2f5a0a1d779..7840a7b382a 100644 --- a/dbms/tests/performance/cryptographic_hashes.xml +++ b/dbms/tests/performance/cryptographic_hashes.xml @@ -11,9 +11,6 @@ - - - diff --git a/dbms/tests/performance/date_parsing.xml b/dbms/tests/performance/date_parsing.xml index 10a2812b067..8ecf3681804 100644 --- a/dbms/tests/performance/date_parsing.xml +++ b/dbms/tests/performance/date_parsing.xml @@ -12,9 +12,6 @@ - - - test.hits diff --git a/dbms/tests/performance/date_time_64.xml b/dbms/tests/performance/date_time_64.xml index 60c77ca22f8..b345550b335 100644 --- a/dbms/tests/performance/date_time_64.xml +++ b/dbms/tests/performance/date_time_64.xml @@ -21,9 +21,6 @@ - - - SELECT count() FROM dt where not ignore(x) diff --git a/dbms/tests/performance/decimal_aggregates.xml b/dbms/tests/performance/decimal_aggregates.xml index f22cb89de36..86830fedce6 100644 --- a/dbms/tests/performance/decimal_aggregates.xml +++ b/dbms/tests/performance/decimal_aggregates.xml @@ -11,9 +11,6 @@ - - - SELECT min(d32), max(d32), argMin(x, d32), argMax(x, d32) FROM t SELECT min(d64), max(d64), argMin(x, d64), argMax(x, d64) FROM t diff --git a/dbms/tests/performance/early_constant_folding.xml b/dbms/tests/performance/early_constant_folding.xml index 04fb4057d17..ad2d1619eb9 100644 --- a/dbms/tests/performance/early_constant_folding.xml +++ b/dbms/tests/performance/early_constant_folding.xml @@ -11,9 +11,6 @@ - - - default.hits_100m_single diff --git a/dbms/tests/performance/entropy.xml b/dbms/tests/performance/entropy.xml index a7b8f76fcaf..dcede345792 100644 --- a/dbms/tests/performance/entropy.xml +++ b/dbms/tests/performance/entropy.xml @@ -15,9 +15,6 @@ - - - diff --git a/dbms/tests/performance/first_significant_subdomain.xml b/dbms/tests/performance/first_significant_subdomain.xml index 8d38b871c62..705e70b86f9 100644 --- a/dbms/tests/performance/first_significant_subdomain.xml +++ b/dbms/tests/performance/first_significant_subdomain.xml @@ -12,9 +12,6 @@ - - - test.hits diff --git a/dbms/tests/performance/fixed_string16.xml b/dbms/tests/performance/fixed_string16.xml index 34fa6a94707..398f09aba3d 100644 --- a/dbms/tests/performance/fixed_string16.xml +++ b/dbms/tests/performance/fixed_string16.xml @@ -22,7 +22,4 @@ - - - diff --git a/dbms/tests/performance/float_formatting.xml b/dbms/tests/performance/float_formatting.xml new file mode 100644 index 00000000000..0216e524735 --- /dev/null +++ b/dbms/tests/performance/float_formatting.xml @@ -0,0 +1,58 @@ + + once + + long + + + + + 10000 + + + 5000 + 20000 + + + + + + + expr + + 1 / rand() + rand() / 0xFFFFFFFF + 0xFFFFFFFF / rand() + toFloat64(number) + toFloat64(number % 2) + toFloat64(number % 10) + toFloat64(number % 100) + toFloat64(number % 1000) + toFloat64(number % 10000) + toFloat64(number % 100 + 0.5) + toFloat64(number % 100 + 0.123) + toFloat64(number % 1000 + 0.123456) + number / 2 + number / 3 + number / 7 + number / 16 + toFloat64(rand()) + toFloat64(rand64()) + toFloat32(number) + toFloat32(number % 2) + toFloat32(number % 10) + toFloat32(number % 100) + toFloat32(number % 1000) + toFloat32(number % 10000) + toFloat32(number % 100 + 0.5) + toFloat32(number % 100 + 0.123) + toFloat32(number % 1000 + 0.123456) + toFloat32(rand()) + toFloat32(rand64()) + reinterpretAsFloat32(reinterpretAsString(rand())) + reinterpretAsFloat64(reinterpretAsString(rand64())) + + + + + SELECT count() FROM system.numbers WHERE NOT ignore(toString({expr})) + diff --git a/dbms/tests/performance/float_parsing.xml b/dbms/tests/performance/float_parsing.xml index 1cded361871..81f30540dd1 100644 --- a/dbms/tests/performance/float_parsing.xml +++ b/dbms/tests/performance/float_parsing.xml @@ -14,9 +14,6 @@ - - - diff --git a/dbms/tests/performance/general_purpose_hashes.xml b/dbms/tests/performance/general_purpose_hashes.xml index b7a1b915ff0..122e69f374c 100644 --- a/dbms/tests/performance/general_purpose_hashes.xml +++ b/dbms/tests/performance/general_purpose_hashes.xml @@ -12,9 +12,6 @@ - - - diff --git a/dbms/tests/performance/general_purpose_hashes_on_UUID.xml b/dbms/tests/performance/general_purpose_hashes_on_UUID.xml index 23e00909bbe..c7fb0a3676b 100644 --- a/dbms/tests/performance/general_purpose_hashes_on_UUID.xml +++ b/dbms/tests/performance/general_purpose_hashes_on_UUID.xml @@ -12,9 +12,6 @@ - - - diff --git a/dbms/tests/performance/group_array_moving_sum.xml b/dbms/tests/performance/group_array_moving_sum.xml index ee2686c9af8..504a8b133a1 100644 --- a/dbms/tests/performance/group_array_moving_sum.xml +++ b/dbms/tests/performance/group_array_moving_sum.xml @@ -12,9 +12,6 @@ - - - CREATE TABLE moving_sum_1m(k UInt64, v UInt64) ENGINE = MergeTree ORDER BY k CREATE TABLE moving_sum_10m(k UInt64, v UInt64) ENGINE = MergeTree ORDER BY k diff --git a/dbms/tests/performance/if_array_num.xml b/dbms/tests/performance/if_array_num.xml index 3d1359bb55a..417b82a9d0c 100644 --- a/dbms/tests/performance/if_array_num.xml +++ b/dbms/tests/performance/if_array_num.xml @@ -8,9 +8,6 @@ - - - SELECT count() FROM system.numbers WHERE NOT ignore(rand() % 2 ? [1, 2, 3] : [4, 5]) SELECT count() FROM system.numbers WHERE NOT ignore(rand() % 2 ? [1, 2, 3] : materialize([4, 5])) diff --git a/dbms/tests/performance/if_array_string.xml b/dbms/tests/performance/if_array_string.xml index c135cf9c8ce..e1d8485adc2 100644 --- a/dbms/tests/performance/if_array_string.xml +++ b/dbms/tests/performance/if_array_string.xml @@ -8,9 +8,6 @@ - - - SELECT count() FROM system.numbers WHERE NOT ignore(rand() % 2 ? ['Hello', 'World'] : ['a', 'b', 'c']) SELECT count() FROM system.numbers WHERE NOT ignore(rand() % 2 ? materialize(['Hello', 'World']) : ['a', 'b', 'c']) diff --git a/dbms/tests/performance/if_string_const.xml b/dbms/tests/performance/if_string_const.xml index be2c4629519..15a281685ae 100644 --- a/dbms/tests/performance/if_string_const.xml +++ b/dbms/tests/performance/if_string_const.xml @@ -8,9 +8,6 @@ - - - SELECT count() FROM system.numbers WHERE NOT ignore(rand() % 2 ? 'hello' : 'world') SELECT count() FROM system.numbers WHERE NOT ignore(rand() % 2 ? 'hello' : '') diff --git a/dbms/tests/performance/if_string_hits.xml b/dbms/tests/performance/if_string_hits.xml index e0ee7109f0c..267c8b039e5 100644 --- a/dbms/tests/performance/if_string_hits.xml +++ b/dbms/tests/performance/if_string_hits.xml @@ -12,9 +12,6 @@ - - - test.hits diff --git a/dbms/tests/performance/if_to_multiif.xml b/dbms/tests/performance/if_to_multiif.xml new file mode 100644 index 00000000000..54d4b8ba842 --- /dev/null +++ b/dbms/tests/performance/if_to_multiif.xml @@ -0,0 +1,19 @@ + + once + + + + 1000 + 10000 + + + + + + + + + + diff --git a/dbms/tests/performance/information_value.xml b/dbms/tests/performance/information_value.xml index 63d61f6a432..ed054eda40d 100644 --- a/dbms/tests/performance/information_value.xml +++ b/dbms/tests/performance/information_value.xml @@ -15,9 +15,6 @@ - - - SELECT categoricalInformationValue(Age < 15, IsMobile) SELECT categoricalInformationValue(Age < 15, Age >= 15 and Age < 30, Age >= 30 and Age < 45, Age >= 45 and Age < 60, Age >= 60, IsMobile) diff --git a/dbms/tests/performance/int_parsing.xml b/dbms/tests/performance/int_parsing.xml index 625bdaadc86..51f740523ba 100644 --- a/dbms/tests/performance/int_parsing.xml +++ b/dbms/tests/performance/int_parsing.xml @@ -12,9 +12,6 @@ - - - test.hits diff --git a/dbms/tests/performance/jit_small_requests.xml b/dbms/tests/performance/jit_small_requests.xml index 9c481994b97..d65e14cb97e 100644 --- a/dbms/tests/performance/jit_small_requests.xml +++ b/dbms/tests/performance/jit_small_requests.xml @@ -12,9 +12,6 @@ - - - @@ -25,10 +22,11 @@ bitXor(x2, bitShiftRight(x2, 33)) AS x3, x3 * 0xc4ceb9fe1a85ec53 AS x4, bitXor(x4, bitShiftRight(x4, 33)) AS x5 - SELECT x5, intHash64(number) FROM system.numbers LIMIT 10 + SELECT count() FROM numbers(10000000) WHERE NOT ignore(x5) SETTINGS compile_expressions = 0 + WITH bitXor(number, 0x4CF2D2BAAE6DA887) AS x0, @@ -37,8 +35,13 @@ bitXor(x2, bitShiftRight(x2, 33)) AS x3, x3 * 0xc4ceb9fe1a85ec53 AS x4, bitXor(x4, bitShiftRight(x4, 33)) AS x5 - SELECT x5, intHash64(number) FROM system.numbers LIMIT 10 + SELECT count() FROM numbers(10000000) WHERE NOT ignore(x5) SETTINGS compile_expressions = 1 + + + + SELECT count() FROM numbers(10000000) WHERE NOT ignore(intHash64(number)) + diff --git a/dbms/tests/performance/joins_in_memory.xml b/dbms/tests/performance/joins_in_memory.xml index 1da400c48f4..f624030d7d4 100644 --- a/dbms/tests/performance/joins_in_memory.xml +++ b/dbms/tests/performance/joins_in_memory.xml @@ -7,9 +7,6 @@ - - - CREATE TABLE ints (i64 Int64, i32 Int32, i16 Int16, i8 Int8) ENGINE = Memory diff --git a/dbms/tests/performance/joins_in_memory_pmj.xml b/dbms/tests/performance/joins_in_memory_pmj.xml index 3908df8c978..0352268c846 100644 --- a/dbms/tests/performance/joins_in_memory_pmj.xml +++ b/dbms/tests/performance/joins_in_memory_pmj.xml @@ -7,9 +7,6 @@ - - - CREATE TABLE ints (i64 Int64, i32 Int32, i16 Int16, i8 Int8) ENGINE = Memory SET partial_merge_join = 1 diff --git a/dbms/tests/performance/json_extract_rapidjson.xml b/dbms/tests/performance/json_extract_rapidjson.xml index 74f04c9736d..8a2718d4a56 100644 --- a/dbms/tests/performance/json_extract_rapidjson.xml +++ b/dbms/tests/performance/json_extract_rapidjson.xml @@ -12,9 +12,6 @@ - - - diff --git a/dbms/tests/performance/json_extract_simdjson.xml b/dbms/tests/performance/json_extract_simdjson.xml index 6ba1746d5e3..f3a38912b0f 100644 --- a/dbms/tests/performance/json_extract_simdjson.xml +++ b/dbms/tests/performance/json_extract_simdjson.xml @@ -12,9 +12,6 @@ - - - diff --git a/dbms/tests/performance/leftpad.xml b/dbms/tests/performance/leftpad.xml index a38668c745d..a0717adbbd8 100644 --- a/dbms/tests/performance/leftpad.xml +++ b/dbms/tests/performance/leftpad.xml @@ -21,9 +21,6 @@ - - - diff --git a/dbms/tests/performance/linear_regression.xml b/dbms/tests/performance/linear_regression.xml index a04683c2b60..50634b6a60a 100644 --- a/dbms/tests/performance/linear_regression.xml +++ b/dbms/tests/performance/linear_regression.xml @@ -12,9 +12,6 @@ test.hits - - - DROP TABLE IF EXISTS test_model CREATE TABLE test_model engine = Memory as select stochasticLinearRegressionState(0.0001)(Age, Income, ParamPrice, Robotness, RefererHash) as state from test.hits diff --git a/dbms/tests/performance/math.xml b/dbms/tests/performance/math.xml index ea6d79b696d..5f4f302a0e8 100644 --- a/dbms/tests/performance/math.xml +++ b/dbms/tests/performance/math.xml @@ -8,9 +8,6 @@ - - - diff --git a/dbms/tests/performance/merge_table_streams.xml b/dbms/tests/performance/merge_table_streams.xml index 3f19c21109e..f1816e85097 100644 --- a/dbms/tests/performance/merge_table_streams.xml +++ b/dbms/tests/performance/merge_table_streams.xml @@ -15,9 +15,6 @@ - - - 5 diff --git a/dbms/tests/performance/merge_tree_many_partitions.xml b/dbms/tests/performance/merge_tree_many_partitions.xml index 8450a34dc59..6eb110bfab9 100644 --- a/dbms/tests/performance/merge_tree_many_partitions.xml +++ b/dbms/tests/performance/merge_tree_many_partitions.xml @@ -15,9 +15,6 @@ - - - 0 diff --git a/dbms/tests/performance/merge_tree_many_partitions_2.xml b/dbms/tests/performance/merge_tree_many_partitions_2.xml index 1ad06fcbf2f..038bff93057 100644 --- a/dbms/tests/performance/merge_tree_many_partitions_2.xml +++ b/dbms/tests/performance/merge_tree_many_partitions_2.xml @@ -15,9 +15,6 @@ - - - 0 diff --git a/dbms/tests/performance/modulo.xml b/dbms/tests/performance/modulo.xml index 931b160ea00..8e6674d0980 100644 --- a/dbms/tests/performance/modulo.xml +++ b/dbms/tests/performance/modulo.xml @@ -7,9 +7,6 @@ - - - SELECT number % 128 FROM numbers(300000000) FORMAT Null SELECT number % 255 FROM numbers(300000000) FORMAT Null diff --git a/dbms/tests/performance/ngram_distance.xml b/dbms/tests/performance/ngram_distance.xml index 8204a9e9d0b..9a8e4ac72a2 100644 --- a/dbms/tests/performance/ngram_distance.xml +++ b/dbms/tests/performance/ngram_distance.xml @@ -43,7 +43,4 @@ SELECT DISTINCT URL, ngramDistanceCaseInsensitiveUTF8(URL, 'как дЕлА') AS distance FROM hits_100m_single ORDER BY distance ASC LIMIT 50 SELECT DISTINCT URL, ngramDistanceCaseInsensitiveUTF8(URL, 'Чем зАнимаешЬся') AS distance FROM hits_100m_single ORDER BY distance ASC LIMIT 50 - - - diff --git a/dbms/tests/performance/number_formatting_formats.xml b/dbms/tests/performance/number_formatting_formats.xml index df83c5cbf11..aa9929464fb 100644 --- a/dbms/tests/performance/number_formatting_formats.xml +++ b/dbms/tests/performance/number_formatting_formats.xml @@ -14,9 +14,6 @@ - - - diff --git a/dbms/tests/performance/nyc_taxi.xml b/dbms/tests/performance/nyc_taxi.xml index ac97a8e1475..7648e377433 100644 --- a/dbms/tests/performance/nyc_taxi.xml +++ b/dbms/tests/performance/nyc_taxi.xml @@ -12,9 +12,6 @@ - - - default.trips_mergetree diff --git a/dbms/tests/performance/order_by_decimals.xml b/dbms/tests/performance/order_by_decimals.xml index ad6937cd1d6..faf2841e993 100644 --- a/dbms/tests/performance/order_by_decimals.xml +++ b/dbms/tests/performance/order_by_decimals.xml @@ -24,7 +24,4 @@ SELECT toDecimal64(number, 8) AS n FROM numbers(1000000) ORDER BY n DESC SELECT toDecimal128(number, 10) AS n FROM numbers(1000000) ORDER BY n DESC - - - diff --git a/dbms/tests/performance/order_by_read_in_order.xml b/dbms/tests/performance/order_by_read_in_order.xml index d0c5350b3c6..a99dd89846e 100644 --- a/dbms/tests/performance/order_by_read_in_order.xml +++ b/dbms/tests/performance/order_by_read_in_order.xml @@ -12,17 +12,7 @@ - - - - - - - - - - default.hits_100m_single diff --git a/dbms/tests/performance/order_by_single_column.xml b/dbms/tests/performance/order_by_single_column.xml index 98f2bdac17e..ed247641ca8 100644 --- a/dbms/tests/performance/order_by_single_column.xml +++ b/dbms/tests/performance/order_by_single_column.xml @@ -29,7 +29,4 @@ SELECT PageCharset as col FROM hits_100m_single ORDER BY col LIMIT 10000,1 SELECT Title as col FROM hits_100m_single ORDER BY col LIMIT 1000,1 - - - diff --git a/dbms/tests/performance/parse_engine_file.xml b/dbms/tests/performance/parse_engine_file.xml index 8308d8f049f..080acbd53f2 100644 --- a/dbms/tests/performance/parse_engine_file.xml +++ b/dbms/tests/performance/parse_engine_file.xml @@ -16,9 +16,6 @@ - - - diff --git a/dbms/tests/performance/prewhere.xml b/dbms/tests/performance/prewhere.xml index 2ba028562e5..e9a7c4c5a7f 100644 --- a/dbms/tests/performance/prewhere.xml +++ b/dbms/tests/performance/prewhere.xml @@ -12,9 +12,6 @@ - - - default.hits_10m_single diff --git a/dbms/tests/performance/random_printable_ascii.xml b/dbms/tests/performance/random_printable_ascii.xml new file mode 100644 index 00000000000..b37469c0aee --- /dev/null +++ b/dbms/tests/performance/random_printable_ascii.xml @@ -0,0 +1,19 @@ + + once + + + + 4000 + 10000 + + + + + SELECT count() FROM system.numbers WHERE NOT ignore(randomPrintableASCII(10)) + SELECT count() FROM system.numbers WHERE NOT ignore(randomPrintableASCII(100)) + SELECT count() FROM system.numbers WHERE NOT ignore(randomPrintableASCII(1000)) + SELECT count() FROM system.numbers WHERE NOT ignore(randomPrintableASCII(10000)) + SELECT count() FROM system.numbers WHERE NOT ignore(randomPrintableASCII(rand() % 10)) + SELECT count() FROM system.numbers WHERE NOT ignore(randomPrintableASCII(rand() % 100)) + SELECT count() FROM system.numbers WHERE NOT ignore(randomPrintableASCII(rand() % 1000)) + diff --git a/dbms/tests/performance/range.xml b/dbms/tests/performance/range.xml index b075bad5e43..48463b535ef 100644 --- a/dbms/tests/performance/range.xml +++ b/dbms/tests/performance/range.xml @@ -8,9 +8,6 @@ - - - SELECT count() FROM (SELECT range(number % 100) FROM system.numbers limit 10000000) SELECT count() FROM (SELECT range(0, number % 100, 1) FROM system.numbers limit 10000000) diff --git a/dbms/tests/performance/right.xml b/dbms/tests/performance/right.xml index 8d1304a4604..06d4bdaa93f 100644 --- a/dbms/tests/performance/right.xml +++ b/dbms/tests/performance/right.xml @@ -15,9 +15,6 @@ - - - diff --git a/dbms/tests/performance/round_down.xml b/dbms/tests/performance/round_down.xml index 34c030672b2..5275d69ad84 100644 --- a/dbms/tests/performance/round_down.xml +++ b/dbms/tests/performance/round_down.xml @@ -11,9 +11,6 @@ - - - SELECT count() FROM system.numbers WHERE NOT ignore(roundDuration(rand() % 65536)) SELECT count() FROM system.numbers WHERE NOT ignore(roundDown(rand() % 65536, [0, 1, 10, 30, 60, 120, 180, 240, 300, 600, 1200, 1800, 3600, 7200, 18000, 36000])) diff --git a/dbms/tests/performance/round_methods.xml b/dbms/tests/performance/round_methods.xml index d999feaedd4..b80a8977c33 100644 --- a/dbms/tests/performance/round_methods.xml +++ b/dbms/tests/performance/round_methods.xml @@ -11,9 +11,6 @@ - - - SELECT count() FROM system.numbers WHERE NOT ignore(round(toInt64(number), -2)) SELECT count() FROM system.numbers WHERE NOT ignore(roundBankers(toInt64(number), -2)) diff --git a/dbms/tests/performance/scalar.xml b/dbms/tests/performance/scalar.xml index bb8044685d3..d1bc661c58f 100644 --- a/dbms/tests/performance/scalar.xml +++ b/dbms/tests/performance/scalar.xml @@ -11,9 +11,6 @@ - - - CREATE TABLE cdp_tags (tag_id String, mid_seqs AggregateFunction(groupBitmap, UInt32)) engine=MergeTree() ORDER BY (tag_id) SETTINGS index_granularity=1 CREATE TABLE cdp_orders(order_id UInt64, order_complete_time DateTime, order_total_sales Float32, mid_seq UInt32) engine=MergeTree() PARTITION BY toYYYYMMDD(order_complete_time) ORDER BY (order_complete_time, order_id) diff --git a/dbms/tests/performance/select_format.xml b/dbms/tests/performance/select_format.xml index 55ab7b2d458..621247fee1e 100644 --- a/dbms/tests/performance/select_format.xml +++ b/dbms/tests/performance/select_format.xml @@ -1,7 +1,7 @@ loop - CREATE TABLE IF NOT EXISTS table_{format} ENGINE = File({format}) AS test.hits + CREATE TABLE IF NOT EXISTS table_{format} ENGINE = File({format}, '/dev/null') AS test.hits @@ -14,12 +14,10 @@ - - - 1000000 + 1 diff --git a/dbms/tests/performance/set.xml b/dbms/tests/performance/set.xml index 1e62840d8d1..7f3ee4fd4c1 100644 --- a/dbms/tests/performance/set.xml +++ b/dbms/tests/performance/set.xml @@ -14,9 +14,6 @@ - - - diff --git a/dbms/tests/performance/set_hits.xml b/dbms/tests/performance/set_hits.xml index b4afc111ad1..f124de84e64 100644 --- a/dbms/tests/performance/set_hits.xml +++ b/dbms/tests/performance/set_hits.xml @@ -15,9 +15,6 @@ - - - SELECT count() FROM hits_100m_single WHERE UserID IN (SELECT UserID FROM hits_100m_single WHERE AdvEngineID != 0) SELECT count() FROM hits_100m_single WHERE UserID IN (SELECT UserID FROM hits_100m_single) diff --git a/dbms/tests/performance/simple_join_query.xml b/dbms/tests/performance/simple_join_query.xml index 7d8981db2ff..1f6d6ba74d6 100644 --- a/dbms/tests/performance/simple_join_query.xml +++ b/dbms/tests/performance/simple_join_query.xml @@ -11,9 +11,6 @@ - - - CREATE TABLE join_table(A Int64, S0 String, S1 String, S2 String, S3 String)ENGINE = MergeTree ORDER BY A diff --git a/dbms/tests/performance/slices_hits.xml b/dbms/tests/performance/slices_hits.xml index 84aee81c8a1..ad01a607b8a 100644 --- a/dbms/tests/performance/slices_hits.xml +++ b/dbms/tests/performance/slices_hits.xml @@ -12,9 +12,6 @@ - - - test.hits diff --git a/dbms/tests/performance/string_join.xml b/dbms/tests/performance/string_join.xml index 6c0ad83d5b4..228fe3182b8 100644 --- a/dbms/tests/performance/string_join.xml +++ b/dbms/tests/performance/string_join.xml @@ -7,9 +7,6 @@ - - - default.hits_10m_single diff --git a/dbms/tests/performance/string_set.xml b/dbms/tests/performance/string_set.xml index 08a74e3e8f9..cf6261d6d60 100644 --- a/dbms/tests/performance/string_set.xml +++ b/dbms/tests/performance/string_set.xml @@ -7,9 +7,6 @@ - - - hits_10m_single diff --git a/dbms/tests/performance/string_sort.xml b/dbms/tests/performance/string_sort.xml index 7b33bb20d52..c4c9463a9aa 100644 --- a/dbms/tests/performance/string_sort.xml +++ b/dbms/tests/performance/string_sort.xml @@ -47,7 +47,4 @@ - - - diff --git a/dbms/tests/performance/trim_numbers.xml b/dbms/tests/performance/trim_numbers.xml index 48fdb710cd9..997272e95f6 100644 --- a/dbms/tests/performance/trim_numbers.xml +++ b/dbms/tests/performance/trim_numbers.xml @@ -11,9 +11,6 @@ - - - diff --git a/dbms/tests/performance/trim_urls.xml b/dbms/tests/performance/trim_urls.xml index 2b672d9d97d..23dd3f77f6e 100644 --- a/dbms/tests/performance/trim_urls.xml +++ b/dbms/tests/performance/trim_urls.xml @@ -15,9 +15,6 @@ - - - diff --git a/dbms/tests/performance/trim_whitespace.xml b/dbms/tests/performance/trim_whitespace.xml index 26d89859432..e886359a368 100644 --- a/dbms/tests/performance/trim_whitespace.xml +++ b/dbms/tests/performance/trim_whitespace.xml @@ -10,9 +10,6 @@ - - - diff --git a/dbms/tests/performance/uniq.xml b/dbms/tests/performance/uniq.xml index c44a4e2ca58..307ad6f88ef 100644 --- a/dbms/tests/performance/uniq.xml +++ b/dbms/tests/performance/uniq.xml @@ -13,9 +13,6 @@ - - - 30000000000 diff --git a/dbms/tests/performance/url_hits.xml b/dbms/tests/performance/url_hits.xml index a251a2706b8..d4e504cd1b8 100644 --- a/dbms/tests/performance/url_hits.xml +++ b/dbms/tests/performance/url_hits.xml @@ -15,9 +15,6 @@ - - - diff --git a/dbms/tests/performance/vectorize_aggregation_combinators.xml b/dbms/tests/performance/vectorize_aggregation_combinators.xml index a1afb2e6cc8..73024f454f9 100644 --- a/dbms/tests/performance/vectorize_aggregation_combinators.xml +++ b/dbms/tests/performance/vectorize_aggregation_combinators.xml @@ -12,9 +12,6 @@ - - - 1 diff --git a/dbms/tests/performance/visit_param_extract_raw.xml b/dbms/tests/performance/visit_param_extract_raw.xml index 02b8224c361..0faa43088e7 100644 --- a/dbms/tests/performance/visit_param_extract_raw.xml +++ b/dbms/tests/performance/visit_param_extract_raw.xml @@ -8,12 +8,6 @@ - - - - - - diff --git a/dbms/tests/performance/website.xml b/dbms/tests/performance/website.xml index 4cb350a60a1..83a1c3607c7 100644 --- a/dbms/tests/performance/website.xml +++ b/dbms/tests/performance/website.xml @@ -16,9 +16,6 @@ - - - 20000000000 diff --git a/dbms/tests/queries/0_stateless/00121_drop_column_zookeeper.reference b/dbms/tests/queries/0_stateless/00121_drop_column_zookeeper.reference index db5cdb698b3..b5b696fc82c 100644 --- a/dbms/tests/queries/0_stateless/00121_drop_column_zookeeper.reference +++ b/dbms/tests/queries/0_stateless/00121_drop_column_zookeeper.reference @@ -1,13 +1,5 @@ -columns format version: 1 -1 columns: -`d` Date - 2014-01-01 2014-01-01 0 2014-02-01 1 2014-01-01 2014-02-01 -columns format version: 1 -1 columns: -`d` Date - diff --git a/dbms/tests/queries/0_stateless/00121_drop_column_zookeeper.sql b/dbms/tests/queries/0_stateless/00121_drop_column_zookeeper.sql index e31dd1d2872..ff05cea9c84 100644 --- a/dbms/tests/queries/0_stateless/00121_drop_column_zookeeper.sql +++ b/dbms/tests/queries/0_stateless/00121_drop_column_zookeeper.sql @@ -4,8 +4,6 @@ CREATE TABLE alter_00121 (d Date, x UInt8) ENGINE = ReplicatedMergeTree('/clickh INSERT INTO alter_00121 VALUES ('2014-01-01', 1); ALTER TABLE alter_00121 DROP COLUMN x; -SELECT value FROM system.zookeeper WHERE path = '/clickhouse/tables/test/alter_00121/replicas/r1/parts/20140101_20140101_0_0_0' AND name = 'columns' FORMAT TabSeparatedRaw; - DROP TABLE alter_00121; @@ -22,6 +20,4 @@ SELECT * FROM alter_00121 ORDER BY d; ALTER TABLE alter_00121 DROP COLUMN x; SELECT * FROM alter_00121 ORDER BY d; -SELECT value FROM system.zookeeper WHERE path = '/clickhouse/tables/test/alter_00121/replicas/r1/parts/20140201_20140201_0_0_0' AND name = 'columns' FORMAT TabSeparatedRaw; - DROP TABLE alter_00121; diff --git a/dbms/tests/queries/0_stateless/00611_zookeeper_different_checksums_formats.reference b/dbms/tests/queries/0_stateless/00611_zookeeper_different_checksums_formats.reference deleted file mode 100644 index da05f197b7e..00000000000 --- a/dbms/tests/queries/0_stateless/00611_zookeeper_different_checksums_formats.reference +++ /dev/null @@ -1,9 +0,0 @@ -1 -1 -0 [] -1 [] -0 [] -1 [] -DETACH -0 [] -1 [] diff --git a/dbms/tests/queries/0_stateless/00611_zookeeper_different_checksums_formats.sql b/dbms/tests/queries/0_stateless/00611_zookeeper_different_checksums_formats.sql deleted file mode 100644 index 3fbb0d400f3..00000000000 --- a/dbms/tests/queries/0_stateless/00611_zookeeper_different_checksums_formats.sql +++ /dev/null @@ -1,24 +0,0 @@ -DROP TABLE IF EXISTS table_old; -DROP TABLE IF EXISTS table_new; - -CREATE TABLE table_old (k UInt64, d Array(String)) ENGINE = ReplicatedMergeTree('/clickhouse/test/tables/checksums_test', 'old') ORDER BY k SETTINGS use_minimalistic_checksums_in_zookeeper=0; -CREATE TABLE table_new (k UInt64, d Array(String)) ENGINE = ReplicatedMergeTree('/clickhouse/test/tables/checksums_test', 'new') ORDER BY k SETTINGS use_minimalistic_checksums_in_zookeeper=1; - -SET insert_quorum=2; -INSERT INTO table_old VALUES (0, []); -SELECT value LIKE '%checksums format version: 4%' FROM system.zookeeper WHERE path='/clickhouse/test/tables/checksums_test/replicas/old/parts/all_0_0_0' AND name = 'checksums'; - -INSERT INTO table_new VALUES (1, []); -SELECT value LIKE '%checksums format version: 5%' FROM system.zookeeper WHERE path='/clickhouse/test/tables/checksums_test/replicas/new/parts/all_1_1_0' AND name = 'checksums'; - -OPTIMIZE TABLE table_old; -SELECT * FROM table_old ORDER BY k; -SELECT * FROM table_new ORDER BY k; - -SELECT 'DETACH'; -DETACH TABLE table_old; -ATTACH TABLE table_old (k UInt64, d Array(String)) ENGINE = ReplicatedMergeTree('/clickhouse/test/tables/checksums_test', 'old') ORDER BY k SETTINGS use_minimalistic_checksums_in_zookeeper=1; -SELECT * FROM table_old ORDER BY k; - -DROP TABLE IF EXISTS table_old; -DROP TABLE IF EXISTS table_new; \ No newline at end of file diff --git a/dbms/tests/queries/0_stateless/00820_multiple_joins.sql b/dbms/tests/queries/0_stateless/00820_multiple_joins.sql index 309b369ade3..890d003a124 100644 --- a/dbms/tests/queries/0_stateless/00820_multiple_joins.sql +++ b/dbms/tests/queries/0_stateless/00820_multiple_joins.sql @@ -13,8 +13,6 @@ INSERT INTO table2 SELECT number * 2, number * 20 FROM numbers(11); INSERT INTO table3 SELECT number * 30, number * 300 FROM numbers(10); INSERT INTO table5 SELECT number * 5, number * 50, number * 500 FROM numbers(10); -SET allow_experimental_multiple_joins_emulation = 1; - select t1.a, t2.b, t3.c from table1 as t1 join table2 as t2 on t1.a = t2.a join table3 as t3 on t2.b = t3.b; select t1.a, t2.b, t5.c from table1 as t1 join table2 as t2 on t1.a = t2.a join table5 as t5 on t1.a = t5.a AND t2.b = t5.b; diff --git a/dbms/tests/queries/0_stateless/00820_multiple_joins_subquery_requires_alias.sql b/dbms/tests/queries/0_stateless/00820_multiple_joins_subquery_requires_alias.sql index ee7910ab41f..ad59e02ecad 100644 --- a/dbms/tests/queries/0_stateless/00820_multiple_joins_subquery_requires_alias.sql +++ b/dbms/tests/queries/0_stateless/00820_multiple_joins_subquery_requires_alias.sql @@ -13,7 +13,6 @@ INSERT INTO table2 SELECT number * 2, number * 20 FROM numbers(11); INSERT INTO table3 SELECT number * 30, number * 300 FROM numbers(10); INSERT INTO table5 SELECT number * 5, number * 50, number * 500 FROM numbers(10); -SET allow_experimental_multiple_joins_emulation = 1; SET joined_subquery_requires_alias = 1; select t1.a, t2.b, t3.c from table1 as t1 join table2 as t2 on t1.a = t2.a join table3 as t3 on t2.b = t3.b; diff --git a/dbms/tests/queries/0_stateless/00826_cross_to_inner_join.reference b/dbms/tests/queries/0_stateless/00826_cross_to_inner_join.reference index df21becc999..32b1c42ca2c 100644 --- a/dbms/tests/queries/0_stateless/00826_cross_to_inner_join.reference +++ b/dbms/tests/queries/0_stateless/00826_cross_to_inner_join.reference @@ -1,27 +1,17 @@ 0 0 -0 0 cross 1 1 1 1 1 1 1 2 2 2 2 \N -1 1 1 1 -1 1 1 2 -2 2 2 \N cross nullable 1 1 1 1 2 2 1 2 -1 1 1 1 -2 2 1 2 cross nullable vs not nullable 1 1 1 1 2 2 1 2 -1 1 1 1 -2 2 1 2 cross self 1 1 1 1 2 2 2 2 -1 1 1 1 -2 2 2 2 cross one table expr 1 1 1 1 1 1 1 2 @@ -31,23 +21,12 @@ cross one table expr 2 2 1 2 2 2 2 \N 2 2 3 \N -1 1 1 1 -1 1 1 2 -1 1 2 \N -1 1 3 \N -2 2 1 1 -2 2 1 2 -2 2 2 \N -2 2 3 \N cross multiple ands 1 1 1 1 -1 1 1 1 cross and inside and 1 1 1 1 -1 1 1 1 cross split conjunction 1 1 1 1 -1 1 1 1 comma 1 1 1 1 1 1 1 2 @@ -56,26 +35,18 @@ comma nullable 1 1 1 1 2 2 1 2 cross -SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nCROSS JOIN t2_00826\nWHERE a = t2_00826.a SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nALL INNER JOIN t2_00826 ON a = t2_00826.a\nWHERE a = t2_00826.a cross nullable -SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\n, t2_00826\nWHERE a = t2_00826.a SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nALL INNER JOIN t2_00826 ON a = t2_00826.a\nWHERE a = t2_00826.a cross nullable vs not nullable -SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nCROSS JOIN t2_00826\nWHERE a = t2_00826.b SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nALL INNER JOIN t2_00826 ON a = t2_00826.b\nWHERE a = t2_00826.b cross self -SELECT \n a, \n b, \n y.a, \n y.b\nFROM t1_00826 AS x\nCROSS JOIN t1_00826 AS y\nWHERE (a = y.a) AND (b = y.b) SELECT \n a, \n b, \n y.a, \n y.b\nFROM t1_00826 AS x\nALL INNER JOIN t1_00826 AS y ON (a = y.a) AND (b = y.b)\nWHERE (a = y.a) AND (b = y.b) cross one table expr SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nCROSS JOIN t2_00826\nWHERE a = b -SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nCROSS JOIN t2_00826\nWHERE a = b cross multiple ands -SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nCROSS JOIN t2_00826\nWHERE (a = t2_00826.a) AND (b = t2_00826.b) SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nALL INNER JOIN t2_00826 ON (a = t2_00826.a) AND (b = t2_00826.b)\nWHERE (a = t2_00826.a) AND (b = t2_00826.b) cross and inside and -SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nCROSS JOIN t2_00826\nWHERE (a = t2_00826.a) AND ((a = t2_00826.a) AND ((a = t2_00826.a) AND (b = t2_00826.b))) SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nALL INNER JOIN t2_00826 ON (a = t2_00826.a) AND (a = t2_00826.a) AND (a = t2_00826.a) AND (b = t2_00826.b)\nWHERE (a = t2_00826.a) AND ((a = t2_00826.a) AND ((a = t2_00826.a) AND (b = t2_00826.b))) cross split conjunction -SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nCROSS JOIN t2_00826\nWHERE (a = t2_00826.a) AND (b = t2_00826.b) AND (a >= 1) AND (t2_00826.b > 0) SELECT \n a, \n b, \n t2_00826.a, \n t2_00826.b\nFROM t1_00826\nALL INNER JOIN t2_00826 ON (a = t2_00826.a) AND (b = t2_00826.b)\nWHERE (a = t2_00826.a) AND (b = t2_00826.b) AND (a >= 1) AND (t2_00826.b > 0) diff --git a/dbms/tests/queries/0_stateless/00826_cross_to_inner_join.sql b/dbms/tests/queries/0_stateless/00826_cross_to_inner_join.sql index e21d257d2da..618c0374a28 100644 --- a/dbms/tests/queries/0_stateless/00826_cross_to_inner_join.sql +++ b/dbms/tests/queries/0_stateless/00826_cross_to_inner_join.sql @@ -1,9 +1,6 @@ SET enable_debug_queries = 1; SET enable_optimize_predicate_expression = 0; -set allow_experimental_cross_to_join_conversion = 0; -select * from system.one l cross join system.one r; -set allow_experimental_cross_to_join_conversion = 1; select * from system.one l cross join system.one r; DROP TABLE IF EXISTS t1_00826; @@ -17,50 +14,21 @@ INSERT INTO t2_00826 values (1,1), (1,2); INSERT INTO t2_00826 (a) values (2), (3); SELECT 'cross'; -SET allow_experimental_cross_to_join_conversion = 0; -SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a; -SET allow_experimental_cross_to_join_conversion = 1; SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a; SELECT 'cross nullable'; -SET allow_experimental_cross_to_join_conversion = 0; -SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.b = t2_00826.b; -SET allow_experimental_cross_to_join_conversion = 1; SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.b = t2_00826.b; SELECT 'cross nullable vs not nullable'; -SET allow_experimental_cross_to_join_conversion = 0; -SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.b; -SET allow_experimental_cross_to_join_conversion = 1; SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.b; SELECT 'cross self'; -SET allow_experimental_cross_to_join_conversion = 0; -SELECT * FROM t1_00826 x cross join t1_00826 y where x.a = y.a and x.b = y.b; -SET allow_experimental_cross_to_join_conversion = 1; SELECT * FROM t1_00826 x cross join t1_00826 y where x.a = y.a and x.b = y.b; SELECT 'cross one table expr'; -SET allow_experimental_cross_to_join_conversion = 0; -SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t1_00826.b order by (t1_00826.a, t2_00826.a, t2_00826.b); -SET allow_experimental_cross_to_join_conversion = 1; SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t1_00826.b order by (t1_00826.a, t2_00826.a, t2_00826.b); SELECT 'cross multiple ands'; -SET allow_experimental_cross_to_join_conversion = 0; ---SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b and t1_00826.a = t2_00826.a; -SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b; -SET allow_experimental_cross_to_join_conversion = 1; SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b; SELECT 'cross and inside and'; -SET allow_experimental_cross_to_join_conversion = 0; ---SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and (t1_00826.a = t2_00826.a and (t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b)); ---SELECT * FROM t1_00826 x cross join t2_00826 y where t1_00826.a = t2_00826.a and (t1_00826.b = t2_00826.b and (x.a = y.a and x.b = y.b)); -SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and (t1_00826.b = t2_00826.b and 1); -SET allow_experimental_cross_to_join_conversion = 1; SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and (t1_00826.b = t2_00826.b and 1); SELECT 'cross split conjunction'; -SET allow_experimental_cross_to_join_conversion = 0; SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b and t1_00826.a >= 1 and t2_00826.b = 1; -SET allow_experimental_cross_to_join_conversion = 1; -SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b and t1_00826.a >= 1 and t2_00826.b = 1; - -SET allow_experimental_cross_to_join_conversion = 1; SELECT 'comma'; SELECT * FROM t1_00826, t2_00826 where t1_00826.a = t2_00826.a; @@ -69,30 +37,22 @@ SELECT * FROM t1_00826, t2_00826 where t1_00826.b = t2_00826.b; SELECT 'cross'; -SET allow_experimental_cross_to_join_conversion = 0; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a; -SET allow_experimental_cross_to_join_conversion = 1; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a; +ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a; SELECT 'cross nullable'; -SET allow_experimental_cross_to_join_conversion = 0; ANALYZE SELECT * FROM t1_00826, t2_00826 where t1_00826.a = t2_00826.a; -SET allow_experimental_cross_to_join_conversion = 1; ANALYZE SELECT * FROM t1_00826, t2_00826 where t1_00826.a = t2_00826.a; +ANALYZE SELECT * FROM t1_00826, t2_00826 where t1_00826.a = t2_00826.a; SELECT 'cross nullable vs not nullable'; -SET allow_experimental_cross_to_join_conversion = 0; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.b; -SET allow_experimental_cross_to_join_conversion = 1; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.b; +ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.b; SELECT 'cross self'; -SET allow_experimental_cross_to_join_conversion = 0; ANALYZE SELECT * FROM t1_00826 x cross join t1_00826 y where x.a = y.a and x.b = y.b; -SET allow_experimental_cross_to_join_conversion = 1; ANALYZE SELECT * FROM t1_00826 x cross join t1_00826 y where x.a = y.a and x.b = y.b; +ANALYZE SELECT * FROM t1_00826 x cross join t1_00826 y where x.a = y.a and x.b = y.b; SELECT 'cross one table expr'; -SET allow_experimental_cross_to_join_conversion = 0; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t1_00826.b; -SET allow_experimental_cross_to_join_conversion = 1; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t1_00826.b; +ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t1_00826.b; SELECT 'cross multiple ands'; -SET allow_experimental_cross_to_join_conversion = 0; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b; -SET allow_experimental_cross_to_join_conversion = 1; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b; +ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b; SELECT 'cross and inside and'; -SET allow_experimental_cross_to_join_conversion = 0; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and (t1_00826.a = t2_00826.a and (t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b)); -SET allow_experimental_cross_to_join_conversion = 1; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and (t1_00826.a = t2_00826.a and (t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b)); +ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and (t1_00826.a = t2_00826.a and (t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b)); SELECT 'cross split conjunction'; -SET allow_experimental_cross_to_join_conversion = 0; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b and t1_00826.a >= 1 and t2_00826.b > 0; -SET allow_experimental_cross_to_join_conversion = 1; ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b and t1_00826.a >= 1 and t2_00826.b > 0; +ANALYZE SELECT * FROM t1_00826 cross join t2_00826 where t1_00826.a = t2_00826.a and t1_00826.b = t2_00826.b and t1_00826.a >= 1 and t2_00826.b > 0; DROP TABLE t1_00826; DROP TABLE t2_00826; diff --git a/dbms/tests/queries/0_stateless/00836_indices_alter.sql b/dbms/tests/queries/0_stateless/00836_indices_alter.sql index e277e0e7bb4..c059c6210da 100644 --- a/dbms/tests/queries/0_stateless/00836_indices_alter.sql +++ b/dbms/tests/queries/0_stateless/00836_indices_alter.sql @@ -1,7 +1,6 @@ DROP TABLE IF EXISTS minmax_idx; DROP TABLE IF EXISTS minmax_idx2; -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE minmax_idx ( diff --git a/dbms/tests/queries/0_stateless/00836_indices_alter_replicated_zookeeper.sql b/dbms/tests/queries/0_stateless/00836_indices_alter_replicated_zookeeper.sql index 4240348f7de..a08d6a20a87 100644 --- a/dbms/tests/queries/0_stateless/00836_indices_alter_replicated_zookeeper.sql +++ b/dbms/tests/queries/0_stateless/00836_indices_alter_replicated_zookeeper.sql @@ -3,7 +3,6 @@ DROP TABLE IF EXISTS test.minmax_idx_r; DROP TABLE IF EXISTS test.minmax_idx2; DROP TABLE IF EXISTS test.minmax_idx2_r; -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE test.minmax_idx ( diff --git a/dbms/tests/queries/0_stateless/00837_minmax_index.sh b/dbms/tests/queries/0_stateless/00837_minmax_index.sh index 210e36603b5..75b1d8d9725 100755 --- a/dbms/tests/queries/0_stateless/00837_minmax_index.sh +++ b/dbms/tests/queries/0_stateless/00837_minmax_index.sh @@ -1,6 +1,5 @@ #!/usr/bin/env bash -CLICKHOUSE_CLIENT_OPT="--allow_experimental_data_skipping_indices=1" CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) . $CURDIR/../shell_config.sh @@ -9,7 +8,6 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS minmax_idx;" $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE minmax_idx ( u64 UInt64, diff --git a/dbms/tests/queries/0_stateless/00837_minmax_index_replicated_zookeeper.sql b/dbms/tests/queries/0_stateless/00837_minmax_index_replicated_zookeeper.sql index 6c4d2b95a8e..bd7f9d91694 100644 --- a/dbms/tests/queries/0_stateless/00837_minmax_index_replicated_zookeeper.sql +++ b/dbms/tests/queries/0_stateless/00837_minmax_index_replicated_zookeeper.sql @@ -1,7 +1,6 @@ DROP TABLE IF EXISTS minmax_idx1; DROP TABLE IF EXISTS minmax_idx2; -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE minmax_idx1 ( diff --git a/dbms/tests/queries/0_stateless/00838_unique_index.sh b/dbms/tests/queries/0_stateless/00838_unique_index.sh index fbc265f7a6e..c2e288b7402 100755 --- a/dbms/tests/queries/0_stateless/00838_unique_index.sh +++ b/dbms/tests/queries/0_stateless/00838_unique_index.sh @@ -1,6 +1,5 @@ #!/usr/bin/env bash -CLICKHOUSE_CLIENT_OPT="--allow_experimental_data_skipping_indices=1" CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) . $CURDIR/../shell_config.sh @@ -8,7 +7,6 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS set_idx;" $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE set_idx ( u64 UInt64, diff --git a/dbms/tests/queries/0_stateless/00855_join_with_array_join.reference b/dbms/tests/queries/0_stateless/00855_join_with_array_join.reference index f5405e564c8..386bde518ea 100644 --- a/dbms/tests/queries/0_stateless/00855_join_with_array_join.reference +++ b/dbms/tests/queries/0_stateless/00855_join_with_array_join.reference @@ -4,9 +4,3 @@ 4 0 5 0 6 0 -1 0 -2 0 -3 0 -4 0 -5 0 -6 0 diff --git a/dbms/tests/queries/0_stateless/00855_join_with_array_join.sql b/dbms/tests/queries/0_stateless/00855_join_with_array_join.sql index 2170e0a67d3..10b03fec062 100644 --- a/dbms/tests/queries/0_stateless/00855_join_with_array_join.sql +++ b/dbms/tests/queries/0_stateless/00855_join_with_array_join.sql @@ -1,13 +1,5 @@ SET joined_subquery_requires_alias = 0; -set allow_experimental_multiple_joins_emulation = 0; -set allow_experimental_cross_to_join_conversion = 0; -select ax, c from (select [1,2] ax, 0 c) array join ax join (select 0 c) using(c); -select ax, c from (select [3,4] ax, 0 c) join (select 0 c) using(c) array join ax; -select ax, c from (select [5,6] ax, 0 c) s1 join system.one s2 ON s1.c = s2.dummy array join ax; - -set allow_experimental_multiple_joins_emulation = 1; -set allow_experimental_cross_to_join_conversion = 1; select ax, c from (select [1,2] ax, 0 c) array join ax join (select 0 c) using(c); select ax, c from (select [3,4] ax, 0 c) join (select 0 c) using(c) array join ax; select ax, c from (select [5,6] ax, 0 c) s1 join system.one s2 ON s1.c = s2.dummy array join ax; diff --git a/dbms/tests/queries/0_stateless/00907_set_index_max_rows.sh b/dbms/tests/queries/0_stateless/00907_set_index_max_rows.sh index 98c0086488d..d32b4c22e04 100755 --- a/dbms/tests/queries/0_stateless/00907_set_index_max_rows.sh +++ b/dbms/tests/queries/0_stateless/00907_set_index_max_rows.sh @@ -6,7 +6,6 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS set_idx;" $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE set_idx ( u64 UInt64, diff --git a/dbms/tests/queries/0_stateless/00907_set_index_with_nullable_and_low_cardinality.sql b/dbms/tests/queries/0_stateless/00907_set_index_with_nullable_and_low_cardinality.sql index 21bbbf1e6e9..9ef5662c112 100644 --- a/dbms/tests/queries/0_stateless/00907_set_index_with_nullable_and_low_cardinality.sql +++ b/dbms/tests/queries/0_stateless/00907_set_index_with_nullable_and_low_cardinality.sql @@ -1,4 +1,3 @@ -SET allow_experimental_data_skipping_indices=1; drop table if exists nullable_set_index; create table nullable_set_index (a UInt64, b Nullable(String), INDEX b_index b TYPE set(0) GRANULARITY 8192) engine = MergeTree order by a; diff --git a/dbms/tests/queries/0_stateless/00907_set_index_with_nullable_and_low_cardinality_bug.sql b/dbms/tests/queries/0_stateless/00907_set_index_with_nullable_and_low_cardinality_bug.sql index 6af826163a3..75e0e482566 100644 --- a/dbms/tests/queries/0_stateless/00907_set_index_with_nullable_and_low_cardinality_bug.sql +++ b/dbms/tests/queries/0_stateless/00907_set_index_with_nullable_and_low_cardinality_bug.sql @@ -1,4 +1,3 @@ -SET allow_experimental_data_skipping_indices=1; drop table if exists null_lc_set_index; diff --git a/dbms/tests/queries/0_stateless/00908_bloom_filter_index.sh b/dbms/tests/queries/0_stateless/00908_bloom_filter_index.sh index bc3ae763c41..3fa0a483610 100755 --- a/dbms/tests/queries/0_stateless/00908_bloom_filter_index.sh +++ b/dbms/tests/queries/0_stateless/00908_bloom_filter_index.sh @@ -10,7 +10,6 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS bloom_filter_idx2;" # NGRAM BF $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE bloom_filter_idx ( k UInt64, @@ -21,7 +20,6 @@ ORDER BY k SETTINGS index_granularity = 2;" $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE bloom_filter_idx2 ( k UInt64, @@ -105,7 +103,6 @@ $CLICKHOUSE_CLIENT --query="SELECT * FROM bloom_filter_idx WHERE (s, lower(s)) I # TOKEN BF $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE bloom_filter_idx3 ( k UInt64, @@ -144,7 +141,6 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE bloom_filter_idx3" $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS bloom_filter_idx_na;" $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE bloom_filter_idx_na ( na Array(Array(String)), diff --git a/dbms/tests/queries/0_stateless/00909_kill_not_initialized_query.sh b/dbms/tests/queries/0_stateless/00909_kill_not_initialized_query.sh index 677709dd2c0..b5982fd0dc8 100755 --- a/dbms/tests/queries/0_stateless/00909_kill_not_initialized_query.sh +++ b/dbms/tests/queries/0_stateless/00909_kill_not_initialized_query.sh @@ -34,7 +34,7 @@ $CLICKHOUSE_CLIENT -q "KILL QUERY WHERE query='$query_to_kill' ASYNC" &>/dev/nul sleep 1 # Kill $query_for_pending SYNC. This query is not blocker, so it should be killed fast. -timeout 10 $CLICKHOUSE_CLIENT -q "KILL QUERY WHERE query='$query_for_pending' SYNC" &>/dev/null +timeout 20 $CLICKHOUSE_CLIENT -q "KILL QUERY WHERE query='$query_for_pending' SYNC" &>/dev/null # Both queries have to be killed, doesn't matter with SYNC or ASYNC kill for run in {1..15} diff --git a/dbms/tests/queries/0_stateless/00915_simple_aggregate_function.reference b/dbms/tests/queries/0_stateless/00915_simple_aggregate_function.reference index f2965419d0a..f60d942d406 100644 --- a/dbms/tests/queries/0_stateless/00915_simple_aggregate_function.reference +++ b/dbms/tests/queries/0_stateless/00915_simple_aggregate_function.reference @@ -39,6 +39,6 @@ SimpleAggregateFunction(sum, Double) 7 14 8 16 9 18 -1 1 2 2.2.2.2 -10 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 20 20.20.20.20 -SimpleAggregateFunction(anyLast, Nullable(String)) SimpleAggregateFunction(anyLast, LowCardinality(Nullable(String))) SimpleAggregateFunction(anyLast, IPv4) +1 1 2 2.2.2.2 3 +10 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 20 20.20.20.20 5 +SimpleAggregateFunction(anyLast, Nullable(String)) SimpleAggregateFunction(anyLast, LowCardinality(Nullable(String))) SimpleAggregateFunction(anyLast, IPv4) SimpleAggregateFunction(groupBitOr, UInt32) diff --git a/dbms/tests/queries/0_stateless/00915_simple_aggregate_function.sql b/dbms/tests/queries/0_stateless/00915_simple_aggregate_function.sql index c010aad7e5a..037032a84cc 100644 --- a/dbms/tests/queries/0_stateless/00915_simple_aggregate_function.sql +++ b/dbms/tests/queries/0_stateless/00915_simple_aggregate_function.sql @@ -19,14 +19,20 @@ select * from simple; -- complex types drop table if exists simple; -create table simple (id UInt64,nullable_str SimpleAggregateFunction(anyLast,Nullable(String)),low_str SimpleAggregateFunction(anyLast,LowCardinality(Nullable(String))),ip SimpleAggregateFunction(anyLast,IPv4)) engine=AggregatingMergeTree order by id; -insert into simple values(1,'1','1','1.1.1.1'); -insert into simple values(1,null,'2','2.2.2.2'); +create table simple ( + id UInt64, + nullable_str SimpleAggregateFunction(anyLast,Nullable(String)), + low_str SimpleAggregateFunction(anyLast,LowCardinality(Nullable(String))), + ip SimpleAggregateFunction(anyLast,IPv4), + status SimpleAggregateFunction(groupBitOr, UInt32) +) engine=AggregatingMergeTree order by id; +insert into simple values(1,'1','1','1.1.1.1', 1); +insert into simple values(1,null,'2','2.2.2.2', 2); -- String longer then MAX_SMALL_STRING_SIZE (actual string length is 100) -insert into simple values(10,'10','10','10.10.10.10'); -insert into simple values(10,'2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222','20','20.20.20.20'); +insert into simple values(10,'10','10','10.10.10.10', 4); +insert into simple values(10,'2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222','20','20.20.20.20', 1); select * from simple final; -select toTypeName(nullable_str),toTypeName(low_str),toTypeName(ip) from simple limit 1; +select toTypeName(nullable_str),toTypeName(low_str),toTypeName(ip),toTypeName(status) from simple limit 1; drop table simple; diff --git a/dbms/tests/queries/0_stateless/00942_mutate_index.sh b/dbms/tests/queries/0_stateless/00942_mutate_index.sh index 467eb9ab671..30ac7e8821b 100755 --- a/dbms/tests/queries/0_stateless/00942_mutate_index.sh +++ b/dbms/tests/queries/0_stateless/00942_mutate_index.sh @@ -8,7 +8,6 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS test.minmax_idx;" $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices=1; CREATE TABLE test.minmax_idx ( u64 UInt64, diff --git a/dbms/tests/queries/0_stateless/00943_materialize_index.sh b/dbms/tests/queries/0_stateless/00943_materialize_index.sh index bc59b41b005..b406f3894eb 100755 --- a/dbms/tests/queries/0_stateless/00943_materialize_index.sh +++ b/dbms/tests/queries/0_stateless/00943_materialize_index.sh @@ -35,7 +35,6 @@ $CLICKHOUSE_CLIENT --query="SELECT count() FROM test.minmax_idx WHERE i64 = 2;" $CLICKHOUSE_CLIENT --query="SELECT count() FROM test.minmax_idx WHERE i64 = 2 FORMAT JSON" | grep "rows_read" $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices=1; ALTER TABLE test.minmax_idx ADD INDEX idx (i64, u64 * i64) TYPE minmax GRANULARITY 1;" $CLICKHOUSE_CLIENT --query="ALTER TABLE test.minmax_idx MATERIALIZE INDEX idx IN PARTITION 1;" diff --git a/dbms/tests/queries/0_stateless/00944_clear_index_in_partition.sh b/dbms/tests/queries/0_stateless/00944_clear_index_in_partition.sh index 74f15e63545..5cdf4c4bbfd 100755 --- a/dbms/tests/queries/0_stateless/00944_clear_index_in_partition.sh +++ b/dbms/tests/queries/0_stateless/00944_clear_index_in_partition.sh @@ -8,7 +8,6 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS test.minmax_idx;" $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices=1; CREATE TABLE test.minmax_idx ( u64 UInt64, diff --git a/dbms/tests/queries/0_stateless/00944_create_bloom_filter_index_with_merge_tree.sh b/dbms/tests/queries/0_stateless/00944_create_bloom_filter_index_with_merge_tree.sh index 52246b50b7a..c0fee503f08 100755 --- a/dbms/tests/queries/0_stateless/00944_create_bloom_filter_index_with_merge_tree.sh +++ b/dbms/tests/queries/0_stateless/00944_create_bloom_filter_index_with_merge_tree.sh @@ -8,5 +8,5 @@ set -e for sequence in 1 10 100 1000 10000 100000 1000000 10000000 100000000 1000000000; do \ rate=`echo "1 $sequence" | awk '{printf("%0.9f\n",$1/$2)}'` $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS test.bloom_filter_idx"; -$CLICKHOUSE_CLIENT --allow_experimental_data_skipping_indices=1 --query="CREATE TABLE test.bloom_filter_idx ( u64 UInt64, i32 Int32, f64 Float64, d Decimal(10, 2), s String, e Enum8('a' = 1, 'b' = 2, 'c' = 3), dt Date, INDEX bloom_filter_a i32 TYPE bloom_filter($rate) GRANULARITY 1 ) ENGINE = MergeTree() ORDER BY u64 SETTINGS index_granularity = 8192" +$CLICKHOUSE_CLIENT --query="CREATE TABLE test.bloom_filter_idx ( u64 UInt64, i32 Int32, f64 Float64, d Decimal(10, 2), s String, e Enum8('a' = 1, 'b' = 2, 'c' = 3), dt Date, INDEX bloom_filter_a i32 TYPE bloom_filter($rate) GRANULARITY 1 ) ENGINE = MergeTree() ORDER BY u64 SETTINGS index_granularity = 8192" done diff --git a/dbms/tests/queries/0_stateless/00944_minmax_null.sql b/dbms/tests/queries/0_stateless/00944_minmax_null.sql index ad3cf5f5c61..01b86775481 100644 --- a/dbms/tests/queries/0_stateless/00944_minmax_null.sql +++ b/dbms/tests/queries/0_stateless/00944_minmax_null.sql @@ -1,5 +1,4 @@ DROP TABLE IF EXISTS min_max_with_nullable_string; -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE min_max_with_nullable_string ( t DateTime, diff --git a/dbms/tests/queries/0_stateless/00945_bloom_filter_index.reference b/dbms/tests/queries/0_stateless/00945_bloom_filter_index.reference index 7e9362b5d33..a00ae5f2d5b 100755 --- a/dbms/tests/queries/0_stateless/00945_bloom_filter_index.reference +++ b/dbms/tests/queries/0_stateless/00945_bloom_filter_index.reference @@ -175,3 +175,6 @@ 1 1 1 +5000 +5000 +5000 diff --git a/dbms/tests/queries/0_stateless/00945_bloom_filter_index.sql b/dbms/tests/queries/0_stateless/00945_bloom_filter_index.sql index 268574a609f..b306f2ed7ed 100755 --- a/dbms/tests/queries/0_stateless/00945_bloom_filter_index.sql +++ b/dbms/tests/queries/0_stateless/00945_bloom_filter_index.sql @@ -1,4 +1,3 @@ -SET allow_experimental_data_skipping_indices = 1; DROP TABLE IF EXISTS test.single_column_bloom_filter; @@ -246,3 +245,21 @@ SELECT COUNT() FROM test.bloom_filter_array_lc_null_types_test WHERE has(str, '1 SELECT COUNT() FROM test.bloom_filter_array_lc_null_types_test WHERE has(fixed_string, toFixedString('100', 5)); DROP TABLE IF EXISTS test.bloom_filter_array_lc_null_types_test; + +DROP TABLE IF EXISTS test.bloom_filter_array_offsets_lc_str; +CREATE TABLE test.bloom_filter_array_offsets_lc_str (order_key int, str Array(LowCardinality((String))), INDEX idx str TYPE bloom_filter(1.01) GRANULARITY 1024) ENGINE = MergeTree() ORDER BY order_key SETTINGS index_granularity = 1024; +INSERT INTO test.bloom_filter_array_offsets_lc_str SELECT number AS i, if(i%2, ['value'], []) FROM system.numbers LIMIT 10000; +SELECT count() FROM test.bloom_filter_array_offsets_lc_str WHERE has(str, 'value'); +DROP TABLE IF EXISTS test.bloom_filter_array_offsets_lc_str; + +DROP TABLE IF EXISTS test.bloom_filter_array_offsets_str; +CREATE TABLE test.bloom_filter_array_offsets_str (order_key int, str Array(String), INDEX idx str TYPE bloom_filter(1.01) GRANULARITY 1024) ENGINE = MergeTree() ORDER BY order_key SETTINGS index_granularity = 1024; +INSERT INTO test.bloom_filter_array_offsets_str SELECT number AS i, if(i%2, ['value'], []) FROM system.numbers LIMIT 10000; +SELECT count() FROM test.bloom_filter_array_offsets_str WHERE has(str, 'value'); +DROP TABLE IF EXISTS test.bloom_filter_array_offsets_str; + +DROP TABLE IF EXISTS test.bloom_filter_array_offsets_i; +CREATE TABLE test.bloom_filter_array_offsets_i (order_key int, i Array(int), INDEX idx i TYPE bloom_filter(1.01) GRANULARITY 1024) ENGINE = MergeTree() ORDER BY order_key SETTINGS index_granularity = 1024; +INSERT INTO test.bloom_filter_array_offsets_i SELECT number AS i, if(i%2, [99999], []) FROM system.numbers LIMIT 10000; +SELECT count() FROM test.bloom_filter_array_offsets_i WHERE has(i, 99999); +DROP TABLE IF EXISTS test.bloom_filter_array_offsets_i; diff --git a/dbms/tests/queries/0_stateless/00955_test_final_mark.sql b/dbms/tests/queries/0_stateless/00955_test_final_mark.sql index ff712b829cd..d58bdec7472 100644 --- a/dbms/tests/queries/0_stateless/00955_test_final_mark.sql +++ b/dbms/tests/queries/0_stateless/00955_test_final_mark.sql @@ -1,5 +1,4 @@ SET send_logs_level = 'none'; -SET allow_experimental_data_skipping_indices = 1; DROP TABLE IF EXISTS mt_with_pk; diff --git a/dbms/tests/queries/0_stateless/00964_bloom_index_string_functions.sh b/dbms/tests/queries/0_stateless/00964_bloom_index_string_functions.sh index 28120e782c1..ef35b3a7ff6 100755 --- a/dbms/tests/queries/0_stateless/00964_bloom_index_string_functions.sh +++ b/dbms/tests/queries/0_stateless/00964_bloom_index_string_functions.sh @@ -7,7 +7,6 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS bloom_filter_idx;" # NGRAM BF $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE bloom_filter_idx ( k UInt64, diff --git a/dbms/tests/queries/0_stateless/00965_set_index_string_functions.sh b/dbms/tests/queries/0_stateless/00965_set_index_string_functions.sh index 056915c0e7c..c064f680715 100755 --- a/dbms/tests/queries/0_stateless/00965_set_index_string_functions.sh +++ b/dbms/tests/queries/0_stateless/00965_set_index_string_functions.sh @@ -6,7 +6,6 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS set_idx;" $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices = 1; CREATE TABLE set_idx ( k UInt64, diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select.reference b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select.reference new file mode 100644 index 00000000000..ebf18a51290 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select.reference @@ -0,0 +1,18 @@ +1 1 +2 1 +3 1 +1 1 +2 1 +3 1 +1 2 +2 2 +3 2 +1 2 +2 2 +3 2 +1 2 +2 2 +3 2 +1 2 +2 2 +3 2 diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select.sql b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select.sql new file mode 100644 index 00000000000..2e256615925 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select.sql @@ -0,0 +1,21 @@ +SET allow_experimental_live_view = 1; + +DROP TABLE IF EXISTS test.lv; +DROP TABLE IF EXISTS test.mt; + +CREATE TABLE test.mt (a Int32) Engine=MergeTree order by tuple(); +CREATE LIVE VIEW test.lv AS SELECT a FROM (SELECT a FROM test.mt); + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +DROP TABLE test.lv; +DROP TABLE test.mt; + diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join.reference b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join.reference new file mode 100644 index 00000000000..7a596e87ed6 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join.reference @@ -0,0 +1,6 @@ +1 hello 2 +1 hello 2 +1 hello 3 +2 hello 3 +1 hello 3 +2 hello 3 diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join.sql b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join.sql new file mode 100644 index 00000000000..f89455803f3 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join.sql @@ -0,0 +1,29 @@ +SET allow_experimental_live_view = 1; + +DROP TABLE IF EXISTS test.lv; +DROP TABLE IF EXISTS test.A; +DROP TABLE IF EXISTS test.B; + +CREATE TABLE test.A (id Int32) Engine=Memory; +CREATE TABLE test.B (id Int32, name String) Engine=Memory; + +CREATE LIVE VIEW test.lv AS SELECT id, name FROM ( SELECT A.id, B.name FROM test.A as A, test.B as B WHERE A.id = B.id ); + +SELECT * FROM test.lv; + +INSERT INTO test.A VALUES (1); +INSERT INTO test.B VALUES (1, 'hello'); + +SELECT *,_version FROM test.lv ORDER BY id; +SELECT *,_version FROM test.lv ORDER BY id; + +INSERT INTO test.A VALUES (2) +INSERT INTO test.B VALUES (2, 'hello') + +SELECT *,_version FROM test.lv ORDER BY id; +SELECT *,_version FROM test.lv ORDER BY id; + +DROP TABLE test.lv; +DROP TABLE test.A; +DROP TABLE test.B; + diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join_no_alias.reference b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join_no_alias.reference new file mode 100644 index 00000000000..7a596e87ed6 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join_no_alias.reference @@ -0,0 +1,6 @@ +1 hello 2 +1 hello 2 +1 hello 3 +2 hello 3 +1 hello 3 +2 hello 3 diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join_no_alias.sql b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join_no_alias.sql new file mode 100644 index 00000000000..b8eea8c71e5 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_join_no_alias.sql @@ -0,0 +1,29 @@ +SET allow_experimental_live_view = 1; + +DROP TABLE IF EXISTS test.lv; +DROP TABLE IF EXISTS test.A; +DROP TABLE IF EXISTS test.B; + +CREATE TABLE test.A (id Int32) Engine=Memory; +CREATE TABLE test.B (id Int32, name String) Engine=Memory; + +CREATE LIVE VIEW test.lv AS SELECT id, name FROM ( SELECT test.A.id, test.B.name FROM test.A, test.B WHERE test.A.id = test.B.id); + +SELECT * FROM test.lv; + +INSERT INTO test.A VALUES (1); +INSERT INTO test.B VALUES (1, 'hello'); + +SELECT *,_version FROM test.lv ORDER BY id; +SELECT *,_version FROM test.lv ORDER BY id; + +INSERT INTO test.A VALUES (2) +INSERT INTO test.B VALUES (2, 'hello') + +SELECT *,_version FROM test.lv ORDER BY id; +SELECT *,_version FROM test.lv ORDER BY id; + +DROP TABLE test.lv; +DROP TABLE test.A; +DROP TABLE test.B; + diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested.reference b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested.reference new file mode 100644 index 00000000000..ebf18a51290 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested.reference @@ -0,0 +1,18 @@ +1 1 +2 1 +3 1 +1 1 +2 1 +3 1 +1 2 +2 2 +3 2 +1 2 +2 2 +3 2 +1 2 +2 2 +3 2 +1 2 +2 2 +3 2 diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested.sql b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested.sql new file mode 100644 index 00000000000..f2decda148b --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested.sql @@ -0,0 +1,21 @@ +SET allow_experimental_live_view = 1; + +DROP TABLE IF EXISTS test.lv; +DROP TABLE IF EXISTS test.mt; + +CREATE TABLE test.mt (a Int32) Engine=MergeTree order by tuple(); +CREATE LIVE VIEW test.lv AS SELECT a FROM ( SELECT * FROM ( SELECT a FROM (SELECT a FROM test.mt) ) ); + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +DROP TABLE test.lv; +DROP TABLE test.mt; + diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation.reference b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation.reference new file mode 100644 index 00000000000..75236c0daf7 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation.reference @@ -0,0 +1,4 @@ +6 1 +6 1 +12 2 +12 2 diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation.sql b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation.sql new file mode 100644 index 00000000000..4dc7b02fc51 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation.sql @@ -0,0 +1,21 @@ +SET allow_experimental_live_view = 1; + +DROP TABLE IF EXISTS test.lv; +DROP TABLE IF EXISTS test.mt; + +CREATE TABLE test.mt (a Int32) Engine=MergeTree order by tuple(); +CREATE LIVE VIEW test.lv AS SELECT * FROM ( SELECT sum(a) FROM ( SELECT a FROM (SELECT a FROM test.mt) ) ); + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +DROP TABLE test.lv; +DROP TABLE test.mt; + diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation_table_alias.reference b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation_table_alias.reference new file mode 100644 index 00000000000..75236c0daf7 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation_table_alias.reference @@ -0,0 +1,4 @@ +6 1 +6 1 +12 2 +12 2 diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation_table_alias.sql b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation_table_alias.sql new file mode 100644 index 00000000000..2e10eefda49 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_nested_with_aggregation_table_alias.sql @@ -0,0 +1,21 @@ +SET allow_experimental_live_view = 1; + +DROP TABLE IF EXISTS test.lv; +DROP TABLE IF EXISTS test.mt; + +CREATE TABLE test.mt (a Int32) Engine=MergeTree order by tuple(); +CREATE LIVE VIEW test.lv AS SELECT * FROM ( SELECT sum(boo.x) FROM ( SELECT foo.x FROM (SELECT a AS x FROM test.mt) AS foo) AS boo ); + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +DROP TABLE test.lv; +DROP TABLE test.mt; + diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_table_alias.reference b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_table_alias.reference new file mode 100644 index 00000000000..ebf18a51290 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_table_alias.reference @@ -0,0 +1,18 @@ +1 1 +2 1 +3 1 +1 1 +2 1 +3 1 +1 2 +2 2 +3 2 +1 2 +2 2 +3 2 +1 2 +2 2 +3 2 +1 2 +2 2 +3 2 diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_table_alias.sql b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_table_alias.sql new file mode 100644 index 00000000000..d5da0854899 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_table_alias.sql @@ -0,0 +1,21 @@ +SET allow_experimental_live_view = 1; + +DROP TABLE IF EXISTS test.lv; +DROP TABLE IF EXISTS test.mt; + +CREATE TABLE test.mt (a Int32) Engine=MergeTree order by tuple(); +CREATE LIVE VIEW test.lv AS SELECT foo.x FROM (SELECT a AS x FROM test.mt) AS foo; + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +DROP TABLE test.lv; +DROP TABLE test.mt; + diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation.reference b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation.reference new file mode 100644 index 00000000000..75236c0daf7 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation.reference @@ -0,0 +1,4 @@ +6 1 +6 1 +12 2 +12 2 diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation.sql b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation.sql new file mode 100644 index 00000000000..bc15e8a7356 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation.sql @@ -0,0 +1,21 @@ +SET allow_experimental_live_view = 1; + +DROP TABLE IF EXISTS test.lv; +DROP TABLE IF EXISTS test.mt; + +CREATE TABLE test.mt (a Int32) Engine=MergeTree order by tuple(); +CREATE LIVE VIEW test.lv AS SELECT sum(a) FROM (SELECT a FROM test.mt); + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +DROP TABLE test.lv; +DROP TABLE test.mt; + diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation_in_subquery.reference b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation_in_subquery.reference new file mode 100644 index 00000000000..75236c0daf7 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation_in_subquery.reference @@ -0,0 +1,4 @@ +6 1 +6 1 +12 2 +12 2 diff --git a/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation_in_subquery.sql b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation_in_subquery.sql new file mode 100644 index 00000000000..4dd7a12b190 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00973_live_view_with_subquery_select_with_aggregation_in_subquery.sql @@ -0,0 +1,21 @@ +SET allow_experimental_live_view = 1; + +DROP TABLE IF EXISTS test.lv; +DROP TABLE IF EXISTS test.mt; + +CREATE TABLE test.mt (a Int32) Engine=MergeTree order by tuple(); +CREATE LIVE VIEW test.lv AS SELECT * FROM (SELECT sum(a) FROM test.mt); + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +INSERT INTO test.mt VALUES (1),(2),(3); + +SELECT *,_version FROM test.lv; +SELECT *,_version FROM test.lv; + +DROP TABLE test.lv; +DROP TABLE test.mt; + diff --git a/dbms/tests/queries/0_stateless/00974_adaptive_granularity_secondary_index.sql b/dbms/tests/queries/0_stateless/00974_adaptive_granularity_secondary_index.sql index 328ec86f060..567bf3cf58d 100644 --- a/dbms/tests/queries/0_stateless/00974_adaptive_granularity_secondary_index.sql +++ b/dbms/tests/queries/0_stateless/00974_adaptive_granularity_secondary_index.sql @@ -1,4 +1,3 @@ -SET allow_experimental_data_skipping_indices = 1; DROP TABLE IF EXISTS indexed_table; diff --git a/dbms/tests/queries/0_stateless/00975_indices_mutation_replicated_zookeeper.sh b/dbms/tests/queries/0_stateless/00975_indices_mutation_replicated_zookeeper.sh index 765dfb6abe5..c6e16fc5148 100755 --- a/dbms/tests/queries/0_stateless/00975_indices_mutation_replicated_zookeeper.sh +++ b/dbms/tests/queries/0_stateless/00975_indices_mutation_replicated_zookeeper.sh @@ -9,7 +9,6 @@ $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS test.indices_mutaions2;" $CLICKHOUSE_CLIENT -n --query=" -SET allow_experimental_data_skipping_indices=1; CREATE TABLE test.indices_mutaions1 ( u64 UInt64, diff --git a/dbms/tests/queries/0_stateless/00979_live_view_watch_live_moving_avg.py b/dbms/tests/queries/0_stateless/00979_live_view_watch_live_moving_avg.py new file mode 100755 index 00000000000..30d5e6d67b3 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00979_live_view_watch_live_moving_avg.py @@ -0,0 +1,53 @@ +#!/usr/bin/env python +import os +import sys +import signal + +CURDIR = os.path.dirname(os.path.realpath(__file__)) +sys.path.insert(0, os.path.join(CURDIR, 'helpers')) + +from client import client, prompt, end_of_block + +log = None +# uncomment the line below for debugging +#log=sys.stdout + +with client(name='client1>', log=log) as client1, client(name='client2>', log=log) as client2: + client1.expect(prompt) + client2.expect(prompt) + + client1.send('SET allow_experimental_live_view = 1') + client1.expect(prompt) + client2.send('SET allow_experimental_live_view = 1') + client2.expect(prompt) + + client1.send('DROP TABLE IF EXISTS test.lv') + client1.expect(prompt) + client1.send(' DROP TABLE IF EXISTS test.mt') + client1.expect(prompt) + client1.send('CREATE TABLE test.mt (a Int32, id Int32) Engine=Memory') + client1.expect(prompt) + client1.send('CREATE LIVE VIEW test.lv AS SELECT sum(a)/2 FROM (SELECT a, id FROM ( SELECT a, id FROM test.mt ORDER BY id DESC LIMIT 2 ) ORDER BY id DESC LIMIT 2)') + client1.expect(prompt) + client1.send('WATCH test.lv') + client1.expect(r'0.*1' + end_of_block) + client2.send('INSERT INTO test.mt VALUES (1, 1),(2, 2),(3, 3)') + client1.expect(r'2\.5.*2' + end_of_block) + client2.expect(prompt) + client2.send('INSERT INTO test.mt VALUES (4, 4),(5, 5),(6, 6)') + client1.expect(r'5\.5.*3' + end_of_block) + client2.expect(prompt) + for v, i in enumerate(range(7,129)): + client2.send('INSERT INTO test.mt VALUES (%d, %d)' % (i, i)) + client1.expect(r'%.1f.*%d' % (i-0.5, 4+v) + end_of_block) + client2.expect(prompt) + # send Ctrl-C + client1.send('\x03', eol='') + match = client1.expect('(%s)|([#\$] )' % prompt) + if match.groups()[1]: + client1.send(client1.command) + client1.expect(prompt) + client1.send('DROP TABLE test.lv') + client1.expect(prompt) + client1.send('DROP TABLE test.mt') + client1.expect(prompt) diff --git a/dbms/tests/queries/0_stateless/00979_live_view_watch_live_moving_avg.reference b/dbms/tests/queries/0_stateless/00979_live_view_watch_live_moving_avg.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/dbms/tests/queries/0_stateless/00979_live_view_watch_live_with_subquery.py b/dbms/tests/queries/0_stateless/00979_live_view_watch_live_with_subquery.py new file mode 100755 index 00000000000..44c923d75d8 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00979_live_view_watch_live_with_subquery.py @@ -0,0 +1,53 @@ +#!/usr/bin/env python +import os +import sys +import signal + +CURDIR = os.path.dirname(os.path.realpath(__file__)) +sys.path.insert(0, os.path.join(CURDIR, 'helpers')) + +from client import client, prompt, end_of_block + +log = None +# uncomment the line below for debugging +#log=sys.stdout + +with client(name='client1>', log=log) as client1, client(name='client2>', log=log) as client2: + client1.expect(prompt) + client2.expect(prompt) + + client1.send('SET allow_experimental_live_view = 1') + client1.expect(prompt) + client2.send('SET allow_experimental_live_view = 1') + client2.expect(prompt) + + client1.send('DROP TABLE IF EXISTS test.lv') + client1.expect(prompt) + client1.send(' DROP TABLE IF EXISTS test.mt') + client1.expect(prompt) + client1.send('CREATE TABLE test.mt (a Int32) Engine=MergeTree order by tuple()') + client1.expect(prompt) + client1.send('CREATE LIVE VIEW test.lv AS SELECT * FROM ( SELECT sum(A.a) FROM (SELECT * FROM test.mt) AS A )') + client1.expect(prompt) + client1.send('WATCH test.lv') + client1.expect(r'0.*1' + end_of_block) + client2.send('INSERT INTO test.mt VALUES (1),(2),(3)') + client1.expect(r'6.*2' + end_of_block) + client2.expect(prompt) + client2.send('INSERT INTO test.mt VALUES (4),(5),(6)') + client1.expect(r'21.*3' + end_of_block) + client2.expect(prompt) + for i in range(1,129): + client2.send('INSERT INTO test.mt VALUES (1)') + client1.expect(r'%d.*%d' % (21+i, 3+i) + end_of_block) + client2.expect(prompt) + # send Ctrl-C + client1.send('\x03', eol='') + match = client1.expect('(%s)|([#\$] )' % prompt) + if match.groups()[1]: + client1.send(client1.command) + client1.expect(prompt) + client1.send('DROP TABLE test.lv') + client1.expect(prompt) + client1.send('DROP TABLE test.mt') + client1.expect(prompt) diff --git a/dbms/tests/queries/0_stateless/00979_live_view_watch_live_with_subquery.reference b/dbms/tests/queries/0_stateless/00979_live_view_watch_live_with_subquery.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/dbms/tests/queries/0_stateless/00979_set_index_not.sql b/dbms/tests/queries/0_stateless/00979_set_index_not.sql index 6b34457e575..fd8f9ce2f73 100644 --- a/dbms/tests/queries/0_stateless/00979_set_index_not.sql +++ b/dbms/tests/queries/0_stateless/00979_set_index_not.sql @@ -1,4 +1,3 @@ -SET allow_experimental_data_skipping_indices = 1; DROP TABLE IF EXISTS test.set_index_not; diff --git a/dbms/tests/queries/0_stateless/00980_create_temporary_live_view.reference b/dbms/tests/queries/0_stateless/00980_create_temporary_live_view.reference index 7f9fcbb2e9c..49d86fc2fbf 100644 --- a/dbms/tests/queries/0_stateless/00980_create_temporary_live_view.reference +++ b/dbms/tests/queries/0_stateless/00980_create_temporary_live_view.reference @@ -1,3 +1,4 @@ temporary_live_view_timeout 5 live_view_heartbeat_interval 15 +lv 0 diff --git a/dbms/tests/queries/0_stateless/00980_create_temporary_live_view.sql b/dbms/tests/queries/0_stateless/00980_create_temporary_live_view.sql index 037c2a9e587..0c3bb7f815d 100644 --- a/dbms/tests/queries/0_stateless/00980_create_temporary_live_view.sql +++ b/dbms/tests/queries/0_stateless/00980_create_temporary_live_view.sql @@ -10,8 +10,8 @@ SET temporary_live_view_timeout=1; CREATE TABLE test.mt (a Int32) Engine=MergeTree order by tuple(); CREATE TEMPORARY LIVE VIEW test.lv AS SELECT sum(a) FROM test.mt; -SHOW TABLES LIKE 'lv'; +SHOW TABLES FROM test LIKE 'lv'; SELECT sleep(2); -SHOW TABLES LIKE 'lv'; +SHOW TABLES FROM test LIKE 'lv'; DROP TABLE test.mt; diff --git a/dbms/tests/queries/0_stateless/00980_merge_alter_settings.sql b/dbms/tests/queries/0_stateless/00980_merge_alter_settings.sql index ed42a79ebbf..d650218e99a 100644 --- a/dbms/tests/queries/0_stateless/00980_merge_alter_settings.sql +++ b/dbms/tests/queries/0_stateless/00980_merge_alter_settings.sql @@ -5,7 +5,7 @@ CREATE TABLE log_for_alter ( Data String ) ENGINE = Log(); -ALTER TABLE log_for_alter MODIFY SETTING aaa=123; -- { serverError 471 } +ALTER TABLE log_for_alter MODIFY SETTING aaa=123; -- { serverError 48 } DROP TABLE IF EXISTS log_for_alter; diff --git a/dbms/tests/queries/0_stateless/00990_hasToken_and_tokenbf.sql b/dbms/tests/queries/0_stateless/00990_hasToken_and_tokenbf.sql index 60e4d959417..ad50420b6ae 100644 --- a/dbms/tests/queries/0_stateless/00990_hasToken_and_tokenbf.sql +++ b/dbms/tests/queries/0_stateless/00990_hasToken_and_tokenbf.sql @@ -1,4 +1,3 @@ -SET allow_experimental_data_skipping_indices = 1; DROP TABLE IF EXISTS bloom_filter; diff --git a/dbms/tests/queries/0_stateless/00997_set_index_array.sql b/dbms/tests/queries/0_stateless/00997_set_index_array.sql index c57507ce22d..1692bbb2055 100644 --- a/dbms/tests/queries/0_stateless/00997_set_index_array.sql +++ b/dbms/tests/queries/0_stateless/00997_set_index_array.sql @@ -1,4 +1,3 @@ -SET allow_experimental_data_skipping_indices = 1; DROP TABLE IF EXISTS test.set_array; diff --git a/dbms/tests/queries/0_stateless/00999_test_skip_indices_with_alter_and_merge.sql b/dbms/tests/queries/0_stateless/00999_test_skip_indices_with_alter_and_merge.sql index 55b2f21dc32..596e0d9cbcb 100644 --- a/dbms/tests/queries/0_stateless/00999_test_skip_indices_with_alter_and_merge.sql +++ b/dbms/tests/queries/0_stateless/00999_test_skip_indices_with_alter_and_merge.sql @@ -1,4 +1,3 @@ -SET allow_experimental_data_skipping_indices=1; DROP TABLE IF EXISTS test_vertical_merge; CREATE TABLE test_vertical_merge ( diff --git a/dbms/tests/queries/0_stateless/01000_bad_size_of_marks_skip_idx.sql b/dbms/tests/queries/0_stateless/01000_bad_size_of_marks_skip_idx.sql index 7af19fec695..10464ca5eaf 100644 --- a/dbms/tests/queries/0_stateless/01000_bad_size_of_marks_skip_idx.sql +++ b/dbms/tests/queries/0_stateless/01000_bad_size_of_marks_skip_idx.sql @@ -1,4 +1,3 @@ -SET allow_experimental_data_skipping_indices=1; DROP TABLE IF EXISTS bad_skip_idx; diff --git a/dbms/tests/queries/0_stateless/01011_test_create_as_skip_indices.sql b/dbms/tests/queries/0_stateless/01011_test_create_as_skip_indices.sql index b702fc3654c..9ac9e2d0a70 100644 --- a/dbms/tests/queries/0_stateless/01011_test_create_as_skip_indices.sql +++ b/dbms/tests/queries/0_stateless/01011_test_create_as_skip_indices.sql @@ -1,4 +1,3 @@ -SET allow_experimental_data_skipping_indices=1; CREATE TABLE foo (key int, INDEX i1 key TYPE minmax GRANULARITY 1) Engine=MergeTree() ORDER BY key; CREATE TABLE as_foo AS foo; CREATE TABLE dist (key int, INDEX i1 key TYPE minmax GRANULARITY 1) Engine=Distributed(test_shard_localhost, currentDatabase(), 'foo'); -- { serverError 36 } diff --git a/dbms/tests/queries/0_stateless/01012_reset_running_accumulate.reference b/dbms/tests/queries/0_stateless/01012_reset_running_accumulate.reference new file mode 100644 index 00000000000..98d21902f5c --- /dev/null +++ b/dbms/tests/queries/0_stateless/01012_reset_running_accumulate.reference @@ -0,0 +1,30 @@ +0 0 0 +0 6 6 +0 12 18 +0 18 36 +0 24 60 +1 1 1 +1 7 8 +1 13 21 +1 19 40 +1 25 65 +2 2 2 +2 8 10 +2 14 24 +2 20 44 +2 26 70 +3 3 3 +3 9 12 +3 15 27 +3 21 48 +3 27 75 +4 4 4 +4 10 14 +4 16 30 +4 22 52 +4 28 80 +5 5 5 +5 11 16 +5 17 33 +5 23 56 +5 29 85 diff --git a/dbms/tests/queries/0_stateless/01012_reset_running_accumulate.sql b/dbms/tests/queries/0_stateless/01012_reset_running_accumulate.sql new file mode 100644 index 00000000000..b9336b2f50c --- /dev/null +++ b/dbms/tests/queries/0_stateless/01012_reset_running_accumulate.sql @@ -0,0 +1,11 @@ +SELECT grouping, + item, + runningAccumulate(state, grouping) +FROM ( + SELECT number % 6 AS grouping, + number AS item, + sumState(number) AS state + FROM (SELECT number FROM system.numbers LIMIT 30) + GROUP BY grouping, item + ORDER BY grouping, item +); \ No newline at end of file diff --git a/dbms/tests/queries/0_stateless/01018_ddl_dictionaries_create.sql b/dbms/tests/queries/0_stateless/01018_ddl_dictionaries_create.sql index 45cc0e7eaf7..4e151a9f6e6 100644 --- a/dbms/tests/queries/0_stateless/01018_ddl_dictionaries_create.sql +++ b/dbms/tests/queries/0_stateless/01018_ddl_dictionaries_create.sql @@ -89,7 +89,7 @@ CREATE DICTIONARY memory_db.dict2 PRIMARY KEY key_column SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' TABLE 'table_for_dict' PASSWORD '' DB 'database_for_dict')) LIFETIME(MIN 1 MAX 10) -LAYOUT(FLAT()); -- {serverError 1} +LAYOUT(FLAT()); -- {serverError 48} SHOW CREATE DICTIONARY memory_db.dict2; -- {serverError 487} @@ -114,7 +114,7 @@ CREATE DICTIONARY lazy_db.dict3 PRIMARY KEY key_column, second_column SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' TABLE 'table_for_dict' PASSWORD '' DB 'database_for_dict')) LIFETIME(MIN 1 MAX 10) -LAYOUT(COMPLEX_KEY_HASHED()); -- {serverError 1} +LAYOUT(COMPLEX_KEY_HASHED()); -- {serverError 48} DROP DATABASE IF EXISTS lazy_db; diff --git a/dbms/tests/queries/0_stateless/01045_bloom_filter_null_array.reference b/dbms/tests/queries/0_stateless/01045_bloom_filter_null_array.reference new file mode 100644 index 00000000000..019468dc91b --- /dev/null +++ b/dbms/tests/queries/0_stateless/01045_bloom_filter_null_array.reference @@ -0,0 +1,5 @@ +7 +1 +2 +1 +0 diff --git a/dbms/tests/queries/0_stateless/01045_bloom_filter_null_array.sql b/dbms/tests/queries/0_stateless/01045_bloom_filter_null_array.sql new file mode 100644 index 00000000000..bf97e6be838 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01045_bloom_filter_null_array.sql @@ -0,0 +1,18 @@ +SET allow_experimental_data_skipping_indices = 1; + + +DROP TABLE IF EXISTS test.bloom_filter_null_array; + +CREATE TABLE test.bloom_filter_null_array (v Array(LowCardinality(Nullable(String))), INDEX idx v TYPE bloom_filter(0.1) GRANULARITY 1) ENGINE = MergeTree() ORDER BY v; + +INSERT INTO test.bloom_filter_null_array VALUES ([]); +INSERT INTO test.bloom_filter_null_array VALUES (['1', '2']) ([]) ([]); +INSERT INTO test.bloom_filter_null_array VALUES ([]) ([]) (['2', '3']); + +SELECT COUNT() FROM test.bloom_filter_null_array; +SELECT COUNT() FROM test.bloom_filter_null_array WHERE has(v, '1'); +SELECT COUNT() FROM test.bloom_filter_null_array WHERE has(v, '2'); +SELECT COUNT() FROM test.bloom_filter_null_array WHERE has(v, '3'); +SELECT COUNT() FROM test.bloom_filter_null_array WHERE has(v, '4'); + +DROP TABLE IF EXISTS test.bloom_filter_null_array; diff --git a/dbms/tests/queries/0_stateless/01051_all_join_engine.reference b/dbms/tests/queries/0_stateless/01051_all_join_engine.reference new file mode 100644 index 00000000000..fbb9eca348d --- /dev/null +++ b/dbms/tests/queries/0_stateless/01051_all_join_engine.reference @@ -0,0 +1,90 @@ +left +0 a1 +1 a2 +2 a3 b1 +2 a3 b2 +3 a4 +4 a5 b3 +4 a5 b4 +4 a5 b5 +inner +2 a3 b1 +2 a3 b2 +4 a5 b3 +4 a5 b4 +4 a5 b5 +right +2 a3 b1 +2 a3 b2 +4 a5 b3 +4 a5 b4 +4 a5 b5 +5 b6 +full +0 a1 +1 a2 +2 a3 b1 +2 a3 b2 +3 a4 +4 a5 b3 +4 a5 b4 +4 a5 b5 +5 b6 +inner (join_use_nulls mix) +2 a3 b1 +2 a3 b2 +4 a5 b3 +4 a5 b4 +4 a5 b5 +right (join_use_nulls mix) +2 a3 b1 +2 a3 b2 +4 a5 b3 +4 a5 b4 +4 a5 b5 +5 \N b6 +left (join_use_nulls) +0 a1 \N +1 a2 \N +2 a3 b1 +2 a3 b2 +3 a4 \N +4 a5 b3 +4 a5 b4 +4 a5 b5 +inner (join_use_nulls) +2 a3 b1 +2 a3 b2 +4 a5 b3 +4 a5 b4 +4 a5 b5 +right (join_use_nulls) +2 a3 b1 +2 a3 b2 +4 a5 b3 +4 a5 b4 +4 a5 b5 +5 \N b6 +full (join_use_nulls) +0 a1 \N +1 a2 \N +2 a3 b1 +2 a3 b2 +3 a4 \N +4 a5 b3 +4 a5 b4 +4 a5 b5 +5 \N b6 +inner (join_use_nulls mix2) +2 a3 b1 +2 a3 b2 +4 a5 b3 +4 a5 b4 +4 a5 b5 +right (join_use_nulls mix2) +2 a3 b1 +2 a3 b2 +4 a5 b3 +4 a5 b4 +4 a5 b5 +5 b6 diff --git a/dbms/tests/queries/0_stateless/01051_all_join_engine.sql b/dbms/tests/queries/0_stateless/01051_all_join_engine.sql new file mode 100644 index 00000000000..f894ea84962 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01051_all_join_engine.sql @@ -0,0 +1,90 @@ +DROP TABLE IF EXISTS t1; + +DROP TABLE IF EXISTS left_join; +DROP TABLE IF EXISTS inner_join; +DROP TABLE IF EXISTS right_join; +DROP TABLE IF EXISTS full_join; + +CREATE TABLE t1 (x UInt32, str String) engine = Memory; + +CREATE TABLE left_join (x UInt32, s String) engine = Join(ALL, LEFT, x); +CREATE TABLE inner_join (x UInt32, s String) engine = Join(ALL, INNER, x); +CREATE TABLE right_join (x UInt32, s String) engine = Join(ALL, RIGHT, x); +CREATE TABLE full_join (x UInt32, s String) engine = Join(ALL, FULL, x); + +INSERT INTO t1 (x, str) VALUES (0, 'a1'), (1, 'a2'), (2, 'a3'), (3, 'a4'), (4, 'a5'); + +INSERT INTO left_join (x, s) VALUES (2, 'b1'), (2, 'b2'), (4, 'b3'), (4, 'b4'), (4, 'b5'), (5, 'b6'); +INSERT INTO inner_join (x, s) VALUES (2, 'b1'), (2, 'b2'), (4, 'b3'), (4, 'b4'), (4, 'b5'), (5, 'b6'); +INSERT INTO right_join (x, s) VALUES (2, 'b1'), (2, 'b2'), (4, 'b3'), (4, 'b4'), (4, 'b5'), (5, 'b6'); +INSERT INTO full_join (x, s) VALUES (2, 'b1'), (2, 'b2'), (4, 'b3'), (4, 'b4'), (4, 'b5'), (5, 'b6'); + +SET join_use_nulls = 0; + +SELECT 'left'; +SELECT * FROM t1 LEFT JOIN left_join j USING(x) ORDER BY x, str, s; + +SELECT 'inner'; +SELECT * FROM t1 INNER JOIN inner_join j USING(x) ORDER BY x, str, s; + +SELECT 'right'; +SELECT * FROM t1 RIGHT JOIN right_join j USING(x) ORDER BY x, str, s; + +SELECT 'full'; +SELECT * FROM t1 FULL JOIN full_join j USING(x) ORDER BY x, str, s; + +SET join_use_nulls = 1; + +SELECT * FROM t1 LEFT JOIN left_join j USING(x) ORDER BY x, str, s; -- { serverError 264 } +SELECT * FROM t1 FULL JOIN full_join j USING(x) ORDER BY x, str, s; -- { serverError 264 } + +SELECT 'inner (join_use_nulls mix)'; +SELECT * FROM t1 INNER JOIN inner_join j USING(x) ORDER BY x, str, s; + +SELECT 'right (join_use_nulls mix)'; +SELECT * FROM t1 RIGHT JOIN right_join j USING(x) ORDER BY x, str, s; + +DROP TABLE left_join; +DROP TABLE inner_join; +DROP TABLE right_join; +DROP TABLE full_join; + +CREATE TABLE left_join (x UInt32, s String) engine = Join(ALL, LEFT, x) SETTINGS join_use_nulls = 1; +CREATE TABLE inner_join (x UInt32, s String) engine = Join(ALL, INNER, x) SETTINGS join_use_nulls = 1; +CREATE TABLE right_join (x UInt32, s String) engine = Join(ALL, RIGHT, x) SETTINGS join_use_nulls = 1; +CREATE TABLE full_join (x UInt32, s String) engine = Join(ALL, FULL, x) SETTINGS join_use_nulls = 1; + +INSERT INTO left_join (x, s) VALUES (2, 'b1'), (2, 'b2'), (4, 'b3'), (4, 'b4'), (4, 'b5'), (5, 'b6'); +INSERT INTO inner_join (x, s) VALUES (2, 'b1'), (2, 'b2'), (4, 'b3'), (4, 'b4'), (4, 'b5'), (5, 'b6'); +INSERT INTO right_join (x, s) VALUES (2, 'b1'), (2, 'b2'), (4, 'b3'), (4, 'b4'), (4, 'b5'), (5, 'b6'); +INSERT INTO full_join (x, s) VALUES (2, 'b1'), (2, 'b2'), (4, 'b3'), (4, 'b4'), (4, 'b5'), (5, 'b6'); + +SELECT 'left (join_use_nulls)'; +SELECT * FROM t1 LEFT JOIN left_join j USING(x) ORDER BY x, str, s; + +SELECT 'inner (join_use_nulls)'; +SELECT * FROM t1 INNER JOIN inner_join j USING(x) ORDER BY x, str, s; + +SELECT 'right (join_use_nulls)'; +SELECT * FROM t1 RIGHT JOIN right_join j USING(x) ORDER BY x, str, s; + +SELECT 'full (join_use_nulls)'; +SELECT * FROM t1 FULL JOIN full_join j USING(x) ORDER BY x, str, s; + +SET join_use_nulls = 0; + +SELECT * FROM t1 LEFT JOIN left_join j USING(x) ORDER BY x, str, s; -- { serverError 264 } +SELECT * FROM t1 FULL JOIN full_join j USING(x) ORDER BY x, str, s; -- { serverError 264 } + +SELECT 'inner (join_use_nulls mix2)'; +SELECT * FROM t1 INNER JOIN inner_join j USING(x) ORDER BY x, str, s; + +SELECT 'right (join_use_nulls mix2)'; +SELECT * FROM t1 RIGHT JOIN right_join j USING(x) ORDER BY x, str, s; + +DROP TABLE t1; + +DROP TABLE left_join; +DROP TABLE inner_join; +DROP TABLE right_join; +DROP TABLE full_join; diff --git a/dbms/tests/queries/0_stateless/01051_new_any_join_engine.reference b/dbms/tests/queries/0_stateless/01051_new_any_join_engine.reference index fe207c56ed1..635ae641a63 100644 --- a/dbms/tests/queries/0_stateless/01051_new_any_join_engine.reference +++ b/dbms/tests/queries/0_stateless/01051_new_any_join_engine.reference @@ -16,7 +16,6 @@ any right 5 b6 semi left 2 a3 b1 -2 a6 b1 4 a5 b3 semi right 2 a3 b1 diff --git a/dbms/tests/queries/0_stateless/01051_new_any_join_engine.sql b/dbms/tests/queries/0_stateless/01051_new_any_join_engine.sql index 5ca321135a3..8662d8532d4 100644 --- a/dbms/tests/queries/0_stateless/01051_new_any_join_engine.sql +++ b/dbms/tests/queries/0_stateless/01051_new_any_join_engine.sql @@ -45,9 +45,6 @@ SELECT * FROM t1 ANY INNER JOIN any_inner_join j USING(x) ORDER BY x, str, s; SELECT 'any right'; SELECT * FROM t1 ANY RIGHT JOIN any_right_join j USING(x) ORDER BY x, str, s; - -INSERT INTO t1 (x, str) VALUES (2, 'a6'); - SELECT 'semi left'; SELECT * FROM t1 SEMI LEFT JOIN semi_left_join j USING(x) ORDER BY x, str, s; diff --git a/dbms/tests/queries/0_stateless/01051_random_printable_ascii.reference b/dbms/tests/queries/0_stateless/01051_random_printable_ascii.reference new file mode 100644 index 00000000000..9ce84d2e150 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01051_random_printable_ascii.reference @@ -0,0 +1,2 @@ +String +1000 diff --git a/dbms/tests/queries/0_stateless/01051_random_printable_ascii.sql b/dbms/tests/queries/0_stateless/01051_random_printable_ascii.sql new file mode 100644 index 00000000000..8c259671b4f --- /dev/null +++ b/dbms/tests/queries/0_stateless/01051_random_printable_ascii.sql @@ -0,0 +1,2 @@ +SELECT toTypeName(randomPrintableASCII(1000)); +SELECT length(randomPrintableASCII(1000)); diff --git a/dbms/tests/queries/0_stateless/01052_array_reduce_exception.reference b/dbms/tests/queries/0_stateless/01052_array_reduce_exception.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/dbms/tests/queries/0_stateless/01052_array_reduce_exception.sql b/dbms/tests/queries/0_stateless/01052_array_reduce_exception.sql new file mode 100644 index 00000000000..71c030a055c --- /dev/null +++ b/dbms/tests/queries/0_stateless/01052_array_reduce_exception.sql @@ -0,0 +1 @@ +SELECT arrayReduce('aggThrow(0.0001)', range(number % 10)) FROM system.numbers; -- { serverError 503 } diff --git a/dbms/tests/queries/0_stateless/01053_drop_database_mat_view.reference b/dbms/tests/queries/0_stateless/01053_drop_database_mat_view.reference new file mode 100644 index 00000000000..bb70980e620 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01053_drop_database_mat_view.reference @@ -0,0 +1,4 @@ +my_table +.inner.my_materialized_view +my_materialized_view +my_table diff --git a/dbms/tests/queries/0_stateless/01053_drop_database_mat_view.sql b/dbms/tests/queries/0_stateless/01053_drop_database_mat_view.sql new file mode 100644 index 00000000000..d5763461b42 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01053_drop_database_mat_view.sql @@ -0,0 +1,10 @@ +DROP DATABASE IF EXISTS some_tests; +CREATE DATABASE some_tests; + +create table some_tests.my_table ENGINE = MergeTree(day, (day), 8192) as select today() as day, 'mystring' as str; +show tables from some_tests; +create materialized view some_tests.my_materialized_view ENGINE = MergeTree(day, (day), 8192) as select * from some_tests.my_table; +show tables from some_tests; +select * from some_tests.my_materialized_view; + +DROP DATABASE some_tests; diff --git a/dbms/tests/queries/0_stateless/01053_if_chain_check.reference b/dbms/tests/queries/0_stateless/01053_if_chain_check.reference new file mode 100644 index 00000000000..4211be303d5 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01053_if_chain_check.reference @@ -0,0 +1,2002 @@ +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +1 +2 +4 +5 +7 +8 +10 +11 +14 +17 +19 +20 +23 +25 +29 +31 +35 +38 +40 +41 +43 +47 +49 +50 +53 +55 +59 +61 +62 +67 +70 +71 +73 +77 +79 +82 +83 +85 +86 +89 +94 +95 +97 +98 +100 +101 +103 +106 +107 +109 +113 +115 +118 +119 +121 +122 +124 +125 +127 +131 +133 +134 +137 +139 +142 +145 +146 +149 +151 +155 +157 +158 +161 +163 +164 +166 +167 +172 +173 +175 +178 +179 +181 +187 +188 +190 +191 +193 +194 +197 +199 +200 +202 +203 +205 +206 +209 +211 +212 +214 +215 +217 +218 +223 +226 +227 +229 +233 +235 +236 +239 +241 +244 +245 +248 +250 +251 +253 +254 +257 +262 +263 +265 +266 +268 +269 +271 +274 +275 +277 +278 +281 +283 +284 +287 +289 +292 +293 +295 +298 +301 +302 +305 +307 +310 +311 +313 +314 +316 +317 +319 +323 +326 +328 +329 +331 +332 +334 +335 +337 +341 +343 +344 +346 +347 +349 +350 +353 +355 +356 +358 +359 +361 +362 +365 +367 +371 +373 +376 +379 +382 +383 +385 +386 +388 +389 +391 +394 +395 +397 +398 +401 +404 +409 +410 +412 +413 +415 +419 +421 +422 +424 +425 +427 +428 +430 +431 +433 +434 +436 +437 +439 +443 +445 +446 +449 +451 +452 +454 +457 +458 +461 +463 +466 +467 +469 +470 +472 +473 +475 +478 +479 +482 +485 +487 +488 +490 +491 +493 +497 +499 +500 +502 +503 +505 +508 +509 +511 +514 +515 +517 +521 +523 +524 +526 +527 +529 +530 +535 +536 +538 +539 +541 +542 +545 +547 +548 +551 +553 +554 +556 +557 +562 +563 +565 +566 +568 +569 +571 +574 +575 +577 +581 +583 +584 +586 +587 +589 +590 +593 +595 +596 +599 +601 +602 +604 +605 +607 +610 +613 +614 +617 +619 +620 +622 +623 +625 +626 +628 +631 +632 +634 +635 +641 +643 +647 +649 +652 +653 +655 +658 +659 +661 +662 +664 +665 +667 +668 +670 +671 +673 +674 +677 +679 +683 +685 +686 +691 +692 +694 +695 +697 +698 +701 +706 +707 +709 +710 +712 +713 +716 +718 +719 +721 +722 +724 +725 +727 +730 +731 +733 +734 +737 +739 +742 +743 +745 +746 +749 +751 +755 +757 +758 +761 +763 +764 +766 +769 +772 +773 +775 +776 +778 +779 +781 +785 +787 +788 +790 +791 +794 +796 +797 +799 +802 +803 +805 +808 +809 +811 +815 +817 +818 +820 +821 +823 +824 +826 +827 +829 +830 +833 +835 +838 +839 +841 +842 +844 +847 +853 +854 +856 +857 +859 +860 +862 +863 +865 +866 +869 +872 +875 +877 +878 +881 +883 +886 +887 +889 +890 +892 +893 +895 +898 +899 +901 +904 +905 +907 +908 +911 +913 +914 +916 +917 +919 +922 +926 +929 +931 +932 +934 +935 +937 +938 +940 +941 +943 +947 +950 +953 +955 +956 +958 +959 +961 +964 +965 +967 +970 +971 +973 +974 +977 +979 +982 +983 +985 +989 +991 +994 +995 +997 +998 +1000 +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +\N +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +nan +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 +42 diff --git a/dbms/tests/queries/0_stateless/01053_if_chain_check.sql b/dbms/tests/queries/0_stateless/01053_if_chain_check.sql new file mode 100644 index 00000000000..3a98b85c473 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01053_if_chain_check.sql @@ -0,0 +1,3 @@ +SELECT x FROM (SELECT number % 16 = 0 ? nan : (number % 24 = 0 ? NULL : (number % 37 = 0 ? nan : (number % 34 = 0 ? nan : (number % 3 = 0 ? NULL : (number % 68 = 0 ? 42 : (number % 28 = 0 ? nan : (number % 46 = 0 ? nan : (number % 13 = 0 ? nan : (number % 27 = 0 ? NULL : (number % 39 = 0 ? NULL : (number % 27 = 0 ? NULL : (number % 30 = 0 ? NULL : (number % 72 = 0 ? NULL : (number % 36 = 0 ? NULL : (number % 51 = 0 ? NULL : (number % 58 = 0 ? nan : (number % 26 = 0 ? 42 : (number % 13 = 0 ? nan : (number % 12 = 0 ? NULL : (number % 22 = 0 ? nan : (number % 36 = 0 ? NULL : (number % 63 = 0 ? NULL : (number % 27 = 0 ? NULL : (number % 18 = 0 ? NULL : (number % 69 = 0 ? NULL : (number % 76 = 0 ? nan : (number % 42 = 0 ? NULL : (number % 9 = 0 ? NULL : (toFloat64(number)))))))))))))))))))))))))))))) AS x FROM system.numbers LIMIT 1001) ORDER BY x ASC NULLS FIRST; + +SELECT x FROM (SELECT number % 22 = 0 ? nan : (number % 56 = 0 ? 42 : (number % 45 = 0 ? NULL : (number % 47 = 0 ? 42 : (number % 39 = 0 ? NULL : (number % 1 = 0 ? nan : (number % 43 = 0 ? nan : (number % 40 = 0 ? nan : (number % 42 = 0 ? NULL : (number % 26 = 0 ? 42 : (number % 41 = 0 ? 42 : (number % 6 = 0 ? NULL : (number % 39 = 0 ? NULL : (number % 34 = 0 ? nan : (number % 74 = 0 ? 42 : (number % 40 = 0 ? nan : (number % 37 = 0 ? nan : (number % 51 = 0 ? NULL : (number % 46 = 0 ? nan : (toFloat64(number)))))))))))))))))))) AS x FROM system.numbers LIMIT 1001) ORDER BY x ASC NULLS FIRST; \ No newline at end of file diff --git a/dbms/tests/queries/0_stateless/01054_random_printable_ascii_ubsan.reference b/dbms/tests/queries/0_stateless/01054_random_printable_ascii_ubsan.reference new file mode 100644 index 00000000000..c6deeedd330 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01054_random_printable_ascii_ubsan.reference @@ -0,0 +1 @@ + Ok diff --git a/dbms/tests/queries/0_stateless/01054_random_printable_ascii_ubsan.sh b/dbms/tests/queries/0_stateless/01054_random_printable_ascii_ubsan.sh new file mode 100755 index 00000000000..b0189445d66 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01054_random_printable_ascii_ubsan.sh @@ -0,0 +1,14 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +. $CURDIR/../shell_config.sh + +# Implementation specific behaviour on overflow. We may return error or produce empty string. +${CLICKHOUSE_CLIENT} --query="SELECT randomPrintableASCII(nan);" >/dev/null 2>&1 ||: +${CLICKHOUSE_CLIENT} --query="SELECT randomPrintableASCII(inf);" >/dev/null 2>&1 ||: +${CLICKHOUSE_CLIENT} --query="SELECT randomPrintableASCII(-inf);" >/dev/null 2>&1 ||: +${CLICKHOUSE_CLIENT} --query="SELECT randomPrintableASCII(1e300);" >/dev/null 2>&1 ||: +${CLICKHOUSE_CLIENT} --query="SELECT randomPrintableASCII(-123.456);" >/dev/null 2>&1 ||: +${CLICKHOUSE_CLIENT} --query="SELECT randomPrintableASCII(-1);" >/dev/null 2>&1 ||: + +${CLICKHOUSE_CLIENT} --query="SELECT randomPrintableASCII(0), 'Ok';" diff --git a/dbms/tests/queries/0_stateless/01055_prewhere_bugs.reference b/dbms/tests/queries/0_stateless/01055_prewhere_bugs.reference new file mode 100644 index 00000000000..cd0e6a397a1 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01055_prewhere_bugs.reference @@ -0,0 +1,2 @@ +43 + 1 diff --git a/dbms/tests/queries/0_stateless/01055_prewhere_bugs.sql b/dbms/tests/queries/0_stateless/01055_prewhere_bugs.sql new file mode 100644 index 00000000000..d9a0256ce52 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01055_prewhere_bugs.sql @@ -0,0 +1,17 @@ +DROP TABLE IF EXISTS test_prewhere_default_column; +DROP TABLE IF EXISTS test_prewhere_column_type; + +CREATE TABLE test_prewhere_default_column (APIKey UInt8, SessionType UInt8) ENGINE = MergeTree() PARTITION BY APIKey ORDER BY tuple(); +INSERT INTO test_prewhere_default_column VALUES( 42, 42 ); +ALTER TABLE test_prewhere_default_column ADD COLUMN OperatingSystem UInt64 DEFAULT SessionType+1; + +SELECT OperatingSystem FROM test_prewhere_default_column PREWHERE SessionType = 42; + + +CREATE TABLE test_prewhere_column_type (`a` LowCardinality(String), `x` Nullable(Int32)) ENGINE = MergeTree ORDER BY tuple(); +INSERT INTO test_prewhere_column_type VALUES ('', 2); + +SELECT a, y FROM test_prewhere_column_type prewhere (x = 2) AS y; + +DROP TABLE test_prewhere_default_column; +DROP TABLE test_prewhere_column_type; diff --git a/dbms/tests/queries/0_stateless/01056_create_table_as.reference b/dbms/tests/queries/0_stateless/01056_create_table_as.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/dbms/tests/queries/0_stateless/01056_create_table_as.sql b/dbms/tests/queries/0_stateless/01056_create_table_as.sql new file mode 100644 index 00000000000..868e1f082dd --- /dev/null +++ b/dbms/tests/queries/0_stateless/01056_create_table_as.sql @@ -0,0 +1,32 @@ +DROP TABLE IF EXISTS t1; +CREATE TABLE t1 (key Int) Engine=Memory(); +CREATE TABLE t2 AS t1; +DROP TABLE t2; + +-- live view +SET allow_experimental_live_view=1; +CREATE LIVE VIEW lv AS SELECT * FROM t1; +CREATE TABLE t3 AS lv; -- { serverError 80; } +DROP TABLE lv; + +-- view +CREATE VIEW v AS SELECT * FROM t1; +CREATE TABLE t3 AS v; -- { serverError 80; } +DROP TABLE v; + +-- dictionary +DROP DATABASE if exists test_01056_dict_data; +CREATE DATABASE test_01056_dict_data; +CREATE TABLE test_01056_dict_data.dict_data (key Int, value UInt16) Engine=Memory(); +CREATE DICTIONARY dict +( + `key` UInt64, + `value` UInt16 +) +PRIMARY KEY key +SOURCE(CLICKHOUSE( + HOST '127.0.0.1' PORT 9000 + TABLE 'dict_data' DB 'test_01056_dict_data' USER 'default' PASSWORD '')) +LIFETIME(MIN 0 MAX 0) +LAYOUT(SPARSE_HASHED()); +CREATE TABLE t3 AS dict; -- { serverError 80; } diff --git a/dbms/tests/queries/0_stateless/01056_prepared_statements_null_and_escaping.reference b/dbms/tests/queries/0_stateless/01056_prepared_statements_null_and_escaping.reference new file mode 100644 index 00000000000..b1ec6c8a4ed --- /dev/null +++ b/dbms/tests/queries/0_stateless/01056_prepared_statements_null_and_escaping.reference @@ -0,0 +1,8 @@ +Hello, World +Hello,\tWorld +Hello,\nWorld +Hello,\tWorld +Hello,\nWorld +\N +457 +457 diff --git a/dbms/tests/queries/0_stateless/01056_prepared_statements_null_and_escaping.sh b/dbms/tests/queries/0_stateless/01056_prepared_statements_null_and_escaping.sh new file mode 100755 index 00000000000..4840a8f672d --- /dev/null +++ b/dbms/tests/queries/0_stateless/01056_prepared_statements_null_and_escaping.sh @@ -0,0 +1,28 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +. $CURDIR/../shell_config.sh + +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}¶m_x=Hello,%20World" \ + -d "SELECT {x:Nullable(String)}"; + +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}¶m_x=Hello,%5CtWorld" \ + -d "SELECT {x:Nullable(String)}"; + +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}¶m_x=Hello,%5CnWorld" \ + -d "SELECT {x:Nullable(String)}"; + +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}¶m_x=Hello,%5C%09World" \ + -d "SELECT {x:Nullable(String)}"; + +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}¶m_x=Hello,%5C%0AWorld" \ + -d "SELECT {x:Nullable(String)}"; + +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}¶m_x=%5CN" \ + -d "SELECT {x:Nullable(String)}"; + +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}¶m_x=Hello,%09World" \ + -d "SELECT {x:Nullable(String)}" 2>&1 | grep -oF '457'; + +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}¶m_x=Hello,%0AWorld" \ + -d "SELECT {x:Nullable(String)}" 2>&1 | grep -oF '457'; diff --git a/dbms/tests/queries/0_stateless/01057_http_compression_prefer_brotli.reference b/dbms/tests/queries/0_stateless/01057_http_compression_prefer_brotli.reference new file mode 100644 index 00000000000..5dd396a38c9 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01057_http_compression_prefer_brotli.reference @@ -0,0 +1,11 @@ +1 +1 +1 +1 +1 +999997 +999998 +999999 +999997 +999998 +999999 diff --git a/dbms/tests/queries/0_stateless/01057_http_compression_prefer_brotli.sh b/dbms/tests/queries/0_stateless/01057_http_compression_prefer_brotli.sh new file mode 100755 index 00000000000..419f774e502 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01057_http_compression_prefer_brotli.sh @@ -0,0 +1,12 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +. $CURDIR/../shell_config.sh + +${CLICKHOUSE_CURL} -sS -H 'Accept-Encoding: br' "${CLICKHOUSE_URL}&enable_http_compression=1" -d 'SELECT 1' | brotli -d +${CLICKHOUSE_CURL} -sS -H 'Accept-Encoding: br,gzip' "${CLICKHOUSE_URL}&enable_http_compression=1" -d 'SELECT 1' | brotli -d +${CLICKHOUSE_CURL} -sS -H 'Accept-Encoding: gzip,br' "${CLICKHOUSE_URL}&enable_http_compression=1" -d 'SELECT 1' | brotli -d +${CLICKHOUSE_CURL} -sS -H 'Accept-Encoding: gzip,deflate,br' "${CLICKHOUSE_URL}&enable_http_compression=1" -d 'SELECT 1' | brotli -d +${CLICKHOUSE_CURL} -sS -H 'Accept-Encoding: gzip,deflate' "${CLICKHOUSE_URL}&enable_http_compression=1" -d 'SELECT 1' | gzip -d +${CLICKHOUSE_CURL} -sS -H 'Accept-Encoding: gzip' "${CLICKHOUSE_URL}&enable_http_compression=1" -d 'SELECT number FROM numbers(1000000)' | gzip -d | tail -n3 +${CLICKHOUSE_CURL} -sS -H 'Accept-Encoding: br' "${CLICKHOUSE_URL}&enable_http_compression=1" -d 'SELECT number FROM numbers(1000000)' | brotli -d | tail -n3 diff --git a/dbms/tests/queries/0_stateless/01058_zlib_ng_level1_bug.reference b/dbms/tests/queries/0_stateless/01058_zlib_ng_level1_bug.reference new file mode 100644 index 00000000000..1036eecb9b0 --- /dev/null +++ b/dbms/tests/queries/0_stateless/01058_zlib_ng_level1_bug.reference @@ -0,0 +1,101 @@ +34529 +34530 +34531 +34532 +34533 +34534 +34535 +34536 +34537 +34538 +34539 +34540 +34541 +34542 +34543 +34544 +34545 +34546 +34547 +34548 +34549 +34550 +34551 +34552 +34553 +34554 +34555 +34556 +34557 +34558 +34559 +34560 +34561 +34562 +34563 +34564 +34565 +34566 +34567 +34568 +34569 +34570 +34571 +34572 +34573 +34574 +34575 +34576 +34577 +34578 +34579 +34580 +34581 +34582 +34583 +34584 +34585 +34586 +34587 +34588 +34589 +34590 +34591 +34592 +34593 +34594 +34595 +34596 +34597 +34598 +34599 +34600 +34601 +34602 +34603 +34604 +34605 +34606 +34607 +34608 +34609 +34610 +34611 +34612 +34613 +34614 +34615 +34616 +34617 +34618 +34619 +34620 +34621 +34622 +34623 +34624 +34625 +34626 +34627 +34628 +34629 diff --git a/dbms/tests/queries/0_stateless/01058_zlib_ng_level1_bug.sh b/dbms/tests/queries/0_stateless/01058_zlib_ng_level1_bug.sh new file mode 100755 index 00000000000..f554aec4fca --- /dev/null +++ b/dbms/tests/queries/0_stateless/01058_zlib_ng_level1_bug.sh @@ -0,0 +1,6 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +. $CURDIR/../shell_config.sh + +for i in $(seq 34530 1 34630); do ${CLICKHOUSE_CURL} -sS -H 'Accept-Encoding: gzip' "${CLICKHOUSE_URL}&enable_http_compression=1&http_zlib_compression_level=1" -d "SELECT * FROM numbers($i)" | gzip -d | tail -n1; done diff --git a/dbms/tests/queries/0_stateless/01059_storage_file_brotli.reference b/dbms/tests/queries/0_stateless/01059_storage_file_brotli.reference new file mode 100644 index 00000000000..6c545e9faec --- /dev/null +++ b/dbms/tests/queries/0_stateless/01059_storage_file_brotli.reference @@ -0,0 +1,5 @@ +1000000 999999 +1000000 999999 +2000000 999999 +1 255 +1 255 diff --git a/dbms/tests/queries/0_stateless/01059_storage_file_brotli.sql b/dbms/tests/queries/0_stateless/01059_storage_file_brotli.sql new file mode 100644 index 00000000000..e7d5a87b2af --- /dev/null +++ b/dbms/tests/queries/0_stateless/01059_storage_file_brotli.sql @@ -0,0 +1,22 @@ +DROP TABLE IF EXISTS file; +CREATE TABLE file (x UInt64) ENGINE = File(TSV, 'data1.tsv.br'); +TRUNCATE TABLE file; + +INSERT INTO file SELECT * FROM numbers(1000000); +SELECT count(), max(x) FROM file; + +DROP TABLE file; + +CREATE TABLE file (x UInt64) ENGINE = File(TSV, 'data2.tsv.gz'); +TRUNCATE TABLE file; + +INSERT INTO file SELECT * FROM numbers(1000000); +SELECT count(), max(x) FROM file; + +DROP TABLE file; + +SELECT count(), max(x) FROM file('data{1,2}.tsv.{gz,br}', TSV, 'x UInt64'); + +-- check that they are compressed +SELECT count() < 1000000, max(x) FROM file('data1.tsv.br', RowBinary, 'x UInt8', 'none'); +SELECT count() < 3000000, max(x) FROM file('data2.tsv.gz', RowBinary, 'x UInt8', 'none'); diff --git a/dbms/tests/queries/1_stateful/00091_prewhere_two_conditions.sql b/dbms/tests/queries/1_stateful/00091_prewhere_two_conditions.sql index cc660ed3f24..201ff788006 100644 --- a/dbms/tests/queries/1_stateful/00091_prewhere_two_conditions.sql +++ b/dbms/tests/queries/1_stateful/00091_prewhere_two_conditions.sql @@ -1,4 +1,4 @@ -SET max_bytes_to_read = 200000000; +SET max_bytes_to_read = 600000000; SET optimize_move_to_prewhere = 1; diff --git a/docker/packager/packager b/docker/packager/packager index 8e385786c5f..5e8ffbf1cb9 100755 --- a/docker/packager/packager +++ b/docker/packager/packager @@ -124,6 +124,7 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, image_typ if is_cross_darwin: cc = compiler[:-len(DARWIN_SUFFIX)] cmake_flags.append("-DCMAKE_AR:FILEPATH=/cctools/bin/x86_64-apple-darwin-ar") + cmake_flags.append("-DCMAKE_INSTALL_NAME_TOOL=/cctools/bin/x86_64-apple-darwin-install_name_tool") cmake_flags.append("-DCMAKE_RANLIB:FILEPATH=/cctools/bin/x86_64-apple-darwin-ranlib") cmake_flags.append("-DLINKER_NAME=/cctools/bin/x86_64-apple-darwin-ld") cmake_flags.append("-DCMAKE_TOOLCHAIN_FILE=/build/cmake/darwin/toolchain-x86_64.cmake") diff --git a/docker/test/performance-comparison/Dockerfile b/docker/test/performance-comparison/Dockerfile index 45900d414b2..1e08ec0f521 100644 --- a/docker/test/performance-comparison/Dockerfile +++ b/docker/test/performance-comparison/Dockerfile @@ -1,9 +1,17 @@ # docker build -t yandex/clickhouse-performance-comparison . -FROM alpine +FROM ubuntu:18.04 -RUN apk update && apk add --no-cache bash wget python3 python3-dev g++ -RUN pip3 --no-cache-dir install clickhouse_driver -RUN apk del g++ python3-dev +RUN apt-get update \ + && apt-get install --yes --no-install-recommends \ + p7zip-full bash ncdu wget python3 python3-pip python3-dev g++ \ + && pip3 --no-cache-dir install clickhouse_driver \ + && apt-get purge --yes python3-dev g++ \ + && apt-get autoremove --yes \ + && apt-get clean COPY * / +CMD /entrypoint.sh + +# docker run --network=host --volume :/workspace --volume=:/output -e LEFT_PR=<> -e LEFT_SHA=<> -e RIGHT_PR=<> -e RIGHT_SHA=<> yandex/clickhouse-performance-comparison + diff --git a/docker/test/performance-comparison/compare.sh b/docker/test/performance-comparison/compare.sh index 5e7fa7e79fb..7ecf715403f 100755 --- a/docker/test/performance-comparison/compare.sh +++ b/docker/test/performance-comparison/compare.sh @@ -6,8 +6,6 @@ trap "kill 0" EXIT script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" -mkdir left ||: -mkdir right ||: mkdir db0 ||: left_pr=$1 @@ -18,19 +16,21 @@ right_sha=$4 function download { + rm -r left ||: + mkdir left ||: + rm -r right ||: + mkdir right ||: + la="$left_pr-$left_sha.tgz" ra="$right_pr-$right_sha.tgz" - wget -nd -c "https://clickhouse-builds.s3.yandex.net/$left_pr/$left_sha/performance/performance.tgz" -O "$la" && tar -C left --strip-components=1 -zxvf "$la" & - wget -nd -c "https://clickhouse-builds.s3.yandex.net/$right_pr/$right_sha/performance/performance.tgz" -O "$ra" && tar -C right --strip-components=1 -zxvf "$ra" & - cd db0 && wget -nd -c "https://s3.mds.yandex.net/clickhouse-private-datasets/hits_10m_single/partitions/hits_10m_single.tar" && tar -xvf hits_10m_single.tar & - cd db0 && wget -nd -c "https://s3.mds.yandex.net/clickhouse-private-datasets/hits_100m_single/partitions/hits_100m_single.tar" && tar -xvf hits_100m_single.tar & - cd db0 && wget -nd -c "https://clickhouse-datasets.s3.yandex.net/hits/partitions/hits_v1.tar" && tar -xvf hits_v1.tar & - cd db0 && wget -nd -c "https://clickhouse-datasets.s3.yandex.net/visits/partitions/visits_v1.tar" && tar -xvf visits_v1.tar & + wget -q -nd -c "https://clickhouse-builds.s3.yandex.net/$left_pr/$left_sha/performance/performance.tgz" -O "$la" && tar -C left --strip-components=1 -zxvf "$la" & + wget -q -nd -c "https://clickhouse-builds.s3.yandex.net/$right_pr/$right_sha/performance/performance.tgz" -O "$ra" && tar -C right --strip-components=1 -zxvf "$ra" & + cd db0 && wget -q -nd -c "https://s3.mds.yandex.net/clickhouse-private-datasets/hits_10m_single/partitions/hits_10m_single.tar" && tar -xvf hits_10m_single.tar & + cd db0 && wget -q -nd -c "https://s3.mds.yandex.net/clickhouse-private-datasets/hits_100m_single/partitions/hits_100m_single.tar" && tar -xvf hits_100m_single.tar & + cd db0 && wget -q -nd -c "https://clickhouse-datasets.s3.yandex.net/hits/partitions/hits_v1.tar" && tar -xvf hits_v1.tar & wait # Use hardlinks instead of copying - rm -r left/db ||: - rm -r right/db ||: cp -al db0/ left/db/ cp -al db0/ right/db/ } @@ -40,16 +40,26 @@ function configure { sed -i 's/9000/9001/g' right/config/config.xml - cat > right/config/config.d/perf-test-tweaks.xml < right/config/config.d/zz-perf-test-tweaks.xml < true - + + + + +
+ EOF - cp right/config/config.d/perf-test-tweaks.xml left/config/config.d/perf-test-tweaks.xml + cp right/config/config.d/zz-perf-test-tweaks.xml left/config/config.d/zz-perf-test-tweaks.xml + + rm left/config/config.d/metric_log.xml ||: + rm left/config/config.d/text_log.xml ||: + rm right/config/config.d/metric_log.xml ||: + rm right/config/config.d/text_log.xml ||: } configure @@ -78,6 +88,11 @@ function restart while ! right/clickhouse client --port 9001 --query "select 1" ; do kill -0 $right_pid ; echo . ; sleep 1 ; done echo right ok + + right/clickhouse client --port 9001 --query "create database test" ||: + right/clickhouse client --port 9001 --query "rename table datasets.hits_v1 to test.hits" ||: + left/clickhouse client --port 9000 --query "create database test" ||: + left/clickhouse client --port 9000 --query "rename table datasets.hits_v1 to test.hits" ||: } restart @@ -90,13 +105,14 @@ function run_tests for test in left/performance/*.xml do test_name=$(basename $test ".xml") - "$script_dir/perf.py" "$test" > "$test_name-raw.tsv" || continue + "$script_dir/perf.py" "$test" > "$test_name-raw.tsv" 2> "$test_name-err.log" || continue right/clickhouse local --file "$test_name-raw.tsv" --structure 'query text, run int, version UInt32, time float' --query "$(cat $script_dir/eqmed.sql)" > "$test_name-report.tsv" done } run_tests # Analyze results -result_structure="fail int, left float, right float, diff float, rd Array(float), query text" +result_structure="left float, right float, diff float, rd Array(float), query text" right/clickhouse local --file '*-report.tsv' -S "$result_structure" --query "select * from table where rd[3] > 0.05 order by rd[3] desc" > flap-prone.tsv -right/clickhouse local --file '*-report.tsv' -S "$result_structure" --query "select * from table where diff > 0.05 and diff > rd[3] order by diff desc" > failed.tsv +right/clickhouse local --file '*-report.tsv' -S "$result_structure" --query "select * from table where diff > 0.05 and diff > rd[3] order by diff desc" > bad-perf.tsv +grep Exception:[^:] *-err.log > run-errors.log diff --git a/docker/test/performance-comparison/entrypoint.sh b/docker/test/performance-comparison/entrypoint.sh new file mode 100755 index 00000000000..7ef5a9553a0 --- /dev/null +++ b/docker/test/performance-comparison/entrypoint.sh @@ -0,0 +1,8 @@ +#!/bin/bash + +cd /workspace + +../compare.sh $LEFT_PR $LEFT_SHA $RIGHT_PR $RIGHT_SHA > compare.log 2>&1 + +7z a /output/output.7z *.log *.tsv +cp compare.log /output diff --git a/docker/test/performance-comparison/eqmed.sql b/docker/test/performance-comparison/eqmed.sql index 22df87a2890..5e8d842b7df 100644 --- a/docker/test/performance-comparison/eqmed.sql +++ b/docker/test/performance-comparison/eqmed.sql @@ -38,4 +38,4 @@ from group by query ) original_medians_array where rd.query = original_medians_array.query -order by fail desc, rd_quantiles_percent[3] asc; +order by rd_quantiles_percent[3] desc; diff --git a/docker/test/performance-comparison/perf.py b/docker/test/performance-comparison/perf.py index 7a3e50e2046..5517a71cc44 100755 --- a/docker/test/performance-comparison/perf.py +++ b/docker/test/performance-comparison/perf.py @@ -15,8 +15,13 @@ root = tree.getroot() # Check main metric main_metric_element = root.find('main_metric/*') -if main_metric_element and main_metric_element.tag != 'min_time': - raise Exception('Only the min_time main metric is supported. This test uses \'{}\''.format(main_metric)) +if main_metric_element is not None and main_metric_element.tag != 'min_time': + raise Exception('Only the min_time main metric is supported. This test uses \'{}\''.format(main_metric_element.tag)) + +# FIXME another way to detect infinite tests. They should have an appropriate main_metric but sometimes they don't. +infinite_sign = root.find('.//average_speed_not_changing_for_ms') +if infinite_sign is not None: + raise Exception('Looks like the test is infinite (sign 1)') # Open connections servers = [{'host': 'localhost', 'port': 9000, 'client_name': 'left'}, {'host': 'localhost', 'port': 9001, 'client_name': 'right'}] @@ -24,12 +29,9 @@ connections = [clickhouse_driver.Client(**server) for server in servers] # Check tables that should exist tables = [e.text for e in root.findall('preconditions/table_exists')] -if tables: +for t in tables: for c in connections: - tables_list = ", ".join("'{}'".format(t) for t in tables) - res = c.execute("select t from values('t text', {}) anti join system.tables on database = currentDatabase() and name = t".format(tables_list)) - if res: - raise Exception('Some tables are not found: {}'.format(res)) + res = c.execute("select 1 from {}".format(t)) # Apply settings settings = root.findall('settings/*') @@ -76,6 +78,9 @@ for c in connections: c.execute(q) # Run test queries +def tsv_escape(s): + return s.replace('\\', '\\\\').replace('\t', '\\t').replace('\n', '\\n').replace('\r','') + test_query_templates = [q.text for q in root.findall('query')] test_queries = substitute_parameters(test_query_templates, parameter_combinations) @@ -83,7 +88,7 @@ for q in test_queries: for run in range(0, 7): for conn_index, c in enumerate(connections): res = c.execute(q) - print(q + '\t' + str(run) + '\t' + str(conn_index) + '\t' + str(c.last_query.elapsed)) + print(tsv_escape(q) + '\t' + str(run) + '\t' + str(conn_index) + '\t' + str(c.last_query.elapsed)) # Run drop queries drop_query_templates = [q.text for q in root.findall('drop_query')] diff --git a/docs/en/faq/general.md b/docs/en/faq/general.md index 41026e54a08..3e6daf6ed9a 100644 --- a/docs/en/faq/general.md +++ b/docs/en/faq/general.md @@ -21,4 +21,36 @@ If you use Oracle through the ODBC driver as a source of external dictionaries, NLS_LANG=RUSSIAN_RUSSIA.UTF8 ``` +## How to export data from ClickHouse to the file? + +### Using INTO OUTFILE Clause + +Add [INTO OUTFILE](../query_language/select/#into-outfile-clause) clause to your query. + +For example: + +```sql +SELECT * FROM table INTO OUTFILE 'file' +``` + +By default, ClickHouse uses the [TabSeparated](../interfaces/formats.md#tabseparated) format for output data. To select the [data format](../interfaces/formats.md), use the [FORMAT clause](../query_language/select/#format-clause). + +For example: + +```sql +SELECT * FROM table INTO OUTFILE 'file' FORMAT CSV +``` + +### Using File-engine Table + +See [File](../operations/table_engines/file.md). + +### Using Command-line Redirection + +```sql +$ clickhouse-client --query "SELECT * from table" > result.txt +``` + +See [clickhouse-client](../interfaces/cli.md). + [Original article](https://clickhouse.yandex/docs/en/faq/general/) diff --git a/docs/en/getting_started/example_datasets/star_schema.md b/docs/en/getting_started/example_datasets/star_schema.md index 2e66ced7149..43c1b1df6fa 100644 --- a/docs/en/getting_started/example_datasets/star_schema.md +++ b/docs/en/getting_started/example_datasets/star_schema.md @@ -10,6 +10,10 @@ $ make Generating data: +!!! warning "Attention" + -s 100 -- dbgen generates 600 million rows (67 GB) + -s 1000 -- dbgen generates 6 billion rows (takes a lot of time) + ```bash $ ./dbgen -s 1000 -T c $ ./dbgen -s 1000 -T l @@ -101,68 +105,236 @@ CREATE TABLE lineorder_flat ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS -SELECT l.*, c.*, s.*, p.* -FROM lineorder l - ANY INNER JOIN customer c ON (c.C_CUSTKEY = l.LO_CUSTKEY) - ANY INNER JOIN supplier s ON (s.S_SUPPKEY = l.LO_SUPPKEY) - ANY INNER JOIN part p ON (p.P_PARTKEY = l.LO_PARTKEY); +SELECT + l.LO_ORDERKEY AS LO_ORDERKEY, + l.LO_LINENUMBER AS LO_LINENUMBER, + l.LO_CUSTKEY AS LO_CUSTKEY, + l.LO_PARTKEY AS LO_PARTKEY, + l.LO_SUPPKEY AS LO_SUPPKEY, + l.LO_ORDERDATE AS LO_ORDERDATE, + l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY, + l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY, + l.LO_QUANTITY AS LO_QUANTITY, + l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE, + l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE, + l.LO_DISCOUNT AS LO_DISCOUNT, + l.LO_REVENUE AS LO_REVENUE, + l.LO_SUPPLYCOST AS LO_SUPPLYCOST, + l.LO_TAX AS LO_TAX, + l.LO_COMMITDATE AS LO_COMMITDATE, + l.LO_SHIPMODE AS LO_SHIPMODE, + c.C_NAME AS C_NAME, + c.C_ADDRESS AS C_ADDRESS, + c.C_CITY AS C_CITY, + c.C_NATION AS C_NATION, + c.C_REGION AS C_REGION, + c.C_PHONE AS C_PHONE, + c.C_MKTSEGMENT AS C_MKTSEGMENT, + s.S_NAME AS S_NAME, + s.S_ADDRESS AS S_ADDRESS, + s.S_CITY AS S_CITY, + s.S_NATION AS S_NATION, + s.S_REGION AS S_REGION, + s.S_PHONE AS S_PHONE, + p.P_NAME AS P_NAME, + p.P_MFGR AS P_MFGR, + p.P_CATEGORY AS P_CATEGORY, + p.P_BRAND AS P_BRAND, + p.P_COLOR AS P_COLOR, + p.P_TYPE AS P_TYPE, + p.P_SIZE AS P_SIZE, + p.P_CONTAINER AS P_CONTAINER +FROM lineorder AS l +INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY +INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY +INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY; -ALTER TABLE lineorder_flat DROP COLUMN C_CUSTKEY, DROP COLUMN S_SUPPKEY, DROP COLUMN P_PARTKEY; ``` Running the queries: Q1.1 ```sql -SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25; +SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue +FROM lineorder_flat +WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25; ``` Q1.2 ```sql -SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35; +SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue +FROM lineorder_flat +WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35; ``` Q1.3 ```sql -SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35; +SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue +FROM lineorder_flat +WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994 + AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35; ``` Q2.1 ```sql -SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY year, P_BRAND ORDER BY year, P_BRAND; +SELECT + sum(LO_REVENUE), + toYear(LO_ORDERDATE) AS year, + P_BRAND +FROM lineorder_flat +WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' +GROUP BY + year, + P_BRAND +ORDER BY + year, + P_BRAND; ``` Q2.2 ```sql -SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE P_BRAND BETWEEN 'MFGR#2221' AND 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY year, P_BRAND ORDER BY year, P_BRAND; +SELECT + sum(LO_REVENUE), + toYear(LO_ORDERDATE) AS year, + P_BRAND +FROM lineorder_flat +WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' +GROUP BY + year, + P_BRAND +ORDER BY + year, + P_BRAND; ``` Q2.3 ```sql -SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY year, P_BRAND ORDER BY year, P_BRAND; +SELECT + sum(LO_REVENUE), + toYear(LO_ORDERDATE) AS year, + P_BRAND +FROM lineorder_flat +WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' +GROUP BY + year, + P_BRAND +ORDER BY + year, + P_BRAND; ``` Q3.1 ```sql -SELECT C_NATION, S_NATION, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 GROUP BY C_NATION, S_NATION, year ORDER BY year asc, revenue desc; +SELECT + C_NATION, + S_NATION, + toYear(LO_ORDERDATE) AS year, + sum(LO_REVENUE) AS revenue +FROM lineorder_flat +WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 +GROUP BY + C_NATION, + S_NATION, + year +ORDER BY + year ASC, + revenue DESC; ``` Q3.2 ```sql -SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 GROUP BY C_CITY, S_CITY, year ORDER BY year asc, revenue desc; +SELECT + C_CITY, + S_CITY, + toYear(LO_ORDERDATE) AS year, + sum(LO_REVENUE) AS revenue +FROM lineorder_flat +WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 +GROUP BY + C_CITY, + S_CITY, + year +ORDER BY + year ASC, + revenue DESC; ``` Q3.3 ```sql -SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 GROUP BY C_CITY, S_CITY, year ORDER BY year asc, revenue desc; +SELECT + C_CITY, + S_CITY, + toYear(LO_ORDERDATE) AS year, + sum(LO_REVENUE) AS revenue +FROM lineorder_flat +WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 +GROUP BY + C_CITY, + S_CITY, + year +ORDER BY + year ASC, + revenue DESC; ``` Q3.4 ```sql -SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = '199712' GROUP BY C_CITY, S_CITY, year ORDER BY year asc, revenue desc; +SELECT + C_CITY, + S_CITY, + toYear(LO_ORDERDATE) AS year, + sum(LO_REVENUE) AS revenue +FROM lineorder_flat +WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712 +GROUP BY + C_CITY, + S_CITY, + year +ORDER BY + year ASC, + revenue DESC; ``` Q4.1 ```sql -SELECT toYear(LO_ORDERDATE) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year, C_NATION ORDER BY year, C_NATION; +SELECT + toYear(LO_ORDERDATE) AS year, + C_NATION, + sum(LO_REVENUE - LO_SUPPLYCOST) AS profit +FROM lineorder_flat +WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') +GROUP BY + year, + C_NATION +ORDER BY + year ASC, + C_NATION ASC; ``` Q4.2 ```sql -SELECT toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year, S_NATION, P_CATEGORY ORDER BY year, S_NATION, P_CATEGORY; +SELECT + toYear(LO_ORDERDATE) AS year, + S_NATION, + P_CATEGORY, + sum(LO_REVENUE - LO_SUPPLYCOST) AS profit +FROM lineorder_flat +WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') +GROUP BY + year, + S_NATION, + P_CATEGORY +ORDER BY + year ASC, + S_NATION ASC, + P_CATEGORY ASC; ``` Q4.3 ```sql -SELECT toYear(LO_ORDERDATE) AS year, S_CITY, P_BRAND, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BY year, S_CITY, P_BRAND ORDER BY year, S_CITY, P_BRAND; +SELECT + toYear(LO_ORDERDATE) AS year, + S_CITY, + P_BRAND, + sum(LO_REVENUE - LO_SUPPLYCOST) AS profit +FROM lineorder_flat +WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' +GROUP BY + year, + S_CITY, + P_BRAND +ORDER BY + year ASC, + S_CITY ASC, + P_BRAND ASC; ``` [Original article](https://clickhouse.yandex/docs/en/getting_started/example_datasets/star_schema/) diff --git a/docs/en/interfaces/cli.md b/docs/en/interfaces/cli.md index 198e5f5c094..86c7a104670 100644 --- a/docs/en/interfaces/cli.md +++ b/docs/en/interfaces/cli.md @@ -117,7 +117,7 @@ You can pass parameters to `clickhouse-client` (all parameters have a default va - `--query, -q` – The query to process when using non-interactive mode. - `--database, -d` – Select the current default database. Default value: the current database from the server settings ('default' by default). - `--multiline, -m` – If specified, allow multiline queries (do not send the query on Enter). -- `--multiquery, -n` – If specified, allow processing multiple queries separated by commas. Only works in non-interactive mode. +- `--multiquery, -n` – If specified, allow processing multiple queries separated by semicolons. - `--format, -f` – Use the specified default format to output the result. - `--vertical, -E` – If specified, use the Vertical format by default to output the result. This is the same as '--format=Vertical'. In this format, each value is printed on a separate line, which is helpful when displaying wide tables. - `--time, -t` – If specified, print the query execution time to 'stderr' in non-interactive mode. diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md index eebdf10702d..b37c9cdddb2 100644 --- a/docs/en/interfaces/formats.md +++ b/docs/en/interfaces/formats.md @@ -29,6 +29,7 @@ The supported formats are: | [PrettySpace](#prettyspace) | ✗ | ✔ | | [Protobuf](#protobuf) | ✔ | ✔ | | [Parquet](#data-format-parquet) | ✔ | ✔ | +| [ORC](#data-format-orc) | ✔ | ✗ | | [RowBinary](#rowbinary) | ✔ | ✔ | | [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ | | [Native](#native) | ✔ | ✔ | @@ -954,16 +955,57 @@ Data types of a ClickHouse table columns can differ from the corresponding field You can insert Parquet data from a file into ClickHouse table by the following command: ```bash -cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet" +$ cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet" ``` You can select data from a ClickHouse table and save them into some file in the Parquet format by the following command: -```sql -clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq} +```bash +$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq} ``` -To exchange data with the Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). +To exchange data with Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). + +## ORC {#data-format-orc} + +[Apache ORC](https://orc.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. ClickHouse supports only read operations for this format. + +### Data Types Matching + +The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` queries. + +| ORC data type (`INSERT`) | ClickHouse data type | +| -------------------- | ------------------ | +| `UINT8`, `BOOL` | [UInt8](../data_types/int_uint.md) | +| `INT8` | [Int8](../data_types/int_uint.md) | +| `UINT16` | [UInt16](../data_types/int_uint.md) | +| `INT16` | [Int16](../data_types/int_uint.md) | +| `UINT32` | [UInt32](../data_types/int_uint.md) | +| `INT32` | [Int32](../data_types/int_uint.md) | +| `UINT64` | [UInt64](../data_types/int_uint.md) | +| `INT64` | [Int64](../data_types/int_uint.md) | +| `FLOAT`, `HALF_FLOAT` | [Float32](../data_types/float.md) | +| `DOUBLE` | [Float64](../data_types/float.md) | +| `DATE32` | [Date](../data_types/date.md) | +| `DATE64`, `TIMESTAMP` | [DateTime](../data_types/datetime.md) | +| `STRING`, `BINARY` | [String](../data_types/string.md) | +| `DECIMAL` | [Decimal](../data_types/decimal.md) | + +ClickHouse supports configurable precision of `Decimal` type. The `INSERT` query treats the ORC `DECIMAL` type as the ClickHouse `Decimal128` type. + +Unsupported ORC data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`. + +Data types of a ClickHouse table columns can differ from the corresponding fields of the ORC data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [cast](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to that data type which is set for the ClickHouse table column. + +### Inserting Data + +You can insert Parquet data from a file into ClickHouse table by the following command: + +```bash +$ cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT ORC" +``` + +To exchange data with Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md). ## Format Schema {#formatschema} diff --git a/docs/en/interfaces/third-party/client_libraries.md b/docs/en/interfaces/third-party/client_libraries.md index e0842ab36ef..514d2bd0305 100644 --- a/docs/en/interfaces/third-party/client_libraries.md +++ b/docs/en/interfaces/third-party/client_libraries.md @@ -41,6 +41,7 @@ - C# - [ClickHouse.Ado](https://github.com/killwort/ClickHouse-Net) - [ClickHouse.Net](https://github.com/ilyabreev/ClickHouse.Net) + - [ClickHouse.Client](https://github.com/DarkWanderer/ClickHouse.Client) - Elixir - [clickhousex](https://github.com/appodeal/clickhousex/) - Nim diff --git a/docs/en/interfaces/third-party/integrations.md b/docs/en/interfaces/third-party/integrations.md index f96507320a5..692dfae9776 100644 --- a/docs/en/interfaces/third-party/integrations.md +++ b/docs/en/interfaces/third-party/integrations.md @@ -33,7 +33,9 @@ - Monitoring - [Graphite](https://graphiteapp.org) - [graphouse](https://github.com/yandex/graphouse) - - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + + - [graphite-clickhouse](https://github.com/lomik/graphite-clickhouse) + - [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer) - optimizes staled partitions in [\*GraphiteMergeTree](../../operations/table_engines/graphitemergetree.md#graphitemergetree) if rules from [rollup configuration](../../operations/table_engines/graphitemergetree.md#rollup-configuration) could be applied - [Grafana](https://grafana.com/) - [clickhouse-grafana](https://github.com/Vertamedia/clickhouse-grafana) - [Prometheus](https://prometheus.io/) diff --git a/docs/en/operations/index.md b/docs/en/operations/index.md index 547fc4de260..3ce48c8e57e 100644 --- a/docs/en/operations/index.md +++ b/docs/en/operations/index.md @@ -13,6 +13,7 @@ ClickHouse operations manual consists of the following major sections: - [Quotas](quotas.md) - [System Tables](system_tables.md) - [Server Configuration Parameters](server_settings/index.md) + - [How To Test Your Hardware With ClickHouse](performance_test.md) - [Settings](settings/index.md) - [Utilities](utils/index.md) diff --git a/docs/en/operations/performance_test.md b/docs/en/operations/performance_test.md new file mode 100644 index 00000000000..a56490ac8ba --- /dev/null +++ b/docs/en/operations/performance_test.md @@ -0,0 +1,72 @@ +# How To Test Your Hardware With ClickHouse + +Draft. + +With this instruction you can run basic ClickHouse performance test on any server without installation of ClickHouse packages. + +1. Go to "commits" page: https://github.com/ClickHouse/ClickHouse/commits/master + +2. Click on the first green check mark or red cross with green "ClickHouse Build Check" and click on the "Details" link near "ClickHouse Build Check". + +3. Copy the link to "clickhouse" binary for amd64 or aarch64. + +4. ssh to the server and download it with wget: +``` +# For amd64: +wget https://clickhouse-builds.s3.yandex.net/0/00ba767f5d2a929394ea3be193b1f79074a1c4bc/1578163263_binary/clickhouse +# For aarch64: +wget https://clickhouse-builds.s3.yandex.net/0/00ba767f5d2a929394ea3be193b1f79074a1c4bc/1578161264_binary/clickhouse +# Then do: +chmod a+x clickhouse +``` + +5. Download configs: +``` +wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/dbms/programs/server/config.xml +wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/dbms/programs/server/users.xml +mkdir config.d +wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/dbms/programs/server/config.d/path.xml -O config.d/path.xml +wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/dbms/programs/server/config.d/log_to_console.xml -O config.d/log_to_console.xml +``` + +6. Download benchmark files: +``` +wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/dbms/benchmark/clickhouse/benchmark-new.sh +chmod a+x benchmark-new.sh +wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/dbms/benchmark/clickhouse/queries.sql +``` + +7. Download test data: +According to the instruction: +https://clickhouse.yandex/docs/en/getting_started/example_datasets/metrica/ +("hits" table containing 100 million rows) + +``` +wget https://clickhouse-datasets.s3.yandex.net/hits/partitions/hits_100m_obfuscated_v1.tar.xz +tar xvf hits_100m_obfuscated_v1.tar.xz -C . +mv hits_100m_obfuscated_v1/* . +``` + +8. Run the server: +``` +./clickhouse server +``` + +9. Check the data: +ssh to the server in another terminal +``` +./clickhouse client --query "SELECT count() FROM hits_100m_obfuscated" +100000000 +``` + +10. Edit the benchmark-new.sh, change "clickhouse-client" to "./clickhouse client" and add "--max_memory_usage 100000000000" parameter. +``` +mcedit benchmark-new.sh +``` + +11. Run the benchmark: +``` +./benchmark-new.sh hits_100m_obfuscated +``` + +12. Send the numbers and the info about your hardware configuration to clickhouse-feedback@yandex-team.com diff --git a/docs/en/operations/server_settings/settings.md b/docs/en/operations/server_settings/settings.md index c76637cc927..89bb7ef33ae 100644 --- a/docs/en/operations/server_settings/settings.md +++ b/docs/en/operations/server_settings/settings.md @@ -372,9 +372,6 @@ Approximate size (in bytes) of the cache of marks used by table engines of the [ The cache is shared for the server and memory is allocated as needed. The cache size must be at least 5368709120. -!!! warning "Warning" - This parameter could be exceeded by the [mark_cache_min_lifetime](../settings/settings.md#settings-mark_cache_min_lifetime) setting. - **Example** ```xml diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index ba7370fef03..21d2f46e71b 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -80,7 +80,7 @@ This parameter is useful when you are using formats that require a schema defini Enables or disables [fsync](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html) when writing `.sql` files. Enabled by default. -It makes sense to disable it if the server has millions of tiny table chunks that are constantly being created and destroyed. +It makes sense to disable it if the server has millions of tiny tables that are constantly being created and destroyed. ## enable_http_compression {#settings-enable_http_compression} @@ -218,11 +218,11 @@ Ok. Enables or disables template deduction for an SQL expressions in [Values](../../interfaces/formats.md#data-format-values) format. It allows to parse and interpret expressions in `Values` much faster if expressions in consecutive rows have the same structure. ClickHouse will try to deduce template of an expression, parse the following rows using this template and evaluate the expression on batch of successfully parsed rows. For the following query: ```sql INSERT INTO test VALUES (lower('Hello')), (lower('world')), (lower('INSERT')), (upper('Values')), ... -``` +``` - if `input_format_values_interpret_expressions=1` and `format_values_deduce_templates_of_expressions=0` expressions will be interpreted separately for each row (this is very slow for large number of rows) - if `input_format_values_interpret_expressions=0` and `format_values_deduce_templates_of_expressions=1` expressions in the first, second and third rows will be parsed using template `lower(String)` and interpreted together, expression is the forth row will be parsed with another template (`upper(String)`) - if `input_format_values_interpret_expressions=1` and `format_values_deduce_templates_of_expressions=1` - the same as in previous case, but also allows fallback to interpreting expressions separately if it's not possible to deduce template. - + Enabled by default. ## input_format_values_accurate_types_of_literals {#settings-input_format_values_accurate_types_of_literals} @@ -232,7 +232,7 @@ This setting is used only when `input_format_values_deduce_templates_of_expressi (..., abs(3.141592654), ...), -- Float64 literal (..., abs(-1), ...), -- Int64 literal ``` -When this setting is enabled, ClickHouse will check actual type of literal and will use expression template of the corresponding type. In some cases it may significantly slow down expression evaluation in `Values`. +When this setting is enabled, ClickHouse will check actual type of literal and will use expression template of the corresponding type. In some cases it may significantly slow down expression evaluation in `Values`. When disabled, ClickHouse may use more general type for some literals (e.g. `Float64` or `Int64` instead of `UInt64` for `42`), but it may cause overflow and precision issues. Enabled by default. @@ -477,7 +477,7 @@ Default value: 8. ## merge_tree_max_rows_to_use_cache {#setting-merge_tree_max_rows_to_use_cache} -If ClickHouse should read more than `merge_tree_max_rows_to_use_cache` rows in one query, it doesn't use the cache of uncompressed blocks. +If ClickHouse should read more than `merge_tree_max_rows_to_use_cache` rows in one query, it doesn't use the cache of uncompressed blocks. The cache of uncompressed blocks stores data extracted for queries. ClickHouse uses this cache to speed up responses to repeated small queries. This setting protects the cache from trashing by queries that read a large amount of data. The [uncompressed_cache_size](../server_settings/settings.md#server-settings-uncompressed_cache_size) server setting defines the size of the cache of uncompressed blocks. @@ -591,12 +591,6 @@ We are writing a URL column with the String type (average size of 60 bytes per v There usually isn't any reason to change this setting. -## mark_cache_min_lifetime {#settings-mark_cache_min_lifetime} - -If the value of [mark_cache_size](../server_settings/settings.md#server-mark-cache-size) setting is exceeded, delete only records older than mark_cache_min_lifetime seconds. If your hosts have low amount of RAM, it makes sense to lower this parameter. - -Default value: 10000 seconds. - ## max_query_size {#settings-max_query_size} The maximum part of a query that can be taken to RAM for parsing with the SQL parser. @@ -960,7 +954,7 @@ Possible values: - 1 — skipping enabled. - If a shard is unavailable, ClickHouse returns a result based on partial data and doesn't report node availability issues. + If a shard is unavailable, ClickHouse returns a result based on partial data and doesn't report node availability issues. - 0 — skipping disabled. @@ -987,7 +981,7 @@ Default value: 0. - Type: seconds - Default value: 60 seconds -Controls how fast errors of distributed tables are zeroed. Given that currently a replica was unavailabe for some time and accumulated 5 errors and distributed_replica_error_half_life is set to 1 second, then said replica is considered back to normal in 3 seconds since last error. +Controls how fast errors in distributed tables are zeroed. If a replica is unavailabe for some time, accumulates 5 errors, and distributed_replica_error_half_life is set to 1 second, then the replica is considered normal 3 seconds after last error. ** See also ** @@ -1000,7 +994,7 @@ Controls how fast errors of distributed tables are zeroed. Given that currently - Type: unsigned int - Default value: 1000 -Error count of each replica is capped at this value, preventing a single replica from accumulating to many errors. +Error count of each replica is capped at this value, preventing a single replica from accumulating too many errors. ** See also ** @@ -1010,7 +1004,7 @@ Error count of each replica is capped at this value, preventing a single replica ## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms} -Base interval of data sending by the [Distributed](../table_engines/distributed.md) table engine. Actual interval grows exponentially in case of any errors. +Base interval for the [Distributed](../table_engines/distributed.md) table engine to send data. The actual interval grows exponentially in the event of errors. Possible values: @@ -1021,7 +1015,7 @@ Default value: 100 milliseconds. ## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms} -Maximum interval of data sending by the [Distributed](../table_engines/distributed.md) table engine. Limits exponential growth of the interval set in the [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms) setting. +Maximum interval for the [Distributed](../table_engines/distributed.md) table engine to send data. Limits exponential growth of the interval set in the [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms) setting. Possible values: @@ -1033,14 +1027,14 @@ Default value: 30000 milliseconds (30 seconds). Enables/disables sending of inserted data in batches. -When batch sending is enabled, [Distributed](../table_engines/distributed.md) table engine tries to send multiple files of inserted data in one operation instead of sending them separately. Batch sending improves cluster performance by better server and network resources utilization. +When batch sending is enabled, the [Distributed](../table_engines/distributed.md) table engine tries to send multiple files of inserted data in one operation instead of sending them separately. Batch sending improves cluster performance by better utilizing server and network resources. Possible values: - 1 — Enabled. - 0 — Disabled. -Defaule value: 0. +Default value: 0. ## os_thread_priority {#setting-os_thread_priority} @@ -1067,7 +1061,7 @@ Possible values: - Positive integer number, in nanoseconds. Recommended values: - + - 10000000 (100 times a second) nanoseconds and less for single queries. - 1000000000 (once a second) for cluster-wide profiling. @@ -1090,7 +1084,7 @@ Possible values: - Positive integer number of nanoseconds. Recommended values: - + - 10000000 (100 times a second) nanosecods and more for for single queries. - 1000000000 (once a second) for cluster-wide profiling. diff --git a/docs/en/operations/settings/settings_profiles.md b/docs/en/operations/settings/settings_profiles.md index c335e249212..095bb362936 100644 --- a/docs/en/operations/settings/settings_profiles.md +++ b/docs/en/operations/settings/settings_profiles.md @@ -60,7 +60,7 @@ Example: The example specifies two profiles: `default` and `web`. The `default` profile has a special purpose: it must always be present and is applied when starting the server. In other words, the `default` profile contains default settings. The `web` profile is a regular profile that can be set using the `SET` query or using a URL parameter in an HTTP query. -Settings profiles can inherit from each other. To use inheritance, indicate the `profile` setting before the other settings that are listed in the profile. +Settings profiles can inherit from each other. To use inheritance, indicate one or multiple `profile` settings before the other settings that are listed in the profile. In case when one setting is defined in different profiles, the latest defined is used. [Original article](https://clickhouse.yandex/docs/en/operations/settings/settings_profiles/) diff --git a/docs/en/operations/table_engines/distributed.md b/docs/en/operations/table_engines/distributed.md index a22fd43b34f..24a61998b39 100644 --- a/docs/en/operations/table_engines/distributed.md +++ b/docs/en/operations/table_engines/distributed.md @@ -87,9 +87,9 @@ The Distributed engine requires writing clusters to the config file. Clusters fr There are two methods for writing data to a cluster: -First, you can define which servers to write which data to, and perform the write directly on each shard. In other words, perform INSERT in the tables that the distributed table "looks at". This is the most flexible solution – you can use any sharding scheme, which could be non-trivial due to the requirements of the subject area. This is also the most optimal solution, since data can be written to different shards completely independently. +First, you can define which servers to write which data to and perform the write directly on each shard. In other words, perform INSERT in the tables that the distributed table "looks at". This is the most flexible solution as you can use any sharding scheme, which could be non-trivial due to the requirements of the subject area. This is also the most optimal solution, since data can be written to different shards completely independently. -Second, you can perform INSERT in a Distributed table. In this case, the table will distribute the inserted data across servers itself. In order to write to a Distributed table, it must have a sharding key set (the last parameter). In addition, if there is only one shard, the write operation works without specifying the sharding key, since it doesn't have any meaning in this case. +Second, you can perform INSERT in a Distributed table. In this case, the table will distribute the inserted data across servers itself. In order to write to a Distributed table, it must have a sharding key set (the last parameter). In addition, if there is only one shard, the write operation works without specifying the sharding key, since it doesn't mean anything in this case. Each shard can have a weight defined in the config file. By default, the weight is equal to one. Data is distributed across shards in the amount proportional to the shard weight. For example, if there are two shards and the first has a weight of 9 while the second has a weight of 10, the first will be sent 9 / 19 parts of the rows, and the second will be sent 10 / 19. @@ -112,7 +112,7 @@ You should be concerned about the sharding scheme in the following cases: - Queries are used that require joining data (IN or JOIN) by a specific key. If data is sharded by this key, you can use local IN or JOIN instead of GLOBAL IN or GLOBAL JOIN, which is much more efficient. - A large number of servers is used (hundreds or more) with a large number of small queries (queries of individual clients - websites, advertisers, or partners). In order for the small queries to not affect the entire cluster, it makes sense to locate data for a single client on a single shard. Alternatively, as we've done in Yandex.Metrica, you can set up bi-level sharding: divide the entire cluster into "layers", where a layer may consist of multiple shards. Data for a single client is located on a single layer, but shards can be added to a layer as necessary, and data is randomly distributed within them. Distributed tables are created for each layer, and a single shared distributed table is created for global queries. -Data is written asynchronously. When inserted to the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The period of data sending is managed by the [distributed_directory_monitor_sleep_time_ms](../settings/settings.md#distributed_directory_monitor_sleep_time_ms) and [distributed_directory_monitor_max_sleep_time_ms](../settings/settings.md#distributed_directory_monitor_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_directory_monitor_batch_inserts](../settings/settings.md#distributed_directory_monitor_batch_inserts) setting. This setting improves cluster performance by better local server and network resources utilization. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`. +Data is written asynchronously. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The period for sending data is managed by the [distributed_directory_monitor_sleep_time_ms](../settings/settings.md#distributed_directory_monitor_sleep_time_ms) and [distributed_directory_monitor_max_sleep_time_ms](../settings/settings.md#distributed_directory_monitor_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_directory_monitor_batch_inserts](../settings/settings.md#distributed_directory_monitor_batch_inserts) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`. If the server ceased to exist or had a rough restart (for example, after a device failure) after an INSERT to a Distributed table, the inserted data might be lost. If a damaged data part is detected in the table directory, it is transferred to the 'broken' subdirectory and no longer used. diff --git a/docs/en/operations/table_engines/graphitemergetree.md b/docs/en/operations/table_engines/graphitemergetree.md index a8ed8aaaddf..7a47eabfe22 100644 --- a/docs/en/operations/table_engines/graphitemergetree.md +++ b/docs/en/operations/table_engines/graphitemergetree.md @@ -1,5 +1,4 @@ - -# GraphiteMergeTree +# GraphiteMergeTree {#graphitemergetree} This engine is designed for thinning and aggregating/averaging (rollup) [Graphite](http://graphite.readthedocs.io/en/latest/index.html) data. It may be helpful to developers who want to use ClickHouse as a data store for Graphite. @@ -7,7 +6,7 @@ You can use any ClickHouse table engine to store the Graphite data if you don't The engine inherits properties from [MergeTree](mergetree.md). -## Creating a Table +## Creating a Table {#creating-table} ```sql CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] @@ -67,7 +66,7 @@ All of the parameters excepting `config_section` have the same meaning as in `Me - `config_section` — Name of the section in the configuration file, where are the rules of rollup set. -## Rollup configuration +## Rollup configuration {#rollup-configuration} The settings for rollup are defined by the [graphite_rollup](../server_settings/settings.md#server_settings-graphite_rollup) parameter in the server configuration. The name of the parameter could be any. You can create several configurations and use them for different tables. @@ -78,14 +77,14 @@ required-columns patterns ``` -### Required Columns +### Required Columns {#required-columns} - `path_column_name` — The name of the column storing the metric name (Graphite sensor). Default value: `Path`. - `time_column_name` — The name of the column storing the time of measuring the metric. Default value: `Time`. - `value_column_name` — The name of the column storing the value of the metric at the time set in `time_column_name`. Default value: `Value`. - `version_column_name` — The name of the column storing the version of the metric. Default value: `Timestamp`. -### Patterns +### Patterns {#patterns} Structure of the `patterns` section: @@ -127,7 +126,7 @@ Fields for `pattern` and `default` sections: - `function` – The name of the aggregating function to apply to data whose age falls within the range `[age, age + precision]`. -### Configuration Example +### Configuration Example {#configuration-example} ```xml diff --git a/docs/en/operations/table_engines/jdbc.md b/docs/en/operations/table_engines/jdbc.md index e2ceb12641d..293f3c10dad 100644 --- a/docs/en/operations/table_engines/jdbc.md +++ b/docs/en/operations/table_engines/jdbc.md @@ -10,6 +10,9 @@ This engine supports the [Nullable](../../data_types/nullable.md) data type. ```sql CREATE TABLE [IF NOT EXISTS] [db.]table_name +( + columns list... +) ENGINE = JDBC(dbms_uri, external_database, external_table) ``` diff --git a/docs/en/operations/table_engines/mergetree.md b/docs/en/operations/table_engines/mergetree.md index f1c888e4480..e9c9ec26c53 100644 --- a/docs/en/operations/table_engines/mergetree.md +++ b/docs/en/operations/table_engines/mergetree.md @@ -504,21 +504,25 @@ Disks, volumes and storage policies should be declared inside the ` - - /mnt/fast_ssd/clickhouse - - - /mnt/hdd1/clickhouse - 10485760_ - - - /mnt/hdd2/clickhouse - 10485760_ - + + + + /mnt/fast_ssd/clickhouse + + + /mnt/hdd1/clickhouse + 10485760 + + + /mnt/hdd2/clickhouse + 10485760 + + ... + + ... - + ``` Tags: @@ -532,26 +536,30 @@ The order of the disk definition is not important. Storage policies configuration markup: ```xml - - - - - disk_name_from_disks_configuration - 1073741824 - - - - - - - 0.2 - - - - + + ... + + + + + disk_name_from_disks_configuration + 1073741824 + + + + + + + 0.2 + + + + - - + + + ... + ``` Tags: @@ -565,29 +573,33 @@ Tags: Cofiguration examples: ```xml - - - - - disk1 - disk2 - - - + + ... + + + + + disk1 + disk2 + + + - - - - fast_ssd - 1073741824 - - - disk1 - - - 0.2 - - + + + + fast_ssd + 1073741824 + + + disk1 + + + 0.2 + + + ... + ``` In given example, the `hdd_in_order` policy implements the [round-robin](https://en.wikipedia.org/wiki/Round-robin_scheduling) approach. Thus this policy defines only one volume (`single`), the data parts are stored on all its disks in circular order. Such policy can be quite useful if there are several similar disks are mounted to the system, but RAID is not configured. Keep in mind that each individual disk drive is not reliable and you might want to compensate it with replication factor of 3 or more. diff --git a/docs/en/query_language/functions/other_functions.md b/docs/en/query_language/functions/other_functions.md index ba1e0c0cc9a..5542369e738 100644 --- a/docs/en/query_language/functions/other_functions.md +++ b/docs/en/query_language/functions/other_functions.md @@ -6,7 +6,7 @@ Returns a string with the name of the host that this function was performed on. ## FQDN {#fqdn} -Returns the fully qualified domain name. +Returns the fully qualified domain name. **Syntax** @@ -392,7 +392,7 @@ neighbor(column, offset[, default_value]) The result of the function depends on the affected data blocks and the order of data in the block. If you make a subquery with ORDER BY and call the function from outside the subquery, you can get the expected result. -**Parameters** +**Parameters** - `column` — A column name or scalar expression. - `offset` — The number of rows forwards or backwards from the current row of `column`. [Int64](../../data_types/int_uint.md). @@ -400,7 +400,7 @@ If you make a subquery with ORDER BY and call the function from outside the subq **Returned values** -- Value for `column` in `offset` distance from current row if `offset` value is not outside block bounds. +- Value for `column` in `offset` distance from current row if `offset` value is not outside block bounds. - Default value for `column` if `offset` value is outside block bounds. If `default_value` is given, then it will be used. Type: type of data blocks affected or default value type. @@ -545,7 +545,7 @@ WHERE diff != 1 └────────┴──────┘ ``` ```sql -set max_block_size=100000 -- default value is 65536! +set max_block_size=100000 -- default value is 65536! SELECT number, @@ -886,7 +886,7 @@ Code: 395. DB::Exception: Received from localhost:9000. DB::Exception: Too many. ## identity() -Returns the same value that was used as its argument. +Returns the same value that was used as its argument. ```sql SELECT identity(42) @@ -898,4 +898,39 @@ SELECT identity(42) ``` Used for debugging and testing, allows to "break" access by index, and get the result and query performance for a full scan. +## randomPrintableASCII {#randomascii} + +Generates a string with a random set of [ASCII](https://en.wikipedia.org/wiki/ASCII#Printable_characters) printable characters. + +**Syntax** + +```sql +randomPrintableASCII(length) +``` + +**Parameters** + +- `length` — Resulting string length. Positive integer. + + If you pass `length < 0`, behavior of the function is undefined. + +**Returned value** + + - String with a random set of [ASCII](https://en.wikipedia.org/wiki/ASCII#Printable_characters) printable characters. + +Type: [String](../../data_types/string.md) + +**Example** + +```sql +SELECT number, randomPrintableASCII(30) as str, length(str) FROM system.numbers LIMIT 3 +``` +```text +┌─number─┬─str────────────────────────────┬─length(randomPrintableASCII(30))─┐ +│ 0 │ SuiCOSTvC0csfABSw=UcSzp2.`rv8x │ 30 │ +│ 1 │ 1Ag NlJ &RCN:*>HVPG;PE-nO"SUFD │ 30 │ +│ 2 │ /"+<"wUTh:=LjJ Vm!c&hI*m#XTfzz │ 30 │ +└────────┴────────────────────────────────┴──────────────────────────────────┘ +``` + [Original article](https://clickhouse.yandex/docs/en/query_language/functions/other_functions/) diff --git a/docs/en/query_language/functions/string_functions.md b/docs/en/query_language/functions/string_functions.md index 33e5700f355..8b73f1b3c19 100644 --- a/docs/en/query_language/functions/string_functions.md +++ b/docs/en/query_language/functions/string_functions.md @@ -217,6 +217,44 @@ Result: └───────────────────────────────────┘ ``` +## trim {#trim} + +Removes all specified characters from the start or end of a string. +By default removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string. + +**Syntax** + +```sql +trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string) +``` + +**Parameters** + +- `trim_character` — specified characters for trim. [String](../../data_types/string.md). +- `input_string` — string for trim. [String](../../data_types/string.md). + +**Returned value** + +A string without leading and (or) trailing specified characters. + +Type: `String`. + +**Example** + +Query: + +```sql +SELECT trim(BOTH ' ()' FROM '( Hello, world! )') +``` + +Result: + +```text +┌─trim(BOTH ' ()' FROM '( Hello, world! )')─┐ +│ Hello, world! │ +└───────────────────────────────────────────────┘ +``` + ## trimLeft {#trimleft} Removes all consecutive occurrences of common whitespace (ASCII character 32) from the beginning of a string. It doesn't remove other kinds of whitespace characters (tab, no-break space, etc.). @@ -224,14 +262,14 @@ Removes all consecutive occurrences of common whitespace (ASCII character 32) fr **Syntax** ```sql -trimLeft() +trimLeft(input_string) ``` -Alias: `ltrim`. +Alias: `ltrim(input_string)`. **Parameters** -- `string` — string to trim. [String](../../data_types/string.md). +- `input_string` — string to trim. [String](../../data_types/string.md). **Returned value** @@ -262,14 +300,14 @@ Removes all consecutive occurrences of common whitespace (ASCII character 32) fr **Syntax** ```sql -trimRight() +trimRight(input_string) ``` -Alias: `rtrim`. +Alias: `rtrim(input_string)`. **Parameters** -- `string` — string to trim. [String](../../data_types/string.md). +- `input_string` — string to trim. [String](../../data_types/string.md). **Returned value** @@ -300,14 +338,14 @@ Removes all consecutive occurrences of common whitespace (ASCII character 32) fr **Syntax** ```sql -trimBoth() +trimBoth(input_string) ``` -Alias: `trim`. +Alias: `trim(input_string)`. **Parameters** -- `string` — string to trim. [String](../../data_types/string.md). +- `input_string` — string to trim. [String](../../data_types/string.md). **Returned value** diff --git a/docs/en/query_language/select.md b/docs/en/query_language/select.md index e6e7676c643..6e053332d83 100644 --- a/docs/en/query_language/select.md +++ b/docs/en/query_language/select.md @@ -1120,7 +1120,7 @@ The structure of results (the number and type of columns) must match for the que Queries that are parts of UNION ALL can't be enclosed in brackets. ORDER BY and LIMIT are applied to separate queries, not to the final result. If you need to apply a conversion to the final result, you can put all the queries with UNION ALL in a subquery in the FROM clause. -### INTO OUTFILE Clause +### INTO OUTFILE Clause {#into-outfile-clause} Add the `INTO OUTFILE filename` clause (where filename is a string literal) to redirect query output to the specified file. In contrast to MySQL, the file is created on the client side. The query will fail if a file with the same filename already exists. @@ -1128,7 +1128,7 @@ This functionality is available in the command-line client and clickhouse-local The default output format is TabSeparated (the same as in the command-line client batch mode). -### FORMAT Clause +### FORMAT Clause {#format-clause} Specify 'FORMAT format' to get data in any specified format. You can use this for convenience, or for creating dumps. diff --git a/docs/fa/interfaces/cli.md b/docs/fa/interfaces/cli.md index 7680348aef6..e5f869e1c0d 100644 --- a/docs/fa/interfaces/cli.md +++ b/docs/fa/interfaces/cli.md @@ -91,7 +91,7 @@ command line برا پایه 'readline' (و 'history' یا 'libedit'، یه بد - `--query, -q` – مشخص کردن query برای پردازش در هنگام استفاده از حالت non-interactive. - `--database, -d` – انتخاب دیتابیس در بدو ورود به کلاینت. مقدار پیش فرض: دیتابیس مشخص شده در تنظیمات سرور (پیش فرض 'default') - `--multiline, -m` – اگر مشخص شود، یعنی اجازه ی نوشتن query های چند خطی را بده. (بعد از Enter، query را ارسال نکن). -- `--multiquery, -n` – اگر مشخص شود، اجازه ی اجرای چندین query که از طریق کاما جدا شده اند را می دهد. فقط در حالت non-interactive کار می کند. +- `--multiquery, -n` – اگر مشخص شود، اجازه ی اجرای چندین query که از طریق جمع و حلقه ها جدا شده اند را می دهد. فقط در حالت non-interactive کار می کند. - `--format, -f` مشخص کردن نوع فرمت خروجی - `--vertical, -E` اگر مشخص شود، از فرمت Vertical برای نمایش خروجی استفاده می شود. این گزینه مشابه '--format=Vertical' می باشد. در این فرمت، هر مقدار در یک خط جدید چاپ می شود، که در هنگام نمایش جداول عریض مفید است. - `--time, -t` اگر مشخص شود، در حالت non-interactive زمان اجرای query در 'stderr' جاپ می شود. diff --git a/docs/fa/interfaces/third-party/client_libraries.md b/docs/fa/interfaces/third-party/client_libraries.md index c31998191e5..c6ee8785499 100644 --- a/docs/fa/interfaces/third-party/client_libraries.md +++ b/docs/fa/interfaces/third-party/client_libraries.md @@ -40,6 +40,7 @@ - C# - [ClickHouse.Ado](https://github.com/killwort/ClickHouse-Net) - [ClickHouse.Net](https://github.com/ilyabreev/ClickHouse.Net) + - [ClickHouse.Client](https://github.com/DarkWanderer/ClickHouse.Client) - Elixir - [clickhousex](https://github.com/appodeal/clickhousex/) - Nim diff --git a/docs/fa/interfaces/third-party/integrations.md b/docs/fa/interfaces/third-party/integrations.md index e7def6bca58..7aed10d3762 100644 --- a/docs/fa/interfaces/third-party/integrations.md +++ b/docs/fa/interfaces/third-party/integrations.md @@ -34,7 +34,9 @@ - نظارت بر - [Graphite](https://graphiteapp.org) - [graphouse](https://github.com/yandex/graphouse) - - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + + - [graphite-clickhouse](https://github.com/lomik/graphite-clickhouse) + - [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer) - optimizes staled partitions in [\*GraphiteMergeTree](../../operations/table_engines/graphitemergetree.md#graphitemergetree) if rules from [rollup configuration](../../operations/table_engines/graphitemergetree.md#rollup-configuration) could be applied - [Grafana](https://grafana.com/) - [clickhouse-grafana](https://github.com/Vertamedia/clickhouse-grafana) - [Prometheus](https://prometheus.io/) diff --git a/docs/fa/operations/performance_test.md b/docs/fa/operations/performance_test.md new file mode 120000 index 00000000000..a74c126c63f --- /dev/null +++ b/docs/fa/operations/performance_test.md @@ -0,0 +1 @@ +../../en/operations/performance_test.md \ No newline at end of file diff --git a/docs/ja/operations/performance_test.md b/docs/ja/operations/performance_test.md new file mode 120000 index 00000000000..a74c126c63f --- /dev/null +++ b/docs/ja/operations/performance_test.md @@ -0,0 +1 @@ +../../en/operations/performance_test.md \ No newline at end of file diff --git a/docs/ru/extended_roadmap.md b/docs/ru/extended_roadmap.md index 5fb0378f896..d5cbcf5f8c8 100644 --- a/docs/ru/extended_roadmap.md +++ b/docs/ru/extended_roadmap.md @@ -495,6 +495,8 @@ Fuzzing тестирование - это тестирование случай Изначально занимался Олег Алексеенков. Сейчас он перешёл работать в дружественный отдел, но обещает продолжать синхронизацию. Затем, возможно, [Иван Лежанкин](https://github.com/abyss7). Но сейчас приостановлено, так как Максим из YT должен исправить регрессию производительности в анализе индекса. +Максим из YT сказал, что сделает это после нового года. + ### 7.26. Побайтовая идентичность репозитория с Аркадией. Команда DevTools. Прогресс по задаче под вопросом. @@ -669,11 +671,13 @@ ClickHouse предоставляет возможность обратитьс ## 10. Внешние словари. -### 10.1. Исправление зависания в библиотеке доступа к YT. +### 10.1. + Исправление зависания в библиотеке доступа к YT. Библиотека для доступа к YT не переживает учения. Нужно для БК и Метрики. Поиск причин - [Александр Сапин](https://github.com/alesapin). Дальшейшее исправление возможно на стороне YT. +Цитата: "Оказывается для YT-клиента зависания на несколько минут это нормально. Убрал внутренние ретраи, снизил таймауты. Однозначно станет лучше". + ### 10.2. Исправление SIGILL в библиотеке доступа к YT. Код YT использует SIGILL вместо abort. Это, опять же, происходит при учениях. @@ -1409,7 +1413,7 @@ ClickHouse поддерживает LZ4 и ZSTD для сжатия данных ### 24.4. Шифрование в ClickHouse на уровне кусков данных. -Yuchen Dong, ICS. +Yuchen Dong, ICT. Данные в ClickHouse хранятся без шифрования. При наличии доступа к дискам, злоумышленник может прочитать данные. Предлагается реализовать два подхода к шифрованию: @@ -1418,7 +1422,7 @@ Yuchen Dong, ICS. ### 24.5. Поддержка функций шифрования для отдельных значений. -Yuchen Dong, ICS. +Yuchen Dong, ICT. Смотрите также 24.5. @@ -1570,7 +1574,7 @@ https://github.com/yandex/ClickHouse/issues/6874 ### 24.27. Реализация алгоритмов min-hash, sim-hash для нечёткого поиска полудубликатов. -ucasFL, ICS. +ucasFL, ICT. Алгоритмы min-hash и sim-hash позволяют вычислить для текста несколько хэш-значений таких, что при небольшом изменении текста, по крайней мере один из хэшей не меняется. Вычисления можно реализовать на n-грамах и словарных шинглах. Предлагается добавить поддержку этих алгоритмов в виде функций в ClickHouse и изучить их применимость для задачи нечёткого поиска полудубликатов. @@ -1677,7 +1681,7 @@ Amos Bird, но его решение слишком громоздкое и п Требуется проработать вопрос безопасности и изоляции инстансов (поднятие в контейнерах с ограничениями по сети), подключение тестовых датасетов с помощью copy-on-write файловой системы; органичения ресурсов. -### 25.17. Взаимодействие с ВУЗами: ВШЭ, УрФУ, ICS Beijing. +### 25.17. Взаимодействие с ВУЗами: ВШЭ, УрФУ, ICT Beijing. Алексей Миловидов и вся группа разработки diff --git a/docs/ru/getting_started/example_datasets/star_schema.md b/docs/ru/getting_started/example_datasets/star_schema.md index 2e66ced7149..28ab1e0fd2b 100644 --- a/docs/ru/getting_started/example_datasets/star_schema.md +++ b/docs/ru/getting_started/example_datasets/star_schema.md @@ -1,6 +1,6 @@ # Star Schema Benchmark -Compiling dbgen: +Компиляция dbgen: ```bash $ git clone git@github.com:vadimtk/ssb-dbgen.git @@ -8,7 +8,12 @@ $ cd ssb-dbgen $ make ``` -Generating data: +Генерация данных: + +!!! warning "Внимание" + -s 100 -- dbgen генерирует 600 миллионов строк (67 ГБ) + -s 1000 -- dbgen генерирует 6 миллиардов строк (занимает много времени) + ```bash $ ./dbgen -s 1000 -T c @@ -18,7 +23,7 @@ $ ./dbgen -s 1000 -T s $ ./dbgen -s 1000 -T d ``` -Creating tables in ClickHouse: +Создание таблиц в Кликхауз: ```sql CREATE TABLE customer @@ -83,7 +88,7 @@ CREATE TABLE supplier ENGINE = MergeTree ORDER BY S_SUPPKEY; ``` -Inserting data: +Вставка данных: ```bash $ clickhouse-client --query "INSERT INTO customer FORMAT CSV" < customer.tbl @@ -92,77 +97,244 @@ $ clickhouse-client --query "INSERT INTO supplier FORMAT CSV" < supplier.tbl $ clickhouse-client --query "INSERT INTO lineorder FORMAT CSV" < lineorder.tbl ``` -Converting "star schema" to denormalized "flat schema": +Конвертация схемы-звезда в денормализованную плоскую схему: ```sql SET max_memory_usage = 20000000000, allow_experimental_multiple_joins_emulation = 1; - CREATE TABLE lineorder_flat ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS -SELECT l.*, c.*, s.*, p.* -FROM lineorder l - ANY INNER JOIN customer c ON (c.C_CUSTKEY = l.LO_CUSTKEY) - ANY INNER JOIN supplier s ON (s.S_SUPPKEY = l.LO_SUPPKEY) - ANY INNER JOIN part p ON (p.P_PARTKEY = l.LO_PARTKEY); +SELECT + l.LO_ORDERKEY AS LO_ORDERKEY, + l.LO_LINENUMBER AS LO_LINENUMBER, + l.LO_CUSTKEY AS LO_CUSTKEY, + l.LO_PARTKEY AS LO_PARTKEY, + l.LO_SUPPKEY AS LO_SUPPKEY, + l.LO_ORDERDATE AS LO_ORDERDATE, + l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY, + l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY, + l.LO_QUANTITY AS LO_QUANTITY, + l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE, + l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE, + l.LO_DISCOUNT AS LO_DISCOUNT, + l.LO_REVENUE AS LO_REVENUE, + l.LO_SUPPLYCOST AS LO_SUPPLYCOST, + l.LO_TAX AS LO_TAX, + l.LO_COMMITDATE AS LO_COMMITDATE, + l.LO_SHIPMODE AS LO_SHIPMODE, + c.C_NAME AS C_NAME, + c.C_ADDRESS AS C_ADDRESS, + c.C_CITY AS C_CITY, + c.C_NATION AS C_NATION, + c.C_REGION AS C_REGION, + c.C_PHONE AS C_PHONE, + c.C_MKTSEGMENT AS C_MKTSEGMENT, + s.S_NAME AS S_NAME, + s.S_ADDRESS AS S_ADDRESS, + s.S_CITY AS S_CITY, + s.S_NATION AS S_NATION, + s.S_REGION AS S_REGION, + s.S_PHONE AS S_PHONE, + p.P_NAME AS P_NAME, + p.P_MFGR AS P_MFGR, + p.P_CATEGORY AS P_CATEGORY, + p.P_BRAND AS P_BRAND, + p.P_COLOR AS P_COLOR, + p.P_TYPE AS P_TYPE, + p.P_SIZE AS P_SIZE, + p.P_CONTAINER AS P_CONTAINER +FROM lineorder AS l +INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY +INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY +INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY; -ALTER TABLE lineorder_flat DROP COLUMN C_CUSTKEY, DROP COLUMN S_SUPPKEY, DROP COLUMN P_PARTKEY; ``` Running the queries: Q1.1 ```sql -SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25; +SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue +FROM lineorder_flat +WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25; ``` Q1.2 ```sql -SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35; +SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue +FROM lineorder_flat +WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35; ``` Q1.3 ```sql -SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35; +SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue +FROM lineorder_flat +WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994 + AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35; ``` Q2.1 ```sql -SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY year, P_BRAND ORDER BY year, P_BRAND; +SELECT + sum(LO_REVENUE), + toYear(LO_ORDERDATE) AS year, + P_BRAND +FROM lineorder_flat +WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' +GROUP BY + year, + P_BRAND +ORDER BY + year, + P_BRAND; ``` Q2.2 ```sql -SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE P_BRAND BETWEEN 'MFGR#2221' AND 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY year, P_BRAND ORDER BY year, P_BRAND; +SELECT + sum(LO_REVENUE), + toYear(LO_ORDERDATE) AS year, + P_BRAND +FROM lineorder_flat +WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' +GROUP BY + year, + P_BRAND +ORDER BY + year, + P_BRAND; ``` Q2.3 ```sql -SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY year, P_BRAND ORDER BY year, P_BRAND; +SELECT + sum(LO_REVENUE), + toYear(LO_ORDERDATE) AS year, + P_BRAND +FROM lineorder_flat +WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' +GROUP BY + year, + P_BRAND +ORDER BY + year, + P_BRAND; ``` Q3.1 ```sql -SELECT C_NATION, S_NATION, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 GROUP BY C_NATION, S_NATION, year ORDER BY year asc, revenue desc; +SELECT + C_NATION, + S_NATION, + toYear(LO_ORDERDATE) AS year, + sum(LO_REVENUE) AS revenue +FROM lineorder_flat +WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 +GROUP BY + C_NATION, + S_NATION, + year +ORDER BY + year ASC, + revenue DESC; ``` Q3.2 ```sql -SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 GROUP BY C_CITY, S_CITY, year ORDER BY year asc, revenue desc; +SELECT + C_CITY, + S_CITY, + toYear(LO_ORDERDATE) AS year, + sum(LO_REVENUE) AS revenue +FROM lineorder_flat +WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 +GROUP BY + C_CITY, + S_CITY, + year +ORDER BY + year ASC, + revenue DESC; ``` Q3.3 ```sql -SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 GROUP BY C_CITY, S_CITY, year ORDER BY year asc, revenue desc; +SELECT + C_CITY, + S_CITY, + toYear(LO_ORDERDATE) AS year, + sum(LO_REVENUE) AS revenue +FROM lineorder_flat +WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 +GROUP BY + C_CITY, + S_CITY, + year +ORDER BY + year ASC, + revenue DESC; ``` Q3.4 ```sql -SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = '199712' GROUP BY C_CITY, S_CITY, year ORDER BY year asc, revenue desc; +SELECT + C_CITY, + S_CITY, + toYear(LO_ORDERDATE) AS year, + sum(LO_REVENUE) AS revenue +FROM lineorder_flat +WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712 +GROUP BY + C_CITY, + S_CITY, + year +ORDER BY + year ASC, + revenue DESC; ``` Q4.1 ```sql -SELECT toYear(LO_ORDERDATE) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year, C_NATION ORDER BY year, C_NATION; +SELECT + toYear(LO_ORDERDATE) AS year, + C_NATION, + sum(LO_REVENUE - LO_SUPPLYCOST) AS profit +FROM lineorder_flat +WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') +GROUP BY + year, + C_NATION +ORDER BY + year ASC, + C_NATION ASC; ``` Q4.2 ```sql -SELECT toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year, S_NATION, P_CATEGORY ORDER BY year, S_NATION, P_CATEGORY; +SELECT + toYear(LO_ORDERDATE) AS year, + S_NATION, + P_CATEGORY, + sum(LO_REVENUE - LO_SUPPLYCOST) AS profit +FROM lineorder_flat +WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') +GROUP BY + year, + S_NATION, + P_CATEGORY +ORDER BY + year ASC, + S_NATION ASC, + P_CATEGORY ASC; ``` Q4.3 ```sql -SELECT toYear(LO_ORDERDATE) AS year, S_CITY, P_BRAND, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BY year, S_CITY, P_BRAND ORDER BY year, S_CITY, P_BRAND; +SELECT + toYear(LO_ORDERDATE) AS year, + S_CITY, + P_BRAND, + sum(LO_REVENUE - LO_SUPPLYCOST) AS profit +FROM lineorder_flat +WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' +GROUP BY + year, + S_CITY, + P_BRAND +ORDER BY + year ASC, + S_CITY ASC, + P_BRAND ASC; ``` [Original article](https://clickhouse.yandex/docs/en/getting_started/example_datasets/star_schema/) diff --git a/docs/ru/interfaces/cli.md b/docs/ru/interfaces/cli.md index a67ae87f6ab..71742d02740 100644 --- a/docs/ru/interfaces/cli.md +++ b/docs/ru/interfaces/cli.md @@ -119,7 +119,7 @@ $ clickhouse-client --param_tuple_in_tuple="(10, ('dt', 10))" -q "SELECT * FROM - `--query, -q` — запрос для выполнения, при использовании в неинтерактивном режиме. - `--database, -d` — выбрать текущую БД, по умолчанию — текущая БД из настроек сервера (по умолчанию — БД default). - `--multiline, -m` — если указано — разрешить многострочные запросы, не отправлять запрос по нажатию Enter. -- `--multiquery, -n` — если указано — разрешить выполнять несколько запросов, разделённых точкой с запятой. Работает только в неинтерактивном режиме. +- `--multiquery, -n` — если указано — разрешить выполнять несколько запросов, разделённых точкой с запятой. - `--format, -f` — использовать указанный формат по умолчанию для вывода результата. - `--vertical, -E` — если указано, использовать формат Vertical по умолчанию для вывода результата. То же самое, что --format=Vertical. В этом формате каждое значение выводится на отдельной строке, что удобно для отображения широких таблиц. - `--time, -t` — если указано, в неинтерактивном режиме вывести время выполнения запроса в stderr. diff --git a/docs/ru/interfaces/third-party/client_libraries.md b/docs/ru/interfaces/third-party/client_libraries.md index 13b7b9d243e..1860b074123 100644 --- a/docs/ru/interfaces/third-party/client_libraries.md +++ b/docs/ru/interfaces/third-party/client_libraries.md @@ -39,6 +39,7 @@ - C# - [ClickHouse.Ado](https://github.com/killwort/ClickHouse-Net) - [ClickHouse.Net](https://github.com/ilyabreev/ClickHouse.Net) + - [ClickHouse.Client](https://github.com/DarkWanderer/ClickHouse.Client) - Elixir - [clickhousex](https://github.com/appodeal/clickhousex/) - Nim diff --git a/docs/ru/interfaces/third-party/integrations.md b/docs/ru/interfaces/third-party/integrations.md index 5ab706da67b..470d02bea7d 100644 --- a/docs/ru/interfaces/third-party/integrations.md +++ b/docs/ru/interfaces/third-party/integrations.md @@ -32,7 +32,9 @@ - Мониторинг - [Graphite](https://graphiteapp.org) - [graphouse](https://github.com/yandex/graphouse) - - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + + - [graphite-clickhouse](https://github.com/lomik/graphite-clickhouse) + - [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer) - оптимизирует партиции таблиц [\*GraphiteMergeTree](../../operations/table_engines/graphitemergetree.md#graphitemergetree) согласно правилам в [конфигурации rollup](../../operations/table_engines/graphitemergetree.md#rollup-configuration) - [Grafana](https://grafana.com/) - [clickhouse-grafana](https://github.com/Vertamedia/clickhouse-grafana) - [Prometheus](https://prometheus.io/) diff --git a/docs/ru/operations/index.md b/docs/ru/operations/index.md index a10f7c377b1..371afaf2af0 100644 --- a/docs/ru/operations/index.md +++ b/docs/ru/operations/index.md @@ -13,6 +13,7 @@ - [Квоты](quotas.md) - [Системные таблицы](system_tables.md) - [Конфигурационные параметры сервера](server_settings/index.md) + - [Тестирование севреров с помощью ClickHouse](performance_test.md) - [Настройки](settings/index.md) - [Утилиты](utils/index.md) diff --git a/docs/ru/operations/performance_test.md b/docs/ru/operations/performance_test.md new file mode 120000 index 00000000000..a74c126c63f --- /dev/null +++ b/docs/ru/operations/performance_test.md @@ -0,0 +1 @@ +../../en/operations/performance_test.md \ No newline at end of file diff --git a/docs/ru/operations/server_settings/settings.md b/docs/ru/operations/server_settings/settings.md index ca1c255bee3..9a0fa781f63 100644 --- a/docs/ru/operations/server_settings/settings.md +++ b/docs/ru/operations/server_settings/settings.md @@ -370,10 +370,8 @@ ClickHouse проверит условия `min_part_size` и `min_part_size_rat Приблизительный размер (в байтах) кэша засечек, используемых движками таблиц семейства [MergeTree](../../operations/table_engines/mergetree.md). -Кэш общий для сервера, память выделяется по мере необходимости. Кэш не может быть меньше, чем 5368709120. +Кэш общий для сервера, память выделяется по мере необходимости. -!!! warning "Внимание" - Этот параметр может быть превышен при большом значении настройки [mark_cache_min_lifetime](../settings/settings.md#settings-mark_cache_min_lifetime). **Пример** diff --git a/docs/ru/operations/settings/settings.md b/docs/ru/operations/settings/settings.md index 2d5a11bec86..71ff9b494a4 100644 --- a/docs/ru/operations/settings/settings.md +++ b/docs/ru/operations/settings/settings.md @@ -566,12 +566,6 @@ ClickHouse использует этот параметр при чтении д Как правило, не имеет смысла менять эту настройку. -## mark_cache_min_lifetime {#settings-mark_cache_min_lifetime} - -Если превышено значение параметра [mark_cache_size](../server_settings/settings.md#server-mark-cache-size), то будут удалены только записи старше чем значение этого параметра. Имеет смысл понижать данный параметр при малом количестве RAM на хост-системах. - -Default value: 10000 seconds. - ## max_query_size {#settings-max_query_size} Максимальный кусок запроса, который будет считан в оперативку для разбора парсером языка SQL. @@ -957,6 +951,39 @@ load_balancing = first_or_random Значение по умолчанию — 0. +## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms} + +Основной интервал отправки данных движком таблиц [Distributed](../table_engines/distributed.md). Фактический интервал растёт экспоненциально при возникновении ошибок. + +Возможные значения: + +- Положительное целое количество миллисекунд. + +Значение по умолчанию: 100 миллисекунд. + + +## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms} + +Максимальный интервал отправки данных движком таблиц [Distributed](../table_engines/distributed.md). Ограничивает экпоненциальный рост интервала, установленого настройкой [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms). + +Возможные значения: + +- Положительное целое количество миллисекунд. + +Значение по умолчанию: 30000 миллисекунд (30 секунд). + +## distributed_directory_monitor_batch_inserts {#distributed_directory_monitor_batch_inserts} + +Включает/выключает пакетную отправку вставленных данных. + +Если пакетная отправка включена, то движок таблиц [Distributed](../table_engines/distributed.md) вместо того, чтобы отправлять каждый файл со вставленными данными по отдельности, старается отправить их все за одну операцию. Пакетная отправка улучшает производительность кластера за счет более оптимального использования ресурсов сервера и сети. + +Возможные значения: + +- 1 — включено. +- 0 — выключено. + +Значение по умолчанию: 0. ## os_thread_priority {#setting-os_thread_priority} diff --git a/docs/ru/operations/settings/settings_profiles.md b/docs/ru/operations/settings/settings_profiles.md index a120c388880..8b4e2316fe6 100644 --- a/docs/ru/operations/settings/settings_profiles.md +++ b/docs/ru/operations/settings/settings_profiles.md @@ -60,6 +60,6 @@ SET profile = 'web' В примере задано два профиля: `default` и `web`. Профиль `default` имеет специальное значение - он всегда обязан присутствовать и применяется при запуске сервера. То есть, профиль `default` содержит настройки по умолчанию. Профиль `web` - обычный профиль, который может быть установлен с помощью запроса `SET` или с помощью параметра URL при запросе по HTTP. -Профили настроек могут наследоваться от друг-друга - это реализуется указанием настройки `profile` перед остальными настройками, перечисленными в профиле. +Профили настроек могут наследоваться от друг-друга - это реализуется указанием одной или нескольких настроек `profile` перед остальными настройками, перечисленными в профиле. Если одна настройка указана в нескольких профилях, используется последнее из значений. [Оригинальная статья](https://clickhouse.yandex/docs/ru/operations/settings/settings_profiles/) diff --git a/docs/ru/operations/table_engines/distributed.md b/docs/ru/operations/table_engines/distributed.md index ceea785d84e..53391fe8125 100644 --- a/docs/ru/operations/table_engines/distributed.md +++ b/docs/ru/operations/table_engines/distributed.md @@ -87,12 +87,10 @@ logs - имя кластера в конфигурационном файле с Есть два способа записывать данные на кластер: -Во первых, вы можете самостоятельно определять, на какие серверы какие данные записывать, и выполнять запись непосредственно на каждый шард. То есть, делать INSERT в те таблицы, на которые "смотрит" распределённая таблица. -Это наиболее гибкое решение - вы можете использовать любую схему шардирования, которая может быть нетривиальной из-за требований предметной области. +Во-первых, вы можете самостоятельно определять, на какие серверы какие данные записывать, и выполнять запись непосредственно на каждый шард. То есть, делать INSERT в те таблицы, на которые "смотрит" распределённая таблица. Это наиболее гибкое решение поскольку вы можете использовать любую схему шардирования, которая может быть нетривиальной из-за требований предметной области. Также это является наиболее оптимальным решением, так как данные могут записываться на разные шарды полностью независимо. -Во вторых, вы можете делать INSERT в Distributed таблицу. В этом случае, таблица будет сама распределять вставляемые данные по серверам. -Для того, чтобы писать в Distributed таблицу, у неё должен быть задан ключ шардирования (последний параметр). Также, если шард всего-лишь один, то запись работает и без указания ключа шардирования (так как в этом случае он не имеет смысла). +Во-вторых, вы можете делать INSERT в Distributed таблицу. В этом случае, таблица будет сама распределять вставляемые данные по серверам. Для того, чтобы писать в Distributed таблицу, у неё должен быть задан ключ шардирования (последний параметр). Также, если шард всего-лишь один, то запись работает и без указания ключа шардирования (так как в этом случае он не имеет смысла). У каждого шарда в конфигурационном файле может быть задан "вес" (weight). По умолчанию, вес равен единице. Данные будут распределяться по шардам в количестве, пропорциональном весу шарда. Например, если есть два шарда, и у первого выставлен вес 9, а у второго 10, то на первый будет отправляться 9 / 19 доля строк, а на второй - 10 / 19. @@ -114,7 +112,7 @@ logs - имя кластера в конфигурационном файле с - используются запросы, требующие соединение данных (IN, JOIN) по определённому ключу - тогда если данные шардированы по этому ключу, то можно использовать локальные IN, JOIN вместо GLOBAL IN, GLOBAL JOIN, что кардинально более эффективно. - используется большое количество серверов (сотни и больше) и большое количество маленьких запросов (запросы отдельных клиентов - сайтов, рекламодателей, партнёров) - тогда, для того, чтобы маленькие запросы не затрагивали весь кластер, имеет смысл располагать данные одного клиента на одном шарде, или (вариант, который используется в Яндекс.Метрике) сделать двухуровневое шардирование: разбить весь кластер на "слои", где слой может состоять из нескольких шардов; данные для одного клиента располагаются на одном слое, но в один слой можно по мере необходимости добавлять шарды, в рамках которых данные распределены произвольным образом; создаются распределённые таблицы на каждый слой и одна общая распределённая таблица для глобальных запросов. -Запись данных осуществляется полностью асинхронно. При INSERT-е в Distributed таблицу, блок данных всего лишь записывается в локальную файловую систему. Данные отправляются на удалённые серверы в фоне, при первой возможности. Вы должны проверять, успешно ли отправляются данные, проверяя список файлов (данные, ожидающие отправки) в директории таблицы: /var/lib/clickhouse/data/database/table/. +Запись данных осуществляется полностью асинхронно. При вставке в таблицу, блок данных сначала записывается в файловую систему. Затем, в фоновом режиме отправляются на удалённые серверы при первой возможности. Период отправки регулируется настройками [distributed_directory_monitor_sleep_time_ms](../settings/settings.md#distributed_directory_monitor_sleep_time_ms) и [distributed_directory_monitor_max_sleep_time_ms](../settings/settings.md#distributed_directory_monitor_max_sleep_time_ms). Движок таблиц `Distributed` отправляет каждый файл со вставленными данными отдельно, но можно включить пакетную отправку данных настройкой [distributed_directory_monitor_batch_inserts](../settings/settings.md#distributed_directory_monitor_batch_inserts). Эта настройка улучшает производительность кластера за счет более оптимального использования ресурсов сервера-отправителя и сети. Необходимо проверять, что данные отправлены успешно, для этого проверьте список файлов (данных, ожидающих отправки) в каталоге таблицы `/var/lib/clickhouse/data/database/table/`. Если после INSERT-а в Distributed таблицу, сервер перестал существовать или был грубо перезапущен (например, в следствие аппаратного сбоя), то записанные данные могут быть потеряны. Если в директории таблицы обнаружен повреждённый кусок данных, то он переносится в поддиректорию broken и больше не используется. diff --git a/docs/ru/operations/table_engines/graphitemergetree.md b/docs/ru/operations/table_engines/graphitemergetree.md index 40948512a2c..cbb4cc746df 100644 --- a/docs/ru/operations/table_engines/graphitemergetree.md +++ b/docs/ru/operations/table_engines/graphitemergetree.md @@ -1,4 +1,4 @@ -# GraphiteMergeTree +# GraphiteMergeTree {#graphitemergetree} Движок предназначен для прореживания и агрегирования/усреднения (rollup) данных [Graphite](http://graphite.readthedocs.io/en/latest/index.html). Он может быть интересен разработчикам, которые хотят использовать ClickHouse как хранилище данных для Graphite. @@ -6,7 +6,7 @@ Движок наследует свойства от [MergeTree](mergetree.md). -## Создание таблицы +## Создание таблицы {#creating-table} ```sql CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] @@ -70,7 +70,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] -## Конфигурация rollup +## Конфигурация rollup {#rollup-configuration} Настройки прореживания данных задаются параметром [graphite_rollup](../server_settings/settings.md#server_settings-graphite_rollup) в конфигурации сервера . Имя параметра может быть любым. Можно создать несколько конфигураций и использовать их для разных таблиц. @@ -81,14 +81,14 @@ required-columns patterns ``` -### Требуемые столбцы (required-columns) +### Требуемые столбцы (required-columns) {#required-columns} - `path_column_name` — столбец, в котором хранится название метрики (сенсор Graphite). Значение по умолчанию: `Path`. - `time_column_name` — столбец, в котором хранится время измерения метрики. Значение по умолчанию: `Time`. - `value_column_name` — столбец со значением метрики в момент времени, установленный в `time_column_name`. Значение по умолчанию: `Value`. - `version_column_name` — столбец, в котором хранится версия метрики. Значение по умолчанию: `Timestamp`. -### Правила (patterns) +### Правила (patterns) {#patterns} Структура раздела `patterns`: @@ -129,7 +129,7 @@ default - `precision` – точность определения возраста данных в секундах. Должен быть делителем для 86400 (количество секунд в сутках). - `function` – имя агрегирующей функции, которую следует применить к данным, чей возраст оказался в интервале `[age, age + precision]`. -### Пример конфигурации +### Пример конфигурации {#configuration-example} ```xml diff --git a/docs/ru/operations/table_engines/mergetree.md b/docs/ru/operations/table_engines/mergetree.md index f3eba70f0e2..058e46eed99 100644 --- a/docs/ru/operations/table_engines/mergetree.md +++ b/docs/ru/operations/table_engines/mergetree.md @@ -491,21 +491,25 @@ ALTER TABLE example_table Структура конфигурации: ```xml - - - /mnt/fast_ssd/clickhouse - - - /mnt/hdd1/clickhouse - 10485760_ - - - /mnt/hdd2/clickhouse - 10485760_ - + + + + /mnt/fast_ssd/clickhouse + + + /mnt/hdd1/clickhouse + 10485760 + + + /mnt/hdd2/clickhouse + 10485760 + + ... + + ... - + ``` Теги: @@ -519,26 +523,30 @@ ALTER TABLE example_table Общий вид конфигурации политик хранения: ```xml - - - - - disk_name_from_disks_configuration - 1073741824 - - - - - - - 0.2 - - - - + + ... + + + + + disk_name_from_disks_configuration + 1073741824 + + + + + + + 0.2 + + + + - - + + + ... + ``` Тэги: @@ -552,29 +560,33 @@ ALTER TABLE example_table Примеры конфигураций: ```xml - - - - - disk1 - disk2 - - - + + ... + + + + + disk1 + disk2 + + + - - - - fast_ssd - 1073741824 - - - disk1 - - - 0.2 - - + + + + fast_ssd + 1073741824 + + + disk1 + + + 0.2 + + + ... + ``` В приведенном примере, политика `hdd_in_order` реализует прицип [round-robin](https://ru.wikipedia.org/wiki/Round-robin_(%D0%B0%D0%BB%D0%B3%D0%BE%D1%80%D0%B8%D1%82%D0%BC)). Так как в политике есть всего один том (`single`), то все записи производятся на его диски по круговому циклу. Такая политика может быть полезна при наличии в системе нескольких похожих дисков, но при этом не сконфигурирован RAID. Учтите, что каждый отдельный диск ненадёжен и чтобы не потерять важные данные это необходимо скомпенсировать за счет хранения данных в трёх копиях. diff --git a/docs/ru/query_language/functions/other_functions.md b/docs/ru/query_language/functions/other_functions.md index f784748ee29..bf3ee2a8420 100644 --- a/docs/ru/query_language/functions/other_functions.md +++ b/docs/ru/query_language/functions/other_functions.md @@ -6,7 +6,7 @@ ## FQDN {#fqdn} -Возвращает полное имя домена. +Возвращает полное имя домена. **Синтаксис** @@ -377,7 +377,7 @@ neighbor(column, offset[, default_value]) **Возвращаемое значение** -- Значение `column` в смещении от текущей строки, если значение `offset` не выходит за пределы блока. +- Значение `column` в смещении от текущей строки, если значение `offset` не выходит за пределы блока. - Значение по умолчанию для `column`, если значение `offset` выходит за пределы блока данных. Если передан параметр `default_value`, то значение берется из него. Тип: зависит от данных в `column` или переданного значения по умолчанию в `default_value`. @@ -850,10 +850,12 @@ SELECT filesystemAvailable() AS "Free space", toTypeName(filesystemAvailable()) Поддержаны только таблицы, созданные запросом с `ENGINE = Join(ANY, LEFT, )`. -## modelEvaluate(model_name, ...) +## modelEvaluate(model_name, ...) {#function-modelevaluate} + +Оценивает внешнюю модель. + +Принимает на вход имя и аргументы модели. Возвращает Float64. -Вычислить модель. -Принимает имя модели и аргументы модели. Возвращает Float64. ## throwIf(x\[, custom_message\]) @@ -883,10 +885,39 @@ SELECT identity(42) ``` Используется для отладки и тестирования, позволяет "сломать" доступ по индексу, и получить результат и производительность запроса для полного сканирования. +## randomPrintableASCII {#randomascii} + +Генерирует строку со случайным набором печатных символов [ASCII](https://en.wikipedia.org/wiki/ASCII#Printable_characters). + +**Синтаксис** + +```sql +randomPrintableASCII(length) +``` + +**Параметры** + +- `length` — Длина результирующей строки. Положительное целое число. + + Если передать `length < 0`, то поведение функции не определено. + +**Возвращаемое значение** + + - Строка со случайным набором печатных символов [ASCII](https://en.wikipedia.org/wiki/ASCII#Printable_characters). + +Тип: [String](../../data_types/string.md) + +**Пример** + +```sql +SELECT number, randomPrintableASCII(30) as str, length(str) FROM system.numbers LIMIT 3 +``` +```text +┌─number─┬─str────────────────────────────┬─length(randomPrintableASCII(30))─┐ +│ 0 │ SuiCOSTvC0csfABSw=UcSzp2.`rv8x │ 30 │ +│ 1 │ 1Ag NlJ &RCN:*>HVPG;PE-nO"SUFD │ 30 │ +│ 2 │ /"+<"wUTh:=LjJ Vm!c&hI*m#XTfzz │ 30 │ +└────────┴────────────────────────────────┴──────────────────────────────────┘ +``` + [Оригинальная статья](https://clickhouse.yandex/docs/ru/query_language/functions/other_functions/) - -## modelEvaluate(model_name, ...) {#function-modelevaluate} - -Оценивает внешнюю модель. - -Принимает на вход имя и аргументы модели. Возвращает Float64. diff --git a/docs/ru/query_language/functions/string_functions.md b/docs/ru/query_language/functions/string_functions.md index 2169cb794e0..56886e83a3d 100644 --- a/docs/ru/query_language/functions/string_functions.md +++ b/docs/ru/query_language/functions/string_functions.md @@ -189,6 +189,44 @@ SELECT startsWith('Hello, world!', 'He'); └───────────────────────────────────┘ ``` +## trim {#trim} + +Удаляет все указанные символы с начала или окончания строки. +По умолчанию удаляет все последовательные вхождения обычных пробелов (32 символ ASCII) с обоих концов строки. + +**Синтаксис** + +```sql +trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string) +``` + +**Параметры** + +- `trim_character` — один или несколько символов, подлежащие удалению. [String](../../data_types/string.md). +- `input_string` — строка для обрезки. [String](../../data_types/string.md). + +**Возвращаемое значение** + +Исходную строку после обрезки с левого и (или) правого концов строки. + +Тип: `String`. + +**Пример** + +Запрос: + +```sql +SELECT trim(BOTH ' ()' FROM '( Hello, world! )') +``` + +Ответ: + +```text +┌─trim(BOTH ' ()' FROM '( Hello, world! )')─┐ +│ Hello, world! │ +└───────────────────────────────────────────────┘ +``` + ## trimLeft {#trimleft} Удаляет все последовательные вхождения обычных пробелов (32 символ ASCII) с левого конца строки. Не удаляет другие виды пробелов (табуляция, пробел без разрыва и т. д.). @@ -196,14 +234,14 @@ SELECT startsWith('Hello, world!', 'He'); **Синтаксис** ```sql -trimLeft() +trimLeft(input_string) ``` -Алиас: `ltrim`. +Алиас: `ltrim(input_string)`. **Параметры** -- `string` — строка для обрезки. [String](../../data_types/string.md). +- `input_string` — строка для обрезки. [String](../../data_types/string.md). **Возвращаемое значение** @@ -234,14 +272,14 @@ SELECT trimLeft(' Hello, world! ') **Синтаксис** ```sql -trimRight() +trimRight(input_string) ``` -Алиас: `rtrim`. +Алиас: `rtrim(input_string)`. **Параметры** -- `string` — строка для обрезки. [String](../../data_types/string.md). +- `input_string` — строка для обрезки. [String](../../data_types/string.md). **Возвращаемое значение** @@ -272,14 +310,14 @@ SELECT trimRight(' Hello, world! ') **Синтаксис** ```sql -trimBoth() +trimBoth(input_string) ``` -Алиас: `trim`. +Алиас: `trim(input_string)`. **Параметры** -- `string` — строка для обрезки. [String](../../data_types/string.md). +- `input_string` — строка для обрезки. [String](../../data_types/string.md). **Возвращаемое значение** diff --git a/docs/toc_en.yml b/docs/toc_en.yml index a11c40e4907..604aca5d18e 100644 --- a/docs/toc_en.yml +++ b/docs/toc_en.yml @@ -197,6 +197,7 @@ nav: - 'Configuration Files': 'operations/configuration_files.md' - 'Quotas': 'operations/quotas.md' - 'System Tables': 'operations/system_tables.md' + - 'Testing Hardware': 'operations/performance_test.md' - 'Server Configuration Parameters': - 'Introduction': 'operations/server_settings/index.md' - 'Server Settings': 'operations/server_settings/settings.md' diff --git a/docs/toc_fa.yml b/docs/toc_fa.yml index 710a2ee20f8..b13c8092889 100644 --- a/docs/toc_fa.yml +++ b/docs/toc_fa.yml @@ -193,6 +193,7 @@ nav: - 'Configuration Files': 'operations/configuration_files.md' - 'Quotas': 'operations/quotas.md' - 'System Tables': 'operations/system_tables.md' + - 'Testing Hardware': 'operations/performance_test.md' - 'Server Configuration Parameters': - 'Introduction': 'operations/server_settings/index.md' - 'Server Settings': 'operations/server_settings/settings.md' diff --git a/docs/toc_ja.yml b/docs/toc_ja.yml index 945042f0fef..31b384f97b5 100644 --- a/docs/toc_ja.yml +++ b/docs/toc_ja.yml @@ -197,6 +197,7 @@ nav: - 'Configuration Files': 'operations/configuration_files.md' - 'Quotas': 'operations/quotas.md' - 'System Tables': 'operations/system_tables.md' + - 'Testing Hardware': 'operations/performance_test.md' - 'Server Configuration Parameters': - 'Introduction': 'operations/server_settings/index.md' - 'Server Settings': 'operations/server_settings/settings.md' diff --git a/docs/toc_ru.yml b/docs/toc_ru.yml index 469590b6bc8..dc6a0d9227c 100644 --- a/docs/toc_ru.yml +++ b/docs/toc_ru.yml @@ -196,6 +196,7 @@ nav: - 'Конфигурационные файлы': 'operations/configuration_files.md' - 'Квоты': 'operations/quotas.md' - 'Системные таблицы': 'operations/system_tables.md' + - 'Тестирование оборудования': 'operations/performance_test.md' - 'Конфигурационные параметры сервера': - 'Введение': 'operations/server_settings/index.md' - 'Серверные настройки': 'operations/server_settings/settings.md' diff --git a/docs/toc_zh.yml b/docs/toc_zh.yml index 09f9875069b..c07c3bb2711 100644 --- a/docs/toc_zh.yml +++ b/docs/toc_zh.yml @@ -192,6 +192,7 @@ nav: - '配置文件': 'operations/configuration_files.md' - '配额': 'operations/quotas.md' - '系统表': 'operations/system_tables.md' + - 'Testing Hardware': 'operations/performance_test.md' - 'Server参数配置': - '介绍': 'operations/server_settings/index.md' - 'Server参数说明': 'operations/server_settings/settings.md' diff --git a/docs/zh/interfaces/third-party/client_libraries.md b/docs/zh/interfaces/third-party/client_libraries.md index a8625c0d4ac..3a814f05237 100644 --- a/docs/zh/interfaces/third-party/client_libraries.md +++ b/docs/zh/interfaces/third-party/client_libraries.md @@ -39,6 +39,7 @@ - C# - [ClickHouse.Ado](https://github.com/killwort/ClickHouse-Net) - [ClickHouse.Net](https://github.com/ilyabreev/ClickHouse.Net) + - [ClickHouse.Client](https://github.com/DarkWanderer/ClickHouse.Client) - Elixir - [clickhousex](https://github.com/appodeal/clickhousex/) - Nim diff --git a/docs/zh/interfaces/third-party/integrations.md b/docs/zh/interfaces/third-party/integrations.md index 3e9fcfcd410..253939827df 100644 --- a/docs/zh/interfaces/third-party/integrations.md +++ b/docs/zh/interfaces/third-party/integrations.md @@ -31,7 +31,9 @@ - 监控 - [Graphite](https://graphiteapp.org) - [graphouse](https://github.com/yandex/graphouse) - - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + + - [graphite-clickhouse](https://github.com/lomik/graphite-clickhouse) + - [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer) - optimizes staled partitions in [\*GraphiteMergeTree](../../operations/table_engines/graphitemergetree.md#graphitemergetree) if rules from [rollup configuration](../../operations/table_engines/graphitemergetree.md#rollup-configuration) could be applied - [Grafana](https://grafana.com/) - [clickhouse-grafana](https://github.com/Vertamedia/clickhouse-grafana) - [Prometheus](https://prometheus.io/) diff --git a/docs/zh/operations/performance_test.md b/docs/zh/operations/performance_test.md new file mode 120000 index 00000000000..a74c126c63f --- /dev/null +++ b/docs/zh/operations/performance_test.md @@ -0,0 +1 @@ +../../en/operations/performance_test.md \ No newline at end of file diff --git a/docs/zh/operations/settings/settings_profiles.md b/docs/zh/operations/settings/settings_profiles.md deleted file mode 100644 index c335e249212..00000000000 --- a/docs/zh/operations/settings/settings_profiles.md +++ /dev/null @@ -1,66 +0,0 @@ - -# Settings profiles - -A settings profile is a collection of settings grouped under the same name. Each ClickHouse user has a profile. -To apply all the settings in a profile, set the `profile` setting. - -Example: - -Install the `web` profile. - -``` sql -SET profile = 'web' -``` - -Settings profiles are declared in the user config file. This is usually `users.xml`. - -Example: - -```xml - - - - - - 8 - - - - - 1000000000 - 100000000000 - - 1000000 - any - - 1000000 - 1000000000 - - 100000 - 100000000 - break - - 600 - 1000000 - 15 - - 25 - 100 - 50 - - 2 - 25 - 50 - 100 - - 1 - - -``` - -The example specifies two profiles: `default` and `web`. The `default` profile has a special purpose: it must always be present and is applied when starting the server. In other words, the `default` profile contains default settings. The `web` profile is a regular profile that can be set using the `SET` query or using a URL parameter in an HTTP query. - -Settings profiles can inherit from each other. To use inheritance, indicate the `profile` setting before the other settings that are listed in the profile. - - -[Original article](https://clickhouse.yandex/docs/en/operations/settings/settings_profiles/) diff --git a/docs/zh/operations/settings/settings_profiles.md b/docs/zh/operations/settings/settings_profiles.md new file mode 120000 index 00000000000..35d9747ad56 --- /dev/null +++ b/docs/zh/operations/settings/settings_profiles.md @@ -0,0 +1 @@ +../../../en/operations/settings/settings_profiles.md \ No newline at end of file diff --git a/docs/zh/roadmap.md b/docs/zh/roadmap.md index 3be2aa01533..1f23d5f0ab4 100644 --- a/docs/zh/roadmap.md +++ b/docs/zh/roadmap.md @@ -1,14 +1,7 @@ # 规划 -## Q3 2019 +## Q1 2020 -- 字典表的DDL -- 与类S3对象存储集成 -- 冷热数据存储分离,支持JBOD - -## Q4 2019 - -- JOIN 不受可用内存限制 - 更精确的用户资源池,可以在用户之间合理分配集群资源 - 细粒度的授权管理 - 与外部认证服务集成 diff --git a/libs/libcommon/include/ext/scope_guard.h b/libs/libcommon/include/ext/scope_guard.h index c2c7e5ec630..c12d17d0398 100644 --- a/libs/libcommon/include/ext/scope_guard.h +++ b/libs/libcommon/include/ext/scope_guard.h @@ -1,22 +1,56 @@ #pragma once #include +#include + namespace ext { - -template class scope_guard { - const F function; - +template +class [[nodiscard]] basic_scope_guard +{ public: - constexpr scope_guard(const F & function_) : function{function_} {} - constexpr scope_guard(F && function_) : function{std::move(function_)} {} - ~scope_guard() { function(); } + constexpr basic_scope_guard() = default; + constexpr basic_scope_guard(basic_scope_guard && src) : function{std::exchange(src.function, F{})} {} + + constexpr basic_scope_guard & operator=(basic_scope_guard && src) + { + if (this != &src) + { + invoke(); + function = std::exchange(src.function, F{}); + } + return *this; + } + + template , void>> + constexpr basic_scope_guard(const G & function_) : function{function_} {} + + template , void>> + constexpr basic_scope_guard(G && function_) : function{std::move(function_)} {} + + ~basic_scope_guard() { invoke(); } + +private: + void invoke() + { + if constexpr (std::is_constructible_v) + { + if (!function) + return; + } + + function(); + } + + F function = F{}; }; -template -inline scope_guard make_scope_guard(F && function_) { return std::forward(function_); } +using scope_guard = basic_scope_guard>; + +template +inline basic_scope_guard make_scope_guard(F && function_) { return std::forward(function_); } } #define SCOPE_EXIT_CONCAT(n, ...) \ diff --git a/libs/libcommon/src/LineReader.cpp b/libs/libcommon/src/LineReader.cpp index 569c1579d2e..6df0be0b32f 100644 --- a/libs/libcommon/src/LineReader.cpp +++ b/libs/libcommon/src/LineReader.cpp @@ -2,6 +2,8 @@ #include +#include + namespace { @@ -11,6 +13,17 @@ void trim(String & s) s.erase(std::find_if(s.rbegin(), s.rend(), [](int ch) { return !std::isspace(ch); }).base(), s.end()); } +/// Check if multi-line query is inserted from the paste buffer. +/// Allows delaying the start of query execution until the entirety of query is inserted. +bool hasInputData() +{ + timeval timeout = {0, 0}; + fd_set fds; + FD_ZERO(&fds); + FD_SET(STDIN_FILENO, &fds); + return select(1, &fds, nullptr, nullptr, &timeout) == 1; +} + } LineReader::Suggest::WordsRange LineReader::Suggest::getCompletions(const String & prefix, size_t prefix_length) const @@ -73,7 +86,7 @@ String LineReader::readLine(const String & first_prompt, const String & second_p if (input.empty()) continue; - is_multiline = (input.back() == extender) || (delimiter && input.back() != delimiter); + is_multiline = (input.back() == extender) || (delimiter && input.back() != delimiter) || hasInputData(); if (input.back() == extender) { diff --git a/libs/libdaemon/src/BaseDaemon.cpp b/libs/libdaemon/src/BaseDaemon.cpp index 70cc7157344..e93e2ab2f8b 100644 --- a/libs/libdaemon/src/BaseDaemon.cpp +++ b/libs/libdaemon/src/BaseDaemon.cpp @@ -209,7 +209,7 @@ public: /// This allows to receive more signals if failure happens inside onFault function. /// Example: segfault while symbolizing stack trace. - std::thread([=] { onFault(sig, info, context, stack_trace, thread_num, query_id); }).detach(); + std::thread([=, this] { onFault(sig, info, context, stack_trace, thread_num, query_id); }).detach(); } } } @@ -686,7 +686,7 @@ void BaseDaemon::initialize(Application & self) } // sensitive data masking rules are not used here - buildLoggers(config(), logger()); + buildLoggers(config(), logger(), self.commandName()); if (is_daemon) { diff --git a/libs/libmysqlxx/cmake/find_mysqlclient.cmake b/libs/libmysqlxx/cmake/find_mysqlclient.cmake index 30cd0a586b3..80e7806ef47 100644 --- a/libs/libmysqlxx/cmake/find_mysqlclient.cmake +++ b/libs/libmysqlxx/cmake/find_mysqlclient.cmake @@ -1,17 +1,17 @@ -option(ENABLE_MYSQL "Enable MySQL" ${ENABLE_LIBRARIES}) +if(OS_LINUX) + option(ENABLE_MYSQL "Enable MySQL" ${ENABLE_LIBRARIES}) +else () + option(ENABLE_MYSQL "Enable MySQL" FALSE) +endif () + if(ENABLE_MYSQL) - if(OS_LINUX) - option(USE_INTERNAL_MYSQL_LIBRARY "Set to FALSE to use system mysqlclient library instead of bundled" ${NOT_UNBUNDLED}) - else() - option(USE_INTERNAL_MYSQL_LIBRARY "Set to FALSE to use system mysqlclient library instead of bundled" OFF) - endif() + option(USE_INTERNAL_MYSQL_LIBRARY "Set to FALSE to use system mysqlclient library instead of bundled" ${NOT_UNBUNDLED}) if(USE_INTERNAL_MYSQL_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/mariadb-connector-c/README") message(WARNING "submodule contrib/mariadb-connector-c is missing. to fix try run: \n git submodule update --init --recursive") set(USE_INTERNAL_MYSQL_LIBRARY 0) endif() - if (USE_INTERNAL_MYSQL_LIBRARY) set (MYSQLCLIENT_LIBRARIES mariadbclient) set (USE_MYSQL 1) diff --git a/utils/check-marks/main.cpp b/utils/check-marks/main.cpp index 0e075f6be4e..a0a204a0185 100644 --- a/utils/check-marks/main.cpp +++ b/utils/check-marks/main.cpp @@ -76,7 +76,7 @@ void checkCompressedHeaders(const std::string & mrk_path, const std::string & bi void checkByCompressedReadBuffer(const std::string & mrk_path, const std::string & bin_path) { DB::ReadBufferFromFile mrk_in(mrk_path); - DB::CompressedReadBufferFromFile bin_in(bin_path, 0, 0); + DB::CompressedReadBufferFromFile bin_in(bin_path, 0, 0, 0); DB::WriteBufferFromFileDescriptor out(STDOUT_FILENO); bool mrk2_format = boost::algorithm::ends_with(mrk_path, ".mrk2"); @@ -133,7 +133,7 @@ int main(int argc, char ** argv) std::cerr << e.what() << ", " << e.message() << std::endl << std::endl << "Stack trace:" << std::endl - << e.getStackTrace().toString() + << e.getStackTraceString() << std::endl; throw; } diff --git a/utils/fill-factor/main.cpp b/utils/fill-factor/main.cpp index fa7ec6252ce..b492be1be85 100644 --- a/utils/fill-factor/main.cpp +++ b/utils/fill-factor/main.cpp @@ -53,7 +53,7 @@ int main(int argc, char ** argv) std::cerr << e.what() << ", " << e.message() << std::endl << std::endl << "Stack trace:" << std::endl - << e.getStackTrace().toString() + << e.getStackTraceString() << std::endl; throw; } diff --git a/utils/make_changelog.py b/utils/make_changelog.py index a47706767e3..90d12844c33 100755 --- a/utils/make_changelog.py +++ b/utils/make_changelog.py @@ -110,9 +110,29 @@ def get_commits_from_branch(repo, branch, base_sha, commits_info, max_pages, tok return commits +# Get list of commits a specified commit is cherry-picked from. Can return an empty list. +def parse_original_commits_from_cherry_pick_message(commit_message): + prefix = '(cherry picked from commits' + pos = commit_message.find(prefix) + if pos == -1: + prefix = '(cherry picked from commit' + pos = commit_message.find(prefix) + if pos == -1: + return [] + pos += len(prefix) + endpos = commit_message.find(')', pos) + if endpos == -1: + return [] + lst = [x.strip() for x in commit_message[pos:endpos].split(',')] + lst = [x for x in lst if x] + return lst + + # Use GitHub search api to check if commit from any pull request. Update pull_requests info. -def find_pull_request_for_commit(commit_sha, pull_requests, token, max_retries, retry_timeout): - resp = github_api_get_json('search/issues?q={}+type:pr+repo:{}&sort=created&order=asc'.format(commit_sha, repo), token, max_retries, retry_timeout) +def find_pull_request_for_commit(commit_info, pull_requests, token, max_retries, retry_timeout): + commits = [commit_info['sha']] + parse_original_commits_from_cherry_pick_message(commit_info['commit']['message']) + query = 'search/issues?q={}+type:pr+repo:{}&sort=created&order=asc'.format(' '.join(commits), repo) + resp = github_api_get_json(query, token, max_retries, retry_timeout) found = False for item in resp['items']: @@ -130,14 +150,14 @@ def find_pull_request_for_commit(commit_sha, pull_requests, token, max_retries, # Find pull requests from list of commits. If no pull request found, add commit to not_found_commits list. -def find_pull_requests(commits, token, max_retries, retry_timeout): +def find_pull_requests(commits, commits_info, token, max_retries, retry_timeout): not_found_commits = [] pull_requests = {} for i, commit in enumerate(commits): if (i + 1) % 10 == 0: logging.info('Processed %d commits', i + 1) - if not find_pull_request_for_commit(commit, pull_requests, token, max_retries, retry_timeout): + if not find_pull_request_for_commit(commits_info[commit], pull_requests, token, max_retries, retry_timeout): not_found_commits.append(commit) return not_found_commits, pull_requests @@ -187,7 +207,7 @@ def get_users_info(pull_requests, commits_info, token, max_retries, retry_timeou # List of unknown commits -> text description. def process_unknown_commits(commits, commits_info, users): - pattern = 'Commit: [{}]({})\nAuthor: {}\nMessage: {}' + pattern = u'Commit: [{}]({})\nAuthor: {}\nMessage: {}' texts = [] @@ -410,7 +430,7 @@ def make_changelog(new_tag, prev_tag, pull_requests_nums, repo, repo_folder, sta if not is_pull_requests_loaded: logging.info('Searching for pull requests using github api.') - unknown_commits, pull_requests = find_pull_requests(commits, token, max_retries, retry_timeout) + unknown_commits, pull_requests = find_pull_requests(commits, commits_info, token, max_retries, retry_timeout) state['unknown_commits'] = unknown_commits state['pull_requests'] = pull_requests else: diff --git a/utils/package/arch/PKGBUILD.in b/utils/package/arch/PKGBUILD.in index b3482b04907..20de555f8a7 100644 --- a/utils/package/arch/PKGBUILD.in +++ b/utils/package/arch/PKGBUILD.in @@ -7,6 +7,12 @@ url='https://clickhouse.yandex/' license=('Apache') package() { + install -dm 755 $pkgdir/usr/lib/tmpfiles.d + install -dm 755 $pkgdir/usr/lib/sysusers.d + install -Dm 644 ${CMAKE_CURRENT_SOURCE_DIR}/clickhouse.tmpfiles $pkgdir/usr/lib/tmpfiles.d/clickhouse.conf + install -Dm 644 ${CMAKE_CURRENT_SOURCE_DIR}/clickhouse.sysusers $pkgdir/usr/lib/sysusers.d/clickhouse.conf + install -dm 755 $pkgdir/etc/clickhouse-server/config.d + install -Dm 644 ${CMAKE_CURRENT_SOURCE_DIR}/logging.xml $pkgdir/etc/clickhouse-server/config.d/logging.xml # This code was requisited from kmeaw@ https://aur.archlinux.org/packages/clickhouse/ . SRC=${ClickHouse_SOURCE_DIR} BIN=${ClickHouse_BINARY_DIR} diff --git a/utils/package/arch/README.md b/utils/package/arch/README.md index 10bdae7367a..0db5aac8080 100644 --- a/utils/package/arch/README.md +++ b/utils/package/arch/README.md @@ -1,9 +1,17 @@ -### Build Arch linux package +### Build Arch Linux package From binary directory: ``` make -cd arch +cd utils/package/arch makepkg ``` + +### Install and start ClickHouse server + +``` +pacman -U clickhouse-*.pkg.tar.xz +systemctl enable clickhouse-server +systemctl start clickhouse-server +``` diff --git a/utils/package/arch/clickhouse.sysusers b/utils/package/arch/clickhouse.sysusers new file mode 100644 index 00000000000..4381c52c4f2 --- /dev/null +++ b/utils/package/arch/clickhouse.sysusers @@ -0,0 +1,3 @@ +u clickhouse - "ClickHouse user" /nonexistent /bin/false +g clickhouse - "ClickHouse group" +m clickhouse clickhouse diff --git a/utils/package/arch/clickhouse.tmpfiles b/utils/package/arch/clickhouse.tmpfiles new file mode 100644 index 00000000000..631aa895f2f --- /dev/null +++ b/utils/package/arch/clickhouse.tmpfiles @@ -0,0 +1 @@ +d /var/lib/clickhouse 0700 clickhouse clickhouse diff --git a/utils/package/arch/logging.xml b/utils/package/arch/logging.xml new file mode 100644 index 00000000000..0f9c51dff80 --- /dev/null +++ b/utils/package/arch/logging.xml @@ -0,0 +1,6 @@ + + + + + +